I am working on heavy data (500k-1m records). I need to filter these records in milliseconds.
Currently I am using list with FindAll method in C# but it takes at least 1 second to filter records from 200k records.
I have used something like below:
var FilteredRecords = ListofAllRecords.FindAll(row => row["ID"].tostring().StartsWith("value"))
Is there any other faster way to do this (in the order of milliseconds)?
I got a hint from your replies and solved my question by using below code-
var FilteredRecords = ListofAllRecords.AsParallel().Where(row =>
row.Field<string>("ID").tostring().StartsWith("value"))
Now it is taking milliseconds only.
Actualy FindAll is the fastest, because it doesn't use the iterator pattern like Where (Linq).
What you can try is doing the filtering parallel
var filteredRows = new List<DataRow>(ListofAllRecords.Count / 5);
Parallel.For(0, ListofAllRecords.Count, i =>
{
if (ListofAllRecords[i].Field<string>("ID").StartsWith("value"))
{
filteredRows.Add(ListofAllRecords[i]);
}
});
I have a collection of data = IEnumerable<AnalyticsData> and I'm trying to group by multiple properties and Sum() on an integer column. The end result will be a collection of AnalyticsReportRow<dynamic>() as you can see below, though this isn't highly relevant.
In the final Select() method, I want to pass an object in, ideally from the original set and would prefer not to recreate one in the middle of my chained queries if possible. Most of the examples seem like the create either a new strongly-typed or dynamic object to pass into the next link in the chain.
Here's what I have spent a few hours trying to work with, and this returns the set as it is in the first code block below with all rows (I export to CSV, hence the formatting):
var pageViewsData = analyticsData.GroupBy(data => new { g1 = data.Webproperty, pv = data.PageViews, d = data })
.GroupBy(data => new { gg1 = data.Key.g1, dd = data.Key.d })
.Select(data => new AnalyticsReportRow<dynamic>(data.Key.dd, "Page_Views", data.Sum(datas => datas.Key.pv)));
Result is this:
"CustomerA","","","","","Page_Views",0,"A1-810","","",2,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"A1-810","","",2,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-N8013","","",2,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",7,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",2,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",3,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",3,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",2,"4/10/2015 16:08:33"
And would like to end up with a Sum() on the second-last column, grouped by customer and then by device. For example:
"CustomerA","","","","","Page_Views",0,"A1-810","","",4,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-N8013","","",2,"4/10/2015 16:08:33"
"CustomerA","","","","","Page_Views",0,"GT-P3113","","",16,"4/10/2015 16:08:33"
I am having a hard time wrapping my head around the logic and could really use an example of how to group like this, even pseudocode and dynamic types.
Thank you.
After spending several hours on this today, and posting my question, I decided to try a few more things and read some more documentation on GroupBy().
As it turns out, I was missing the fact that you can provide a Key Selector and an Element Selector to the GroupBy method as explained in the MSDN documentation. If I understand correctly, this provides the ability to have a distinct qualifier that tells the query how to group.
In the end, this appears to give me what I need. I would really like some feedback on this to make sure I'm going about it correctly:
var pageViewsData = analyticsData.Where(data => data.PageViews > 0)
.GroupBy(data => new { g1 = data.Webproperty, g2 = data.DeviceModel }, data => data)
.Select(data => new AnalyticsReportRow<dynamic>(data.FirstOrDefault(), "Page_Views", data.Sum(d => d.PageViews)));
Try something like this:
var query = from d in analyticsData
group d by new { d.Webproperty, d.DeviceModel }
into g
select new
{
g.Webproperty,
g.DeviceModel,
Total = g.Sum(it => it.PageViews)
};
var result = query.ToList();
Let's say I'm doing a LINQ query like this (this is LINQ to Objects, BTW):
var rows =
from t in totals
let name = Utilities.GetName(t)
orderby name
select t;
So the GetName method just calculates a display name from a Total object and is a decent use of the let keyword. But let's say I have another method, Utilities.Sum() that applies some math on a Total object and sets some properties on it. I can use let to achieve this, like so:
var rows =
from t in totals
let unused = Utilities.Sum(t)
select t;
The thing that is weird here, is that Utilities.Sum() has to return a value, even if I don't use it. Is there a way to use it inside a LINQ statement if it returns void? I obviously can't do something like this:
var rows =
from t in totals
Utilities.Sum(t)
select t;
PS - I know this is probably not good practice to call a method with side effects in a LINQ expression. Just trying to understand LINQ syntax completely.
No, there is no LINQ method that performs an Action on all of the items in the IEnumerable<T>. It was very specifically left out because the designers actively didn't want it to be in there.
Answering the question
No, but you could cheat by creating a Func which just calls the intended method and spits out a random return value, bool for example:
Func<Total, bool> dummy = (total) =>
{
Utilities.Sum(total);
return true;
};
var rows = from t in totals
let unused = dummy(t)
select t;
But this is not a good idea - it's not particularly readable.
The let statement behind the scenes
What the above query will translate to is something similar to this:
var rows = totals.Select(t => new { t, unused = dummy(t) })
.Select(x => x.t);
So another option if you want to use method-syntax instead of query-syntax, what you could do is:
var rows = totals.Select(t =>
{
Utilities.Sum(t);
return t;
});
A little better, but still abusing LINQ.
... but what you should do
But I really see no reason not to just simply loop around totals separately:
foreach (var t in totals)
Utilities.Sum(t);
You should download the "Interactive Extensions" (NuGet Ix-Main) from Microsoft's Reactive Extensions team. It has a load of useful extensions. It'll let you do this:
var rows =
from t in totals.Do(x => Utilities.Sum(x))
select t;
It's there to allow side-effects on a traversed enumerable.
Please, read my comment to the question. The simplest way to achieve such of functionality is to use query like this:
var rows = from t in totals
group t by t.name into grp
select new
{
Name = t.Key,
Sum = grp.Sum()
};
Above query returns IEnumerable object.
For further information, please see: 101 LINQ Samples
I have lots of queries like sample1,sample2 and sample3. There are more than 13 million records in mongodb collection. So this query getting long time. Is there any way to faster this query?
I think using IMongoQuery object to resolve this problem. Is there any better way?
Sample 1:
var collection = new MongoDbRepo().DbCollection<Model>("tblmodel");
decimal total1 = collection.FindAll()
.SelectMany(x => x.MMB.MVD)
.Where(x => x.M01.ToLower() == "try")
.Sum(x => x.M06);
Sample 2:
var collection = new MongoDbRepo().DbCollection<Model>("tblmodel");
decimal total2 = collection.FindAll().Sum(x => x.MMB.MVO.O01);
Sample 3:
var list1= collection.FindAll()
.SelectMany(x => x.MHB.VLH)
.Where(x => x.V15 > 1).ToList();
var list2= list1.GroupBy(x => new { x.H03, x.H09 })
.Select(lg =>
new
{
Prop1= lg.Key.H03,
Prop2= lg.Count(),
Prop3= lg.Sum(w => w.H09),
});
The function FindAll returns a MongoCursor. When you add LINQ extension methods on to the FindAll, all of the processing happens on the client, not the Database server. Every document is returned to the client. Ideally, you'll need to pass in a query to limit the results by using Find.
Or, you could use the AsQueryable function to better utilize LINQ expressions and the extension methods:
var results = collection.AsQueryable().Where(....);
I don't understand your data model, so I can't offer any specific suggestions as to how to add a query that would filter more of the data on the server.
You can use the SetFields chainable method after FindAll to limit the fields that are returned if you really do need to return every document to the client for processing.
You also might find that writing some of the queries using the MongoDB aggregation framework might produce similar results, without sending any data to the client (except the results). Or, possibly a Map-Reduce depending on the nature of the data.
There are quite a few other questions similiar to this but none of them seem to do what I'm trying to do. I'd like pass in a list of string and query
SELECT ownerid where sysid in ('', '', '') -- i.e. List<string>
or like
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => a.sysid IN chiList).Select(a => a.ownerid);
I've been playing around with a.sysid.Contains() but haven't been able to get anywhere.
Contains is the way forward:
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
Although you'd be better off with a HashSet<string> instead of a list, in terms of performance, given all the contains checks. (That's assuming there will be quite a few entries... for a small number of values, it won't make much difference either way, and a List<string> may even be faster.)
Note that the performance aspect is assuming you're using LINQ to Objects for this - if you're using something like LINQ to SQL, it won't matter as the Contains check won't be done in-process anyway.
You wouldn't call a.sysid.Contains; the syntax for IN (SQL) is the reverse of the syntax for Contains (LINQ)
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
In addition to the Contains approach, you could join:
var parRec = from a in Lnq.attlnks
join sysid in chiLst
on a.sysid equals sysid
select a.ownerid
I'm not sure whether this will do better than Contains with a HashSet, but it will at least have similar performance. It will certainly do better than using Contains with a list.