Slow Performance of Linq Where statement - c#

I have a List of Objects (roughly 100k) that is must iterate upon in order to produce a Dictionary.
however the code is performing very slowly, specifically on one line
public class Item{
public int ID;
public int Secondary_ID;
public string Text;
public int Number;
}
Data Looks something like (100k lines)
ID | Secondary_ID | Text | Number
1 | 1 | "something" | 3
1 | 1 | "something else"| 7
1 | 1 | "something1" | 4
1 | 2 | "something2" | 344
2 | 3 | "something3" | 74
2 | 3 | "something4" | 1
and i would like it to look like this when finished. (any collection will do to be honest)
Dictionary<int, string>
Key | Value
(secondary_ID) | (Text : Number)
1 | "Something : 3, Something else : 7, Something1 : 4"
2 | "Something2 : 344"
3 | "Something3 : 74, Something4 : 1"
My code currently works like this ListAll contains all data.
var Final=new Dictionary<int, string>();
var id1s=ListAll.Select(x => x.ID).Distinct().ToList();
foreach(var id1 in id1s) {
var shortList=ListAll.Where(x => x.ID==id1).ToList(); //99% of time spent is here
var id2s=shortList.Select(x => x.Secondary_ID).Distinct().ToList();
foreach(var id2 in id2s) {
var s=new StringBuilder();
var items=shortList.Where(x => x.Secondary_ID==id2).ToList();
foreach(var i in items) {
s.Append(String.Format("{0} : {1}", i.Text, i.Number));
}
Final.Add(id2, s.ToString());
}
}
return Final;
now the output is correct however as stated in the above comment, this takes an incredibly long time to process (90 seconds - certainly more than i am comfortable with) and was wondering if there is a faster way of achieving this.
This code is only going to be used once so is not really a normal usage and normally I would ignore it for that reason, but was wondering for learning purposes.

Here's what I would do (untested, but hopefully you get the idea):
var final = ListAll.GroupBy(x => x.Secondary_ID)
.ToDictionary(x => x.Key, x => String.Join(", ",
x.Select(y => String.Format("{0} : {1}",
y.Text, y.Number)))
This first groups by Secondary_ID using GroupBy, then puts the result into a dictionary using ToDictionary.
The GroupBy will group your data into the following groups:
Key = 1:
ID | Secondary_ID | Text | Number
1 | 1 | "something" | 3
1 | 1 | "something else"| 7
1 | 1 | "something1" | 4
Key = 2:
ID | Secondary_ID | Text | Number
1 | 2 | "something2" | 344
Key = 3:
ID | Secondary_ID | Text | Number
2 | 3 | "something3" | 74
2 | 3 | "something4" | 1
Then the .ToDictionary method:
Selects the key as x.Key (the key we grouped on, i.e. Secondary_ID).
Selects the result of a String.Join operation as the value. What is being joined is the collection of "Text : Number" from the elements inside that group - x.Select(y => String.Format("{0} : {1}", y.Text, y.Number).

A much more efficient (and even easier to write) method of grouping the items by ID is to use GroupBy.
var query = ListAll.GroupBy(x => x.Secondary_ID)
.ToDictionary(group => group.Key,
group => string.Join(", ",
group.Select(item => string.Format("{0} : {1}",item.Text , item.Number))),
//consider refactoring part of this line out to another method
});
As for the reason that your code is so slow, you're searching through the entire list for each distinct ID. That's an O(n^2) operation. GroupBy doesn't do that. It uses a hash based structure internally, based on whatever you're grouping on, so that it can quickly (in O(1) time) find the bucket that any given item belongs in, as opposed to the O(n) time it takes your method.

First, remove everywhere ToList(), it should becomes faster; because ToList() performs eager evaluation.
I think what your code expects to do is:
var Final=new Dictionary<int, string>();
foreach(var x in ListAll)
if(Final.ContainsKey(x.Secondary_ID))
Final[x.Secondary_ID]+=String.Format(", {0} : {1}", x.Text, x.Number);
else
Final.Add(x.Secondary_ID, String.Format("{0} : {1}", x.Text, x.Number));
return Final;
A Dictionary cannot contain a duplicate key, so it's no matter here you use ID or Secondary_ID, if your Secondary_ID must be in the range of existing ID; and you even do not need Distinct() in the code.
By doing some simplification, original code would be:
foreach(var id1 in ListAll.Select(x => x.ID).Distinct()) {
foreach(var id2 in ListAll.Where(x => x.ID==id1).Select(x => x.Secondary_ID).Distinct()) {
var s=new StringBuilder();
foreach(var i in ListAll.Where(x => x.ID==id1).Where(x => x.Secondary_ID==id2)) {
s.Append(String.Format("{0} : {1}", i.Text, i.Number));
}
Final.Add(id2, s.ToString());
}
}

Related

Column getting lost in LINQ with Method Syntax after group by

I'm pretty new to LINQ and trying to figure it out. I have the following statement:
Context.dataset1
.Join(
Context.dataset2,
r => r.ID, o => o.ID,
(r, o) => new { PartID = r.PartID, Quantity = r.Quantity1 - r.Quantity2, Date = o.Date })
.GroupBy(
column => new { column.Date },
(key, group) => new {Date = key.Date, Quantity = group.Sum(g => g.Quantity) })
.Where(x => x.Quantity > 0);
the return data set looks like this
| Date | Quantity |
| ------------- | ---------|
| 2022-01-01 | 333 |
| 2022-01-02 | 444 |
| 2022-03-03 | 444 |
what i want it to look like is
| PartID | Date | Quantity |
|--------| ------------- | ---------|
|1 | 2022-01-01 | 333 |
|1 | 2022-01-02 | 444 |
|2 | 2022-03-03 | 444 |
Basically it seems that when I do the groupby I lose access to the PartId column since i'm no specifying it inside the groupby. I'm not sure how to make it appear without grouping by it which I don't want to do.
Any help would be great. Thanks.
What if two different part ids exist for the same date? What part id would it show? If you really want the part id, then you need to include the part id in your group by. For example:
column => new { column.PartID, column.Date }
This will mean that if you have multiple part ids for the same date, you will have as many rows for that date as you have distinct part ids. Based on your comments, this seems like what you're after.

Filter LINQ To Entities (EF Core) query by List of HashSet of String and Enum

I have this Linq to Entities (EF Core) query which looks like below
var query = (from p in db.Samples
join q in db.Items on p.Id equals q.SampleId
Where p.active = IsActive and p.Id = GivenId
group new
{
p.Name,
p.Address,
p.Marks,
p.LocationId,
q.EmailId,
q.Grade
}
by new
{ q.Grade }
into data
select new DataSummary()
{
UserName = data.Name,
Grade = data.Min(x => x.Grade),
Email = data.Min(x => x.Email,
Total = data.Sum(x => x.Marks)
}.ToList()
Now I have a constant List of Hashset of Grades and Location that looks like this:
public List<(HashSet<string> Grades, HashSet<Location> Loctions)> LocationGrades => new()
{
(new() { "A", "B" }, new()), // Includes all location
(new() { "C"}, new(){
Location.Boston, //Location is Enum
Location.Maine
}
}
I want to get the data where if the student has grade A or B include all location and if the student has grade C only include Boston and Maine.
Is it possible to integrate this within the LINQ to Entities query?
Sample Table
| ID | Name | Address | Marks | LocationId |
|-----|-------|---------|-------|-------------|
| 234 | Test | 123 St | 240 | 3 (Maine) |
| 122 | Test1 | 234 St | 300 | 5 (Texas) |
| 142 | Test1 | 234 St | 390 | 1 (Boston) |
Items Table
| ID | SampelId | Grade | Email |
|----|----------|-------|-------|
| 12 | 234 | A | a.com |
| 13 | 122 | C | b.com |
| 14 | 142 | C | c.com |
So, In the table above I shouldn't get Texas row but get Boston row as they both have Grade C but Texas does not exist in the HashSet combo.
Okay, now I got it. You have to add dynamic ORed constraints to the query based on a given list of elements. This is a little tricky, because AND can be done with using multiple .Where() statements, but OR not. I did something similar recently against CosmosDB by using LinqKit and the same should also work against EF.
In your case you probably of to do something like this:
...
into data
.WhereAny(grades, (item, grade) => item.Grade == grade)
select new DataSummary()
...
I think the given example doesn't match your exact case, but it allows you to define multiple ORed constraints from a given list and I think this is the missing part you're searching. Take care to use within the lambda method only definitions which are also supported by EF core. The given inner enumeration (in this example grades) will be iterated on the client side and can be dynamically build with everything available in C#.

How to merge multiple list by id and get specific data?

i have 3 lists with common IDs. I need to group by object in one list, and extract data from other two. Will give example for more understanding
table for groupNames:
| Id | Name |
|--------------|
| 1 | Hello |
| 2 | Hello |
| 3 | Hey |
| 4 | Dude |
| 5 | Dude |
table for countId:
| Id | whatever |
|---------------|
| 1 | test0 |
| 1 | test1 |
| 2 | test2 |
| 3 | test3 |
| 3 | test4 |
table for lastTime:
| Id | timestamp |
|-----------------|
| 1 | 1636585230 |
| 1 | 1636585250 |
| 2 | 1636585240 |
| 3 | 1636585231 |
| 3 | 1636585230 |
| 5 | 1636585330 |
and I'm expecting result in list like this
| Name | whateverCnt | lastTimestamp |
|---------------------------------------|
| Hello | 3 | 1636585250 |
| Hey | 2 | 1636585231 |
| Dude | 0 | 1636585330 |
for now i had something like this, but it doesnt work
return groupNames
.GroupBy(x => x.Name)
.Select(x =>
{
return new myElem
{
Name = x.Name,
lastTimestamp = new DateTimeOffset(lastTime.Where(a => groupNames.Where(d => d.Name == x.Key).Select(d => d.Id).Contains(a.Id)).Max(m => m.timestamp)).ToUnixTimeMilliseconds(),
whateverCnt = countId.Where(q => (groupNames.Where(d => d.Name == x.Key).Select(d => d.Id)).ToList().Contains(q.Id)).Count()
};
})
.ToList();
Many thanks for any advice.
I think I'd mostly skip LINQ for this
class Thing{
public string Name {get;set;}
public int Count {get;set;}
public long LastTimestamp {get;set;}
}
...
var ids = new Dictionary<int, string>();
var result = new Dictionary<string, Thing>();
foreach(var g in groupNames) {
ids[g.Id] = g.Name;
result[g.Name] = new Whatever { Name = n };
}
foreach(var c in counts)
result[ids[c.Id]].Count++;
foreach(var l in lastTime){
var t = result[ids[l.Id]];
if(t.LastTimeStamp < l.Timestamp) t.LastTimeStamp = l.TimeStamp;
}
We start off making two dictionaries (you could ToDictionary this).. If groupNames is already a dictionary that maps id:name then you can skip making the ids dictionary and just use groupNames directly. This gives us fast lookup from ID to Name, but we actually want to colelct results into a name:something mapping, so we make one of those too. doing result[name] = thing always succeeds, even if we've seen name before. We could skip on some object creation with a ContainsKey check here if you want
Then all we need to do is enumerate our other N collections, building the result. The result we want is accessed from result[ids[some_id_value_here]] and it always exists if groupnames id space is complete (we will never have an id in the counts that we do not have in groupNames)
For counts, we don't care for any of the other data; just the presence of the id is enough to increment the count
For dates, it's a simple max algorithm of "if known max is less than new max make known max = new max". If you know your dates list is sorted ascending you can skip that if too..
In your example, the safest would be a list of the last specified object and just LINQ query the other arrays of objects for the same id.
So something like
public IEnumerable<SomeObject> MergeListsById(
IEnumerable<GroupNames> groupNames,
IEnumerable<CountId> countIds,
IEnumerable<LastTime> lastTimes)
{
IEnumerable<SomeObject> mergedList = new List<SomeObject>();
groupNames.ForEach(gn => {
mergedList.Add(new SomeObject {
Name = gn.Name,
whateverCnt = countIds.FirstOrDefault(ci => ci.Id == gn.Id)?.whatever,
lastTimeStamp = lastTimes.LastOrDefault(lt => lt.Id == gn.Id)?.timestamp
});
});
return mergedList;
}
Try it in a Fiddle or throwaway project and tweak it to your needs. A solution in pure LINQ is probably not desired here, for readability and maintainability sake.
And yes, as the comments say do carefully consider whether LINQ is your best option here. While it works, it does not always do better in performance than a "simple" foreach. LINQ's main selling point is and always has been short, one-line querying statements which maintain readability.
Well, having
List<(int id, string name)> groupNames = new List<(int id, string name)>() {
( 1, "Hello"),
( 2, "Hello"),
( 3, "Hey"),
( 4, "Dude"),
( 5, "Dude"),
};
List<(int id, string comments)> countId = new List<(int id, string comments)>() {
( 1 , "test0"),
( 1 , "test1"),
( 2 , "test2"),
( 3 , "test3"),
( 3 , "test4"),
};
List<(int id, int time)> lastTime = new List<(int id, int time)>() {
( 1 , 1636585230 ),
( 1 , 1636585250 ),
( 2 , 1636585240 ),
( 3 , 1636585231 ),
( 3 , 1636585230 ),
( 5 , 1636585330 ),
};
you can, technically, use the Linq below:
var result = groupNames
.GroupBy(item => item.name, item => item.id)
.Select(group => (Name : group.Key,
whateverCnt : group
.Sum(id => countId.Count(item => item.id == id)),
lastTimestamp : lastTime
.Where(item => group.Any(g => g == item.id))
.Max(item => item.time)));
Let's have a look:
Console.Write(string.Join(Environment.NewLine, result));
Outcome:
(Hello, 3, 1636585250)
(Hey, 2, 1636585231)
(Dude, 0, 1636585330)
But be careful: List<T> (I mean countId and lastTime) are not efficient data structures here. In the Linq query we have to scan them in order to get Sum and Max. If countId and lastTime are long, turn them (by grouping) into Dictionary<int, T> with id being Key

C# grouping loses row

I have got the following SQL-Table:
---------------------------------------------------
| ID | Line1 | Line2 | Line3 | Line4 | Line5 |
---------------------------------------------------
| 1 | Software | Citrix | XenApp | Null | Null |
---------------------------------------------------
| 2 | Software | Citrix | XenApp | Null | Null |
---------------------------------------------------
I used this code in order to group it by Line3:
var KategorienLine3 = result.GroupBy(x => x.Line3).ToList();
In which result represents the list including the 2 entries.
Now this grouping results in this output:
[0] -> Key = XenApp
[1] -> Key = XenApp
But I don't have access to Line2. I would like to include it in the result. How can I do that, so that I have access to that as well?
It don't want to group by it!! I just want to have it in the result.
Thats what it gives me after the grouping. I want to include Line2 as well.
The data is there. It is just in the IGrouping<TKey, TResult> object returned by the GroupBy. The reason you don't have access to Line2 is that each grouping contains a collection of records that are of that group - and each record there is of your object's type, and has the Line2 property.
To retrieve it project the data as you want it to show:
// method syntax
var result = data.GroupBy(key => key.Line3, item => item.Line2)
.Select(g => new
{
g.Key,
Line2 = g.ToList()
}).ToList();
// query syntax
var result = from item in data
group item.Line2 by item.Line3 into g
select new
{
g.Key,
Line2 = g.ToList()
};

Linq union usage?

Sql:
SELECT date,total_usage_T1 as TotalUsageValue,'T1' as UsageType FROM TblSayacOkumalari
UNION ALL
SELECT date,total_usage_T2 as TotalUsageValue,'T2' as UsageType FROM TblSayacOkumalari
And I try to do to convert it to linq
IEnumerable<TblSayacOkumalari> sayac_okumalari = entity.TblSayacOkumalari
.Select(x => new
{ x.date, x.total_usage_T1 })
.Union(entity.TblSayacOkumalari.Select(x => new
{ x.date, x.total_usage_T2 }));
But I dont know how to convert 'T1' as UsageType to linq. Also my union using is incorrect too.
My table fields like this:
| date | total_usage_T1 | total_usage_T2 |
| 2010 | 30 | 40 |
| 2011 | 40 | 45 |
| 2012 | 35 | 50 |
I want like this
| date | TotalUsageValue | UsageType |
| 2010 | 30 | T1 |
| 2011 | 40 | T1 |
| 2012 | 35 | T1 |
| 2010 | 40 | T2 |
| 2011 | 45 | T2 |
| 2012 | 50 | T2 |
I tried very hard, but could not. Please help.
EDIT
Def. from MSDN
Enumerable.Concat - Concatenates two sequences.
Enumerable.Union - Produces the set union of two sequences by using the default equality comparer.
My post : Concat() vs Union()
IEnumerable<TblSayacOkumalari> sayac_okumalari =
entity.TblSayacOkumalari
.Select(x => new
{
date= x.date,
TotalUsageValue = x.total_usage_T1,
UsageType = "T1"
})
.Concat(entity.TblSayacOkumalari
.Select(x => new
{
date= x.date,
TotalUsageValue = x.total_usage_T2,
UsageType = "T2" }
));
for usage type you juse need to add UsageType = "T2" in your new anonymous type as i did above this will do the task for you
Than you should go for Concat method rather than Union method ..
Example
int[] ints1 = { 1, 2, 3 }; int[] ints2 = { 3, 4, 5 };
IEnumerable<INT> union = ints1.Union(ints2);
Console.WriteLine("Union");
foreach (int num in union)
{
Console.Write("{0} ", num);
}
Console.WriteLine();
IEnumerable<INT> concat = ints1.Concat(ints2);
Console.WriteLine("Concat");
foreach (int num in concat)
{
Console.Write("{0} ", num);
}
output
Fact about Union and Concat
The output shows that Concat() method just combine two enumerable collection to single one but doesn't perform any operation/ process any element just return single enumerable collection with all element of two enumerable collections.
Union() method return the enumerable collection by eliminating the duplicate i.e just return single element if the same element exists in both enumerable collection on which union is performed.
Important point to Note
By this fact we can say that Concat() is faster than Union() because it doesn't do any processing.
But if after combining two collection using Concat() having single collection with too many number of duplicate element and if you want to perform further operation on that created collection takes longer time than collection created using Union() method, because Union() eliminate duplicate and create collection with less elements.
Use this:
var result = entity.TblSayacOkumalari
.Select(x => new
{
Date = x.date,
TotalUsage = x.total_usage_T1,
UsageType = "T1"
})
.Union(entity.TblSayacOkumalari.Select(x => new
{
Date = x.date,
TotalUsage = x.total_usage_T2,
UsageType = "T2"
}));
In order to get the expected property names on the anonymous type you probably want to do something like:
new { x.date, TotalUsage = x.total_usage_T1, UsageType="T1" }
and also
new { x.date, TotalUsage = x.total_usage_T2, UsageType="T2" }

Categories

Resources