C# LINQ - What can I do here to improve performance? - c#

I'm doing some heavy filtering on a collection (which is nothing more than an encapsulated list of entries of "datalines").
I need to 'consolidate' these lines on 3 fields (Date (string), Route (string) and ConsolidationCode (string)).
Extracting the 3 Distinct Lists works fast. I'm more worried about the triple foreach...
I'd say that a normal "complete _DealerCaseSetComplete contains 5000 entries.
The Dates would be around 5, the Routes would be around 100 and the Consolidations 350-500.
I have written following method. It does exactly what I want it to do, but is very slow in calculationtime.
Perhaps you people could guide me towards a faster code execution.
If you require any other code (which is really plain actually, please ask.
private void FillDataGridView()
{
//
_LocalGridControl.Invoke(CreateDataGrid);
//Filter by Date
List<string> Dates = _DealerCaseSetComplete.Data.Select(rec => rec.DateAdded).Distinct().ToList();
//Filter by Route
List<string> Routes = _DealerCaseSetComplete.Data.Select(rec => rec.Route).Distinct().ToList();
//Filter by Consolidation
List<string> Consolidations = _DealerCaseSetComplete.Data.Select(rec => rec.DealerConsolidationCode).Distinct().ToList();
foreach(string d in Dates)
{
foreach(string r in Routes)
{
foreach(string c in Consolidations)
{
List<DealerCaseLine> Filter = _DealerCaseSetComplete.Data.Where(rec => rec.DateAdded == d &&
rec.Route == r &&
rec.DealerConsolidationCode == c).ToList();
if(Filter.Count > 0)
_LocalGridControl.Invoke(AddLineToDataGrid, Filter);
}
}
}
_LocalGridControl.Invoke(SortDataGrid);
}

Looks like you need grouping by three fields:
var filters = from r in _DealerCaseSetComplete.Data
group r by new {
r.DateAdded,
r.Route,
r.DealerConsolidationCode
} into g
select g.ToList();
foreach(List<DealerCaseLine> filter in filters)
_LocalGridControl.Invoke(AddLineToDataGrid, filter);
Your code iterates all data three times to get distinct fields. Then it iterates all data for all combinations of distinct fields (when you do filtering with where clause). With grouping by this three fields you will iterate data only once. Each resulting group will have at least one item, so you don't need to check if there is any items in group, before invoking filter.

It looks like you're trying to get every distinct combination of Dates, Routes and Consolidations.
Your current code is slow because it is, I think, O(n^4). You have three nested loops, the body of which is a linear search.
You can get much better performance by using the overload of Distinct that takes an IEqualityComparer<T>:
http://msdn.microsoft.com/en-us/library/bb338049.aspx
var Consolidated =
_DealerCaseSetComplete.Data.Select(rec => rec).
Distinct(new DealerCaseComparer());
The class DealerCaseComparer would be implemented much as in the above MSDN link.

Related

Retrieving non-duplicates from 2 Collections using LINQ

Background: I have two Collections of different types of objects with different name properties (both strings). Objects in Collection1 have a field called Name, objects in Collection2 have a field called Field.
I needed to compare these 2 properties, and get items from Collection1 where there is not a match in Collection2 based on that string property (Collection1 will always have a greater or equal number of items. All items should have a matching item by Name/Field in Collection2 when finished).
The question: I've found answers using Lists and they have helped me a little(for what it's worth, I'm using Collections). I did find this answer which appears to be working for me, however I would like to convert what I've done from query syntax (if that's what it's called?) to a LINQ query. See below:
//Query for results. This code is what I'm specifically trying to convert.
var result = (from item in Collection1
where !Collection2.Any(x => x.ColumnName == item.FieldName)
select item).ToList();
//** Remove items in result from Collection1**
//...
I'm really not at all familiar with either syntax (working on it), but I think I generally understand what this is doing. I'm struggling trying to convert this to LINQ syntax though and I'd like to learn both of these options rather than some sort of nested loop.
End goal after I remove the query results from Collection1: Collection1.Count == Collection2 and the following is true for each item in the collection: ItemFromCollection1.Name == SomeItemFromCollection2.Field (if that makes sense...)
You can convert this to LINQ methods like this:
var result = Collection1.Where(item => !Collection2.Any(x => x.ColumnName == item.FieldName))
.ToList();
Your first query is the opposite of what you asked for. It's finding records that don't have an equivalent. The following will return all records in Collection1 where there is an equivalent:
var results=Collection1.Where(c1=>!Collection2.Any(c2=>c2.Field==c1.Name));
Please note that this isn't the fastest approach, especially if there is a large number of records in collection2. You can find ways of speeding it up through HashSets or Lookups.
if you want to get a list of non duplicate values to be retained then do the following.
List<string> listNonDup = new List<String>{"6","1","2","4","6","5","1"};
var singles = listNonDup.GroupBy(n => n)
.Where(g => g.Count() == 1)
.Select(g => g.Key).ToList();
Yields: 2, 4, 5
if you want a list of all the duplicate values then you can do the opposite
var duplicatesxx = listNonDup.GroupBy(s => s)
.SelectMany(g => g.Skip(1)).ToList();

lambda and linq expression

I am working on an asp.net mvc 5 application and I am trying to filter a list but I'll always get a bad result when I have a multiple selection. Im using a simple form with checkboxes to know which mission's criteria is selected.
My Database:
Table Mission has a list of criteria (Table: CriteriaList)
// public virtual ICollection<Criteria> CriteriaList { get; set; }
int[] CriteriaSelected = List of criteria selected in the form
var items = from i in db.Missions select i;
foreach (var criteriaID in CriteriaSelected)
{
items = items.Where(m => m.CriteriaList.Any(c => c.CriteriaID == criteriaID ));
}
I know it might be a problem with the 'and' operator used to concatenate the multiple "where" because I got the right result just for one checkbox selected. But I'm right now a little lost to do a multiple selection of criteria
Your help is really appreciated
I would try:
var items = db.Missions.Where(m => m.CriteriaList
.Any(c => CriteriaSelected.Contains(c.CriteriaID )));
You should get every mission where at least one of the criteria in the list is found in the selected criterias.
But be aware, that this is not with good performance with too much records because of lots of list searches.
#loiti was close, but deleted his answer instead of revising it. Here's what you need:
var criteriaSelectedIds = CriteriaSelected.Select(s => s.CriteriaID);
var items = db.Missions.Where(m =>
m.CriteriaList.Any(c =>
criteriaSelectedIds.Contains(c.CriteriaID)
)
);

Linq intersect to filter multiple criteria against list

I'm trying to filter users by department. The filter may contain multiple departments, the users may belong to multiple departments (n:m). I'm fiddling around with LINQ, but can't find the solution. Following example code uses simplified Tuples just to make it runnable, of course there are some real user objects.
Also on CSSharpPad, so you have some runnable code: http://csharppad.com/gist/34be3e2dd121ffc161c4
string Filter = "Dep1"; //can also contain multiple filters
var users = new List<Tuple<string, string>>
{
Tuple.Create("Meyer", "Dep1"),
Tuple.Create("Jackson", "Dep2"),
Tuple.Create("Green", "Dep1;Dep2"),
Tuple.Create("Brown", "Dep1")
};
//this is the line I can't get to work like I want to
var tuplets = users.Where(u => u.Item2.Intersect(Filter).Any());
if (tuplets.Distinct().ToList().Count > 0)
{
foreach (var item in tuplets) Console.WriteLine(item.ToString());
}
else
{
Console.WriteLine("No results");
}
Right now it returns:
(Meyer, Dep1)
(Jackson, Dep2)
(Green, Dep1;Dep2)
(Brown, Dep1)
What I would want it to return is: Meyer,Green,Brown. If Filter would be set to "Dep1;Dep2" I would want to do an or-comparison and find *Meyer,Jackson,Green,Brown" (as well as distinct, as I don't want Green twice). If Filter would be set to "Dep2" I would only want to have Jackson, Green. I also played around with .Split(';'), but it got me nowhere.
Am I making sense? I have Users with single/multiple departments and want filtering for those departments. In my output I want to have all users from the specified department(s). The LINQ-magic is not so strong on me.
Since string implements IEnumerable, what you're doing right now is an Intersect on a IEnumerable<char> (i.e. you're checking each letter in the string). You need to split on ; both on Item2 and Filter and intersect those.
var tuplets = users.Where(u =>
u.Item2.Split(new []{';'})
.Intersect(Filter.Split(new []{';'}))
.Any());
string[] Filter = {"Dep1","Dep2"}; //Easier if this is an enumerable
var users = new List<Tuple<string, string>>
{
Tuple.Create("Meyer", "Dep1"),
Tuple.Create("Jackson", "Dep2"),
Tuple.Create("Green", "Dep1;Dep2"),
Tuple.Create("Brown", "Dep1")
};
//I would use Any/Split/Contains
var tuplets = users.Where(u => Filter.Any(y=> u.Item2.Split(';').Contains(y)));
if (tuplets.Distinct().ToList().Count > 0)
{
foreach (var item in tuplets) Console.WriteLine(item.ToString());
}
else
{
Console.WriteLine("No results");
}
In addition to the other answers, the Contains extension method may also be a good fit for what you're trying to do if you're matching on a value:
var result = list.Where(x => filter.Contains(x.Value));
Otherwise, the Any method will accept a delegate:
var result = list.Where(x => filter.Any(y => y.Value == x.Value));

List<> Iteration performance

Hi I have a question with regards to the efficiency of iterating through a list of values.
I am wondering say you have to look through a list of values pulling out those values that match your current search criteria, does it make sense to remove the match you have found once you have found it, resulting in a smaller list of values to search through on the next iteration. Or does this make little difference. Here is my code.
foreach (Project prj in projectList)
{
string prjCode = prj.Code;
var match = usersProjects.FirstOrDefault(x => x.Code == prjCode);
if (match != null)
{
usersProjects.Remove(match);
//More logic here
}
}
Basically I am searching for a project code that corresponds to a user from a list of all projects.
Say there are 50 projects, and the user has access to 20 of them. Does removing the found project every loop reducing the overall project count make the iteration more efficient? Thanks.
I wouldn't recommend changing the list - that, itself, is slow, order O(n).
Use a prepared lookup to do what you want instead of FirstOrDefault()
var projectLookup = usersProjects.ToLookup((x) => x.Code);
foreach (Project prj in projectList)
{
string prjCode = prj.Code;
var match = projectLookup[prjCode].FirstOrDefault()
if (match != null)
{
//More logic here
}
}
Note that ToLookup() is expensive so you want to retain the lookup if possible - consider recreating it only when userProjects changes. After that, actually using the lookup to retrieve a match requires only constant time.
I would suggest using a group join for this:
var matches =
from prj in projectList
join x in usersProjects on prj.Code equals x.Code into xs
where xs.Any()
select xs.First();
Actually, a slightly better query would be:
var matches =
from prj in projectList
join x in usersProjects on prj.Code equals x.Code into xs
from x1 in xs.Take(1)
select x1;
If you then need to remove them from the usersProjects list you would need to do this:
foreach (var match in matches)
{
usersProjects.Remove(match);
}
But, if you just want to know what's left in the usersProjects if you removed the matches you could then just do this:
var remainingUsersProjects = usersProjects.Except(matches);
At the end of all of this the only thing you need to do is time all of the options to see what is faster.
But I would think that it really won't matter unless your lists are huge. Otherwise I'd go with the simplest to understand code so that you can maintain your project in the future.
Instead of loop and multiple FirstOrDefault() calls, you can use simple Where() method to get all user projects:
userProjects = userProjects.Where(up => projectList.All(p => up.Code != p.Code))

Single LINQ expression to tally up several columns in a DataSet

I have a DataSet with several rows and columns (as DataSets tend to have). I need to create a tally row on the bottom with sums of each column. I'd like to do this with a single LINQ expression, as it will simplify a bunch of my code. I can get the total of a single column like so:
var a = (from m in month
where <some long expression>
select m["BGCO_MINUTES"] as Decimal?).Sum();
However, I want totals for other columns as well. I don't want to use multiple LINQ expressions because there's also a complicated where clause in there, and I'm doing several tally rows with various expressions and only want to loop through this set once. I also don't want to manually loop through the dataset myself and add up the totals since I'm creating many of these tally rows and think it would be messier.
What I want is an anonymous type that contains a total of BGCO_MINUTES, 800IB_MINUTES and TSDATA_MINUTES.
Is there any way to do this?
You could do this:
// run the filters once and get List<DataRow> with the matching rows
var list = (from m in month
where <some long expression>
select m).ToList();
// build the summary object
var result = new {
BGCO_MINUTES = list.Sum(m => m["BGCO_MINUTES"] as Decimal?),
_800IB_MINUTES= list.Sum(m => m["800IB_MINUTES"] as Decimal?),
}
And that's assuming your where clause is not just long to type, but computationally expensive to evaluate. That will iterate through the list once per column.
If you really want to only iterate the list once, you can probably do it with Enumerable.Aggregate, but the code is less elegant (in my opinion):
// run the filters once and get List<DataRow> with the matching rows
var a = (from m in month
where <some long expression>
select m)
.Aggregate( new { BGCO_MINUTES = (decimal?)0m,
_800IB_MINUTES = (decimal?)0m },
(ac,v) => new { BGCO_MINUTES = ac.BGCO_MINUTES + (decimal?)v["BGCO_MINUTES"],
_800IB_MINUTES = ac._800IB_MINUTES + (decimal?)v["800IB_MINUTES"] });
Like I said, I think it's less elegant than the first version, but it should work. Even though the first one requires a temporary copy of the values that match the where clause (memory cost) and 1 pass through the list for each field (CPU cost), I think it's a lot more readable than the latter version - make sure the performance difference is worth it before using the less-understandable version.
Use Aggregate instead of Sum as it is more flexible - you will be able to have object (or simply dictionary) to hold sums for individual columns while iterating through each row.
(non-compiled code ahead)
class SumObject {
public float First;
public float Second;
}
var filtered = (from m in month
where <some long expression>
select m;
filtered.Aggregate(new SumObject(), (currentSum, item)=> {
currentSum.First += item.First;
currentSum.Second += item.Second;
return currentSum;
});

Categories

Resources