Hi I have a question with regards to the efficiency of iterating through a list of values.
I am wondering say you have to look through a list of values pulling out those values that match your current search criteria, does it make sense to remove the match you have found once you have found it, resulting in a smaller list of values to search through on the next iteration. Or does this make little difference. Here is my code.
foreach (Project prj in projectList)
{
string prjCode = prj.Code;
var match = usersProjects.FirstOrDefault(x => x.Code == prjCode);
if (match != null)
{
usersProjects.Remove(match);
//More logic here
}
}
Basically I am searching for a project code that corresponds to a user from a list of all projects.
Say there are 50 projects, and the user has access to 20 of them. Does removing the found project every loop reducing the overall project count make the iteration more efficient? Thanks.
I wouldn't recommend changing the list - that, itself, is slow, order O(n).
Use a prepared lookup to do what you want instead of FirstOrDefault()
var projectLookup = usersProjects.ToLookup((x) => x.Code);
foreach (Project prj in projectList)
{
string prjCode = prj.Code;
var match = projectLookup[prjCode].FirstOrDefault()
if (match != null)
{
//More logic here
}
}
Note that ToLookup() is expensive so you want to retain the lookup if possible - consider recreating it only when userProjects changes. After that, actually using the lookup to retrieve a match requires only constant time.
I would suggest using a group join for this:
var matches =
from prj in projectList
join x in usersProjects on prj.Code equals x.Code into xs
where xs.Any()
select xs.First();
Actually, a slightly better query would be:
var matches =
from prj in projectList
join x in usersProjects on prj.Code equals x.Code into xs
from x1 in xs.Take(1)
select x1;
If you then need to remove them from the usersProjects list you would need to do this:
foreach (var match in matches)
{
usersProjects.Remove(match);
}
But, if you just want to know what's left in the usersProjects if you removed the matches you could then just do this:
var remainingUsersProjects = usersProjects.Except(matches);
At the end of all of this the only thing you need to do is time all of the options to see what is faster.
But I would think that it really won't matter unless your lists are huge. Otherwise I'd go with the simplest to understand code so that you can maintain your project in the future.
Instead of loop and multiple FirstOrDefault() calls, you can use simple Where() method to get all user projects:
userProjects = userProjects.Where(up => projectList.All(p => up.Code != p.Code))
Related
I'm trying to get my head around this rather than just chalking it up to general voodoo.
I do an EF query and get some data back, and I .ToList() it, like this:
IEnumerable<DatabaseMatch<CatName>> nameMatches = nameLogicMatcher.Match(myIQueryableOfCats).ToList();
Some cats appear twice in the database because they have multiple names, but each cat has a primary name. So in order to filter this down, I get all of the ids of the cats in a list:
List<int> catIds = nameMatches.Select(c => c.Match.CatId).ToList();
I then iterate through all of the distinct ids, get all of the matching cat names, and remove anything that isn't a primary name from the list, like this:
foreach (int catId in catIds.Distinct())
{
var allCatNameMatches = nameMatches.Where(c => c.Match.CatId == catId);
var primaryMatch = allCatNameMatches.FirstOrDefault(c => c.Match.NameType == "Primary Name");
nameMatches = nameMatches.Except(allCatNameMatches.Where(c => c != primaryMatch));
}
Now this code, when I first ran it, just hung. Which I thought was odd. I stepped through it, and it seemed to work but after 10 iterations (it is capped at 100 cats in total) it started to slow down and then eventually it was glacial and then hung completely.
I thought maybe it was doing some intensive database work by mistake, but the profiler shows no SQL executed except that which retrieves the initial list of cat names.
I decided to change it from IEnumerable of nameMatches to a List, and put the appropriate .ToList() on the last line. It worked instantly and perfectly after I did this.
The question I'd like to ask is, why?
Without the ToList() you are building up in nameMatches a nested chain of IEnumerables awaiting delayed execution. This might not be so bad, except you are also calling FirstOrDefault on each iteration which will execute the chain. So on iteration number n, you are executing the filter operations contained in the loop n-1 times. If you had 1000 distinct cats, the Linq chain is getting executed 1000 + 99 + ... + 1 times. (I think you have something that is O(n³)!)
The moral is, if you want to use delayed execution, make very sure that you're only executing your chain once.
Let's simplify your code a little:
foreach (int catId in catIds.Distinct())
{
var allCatNameMatches = nameMatches.Where(c => c.Match.CatId == catId);
var primaryMatch = null;
nameMatches = nameMatches.Except(allCatNameMatches.Where(c => c != primaryMatch));
}
And a little more:
foreach (int catId in catIds.Distinct())
{
nameMatches = nameMatches.Where(c => c.Match.CatId == catId);
var primaryMatch = null;
nameMatches = nameMatches.Except(nameMatches.Where(c => c != primaryMatch));
}
In the latter one it is obvious that due to deferred execution each pass of foreach body lengthens the chain of Where and Except. Then remember var primaryMatch = allCatNameMatches.FirstOrDefault. It is not deferred executed, so in each iteration of foreach it should execute all chain. Therefore it hangs.
Consider following code snippet
List orderList ; // This list is pre-populated
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items) // OrdersChoiceList is of type System.Web.UI.WebControls.CheckBoxList
{
foreach (Order o in orderList)
{
if (item.id == o.id)
{
item.Selected = scopeComputer.SelectedBox;
break;
}
}
}
There are thousands of item in the list, hence these loops are time consuming. How we can optimze it?
Also how can we do the same stuff with LINQ. I tried using join operation but not able to set the value of "Selected" variable based on "SelectedBox". For now I hardocoded the value in select clause to "true", how can we pass & use SelectedBox value in select clause
var v = (from c in ComputersChoiceList.Items.Cast<ListItem>()
join s in scopeComputers on c.Text equals s.CName
select c).Select(x=>x.Selected = true);
I think you need to eliminate the nested iteration. As you state, both lists have a large set of items. If they both have 5,000 items, then you're looking at 25,000,000 iterations in the worst case.
There's no need to continually re-iterate orderList for every single ListItem. Instead create an ID lookup so you have fast O(1) lookups for each ID. Not sure what work is involved hitting scopeComputer.SelectedBox, but that may as well be resolved once outside the loop as well.
bool selectedState = scopeComputer.SelectedBox;
HashSet<int> orderIDs = new HashSet<int>(orders.Select(o => o.id));
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items)
{
if (orderIDs.Contains(item.id))
item.Selected = selectedState;
}
Using a HashSet lookup, you're now really only iterating 5,000 times plus a super-fast lookup.
EDIT: From what I can tell, there's no id property on ListItem, but I'm assuming that the code you've posted is condensed for brevity, but largely representative of your overall process. I'll keep my code API/usage to match what you have there; I'm assuming it's translatable back to your specific implementation.
EDIT: Based on your edited question, I think you're doing yet another lookup/iteration on retrieving the scopeComputer reference. Similarly, you can make another lookup for this:
HashSet<int> orderIDs = new HashSet<int>(orders.Select(o => o.id));
Dictionary<string, bool> scopeComputersSelectedState =
scopeComputers.ToDictionary(s => s.CName, s => s.Selected);
foreach (System.Web.UI.WebControls.ListItem item in OrdersChoiceList.Items)
{
if (orderIDs.Contains(item.id))
item.Selected = scopeComputersSelectedState[item.Text];
}
Again, not sure on the exact types/usage you have. You could also condense this down with a single LINQ query, but I don't think (performance speaking) you will see much of a improvement. I'm also assuming that there is a matching ScopeComputer for every ListItem.Text entry otherwise you'll get an exception when accessing scopeComputersSelectedState[item.Text]. If not, then it should be a trivial exercise for you to change it to perform a TryGetValue lookup instead.
I've got 2 parts to this question:
Is there way to get rid of the outer loop
List<Document> newDocuments = new List<Document>();
foreach (DocumentDetail documentDetail in documentDetails)
{
newDocuments.AddRange(documents.FindAll(d => d.Extension.ToUpperInvariant() == documentDetail.Extension.ToUpperInvariant()));
}
As you can see from the above, I'm just dealing with the "Extension" and even if I ended up keeping my outer foreach loop, I still want to check every other properties (i.e. description, etc...) so I may end up with multiple calls to the inner part and it will ended up looking like this, assuming I'm now checking Description
newDocuments.AddRange(documents.FindAll(d => d.Description.ToUpperInvariant() == documentDetail.Description.ToUpperInvariant()));
The problem with the above is that I could end up with duplicate documents if a document happens to have a .pdf extension and then matches the description.
How can I ensure that no duplicates documents are added. Can I add something to my linq query (or to lambda?). In terms of uniqueness, while not definite, for now I have access to a "documentno" which is unique for all the documents held in the "documents" lists.
If you know the answer to one part or the other, please let me know.
Appreciated!
Edited
What about this? It definitely works, but I'm not sure this is the right way to write this?
List<Document> newDocs = (from documentDetail in documentDetails
from document in documents
where document.Extension.ToUpperInvariant() == documentDetail.Extension.ToUpperInvariant()
select document).Distinct().ToList();
Am I better off just sticking to foreach loop?
Using the above, I would end up with something like this if I wanted to check multiple properties from the DocumentDetails list against my Document List:
List<Document> newDocuments = null;
if (documentDetails.FindAll(dd => (dd.Extension != null || !string.IsNullOrEmpty(dd.Extension))).Count > 0)
{
newDocuments = (from documentDetail in documentDetails
from document in documents
where document.Extension.ToUpperInvariant() == documentDetail.Extension.ToUpperInvariant()
select document).Distinct().ToList();
}
if (documentDetails.FindAll(dd => (dd.Description != null || !string.IsNullOrEmpty(dd.Description))).Count > 0)
{
newDocuments = (from documentDetail in documentDetails
from document in documents
where document.Description.ToUpperInvariant() == documentDetail.Description.ToUpperInvariant()
select document).Distinct().ToList();
}
While I'd still like someone to confirm that the way I wrote this is correct, I'm still left with how to append the results and then remove all duplicates. I guess when all is done, I could apply the Distinct, but I'm still left with "appending" issue. How do I do this then?
Thanks.
You can get rid of any duplicates before processing the list:
List<Document> newDocuments = new List<Document>();
List<Document> distinctItems = newDocuments .Distinct();
foreach (DocumentDetail documentDetail in documentDetails)
{
newDocuments.AddRange(documents.FindAll(d => d.Extension.ToUpperInvariant() == documentDetail.Extension.ToUpperInvariant()));
}
I'd be looking at something like this: Not tested yet - class definitions would help!
var extensions = documentDetail.Select(d => d.Extension.ToUpperInvariant()).ToList();
var newDocuments = documents.Where(d => extensions.Contains(d.Extension.ToUpperInvariant())).ToList();
What you're logically doing here is a Join:
var newDocuments = documentDetails.Join(documents,
documentDetail => documentDetail.Extension,
document => document.Extension,
(documentDetail, document) => document,
StringComparer.InvariantCultureIgnoreCase)
.Distinct();
(This could be done in query syntax, rather than method syntax, but then you couldn't supply the string comparer which allows you to compare the strings without needing to convert their case. That's the appropriate way of performing a case insensitive comparison.)
This code is, from a functional standpoint, the exact same as your first edited version, except that it's much more efficient. What you're doing is pairing up every single document with every single detail record and then filtering out just those that you need under the assumption that the item from each list has a "key" and you want the items where those keys match. This is such a common operation that it has it's own special operator (Join). It's also possible to optimize this specific case much more heavily than in your code. The Join operator is capable of only creating those pairs that it needs; it doesn't need to match up a lot of items that don't belong together only to throw them out later.
I'm doing some heavy filtering on a collection (which is nothing more than an encapsulated list of entries of "datalines").
I need to 'consolidate' these lines on 3 fields (Date (string), Route (string) and ConsolidationCode (string)).
Extracting the 3 Distinct Lists works fast. I'm more worried about the triple foreach...
I'd say that a normal "complete _DealerCaseSetComplete contains 5000 entries.
The Dates would be around 5, the Routes would be around 100 and the Consolidations 350-500.
I have written following method. It does exactly what I want it to do, but is very slow in calculationtime.
Perhaps you people could guide me towards a faster code execution.
If you require any other code (which is really plain actually, please ask.
private void FillDataGridView()
{
//
_LocalGridControl.Invoke(CreateDataGrid);
//Filter by Date
List<string> Dates = _DealerCaseSetComplete.Data.Select(rec => rec.DateAdded).Distinct().ToList();
//Filter by Route
List<string> Routes = _DealerCaseSetComplete.Data.Select(rec => rec.Route).Distinct().ToList();
//Filter by Consolidation
List<string> Consolidations = _DealerCaseSetComplete.Data.Select(rec => rec.DealerConsolidationCode).Distinct().ToList();
foreach(string d in Dates)
{
foreach(string r in Routes)
{
foreach(string c in Consolidations)
{
List<DealerCaseLine> Filter = _DealerCaseSetComplete.Data.Where(rec => rec.DateAdded == d &&
rec.Route == r &&
rec.DealerConsolidationCode == c).ToList();
if(Filter.Count > 0)
_LocalGridControl.Invoke(AddLineToDataGrid, Filter);
}
}
}
_LocalGridControl.Invoke(SortDataGrid);
}
Looks like you need grouping by three fields:
var filters = from r in _DealerCaseSetComplete.Data
group r by new {
r.DateAdded,
r.Route,
r.DealerConsolidationCode
} into g
select g.ToList();
foreach(List<DealerCaseLine> filter in filters)
_LocalGridControl.Invoke(AddLineToDataGrid, filter);
Your code iterates all data three times to get distinct fields. Then it iterates all data for all combinations of distinct fields (when you do filtering with where clause). With grouping by this three fields you will iterate data only once. Each resulting group will have at least one item, so you don't need to check if there is any items in group, before invoking filter.
It looks like you're trying to get every distinct combination of Dates, Routes and Consolidations.
Your current code is slow because it is, I think, O(n^4). You have three nested loops, the body of which is a linear search.
You can get much better performance by using the overload of Distinct that takes an IEqualityComparer<T>:
http://msdn.microsoft.com/en-us/library/bb338049.aspx
var Consolidated =
_DealerCaseSetComplete.Data.Select(rec => rec).
Distinct(new DealerCaseComparer());
The class DealerCaseComparer would be implemented much as in the above MSDN link.
There are quite a few other questions similiar to this but none of them seem to do what I'm trying to do. I'd like pass in a list of string and query
SELECT ownerid where sysid in ('', '', '') -- i.e. List<string>
or like
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => a.sysid IN chiList).Select(a => a.ownerid);
I've been playing around with a.sysid.Contains() but haven't been able to get anywhere.
Contains is the way forward:
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
Although you'd be better off with a HashSet<string> instead of a list, in terms of performance, given all the contains checks. (That's assuming there will be quite a few entries... for a small number of values, it won't make much difference either way, and a List<string> may even be faster.)
Note that the performance aspect is assuming you're using LINQ to Objects for this - if you're using something like LINQ to SQL, it won't matter as the Contains check won't be done in-process anyway.
You wouldn't call a.sysid.Contains; the syntax for IN (SQL) is the reverse of the syntax for Contains (LINQ)
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
In addition to the Contains approach, you could join:
var parRec = from a in Lnq.attlnks
join sysid in chiLst
on a.sysid equals sysid
select a.ownerid
I'm not sure whether this will do better than Contains with a HashSet, but it will at least have similar performance. It will certainly do better than using Contains with a list.