Multiple LINQ Where vs foreach on List<T>

Multiple LINQ Where vs foreach on List<T> - c#

If I have a collection:
List<T> collection;
and I need to perform two test on this collection, which is more efficient:
foreach(T t in collection.where(w => w.value == true))
{
t.something = true;
}
foreach(T t in collection.where(w => w.value2 == true))
{
t.something2 = true;
}
Or
foreach(T t in collection)
{
if (t.value == true)
{
//check 1
}
if (t.value2 == true)
{
//check 2
}
}
I think it'll be the later because I presume that each where will iterate the collection but just wanted to be sure I wasn't missing something?

I think it might change according the collection size. However I would go with the second code.
The first code iterates the entire collection twice (one per where) and then iterates each result once.
The second code iterates just once the entire collection. Also, it's cleaner.

The second will most likely be slightly faster, but the difference is quite small.
What might be more important is that you do things in a different order. The first code loops through all items acting on one condition first, then again acting on the other condition. The second code loops through the items once, checking both conditions for each item in turn. Depending on what you are doing with the items, that may make a difference.

First case enumerates your collection twice (using WhereEnumerator to do the check for you), the second only once, but you have to do your check manually. The latter case would be more efficient also because you use simple conditional expression to compare the item value, while the WhereEnumerator has to call the supplied delegate on each item.

Related

Find and change the value of first element in list that meets a requirement, do something else if not found

This seems to be ridiculously easy, but I just can't seem to find a way to do it. Basically the title, I want to find the first item in my list that meets a requirement, and modify the value of that found item, and if none of the items in that list meets it, then do something else.
I was using a foreach loop to this, but it is definitely not the fastest way.
foreach (CustomClass foo in bar)
{
if (!foo.Value)
{
foo.Value = true;
currentCount++;
break;
}
}
I then tried to use List.First() and catching the exception when it can't find the value, but that is far slower, and I'm looking for performance.
EDIT: Never mind about what is below, I found how to make first or default work, but is there a faster way to do this multiple times than the foreach method? Thanks
So I tried FirstOrDefault, but I keep getting null reference exception
if (bar.FirstOrDefault(c => c.Value == false).Equals(null))
{
break;
}
else
{
thePicture.FirstOrDefault(c => c.Value == false).Value = true;
currentCount++;
}
Anyone know how to make the first or default work? Or is there any other way to do this faster than the foreach method. (This will be ran in another loop a lot of times) Thanks!

FirstOrDefault will return a null reference if no element is found - assuming the element type is a reference type. Instead of calling Equals on the result, just use ==... and don't call it twice:
var first = bar.FirstOrDefault(c => !c.Value);
if (first == null)
{
...
}
else
{
// Use first, I suspect.
// (You don't in the sample code, but...)
}
Note that this won't be faster than an appropriate foreach loop, but it can be more readable.

(bar!=null)?((bar[1].value == true)?(do something):(do something)):do something)
Here you are only checking the first element in list right?
So why going for a loop.

Converting foreach to Linq

Current Code:
For each element in the MapEntryTable, check the properties IsDisplayedColumn and IsReturnColumn and if they are true then add the element to another set of lists, its running time would be O(n), there would be many elements with both properties as false, so will not get added to any of the lists in the loop.
foreach (var mapEntry in MapEntryTable)
{
if (mapEntry.IsDisplayedColumn)
Type1.DisplayColumnId.Add(mapEntry.OutputColumnId);
if (mapEntry.IsReturnColumn)
Type1.ReturnColumnId.Add(mapEntry.OutputColumnId);
}
Following is the Linq version of doing the same:
MapEntryTable.Where(x => x.IsDisplayedColumn == true).ToList().ForEach(mapEntry => Type1.DisplayColumnId.Add(mapEntry.OutputColumnId));
MapEntryTable.Where(x => x.IsReturnColumn == true).ToList().ForEach(mapEntry => Type1.ReturnColumnId.Add(mapEntry.OutputColumnId));
I am converting all such foreach code to linq, as I am learning it, but my question is:
Do I get any advantage of Linq conversion in this case or is it a disadvantage ?
Is there a better way to do the same using Linq
UPDATE:
Consider the condition where out of 1000 elements in the list 80% have both properties false, then does where provides me a benefit of quickly finding elements with a given condition.
Type1 is a custom type with set of List<int> structures, DisplayColumnId and ReturnColumnId

ForEach ins't a LINQ method. It's a method of List. And not only is it not a part of LINQ, it's very much against the very values and patterns of LINQ. Eric Lippet explains this in a blog post that was written when he was a principle developer on the C# compiler team.
Your "LINQ" approach also:
Completely unnecessarily copies all of the items to be added into a list, which is both wasteful in time and memory and also conflicts with LINQ's goals of deferred execution when executing queries.
Isn't actually a query with the exception of the Where operator. You're acting on the items in the query, rather than performing a query. LINQ is a querying tool, not a tool for manipulating data sets.
You're iterating the source sequence twice. This may or may not be a problem, depending on what the source sequence actually is and what the costs of iterating it are.
A solution that uses LINQ as much as is it is designed for would be to use it like so:
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsDisplayedColumn))
list1.DisplayColumnId.Add(mapEntry.OutputColumnId);
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsReturnColumn))
list2.ReturnColumnId.Add(mapEntry.OutputColumnId);

I would say stick with the original way with the foreach loop, since you are only iterating through the list 1 time over.
also your linq should look more like this:
list1.DisplayColumnId.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn).Select(mapEntry => mapEntry.OutputColumnId));
list2.ReturnColumnId.AddRange(MapEntryTable.Where(x => x.IsReturnColumn).Select(mapEntry => mapEntry.OutputColumnId));

The performance of foreach vs Linq ForEach are almost exactly the same, within nano seconds of each other. Assuming you have the same internal logic in the loop in both versions when testing.
However a for loop, outperforms both by a LARGE margin. for(int i; i < count; ++i) is much faster than both. Because a for loop doesn't rely on an IEnumerable implementation (overhead). The for loop compiles to x86 register index/jump code. It maintains an incrementor, and then it's up to you to retrieve the item by it's index in the loop.
Using a Linq ForEach loop though does have a big disadvantage. You cannot break out of the loop. If you need to do that you have to maintain a boolean like "breakLoop = false", set it to true, and have each recursive exit if breakLoop is true... Bad performing there. Secondly you cannot use continue, instead you use "return".
I never use Linq's foreach loop.
If you are dealing with linq, e.g.
List<Thing> things = .....;
var oldThings = things.Where(p.DateTime.Year < DateTime.Now.Year);
That internally will foreach with linq and give you back only the items with a year less than the current year. Cool..
But if I am doing this:
List<Thing> things = new List<Thing>();
foreach(XElement node in Results) {
things.Add(new Thing(node));
}
I don't need to use a linq for each loop. Even if I did...
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing") {
if (node.Ignore) {
continue;
}
thing.Add(node);
}
even though I could write that cleaner like
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing" && !node.Ignore) {
thing.Add(node);
}
There is no real reason I can think of to do this..>
things.ForEach(thing => {
//do something
//can't break
//can't continue
return; //<- continue
});
And if I want the fastest loop possible,
for (int i = 0; i < things.Count; ++i) {
var thing = things[i];
//do something
}
Will be faster.

Your LINQ isn't quite right as you're converting the results of Where to a List and then pseudo-iterating over those results with ForEach to add to another list. Use ToList or AddRange for converting or adding sequences to lists.
Example, where overwriting list1 (if it were actually a List<T>):
list1 = MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId).ToList();
or to append:
list1.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId));

In C#, to do what you want functionally in one call, you have to write your own partition method. If you are open to using F#, you can use List.Partition<'T>
https://msdn.microsoft.com/en-us/library/ee353782.aspx

Any benefit to loop with nested condition in the ReSharper way?

When using a foreach loop with a nested condition inside, I ever write in the following way:
foreach (RadioButton item in listOfRadioButtons)
{
if (item.IsChecked == true)
{
// sometging
}
}
But I've installed ReSharper and it suggests to change this loop to the following form (removing the if and using a lambda):
foreach (RadioButton item in listOfRadioButtons.Where(item => item.IsChecked == true))
{
// something
}
In my experience, the ReSharper way will loop two times: one to generate the filtered IEnumerable, and after to loop the results of the .Where query again.
I am correct? If so, why is ReSharper suggesting this? Because in my opinion, the first is also more reliable.
Note: The default IsChecked property of the WPF RadioButton is a Nullable bool, so it's need a == true, a .Value, or a cast to bool inside a condition to return bool.

In my experience, the ReSharper way will loop two times: one to
generate the filtered IEnumerable, and after to loop the results of
the .Where query again.
Nope, it will loop only once. Where does not loop your collection - it only creates iterator which will be used to enumerate your collection. Here is how LINQ solution looks like:
using(var iterator = listOfRadioButtons.Where(rb => rb.IsChecked == true))
{
while(iterator.MoveNext())
{
RadioButton item = iterator.Current;
// something
}
}
Your original code is better for performance - you will avoid creating delegate and passing it to instance of WhereEnumerableIterator, and then executing delegate for each item in source sequence. But you should note, as #dcastro pointed, difference will be really small and does not worth noting until you will have to optimize this particular loop.
Solution suggested by ReSharper is (maybe) better for readability. I personally like simple if condition in a loop.
UPDATE: Where iterator can be simplified to (also some interfaces are omitted)
public class WhereEnumerableIterator<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> _enumerator;
private Func<T,bool> _predicate;
public WhereEnumerableIterator(IEnumerable<T> source, Func<T,bool> predicate)
{
_predicate = predicate;
_enumerator = source.GetEnumerator();
}
public bool MoveNext()
{
while (_enumerator.MoveNext())
{
if (_predicate(_enumerator.Current))
{
Current = _enumerator.Current;
return true;
}
}
return false;
}
public T Current { get; private set; }
public void Dispose()
{
if (_enumerator != null)
_enumerator.Dispose();
}
}
Main idea here - it enumerates original source only when you ask it to move to next item. Then iterator goes to next item in original source and checks if it matches predicate. If match found, then it returns current item and puts enumerating source on hold.
So, until you will not ask items from this iterator, it will not enumerate source. If you will call ToList() on this iterator, it will enumerate source sequence and return all matched items, which will be saved to new list.

A better way to loop through lists

So I have a couple of different lists that I'm trying to process and merge into 1 list.
Below is a snipet of code that I want to see if there was a better way of doing.
The reason why I'm asking is that some of these lists are rather large. I want to see if there is a more efficient way of doing this.
As you can see I'm looping through a list, and the first thing I'm doing is to check to see if the CompanyId exists in the list. If it does, then I find item in the list that I'm going to process.
pList is my processign list. I'm adding the values from my different lists into this list.
I'm wondering if there is a "better way" of accomplishing the Exist and Find.
boolean tstFind = false;
foreach (parseAC item in pACList)
{
tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));
if (tstFind == true)
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Just as a side note, I'm going to be researching a way to use joins to see if that is faster. But I haven't gotten there yet. The above code is my first cut at solving this issue and it appears to work. However, since I have the time I want to see if there is a better way still.
Any input is greatly appreciated.
Time Findings:
My current Find and Exists code takes about 84 minutes to loop through the 5.5M items in the pACList.
Using pList.firstOrDefault(x=> x.CompanyId == item.key.ToString()); takes 54 minutes to loop through 5.5M items in the pACList

You can retrieve item with FirstOrDefault instead of searching for item two times (first time to define if item exists, and second time to get existing item):
var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());
if (tstFind != null)
{
//Processing done here. pItem gets updated here
}

Yes, use a hashtable so that your algorithm is O(n) instead of O(n*m) which it is right now.
var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
foreach (parseAC item in pACList)
{
if (pListByCompanyId.ContainsKey(item.key.ToString()))
{
pItem = pListByCompanyId[item.key.ToString()];
//Processing done here. pItem gets updated here
...
}

You can iterate though filtered list using linq
foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}

Using lists for this type of operation is O(MxN) (M is the count of pACList, N is the count of pList). Additionally, you are searching pACList twice. To avoid that issue, use pList.FirstOrDefault as recommended by #lazyberezovsky.
However, if possible I would avoid using lists. A Dictionary indexed by the key you're searching on would greatly improve the lookup time.

Doing a linear search on the list for each item in another list is not efficient for large data sets. What is preferable is to put the keys into a Table or Dictionary that can be much more efficiently searched to allow you to join the two tables. You don't even need to code this yourself, what you want is a Join operation. You want to get all of the pairs of items from each sequence that each map to the same key.
Either pull out the implementation of the method below, or change Foo and Bar to the appropriate types and use it as a method.
public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
, IEnumerable<Foo> pList)
{
return pACList.Join(pList, item => item.Key.ToString()
, item => item.CompanyID.ToString()
, (a, b) => Tuple.Create(a, b));
}
You can use the results of this call to merge the two items together, as they will have the same key.
Internally the method will create a lookup table that allows for efficient searching before actually doing the searching.

Convert pList to HashSet then query pHashSet.Contains(). Complexity O(N) + O(n)
Sort pList on CompanyId and do Array.BinarySearch() = O(N Log N) + O(n * Log N )
If Max company id is not prohibitively large, simply create and array of them where item with company id i exists at i-th position. Nothing can be more fast.
where N is size of pList and n is size of pACList

would remove a key from Dictionary in foreach cause a problem? or should I better to construct a new Dictionary?

for example:
1.
foreach (var item in myDic)
{
if (item.value == 42)
myDic.remove(item.key);
}
would the iterator works properly no matter how the statements in the inner brackets could possibly affect myDic?
2.
var newDic = myDic.where(x=>x.value!=42).ToDictionary(x=>x.key,x=>x.value);
Is 2nd approach a good practice? functional programming and immutable?

The first approach will crash at runtime, since the enumerator makes sure that nobody deletes from the underlying collection while it's enumerating.
The second approach is a nice thought, but C# dictionaries are mutable and it's neither idiomatic nor efficient to copy them around if you can accomplish the same thing with mutation.
This is a typical way:
var itemsToRemove = myDic.Where(f => f.Value == 42).ToArray();
foreach (var item in itemsToRemove)
myDic.Remove(item.Key);
EDIT: In response to your question in the comments. Here's how the example in your other question works:
myList = myList.where(x=>x>10).select(x=>x-10);
This line of code doesn't run anything; it's totally lazy. Let's say for the sake of argument that we have a foreach after it to make it look more like this question's example.
foreach (int n in myList)
Console.WriteLine(n);
When that executes, here's what'll happen on each iteration:
Call MoveNext on the enumerator
The enumerator finds the next value greater than ten
Then it takes that value minus ten and sets the Current property to that
Binds the Current property to the variable n
Console.WriteLines it
You can see that there's no mystery and no infinite loop and no whatever.
Now compare to my example, supposing we left out the ToArray.
var itemsToRemove = myDic.Where(f => f.Value == 42);
foreach (var item in itemsToRemove)
myDic.Remove(item.Key);
Call MoveNext on the enumerator
The enumerator finds the next pair with value 42 and sets the Current property to that
Binds the Current property to the variable item
Removes it
This doesn't work because while it's perfectly fine to WriteLine something from a collection while you have an enumerator open on it, you aren't permitted to Remove something from a collection while you have an enumerator open on it.
If you call ToArray up front, then you start out by enumerating over the dictionary and populating the array. When we get to the foreach, the foreach statement has an enumerator open on the array, not the dictionary. You're allowed to remove from the dictionary as you iterate over the array.

Also you can iterate over the copy of your collection:
foreach (var item in myDic.ToList())
{
if (item.value == 42)
myDic.remove(item.key);
}
notice myDic.ToList() in foreach statement.

According to the docs, starting from .NET Core 3.0, removing an element will no longer affect active enumerators. You can safely remove an item while iterating:
foreach (var item in myDic)
{
if (item.Value == 42)
myDic.Remove(item.Key);
}
Dictionary<TKey,TValue>.Remove Method
.NET Core 3.0+ only: this mutating method may be safely called without invalidating active enumerators on the Dictionary<TKey,TValue> instance. This does not imply thread safety.

I would suggest making a copy of the keys and not the entire dictionary as an array, like others have suggested.
mykeytype[] mykeys = new mykeytype[mydic.Keys.Count];
mydic.Keys.CopyTo(mykeys, 0);
foreach (var key in mykeys)
{
MyType thing;
if (!mydic.TryGetValue(key, out thing)) continue;
// remove or add to dictionary here
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Multiple LINQ Where vs foreach on List<T> - c#

I think it might change according the collection size. However I would go with the second code. The first code iterates the entire collection twice (one per where) and then iterates each result once. The second code iterates just once the entire collection. Also, it's cleaner.

Related

Find and change the value of first element in list that meets a requirement, do something else if not found

Converting foreach to Linq

Any benefit to loop with nested condition in the ReSharper way?

A better way to loop through lists

would remove a key from Dictionary in foreach cause a problem? or should I better to construct a new Dictionary?

Categories

Resources