LINQ vs nested loop - c#

The code Im maintaining has a common pattern like the following, a nested loop with an if to find certain elements.
foreach (Storage storage in mStorage.Values)
foreach (OrderStorage oStorage in storage.OrderStorage)
if (oStorage.OrderStorageId == orderStorageId)
I was thinking to change this into LINQ:
foreach (OrderStorage oStorage in (from storage in mStorage.Values
from oStorage in storage.OrderStorage
where oStorage.OrderStorageId == orderStorageId
select oStorage))
But it doesnt seem all that appealing because it's less transparent whats going on here, more objects may be created costing performance in terms of both memory and cpu. Will there actually be more objects created or does the C# compiler emit code resembling the nested loop with an if inside?

Will there actually be more objects created or does the C# compiler emit code resembling the nested loop with an if inside?
More objects; each LINQ operation (SelectMany, Where, Select etc) will result in a new place-holder object that represents the pending IEnumerable<T> query of that operation, and then when it is finally iterated, each of those will result in an enumerator instance, along with the context etc. Plus there is a captured-variable context for the hoisted orderStorageId, etc.
Note that the regular foreach will also result in an enumerator instance, but foreach has the advantage that it can also use duck-typed enumerators - which means that for things like List<T> it is actually using a struct enumerator, not a class enumerator. And of course using the local variable (orderStorageId) directly (rather than in an anonymous method) means that it doesn't need to be hoisted into a state/context object.
So yes, the raw foreach is more direct and efficient. The interesting question is: is the difference important. Sometimes it is, sometimes it isn't.

Related

Does the foreach the same as classic for?

Regarding the collections implementing this[int] and assuming the collection won't change during the enumeration, does the foreach (var item in list) loop produce the same sequence as for (var i = 0; i < list.Count; ++i) anytime?
This means, when I need the ascending order by index, could I use foreach or is just simply safer to use for? Or it just depends on the curren collection implementation and migh vary or change in time?
foreach (var item in list)
{
// do things
}
translates to
var enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
var item = enumerator.Current;
// do things
}
So as you can see, it's not using the indexor list[i] in the general case.
For most collections types, however, the semantics is the same.
edit
There are IList<T> implementations where the enumerator IList<T> as a linked list, it's very unlikely you will use the indexor in your enumerator implementation, as it would be very inefficient.
As a rule of thumb, using foreach ensure you use the most efficient algorithm for the class at hand, as it is the one chosen by the class' Creator. In the worst case, you will just suffer a small indirection overhead that is very unlikely to be noticeable.
edit 2 after nos's comment
There is a case where the semantics of the two constructs varies widly: the collection modification.
While using a simple for loop, nothing particular will happen if you change the collection while iterating through it. The program will behave as if it assumed you know what you're doing. This could result in some values iterated over more than once or other skipped, but no exception as long as you're not accessing outside of the range of the indexor (which would require a multithreaded program ot happen).
While using a foreachloop; if you modify the collection while iterating through it, you enter undefined behavior. The documentation tells us
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
In that case, expect most of C# built-in types to throw an InvalidOperationException, but everything can happen in a custom implementation, from missed values to repeated values , including infinite loops...
Generally speaking, yes, but strictly spoken: no. It really depends on the implementation.
Usually with for you would use the this indexer properties. foreach uses GetEnumerator() to get the enumerator that iterates over the collection. Depending on the implementation the enumerator might yield another result than the for.
The implied logic of a list is that is has a specific order, and when implementing IList you may state is it save to assume that the order of both the indexer properties as the enumerator are the same.
There is no guarantee that this would be the case. The code paths can be completely separate. Of course collections like List will produce the same result but you can write data structures (even useful ones) that do not.
The indexer is just a property with additional index argument. You can return a random value if you feel like it.
One important think you should have in mind, as a difference between the 2 is that inside foreach you can/t make any changes to the enumarared objects.
If you wish to alter (basicaly delete) objects from the enumeration you must use a for loop

Select and ForEach on List<> [duplicate]

This question already has answers here:
LINQ equivalent of foreach for IEnumerable<T>
(22 answers)
Closed 9 years ago.
I am quite new to C# and was trying to use lambda expressions.
I am having a list of object. I would like to select item from the list and perform foreach operation on the selected items. I know i could do it without using lambda expression but wanted to if this was possible using lambda expression.
So i was trying to achieve a similar result
List<UserProfile> users = new List<UserProfile>();
..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
it was possible to do
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
but not something like this
users.select(i => i.UserName=="").ForEach(i=>i.UserName="NA");
Can someone explain this behaviour..
Let's start here:
I am having a list of object.
It's important to understand that, while accurate, that statement leaves a c# programmer wanting more. What kind of object? In the .Net world, it pays to always keep in mind what specific type of object you are working with. In this case, that type is UserProfile. This may seem like a side issue, but it will become more relevant to the specific question very quickly. What you want to say instead is this:
I have a list of UserProfile objects.
Now let's look at your two expressions:
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
and
users.Where(i => i.UserName=="").ForEach(i=>i.UserName="NA");
The difference (aside from that only the first compiles or works) is that you need to call .ToList() to convert the results of Where() function to a List type. Now we begin to see why it is that you want to always think in terms of types when working with .Net code, because it should now occur to you to wonder, "What type am I working with, then?" I'm glad you asked.
The .Where() function results in an IEnumerable<T> type, which is actually not a full type all by itself. It's an interface that describes certain things a type that implements it's contract will be able to do. The IEnumerable interface can be confusing at first, but the important thing to remember is that it defines something that you can use with a foreach loop. That is it's sole purpose. Anything in .Net that you can use with a foreach loop: arrays, lists, collections — they pretty much all implement the IEnumerable interface. There are other things you can loop over, as well. Strings, for example. Many methods you have today that require a List or Array as an argument can be made more powerful and flexible simply by changing that argument type to IEnumerable.
.Net also makes it easy to create state machine-based iterators that will work with this interface. This is especially useful for creating objects that don't themselves hold any items, but do know how to loop over items in a different collection in a specific way. For example, I might loop over just items 3 through 12 in an array of size 20. Or might loop over the items in alphabetical order. The important thing here is that I can do this without needing to copy or duplicate the originals. This makes it very efficient in terms of memory, and it's structure in such a way that you can easily compose different iterators together to get very powerful results.
The IEnumerable<T> type is especially important, because it is one of two types (the other being IQueryable) that form the core of the linq system. Most of the .Where(), .Select(), .Any() etc linq operators you can use are defined as extensions to IEnumerable.
But now we come to an exception: ForEach(). This method is not part of IEnumerable. It is defined directly as part of the List<T> type. So, we see again that it's important to understand what type you are working with at all times, including the results of each of the different expressions that make up a complete statement.
It's also instructional to go into why this particular method is not part of IEnumerable directly. I believe the answer lies in the fact that the linq system takes a lot of inspiration from a the Functional Programming world. In functional programming, you want to have operations (functions) that do exactly one thing, with no side effects. Ideally, these functions will not alter the original data, but rather they will return new data. The ForEach() method is implicitly all about creating bad side effects that alter data. It's just bad functional style. Additionally, ForEach() breaks method chaining, in that it doesn't return a new IEnumerable.
There is one more lesson to learn here. Let's take a look at your original snippet:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
I mentioned something earlier that should help you significantly improve this code. Remember that bit about how you can have IEnumerable items that loop over a collection, without duplicating it? Think about what happens if you wrote that code this way, instead:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
var selecteditem = users.Where(i => i.UserName=="");
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
All I did was remove the call to .ToList(), but everything will still work. The only thing that changed is we avoided needing to copy the entire list. That should make this code faster. In some circumstances, it can make the code a lot faster. Something to keep in mind: when working the with the linq operator methods, it's generally good to avoid calling .ToArray() or .ToList() whenever possible, and it's possible a lot more than you might think.
As for the foreach() {...} vs .Foreach( ... ): the former is still perfectly appropriate style.
Sure, it's quite simple. List has a ForEach method. There is no such method, or extension method, for IEnumerable.
As to why one has a method and another doesn't, that's an opinion. Eric Lippert blogged on the topic if you're interested in his.

Write a lambda expression to perform a calulcation on an list

I have a List/IEnumerable of objects and I'd like to perform a calculation on some of them.
e.g.
myList.Where(f=>f.Calculate==true).Calculate();
to update myList, based on the Where clause, so that the required calulcation is performed and the entire list updated as appropriate.
The list contains "lines" where an amount is either in Month1, Month2, Month3...Month12, Year1, Year2, Year3-5 or "Long Term"
Most lines are fixed and always fall into one of these months, but some "lines" are calulcated based upon their "Maturity Date".
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries. I could make it a concrete class if required though, but I'd prefer not to if I can avoid it.
So, I'd like to call a method that works on only the calculated lines, and puts the correct amount into the correct "month".
I'm not worried about the calculation logic, but rather how to get this into an easily readable method that updates the list without, ideally, returning a new list.
[Is it possible to write a lambda extension method to do both the calculation AND the where - or is this overkill anyway as Where() already exists?]
Personally, if you want to update the list in place, I would just use a simple loop. It will be much simpler to follow and maintain:
for (int i=0;i<list.Count;++i)
{
if (list[i].ShouldCalculate)
list[i] = list[i].Calculate();
}
This, at least, is much more obvious that it's going to update. LINQ has the expectation of performing a query, not mutating the data.
If you really want to use LINQ for this, you can - but it will still require a copy if you want to have a List<T> as your results:
myList = myList.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This would call your Calculate() method as needed, and copy the original when not needed. It does require a copy to create a new List<T>, though, as you mentioned that was a requirement (in comments).
However, my personal preference would still be to use a loop in this case. I find the intent much more clear - plus, you avoid the unnecessary copy operation.
Edit #2:
Given this comment:
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries
If you really want to use LINQ style syntax, I would recommend just not calling ToList() on your original queries. If you leave them in their original, IEnumerable<T> form, you can easily do my second option above, but on the original query:
var myList = query.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This has the advantage of only constructing the list one time, and preventing the copy, as the original sequence will not get evaluated until this operation.
LINQ is mostly geared around side-effect-free queries, and anonymous types themselves are immutable (although of course they can maintain references to mutable types).
Given that you want to mutate the list in place, LINQ isn't a great fit.
As per Reed's suggestion, I would use a straight for loop. However, if you want to perform different calculations at different points, you could encapsulate this:
public static void Recalculate<T>(IList<T> list,
Func<T, bool> shouldCalculate,
Func<T, T> calculation)
{
for (int i = 0; i < list.Count; i++)
{
if (shouldCalculate(items[i]))
{
items[i] = calculation(items[i]);
}
}
}
If you really want to use this in a fluid way, you could make it return the list - but I would personally be against that, as it would then look like it was side-effect-free like LINQ.
And like Reed, I'd also prefer to do this by creating a new sequence...
Select doesn't copy or clone the objects it passes to the passed delegate, any state changes to that object will be reflected through the reference in the container (unless it is a value type).
So updating reference types is not a problem.
To replace the objects (or when working with value types1) this are more complex and there is no inbuilt solution with LINQ. A for loop is clearest (as with the other answers).
1 Remembering, of course, that mutable value types are evil.

What's the performance hit of List.OfType<> where the entire list is that type?

I have an architecture where we are passing our data nodes as IEnumerable<BaseNode>. It all works great, but in each subclass we want to store these as List<AnotherNode> as everything in that class creates and uses AnotherNode objects (we have about 15 different subclasses).
The one place using the more strongly typed list doesn't work is the root classes method that returns a type IEnumerable<BaseNode> and with the covariance limitations in .net 3.5, that can't be returned. (We have to stay on .net 3.5 for now.)
But if I have List<AnotherNode> data; and return data.OfType<BaseNode>(); - that works fine. So here's my question.
As all of data is of type BaseNode - what's the performance hit of this call? Because the alternative is I have to cast in places which has a small performance hit - but it's also a situation where we give up everything knowing it's type.
Two minor things:
There is a small, but measurable overhead associated with yielding each item in the enumerator. If you need to care about this because you're in a very tight inner loop, you're actually better off iterating with a for loop on the list directly. Most likely this doesn't matter.
Because the result is IEnumerable<BaseNode> and has already been filtered through a yielding enumeration function, subsequent calls to methods like Count() or ElementAt() will not take advantage of optimizations in the LINQ implementation for Lists. This is also unlikely to be a problem unless you make frequent use of these extension methods and have a very large number of elements.
Have you seen the Cast<T>() Linq operator? It should be more performant than OfType<T>().
Basically there is a condition that is run with OfType<T>()
if (item is T) {
yield return (T)item;
}
Contrast that with what Cast<T>() does:
yield return (T)item;

Is garbage created in this foreach loop?

I came across a method to change a list in a foreach loop by converting to a list in itself like this:
foreach (var item in myList.ToList())
{
//add or remove items from myList
}
(If you attempt to modify myList directly an error is thrown since the enumerator basically locks it)
This works because it's not the original myList that's being modified. My question is, does this method create garbage when the loop is over (namely from the List that's returned from the ToList method? For small loops, would it be preferable to using a for loop to avoid the creation of garbage?
The second list is going to be garbage, there will be garbage for an enumerator that is used in building the second list, and add in the enumerator that the foreach would spawn, which you would have had with or without the second list.
Should you switch to a for? Maybe, if you can point to this region of code being a true performance bottleneck. Otherwise, code for simplicity and maintainability.
Yes. ToList() would create another list that would need to be garbage collected.
That's an interesting technique which I will keep in mind for the future! (I can't believe I've never thought of that!)
Anyway, yes, the list that you are building doesn't magically unallocate itself. The possible performance problems with this technique are:
Increased memory usage (building a List, separate from the IEnumerable). Probably not that big of a deal, unless you do this very frequently, or the IEnumerable is very large.
Decreased speed, since it has to go through the IEnumerable at once to build the List.
Also, if enumerating the IEnumerable has side effects, they will all be triggered by this process.
Unless this is actually inside an inner loop, or you're working with very large data sets, you can probably do this without any problems.
Yes, the ToList() method creates "garbage". I would just indexing.
for (int i = MyList.Count - 1; 0 <= i; --i)
{
var item = MyList[i];
//add or remove items from myList
}
It's non-deterministic. But the reference created from the call ToList() will be GCd eventually.
I wouldn't worry about it too much, since all it would be holding at most would be references or small value types.

Categories

Resources