This question already has answers here:
LINQ equivalent of foreach for IEnumerable<T>
(22 answers)
Closed 9 years ago.
I am quite new to C# and was trying to use lambda expressions.
I am having a list of object. I would like to select item from the list and perform foreach operation on the selected items. I know i could do it without using lambda expression but wanted to if this was possible using lambda expression.
So i was trying to achieve a similar result
List<UserProfile> users = new List<UserProfile>();
..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
it was possible to do
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
but not something like this
users.select(i => i.UserName=="").ForEach(i=>i.UserName="NA");
Can someone explain this behaviour..
Let's start here:
I am having a list of object.
It's important to understand that, while accurate, that statement leaves a c# programmer wanting more. What kind of object? In the .Net world, it pays to always keep in mind what specific type of object you are working with. In this case, that type is UserProfile. This may seem like a side issue, but it will become more relevant to the specific question very quickly. What you want to say instead is this:
I have a list of UserProfile objects.
Now let's look at your two expressions:
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
and
users.Where(i => i.UserName=="").ForEach(i=>i.UserName="NA");
The difference (aside from that only the first compiles or works) is that you need to call .ToList() to convert the results of Where() function to a List type. Now we begin to see why it is that you want to always think in terms of types when working with .Net code, because it should now occur to you to wonder, "What type am I working with, then?" I'm glad you asked.
The .Where() function results in an IEnumerable<T> type, which is actually not a full type all by itself. It's an interface that describes certain things a type that implements it's contract will be able to do. The IEnumerable interface can be confusing at first, but the important thing to remember is that it defines something that you can use with a foreach loop. That is it's sole purpose. Anything in .Net that you can use with a foreach loop: arrays, lists, collections — they pretty much all implement the IEnumerable interface. There are other things you can loop over, as well. Strings, for example. Many methods you have today that require a List or Array as an argument can be made more powerful and flexible simply by changing that argument type to IEnumerable.
.Net also makes it easy to create state machine-based iterators that will work with this interface. This is especially useful for creating objects that don't themselves hold any items, but do know how to loop over items in a different collection in a specific way. For example, I might loop over just items 3 through 12 in an array of size 20. Or might loop over the items in alphabetical order. The important thing here is that I can do this without needing to copy or duplicate the originals. This makes it very efficient in terms of memory, and it's structure in such a way that you can easily compose different iterators together to get very powerful results.
The IEnumerable<T> type is especially important, because it is one of two types (the other being IQueryable) that form the core of the linq system. Most of the .Where(), .Select(), .Any() etc linq operators you can use are defined as extensions to IEnumerable.
But now we come to an exception: ForEach(). This method is not part of IEnumerable. It is defined directly as part of the List<T> type. So, we see again that it's important to understand what type you are working with at all times, including the results of each of the different expressions that make up a complete statement.
It's also instructional to go into why this particular method is not part of IEnumerable directly. I believe the answer lies in the fact that the linq system takes a lot of inspiration from a the Functional Programming world. In functional programming, you want to have operations (functions) that do exactly one thing, with no side effects. Ideally, these functions will not alter the original data, but rather they will return new data. The ForEach() method is implicitly all about creating bad side effects that alter data. It's just bad functional style. Additionally, ForEach() breaks method chaining, in that it doesn't return a new IEnumerable.
There is one more lesson to learn here. Let's take a look at your original snippet:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
I mentioned something earlier that should help you significantly improve this code. Remember that bit about how you can have IEnumerable items that loop over a collection, without duplicating it? Think about what happens if you wrote that code this way, instead:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
var selecteditem = users.Where(i => i.UserName=="");
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
All I did was remove the call to .ToList(), but everything will still work. The only thing that changed is we avoided needing to copy the entire list. That should make this code faster. In some circumstances, it can make the code a lot faster. Something to keep in mind: when working the with the linq operator methods, it's generally good to avoid calling .ToArray() or .ToList() whenever possible, and it's possible a lot more than you might think.
As for the foreach() {...} vs .Foreach( ... ): the former is still perfectly appropriate style.
Sure, it's quite simple. List has a ForEach method. There is no such method, or extension method, for IEnumerable.
As to why one has a method and another doesn't, that's an opinion. Eric Lippert blogged on the topic if you're interested in his.
Related
I got in a discussion with two colleagues regarding a setup for an iteration over an IEnumerable (the contents of which will not be altered in any way during the operation). There are three conflicting theories on which is the optimal approach. Both the others (and me as well) are very certain and that got me unsure, so for the sake of clarity, I want to check with an external source.
The scenario is as follows. We had the code below as a starting point and discovered that some of the hazaas need not to be acted upon. So, starting with the code below, we started to add a blocker for the action.
foreach(Hazaa hazaa in hazaas) ;
My suggestion is as follows.
foreach(Hazaa hazaa in hazaas.Where(element => condition)) ;
One of the guys wants to resolve it by a more explicit form, claiming that LINQ is not appropriate in this case (not sure why it'd be so but he seems to be very convinced). He's solution is this.
foreach(Hazaa hazaa in hazaas) ;
if(condition) ;
The other contra-suggestion is supported by the claim that Where risks to repeat the filtering process needlessly and that it's more certain to minimize the computational workload by picking the appropriate elements once for all by Select.
foreach(Hazaa hazaa in hazaas.Select(element => condition)) ;
I argue that the first is obsolete, since LINQ can handle data objects quite well.
I also believe that Select-ing is in this case equivalently fast to Where-ing and no needless steps will be taken (e.g. the evaluation of the condition on the elements will only be performed once). If anything, it should be faster using Where because we won't be creating an extra instance of anything.
Who's right?
Select is inappropriate. It doesn't filter anything.
if is a possible solution, but Where is just as explicit.
Where executes the condition exactly once per item, just as the if. Additionally, it is important to note that the call to Where doesn't iterate the list. So, using Where you iterate the list exactly once, just like when using if.
I think you are discussing with one person that didn't understand LINQ - the guy that wants to use Select - and one that doesn't like the functional aspect of LINQ.
I would go with Where.
The .Where() and the if(condition) approach will be the same.
But since LinQ is nicely readable i'd prefer that.
The approach with .Select() is nonsense, since it will not return the Hazaa-Object, but an IEnumerable<Boolean>
To be clear about the functions:
myEnumerable.Where(a => isTrueFor(a)) //This is filtering
myEnumerable.Select(a => a.b) //This is projection
Where() will run a function, which returns a Boolean foreach item of the enumerable and return this item depending on the result of the Boolean function
Select() will run a function for every item in the list and return the result of the function without doing any filtering.
I have a List/IEnumerable of objects and I'd like to perform a calculation on some of them.
e.g.
myList.Where(f=>f.Calculate==true).Calculate();
to update myList, based on the Where clause, so that the required calulcation is performed and the entire list updated as appropriate.
The list contains "lines" where an amount is either in Month1, Month2, Month3...Month12, Year1, Year2, Year3-5 or "Long Term"
Most lines are fixed and always fall into one of these months, but some "lines" are calulcated based upon their "Maturity Date".
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries. I could make it a concrete class if required though, but I'd prefer not to if I can avoid it.
So, I'd like to call a method that works on only the calculated lines, and puts the correct amount into the correct "month".
I'm not worried about the calculation logic, but rather how to get this into an easily readable method that updates the list without, ideally, returning a new list.
[Is it possible to write a lambda extension method to do both the calculation AND the where - or is this overkill anyway as Where() already exists?]
Personally, if you want to update the list in place, I would just use a simple loop. It will be much simpler to follow and maintain:
for (int i=0;i<list.Count;++i)
{
if (list[i].ShouldCalculate)
list[i] = list[i].Calculate();
}
This, at least, is much more obvious that it's going to update. LINQ has the expectation of performing a query, not mutating the data.
If you really want to use LINQ for this, you can - but it will still require a copy if you want to have a List<T> as your results:
myList = myList.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This would call your Calculate() method as needed, and copy the original when not needed. It does require a copy to create a new List<T>, though, as you mentioned that was a requirement (in comments).
However, my personal preference would still be to use a loop in this case. I find the intent much more clear - plus, you avoid the unnecessary copy operation.
Edit #2:
Given this comment:
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries
If you really want to use LINQ style syntax, I would recommend just not calling ToList() on your original queries. If you leave them in their original, IEnumerable<T> form, you can easily do my second option above, but on the original query:
var myList = query.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This has the advantage of only constructing the list one time, and preventing the copy, as the original sequence will not get evaluated until this operation.
LINQ is mostly geared around side-effect-free queries, and anonymous types themselves are immutable (although of course they can maintain references to mutable types).
Given that you want to mutate the list in place, LINQ isn't a great fit.
As per Reed's suggestion, I would use a straight for loop. However, if you want to perform different calculations at different points, you could encapsulate this:
public static void Recalculate<T>(IList<T> list,
Func<T, bool> shouldCalculate,
Func<T, T> calculation)
{
for (int i = 0; i < list.Count; i++)
{
if (shouldCalculate(items[i]))
{
items[i] = calculation(items[i]);
}
}
}
If you really want to use this in a fluid way, you could make it return the list - but I would personally be against that, as it would then look like it was side-effect-free like LINQ.
And like Reed, I'd also prefer to do this by creating a new sequence...
Select doesn't copy or clone the objects it passes to the passed delegate, any state changes to that object will be reflected through the reference in the container (unless it is a value type).
So updating reference types is not a problem.
To replace the objects (or when working with value types1) this are more complex and there is no inbuilt solution with LINQ. A for loop is clearest (as with the other answers).
1 Remembering, of course, that mutable value types are evil.
I'm using the new Resharper version 6. In several places in my code it has underlined some text and warned me that there may be a Possible multiple enumeration of IEnumerable.
I understand what this means, and have taken the advice where appropriate, but in some cases I'm not sure it's actually a big deal.
Like in the following code:
var properties = Context.ObjectStateManager.GetObjectStateEntry(this).GetModifiedProperties();
if (properties.Contains("Property1") || properties.Contains("Property2") || properties.Contains("Property3")) {
...
}
It's underlining each mention of properties on the second line, warning that I am enumerating over this IEnumerable multiple times.
If I add .ToList() to the end of line 1 (turning properties from a IEnumerable<string> to a List<string>), the warnings go away.
But surely, if I convert it to a List, then it will enumerate over the entire IEnumerable to build the List in the first place, and then enumerate over the List as required to find the properties (i.e. 1 full enumeration, and 3 partial enumerations). Whereas in my original code, it is only doing the 3 partial enumerations.
Am I wrong? What is the best method here?
I don't know exactly what your properties really is here - but if it's essentially representing an unmaterialized database query, then your if statement will perform three queries.
I suspect it would be better to do:
string[] propertiesToFind = { "Property1", "Property2", "Property3" };
if (properties.Any(x => propertiesToFind.Contains(x))
{
...
}
That will logically only iterate over the sequence once - and if there's a database query involved, it may well be able to just use a SQL "IN" clause to do it all in the database in a single query.
If you invoke Contains() on a IEnumerable, it will invoke the extension method which will just iterate through the items in order to find it. IList has real implementation for Contains() that probably are more efficient than a regular iteration through the values (it might have a search tree with hashes?), hence it doesn't warn with IList.
Since the extension method will only be aware that it's an IEnumerable, it probably can not utilize any built-in methods for Contains() even though it would be possible in theory to identify known types and cast them accordingly in order to utilize them.
While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!
Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).
I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));
This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.
More than about LINQ to [insert your favorite provider here], this question is about searching or filtering in-memory collections.
I know LINQ (or searching/filtering extension methods) works in objects implementing IEnumerable or IEnumerable<T>. The question is: because of the nature of enumeration, is every query complexity at least O(n)?
For example:
var result = list.FirstOrDefault(o => o.something > n);
In this case, every algorithm will take at least O(n) unless list is ordered with respect to 'something', in which case the search should take O(log(n)): it should be a binary search. However, If I understand correctly, this query will be resolved through enumeration, so it should take O(n), even in list was previously ordered.
Is there something I can do to solve a query in O(log(n))?
If I want performance, should I use Array.Sort and Array.BinarySearch?
Even with parallelisation, it's still O(n). The constant factor would be different (depending on your number of cores) but as n varied the total time would still vary linearly.
Of course, you could write your own implementations of the various LINQ operators over your own data types, but they'd only be appropriate in very specific situations - you'd have to know for sure that the predicate only operated on the optimised aspects of the data. For instance, if you've got a list of people that's ordered by age, it's not going to help you with a query which tries to find someone with a particular name :)
To examine the predicate, you'd have to use expression trees instead of delegates, and life would become a lot harder.
I suspect I'd normally add new methods which make it obvious that you're using the indexed/ordered/whatever nature of the data type, and which will always work appropriately. You couldn't easily invoke those extra methods from query expressions, of course, but you can still use LINQ with dot notation.
Yes, the generic case is always O(n), as Sklivvz said.
However, many LINQ methods special case for when the object implementing IEnumerable actually implements e.g. ICollection. (I've seen this for IEnumerable.Contains at least.)
In practice this means that LINQ IEnumerable.Contains calls the fast HashSet.Contains for example if the IEnumerable actually is a HashSet.
IEnumerable<int> mySet = new HashSet<int>();
// calls the fast HashSet.Contains because HashSet implements ICollection.
if (mySet.Contains(10)) { /* code */ }
You can use reflector to check exactly how the LINQ methods are defined, that is how I figured this out.
Oh, and also LINQ contains methods IEnumerable.ToDictionary (maps key to single value) and IEnumerable.ToLookup (maps key to multiple values). This dictionary/lookup table can be created once and used many times, which can speed up some LINQ-intensive code by orders of magnitude.
Yes, it has to be, because the only way of accessing any member of an IEnumerable is by using its methods, which means O(n).
It seems like a classic case in which the language designers decided to trade performance for generality.