I have a foreach loop
var axsEntities = GetAxsEntitiesForInvoicing(adapter)
.GroupBy(x => x.AccountUsingAccountIdToAccountId);
foreach(var gbAccount in axsEntities)
{
int i = gbAccount.count();
}
Now when i run this without the loop it runs fine, but with the loop it uses way too much memory, 3 gigabytes in this case. What could be the reason for this?
Thanks
Without the loop, nothing is really happening.
axsEntities is just an IEnumerable with deferred execution.
Creating it is always cheap. Only when Iterating over it (the foreach) things are being fetched and computed.
So you just might have very many elements, or .count() uses a lot of memory.
We'd have to see what type axsEntitie is to be sure, but I'm guessing it is a IQueryable? If so, without the for loop you aren't actually doing anything on that set. With the for loop you're actually iterating the result set.
The first expression is probably lazy evaluated. Try a simple
var test = axsEntities.ToList();
to see if that also uses a lot of memory.
The problem likely is NOT the forreach loop, but the GroupBy logic that is delay executed in the loop.
Unless the GetAxyEntitiesForInvoicing method is IQueryable and does not return all entities, the grouping has to happen in memory.
What about gbAccount.count(); inside a foreach loop? This might not be a good idea. I would first check to see if this is responsible for using the precious memory. My advice is that you could come up with a more specialized query e.g. GetAccountsCountForGroupedAxsEntitiesForInvoicingByAccountUsingAccountIdToAccountIdWithSauce, this sounds like a really nice name to me.
Peace
Related
today I noticed that when I run several LINQ-statements on big data the time taken may vary extremely.
Suppose we have a query like this:
var conflicts = features.Where(/* some condition */);
foreach (var c in conflicts) // log the conflicts
Where features is a list of objects representing rows in a table. Hence these objects are quite complex and even querying one simple property of them is a huge operation (including the actual database-query, validation, state-changes...) I suppose performing such a query takes much time. Far wrong: the first statement executes in a quite small amount of time, whereas simply looping the results takes eternally.
However, If I convert the collection retrieved by the LINQ-expression to a List using IEnumerable#ToList() the first statement runs a bit slower and looping the results is very fast. Having said this the complete duration-time of second approach is much less than when not converting to a list.
var conflicts = features.Where(/* some condition */).ToList();
foreach (var c in conflicts) // log the conflicts
So I suppose that var conflicts = features.Where does not actually query but prepares the data. But I do not understand why converting to a list and afterwards looping is so much faster then. That´s the actual question
Has anybody an explanation for this?
This statement, just declare your intention:
var conflicts = features.Where(...);
to get the data that fullfils the criteria in Where clause. Then when you write this
foreach (var c in conflicts)
The the actual query will be executed and will start getting the results. This is called lazy loading. Another term we use for this is the deffered execution. We deffer the execution of the query, until we need it's data.
On the other hand, if you had done something like this:
var conflicts = features.Where(...).ToList();
an in memory collection would have been created, in which the results of the query would had been stored. In this case the query, would had been executed immediately.
Generally speaking, as you could read in wikipedia:
Lazy loading is a design pattern commonly used in computer programming
to defer initialization of an object until the point at which it is
needed. It can contribute to efficiency in the program's operation if
properly and appropriately used. The opposite of lazy loading is eager
loading.
Update
And I suppose this in-memory collection is much faster then when doing
lazy load?
Here is a great article that answers your question.
Welcome to the wonderful world of lazy evaluation. With LINQ the query is not executed until the result is needed. Before you try to get the result (ToList() gets the result and puts it in a list) you are just creating the query. Think of it as writing code vs running the program. While this may be confusing and may cause the code to execute at unexpected times and even multiple times if you for example you foreach the query twice, this is actually a good thing. It allows you to have a piece of code that returns a query (not the result but the actual query) and have another piece of code create a new query based on the original query. For example you may add additional filters on top of the original or page it.
The performance difference you are seeing is basically the database call happening at different places in your code.
I have many methods like the one below:
void ValidateBuyerRules()
{
var nodesWithRules = ActiveNodes.Where(x => x.RuleClass.IsNotNullOrEmpty());
**if (!nodesWithRules.Any()) return;**
foreach (var ruleClass in nodesWithRules)
{
// Do something here
}
}
As you can see, I check if nodesWithRules has any items and exit the method before conducting the foreach statement, but is this unecessary code?
Unless you have some logic after the foreach statement that you want to avoid, that's unnecessary as it will work the same.
When foreach iterates over nodesWithRules detects that there are no items and exit the loop.
If this is linq 2 sql, never do that.
You cause an extra round trip.
Also if you have any other type of IEnumerable, you should avoid that. .net does some tricks for underlying list, but you shouldn't rely on those.
There's really no point in calling Any before the Where. If no results come back from the Where query, you will never enter the for loop.
Calling the Any before the where will actually end up hurting performance because you're doing two queries.
Currently I'm working on a 2D XNA game. It needs optimizing, because the multiplayer mode is not performing well. In code we have mostly used foreach loops, and have not used LINQ or yield return statements anywhere. I understand we can win some performance here. Since for loops are faster, I was thinking of replacing all foreach loops.
But, however, I can't benefit from the yield return statement in a for loop, can I?
Also, will LINQ still be useful when iterating using a for loop?
For example, I have a list of 1000+ shapes (squares, triangles, circles...), and I want to enumerate through all squares (75%) at a certain position. What's the best way of doing this?
What should I use? Arrays, lists, for loops, foreach loops, yield return and/or LINQ?
Do any grouping or sorting you need to do as items are added, not as they are retrieved. I say this because you'll (I assume) only add each item once, but as you say you're retrieving them multiple times.
Since it is performance question the only right answer is measure.
In general directly using for will likely be the fastest approach as the rest add more code for each iteration. Try and measure yourself - see if it matters in your case and which version of code you like to read the most.
Definitely use For loop. If not for the performance gain, which will be negligible, you will not create enumerator object. So you don't put unnecessary pressure on GC.
I came across a method to change a list in a foreach loop by converting to a list in itself like this:
foreach (var item in myList.ToList())
{
//add or remove items from myList
}
(If you attempt to modify myList directly an error is thrown since the enumerator basically locks it)
This works because it's not the original myList that's being modified. My question is, does this method create garbage when the loop is over (namely from the List that's returned from the ToList method? For small loops, would it be preferable to using a for loop to avoid the creation of garbage?
The second list is going to be garbage, there will be garbage for an enumerator that is used in building the second list, and add in the enumerator that the foreach would spawn, which you would have had with or without the second list.
Should you switch to a for? Maybe, if you can point to this region of code being a true performance bottleneck. Otherwise, code for simplicity and maintainability.
Yes. ToList() would create another list that would need to be garbage collected.
That's an interesting technique which I will keep in mind for the future! (I can't believe I've never thought of that!)
Anyway, yes, the list that you are building doesn't magically unallocate itself. The possible performance problems with this technique are:
Increased memory usage (building a List, separate from the IEnumerable). Probably not that big of a deal, unless you do this very frequently, or the IEnumerable is very large.
Decreased speed, since it has to go through the IEnumerable at once to build the List.
Also, if enumerating the IEnumerable has side effects, they will all be triggered by this process.
Unless this is actually inside an inner loop, or you're working with very large data sets, you can probably do this without any problems.
Yes, the ToList() method creates "garbage". I would just indexing.
for (int i = MyList.Count - 1; 0 <= i; --i)
{
var item = MyList[i];
//add or remove items from myList
}
It's non-deterministic. But the reference created from the call ToList() will be GCd eventually.
I wouldn't worry about it too much, since all it would be holding at most would be references or small value types.
I have a huge IEnumerable(suppose the name is myItems), which way is more effective?
Solution 1: Filter it first then ForEach.
Array.ForEach(myItems.Where(FILTER-IT-HERE).ToArray(),MY-ACTION);
Solution 2: Do RETURN in MY-ACTION if the item is not up to the mustard.
Array.ForEach(myItems.ToArray(),MY-ACTION-WITH-FILTER);
Is one of them always better than another? Or any other good suggestions? Thanks in advance.
Did you do any measurements? Since WE can't measure the run time of My-Action then only you can. Measure and decide.
Sometimes one has to create benchmark's because similar looking activities could produce radically different and unexpected results.
You do not say what your data source is so I'm going to assume it may be data on an SQL server in which case filtering at the server side will likely always be the best approach because you have minimized the amount of data transfer. Memory access is always faster than data transfer from disk to memory so whenever you can transfer fewer records, you are likely to have better performance.
Well, both times, you're converting to an array, which might not be so efficient if the IEnumerable is very large (like you said). You could create a generic extension method for IEnumerable, like:
public static void ForEach<T>(this IEnumerable<T> current, Action<T> action) {
foreach (var i in current) {
action(i);
}
}
and then you could do this:
IEnumerable<int> ints = new List<int>();
ints.Where(i => i == 5).ForEach(i => Console.WriteLine(i));
If performance is a concern, it's unclear to me why you'd be bothering to construct an entire array in the first place. Why not just this?
foreach (var item in myItems.Where(FILTER-IT-HERE))
MY-ACTION;
Or:
foreach (var item in myItems)
MY-ACTION-WITH-FILTER;
I ask because, while the others are right that you can't really know without testing, I wouldn't expect there to be much difference between the above two options. I would expect there to be a difference, on the other hand, between creating/populating an array (seemingly for no reason) and not creating an array.
Everything else being equal, calling ToArray() first will impart a greater performance hit than when calling it last. Although, as has been stated by others before me,
Why use ToArray() and Array.ForEach() at all?
We don't know that everything else actually is equal since you do not reveal the implementation details of your filter and action.
The idea of LINQ is to work on enumerable collections, so the best LINQ query is the one where you don't use Array.ForEach() and .ToArray() at all.
I would say that this falls into the category of premature optimization. If, after establishing benchmarks, you find that the code is too slow, you can always try each approach and pick the result that works better for you.
Since we don't know how the IEnumerable<> is produced it's hard to say which approach will perform better. We also don't know how many items will remain after you apply your predicate - nor do we know whether the action or iteration steps are going to be the dominant factor in the execution of your code. The only way to know for sure is to try it both ways, profile the results, and pick the best.
Performance aside, I would choose the version that is most clear - which (for me) is to first filter and then apply the projection to the result.