Which LINQ query is more effective? - c#

I have a huge IEnumerable(suppose the name is myItems), which way is more effective?
Solution 1: Filter it first then ForEach.
Array.ForEach(myItems.Where(FILTER-IT-HERE).ToArray(),MY-ACTION);
Solution 2: Do RETURN in MY-ACTION if the item is not up to the mustard.
Array.ForEach(myItems.ToArray(),MY-ACTION-WITH-FILTER);
Is one of them always better than another? Or any other good suggestions? Thanks in advance.

Did you do any measurements? Since WE can't measure the run time of My-Action then only you can. Measure and decide.

Sometimes one has to create benchmark's because similar looking activities could produce radically different and unexpected results.
You do not say what your data source is so I'm going to assume it may be data on an SQL server in which case filtering at the server side will likely always be the best approach because you have minimized the amount of data transfer. Memory access is always faster than data transfer from disk to memory so whenever you can transfer fewer records, you are likely to have better performance.

Well, both times, you're converting to an array, which might not be so efficient if the IEnumerable is very large (like you said). You could create a generic extension method for IEnumerable, like:
public static void ForEach<T>(this IEnumerable<T> current, Action<T> action) {
foreach (var i in current) {
action(i);
}
}
and then you could do this:
IEnumerable<int> ints = new List<int>();
ints.Where(i => i == 5).ForEach(i => Console.WriteLine(i));

If performance is a concern, it's unclear to me why you'd be bothering to construct an entire array in the first place. Why not just this?
foreach (var item in myItems.Where(FILTER-IT-HERE))
MY-ACTION;
Or:
foreach (var item in myItems)
MY-ACTION-WITH-FILTER;
I ask because, while the others are right that you can't really know without testing, I wouldn't expect there to be much difference between the above two options. I would expect there to be a difference, on the other hand, between creating/populating an array (seemingly for no reason) and not creating an array.

Everything else being equal, calling ToArray() first will impart a greater performance hit than when calling it last. Although, as has been stated by others before me,
Why use ToArray() and Array.ForEach() at all?
We don't know that everything else actually is equal since you do not reveal the implementation details of your filter and action.

The idea of LINQ is to work on enumerable collections, so the best LINQ query is the one where you don't use Array.ForEach() and .ToArray() at all.

I would say that this falls into the category of premature optimization. If, after establishing benchmarks, you find that the code is too slow, you can always try each approach and pick the result that works better for you.
Since we don't know how the IEnumerable<> is produced it's hard to say which approach will perform better. We also don't know how many items will remain after you apply your predicate - nor do we know whether the action or iteration steps are going to be the dominant factor in the execution of your code. The only way to know for sure is to try it both ways, profile the results, and pick the best.
Performance aside, I would choose the version that is most clear - which (for me) is to first filter and then apply the projection to the result.

Related

Is .Select<T>(...) to be prefered before .Where<T>(...)?

I got in a discussion with two colleagues regarding a setup for an iteration over an IEnumerable (the contents of which will not be altered in any way during the operation). There are three conflicting theories on which is the optimal approach. Both the others (and me as well) are very certain and that got me unsure, so for the sake of clarity, I want to check with an external source.
The scenario is as follows. We had the code below as a starting point and discovered that some of the hazaas need not to be acted upon. So, starting with the code below, we started to add a blocker for the action.
foreach(Hazaa hazaa in hazaas) ;
My suggestion is as follows.
foreach(Hazaa hazaa in hazaas.Where(element => condition)) ;
One of the guys wants to resolve it by a more explicit form, claiming that LINQ is not appropriate in this case (not sure why it'd be so but he seems to be very convinced). He's solution is this.
foreach(Hazaa hazaa in hazaas) ;
if(condition) ;
The other contra-suggestion is supported by the claim that Where risks to repeat the filtering process needlessly and that it's more certain to minimize the computational workload by picking the appropriate elements once for all by Select.
foreach(Hazaa hazaa in hazaas.Select(element => condition)) ;
I argue that the first is obsolete, since LINQ can handle data objects quite well.
I also believe that Select-ing is in this case equivalently fast to Where-ing and no needless steps will be taken (e.g. the evaluation of the condition on the elements will only be performed once). If anything, it should be faster using Where because we won't be creating an extra instance of anything.
Who's right?
Select is inappropriate. It doesn't filter anything.
if is a possible solution, but Where is just as explicit.
Where executes the condition exactly once per item, just as the if. Additionally, it is important to note that the call to Where doesn't iterate the list. So, using Where you iterate the list exactly once, just like when using if.
I think you are discussing with one person that didn't understand LINQ - the guy that wants to use Select - and one that doesn't like the functional aspect of LINQ.
I would go with Where.
The .Where() and the if(condition) approach will be the same.
But since LinQ is nicely readable i'd prefer that.
The approach with .Select() is nonsense, since it will not return the Hazaa-Object, but an IEnumerable<Boolean>
To be clear about the functions:
myEnumerable.Where(a => isTrueFor(a)) //This is filtering
myEnumerable.Select(a => a.b) //This is projection
Where() will run a function, which returns a Boolean foreach item of the enumerable and return this item depending on the result of the Boolean function
Select() will run a function for every item in the list and return the result of the function without doing any filtering.

Memory usage in a foreach loop C#

I have a foreach loop
var axsEntities = GetAxsEntitiesForInvoicing(adapter)
.GroupBy(x => x.AccountUsingAccountIdToAccountId);
foreach(var gbAccount in axsEntities)
{
int i = gbAccount.count();
}
Now when i run this without the loop it runs fine, but with the loop it uses way too much memory, 3 gigabytes in this case. What could be the reason for this?
Thanks
Without the loop, nothing is really happening.
axsEntities is just an IEnumerable with deferred execution.
Creating it is always cheap. Only when Iterating over it (the foreach) things are being fetched and computed.
So you just might have very many elements, or .count() uses a lot of memory.
We'd have to see what type axsEntitie is to be sure, but I'm guessing it is a IQueryable? If so, without the for loop you aren't actually doing anything on that set. With the for loop you're actually iterating the result set.
The first expression is probably lazy evaluated. Try a simple
var test = axsEntities.ToList();
to see if that also uses a lot of memory.
The problem likely is NOT the forreach loop, but the GroupBy logic that is delay executed in the loop.
Unless the GetAxyEntitiesForInvoicing method is IQueryable and does not return all entities, the grouping has to happen in memory.
What about gbAccount.count(); inside a foreach loop? This might not be a good idea. I would first check to see if this is responsible for using the precious memory. My advice is that you could come up with a more specialized query e.g. GetAccountsCountForGroupedAxsEntitiesForInvoicingByAccountUsingAccountIdToAccountIdWithSauce, this sounds like a really nice name to me.
Peace

Methods: What is better List or object?

While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!
Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).
I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));
This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.

Best Practice - Removing item from generic collection in C#

I'm using C# in Visual Studio 2008 with .NET 3.5.
I have a generic dictionary that maps types of events to a generic list of subscribers. A subscriber can be subscribed to more than one event.
private static Dictionary<EventType, List<ISubscriber>> _subscriptions;
To remove a subscriber from the subscription list, I can use either of these two options.
Option 1:
ISubscriber subscriber; // defined elsewhere
foreach (EventType event in _subscriptions.Keys) {
if (_subscriptions[event].Contains(subscriber)) {
_subscriptions[event].Remove(subscriber);
}
}
Option 2:
ISubscriber subscriber; // defined elsewhere
foreach (EventType event in _subscriptions.Keys) {
_subscriptions[event].Remove(subscriber);
}
I have two questions.
First, notice that Option 1 checks for existence before removing the item, while Option 2 uses a brute force removal since Remove() does not throw an exception. Of these two, which is the preferred, "best-practice" way to do this?
Second, is there another, "cleaner," more elegant way to do this, perhaps with a lambda expression or using a LINQ extension? I'm still getting acclimated to these two features.
Thanks.
EDIT
Just to clarify, I realize that the choice between Options 1 and 2 is a choice of speed (Option 2) versus maintainability (Option 1). In this particular case, I'm not necessarily trying to optimize the code, although that is certainly a worthy consideration. What I'm trying to understand is if there is a generally well-established practice for doing this. If not, which option would you use in your own code?
Option 1 will be slower than Option 2. Lambda expressions and LINQ will be slower. I would use HashSet<> instead of List<>.
If you need confirmation about item removal, then Contains has to be used.
EDITED:
Since there is a high probabilty of using your code inside lock statement, and best practice is to reduce time of execution inside lock, it may be useful to apply Option 2. It looks like there is no best practice to use or not-use Contains with Remove.
The Remove() method 'approches O(1)' and is OK when a key does not exist.
But otherwise: when in doubt, measure. Getting some timings isn't that difficult...
Why enumerate the keys when all you're concerned with is the values?
foreach (List<ISubscriber> list in _subscriptions.Values)
{
list.Remove(subscriber);
}
That said, the LINQ solution suggested by Eric P is certainly more concise. Performance might be an issue, though.
I'd opt for the second option. Contains() and Remove() are both O(n) methods, and there's no reason to call both since Remove doesn't throw. At least with method 2, you're only calling one expensive operation instead of two.
I don't know of a faster way to handle it.
If you wanted to use Linq to do this, I think this would work (not tested):
_subscriptions.Values.All(x => x.Remove(subscriber));
Might want to check the performance on that though.

Generic list FindAll() vs. foreach

I'm looking through a generic list to find items based on a certain parameter.
In General, what would be the best and fastest implementation?
1. Looping through each item in the list and saving each match to a new list and returning that
foreach(string s in list)
{
if(s == "match")
{
newList.Add(s);
}
}
return newList;
Or
2. Using the FindAll method and passing it a delegate.
newList = list.FindAll(delegate(string s){return s == "match";});
Don't they both run in ~ O(N)? What would be the best practice here?
Regards,
Jonathan
You should definitely use the FindAll method, or the equivalent LINQ method. Also, consider using the more concise lambda instead of your delegate if you can (requires C# 3.0):
var list = new List<string>();
var newList = list.FindAll(s => s.Equals("match"));
I would use the FindAll method in this case, as it is more concise, and IMO, has easier readability.
You are right that they are pretty much going to both perform in O(N) time, although the foreach statement should be slightly faster given it doesn't have to perform a delegate invocation (delegates incur a slight overhead as opposed to directly calling methods).
I have to stress how insignificant this difference is, it's more than likely never going to make a difference unless you are doing a massive number of operations on a massive list.
As always, test to see where the bottlenecks are and act appropriately.
Jonathan,
A good answer you can find to this is in chapter 5 (performance considerations) of Linq To Action.
They measure a for each search that executes about 50 times and that comes up with foreach = 68ms per cycle / List.FindAll = 62ms per cycle. Really, it would probably be in your interest to just create a test and see for yourself.
List.FindAll is O(n) and will search the entire list.
If you want to run your own iterator with foreach, I'd recommend using the yield statement, and returning an IEnumerable if possible. This way, if you end up only needing one element of your collection, it will be quicker (since you can stop your caller without exhausting the entire collection).
Otherwise, stick to the BCL interface.
Any perf difference is going to be extremely minor. I would suggest FindAll for clarity, or, if possible, Enumerable.Where. I prefer using the Enumerable methods because it allows for greater flexibility in refactoring the code (you don't take a dependency on List<T>).
Yes, they both implementations are O(n). They need to look at every element in the list to find all matches. In terms of readability I would also prefer FindAll. For performance considerations have a look at LINQ in Action (Ch 5.3). If you are using C# 3.0 you could also apply a lambda expression. But that's just the icing on the cake:
var newList = aList.FindAll(s => s == "match");
Im with the Lambdas
List<String> newList = list.FindAll(s => s.Equals("match"));
Unless the C# team has improved the performance for LINQ and FindAll, the following article seems to suggest that for and foreach would outperform LINQ and FindAll on object enumeration: LINQ on Objects Performance.
This artilce was dated back to March 2009, just before this post originally asked.

Categories

Resources