LINQ Optimization Question

LINQ Optimization Question - c#

So I've been using LINQ for a while, and I have a question.
I have a collection of objects. They fall into two categories based on their property values. I need to set a different property one way for one group, and one way for the other:
foreach(MyItem currentItem in myItemCollection)
{
if (currentItem.myProp == "CATEGORY_ONE")
{
currentItem.value = 1;
}
else if (currentItem.myProp == "CATEGORY_TWO")
{
currentItem.value = 2;
}
}
Alternately, I could do something like:
myItemCollection.Where(currentItem=>currentItem.myProp == "CATEGORY_ONE").ForEach(item=>item.value = 1);
myItemCollection.Where(currentItem=>currentItem.myProp == "CATEGORY_TWO").ForEach(item=>item.value = 2);
I would think the first one is faster, but figured it couldn't hurt to check.

Iterating through the collection only once (and not calling any delegates, and not using as many iterators) is likely to be slightly faster, but I very much doubt that it'll be significant.
Write the most readable code which does the job, and only worry about performance at the micro level (i.e. where it's easy to change) when it's a problem.
I think the first piece of code is more readable in this case. Less LINQy, but more readable.

How about doing it like that?
myItemCollection.ForEach(item => item.value = item.myProp == "CATEGORY_ONE" ? 1 : 2);

Real Answer
Only a profiler will really tell you which one is faster.
Fuzzy Answer
The first one is most likely faster in terms of raw speed. There are two reasons why
The list is only iterated a single time
The second one erquires 2 delegate invocations for every element in the list.
The real question though is "does the speed difference between the two solutions matter?" That is the only question that is relevant to your application. And only profiling can really give you much data on this.

Related

Efficiently gather a list of objects

List<MyClass> options = new List<MyClass>();
foreach (MyClass entity in ExistingList) {
if (entity.IsCoolEnough) {
options.Add(entity);
}
}
I'm simply curious what the fastest, most efficient way of doing this is. The list isn't very large, but it's run frequently, so I'd like to keep it snappy. I'm not looking for a change in verbosity either. I just want runtime as fast as possible.

Well, using LINQ it reads more intuitively:
var options = (from e in ExistingList where e.IsCoolEnough select e).ToList();
I'm not sure whether it is faster or more efficient, though.
I'd say that for small lists, this is actually some kind of over-optimization, as for short lists, foreach, for and the above approach shouldn't make a difference at all. So instead of optimizing this, first check whether it imposes a runtime-problem at all.

I see two possible cases of "general" optimization, either at db-level or model-level depending on where the resource hog is.
If you can select via a where on IsCoolEnough in db-level:
var result = ExistingList.Where(e => e.IsCoolEnough) // still in db?
.ToList(); // enumerate
If you can't and the expensive operation is IsCoolEnough:
var result = new ConcurrentBag<MyClass>(); // Thread safe but unordered.
Parallel.ForEach(ExistingList, current =>
{
if (current.IsCoolEnough) result.Add(current);
});
It's impossible for me to give more solid advice given the quite unknown requirements. But you do not have any general "faults" in your implementation.
I like referring to this article, so I'm doing that again now. It's worth the read (Ie click somewhere on/in this line/sentence, it's a link).

Methods: What is better List or object?

While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!

Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).

I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));

This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.

LINQ or foreach - style/readability and speed

I have a piece of code for some validation logic, which in generalized for goes like this:
private bool AllItemsAreSatisfactoryV1(IEnumerable<Source> collection)
{
foreach(var foo in collection)
{
Target target = SomeFancyLookup(foo);
if (!target.Satisfactory)
{
return false;
}
}
return true;
}
This works, is pretty easy to understand, and has early-out optimization. It is, however, pretty verbose. The main purpose of this question is what is considered readable and good style. I'm also interested in the performance; I'm a firm believer that premature {optimization, pessimization} is the root of all evil, and try to avoid micro-optimizing as well as introducing bottlenecks.
I'm pretty new to LINQ, so I'd like some comments on the two alternative versions I've come up with, as well as any other suggestions wrt. readability.
private bool AllItemsAreSatisfactoryV2(IEnumerable<Source> collection)
{
return null ==
(from foo in collection
where !(SomeFancyLookup(foo).Satisfactory)
select foo).First();
}
private bool AllItemsAreSatisfactoryV3(IEnumerable<Source> collection)
{
return !collection.Any(foo => !SomeFancyLookup(foo).Satisfactory);
}
I don't believe that V2 offers much over V1 in terms of readability, even if shorter. I find V3 to be clear & concise, but I'm not too fond of the Method().Property part; of course I could turn the lambda into a full delegate, but then it loses it's one-line elegance.
What I'd like comments on are:
Style - ever so subjective, but what do you feel is readable?
Performance - are any of these a definite no-no? As far as I understand, all three methods should early-out.
Debuggability - anything to consider?
Alternatives - anything goes.
Thanks in advance :)

I think All would be clearer:
private bool AllItemsAreSatisfactoryV1(IEnumerable<Source> collection)
{
return collection.Select(f => SomeFancyLookup(f)).All(t => t.Satisfactory);
}
I think it's unlikely using linq here would cause a performance problem over a regular foreach loop, although it would be straightforward to change if it did.

I personally have no problem with the style of V3, and that one would be my first choice. You're essentially looking through the list for any whose lookup is not satisfactory.
V2 is difficult to grasp the intent of, and in its current form will throw an exception (First() requires the source IEnumerable to not be empty; I think you're looking for FirstOrDefault()). Why not just tack Any() on the end instead of comparing a result from the list to null?
V1 is fine, if a bit loquacious, and probably the easiest to debug, as I've found debugging lambdas to be a bit persnickety at times. You can remove the inner braces to lose some whitespace without sacrificing readability.
Really, all three will boil down into very similar opcodes; iterate through collection, call SomeFancyLookup(), and check a property of its return value; get out on the first failure. Any() "hides" a very similar foreach algorithm. The difference between V1 and all others is the use of a named variable, which MIGHT be a little less performant, but you have a reference to a Target in all three cases so I doubt it's significant, if a difference even exists.

Which LINQ query is more effective?

I have a huge IEnumerable(suppose the name is myItems), which way is more effective?
Solution 1: Filter it first then ForEach.
Array.ForEach(myItems.Where(FILTER-IT-HERE).ToArray(),MY-ACTION);
Solution 2: Do RETURN in MY-ACTION if the item is not up to the mustard.
Array.ForEach(myItems.ToArray(),MY-ACTION-WITH-FILTER);
Is one of them always better than another? Or any other good suggestions? Thanks in advance.

Did you do any measurements? Since WE can't measure the run time of My-Action then only you can. Measure and decide.

Sometimes one has to create benchmark's because similar looking activities could produce radically different and unexpected results.
You do not say what your data source is so I'm going to assume it may be data on an SQL server in which case filtering at the server side will likely always be the best approach because you have minimized the amount of data transfer. Memory access is always faster than data transfer from disk to memory so whenever you can transfer fewer records, you are likely to have better performance.

Well, both times, you're converting to an array, which might not be so efficient if the IEnumerable is very large (like you said). You could create a generic extension method for IEnumerable, like:
public static void ForEach<T>(this IEnumerable<T> current, Action<T> action) {
foreach (var i in current) {
action(i);
}
}
and then you could do this:
IEnumerable<int> ints = new List<int>();
ints.Where(i => i == 5).ForEach(i => Console.WriteLine(i));

If performance is a concern, it's unclear to me why you'd be bothering to construct an entire array in the first place. Why not just this?
foreach (var item in myItems.Where(FILTER-IT-HERE))
MY-ACTION;
Or:
foreach (var item in myItems)
MY-ACTION-WITH-FILTER;
I ask because, while the others are right that you can't really know without testing, I wouldn't expect there to be much difference between the above two options. I would expect there to be a difference, on the other hand, between creating/populating an array (seemingly for no reason) and not creating an array.

Everything else being equal, calling ToArray() first will impart a greater performance hit than when calling it last. Although, as has been stated by others before me,
Why use ToArray() and Array.ForEach() at all?
We don't know that everything else actually is equal since you do not reveal the implementation details of your filter and action.

The idea of LINQ is to work on enumerable collections, so the best LINQ query is the one where you don't use Array.ForEach() and .ToArray() at all.

I would say that this falls into the category of premature optimization. If, after establishing benchmarks, you find that the code is too slow, you can always try each approach and pick the result that works better for you.
Since we don't know how the IEnumerable<> is produced it's hard to say which approach will perform better. We also don't know how many items will remain after you apply your predicate - nor do we know whether the action or iteration steps are going to be the dominant factor in the execution of your code. The only way to know for sure is to try it both ways, profile the results, and pick the best.
Performance aside, I would choose the version that is most clear - which (for me) is to first filter and then apply the projection to the result.

Find() vs. enumeration on lists

I'm working with a code base where lists need to be frequently searched for a single element.
Is it faster to use a Predicate and Find() than to manually do an enumeration on the List?
for example:
string needle = "example";
FooObj result = _list.Find(delegate(FooObj foo) {
return foo.Name == needle;
});
vs.
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
While they are equivalent in functionality, are they equivalent in performance as well?

They are not equivalent in performance. The Find() method requires a method (in this case delegate) invocation for every item in the list. Method invocation is not free and is relatively expensive as compared to an inline comparison. The foreach version requires no extra method invocation per object.
That being said, I wouldn't pick one or the other based on performance until I actually profiled my code and found this to be a problem. I haven't yet found the overhead of this scenario to every be a "hot path" problem for code I've written and I use this pattern a lot with Find and other similar methods.

If searching your list is too slow as-is, you can probably do better than a linear search. If you can keep the list sorted, you can use a binary search to find the element in O(lg n) time.
If you're searching a whole lot, consider replacing that list with a Dictionary to index your objects by name.

Technically, the runtime performance of the delegate version will be slightly worse than the other version - but in most cases you'd be hard pressed to perceive any difference.
Of more importance (IHMO) is the code time performance of being able to write what you want, rather than how you want it. This makes a big difference in maintainability.
This original code:
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
requires any maintainer to read the code and understand that you're looking for a particular item.
This code
string needle = "example";
return _list.Find(
delegate(FooObj foo)
{
return foo.Name == needle;
});
makes it clear that you're looking for a particular item - quicker to understand.
Finally, this code, using features from C# 3.0:
string needle = "example";
return _list.Find( foo => foo.Name == needle);
does exactly the same thing, but in one line that's even faster to read and understand (well, once you understand lambda expressions, anyway).
In summary, given that the performance of the alternatives is nearly equal, choose the one that makes the code easier to read and maintain.

"I'm working with a code base where lists need to be frequently searched for a single element"
It is better to change your data structure to be Dictionary instead of List to get better performance

Similar question was asked for List.ForEach vs. foreach-iteration (foreach vs someList.Foreach(){}).
In that case List.ForEach was a bit faster.

As Jared pointed out, there are differences.
But, as always, don't worry unless you know it's a bottleneck. And if it is a bottleneck, that's probably because the lists are big, in which case you should consider using a faster find - a hash table or binary tree, or even just sorting the list and doing binary search will give you log(n) performance which will have far more impact than tweaking your linear case.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ Optimization Question - c#

How about doing it like that? myItemCollection.ForEach(item => item.value = item.myProp == "CATEGORY_ONE" ? 1 : 2);

Related

Efficiently gather a list of objects

Methods: What is better List or object?

LINQ or foreach - style/readability and speed

Which LINQ query is more effective?

Find() vs. enumeration on lists

Categories

Resources