When writing a LINQ query with multiple "and" conditions, should I write a single where clause containing && or multiple where clauses, one for each conditon?
static void Main(string[] args)
{
var ints = new List<int>(Enumerable.Range(-10, 20));
var positiveEvensA = from i in ints
where (i > 0) && ((i % 2) == 0)
select i;
var positiveEvensB = from i in ints
where i > 0
where (i % 2) == 0
select i;
System.Diagnostics.Debug.Assert(positiveEvensA.Count() ==
positiveEvensB.Count());
}
Is there any difference other than personal preference or coding style (long lines, readability, etc.) between positiveEvensA and positiveEvensB?
One possible difference that comes to mind is that different LINQ providers may be able to better cope with multiple wheres rather than a more complex expression; is this true?
I personally would always go with the && vs. two where clauses whenever it doesn't make the statement unintelligible.
In your case, it probably won't be noticeble at all, but having 2 where clauses definitely will have a performance impact if you have a large collection, and if you use all of the results from this query. For example, if you call .Count() on the results, or iterate through the entire list, the first where clause will run, creating a new IEnumerable<T> that will be completely enumerated again, with a second delegate.
Chaining the 2 clauses together causes the query to form a single delegate that gets run as the collection is enumerated. This results in one enumeration through the collection and one call to the delegate each time a result is returned.
If you split them, things change. As your first where clause enumerates through the original collection, the second where clause enumerates its results. This causes, potentially (worst case), 2 full enumerations through your collection and 2 delegates called per member, which could mean this statement (theoretically) could take 2x the runtime speed.
If you do decide to use 2 where clauses, placing the more restrictive clause first will help quite a bit, since the second where clause is only run on the elements that pass the first one.
Now, in your case, this won't matter. On a large collection, it could. As a general rule of thumb, I go for:
Readability and maintainability
Performance
In this case, I think both options are equally maintainable, so I'd go for the more performant option.
This is mostly a personal style issue. Personally, as long as the where clause fits on one line, I group the clauses.
Using multiple wheres will tend to be less performant because it requires an extra delegate invocation for every element that makes it that far. However it's likely to be an insignificant issue and should only be considered if a profiler shows it to be a problem.
As Jared Par has already said: it depends on your personal preference, readability and the use-case. For example if your method has some optional parameters and you want to filter a collection if they are given, the Where is perfect:
IEnumerable<SomeClass> matchingItems = allItems;
if(!string.IsNullOrWhiteSpace(name))
matchingItems = matchingItems
.Where(c => c.Name == name);
if(date.HasValue)
matchingItems = matchingItems
.Where(c => c.Date == date.Value);
if(typeId.HasValue)
matchingItems = matchingItems
.Where(c => c.TypeId == typeId.Value);
return matchingItems;
If you wanted to do this with &&, have fun ;)
Where i don't agree with Jared and Reed is the performance issue that multiple Where are supposed to have. Actually Where is optimized in a way that it combines multiple predicates to one as you can see here(in CombinePredicates).
But i wanted to know if it really has no big impact if the collection is large and there are multiple Where which all have to be evaluated. I was suprised that the following benchmark revealed that even the multiple Where approach was slightly more efficient. The summary:
Method
Mean
Error
StdDev
MultipleWhere
1.555 s
0.0310 s
0.0392 s
MultipleAnd
1.571 s
0.0308 s
0.0649 s
Here's the benchmark code, i think it's good enough for this test:
#LINQPad optimize+
void Main()
{
var summary = BenchmarkRunner.Run<WhereBenchmark>();
}
public class WhereBenchmark
{
string[] fruits = new string[] { "apple", "mango", "papaya", "banana", "guava", "pineapple" };
private IList<string> longFruitList;
[GlobalSetup]
public void Setup()
{
Random rnd = new Random();
int size = 1_000_000;
longFruitList = new List<string>(size);
for (int i = 1; i < size; i++)
longFruitList.Add(GetRandomFruit());
string GetRandomFruit()
{
return fruits[rnd.Next(0, fruits.Length)];
}
}
[Benchmark]
public void MultipleWhere()
{
int count = longFruitList
.Where(f => f.EndsWith("le"))
.Where(f => f.Contains("app"))
.Where(f => f.StartsWith("pine"))
.Count(); // counting pineapples
}
[Benchmark]
public void MultipleAnd()
{
int count = longFruitList
.Where(f => f.EndsWith("le") && f.Contains("app") && f.StartsWith("pine"))
.Count(); // counting pineapples
}
}
The performance issue only applies to memory based collections ... Linq to SQL generates expression trees that defer execution. More Details here:
Multiple WHERE Clauses with LINQ extension methods
If you run SQL Profiler and check the generated queries, you can see that there is no difference between two types of queries in terms of performance.
So, just your taste in code style.
Like others have suggested, it's more of a personal preference. I like the use of && as it's more readable and mimics the syntax of other mainstream languages.
Related
I have a Foreach Statement as shown below
foreach (var fieldMappingOption in collectionHelper.FieldMappingOptions
.Where(fmo => fmo.IsRequired && !fmo.IsCalculated
&& !fmo.FieldDefinition.Equals( MMPConstants.FieldDefinitions.FieldValue)
&& (implicitParents || anyParentMappings
|| fmo.ContainerType == collectionHelper.SelectedOption.ContainerType)))
{
if (!collectionHelper.FieldMappingHelpers
.Any(fmh => fmh.SelectedOption.Equals(fieldMappingOption)))
{
requiredMissing = true;
var message = String.Format(
"The MMP column {0} is required and therefore must be mapped to a {1} column.",
fieldMappingOption.Label, session.ImportSource.CollectionLabel);
session.ErrorMessages.Add(message);
}
}
Can I break the above complex foreach and IF statements into better formatted LINQ expression. Also, performance wise which will be better. Please suggest.
Re : Change the Foreach to a Linq statement
Well, you could convert the two for loops into a LINQ Select, and since inside the loop, you've only one branch with an additional predicate, you could combine the predicate into the outer loop, something like so:
var missingFieldMappingOptions = collectionHelper.FieldMappingOptions
.Where(fmo => fmo.IsRequired && !fmo.IsCalculated
&& !fmo.FieldDefinition.Equals( MMPConstants.FieldDefinitions.FieldValue)
&& (implicitParents || anyParentMappings
|| fmo.ContainerType == collectionHelper.SelectedOption.ContainerType))
&& !collectionHelper.FieldMappingHelpers
.Any(fmh => fmh.SelectedOption.Equals(fmo)))
.Select(fmo =>
$"The MMP column {fmo.Label} is required and therefore" +
$" must be mapped to a {session.ImportSource.CollectionLabel} column.");
var requiredMissing = missingFieldMappingOptions.Any();
session.ErrorMessages.AddRange(missingFieldMappingOptions)
However, even LINQ can't make the filter clauses in the .Where disappear, so the LINQ Select is hardly more readable than the for loop, and isn't really any more performant either (there may be some marginal benefit to setting the requiredMissing flag and adding to session.ErrorMessages in one bulk chunk.
Performance
From a performance perspective, the below is problematic as it will be O(N log N) when combined in the outer for loop (fortunately .Any() will return early if a match is found, otherwise it would be as bad as N^2):
if (!collectionHelper
.FieldMappingHelpers.Any(fmh => fmh.SelectedOption.Equals(fieldMappingOption)))
Does FieldMappingOption have a unique key? If so, then suggest add a Dictionary<Key, FieldMappingOption> to collectionHelper and then use .ContainsKey(key) which approaches O(1), e.g.
!collectionHelper
.SelectedFieldMappingOptions.ContainsKey(fieldMappingOption.SomeKey)
Even if there isn't a unique key, you could use a decent HashCode on FieldMappingOption and key by that to get a similar effect, although you'll need to consider what happens in the event of a hash collision.
Readability
The Where predicate in the outer for loop is arguably messy and could use some refactoring (for readability, if not for performance).
IMO most of the where clauses could be moved into FieldMappingOption as a meta property, e.g. wrap up
fmo.IsRequired
&& !fmo.IsCalculated
&& !fmo.FieldDefinition.Equals(MMPConstants.FieldDefinitions.FieldValue)
into a property, e.g. fmo.MustBeValidated etc.
You could squeeze minor performance by ensuring the predicate returns false as soon as possible, by rearranging the && clauses which are most likely to fail first, but wouldn't do so if it impacts the flow of readability of the code.
I have a HashSet of ID numbers, stored as integers:
HashSet<int> IDList; // Assume that this is created with a new statement in the constructor.
I have a SortedList of objects, indexed by the integers found in the HashSet:
SortedList<int,myClass> masterListOfMyClass;
I want to use the HashSet to create a List as a subset of the masterListOfMyclass.
After wasting all day trying to figure out the Linq query, I eventually gave up and wrote the following, which works:
public List<myclass> SubSet {
get {
List<myClass> xList = new List<myClass>();
foreach (int x in IDList) {
if (masterListOfMyClass.ContainsKey(x)) {
xList.Add(masterListOfMyClass[x]);
}
}
return xList;
}
private set { }
}
So, I have two questions here:
What is the appropriate Linq query? I'm finding Linq extremely frustrating to try to figuere out. Just when I think I've got it, it turns around and "goes on strike".
Is a Linq query any better -- or worse -- than what I have written here?
var xList = IDList
.Where(masterListOfMyClass.ContainsKey)
.Select(x => masterListOfMyClass[x])
.ToList();
If your lists both have equally large numbers of items, you may wish to consider inverting the query (i.e. iterate through masterListOfMyClass and query IDList) since a HashSet is faster for random queries.
Edit:
It's less neat, but you could save a lookup into masterListOfMyClass with the following query, which would be a bit faster:
var xList = IDList
.Select(x => { myClass y; masterListOfMyClass.TryGetValue(x, out y); return y; })
.Where(x => x != null)
.ToList();
foreach (int x in IDList.Where(x => masterListOfMyClass.ContainsKey(x)))
{
xList.Add(masterListOfMyClass[x]);
}
This is the appropriate linq query for your loop.
Here the linq query will not effective in my point of view..
Here is the Linq expression:
List<myClass> xList = masterListOfMyClass
.Where(x => IDList.Contains(x.Key))
.Select(x => x.Value).ToList();
There is no big difference in the performance in such a small example, Linq is slower in general, it actually uses iterations under the hood too. The thing you get with ling is, imho, clearer code and the execution is defered until it is needed. Not i my example though, when I call .ToList().
Another option would be (which is intentionally the same as Sankarann's first answer)
return (
from x in IDList
where masterListOfMyClass.ContainsKey(x)
select masterListOfMyClass[x]
).ToList();
However, are you sure you want a List to be returned? Usually, when working with IEnumerable<> you should chain your calls using IEnumerable<> until the point where you actually need the data. There you can decide to e.g. loop once (use the iterator) or actually pull the data in some sort of cache using the ToList(), ToArray() etc. methods.
Also, exposing a List<> to the public implies that modifying this list has an impact on the calling class. I would leave it to the user of the property to decide to make a local copy or continue using the IEnumerable<>.
Second, as your private setter is empty, setting the 'SubSet' has no impact on the functionality. This again is confusing and I would avoid it.
An alternate (an maybe less confusing) declaration of your property might look like this
public IEnumerable<myclass> SubSet {
get {
return from x in IDList
where masterListOfMyClass.ContainsKey(x)
select masterListOfMyClass[x]
}
}
Asume we have a list of objects (to make it more clear no properties etc.pp are used)
public class SomeObject{
public bool IsValid;
public int Height;
}
List<SomeObject> objects = new List<SomeObject>();
Now I want only the value from a list, which is both valid and has the lowest height.
Classically i would have used sth like:
SomeObject temp;
foreach(SomeObject so in objects)
{
if(so.IsValid)
{
if (null == temp)
temp = so;
else if (temp.Height > so.Height)
temp = so;
}
}
return temp;
I was thinking that it can be done more clearly with LinQ.
The first approach which came to my mind was:
List<SomeObject> sos = objects.Where(obj => obj.IsValid);
if(sos.Count>0)
{
return sos.OrderBy(obj => obj.Height).FirstOrDefault();
}
But then i waas thinking: In the foreach approach i am going one time through the list. With Linq i would go one time through the list for filtering, and one time for ordering even i do not need to complete order the list.
Would something like
return objects.OrderBy(obj => obj.Height).FirstOrDefault(o => o.IsValid);
also go twice throught the list?
Can this be somehow optimized, so that the linw also only needs to run once through the list?
You can use GroupBy:
IEnumerable<SomeObject> validHighestHeights = objects
.Where(o => o.IsValid)
.GroupBy(o => o.Height)
.OrderByDescending(g => g.Key)
.First();
This group contains all valid objects with the highest height.
The most efficient way to do this with Linq is as follows:
var result = objects.Aggregate(
default(SomeObject),
(acc, current) =>
!current.IsValid ? acc :
acc == null ? current :
current.Height < acc.Height ? current :
acc);
This will loop over the collection only once.
However, you said "I was thinking that it can be done more clearly with LinQ." Whether this is more clear or not, I leave that up to you to decide.
You can try this one:
return (from _Object in Objects Where _Object.isValid OrderBy _Object.Height).FirstOrDefault();
or
return _Objects.Where(_Object => _Object.isValid).OrderBy(_Object => _Object.Height).FirstOrDefault();
Would something like
return objects.OrderBy(obj => obj.Height).FirstOrDefault(o => o.IsValid);
also go twice throught the list?
Only in the worst case scenario, where the first valid object is the last in order of obj.Height (or there is none to be found). Iterating the collection using FirstOrDefault will stop as soon as a valid element is found.
Can this be somehow optimized, so that the linw also only needs to run
once through the list?
I'm afraid you'd have to make your own extension method. Considering what I've written above though, I'd consider it pretty optimized as it is.
**UPDATE**
Actually, the following would be a bit faster, as we'd avoid sorting invalid items:
return object.Where(o => o.IsValid).OrderBy(o => o.Height).FirstOrDefault();
I am currently trying to optimize a .net application with the help of the VS-Profiling tools.
One function, which gets called quite often, contains the following code:
if (someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Cast<int>().First();
}
Which I changed to something like this:
var relevantEntities = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position);
if (relevantEntities.Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = relevantEntities.Cast<int>().First();
}
I was hoping that the change would speed up the method a bit, as I was unsure wether the compiler would notice that the query is done twice and cache the results.
To my surprise the execution time (the number of inklusive samplings) of the method has not decreased, but even increased by 9% (according to the profiler)
Can someone explain why this is happening?
I was hoping that the change would speed up the method a bit, as I was unsure whether the compiler would notice that the query is done twice and cache the results.
It will not. In fact it cannot. The database might not return the same results for the two queries. It's entirely possible for a result to be added or removed after the first query and before the second. (Making this code not only inefficient, but potentially broken if that were to happen.) Since it's entirely possible that you want two queries to be executed, knowing that the results could differ, it's important that the results of the query not be re-used.
The important point here is the idea of deferred execution. relevantEntities is not the results of a query, it's the query itself. It's not until the IQueryable is iterated (by a method such as Count, First, a foreach loop, etc.) that the database will be queried, and each time you iterate the query it will perform another query against the database.
In your case you can just do this:
var lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.OrderByDescending(i => i.Position)
.Select(i => i.Position)
.Cast<int>()
.FirstOrDefault();
This leverages the fact that the default value of an int is 0, which is what you were setting the value to in the event that there was not match before.
Note that this is a query that is functionally the same as yours, it just avoids executing it twice. An even better query would be the one suggested by lazyberezovsky in which you leveraged Max rather than ordering and taking the first. If there is an index on that column there wouldn't be much of a difference, but if there's not a an index ordering would be a lot more expensive.
You can use Max() to get maximum position instead of ordering and taking first item, and DefaultIfEmpty() to provide default value (zero for int) if there are no entities matching your condition. Btw you can provide custom default value to return if sequence is empty.
lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.Select(i => i.Position)
.Cast<int>()
.DefaultIfEmpty()
.Max();
Thus you will avoid executing two queries - one for defining if there is any positions, and another for getting latest position.
Here's the c# code that I have:
private double get806Fees (Loan loan)
{
Loan.Fee.Items class806;
foreach (Loan.Fee.Item currentFee in loan.Item.Fees)
{
if (currentFee.Classification == 806) class806.Add(currentFee);
}
// then down here I will return the sum of all items in class806
}
Can I do this using linq? If so, how? I have never used linq and i've read in several places that using linq instead of a foreach loop is faster... is this true?
Similar to some existing answers, but doing the projection in the query, to make the Sum call a lot simpler:
var sum = (from fee in loan.Items.Fees
where fee.Classification == 806
select fee.SomeValueToSum).Sum();
loan.Item.Fees.
Where(x => x.Classification == 806).
Sum(x => x.SomeValueProperty)
Whether it is faster or not is debatable. IMO, both complexities are the same, the non-LINQ version may be faster.
var q =
from currentFee in loan.Item.Fees
where currentFee.Classification == 806
select currentFee;
var sum = q.Sum(currentFee => currentFee.Fee);
private double get806Fees(Loan loan)
{
return load.Item.Fees.
Where(f => f.Classification == 806).
Sum(f => f.ValueToCalculateSum);
}
I'm assuming here that ValueToCalculateSum is also a double. If it's not then you have to convert it before it is returned.
All of the answers so far are assuming that you're summing up loan.Fees. But the code you actually posted calls Items.Add() to add each Item in loan.Fees.Items to an Items object, and it's that Items object (and not loan.Fees, which is also an Items object) that you say you want to sum up.
Now, if Items is just a simple collection class, then there's no need to do anything other than what people are suggesting here. But if there's some side-effect of the Add method that we don't know about (or, worse, that you don't know about), simply summing up a filtered list of Item objects might not give you the results you're looking for.
You could still use Linq:
foreach (Loan.Fee.Item currentFee in loan.Item.Fees.Where(x => x.Classification == 806)
{
class806.Add(currentFee);
}
return class806.Sum(x => x.Fee)
I'll confess that I'm a little perplexed by the class hierarchy implied here, though, in which the Loan.Item.Fees property is a collection of Loan.Fee.Item objects. I don't know if what I'm seeing is a namespace hierarchy that conflicts with a class hierarchy, or if you're using nested classes, or what. I know I don't like it.