Stacking where conditions with Lambda expressions - c#

We generally add multiple conditions in Where expression separating with &&(||).
Suppose, if I stack multiple where conditions is there any difference in performance?
For example:
Is this line
dbContext.Students.Where(s=> s.Section = 5 && s.Marks >50).ToList();
Similar to
dbContext.Students.Where(s=>s.Section = 5).Where(s=>s.Marks > 50).ToList();
Note: Above line is possible as Where returns IQueryable which inturn has Where.

The time your statement hits the db is when .ToList() is called. Hence what you do before that wouldn't create much difference in practice.
However from the perspective of pure mathematical performance, there should be some difference related with the translation process going behind those. Which can be understood by an experiment as #sujith karivelil suggests, or by some deep reading.

I think that using '&&' and '||' operators instead of multiple clauses essentially results in 1 enumeration over the full collection once. Multiple 'Where' clauses means you will enumerate over the full collection, then the results, which could be the full collection again.

I would suggest the use operator "&&" .. Because its filter the records at a single moment even used multiple condition.
But if you use multiple where statements it definitely impact the performance because how many times you used where statement it hit the result query for filtering.

Related

Does the performance change when I switch Distinct() and OrderBy() in a LINQ Query? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am just considering which provides me the best performance when I use both OrderBy() and Distinct() inside a LINQ Query. It seems to me they're both equal in speed as the Distinct() method will use a hash table while in-memory and I assume that any SQL query would be optimized first by .NET before it gets executed.
Am I correct in assuming this or does the order of these two commands still affect the performance of LINQ in general?
As for how it would work... When you build a LINQ query, you're basically building an expression tree but nothing gets executed yet. So calling MyList.Distinct().OrderBy() would just make this tree, yet won't execute it. (It's deferred.) Only when you call another function like ToList() would the expression tree get executed and the runtime could optimize the expression tree before it gets executed.
For LINQ to objects even if we assume that that OrderBy(...).Distinct() and Distinct().OrderBy(...) will return the same result (which is not guaranteed) the performance will depend on the data.
If you have a lot of duplication in data - running Distinct first should be faster. Next benchmark shows that (at least on my machine):
public class LinqBench
{
private static List<int> test = Enumerable.Range(1, 100)
.SelectMany(i => Enumerable.Repeat(i, 10))
.Select((i, index) => (i, index))
.OrderBy(t => t.index % 10)
.Select(t => t.i)
.ToList();
[Benchmark]
public List<int> OrderByThenDistinct() => test.OrderBy(i => i).Distinct().ToList();
[Benchmark]
public List<int> DistinctThenOrderBy()=> test.Distinct().OrderBy(i => i).ToList();
}
On my machine for .Net Core 3.1 it gives:
Method
Mean
Error
StdDev
OrderByThenDistinct
129.74 us
2.120 us
1.879 us
DistinctThenOrderBy
19.58 us
0.384 us
0.794 us
First, seq.OrderBy(...).Distinct() and seq.Distinct().OrderBy(...) are not guaranteed to return the same result, because Distinct() may return an unordered enumeration. MS implementation conveniently preserves the order, but if you pass a LINQ query to the database, the results may come back in any order the DB engine sees fit.
Second, in the extreme case when you have lots of duplication (say, five values repeated randomly 1,000,000 times) you would be better off doing a Distinct before OrderBy().
Long story short, if you want your results to be ordered, use Distinct().OrderBy(...) regardless of the performance.
I assume that any SQL query would be optimized first by .NET before it gets >
executed.
And how do you think that would work, given that:
Only the SQL executing side (the server) has the knowledge for this (i.e. which indices to use) AND has a query optimizer that is supposed to optimize the executed query based on the statistics of the table.
You have to be VERY sure that you do not change the result in any way.
Sorry, this makes no sense - there are pretty much no optimizations that you CAN safely do in C# without having all the internal details of the database, so the query is sent to the database for analysis.
As such, an OrderBy or a Distinct (ESPECIALLY a distinct) WILL impact performance - how much depends on i.e. whether the OrderBy can rely on an index.
or does the order of these two commands still affect the performance of LINQ
in general?
Here it gets funny (and you give no example).
DISTINCT and ORDERBY are in SQL in a specific order, regardless how you formulated it in LINQ. There is only ONE allowed syntax as per SQL definition. LINQ puts the query together and optimizes that out. If you look at the syntax, there is a specific place for the DISTINCT (which is a SQL term for at least SQL Server) and the OrderBy.
On the other side...
.Distinct().OrderBy() and .OrderBy().Distinct()
have DIFFERENT RESULTS. They CAN be done in SQL (you can use the output of the Distinct as a virtual table that you then order), but they have a different semantic. Unless you think that LINQ will magically read your mind, there is no context for the compiler other than to assume you are competent in writing what you do (as long as it is legal) and execute these steps in the order you gave.
Except: The DOCUMENTATION for Distinct in Queryable is clear this is not done:
https://learn.microsoft.com/en-us/dotnet/api/system.linq.queryable.distinct?redirectedfrom=MSDN&view=net-5.0#System_Linq_Queryable_Distinct__1_System_Linq_IQueryable___0__
says that Distinct returns an unordered list.
So, there is a fundamental difference and they are not the same.

What is the correct order to use LinQ statements?

I often use LinQ statements to query with EF, or to filter data, or to search my data collections, but I've always had that doubt about which is the first statement to write.
Let's say we have a query similar to this:
var result = Data.Where(x => x.Text.StartsWith("ABC")).OrderBy(x => x.Id).Select(x => x.Text).Take(5).ToList();
The same query works even if the statements are in different order, for example:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
I understand that there are certain statements that do modify the expected result, but my doubt is with those that do not modify, as in the previous example. Does a specified order or any good practice guide exist for this?
It will give you different results. Let's assume that you have following ids:
6,5,4,3,2,1
The first statement will give you
1,2,3,4,5
and the second one
2,3,4,5,6
I assumed that all objects with following ids start with ABC
Edit: I think I haven't answered the question properly. Yes, there is a difference. In the first example you only sort 5 elements however in the second one you order all elements which is definitely slower than the first one.
Does a specified order or any good practice guide exist for this?
No, because the order determines what the result is. In SQL (a declarative language), SELECT always comes before WHERE, which comes before GROUP BY, etc., and the parsing engine turns that into an execution plan which will execute in whatever order the optimizer thinks is best.
So selecting, then ordering, then grouping all happens on the data specified by the FROM clause(s), so order does not matter.
C# (within methods) is a procedural language, meaning that statements will be executed in the exact order that you provide them.
When you select, then order, the ordering applies to the selection, meaning that if you select a subset of fields (or project to different fields), the ordering applies to the projection. If you order, then select, the ordering applies to the original data, then the projection applies to the ordered data data.
In your second edited example, the query seems to be broken because you are specifying properties that would be lost from the projection:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
^
at this (^) point, you are projecting just the Text property, which I'm assuming sia string, and thus the subsequent Select is working on a collection of strings, which would not have a Text property to filter off of.
Certainly you could change the Where to filter the strings directly, but it illustrates that shifting the order of commands can have a catastrophic impact on the query. It might not make a difference, as you are trying to illustrate, for example, ordering then filtering should be logically equivalent to filtering then ordering (assuming that one doesn't impact the other), and there's no "best practice" to say which should go first, so the right answer (if there is one) would be determined on a case-by-case basis.

Does the order of EF linq query clauses influence performance?

Should I worry about the order of Entity Framework linq query clauses in terms of performance?
In the example below could changing the order of the two where clauses have an performance impact on the DB lookup?
using (var context = new ModelContext())
{
var fetchedImages = (from images in context.Images.Include("ImageSource")
where images.Type.Equals("jpg")
where images.ImageSource.Id == 5
select images).ToArray();
}
No, changing of these two where clauses will not affect performance.
Generated SQL will look like this anyway:
WHERE [condition1] AND [condition2]
Besides, you can write conditions, combined with logical operators:
where images.Type.Equals("jpg") && images.ImageSource.Id == 5
The general case is that the where operators will "short circuit", so if the first is false, the second won't even be examined, so if one will fail more frequently, generally you want to check it first.
However, this breaks down if the more frequent failure is slow to process.
The only way to tell for sure is by profiling your code using appropriate tools.

C# Linq - Delayed Execution

If I build a query say:
(the query is build using XDocument class from System.Xml.Linq)
var elements = from e in calendarDocument.Root.Elements("elementName") select e;
and then I call elements.Last() SEVERAL times. Will each call return the most up to date Last() element?
For example if I do
elements.Last().AddAfterSelf(new XElement("elementName", "someValue1"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue2"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue3"));
elements.Last().AddAfterSelf(new XElement("elementName", "someValue4"));
Is it actually getting the latest element each time and added a new one to the end or is elements.Last() the same element each time?
Yes, linq queries are lazy evaluated. It's not until you call Last() that the query will be executed. In this case it will get the most up to date last element each time.
I think that this actually is the first time I've seen a proper use of calling Last() (or any other operator that executes the query) several times. When people do it, it is usually by mistake causing bad performance.
I think that the linq-to-xml library is smart enough to get good performance for your query, but I wouldn't trust it without trying.

NHibernate - Equivalent of CountDistinct projection using LINQ

I'm in the midst of trying to replace a the Criteria queries I'm using for a multi-field search page with LINQ queries using the new LINQ provider. However, I'm running into a problem getting record counts so that I can implement paging. I'm trying to achieve a result
equivalent to that produced by a CountDistinct projection from the Criteria API using LINQ. Is there a way to do this?
The Distinct() method provided by LINQ doesn't seem to behave the way I would expect, and appending ".Distinct().Count()" to the end of a LINQ query grouped by the field I want a distinct count of (an integer ID column) seems to return a non-distinct count of those values.
I can provide the code I'm using if needed, but since there are so many fields, it's
pretty long, so I didn't want to crowd the post if it wasn't needed.
Thanks!
I figured out a way to do this, though it may not be optimal in all situations. Just doing a .Distinct() on the LINQ query does, in fact, produce a "distinct" in the resulting SQL query when used without .Count(). If I cause the query to be enumerated by using .Distinct().ToList() and then use the .Count() method on the resulting in-memory collection, I get the result I want.
This is not exactly equivalent to what I was originally doing with the Criteria query, since the counting is actually being done in the application code, and the entire list of IDs must be sent from the DB to the application. In my case, though, given the small number of distinct IDs, I think it will work, and won't be too much of a performance bottleneck.
I do hope, however, that a true CountDistinct() LINQ operation will be implemented in the future.
You could try selecting the column you want a distinct count of first. It would look something like: Select(p => p.id).Distinct().Count(). As it stands, you're distincting the entire object, which will compare the reference of the object and not the actual values.

Categories

Resources