I'm in the midst of trying to replace a the Criteria queries I'm using for a multi-field search page with LINQ queries using the new LINQ provider. However, I'm running into a problem getting record counts so that I can implement paging. I'm trying to achieve a result
equivalent to that produced by a CountDistinct projection from the Criteria API using LINQ. Is there a way to do this?
The Distinct() method provided by LINQ doesn't seem to behave the way I would expect, and appending ".Distinct().Count()" to the end of a LINQ query grouped by the field I want a distinct count of (an integer ID column) seems to return a non-distinct count of those values.
I can provide the code I'm using if needed, but since there are so many fields, it's
pretty long, so I didn't want to crowd the post if it wasn't needed.
Thanks!
I figured out a way to do this, though it may not be optimal in all situations. Just doing a .Distinct() on the LINQ query does, in fact, produce a "distinct" in the resulting SQL query when used without .Count(). If I cause the query to be enumerated by using .Distinct().ToList() and then use the .Count() method on the resulting in-memory collection, I get the result I want.
This is not exactly equivalent to what I was originally doing with the Criteria query, since the counting is actually being done in the application code, and the entire list of IDs must be sent from the DB to the application. In my case, though, given the small number of distinct IDs, I think it will work, and won't be too much of a performance bottleneck.
I do hope, however, that a true CountDistinct() LINQ operation will be implemented in the future.
You could try selecting the column you want a distinct count of first. It would look something like: Select(p => p.id).Distinct().Count(). As it stands, you're distincting the entire object, which will compare the reference of the object and not the actual values.
Related
When I trying to do this
//data.Photos it's IEnumerable<Photo>. Comparer worked by Id.
List<Photo> inDb = db.Photos
.Intersect(data.Photos, new PhotoComparer())
.ToList();
I get an exception:
NotSupportedException: Could not parse expression
'value(Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1[ReportViewer.Models.DbContexts.Photo]).Intersect(__p_0, __p_1)'
This overload of the method #x27;System.Linq.Queryable.Intersect' is currently not supported.
// This works
List<Photo> inDb = db.Photos
.ToList()
.Intersect(data.Photos, new PhotoComparer())
.ToList();
// But will it take a long time - or not ?
What did I need to use Intersect with IQueryable and IEnumerable collection?
Due to the "custom comparer", although it's functionality might be trivial, the framework is currently not able to translate your statement to SQL (which I suspect you are using).
Next, it seems that you have a in memory collection, on which you want to perform this intersect.
So if you're wondering about speed, in order to get it working you'll need to send your data to the database server, and based on the Id's retrieve your data.
So basically, you are looking for a way to perform an inner join, which would be the SQL equivalent of the intersect.
Which you could do with the flowing linq query:
//disclaimer: from the top of my head
var list= from dbPhoto in db.Photos
join dataPhoto in data.Photos on dbPhoto.Id equals dataPhoto.Id
select dbPhoto;
This will not work though, since as far as I know EF isn't able to perform an join against an in-memory dataset.
So, alternatively you could:
fetch the data as IEnumerable (but yes, you'll be retrieving the whole set first)
use a Contains, be carefull though, if you're not using primitive types this can translate to a bunch of SQL OR statements
But basically it depends on the amount of data you're querying. You might want to reconsider your setup and try to be able to query the data based on some ownership, like user or other means.
I often use LinQ statements to query with EF, or to filter data, or to search my data collections, but I've always had that doubt about which is the first statement to write.
Let's say we have a query similar to this:
var result = Data.Where(x => x.Text.StartsWith("ABC")).OrderBy(x => x.Id).Select(x => x.Text).Take(5).ToList();
The same query works even if the statements are in different order, for example:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
I understand that there are certain statements that do modify the expected result, but my doubt is with those that do not modify, as in the previous example. Does a specified order or any good practice guide exist for this?
It will give you different results. Let's assume that you have following ids:
6,5,4,3,2,1
The first statement will give you
1,2,3,4,5
and the second one
2,3,4,5,6
I assumed that all objects with following ids start with ABC
Edit: I think I haven't answered the question properly. Yes, there is a difference. In the first example you only sort 5 elements however in the second one you order all elements which is definitely slower than the first one.
Does a specified order or any good practice guide exist for this?
No, because the order determines what the result is. In SQL (a declarative language), SELECT always comes before WHERE, which comes before GROUP BY, etc., and the parsing engine turns that into an execution plan which will execute in whatever order the optimizer thinks is best.
So selecting, then ordering, then grouping all happens on the data specified by the FROM clause(s), so order does not matter.
C# (within methods) is a procedural language, meaning that statements will be executed in the exact order that you provide them.
When you select, then order, the ordering applies to the selection, meaning that if you select a subset of fields (or project to different fields), the ordering applies to the projection. If you order, then select, the ordering applies to the original data, then the projection applies to the ordered data data.
In your second edited example, the query seems to be broken because you are specifying properties that would be lost from the projection:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
^
at this (^) point, you are projecting just the Text property, which I'm assuming sia string, and thus the subsequent Select is working on a collection of strings, which would not have a Text property to filter off of.
Certainly you could change the Where to filter the strings directly, but it illustrates that shifting the order of commands can have a catastrophic impact on the query. It might not make a difference, as you are trying to illustrate, for example, ordering then filtering should be logically equivalent to filtering then ordering (assuming that one doesn't impact the other), and there's no "best practice" to say which should go first, so the right answer (if there is one) would be determined on a case-by-case basis.
I save my data in a binary-look string, "100010" ,for example. And I want to check whether it has same value in corresponding place with the other string "100000".
So I try to use "Intersection". In this Condition, the result of intersection will be "100000", and it could be seen as the item I need for my requirement. But how can I use this conception when I query a Entity to Linq statement?
Here is my thought:
var chemicals = db.ChemicalItem.Where(c => c.CategoryNumber.ToCharArray().Intersect(catekey.ToCharArray()).Count()>0);
"CategoryNumber" is my data, and "catekey" is the string for comparing. Both of them are binary-look string(cantain 6 chars). And if the count is not 0,they have '1's in the same index. And I can get the correct query.
Sadly, It didn't work. I always get DbExpressionBinding Error. Can somone tell me What's Wrong? Thanks.
PS:I'm not good at English and post the question here first time, sorry for my bad expression and thank for your reading.
LINQ to Entities is trying to create a SQL query out of your condition, but is not able to do it for the expression you specified.
One way to "fix" the problem would be to do the filtering in code instead of in SQL, but this will impact performance, because all of the records will be retrieved to the client and filtered there. This is how you could do it (notice the added ToList()):
var chemicals = db.ChemicalItem.ToList().Where(c => c.CategoryNumber.ToCharArray().Intersect(catekey.ToCharArray()).Count()>0);
A suggested way would be to do the filtering in SQL, but in this case you will need to write an equivalent stored procedure in SQL which will do the filtering and call that from your EF code. Still such filtering will not be very effective because SQL will not be able to use any indices and will always need to do a table scan.
I need to write a query pulling distinct values from columns defined by a user for any given data set. There could be millions of rows so the statements must be as efficient as possible. Below is the code I have.
What is the order of this LINQ query? Is there a more efficient way of doing this?
var MyValues = from r in MyDataTable.AsEnumerable()
orderby r.Field<double>(_varName)
select r.Field<double>(_varName);
IEnumerable result= MyValues.Distinct();
I can't speak much to the AsEnumerable() call or the field conversions, but for the LINQ side of things, the orderby is a stable quick sort and should be O(n log n). If I had to guess, everything but the orderby should be O(n), so overall you're still just O(n log n).
Update: the LINQ Distinct() call should also be O(n).
So altogether, the Big-Oh for this thing is still O(Kn log n), where K is some constant.
Is there a more efficent way of doing this?
You could get better efficiency if you do the sort as part of the query that initializes MyDataTable, instead of sorting in memory afterwards.
from comments
I actually use MyDistinct.Distinct()
If you want distinct _varName values and you cannot do this all in the select query in dbms(what would be the most efficient way), you should use Distinct before OrderBy. The order matters here.
You would need to order all million of rows before you start to filter out the duplicates. If you use distinct first, you need to order only the rest.
var values = from r in MyDataTable.AsEnumerable()
select r.Field<double>(_varName);
IEnumerable<double> orderedDistinctValues = values.Distinct()
.OrderBy(d => d);
I have asked a related question recently which E.Lippert answered with a good explanation when order matters and when not:
Order of LINQ extension methods does not affect performance?
Here's a little demo where you can see that the order matters, but you can also see that it does not really matter since comparing doubles is trivial for a cpu:
Time for first orderby then distinct: 00:00:00.0045379
Time for first distinct then orderby: 00:00:00.0013316
your above query (linq) is good if you want all the million records and you have enough memory on a 64bit memory addressing OS.
the order of the query is, if you see the underlying command, would be transalated to
Select <_varname> from MyDataTable order by <_varname>
and this is as good as it is when run on the database IDE or commandline.
to give you a short answer regarding performance
put in a where clause if you can (with columns that are indexed)
ensure that the user can choose colums (_varname) that are indexed. Imagine the DB trying to sort million records on an unindexed column, which is evidently slow, but endangers linq to receive the badpress
Ensure that (if possible) initilisation of the MyDataTable is done correctly with the records that are of value (again based on a where clause)
profile your underlying query,
if possible, create storedprocs (debatable). you can create an entity model which includes storedprocs aswell
it may be faster today, but with the tablespace growing, and if your data is not ordered (indexed) thats where things get slowerr (even if you had a good linq expression)
Hope this helps
that said, if your db is not properly indexed, meaning
Quick LINQ performance question.
I have a database with many many records and it's used for a webshop.
All query logic and paging is done with LINQ, and it performs quite well.
This is, because the usual search for products contains one or more where clause, and that shortens my result set to a couple of hundred results at max.
But.. there is an option to list all products (when no search criteria is provided), and that query is slow.. real slow. Even though i'm just asking for a single page with .Skip(20).Take(10), it's still slow because the total result is something like 140000 products. Is there a way to limit this (or all) query, so that the speed of the whole thing is kept okay?
I don't want to force my customers to provide one or more criteria.. but on the other hand i have no problem with telling them that they can never find more than 2000 products.
Thanks for helping!
Tys
Why don't you limit the number of records on the sql side as described in this post
http://www.sqlservercurry.com/2009/06/skip-and-take-n-number-of-records-in.html
Watch out for any "premature" enumerations when you pass down queries/results in your code!
There are also several LINQ visualizers available, which can help to see what the LINQ expressions actually translate to. Or you can play around with expressions in LINQPad before integrating in your codeā¦
What you can do is to have Linq use stored procedure from the database.
In that case, it will be faster because it is the database engine who will do the work and return it to Linq; the database engine is made for that, and it is closer to data than Linq.
I suggest you give it a try and give us feedback
You can check what indexes has the table and what PK is. It could be the table has no index at all so records compared by field values. Also you can catch the query in the SqlProfiler, run it separately and analyse its query plan.