Quick LINQ performance question.
I have a database with many many records and it's used for a webshop.
All query logic and paging is done with LINQ, and it performs quite well.
This is, because the usual search for products contains one or more where clause, and that shortens my result set to a couple of hundred results at max.
But.. there is an option to list all products (when no search criteria is provided), and that query is slow.. real slow. Even though i'm just asking for a single page with .Skip(20).Take(10), it's still slow because the total result is something like 140000 products. Is there a way to limit this (or all) query, so that the speed of the whole thing is kept okay?
I don't want to force my customers to provide one or more criteria.. but on the other hand i have no problem with telling them that they can never find more than 2000 products.
Thanks for helping!
Tys
Why don't you limit the number of records on the sql side as described in this post
http://www.sqlservercurry.com/2009/06/skip-and-take-n-number-of-records-in.html
Watch out for any "premature" enumerations when you pass down queries/results in your code!
There are also several LINQ visualizers available, which can help to see what the LINQ expressions actually translate to. Or you can play around with expressions in LINQPad before integrating in your codeā¦
What you can do is to have Linq use stored procedure from the database.
In that case, it will be faster because it is the database engine who will do the work and return it to Linq; the database engine is made for that, and it is closer to data than Linq.
I suggest you give it a try and give us feedback
You can check what indexes has the table and what PK is. It could be the table has no index at all so records compared by field values. Also you can catch the query in the SqlProfiler, run it separately and analyse its query plan.
Related
I want to execute a complex query using Entity Framework (Code First).
I know it can be done by LINQ, but what is the best way to write LINQ query for complex data and inner joining?
I want to execute the following query:
var v = from m in WebAppDbContext.UserSessionTokens
from c in WebAppDbContext.Companies.Include(a => a.SecurityGroups)
from n in WebAppDbContext.SecurityGroups.Include(x => x.Members)
where m.TokenString == userTokenString &&
n.Members.Contains(m.User) &&
c.SecurityGroups.Contains(n)
select c;
Is this the best way to do this?
Does this query get any entire list of records from the db and then executes the filtration on this list? (might be huge list)
And most important: Does it query the database several times?
In my opinion and based on my own experiences, talking about performance, especially joining data sets, it's faster when you write it in SQL. But since you used code first approach then it's not an option. To answer your questions, your query will not query DB several times (you can try debugging and see Events log in VS). EF will transform your query into SQL statement and execute it.
TL:DR; don't micromanage the robots. Let them do their thing and 99% of the time you'll be fine. Linq doesn't really expose methods to micromanage the underlying data query, anyway, its whole purpose is to be an abstraction layer.
In my experience, the Linq provider is pretty smart about converting valid Linq syntax into "pretty good" SQL. Your looks like your query is all inner joins, and those should all be on the primary indexes/foreign keys of each table, so it's going to come up with something pretty good.
If that's not enough to soothe your worries, I'd suggest:
put on a SQL Trace to see the actual query that's being presented to the Database. I bet it's not as simple as Companies join SecurityGroups join Members join UserTokens, but it's probably equivalent to it.
Only worry about optimizing if this becomes an actual performance problem.
If it does become a performance problem, consider refactoring your problem space. It looks like you're trying to get a Company record from a UserToken, by going through three other tables. Is it possible to just relate Users and Companies directly, rather than through the security model? (please excuse my total ignorance of your problem space, I'm just saying "maybe look at the problem from another angle?")
In short, it sounds like you're burning brain cycles attempting to optimize something prematurely. Now, it's reasonable to think if there is a performance problem, this section of the code could be responsible, but that would only be noticeable if you're doing this query a lot. Based on coming from a security token, I'd guess you're doing this once per session to get the current user's contextual information. In that case, the problem isn't usually with Linq, but with your approach to solving the problem for finding Company information. Can you cache the result?
I am fetching a list of products including their prices. I want to get just enable prices.
I wrote two type of queries:
context.Products.Include("Prices").Where(p=>p.Prices.Where(pr=>pr.Enable==true).Count()>0).ToList();
And the other one is:
context.Products.Include("Prices").ToList().RemoveAll(p => p.Prices.Where(pr => pr.Enable == true).ToList().Count == 0);
Which one is more optimized?
Assuming you are using an EntityFramework context, the first one is way better.
This is because Linq to SQL will translate the statement into an SQL statement. The Where statements will result in an according SQL Where. So only the necessary subset of the elements are retrieved.
The second statement retrieves all Products and Prices and then removes the unwanted elements.
This assumes that you have a remote database. If your database is running locally or you already have all Products and Prices in memory its not so easy to tell (you would have to use the profiler for that).
This kind of question really depends on a lot of things, so it is not so easy to say which is better.
But from the code, the first one is doing the where clause at sql side, where the second code is getting all the data out from sql and do the where in application.
so it will depend on the sql server, the application hardware and data amount.
I am currently fiddeling around with the AdventureWorks sample database and LinqPad, for scratching out some ideas.
This is the query in question:
SalesOrderHeaders.GroupBy (soh => new {soh.CustomerID, soh.BillToAddressID})
.Where(soh => soh.Skip(1).Any())
.Dump();
The idea was to find duplicates based on some criteria and then display them except the first set of data. The result should be deleted from the table.
After executing the query I get result A)
After executing the query again I get Result B)
I do not care about the correct results of the query, but about the ordering of the resultset. Only those two possibilities exist and they alternate on every run of the query.
Surely I could just order by Key, but I am more interested in why does this even occur? Why is the order chaning/alternating?
Sql server select query's result set order is not deterministic. It's just how sql server works and it's not a bug in neither linq or linqpad. The only way to get deterministic results on your query, as yourself noted, is to use an OrderBy clause.
Edit: About getting same results in SSMS if you run the query multiple times, see this. This post explains why you might get the same results if you execute a query multiple times and why you shouldn't rely on it.
Ordering is never deterministic as suggested earlier, but try and insert orderby clause in either your sql query or the Linq query, that's the only way to make it deterministic.
In fact let's look little deeper in the database for the reason. DB would be fetching all the data from the disk via i/o. data is stored in the internal sql server structures like pages, extents, segments (these are Oracle data blocks, I hope sql server have the similar stuff). Now when the query is fired, database would know all the different locations to fetch the data from, but this is not a serial operation, it is sort of parallel fetch, where different datasets are then combined to provide a user view. Now as we know in case of threads , it can never be deterministic that who goes first and who returns first, that's totally OS scheduling of the threads, hope that clarifies further little more.
OrderBy clause will work on fetched data to make in a particular order, so will always yield deterministic result.
I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.
I'm in the midst of trying to replace a the Criteria queries I'm using for a multi-field search page with LINQ queries using the new LINQ provider. However, I'm running into a problem getting record counts so that I can implement paging. I'm trying to achieve a result
equivalent to that produced by a CountDistinct projection from the Criteria API using LINQ. Is there a way to do this?
The Distinct() method provided by LINQ doesn't seem to behave the way I would expect, and appending ".Distinct().Count()" to the end of a LINQ query grouped by the field I want a distinct count of (an integer ID column) seems to return a non-distinct count of those values.
I can provide the code I'm using if needed, but since there are so many fields, it's
pretty long, so I didn't want to crowd the post if it wasn't needed.
Thanks!
I figured out a way to do this, though it may not be optimal in all situations. Just doing a .Distinct() on the LINQ query does, in fact, produce a "distinct" in the resulting SQL query when used without .Count(). If I cause the query to be enumerated by using .Distinct().ToList() and then use the .Count() method on the resulting in-memory collection, I get the result I want.
This is not exactly equivalent to what I was originally doing with the Criteria query, since the counting is actually being done in the application code, and the entire list of IDs must be sent from the DB to the application. In my case, though, given the small number of distinct IDs, I think it will work, and won't be too much of a performance bottleneck.
I do hope, however, that a true CountDistinct() LINQ operation will be implemented in the future.
You could try selecting the column you want a distinct count of first. It would look something like: Select(p => p.id).Distinct().Count(). As it stands, you're distincting the entire object, which will compare the reference of the object and not the actual values.