I want to execute a complex query using Entity Framework (Code First).
I know it can be done by LINQ, but what is the best way to write LINQ query for complex data and inner joining?
I want to execute the following query:
var v = from m in WebAppDbContext.UserSessionTokens
from c in WebAppDbContext.Companies.Include(a => a.SecurityGroups)
from n in WebAppDbContext.SecurityGroups.Include(x => x.Members)
where m.TokenString == userTokenString &&
n.Members.Contains(m.User) &&
c.SecurityGroups.Contains(n)
select c;
Is this the best way to do this?
Does this query get any entire list of records from the db and then executes the filtration on this list? (might be huge list)
And most important: Does it query the database several times?
In my opinion and based on my own experiences, talking about performance, especially joining data sets, it's faster when you write it in SQL. But since you used code first approach then it's not an option. To answer your questions, your query will not query DB several times (you can try debugging and see Events log in VS). EF will transform your query into SQL statement and execute it.
TL:DR; don't micromanage the robots. Let them do their thing and 99% of the time you'll be fine. Linq doesn't really expose methods to micromanage the underlying data query, anyway, its whole purpose is to be an abstraction layer.
In my experience, the Linq provider is pretty smart about converting valid Linq syntax into "pretty good" SQL. Your looks like your query is all inner joins, and those should all be on the primary indexes/foreign keys of each table, so it's going to come up with something pretty good.
If that's not enough to soothe your worries, I'd suggest:
put on a SQL Trace to see the actual query that's being presented to the Database. I bet it's not as simple as Companies join SecurityGroups join Members join UserTokens, but it's probably equivalent to it.
Only worry about optimizing if this becomes an actual performance problem.
If it does become a performance problem, consider refactoring your problem space. It looks like you're trying to get a Company record from a UserToken, by going through three other tables. Is it possible to just relate Users and Companies directly, rather than through the security model? (please excuse my total ignorance of your problem space, I'm just saying "maybe look at the problem from another angle?")
In short, it sounds like you're burning brain cycles attempting to optimize something prematurely. Now, it's reasonable to think if there is a performance problem, this section of the code could be responsible, but that would only be noticeable if you're doing this query a lot. Based on coming from a security token, I'd guess you're doing this once per session to get the current user's contextual information. In that case, the problem isn't usually with Linq, but with your approach to solving the problem for finding Company information. Can you cache the result?
Related
I am fetching a list of products including their prices. I want to get just enable prices.
I wrote two type of queries:
context.Products.Include("Prices").Where(p=>p.Prices.Where(pr=>pr.Enable==true).Count()>0).ToList();
And the other one is:
context.Products.Include("Prices").ToList().RemoveAll(p => p.Prices.Where(pr => pr.Enable == true).ToList().Count == 0);
Which one is more optimized?
Assuming you are using an EntityFramework context, the first one is way better.
This is because Linq to SQL will translate the statement into an SQL statement. The Where statements will result in an according SQL Where. So only the necessary subset of the elements are retrieved.
The second statement retrieves all Products and Prices and then removes the unwanted elements.
This assumes that you have a remote database. If your database is running locally or you already have all Products and Prices in memory its not so easy to tell (you would have to use the profiler for that).
This kind of question really depends on a lot of things, so it is not so easy to say which is better.
But from the code, the first one is doing the where clause at sql side, where the second code is getting all the data out from sql and do the where in application.
so it will depend on the sql server, the application hardware and data amount.
I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.
I'm fairly new to nHibernate having come from an EF background and I'm struggling with the following query :
_patientSearchResultModel = (from patient in _patientRepository.Query(patientSearch.BuildPatientSpecification())
join admission in _admissionRepository.Query(patientSearch.BuildAdmissionSpecification())
on patient.Id equals admission.Patient.Id
orderby admission.AdmissionDate
select new PatientSearchResultModel(patient.Id,
admission.Id,
false,
_phaseTypeMapper.GetPhaseTypeModel(admission.PhaseType),
patient.Last, patient.First,
admission.InPatientLocation,
admission.AdmissionDate,
admission.DischargeDate,
admission.RRI,
null,
admission.CompletionStatus,
admission.FollowupStatus)).ToList();
The intent of this query is to allow users to filter the two queries on parameters built up using the two Build???Specification functions and return the resultset. There could be many admission records and I would only like one PatientSearchResultModel per patient object, with the admission object being the newest one by Admission Date.
These objects are coming from nHibernate and it keeps return a Not Supported exception. There is also an association between Patient and Admissions thus : Patient.Admissions but i couldn't figure out how to then add the query filters return from the function Build???Specifications.
I'd be really grateful if someone could point me in the right direction; am I up against the Linq provider implementation here in nHibernate and need to move to Criteria or is it my Linq query ?
If anyone has any links or suggestions for good books or other learning materials in this area that would also be really helpful too.
I see several potential problems:
If you're using NHibernate 2.x + Linq2NHibernate explicit joins like that are not supported; in other versions they're just considered a smell.
I dont think NHibernate supports calling parameterized constructors in select clauses
I'm very sure NHibernate does not support calling instance methods in the select lambda
I'd suggest using the lambda syntax and SelectMany to alleviate potential join issues. Points #2 & #3 can be solved by projecting into an anonymous type, calling AsEnumerable then projecting into your model type.
Overall I'd suggest restructuring your code like:
var patientSpec = patientSearch.BuildPatientSpecification();
var admissionSpec = patientSearch.BuildAdmissionSpecification();
_patientSearchResultModel = _patientRepository.Where(patientSpec)
.SelectMany(p=>p.Admissions).Where(admissionSpec)
.Select(a=> new {
PatientId = a.Patient.Id,
AdminssionId = a.Id,
a.PhaseType,
a.Patient.Last,
a.Patient.First,
a.InPatientLocation,
a.AdmissionDate,
a.DischargeDate,
a.RRI,
a.CompletionStatus,
a.FollowupStatus
}).AsEnumerable()
.Select(x=> new PatientSearchResultModel(x.PatientId, x.AdmissionId ...))
.ToList();
Divide your query into parts and check which part runs and which doesn't.
My take on this is that select new ... is not supported in Linq to nHibernate.
I would recomend using something else, because it is simply too imature and feature-less to use seriously.
As with most popular LINQ-to-Database query providers, NHibernate will try to translate the whole query into a SQL statement to run against the database. This requires that all elements of your query are possible to express in the SQL flavour you're using.
In your query, the select new statement cannot be expressed in SQL, because you're making a call to the constructor of your PatientSearchResultModel class and are making a call to a GetPhaseTypeModel method.
You should restructure your query to express what you want to execute on the SQL database, then call AsEnumerable() to force the remainder of the query to be evaluated in-memory. After that call, you can call the constructor of your class and any .NET methods, and they will be executed as native code.
This query is too complex to describe it using Linq. It would give wrong result finally (if Patient has more than one admission records, result would have duplicate entries).
I see two steps for solution:
1) At development stage, use in-memory query. So, take Patients using ToList() first (query db at this moment). Some predicates (Patient filter like MRN, First, Last) could be used at this stage.
And then do search in-memory. Not performance, but working solution. Mark it for refactor to optimize later.
2) Finally, use NHibernate IQuery (ISQLQuery) and build sql query manually to make sure it would work as expected and work fast enough on SQL Server side. This is just read-only query and do not require Nhibernate query engine (Linq to Nhibernate) at all.
Quick LINQ performance question.
I have a database with many many records and it's used for a webshop.
All query logic and paging is done with LINQ, and it performs quite well.
This is, because the usual search for products contains one or more where clause, and that shortens my result set to a couple of hundred results at max.
But.. there is an option to list all products (when no search criteria is provided), and that query is slow.. real slow. Even though i'm just asking for a single page with .Skip(20).Take(10), it's still slow because the total result is something like 140000 products. Is there a way to limit this (or all) query, so that the speed of the whole thing is kept okay?
I don't want to force my customers to provide one or more criteria.. but on the other hand i have no problem with telling them that they can never find more than 2000 products.
Thanks for helping!
Tys
Why don't you limit the number of records on the sql side as described in this post
http://www.sqlservercurry.com/2009/06/skip-and-take-n-number-of-records-in.html
Watch out for any "premature" enumerations when you pass down queries/results in your code!
There are also several LINQ visualizers available, which can help to see what the LINQ expressions actually translate to. Or you can play around with expressions in LINQPad before integrating in your codeā¦
What you can do is to have Linq use stored procedure from the database.
In that case, it will be faster because it is the database engine who will do the work and return it to Linq; the database engine is made for that, and it is closer to data than Linq.
I suggest you give it a try and give us feedback
You can check what indexes has the table and what PK is. It could be the table has no index at all so records compared by field values. Also you can catch the query in the SqlProfiler, run it separately and analyse its query plan.
I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?
You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.
Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.
compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.
Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.
Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.