How to boost performance EF Linq to Entities

How to boost performance EF Linq to Entities - c#

I use asp.net 4 c# and EF 4.
I'm profiling my application. I have this code which result expensive.
I woud like to know if you know a better way to write it. I need to speed it up.
string htmlhead = context.CmsOptions.SingleOrDefault(op => op.OptionId == 7).Value;
if (htmlhead != null)
uxHtmlHead.Text = htmlhead;
else
uxHtmlHead.Text = "No Html Head.";
Thanks
Useful article
http://weblogs.asp.net/zeeshanhirani/archive/2010/09/20/which-one-is-faster-singleordefault-or-firstordefault.aspx

Use FirstOrDefault() ... This will exit as soon as it finds a result.
On the other hand SingleOrDefault() searches the entire collection for a single result and throws exception if it finds more than one result for the query

There are some basic steps you can take when optimizing an Entity Framework query.
Here you can find a list of all the options.
The ones I would suggest in your case are:
NoTracking. You only retrieve the entity for getting a value. Not for changing it so you don't need change tracking
Precompile your query. This is an easy step and it really gives you a performance boost
Return the correct amount of data. You are selecting a whole entity just to retrieve it's value property. If you would only select the Value property less data will be returned from SQL Server (BTW, your code will throw a Null Exception if a CmsOption can't be found because of the First/Single -OrDefault part. If you really know there should be at least one entity remove the OrDefault part)
If you are using Visual Studio Premium you could put this code in a Unit Test and then profile the unit test. After each change you can compare the reports to make sure you are improving performance.

Related

Entity Framework Complex query with inner joins

I want to execute a complex query using Entity Framework (Code First).
I know it can be done by LINQ, but what is the best way to write LINQ query for complex data and inner joining?
I want to execute the following query:
var v = from m in WebAppDbContext.UserSessionTokens
from c in WebAppDbContext.Companies.Include(a => a.SecurityGroups)
from n in WebAppDbContext.SecurityGroups.Include(x => x.Members)
where m.TokenString == userTokenString &&
n.Members.Contains(m.User) &&
c.SecurityGroups.Contains(n)
select c;
Is this the best way to do this?
Does this query get any entire list of records from the db and then executes the filtration on this list? (might be huge list)
And most important: Does it query the database several times?

In my opinion and based on my own experiences, talking about performance, especially joining data sets, it's faster when you write it in SQL. But since you used code first approach then it's not an option. To answer your questions, your query will not query DB several times (you can try debugging and see Events log in VS). EF will transform your query into SQL statement and execute it.

TL:DR; don't micromanage the robots. Let them do their thing and 99% of the time you'll be fine. Linq doesn't really expose methods to micromanage the underlying data query, anyway, its whole purpose is to be an abstraction layer.
In my experience, the Linq provider is pretty smart about converting valid Linq syntax into "pretty good" SQL. Your looks like your query is all inner joins, and those should all be on the primary indexes/foreign keys of each table, so it's going to come up with something pretty good.
If that's not enough to soothe your worries, I'd suggest:
put on a SQL Trace to see the actual query that's being presented to the Database. I bet it's not as simple as Companies join SecurityGroups join Members join UserTokens, but it's probably equivalent to it.
Only worry about optimizing if this becomes an actual performance problem.
If it does become a performance problem, consider refactoring your problem space. It looks like you're trying to get a Company record from a UserToken, by going through three other tables. Is it possible to just relate Users and Companies directly, rather than through the security model? (please excuse my total ignorance of your problem space, I'm just saying "maybe look at the problem from another angle?")
In short, it sounds like you're burning brain cycles attempting to optimize something prematurely. Now, it's reasonable to think if there is a performance problem, this section of the code could be responsible, but that would only be noticeable if you're doing this query a lot. Based on coming from a security token, I'd guess you're doing this once per session to get the current user's contextual information. In that case, the problem isn't usually with Linq, but with your approach to solving the problem for finding Company information. Can you cache the result?

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?

There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.

1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.

I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)

I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Should I be using LINQ queries on RavenDB?

I'm learning RavenDB and I am rather confused. As far as I understand, one should create indexes in order to have really efficient queries. However, it is possible to simply make LINQ queries, such as
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>()
.Where(d => d.Property == value)
.Single();
}
This type of query works perfectly fine. I have, however, never created an index for it (and never reference an index when making the query, of course).
Should I be using this kind of query when working with RavenDB? If not, why is it even available in the API?

There's two things you are asking, here.
Can we use Indexes .. which are suppose to be more efficient than dynamic queries?
If we use indexes .. then should we use Linq and chaining?
Indexes
As Matt Warren correctly said, you're not using any indexes in your sample query. Right now, with your sample query, RavenDb is smart enough to create a temp (dynamic) index. If that dynamic index is used enough, it get auto-promoted to a static / perminent index.
So .. should you use indexes? If you can, then yeah!
here's your statement again, this time with an Index defined.
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>("ByProperty")
.Where(d => d.Property == value)
.Single();
}
In this case an index called MyDocument_ByProperty was created somewhere. I'm not going to explain the details of indexes .. go and read all about them here.
Linq and chaining
(Not sure if that is the correct terminology ... )
If you create a linq statement (which I did above) with OR without an index .. a query is still generated .. which then is translated into an HTTP RESTful request to the RavenDB Server. If you have an index .. then the query is smart enough to ask to use that. None? Then the server will create a dynamic index .. which means it will also have to go through the motions of indexing first, then retrieving your results.
TL;DR;
Yes use indexes. Yes use Linq chaining.

RavenDb comes with a native support for .Net and Linq.
The Linq provider, under the hood, does normal REST calls to the ravendb server, but for you it's easier to code on it since you can use IQueryable<T> with strongly typed classes.
So yes, you can and you should use linq/lambda to work with RavenDB in a .Net envorinment.

Something to be aware of that caught me out is that if you include a linq statement such as .Where(d => d.SomeProperty == null) then you might expect that if the document does not have the property then you would return a match. However this is not the case. If the document does not have the property then its value is not considered to be null (or any other value).

When should I use a CompiledQuery?

I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?

You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.

Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.

compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.

Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.

Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.

Loading behaviour of EF v1?

another Entity Framework (ADO.NET) question from me.
I'm using EF1 (no choice there) and have a MySQL database as a backend.
A simple question I can't really find a satisfying answer for:
What exactly do I have to do for loading? IE., when I have an entity and want to enumerate through its children, say I have the Entity "Group" and it has a child "User", and I want to do "from n in g.Users where n.UserID = 4 select n", I first have to call g.Users.Load();
This is a bit annoying, because when I do a query against a non-loaded collection, I would expect the EF to load it automatically - AT LEAST throw some exception, not simply return 0 results?
Another case of me having to take care of loading:
I have a query:
from n in Users where n.Group.Name == "bla" select n
For some reason it fails, giving a null pointer for n.Group, even though n.GroupID (the key for the group) is correctly set. Also, when I before do Server.Groups.Load() (Groups are children of one Server), it works.
Is there any exact policy about when to call Load() of which collection?
Thank you again,
Michael

There is no lazy loading in the first version of entity framework. Any time you want to access something you have not previously loaded, be it a reference to a single object or a collection of objects, you will either have to explicitly tell it to load that reference. The Include() option (first from Rup above) is going to try to load all the data you want in one large query and processing call, the result being that Include() performs slowly. The other method (2nd from Rup above), checking and then loading unloaded references, performed much faster and allowed us to limit loads to what we needed.
Basically our policy was to load only what you had to in the initial query, minimizing performance impact. Then we would check and load the reference later when we wanted to access a referenced entity or entity collection. This resulted in more queries to the database, but they were faster and we were only loading the ancillary data when we needed it, instead of pre-loading everything we could potentially need. It was possible that the same property would get checked as couple times in a function, but it would only have been loaded once and we could be sure we were only loading it because we were using it.

Do you mean ObjectQuery.Include, e.g.
var g = from g in MyDb.Groups.Include("Users") where g.Id = 123 select g;
from n in g.Users where n.UserID = 4 select n
from n in Users.Include("Group") where n.Group.Name == "bla" select n
You can also wrap Load()s in a check if you're worried about over-using them,
if (g.CanLoadReferences() && !g.Users.IsLoaded)
{
g.Users.Load();
}
(apologies for any silly syntax slips here - I use the other LINQ syntax and EF4 now)

This works when we run against MS SQL server, it could be a limitation in the MySQL Adapter.
Are you using the latest version 6.2.3? See: http://www.mysql.com/downloads/connector/net

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.