How does Entity Framework entity loading work? - c#

I've been playing around with making my own Entity Framework (for personal projects and out of curiosity on what making something like this would take).
While I was doing Entity Framework performance tests with a data table with 700k rows and 5 columns (named MassData), I ran into something peculiar issues that I'm hoping someone could explain to me.
Running the following test:
var Context = new EntityFameworkContext();
var first = Context.MassData.Where(x => x.Id == 1);
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
The context creation takes 35ms, getting first takes about 215ms and getting firstFifty takes 14ms.
Removing 'first', getting 'firstFifty' takes about 210ms.
The results were the same if I switch the 'first' query with a Where() that selects everything (still with no iteration).
My first thought, was that this was some case of loading the lazy data in the DbSet, with the first query enumerating data the next one accesses (even though the first one doesn't iterate through anything). This would kind of explain why the first always takes a minimum of 200ms regardless of the query, while the second runs as fast as if no database connection was even involved (the 'firstFifty' takes 25ms minimum to run as an SQL query, more than the 15ms I'm seeing here).
Except loading all of MassData takes 5 seconds. Just reading it takes about 2,5. So it can't be loading everything, but it's clearly loading more than the first query requires. So obviously I'm missing something.
Would anyone happen to have an explanation for why the
var first = Context.MassData.Where(x => x.Id == 1);
query speeds up the
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
query?
EDIT:
Turns out, it really had nothing to do with lazy loading at all. The first query opens the connection and (I presume) does and stores the validation of the entity type against the database table. The second query then doesn't have to open the connection or do much if any validation, in which case the duration of the second query matches up, and everything makes sense.
EDIT 2:
Modified title to better match what the question ended up really being about (How does lazy-loading work => how does entity loading work).

Because you're still loading the Entity Type and the prerequisites of it. Regardless of what you're trying to query. Lambda expression for EF still is SQL, with the conversion from Lambda to String clauses and statements. So the first slowness is not from the Query but from the EF Initial set up.
Remember, You're still creating an Instance of the EF so it will eat up some runtime process. Then the rest is Query Fetching time. This is unavoidable process of-course because of the CLR process.
So, Generally. The Second Query is ready for the Query since in your Model where you set up the EF is still on use, But when the Garbage collection decides that it is not going to be used anywhere, then your query for the next session will be slower at the begin Init for EF. "MEANING YOUR CONNECTION FROM THE DB IS STILL OPEN" as easy as that.

There are tools to show sql server activity, so you do not have to guess (for example sql profiler for microsoft sql server). But the lag on first query has probably nothing to do with database, it is just EF internal initialization. EF is notoriously lazy.

Related

Poor performance when loading child entities with Entity Framework

I'm building an ASP.Net application with Entity Framework (Code First) and I've implemented a Repository pattern like the one in this example.
I only have two tables in my database. One called Sensor and one called MeasurePoint (containing only TimeStamp and Value). A sensor can have multiple measure points. At the moment I have 5 sensors and around 15000 measure points (approximately 3000 points for each sensor).
In one of my MVC controllers I execute the following line (to get the most recent MeasurePoint for a Sensor)
DbSet<Sensor> dbSet = context.Set<Sensor>();
var sensor = dbSet.Find(sensorId);
var point = sensor.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
This call takes ~1s to execute which feels like a lot to me. The call results in the following SQL query
SELECT
[Extent1].[MeasurePointId] AS [MeasurePointId],
[Extent1].[Value] AS [Value],
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[Sensor_SensorId] AS [Sensor_SensorId]
FROM [dbo].[MeasurePoint] AS [Extent1]
WHERE ([Extent1].[Sensor_SensorId] IS NOT NULL) AND ([Extent1].[Sensor_SensorId] = #EntityKeyValue1)
Which only takes ~200ms to execute, so the time is spent somewhere else.
I've profiled the code with the help of Visual Studio Profiler and found that the call that causes the delay is
System.Data.Objects.Internal.LazyLoadBehavior.<>c_DisplayClass7`2.<GetInterceptorDelegate>b_1(!0,!1)
So I guess it has something to do with lazy loading. Do I have to live with performance like this or are there improvements I can make? Is it the ordering by time that causes the performance drop, if so, what options do I have?
Update:
I've updated the code to show where sensor comes from.
What that will do is load the entire children collection into memory and then perform the .First() linq query against the loaded (appx 3000) children.
If you just want the most recent, use this instead:
context.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
If that's the query it's running, it's loading all 3000 points into memory for the sensor. Try running the query directly on your DbContext instead of using the navigation property and see what the performance difference is. Your overhead may be coming from the 2999 points you don't need being loaded.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

LINQ to Entities: Query not working with certain parameter value

I have a very strange problem with a LINQ to Entities query with EF1.
I have a method with a simple query:
public DateTime GetLastSuccessfulRun(string job)
{
var entities = GetEntities();
var query = from jr in entities.JOBRUNS
where jr.JOB_NAME == job && jr.JOB_INFO == "SUCCESS"
orderby jr.JOB_END descending
select jr.JOB_END;
var result = query.ToList().FirstOrDefault();
return result.HasValue ? result.Value : default(DateTime);
}
The method GetEntities returns an instance of a class that is derived from System.Data.Objects.ObjectContext and has automatically been created by the EF designer when I imported the schema of the database.
The query worked just fine for the last 15 or 16 months. And it still runs fine on our test system. In the live system however, there is a strange problem: Depending on the value of the parameter job, it returns the correct results or an empty result set, although there is data it should return.
Anyone ever had a strange case like that? Any ideas what could be the problem?
Some more info:
The database we query against is a Oracle 10g, we are using an enhanced version of the OracleEFProvider v0.2a.
The SQl statement that is returned by ToTraceString works just fine when executed directly via SQL Developer, even with the same parameter that is causing the problem in the LINQ query.
The following also returns the correct result:
entities.JOBRUNS.ToList().Where(x => x.JOB_NAME == job && x.JOB_INFO == "SUCCESS").Count();
The difference here is the call to ToList on the table before applying the where clause. This means two things:
The data is in the database and it is correct.
The problem seems to be the query including the where clause when executed by the EF Provider.
What really stuns me is, that this is a live system and the problem occurred without any changes to the database or the program. One call to that method returned the correct result and the next call five minutes later returned the wrong result. And since then, it only returns the wrong results.
Any hints, suggestions, ideas etc. are welcome, never mind, how far-fetched they seem! Please post them as answers, so I can vote on them, just for the fact for reading my lengthy question and bothering thinking about that strange problem... ;-)
First of all remove ObjectContext caching. Object context internally uses UnitOfWork and IdentityMap patterns. This can have big impact on queries.

Linq2Entities Include with Skip / Take - load issue

Note: I know there are a number of questions around for issues with Linq's .Include(table) not loading data, I believe I have exhausted the options people have listed, and still had problems.
I have a large Linq2Entities query on an application I'm maintaining. The query is built up as such:
IQueryable<Results> query = context.MyTable
.Where(r =>
r.RelatedTable.ID == 2 &&
r.AnotherRelatedTable.ID == someId);
Then predicates are built up depending on various business logic, such as:
if (sortColumn.Contains("dob "))
{
if (orderByAscending)
query = query.OrderBy(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
else
query = query.OrderByDescending(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
}
Note - there is always a sort order provided.
Originally the included tables were set at the beginning, after reading articles such as the famous Tip 22, so now they are done at the end (which didn't fix the problem):
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Seemingly at random (approximately for every 5000 users of the site, this issue happens once) the RelatedTable data won't load. It can be brute forced by calling load on the related table. But even the failure to load isn't consistent, I've run the query in testing and it's worked, but most of the time hasn't, without changing any code or data.
It is fine, when the skip and take aren't included, and the whole dataset is returned, but I would expect the skip and take to be done on the complete dataset - it certainly appears to be from profiling the SQL...
UPDATE 16/11/10: I have profiled the SQL against a problem data set, and I've been able to reproduce the query failing about 9/10 times, but succeeding the rest. The SQL being executed is identical when the query fails or succeeds except, as expected, for the parameters passed to the SQL.
The issue has been solved with the following change, but the question remains as to why this should be.
Failing - get LINQ to handle the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage)
.ToList();
Working - enumerate the data then get the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.ToList()
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Unfortunately the SQL this query creates contains some sensitive schema data so I can't post it, it is also 1400 lines long, so I wouldn't subject the public to it anyway!
The sole effect of Take() is to change the generated SQL. Other than that, the Entity Framework does not care about it at all. Same for .Skip(). It's hard to believe that this would have an effect on query materialization (although stranger things have happened).
So what could be causing this behavior? Off the top of my head:
A bug in your application or mapping which is causing an incorrect query to be generated.
A bug in the Entity Framework which would cause returned data to be materialized into objects incorrectly in certain circumstances.
Bad data in your database.
A bug in your database's SQL parser.
I don't think you're going to get a lot further with this until you can capture the generated SQL and run it yourself. This is actually not terribly hard, as you can set up a SQL profiler with an appropriate filter. If you find that the generated SQL is different in the buggy case, you can work backwards from there. If you find that the generated SQL is identical in the buggy case, the next step would be to look at the rows returned, preferably in the same context as the application ran it.
In short, I think you just have to keep tweaking your SQL profiling until you have the information you need.

Loading behaviour of EF v1?

another Entity Framework (ADO.NET) question from me.
I'm using EF1 (no choice there) and have a MySQL database as a backend.
A simple question I can't really find a satisfying answer for:
What exactly do I have to do for loading? IE., when I have an entity and want to enumerate through its children, say I have the Entity "Group" and it has a child "User", and I want to do "from n in g.Users where n.UserID = 4 select n", I first have to call g.Users.Load();
This is a bit annoying, because when I do a query against a non-loaded collection, I would expect the EF to load it automatically - AT LEAST throw some exception, not simply return 0 results?
Another case of me having to take care of loading:
I have a query:
from n in Users where n.Group.Name == "bla" select n
For some reason it fails, giving a null pointer for n.Group, even though n.GroupID (the key for the group) is correctly set. Also, when I before do Server.Groups.Load() (Groups are children of one Server), it works.
Is there any exact policy about when to call Load() of which collection?
Thank you again,
Michael
There is no lazy loading in the first version of entity framework. Any time you want to access something you have not previously loaded, be it a reference to a single object or a collection of objects, you will either have to explicitly tell it to load that reference. The Include() option (first from Rup above) is going to try to load all the data you want in one large query and processing call, the result being that Include() performs slowly. The other method (2nd from Rup above), checking and then loading unloaded references, performed much faster and allowed us to limit loads to what we needed.
Basically our policy was to load only what you had to in the initial query, minimizing performance impact. Then we would check and load the reference later when we wanted to access a referenced entity or entity collection. This resulted in more queries to the database, but they were faster and we were only loading the ancillary data when we needed it, instead of pre-loading everything we could potentially need. It was possible that the same property would get checked as couple times in a function, but it would only have been loaded once and we could be sure we were only loading it because we were using it.
Do you mean ObjectQuery.Include, e.g.
var g = from g in MyDb.Groups.Include("Users") where g.Id = 123 select g;
from n in g.Users where n.UserID = 4 select n
from n in Users.Include("Group") where n.Group.Name == "bla" select n
You can also wrap Load()s in a check if you're worried about over-using them,
if (g.CanLoadReferences() && !g.Users.IsLoaded)
{
g.Users.Load();
}
(apologies for any silly syntax slips here - I use the other LINQ syntax and EF4 now)
This works when we run against MS SQL server, it could be a limitation in the MySQL Adapter.
Are you using the latest version 6.2.3? See: http://www.mysql.com/downloads/connector/net

Categories

Resources