I'm building an ASP.Net application with Entity Framework (Code First) and I've implemented a Repository pattern like the one in this example.
I only have two tables in my database. One called Sensor and one called MeasurePoint (containing only TimeStamp and Value). A sensor can have multiple measure points. At the moment I have 5 sensors and around 15000 measure points (approximately 3000 points for each sensor).
In one of my MVC controllers I execute the following line (to get the most recent MeasurePoint for a Sensor)
DbSet<Sensor> dbSet = context.Set<Sensor>();
var sensor = dbSet.Find(sensorId);
var point = sensor.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
This call takes ~1s to execute which feels like a lot to me. The call results in the following SQL query
SELECT
[Extent1].[MeasurePointId] AS [MeasurePointId],
[Extent1].[Value] AS [Value],
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[Sensor_SensorId] AS [Sensor_SensorId]
FROM [dbo].[MeasurePoint] AS [Extent1]
WHERE ([Extent1].[Sensor_SensorId] IS NOT NULL) AND ([Extent1].[Sensor_SensorId] = #EntityKeyValue1)
Which only takes ~200ms to execute, so the time is spent somewhere else.
I've profiled the code with the help of Visual Studio Profiler and found that the call that causes the delay is
System.Data.Objects.Internal.LazyLoadBehavior.<>c_DisplayClass7`2.<GetInterceptorDelegate>b_1(!0,!1)
So I guess it has something to do with lazy loading. Do I have to live with performance like this or are there improvements I can make? Is it the ordering by time that causes the performance drop, if so, what options do I have?
Update:
I've updated the code to show where sensor comes from.
What that will do is load the entire children collection into memory and then perform the .First() linq query against the loaded (appx 3000) children.
If you just want the most recent, use this instead:
context.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
If that's the query it's running, it's loading all 3000 points into memory for the sensor. Try running the query directly on your DbContext instead of using the navigation property and see what the performance difference is. Your overhead may be coming from the 2999 points you don't need being loaded.
Related
I've been playing around with making my own Entity Framework (for personal projects and out of curiosity on what making something like this would take).
While I was doing Entity Framework performance tests with a data table with 700k rows and 5 columns (named MassData), I ran into something peculiar issues that I'm hoping someone could explain to me.
Running the following test:
var Context = new EntityFameworkContext();
var first = Context.MassData.Where(x => x.Id == 1);
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
The context creation takes 35ms, getting first takes about 215ms and getting firstFifty takes 14ms.
Removing 'first', getting 'firstFifty' takes about 210ms.
The results were the same if I switch the 'first' query with a Where() that selects everything (still with no iteration).
My first thought, was that this was some case of loading the lazy data in the DbSet, with the first query enumerating data the next one accesses (even though the first one doesn't iterate through anything). This would kind of explain why the first always takes a minimum of 200ms regardless of the query, while the second runs as fast as if no database connection was even involved (the 'firstFifty' takes 25ms minimum to run as an SQL query, more than the 15ms I'm seeing here).
Except loading all of MassData takes 5 seconds. Just reading it takes about 2,5. So it can't be loading everything, but it's clearly loading more than the first query requires. So obviously I'm missing something.
Would anyone happen to have an explanation for why the
var first = Context.MassData.Where(x => x.Id == 1);
query speeds up the
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
query?
EDIT:
Turns out, it really had nothing to do with lazy loading at all. The first query opens the connection and (I presume) does and stores the validation of the entity type against the database table. The second query then doesn't have to open the connection or do much if any validation, in which case the duration of the second query matches up, and everything makes sense.
EDIT 2:
Modified title to better match what the question ended up really being about (How does lazy-loading work => how does entity loading work).
Because you're still loading the Entity Type and the prerequisites of it. Regardless of what you're trying to query. Lambda expression for EF still is SQL, with the conversion from Lambda to String clauses and statements. So the first slowness is not from the Query but from the EF Initial set up.
Remember, You're still creating an Instance of the EF so it will eat up some runtime process. Then the rest is Query Fetching time. This is unavoidable process of-course because of the CLR process.
So, Generally. The Second Query is ready for the Query since in your Model where you set up the EF is still on use, But when the Garbage collection decides that it is not going to be used anywhere, then your query for the next session will be slower at the begin Init for EF. "MEANING YOUR CONNECTION FROM THE DB IS STILL OPEN" as easy as that.
There are tools to show sql server activity, so you do not have to guess (for example sql profiler for microsoft sql server). But the lag on first query has probably nothing to do with database, it is just EF internal initialization. EF is notoriously lazy.
So I have the below EF query that returns about 12,000 records from a flat table. I use a projection that only selects the necessary fields (about 15) and then puts them into a list of a custom class. It takes almost 3 seconds, which seems like a long time for 12,000 records. I've tried wrapping the entire thing in a transaction scrope with "read uncommitted", and I've also tried using "AsNoTracking()". Neither made any difference. Anyone have any idea why the performance on this would be so lousy?
List<InfoModel> results = new List<InfoModel>();
using (InfoData data = new InfoData())
{
results = (from S in data.InfoRecords
select new
{
...bunch of entity fields...
}).AsEnumerable().Select(x => new InfoModel()
{
...bunch of model fields...
}).ToList();
}
It's difficult to answer, because there are soy many things that can affect, your network, the number of other request in your sqlserver or Windows server, your model, ....
Despite of the lastest versions the quality of generated queries and the performance of Entity framework has improved a lot, is far from others in terms of speed. There are some performance considerations you can look https://msdn.microsoft.com/en-us/data/hh949853.aspx
It speed matters and 3 seconds is too much for you, probably I wouldn't use Entity Framework to retrieve so many rows, for me Entity Framework is great when you need just some items but not thousands if speed is important.
To improve speed you case use many others ORM like Dapper, the one used by Stackoverflow that is pretty fast.
I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.
another Entity Framework (ADO.NET) question from me.
I'm using EF1 (no choice there) and have a MySQL database as a backend.
A simple question I can't really find a satisfying answer for:
What exactly do I have to do for loading? IE., when I have an entity and want to enumerate through its children, say I have the Entity "Group" and it has a child "User", and I want to do "from n in g.Users where n.UserID = 4 select n", I first have to call g.Users.Load();
This is a bit annoying, because when I do a query against a non-loaded collection, I would expect the EF to load it automatically - AT LEAST throw some exception, not simply return 0 results?
Another case of me having to take care of loading:
I have a query:
from n in Users where n.Group.Name == "bla" select n
For some reason it fails, giving a null pointer for n.Group, even though n.GroupID (the key for the group) is correctly set. Also, when I before do Server.Groups.Load() (Groups are children of one Server), it works.
Is there any exact policy about when to call Load() of which collection?
Thank you again,
Michael
There is no lazy loading in the first version of entity framework. Any time you want to access something you have not previously loaded, be it a reference to a single object or a collection of objects, you will either have to explicitly tell it to load that reference. The Include() option (first from Rup above) is going to try to load all the data you want in one large query and processing call, the result being that Include() performs slowly. The other method (2nd from Rup above), checking and then loading unloaded references, performed much faster and allowed us to limit loads to what we needed.
Basically our policy was to load only what you had to in the initial query, minimizing performance impact. Then we would check and load the reference later when we wanted to access a referenced entity or entity collection. This resulted in more queries to the database, but they were faster and we were only loading the ancillary data when we needed it, instead of pre-loading everything we could potentially need. It was possible that the same property would get checked as couple times in a function, but it would only have been loaded once and we could be sure we were only loading it because we were using it.
Do you mean ObjectQuery.Include, e.g.
var g = from g in MyDb.Groups.Include("Users") where g.Id = 123 select g;
from n in g.Users where n.UserID = 4 select n
from n in Users.Include("Group") where n.Group.Name == "bla" select n
You can also wrap Load()s in a check if you're worried about over-using them,
if (g.CanLoadReferences() && !g.Users.IsLoaded)
{
g.Users.Load();
}
(apologies for any silly syntax slips here - I use the other LINQ syntax and EF4 now)
This works when we run against MS SQL server, it could be a limitation in the MySQL Adapter.
Are you using the latest version 6.2.3? See: http://www.mysql.com/downloads/connector/net
The executeTime below is 30 seconds the first time, and 25 seconds the next time I execute the same set of code. When watching in SQL Profiler, I immediately see a login, then it just sits there for about 30 seconds. Then as soon as the select statement is run, the app finishes the ToList command. When I run the generated query from Management Studio, the database query only takes about 400ms. It returns 14 rows and 350 columns. It looks like time it takes transforming the database results to the entities is so small it is not noticable.
So what is happening in the 30 seconds before the database call is made?
If entity framework is this slow, it is not possible for us to use it. Is there something I am doing wrong or something I can change to speed this up dramatically?
UPDATE:
All right, if I use a Compiled query, the first time it take 30 seconds, and the second time it takes 1/4th of a second. Is there anything I can do to speed up the first call?
using (EntitiesContext context = new EntitiesContext())
{
Stopwatch sw = new Stopwatch();
sw.Start();
var groupQuery = (from g in context.Groups.Include("DealContract")
.Include("DealContract.Contracts")
.Include("DealContract.Contracts.AdvertiserAccountType1")
.Include("DealContract.Contracts.ContractItemDetails")
.Include("DealContract.Contracts.Brands")
.Include("DealContract.Contracts.Agencies")
.Include("DealContract.Contracts.AdvertiserAccountType2")
.Include("DealContract.Contracts.ContractProductLinks.Products")
.Include("DealContract.Contracts.ContractPersonnelLinks")
.Include("DealContract.Contracts.ContractSpotOrderTypes")
.Include("DealContract.Contracts.Advertisers")
where g.GroupKey == 6
select g).OfType<Deal>();
sw.Stop();
var queryTime = sw.Elapsed;
sw.Reset();
sw.Start();
var groups = groupQuery.ToList();
sw.Stop();
var executeTime = sw.Elapsed;
}
I had this exact same problem, my query was taking 40 seconds.
I found the problem was with the .Include("table_name") functions. The more of these I had, the worse it was. Instead I changed my code to Lazy Load all the data I needed right after the query, this knocked the total time down to about 1.5 seconds from 40 seconds. As far as I know, this accomplishes the exact same thing.
So for your code it would be something like this:
var groupQuery = (from g in context.Groups
where g.GroupKey == 6
select g).OfType<Deal>();
var groups = groupQuery.ToList();
foreach (var g in groups)
{
// Assuming Dealcontract is an Object, not a Collection of Objects
g.DealContractReference.Load();
if (g.DealContract != null)
{
foreach (var d in g.DealContract)
{
// If the Reference is to a collection, you can just to a Straight ".Load"
// if it is an object, you call ".Load" on the refence instead like with "g.DealContractReference" above
d.Contracts.Load();
foreach (var c in d.Contracts)
{
c.AdvertiserAccountType1Reference.Load();
// etc....
}
}
}
}
Incidentally, if you were to add this line of code above the query in your current code, it would knock the time down to about 4-5 seconds (still too ling in my option) From what I understand, the MergeOption.NoTracking option disables a lot of the tracking overhead for updating and inserting stuff back into the database:
context.groups.MergeOption = MergeOption.NoTracking;
It is because of the Include. My guess is that you are eager loading a lot of objects into the memory. It takes much time to build the c# objects that corresponds to your db entities.
My recommendation for you is to try to lazy load only the data you need.
The only way to make the initial compilation of the query faster that I know of is to make the query less complex. The MSDN documentation on performance considerations for the Entity Framework and Compiled Queries don't indicate that there is any way to save a compiled query for use in a different application execution session.
I would add that we have found that having lots of Includes can make query execution slower than having fewer Includes and doing more Loads on related entities later. Some trial and error is required to find the right medium.
However, I have to ask if you really need every property of every entity you are including here. It seems to me that there is a large number of different entity types in this query, so materializing them could well be quite expensive. If you are just trying to get tabular results which you don't intend to update, projecting the (relatively) fewer number of fields that you actually need into a flat, anonymous type should be significantly faster for various reasons. Also, this frees you from having to worry about eager loading, calling Load/IsLoaded, etc.
You can certainly speed up the initial view generation by precompiling the entity views. There is documentation on MSDN for this. But since you pay that cost at the time the first query is executed, your test with a simple query shows that this is running in the neighborhood of 2 seconds for you. It's nice to say that 2 seconds, but it won't save anything else.
EF takes a while to start up. It needs build metadata from xml and probably generates objects used for mapping.
So it takes a few sec to start up, i don't think there is a way to get around that, except never restarting your program.