Loading behaviour of EF v1? - c#

another Entity Framework (ADO.NET) question from me.
I'm using EF1 (no choice there) and have a MySQL database as a backend.
A simple question I can't really find a satisfying answer for:
What exactly do I have to do for loading? IE., when I have an entity and want to enumerate through its children, say I have the Entity "Group" and it has a child "User", and I want to do "from n in g.Users where n.UserID = 4 select n", I first have to call g.Users.Load();
This is a bit annoying, because when I do a query against a non-loaded collection, I would expect the EF to load it automatically - AT LEAST throw some exception, not simply return 0 results?
Another case of me having to take care of loading:
I have a query:
from n in Users where n.Group.Name == "bla" select n
For some reason it fails, giving a null pointer for n.Group, even though n.GroupID (the key for the group) is correctly set. Also, when I before do Server.Groups.Load() (Groups are children of one Server), it works.
Is there any exact policy about when to call Load() of which collection?
Thank you again,
Michael

There is no lazy loading in the first version of entity framework. Any time you want to access something you have not previously loaded, be it a reference to a single object or a collection of objects, you will either have to explicitly tell it to load that reference. The Include() option (first from Rup above) is going to try to load all the data you want in one large query and processing call, the result being that Include() performs slowly. The other method (2nd from Rup above), checking and then loading unloaded references, performed much faster and allowed us to limit loads to what we needed.
Basically our policy was to load only what you had to in the initial query, minimizing performance impact. Then we would check and load the reference later when we wanted to access a referenced entity or entity collection. This resulted in more queries to the database, but they were faster and we were only loading the ancillary data when we needed it, instead of pre-loading everything we could potentially need. It was possible that the same property would get checked as couple times in a function, but it would only have been loaded once and we could be sure we were only loading it because we were using it.

Do you mean ObjectQuery.Include, e.g.
var g = from g in MyDb.Groups.Include("Users") where g.Id = 123 select g;
from n in g.Users where n.UserID = 4 select n
from n in Users.Include("Group") where n.Group.Name == "bla" select n
You can also wrap Load()s in a check if you're worried about over-using them,
if (g.CanLoadReferences() && !g.Users.IsLoaded)
{
g.Users.Load();
}
(apologies for any silly syntax slips here - I use the other LINQ syntax and EF4 now)

This works when we run against MS SQL server, it could be a limitation in the MySQL Adapter.
Are you using the latest version 6.2.3? See: http://www.mysql.com/downloads/connector/net

Related

How does Entity Framework entity loading work?

I've been playing around with making my own Entity Framework (for personal projects and out of curiosity on what making something like this would take).
While I was doing Entity Framework performance tests with a data table with 700k rows and 5 columns (named MassData), I ran into something peculiar issues that I'm hoping someone could explain to me.
Running the following test:
var Context = new EntityFameworkContext();
var first = Context.MassData.Where(x => x.Id == 1);
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
The context creation takes 35ms, getting first takes about 215ms and getting firstFifty takes 14ms.
Removing 'first', getting 'firstFifty' takes about 210ms.
The results were the same if I switch the 'first' query with a Where() that selects everything (still with no iteration).
My first thought, was that this was some case of loading the lazy data in the DbSet, with the first query enumerating data the next one accesses (even though the first one doesn't iterate through anything). This would kind of explain why the first always takes a minimum of 200ms regardless of the query, while the second runs as fast as if no database connection was even involved (the 'firstFifty' takes 25ms minimum to run as an SQL query, more than the 15ms I'm seeing here).
Except loading all of MassData takes 5 seconds. Just reading it takes about 2,5. So it can't be loading everything, but it's clearly loading more than the first query requires. So obviously I'm missing something.
Would anyone happen to have an explanation for why the
var first = Context.MassData.Where(x => x.Id == 1);
query speeds up the
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
query?
EDIT:
Turns out, it really had nothing to do with lazy loading at all. The first query opens the connection and (I presume) does and stores the validation of the entity type against the database table. The second query then doesn't have to open the connection or do much if any validation, in which case the duration of the second query matches up, and everything makes sense.
EDIT 2:
Modified title to better match what the question ended up really being about (How does lazy-loading work => how does entity loading work).
Because you're still loading the Entity Type and the prerequisites of it. Regardless of what you're trying to query. Lambda expression for EF still is SQL, with the conversion from Lambda to String clauses and statements. So the first slowness is not from the Query but from the EF Initial set up.
Remember, You're still creating an Instance of the EF so it will eat up some runtime process. Then the rest is Query Fetching time. This is unavoidable process of-course because of the CLR process.
So, Generally. The Second Query is ready for the Query since in your Model where you set up the EF is still on use, But when the Garbage collection decides that it is not going to be used anywhere, then your query for the next session will be slower at the begin Init for EF. "MEANING YOUR CONNECTION FROM THE DB IS STILL OPEN" as easy as that.
There are tools to show sql server activity, so you do not have to guess (for example sql profiler for microsoft sql server). But the lag on first query has probably nothing to do with database, it is just EF internal initialization. EF is notoriously lazy.

How to boost performance EF Linq to Entities

I use asp.net 4 c# and EF 4.
I'm profiling my application. I have this code which result expensive.
I woud like to know if you know a better way to write it. I need to speed it up.
string htmlhead = context.CmsOptions.SingleOrDefault(op => op.OptionId == 7).Value;
if (htmlhead != null)
uxHtmlHead.Text = htmlhead;
else
uxHtmlHead.Text = "No Html Head.";
Thanks
Useful article
http://weblogs.asp.net/zeeshanhirani/archive/2010/09/20/which-one-is-faster-singleordefault-or-firstordefault.aspx
Use FirstOrDefault() ... This will exit as soon as it finds a result.
On the other hand SingleOrDefault() searches the entire collection for a single result and throws exception if it finds more than one result for the query
There are some basic steps you can take when optimizing an Entity Framework query.
Here you can find a list of all the options.
The ones I would suggest in your case are:
NoTracking. You only retrieve the entity for getting a value. Not for changing it so you don't need change tracking
Precompile your query. This is an easy step and it really gives you a performance boost
Return the correct amount of data. You are selecting a whole entity just to retrieve it's value property. If you would only select the Value property less data will be returned from SQL Server (BTW, your code will throw a Null Exception if a CmsOption can't be found because of the First/Single -OrDefault part. If you really know there should be at least one entity remove the OrDefault part)
If you are using Visual Studio Premium you could put this code in a Unit Test and then profile the unit test. After each change you can compare the reports to make sure you are improving performance.

Entity Framework include vs where

My database structure is this: an OptiUser belongs to multiple UserGroups through the IdentityMap table, which is a matching table (many to many) with some additional properties attached to it. Each UserGroup has multiple OptiDashboards.
I have a GUID string which identifies a particular user (wlid in this code). I want to get an IEnumerable of all of the OptiDashboards for the user identified by wlid.
Which of these two Linq-to-Entities queries is the most efficient? Do they run the same way on the back-end?
Also, can I shorten option 2's Include statements to just .Include("IdentityMaps.UserGroup.OptiDashboards")?
using (OptiEntities db = new OptiEntities())
{
// option 1
IEnumerable<OptiDashboard> dashboards = db.OptiDashboards
.Where(d => d.UserGroups
.Any(u => u.IdentityMaps
.Any(i => i.OptiUser.WinLiveIDToken == wlid)));
// option 2
OptiUser user = db.OptiUsers
.Include("IdentityMaps")
.Include("IdentityMaps.UserGroup")
.Include("IdentityMaps.UserGroup.OptiDashboards")
.Where(r => r.WinLiveIDToken == wlid).FirstOrDefault();
// then I would get the dashboards through user.IdentityMaps.UserGroup.OptiDashboards
// (through foreach loops...)
}
You may be misunderstanding what the Include function actually does. Option 1 is purely a query syntax which has no effect on what is returned by the entity framework. Option 2, with the Include function instructs the entity framework to Eagerly Fetch the related rows from the database when returns the results of the query.
So option 1 will result in some joins, but the "select" part of the query will be restricted to the OptiDashboards table.
Option 2 will result in joins as well, but in this case it will be returning the results from all the included tables, which obviously is going to introduce more of a performance hit. But at the same time, the results will include all the related entities you need, avoiding the [possible] need for more round-trips to the database.
I think the Include will render as joins an you will the able to access the data from those tables in you user object (Eager Loading the properties).
The Any query will render as exists and not load the user object with info from the other tables.
For best performance if you don't need the additional info use the Any query
As has already been pointed out, the first option would almost certainly perform better, simply because it would be retrieving less information. Besides that, I wanted to point out that you could also write the query this way:
var dashboards =
from u in db.OptiUsers where u.WinLiveIDToken == wlid
from im in u.IdentityMaps
from d in im.UserGroup.OptiDashboards
select d;
I would expect the above to perform similarly to the first option, but you may (or may not) prefer the above form.

Dynamically Generating a Linq/Lambda Where Clause

I've been searching here and Google, but I'm at a loss. I need to let users search a database for reports using a form. If a field on the form has a value, the app will get any reports with that field set to that value. If a field on a form is left blank, the app will ignore it. How can I do this? Ideally, I'd like to just write Where clauses as Strings and add together those that are not empty.
.Where("Id=1")
I've heard this is supposed to work, but I keep getting an error: "could not be resolved in the current scope of context Make sure all referenced variables are in scope...".
Another approach is to pull all the reports then filter it one where clause at a time. I'm hesitant to do this because 1. that's a huge chunk of data over the network and 2. that's a lot of processing on the user side. I'd like to take advantage of the server's processing capabilities. I've heard that it won't query until it's actually requested. So doing something like this
var qry = ctx.Reports
.Select(r => r);
does not actually run the query until I do:
qry.First()
But if I start doing:
qry = qry.Where(r => r.Id = 1).Select(r => r);
qry = qry.Where(r => r.reportDate = '2010/02/02').Select(r => r);
Would that run the query? Since I'm adding a where clause to it. I'd like a simple solution...in the worst case I'd use the Query Builder things...but I'd rather avoid that (seems complex).
Any advice? :)
Linq delays record fetching until a record must be fetched.
That means stacking Where clauses is only adding AND/OR clauses to the query, but still not executing.
Execution of the generated query will be done in the precise moment you try to get a record (First, Any etc), a list of records(ToList()), or enumerate them (foreach).
.Take(N) is not considered fetching records - but adding a (SELECT TOP N / LIMIT N) to the query
No, this will not run the query, you can structure your query this way, and it is actually preferable if it helps readability. You are taking advantage of lazy evaluation in this case.
The query will only run if you enumerate results from it by using i.e. foreach or you force eager evaluation of the query results, i.e. using .ToList() or otherwise force evaluation, i.e evaluate to a single result using i.e First() or Single().
Try checking out this dynamic Linq dll that was released a few years back - it still works just fine and looks to be exactly what you are looking for.

LINQ stack overflow selecting many

I have a scenario where I have a table of "batch" and a table of "test" where "test" contains an FK to "batch" and a many tests can belong to a batch.
I want to be able to select multiple batches and find all tests that belong to them. I do this by producing a list of PKs to the batches I'm interested in and then the following LINQ query:
var ret =
from t in tests
from b in indices //indices is a list of long PK's belonging to selected batches
where t.batch_id == b
select t;
It works but when my selection size exceeds 14 batches, I get a "SQLite error
parser stack overflow" on the LINQ expression regardless of how many tests are found.
I want to be able to handle large selections if possible. How can I do this?
If JeffN825's query doesn't resolve your issue, which I would give high odds on it doing so, you may need to compile your own SQLite and set the -DYYSTACKDEPTH value to something bigger than the default. So you will need to find out what it was set to and then maybe double it and go from there. The full line you would pass is CFLAGS="-DYYSTACKDEPTH=1000" changing the 1000 to be what how deep you want the stack to be.
The LINQ provider may be blowing up because it's trying to issue 1 query per index. You can verify this (if SQL is in fact getting generated at all) by profiling the DB and seeing if it's actually issuing 1 query per index.
Try this instead:
var ret =
from t in tests
where indices.Contains(t.batch_id)
select t;

Categories

Resources