Entity Framework - behind the scenes: DataReaders and connection life period - c#

Another question regarding EF:
I was wondering what's going behind the scenes when iterating over a query result.
For example, check out the following code:
var activeSources = from e in entitiesContext.Sources
where e.IsActive
select e;
and then:
foreach (Source currSource in allSources)
{
code based on the current source...
}
Important note: Each iteration takes a while to complete (from 1 to 25 seconds).
Now, I assume EF is based on DataReaders for maximum efficiency, so based on that assumption, I figure that in the above case, the Database connection will be kept open until I finish iterating over the results, which will be a very long time (when talking in terms of code), which is something I obviously don't want.
Is there a way to fetch the entire data like I would've done with plain old ADO.NET DataAdapters, DataSets and the fill() method instead of using DataReaders?
Or maybe i'm way off with my assumptions?
In any case I would've loved to be pointed to a good source explaining this if available.
Thanks,
Mikey

If you want to get all of the data up front, similar to Fill(), you need to force the query to execute.
var activeSources = from e in entitiesContext.Sources
where e.IsActive
select e;
var results = activeSources.ToList();
After ToList() is called you will have the data and be disconnected from the database.

If you want to return all results at once use .ToList(); Then deferred execution won't happen.
var activeSources = (from e in entitiesContext.Sources
where e.IsActive
select e).ToList();

Related

Slow LINQ Performance on DataTable Where Clause?

I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.
The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.
Simplified code:
foreach (String someValue in someList) {
String searchCode = OutOfScopeFunction(someValue);
var results = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.Take(1);
if (results.Any()) {
results.First()["columnname"] = 10;
}
}
This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.
I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.
Three questions:
Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?
Thank you for your time. If I find an answer I shall post it here as well.
Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.
It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.
With deferred execution in mind, you should write something like
var result = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.FirstOrDefault();
if (result != null) {
result["columnname"] = 10;
}
What you have implemented is pretty much join :
var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();
var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);
foreach(var result in results)
{
result["columnname"] = 10;
}
Join will probably optimize the access to both lists using some kind of lookup.
But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.
Did you try doing raw UPDATE calls? How many items are you expecting to update?

EF Linq QUery cause lock compared to SQL

I have a simple count query using LINQ and EF:
var count = (from I in db.mytable
where xyz
select I).Count();
the code above shows the query being locked in the database.
while the execute sql executes right away:
var count = db.SqlQuery<int>("select count(*) from mytable where xyz").FirstOrDefault();
the code above returns immediately.
I few have suggested to remove the .ToList() which I did and not difference. One thing is that this only happens on the PROD server. The QA server executes pretty fast as expected. But the prod server shows that it gets suspended. I suspect this could be a data storage limitation or server related. But wanted to make sure I am not doing something stupid in the code.
UPDATE:
One thing I noticed is the first time it execute is takes longer the first time. When I set next statement to run it again, it executes immediately. Is there a compile of the query the first time?
Because you are calling ToList in the first query and that causes fetching all records from DB and do the counting in memory. Instead of ToList you can just call Count() to get the same behaviour:
var count = (from I in db.mytable
where xyz
select I).Count();
You must not call .ToList() method, because you start retrieve all objects from database.
Just call .Count()
var count = (from I in db.mytable
where xyz
select I).Count();
Count can take a predicate. I'm not sure if it will speed up your code any but you can write the count as such.
var count = db.mytable.Count(x => predicate);
Where predicate is whatever you are testing for in your where clause.
Simple fiddling in LINQPad shows that this will generate similar, if not exactly the same, SQL as above. This is about the simplest way, in terseness of code, that I know how to do it.
If you need much higher speeds than what EF provides, yet stay in the confines of EF without using inline SQL you could make a stored procedure and call it from EF.

C# Query can change independently in context

I have a difficult question. I have seen that the number of the record in a query can change without re-run the same query.
The code below shows the scenario:
using (var db = new MyContext()) {
var query = from e in db.Entities select e;
//here the query.Count is equals to 100 for example
Thread.Sleep(10000);
//after some times the db has been populated
//here the query.Count is equals to 200 for example without run again the query
}
My question is: why this behaviour? why it seems to be an automatic binding between the query result and the data layer? Entity framework works in background in order to update the query result?
Thanks in advance.
Remember that with an IQueryable, thanks to deferred execution, the query will be evaluated and executed against the database every time that you enumerate it, that is, when you run .Count(), .ToList() etc.
If in doubt, use a profiler, such as MiniProfiler or EF Profiler to understand exactly when you are hitting the database.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Dynamically Generating a Linq/Lambda Where Clause

I've been searching here and Google, but I'm at a loss. I need to let users search a database for reports using a form. If a field on the form has a value, the app will get any reports with that field set to that value. If a field on a form is left blank, the app will ignore it. How can I do this? Ideally, I'd like to just write Where clauses as Strings and add together those that are not empty.
.Where("Id=1")
I've heard this is supposed to work, but I keep getting an error: "could not be resolved in the current scope of context Make sure all referenced variables are in scope...".
Another approach is to pull all the reports then filter it one where clause at a time. I'm hesitant to do this because 1. that's a huge chunk of data over the network and 2. that's a lot of processing on the user side. I'd like to take advantage of the server's processing capabilities. I've heard that it won't query until it's actually requested. So doing something like this
var qry = ctx.Reports
.Select(r => r);
does not actually run the query until I do:
qry.First()
But if I start doing:
qry = qry.Where(r => r.Id = 1).Select(r => r);
qry = qry.Where(r => r.reportDate = '2010/02/02').Select(r => r);
Would that run the query? Since I'm adding a where clause to it. I'd like a simple solution...in the worst case I'd use the Query Builder things...but I'd rather avoid that (seems complex).
Any advice? :)
Linq delays record fetching until a record must be fetched.
That means stacking Where clauses is only adding AND/OR clauses to the query, but still not executing.
Execution of the generated query will be done in the precise moment you try to get a record (First, Any etc), a list of records(ToList()), or enumerate them (foreach).
.Take(N) is not considered fetching records - but adding a (SELECT TOP N / LIMIT N) to the query
No, this will not run the query, you can structure your query this way, and it is actually preferable if it helps readability. You are taking advantage of lazy evaluation in this case.
The query will only run if you enumerate results from it by using i.e. foreach or you force eager evaluation of the query results, i.e. using .ToList() or otherwise force evaluation, i.e evaluate to a single result using i.e First() or Single().
Try checking out this dynamic Linq dll that was released a few years back - it still works just fine and looks to be exactly what you are looking for.

Categories

Resources