C# Query can change independently in context - c#

I have a difficult question. I have seen that the number of the record in a query can change without re-run the same query.
The code below shows the scenario:
using (var db = new MyContext()) {
var query = from e in db.Entities select e;
//here the query.Count is equals to 100 for example
Thread.Sleep(10000);
//after some times the db has been populated
//here the query.Count is equals to 200 for example without run again the query
}
My question is: why this behaviour? why it seems to be an automatic binding between the query result and the data layer? Entity framework works in background in order to update the query result?
Thanks in advance.

Remember that with an IQueryable, thanks to deferred execution, the query will be evaluated and executed against the database every time that you enumerate it, that is, when you run .Count(), .ToList() etc.
If in doubt, use a profiler, such as MiniProfiler or EF Profiler to understand exactly when you are hitting the database.

Related

Performance of IQueryable

i needed to write a dynamic query on the customer database for obtaining few fields of a customer.
following is the code
[Route("api/getBasicCustList/{argType}/{argValue}")]
[HttpGet]
[Authorize]
public dynamic getCustomerDataUsername(String argType, String argValue)
{
IQueryable<CustomerDTO> query =
(from recordset in db.Customers
select new CustomerDTO
{
companyId = recordset.Company.Id,
contactNum = recordset.ContactNum,
username = recordset.UserName,
emailAddress = recordset.Email,
fullName = recordset.FullName,
accountNumber = recordset.RCustId
}
);
switch (argType)
{
case "username" :
query = query.Where(c => c.username.StartsWith(argValue));
break;
case "contactnum":
long mobNum = Int64.Parse(argValue);
query = query.Where(c => c.contactNum == mobNum);
break;
case "fullname":
query = query.Where(c => c.fullName.Contains(argValue));
break;
}
return new { data = query.ToList() };
}
this works fine and is solving my purpose.
my question here is when i write my first part of the query to get all the customer records and later on apply the where condition dynamically will the results be brought in memory or the complete query is generated and executed at db in one shot?
Since i have just 500 records as of now, i am not able to find any performance lag but when i take this to production i will be dealing with at least 200,000 to 300,000 records.
ok, the answer is
The query won't be executed until you reach that "ToList" at the end
of your method
from the MSDN link shared by #GeorgPatscheider its mentioned
At what point query expressions are executed can vary. LINQ queries
are always executed when the query variable is iterated over, not when
the query variable is created. This is called deferred execution
Deferred execution enables multiple queries to be combined or a query
to be extended. When a query is extended, it is modified to include
the new operations, and the eventual execution will reflect the
changes.
Its also written that if queries have any of Average, Count, First, or Max it will perform an immediate execution.
thanks.
The biggest factor in improving performance of the query against large tables is doing the filtering on the server (database) side. In Entity Framework 6.x and earlier, a query would fail to compile if EF could not convert the entire query to SQL. In EF Core, this is no longer the case. Instead as much of the query as possible will be converted to SQL. The rest will be evaluated on the client side.
All three of your filtering lambda expressions can be converted to SQL. However, if you were to write a predicate that can't be converted, then on EF Core your performance would suffer. All the records in the Customers table would be sent to the client for filtering despite the fact that the evaluation of the query is still deferred until ToList() is called. Your logs would contain a warning but that is easy to miss.
A good reference for this is Jon Smiths article Entity Framework Core: Client vs. Server evaluation.

EF Linq QUery cause lock compared to SQL

I have a simple count query using LINQ and EF:
var count = (from I in db.mytable
where xyz
select I).Count();
the code above shows the query being locked in the database.
while the execute sql executes right away:
var count = db.SqlQuery<int>("select count(*) from mytable where xyz").FirstOrDefault();
the code above returns immediately.
I few have suggested to remove the .ToList() which I did and not difference. One thing is that this only happens on the PROD server. The QA server executes pretty fast as expected. But the prod server shows that it gets suspended. I suspect this could be a data storage limitation or server related. But wanted to make sure I am not doing something stupid in the code.
UPDATE:
One thing I noticed is the first time it execute is takes longer the first time. When I set next statement to run it again, it executes immediately. Is there a compile of the query the first time?
Because you are calling ToList in the first query and that causes fetching all records from DB and do the counting in memory. Instead of ToList you can just call Count() to get the same behaviour:
var count = (from I in db.mytable
where xyz
select I).Count();
You must not call .ToList() method, because you start retrieve all objects from database.
Just call .Count()
var count = (from I in db.mytable
where xyz
select I).Count();
Count can take a predicate. I'm not sure if it will speed up your code any but you can write the count as such.
var count = db.mytable.Count(x => predicate);
Where predicate is whatever you are testing for in your where clause.
Simple fiddling in LINQPad shows that this will generate similar, if not exactly the same, SQL as above. This is about the simplest way, in terseness of code, that I know how to do it.
If you need much higher speeds than what EF provides, yet stay in the confines of EF without using inline SQL you could make a stored procedure and call it from EF.

Reusing LINQ query results in another LINQ query without re-querying the database

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!
If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.
Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help

Entity Framework - behind the scenes: DataReaders and connection life period

Another question regarding EF:
I was wondering what's going behind the scenes when iterating over a query result.
For example, check out the following code:
var activeSources = from e in entitiesContext.Sources
where e.IsActive
select e;
and then:
foreach (Source currSource in allSources)
{
code based on the current source...
}
Important note: Each iteration takes a while to complete (from 1 to 25 seconds).
Now, I assume EF is based on DataReaders for maximum efficiency, so based on that assumption, I figure that in the above case, the Database connection will be kept open until I finish iterating over the results, which will be a very long time (when talking in terms of code), which is something I obviously don't want.
Is there a way to fetch the entire data like I would've done with plain old ADO.NET DataAdapters, DataSets and the fill() method instead of using DataReaders?
Or maybe i'm way off with my assumptions?
In any case I would've loved to be pointed to a good source explaining this if available.
Thanks,
Mikey
If you want to get all of the data up front, similar to Fill(), you need to force the query to execute.
var activeSources = from e in entitiesContext.Sources
where e.IsActive
select e;
var results = activeSources.ToList();
After ToList() is called you will have the data and be disconnected from the database.
If you want to return all results at once use .ToList(); Then deferred execution won't happen.
var activeSources = (from e in entitiesContext.Sources
where e.IsActive
select e).ToList();

at what point does linq-to-sql or linq send a request to the database

I want to make my queries better but have been un-able to find a resource out there which lays out when a query is shipped of to the db.
DBContext db = new DBContext();
Order _order = (from o in db
where o.OrderID == "qwerty-asdf-xcvb"
select o).FirstOrDefault();
String _custName = _order.Customer.Name +" "+_order.Customer.Surname;
Does the assignment of _custName need to make any request to the database?
Does the assignment of _custName need to make any request to the database?
It depends on whether or not Order.Customer is lazily loaded. If it is lazily loaded, then yes. Otherwise, no.
By the way, you can investigate this easily if you set the DataContext.Log property:
db.Log = Console.Out;
Then you can watch the SQL statements on the console. By stepping through your program you can see exactly when the SQL statement hits the database.
Check out MSDN on Deferred versus Immediate Loading. In particular, you can turn off lazy loading. Watch out for the SELECT N + 1 problem.
Just FYI, besides lazy loading, there is another reason why database activity may not occur when you expect it to when using LINQ. For example, if I change your example code slightly:
DBContext db = new DBContext();
var orders = (from o in db
where o.OrderID == "qwerty-asdf-xcvb"
select o);
var order = orders.FirstOrDefault();
String _custName = _order.Customer.Name +" "+_order.Customer.Surname;
Someone unfamiliar with how LINQ works may expect that all orders are retrieved from the database when the second line of code is executed. In fact, LINQ delays querying the database until the last possible moment, which in this case is the call to FirstOrDefault. Of course, at this point LINQ knows to only retrieve at most one record.

Categories

Resources