LINQ to Entities: Query not working with certain parameter value - c#

I have a very strange problem with a LINQ to Entities query with EF1.
I have a method with a simple query:
public DateTime GetLastSuccessfulRun(string job)
{
var entities = GetEntities();
var query = from jr in entities.JOBRUNS
where jr.JOB_NAME == job && jr.JOB_INFO == "SUCCESS"
orderby jr.JOB_END descending
select jr.JOB_END;
var result = query.ToList().FirstOrDefault();
return result.HasValue ? result.Value : default(DateTime);
}
The method GetEntities returns an instance of a class that is derived from System.Data.Objects.ObjectContext and has automatically been created by the EF designer when I imported the schema of the database.
The query worked just fine for the last 15 or 16 months. And it still runs fine on our test system. In the live system however, there is a strange problem: Depending on the value of the parameter job, it returns the correct results or an empty result set, although there is data it should return.
Anyone ever had a strange case like that? Any ideas what could be the problem?
Some more info:
The database we query against is a Oracle 10g, we are using an enhanced version of the OracleEFProvider v0.2a.
The SQl statement that is returned by ToTraceString works just fine when executed directly via SQL Developer, even with the same parameter that is causing the problem in the LINQ query.
The following also returns the correct result:
entities.JOBRUNS.ToList().Where(x => x.JOB_NAME == job && x.JOB_INFO == "SUCCESS").Count();
The difference here is the call to ToList on the table before applying the where clause. This means two things:
The data is in the database and it is correct.
The problem seems to be the query including the where clause when executed by the EF Provider.
What really stuns me is, that this is a live system and the problem occurred without any changes to the database or the program. One call to that method returned the correct result and the next call five minutes later returned the wrong result. And since then, it only returns the wrong results.
Any hints, suggestions, ideas etc. are welcome, never mind, how far-fetched they seem! Please post them as answers, so I can vote on them, just for the fact for reading my lengthy question and bothering thinking about that strange problem... ;-)

First of all remove ObjectContext caching. Object context internally uses UnitOfWork and IdentityMap patterns. This can have big impact on queries.

Related

Selecting the first item from a query eficiently

Been reading Microsoft's LINQ docs for a while in search for the correct way to do this. Microsoft's example is the following:
Customer custQuery =
(from custs in db.Customers
where custs.CustomerID == "BONAP"
select custs)
.First();
This obviosly works and it's the obvious way to do it(except for using FirstOrDefault() rather than First()), however to me, it looks like this runs the query and after it's done it selects the first.
Is there a way to return the first result and not continue the query?
however to me, it looks like this runs the query and after it's done it selects the first
Nope. The query inside the parentheses returns an IQueryable object, which is basically the representation of a query that hasn't been run yet. It's only when you call .First() does it actually process the IQueryable object and translate it into a database query, and without looking I guarantee you it only asks the database for the first item.
However, if you were to write .ToList().First() instead of just .First() (and you see beginners making this mistake in less obvious ways), it would indeed load everything into memory and then pull the first object from it.
But the code you've pasted is perfectly efficient.

LINQ to Entities chaining commands with differing results

My question is more general, but I have an example to help illustrate:
db.aTable.Where(x => x.Date < someDateInThePast).OrderByDescending(x => x.Date).First()
That gives me one item, which differs from the item returned by this command:
db.aTable.Where(x => x.Date < someDateInThePast).ToList().OrderByDescending(x => x.Date).First()
(note the "ToList()" in the middle).
From what I can see, what is actually happening in the 1st example is the OrderBy is completely disregarding the filtering that is done by the .Where(). It is ordering the entire aTable.
And the 2nd query is giving the actually correct item.
The .Date parameter is a DateTime type (on SQL side it is a 'datetime').
Is this behaviour to be expected from LINQ to Entities?
By adding .ToList() you actually change the context in which the data is processed.
Your first query is handled by your database completely and you only return the value from .First() to your Entity-Instance.
In the second one, you basically give the command to load up the conditioned aTable by giving him the command .ToList() and THEN order it, add the second condition and pick the first Date value from that already instanced table.
Microsoft states link that a CLR change might lead to unexpected results, which is what you are doing.
One way to know exactly what is happening would be that you execute your statement on your SQL Server directly:
Select Top(1) Date
From aTable
Where Date < someDateInThePast
order by Date desc
And then create a dbset for your data up to the point where the context changes:
Select *
From aTable
Where Date < someDateInThePast
order by Date desc
And then call it separately in your c# environment. Then check whether the results still differ.
Hope this helps!
I can't fully explain why it works like this but I have found that the inclusion of the First() is what is causing the issue. When I view the raw SQL that is generated by the LINQ there is no reference to 'Order By' in it. I can only assume the ordering happens on the client side. But, there is reference to take 'TOP (1)' in the SQL. Meaning, because the SQL server is only returning 1 result, the order by is happening on just 1 result and doesn't do anything useful.
If I change First() to ToList() then the ordering works as expected. This doesn't solve my issue but it explains the behaviour.

How does Entity Framework entity loading work?

I've been playing around with making my own Entity Framework (for personal projects and out of curiosity on what making something like this would take).
While I was doing Entity Framework performance tests with a data table with 700k rows and 5 columns (named MassData), I ran into something peculiar issues that I'm hoping someone could explain to me.
Running the following test:
var Context = new EntityFameworkContext();
var first = Context.MassData.Where(x => x.Id == 1);
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
The context creation takes 35ms, getting first takes about 215ms and getting firstFifty takes 14ms.
Removing 'first', getting 'firstFifty' takes about 210ms.
The results were the same if I switch the 'first' query with a Where() that selects everything (still with no iteration).
My first thought, was that this was some case of loading the lazy data in the DbSet, with the first query enumerating data the next one accesses (even though the first one doesn't iterate through anything). This would kind of explain why the first always takes a minimum of 200ms regardless of the query, while the second runs as fast as if no database connection was even involved (the 'firstFifty' takes 25ms minimum to run as an SQL query, more than the 15ms I'm seeing here).
Except loading all of MassData takes 5 seconds. Just reading it takes about 2,5. So it can't be loading everything, but it's clearly loading more than the first query requires. So obviously I'm missing something.
Would anyone happen to have an explanation for why the
var first = Context.MassData.Where(x => x.Id == 1);
query speeds up the
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
query?
EDIT:
Turns out, it really had nothing to do with lazy loading at all. The first query opens the connection and (I presume) does and stores the validation of the entity type against the database table. The second query then doesn't have to open the connection or do much if any validation, in which case the duration of the second query matches up, and everything makes sense.
EDIT 2:
Modified title to better match what the question ended up really being about (How does lazy-loading work => how does entity loading work).
Because you're still loading the Entity Type and the prerequisites of it. Regardless of what you're trying to query. Lambda expression for EF still is SQL, with the conversion from Lambda to String clauses and statements. So the first slowness is not from the Query but from the EF Initial set up.
Remember, You're still creating an Instance of the EF so it will eat up some runtime process. Then the rest is Query Fetching time. This is unavoidable process of-course because of the CLR process.
So, Generally. The Second Query is ready for the Query since in your Model where you set up the EF is still on use, But when the Garbage collection decides that it is not going to be used anywhere, then your query for the next session will be slower at the begin Init for EF. "MEANING YOUR CONNECTION FROM THE DB IS STILL OPEN" as easy as that.
There are tools to show sql server activity, so you do not have to guess (for example sql profiler for microsoft sql server). But the lag on first query has probably nothing to do with database, it is just EF internal initialization. EF is notoriously lazy.

LINQ and selection rows from big database

I have some database ang now it contains a table with about 100 rows. But in future it will have not 100 but 1 000 000+ rows and I have to be careful with my web application I'm developing now.
Problem is next: at web page I need to create paged list what will show records to user. And here is a sample of code that I plan to use
public IQueryable<MyTable> GetRows(int from, int to)
{
var queryRes = (from row in SomeDataContext.MyTable
order by row.id
select row).AsQueriable();
return queryRes.Take(to).Skip(from);
}
It is only sample of code. I did not run it.
But question is what will go on in this case? I see tow scenarios
It will load all rows from database and at server side and records in range from 'from' to 'to' will be returned. Other will be ignored. In this case my application will have big troubles. Imagine load 1 000 000 rows from database every time. It will be disaster.
It will construct SQL request what will return only rows I need without loading others. That's exactly what I need.
I think that it will be 2 scenario but I'm not sure and can't check it. Am I correct?
As a side-note, you don't have to call AsQueryable. It is enough to do
var queryRes = SomeDataContext.MyTable.OrderBy(r => r.Id);
return queryRes.Take(to).Skip(from);
And to answer your question - scenario 2 will be executed. You can always check the generated SQL by using the SQL Server Profiler, but in case you are using Entity Framework, you can even do queryRes.ToString(). And as #Aron correctly pointed out - the query will be actually executed against the database only when enumerating the results (e.g. calling queryRes.ToList()).
These questions address the issue of looking up the SQL code in more detail:
How to view generated SQL from Entity Framework?
exact sql query executed by Entity Framework
Strictly speaking, neither 1 nor 2 is correct. Running the code DOES NOT hit the database. It constructs an expression tree. The calling code can still modify the expression tree further without hitting the database.
With the IQueryable interface no SQL is run. It is at the point when you call IEnumerable.GetEnumerator() that the underlying Linq Provider converts the WHOLE expression into a query. In this case a SQL query, and then run it.
So for example, with this code. You could have
void Main()
{
var foo = from x in GetRows(10, 10)
where x.Id > 1000
select x;
foreach(var f in foo)
{
//Stuff
}
}
The sql that is actually run will actually be closer to
SELECT a,b,c FROM
(SELECT a,b,c, ROW_NUMBER() OVER (ORDER BY ...) as row_number
FROM Table
WHERE id > 1000) t0
WHERE to.row_number BETWEEN 10 and 20;
To be honest you are going about this wrong. You don't need a GetRows method. I would directly call the Linq query when constructing the table itself. You should take a look at the IRepository pattern that MVC scaffolding uses.
Finally if this is meant to be called as a WebQuery for AJAX I would look at the two OData implementations in .net (WCF Data Services and WebAPI OData).
You are right.
The 2. scenario is what will happen. When the query is eventuallty exectuted.
I Would sugges to reverse the Take - Skip, so you start by Skip
queryRes.Skip(from).Take(to)
Debuggen this method will not make any calls to the database. It just returns the query - not the resualt.
If you want to test exactly what will happen, try download LinqPad - it is a great to for demystifying linq queries.

LINQ to Entities query takes long to compile, SQL runs fast

I'm working on a piece of code, written by a coworker, that interfaces with a CRM application our company uses. There are two LINQ to Entities queries in this piece of code that get executed many times in our application, and I've been asked to optimize them because one of them is really slow.
These are the queries:
First query, this one compiles pretty much instantly. It gets relation information from the CRM database, filtering by a list of relation IDs given by the application:
from relation in context.ADRELATION
where ((relationIds.Contains(relation.FIDADRELATION)) && (relation.FLDELETED != -1))
join addressTable in context.ADDRESS on relation.FIDADDRESS equals addressTable.FIDADDRESS
into temporaryAddressTable
from address in temporaryAddressTable.DefaultIfEmpty()
join mailAddressTable in context.ADDRESS on relation.FIDMAILADDRESS equals
mailAddressTable.FIDADDRESS into temporaryMailAddressTable
from mailAddress in temporaryMailAddressTable.DefaultIfEmpty()
select new { Relation = relation, Address = address, MailAddress = mailAddress };
The second query, which takes about 4-5 seconds to compile, and takes information about people from the database (again filtered by a list of IDs):
from role in context.ROLE
join relationTable in context.ADRELATION on role.FIDADRELATION equals relationTable.FIDADRELATION into temporaryRelationTable
from relation in temporaryRelationTable.DefaultIfEmpty()
join personTable in context.PERSON on role.FIDPERS equals personTable.FIDPERS into temporaryPersonTable
from person in temporaryPersonTable.DefaultIfEmpty()
join nationalityTable in context.TBNATION on person.FIDTBNATION equals nationalityTable.FIDTBNATION into temporaryNationalities
from nationality in temporaryNationalities.DefaultIfEmpty()
join titelTable in context.TBTITLE on person.FIDTBTITLE equals titelTable.FIDTBTITLE into temporaryTitles
from title in temporaryTitles.DefaultIfEmpty()
join suffixTable in context.TBSUFFIX on person.FIDTBSUFFIX equals suffixTable.FIDTBSUFFIX into temporarySuffixes
from suffix in temporarySuffixes.DefaultIfEmpty()
where ((rolIds.Contains(role.FIDROLE)) && (relation.FLDELETED != -1))
select new { Role = role, Person = person, relation = relation, Nationality = nationality, Title = title.FTXTBTITLE, Suffix = suffix.FTXTBSUFFIX };
I've set up the SQL Profiler and took the SQL from both queries, then ran it in SQL Server Management Studio. Both queries ran very fast, even with a large (~1000) number of IDs. So the problem seems to lie in the compilation of the LINQ query.
I have tried to use a compiled query, but since those can only contain primitive parameters, I had to strip out the part with the filter and apply that after the Invoke() call, so I'm not sure if that helps much. Also, since this code runs in a WCF service operation, I'm not sure if the compiled query will even still exist on subsequent calls.
Finally what I tried was to only select a single column in the second query. While this obviously won't give me the information I need, I figured it would be faster than the ~200 columns we're selecting now. No such case, it still took 4-5 seconds.
I'm not a LINQ guru at all, so I can barely follow this code (I have a feeling it's not written optimally, but can't put my finger on it). Could anyone give me a hint as to why this problem might be occurring?
The only solution I have left is to manually select all the information instead of joining all these tables. I'd then end up with about 5-6 queries. Not too bad I guess, but since I'm not dealing with horribly inefficient SQL here (or at least an acceptable level of inefficiency), I was hoping to prevent that.
Thanks in advance, hope I made things clear. If not, feel free to ask and I'll provide additional details.
Edit:
I ended up adding associations on my entity framework (the target database didn't have foreign keys specified) and rewriting the query thusly:
context.ROLE.Where(role => rolIds.Contains(role.FIDROLE) && role.Relation.FLDELETED != -1)
.Select(role => new
{
ContactId = role.FIDROLE,
Person = role.Person,
Nationality = role.Person.Nationality.FTXTBNATION,
Title = role.Person.Title.FTXTBTITLE,
Suffix = role.Person.Suffix.FTXTBSUFFIX
});
Seems a lot more readable and it's faster too.
Thanks for the suggestions, I will definitely keep the one about making multiple compiled queries for different numbers of arguments in mind!
Gabriels answer is correct: Use a compiled query.
It looks like you are compiling it again for every WCF request which of course defeats the purpose of one-time initialization. Instead, put the compiled query into a static field.
Edit:
Do this: Send maximum load to your service and pause the debugger 10 times. Look at the call stack. Did it stop more often in L2S code or in ADO.NET code? This will tell you if the problem is still with L2S or with SQL Server.
Next, let's fix the filter. We need to push it back into the compiled query. This is only possible by transforming this:
rolIds.Contains(role.FIDROLE)
to this:
role.FIDROLE == rolIds_0 || role.FIDROLE == rolIds_1 || ...
You need a new compiled query for every cardinality of rolIds. This is nasty, but it is necessary to get it to compile. In my project, I have automated this task but you can do a one-off solution here.
I guess most queries will have very few role-id's so you can materialize 10 compiled queries for cardinalities 1-10 and if the cardinality exceeds 10 you fall back to client-side filtering.
If you decide to keep the query inside the code, you could compile it. You still have to compile the query once when you run your app, but all subsequent call are gonna use that already compiled query. You can take a look at MSDN help here: http://msdn.microsoft.com/en-us/library/bb399335.aspx.
Another option would be to use a stored procedure and call the procedure from your code. Hence no compile time.

Categories

Resources