LINQ join behaving oddly - c#

I am attempting to perform a join between two tables and limit results by 3 conditions. 2 of the conditions belong to the primary table, the third condition belongs to the secondary table. Here is the query I'm attempting:
var articles = (from article in this.Context.contents
join meta in this.Context.content_meta on article.ID equals meta.contentID
where meta.metaID == 1 && article.content_statusID == 1 && article.date_created > created
orderby article.date_created ascending
select article.content_text_key);
It is meant to join the two tables by the contentID, then filter based on the metaID (type of article), statusID, and then get all articles that are greater than the datetime created. The problem is that it returns 2 records (out of 4 currently). One has a date_created less than created and the other is the record that produced created in the first place (thus equal).
By removing the join and the where clause for the meta, the result produces no records (expected). What I can't understand is that when I translate this join into regular SQL it works just fine. Obviously I'm misunderstanding what the functionality of join is in this context. What would cause this behavior?
Edit:
Having tried this in LinqPad, I've noticed that LinqPad provides the expected results. I have tried these queries separately in code and it isn't until the join is added that odd results begin populating it appears to be happening on any date comparison where the record occurs on the same day as the limiter.

I can't seem to be able to add a comment but in debug mode you should be able to put a break point on this line of code. When you do you should be able to hover over it and have it tell you the sql that LINQ generates. Please post that sql.

At your suggestion, I'm posting my comment as the answer:
"It might also help to see your schema. The data types for metaID,
content_statusID, and date_created might come into play as well -- and
it's easy for me (somebody who's unfamiliar with your code) to make
assumptions about those data types."

Related

LINQ to Entities chaining commands with differing results

My question is more general, but I have an example to help illustrate:
db.aTable.Where(x => x.Date < someDateInThePast).OrderByDescending(x => x.Date).First()
That gives me one item, which differs from the item returned by this command:
db.aTable.Where(x => x.Date < someDateInThePast).ToList().OrderByDescending(x => x.Date).First()
(note the "ToList()" in the middle).
From what I can see, what is actually happening in the 1st example is the OrderBy is completely disregarding the filtering that is done by the .Where(). It is ordering the entire aTable.
And the 2nd query is giving the actually correct item.
The .Date parameter is a DateTime type (on SQL side it is a 'datetime').
Is this behaviour to be expected from LINQ to Entities?
By adding .ToList() you actually change the context in which the data is processed.
Your first query is handled by your database completely and you only return the value from .First() to your Entity-Instance.
In the second one, you basically give the command to load up the conditioned aTable by giving him the command .ToList() and THEN order it, add the second condition and pick the first Date value from that already instanced table.
Microsoft states link that a CLR change might lead to unexpected results, which is what you are doing.
One way to know exactly what is happening would be that you execute your statement on your SQL Server directly:
Select Top(1) Date
From aTable
Where Date < someDateInThePast
order by Date desc
And then create a dbset for your data up to the point where the context changes:
Select *
From aTable
Where Date < someDateInThePast
order by Date desc
And then call it separately in your c# environment. Then check whether the results still differ.
Hope this helps!
I can't fully explain why it works like this but I have found that the inclusion of the First() is what is causing the issue. When I view the raw SQL that is generated by the LINQ there is no reference to 'Order By' in it. I can only assume the ordering happens on the client side. But, there is reference to take 'TOP (1)' in the SQL. Meaning, because the SQL server is only returning 1 result, the order by is happening on just 1 result and doesn't do anything useful.
If I change First() to ToList() then the ordering works as expected. This doesn't solve my issue but it explains the behaviour.

How are join operators unnecessary in LINQ to SQL?

I've periodically seen it written that joins are unnecessary in LINQ to SQL. Most recently, I saw this statement in one of Joseph Albahari's LINQPad samples (Chapter 9 - LINQ Operators > Filtering > Joining > Simple Join).
The comment says:
// Note: before delving into this section, make sure you've read the preceding two
// sections: Select and SelectMany. The Join operators are actually unnecessary
// in LINQ to SQL, and the equivalent of SQL inner and outer joins is most easily
// achieved in LINQ to SQL using Select/SelectMany and subqueries!
I've gone through the Select and SelectMany sections in LINQPad and I definitely want to be doing this the easy way, but my attempts to completely remove joins (and get the same results) have failed.
Anyway, below is the 100% working query I'm trying this out on (full schema pictured below).
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
join projectsBillingSchedule in dbContext.ProjectsBillingSchedules
on workOrder.ProjectId equals projectsBillingSchedule.ProjectId
join partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
on projectsBillingSchedule.BillingSchId equals partyPricing.BILLING_SCH_ID
join measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
on partyPricing.PARTY_RETROFIT_CODE_ID equals measuresPartyRetrofitCode.PartyRetrofitCodeId
join measure in dbContext.Measures on measuresPartyRetrofitCode.ConvId equals measure.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Please note, certain entities are omitted from the code because they are not strictly necessary for the query to run properly. Aside from that, you can see the joins are done in order of the relationships in the schema image, i.e., from WORK_ORDERS to MEASURES (start by moving away from WORK_ORDER_LINES):
I have tried some using navigation properties, but I run into 2 problems:
The SQL outputs in multiple statements (N+1 problem?), and
I can't seem to get it all in one statement.
So, back to my question - using the example above (or something else with a lot of joins), how are join operators unnecessary in LINQ to SQL?
UPDATE
Ok, I think I've figured out one solution, but this actually requires more lines of code to achieve the same result.
Since that is the case, I'm not sure why it is worth exclaiming that join operators are unnecessary. I'll leave this question open for a while to see if someone wants to make a compelling case against joins.
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
from projectsBillingSchedule in dbContext.ProjectsBillingSchedules
where workOrder.ProjectId == projectsBillingSchedule.ProjectId
from partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
where projectsBillingSchedule.BillingSchId == partyPricing.BILLING_SCH_ID
from measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
where partyPricing.PARTY_RETROFIT_CODE_ID == measuresPartyRetrofitCode.PartyRetrofitCodeId
from measure in dbContext.Measures
where measure.ConvId == measuresPartyRetrofitCode.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)

Exception in a CRM LINQ query with joins. Attribute in second table doesn't exist

First of all I'm sorry because this is the second time that I write this question but before was bad explained and now is close.
I'm doing a linq query for a search page for a CRM data base, and wrtiting a normal query like below is not working, I'm getting the exception:
[System.ServiceModel.FaultException<Microsoft.Xrm.Sdk.OrganizationServiceFault>] = {"'Contact' entity doesn't contain attribute with Name = 'title'."}
For a join query, that in the clause Where was something like r.Name == "Me" && j.LastName == "He" I had to did the query with two Where clauses, because I was getting the same exception as above, saying that table 'r' doesn't have 'LastName' attributte.
var cms = from i in aux_pr
join cal in Contact on i.aux_CallerRequestorID.Id equals cal.ContactId.Value
join sub in Subject on i.aux_ClassificationID.Id equals sub.SubjectId
where cal.FullName.Contains(searchTerm) ||
sub.Title.Contains(searchTerm)
In this case, how can I do this query. Thanks in advance!
I want to comment what have I learned and the solution that I have found to my problem hoping could help some one. There are some limitations in CRM LINQ, as explained here
The first that I found, having an entity reference like this:
CrmEntityReference Caller
{
Guid ID;
string name;
}
I can select Caller.name but I CAN'T have Caller.name in the where clause. Solution for this -> Join the table
The second limitation, is when we have joins in the query, we can have different tables in the where if they are an AND predicate, we have to write two clauses where like this:
where cal.FullName.Contains(searchTerm)
where sub.Title.Contains(searchTerm)
But the problem comes when instead of an AND we need use an OR predicate, the only solution we have is do two queries and after do an Union of these queries.
I have four queries for a call that could be done just with one, now in developing stage performance is good due to the amount of records, but we'll see in testing stage how this work.
try to create two different filters..
var cms = from i in aux_pr
join cal in Contact on i.aux_CallerRequestorID.Id equals cal.ContactId.Value
join sub in Subject on i.aux_ClassificationID.Id equals sub.SubjectId
where cal.FullName.Contains(searchTerm) ||
where sub.Title.Contains(searchTerm)

LINQ to Entities query takes long to compile, SQL runs fast

I'm working on a piece of code, written by a coworker, that interfaces with a CRM application our company uses. There are two LINQ to Entities queries in this piece of code that get executed many times in our application, and I've been asked to optimize them because one of them is really slow.
These are the queries:
First query, this one compiles pretty much instantly. It gets relation information from the CRM database, filtering by a list of relation IDs given by the application:
from relation in context.ADRELATION
where ((relationIds.Contains(relation.FIDADRELATION)) && (relation.FLDELETED != -1))
join addressTable in context.ADDRESS on relation.FIDADDRESS equals addressTable.FIDADDRESS
into temporaryAddressTable
from address in temporaryAddressTable.DefaultIfEmpty()
join mailAddressTable in context.ADDRESS on relation.FIDMAILADDRESS equals
mailAddressTable.FIDADDRESS into temporaryMailAddressTable
from mailAddress in temporaryMailAddressTable.DefaultIfEmpty()
select new { Relation = relation, Address = address, MailAddress = mailAddress };
The second query, which takes about 4-5 seconds to compile, and takes information about people from the database (again filtered by a list of IDs):
from role in context.ROLE
join relationTable in context.ADRELATION on role.FIDADRELATION equals relationTable.FIDADRELATION into temporaryRelationTable
from relation in temporaryRelationTable.DefaultIfEmpty()
join personTable in context.PERSON on role.FIDPERS equals personTable.FIDPERS into temporaryPersonTable
from person in temporaryPersonTable.DefaultIfEmpty()
join nationalityTable in context.TBNATION on person.FIDTBNATION equals nationalityTable.FIDTBNATION into temporaryNationalities
from nationality in temporaryNationalities.DefaultIfEmpty()
join titelTable in context.TBTITLE on person.FIDTBTITLE equals titelTable.FIDTBTITLE into temporaryTitles
from title in temporaryTitles.DefaultIfEmpty()
join suffixTable in context.TBSUFFIX on person.FIDTBSUFFIX equals suffixTable.FIDTBSUFFIX into temporarySuffixes
from suffix in temporarySuffixes.DefaultIfEmpty()
where ((rolIds.Contains(role.FIDROLE)) && (relation.FLDELETED != -1))
select new { Role = role, Person = person, relation = relation, Nationality = nationality, Title = title.FTXTBTITLE, Suffix = suffix.FTXTBSUFFIX };
I've set up the SQL Profiler and took the SQL from both queries, then ran it in SQL Server Management Studio. Both queries ran very fast, even with a large (~1000) number of IDs. So the problem seems to lie in the compilation of the LINQ query.
I have tried to use a compiled query, but since those can only contain primitive parameters, I had to strip out the part with the filter and apply that after the Invoke() call, so I'm not sure if that helps much. Also, since this code runs in a WCF service operation, I'm not sure if the compiled query will even still exist on subsequent calls.
Finally what I tried was to only select a single column in the second query. While this obviously won't give me the information I need, I figured it would be faster than the ~200 columns we're selecting now. No such case, it still took 4-5 seconds.
I'm not a LINQ guru at all, so I can barely follow this code (I have a feeling it's not written optimally, but can't put my finger on it). Could anyone give me a hint as to why this problem might be occurring?
The only solution I have left is to manually select all the information instead of joining all these tables. I'd then end up with about 5-6 queries. Not too bad I guess, but since I'm not dealing with horribly inefficient SQL here (or at least an acceptable level of inefficiency), I was hoping to prevent that.
Thanks in advance, hope I made things clear. If not, feel free to ask and I'll provide additional details.
Edit:
I ended up adding associations on my entity framework (the target database didn't have foreign keys specified) and rewriting the query thusly:
context.ROLE.Where(role => rolIds.Contains(role.FIDROLE) && role.Relation.FLDELETED != -1)
.Select(role => new
{
ContactId = role.FIDROLE,
Person = role.Person,
Nationality = role.Person.Nationality.FTXTBNATION,
Title = role.Person.Title.FTXTBTITLE,
Suffix = role.Person.Suffix.FTXTBSUFFIX
});
Seems a lot more readable and it's faster too.
Thanks for the suggestions, I will definitely keep the one about making multiple compiled queries for different numbers of arguments in mind!
Gabriels answer is correct: Use a compiled query.
It looks like you are compiling it again for every WCF request which of course defeats the purpose of one-time initialization. Instead, put the compiled query into a static field.
Edit:
Do this: Send maximum load to your service and pause the debugger 10 times. Look at the call stack. Did it stop more often in L2S code or in ADO.NET code? This will tell you if the problem is still with L2S or with SQL Server.
Next, let's fix the filter. We need to push it back into the compiled query. This is only possible by transforming this:
rolIds.Contains(role.FIDROLE)
to this:
role.FIDROLE == rolIds_0 || role.FIDROLE == rolIds_1 || ...
You need a new compiled query for every cardinality of rolIds. This is nasty, but it is necessary to get it to compile. In my project, I have automated this task but you can do a one-off solution here.
I guess most queries will have very few role-id's so you can materialize 10 compiled queries for cardinalities 1-10 and if the cardinality exceeds 10 you fall back to client-side filtering.
If you decide to keep the query inside the code, you could compile it. You still have to compile the query once when you run your app, but all subsequent call are gonna use that already compiled query. You can take a look at MSDN help here: http://msdn.microsoft.com/en-us/library/bb399335.aspx.
Another option would be to use a stored procedure and call the procedure from your code. Hence no compile time.

LINQ Join 2 Datatables and With SUM AND GROUP BY

I simply can not get this to work out at all, so any expert help would be very much appreciated.
I'm trying (as the subject suggests) to join 2 datatables on Zip Code, but return a table which grouped this by State and has a SUM() of sales.
Here's the latest version of my troubles:
var results =(
from a in dtList.AsEnumerable()
join b in dtListCoded.AsEnumerable()
on a.Field<string>("ZIP") equals b.Field<string>("zip")
group a by {a.Field<string>("StateCode")} into g
select new {
StateCode = a.Field<string>("StateCode"),
SumSales = b.Sum(b => b.Field<double>("SUMSales"))
});
I can join the 2 tables but its getting the result i need that seems to be the tricky bit. If need be I will just have to do 2 queries, but that just seems a bit backward.
Thanks in advance.
Two queries wouldn't be any slower (they should be brought together into a single SQL query upon execution), and would be a lot more readable, transparent during debugging and reusable. I'd recommend breaking it down.

Categories

Resources