How are join operators unnecessary in LINQ to SQL?

How are join operators unnecessary in LINQ to SQL? - c#

I've periodically seen it written that joins are unnecessary in LINQ to SQL. Most recently, I saw this statement in one of Joseph Albahari's LINQPad samples (Chapter 9 - LINQ Operators > Filtering > Joining > Simple Join).
The comment says:
// Note: before delving into this section, make sure you've read the preceding two
// sections: Select and SelectMany. The Join operators are actually unnecessary
// in LINQ to SQL, and the equivalent of SQL inner and outer joins is most easily
// achieved in LINQ to SQL using Select/SelectMany and subqueries!
I've gone through the Select and SelectMany sections in LINQPad and I definitely want to be doing this the easy way, but my attempts to completely remove joins (and get the same results) have failed.
Anyway, below is the 100% working query I'm trying this out on (full schema pictured below).
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
join projectsBillingSchedule in dbContext.ProjectsBillingSchedules
on workOrder.ProjectId equals projectsBillingSchedule.ProjectId
join partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
on projectsBillingSchedule.BillingSchId equals partyPricing.BILLING_SCH_ID
join measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
on partyPricing.PARTY_RETROFIT_CODE_ID equals measuresPartyRetrofitCode.PartyRetrofitCodeId
join measure in dbContext.Measures on measuresPartyRetrofitCode.ConvId equals measure.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Please note, certain entities are omitted from the code because they are not strictly necessary for the query to run properly. Aside from that, you can see the joins are done in order of the relationships in the schema image, i.e., from WORK_ORDERS to MEASURES (start by moving away from WORK_ORDER_LINES):
I have tried some using navigation properties, but I run into 2 problems:
The SQL outputs in multiple statements (N+1 problem?), and
I can't seem to get it all in one statement.
So, back to my question - using the example above (or something else with a lot of joins), how are join operators unnecessary in LINQ to SQL?
UPDATE
Ok, I think I've figured out one solution, but this actually requires more lines of code to achieve the same result.
Since that is the case, I'm not sure why it is worth exclaiming that join operators are unnecessary. I'll leave this question open for a while to see if someone wants to make a compelling case against joins.
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
from projectsBillingSchedule in dbContext.ProjectsBillingSchedules
where workOrder.ProjectId == projectsBillingSchedule.ProjectId
from partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
where projectsBillingSchedule.BillingSchId == partyPricing.BILLING_SCH_ID
from measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
where partyPricing.PARTY_RETROFIT_CODE_ID == measuresPartyRetrofitCode.PartyRetrofitCodeId
from measure in dbContext.Measures
where measure.ConvId == measuresPartyRetrofitCode.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)

Related

Entity Framework gets progressively slow with extra join added even though SQL generated is fast

We have 18 table join which is typical for ERP systems. The join is done via LINQ over Entity Framework.
The join gets progressively slower as more joins are added. The return result set is small(15 records). The LINQ generated query is captured via SQL Profiler and when we run this via Microsoft Management Console it is very fast : 10ms. When we run it via our C# LINQ-over-EntityFramework it takes 4 seconds.
What i guess is happening:
The time it takes to compile expression tree into SQL is 2 seconds out of total 4 seconds, and another 2 seconds i guess is spent internally to convert SQL result set into actually C# classes. Also it is not connected to initialization of entity framework because we run some queries before and repetitive calls to this join produce same 4 seconds.
Is there a way to speed this up. Otherwise we are considering abandoning Entity Framework for being absolutely inefficient...

In case it helps, I had a nasty performance issue, whereby a simple query that took 1-2 seconds in raw SQL took about 11 seconds via EF.
I went from using...
List<GeographicalLocation> geographicalLocations = new SalesTrackerCRMEntities()
.CreateObjectSet<GeographicalLocation>()
.Where(g => g.Active)
.ToList();
which took about 11 seconds via EF, to using...
var geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>()
.AsNoTracking()
.Where(g => g.Active).ToList();
which took less than 200 milliseconds.
The disadvantage to this is that it won't load related entities, so you have to load them manually afterwards, but it gives such an immense performance boost that it was well worth it (in this case at least).
You would have to assess each case individually to see if the extra speed is worth the extra code.

You correctly identified bottlenecks.
If you have quite complex queries, I would suggest you to use compiled queries to overcome expression tree to sql query conversion.
You can refer Compiled Queries in EF from here.
Fo second part if EF is using two much time materialize your object graph then I would suggest to use some other means to retrieve data apart from EF.
One option can be Dapper.NET, You can have your concise sql query and you can directly retrieve its result in concrete model objects using Dapper (or any other tiny ORM)

I suspect your query takes so long to generate becuase you are treating Entity Framework like it is a SQL Query, which is not correct. You have many joins and akward calls in your linq syntax. Generally, your syntax should be similar to the following fictitious modeling query:
var result = (from appointment in appointments
from operation in appointment.Operations
where appointment.Id == 12
select new Model {
Id = appointment.Id,
Name = appointment.Name,
// etc, etc
}).ToList();
There is no use of joins above, the navigation property between Appointment and Operations takes care of the neccessary plumbing. Remember, this is an ORM, there is no concept of a join, only a concept of relationships.
The call to Distinct at the end, also indicates the structure of the db schema may be problematic if it returns too many duplicate results.
If after refactoring the entity model and correctly constructing the query still leaves with underperformance, it is advisable to use a stored procedure and map the result with EF's built in methods for doing so.

It is hard to tell what is going wrong here without seeing how you are using linq, but I suspect this will fix your problem:
var myResult = dataContext.table.Where(x == "Your joins and otherstuff").ToList();
//after converting it to a list use it how you need, but not before.
If this does not help please post your code.

The problem is that you are probably passing it to a data source that is running all sorts of additional queries based on you open result set.
Try this instead:
IEnumerable<SigmaTEK.Web.Models.SchedulerGridModel> tasks = (from appointment in _appointmentRep.Value.Get().Where(a => (a.Start < DbContext.MaxTime && DbContext.MinTime < a.Expiration))
join timeApplink in _timelineAppointmentLink.Value.Get().Where(a => a.AppointmentId != Guid.Empty)
on appointment.Id equals timeApplink.AppointmentId
join timeline in timelineRep.Value.Get().Where(i => timelines.Contains(i.Id))
on timeApplink.TimelineId equals timeline.Id
join repeater in _appointmentRepeaterRep.Value.Get().Where(repeater => (repeater.Start < DbContext.MaxTime && DbContext.MinTime < repeater.Expiration))
on appointment.Id equals repeater.Appointment
into repeaters
from repeater in repeaters.DefaultIfEmpty()
join aInstance in _appointmentInstanceRep.Value.Get()
on appointment.Id equals aInstance.Appointment
into instances
from instance in instances.DefaultIfEmpty()
join opRes in opResRep.Get()
on instance.ResourceOwner equals opRes.Id
into opResources
from op in opResources.DefaultIfEmpty()
join itemResource in _opDocItemResourcelinkRep.Value.Get()
on op.Id equals itemResource.Resource
into itemsResources
from itemresource in itemsResources.DefaultIfEmpty()
join opDocItem in opDocItemRep.Get()
on itemresource.OpDocItem equals opDocItem.Id
into opDocItems
from opdocitem in opDocItems.DefaultIfEmpty()
join opDocSection in opDocOpSecRep.Get()
on opdocitem.SectionId equals opDocSection.Id
into sections
from section in sections.DefaultIfEmpty()
join opDoc in opDocRep.Get()
on section.InternalOperationalDocument equals opDoc.Id
into opdocs
from opdocitem2 in opDocItems.DefaultIfEmpty()
join opDocItemLink in opDocItemStrRep.Get()
on opdocitem2.Id equals opDocItemLink.Parent
into opDocItemLinks
from link in opDocItemLinks.DefaultIfEmpty()
join finItem in finItemsRep.Get()
on link.Child equals finItem.Id
into temp1
from rd1 in temp1.DefaultIfEmpty()
join sec in finSectionRep.Get()
on rd1.SectionId equals sec.Id
into opdocsections
from finopdocsec in opdocsections.DefaultIfEmpty()
join finopdoc in opDocRep.Get().Where(i => i.DocumentType == "Quote")
on finopdocsec.InternalOperationalDocument equals finopdoc.Id
into finOpdocs
from finOpDoc in finOpdocs.DefaultIfEmpty()
join entry in entryRep.Get()
on rd1.Transaction equals entry.Transaction
into entries
from entry2 in entries.DefaultIfEmpty()
join resproduct in resprosductRep.Get()
on entry2.Id equals resproduct.Entry
into resproductlinks
from resprlink in resproductlinks.DefaultIfEmpty()
join res in resRep.Get()
on resprlink.Resource equals res.Id
into rootResource
from finopdoc in finOpdocs.DefaultIfEmpty()
join rel in orgDocIndRep.Get().Where(i => (i.Relationship == "OrderedBy"))
on finopdoc.Id equals rel.OperationalDocument
into orgDocIndLinks
from orgopdoclink in orgDocIndLinks.DefaultIfEmpty()
join org in orgRep.Get()
on orgopdoclink.Organization equals org.Id
into toorgs
from opdoc in opdocs.DefaultIfEmpty()
from rootresource in rootResource.DefaultIfEmpty()
from toorg in toorgs.DefaultIfEmpty()
select new SigmaTEK.Web.Models.SchedulerGridModel()
{
Id = appointment.Id,
Description = appointment.Description,
End = appointment.Expiration,
Start = appointment.Start,
OperationDisplayId = op.DisplayId,
OperationName = op.Name,
AppContextId = _appContext.Id,
TimelineId = timeline.Id,
AssemblyDisplayId = rootresource.DisplayId,
//Duration = SigmaTEK.Models.App.Utils.StringHelpers.TimeSpanToString((appointment.Expiration - appointment.Start)),
WorkOrder = opdoc.DisplayId,
Organization = toorg.Name
}).Distinct().ToList();
//In your UI
MyGrid.DataSource = tasks;
MyGrid.DataBind();
//Do not use an ObjectDataSource! It makes too many extra calls

How is OrderBy Applied?

I have the following query:
var vendors = (from pp in this.ProductPricings
join pic in this.ProductItemCompanies
on pp.CompanyId equals pic.CompanyId into left
from pic in left.DefaultIfEmpty()
orderby pp.EffectiveDate descending
group pp by new { pp.Company, SortOrder = (pic != null) ? pic.SortOrder : short.MinValue } into v
select v).OrderBy(z => z.Key.SortOrder);
Does anyone know how the last OrderBy() is applied? Does that become part of the SQL query, or are all the results loaded in to memory and then passed to OrderBy()?
And if it's the second case, is there any way to make it all one query? I only need the first item and it would be very inefficent to return all the results.

Well it will try to apply the OrderBy to the original query since you are still using an IQueryable - meaning it hasn't been converted to an IEnumerable or hydrated to a collection using ToList or an equivalent.
Whether it can or not depends on the complexity of the resulting query. You'd have to try it to find out. My guess is it will turn the main query into a subquery and layer on a "SELECT * FROM (...) ORDER BY SortOrder" outer query.

Given your specific example the order by in this situation most, likely be appliead as part of the expression tree when it getting build, there for it will be applied to sql generated by the LINQ query, if you would convert it to Enumarable like ToList as mentioned in another answer then Order by would be applied as an extension to Enumerable.

Might use readable code, because as you write it is not understandable.
You will have a problem in the future with the linq statement. The problem is that if your statement does not return any value the value will be null and whenever you make cause a exception.
You must be careful.
I recommend you to do everything separately to understand the code friend.

where vs join. which one is better to use

Currently I'm running some code and got some question about this. Below are the code listings of two LINQ to Entites queries.
Code listing A:
IQueryable list =
from tableProject in db.Project
select new {StaffInCharge = (
from tableStaff in db.Staff
where tableStaff.StaffId == tableProject.StaffInChargeId
select tableStaff.StaffName)};
Code listing B:
IQueryable list =
from tableProjectin db.Project
join tableStaff in db.Staff
on tableProject.StaffInChargeId
equal tableStaff.StaffId
select new {StaffInCharge = tableStaff.StaffName};
What I want to figure out is which one will be better and faster if I have to select many column from many others table.
Thanks.

this is the comment from #Tim Schmelter
"The article(actually my SO-Question) relates to LINQ-To-DataSet what is based on LINQ-To-Objects. Linq to SQL or Linq to Entities might be optimized by the DBMS in that way that a where clause has the same performance as a join."
and the link is
Why is LINQ JOIN so much faster than linking with WHERE?
i think it is very useful.

I am attempting to perform a join between two tables and limit results by 3 conditions. 2 of the conditions belong to the primary table, the third condition belongs to the secondary table. Here is the query I'm attempting:
var articles = (from article in this.Context.contents
join meta in this.Context.content_meta on article.ID equals meta.contentID
where meta.metaID == 1 && article.content_statusID == 1 && article.date_created > created
orderby article.date_created ascending
select article.content_text_key);
It is meant to join the two tables by the contentID, then filter based on the metaID (type of article), statusID, and then get all articles that are greater than the datetime created. The problem is that it returns 2 records (out of 4 currently). One has a date_created less than created and the other is the record that produced created in the first place (thus equal).
By removing the join and the where clause for the meta, the result produces no records (expected). What I can't understand is that when I translate this join into regular SQL it works just fine. Obviously I'm misunderstanding what the functionality of join is in this context. What would cause this behavior?
Edit:
Having tried this in LinqPad, I've noticed that LinqPad provides the expected results. I have tried these queries separately in code and it isn't until the join is added that odd results begin populating it appears to be happening on any date comparison where the record occurs on the same day as the limiter.

I can't seem to be able to add a comment but in debug mode you should be able to put a break point on this line of code. When you do you should be able to hover over it and have it tell you the sql that LINQ generates. Please post that sql.

At your suggestion, I'm posting my comment as the answer:
"It might also help to see your schema. The data types for metaID,
content_statusID, and date_created might come into play as well -- and
it's easy for me (somebody who's unfamiliar with your code) to make
assumptions about those data types."

Linq: What is the difference between == and equals in a join? [duplicate]

This question already has an answer here:
Reason of equals keyword in LINQ's join statement
(1 answer)
Closed 8 months ago.
I always wondered why there's an equals keyword in linq joins rather than using the == operator.
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID equals w.ID
select p).First();
Instead of
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID == w.ID
select p).First();
[EDIT] Rephrased the question and revised the examples.

There's a nice explanation by Matt Warren at The Moth:
"The reason C# has the word ‘equals’ instead of the ‘==’ operator was to make it clear that the ‘on’ clause needs you to supply two separate expressions that are compared for equality not a single predicate expression. The from-join pattern maps to the Enumerable.Join() standard query operator that specifies two separate delegates that are used to compute values that can then be compared. It needs them as separate delegates in order to build a lookup table with one and probe into the lookup table with the other. A full query processor like SQL is free to examine a single predicate expression and choose how it is going to process it. Yet, to make LINQ operate similar to SQL would require that the join condition be always specified as an expression tree, a significant overhead for the simple in-memory object case."
However, this concerns join. I'm not sure equals should be used in your code example (does it even compile?).

Your first version doesn't compile. You only use equals in joins, to make the separate halves of the equijoin clear to the compiler.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.