This question already has an answer here:
Reason of equals keyword in LINQ's join statement
(1 answer)
Closed 8 months ago.
I always wondered why there's an equals keyword in linq joins rather than using the == operator.
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID equals w.ID
select p).First();
Instead of
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID == w.ID
select p).First();
[EDIT] Rephrased the question and revised the examples.
There's a nice explanation by Matt Warren at The Moth:
"The reason C# has the word ‘equals’ instead of the ‘==’ operator was to make it clear that the ‘on’ clause needs you to supply two separate expressions that are compared for equality not a single predicate expression. The from-join pattern maps to the Enumerable.Join() standard query operator that specifies two separate delegates that are used to compute values that can then be compared. It needs them as separate delegates in order to build a lookup table with one and probe into the lookup table with the other. A full query processor like SQL is free to examine a single predicate expression and choose how it is going to process it. Yet, to make LINQ operate similar to SQL would require that the join condition be always specified as an expression tree, a significant overhead for the simple in-memory object case."
However, this concerns join. I'm not sure equals should be used in your code example (does it even compile?).
Your first version doesn't compile. You only use equals in joins, to make the separate halves of the equijoin clear to the compiler.
Related
I have been reading Programming Microsoft LINQ in Microsoft .NET Framework 4, and now I am understanding the join clause in LINQ, but I have a doubt or question respect about its definition; in the book it is defines as:
You can define equality comparisons only by using a special equals keyword that behaves differently from the == operator, because the position of the operands is significant. With equals, the left key consumes the outer source sequence, and the right key consumes the inner source sequence. The outer source sequence is in scope only on the left side of equals, and the inner source sequence is in scope only on the right side.
And there is also a formal definition about this operator:
join-clause ::= join innerItem in innerSequence on outerKey equals innerKey
Please, can someone explain me the above concept in other words or by paraphrasing it?
I guess it's because 'equals' in the join doesn't work like == does, and so the language designers decided not to call what it is doing the same thing.
In C#, it is sort of given that a == b is exactly the same as b == a. In the definition of a join, this is not so:
var list = from a in ctx.TableA
join b from ctx.TableB on a.Id equals b.tableAId
This, above, is valid.
var list = from a in ctx.TableA
join b from ctx.TableB on b.tableAId equals a.Id
This will not compile. What the language spec says is that the 'outer' table (TableA in this case) must be specified first and the inner one (TableB) must be second. I suppose that the language designers thought that this was sufficiently different from the way that == works that it would be a bad idea to use it and they came up with the idea to use 'equals'.
I think I'm probably right, but only the language designers involved will really know the truth.
I have the following query:
var vendors = (from pp in this.ProductPricings
join pic in this.ProductItemCompanies
on pp.CompanyId equals pic.CompanyId into left
from pic in left.DefaultIfEmpty()
orderby pp.EffectiveDate descending
group pp by new { pp.Company, SortOrder = (pic != null) ? pic.SortOrder : short.MinValue } into v
select v).OrderBy(z => z.Key.SortOrder);
Does anyone know how the last OrderBy() is applied? Does that become part of the SQL query, or are all the results loaded in to memory and then passed to OrderBy()?
And if it's the second case, is there any way to make it all one query? I only need the first item and it would be very inefficent to return all the results.
Well it will try to apply the OrderBy to the original query since you are still using an IQueryable - meaning it hasn't been converted to an IEnumerable or hydrated to a collection using ToList or an equivalent.
Whether it can or not depends on the complexity of the resulting query. You'd have to try it to find out. My guess is it will turn the main query into a subquery and layer on a "SELECT * FROM (...) ORDER BY SortOrder" outer query.
Given your specific example the order by in this situation most, likely be appliead as part of the expression tree when it getting build, there for it will be applied to sql generated by the LINQ query, if you would convert it to Enumarable like ToList as mentioned in another answer then Order by would be applied as an extension to Enumerable.
Might use readable code, because as you write it is not understandable.
You will have a problem in the future with the linq statement. The problem is that if your statement does not return any value the value will be null and whenever you make cause a exception.
You must be careful.
I recommend you to do everything separately to understand the code friend.
I've periodically seen it written that joins are unnecessary in LINQ to SQL. Most recently, I saw this statement in one of Joseph Albahari's LINQPad samples (Chapter 9 - LINQ Operators > Filtering > Joining > Simple Join).
The comment says:
// Note: before delving into this section, make sure you've read the preceding two
// sections: Select and SelectMany. The Join operators are actually unnecessary
// in LINQ to SQL, and the equivalent of SQL inner and outer joins is most easily
// achieved in LINQ to SQL using Select/SelectMany and subqueries!
I've gone through the Select and SelectMany sections in LINQPad and I definitely want to be doing this the easy way, but my attempts to completely remove joins (and get the same results) have failed.
Anyway, below is the 100% working query I'm trying this out on (full schema pictured below).
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
join projectsBillingSchedule in dbContext.ProjectsBillingSchedules
on workOrder.ProjectId equals projectsBillingSchedule.ProjectId
join partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
on projectsBillingSchedule.BillingSchId equals partyPricing.BILLING_SCH_ID
join measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
on partyPricing.PARTY_RETROFIT_CODE_ID equals measuresPartyRetrofitCode.PartyRetrofitCodeId
join measure in dbContext.Measures on measuresPartyRetrofitCode.ConvId equals measure.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Please note, certain entities are omitted from the code because they are not strictly necessary for the query to run properly. Aside from that, you can see the joins are done in order of the relationships in the schema image, i.e., from WORK_ORDERS to MEASURES (start by moving away from WORK_ORDER_LINES):
I have tried some using navigation properties, but I run into 2 problems:
The SQL outputs in multiple statements (N+1 problem?), and
I can't seem to get it all in one statement.
So, back to my question - using the example above (or something else with a lot of joins), how are join operators unnecessary in LINQ to SQL?
UPDATE
Ok, I think I've figured out one solution, but this actually requires more lines of code to achieve the same result.
Since that is the case, I'm not sure why it is worth exclaiming that join operators are unnecessary. I'll leave this question open for a while to see if someone wants to make a compelling case against joins.
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
from projectsBillingSchedule in dbContext.ProjectsBillingSchedules
where workOrder.ProjectId == projectsBillingSchedule.ProjectId
from partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
where projectsBillingSchedule.BillingSchId == partyPricing.BILLING_SCH_ID
from measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
where partyPricing.PARTY_RETROFIT_CODE_ID == measuresPartyRetrofitCode.PartyRetrofitCodeId
from measure in dbContext.Measures
where measure.ConvId == measuresPartyRetrofitCode.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
This question already has an answer here:
Why does the Contains() operator degrade Entity Framework' Linq queries?
(1 answer)
Closed 9 years ago.
I was looking for some tips to improve my entity framework query performance and came accross this useful article.
The author of this article mentioned following:
08 Avoid using Contains
In LINQ, we use contains method for checking existence. It is converted to "WHERE IN" in SQL which cause performance degrades.
Which faster alternatives are remaining for me?
Contains is perfectly valid for the scenarios you WANT WHERE IN
EG:
var q = from p in products where new[]{1,50,77}.Contains(p.productId) select p;
gets (essentially) converted to
SELECT * FROM products WHERE ProductId IN (1,50,77)
However if you are checking for existence I would advice you to use .Any() , which gets converted to EXISTS -
EG
var q = from p in products
where p.productsLinkGroups.Any(x => x.GroupID == 5)
select p
Gets (more or less) coverted to:
SELECT * FROM products p
WHERE EXISTS(
SELECT NULL FROM productsLinkGroups plg
WHERE plg.GroupId = 5 AND plg.ProductId = p.ProductId
)
It is very context dependent, what you should be looking at is not avoiding .Contains() but rather how do you avoid WHERE xx IN yy in SQL. Could you do a join instead? Is it possible to specify an interval rather than discrete values?
A perfect example is presented here: Avoid SQL WHERE NOT IN Clause
Where it was possible to avoid it by using a join.
I would say that WHERE xx IN yy is usually just a half a solution, often what you really want is something else and you only get halfway there instead of going there directly, like in the case of a join.
var selectedProducts = from p in products
where p.Category == 1
select p;
var selectedProducts = products.Where(p=>p.Category==1) ;
The above 2 statements seems to produce same result.
Then what is the difference(some times internally)?
Which one is efficient more?
There is no difference. The first (the query expression) is translated to the second by the compiler, and has no impact on run time.
See also:
How query expressions work - By Jon Skeet
Query transformations are syntactic - By Eric Lippert
There isn't any difference between this two way in this case, but in some cases is better use query and In some cases is better to use extension method or impossible to use query.
you can use query in situation which are complicated with extension methods and unreadable like Join.
Also you can use extension method in some cases like Distinct which is not available in query syntax, Also you can use extension method calls for using method chaining to improve your code readability.
You can use mix of extension method and query but is not good (code readability): like
(from p in products
where p.Category == 1
select p).Distinct()