I have been reading Programming Microsoft LINQ in Microsoft .NET Framework 4, and now I am understanding the join clause in LINQ, but I have a doubt or question respect about its definition; in the book it is defines as:
You can define equality comparisons only by using a special equals keyword that behaves differently from the == operator, because the position of the operands is significant. With equals, the left key consumes the outer source sequence, and the right key consumes the inner source sequence. The outer source sequence is in scope only on the left side of equals, and the inner source sequence is in scope only on the right side.
And there is also a formal definition about this operator:
join-clause ::= join innerItem in innerSequence on outerKey equals innerKey
Please, can someone explain me the above concept in other words or by paraphrasing it?
I guess it's because 'equals' in the join doesn't work like == does, and so the language designers decided not to call what it is doing the same thing.
In C#, it is sort of given that a == b is exactly the same as b == a. In the definition of a join, this is not so:
var list = from a in ctx.TableA
join b from ctx.TableB on a.Id equals b.tableAId
This, above, is valid.
var list = from a in ctx.TableA
join b from ctx.TableB on b.tableAId equals a.Id
This will not compile. What the language spec says is that the 'outer' table (TableA in this case) must be specified first and the inner one (TableB) must be second. I suppose that the language designers thought that this was sufficiently different from the way that == works that it would be a bad idea to use it and they came up with the idea to use 'equals'.
I think I'm probably right, but only the language designers involved will really know the truth.
Related
I would be grateful if someone could explain the meaning of the term into while using LINQ. In general, I am trying to understand how to make INNER JOIN, LEFT OUTER JOIN etc. in C#.
I have the main table Students that stores a few foreign ID keys which then are substituted by their names when running a query. The names are read from look up tables such as Marks, SoftwareVersions, Departments etc. All fields are required but MarkID. The query I tried to build in LINQ is this:
SELECT * FROM dbo.Students
INNER JOIN dbo.Departments ON dbo.Students.DepartmentID=dbo.Departments.DepartmentID
INNER JOIN dbo.SoftwareVersions ON dbo.Students.SoftwareVersionID=dbo.SoftwareVersions.SoftwareVersionID
INNER JOIN dbo.Statuses ON dbo.Students.StatusID=dbo.Statuses.StatusID
LEFT JOIN dbo.Marks ON dbo.Students.MarkID=dbo.Marks.MarkID
WHERE dbo.Students.DepartmentID=17;
I somehow managed to get the code below worked after reading plenty of articles and watching some videos but I don't feel like I have a complete understanding of the code. The bits that confuse me are in 5th line ending with into and then in the very next line beginning with from m .... I'm confused what into does and and what really happens in from m .... And this is the code in LINQ:
var result = from st in dbContext.Students where st.DepartmentID == 17
join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
join m in dbContext.Marks on st.MarkID equals m.MarkID into marksGroup
from m in marksGroup.DefaultIfEmpty()
select new
{
student = st.StudentName,
department = p.DepartmentName,
software = sv.SoftwareVersionName,
status = st.StatusName,
marked = m != null ? m.MarkName : "-- Not marked --"
};
I believe Example section from How to: Perform Left Outer Joins MSDN page is really well explained. Let's project it to your example. To quote first paragraph from the page
The first step in producing a left outer join of two collections is to
perform an inner join by using a group join. (See How to: Perform
Inner Joins (C# Programming Guide) for an explanation of this
process.) In this example, the list of Person objects is inner-joined
to the list of Pet objects based on a Person object that matches
Pet.Owner.
So in your case, the first step is to perform an inner join of list of Students objects with the list of Marks objects based on MarkID in Students object matches MarkID in Marks object. As can be seen in the quote, inner join is being performed using group join. If you check Note section in MSDN page on how to perform group join, you can see that
Each element of the first collection appears in the result set of a
group join regardless of whether correlated elements are found in the
second collection. In the case where no correlated elements are found,
the sequence of correlated elements for that element is empty. The
result selector therefore has access to every element of the first
collection.
What this means in the context of your example, is that by using into you have group joined results where you have all Students objects, and sequence of correlated elements of Marks objects (in case there is no matching Marks objects, the sequence is going to be empty).
Now let's go back to How to: Perform Left Outer Joins MSDN page, in particular second paragraph
The second step is to include each element of the first (left)
collection in the result set even if that element has no matches in
the right collection. This is accomplished by calling DefaultIfEmpty
on each sequence of matching elements from the group join. In this
example, DefaultIfEmpty is called on each sequence of matching Pet
objects. The method returns a collection that contains a single,
default value if the sequence of matching Pet objects is empty for any
Person object, thereby ensuring that each Person object is represented
in the result collection.
Again, to project this to your example, DefaultIsEmpty() is being called on each sequence of matching Marks objects. As explained above, the method returns a collection that contains a single, default value if the sequence of matching Marks objects is empty for any Student object, which ensures each Student object will be represented in the resulting collection. As a result what you have is set of elements, that contain all Student objects, and matching Marks object, or if there is no matching Marks object, default value of Marks, which in this case is null.
what I can say is that "into MarksGroup" stores the result data of your joined tables into a temporary (application based, not database based) resultset (in sql terms: a table, so its a SELECT INTO)
In the next line, your code then selects from Marksgroup the columns with your data (in sql terms: SELECT student, department, software, status, marked FROM Marksgroup
So basically, it's getting your data from the db, then putting it aside to "Marksgroup, and in the very next step getting Marksgroup back in your fingers to take out the data you want to use in your c# code.
Try to get rid of Marksgroup, it should be possible (haven't tested ist with your code). It should be something like this:
from st in dbContext.Students where st.DepartmentID == 17
join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
join m in dbContext.Marks on st.MarkID equals m.MarkID
select new
{
student = st.StudentName,
department = p.DepartmentName,
software = sv.SoftwareVersionName,
status = st.StatusName,
marked = m != null ? m.MarkName : "-- Not marked --"
};
Your second question with 'm' : This should also show a different behaviour without your temporary resultset "Marksgroup"
I've periodically seen it written that joins are unnecessary in LINQ to SQL. Most recently, I saw this statement in one of Joseph Albahari's LINQPad samples (Chapter 9 - LINQ Operators > Filtering > Joining > Simple Join).
The comment says:
// Note: before delving into this section, make sure you've read the preceding two
// sections: Select and SelectMany. The Join operators are actually unnecessary
// in LINQ to SQL, and the equivalent of SQL inner and outer joins is most easily
// achieved in LINQ to SQL using Select/SelectMany and subqueries!
I've gone through the Select and SelectMany sections in LINQPad and I definitely want to be doing this the easy way, but my attempts to completely remove joins (and get the same results) have failed.
Anyway, below is the 100% working query I'm trying this out on (full schema pictured below).
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
join projectsBillingSchedule in dbContext.ProjectsBillingSchedules
on workOrder.ProjectId equals projectsBillingSchedule.ProjectId
join partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
on projectsBillingSchedule.BillingSchId equals partyPricing.BILLING_SCH_ID
join measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
on partyPricing.PARTY_RETROFIT_CODE_ID equals measuresPartyRetrofitCode.PartyRetrofitCodeId
join measure in dbContext.Measures on measuresPartyRetrofitCode.ConvId equals measure.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Please note, certain entities are omitted from the code because they are not strictly necessary for the query to run properly. Aside from that, you can see the joins are done in order of the relationships in the schema image, i.e., from WORK_ORDERS to MEASURES (start by moving away from WORK_ORDER_LINES):
I have tried some using navigation properties, but I run into 2 problems:
The SQL outputs in multiple statements (N+1 problem?), and
I can't seem to get it all in one statement.
So, back to my question - using the example above (or something else with a lot of joins), how are join operators unnecessary in LINQ to SQL?
UPDATE
Ok, I think I've figured out one solution, but this actually requires more lines of code to achieve the same result.
Since that is the case, I'm not sure why it is worth exclaiming that join operators are unnecessary. I'll leave this question open for a while to see if someone wants to make a compelling case against joins.
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
from projectsBillingSchedule in dbContext.ProjectsBillingSchedules
where workOrder.ProjectId == projectsBillingSchedule.ProjectId
from partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
where projectsBillingSchedule.BillingSchId == partyPricing.BILLING_SCH_ID
from measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
where partyPricing.PARTY_RETROFIT_CODE_ID == measuresPartyRetrofitCode.PartyRetrofitCodeId
from measure in dbContext.Measures
where measure.ConvId == measuresPartyRetrofitCode.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
I have three tables (Table A, B and C).
I would like to do the following:
Left Join A with B and Left join A with C.
Now I have used CreateCriteria, to do the joins using jointype which worked upto a certain point but it's throwing Query exception. It seems this is because it's seems to attempt to left join B with C rather than A and C.
Code:
currencies = session.CreateCriteria(typeof(Currency), "TableA")
.CreateCriteria("FXRates", "TableB",
JoinType.LeftOuterJoin,
Expression.Eq("fxrate.RateDate",date))
.CreateCriteria("FundingRates", "TableC",
JoinType.LeftOuterJoin,
Expression.Eq("fundingrate.RateDate", date))
.Add(Restrictions.IsNotNull("currency.code"))
.List<Currency>();
Apologies in advanced if I have missed out anything or not provided enough detail, let me know if you need more...
You can use NHibernate "QueryOver" to do it:
session.QueryOver<Item_A>()
.Left.JoinQueryOver(item_A => item_A.Item_B)
.Left.JoinQueryOver(item_A => item_A.Item_C)
.TransformUsing(Transformers.DistinctRootEntity)
.List();
I can't seem to find any solid answer to the problem, I'm hoping someone will be able to help me here.
Sample query:
select * from A a inner join B b on a.Id = b.Id Or a.Date = b.Date
Basically I want to know if it's possible to implement the second part of the join condition using criteria, and if it is possible, how to go about it. If anyone can please let me know, that will be great! Thanks a bunch!
It might be, but that query is more clear written in HQL:
select a from A a, B b where a.Id = b.Id or a.Date = b.Date
As you can see, it's almost the same as the SQL.
Unfortunately you cannot define ANSI syntax joins with NHibernate.
With NH2 and onward you can define them on HQL using a WITH clause and i say so because if you use Diego's solution you will have to set up a ISNULL OR pattern in your queries if you want to perform multiple joins.
to add conditions, use Expressions. For the OR disjunction, its less complex if you use Expresion.In
session.CreateCriteria(typeof(A), "a").CreateCriteria("B", "b", NHibernate.SqlCommand.JoinType.FullJoin)
.Add(Expression.Eq("a.Date", a.Date))
.Add(Expression.Eq("b.Date", b.Date))
This question already has an answer here:
Reason of equals keyword in LINQ's join statement
(1 answer)
Closed 8 months ago.
I always wondered why there's an equals keyword in linq joins rather than using the == operator.
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID equals w.ID
select p).First();
Instead of
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID == w.ID
select p).First();
[EDIT] Rephrased the question and revised the examples.
There's a nice explanation by Matt Warren at The Moth:
"The reason C# has the word ‘equals’ instead of the ‘==’ operator was to make it clear that the ‘on’ clause needs you to supply two separate expressions that are compared for equality not a single predicate expression. The from-join pattern maps to the Enumerable.Join() standard query operator that specifies two separate delegates that are used to compute values that can then be compared. It needs them as separate delegates in order to build a lookup table with one and probe into the lookup table with the other. A full query processor like SQL is free to examine a single predicate expression and choose how it is going to process it. Yet, to make LINQ operate similar to SQL would require that the join condition be always specified as an expression tree, a significant overhead for the simple in-memory object case."
However, this concerns join. I'm not sure equals should be used in your code example (does it even compile?).
Your first version doesn't compile. You only use equals in joins, to make the separate halves of the equijoin clear to the compiler.