Left outer join using LINQ -- understanding the code

Left outer join using LINQ -- understanding the code - c#

I would be grateful if someone could explain the meaning of the term into while using LINQ. In general, I am trying to understand how to make INNER JOIN, LEFT OUTER JOIN etc. in C#.
I have the main table Students that stores a few foreign ID keys which then are substituted by their names when running a query. The names are read from look up tables such as Marks, SoftwareVersions, Departments etc. All fields are required but MarkID. The query I tried to build in LINQ is this:
SELECT * FROM dbo.Students
INNER JOIN dbo.Departments ON dbo.Students.DepartmentID=dbo.Departments.DepartmentID
INNER JOIN dbo.SoftwareVersions ON dbo.Students.SoftwareVersionID=dbo.SoftwareVersions.SoftwareVersionID
INNER JOIN dbo.Statuses ON dbo.Students.StatusID=dbo.Statuses.StatusID
LEFT JOIN dbo.Marks ON dbo.Students.MarkID=dbo.Marks.MarkID
WHERE dbo.Students.DepartmentID=17;
I somehow managed to get the code below worked after reading plenty of articles and watching some videos but I don't feel like I have a complete understanding of the code. The bits that confuse me are in 5th line ending with into and then in the very next line beginning with from m .... I'm confused what into does and and what really happens in from m .... And this is the code in LINQ:
var result = from st in dbContext.Students where st.DepartmentID == 17
join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
join m in dbContext.Marks on st.MarkID equals m.MarkID into marksGroup
from m in marksGroup.DefaultIfEmpty()
select new
{
student = st.StudentName,
department = p.DepartmentName,
software = sv.SoftwareVersionName,
status = st.StatusName,
marked = m != null ? m.MarkName : "-- Not marked --"
};

I believe Example section from How to: Perform Left Outer Joins MSDN page is really well explained. Let's project it to your example. To quote first paragraph from the page
The first step in producing a left outer join of two collections is to
perform an inner join by using a group join. (See How to: Perform
Inner Joins (C# Programming Guide) for an explanation of this
process.) In this example, the list of Person objects is inner-joined
to the list of Pet objects based on a Person object that matches
Pet.Owner.
So in your case, the first step is to perform an inner join of list of Students objects with the list of Marks objects based on MarkID in Students object matches MarkID in Marks object. As can be seen in the quote, inner join is being performed using group join. If you check Note section in MSDN page on how to perform group join, you can see that
Each element of the first collection appears in the result set of a
group join regardless of whether correlated elements are found in the
second collection. In the case where no correlated elements are found,
the sequence of correlated elements for that element is empty. The
result selector therefore has access to every element of the first
collection.
What this means in the context of your example, is that by using into you have group joined results where you have all Students objects, and sequence of correlated elements of Marks objects (in case there is no matching Marks objects, the sequence is going to be empty).
Now let's go back to How to: Perform Left Outer Joins MSDN page, in particular second paragraph
The second step is to include each element of the first (left)
collection in the result set even if that element has no matches in
the right collection. This is accomplished by calling DefaultIfEmpty
on each sequence of matching elements from the group join. In this
example, DefaultIfEmpty is called on each sequence of matching Pet
objects. The method returns a collection that contains a single,
default value if the sequence of matching Pet objects is empty for any
Person object, thereby ensuring that each Person object is represented
in the result collection.
Again, to project this to your example, DefaultIsEmpty() is being called on each sequence of matching Marks objects. As explained above, the method returns a collection that contains a single, default value if the sequence of matching Marks objects is empty for any Student object, which ensures each Student object will be represented in the resulting collection. As a result what you have is set of elements, that contain all Student objects, and matching Marks object, or if there is no matching Marks object, default value of Marks, which in this case is null.

what I can say is that "into MarksGroup" stores the result data of your joined tables into a temporary (application based, not database based) resultset (in sql terms: a table, so its a SELECT INTO)
In the next line, your code then selects from Marksgroup the columns with your data (in sql terms: SELECT student, department, software, status, marked FROM Marksgroup
So basically, it's getting your data from the db, then putting it aside to "Marksgroup, and in the very next step getting Marksgroup back in your fingers to take out the data you want to use in your c# code.
Try to get rid of Marksgroup, it should be possible (haven't tested ist with your code). It should be something like this:
from st in dbContext.Students where st.DepartmentID == 17
join d in dbContext.Departments on st.DepartmentID equals d.DepartmentID
join sv in dbContext.SoftwareVersions on st.SoftwareVersionID equals sv.SoftwareVersionID
join stat in dbContext.Statuses on st.StatusID equals stat.StatusID
join m in dbContext.Marks on st.MarkID equals m.MarkID
select new
{
student = st.StudentName,
department = p.DepartmentName,
software = sv.SoftwareVersionName,
status = st.StatusName,
marked = m != null ? m.MarkName : "-- Not marked --"
};
Your second question with 'm' : This should also show a different behaviour without your temporary resultset "Marksgroup"

Related

LINQ Join Query Structure Resulting in CS0119 Error

I'm trying to join two different class models in an MVC project together so I can order them ascending/descending. I've tried several permutations but can't seem to get the LINQ query to play nice. Any suggestions on what I'm doing wrong?
var lateContact = from c in JPLatestContact
join s in JPStudent on c.ApplicationUserId equals
s.ApplicationUserId
orderby c.JPLatestContactDate ascending
select s;
I'm a beginner when it comes to this, but if I'm understanding this correctly, the "c" and "s" are variables I make up myself. "JPLatestContact" and "JPStudent" are the two models/classes/tables I want to join, and both have "ApplicationUserId" that I can join them on, and I want to order all the results by the value "JPLatestContactDate" found in the JPLatestContact model, in ascending order.
With the query I've written above, I'm getting a CS0119 error "'JPLatestContact' is a type, which is not valid in the given context."
I'm not sure where I'm going wrong with my structure, or have I misused the JOIN structure in some way?

You cannot run a LINQ select on a type, only on a collection of that type - i.e. anything that implements IEnumerable<JPLatestContact> or IQueryable<JPLatestContact>, such as List<JPLatestContact>, dbContext.JPLatestContact, etc. Same goes for JPStudent - you need a collection or IQueryable<JPStudent> for it.
Assuming that you are querying EF, the query should look like this:
var lateContact = from c in dbContext.JPLatestContact
join s in dbContext.JPStudent on c.ApplicationUserId equals
s.ApplicationUserId
orderby c.JPLatestContactDate ascending
select s;
Make sure that all entity names and property names match the actual names as defined in your EF model.

DefaultIfEmpty returns empty rows

I've been trying to perform a Left Join kind of expression in LINQ to Entities, however the DefaultIfEmpty method works differently to what I expected - it returns an empty row for each CounterNo that doesn't have a match in the Readings table.
var leftjoin = from counter in database.Counters
join reading in database.Readings
on counter.CounterNo equals reading.CounterNo into gj
from x in gj.DefaultIfEmpty()
select x;
This way I don't know which rows from the Counters table don't have a corresponding row the Readings table.
How do I make this work?

Sounds like you simply don't want to add in the from x in gj.DefaultIfEmpty(), and you instead want to have each item in the left table paired with a group of items in the right table (that group may have zero elements, which is how you know when there are no matching items) which is exactly the behavior you get when you remove that line.

Comparison operators not support for 'System.String[]' in a Linq query

I'm working on a Linq query to join data from two tables (using Linq to SQL), with the logic as follows:
Banners contains a field which has comma separated values in it. I want to split this column and have a list of IDs (for example 1,2,3,4)
References contains a list of these mappings with 1:1 mapping between the ID in banners and the ID in the reference table
Once the tables are merged I want to return the description from the reference table, which is the text representation of the ID.
I've been fiddling with this for a while and have hit a brick wall. Below is the code I am using (in LinqPad):
var results = (from b in Banners
where b.BannerCode == "1234"
from a in b.VesselBoatAreaY.Split (',').AsEnumerable()
join r in References on a equals r.ReferenceCode
where r.Context == "TestContext"
select r.Description).ToList();
I have confirmed that the first part of the query works, i.e. that banner code exists and returns 4 separate values. When I run the query as a whole however I get the following:
NotSupportedException
Comparison operators not supported for type 'System.String[]'.
I have also tried the following:
var results = (from b in Banners
where b.BannerCode == "1234"
from a in b.VesselBoatAreaY.Split (',').AsEnumerable()
from r in References
where r.Context == "TestContext" &&
a.Contains(r.ReferenceCode)
select r.Description).ToList();
When I run this I get the following:
ArgumentException
The argument 'value' was the wrong type. Expected 'System.String'. Actual 'System.String[]'.
Any help appreciated!

Thanks for everyones help. I've solved the problem and it was actually very easy. As the table I am reading from is quite small I can apply AsEnumerable to the Banners table and it works fine. I realise this means it will get processed in memory, so it's not good for bigger tables, but its fine for what I need.
For reference the code is now:
var results = (from b in Banners.AsEnumerable()
where b.BannerCode == "1234"
from a in b.VesselBoatAreaY.Split (',')
from r in References.AsEnumerable()
where r.Context == "TestContext" &&
a.Contains(r.ReferenceCode)
select r.Description).ToList();

Correct Join statement in Linq

I searched for the answer of my question, since am a beginner am not able to get those all, so asked my own,
I have two tables SUBMENU and AUTHORIZATION am tryin to put a join but not clear whether to put left or right or some other way, since am new to linq.
Here is what i have done so far,
var _lststage =
from sm in db.SUB_MENUs
join a in db.AUTHORISATIONs
on sm.SUB_MENU_ID equals a.SUB_MENU_ID into joined_autho
from jA in joined_autho.DefaultIfEmpty()
where sm.MENU_ID.Equals(ViewState["MenuId"]) &&
jA.Roleid Equals ddlroleid.Selectedvalue
select new
{
sm.SUB_MENU_ID,
sm.SUB_MENU_NAME,
jA.checkbox,
};
I want to get all the submenus from the submenu table based on the menuid in view state, and,
I need to get the value for check box in Authorization table based on the role id and submenuid, and if there is no value for that role id in authorization table default false value should return.
Hope i explained my scenario well,
Possibilities of duplicate question...
Sorry if.

Your query even will not compile. In your where condition second Equals should be method call:
// instead of: jA.Roleid Equals ddlroleid.Selectedvalue
jA.Roleid.Equals(ddlroleid.Selectedvalue)
Also you have unnecessary comma on last line of select statement:
// instead of: jA.checkbox,
jA.checkbox
Complete query should look like:
from sm in db.SUB_MENUs
join a in db.AUTHORISATIONs
on sm.SUB_MENU_ID equals a.SUB_MENU_ID into joined_autho
from jA in joined_autho.DefaultIfEmpty()
where sm.MENU_ID.Equals(ViewState["MenuId"]) &&
jA.Roleid.Equals(ddlroleid.Selectedvalue)
select new
{
sm.SUB_MENU_ID,
sm.SUB_MENU_NAME,
jA.checkbox
};
I believe you have tagged your question appropriately, and this is Linq to SQL query.

Having some trouble understanding Linq's INTO keyword

1)
into keyword creates temporary identifier for storing results of join,
group or select clauses.
I assume into keyword can only be used as part of group, join or select clauses?
2)
a) I've read that when into is used as a part of group or select clauses, it splices the query in two halves and because of that range variables declared in first half of the query ALWAYS go out of scope in the second half of the query. Correct?
b) But when into is used as part of the join clause, rangle variables NEVER go out of the scope within the query ( unless query also contains group...into or select...into ). I assume this is due to into not splicing the query in two halves when used with join clause?
c)
A query expression consists of a from clause followed by optional query body ( from,where,let clauses ) and must end with either select of group clause.
d) If into indeed splices query into two halves, is in the following example group clause part of the body:
var result = from c1 in a1
group c1 by c1.name into GroupResult
select ...
thank you
Reply to Ufuk:
a)
After a group by you get a sequence of like this
IEnumerable>
Doesn't a GroupBy operator return a result of type IEnumerable<IGrouping<Key,Foo>> and not IEnumerable<Key,IEnumerable<Foo>>
b) Couldn't we arguee that group...by...into or join...into do splice the query in a sense that first half of the query at least conceptually must run before the second half of the query can run?
Reply to Robotsushi:
the more I'm thinking about it, the more I get the feeling that my question is pretty pointless since it has no practical value what so ever. Still...
When you say it gets split. Do you mean the scope of the variables
gets split or the sql query generated gets split
Here is the quote:
In many cases the range variables on one side of this divide cannot be
mixed with the range variables on the other side. The into keyword
that is part of this group-by clause is used to link or splice the
two halves of this query. As such, it marks the boundary in the midst
of the query over which range variables typically cannot climb. The
range variables above the into keyword go out of scope in the last
part of this query.
My question is whether both halves are still considered a single query and as such the entire query still consists of just three parts. If that is the case, then in my code example ( under d) ) group clause is part of the body. But if both halves are considered two queries, then each of the two queries will consist of three parts
2. reply to Robotsushi:
This chunk of your query is evaluated as one data pull.
I'm not familiar with the term "data pull", so I'm going to guess that what you were trying to say is that first half of the query executes/evaluates as a unit, and then second half of the query takes the results from the first half and uses the results in its execution/evaluation? In other words, conceptually we have two queries?

group... by ...into
A group by has to provide a different kind of sequence after the operation.
You have a sequence like this:
IEnumerable<Foo>
After a group by you get a sequence of like this
IEnumerable<Key,IEnumerable<Foo>>
Now your items are in nested sequences and you don't have direct access to them. That's why identifiers in first part are out of scope. Since your first part is out of scope, you are left with the identifier after the into. It has ended and a new query can begin. Your second part of the query works on a total different sequence from the first one. It's a continuation.
from foo in foolist
group foo by foo.name into grouped
//foo is out of scope, you are working on a different sequence now
//and you have a ready to use range variable for your second query
join ... on ... into
On the other hand, group join is not that kind of operation.They operate on two sequences where group by operates on one. They will provide matching elements on the right sequence for the left sequence.
IEnumerable<Left> and IEnumerable<Right>
After the operation it lets you use the identifier from the left sequence, but the identifier in right is out of scope. That's because join returns a sequence of them now. So again you don't have direct access to them. The outcome of group join is like:
IEnumerable<Left,IEnumerable<Right>>
When you use group join, only right range variable goes out of scope. While the left part still remains, you are still working on the same sequence. You haven't provided a projection yet, so you can't continue a second query.
from left in leftList
join right from rightList
on left.Key equals right.Key into joinedRights
// left is still your range variable, you are still enumerating leftList
// you have to provide a projection here but you won't have a ready to use range variable
// that's why it's not a continuation.

1) correct... to be more specific into provides a reference to the results of a join, group, or select clause that will be out of scope.
2) I don't think your query is split as a result of using into as it is usage is most commonly:
The use of into in a group clause is only necessary when you want to
perform additional query operations on each group
Added Response
I've read that when into is used as a part of group or select
clauses, it splices the query in two halves and because of that range
variables declared in first half of the query ALWAYS go out of scope
in the second half of the query. Correct?
This chunk of your query is evaluated as one data pull. The group keyword requires a sort operation to continue evaluation of your LINQ Query:
from c1 in a1
group c1 by c1.name into GroupResult
So in the following select:
select ...
The variables from the first part of the query would have been evaluated, however since you include the into keyword you can work with the results of the query in the select because they are stored into the GroupResult variable.
But when into is used as part of the join clause, rangle variables
NEVER go out of the scope within the query ( unless query also
contains group...into or select...into ). I assume this is due to into
not splicing the query in two halves when used with join clause?
The query is still evaluated in two parts however the GroupResult gives you access to what was declared before the group keyword.
A query expression consists of a from clause followed by optional
query body ( from,where,let clauses ) and must end with either select
of group clause.
This is a definition not a question.
If into indeed splices query into two halves, is in the following
example group clause part of the body:
The group is part of the first half of the query.
This LINQ query shown would generate one sql statement just in case you were curious.
2nd Update
I'm not familiar with the term "data pull", so I'm going to guess that
what you were trying to say is that first half of the query
executes/evaluates as a unit, and then second half of the query takes
the results from the first half and uses the results in its
execution/evaluation? In other words, conceptually we have two
queries?
Yes there are two different parts of the query.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.