Having some trouble understanding Linq's INTO keyword

Having some trouble understanding Linq's INTO keyword - c#

1)
into keyword creates temporary identifier for storing results of join,
group or select clauses.
I assume into keyword can only be used as part of group, join or select clauses?
2)
a) I've read that when into is used as a part of group or select clauses, it splices the query in two halves and because of that range variables declared in first half of the query ALWAYS go out of scope in the second half of the query. Correct?
b) But when into is used as part of the join clause, rangle variables NEVER go out of the scope within the query ( unless query also contains group...into or select...into ). I assume this is due to into not splicing the query in two halves when used with join clause?
c)
A query expression consists of a from clause followed by optional query body ( from,where,let clauses ) and must end with either select of group clause.
d) If into indeed splices query into two halves, is in the following example group clause part of the body:
var result = from c1 in a1
group c1 by c1.name into GroupResult
select ...
thank you
Reply to Ufuk:
a)
After a group by you get a sequence of like this
IEnumerable>
Doesn't a GroupBy operator return a result of type IEnumerable<IGrouping<Key,Foo>> and not IEnumerable<Key,IEnumerable<Foo>>
b) Couldn't we arguee that group...by...into or join...into do splice the query in a sense that first half of the query at least conceptually must run before the second half of the query can run?
Reply to Robotsushi:
the more I'm thinking about it, the more I get the feeling that my question is pretty pointless since it has no practical value what so ever. Still...
When you say it gets split. Do you mean the scope of the variables
gets split or the sql query generated gets split
Here is the quote:
In many cases the range variables on one side of this divide cannot be
mixed with the range variables on the other side. The into keyword
that is part of this group-by clause is used to link or splice the
two halves of this query. As such, it marks the boundary in the midst
of the query over which range variables typically cannot climb. The
range variables above the into keyword go out of scope in the last
part of this query.
My question is whether both halves are still considered a single query and as such the entire query still consists of just three parts. If that is the case, then in my code example ( under d) ) group clause is part of the body. But if both halves are considered two queries, then each of the two queries will consist of three parts
2. reply to Robotsushi:
This chunk of your query is evaluated as one data pull.
I'm not familiar with the term "data pull", so I'm going to guess that what you were trying to say is that first half of the query executes/evaluates as a unit, and then second half of the query takes the results from the first half and uses the results in its execution/evaluation? In other words, conceptually we have two queries?

group... by ...into
A group by has to provide a different kind of sequence after the operation.
You have a sequence like this:
IEnumerable<Foo>
After a group by you get a sequence of like this
IEnumerable<Key,IEnumerable<Foo>>
Now your items are in nested sequences and you don't have direct access to them. That's why identifiers in first part are out of scope. Since your first part is out of scope, you are left with the identifier after the into. It has ended and a new query can begin. Your second part of the query works on a total different sequence from the first one. It's a continuation.
from foo in foolist
group foo by foo.name into grouped
//foo is out of scope, you are working on a different sequence now
//and you have a ready to use range variable for your second query
join ... on ... into
On the other hand, group join is not that kind of operation.They operate on two sequences where group by operates on one. They will provide matching elements on the right sequence for the left sequence.
IEnumerable<Left> and IEnumerable<Right>
After the operation it lets you use the identifier from the left sequence, but the identifier in right is out of scope. That's because join returns a sequence of them now. So again you don't have direct access to them. The outcome of group join is like:
IEnumerable<Left,IEnumerable<Right>>
When you use group join, only right range variable goes out of scope. While the left part still remains, you are still working on the same sequence. You haven't provided a projection yet, so you can't continue a second query.
from left in leftList
join right from rightList
on left.Key equals right.Key into joinedRights
// left is still your range variable, you are still enumerating leftList
// you have to provide a projection here but you won't have a ready to use range variable
// that's why it's not a continuation.

1) correct... to be more specific into provides a reference to the results of a join, group, or select clause that will be out of scope.
2) I don't think your query is split as a result of using into as it is usage is most commonly:
The use of into in a group clause is only necessary when you want to
perform additional query operations on each group
Added Response
I've read that when into is used as a part of group or select
clauses, it splices the query in two halves and because of that range
variables declared in first half of the query ALWAYS go out of scope
in the second half of the query. Correct?
This chunk of your query is evaluated as one data pull. The group keyword requires a sort operation to continue evaluation of your LINQ Query:
from c1 in a1
group c1 by c1.name into GroupResult
So in the following select:
select ...
The variables from the first part of the query would have been evaluated, however since you include the into keyword you can work with the results of the query in the select because they are stored into the GroupResult variable.
But when into is used as part of the join clause, rangle variables
NEVER go out of the scope within the query ( unless query also
contains group...into or select...into ). I assume this is due to into
not splicing the query in two halves when used with join clause?
The query is still evaluated in two parts however the GroupResult gives you access to what was declared before the group keyword.
A query expression consists of a from clause followed by optional
query body ( from,where,let clauses ) and must end with either select
of group clause.
This is a definition not a question.
If into indeed splices query into two halves, is in the following
example group clause part of the body:
The group is part of the first half of the query.
This LINQ query shown would generate one sql statement just in case you were curious.
2nd Update
I'm not familiar with the term "data pull", so I'm going to guess that
what you were trying to say is that first half of the query
executes/evaluates as a unit, and then second half of the query takes
the results from the first half and uses the results in its
execution/evaluation? In other words, conceptually we have two
queries?
Yes there are two different parts of the query.

Related

force to use join in nhibernate syntax instead of iteration

i am trying to figure out why nhibernate query iterates on values instead of using joins internally. due to this iteration it becomes slower, cause it will iterates all values one by one.
i.e. it generates n no. of queries and execute it one by one instead of joins.
documentClrType is evaluated dynamically at runtime, so i can't use directly in QueryOver<> syntax
documentClrType is FactSheetPrivate as of now.
i observed query in logger it is created something like this.
select * from foo where col1=#val1
select * from foo where col1=#val2
select * from foo where col1=#val3
select * from foo where col1=#val4
so,
how could i turn this query to joins instead of iterations?
what could be syntax for this dynamic types with query over?
i am bit new to nhibernate, any guidance will be appreciated.
var criteria=
store.Session.
CreateCriteria(documentClrType)
.Add(Restrictions.Disjunction()
.Add(Restrictions.Le("CreationDate",)DateTime.Now )
.Add(Restrictions.Le("AccurateDate",)DateTime.Now )
)
criteria=criteria.CreateCriteria("Entity")
.Add(Restrictions.Eq("DBTypeString",receiverType))
return criteria.List<IDocument>()
// at this lines instead of join query iterates value one by one
following screen display entity nhibernate xml file
following screen display packets nhibernate XML file

NHibernate has a feature to batch the selects
configuration.SetProperty(NHibernate.Cfg.Environment.DefaultBatchFetchSize, "20")
or use eager fetching
return criteria.SetFetchMode("TheReferenceOrCollectionProperty", FetchMode.Eager).List<IDocument>()

What is the correct order to use LinQ statements?

I often use LinQ statements to query with EF, or to filter data, or to search my data collections, but I've always had that doubt about which is the first statement to write.
Let's say we have a query similar to this:
var result = Data.Where(x => x.Text.StartsWith("ABC")).OrderBy(x => x.Id).Select(x => x.Text).Take(5).ToList();
The same query works even if the statements are in different order, for example:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
I understand that there are certain statements that do modify the expected result, but my doubt is with those that do not modify, as in the previous example. Does a specified order or any good practice guide exist for this?

It will give you different results. Let's assume that you have following ids:
6,5,4,3,2,1
The first statement will give you
1,2,3,4,5
and the second one
2,3,4,5,6
I assumed that all objects with following ids start with ABC
Edit: I think I haven't answered the question properly. Yes, there is a difference. In the first example you only sort 5 elements however in the second one you order all elements which is definitely slower than the first one.

Does a specified order or any good practice guide exist for this?
No, because the order determines what the result is. In SQL (a declarative language), SELECT always comes before WHERE, which comes before GROUP BY, etc., and the parsing engine turns that into an execution plan which will execute in whatever order the optimizer thinks is best.
So selecting, then ordering, then grouping all happens on the data specified by the FROM clause(s), so order does not matter.
C# (within methods) is a procedural language, meaning that statements will be executed in the exact order that you provide them.
When you select, then order, the ordering applies to the selection, meaning that if you select a subset of fields (or project to different fields), the ordering applies to the projection. If you order, then select, the ordering applies to the original data, then the projection applies to the ordered data data.
In your second edited example, the query seems to be broken because you are specifying properties that would be lost from the projection:
var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
^
at this (^) point, you are projecting just the Text property, which I'm assuming sia string, and thus the subsequent Select is working on a collection of strings, which would not have a Text property to filter off of.
Certainly you could change the Where to filter the strings directly, but it illustrates that shifting the order of commands can have a catastrophic impact on the query. It might not make a difference, as you are trying to illustrate, for example, ordering then filtering should be logically equivalent to filtering then ordering (assuming that one doesn't impact the other), and there's no "best practice" to say which should go first, so the right answer (if there is one) would be determined on a case-by-case basis.

Selecting Consecutive String Entries with LINQ to Entities

At first you might think this is duplicate of this question but hopefully you will see it is not.
I also want to select groups of rows that are consecutive but consider that this time the entries are telephone numbers, therefore, stored as string.
I have been trying somethink like:
var numbers = await (from a in context.Telephones
from b in context.Telephones
Convert.ToInt32(a.Number) < Convert.ToInt32(b.Number) &&
Convert.ToInt32(b.Number) < (Convert.ToInt32(a.Number) + numberQuantity)
group b by new { a.Number }
into myGroup
where myGroup.Count() + 1 == numberQuantity
select myGroup.Key.Number).ToListAsync();
But this fails with:
LINQ to Entities does not recognize the method 'Int32 ToInt32(System.String)' method, and this method cannot be translated into a store expression.
I understand that LINQ to Entities does not support Convert.ToInt32 but I am running out of ideas here to make it work.
So if my database has:
2063717608
2063717609
2063717610
2063717611
2063717613
2063717614
How can I select consecutive rows based on the string values? And when querying for 3 consecutive numbers get results like:
From 2063717608 to 2063717610
From 2063717609 to 2063717611

1- If you are aware of performance side effect of calling AsEnumerable() cast your query and do conversion in memory on the retrieved entities.
2- If you don't want solution #1, you have to look for a way to solve the conversion problem:
2-1- Either change the column type in the database to int
2-2- Or select one of the solution previously proposed by other developers such as:
Problem with converting int to string in Linq to entities

Which LINQ expression do I need for this, without looping?

I have an MSSQL database with LINQ to SQL.
I have three tables.
Requests -> id, string name
Results -> id, requestID, int jumps
Places -> id, resultID, int location
Then, using an input string, I need to get an ICollectable or array or something of Place which meets the following:
Each Request that has name=input, take its ID.[you can assume only one has]
Each Result that has requestID=ID[from above] - take its id.
Each Place that has resultID='id[from above]' - append to array for further processing.
I made it by looping on all Results and then executing another LINQ statement, but its extremely slow [about 500ms for a single request!]. Can I make it any faster?
Thank you!
Edit: Whoops, I also need it grouped by result. aka a List of List of Places, while each inner list contains one column from Result.

You can perform table joins in Linq2Sql using the join keyword:
var places = from request in Requests
join result in Results on request.Id equals result.requestID
join place in Places on result.Id equals place.ResultId
where request.name = input
select place;

Somthing like
Requests.Where(r => r.name == input).Results.Places.Select();
If this is too slow then I expect you need some indexes on your database.
If you don't have the relationships in your model then you need to establish some foreign key constraints on your tables an rebuild your model.

How can I make this SelectMany use a Join?

Given that I have three tables (Customer, Orders, and OrderLines) in a Linq To Sql model where
Customer -- One to Many -> Orders -- One to Many -> OrderLines
When I use
var customer = Customers.First();
var manyWay = from o in customer.CustomerOrders
from l in o.OrderLines
select l;
I see one query getting the customer, that makes sense. Then I see a query for the customer's orders and then a single query for each order getting the order lines, rather than joining the two. Total of n + 1 queries (not counting getting customer)
But if I use
var tableWay = from o in Orders
from l in OrderLines
where o.Customer == customer
&& l.Order == o
select l;
Then instead of seeing a single query for each order getting the order lines, I see a single query joining the two tables. Total of 1 query (not counting getting customer)
I would prefer to use the first Linq query as it seems more readable to me, but why isn't L2S joining the tables as I would expect in the first query? Using LINQPad I see that the second query is being compiled into a SelectMany, though I see no alteration to the first query, not sure if that's a indicator to some problem in my query.

I think the key here is
customer.CustomerOrders
Thats an EntitySet, not an IQueryable, so your first query doesn't translate directly into a SQL query. Instead, it is interpreted as many queries, one for each Order.
That's my guess, anyway.

How about this:
Customers.First().CustomerOrders.SelectMany(item => item.OrderLines)

I am not 100% sure. But my guess is because you are traversing down the relationship that is how the query is built up, compared to the second solution where you are actually joining two sets by a value.

So after Francisco's answer and experimenting with LINQPad I have come up with a decent workaround.
var lines = from c in Customers
where c == customer
from o in c.CustomerOrders
from l in o.OrderLines
select l;
This forces the EntitySet into an Expression which the provider then turns into the appropriate query. The first two lines are the key, by querying the IQueryable and then putting the EntitySet in the SelectMany it becomes an expression. This works for the other operators as well, Where, Select, etc.

Try this query:
IQueryable<OrderLine> query =
from c in myDataContext.customers.Take(1)
from o in c.CustomerOrders
from l in o.OrderLines
select l;
You can go to the CustomerOrders property definition and see how the property acts when it used with an actual instance. When the property is used in a query expression, the behavior is up to the query provider - the property code is usually not run in that case.
See also this answer, which demonstrates a method that behaves differently in a query expression, than if it is actually called.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.