How does a LINQ expression know that Where() comes before Select()? - c#

I'm trying to create a LINQ provider. I'm using the guide LINQ: Building an IQueryable provider series, and I have added the code up to LINQ: Building an IQueryable Provider - Part IV.
I am getting a feel of how it is working and the idea behind it. Now I'm stuck on a problem, which isn't a code problem but more about the understanding.
I'm firing off this statement:
QueryProvider provider = new DbQueryProvider();
Query<Customer> customers = new Query<Customer>(provider);
int i = 3;
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name}).Where(p => p.Id == 2 | p.Id == i).ToList();
Somehow the code, or expression, knows that the Where comes before the Select. But how and where?
There is no way in the code that sorts the expression, in fact the ToString() in debug mode, shows that the Select comes before the Where.
I was trying to make the code fail. Normal I did the Where first and then the Select.
So how does the expression sort this? I have not done any change to the code in the guide.

The expressions are "interpreted", "translated" or "executed" in the order you write them - so the Where does not come before the Select
If you execute:
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name})
.Where(p => p.Id == 2 | p.Id == i).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the anonymous type.
If you execute:
var newLinqCustomer = customers.Where(p => p.Id == 2 | p.Id == i)
.Select(c => new { c.Id, c.Name}).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the customer type.
The only thing I can think of is that maybe you're seeing some generated SQL where the SELECT and WHERE have been reordered? In which case I'd guess that there's an optimisation step somewhere in the (e.g.) LINQ to SQL provider that takes SELECT Id, Name FROM (SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i) and converts it to SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i - but this must be a provider specific optimisation.

No, in the general case (such as LINQ to Objects) the select will be executed before the where statement. Think of it is a pipeline, your first step is a transformation, the second a filter. Not the other way round, as it would be the case if you wrote Where...Select.
Now, a LINQ Provider has the freedom to walk the expression tree and optimize it as it sees fit. Be aware that you may not change the semantics of the expression though. This means that a smart LINQ to SQL provider would try to pull as many where clauses it can into the SQL query to reduce the amount of data travelling over the network. However, keep the example from Stuart in mind: Not all query providers are clever, partly because ruling out side effects from query reordering is not as easy as it seems.

Related

Filtering CosmosDb using LINQ Query; .Contains or .Any LINQ causing exception

I am using CosmosDB and what to load filtered data. I want to do the filtering on the database side but I had some problems, which I think I managed to resolve but not sure why one way works but not another.
I have a list of CarPositions
// I fill this list from cosmosdb data
var carPosition = new List<CarPositions>()
My goal is to only get the cars if I have their positions in the carPosition list
I tried to do something like this
var iterator = CarContainer
.GetItemLinqQueryable<Car>(true)
.Where(x => carPosition.Any(cp => cp.Id == x.Id))
.ToFeedIterator();
Which throws an exception; "Input is not of type IDocumentQuery"
From what I understood from looking online is that CosmosDb provider does not support ANY linq when translating the query to SQL
This then lead me to try doing the same thing but with using Select and Contain
1)
var iterator = CarContainer
.GetItemLinqQueryable<Car>(true)
.Where(x => carPosition.Select(cp => cp.Id).ToList().Contain(x.Id))
.ToFeedIterator();
Again causing the same exception
I then tried this which works but to me this is doing the exact same thing
2)
var carPosIds = carPosition.Select(cp => cp.Id).ToList();
var iterator = CarContainer
.GetItemLinqQueryable<Car>(true)
.Where(x => carPosIds.Contain(x.Id))
.ToFeedIterator();
Can someone explain why the second one works but not the first? Also why the Any linq did not work?
Because the linq statement needs to be translated into a sql query which will run on the database server. The Contains statement inside the Where statement is supported because the SDK has implemented a translation using the sql IN clause (presumably).
While your Any query seems the same, just imagine it being a different expression and how difficult (or nearly impossible) it would become to translate it into an SQL query as the expression grows more complex. E.g. .Any(cp => cp.Id * 2 - x.Id / 2 == x.Id).
If this would be pure C# code running in memory instead of being translated by the SDK into a query for Cosmos all above methods would work (some better than others).

Is it possible to combine 2 LINQ queries, each filtering data, before fetching the results of the queries?

I need to retrieve data from 2 SQL tables, using LINQ. I was hoping to combine them using a Join. I've looked this problem up on Stack Overflow, but all the questions and answers I've seen involve retrieving the data using ToList(), but I need to use lazy loading. The reason for this is there's too much data to fetch it all. Therefore, I've got to apply a filter to both queries before performing a ToList().
One of these queries is easily specified:
var solutions = ctx.Solutions.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear);
It retrieves all the data from the Solution table, where the SolutionNumber starts with either the current or previous year. It returns an IQueryable.
The thing that's tough for me to figure out is how to retrieve a filtered list from another table named Proficiency. At this point all I've got is this:
var profs = ctx.Proficiencies;
The Proficiency table has a column named SolutionID, which is a foreign key to the ID column in the Solution table. If I were doing this in SQL, I'd do a subquery where SolutionID is in a collection of IDs from the Solution table, where those Solution records match the same Where clause I'm using to retrieve the IQueryable for Solutions above. Only when I've specified both IQueryables do I want to then perform a ToList().
But I don't know how to specify the second LINQ query for Proficiency. How do I go about doing what I'm trying to do?
As far as I understand, you are trying to fetch Proficiencies based on some Solutions. This might be achieved in two different ways. I'll try to provide solutions in Linq as it is more readable. However, you can change them in Lambda Expressions later.
Solution 1
var solutions = ctx.Solutions
.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear)
.Select(q => q.SolutionId);
var profs = (from prof in ctx.Proficiencies where (from sol in solutions select sol).Contains(prof.SolutionID) select prof).ToList();
or
Solution 2
var profs = (from prof in ctx.Proficiencies
join sol in ctx.Solutions on prof.SolutionId equals sol.Id
where sol.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || sol.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear
select prof).Distinct().ToList();
You can trace both queries in SQL Profiler to investigate the generated queries. But I'd go for the first solution as it will generate a subquery that is faster and does not use Distinct function that is not recommended unless you have to.

Modify linq query at runtime

Problem statement
Say I have a query which searches the names of people:
var result = (from person in container.people select person)
.Where(p => p.Name.Contains(some_criterion)
This will be translated to a SQL query containing the following like clause:
WHERE NAME LIKE '%some_criterion%'
This has some performance implications, as the database is unable to effectively use the index on the name column (index scan v.s. index seek if I'm not mistaken).
To remedy this, I can decide to just StartsWith() instead, generating a query using a like clause like:
WHERE NAME LIKE 'some_criterion%'
Which enables SQL server to use the index seek and delivering performance at the cost of some functionality.
I'd like to be able to provide the user with a choice: defaulting the behavior to use StartsWith, but if the user want the 'added flexibility' of searching using Contains(), than that should used.
What have I tried
I thought this to be trivial and went on and implemented an extension method on string. But of course, LINQ does not accept this and an exception is thrown.
Now, of course I can go about and use an if or switch statement and create a query for each of the cases, but I'd much rather solve this 'on a higher level' or more generically.
In short: using an if statement to differentiate between use cases isn't feasible due to the complexity of the real-life application. This would lead to alot of repetition and clutter. I'd really like to be able to encapsulate the varying behavior (Contains, StartsWith, EndsWith) somehow.
Question
Where should I look or what should I look for? Is this a case for composability with IQueryables? I'm quite puzzled!
Rather than overcomplicate things, how about just using an if statement?
var query = from person in container.people
select person;
if (userWantsStartsWith)
{
query = from p in query
where p.Name.Contains(some_criterion)
select p;
}
else
{
query = from p in query
where p.Name.StartsWith(some_criterion)
select p;
}
Update
If you really need something more complex try looking at LinqKit. It allows you to do the following.
var stringFunction = Lambda.Expression((string s1, string s2) => s1.Contains(s2));
if (userWantsStartsWith)
{
stringFunction = Lambda.Expression((string s1, string s2) => s1.StartsWith(s2));
}
var query = from p in container.people.AsExpandable()
where stringFunction.Invoke(p.Name, some_criterion)
select p;
I believe this fulfils your requirement of
I'd really like to be able to encapsulate the varying behavior
(Contains, StartsWith, EndsWith) somehow.
You can dynamically alter the query before enumerating it.
var query = container.people.AsQueryable();
if (contains)
{
query = query.Where(p => p.Name.Contains(filter));
}
else
{
query = query.Where(p => p.Name.StartsWith(filter));
}
Try dis:
var result = (from person in container.people select person)
.Where(p => some_bool_variable ? p.Name.Contains(some_criterium) : p.Name.StartsWith(some_criterium));
The real life queries are quite huge and unioned with several others. This, like my problem states, isn't te solution I'm looking for
Sinse your queries are huge: can't you just define stored procedure that handles everything and call it with specific to query parameters (probably several stored procedures that are called by main, e.g. one of em searches by Name, another - by Age, have different sort order and so on to keep code clear)?

How to Linq-ify this query?

I currently have this Linq query:
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.ToList<IStockTakeFact>();
The intent is to return every fact for the topCount of StockTakes but instead I can see that I will only get the topCount number of facts.
How do I Linq-ify this query to achieve my aim?
I could use 2 queries to get the top-topCount StockTakeId and then do a "between" but I wondered what tricks Linq might have.
This is what I'm trying to beat. Note that it's really more about learning that not being able to find a solution. Also concerned about performance not for these queries but in general, I don't want to just to easy stuff and find out it's thrashing behind the scenes. Like what is the penalty of that contains clause in my second query below?
List<long> stids = this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Select(st => st.StockTakeId)
.ToList<long>();
return this.Context.StockTakeFacts
.Where(stf => (stf.FactKindId == ((int)kind)) && (stids.Contains(stf.StockTakeId)))
.ToList<IStockTakeFact>();
What about this?
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.Select(stf=>stf.Fact)
.ToList();
If I've understood what you're after correctly, how about:
return this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Join(
this.Context.StockTakeFacts,
st => st.StockTakeId,
stf => stf.StockTakeId,
(st, stf) => stf)
.OrderByDescending(stf => stf.StockTakeId)
.ToList<IStockTakeFact>();
Here's my attempt using mostly query syntax and using two separate queries:
var stids =
from st in this.Context.StockTakes
orderby st.StockTakeId descending
select st.StockTakeId;
var topFacts =
from stid in stids.Take(topCount)
join stf in this.Context.StockTakeFacts
on stid equals stf.StockTakeId
where stf.FactKindId == (int)kind
select stf;
return topFacts.ToList<IStockTakeFact>();
As others suggested, what you were looking for is a join. Because the join extension has so many parameters they can be a bit confusing - so I prefer query syntax when doing joins - the compiler gives errors if you get the order wrong, for instance. Join is by far preferable to a filter not only because it spells out how the data is joined together, but also for performance reasons because it uses indexes when used in a database and hashes when used in linq to objects.
You should note that I call Take in the second query to limit to the topCount stids used in the second query. Instead of having two queries, I could have used an into (i.e., query continuation) on the select line of the stids query to combine the two queries, but that would have created a mess for limiting it to topCount items. Another option would have been to put the stids query in parentheses and invoked Take on it. Instead, separating it out into two queries seemed the cleanest to me.
I ordinarily avoid specifying generic types whenever I think the compiler can infer the type; however, IStockTakeFact is almost certainly an interface and whatever concrete type implements it is likely contained by this.Context.StockTakeFacts; which creates the need to specify the generic type on the ToList call. Ordinarily I omit the generic type parameter to my ToList calls - that seems to be an element of my personal tastes, yours may differ. If this.Context.StockTakeFacts is already a List<IStockTakeFact> you could safely omit the generic type on the ToList call.

How linq 2 sql is translated to TSQL

It seems Linq2sql doesn't know how to construct the TSQL while you transform linq2sql object into domain object with constructors. Such as:
from c in db.Companies
select new Company (c.ID, c.Name, c.Location).Where(x => x.Name =="Roy");
But when using settable attributes, it will be OK.
from c in db.Companies
select new Company { ID = c.ID, Name = c.Name, Location = c.Location }.Where(x => x.Name =="Roy");
I don't want to allow those attributes to be settable. How can I achieve this? And can anybody provide food for thought on how linq 2 sql is translated into TSQL? Thanks in advance!
It's probably to do with the way L2S parses the expressions - it can parse the object initialiser expression, but not the constructor expression. Basically, the way L2S works is to parse the linq expressions the way any LINQ provider does and then convert the result into SQL.
You could achieve what you'd like by converting it into an IEnumerable first, as then you'll be free to use LINQ to Objects. In the example you gave this is trivial, but let's generalise to a case with a more complex where clause:
var companyData =
from c in db.Companies
where c.Name.StartsWith("Roy")
select new { c.ID, c.Name, c.Location };
var companies =
from c in companyData.AsEnumerable()
select new Company(c.ID, c.Name, c.Location);
The first query is incorrect because the from statement will return a collection of company entities. To get only 1 company you would need to change the first statement to:
Company c = (from c in db.Companies where c.ID = someId select c).First();
The second statement does the where statement implicitly.
I suggest you run SQL Profiler while executing the second query to see what is actually being used as TSQL statement.
It cannot translate a query involving a constructor, because it doesn't know what said constructor is supposed to do. It has field mappings for properties, but your constructor could do absolutely anything - why should it try to second-guess that?
It's not clear what the purpose of making the properties read-only would be for L2S entity classes anyway - it is trivially circumvented by deleting the object and re-creating a new one with the same primary key but new property values.

Categories

Resources