Problem statement
Say I have a query which searches the names of people:
var result = (from person in container.people select person)
.Where(p => p.Name.Contains(some_criterion)
This will be translated to a SQL query containing the following like clause:
WHERE NAME LIKE '%some_criterion%'
This has some performance implications, as the database is unable to effectively use the index on the name column (index scan v.s. index seek if I'm not mistaken).
To remedy this, I can decide to just StartsWith() instead, generating a query using a like clause like:
WHERE NAME LIKE 'some_criterion%'
Which enables SQL server to use the index seek and delivering performance at the cost of some functionality.
I'd like to be able to provide the user with a choice: defaulting the behavior to use StartsWith, but if the user want the 'added flexibility' of searching using Contains(), than that should used.
What have I tried
I thought this to be trivial and went on and implemented an extension method on string. But of course, LINQ does not accept this and an exception is thrown.
Now, of course I can go about and use an if or switch statement and create a query for each of the cases, but I'd much rather solve this 'on a higher level' or more generically.
In short: using an if statement to differentiate between use cases isn't feasible due to the complexity of the real-life application. This would lead to alot of repetition and clutter. I'd really like to be able to encapsulate the varying behavior (Contains, StartsWith, EndsWith) somehow.
Question
Where should I look or what should I look for? Is this a case for composability with IQueryables? I'm quite puzzled!
Rather than overcomplicate things, how about just using an if statement?
var query = from person in container.people
select person;
if (userWantsStartsWith)
{
query = from p in query
where p.Name.Contains(some_criterion)
select p;
}
else
{
query = from p in query
where p.Name.StartsWith(some_criterion)
select p;
}
Update
If you really need something more complex try looking at LinqKit. It allows you to do the following.
var stringFunction = Lambda.Expression((string s1, string s2) => s1.Contains(s2));
if (userWantsStartsWith)
{
stringFunction = Lambda.Expression((string s1, string s2) => s1.StartsWith(s2));
}
var query = from p in container.people.AsExpandable()
where stringFunction.Invoke(p.Name, some_criterion)
select p;
I believe this fulfils your requirement of
I'd really like to be able to encapsulate the varying behavior
(Contains, StartsWith, EndsWith) somehow.
You can dynamically alter the query before enumerating it.
var query = container.people.AsQueryable();
if (contains)
{
query = query.Where(p => p.Name.Contains(filter));
}
else
{
query = query.Where(p => p.Name.StartsWith(filter));
}
Try dis:
var result = (from person in container.people select person)
.Where(p => some_bool_variable ? p.Name.Contains(some_criterium) : p.Name.StartsWith(some_criterium));
The real life queries are quite huge and unioned with several others. This, like my problem states, isn't te solution I'm looking for
Sinse your queries are huge: can't you just define stored procedure that handles everything and call it with specific to query parameters (probably several stored procedures that are called by main, e.g. one of em searches by Name, another - by Age, have different sort order and so on to keep code clear)?
Related
I am trying to work with dynamic data and running into some odd things with LINQ that I can't find much information online. I want to point out that this issue I run into happens on any nested collection.
I want to take a collection of dynamic data, then filter it with a where query. That where query simply checks all the values to see if it contains "FL" and then I want it to return the dynamic collection... not just the fields that contain FL.
I've explicitly put in the type in the where clause to make it easier to read online, it is redundant otherwise.
IEnumerable<dynamic> query = from agent in agentRecords
from values in (ExpandoObject)agent
where ((KeyValuePair<string, object>)values).Value.ToString().Contains("FL")
select agent;
The query works, but returns 3 times the expected result.(I get 9 agents instead of 3, multiple duplicates.)
I am able to filter it by calling distinct, but something tells me I am not doing this right.
The other way to do this is by using LINQ extension methods
var result = agentRecords.Cast<ExpandoObject>().Where(x => x.Any(y => y.Value.ToString().Contains("FL")));
According to https://learn.microsoft.com/en-us/dotnet/csharp/linq/query-expression-basics, there are multiple examples of "multiple/nested from" linq queries and it doesn't seem to run into this duplicate result problem... what am I overlooking?
Instead of cross joining each agent with its collection of values, test each agent once:
IEnumerable<dynamic> query = from agent in agentRecords
where (from values in (ExpandoObject)agent
select ((KeyValuePair<string, object>)values).Value.ToString().Contains("FL")).Any()
select agent;
Lambda syntax does seem clearer to me, which looks to be identical to your expression:
IEnumerable<dynamic> query2 = agentRecords.Where(agent => ((ExpandoObject)agent).Any(((KeyValuePair<string, object>)values).Value.ToString().Contains("FL")));
from a in agentRecords
where (from i in (ExpandoObject)a
where (((KeyValuePair<string, object>)i).Value.ToString().Contains("FL")
select i).Count() > 0
select a;
I currently have this Linq query:
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.ToList<IStockTakeFact>();
The intent is to return every fact for the topCount of StockTakes but instead I can see that I will only get the topCount number of facts.
How do I Linq-ify this query to achieve my aim?
I could use 2 queries to get the top-topCount StockTakeId and then do a "between" but I wondered what tricks Linq might have.
This is what I'm trying to beat. Note that it's really more about learning that not being able to find a solution. Also concerned about performance not for these queries but in general, I don't want to just to easy stuff and find out it's thrashing behind the scenes. Like what is the penalty of that contains clause in my second query below?
List<long> stids = this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Select(st => st.StockTakeId)
.ToList<long>();
return this.Context.StockTakeFacts
.Where(stf => (stf.FactKindId == ((int)kind)) && (stids.Contains(stf.StockTakeId)))
.ToList<IStockTakeFact>();
What about this?
return this.Context.StockTakeFacts
.OrderByDescending(stf => stf.StockTakeId)
.Where(stf => stf.FactKindId == ((int)kind))
.Take(topCount)
.Select(stf=>stf.Fact)
.ToList();
If I've understood what you're after correctly, how about:
return this.Context.StockTakes
.OrderByDescending(st => st.StockTakeId)
.Take(topCount)
.Join(
this.Context.StockTakeFacts,
st => st.StockTakeId,
stf => stf.StockTakeId,
(st, stf) => stf)
.OrderByDescending(stf => stf.StockTakeId)
.ToList<IStockTakeFact>();
Here's my attempt using mostly query syntax and using two separate queries:
var stids =
from st in this.Context.StockTakes
orderby st.StockTakeId descending
select st.StockTakeId;
var topFacts =
from stid in stids.Take(topCount)
join stf in this.Context.StockTakeFacts
on stid equals stf.StockTakeId
where stf.FactKindId == (int)kind
select stf;
return topFacts.ToList<IStockTakeFact>();
As others suggested, what you were looking for is a join. Because the join extension has so many parameters they can be a bit confusing - so I prefer query syntax when doing joins - the compiler gives errors if you get the order wrong, for instance. Join is by far preferable to a filter not only because it spells out how the data is joined together, but also for performance reasons because it uses indexes when used in a database and hashes when used in linq to objects.
You should note that I call Take in the second query to limit to the topCount stids used in the second query. Instead of having two queries, I could have used an into (i.e., query continuation) on the select line of the stids query to combine the two queries, but that would have created a mess for limiting it to topCount items. Another option would have been to put the stids query in parentheses and invoked Take on it. Instead, separating it out into two queries seemed the cleanest to me.
I ordinarily avoid specifying generic types whenever I think the compiler can infer the type; however, IStockTakeFact is almost certainly an interface and whatever concrete type implements it is likely contained by this.Context.StockTakeFacts; which creates the need to specify the generic type on the ToList call. Ordinarily I omit the generic type parameter to my ToList calls - that seems to be an element of my personal tastes, yours may differ. If this.Context.StockTakeFacts is already a List<IStockTakeFact> you could safely omit the generic type on the ToList call.
I'm trying to create a LINQ provider. I'm using the guide LINQ: Building an IQueryable provider series, and I have added the code up to LINQ: Building an IQueryable Provider - Part IV.
I am getting a feel of how it is working and the idea behind it. Now I'm stuck on a problem, which isn't a code problem but more about the understanding.
I'm firing off this statement:
QueryProvider provider = new DbQueryProvider();
Query<Customer> customers = new Query<Customer>(provider);
int i = 3;
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name}).Where(p => p.Id == 2 | p.Id == i).ToList();
Somehow the code, or expression, knows that the Where comes before the Select. But how and where?
There is no way in the code that sorts the expression, in fact the ToString() in debug mode, shows that the Select comes before the Where.
I was trying to make the code fail. Normal I did the Where first and then the Select.
So how does the expression sort this? I have not done any change to the code in the guide.
The expressions are "interpreted", "translated" or "executed" in the order you write them - so the Where does not come before the Select
If you execute:
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name})
.Where(p => p.Id == 2 | p.Id == i).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the anonymous type.
If you execute:
var newLinqCustomer = customers.Where(p => p.Id == 2 | p.Id == i)
.Select(c => new { c.Id, c.Name}).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the customer type.
The only thing I can think of is that maybe you're seeing some generated SQL where the SELECT and WHERE have been reordered? In which case I'd guess that there's an optimisation step somewhere in the (e.g.) LINQ to SQL provider that takes SELECT Id, Name FROM (SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i) and converts it to SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i - but this must be a provider specific optimisation.
No, in the general case (such as LINQ to Objects) the select will be executed before the where statement. Think of it is a pipeline, your first step is a transformation, the second a filter. Not the other way round, as it would be the case if you wrote Where...Select.
Now, a LINQ Provider has the freedom to walk the expression tree and optimize it as it sees fit. Be aware that you may not change the semantics of the expression though. This means that a smart LINQ to SQL provider would try to pull as many where clauses it can into the SQL query to reduce the amount of data travelling over the network. However, keep the example from Stuart in mind: Not all query providers are clever, partly because ruling out side effects from query reordering is not as easy as it seems.
Could someone help me with this exception? I don't understand what it means or how to fix it... It is an SqlException with the following message:
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
I get it when running a query in pseudo code looking like this:
// Some filtering of data
var query = data.Subjects
.Where(has value)
.Where(has other value among some set of values);
// More filtering, where I need to have two different options
var a = query
.Where(some foreign key is null);
var b = query
.Where(some foreign key is not null)
.Where(and that foreign key has a property which is what I want);
query = a.Union(b);
// Final filter and then get result as a list
var list = query
.Where(last requirement)
.ToList();
If I remove the a.Union(b) parts, it runs without the exception. So I know the error is there. But why do I get it? And how can I fix it? Am I doing something too crazy here? Have I misunderstood how to use the Union thing?
Basically what I have is some entities which have a foreign key to some other entity. And I need to get all the entities which either have that foreign key set to null or where that foreign entity fulfills some requirements.
Judging from the SQL error you listed you may be experiencing the same issue I was. Basically when Linq to SQL queries that use the Concat or Union extension method on two different queries it appears that there is a bug in Linq to SQL which optimizes each projection separately without regard to the fact that the projection must stay the same in order to accomplish the SQL Union.
References:
LINQ to SQL produces incorrect TSQL when using UNION or CONCAT
Linq to SQL Union Same Fieldname generating Error
If this happens to be your problem as well I've found a solution that is working for me as shown below.
var queryA =
from a in context.TableA
select new
{
id,
name,
onlyInTableA,
}
var queryB =
from b in context.TableB
let onlyInTableA = default(string)
select new
{
id,
name,
onlyInTableA,
}
var results = queryA.Union(queryB).ToList();
Since this looks like a problem with the generated SQL, you should try to use either an SQL Profiler, or use this code for DebuggerWritter class to write the SQL to your Output Window in Visual Studio.
The SQL error is normally caused by the fields retrieved for UNION is not the same for the 2 queries. For example, if the first query might have 3 fields, but the second query has 4 fields, this error will occur. So, seeing the generated SQL will definitely help in this case.
Can you perhaps write it in a single query?
.Where(row => row.ForeignKey == null || row.ForeignKey.SomeCondition);
There are also ways of merging expressions (OrElse), but that isn't trivial.
Not sure where the error comes from, though!
edit: haven't tested it, but this should be logically equivalent to a UNION:
public static IQueryable<T> WhereAnyOf<T>(
this IQueryable<T> source,
params Expression<Func<T, bool>>[] predicates)
{
if (source == null) throw new ArgumentNullException("source");
if (predicates == null) throw new ArgumentNullException("predicates");
if (predicates.Length == 0) return source.Where(row => false);
if (predicates.Length == 1) return source.Where(predicates[0]);
var param = Expression.Parameter(typeof(T), "row");
Expression body = Expression.Invoke(predicates[0], param);
for (int i = 1; i < predicates.Length; i++)
{
body = Expression.OrElse(body,
Expression.Invoke(predicates[i], param));
}
return source.Where(Expression.Lambda<Func<T, bool>>(body, param));
}
query = a.Union(b);
Not a good idea to mutate captured variables... Likely the cause of the error.
UPDATE: ok not
Here is another idea. The hint is in the error message.
var a = query
.Where(some foreign key is null)
.Select(x => x);
Or play by adding another 'fake' Where till they do become equal :)
I would call data.GetCommand(query) and analyze the resulting DbCommand (especially the generated SQL string). That should give you a clue to what goes wrong.
There is no projection going on anywhere so I would expect both target lists to be the same.
You could try to reduce your query to a smaller one that still doesn't work. Start with query.Union(query) (this should at least work). Than add your Where calls one by one to see when it stops working.
It must be one of your Where calls that adds extra columns to your select list.
Are you by any chance passing in a value to the 'select' side in a variable, or are you returning the same field more than once? SP1 introduced a bug where it tries to 'optimize' out such things and that can cause union queries to break (due to the query parts 'optimizing' out different passed-in params).
If you post your actual query rather than pseudo code it makes it easier to identify if this is the case.
(And a workaround if this is the case is to materialize the individual parts first and then do a client-side (L2O) union).
jpierson has the problem summarised correctly.
I also had the problem, this time caused by some literals in the select statement:
Dim results = (From t in TestDataContext.Table1 _
Where t.ID = WantedID _
Select t.name, SpecialField = 0, AnotherSpecialField = 0, t.Address).Union _
From t in TestDataContext.Table1 _
Where t.SecondID = WantedSecondID _
Select t.name, SpecialField = 1, AnotherSpecialField = 0, t.Address)
The first sub-query of "SpecialField = 0" and the "AnotherSpecialField = 0" were optimised, resulting in one field instead of two being used in the union, which will obviously fail.
I had to change the first query so that the SpecialField & AnotherSpecialField had different values, much like in the second sub-query.
Well I had an issue with this. Using Sql 08 i had two table functions that returned an int and a string in both cases. I created a complex object and used linq to attempt a UNION. Had an IEqualityComparer to do the comparision. All compiled fine, but crashed with a unsupported overload. Ok, i realised the problem discussed seemed to smack of defered execution. So i get the collections, and place ToList(), then do the UNION and it is all good. Not sure if this is helpful, but it works for me
LINQ is one of the greatest improvements to .NET since generics and it saves me tons of time, and lines of code. However, the fluent syntax seems to come much more natural to me than the query expression syntax.
var title = entries.Where(e => e.Approved)
.OrderBy(e => e.Rating).Select(e => e.Title)
.FirstOrDefault();
var query = (from e in entries
where e.Approved
orderby e.Rating
select e.Title).FirstOrDefault();
Is there any difference between the two or is there any particular benefit of one over other?
Neither is better: they serve different needs. Query syntax comes into its own when you want to leverage multiple range variables. This happens in three situations:
When using the let keyword
When you have multiple generators (from clauses)
When doing joins
Here's an example (from the LINQPad samples):
string[] fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" };
var query =
from fullName in fullNames
from name in fullName.Split()
orderby fullName, name
select name + " came from " + fullName;
Now compare this to the same thing in method syntax:
var query = fullNames
.SelectMany (fName => fName.Split().Select (name => new { name, fName } ))
.OrderBy (x => x.fName)
.ThenBy (x => x.name)
.Select (x => x.name + " came from " + x.fName);
Method syntax, on the other hand, exposes the full gamut of query operators and is more concise with simple queries. You can get the best of both worlds by mixing query and method syntax. This is often done in LINQ to SQL queries:
var query =
from c in db.Customers
let totalSpend = c.Purchases.Sum (p => p.Price) // Method syntax here
where totalSpend > 1000
from p in c.Purchases
select new { p.Description, totalSpend, c.Address.State };
I prefer to use the latter (sometimes called "query comprehension syntax") when I can write the whole expression that way.
var titlesQuery = from e in entries
where e.Approved
orderby e.Rating
select e.Titles;
var title = titlesQuery.FirstOrDefault();
As soon as I have to add (parentheses) and .MethodCalls(), I change.
When I use the former, I usually put one clause per line, like this:
var title = entries
.Where (e => e.Approved)
.OrderBy (e => e.Rating)
.Select (e => e.Title)
.FirstOrDefault();
I find that a little easier to read.
Each style has their pros and cons. Query syntax is nicer when it comes to joins and it has the useful let keyword that makes creating temporary variables inside a query easy.
Fluent syntax on the other hand has a lot more methods and operations that aren't exposed through the query syntax. Also since they are just extension methods you can write your own.
I have found that every time I start writing a LINQ statement using the query syntax I end up having to put it in parenthesis and fall back to using fluent LINQ extension methods. Query syntax just doesn't have enough features to use by itself.
In VB.NET i very much prefer query syntax.
I hate to repeat the ugly Function-keyword:
Dim fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" };
Dim query =
fullNames.SelectMany(Function(fName) fName.Split().
Select(Function(Name) New With {Name, fName})).
OrderBy(Function(x) x.fName).
ThenBy(Function(x) x.Name).
Select(Function(x) x.Name & " came from " & x.fName)
This neat query is much more readable and maintainable in my opinion:
query = From fullName In fullNames
From name In fullName.Split()
Order By fullName, name
Select name & " came from " & fullName
VB.NET's query syntax is also more powerful and less verbose than in C#: https://stackoverflow.com/a/6515130/284240
For example this LINQ to DataSet(Objects) query
VB.NET:
Dim first10Rows = From r In dataTable1 Take 10
C#:
var first10Rows = (from r in dataTable1.AsEnumerable()
select r)
.Take(10);
I don't get the query syntax at all. There's just no reason for it in my mind. let can be acheived with .Select and anonymous types. I just think things look much more organized with the "punctuation" in there.
The fluent interface if there's just a where. If I need a select or orderby, I generally use the Query syntax.
Fluent syntax does seem more powerful indeed, it should also work better for organizing code into small reusable methods.
I know this question is tagged with C#, but the Fluent syntax is painfully verbose with VB.NET.
I really like the Fluent syntax and I try to use it where I can, but in certain cases, for example where I use joins, I usually prefer the Query syntax, in those cases I find it easier to read, and I think some people are more familiar to Query (SQL-like) syntax, than lambdas.
While I do understand and like the fluent format , I've stuck to Query for the time being for readability reasons. People just being introduced to LINQ will find Query much more comfortable to read.
I prefer the query syntax as I came from traditional web programming using SQL. It is much easier for me to wrap my head around. However, it think I will start to utilize the .Where(lambda) as it is definitely much shorter.
I've been using Linq for about 6 months now. When I first started using it I preferred the query syntax as it's very similar to T-SQL.
But, I'm gradually coming round to the former now, as it's easy to write reusable chunks of code as extension methods and just chain them together. Although I do find putting each clause on its own line helps a lot with readability.
I have just set up our company's standards and we enforce the use of the Extension methods. I think it's a good idea to choose one over the other and don't mix them up in code. Extension methods read more like the other code.
The comprehension syntax does not have all operators and using parentheses around the query and add extension methods after all just begs me for using extension methods from the start.
But for the most part it is just personal preference with a few exceptions.
From Microsoft's docs:
As a rule when you write LINQ queries, we recommend that you use query syntax whenever possible and method syntax whenever necessary. There is no semantic or performance difference between the two different forms. Query expressions are often more readable than equivalent expressions written in method syntax.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/#query-expression-overview
They also say:
To get started using LINQ, you do not have to use lambdas extensively. However, certain queries can only be expressed in method syntax and some of those require lambda expressions.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/query-syntax-and-method-syntax-in-linq