This question already has an answer here:
Why does the Contains() operator degrade Entity Framework' Linq queries?
(1 answer)
Closed 9 years ago.
I was looking for some tips to improve my entity framework query performance and came accross this useful article.
The author of this article mentioned following:
08 Avoid using Contains
In LINQ, we use contains method for checking existence. It is converted to "WHERE IN" in SQL which cause performance degrades.
Which faster alternatives are remaining for me?
Contains is perfectly valid for the scenarios you WANT WHERE IN
EG:
var q = from p in products where new[]{1,50,77}.Contains(p.productId) select p;
gets (essentially) converted to
SELECT * FROM products WHERE ProductId IN (1,50,77)
However if you are checking for existence I would advice you to use .Any() , which gets converted to EXISTS -
EG
var q = from p in products
where p.productsLinkGroups.Any(x => x.GroupID == 5)
select p
Gets (more or less) coverted to:
SELECT * FROM products p
WHERE EXISTS(
SELECT NULL FROM productsLinkGroups plg
WHERE plg.GroupId = 5 AND plg.ProductId = p.ProductId
)
It is very context dependent, what you should be looking at is not avoiding .Contains() but rather how do you avoid WHERE xx IN yy in SQL. Could you do a join instead? Is it possible to specify an interval rather than discrete values?
A perfect example is presented here: Avoid SQL WHERE NOT IN Clause
Where it was possible to avoid it by using a join.
I would say that WHERE xx IN yy is usually just a half a solution, often what you really want is something else and you only get halfway there instead of going there directly, like in the case of a join.
Related
This question already has answers here:
How to make LINQ execute a (SQL) LIKE range search
(3 answers)
Closed 5 years ago.
I try to do a request to my database and I want to use something like the "LIKE" operator in SQL to resolve this pattern: %{%}%
Using the "SQL-friendly" syntax.
Currently my request looks like:
var routelist = (from dbHost in db.Hosts
where dbHost.Host == host
join dbRoute in db.Route on dbHost.HostsId equals dbRoute.HostId
select dbRoute).ToList();
If someone know how to do, can he tells it to me ? Thanks
EDIT :
I want to do something like it :
var routelist = (from dbHost in db.Hosts
where dbHost.Host == host
join dbRoute in db.Route on dbHost.HostsId equals dbRoute.HostId
where dbRoute.alias like "%{%}%"
select dbRoute).ToList();
Entity Framework Core 2.0 has now available EF.Functions property that includes EF.Functions.Like().
https://blogs.msdn.microsoft.com/dotnet/2017/08/14/announcing-entity-framework-core-2-0/
Example:
var customers =
from c in context.Customers
where EF.Functions.Like(c.Name, "a%");
select c;
When I need to perform such LIKE queries with entity framework (and this happens very rarely, because such like queries usually cannot use any index and so perform full table scan and are quite slow on big tables), I use PATINDEX, like this:
var routelist = (from dbHost in db.Hosts
where dbHost.Host == host
join dbRoute in db.Route on dbHost.HostsId equals dbRoute.HostId
where SqlFunctions.PatIndex("%{%}%",dbRoute.alias) > 0
select dbRoute).ToList();
PATINDEX function in sql server is like LIKE, but returns the position of the first match.
If you want to perform LIKE query in a form of "%something%", you can use Contains("something"). If it has the form of "%something" - use StartsWith("something"). If it has the form of "something%" - use EndsWith("something").
I've periodically seen it written that joins are unnecessary in LINQ to SQL. Most recently, I saw this statement in one of Joseph Albahari's LINQPad samples (Chapter 9 - LINQ Operators > Filtering > Joining > Simple Join).
The comment says:
// Note: before delving into this section, make sure you've read the preceding two
// sections: Select and SelectMany. The Join operators are actually unnecessary
// in LINQ to SQL, and the equivalent of SQL inner and outer joins is most easily
// achieved in LINQ to SQL using Select/SelectMany and subqueries!
I've gone through the Select and SelectMany sections in LINQPad and I definitely want to be doing this the easy way, but my attempts to completely remove joins (and get the same results) have failed.
Anyway, below is the 100% working query I'm trying this out on (full schema pictured below).
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
join projectsBillingSchedule in dbContext.ProjectsBillingSchedules
on workOrder.ProjectId equals projectsBillingSchedule.ProjectId
join partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
on projectsBillingSchedule.BillingSchId equals partyPricing.BILLING_SCH_ID
join measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
on partyPricing.PARTY_RETROFIT_CODE_ID equals measuresPartyRetrofitCode.PartyRetrofitCodeId
join measure in dbContext.Measures on measuresPartyRetrofitCode.ConvId equals measure.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Please note, certain entities are omitted from the code because they are not strictly necessary for the query to run properly. Aside from that, you can see the joins are done in order of the relationships in the schema image, i.e., from WORK_ORDERS to MEASURES (start by moving away from WORK_ORDER_LINES):
I have tried some using navigation properties, but I run into 2 problems:
The SQL outputs in multiple statements (N+1 problem?), and
I can't seem to get it all in one statement.
So, back to my question - using the example above (or something else with a lot of joins), how are join operators unnecessary in LINQ to SQL?
UPDATE
Ok, I think I've figured out one solution, but this actually requires more lines of code to achieve the same result.
Since that is the case, I'm not sure why it is worth exclaiming that join operators are unnecessary. I'll leave this question open for a while to see if someone wants to make a compelling case against joins.
(from workOrder in dbContext.WorkOrders.Where(wo => wo.WoId == workOrderLine.WoId)
from projectsBillingSchedule in dbContext.ProjectsBillingSchedules
where workOrder.ProjectId == projectsBillingSchedule.ProjectId
from partyPricing in dbContext.PARTY_PRICING.Where(pp => pp.END_DATE_ACTIVE == endDateActive)
where projectsBillingSchedule.BillingSchId == partyPricing.BILLING_SCH_ID
from measuresPartyRetrofitCode in dbContext.MeasuresPartyRetrofitCodes
where partyPricing.PARTY_RETROFIT_CODE_ID == measuresPartyRetrofitCode.PartyRetrofitCodeId
from measure in dbContext.Measures
where measure.ConvId == measuresPartyRetrofitCode.ConvId
select measure).FirstOrDefault(m => m.ConvId == workOrderLine.ConvId)
Currently I'm running some code and got some question about this. Below are the code listings of two LINQ to Entites queries.
Code listing A:
IQueryable list =
from tableProject in db.Project
select new {StaffInCharge = (
from tableStaff in db.Staff
where tableStaff.StaffId == tableProject.StaffInChargeId
select tableStaff.StaffName)};
Code listing B:
IQueryable list =
from tableProjectin db.Project
join tableStaff in db.Staff
on tableProject.StaffInChargeId
equal tableStaff.StaffId
select new {StaffInCharge = tableStaff.StaffName};
What I want to figure out is which one will be better and faster if I have to select many column from many others table.
Thanks.
this is the comment from #Tim Schmelter
"The article(actually my SO-Question) relates to LINQ-To-DataSet what is based on LINQ-To-Objects. Linq to SQL or Linq to Entities might be optimized by the DBMS in that way that a where clause has the same performance as a join."
and the link is
Why is LINQ JOIN so much faster than linking with WHERE?
i think it is very useful.
I want to find the fastest way to get the min and max of a column in a table with a single Linq to SQL roundtrip. So I know this would work in two roundtrips:
int min = MyTable.Min(row => row.FavoriteNumber);
int max = MyTable.Max(row => row.FavoriteNumber);
I know I can use group but I don't have a group by clause, I want to aggregate over the whole table! And I can't use the .Min without grouping first. I did try this:
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
But that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be.
So, is there any way to just get the correct SQL out?
EDIT: These guys failed too: Linq to SQL: how to aggregate without a group by? ... lame oversight by LINQ designers if there's really no answer.
EDIT 2: I looked at my own solution (with the nonsensical constant group by clause) in the SQL Server Management Studio execution plan analysis, and it looks to me like it is identical to the plan generated by:
SELECT MIN(FavoriteNumber), MAX(FavoriteNumber)
FROM MyTable
so unless someone can come up with a simpler-or-equally-as-good answer, I think I have to mark it as answered-by-myself. Thoughts?
As stated in the question, this method seems to actually generate optimal SQL code, so while it looks a bit squirrely in LINQ, it should be optimal performance-wise.
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
I could find only this one which produces somewhat clean sql still not really effective comparing to select min(val), max(val) from table:
var r =
(from min in items.OrderBy(i => i.Value)
from max in items.OrderByDescending(i => i.Value)
select new {min, max}).First();
the sql is
SELECT TOP (1)
[t0].[Value],
[t1].[Value] AS [Value2]
FROM
[TestTable] AS [t0],
[TestTable] AS [t1]
ORDER BY
[t0].[Value],
[t1].[Value] DESC
still there is another option to use single connection for both min and max queries (see Multiple Active Result Sets (MARS))
or stored procedure..
I'm not sure how to translate it into C# yet (I'm working on it)
This is the Haskell version
minAndMax :: Ord a => [a] -> (a,a)
minAndMax [x] = (x,x)
minAndMax (x:xs) = (min a x, max b x)
where (a,b) = minAndMax xs
The C# version should involve Aggregate some how (I think).
You could select the whole table, and do your min and max operations in memory:
var cache = // select *
var min = cache.Min(...);
var max = cache.Max(...);
Depending on how large your dataset is, this might be the way to go about not hitting your database more than once.
A LINQ to SQL query is a single expression. Thus, if you can't express your query in a single expression (or don't like it once you do) then you have to look at other options.
Stored procedures, since they can have statements, enable you to accomplish this in a single round-trip. You will either have two output parameters or select a result set with two rows. Either way, you will need custom code to read the stored procedure's result.
(I don't personally see the need to avoid two round-trips here. It seems like a premature optimization, especially since you will probably have to jump through hoops to get it working. Not to mention the time you will spend justifying this decision and explaining the solution to other developers.)
Put another way: you've already answered your own question. "I can't use the .Min without grouping first", followed by "that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be", are clues that the simple and easily-understood two-round-trip solution is the best expression of your intent (unless you write custom SQL).
This question already has an answer here:
Reason of equals keyword in LINQ's join statement
(1 answer)
Closed 8 months ago.
I always wondered why there's an equals keyword in linq joins rather than using the == operator.
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID equals w.ID
select p).First();
Instead of
Property deadline =
(from p in properties
join w in widgets
on p.WidgetID == w.ID
select p).First();
[EDIT] Rephrased the question and revised the examples.
There's a nice explanation by Matt Warren at The Moth:
"The reason C# has the word ‘equals’ instead of the ‘==’ operator was to make it clear that the ‘on’ clause needs you to supply two separate expressions that are compared for equality not a single predicate expression. The from-join pattern maps to the Enumerable.Join() standard query operator that specifies two separate delegates that are used to compute values that can then be compared. It needs them as separate delegates in order to build a lookup table with one and probe into the lookup table with the other. A full query processor like SQL is free to examine a single predicate expression and choose how it is going to process it. Yet, to make LINQ operate similar to SQL would require that the join condition be always specified as an expression tree, a significant overhead for the simple in-memory object case."
However, this concerns join. I'm not sure equals should be used in your code example (does it even compile?).
Your first version doesn't compile. You only use equals in joins, to make the separate halves of the equijoin clear to the compiler.