where vs join. which one is better to use - c#

Currently I'm running some code and got some question about this. Below are the code listings of two LINQ to Entites queries.
Code listing A:
IQueryable list =
from tableProject in db.Project
select new {StaffInCharge = (
from tableStaff in db.Staff
where tableStaff.StaffId == tableProject.StaffInChargeId
select tableStaff.StaffName)};
Code listing B:
IQueryable list =
from tableProjectin db.Project
join tableStaff in db.Staff
on tableProject.StaffInChargeId
equal tableStaff.StaffId
select new {StaffInCharge = tableStaff.StaffName};
What I want to figure out is which one will be better and faster if I have to select many column from many others table.
Thanks.

this is the comment from #Tim Schmelter
"The article(actually my SO-Question) relates to LINQ-To-DataSet what is based on LINQ-To-Objects. Linq to SQL or Linq to Entities might be optimized by the DBMS in that way that a where clause has the same performance as a join."
and the link is
Why is LINQ JOIN so much faster than linking with WHERE?
i think it is very useful.

Related

Is it possible to combine 2 LINQ queries, each filtering data, before fetching the results of the queries?

I need to retrieve data from 2 SQL tables, using LINQ. I was hoping to combine them using a Join. I've looked this problem up on Stack Overflow, but all the questions and answers I've seen involve retrieving the data using ToList(), but I need to use lazy loading. The reason for this is there's too much data to fetch it all. Therefore, I've got to apply a filter to both queries before performing a ToList().
One of these queries is easily specified:
var solutions = ctx.Solutions.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear);
It retrieves all the data from the Solution table, where the SolutionNumber starts with either the current or previous year. It returns an IQueryable.
The thing that's tough for me to figure out is how to retrieve a filtered list from another table named Proficiency. At this point all I've got is this:
var profs = ctx.Proficiencies;
The Proficiency table has a column named SolutionID, which is a foreign key to the ID column in the Solution table. If I were doing this in SQL, I'd do a subquery where SolutionID is in a collection of IDs from the Solution table, where those Solution records match the same Where clause I'm using to retrieve the IQueryable for Solutions above. Only when I've specified both IQueryables do I want to then perform a ToList().
But I don't know how to specify the second LINQ query for Proficiency. How do I go about doing what I'm trying to do?
As far as I understand, you are trying to fetch Proficiencies based on some Solutions. This might be achieved in two different ways. I'll try to provide solutions in Linq as it is more readable. However, you can change them in Lambda Expressions later.
Solution 1
var solutions = ctx.Solutions
.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear)
.Select(q => q.SolutionId);
var profs = (from prof in ctx.Proficiencies where (from sol in solutions select sol).Contains(prof.SolutionID) select prof).ToList();
or
Solution 2
var profs = (from prof in ctx.Proficiencies
join sol in ctx.Solutions on prof.SolutionId equals sol.Id
where sol.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || sol.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear
select prof).Distinct().ToList();
You can trace both queries in SQL Profiler to investigate the generated queries. But I'd go for the first solution as it will generate a subquery that is faster and does not use Distinct function that is not recommended unless you have to.

Reusing LINQ query results in another LINQ query without re-querying the database

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!
If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.
Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help

How can I make this SelectMany use a Join?

Given that I have three tables (Customer, Orders, and OrderLines) in a Linq To Sql model where
Customer -- One to Many -> Orders -- One to Many -> OrderLines
When I use
var customer = Customers.First();
var manyWay = from o in customer.CustomerOrders
from l in o.OrderLines
select l;
I see one query getting the customer, that makes sense. Then I see a query for the customer's orders and then a single query for each order getting the order lines, rather than joining the two. Total of n + 1 queries (not counting getting customer)
But if I use
var tableWay = from o in Orders
from l in OrderLines
where o.Customer == customer
&& l.Order == o
select l;
Then instead of seeing a single query for each order getting the order lines, I see a single query joining the two tables. Total of 1 query (not counting getting customer)
I would prefer to use the first Linq query as it seems more readable to me, but why isn't L2S joining the tables as I would expect in the first query? Using LINQPad I see that the second query is being compiled into a SelectMany, though I see no alteration to the first query, not sure if that's a indicator to some problem in my query.
I think the key here is
customer.CustomerOrders
Thats an EntitySet, not an IQueryable, so your first query doesn't translate directly into a SQL query. Instead, it is interpreted as many queries, one for each Order.
That's my guess, anyway.
How about this:
Customers.First().CustomerOrders.SelectMany(item => item.OrderLines)
I am not 100% sure. But my guess is because you are traversing down the relationship that is how the query is built up, compared to the second solution where you are actually joining two sets by a value.
So after Francisco's answer and experimenting with LINQPad I have come up with a decent workaround.
var lines = from c in Customers
where c == customer
from o in c.CustomerOrders
from l in o.OrderLines
select l;
This forces the EntitySet into an Expression which the provider then turns into the appropriate query. The first two lines are the key, by querying the IQueryable and then putting the EntitySet in the SelectMany it becomes an expression. This works for the other operators as well, Where, Select, etc.
Try this query:
IQueryable<OrderLine> query =
from c in myDataContext.customers.Take(1)
from o in c.CustomerOrders
from l in o.OrderLines
select l;
You can go to the CustomerOrders property definition and see how the property acts when it used with an actual instance. When the property is used in a query expression, the behavior is up to the query provider - the property code is usually not run in that case.
See also this answer, which demonstrates a method that behaves differently in a query expression, than if it is actually called.

LINQ Join 2 Datatables and With SUM AND GROUP BY

I simply can not get this to work out at all, so any expert help would be very much appreciated.
I'm trying (as the subject suggests) to join 2 datatables on Zip Code, but return a table which grouped this by State and has a SUM() of sales.
Here's the latest version of my troubles:
var results =(
from a in dtList.AsEnumerable()
join b in dtListCoded.AsEnumerable()
on a.Field<string>("ZIP") equals b.Field<string>("zip")
group a by {a.Field<string>("StateCode")} into g
select new {
StateCode = a.Field<string>("StateCode"),
SumSales = b.Sum(b => b.Field<double>("SUMSales"))
});
I can join the 2 tables but its getting the result i need that seems to be the tricky bit. If need be I will just have to do 2 queries, but that just seems a bit backward.
Thanks in advance.
Two queries wouldn't be any slower (they should be brought together into a single SQL query upon execution), and would be a lot more readable, transparent during debugging and reusable. I'd recommend breaking it down.

Use a Criteria on top of another Criteria

I have a question on criteria:
How can i use a Criteria (or similar) that filters and/or do whatever with another criteria?
Something like:
select clients.* from
(select * from clients) as clients
The real problem is something else, but achieving this behaviour would be terrific...
(btw, both java and .net are welcome to help)
thanks
It can't be done, AFAIK. The tutorial about HQL says:
Note that HQL subqueries can occur only in the select or where clauses.
I can't find same statement about Criteria, but in API the only way to create criteria is to give mapped type. There is support for subqueries but only in where clause. Here is javadoc.
Your FROM clause needs to be a mapped object. You could do a subselect inside the WHERE clause... something like:
select c from clients c where c.id in (select c2.id from clients c2)
It would help if you could give a better example. The example you gave can be reduced down to the following HQL:
"from clients"
...which isn't terribly useful.
You could try adding a NHibernate.Criterion.InExpression to your criteria.
Found an example on this blog:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/08/26/parameter-lists-in-nhibernate.aspx
i'm not sure i understand your question correctly, but if you want to do a select on a list of objects, you can use the subqueries with DetachedCriteria. I use it all the time, especially for paging objects while creating a left outer join, which could lead me to incorrect number of entities.
Imagine you've got users who buy products, with a relationship many-many:
Dim dc As DetachedCriteria = DetachedCriteria.For(GetType(User)).SetFirstResult(pageNumber * itemsPerPage).SetMaxResults(itemsPerPage)
Session.CreateCriteria(GetType(user)).Add(Subqueries.PropertyIn("Id", dc)).CreateAlias("ProductsBought", "pb", NHibernate.SqlCommand.JoinType.LeftOuterJoin)
Thom's right, perhaps you should be more precise...

Categories

Resources