How to query entities from unrelated tables in one batch - c#

I would like to query to different Tables, say apples and cars, which have no relationship so that active record goes only once to the database.
Example in pseudocode:
var q1 = new Query("select * form apple");
var q2 = new Query("select * from car");
var batchQuery = new BatchQuery(q1,q2);
var result = BatchQuery.Execute(); //only one trip to the database
var apples = result[0] as IEnumerable<Apple>;
var cars = result[1] as IEnumerable<Car>;
I have tried ActiveRecordMultiQuery, but there all Queries need to query the same table.

I don't believe there is a way to do this.
It seems like you might be going a bit overboard with optimization here: does it really make a noticeable difference to your application to make 2 separate queries? I think your time might be better spent looking for N+1 select queries elsewhere in your application instead.
If the cost of one extra query is in fact significant then you probably have an issue with the database server or the connection to it.

Related

Optimizing LINQ Query using Entity Framework

I have a following LINQ query to get product information using Entity Framework
productDetails.Items = (from productDetail in db.ToList()
select new prod
{
ID = productDetail.ID
ProdName = productDetail.ProductName,
...
...
...
...
Calaculation1 = GetCalaculation(productDetail.Calc1),
Calaculation1 = GetCalaculation(productDetail.Calc2),
...
...
...
...
Calaculation15 = GetCalaculation(productDetail.Calc3)
}
).ToList();
where the GetCalaculation method also queries DB using LINQ. The query is slow if I am fetching 100's of records. How I can optimize it?
First of all the structure of your select looks a "little" problematic to me since you fetching 15 Calaculation properties for each record. Even if you create a view in the data base it will have 15 Joins to Calculations table which is very bad for performance. So the first thing that you should do is to review your object structure and to confirm that you REALLY need all those calculations to be fetched in one request.
If you insist that your structure can not be changed here are some steps that could significantly improve the performance:
If the tables don't change to often you may consider to create materialized view (view with clustered index in SQL Server) that will contain already calculated data. In this case the query will be very fast but the Inserts/Updates in to tables will be much slower.
Do not use db.ToList() in your query - by doing it you fetch all your table in to the memory and then you issue separate queries for each one of the calculations.
I am a little confused about this query:
var dbQuery = from calculation in db.Calculations
where calculation.calc == calc1 select calculation);
var totalsum = (from xyz in dbQuery select (Decimal?)xyz.calc).Sum() ?? 0;
You are fetching all the records that have a calc == calc1 and then calculating their sum? wouldn't it be much easier to count how many records have calc == calc1 and then to multiply it by calc1
db.Calculations.Count(c=>c.calc == calc1) * calc1;
It could be cheaper to fetch all the Calculations table into memory together with Product table ( var calcTable = db.Calculations.ToList() ) if it has a limited number of records then GetCalaculation will work with in-memory objects that will be faster. If you are going to do it you may consider to do it in parallel or in separate Tasks.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.
So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);
Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

linq to sql query with 2 dbml files

I have a object of MyFriendFollowStatus , for each friend I need information from 2 different databases, so I wrote something like this
db1Context FFdb = new db1Context();
db2Context EEdb = new db2Context();
foreach (fbFriendsFollowStatus a in fbids)
{
long ffID = FFdb.FFUsers.Where(x => x.FacebookID == a.fbid).FirstOrDefault().FFUserID;
a.ffID = ffID;
int status = EEdb.StatusTable(x => x.ffID == ffid).FirstOrDefault().Status;
a.Status = status;
}
this works, but it doesnt really seem right calling 2 databases - once each for each user , is there something built in LinqToSql that helps with something like this? or sometype of join I can use using 2 different databases?
Well, you can always limit your N+1 query problem to 3 queries - one to get users, one to get user's data form first database and one for the second database. Then connect all the results in memory - this will limit the connections to databases which should improve performance of your application.
I don't know if linq-to-sql or entity framework offers building model from different databases - this would pose some performance problems probably - like in includes or something, but I may simply not be aware of such features.
Sample code to do what you're trying to achieve would look something like that:
var facebookIds = fbFriendsFollowStatus.Select(a => a.fbid);
var FFUserIds= FFdb.FFUsers.Where(x => facebookIds.Contains(x.FacebookID)).Select(x => new { x.FacebookID, x.FFUserId)
var statuses = EEdb.StatusTable.Where(x => FFUserIds.Contains(x.ffID)).Select(x => new { x.ffID, x.Status})
And then some simple code to match results in memory - but that will be simple.
Please note that this code is sample - if I've mismatched some ids or something, but idea should be clear.

Manipulating entity framework to eliminate round trips to the database

Let's say I have the following bit of code (which I know could be easily modified to perform better, but it illustrates what I want to do)
List<Query> l = new List<Query>;
// Query is a class that doesn't exist, it represents an EF operation
foreach (var x in Xs)
{
Query o = { context.someEntity.Where(s=>s.Id==x.Id).First();}
// It wouldn't execute it, this is pseudo code for delegate/anonymous function
l.Add(o)
}
Then send this list of Query to EF, and have it optimize so that it does the least amount of round trips possible. Let's call it BatchOptimizeAndRun; you would say
var results = BatchOptimizeAndRun(l);
And knowing what it knows from the schema it would reduce the overall query to an optimal version and execute that and place the read results in an array.
I hope I've described what I'm looking for accurately and more importantly that it exists.
And if I sound like a rambling mad man, let's pretend this question never existed.
I'd have to echo Mr. Moore's advice, as I too have spent far too long constructing a linq-to-entities query of monolithic proportions only to find that I could have made a stored procedure in less time that was easier to read and faster to execute. That being said in your example...
List<int> ids = Xs.Select(x => x.Id).ToList();
var results = context.someEntity.Where(s => ids.Contains(s.Id)).ToList();
I believe this will compile to something like
SELECT
*
FROM
someEntity
WHERE
Id IN (ids) --Where ids is a comma separated list of INT
Which will provide you with what you need.

Reusing LINQ query results in another LINQ query without re-querying the database

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!
If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.
Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help

Categories

Resources