Optimizing LINQ Query using Entity Framework - c#

I have a following LINQ query to get product information using Entity Framework
productDetails.Items = (from productDetail in db.ToList()
select new prod
{
ID = productDetail.ID
ProdName = productDetail.ProductName,
...
...
...
...
Calaculation1 = GetCalaculation(productDetail.Calc1),
Calaculation1 = GetCalaculation(productDetail.Calc2),
...
...
...
...
Calaculation15 = GetCalaculation(productDetail.Calc3)
}
).ToList();
where the GetCalaculation method also queries DB using LINQ. The query is slow if I am fetching 100's of records. How I can optimize it?

First of all the structure of your select looks a "little" problematic to me since you fetching 15 Calaculation properties for each record. Even if you create a view in the data base it will have 15 Joins to Calculations table which is very bad for performance. So the first thing that you should do is to review your object structure and to confirm that you REALLY need all those calculations to be fetched in one request.
If you insist that your structure can not be changed here are some steps that could significantly improve the performance:
If the tables don't change to often you may consider to create materialized view (view with clustered index in SQL Server) that will contain already calculated data. In this case the query will be very fast but the Inserts/Updates in to tables will be much slower.
Do not use db.ToList() in your query - by doing it you fetch all your table in to the memory and then you issue separate queries for each one of the calculations.
I am a little confused about this query:
var dbQuery = from calculation in db.Calculations
where calculation.calc == calc1 select calculation);
var totalsum = (from xyz in dbQuery select (Decimal?)xyz.calc).Sum() ?? 0;
You are fetching all the records that have a calc == calc1 and then calculating their sum? wouldn't it be much easier to count how many records have calc == calc1 and then to multiply it by calc1
db.Calculations.Count(c=>c.calc == calc1) * calc1;
It could be cheaper to fetch all the Calculations table into memory together with Product table ( var calcTable = db.Calculations.ToList() ) if it has a limited number of records then GetCalaculation will work with in-memory objects that will be faster. If you are going to do it you may consider to do it in parallel or in separate Tasks.

Related

LINQ Query is running slow but only on two of my statements

I have 2 linq statements below that are part of a larger query. I have about 6 other statements that do very similar things as the the below 2 statements. My query without these 2 statements executes in about 237ms. When I add these 2 it adds on about 10 seconds of time.
The demandXPCILStatuses table has about 30k records and the demand has about 13k.
The PCILStatuses table has 6 records in it.
After doing timing on other tables that have about the same amount of records I have pretty much ruled it being too much data which I never really thought it was anyways but thought I would run some tests.
DemandXPCILStatus = (from demandXPCILStatus in demandXPCILStatuses
where demand.ID == demandXPCILStatus.DemandID
&& demandXPCILStatus.Active == true
select demandXPCILStatus).FirstOrDefault(),
PCILStatus = (from demandXPCILStatus in demandXPCILStatuses
join PCILStatus in PCILStatuses
on new { A = demandXPCILStatus.PCILStatusID,
B = demandXPCILStatus.DemandID,
C = demandXPCILStatus.Active }
equals new { A = PCILStatus.ID, B = demand.ID, C = true }
select PCILStatus).FirstOrDefault(),
Here is how my tables are designed
I [![DemandXPCILStatus][1]][1]
[![PCILStatus][2]][2]
I tried to post an image of my database design but I don't have enough points to do that.
So here is how it is designed
DemandXPCILStatus
ID (PK, int, not null)
DemandID (int, not null)
PCILStatusID (int not null)
PCILTime (datetime, null)
LastUpdatedOn (datetime, null)
Active (bit, null)
PCILStatus
ID (PK, int, not null)
Status (nvarchar(50), null)
Code (nvarchar(10), null)
Class (nvarchar(30), null)
At this point I don't know what else to try. Any suggestions? FYI this is my first LINQ query so I have almost no idea what I am doing.
I am using Dapper to retrieve data and put it into memory before running the query. The table DemandXPCILStatus was returning just over 30k records. I know I didn't post the rest of my query but it is a pretty heavy use of LINQ and I guess 30k records was just too many for performance issues. I filtered out data on that table before putting into memory and that portion of the query went from like 4.5 seconds to like 2ms.
I guess I was a little unclear on the amount of data linq could handle and map to complex objects. But now that I know I fixed up my query and it went from running and displaying in about 15 seconds to 1.2 seconds.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.
So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);
Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

NHibernate: Object hierarchy and performance

I've a database with a Customer table. Each of these customers has a foreign key to an Installation table, which further has an foreign key to an Address table (table renamed for simplicity).
In NHibernate I'm trying to query the Customer table like this:
ISession session = tx.Session;
var customers = session.QueryOver<Customer>().Where(x => x.Country == country);
var installations = customers.JoinQueryOver(x => x.Installation, JoinType.LeftOuterJoin);
var addresses = installations.JoinQueryOver(x => x.Address, JoinType.LeftOuterJoin);
if (installationType != null)
{
installations.Where(x => x.Type == installationType);
}
return customers.TransformUsing(new DistinctRootEntityResultTransformer()).List<Customer>();
Which results in a SQL query similar to (catched by NHibernate Profiler):
SELECT *
FROM Customer this_
left outer join Installation installati1_
on this_.InstallationId = installati1_.Id
left outer join Address address2_
on installati1_.AddressId = address2_.Id
WHERE this_.CountryId = 4
and installati1_.TypeId = 1
When I execute the above SQL query in Microsoft SQL Server Management Studio it executes in about 5 seconds but returns ~200.000 records. Nevertheless it takes a lot of time to retrieve the List when running the code. I've been waiting for 10 minutes without any results. The debug-log indicated that a lot of objects are constructed and initiated because of the object hierarchy. Is there a way to fix this performance issue?
I'm not sure what you are trying to do, but loading and saving 200000 records through any OR mapper is not feasable. 200000 objects will take a lot of memory and time to be created. Depending on what you want to do, loading them in pages or make a update query directly on the database (sp or named query) can fix your performance. Batching can be done by:
criteria.SetFirstResult(START).SetMaxResult(PAGESIZE);
NHibernate Profiler shows two times in the duration column x/y, with x being the time to execute the query and y the time to initialize the objects. The first step is to determine where the problem lies. If the query is slow, get the actual query sent to the database using SQL Profiler (assuming SQL Server) and check its performance in SSMS.
However, I suspect your issue may be the logging level. If you have the logging level set to DEBUG, NHibernate will generate very verbose logs and this will significantly impact performance.
Even if you can get it to perform well with 200000 records that's more than you can display to the user in a meaningful way. You should use paging/filtering to reduce the size of the result set.

How to query entities from unrelated tables in one batch

I would like to query to different Tables, say apples and cars, which have no relationship so that active record goes only once to the database.
Example in pseudocode:
var q1 = new Query("select * form apple");
var q2 = new Query("select * from car");
var batchQuery = new BatchQuery(q1,q2);
var result = BatchQuery.Execute(); //only one trip to the database
var apples = result[0] as IEnumerable<Apple>;
var cars = result[1] as IEnumerable<Car>;
I have tried ActiveRecordMultiQuery, but there all Queries need to query the same table.
I don't believe there is a way to do this.
It seems like you might be going a bit overboard with optimization here: does it really make a noticeable difference to your application to make 2 separate queries? I think your time might be better spent looking for N+1 select queries elsewhere in your application instead.
If the cost of one extra query is in fact significant then you probably have an issue with the database server or the connection to it.

Entity model .net querying 1 million records from MySQL performance issues

I am using an ADO .Net Entity Model for querying a MySQL database. I was very happy about its implementation and usage. I decided to see what would happen if I queried 1 million records and it has serious performance issues, and I don't understand why.
The system hangs for sometime and then I get either
A deadlock exception
MySQL Exception
My code is as follows::
try
{
// works very fast
var data = from employees in dataContext.employee_table
.Include("employee_type")
.Include("employee_status")
orderby employees.EMPLOYEE_ID descending
select employees;
// This hangs the system and causes some deadlock exception
IList<employee_table> result = data.ToList<employee_table>();
return result;
}
catch (Exception ex)
{
throw new MyException("Error in fetching all employees", ex);
}
My question is why is ToList() taking such a long time?
Also how can I avoid this exception and what is the ideal way to query a million records?
The ideal way to query a million records would be to use a IQueryable<T> to make sure that you actually aren't executing a query on the database until you need the actual data. I highly doubt that you need a million records at once.
The reason that it is deadlocking is that you are asking the MySQL server to pull those million records from the database then sort then by the EMPLOYEE_ID and then for your program to return that back to you. So I imagine that the deadlocks are from your program waiting for that to finish, and for your program to read that into memory. The MySQL problems are probably related to timeout issues.
The reason that the var data section works quickly is because you actually haven't done anything yet, you've just constructed the query. when you call ToList() then all of the SQL and reading of the SQL is executed. This is what is known as Lazy Loading.
I would suggest try this as follows:
var data = from employees in dataContext.employee_table
.Include("employee_type")
.Include("employee_status")
orderby employees.EMPLOYEE_ID descending
select employees;
Then when you actually need something from the list just call
data.Where(/* your filter expression */).ToList()
So if you needed the employee with ID 10.
var employee = data.Where(e => e.ID == 10).ToList();
Or if you need all the employees that last names start with S (I don't know if your table has a last name column, just an example).
var employees = data.Where(e => e.LastName.StartsWith("s")).ToList();
Or if you want to page through all of the employees in chunks of 100
var employees = data.Skip(page * 100).Take(100).ToList();
If you want to defer your database calls even further, you can not call ToList() and just use the iterator when you need it. So let's say you want to add up all of the salaries of the people that have a name starting with A
var salaries = data.Where(s => s.LastName.StartsWith("A"))
foreach(var employee in salaries)
{
salaryTotal += employee.Salary;
}
This would only do a query that would look something like
Select Salary From EmployeeTable Where ID = #ID
Resulting in a very fast query that is only getting the information when you need it and only just the information that you need.
If for some crazy reason you wanted to actually query all the million records for the database. Ignoring the fact that this would eat up a massive amount of system resources I would suggest doing this in chunks, you would probably need to play around with the chunk size to get the best performance.
The general idea is to do smaller queries to avoid timeout issues from the database.
int ChunkSize = 100; //for example purposes
HashSet<Employee> Employees - new HashSet<Employee>;
//Assuming it's exactly 1 Million records
int RecordsToGet = 1000000;
for(record = 0; record <= RecordsToGet; record += ChunkSize)
{
dataContext.EmployeeTable.Skip(record).Take(ChunkSize).ForEach(e => HashSet.Add(e));
}
I chose to use a HashSet<T> since they are designed for large sets of data, but I don't know what performance would look like a 1,000,000 objects.

Categories

Resources