LINQ Query is running slow but only on two of my statements - c#

I have 2 linq statements below that are part of a larger query. I have about 6 other statements that do very similar things as the the below 2 statements. My query without these 2 statements executes in about 237ms. When I add these 2 it adds on about 10 seconds of time.
The demandXPCILStatuses table has about 30k records and the demand has about 13k.
The PCILStatuses table has 6 records in it.
After doing timing on other tables that have about the same amount of records I have pretty much ruled it being too much data which I never really thought it was anyways but thought I would run some tests.
DemandXPCILStatus = (from demandXPCILStatus in demandXPCILStatuses
where demand.ID == demandXPCILStatus.DemandID
&& demandXPCILStatus.Active == true
select demandXPCILStatus).FirstOrDefault(),
PCILStatus = (from demandXPCILStatus in demandXPCILStatuses
join PCILStatus in PCILStatuses
on new { A = demandXPCILStatus.PCILStatusID,
B = demandXPCILStatus.DemandID,
C = demandXPCILStatus.Active }
equals new { A = PCILStatus.ID, B = demand.ID, C = true }
select PCILStatus).FirstOrDefault(),
Here is how my tables are designed
I [![DemandXPCILStatus][1]][1]
[![PCILStatus][2]][2]
I tried to post an image of my database design but I don't have enough points to do that.
So here is how it is designed
DemandXPCILStatus
ID (PK, int, not null)
DemandID (int, not null)
PCILStatusID (int not null)
PCILTime (datetime, null)
LastUpdatedOn (datetime, null)
Active (bit, null)
PCILStatus
ID (PK, int, not null)
Status (nvarchar(50), null)
Code (nvarchar(10), null)
Class (nvarchar(30), null)
At this point I don't know what else to try. Any suggestions? FYI this is my first LINQ query so I have almost no idea what I am doing.

I am using Dapper to retrieve data and put it into memory before running the query. The table DemandXPCILStatus was returning just over 30k records. I know I didn't post the rest of my query but it is a pretty heavy use of LINQ and I guess 30k records was just too many for performance issues. I filtered out data on that table before putting into memory and that portion of the query went from like 4.5 seconds to like 2ms.
I guess I was a little unclear on the amount of data linq could handle and map to complex objects. But now that I know I fixed up my query and it went from running and displaying in about 15 seconds to 1.2 seconds.

Related

How to retrieve data from very large datasets with optional parameters?

I have an app that retrieves data requested by the user. All parameters except Type are optional. If a parameter is not specified, all items are retrieved. If it is specified, only items corresponding that parameter are retrieved. For example, here I retrieve products by year of release (-1 is the default value, if the user hasn't specified one):
var products = context.Products.Where(p => p.type == Type).ToList();
if (!(Year == -1))
products = products.Where(p => p.year == Year).ToList();
This works perfectly fine for some of the years. E.g., if I search 2001, I get all entries needed. But since products has a limited size and only retrieves 1500 entries, later years are simply not retrieved, not in the products list, and it comes up as no data for that year, even though there is data in the DB.
How can I get around this problem?
One of the nice things about deferred execution on LINQ is it can help make code that has variable filtering rules a lot more neat and readable. If you're not sure what deferred execution is, in a nutshell it's a mechanism that only runs the LINQ query when you ask for the results rather than when you make the statements that comprise the query.
In essence this means we can have code like:
//always adults
var p = person.Where(x => x.Age > 18);
//we maybe filter on these
if(email != null)
p = p.Where(x => x.Email == email);
if(socialSN != null)
p = p.Where(x => x.SSN == socialSN);
var r = p.ToList(); //the query is only actually run now
The multiple calls to where here are cumulative; they will conceptually build a where clause but not execute the query until ToList is called. At this point, if a database is in use then the db sees the query with all its Where clauses and can leverage indexes and statistics
If we were to use ToList after every Where, then the first Where would hit the db and it's whole dataset would download to the client app, and the runtime would set about converting an enumerable to a list (a lot of copying and memory allocating). The subsequent Where would filter the list in the client app, enumerating it but then converting it to a list again - the big problem being its done in the memory of the client app as some naive unindexed loop, and all those millions of dollars of r&d Microsoft poured into making their SQL Server query optimizer pull huge amounts of data very quickly, are wasted :)
Consider also that that first clause in my example set- Age>18 could be huge; a million people of a spread of ages over age 12, for example - A large amount of data is true for that predicate. Email or SSN would be a far smaller dataset, probably indexed etc. It's a contrived example sure but hopefully well illustrates the point about performance; by ToList()ing too early we end up downloading too much data

Optimizing LINQ Query using Entity Framework

I have a following LINQ query to get product information using Entity Framework
productDetails.Items = (from productDetail in db.ToList()
select new prod
{
ID = productDetail.ID
ProdName = productDetail.ProductName,
...
...
...
...
Calaculation1 = GetCalaculation(productDetail.Calc1),
Calaculation1 = GetCalaculation(productDetail.Calc2),
...
...
...
...
Calaculation15 = GetCalaculation(productDetail.Calc3)
}
).ToList();
where the GetCalaculation method also queries DB using LINQ. The query is slow if I am fetching 100's of records. How I can optimize it?
First of all the structure of your select looks a "little" problematic to me since you fetching 15 Calaculation properties for each record. Even if you create a view in the data base it will have 15 Joins to Calculations table which is very bad for performance. So the first thing that you should do is to review your object structure and to confirm that you REALLY need all those calculations to be fetched in one request.
If you insist that your structure can not be changed here are some steps that could significantly improve the performance:
If the tables don't change to often you may consider to create materialized view (view with clustered index in SQL Server) that will contain already calculated data. In this case the query will be very fast but the Inserts/Updates in to tables will be much slower.
Do not use db.ToList() in your query - by doing it you fetch all your table in to the memory and then you issue separate queries for each one of the calculations.
I am a little confused about this query:
var dbQuery = from calculation in db.Calculations
where calculation.calc == calc1 select calculation);
var totalsum = (from xyz in dbQuery select (Decimal?)xyz.calc).Sum() ?? 0;
You are fetching all the records that have a calc == calc1 and then calculating their sum? wouldn't it be much easier to count how many records have calc == calc1 and then to multiply it by calc1
db.Calculations.Count(c=>c.calc == calc1) * calc1;
It could be cheaper to fetch all the Calculations table into memory together with Product table ( var calcTable = db.Calculations.ToList() ) if it has a limited number of records then GetCalaculation will work with in-memory objects that will be faster. If you are going to do it you may consider to do it in parallel or in separate Tasks.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.
So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);
Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

NHibernate: Object hierarchy and performance

I've a database with a Customer table. Each of these customers has a foreign key to an Installation table, which further has an foreign key to an Address table (table renamed for simplicity).
In NHibernate I'm trying to query the Customer table like this:
ISession session = tx.Session;
var customers = session.QueryOver<Customer>().Where(x => x.Country == country);
var installations = customers.JoinQueryOver(x => x.Installation, JoinType.LeftOuterJoin);
var addresses = installations.JoinQueryOver(x => x.Address, JoinType.LeftOuterJoin);
if (installationType != null)
{
installations.Where(x => x.Type == installationType);
}
return customers.TransformUsing(new DistinctRootEntityResultTransformer()).List<Customer>();
Which results in a SQL query similar to (catched by NHibernate Profiler):
SELECT *
FROM Customer this_
left outer join Installation installati1_
on this_.InstallationId = installati1_.Id
left outer join Address address2_
on installati1_.AddressId = address2_.Id
WHERE this_.CountryId = 4
and installati1_.TypeId = 1
When I execute the above SQL query in Microsoft SQL Server Management Studio it executes in about 5 seconds but returns ~200.000 records. Nevertheless it takes a lot of time to retrieve the List when running the code. I've been waiting for 10 minutes without any results. The debug-log indicated that a lot of objects are constructed and initiated because of the object hierarchy. Is there a way to fix this performance issue?
I'm not sure what you are trying to do, but loading and saving 200000 records through any OR mapper is not feasable. 200000 objects will take a lot of memory and time to be created. Depending on what you want to do, loading them in pages or make a update query directly on the database (sp or named query) can fix your performance. Batching can be done by:
criteria.SetFirstResult(START).SetMaxResult(PAGESIZE);
NHibernate Profiler shows two times in the duration column x/y, with x being the time to execute the query and y the time to initialize the objects. The first step is to determine where the problem lies. If the query is slow, get the actual query sent to the database using SQL Profiler (assuming SQL Server) and check its performance in SSMS.
However, I suspect your issue may be the logging level. If you have the logging level set to DEBUG, NHibernate will generate very verbose logs and this will significantly impact performance.
Even if you can get it to perform well with 200000 records that's more than you can display to the user in a meaningful way. You should use paging/filtering to reduce the size of the result set.

Is SQL View faster than Table while using Linq?

My WPF application has a lookup screen for selecting customers. The customer table contain nearly 10,000 records. Its very slow when loading and filtering records using my Linq query(I am not doing any ordering of records). Is there a way to increase speed? Heard about using indexed views. Can someone please give some ideas?
lstCustomerData = dbContext.customers.Where(c => c.Status == "Activated").ToList();
dgCustomers.ItemsSource = lstCustomerData;
filtering:
string searchKey = TxtCustName.Text.Trim();
var list = (from c in lstCustomerData
where (c.LastName == null ? "" : c.LastName.ToUpper()).Contains(searchKey.ToUpper())
select c).ToList();
if (list != null)
dgCustomers.ItemsSource = list;
Depends on what is slow. is the SQL Query slow? Is the UI rendering slow? Are you sorting/fintering in memory or going back to the DB?
You should profile your app to find out exactly what the slowest piece is, then tackle that first.
If the Linq query you added is what is slow then adding an index to the Status column in your database may help.
You might get some improvement by changing your Where clause:
var list = (from c in lstCustomerData
where (c.LastName != null && c.LastName.ToUpper()).Contains(searchKey.ToUpper())
select c).ToList();
if (list != null)
dgCustomers.ItemsSource = list;
since it doesn't have to compare an empty string. However if you have very few NULL records than this probably won't help much.
In this case, however, all of the filtering is done in memory so using an indexed view in the DB won't help unless you push the filtering back to the source repository.

Categories

Resources