I have built a document management system for my company. A desktop application connects to an ASP.Net Web API, hosted as an Azure Web App, which connects to an Azure SQL database.
As my database has become more populated, it is starting to slow down significantly so I need help with doing so query optimizations. Here is the Linq query that is currently causes me problems. Basically this query captures all the projects from the database and then populates a list in my desktop application
var projects = (from p in db.Projects.Include(c => c.ProjectType)
select new
{
ID = p.ID,
HasSubProjects = (db.Projects.Where(u => u.ParentProjectID == p.ID).Count() > 0) ? 1 : 0,
ParentProjectID = p.ParentProjectID,
Name = p.Name,
Description = p.Description,
DateLastEdited = p.DateLastEdited,
DateCreated = p.DateCreated,
ProjectTypeID = p.ProjectTypeID,
LastEditedByGoesby = p.LastEditedByGoesby,
ProjectComponentSecurityType = p.ProjectComponentSecurityType,
ClonedFrom = p.ClonedFrom,
DateAnyVersionLastEditedByUser = p.DateAnyVersionLastEditedByUser,
DateAnyVersionLastEdited = p.UserProjectActivityLookups
.OrderByDescending(u => u.LastActivityDate)
.Select(v => v.LastActivityDate)
.FirstOrDefault(),
// p.DateAnyVersionLastEdited,,
ProjectType = p.ProjectType
}).ToList()
.Select(x2 => new Project
{
ID = x2.ID,
HasSubProjects = x2.HasSubProjects,
ParentProjectID = x2.ParentProjectID,
Name = x2.Name,
Description = x2.Description,
DateLastEdited = x2.DateLastEdited,
DateCreated = x2.DateCreated,
ProjectTypeID = x2.ProjectTypeID,
LastEditedByGoesby = x2.LastEditedByGoesby,
ProjectComponentSecurityType = x2.ProjectComponentSecurityType,
ClonedFrom = x2.ClonedFrom,
DateAnyVersionLastEditedByUser = x2.DateAnyVersionLastEditedByUser,
DateAnyVersionLastEdited = x2.DateAnyVersionLastEdited,
ProjectType = x2.ProjectType
});
;
Any ideas on how to optimize this, avoid problems related to this query, create indexes better or using standard procedures etc would be helpful. I'm looking for direction on this specific query but also any guidance on how I can improve my performance on other queries and how to go about doing that.
Thanks.
Azure SQL database has several built in tools that provide query optimization help.
Query Store acts as "flight data recorder" for queries to get the data on query plans, execution times and to find most expensive queries. You can look at the execution statistics for your query above to see if there are any obvious problems.
Index advisor analyzes the historical resource usage and provides index advice and monitors the index usage
Various dynamic management views (DMVs) like dm_exec_query_stats provide query execution statistics information
MSDN article on azure sql database performance guidance provides a high level performance guidance information
Azure SQL database shares common codebase with SQL Server. So, most of the query optimization techniques used for SQL Server will also apply here. Bing/Google search on SQL Server query optimization will provide lots of pointers.
Srini Acharya
I think the best way for going about is to find out what sql query this code is generating. Then try to see what is the issue with that sql query and accordingly change your linq query.
Some things that you should look for:
ParentProjectID, check for the indexes and try to join instead of subquery.
Similarly check for DateAnyVersionLastEdited, since the query is going against the all user data. This could slow you down if this has a lot of data and bad indexes.
Last thing I would recommend to have a where clause depending upon the data you are retrieving.
So, if you are ultimately retrieving records in thousands, then try to restrict the count either by paging or by some other business condition.
Related
This may be a silly question, but I have fluent linq query that is pulling data off a DB, but, of course, it processes the data in memory, not on the DB.
Is there a means of taking a query (Say table.Where(t => t. id > 22)) and having it run as a db query (i.e. SELECT * FROM TABLE WHERE ID > 22).
I want to use the fluent style because I am building up the query from various places. I know I can make this specific query into query syntax, but my actual queries are more complex.
I have tried standard searches, but I suspect it is not going to be possible, as nothing comes up.
Yes using EF Core.
The code is not necessarily clear:
var getTableData = table.Where(base.whereSelection);
whereSelection.ForEach(w => getTableData = getTableData.Where(w));
At the moment, all of the Where Selections are Funcs - they could be converted to something else (like Expressions) if I knew that would make them run on the DB. The rest of the code is also building up this set of details programatically:
var selectedData = getTableData.Select(Convert);
var sortedTableData = GetSort(selectedData, Ordering);
var sorted = PickDataFrom(sortedTableData);
return CheckAndTake(sorted, pageSize);
Everything seems to work. Just in the wrong place.
I have the following LINQ query that runs on a 1 million records table:
var result = await lmsDb.SendGridEvents
.Where(s => s.InternalId == internalId && s.Email.ToLower() == email.ToLower())
.Select(s => new MailEventDTO
{
InternalId = s.InternalId,
Email = s.Email,
error = s.error,
Event = s.Event,
Reason = s.Reason,
Response = s.Response,
Url = s.Url,
TimeStamp = s.TimeStamp
})
.OrderByDescending(a => a.TimeStamp) // get the latest
.FirstOrDefaultAsync();
return result;
How can I improve the performance of this query? Started to become really really slow.
Check the collation setting on your database. If it is _CI_ this will perform case insensitive string comparisons so you do not need to perform explicit case conversions. This will allow SQL Server to utilize indexes for the Email address. If it's _CS_ this is case sensitive which will be a crimper on your performance. In that case pre-case the variable and just apply the ToLower on the entity value in the expression.
The next step would be to look at the database advisor for Azure SQL in the environments that you are experiencing the performance issues, typically in production. This will give you an overall view of the performance of the database including providing suggestions for index changes based on the types of queries running.
For SQL Server I like to capture the actual queries being run using a profiler, then execute these individually to inspect the execution plan being used, as well as look for any index suggestions. For instance at a guess for this query you would likely want an index on the combination of InternalId ASC, Email ASC, and Timestamp DESC. If you are using a _CS_ collation then I suspect Email might be better off not in the index, but ultimately base the index addition/change/deletions off the suggestions from the advisor. When it comes to index creation, look at the suggestions from the tooling as creating the wrong indexes just lead to storage bloat and performance costs with no benefit.
This should give you a few initial places to start looking into.
I am using EntityFramework 6 with the official MySQL provider.
I have a database containing a list of VenuePlans which each consist of Areas.
In order to show these values I am using this very simple LINQ query:
model.VenuePlans = CurrentOrganization.VenuePlans.Select(p => new ViewModels.VenuePlans.IndexViewModel.VenuePlan
{
ID = p.MaskID,
Name = p.DisplayName,
AreaCount = p.VenuePlans_Areas.Count()
}).ToArray();
But when looking at the executed queries using MiniProfiler I see that this results in duplicate queries as follows:
Retrieving the VenuePlans:
SELECT
`Extent1`.`PlanID`,
`Extent1`.`MaskID`,
`Extent1`.`DisplayName`,
`Extent1`.`OrganizationID`,
`Extent1`.`SVG`
FROM `VenuePlans` AS `Extent1`
WHERE `Extent1`.`OrganizationID` = #EntityKeyValue1
Retrieving the Areas for the first VenuePlan:
SELECT
`Extent1`.`AreaID`,
`Extent1`.`PlanID`,
`Extent1`.`DisplayName`,
`Extent1`.`MaskID`,
`Extent1`.`FillColor`,
`Extent1`.`InternalName`
FROM `VenuePlans_Areas` AS `Extent1`
WHERE `Extent1`.`PlanID` = #EntityKeyValue1
Now this latter query is repeated for every area present in the database.
CurrentOrganization is an instance of another model retrieved earlier.
Now when writing the query directly on the DbContext instance I don't have this issue:
model.VenuePlans = DbContext.VenuePlans
.Where(p => p.OrganizationID == CurrentOrganization.OrganizationID)
.Select(p => new ViewModels.VenuePlans.IndexViewModel.VenuePlan
{
ID = p.MaskID,
Name = p.DisplayName,
AreaCount = p.VenuePlans_Areas.Count()
}).ToArray();
What is the reason for this?
DbContext is a variable declared in my BaseController which returns an instance of the current DbContext stored in HttpRequest.Items.
What can I do to prevent this behavior?
I've never found the MySql Linq stuff to be very good. I used it recently, and had to use ToList earlier than I would have liked to stop the query generation from spouting gibberish.
Armed with the knowledge that Linq to MySql is broken, and it's not just you, you'd be best using the version of the query that's fluid from your context instead of from your object.
Having said that, I'd be interesting in seeing if anybody does have a solution, because I tend to avoid Linq when using MySql.
I have a object of MyFriendFollowStatus , for each friend I need information from 2 different databases, so I wrote something like this
db1Context FFdb = new db1Context();
db2Context EEdb = new db2Context();
foreach (fbFriendsFollowStatus a in fbids)
{
long ffID = FFdb.FFUsers.Where(x => x.FacebookID == a.fbid).FirstOrDefault().FFUserID;
a.ffID = ffID;
int status = EEdb.StatusTable(x => x.ffID == ffid).FirstOrDefault().Status;
a.Status = status;
}
this works, but it doesnt really seem right calling 2 databases - once each for each user , is there something built in LinqToSql that helps with something like this? or sometype of join I can use using 2 different databases?
Well, you can always limit your N+1 query problem to 3 queries - one to get users, one to get user's data form first database and one for the second database. Then connect all the results in memory - this will limit the connections to databases which should improve performance of your application.
I don't know if linq-to-sql or entity framework offers building model from different databases - this would pose some performance problems probably - like in includes or something, but I may simply not be aware of such features.
Sample code to do what you're trying to achieve would look something like that:
var facebookIds = fbFriendsFollowStatus.Select(a => a.fbid);
var FFUserIds= FFdb.FFUsers.Where(x => facebookIds.Contains(x.FacebookID)).Select(x => new { x.FacebookID, x.FFUserId)
var statuses = EEdb.StatusTable.Where(x => FFUserIds.Contains(x.ffID)).Select(x => new { x.ffID, x.Status})
And then some simple code to match results in memory - but that will be simple.
Please note that this code is sample - if I've mismatched some ids or something, but idea should be clear.
I've a database with a Customer table. Each of these customers has a foreign key to an Installation table, which further has an foreign key to an Address table (table renamed for simplicity).
In NHibernate I'm trying to query the Customer table like this:
ISession session = tx.Session;
var customers = session.QueryOver<Customer>().Where(x => x.Country == country);
var installations = customers.JoinQueryOver(x => x.Installation, JoinType.LeftOuterJoin);
var addresses = installations.JoinQueryOver(x => x.Address, JoinType.LeftOuterJoin);
if (installationType != null)
{
installations.Where(x => x.Type == installationType);
}
return customers.TransformUsing(new DistinctRootEntityResultTransformer()).List<Customer>();
Which results in a SQL query similar to (catched by NHibernate Profiler):
SELECT *
FROM Customer this_
left outer join Installation installati1_
on this_.InstallationId = installati1_.Id
left outer join Address address2_
on installati1_.AddressId = address2_.Id
WHERE this_.CountryId = 4
and installati1_.TypeId = 1
When I execute the above SQL query in Microsoft SQL Server Management Studio it executes in about 5 seconds but returns ~200.000 records. Nevertheless it takes a lot of time to retrieve the List when running the code. I've been waiting for 10 minutes without any results. The debug-log indicated that a lot of objects are constructed and initiated because of the object hierarchy. Is there a way to fix this performance issue?
I'm not sure what you are trying to do, but loading and saving 200000 records through any OR mapper is not feasable. 200000 objects will take a lot of memory and time to be created. Depending on what you want to do, loading them in pages or make a update query directly on the database (sp or named query) can fix your performance. Batching can be done by:
criteria.SetFirstResult(START).SetMaxResult(PAGESIZE);
NHibernate Profiler shows two times in the duration column x/y, with x being the time to execute the query and y the time to initialize the objects. The first step is to determine where the problem lies. If the query is slow, get the actual query sent to the database using SQL Profiler (assuming SQL Server) and check its performance in SSMS.
However, I suspect your issue may be the logging level. If you have the logging level set to DEBUG, NHibernate will generate very verbose logs and this will significantly impact performance.
Even if you can get it to perform well with 200000 records that's more than you can display to the user in a meaningful way. You should use paging/filtering to reduce the size of the result set.