LINQ query to Azure SQL database timing out - c#

I'm querying my sql database which is in Azure (actually my web app is on Azure as well).
Every time I perform this particular query, there are ever changing errors (e.g. sometimes timeout occurs, sometimes it works perfectly, sometimes it takes extremely long to load).
I have noted that I am using the ToList method here to enumerate the query but I suspect that's why it is degrading.
Is there anyway I can fix this or make it better....or maybe just use native SQL to execute my query?.
I should also note in my webconfig my Database connection timeout is set to 30 seconds. Would this have any performance benefit?
I'm putting the suspect code here:
case null:
lstQueryEvents = db.vwTimelines.Where(s => s.UserID == UserId)
.Where(s => s.blnHide == false)
.Where(s => s.strEmailAddress.Contains(strSearch) || s.strDisplayName.Contains(strSearch) || s.strSubject.Contains(strSearch))
.OrderByDescending(s => s.LatestEventTime)
.Take(intNumRecords)
.ToList();
break;
It's basically querying for the 50 records...I don't understand why it's timing out sometimes.

Here are some tips:
Make sure that your SQL data types matches types in your model
Judging by your code, types should be something like this:
UserID should be int (cannot tell for sure by looking at code);
blnHide should be bit;
strEmailAddress should be nvarchar;
strDisplayName should be nvarchar;
strSubject should be nvarchar;
Make use of indexes
You should create Non-Clustered Indexes on columns that you use to filter and order data.
In order of importance:
LatestEventTime as you order ALL data by this column;
UserID as you filter out most of data by this column;
blnHide as you filter out part of data by this column;
Make use of indexes for text lookup
You could make use of indexes for text lookup if you change your filter behaviour slightly and search text only in the start of column value.
To achieve that:
change .Contains() with .StartsWith() as it would allow index to be used.
create Non-Clustered Indexes on strEmailAddress column:
create Non-Clustered Indexes on strDisplayName column:
create Non-Clustered Indexes on strSubject column:
Try out free text search
Microsoft only recently have introduced full text search in Azure SQL. You can use that to find rows matching by partial string. This is a bit complicated to achieve using EF, but it is certainly doable.
Here are some links to get you started:
Entity Framework, Code First and Full Text Search
https://azure.microsoft.com/en-us/blog/full-text-search-is-now-available-for-preview-in-azure-sql-database/

string.Contains(...) converted to WHERE ... LIKE ... sql-statement. Which is very expensive. Try to reform your query to avoid it.
Plus, Azure SQL has it's own limitations (5 sec as far as I remember, but better check SLA) for query run, so it would generally ignore your web.config settings if they are longer.

Related

Calling SKIP() in code or using TOP in function

I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.

How to improve this LINQ query to find users will all skills in list

I'm having some trouble optimizing this query.
This is to filter out users that have a particular set of skills.
These skills are sent to the server in the form of a list of their IDs, which are GUIDs.
unfortunately I cannot just get the user as easily as I would like because the last person working on this placed this in a SQL view.
This is where we try and find all users with all skills selected.
skillIDs is the list of GUIDs
Here is what it looks like
myview.Where(view => skillIDs.All(skill => view.User.Skills.Any(s => s.ID == skill)))
Other things we have tried
myView.Where(view => !skillIDs.Except(view.User.Skills.Select(skill => skill.ID).Any()))
myview.Where(view => skillIDs.All(skill => view.User.Skills.Select(s => s.ID).contains(skill)))
I realize the way it is working is highly inefficient and yes, we are paginating the results, but not until after this query. What I believe is happening is it is executing the query here rather than waiting for the .skip(0).Take(10).tolist() which is when it should execute. Right now it takes 45 seconds for this to work.When it's not trying to execute the query above it's less than a second so I know this is the culprit.
In this case playing with different variations of LINQ won't make a difference as the issue is likely in the backend table indexing, not in how you create a LINQ statement. You really have two options:
Index the backend table as the view will be able to use index on the table if it needs
Index a view directly. The definition of an indexed view must be deterministic. MSDN
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON myview (skillid);
GO

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

How can I speed up my EF query?

How can I speed up this query? Right now it's taking me around 2 minutes to pull back 210K records.
I turned off LazyLoading as well as set AsNoTracking on my tables.
I know it's a lot of data but surely it shouldn't take 2 minutes to retrieve the data?
context.Configuration.LazyLoadingEnabled = false;
List<MY_DATA> data = context
.MY_DATA.AsNoTracking()
.Include(x => x.MY_DATA_DETAILS)
.Where(x => startDate <= DbFunctions.TruncateTime(x.DB_DATE)
&& endDate >= DbFunctions.TruncateTime(x.DB_DATE)
&& x.MY_DATA_DETAILS.CODE.Trim().ToUpper() == myCode.Trim().ToUpper())
.ToList();
You can do without DbFunctions.TruncateTime() and probably also without these Trim().ToUpper() calls.
If you execute a function on a database column before it is filtered, it's impossible for the query optimizer to use any indexes on this column. This is know as being non-sargable. To execute the query in its present form, the database engine has to transform the data first and then scan all the transformed data to do the filtering. No index involved.
DbFunctions.TruncateTime() is meaningless. You have to choose startDate and endDate wisely and use x.DB_DATE as it is.
Further, if x.MY_DATA_DETAILS.CODE is a varchar column (most text columns are), it will be auto-trimmed in searches. Even if the database value contains trailing spaces, they will be ignored. So Trim isn't necessary. Next, most text columns by default have a case-insensitive database collation. You should check it. If this is SQL Server, look for collations like SQL_Latin1_General_CP1_CI_AS. The CI part means Case-Insensitive. If so, you can also do away with the ToUpper part. If not, you can either change the collation to a case-insensitive one, or you maybe should conclude that the column is case sensitive for a reason, so it does matter whether you look for Abc or abc.
Either way, having these transforming function removed form the database columns, the query should be able to run considerably faster, provided that proper indexes are in place.
Usually defining indexes on the columns that you frequently use in the 'where' clause, can improve your performance in selecting the rows from a large tale.
I recommend that you create a stored procedure and move your query into the SP and apply the performance tuning in Database and in your C# code, call the SP.
In addition to what the other guy said about moving your query to a stored proc and creating the proper indexes, I'd say you'd be better off using SQL reporting rather then trying to import the data into your application and reporting from there. Especially 210K rows. SQL has internal optimizations that you'll never be able to come close to with stored procs and queries.
You can see this very easily:
1) try write a simple console app that tries to pull down that entire table and writes it to a CSV file -- it'll be extremely slow.
2) try use the data export through Sql Mgmt Studio and export to a CSV -- it'll be done in a few seconds.
Do you need all the attributes of the object?
You can do this.
List<MY_DATA> data = context.MY_DATA.Include(x =>
x.MY_DATA_DETAILS).Where(x => startDate <= DbFunctions.TruncateTime(x.DB_DATE) &&
endDate >= DbFunctions.TruncateTime(x.DB_DATE) &&
x.MY_DATA_DETAILS.CODE.Trim().ToUpper() == myCode.Trim().ToUpper()).select(x => new MY_DATA()
{
Value = data
}).ToList();

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Categories

Resources