Performance of IQueryable

Performance of IQueryable - c#

i needed to write a dynamic query on the customer database for obtaining few fields of a customer.
following is the code
[Route("api/getBasicCustList/{argType}/{argValue}")]
[HttpGet]
[Authorize]
public dynamic getCustomerDataUsername(String argType, String argValue)
{
IQueryable<CustomerDTO> query =
(from recordset in db.Customers
select new CustomerDTO
{
companyId = recordset.Company.Id,
contactNum = recordset.ContactNum,
username = recordset.UserName,
emailAddress = recordset.Email,
fullName = recordset.FullName,
accountNumber = recordset.RCustId
}
);
switch (argType)
{
case "username" :
query = query.Where(c => c.username.StartsWith(argValue));
break;
case "contactnum":
long mobNum = Int64.Parse(argValue);
query = query.Where(c => c.contactNum == mobNum);
break;
case "fullname":
query = query.Where(c => c.fullName.Contains(argValue));
break;
}
return new { data = query.ToList() };
}
this works fine and is solving my purpose.
my question here is when i write my first part of the query to get all the customer records and later on apply the where condition dynamically will the results be brought in memory or the complete query is generated and executed at db in one shot?
Since i have just 500 records as of now, i am not able to find any performance lag but when i take this to production i will be dealing with at least 200,000 to 300,000 records.

ok, the answer is
The query won't be executed until you reach that "ToList" at the end
of your method
from the MSDN link shared by #GeorgPatscheider its mentioned
At what point query expressions are executed can vary. LINQ queries
are always executed when the query variable is iterated over, not when
the query variable is created. This is called deferred execution
Deferred execution enables multiple queries to be combined or a query
to be extended. When a query is extended, it is modified to include
the new operations, and the eventual execution will reflect the
changes.
Its also written that if queries have any of Average, Count, First, or Max it will perform an immediate execution.
thanks.

The biggest factor in improving performance of the query against large tables is doing the filtering on the server (database) side. In Entity Framework 6.x and earlier, a query would fail to compile if EF could not convert the entire query to SQL. In EF Core, this is no longer the case. Instead as much of the query as possible will be converted to SQL. The rest will be evaluated on the client side.
All three of your filtering lambda expressions can be converted to SQL. However, if you were to write a predicate that can't be converted, then on EF Core your performance would suffer. All the records in the Customers table would be sent to the client for filtering despite the fact that the evaluation of the query is still deferred until ToList() is called. Your logs would contain a warning but that is easy to miss.
A good reference for this is Jon Smiths article Entity Framework Core: Client vs. Server evaluation.

Related

How to bring C# AsQueryable() object into memory

I want to use Entity Framework to pull a database list into memory with AsQueryable(), as when I use ToList() LINQ doesn't work anymore as it returns a list of 0 elements.
It is possible to do something like
var vwEntityList = dc.vwEntity.ToList().AsQueryable()
? I've tried this but LINQ fails when I query the object. If I use just .AsQueryable() it works but is really slow querying the database for every operation, which I want to avoid, I want to bring the EF view to memory as a queryable object.
When I have my list and try to use LINQ on it like
var newlist = vwEntityList.Where(x => x.deal_status == "B").ToList();
newlist is a list with 0 items, when I know there should be items in the list by running a similar query against the database, and it also works when using just .AsQueryable()
Thanks.

Based on what you've provided it sounds like this works (but is too slow for your liking)
var list = context.vwEntity.Where(x => x.deal_status == "B").ToList();
but this does not:
var totalList = context.vwEntity.ToList();
var list = totalList.Where(x => x.deal_status == "B").ToList();
The reason this would be the case is database collation. Some database such as SQL Server use a case-insensitive comparison for strings by default. So if a record had a deal_status of "b", then the first query would return that record because in SQL "B" == "b", however once loaded into memory, a Linq2Object where clause will do a case-sensitive comparison. "B" <> "b". SQL Server databases can be configured with case sensitive collations, so it is dangerous to assume strings will always be treated case-insensitive.
Linq expressions work against IEnumerable so you don't need to force a List back to IQueryable.
If you are working with Linq2Object, or against a case-sensitive database you should ensure case-insensitive matching is done where you want rather than relying on collation:
// this should work...
var totalList = context.vwEntity.ToList();
var list = totalList.Where(x => x.deal_status.ToUpper() == "B").ToList();
That said, reading an entire table/view into memory then querying against the objects should never really be "faster" than querying the entities unless perhaps you need to do a lot of different queries and want to cache the table in memory. Querying across strings can commonly hit performance snags if the fields being queried are not indexed. I would suggest getting familiar with a query profiler for whatever database you are running against to capture the exact queries that EF is executing to identify performance bottlenecks. When facing slow queries, I don't think I've ever recommended "load the entire 200k records into memory first" as a solution. :)

string Text.ToUpper() slow inside of Linq query in C# Visual Studio 2017 ver 15.1.1

I have the following code that takes 5-8 seconds to complete:
recent_items returnedData = (from d in db.recent_items
where d.item_number == scannerInput.Text.ToUpper()
select d).FirstOrDefault();
Where as the following executes in less than a half second:
string search = scannerInput.Text.ToUpper();
recent_items returnedData = (from d in db.recent_items
where d.item_number == search
select d).FirstOrDefault();
What in the world is going on?

The most likely reason for the discrepancy is that EF evaluates the first condition in memory, after retrieving all rows from RDBMS.
Microsoft documentation says the following about Client vs. Server Evaluation:
Entity Framework Core supports parts of the query being evaluated on the client and parts of it being pushed to the database. It is up to the database provider to determine which parts of the query will be evaluated in the database.
Your EF DB provider could not send scannerInput.Text.ToUpper() to RDBMS for evaluation, so it did all comparisons in memory, after retrieving data rows from the database. This decision is correct, because EF DB provider cannot assume that consecutive evaluations of the above expression would yield identical results.
Second query, on the other hand, used a captured variable for the query. The value of that variable is guaranteed to stay constant while EF runs the query, so it proceeded to evaluating the request on the RDBMS side.

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?

There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.

The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.

It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?

There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)

Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();

Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();

If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}

It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.

Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();

I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();

sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.

What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

Reusing LINQ query results in another LINQ query without re-querying the database

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!

If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).

Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.

Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.