Fluent Linq and SQL Server - c#

This may be a silly question, but I have fluent linq query that is pulling data off a DB, but, of course, it processes the data in memory, not on the DB.
Is there a means of taking a query (Say table.Where(t => t. id > 22)) and having it run as a db query (i.e. SELECT * FROM TABLE WHERE ID > 22).
I want to use the fluent style because I am building up the query from various places. I know I can make this specific query into query syntax, but my actual queries are more complex.
I have tried standard searches, but I suspect it is not going to be possible, as nothing comes up.
Yes using EF Core.
The code is not necessarily clear:
var getTableData = table.Where(base.whereSelection);
whereSelection.ForEach(w => getTableData = getTableData.Where(w));
At the moment, all of the Where Selections are Funcs - they could be converted to something else (like Expressions) if I knew that would make them run on the DB. The rest of the code is also building up this set of details programatically:
var selectedData = getTableData.Select(Convert);
var sortedTableData = GetSort(selectedData, Ordering);
var sorted = PickDataFrom(sortedTableData);
return CheckAndTake(sorted, pageSize);
Everything seems to work. Just in the wrong place.

Related

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?
There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.
The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.
It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

How can I get a nested IN clause with a linq2sql query?

I am trying to implement some search functionality within our app and have a situation where a User can select multiple Topics from a list and we want to return all activities that match at least one of the selected Topics. Each Activity can have 0-to-many Topics.
I can write a straight SQL query that gives me the results I want like so:
SELECT *
FROM dbo.ACTIVITY_VERSION av
WHERE ACTIVITY_VERSION_ID IN (
SELECT ACTIVITY_VERSION_ID
FROM dbo.ACTIVITY_TOPIC at
WHERE at.TOPIC_ID IN (3,4)
)
What I can't figure out is how to write a LINQ query (we are using Linq to Sql) that returns the same results.
I've tried:
activities.Where(x => criteria.TopicIds.Intersect(x.TopicIds).Any());
this works if activities is a list of in memory objects (i.e. a Linq to Objects query), but I get an error if I try to use the same code in a query that hits the database. The error I receive is:
Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator.
I believe that this means that Linq to Sql doesn't know how to translate either Intersect or Any (or possibly both). If that is the case, I understand why it isn't working, but I still don't know how to make it do what I want it to and my Google-fu has not provided me with anything that works.
Haven't tested it. But this is how you ll go about it.
List<int> IDs = new List<int>();
IDs.Add(3);
IDs.Add(4);
var ACTIVITY_VERSION_IDs = ACTIVITY_TOPIC
.Where(AT => IDs.Contains(AT.TOPIC_ID))
.Select(AT=>AT.ACTIVITY_VERSION_ID)
var results = ACTIVITY_VERSION
.Where(AV => ACTIVITY_VERSION_IDs.Contains(AV.ACTIVITY_VERSION_ID))

Entity Framework Query filters implementation for best perfomance

As the title says, I´m using entity framework 4.0 for a financial application. I have a winform where I list all the cheques (checks) I have. But in that form, the user can specify some filters.
If the user does not apply any filter, I just can make the query like this:
lista_cheques = db.Cheque.Include("Operacion").Include("Cliente").ToList();
datagridview.Datasource = lista_cheques;
That is simple. But when it applies filters, the problem gets bigger.
As you see, the user can use filter to see cheques (checks) of a specific client, dates, bank, CUIT number, check state, etc.
Now, my question is related to performance in the queries.
I was thinking on doing the filters separeted, like this:
lista_cheques = db.Cheque.Include("Operacion").Include("Cliente").Where(x => x.fecha_deposito == fecha).ToList();
lista_cheques = lista_cheques.Where(x => x.banco.id_banco = banco).ToList();
lista_cheques = lista_cheques.Where(x => x.Operacion.Cliente.id_cliente = id_cliente).ToList();
Translation:
fecha is date
Operacion is a group of checks
Cliente is client.
In this way, I´m doing a query, then, a query from that query result, then a new query from that new result and that goes on.
I think this way might have a big performance issues. I know that SQL server optimize queries. So, if I´m doing fragmented queries, the optimizer is not working properly.
The other way I thought about but it´s very tedious, is to create one big query to handle every possible filter selection.
For example, the other example would be like this:
lista_cheques = db.Cheque.Include("Operacion").Include("Cliente").Where(x => x.fecha_deposito == fecha && x.banco.id_banco = banco && x.Operacion.Cliente.id_cliente = id_cliente).ToList();
The big problem is that I will need lots of combinations to be able to handle all the filter posibilities.
Ok guys, now, will I have performace issues in the first code example? I doing there one big query to the database, and then I´m doing the query in the list of objects (which I think will be faster). I´m pretty new to this ORM, and this listing will have to handle a lot of registries..
I somebody can give me some advice? I made pretty much a mess explaining, hope you can understand..
lista_cheques = db.Cheque.Include("Operacion").Include("Cliente").Where(x => x.fecha_deposito == fecha).ToList();
lista_cheques = lista_cheques.Where(x => x.banco.id_banco = banco).ToList();
lista_cheques = lista_cheques.Where(x => x.Operacion.Cliente.id_cliente = id_cliente).ToList();
Nearly perfect. Kill all those ToList and it is good.
The ToList means SQL is evaluated, so if all 3 flters trigger, 2 and 3 are evaluated in memory.
Kick the ToList away, and the different Where clauses get combined on the database.
Standard LINQ 101. Works like a charm and is always nice to see.
Then add as LAST line:
lista_cheques = lista_cheques.ToList ();

Can I easily evaluate many IQueryables in a single database call using Entity Framework?

Suppose I have a collection (of arbitrary size) of IQueryable<MyEntity> (all for the same MyEntity type). Each individual query has successfully been dynamically built to encapsulate various pieces of business logic into a form that can be evaluated in a single database trip. Is there any way I can now have all these IQueryables executed in a single round-trip to the database?
For example (simplified; my actual queries are more complex!), if I had
ObjectContext context = ...;
var myQueries = new[] {
context.Widgets.Where(w => w.Price > 500),
context.Widgets.Where(w => w.Colour == 5),
context.Widgets.Where(w => w.Supplier.Name.StartsWith("Foo"))
};
I would like to have EF perform the translation of each query (which it can do indivudually), then in one database visit, execute
SELECT * FROM Widget WHERE Price > 500
SELECT * FROM Widget WHERE Colour = 5
SELECT W.* FROM Widget
INNER JOIN SUpplier ON Widget.SupplierId = Supplier.Id
WHERE Supplier.Name LIKE 'Foo%'
then convert each result set into an IEnumerable<Widget>, updating the ObjectContext in the usual way.
I've seen various posts about dealing with multiple result sets from a stored procedure, but this is slightly different (not least because I don't know at compile time how many results sets there are going to be). Is there an easy way, or do I have to use something along the lines of Does the Entity Framework support the ability to have a single stored procedure that returns multiple result sets??
No. EF deosn't have query batching (future queries). One queryable is one database roundtrip. As a workaround you can try to play with it and for example use:
string sql = ((ObjectQuery<Widget>)context.Widgets.Where(...)).ToTraceString();
to get SQL of the query and build your own custom command from all SQLs to be executed. After that you can use similar approach as with stored procedures to translate results.
Unless you really need to have each query executed separately you can also union them to single query:
context.Widgets.Where(...).Union(context.Widgets.Where(...));
This will result in UNION. If you need just UNION ALL you can use Concat method instead.
It might be late answer, hopefully it would help some one else with the same issue.
There is Entity Framework Extended Library on NuGet which provides the future queries feature (among others). I played a bit with it and it looks promising.
You can find more information here.

Categories

Resources