EF - IQueryable - two selects with one db call - c#

I am carrying out a pretty normal select data from table using Entity Framework Core and IQueryable. I am using paging in my search so want to fetch rows x to y, depending on the page size and current page.
Here's my code:
_dbContext.Orders.Where(o => o.UserId == userId)
.Skip((pageNo - 1) * pageSize).Take(pageSize).ToListAsync();
And resulting SQL:
SELECT [o].[UserId], [o].[OrderId], [o].[OrderDate], [o].[OrderType], [o].[FromNameAddressId], [o].[ToNameAddressId], [o].[Status]
FROM [Orders] AS [o]
WHERE [o].[UserId] = 12
ORDER BY (SELECT 1) OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY;
The problem I'm facing is that I also want to return the total record count as part of the query results and end up (currently) having to do that as a separate call to the db.
Here is my code:
_dbContext.Orders.Where(o => o.UserId == userId).CountAsync();
And resulting SQL:
SELECT COUNT(*) FROM [Orders] AS [o] WHERE [o].[UserId] = 12;
I am looking to make this more efficient, so was looking to either return the total record count as part of the first query OR to run the two selects with one db call rather than two. Has anyone achieved this before? I'm fairly new to Entity Framework, so this is probably fairly straight forward to achieve.
Thanks in advance for any pointers!

Related

C# Linq query execution order

Consider the following method:
public IEnumerable<Owner> GetOwners(OwnerParameters ownerParameters)
{
return FindAll()
.OrderBy(on => on.Name)
.Skip((ownerParameters.PageNumber - 1) * ownerParameters.PageSize)
.Take(ownerParameters.PageSize)
.ToList();
}
Where FindAll() is a repository pattern method that returns IQueryable<Owner>. Does having .OrderBy() before .Skip() and .Take() methods mean that all the elements from the Owner data table will be retrieved and ordered, or, does Linq take into account that .Skip() and .Take() methods might narrow down the required Owner elements and only after having retrieved those will the ordering happen?
EDIT: Profiler log:
SELECT XXX
FROM [Owners] AS [a]
ORDER BY [a].[Name]
OFFSET #__p_0 ROWS FETCH NEXT #__p_1 ROWS ONLY',N'#__p_0 int,#__p_1 int',#__p_0=0,#__p_1=10
Ultimately, this depends on what FindAll() does and what it returns:
if it returns IEnumerable<T>, then it is using LINQ-to-Objects, which mostly just does literally what it is told; if you sort then pages, then it sorts then pages; if you page then sort, then it pages then sorts
however, if it returns IQueryable<T>, then the query is being composed - and only actually executed in ToList(), at which point the provider model gets a chance to inspect your query tree and build the most suitable implementation possible, which often means: writing a SQL query that includes an ORDER BY and some paging hints suitable for the specific RDBMS; if your code paged then sort (which is... unusual) then I would expect most providers to either write some kind of horrible sub-query to try to describe that, or just throw an exception (perhaps NotSupportedException) in disgust
All records wouldn't be retrieved. Depending on your backend a query would be generated, that orders by Name, skips N rows and takes next M rows. Only the query result is retrieved, triggered by .ToList().
ie: In MS SQL server the query might look like:
Select top(M) row_number() over (order by Name) as RowNo, *
from myTable
where RowNo > N
That type of query is not very effective but that is another matter and you could create your custom workaround.
EDIT: I remembered MS SQL code generated wrong. It was:
SELECT ...
FROM ...
ORDER BY ...
OFFSET x ROWS FETCH NEXT y ROWS ONLY
That one is also slow. If you have speed consideration then write your own SQL and send with Linq. Basically what you do is to keep the lastRetrieved value and set it as a parameter:
select top(NTake) ... from ...
order by ...
where orderedByValue > #lastRetrievedValue
(You can send raw SQL queries in Linq)
A quick example to demonstrate that order of query matters. The explanations given above are good.
var test1 = await _dbContext.UserActivityLogs
.Where(x => x.ExternalSyncLogId == request.Id)
.Skip(0).Take(25)
.AsNoTracking().ToListAsync();
This query transpile to
exec sp_executesql N'SELECT [u].[Id], [u].[ActionType], [u].[ActivityEndTime], [u].[ActivityStartTime], [u].[ActivityType], [u].[Created], [u].[CreatedBy], [u].[EntityId], [u].[ExternalSyncLogId], [u].[LastModified], [u].[LastModifiedBy], [u].[RequestBody], [u].[ResponseBody], [u].[Source], [u].[TenantId]
FROM [UserActivityLogs] AS [u]
WHERE ([u].[ExternalSyncLogId] IS NULL
ORDER BY (SELECT 1)
OFFSET #__p_1 ROWS FETCH NEXT #__p_2 ROWS ONLY',N'#__p_1 int,#__p_2 int',#__p_1=0,#__p_2=25
Execution time: 29
But if we just change the sequence then
var test2 = await _dbContext.UserActivityLogs
.Skip(0).Take(25)
.Where(x => x.ExternalSyncLogId == request.Id)
.AsNoTracking().ToListAsync();
Translated as
exec sp_executesql N'SELECT [t].[Id], [t].[ActionType], [t].[ActivityEndTime], [t].[ActivityStartTime], [t].[ActivityType], [t].[Created], [t].[CreatedBy], [t].[EntityId], [t].[ExternalSyncLogId], [t].[LastModified], [t].[LastModifiedBy], [t].[RequestBody], [t].[ResponseBody], [t].[Source], [t].[TenantId]
FROM (
SELECT [u].[Id], [u].[ActionType], [u].[ActivityEndTime], [u].[ActivityStartTime], [u].[ActivityType], [u].[Created], [u].[CreatedBy], [u].[EntityId], [u].[ExternalSyncLogId], [u].[LastModified], [u].[LastModifiedBy], [u].[RequestBody], [u].[ResponseBody], [u].[Source], [u].[TenantId]
FROM [UserActivityLogs] AS [u]
ORDER BY (SELECT 1)
OFFSET #__p_0 ROWS FETCH NEXT #__p_1 ROWS ONLY
) AS [t]
WHERE [t].[ExternalSyncLogId] IS NULL',N'#__p_0 int,#__p_1 int',#__p_0=0,#__p_1=25
Execution time: 474

.netcore EF linq - this is a BUG? Very strange behavior

I have two table in sql. Document and User. Document have relation to User and I want to get users that I sent document recently.
I need to sort by the date document was sent and get unique (distinct) user with relation to this document
This is my linq queries
var recentClients = documentCaseRepository.Entities
.Where(docCase => docCase.AssignedByAgentId == WC.UserContext.UserId)
.OrderByDescending(userWithDate => userWithDate.LastUpdateDate)
.Take(1000) // I need this because if I comment this line then EF generate completely different sql query.
.Select(doc => new { doc.AssignedToClient.Id, doc.AssignedToClient.FirstName, doc.AssignedToClient.LastName })
.Distinct()
.Take(configuration.MaxRecentClientsResults)
.ToList();
and generated sql query is:
SELECT DISTINCT TOP(5) [t].*
FROM (
SELECT TOP(1000) [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
ORDER BY [docCase].[LastUpdateDate] DESC
)
AS [t]
Every thing is correct for now. But if I delete this line
.Take(1000) // I need this because...
EF generated completely different query such as:
SELECT DISTINCT TOP(5)
[docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
My question is: why EF not generated orderby clause and subquery with distinct?
This is a BUG EF or I'm doing something wrong? And what I must do to generate in linq this sql query ()
SELECT DISTINCT TOP 5 [t].*
FROM ( SELECT [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON [docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id]
WHERE [docCase].[AssignedByAgentId] = 1
ORDER BY [docCase].[LastUpdateDate] DESC
) AS [t]
OrderBy information not always retained across other operators such as Distinct. Entity Framework does not document (to my knowledge) how exactly OrderBy is propagated.
This kind of makes sense because some operators have undefined output order. The fact that ordering is retained in many situations is a convenience for the developer.
Move the OrderBy to the end of the query (or at least past the Distinct).
The reason for the difference in queries is that Distinct messes up result order. So when you first execute OrderBy and then Distinct, you can just es well not execute OrderBy, because this order is lost anyway. So EF can just optimize it away.
Calling Take in between causes the result set to be semantically different: You first order the items, take the first 1000 items of that order and then call Distinct on them.
What you can change in your query depends mainly on the result you want to achieve. Maybe you want to first make the result set distinct then order by date and finally take the amount of items. Other options are also thinkable based on your requirements.

Linq to Entities strange results

I have the following View in my database
SELECT YEAR(Received) AS YEAR,
MONTH(Received) AS MONTH,
LEFT(DATENAME(MONTH, Received), 3) AS MMM,
COUNT(Received) AS Submissions,
COUNT(Quoted) AS Quotes,
COUNT(Bound) AS Binders,
COALESCE (SUM(BndPremium), 0) AS Premium,
ProducerID
FROM dbo.Quote AS Q WITH (NOLOCK)
WHERE (Received >= DATEADD(year, - 1, GETDATE()))
GROUP BY ProducerID, YEAR(Received), MONTH(Received), DATENAME(MONTH, Received)
And I have added the view to my EDMX. I query the view this way:
var submissions = from s in db.WSS_PortalSubmissions
where s.ProducerID == ID
select s;
The results in 'submissions' however, is 12 copies of the first month rather than the results from the past 12 months. Running the query in Linq today I get 12 copies of the results from April 2016. If I run the query in SSMS I get the expected results, a list of the last 12 months.
I have tried .ToList(), .ToArray(), even tried some sorting of the results, but it doesn't change. It is only giving me 12 copies of the first month. Any reasons why that I can't see?
I would change the view so that there is a unique column(s) if there is not one already and make sure it is mapped as the primary key in EF.
If that is not an option try changing the code to
var submissions = from s in db.WSS_PortalSubmissions.AsNoTracking()
where s.ProducerID == ID
select s;
so that EF does not track the entity, this should force it to just return exactly the results of the query without trying to track based on a primary key.

sql Top 1 vs System.Linq firstordefault

I am rewriting an SProc in c#. the problem is that in SProc there is a query like this:
select top 1 *
from ClientDebt
where ClinetID = 11234
order by Balance desc
For example :I have a client with 3 debts, all of them have same balance. the debt ids are : 1,2,3
c# equivalent of that query is :
debts.OrderByDescending(d => d.Balance)
.FirstOrDefault()
debts represent clients 3 debts
the interesting part is that sql return debt with Id 2 but c# code returns Id 1.
The Id 1 make sense for me But in order to keep code functionality the same I need to change the c# code to return middle one.
I do not sure what is the logic behind sql top 1 where several rows match the query.
The query will select one debt and update the database. I would like the linq to return the same result with sql
Thanks
debts.OrderByDescending(d => d.Balance).ThenByDescending(d => d.Id)
.FirstOrDefault()
You can start SQL Profiler, execute stored procedure, review result, and then catch query which application send through linq, and again review result.
Also, you can easily view execution plan of you procedure, and try it to optimize, but with linq query, you cannot easily do this.
AFAIK, IN SQL if you select rows without ORDER BY, it orders the resultset based on the primary key.
With Order BY CLAUSE [field], implicitly next order is [primarykey].

Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;
i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.
My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.
Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.
Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.
A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.
Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

Categories

Resources