Entity Framework Skip method running very slow - c#

I'm using Entity Framework 5, ObjectContext and POCOs on my data access layer. I have a generic respository implementation and I have a method that queries the database with paging using Skip() and Take(). Everything works fine, except that the query performance is very slow when skipping a lot of rows (I'm talking about 170k rows)
This is an excerpt of my query on Linq to Entities:
C# Code:
ObjectContext oc = TheOBJEntitiesFactory.CreateOBJEntitiesContext(connection);
var idPred = oc.CreateObjectSet<view_Trans>("view_Trans").AsQueryable();
idPred = idPred.OrderBy(sortColumn, sortDirection.ToLower().Equals("desc"));
var result = idPred.Skip(iDisplayStart).Take(iDisplayLength);
return new PagedResult<view_Trans>(result, totalRecords);
In the translated query to Transact-SQL I noticed that instead of using the ROW_NUMBER() clause with the view directly its making a sub-query and applying the ROW_NUMBER() to the results of the sub-query...
example:
select top(10) extent1.A, extent1.B.extent1.C from (
select extent1.A, extent1.B, extent1.C,
row_number() OVER (ORDER BY [Extent1].[A] DESC) AS [row_number]
from (
select A,B,C from table as extent1)) as extent1
WHERE [Extent1].[row_number] > 176610
ORDER BY [Extent1].[A] DESC
This takes about 165 seconds to complete. Any idea on how to improve the performance of the translated query statement?

For those not following the comments above, I suspected the problem was not the extra SELECT, since that extra SELECT is present on many, many EF queries which do not take 165s to run. I eventually noticed that his ObjectSet referenced a VIEW and wondered if that might be part of the problem. After some experimentation, he narrowed the problem down to a LEFT JOIN inside the view. I suggested that he ran the Database Tuning Advisor on that query; he did, and the two indices suggested fixed the problem.

One reason for the slowness is probably that your sql is ordering your rows twice.
To control the query, the only option I know of is to call idPred.SqlQuery("Select ...", params). This will allow you to write your own optimized query for the data request.

Related

Entity Framework Core2 LINQ - each join runs as a separate query

I have something like this in my C# MVC controller:
from table1 in db.Table1.AsQueryable()
join table2 in db.Table2.AsQueryable() on table1.Col1 equals table2.Col1
join table3 in db.table3.AsQueryable() on new { table2.Col2, table2.Col5 } equals new { table3.Col2, table3.Col5 }
.
.
few more joins
.
.
WHERE ......
select new {table1.Prop1, table2.Prop2, table3.Prop3}
When I watch what it runs on SQL profiler, I was expecting a single query with all the joins. What it does instead, it selects all columns from all tables in separate queries. i.e. Runs
SELECT * FROM Table2 --Instead of * it has all column names
when that's finished running, it runs
SELECT * FROM Table3 --Instead of * it has all column names
and so on for each table. Tables are big so it takes too long, using a lot of memory. I added AsQueryable() on the entities but it didn't make a difference, still multiple queries. db is a DbContext, using core 2.
How can I change the LINQ or some other setting so the whole thing runs as a single query?
Update
It looks like the problem was caused by having Convert.ToInt32( on one of the join columns. The int column I was joining on is nullable in one table and non-nullable in the other table, I had Convert.ToInt32( on the nullable
table, removing the convert generated a single query.
According to LINQ2SQL documents:
When you query for an object, you actually retrieve only the object you requested. The related objects are not automatically fetched at the same time.
The DataLoadOptions class provides two methods to achieve immediate loading of specified related data. The LoadWith method allows for immediate loading of data related to the main target. The AssociateWith method allows for filtering related objects.
This is an issue with Lazy Loading vs Eager Loading.
This is a great post with very good explanation.
Lazy Loading And Eager Loading In LINQ To SQL

Why is this Entity Framework code faster than its native SQL equivalent? (in MySQL)

I'm running Entity Framework 5 in my C# .NET application, against a MySQL database using the MySQL .NET connector 6.6.5.
Usually, when EF isn't fast enough, I resort to either Stored Procedures or direct SQL execution with context.Database.SqlQuery However, I recently had an issue with a SQL call actually taking far longer than its EF equivalent, and I wondered if anybody knows why this is?
Here's the (slow) SQL query:
public sbyte? getFirstRouteTypeFromStop(string primaryCode) {
string sql = string.Format("SELECT r.route_type FROM stoptimes st INNER JOIN trips t ON st.trip_id = t.trip_id INNER JOIN routes r ON t.route_id = r.route_id WHERE st.stop_id = '{0}' LIMIT 1;", primaryCode);
return context.Database.SqlQuery<sbyte?>(sql).FirstOrDefault();
}
Here's the (fast) EF code:
public sbyte? getFirstRouteTypeFromStop(string primaryCode) {
return context.stoptimes.Where(st => st.stop_id.Equals(primaryCode)).FirstOrDefault().trip.route.route_type;
}
This method gets called repeatedly in a loop and EF is a LOT faster. (at least 1000%)
Why?
Important Notes:
The MySQL database has all these columns appropriately indexed.
When the native SQL query is run directly in MySQL it seems to execute much faster than when run in the C# app - I suspect this is quite an important observation.
You are using an INNER JOIN in the SqlQuery which means:
total rows selected = (no. of rows in stoptimes) * (no. of rows in routes) * (no. of rows in trips)
and then on that huge list "WHERE st.stop_id = '{0}'" is executed...
I suspect this is the problem in the sql query... the inner joins make a huge selection and filter the records from that...
Whereas the EF code filters only on the table stoptimes
Where(st => st.stop_id.Equals(primaryCode))
so it makes the filtration fast and then on the single selected record, the route and trip are fetched.
Note:- Try using LEFT OUTER JOIN... It will make your query faster.
hope it helps...
Maybe you have all or some Stoptimes (and its navigation properties like trip and route) already in memory. If this is the case, EF are just hitting databse one time (or at least not in every iteration of the loop). With SQLQuery you go to database in every iteration of the loop. Maybe primaryCode repeats a lot in the loop? or maybe you retrieve Stoptimes in previous code in your app without disposing context?
Try changing the TSQL
SELECT r.route_type
FROM stoptimes st
JOIN trips t
ON st.trip_id = t.trip_id
AND st.stop_id = '{0}'
JOIN routes r
ON t.route_id = r.route_id
LIMIT 1;

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

Does LINQ with a scalar result trigger the lazy loading

I read the Loading Related Entities post by the Entity Framework team and got a bit confused by the last paragraph:
Sometimes it is useful to know how many entities are related to another entity in the database without actually incurring the cost of loading all those entities. The Query method with the LINQ Count method can be used to do this. For example:
using (var context = new BloggingContext())
{
var blog = context.Blogs.Find(1);
// Count how many posts the blog has
var postCount = context.Entry(blog)
.Collection(b => b.Posts)
.Query()
.Count();
}
Why do the Query + Count method needed here?
Can't we simple use the LINQ's COUNT method instead?
var blog = context.Blogs.Find(1);
var postCount = blog.Posts.Count();
Will that trigger the lazy loading and all the collection will be loaded to the memory and just than I'll get my desired scalar value?
You will get your desired scalar value in bot cases. But consider the difference in what's happening.
With .Query().Count() you run a query on the database of the form SELECT COUNT(*) FROM Posts and assign that value to your integer variable.
With .Posts.Count, you run (something like) SELECT * FROM Posts on the database (much more expensive already). Each row of the result is then mapped field-by-field into your C# object type as the collection is enumerated to find your count. By asking for the count in this way, you are forcing all of the data to be loaded so that C# can count how much there is.
Hopefully it's obvious that asking the database for the count of rows (without actually returning all of those rows) is much more efficient!
The first method is not loading all rows since the Count method is invoked from an IQueryable but the second method is loading all rows since it is invoked from an ICollection.
I did some testings to verify it. I tested it with Table1 and Table2 which Table1 has the PK "Id" and Table2 has the FK "Id1" (1:N). I used EF profiler from here http://efprof.com/.
First method:
var t1 = context.Table1.Find(1);
var count1 = context.Entry(t1)
.Collection(t => t.Table2)
.Query()
.Count();
No Select * From Table2:
SELECT TOP (2) [Extent1].[Id] AS [Id]
FROM [dbo].[Table1] AS [Extent1]
WHERE [Extent1].[Id] = 1 /* #p0 */
SELECT [GroupBy1].[A1] AS [C1]
FROM (SELECT COUNT(1) AS [A1]
FROM [dbo].[Table2] AS [Extent1]
WHERE [Extent1].[Id1] = 1 /* #EntityKeyValue1 */) AS [GroupBy1]
Second method:
var t1 = context.Table1.Find(1);
var count2 = t1.Table2.Count();
Table2 is loaded into memory:
SELECT TOP (2) [Extent1].[Id] AS [Id]
FROM [dbo].[Table1] AS [Extent1]
WHERE [Extent1].[Id] = 1 /* #p0 */
SELECT [Extent1].[Id] AS [Id],
[Extent1].[Id1] AS [Id1]
FROM [dbo].[Table2] AS [Extent1]
WHERE [Extent1].[Id1] = 1 /* #EntityKeyValue1 */
Why is this happening?
The result of Collection(t => t.Table2) is a class that implements ICollection but it is not loading all rows and has a property named IsLoaded. The result of the Query method is an IQueryable and this allows calling Count without preloading rows.
The result of t1.Table2 is an ICollection and it is loading all rows to get the count.
By the way, even if you use only t1.Table2 without asking for the count, rows are loaded into memory.
The first solution doesn't trigger the lazy loading because it most probably never access the collection property directly. The Collection method accepts Expression, not just delegate. It is used only to get the name of the property which is than used to access mapping information and build correct query.
Even if it would access the collection property it could use the same strategy as other internal parts of EF (for example validation) which turns off lazy loading temporarily before accessing navigation properties to avoid unexpected lazy loading.
Btw. this is a huge improvement in contrast to ObjectContext API where building query required accessing the navigation property and thus it could trigger lazy loading.
There is one more difference between those two approaches:
The first always executes query to database and returns count of items in the database
The second executes query to database only once to load all items and then returns counts of items in the application without checking state in the database
As the third quite interesting option you can use extra loading. The implementation by Arthur Vickers shows how to use navigation property to get count from the database without lazy loading items.

Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;
i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.
My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.
Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.
Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.
A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.
Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

Categories

Resources