Entity Framework Query Slow for Small Result Set - c#

So I have the below EF query that returns about 12,000 records from a flat table. I use a projection that only selects the necessary fields (about 15) and then puts them into a list of a custom class. It takes almost 3 seconds, which seems like a long time for 12,000 records. I've tried wrapping the entire thing in a transaction scrope with "read uncommitted", and I've also tried using "AsNoTracking()". Neither made any difference. Anyone have any idea why the performance on this would be so lousy?
List<InfoModel> results = new List<InfoModel>();
using (InfoData data = new InfoData())
{
results = (from S in data.InfoRecords
select new
{
...bunch of entity fields...
}).AsEnumerable().Select(x => new InfoModel()
{
...bunch of model fields...
}).ToList();
}

It's difficult to answer, because there are soy many things that can affect, your network, the number of other request in your sqlserver or Windows server, your model, ....
Despite of the lastest versions the quality of generated queries and the performance of Entity framework has improved a lot, is far from others in terms of speed. There are some performance considerations you can look https://msdn.microsoft.com/en-us/data/hh949853.aspx
It speed matters and 3 seconds is too much for you, probably I wouldn't use Entity Framework to retrieve so many rows, for me Entity Framework is great when you need just some items but not thousands if speed is important.
To improve speed you case use many others ORM like Dapper, the one used by Stackoverflow that is pretty fast.

Related

Entity Framework Complex query with inner joins

I want to execute a complex query using Entity Framework (Code First).
I know it can be done by LINQ, but what is the best way to write LINQ query for complex data and inner joining?
I want to execute the following query:
var v = from m in WebAppDbContext.UserSessionTokens
from c in WebAppDbContext.Companies.Include(a => a.SecurityGroups)
from n in WebAppDbContext.SecurityGroups.Include(x => x.Members)
where m.TokenString == userTokenString &&
n.Members.Contains(m.User) &&
c.SecurityGroups.Contains(n)
select c;
Is this the best way to do this?
Does this query get any entire list of records from the db and then executes the filtration on this list? (might be huge list)
And most important: Does it query the database several times?
In my opinion and based on my own experiences, talking about performance, especially joining data sets, it's faster when you write it in SQL. But since you used code first approach then it's not an option. To answer your questions, your query will not query DB several times (you can try debugging and see Events log in VS). EF will transform your query into SQL statement and execute it.
TL:DR; don't micromanage the robots. Let them do their thing and 99% of the time you'll be fine. Linq doesn't really expose methods to micromanage the underlying data query, anyway, its whole purpose is to be an abstraction layer.
In my experience, the Linq provider is pretty smart about converting valid Linq syntax into "pretty good" SQL. Your looks like your query is all inner joins, and those should all be on the primary indexes/foreign keys of each table, so it's going to come up with something pretty good.
If that's not enough to soothe your worries, I'd suggest:
put on a SQL Trace to see the actual query that's being presented to the Database. I bet it's not as simple as Companies join SecurityGroups join Members join UserTokens, but it's probably equivalent to it.
Only worry about optimizing if this becomes an actual performance problem.
If it does become a performance problem, consider refactoring your problem space. It looks like you're trying to get a Company record from a UserToken, by going through three other tables. Is it possible to just relate Users and Companies directly, rather than through the security model? (please excuse my total ignorance of your problem space, I'm just saying "maybe look at the problem from another angle?")
In short, it sounds like you're burning brain cycles attempting to optimize something prematurely. Now, it's reasonable to think if there is a performance problem, this section of the code could be responsible, but that would only be noticeable if you're doing this query a lot. Based on coming from a security token, I'd guess you're doing this once per session to get the current user's contextual information. In that case, the problem isn't usually with Linq, but with your approach to solving the problem for finding Company information. Can you cache the result?

Reading thousands of objects with EF Core FAST

I am reading 40,000 small objects / rows from SQLite with EF core, and it's taking 18 seconds, which is too long for my UWP app.
When this happens CPU usage on a single core reaches 100%, but the disk reading speed is circa 1%.
var dataPoints = _db.DataPoints.AsNoTracking().ToArray();
Without AsNoTracking() the time taken is even longer.
DataPoint is a small POCO with a few primitive properties. Total amount of data I am loading is 4.5 MB.
public class DataPointDto
{
[Key]
public ulong Id { get; set; }
[Required]
public DateTimeOffset TimeStamp { get; set; }
[Required]
public bool trueTime { get; set; }
[Required]
public double Value { get; set; }
}
Question: Is there a better way of loading this many objects, or am I stuck with this level of performance?
Fun fact: x86 takes 11 seconds, x64 takes 18. 'Optimise code' shaves off a second. Using Async pushes execution time to 30 seconds.
Most answers follow the common wisdom of loading less data, but in some circumstances such as here you Absolutely Positively Must load a lot of entities. So how do we do that?
Cause of poor performance
Is it unavoidable for this operation to take this long?
Well, its not. We are loading just a megabyte of data from disk, the cause of poor performance is that the data is split across 40,000 tiny entities. The database can handle that, but the entity framework seem to struggle setting up all those entities, change tracking, etc. If we do not intend to modify the data, there is a lot we can do.
I tried three things
Primitives
Load just one property, then you get a list of primitives.
List<double> dataPoints = _db.DataPoints.Select(dp => dp.Value).ToList();
This bypasses all of entity creation normally performed by entity framework. This query took 0.4 seconds, compared to 18 seconds for the original query. We are talking 45 (!) times improvement.
Anonymous types
Of-course most of the time we need more than just an array of primitives
We can create new objects right inside the LINQ query. Entity framework won't create the entities it normally would, and the operation runs much faster. We can use anonymous objects for convenience.
var query = db.DataPoints.Select(dp => new {Guid ID = dp.sensorID, DateTimeOffset Timestamp = dp.TimeStamp, double Value = dp.Value});
This operations takes 1.2 seconds compared to 18 seconds for normally retrieving the same amount of data.
Tuples
I found that in my case using Tuples instead of anonymous types improves performance a little, the following query executed roughly 30% faster:
var query = db.DataPoints.Select(dp => Tuple.Create(dp.sensorID, dp.TimeStamp, dp.Value));
Other ways
You cannot use structures inside LinQ queries, so that's not an
option
In many cases you can combine many records together to reduce
overhead associated with retrieving many individual records. By
retrieving fewer larger records you could improve performance. For
instance in my usecase I've got some measurements that are being
taken every 5 minutes, 24/7. At the moment I am storing them
individually, and that's silly. Nobody will ever query less than a
day worth of them. I plan to update this post when I make the change
and find out how performance changed.
Some recommend using an object oriented DB or micro ORM. I have
never used either, so I can't comment.
you can use a different technique to load all your items.
you can create your own logic to load parts of the data while the user is scrolling the ListView( I guess you are using it) .
fortunately UWP a easy way to do this technique.
Incremental loading
please see the documentation and example
https://msdn.microsoft.com/library/windows/apps/Hh701916
Performance test on 26 million records (1 datetime, 1 double, 1 int), EF Core 3.1.5:
Anonymous types or tuples as suggested in the accepted answer = About
20 sec, 1.3GB RAM
Struct = About 15 sec, 0.8GB RAM

Poor performance when loading child entities with Entity Framework

I'm building an ASP.Net application with Entity Framework (Code First) and I've implemented a Repository pattern like the one in this example.
I only have two tables in my database. One called Sensor and one called MeasurePoint (containing only TimeStamp and Value). A sensor can have multiple measure points. At the moment I have 5 sensors and around 15000 measure points (approximately 3000 points for each sensor).
In one of my MVC controllers I execute the following line (to get the most recent MeasurePoint for a Sensor)
DbSet<Sensor> dbSet = context.Set<Sensor>();
var sensor = dbSet.Find(sensorId);
var point = sensor.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
This call takes ~1s to execute which feels like a lot to me. The call results in the following SQL query
SELECT
[Extent1].[MeasurePointId] AS [MeasurePointId],
[Extent1].[Value] AS [Value],
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[Sensor_SensorId] AS [Sensor_SensorId]
FROM [dbo].[MeasurePoint] AS [Extent1]
WHERE ([Extent1].[Sensor_SensorId] IS NOT NULL) AND ([Extent1].[Sensor_SensorId] = #EntityKeyValue1)
Which only takes ~200ms to execute, so the time is spent somewhere else.
I've profiled the code with the help of Visual Studio Profiler and found that the call that causes the delay is
System.Data.Objects.Internal.LazyLoadBehavior.<>c_DisplayClass7`2.<GetInterceptorDelegate>b_1(!0,!1)
So I guess it has something to do with lazy loading. Do I have to live with performance like this or are there improvements I can make? Is it the ordering by time that causes the performance drop, if so, what options do I have?
Update:
I've updated the code to show where sensor comes from.
What that will do is load the entire children collection into memory and then perform the .First() linq query against the loaded (appx 3000) children.
If you just want the most recent, use this instead:
context.MeasurePoints.OrderByDescending(measurePoint => measurePoint.TimeStamp).First();
If that's the query it's running, it's loading all 3000 points into memory for the sensor. Try running the query directly on your DbContext instead of using the navigation property and see what the performance difference is. Your overhead may be coming from the 2999 points you don't need being loaded.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Is it possible to accelerate (dynamic) LINQ queries using GPU?

I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.
Technologies I have "investigated" so far:
Microsoft Accelerator
Cudafy
Brahma
In short, would it even be possible at all to do an in-memory filtering of objects on the GPU?
Let´s say we have a list of some objects and we want to filter something like:
var result = myList.Where(x => x.SomeProperty == SomeValue);
Any pointers on this one?
Thanks in advance!
UPDATE
I´ll try to be more specific about what I am trying to achieve :)
The goal is, to use any technology, which is able to filter a list of objects (ranging from ~50 000 to ~2 000 000), in the absolutely fastest way possible.
The operations I perform on the data when the filtering is done (sum, min, max etc) is made using the built in LINQ-methods and is already fast enough for our application, so that´s not a problem.
The bottleneck is "simply" the filtering of data.
UPDATE
Just wanted to add that I have tested about 15 databases, including MySQL (checking possible cluster approach / memcached solution), H2, HSQLDB, VelocityDB (currently investigating further), SQLite, MongoDB etc, and NONE is good enough when it comes to the speed of filtering data (of course, the NO-sql solutions do not offer this like the sql ones, but you get the idea) and/or the returning of the actual data.
Just to summarize what I/we need:
A database which is able to sort data in the format of 200 columns and about 250 000 rows in less than 100 ms.
I currently have a solution with parallellized LINQ which is able (on a specific machine) to spend only nano-seconds on each row when filtering AND processing the result!
So, we need like sub-nano-second-filtering on each row.
Why does it seem that only in-memory LINQ is able to provide this?
Why would this be impossible?
Some figures from the logfile:
Total tid för 1164 frågor: 2579
This is Swedish and translates:
Total time for 1164 queries: 2579
Where the queries in this case are queries like:
WHERE SomeProperty = SomeValue
And those queries are all being done in parallell on 225639 rows.
So, 225639 rows are being filtered in memory 1164 times in about 2.5 seconds.
That´s 9,5185952917007032597107300413827e-9 seconds / row, BUT, that also includes the actual processing of the numbers! We do Count (not null), total count, Sum, Min, Max, Avg, Median. So, we have 7 operations on these filtered rows.
So, we could say it´s actually 7 times faster than the the databases we´ve tried, since we do NOT do any aggregation-stuff in those cases!
So, in conclusion, why are the databases so poor at filtering data compared to in-memory LINQ filtering? Have Microsoft really done such a good job that it is impossible to compete with it? :)
It makes sense though that in-memory filtering should be faster, but I don´t want a sense that it is faster. I want to know what is faster, and if it´s possible why.
I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.
If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.
Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).
Hope this helps.
The GPU is really not intended for all general purpose computing purposes, especially with object oriented designs like this, and filtering an arbitrary collection of data like this would really not be an appropriate thing.
GPU computations are great for things where you are performing the same operation on a large dataset - which is why things like matrix operations and transforms can be very nice. There, the data copying can be outweighed by the incredibly fast computational capabilities on the GPU....
In this case, you'd have to copy all of the data into the GPU to make this work, and restructure it into some form the GPU will understand, which would likely be more expensive than just performing the filter in software in the first place.
Instead, I would recommend looking at using PLINQ for speeding up queries of this nature. Provided your filter is thread safe (which it'd have to be for any GPU related work...) this is likely a better option for general purpose query optimization, as it won't require the memory copying of your data. PLINQ would work by rewriting your query as:
var result = myList.AsParallel().Where(x => x.SomeProperty == SomeValue);
If the predicate is an expensive operation, or the collection is very large (and easily partitionable), this can make a significant improvement to the overall performance when compared to standard LINQ to Objects.
GpuLinq
GpuLinq's main mission is to democratize GPGPU programming through LINQ. The main idea is that we represent the query as an Expression tree and after various transformations-optimizations we compile it into fast OpenCL kernel code. In addition we provide a very easy to work API without the need of messing with the details of the OpenCL API.
https://github.com/nessos/GpuLinq
select *
from table1 -- contains 100k rows
left join table2 -- contains 1M rows
on table1.id1=table2.id2 -- this would run for ~100G times
-- unless they are cached on sql side
where table1.id between 1 and 100000 -- but this optimizes things (depends)
could be turned into
select id1 from table1 -- 400k bytes if id1 is 32 bit
-- no need to order
stored in memory
select id2 from table2 -- 4Mbytes if id2 is 32 bit
-- no need to order
stored in memory, both arrays sent to gpu using a kernel(cuda,opencl) like below
int i=get_global_id(0); // to select an id2, we need a thread id
int selectedID2=id2[i];
summary__=-1;
for(int j=0;j<id1Length;j++)
{
int selectedID1=id1[j];
summary__=(selectedID2==selectedID1?j:summary__); // no branching
}
summary[i]=j; // accumulates target indexings of
"on table1.id1=table2.id2" part.
On the host side, you can make
select * from table1 --- query3
and
select * from table2 --- query4
then use the id list from gpu to select the data
// x is table1 ' s data
myList.AsParallel().ForEach(x=>query3.leftjoindata=query4[summary[index]]);
The gpu code shouldn't be slower than 50ms for a gpu with constant memory, global broadcast ability and some thousands of cores.
If any trigonometric function is used for filtering, the performance would drop fast. Also when left joined tables row count makes it O(m*n) complexity so millions versus millions would be much slower. GPU memory bandwidth is important here.
Edit:
A single operation of gpu.findIdToJoin(table1,table2,"id1","id2") on my hd7870(1280 cores) and R7-240(320 cores) with "products table(64k rows)" and a "categories table(64k rows)" (left join filter) took 48 milliseconds with unoptimized kernel.
Ado.Net 's "nosql" style linq-join took more than 2000 ms with only 44k products and 4k categories table.
Edit-2:
left join with a string search condition gets 50 to 200 x faster on gpu when tables grow to 1000s of rows each having at least hundreds of characters.
The simple answer for your use case is no.
1) There's no solution for that kind of workload even in raw linq to object, much less in something that would replace your database.
2) Even if you were fine with loading the whole set of data at once (this takes time) it would still be much slower as GPU have high thoroughput but their access is high latency, so if you're looking at "very" fast solutions GPGPU is often not the answer as just preparing / sending the workload and getting back the results will be slow, and in your case probably need to be done in chunks too.

Categories

Resources