Related
I am creating a web application on the top of ASP.NET 6 framework. I am trying to figure out the best ORM to use for this project. I am leaning toward Entity Framework for the following reason
I'll be able to use LINQ to write my queries
I'll be able to access my relations easily and directly using native C# model.
Here is where the complication starts. This app will be connecting to a very large database with over 500 tables. Also, the app is going to be broken down into many small logical areas so it's easy for me to maintain it.
If Entity Framework is the way to go, how should I setup the DbContext so I can manage 500+ DbSet and the relations? In other words, should I create a single DbContext for the entire app even when my app is broken down into multiple Areas? Or should I create a DbContext for each area? But if I do that, what if I need to establish relation across multiple areas? For example, X model in X-area need to create a relation to B model in B-area and C model in C-area? I thought about introducing DbContext inheritance where CAreaDbContext would inherit from BAreaDbContext which inherits from AAreaDbContext but that would break real quick.
Is Entity Framework if the right framework for a large database app? If so, how can I manage the DbContext across multiple areas? If not, what would be the alternative to use without having to write plain SQL queries?
EF is perfectly fine for large databases. When mapping a large number of tables and relationships there is a single-time startup cost for the very first query as EF initializes and validates its mapping, but this is a static cost for an application, not each time a DbContext is initialized.
You can split the application across several DbContexts to help make organizing entities more logical and reduce those initial setup costs. This is generally referred to as using Bounded Contexts if you want to search up examples. These typically organize your application down to aggregate roots or top-level entities with everything else falling under those aggregates or serving as lookups, etc. Entities can be registered with multiple DbContexts, though you should aim to ensure that one aggregate root is nominated for being responsible for editing and creating a given entity.
The most important details to consider with EF and areas of performance and avoiding unwanted/unexpected behaviour would be to ensure you generally don't load more data than you need through the entities, more often than you need to.
Some general advice would include:
Absolutely AVOID the temptation to use the Generic Repository pattern with EF. Non-generic Repositories are great to facilitate unit testing or centralize important, common rules/validation, but Generic flavours lead to inefficient and expensive, or overly complex code, usually both.
Keep DbContext lifetimes as short as possible. For Web applications this should be kept no longer than the Request length (when using an IoC container for instance) or shorter. Worst case, use using blocks to scope your DbContext. The longer a DbContext is kept alive, the more entities it tracks, and the more it tracks, the more it needs to sift through looking for references when loading other entities that might have navigation properties and the slower it gets. Long-lived DbContexts can also get "poisoned" when you have an issue attempting to save entity changes. Those invalid entities will remain tracked by the DbContext and interfere with future unrelated SaveChanges calls until they are removed (Detached) or corrected.
Gain an understanding of Projection using Select or AutoMapper's ProjectTo method. Loading entire entity graphs will get expensive, especially if the DbContext is left to track all of those instances. Projection down to ViewModels/DTOs help ensure that only as much data is needed is ever loaded and transmitted and makes it crystal clear what is being passed around. (As opposed to passing detached entities, or worse, partially filled detached entities)
Understand IQueryable and everything that Linq can bring to working with the data. EF query building is extremely valuable, so you can leverage sorting, filtering, pagination, projection, as well as scenarios to get Counts and check existence (.Any()) all without fetching a ton of data via Entities. See point #1 to avoid falling into this trap.
Use ToList/ToListAsync sparingly and be aware that any logic you feed EF in Linq expressions needs to be able to translated down to SQL. Sometimes you will find yourself trying to build a query where EF complains that it cannot evaluate your expression. Things like calling private methods / unmapped properties. Adding a ToList before the expression will seem like a magic fix, forcing a client-side evaluation. This is an expensive operation as you are effectively fetching (and typically tracking) all entities up to that point then continuing in memory. This gets expensive for memory use.
Asynchronous methods are not a silver bullet and does not make queries faster. Awaiting asynchronous EF methods is very useful when you have queries that are going to take a while to run, or be called extremely often. My advice is to default to synchronous methods and test run your code against production-like volumes as early as possible. I use 250ms as a threshold, but pick something acceptable to you and profile your queries. Anything over that threshold is something that would likely benefit from being made asynchronous. Typically things like searches, especially ones that involve text match searches are good candidates as these can be a bit slow and are generally run fairly frequently by several users at a time. The same goes for any operation that might get called a lot through the course of an application by many users at the same time. async/await doesn't make queries faster, they make them slightly slower, but they do make your server more responsive by not hanging the request until the query finishes. Using this by default makes your code a touch slower and a bit tougher to debug for no real benefit. (As it can easily be introduced as needed.)
Profile your queries. With traditional data access you would create your schema and write your access queries (Sprocs etc.) creating indexes as you go. With EF building your queries, indexing becomes more of a reactionary process where you might add your typical indexes, but should look at the queries being run in a production-like scenario to refine indexes based on high-volume queries that EF is building. This also provides key insight into other inefficiencies that might creep into your queries, as well as performance problems like lazy loading being tripped. Expensive queries should be investigated and optimized where possible.
Prepare to employ things like queuing for truly expensive queries. Systems will often call for things like Reports and data exports or just really expensive query options. Aim to set reasonable expectations by default so for instance avoiding things like string Contains() in text searches opting for string StartsWith(). Where you do need to support expensive queries, build a mechanism to allow users/processes to queue the query details as a request and employ a background worker/pool to pick up and process these requests. The temptation might be to just employ async/await here but the important thing is to avoid situations where too many of these queries are kicked off at once. Queries like this will "touch" a lot of data leading to locks and deadlocks in a system. Users have a bad tendency to repeatedly kick off actions when it looks like one isn't responding which compounds the problem on the back-end.
I've got Entity Framework 4.1 with .NET 4.5 running on ASP.NET in Windows 2008R2. I'm using EF code-first to connect to SQL Server 2008R2, and executing a fairly complex LINQ query, but resulting in just a Count().
I've reproduced the problem on two different web servers but only one database (production of course). It recently started happening with no application, database structure, or server changes on the web or database side.
My problem is that executing the query under certain circumstances takes a ridiculous amount of time (close to 4 minutes). I can take the actual query, pulled from SQL Profiler, and execute in SSMS in about 1 second. This is consistent and reproducible for me, but if I change the value of one of the parameters (a "Date after 2015-01-22" parameter) to something earlier, like 2015-01-01, or later like 2015-02-01, it works fine in EF. But I put it back to 2015-01-22 and it's slow again. I can repeat this over and over again.
I can then run a similar but unrelated query in EF, then come back to the original, and it runs fine this time - same exact query as before. But if I open a new browser, the cycle starts over again. That part also makes no sense - we're not doing anything to retain the data context in a user session, so I have no clue whatsoever why that comes into play.
But this all tells me that the data itself is fine.
In Profiler, when the query runs properly, it takes about a second or two, and shows about 2,000,000 in reads and about 2,000 in CPU. When it runs slowly, it takes 3.5 minutes, and the values are 300,000,000 and 200,000 - so reads are about 150 times higher and CPU is 100 times higher. Again, for the identical SQL statement.
Any suggestions on what EF might be doing differently that wouldn't show up in the query text? Is there some kind of hidden connection property which might cause a different execution plan in certain circumstances?
EDIT
The query that EF builds is one of the ones where it builds a giant string with the parameter included in the text, not as a SQL parameter:
exec sp_executesql
N'SELECT [GroupBy1].[A1] AS [C1]
FROM (
SELECT COUNT(1) AS [A1]
...
AND ([Extent1].[Added_Time] >= convert(datetime2, ''2015-01-22 00:00:00.0000000'', 121))
...
) AS [GroupBy1]'
EDIT
I'm not adding this as an answer since it doesn't actually address the underlying issue, but this did end up getting resolved by rebuilding indexes and recomputing statistics. That hadn't been done in longer than usual, and it seems to have cleared up whatever caused the issue.
I'll keep reading up on some of the links here in case this happens again, but since it's all working now and unreproduceable, I don't know if I'll ever know for sure exactly what it was doing.
Thanks for all the ideas.
I recently had a very similar scenario, a query would run very fast executing it directly in the database, but had terrible performance using EF (version 5, in my case). It was not a network issue, the difference was from 4ms to 10 minutes.
The problem ended up being a mapping problem. I had a column mapped to NVARCHAR, while it was VARCHAR in the database. Seems inoffensive, but that resulted in an implicit conversion in the database, which totally ruined the performance.
I'm not entirely sure on why this happens, but from the tests I made, this resulted in the database doing an Index Scan instead of an Index Seek, and apparently they are very different performance-wise.
I blogged about this here (disclaimer: it is in Portuguese), but later I found that Jimmy Bogard described this exact problem in a post from 2012, I suggest you check it out.
Since you do have a convert in your query, I would say start from there. Double check all your column mappings and check for differences between your table's column and your entity's property. Avoid having implicit conversions in your query.
If you can, check your execution plan to find any inconsistencies, be aware of the yellow warning triangle that may indicate problems like this one about doing implicit conversion:
I hope this helps you somehow, it was a really difficult problem for us to find out, but made sense in the end.
Just to put this out there since it has not been addressed as a possibility:
Given that you are using Entity Framework (EF), if you are using Lazy Loading of entities, then EF requires Multiple Active Result Sets (MARS) to be enabled via the connection string. While it might seem entirely unrelated, MARS does sometimes produce this exact behavior of something running quickly in SSMS but horribly slow (seconds become several minutes) via EF.
One way to test this is to turn off Lazy Loading and either remove MultipleActiveResultSets=True; (the default is "false") or at least change it to be MultipleActiveResultSets=False;.
As far as I know, there is unfortunately no work-around or fix (currently) for this behavior.
Here is an instance of this issue: Same query with the same query plan takes ~10x longer when executed from ADO.NET vs. SMSS
There is an excellent article about Entity Framework performance consideration here.
I would like to draw your attention to the section on Cold vs. Warm Query Execution:
The very first time any query is made against a given model, the
Entity Framework does a lot of work behind the scenes to load and
validate the model. We frequently refer to this first query as a
"cold" query. Further queries against an already loaded model are
known as "warm" queries, and are much faster.
During LINQ query execution, the step "Metadata loading" has a high impact on performance for Cold query execution. However, once loaded metadata will be cached and future queries will run much faster. The metadata are cached outside of the DbContext and will be re-usable as long as the application pool lives.
In order to improve performance, consider the following actions:
use pre-generated views
use query plan caching
use no tracking queries (only if accessing for read-only)
create a native image of Entity Framework (only relevant if using EF 6 or later)
All those points are well documented in the link provided above. In addition, you can find additional information about creating a native image of Entity Framework here.
I don't have an specific answer as to WHY this is happening, but it certainly looks to be related with how the query is handled more than the query itself. If you say that you don't have any issues running the same generated query from SSMS, then it isn't the problem.
A workaround you can try: A stored procedure. EF can handle them very well, and it is the ideal way to deal with potentially complicated or expensive queries.
Realizing you are using Entity Framework 4.1, I would suggest you upgrade to Entity Framework 6.
There has been a lot of performance improvement and EF 6 is much faster than EF 4.1.
The MSDN article about Entity Framework performance consideration mentioned in my other response has also a comparison between EF 4.1 and EF 6.
There might be a bit of refactoring needed as a result, but the improvement in performance should be worth it (and that would reduce the technical debt at the same time).
I am working on an application with a kinda simple data model (4 tables including two small, having around 10 rows, and two bigger, having hundreds of rows).
I'm working with C# and currently use an OdbcDriver for my Data Access Layer.
I was wondering if there is any difference in terms of performance between this driver or NHibernate?
The application works but I'd like to know if installing NHibernate instead of a classic OdbcDriver would make it faster? If so, is the difference really worth installing NHibernate? (according to the fact that I have never used such technology)
Thanks!
Short answer: no, NHibernate will actually slow your performance in most cases.
Longer answer: NHibernate uses the basic ADO.NET drivers, including OdbcConnection (if there's nothing better), to perform the actual SQL queries. On top of that, it is using no small amount of reflection to digest queries into SQL, and to turn SQL results into lists of objects. This extra layer, as flexible and powerful as it is, is going to perform more slowly than a hard-coded "firehose" solution based on a DataReader.
Where NHibernate may get you the APPEARANCE of faster performance is in "lazy-loading". Say you have a list of People, who each have a list of PhoneNumbers. You are retrieving People from the database, just to get their names. A naive DataReader-based implementation may involve calling a stored procedure for the People that includes a join to their PhoneNumbers, which you don't need in this case. Instead, NHibernate will retrieve only People, and set a "proxy" into the reference to the list of PhoneNumbers; when the list needs to be evaluated, the proxy object will perform another call. If the phone number is never needed, the proxy is never evaluated, saving you the trouble of pulling phone numbers you don't need.
NHibernate isn't about making it faster and it'll alwasy be slower than just using the database primatives like you are (it uses them "under the hood").
In my opinion NHibernate about making a reusable entity layer that can be applied to different applications or at the very least reused in multiple areas in one medium to large application. Therefore moving your application to NHibernate would be a waste of time (it sounds very small).
You might get better performance by using a specific datbase driver for your database engine.
For amount of data in your database it won't make any difference. But in general using NHibernate will slow down application performance, but increase development speed. But this is generally true for all ORM's.
SOme hint: NHIbernate is not magic. It sits on top of ADO.NET. Want a faster driver? GET ONE. Why are yo using a slow outdated technilogy like ODbc anyway? WHat is your data source? Don't they support ANY newer standard like OLEDB?
So I have an application which requires very fast access to large volumes of data and we're at the stage where we're undergoing a large re-design of the database, which gives a good opertunity to re-write the data access layer if nessersary!
Currently in our data access layer we use manually created entities along with plain SQL to fill them. This is pretty fast, but this technology is really getting old, and I'm concerned we're missing out on a newer framework or data access method which could be better in terms of neatness and maintainability.
We've seen the Entity Framework, but after some research it just seems that the benefit of the ORM it gives is not enough to justify the lower performance and as some of our queries are getting complex I'm sure performance with the EF would become more of an issue.
So it is a case of sticking with our current methods of data access, or is there something a bit neater than manually creating and maintaining entities?
I guess the thing that's bugging me is just opening our data layer solution and seeing lots of entities, all of which need to be maintained exactly in line with the database, which sometimes can be a lot of work, but then maybe this is the price we pay for performance?
Any ideas, comments and suggestions are very appreciated! :)
Thanks,
Andy.
** Update **
Forgot to mention that we really need to be able to handle using Azure (client requirements), which currently stops us from using stored procedures. ** Update 2 ** Actually we have an interface layer for our DAL which means we can created an Azure implementation which just override data access methods from the Local implementation which aren't suitable for Azure, so I guess we could just use stored procedures for performance sensitive local databases with EF for the cloud.
I would use an ORM layer (Entity Framework, NHibernate etc) for management of individual entities. For example, I would use the ORM / entities layers to allow users to make edits to entities. This is because thinking of your data as entities is conceptually simpler and the ORMs make it pretty easy to code this stuff without ever having to program any SQL.
For the bulk reporting side of things, I would definitely not use an ORM layer. I would probably create a separate class library specifically for standard reports, which creates SQL statements itself or calls sprocs. ORMs are not really for bulk reporting and you'll never get the same flexibility of querying through the ORM as through hand-coded SQL.
Stored procedures for performance. ORMs for ease of development
Do you feel up to troubleshooting some opaque generated SQL when it runs badly...? That generates several round trips where one would do? Or insists on using wrong datatypes?
You could try using mybatis (previously known as ibatis). It allows you to map sql statements to domain objects. This way you keep full control over SQL being executed and get cleanly defined domain model at the same time.
Don't rule out plain old ADO.NET. It may not be as hip as EF4, but it just works.
With ADO.NET you know what your SQL queries are going to look like because you get 100% control over them. ADO.NET forces developers to think about SQL instead of falling back on the ORM to do the magic.
If performance is high on your list, I'd be reluctant to take a dependency on any ORM especially EF which is new on the scene and highly complex. ORM's speed up development (a little) but are going to make your SQL query performance hard to predict, and in most cases slower than hand rolled SQL/Stored Procs.
You can also unit test SQL/Stored Procs independently of the application and therefore isolate performance issues as either DB/query related or application related.
I guess you are using ADO.NET in your DAL already, so I'd suggest investing the time and effort in refactoring it rather than throwing it out.
I have one query on my page that takes at least a half second to execute using EF 3.5. When I used a stored procedure the speed was noticably faster. It is a very complex query. Will there be any performance improvements in the upcoming EF 4.0? And does EF 4.0 really beat out 3.5 performance wise?
The short answer is it's too early to tell. The .Net guys are focusing almost entirely on performance until the release on April 12th has to be finalized and localized. Also, what is meant by faster? Faster can be viewed in many ways, for example:
Entity Framework 4.0 has new features, the object tracking improvements alone may mean huge wins since you're not doing that manual work yourself...in any case, at least the development's faster.
If it didn't work at all before, lighter weight objects with POCO support may mean a lot less memory being shifted when dealing with lots of objects as well. No matter how small the cost of extra properties being populated when fetching from the DB, there is a cost both in instantiating and tracking them (load time and memory consumption).
In your specific case, a half second is a long time for anything but a very complex or high volume query...have you looked to see how much time is spent in the database and how much time is spent once .Net has the data? If you're spending most of your time outside of SQL then yes, the base improvements in reflections in Net 4.0 should provide you some speed improvement...however if you're spending all your time in SQL, it won't help much at all. The bulk of your performance problem may be indexing of the generated SQL and not Entity Framework hydration performance.
I would follow Kane's comment, look at the SQL it's generating for your query, is it possible for you to post this and the stored procedure that is quick so we can maybe find where the problem lies?
From the ADO.NET blog:
Customizing Queries – Adding support for existing LINQ operators,
recognizing a larger set of patterns
with LINQ, writing model defined
functions along with the ability to
use these in LINQ, and a number of
other ways to create and customize
queries.
SQL Generation Readability Improvements – Improving the
readability, along with TSQL
performance optimizations, of the
generated queries to make it much
easier to understand what is happening
So these two points imply you could see improvements in the way it's generating your query from LINQ.
However it's unlikely that an ORM will ever be able to out-perform a query you've written from scratch as it has to cater for so many different scenarios, and usually the most common one is defaulted to. EF 3.5 seemed to produce some very efficient join SQL when I used it, probably the best I've seen from an ORM so there is hope you can ditch the SP in 4.0.
If you've got a stored procedure I'm guessing it's a big query - sending this SQL text each time to the server will cause a lot of network traffic which is one other thing you may or may not have considered. Obviously on the same server or inside the same internal network this a 'cutting your hair to lose weight' style optimisation.
When it comes to really complex queries, I've not seen any evidence that any of L2S, NH, or EF can generate a better query plan than I can in a sproc. I love ORM's (especially NH), but there are still times when ORM execution time can get curbstomped by a well written sproc.