I am working on an application with a kinda simple data model (4 tables including two small, having around 10 rows, and two bigger, having hundreds of rows).
I'm working with C# and currently use an OdbcDriver for my Data Access Layer.
I was wondering if there is any difference in terms of performance between this driver or NHibernate?
The application works but I'd like to know if installing NHibernate instead of a classic OdbcDriver would make it faster? If so, is the difference really worth installing NHibernate? (according to the fact that I have never used such technology)
Thanks!
Short answer: no, NHibernate will actually slow your performance in most cases.
Longer answer: NHibernate uses the basic ADO.NET drivers, including OdbcConnection (if there's nothing better), to perform the actual SQL queries. On top of that, it is using no small amount of reflection to digest queries into SQL, and to turn SQL results into lists of objects. This extra layer, as flexible and powerful as it is, is going to perform more slowly than a hard-coded "firehose" solution based on a DataReader.
Where NHibernate may get you the APPEARANCE of faster performance is in "lazy-loading". Say you have a list of People, who each have a list of PhoneNumbers. You are retrieving People from the database, just to get their names. A naive DataReader-based implementation may involve calling a stored procedure for the People that includes a join to their PhoneNumbers, which you don't need in this case. Instead, NHibernate will retrieve only People, and set a "proxy" into the reference to the list of PhoneNumbers; when the list needs to be evaluated, the proxy object will perform another call. If the phone number is never needed, the proxy is never evaluated, saving you the trouble of pulling phone numbers you don't need.
NHibernate isn't about making it faster and it'll alwasy be slower than just using the database primatives like you are (it uses them "under the hood").
In my opinion NHibernate about making a reusable entity layer that can be applied to different applications or at the very least reused in multiple areas in one medium to large application. Therefore moving your application to NHibernate would be a waste of time (it sounds very small).
You might get better performance by using a specific datbase driver for your database engine.
For amount of data in your database it won't make any difference. But in general using NHibernate will slow down application performance, but increase development speed. But this is generally true for all ORM's.
SOme hint: NHIbernate is not magic. It sits on top of ADO.NET. Want a faster driver? GET ONE. Why are yo using a slow outdated technilogy like ODbc anyway? WHat is your data source? Don't they support ANY newer standard like OLEDB?
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I see already a un-answered question here on.
My question is -
Is EF really production ready for large application?
The question originated from these underlying questions -
EF pulls all the records into memory then performs the query
operation. How EF would behave when table has around ~1000 records?
For simple edit I have to pull the record edit it and
then push to db using SaveChanges()
I faced a similar situation where we had a large database with many tables 7- 10 million records each. we used Entity framework to display the data. To get nice performance here's what I learned; My 10 Golden rules for Entity Framework:
Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible
Yes, (In some cases stored procedures are a better choice, they are not that evil as some make you believe), you should use stored procedures where necessary. Import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Same goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(#id)".
EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy
Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster
Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.
When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.
When you have to make changes in the same entity at different places, use the same entity, make changes and save it only once. The point is to AVOID retrieving the same record, make changes & save it multiple times. (Real performance gain tip).
When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.
Transactions are slow. be careful with them.
SQL Profiler or any query profiler is your friend. Run it when developing your application to see what does EF sends to database. When you perform a join using LINQ or Lambda expression in ur application, EF usually generates a Select-Where-In-Select style query which may not always perform well. If u find any such case, roll up ur sleeves, perform the join on DB and have EF retrieve results. (I forgot this one, the most important one!)
if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.
1. EF pulls all the records into memory then performs the query operation. How EF would behave when table has around ~1000 records?
That's not true! EF fetches only necessary records and queries are transformed into proper SQL statements. EF can cache objects locally within DataContext (and track all changes made to entities), but as long as you follow the rule to keep context open only when needed, it won't be a problem.
2. For simple edit I have to pull the record edit it and then push to db using SaveChanges()
It's true, but I would not bother in doing that unless you really see the performance problems. Because 1. is not true, you'll only get one record from DB fetched before it's saved. You can bypass that, by creating the SQL query as a string and sending it as a plain string.
EF translates your LINQ query into an SQL query, so it doesn't pull all records into memory. The generated SQL might not always be the most efficient, but a thousand records won't be a problem at all.
Yes, that's one way of doing it (assuming you only want to edit one record). If you are changing several records, you can get them all using one query and SaveChanges() will persist all of those changes.
EF is not a bad ORM framework. It is a different one with its own characteristics. Compare Microsoft Entity Framework 6, against say NetTiers which is powered by Microsoft Enterprise Library 6.
These are two entirely different beasts. The accepted answer is really good because it goes through the nuances of EF6. Whats key to understand is that each ORM has its own strengths and weaknesses. Compare the project requirements and its data access patterns against the ORM's behavior patterns.
For Example: NetTiers will always give you higher raw performance than EF6. However that is primarily because it is not a point and click ORM and as part and parcel of generating the ORM you will be optimizing your data model, adding custom stored procedures where relevant, etc... if you engineered your data model with the same effort for EF6 you could probably get close to the same performance.
Also consider can you modify the ORM? for example with NetTiers you can add extensions to the codesmith templates to include your own design patterns over and above what is generated by the base ORM library.
Also consider EF6 makes significant use of reflection whereas NetTiers or any library powered by Microsoft Enterprise Library will make heavy use of Generics instead. These are two entirely different approaches. Why so? Because EF6 is based on dynamic reflection whereas NetTiers is based on static reflection. Which is faster and which is better entirely depends on the usage patterns that will be required of the ORM.
Sometimes a hybrid approach works better: Consider for example EF6 for Web API OData endpoints, A few large tables wrapped with NetTiers & Microsoft Enterprise Library with custom stored procedures, and a few large masterdata tables wrapped with a custom built write through object cache where on initial load the record set is streamed into the memory cache using an ADO data reader.
These are all different and they all have their best fit scenarios: EF6, NetTiers, NHibernate, Wilson OR Mapper, XPO from Dev Express, etc...
There is no simple answer for your question. The main thing is about what you want to do with your data? And do you need so much data at one time?
EF translated your Queries to SQL so at this time there is no Object in Memory. When you get the data, then the selected records are in memory. If you are selecting a large amount of large objects then it can be a performance killer if you need to manipulate them all.
If you don't need to manipulate them all you can disable change tracking and enable it later for single objects you need to manipulate.
So you see it depends on your type of application.
If you need to manipulate a mass of data efficient, then don't use a OR-Mapper!
Otherwise EF is fine, but consider how many objects you really need at one time and what you want to do with them.
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Why do you have food in your fridge, when you can just go directly to the grocery store every time you want to eat something? Because going to the grocery store every time you want a snack is extremely inconvenient.
The purpose of DataSets is to avoid directly communicating with the database using simple SQL statements. The purpose of a DataSet is to act as a cheap local copy of the data you care about so that you do not have to keep on making expensive high-latency calls to the database. They let you drive to the data store once, pick up everything you're going to need for the next week, and stuff it in the fridge in the kitchen so that its there when you need it.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
You order a dozen different products from a web site. Which way is better: delivering the items one at a time as soon as they become available from their manufacturers, or waiting until they are all available and shipping them all at once? The first way, you get each item as soon as possible; the second way has lower delivery costs. Which way is better? How the heck should we know? That's up to you to decide!
The data update strategy that is better is the one that does the thing in a way that better meets your customer's wants and needs. You haven't told us what your customer's metric for "better" is, so the question cannot be answered. What does your customer want -- the latest stuff as soon as it is available, or a low delivery fee?
Datasets support disconnected architecture. You can add local data, delete from it and then using SqlAdapter you can commit everything to the database. You can even load xml file directly into dataset. It really depends upon what your requirements are. You can even set in memory relations between tables in DataSet.
And btw, using direct sql queries embedded in your application is a really really bad and poor way of designing application. Your application will be prone to "Sql Injection". Secondly if you write queries like that embedded in application, Sql Server has to do it's execution plan everytime whereas Stored Procedures are compiled and it's execution is already decided when it is compiled. Also Sql server can change it's plan as the data gets large. You will get performance improvement by this. Atleast use stored procedures and validate junk input in that. They are inherently resistant to Sql Injection.
Stored Procedures and Dataset are the way to go.
See this diagram:
Edit: If you are into .Net framework 3.5, 4.0 you can use number of ORMs like Entity Framework, NHibernate, Subsonic. ORMs represent your business model more realistically. You can always use stored procedures with ORMs if some of the features are not supported into ORMs.
For Eg: If you are writing a recursive CTE (Common Table Expression) Stored procedures are very helpful. You will run into too much problems if you use Entity Framework for that.
This page explains in detail in which cases you should use a Dataset and in which cases you use direct access to the databases
I usually like to practice that, if I need to perform a bunch of analytical proccesses on a large set of data I will fill a dataset (or a datatable depending on the structure). That way it is a disconnected model from the database.
But for DML queries I prefer the quick hits directly to the database (preferably through stored procs). I have found this is the most efficient, and with well tuned queries it is not bad at all on the db.
So I have an application which requires very fast access to large volumes of data and we're at the stage where we're undergoing a large re-design of the database, which gives a good opertunity to re-write the data access layer if nessersary!
Currently in our data access layer we use manually created entities along with plain SQL to fill them. This is pretty fast, but this technology is really getting old, and I'm concerned we're missing out on a newer framework or data access method which could be better in terms of neatness and maintainability.
We've seen the Entity Framework, but after some research it just seems that the benefit of the ORM it gives is not enough to justify the lower performance and as some of our queries are getting complex I'm sure performance with the EF would become more of an issue.
So it is a case of sticking with our current methods of data access, or is there something a bit neater than manually creating and maintaining entities?
I guess the thing that's bugging me is just opening our data layer solution and seeing lots of entities, all of which need to be maintained exactly in line with the database, which sometimes can be a lot of work, but then maybe this is the price we pay for performance?
Any ideas, comments and suggestions are very appreciated! :)
Thanks,
Andy.
** Update **
Forgot to mention that we really need to be able to handle using Azure (client requirements), which currently stops us from using stored procedures. ** Update 2 ** Actually we have an interface layer for our DAL which means we can created an Azure implementation which just override data access methods from the Local implementation which aren't suitable for Azure, so I guess we could just use stored procedures for performance sensitive local databases with EF for the cloud.
I would use an ORM layer (Entity Framework, NHibernate etc) for management of individual entities. For example, I would use the ORM / entities layers to allow users to make edits to entities. This is because thinking of your data as entities is conceptually simpler and the ORMs make it pretty easy to code this stuff without ever having to program any SQL.
For the bulk reporting side of things, I would definitely not use an ORM layer. I would probably create a separate class library specifically for standard reports, which creates SQL statements itself or calls sprocs. ORMs are not really for bulk reporting and you'll never get the same flexibility of querying through the ORM as through hand-coded SQL.
Stored procedures for performance. ORMs for ease of development
Do you feel up to troubleshooting some opaque generated SQL when it runs badly...? That generates several round trips where one would do? Or insists on using wrong datatypes?
You could try using mybatis (previously known as ibatis). It allows you to map sql statements to domain objects. This way you keep full control over SQL being executed and get cleanly defined domain model at the same time.
Don't rule out plain old ADO.NET. It may not be as hip as EF4, but it just works.
With ADO.NET you know what your SQL queries are going to look like because you get 100% control over them. ADO.NET forces developers to think about SQL instead of falling back on the ORM to do the magic.
If performance is high on your list, I'd be reluctant to take a dependency on any ORM especially EF which is new on the scene and highly complex. ORM's speed up development (a little) but are going to make your SQL query performance hard to predict, and in most cases slower than hand rolled SQL/Stored Procs.
You can also unit test SQL/Stored Procs independently of the application and therefore isolate performance issues as either DB/query related or application related.
I guess you are using ADO.NET in your DAL already, so I'd suggest investing the time and effort in refactoring it rather than throwing it out.
Currently, my entire website does updating from SQL parameterized queries. It works, we've had no problems with it, but it can occasionally be very slow.
I was wondering if it makes sense to refactor some of these SQL commands into classes so that we would not have to hit the database so often. I understand hitting the database is generally the slowest part of any web application For example, say we have a class structure like this:
Project (comprised of) Tasks (comprised of) Assignments
Where Project, Task, and Assignment are classes.
At certain points in the site you are only working on one project at a time, and so creating a Project class and passing it among pages (using Session, Profile, something else) might make sense. I imagine this class would have a Save() method to save value changes.
Does it make sense to invest the time into doing this? Under what conditions might it be worth it?
If your site is slow, you need to figure out what the bottleneck is before you randomly start optimizing things.
Caching is certainly a good idea, but you shouldn't assume that this will solve the problem.
Caching is almost always underutilized in ASP .NET applications. Any time you hit your database, you should look for ways to cache the results.
Serializing objects into the session can be costly in itself, but most likley faster than just hitting the database every single time. You are benefiting now from Execution Plan caching in SQL Server so it's very likely that what you're getting is optimal performance out of your stored procedure.
One option you might consider doing to increase performance is to astract your data into objects via LINQ to SQL (against your sprocs) and then use AppFabric to cache the objects.
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
As for your updates, you should do that directly against the sprocs, but you will also need to clear our the Cache in AppFabric for objects that are affected by the Insert/Update/Delete.
You could also do the same thing simply using the standard Cache as well, but AppFabric has some added benefits.
Use the SQL Profiler to identify your slowest queries, and see if you can improve them with some simple index changes (removing unused indexes, adding missing indexes).
You could very easily improve your application performance by an order of magnitude without changing your front-end app at all.
See http://sqlserverpedia.com/wiki/Find_Missing_Indexes
If you have look up data only, you can store it in Cache object. This will avoid the hits to DB. Only data that can be used globally should be stored in Cache.
If this data requires filtering, you can restore it from Cache, and filter the data before rendering.
Session can be used to store user specific data. But care must be taken that too much of session variables can easily cause performance problems.
This book may be helpful.
http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839
Currently I am working with a custom business object layer (adopting the facade pattern) in which the object's properties are loaded from stored procedures as well as provide a place for business logic. This has been working well in the attempt to move our code base to a more tiered and standardized application model but feel that this approach is more of an evolutionary step rather than a permanent one.
I am currently looking into moving to a more formal framework so that certain architecture decisions won't have to be my own. In the past I have worked with CSLA and Linq to SQL and while I like a lot of the design decisions in CLSA I find it a bit bloated for my tastes and that Linq to SQL might not have the performance I want. I have been interested in the popularity of NHibernate and the push of Linq to Entity however performance is a key issue since there are instances where a large number of records need to be fetched at a time (> 15k) (please do not debate the reason for this) and am curious as far as performance what looks like the best choice for adopting a formal .Net Object Framwork?
NOTE: This will be used primarily in Winform and WPF applications.
Duplicate: https://stackoverflow.com/questions/146087/best-performing-orm-for-net
http://ormbattle.net - the performance test there seems almost exactly what you want to see.
You must look at materialization test (performance of fetching large number of items is exactly what it shows); moreover, you can compare ORM performance with performance of nearly ideal SQL on plain ADO.NET doing the same.
With any ORM you're going to get a boost out of the box via a Level 1 in proc cache. Especially with loads, if it's already there it won't take a trip to Pluto(the DB). Most ORMs have the opportunity to inject a L2 out proc cache. The nice thing about these is that they just plug into the ORM. Checkout NCache for NHibernate.
O/R mapper performance will depend greatly on how your application is designed and how you map the business objects. For example, you could easily kill performance by lazy loading a child object in a loop so that 1 select for 1000 objects turns into 1001 selects (google n+1 select).
One of the biggest performance gains with o/r mappers is in developer productivity, which is often more important than application performance. Application performance is usually acceptable to end users for most applications running on recent hardware. Developer performance continues to be a bottleneck, no matter how much Mountain Dew is applied to the problem. :-)