I am going back and forth between using nHibernate and hand written ado.net/stored procedures.
I currently use codesmith with templates I wrote that spits out simple classes that map my database tables, and it wraps my stored procedures for my data layer, and a thin business logic layer that just calls my data layer and returns the objects (1 object or collection).
This application is a web application, used for online communities (basically a forum).
I am watching summer of nhibernate videos right now.
Will using nHibernate make my life easier? Will updates to the database schema be any easier? What effects will there be on performance?
Is setting up nhibernate, and ensuring it performs optimally a headache of its own?
I don't want a complicated or deep object model, I simply want classes that map my tables, and a way to fetch data from my other tables that have foreign keys to them. I don't want a very complicated OOP model.
NHibernate can definitely make your life easier. Updates to your database schema will definitely be easier, because when you use an ORM, you don't have an API of stored procedures hindering you from refactoring your database schema to meet changes in your business model.
OR mappers have a LOT to offer, and are sadly misunderstood by a significant portion of the developer community, and almost all of the DBA community.
Stored procedures in general give the DBA more options for tuning performance in a database, because they have the freedom to rewrite the stored proc so long as they don't change its output. However, in my experience, stored procedures are rarely rewritten, due to other issues that can arise as a result (i.e. when a deployment of a new version of software is performed, any modified versions of existing procs will overwrite the optimized version that was changed by a DBA...thus negating the benefit and creating a maintenance and unexpected performance issue problem.)
Another grave misconception (and this is primarily from the SQL Server camp...I have very little experience with Oracle), is that Stored Procedures are the only thing that can be compiled and the execution plan cached. As far as SQL Server is concerned, any parameterized query can and probably will be compiled and cached.
A benefit of OR mappers is that they are adaptive...with a stored procedure, you write a single statement that will be used regardless of contextual nuances when that query is executed. LINQ to SQL has an amazing capacity to generate the most efficient queries I've ever seen, and often throws DBA's for a serious loop. I've shown DBA's queries generated by L2S that were full of sub queries and unconventional things which were immediately scoffed at. However, given the challenge, the performance (namely physical reads) of a query written by a DBA that was supposedly superior ended up being significantly inferior (sometimes on a scale of 30 physical reads for L2S vs 400 physical reads for the DBA.)
Another detractor as far as DBA's are concerned is that, because ORM's generate dynamic SQL, they have no way to optimize those queries. On the contrary (and again, this is restricted to SQL Server), SQL Server offers a multitude of optimization paths (horizontal and vertical table partitioning, distribution of physical files accross disks for any table or view, indexes, etc.) that can be taken before the need to modify a query is a necessity. Even in the event that a query needs to be modified, SQL Server 2005 and later provide something called Plan Guides, which allow you to moderately tune any query (stored proc, strait sql, etc.). In the event that tuning a query isn't enough, you can match any particular query to a complete replacement query, allowing the DBA to tune the query as much as they need to (but as a last resort.)
There are many, many benefits that can be gained by using an OR mapper, and NHibernate is one of the best free ones (LLBLGen is also very nice, but is not free.) LINQ to Sql and Entity Framework are some new offerings from Microsoft (L2S is soon to be replaced by EF 4.0 from the .NET 4.0 framework...which will at least rival, if not outpace, NHibernate.) The biggest hurdle to adopting an ORM is usually not the ORM product itself, nor its capabilities or performance. The greatest hurdle is usually convincing your DBA (if your lucky/unlucky enough ... depends on your experience ... to have one) that an ORM can improve efficiency and reduce maintenance costs without a cost of optimization paths for the DBA.
NHibernate works very well, especially for a simple model. It will make your life much easier and isn't too tough to learn. Look at "Fluent NHibernate" instead of using XML mappings, it is much easier.
Related
I am currently trying to find different areas where Linq is not sufficient and FromSqlRaw or ExecuteSqlRaw have to be used.
Some examples I have found are
Bulk updates https://learn.microsoft.com/en-us/ef/core/performance/efficient-updating
Executing stored procedures https://learn.microsoft.com/en-us/ef/core/querying/raw-sql
However I am looking for more areas where Linq does not perform good enough and even queries that cannot be generated from Linq in EF Core when it comes to database access.
My goal is to find poor performing Linq translations and examine the cause.
This is a bit of a solution looking for a problem. Given an application and looking for inefficiencies that might benefit from a different approach is something I would start off using a profiler and observing the database access in as close to a production capacity as I am allowed to get.
EF is like any tool, it can be leveraged to create works of art, and it can be abused and misused to create shanties. Even when done correctly, optimizations like indexes are something that are tuned based on looking at real-world options. There are many options that I would look at to address performance issues before considering direct to SQL. Typical culprits that can be easily identified via profiling:
Lazy loading. (Dozens to hundreds of queries following up a "main" query.)
Over-use of eager loading. (Queries involving a heck of a lot of joins)
Sloppy use of client-side evaluation. (Either enabling that feature in EF Core, or slapping a ToList somewhere when a query complains to "fix" it, AsSplitQuery can help here, Projection is a better solution in most cases)
Lack of pagination where more data is returned than necessary. (Similar to #3, having methods like "GetAll" and then applying filtering, pagination, etc.)
Giving users too much flexibility in querying that they don't need 99% of the time, but in that 1% someone does try it, grinds the system to a halt. (Giving users filters/sorts on ALL columns and performing things like string.Contains by default for text searches)
Giving users access to expensive, but necessary queries in real-time. (Big, justified queries, but being run against the production dataset and not "throttled" by something like a Queue to ensure too many of these monsters don't get run at once.)
Those are some of the top culprits that come to mind around performance, and none of them resort to going to SQL. Batch processing in your list is certainly one case that I believe does deserve looking outside of Linq, and potentially outside of EF all-together. Stored Procs I am mixed on. If there is business logic that is shared between an EF-supported application and another existing system and I want to share that business logic as-is. The trouble is that if I'm relying on the Sproc for business rules then there's little point to EF, and if I'm splitting business rules between C#/EF and Sprocs, then that's having to manage logic in two locations.
I've used the entity framework in a couple of projects. In every project, I've used stored procedures mapped to the entities because of the well known benefits of stored procedures - security, maintainability, etc. However, 99% of the stored procedures are basic CRUD stored procedures. This seems like it negates one of the major, time saving features of the Entity Framework -- SQL generation.
I've read some of the arguments regarding stored procedures vs. generated SQL from the Entity Framework. While using CRUD SPs is better for security, and the SQL generated by EF is often more complex than necessary, does it really buy anything in terms of performance or maintainability to use SPs?
Here is what I believe:
Most of the time, modifying an SP requires updating the data model
anyway. So, it isn't buying much in terms of maintainability.
For web applications, the connection to the database uses a single user ID specific to the application. So, users don't even have direct database access. That reduces the security benefit.
For a small application, the slightly decreased performance from using
generated SQL probably wouldn't be much of an issue. For high
volume, performance critical applications, would EF even be a wise
choice? Plus, are the insert / update / delete statements generated
by EF really that bad?
Sending every attribute to a stored procedure has it's own performance penalties, whereas the EF generated code only sends the attributes that were actually changed. When doing updates to large tables, the increased network traffic and overhead of updating all attributes probably negates the performance benefit of stored procedures.
With that said, my specific questions are:
Are my beliefs listed above correct? Is the idea of always using SPs something that is "old school" now that ORMs are gaining in popularity? In your experience, which is the better way to go with EF -- mapping SPs for all insert / update / deletes, or using EF generated SQL for CRUD operations and only using SPs for more complex stuff?
I think always using SP's is somewhat old school. I used to code that way, and now do everything I can in EF generated code...and when I have a performance problem, or other special need, I then add back in a strategic SP to solve a particular problem....it doesn't have to be either or - use both.
All my basic CRUD operations are straight EF generated code - my web apps used to have 100's or more of SP's, now a typical one will have a dozen SP's and everything else is done in my C# code....and my productivity has gone WAY up by eliminating those 95% of CRUD stored procs.
Yes your beliefs are absolutely correct. Using stored procedures for data manipulation has meaning mainly if:
Database follows strict security rules where changing data is allowed only through stored procedures
You are using views or custom queries for mapping your entities and you need advanced logic in stored procedure to push data back
You have some advanced logic (related to data) in the procedure for any other reason
Using procedures for pure CUD where non of mentioned cases applies is redundant and it doesn't provide any measurable performance boost except single scenario
You will use stored procedures for batch / bulk modifications
EF doesn't have bulk / batch functionality so changing 1000 records result in 1000 updates each executed with separate database roundtrip! But such procedures cannot be mapped to entities anyway and must be executed separately via function import (if possible) or directly as ExecuteStoreCommand or old ADO.NET (for example if you want to use table valued parameter).
The whole different story can be R in CRUD where stored procedure can get significant performance boost for reading data with your own optimized queries.
If performance is your primary concern, then you should take one of your existing apps that uses EF with SPs, disable the SPs, and benchmark the new version. That's the only way to get an answer perfectly applicable to your situation. You might find that EF no matter what you do isn't fast enough for your performance needs compared to custom code, but outside of very high volume sites I think EF 4.1 is actually pretty reasonable.
From my PoV, EF is a great developer productivity boost. You lose a fair bit of that if you're writing SPs for simple CRUD operations, and for Insert/Update/Delete in particular I really don't see you gaining much in performance because those operations are so straightforward to generate SQL for. There are definitely some Select cases where EF will not do the optimal thing and you can get major performance increases by writing a SP (hierarchical queries using CONNECT BY in Oracle come to mind as an example).
The best way to deal with that type of thing is to write your app letting EF generate the SQL. Benchmark it. Find areas where there's performance issues and write SPs for those. Delete is almost never going to be one of the cases you need to do this.
As you mentioned, the security gain here is somewhat lessened because you should have EF on an Application tier that has its own account for the app anyway, so you can restrict what it does. SPs do give you a bit more control but in typical usage I don't think it matters.
It's an interesting question that doesn't have a truely right or wrong answer. I use EF primarily so that I don't have to write generic CRUD SPs and can instead spend my time working on the more complex cases, so for me I'd say you should write fewer of them. :)
I agree broadly with E.J, but there are a couple of other options. It really boils down to the requirements for the particular system:
Do you need to get the app developed FAST? - Then use entity framework and its automatic SQL
Need fine-grained and solid security? - Get onto stored procedures
Need it to run as fast as possible? - You're probably looking at some happy medium!
In my opinion as long as your application/database does not suffer from performance issues and you are mostly using the database for CRUD and accessing it using just one DB user, it is better to use generated SQL. It is faster to develop, more maintainable and the few security or more privacy benefits are not worth it (if the data is not so sensitive). Also the use of model based database access or LINQ disabled the threat from SQL injections.
So I have an application which requires very fast access to large volumes of data and we're at the stage where we're undergoing a large re-design of the database, which gives a good opertunity to re-write the data access layer if nessersary!
Currently in our data access layer we use manually created entities along with plain SQL to fill them. This is pretty fast, but this technology is really getting old, and I'm concerned we're missing out on a newer framework or data access method which could be better in terms of neatness and maintainability.
We've seen the Entity Framework, but after some research it just seems that the benefit of the ORM it gives is not enough to justify the lower performance and as some of our queries are getting complex I'm sure performance with the EF would become more of an issue.
So it is a case of sticking with our current methods of data access, or is there something a bit neater than manually creating and maintaining entities?
I guess the thing that's bugging me is just opening our data layer solution and seeing lots of entities, all of which need to be maintained exactly in line with the database, which sometimes can be a lot of work, but then maybe this is the price we pay for performance?
Any ideas, comments and suggestions are very appreciated! :)
Thanks,
Andy.
** Update **
Forgot to mention that we really need to be able to handle using Azure (client requirements), which currently stops us from using stored procedures. ** Update 2 ** Actually we have an interface layer for our DAL which means we can created an Azure implementation which just override data access methods from the Local implementation which aren't suitable for Azure, so I guess we could just use stored procedures for performance sensitive local databases with EF for the cloud.
I would use an ORM layer (Entity Framework, NHibernate etc) for management of individual entities. For example, I would use the ORM / entities layers to allow users to make edits to entities. This is because thinking of your data as entities is conceptually simpler and the ORMs make it pretty easy to code this stuff without ever having to program any SQL.
For the bulk reporting side of things, I would definitely not use an ORM layer. I would probably create a separate class library specifically for standard reports, which creates SQL statements itself or calls sprocs. ORMs are not really for bulk reporting and you'll never get the same flexibility of querying through the ORM as through hand-coded SQL.
Stored procedures for performance. ORMs for ease of development
Do you feel up to troubleshooting some opaque generated SQL when it runs badly...? That generates several round trips where one would do? Or insists on using wrong datatypes?
You could try using mybatis (previously known as ibatis). It allows you to map sql statements to domain objects. This way you keep full control over SQL being executed and get cleanly defined domain model at the same time.
Don't rule out plain old ADO.NET. It may not be as hip as EF4, but it just works.
With ADO.NET you know what your SQL queries are going to look like because you get 100% control over them. ADO.NET forces developers to think about SQL instead of falling back on the ORM to do the magic.
If performance is high on your list, I'd be reluctant to take a dependency on any ORM especially EF which is new on the scene and highly complex. ORM's speed up development (a little) but are going to make your SQL query performance hard to predict, and in most cases slower than hand rolled SQL/Stored Procs.
You can also unit test SQL/Stored Procs independently of the application and therefore isolate performance issues as either DB/query related or application related.
I guess you are using ADO.NET in your DAL already, so I'd suggest investing the time and effort in refactoring it rather than throwing it out.
I work in a team with 15 developers in a large enterprise. We deal with many tables that have millions of records on the OLTP database. The data warehouse database is much larger.
We're embarking on a new system that needs to be developed that will be going against a very similar sized database. Each one of us is highly proficient and very comfortable with SQL, stored procedure etc. examining execution plans, defining the correct indexes etc. We're all also very comfortable in .NET C# and ASP.NET.
However we've each been looking into ORMs independently and none of us is able to understand the real problem they solve. On the contrary, what we do see is the performance issues people have and all the tweaks that need to be done in order to accommodate what the lack.
Another aspect that seems be that people use ORMs so as not to have to get their hands dirty with dealing with the database and SQL etc., but in fact it seems that you can't escape it for too long, especially when it comes to performance.
So my question is, what are the problems ORMs solve (or attempt to solve).
I should note that we have about 900 tables and over 2000 stored procedure in our OLTP database and our data layer is auto generated off of our stored procedures and we use ADO.NET core currently.
ORMs attempt to solve the object-relational impedance mismatch.
That is, since relational databases are ... relational in nature, the data modeling used for them is very different from the type of modeling you would use in OOP.
This difference is known as the "object-relational impedance mismatch". ORMs try to let you use OOP without consideration of how the database is modeled.
ORMs are for object-oriented people who think that everything works better when they're in objects. They have their best opportunity on projects where OO skills are stronger and more plentiful than SQL and relational databases.
If you're writing in an object-oriented language you'll have to deal with objects at one time or another. Whether you use ORM or not, you'll have to get that data out of the tables and into objects on the middle tier so you can work with them. ORM can help you with the tedium of mapping. But it's not 100% necessary. It's a choice like any other.
Don't make the mistake of writing a client/server app with objects. If your objects are little more than carriers of data from the database to the client, I'd say you're doing something wrong. There's value in encapsulating behavior in objects where it makes sense.
I think, as with all these things, the answer to the question of whether you should use an ORM is 'it depends'. In a lot of cases, people may be writing relatively simple applications with relatively small databases in the background.
In these cases, an ORM makes sense, as it allows for easy maintenance (adding a column is a one place change to have it ripple through you application) and quick turnaround.
However, if you are dealing with very large databases and complex data manipulation, then an ORM is possibly not for you. That said, tables with millions of rows should still not be a problem for an ORM, it all depends on how you return and use the data - a well structured database should allow for reasonable performance.
In you case, you can't see the benefit, as it's maybe not suitable to your application - it is for some others.
BTW - what you describe in your question - stored procedures used to generate classes used in your business layer. This is essentially what a good ORM mapper is - get away from writing the boiler-plate data access code and work on the business logic.
Very good question. It depends on what you are building.
If you have complex object structures (objects with many relationships and encapsulated objects) and you manipulate these objects inside memory before committing the transaction, it is much easier to use ORM like Hibernate, because you don't need to worry about SQL, Caching, Just-In-Time object loading, etc. As a bonus feature you will get pretty good database independence.
If you have very simple objects in your application, without much functionality/methods, you can of course use plain SQL/DB connection. However I would recommend you to use ORM anyway, because you will be database independent, more portable and you will be ready to grow, when your system will need just-in-time object loading, long transactions (with caching), etc.
I have worked with many persistence frameworks in my life and I would recommend Hibernate.
In short, its really trying to make the DB appear as objects to the programmer, so the programmer can be more productive. The only benefit (besides assisting dev laziness) is that the columns mapped to classes are strongly typed - no more trying to fetch a string into an integer variable!
From all the years of ORM libraries that have come and gone, they all seem (IMHO) to be just another programmer toy. Ultimately, its another abstraction that might help you when you start out, or when writing small apps. My opinon here is that once you learn a god way of accessing the DB, you might as well continue to use that way rather than learn many ways of doing the same task - I always feel more productive being a specialist, but that could just be me.
The tools required to create and maintain the mapping, generate the classes, etc are another nuisance. In this regard a built-in framework (eg ruby on rails' ActiveRecord approach) is much better.
Performance can be an issue, as can the sql that is generated at the back end - you will nearly alwas be fetching much more data than you needed when using an ORM compared to the small SQL statements you might otherwise write.
The strong typing is good though, and I would praise ORMs for that.
Currently I am working with a custom business object layer (adopting the facade pattern) in which the object's properties are loaded from stored procedures as well as provide a place for business logic. This has been working well in the attempt to move our code base to a more tiered and standardized application model but feel that this approach is more of an evolutionary step rather than a permanent one.
I am currently looking into moving to a more formal framework so that certain architecture decisions won't have to be my own. In the past I have worked with CSLA and Linq to SQL and while I like a lot of the design decisions in CLSA I find it a bit bloated for my tastes and that Linq to SQL might not have the performance I want. I have been interested in the popularity of NHibernate and the push of Linq to Entity however performance is a key issue since there are instances where a large number of records need to be fetched at a time (> 15k) (please do not debate the reason for this) and am curious as far as performance what looks like the best choice for adopting a formal .Net Object Framwork?
NOTE: This will be used primarily in Winform and WPF applications.
Duplicate: https://stackoverflow.com/questions/146087/best-performing-orm-for-net
http://ormbattle.net - the performance test there seems almost exactly what you want to see.
You must look at materialization test (performance of fetching large number of items is exactly what it shows); moreover, you can compare ORM performance with performance of nearly ideal SQL on plain ADO.NET doing the same.
With any ORM you're going to get a boost out of the box via a Level 1 in proc cache. Especially with loads, if it's already there it won't take a trip to Pluto(the DB). Most ORMs have the opportunity to inject a L2 out proc cache. The nice thing about these is that they just plug into the ORM. Checkout NCache for NHibernate.
O/R mapper performance will depend greatly on how your application is designed and how you map the business objects. For example, you could easily kill performance by lazy loading a child object in a loop so that 1 select for 1000 objects turns into 1001 selects (google n+1 select).
One of the biggest performance gains with o/r mappers is in developer productivity, which is often more important than application performance. Application performance is usually acceptable to end users for most applications running on recent hardware. Developer performance continues to be a bottleneck, no matter how much Mountain Dew is applied to the problem. :-)