Usage of raw SQL in EF Core 6 - c#

I am currently trying to find different areas where Linq is not sufficient and FromSqlRaw or ExecuteSqlRaw have to be used.
Some examples I have found are
Bulk updates https://learn.microsoft.com/en-us/ef/core/performance/efficient-updating
Executing stored procedures https://learn.microsoft.com/en-us/ef/core/querying/raw-sql
However I am looking for more areas where Linq does not perform good enough and even queries that cannot be generated from Linq in EF Core when it comes to database access.
My goal is to find poor performing Linq translations and examine the cause.

This is a bit of a solution looking for a problem. Given an application and looking for inefficiencies that might benefit from a different approach is something I would start off using a profiler and observing the database access in as close to a production capacity as I am allowed to get.
EF is like any tool, it can be leveraged to create works of art, and it can be abused and misused to create shanties. Even when done correctly, optimizations like indexes are something that are tuned based on looking at real-world options. There are many options that I would look at to address performance issues before considering direct to SQL. Typical culprits that can be easily identified via profiling:
Lazy loading. (Dozens to hundreds of queries following up a "main" query.)
Over-use of eager loading. (Queries involving a heck of a lot of joins)
Sloppy use of client-side evaluation. (Either enabling that feature in EF Core, or slapping a ToList somewhere when a query complains to "fix" it, AsSplitQuery can help here, Projection is a better solution in most cases)
Lack of pagination where more data is returned than necessary. (Similar to #3, having methods like "GetAll" and then applying filtering, pagination, etc.)
Giving users too much flexibility in querying that they don't need 99% of the time, but in that 1% someone does try it, grinds the system to a halt. (Giving users filters/sorts on ALL columns and performing things like string.Contains by default for text searches)
Giving users access to expensive, but necessary queries in real-time. (Big, justified queries, but being run against the production dataset and not "throttled" by something like a Queue to ensure too many of these monsters don't get run at once.)
Those are some of the top culprits that come to mind around performance, and none of them resort to going to SQL. Batch processing in your list is certainly one case that I believe does deserve looking outside of Linq, and potentially outside of EF all-together. Stored Procs I am mixed on. If there is business logic that is shared between an EF-supported application and another existing system and I want to share that business logic as-is. The trouble is that if I'm relying on the Sproc for business rules then there's little point to EF, and if I'm splitting business rules between C#/EF and Sprocs, then that's having to manage logic in two locations.

Related

Is there a way to break DbContext to area in ASP.NET 6 framework?

I am creating a web application on the top of ASP.NET 6 framework. I am trying to figure out the best ORM to use for this project. I am leaning toward Entity Framework for the following reason
I'll be able to use LINQ to write my queries
I'll be able to access my relations easily and directly using native C# model.
Here is where the complication starts. This app will be connecting to a very large database with over 500 tables. Also, the app is going to be broken down into many small logical areas so it's easy for me to maintain it.
If Entity Framework is the way to go, how should I setup the DbContext so I can manage 500+ DbSet and the relations? In other words, should I create a single DbContext for the entire app even when my app is broken down into multiple Areas? Or should I create a DbContext for each area? But if I do that, what if I need to establish relation across multiple areas? For example, X model in X-area need to create a relation to B model in B-area and C model in C-area? I thought about introducing DbContext inheritance where CAreaDbContext would inherit from BAreaDbContext which inherits from AAreaDbContext but that would break real quick.
Is Entity Framework if the right framework for a large database app? If so, how can I manage the DbContext across multiple areas? If not, what would be the alternative to use without having to write plain SQL queries?
EF is perfectly fine for large databases. When mapping a large number of tables and relationships there is a single-time startup cost for the very first query as EF initializes and validates its mapping, but this is a static cost for an application, not each time a DbContext is initialized.
You can split the application across several DbContexts to help make organizing entities more logical and reduce those initial setup costs. This is generally referred to as using Bounded Contexts if you want to search up examples. These typically organize your application down to aggregate roots or top-level entities with everything else falling under those aggregates or serving as lookups, etc. Entities can be registered with multiple DbContexts, though you should aim to ensure that one aggregate root is nominated for being responsible for editing and creating a given entity.
The most important details to consider with EF and areas of performance and avoiding unwanted/unexpected behaviour would be to ensure you generally don't load more data than you need through the entities, more often than you need to.
Some general advice would include:
Absolutely AVOID the temptation to use the Generic Repository pattern with EF. Non-generic Repositories are great to facilitate unit testing or centralize important, common rules/validation, but Generic flavours lead to inefficient and expensive, or overly complex code, usually both.
Keep DbContext lifetimes as short as possible. For Web applications this should be kept no longer than the Request length (when using an IoC container for instance) or shorter. Worst case, use using blocks to scope your DbContext. The longer a DbContext is kept alive, the more entities it tracks, and the more it tracks, the more it needs to sift through looking for references when loading other entities that might have navigation properties and the slower it gets. Long-lived DbContexts can also get "poisoned" when you have an issue attempting to save entity changes. Those invalid entities will remain tracked by the DbContext and interfere with future unrelated SaveChanges calls until they are removed (Detached) or corrected.
Gain an understanding of Projection using Select or AutoMapper's ProjectTo method. Loading entire entity graphs will get expensive, especially if the DbContext is left to track all of those instances. Projection down to ViewModels/DTOs help ensure that only as much data is needed is ever loaded and transmitted and makes it crystal clear what is being passed around. (As opposed to passing detached entities, or worse, partially filled detached entities)
Understand IQueryable and everything that Linq can bring to working with the data. EF query building is extremely valuable, so you can leverage sorting, filtering, pagination, projection, as well as scenarios to get Counts and check existence (.Any()) all without fetching a ton of data via Entities. See point #1 to avoid falling into this trap.
Use ToList/ToListAsync sparingly and be aware that any logic you feed EF in Linq expressions needs to be able to translated down to SQL. Sometimes you will find yourself trying to build a query where EF complains that it cannot evaluate your expression. Things like calling private methods / unmapped properties. Adding a ToList before the expression will seem like a magic fix, forcing a client-side evaluation. This is an expensive operation as you are effectively fetching (and typically tracking) all entities up to that point then continuing in memory. This gets expensive for memory use.
Asynchronous methods are not a silver bullet and does not make queries faster. Awaiting asynchronous EF methods is very useful when you have queries that are going to take a while to run, or be called extremely often. My advice is to default to synchronous methods and test run your code against production-like volumes as early as possible. I use 250ms as a threshold, but pick something acceptable to you and profile your queries. Anything over that threshold is something that would likely benefit from being made asynchronous. Typically things like searches, especially ones that involve text match searches are good candidates as these can be a bit slow and are generally run fairly frequently by several users at a time. The same goes for any operation that might get called a lot through the course of an application by many users at the same time. async/await doesn't make queries faster, they make them slightly slower, but they do make your server more responsive by not hanging the request until the query finishes. Using this by default makes your code a touch slower and a bit tougher to debug for no real benefit. (As it can easily be introduced as needed.)
Profile your queries. With traditional data access you would create your schema and write your access queries (Sprocs etc.) creating indexes as you go. With EF building your queries, indexing becomes more of a reactionary process where you might add your typical indexes, but should look at the queries being run in a production-like scenario to refine indexes based on high-volume queries that EF is building. This also provides key insight into other inefficiencies that might creep into your queries, as well as performance problems like lazy loading being tripped. Expensive queries should be investigated and optimized where possible.
Prepare to employ things like queuing for truly expensive queries. Systems will often call for things like Reports and data exports or just really expensive query options. Aim to set reasonable expectations by default so for instance avoiding things like string Contains() in text searches opting for string StartsWith(). Where you do need to support expensive queries, build a mechanism to allow users/processes to queue the query details as a request and employ a background worker/pool to pick up and process these requests. The temptation might be to just employ async/await here but the important thing is to avoid situations where too many of these queries are kicked off at once. Queries like this will "touch" a lot of data leading to locks and deadlocks in a system. Users have a bad tendency to repeatedly kick off actions when it looks like one isn't responding which compounds the problem on the back-end.

Best option for dynamic queries?

I'm working on porting an old application to from WebForms to MVC, and part of that process is tearing out the existing data layer, moving the logic from stored procedures to code. As I have initially only worked with basic C# SQL functions (System.Data.SqlClient), I went with a lightweight pseudo-ORM (PetaPoco), which just takes a SQL statement as a string and executes it. Building dynamic queries would work about the same in SQL - lots of conditionals that add and remove additional code (average query has ~30 filters).
So after looking around a bit, I found some choices:
A bunch of strings and conditionals that add bits of the query as they are needed. Really nasty, especially when queries get complex, and not something I want to pursue if a better solution exists.
A bunch of conditionals using L2E. Looks more elegant, but I tested L2E is too bloated in general was an awful experience. Could I do the same thing in L2S? If so, is L2S going to stick around for the next 5-10 years?
Use a PredicateBuilder. Still looking into this, same questions regarding L2S.
EDIT: I can also just stick to the existing stored procedure model, but I have to rewrite them anyway, so it can't hurt to look at other options as I'm still going to have to do the leg work.
Are there any other options out there? Can anyone weigh in with some experience on any of the mentioned methods - mainly, did the method you choose make you want to build a time machine and kill past you for implementing it?
I'd look at LLBLGen. The code that it generates is quite good and customizable. They also provide a robust linq provider which may help with your queries. I used it for a couple large projects and was quite happy.
http://www.llblgen.com/
In my opinion, neither L2S nor L2E can generate efficient SQL code, especially when it comes to complex queries. Even in some relatively simple cases generating queries via either of the two methods would yield inefficient SQL code, here's an example: Why does this additional join increase # of queries?
That being said, if you're using SQL Server L2S is a better option, as L2E is meant to handle any database; Because of which L2E will generate inefficient SQL code. Also another point to keep in mind is neither L2S or L2E will leverage the tempDB, i.e. generating temp-tables or table variables or CTEs.
I would re-write the stored procedures, optimizing them as much as possible, and use L2S/L2E for simple queries, that would generate one round-trip (this should be as low as possible) to the server, and also ensure that the execution plan SQL Server uses is the most efficient (i.e. uses indexes etc).
Hasanain
Not really an answer, but too long for a comment:
I have built a mid-sized web app using the 'concatenate pieces of SQL' method, and am currently in the process of doing a similar job but using L2E.
I found that with some self-control, the concatenate-pices-of-sql method is not that bad. Of course use parameterized queries, don't try to stick user input into the SQL directly.
I have been slowly growing an appreciation for the L2E method though. It gives you type safety, though you do have to do some things "backwards" from how you might do it with SQL -- such as WHERE X IN (...) constructs. But so far I haven't hit anything that L2E can't handle.
I feel like the L2E approach would be a little easier to maintain if other people were to be heavily involved.
Do you have actual use cases where the "bloat" of L2E is a problem? Or is it just a general sense of malaise where you feel the framework is doing too much behind the scenes?
I definitely had that feeling at first (ok, still do), and certainly don't like reading the generated SQL (esp. compared to my handwritten SQL from the previous project), but so far have found L2E pretty good with regard to only hitting the DB when it is actually necessary.
Another concern is what DB you're using, and how up-to-date its L2E bindings are. If you're using SQL Server, then no problem. MySql might be more flaky though. A chunk of L2E's slickness comes from its nice integration with VStudio, and VStudio's ability to build entity models from your DB automagically. Not sure how good the support is for non-MS DB backends.

Does Entity Framework Use Reflection and Hurt Performance?

I ultimately have two areas with a few questions each about Entity Framework, but let me give a little background so you know what context I am asking for this information in.
At my place of work my team is planning a complete re-write of our application structure so we can adhere to more modern standards. This re-write includes a completely new data layer project. In this project most of the team wants to use Entity Framework. I too would like to use it because I am very familiar with it from my using it in personal projects. However, one team member is opposed to this vehemently, stating that Entity Framework uses reflection and kills performance. His other argument is that EF uses generated SQL that is far less efficient than stored procedures. I'm not so familiar with the inner-workings of EF and my searches haven't turned up anything extremely useful.
Here are my questions. I've tried to make them as specific as possible. If you need some clarification please ask.
Issue 1 Questions - Reflection
Is this true about EF using reflection and hurting performance?
Where does EF use reflection if it does?
Are there any resources that compare performance? Something that I could use to objectively compare technologies for data access in .NET and then present it to my team?
Issue 2 Questions - SQL
What are the implications of this?
Is it possible to use stored procedures to populate EF entities?
Again are there some resources that compare the generated queries with stored procedures, and what the implications of using stored procedures to populate entities (if you can) would be?
I did some searching on my own but didn't come up with too much about EF under the hood.
Yes, it does like many other ORMs (NHibernate) and useful frameworks (DI tools). For example WPF cannot work without Reflection.
While the performance implications of using Reflection has not changed much over the course of the last 10 years since .NET 1.0 (although there has been improvements), with the faster hardware and general trend towards readability, it is becoming less of a concern now.
Remember that main performance hit is at the time of reflecting aka binding which is reading the type metadata into xxxInfo (such as MethodInfo) and this happens at the application startup.
Calling reflected method is definitely slower but is not considered much of an issue.
UPDATE
I have used Reflector to look at the source code of EF and I can confirm it heavily uses Reflection.
Answer for Issue 1:
You can take a look at exactly what is output by EF by examining the Foo.Designer.cs file that is generated. You will see that the resulting container does not use reflection, but does make heavy use of generics.
Here are the places that Entity Framework certainly does use reflection:
The Expression<T> interface is used in creating the SQL statements. The extension methods in System.Linq are based around the idea of Expression Trees which use the types in System.Reflection to represent function calls and types, etc.
When you use a stored procedure like this: db.ExecuteStoreQuery<TEntity>("GetWorkOrderList #p0, #p1", ...), Entity Framework has to populate the entity, and at very least has to check that the TEntity type provided is tracked.
Answer for Issue 2:
It is true that the queries are often strange-looking but that does not indicate that it is less efficient. You would be hard pressed to come up with a query whose acutal query plan is worse.
On top of that, you certainly can use Stored Procedures, or even Inline SQL with entity framework, for querying and for Creating, Updating and Deleting.
Aside:
Even if it used reflection everywhere, and didn't let you use stored procedures, why would that be a reason not to use it? I think that you need to have your coworker prove it.
I can comment on Issue 2 about Generated EF Queries are less efficient than Stored Procedures.
Basically yes sometimes the generated queries are a mess and need some tuning. There are many tools to help you correct this, SQL Profiler, LinqPad, and others. But in the end the Generated Queries may look like crap but they do typically run quickly.
Yes you can map EF entities to Procedures. This is possible and will allow you to control some of the nasty generated EF queries. In turn you could also map views to your entities allowing you to control how the views select the data.
I cannot think of resources but I must say this. The comparison to using EF vs SQL stored procedures is apples to oranges. EF provides a robust way of mapping your Database to your code directly. This combined with LINQ to Entity queries will allow your developers to quickly produce code. EF is an ORM where as SQL store procedures is not.
The entity framework likely uses reflection, but I would not expect this to hurt performance. High-end librairies that are based on reflection typically use light-weight code generation to mitigate the cost. They inspect each type just once to generate the code and then use the generated code from that point on. You pay a little when your application starts up, but the cost is negligible from there on out.
As for stored procedures, they are faster than plain old queries, but this benefit is often oversold. The primary advantage is that the database will precompile and store the execution plan for each stored procedure. But the database will also cache the execution plans it creates for plain old SQL queries. So this benefit varies a great deal depending on the number and type of queries your application executes. And yes, you can use stored procedures with the entity framework if you need to.
I don't know if EF uses reflection (I don't believe it does... I can't think of what information would have to be determined at run-time); but even if it did, so what?
Reflection is used all over the place in .NET (e.g. serializers), some of which are called all of the time.
Reflection really isn't all that slow; especially in the context of this discussion. I imagine the overhead of making a database call, running the query, returning it and hydrating the objects dwarf the performance overhead of reflection.
EDIT
Article by Rick Strahl on reflection performance: .Net Reflection and Performance(old, but still relevant).
Entity Framework generated sql queries are fine, even if they are not exactly as your DBA would write by hand.
However it generates complete rubbish when it comes to queries over base types. If you are planning on using a Table-per-Type inheritance scheme, and you anticipate queries on the base types, then I would recommend proceeding with caution.
For a more detailed explanation of this bizarre shortcoming see my question here. take special note of the accepted answer to that question --- this is arguably a bug.
As for the issue of reflection -- I think your co-worker is grasping at straws. It's highly unlikely that reflection will be the root cause of any performance problems in your app.
EF uses reflection. I didn't check it by I think it is the main point of the mapping when materializing instance of entity from database record. You will say what is a column's name and what is the name of a property and when a data reader is executed you must somehow fill the property which you only know by its name.
I believe that all these "performance like" issues are solved correctly by caching needed information for mapping. Performance impact caused by reflection will probably be nothing comparing to network communication with the server, executing the complex query, etc.
If we talk about EF performance check these two documents: Part 1 and Part2. It describes some reason why people sometimes think the EF is very slow.
Are stored procedures faster then EF queries? My answer: I don't think so. The main advantage of the linq is that you can build your query in the application. This is extreamly handy when you need anything like list filtering, paging and sorting + changing number of displayed columns, loading realted entities etc. If you want to implement this in a stored procedure you will either have tens of procedures for diffent query configurations or you will use a dynamic sql. Dynamic sql is exactly what uses EF. But in case of EF that query has compile time validation which is not the case of plain SQL. The only difference in performance is the amount of transfered data when you send whole query to the server and when you send only exec procecdeure and parameters.
It is true that queries are sometimes very strange. Especially inheritance produces sometimes bad queries. But this is already solved. You can always use custom stored procedure or SQL query to return entities, complex types or custom types. EF will materialize results for you so you don't need to bother with data readers etc.
If you want to be objective when presenting Entity framework you should also mention its cons. Entity framework doesn't support command batching. If you need to update or insert multiple entities each sql command have its own roundrip to database. Because of that you should not use EF for data migrations etc. Another problem with EF is almost no hooks for extensibility.

Comparing hand-written ADO/Sprocs w/nHibernate

I am going back and forth between using nHibernate and hand written ado.net/stored procedures.
I currently use codesmith with templates I wrote that spits out simple classes that map my database tables, and it wraps my stored procedures for my data layer, and a thin business logic layer that just calls my data layer and returns the objects (1 object or collection).
This application is a web application, used for online communities (basically a forum).
I am watching summer of nhibernate videos right now.
Will using nHibernate make my life easier? Will updates to the database schema be any easier? What effects will there be on performance?
Is setting up nhibernate, and ensuring it performs optimally a headache of its own?
I don't want a complicated or deep object model, I simply want classes that map my tables, and a way to fetch data from my other tables that have foreign keys to them. I don't want a very complicated OOP model.
NHibernate can definitely make your life easier. Updates to your database schema will definitely be easier, because when you use an ORM, you don't have an API of stored procedures hindering you from refactoring your database schema to meet changes in your business model.
OR mappers have a LOT to offer, and are sadly misunderstood by a significant portion of the developer community, and almost all of the DBA community.
Stored procedures in general give the DBA more options for tuning performance in a database, because they have the freedom to rewrite the stored proc so long as they don't change its output. However, in my experience, stored procedures are rarely rewritten, due to other issues that can arise as a result (i.e. when a deployment of a new version of software is performed, any modified versions of existing procs will overwrite the optimized version that was changed by a DBA...thus negating the benefit and creating a maintenance and unexpected performance issue problem.)
Another grave misconception (and this is primarily from the SQL Server camp...I have very little experience with Oracle), is that Stored Procedures are the only thing that can be compiled and the execution plan cached. As far as SQL Server is concerned, any parameterized query can and probably will be compiled and cached.
A benefit of OR mappers is that they are adaptive...with a stored procedure, you write a single statement that will be used regardless of contextual nuances when that query is executed. LINQ to SQL has an amazing capacity to generate the most efficient queries I've ever seen, and often throws DBA's for a serious loop. I've shown DBA's queries generated by L2S that were full of sub queries and unconventional things which were immediately scoffed at. However, given the challenge, the performance (namely physical reads) of a query written by a DBA that was supposedly superior ended up being significantly inferior (sometimes on a scale of 30 physical reads for L2S vs 400 physical reads for the DBA.)
Another detractor as far as DBA's are concerned is that, because ORM's generate dynamic SQL, they have no way to optimize those queries. On the contrary (and again, this is restricted to SQL Server), SQL Server offers a multitude of optimization paths (horizontal and vertical table partitioning, distribution of physical files accross disks for any table or view, indexes, etc.) that can be taken before the need to modify a query is a necessity. Even in the event that a query needs to be modified, SQL Server 2005 and later provide something called Plan Guides, which allow you to moderately tune any query (stored proc, strait sql, etc.). In the event that tuning a query isn't enough, you can match any particular query to a complete replacement query, allowing the DBA to tune the query as much as they need to (but as a last resort.)
There are many, many benefits that can be gained by using an OR mapper, and NHibernate is one of the best free ones (LLBLGen is also very nice, but is not free.) LINQ to Sql and Entity Framework are some new offerings from Microsoft (L2S is soon to be replaced by EF 4.0 from the .NET 4.0 framework...which will at least rival, if not outpace, NHibernate.) The biggest hurdle to adopting an ORM is usually not the ORM product itself, nor its capabilities or performance. The greatest hurdle is usually convincing your DBA (if your lucky/unlucky enough ... depends on your experience ... to have one) that an ORM can improve efficiency and reduce maintenance costs without a cost of optimization paths for the DBA.
NHibernate works very well, especially for a simple model. It will make your life much easier and isn't too tough to learn. Look at "Fluent NHibernate" instead of using XML mappings, it is much easier.

Are there good reasons not to use an ORM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
During my apprenticeship, I have used NHibernate for some smaller projects which I mostly coded and designed on my own. Now, before starting some bigger project, the discussion arose how to design data access and whether or not to use an ORM layer. As I am still in my apprenticeship and still consider myself a beginner in enterprise programming, I did not really try to push in my opinion, which is that using an object relational mapper to the database can ease development quite a lot. The other coders in the development team are much more experienced than me, so I think I will just do what they say. :-)
However, I do not completely understand two of the main reasons for not using NHibernate or a similar project:
One can just build one’s own data access objects with SQL queries and copy those queries out of Microsoft SQL Server Management Studio.
Debugging an ORM can be hard.
So, of course I could just build my data access layer with a lot of SELECTs etc, but here I miss the advantage of automatic joins, lazy-loading proxy classes and a lower maintenance effort if a table gets a new column or a column gets renamed. (Updating numerous SELECT, INSERT and UPDATE queries vs. updating the mapping config and possibly refactoring the business classes and DTOs.)
Also, using NHibernate you can run into unforeseen problems if you do not know the framework very well. That could be, for example, trusting the Table.hbm.xml where you set a string’s length to be automatically validated. However, I can also imagine similar bugs in a “simple” SqlConnection query based data access layer.
Finally, are those arguments mentioned above really a good reason not to utilise an ORM for a non-trivial database based enterprise application? Are there probably other arguments they/I might have missed?
(I should probably add that I think this is like the first “big” .NET/C# based application which will require teamwork. Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now.)
The short answer is yes, there are really good reasons. As a matter of fact there are cases where you just cannot use an ORM.
Case in point, I work for a large enterprise financial institution and we have to follow a lot of security guidelines. To meet the rules and regulations that are put upon us, the only way to pass audits is to keep data access within stored procedures. Now some may say that's just plain stupid, but honestly it isn't. Using an ORM tool means the tool/developer can insert, select, update or delete whatever he or she wants. Stored procedures provide a lot more security, especially in environments when dealing with client data. I think this is the biggest reason to consider. Security.
The sweet spot of ORMs
ORMs are useful for automating the 95%+ of queries where they are applicable. Their particular strength is where you have an application with a strong object model architecture and a database that plays nicely with that object model. If you're doing a new build and have strong modelling skills on your team then you will probably get good results with an ORM.
You may well have a handful of queries that are better done by hand. In this case, don't be afraid to write a few stored procedures to handle this. Even if you intend to port your app across multiple DBMS platforms the database dependent code will be in a minority. Bearing in mind that you will need to test your application on any platform on which you intend to support it, a little bit of extra porting effort for some stored procedures isn't going to make a lot of difference to your TCO. For a first approximation, 98% portable is just as good as 100% portable, and far better than convoluted or poorly performing solutions to work around the limits of an ORM.
I have seen the former approach work well on a very large (100's of staff-years) J2EE project.
Where an ORM may not be the best fit
In other cases there may be approaches that suit the application better than an ORM. Fowler's Patterns of Enterprise Application Architecture has a section on data access patterns that does a fairly good job of cataloguing various approaches to this. Some examples I've seen of situations where an ORM may not be applicable are:
On an application with a substantial legacy code base of stored procedures you may want to use a functionally oriented (not to be confused with functional languages) data access layer to wrap the incumbent sprocs. This re-uses the existing (and therefore tested and debugged) data access layer and database design, which often represents quite a substantial development and testing effort, and saves on having to migrate data to a new database model. It is often quite a good way wrapping Java layers around legacy PL/SQL code bases, or re-targeting rich client VB, Powerbuilder or Delphi apps with web interfaces.
A variation is where you inherit a data model that is not necessarily well suited to O-R mapping. If (for example) you are writing an interface that populates or extracts data from a foreign interface you may be better off working direclty with the database.
Financial applications or other types of systems where cross-system data integrity is important, particularly if you're using complex distributed transactions with two-phase commit. You may need to micromanage your transactions better than an ORM is capable of supporting.
High-performance applications where you want to really tune your database access. In this case, it may be preferable to work at a lower level.
Situations where you're using an incumbent data access mechanism like ADO.Net that's 'good enough' and playing nicely with the platform is of greater benefit than the ORM brings.
Sometimes data is just data - it may be the case (for example) that your application is working with 'transactions' rather than 'objects' and that this is a sensible view of the domain. An example of this might be a financials package where you've got transactions with configurable analysis fields. While the application itself may be built on an O-O platform, it is not tied to a single business domain model and may not be aware of much more than GL codes, accounts, document types and half a dozen analysis fields. In this case the application isn't aware of a business domain model as such and an object model (beyond the ledger structure itself) is not relevant to the application.
First off - using an ORM will not make your code any easier to test, nor will it necessarily provide any advantages in a Continuous Integration scenerio.
In my experience, whilst using an ORM can increase the speed of development, the biggest issues you need to address are:
Testing your code
Maintaining your code
The solutions to these are:
Make your code testable (using SOLID principles)
Write automated tests for as much of the code as possible
Run the automated tests as often as possible
Coming to your question, the two objections you list seem more like ignorance than anything else.
Not being able to write SELECT queries by hand (which, I presume, is why the copy-paste is needed) seems to indicate that there's a urgent need for some SQL training.
There are two reasons why I'd not use an ORM:
It is strictly forbidden by the company's policy (in which case I'd go work somewhere else)
The project is extremely data intensive and using vendor specific solutions (like BulkInsert) makes more sense.
The usual rebuffs about ORMs (NHibernate in particular) are:
Speed
There is no reason why using an ORM would be any slower than hand coded Data Access. In fact, because of the caching and optimisations built into it, it can be quicker.
A good ORM will produce a repeatable set of queries for which you can optimise your schema.
A good ORM will also allow efficient retrieval of associated data using various fetching strategies.
Complexity
With regards to complexity, using an ORM means less code, which generally means less complexity.
Many people using hand-written (or code generated) data access find themselves writing their own framework over "low-level" data access libraries (like writing helper methods for ADO.Net). These equate to more complexity, and, worse yet, they're rarely well documented, or well tested.
If you are looking specifically at NHibernate, then tools like Fluent NHibernate and Linq To NHibernate also soften the learning curve.
The thing that gets me about the whole ORM debate is that the same people who claim that using an ORM will be too hard/slow/whatever are the very same people who are more than happy using Linq To Sql or Typed Datasets. Whilst the Linq To Sql is a big step in the right direction, it's still light years behind where some of the open source ORMs are. However, the frameworks for both Typed Datasets and for Linq To Sql is still hugely complex, and using them to go too far of the (Table=Class) + (basic CRUD) is stupidly difficult.
My advice is that if, at the end of the day, you can't get an ORM, then make sure that your data access is separated from the rest of the code, and that you you follow the Gang Of Four's advice of coding to an interface. Also, get a Dependancy Injection framework to do the wiring up.
(How's that for a rant?)
There are a wide range of common problems for which ORM tools like Hibernate are a god-send, and a few where it is a hindrance. I don't know enough about your project to know which it is.
One of Hibernate's strong points is that you get to say things only 3 times: every property is mentioned in the class, the .hbm.xml file, and the database. With SQL queries, your properties are in the class, the database, the select statements, the insert statements, the update statements, the delete statements, and all the marshalling and unmarshalling code supporting your SQL queries! This can get messy fast. On the other hand, you know how it works. You can debug it. It's all right there in your own persistence layer, not buried in the bowels of a 3rd party tool.
Hibernate could be a poster-child for Spolsky's Law of Leaky Abstractions. Get a little bit off the beaten path, and you need to know deep internal workings of the tool. It can be very annoying when you know you could have fixed the SQL in minutes, but instead you are spending hours trying to cajole your dang tool into generating reasonable SQL. Debugging is sometimes a nightmare, but it's hard to convince people who have not been there.
EDIT: You might want to look into iBatis.NET if they are not going to be turned around about NHibernate and they want control over their SQL queries.
EDIT 2: Here's the big red flag, though: "Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now." So, these "experienced" developers, what are they experienced in developing? Their job security? It sounds like you might be among people who are not particularly interested in the field, so don't let them kill your interest. You need to be the balance. Put up a fight.
There's been an explosion of growth with ORMs in recent years and your more experienced coworkers may still be thinking in the "every database call should be through a stored procedure" mentality.
Why would an ORM make things harder to debug? You'll get the same result whether it comes from a stored proc or from the ORM.
I guess the only real detriment that I can think of with an ORM is that the security model is a little less flexible.
EDIT: I just re-read your question and it looks they are copy and pasting the queries into inline sql. This makes the security model the same as an ORM, so there would be absolutely no advantage over this approach over an ORM. If they are using unparametrized queries then it would actually be a security risk.
I worked on one project where not using an ORM was very successfully. It was a project that
Had to be horizontally scalealbe from the start
Had to be developed quickly
Had a relatively simple domain model
The time that it would have taken to get NHibernate to work in a horizontally partitioned structure would have been much longer than the time that it took to develop a super simple datamapper that was aware of our partitioning scheme...
So, in 90% of projects that I have worked on an ORM has been an invaluable help. But there are some very specific circumstances where I can see not using an ORM as being best.
Let me first say that ORMs can make your development life easier if integrated properly, but there are a handful of problems where the ORM can actually prevent you from achieving your stated requirements and goals.
I have found that when designing systems that have heavy performance requirements that I am often challenged to find ways to make the system more performant. Many times, I end up with a solution that has a heavy write performance profile (meaning we're writing data a lot more than we're reading data). In these cases, I want to take advantage of the facilities the database platform offers to me in order to reach our performance goals (it's OLTP, not OLAP). So if I'm using SQL Server and I know I have a lot of data to write, why wouldn't I use a bulk insert... well, as you may have already discovered, most ORMS (I don't know if even a single one does) do not have the ability to take advantage of platform specific advantages like bulk insert.
You should know that you can blend the ORM and non-ORM techniques. I've just found that there are a handful of edge cases where ORMs can not support your requirements and you have to work around them for those cases.
For a non-trivial database based enterprise application, there really is no justifying not using an ORM.
Features aside:
By not using an ORM, you are solving a problem that has already
solved repeatedly by large communities or companies with significant
resources.
By using an ORM, the core piece of your data access layer benefits
from the debugging efforts of that community or company.
To put some perspective in the argument, consider the advantages of using ADO.NET vs. writing the code to parse the tabular data stream oneself.
I have seen ignorance of how to use an ORM justify a developer's disdain for ORMs For example: eager loading (something I noticed you didn't mention). Imagine you want to retrieve a customer and all of their orders, and for those all of the order detail items. If you rely on lazy loading only, you will walk away from your ORM experience with the opinion: "ORMs are slow." If you learn how to use eager loading, you will do in 2 minutes with 5 lines of code, what your colleagues will take a half a day to implement: one query to the database and binding the results to a hierarchy of objects. Another example would be the pain of manually writing SQL queries to implement paging.
The possible exception to using an ORM would be if that application were an ORM framework designed to apply specialized business logic abstractions, and designed to be reused on multiple projects. Even if that were the case, however, you would get faster adoption by enhancing an existing ORM with those abstractions.
Do not let the experience of your senior team members drag you in the opposite direction of the evolution of computer science. I have been developing professionally for 23 years, and one of the constants is the disdain for the new by the old-school. ORMs are to SQL as the C language was to assembly, and you can bet that the equivalents to C++ and C# are on their way. One line of new-school code equals 20 lines of old-school.
When you need to update 50000000 records. Set a flag or whatever.
Try doing this using an ORM without calling a stored procedure or native SQL commands..
Update 1 : Try also retrieving one record with only a few of its fields. (When you have a very "wide" table). Or a scalar result. ORMs suck at this too.
UPDATE 2 : It seems that EF 5.0 beta promises batch updates but this is very hot news (2012, January)
I think there are many good reasons to not use an ORM. First and foremost, I'm a .NET developer and I like to stick within what the wonderful .NET framework has already provided to me. It does everything I possibly need it to. By doing this, you stay with a more standard approach, and thus there is a much better chance of any other developer working on the same project down the road being able to pick up what's there and run with it. The data access capabilities already provided by Microsoft are quite ample, there's no reason to discard them.
I've been a professional developer for 10 years, lead multiple very successful million+ dollar projects, and I have never once written an application that needed to be able to switch to any database. Why would you ever want a client to do this? Plan carefully, pick the right database for what you need, and stick with it. Personally SQL Server has been able to do anything I've ever needed to do. It's easy and it works great. There's even a free version that supports up to 10GB data. Oh, and it works awesome with .NET.
I have recently had to start working on several projects that use an ORM as the datalayer. I think it's bad, and something extra I had to learn how to use for no reason whatsoever. In the insanely rare circumstance the customer did need to change databases, I could have easily reworked the entire datalayer in less time than I've spent fooling with the ORM providers.
Honestly I think there is one real use for an ORM: If you're building an application like SAP that really does need the ability to run on multiple databases. Otherwise as a solution provider, I tell my clients this application is designed to run on this database and that is how it is. Once again, after 10 years and a countless number of applications, this has never been a problem.
Otherwise I think ORMs are for developers that don't understand less is more, and think the more cool 3rd party tools they use in their app, the better their app will be. I'll leave things like this to the die hard uber geeks while I crank out much more great software in the meantime that any developer can pick up and immediately be productive with.
I think that maybe when you work on bigger systems you can use a code generator tool like CodeSmith instead of a ORM... I recently found this: Cooperator Framework which generates SQL Server Stored Procedures and also generates your business entities, mappers, gateways, lazyload and all that stuff in C#...check it out...it was written by a team here in Argentina...
I think it's in the middle between coding the entire data access layer and use a ORM...
Personally, i have (until recently) opposed to use an ORM, and used to get by with writing a data access layer encapsulating all the SQL commands. The main objection to ORMs was that I didn't trust the ORM implementation to write exactly the right SQL. And, judging by the ORMs i used to see (mostly PHP libraries), i think i was totally right.
Now, most of my web development is using Django, and i found the included ORM really convenient, and since the data model is expressed first in their terms, and only later in SQL, it does work perfectly for my needs. I'm sure it wouldn't be too hard to outgrow it and need to supplement with hand-written SQL; but for CRUD access is more than enough.
I don't know about NHibernate; but i guess it's also "good enough" for most of what you need. But if other coders don't trust it; it will be a prime suspect on every data-related bug, making verification more tedious.
You could try to introduce it gradually in your workplace, focus first on small 'obvious' applications, like simple data access. After a while, it might be used on prototypes, and it might not be replaced...
If it is an OLAP database (e.g. static, read-only data used for reporting/analytics, etc.) then implementing an ORM framework is not appropriate. Instead, using the database's native data access functionality such as stored procedures would be preferable. ORMs are better suited for transactional (OLTP) systems.
Runtime performance is the only real downside I can think of but I think that's more than a fair trade-off for the time ORM saves you developing/testing/etc. And in most cases you should be able to locate data bottlenecks and alter your object structures to be more efficient.
I haven't used Hibernate before but one thing I have noticed with a few "off-the-shelf" ORM solutions is a lack of flexibility. I'm sure this depends on which you go with and what you need to do with it.
There are two aspects of ORMs that are worrisome. First, they are code written by someone else, sometimes closed source, sometimes open source but huge in scope. Second, they copy the data.
The first problem causes two issues. You are relying on outsiders code. We all do this, but the choice to do so should not be taken lightly. And what if it doesn't do what you need? When will you discover this? You live inside the box that your ORM draws for you.
The second problem is one of two phase commit. The relational database is being copied to a object model. You change the object model and it is supposed to update the database. This is a two phase commit and not the easiest thing to debug.
I suggest this reading for a list of the downsides of ORMs.
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
For my self, I've found ORMs very useful for most applications I've written!
/Asger
The experience I've had with Hibernate is that its semantics are subtle, and when there's problems, it's a bit hard to understand what's going wrong under the hood. I've heard from a friend that often one starts with Criteria, then needs a bit more flexibility and needs HQL, and later notices that after all, raw SQL is needed (for example, Hibernate doesn't have union AFAIK).
Also with ORM, people easily tend to overuse existing mappings/models, which leads to that there's an object with lots of attributes that aren't initiliazed. So after the query, inside transaction Hibernate makes additional data fetching, which leads to potential slow down. Also sadly, the hibernate model object is sometimes leaked into the view architecture layer, and then we see LazyInitializationExceptions.
To use ORM, one should really understand it. Unfortunately one gets easily impression that it's easy while it's not.
Not to be an answer per se, I want to rephrase a quote I've heard recently. "A good ORM is like a Yeti, everyone talks about one but no one sees it."
Whenever I put my hands on an ORM, I usually find myself struggling with the problems/limitations of that ORM. At the end, yes it does what I want and it was written somewhere in that lousy documentation but I find myself losing another hour I will never get. Anyone who used nhibernate, then fluent nhibernate on postgresql would understand what I've been thru. Constant feeling of "this code is not under my control" really sucks.
I don't point fingers or say they're bad, but I started thinking of what I'm giving away just to automate CRUD in a single expression. Nowadays I think I should use ORM's less, maybe create or find a solution that enables db operations at minimum. But it's just me. I believe some things are wrong in this ORM arena but I'm not skilled enough to express it what not.
I think that using an ORM is still a good idea. Especially considering the situation you give. It sounds by your post you are the more experienced when it comes to the db access strategies, and I would bring up using an ORM.
There is no argument for #1 as copying and pasting queries and hardcoding in text gives no flexibility, and for #2 most orm's will wrap the original exception, will allow tracing the queries generated, etc, so debugging isnt rocket science either.
As for validation, using an ORM will also usually allow much easier time developing validation strategies, on top of any built in validation.
Writing your own framework can be laborious, and often things get missed.
EDIT: I wanted to make one more point. If your company adopts an ORM strategy, that further enhances its value, as you will develop guidelines and practices for using and implementing and everyone will further enhance their knowledge of the framework chosen, mitigating one of the issues you brought up. Also, you will learn what works and what doesnt when situations arise, and in the end it will save lots of time and effort.
Every ORM, even a "good one", comes saddled with a certain number of assumptions that are related to the underlying mechanics that the software uses to pass data back and forth between your application layer and your data store.
I have found that for moderately sophisticated application, that working around those assumptions usually takes me more time than simply writing a more straightfoward solution such as: query the data, and manually instantiate new entities.
In particular, you are likely to run into hitches as soon as you employ multi-column keys or other moderately-complex relationships that fall just outside the scope of the handy examples that your ORM provided you when you downloaded the code.
I concede that for certain types of applications, particularly those that have a very large number of database tables, or dynamically-generated database tables, that the auto-magic process of an ORM can be useful.
Otherwise, to hell with ORMs. I now consider them to basically be a fad.

Categories

Resources