I've used the entity framework in a couple of projects. In every project, I've used stored procedures mapped to the entities because of the well known benefits of stored procedures - security, maintainability, etc. However, 99% of the stored procedures are basic CRUD stored procedures. This seems like it negates one of the major, time saving features of the Entity Framework -- SQL generation.
I've read some of the arguments regarding stored procedures vs. generated SQL from the Entity Framework. While using CRUD SPs is better for security, and the SQL generated by EF is often more complex than necessary, does it really buy anything in terms of performance or maintainability to use SPs?
Here is what I believe:
Most of the time, modifying an SP requires updating the data model
anyway. So, it isn't buying much in terms of maintainability.
For web applications, the connection to the database uses a single user ID specific to the application. So, users don't even have direct database access. That reduces the security benefit.
For a small application, the slightly decreased performance from using
generated SQL probably wouldn't be much of an issue. For high
volume, performance critical applications, would EF even be a wise
choice? Plus, are the insert / update / delete statements generated
by EF really that bad?
Sending every attribute to a stored procedure has it's own performance penalties, whereas the EF generated code only sends the attributes that were actually changed. When doing updates to large tables, the increased network traffic and overhead of updating all attributes probably negates the performance benefit of stored procedures.
With that said, my specific questions are:
Are my beliefs listed above correct? Is the idea of always using SPs something that is "old school" now that ORMs are gaining in popularity? In your experience, which is the better way to go with EF -- mapping SPs for all insert / update / deletes, or using EF generated SQL for CRUD operations and only using SPs for more complex stuff?
I think always using SP's is somewhat old school. I used to code that way, and now do everything I can in EF generated code...and when I have a performance problem, or other special need, I then add back in a strategic SP to solve a particular problem....it doesn't have to be either or - use both.
All my basic CRUD operations are straight EF generated code - my web apps used to have 100's or more of SP's, now a typical one will have a dozen SP's and everything else is done in my C# code....and my productivity has gone WAY up by eliminating those 95% of CRUD stored procs.
Yes your beliefs are absolutely correct. Using stored procedures for data manipulation has meaning mainly if:
Database follows strict security rules where changing data is allowed only through stored procedures
You are using views or custom queries for mapping your entities and you need advanced logic in stored procedure to push data back
You have some advanced logic (related to data) in the procedure for any other reason
Using procedures for pure CUD where non of mentioned cases applies is redundant and it doesn't provide any measurable performance boost except single scenario
You will use stored procedures for batch / bulk modifications
EF doesn't have bulk / batch functionality so changing 1000 records result in 1000 updates each executed with separate database roundtrip! But such procedures cannot be mapped to entities anyway and must be executed separately via function import (if possible) or directly as ExecuteStoreCommand or old ADO.NET (for example if you want to use table valued parameter).
The whole different story can be R in CRUD where stored procedure can get significant performance boost for reading data with your own optimized queries.
If performance is your primary concern, then you should take one of your existing apps that uses EF with SPs, disable the SPs, and benchmark the new version. That's the only way to get an answer perfectly applicable to your situation. You might find that EF no matter what you do isn't fast enough for your performance needs compared to custom code, but outside of very high volume sites I think EF 4.1 is actually pretty reasonable.
From my PoV, EF is a great developer productivity boost. You lose a fair bit of that if you're writing SPs for simple CRUD operations, and for Insert/Update/Delete in particular I really don't see you gaining much in performance because those operations are so straightforward to generate SQL for. There are definitely some Select cases where EF will not do the optimal thing and you can get major performance increases by writing a SP (hierarchical queries using CONNECT BY in Oracle come to mind as an example).
The best way to deal with that type of thing is to write your app letting EF generate the SQL. Benchmark it. Find areas where there's performance issues and write SPs for those. Delete is almost never going to be one of the cases you need to do this.
As you mentioned, the security gain here is somewhat lessened because you should have EF on an Application tier that has its own account for the app anyway, so you can restrict what it does. SPs do give you a bit more control but in typical usage I don't think it matters.
It's an interesting question that doesn't have a truely right or wrong answer. I use EF primarily so that I don't have to write generic CRUD SPs and can instead spend my time working on the more complex cases, so for me I'd say you should write fewer of them. :)
I agree broadly with E.J, but there are a couple of other options. It really boils down to the requirements for the particular system:
Do you need to get the app developed FAST? - Then use entity framework and its automatic SQL
Need fine-grained and solid security? - Get onto stored procedures
Need it to run as fast as possible? - You're probably looking at some happy medium!
In my opinion as long as your application/database does not suffer from performance issues and you are mostly using the database for CRUD and accessing it using just one DB user, it is better to use generated SQL. It is faster to develop, more maintainable and the few security or more privacy benefits are not worth it (if the data is not so sensitive). Also the use of model based database access or LINQ disabled the threat from SQL injections.
Related
I am working on a .NET web api service(with Odata support) to support Mobile client.The service should support both Oracle and SQL server databases, but only one database type will be used at a time, according to which ever database technology client is using.
How to create database agnostic data access layer? Dont want to write code twice - once for SQL server and once for Oracle.
Also it seems like in order to support oracle in EF, 3rd party oracle drivers are required - either from devart or oracle's ODP.NET.
I am debating should I use old style ADO.NET or use EF for building data access layer.
I will appreciate any help on this.
Thanks!
Your question seems to revolve around multiple concerns, i'll give answers based on my views on them:
1.- ¿How can you create a Database (DB Engine) agnostic DAL?
A: One approach for this is to follow the Repository pattern and/or use interfaces to decouple the code that manipulates the data from the code that retrieves/inserts it. The actual implementation of the interfaces used by your code to get the data can also be taylored to be DB Engine agnostic, if you're going to use ADO.NET, you can check out the Enterprise Library for some very useful code which is DB Engine agnostic. Entity Framework is also compatible with different DB engines but, as you mentioned, can only interact with one DB at a time, so whenever you generate the model, you tie it to the specifics of the DB Engine that your DB is hosted in. This is related to another concern in your question:
2.- ¿Should you use plain old ADO.NET or EF?
A: This is a very good question, which i'm sure has been asked before many times and given that both approaches give you the same practical results of being able to retrieve and manipulate data, the resulting question is: ¿what is your personal preference for coding and the time/resources constraints of the project?
IMO, Entity Framework is best suited for Code-First projects and when your business logic doesn't require complex logging, transactions and other security or performance constraints on the DB side, not because EF is not capable of including these requirements, but because it becomes rather convoluted and unpractical to do it and i personally believe that defeats the purpose of EF, to provide you with a tool that allows for rapid development.
So, if the people involved in the project is not very comfortable writing stored procedures in SQL and the data manipulation will revolve mostly around your service without the need for very complex operations on the DB side, then EF is a suitable approach, and you can leverage the Repository pattern as well as interfaces to implement "DBContext" objects that will allow you to create a DB Agnostic DAL.
However, if you are required to implement transactions, security, extensive logging, and are more comfortable writing SQL stored procedures, Entity Framework will often prove to be a burden for you simply because it is not yet suited for advanced tasks, for example:
Imagine you have a User table, with multiple fields (Address, phone, etc) that are not always necessary for all user-related operations (such as authentication); Trying to map an entity to the results of a stored procedure that does not return any of the fields that the entity contains will result in an error, and you will either need to create different models with more or less members or return additional columns in the SP that you might not need for a particular operation, increasing the bandwith consumption unnecessarily.
Another situation is taking advantage of features such as Table Valued Parameters in SQL Server to optimize sending multiple records at once to the DB, in this case Entity Framework does not include anything that will automatically optimize operations with multiple records, so in order to use TVPs you will need to manually define that operation, much like you would if you had gone the ADO.NET route.
Eventually, you will have to weigh the considerations of your project against what your alternatives provide you; ADO.NET gives you the best performance and customization for your DB operations, it is highly scalable and allows optimizations but it takes more time to code, while EF is very straightforward and practical for objects manipulation, and though it is constantly evolving and improving, its performance and capabilities are not quite on pair with ADO.NET yet.
And regarding the drivers issue, it shouldn't weigh too much in the matter since even Oracle encourages you to use their driver instead of the default one provided by Microsoft.
When I am developing an ASP.NET website I do really like to use Entity Framework with both database-first or code-first models (+ asp.net mvc controllers scaffolding).
For an application requiring to access an existing database, I naturally thought to create a database model and to use asp.net mvc scaffolding to get all the basic CRUD operations done in a few minutes with nearly no development costs.
But I discussed with a friend who told me that accessing data stored in the database only through stored procedures is the best approach to take.
My question is thus, what do you think of this sentence? Is it better to create stored procedures for any required operations on a table in the database (e.g. create and read on this table, update and delete only on another one, ...)? And what are the advantages/disadvantages of doing so instead of using a database-first model created from the tables in the database?
What I thought at first is that it double costs of development to do everything through stored procedures as you have to write these stored procedures where Entity Framework could have provided DbContext in a few clicks, allowing me to use LINQ over Entities, ... But then I've read a few stuff about Ownership Chains that might improve security by setting only permissions to execute stored procedures and no permissions for any operations (select, insert, update, delete) on the tables.
Thank you for your answers.
Its a cost benefit analysis. Being a DB focused guy, I would agree with that statement. It is best. It also makes you code easier to read (no crazy sql statements uglifying it). Increased performance with cached execution plans. Ease of modifying the querying without recompiling the code, eetc.
Many of the ppl I work with are not all that familiar with writing SPROCs so it tends to be a constant fight with them use them. Personally I dont see any reason to ever bury SQLStatments in your code. They tend to shy away from them b/c it is more work for them up front.
Yes, it's a good approach.
Whether it's the best approach or not, that depends on a lot of factors, some of them which you don't even know yet.
One important factor is how much furter development there will be, and how much maintainence. If the initial development is a big part of the total job, then you should rather use a method that gets you there as fast and easy as possible.
If you will be working with and maintaining the system for a long time, you should focus less on the initial development time, and more on how easy it is to make changes to the system once it's up and running. Using stored procedures is one way to make the code less depending on the exact data layout, and allows you to make changes without a lot of down time.
Note that it's not neccesarily a choise between stored procedures and Entity Framework. You can also use stored procedures with Entity Framework.
This is primarily an opinion based question and the answer may depend on the situation. Using stored procedure is definetely one of the best ways to query the database but since the emergence of Entity Framework it is widely used. The advantage of Entity Framework is that it provides a higher level of abstraction.
Entity Framework applications provide the following benefits:
Applications can work in terms of a more application-centric conceptual model, including types with inheritance, complex members,
and relationships.
Applications are freed from hard-coded dependencies on a particular data engine or storage schema.
Mappings between the conceptual model and the storage-specific schema can change without changing the application code.
Developers can work with a consistent application object model that can be mapped to various storage schemas, possibly implemented in
different database management systems.
Multiple conceptual models can be mapped to a single storage schema.
Language-integrated query (LINQ) support provides compile-time syntax validation for queries against a conceptual model.
You may also check this related question Best practice to query data from MS SQL Server in C Sharp?
following are some Stored Procedure advantages
Encapsulate multiple statements as single transactions using stored procedured
Implement business logic using temp tables
Better error handling by having tables for capturing/logging errors
Parameter validations / domain validations can be done at database level
Control query plan by forcing to choose index
Use sp_getapplock to enforce single execution of procedure at any time
in addition entity framework will adds an overhead for each request you make, as entity framework will use reflection for each query. So, by implementing stored procedure you will gain in time as it's compiled and not interpreted each time like a normal entity framework query.
The link bellow give some reasons why you should use entity framework
http://kamelbrahim.blogspot.com/2013/10/why-you-should-use-entity-framework.html
Hope this can enlighten you a bit
So I'm gonna give you a suggestion, and it will be something I've done, but not many would say "I do that".
So, yes, I used stored procedures when using ADO.NET.
I also (at times) use ORM's, like NHibernate and EntityFramework.
When I use ADO.NET, I use stored procedures.
When you get data from the database, you have to turn it into something on the DotNet side.
The quickest thing is to put data into a DataTable or DataSet.
I no longer favor this method. While it may make for RAPID development ("just stuff the data into a datatable")......it does not work well for MAINTENANCE, even if that maintenance is only 2-3 months down the road.
So what do I put the data into?
I create DTO/POCO objects and hydrate the data from the database into these objects.
For example.
The NorthWind database has
Customer(s)
Order(s)
and OrderDetail(s)
So I create a csharp class called Order.cs, Order.cs and OrderDetail.cs.
These ONLY contain properties of the entity. Most of the time, the properties simple reflect the columns in the database for that entity. (Order.cs has properties, that simulate a Select * from dbo.Order where OrderID = 123 for example).
Then I create a child-collection object
public class OrderCollection : List<Order>{}
and then the parent object gets a property.
public class Customer ()
{
/* a bunch of scalar properties */
public OrderCollection Orders {get;set;}
}
So now you have a stored procedure. And it gets data.
When that data comes back, one way to get it is with an IDataReader. (.ExecuteReader).
When this IDataReader comes back, I loop over it, and populate the Customer(.cs), the Orders, and the OrderDetails.
This is basic, poor man's ORM (object relation mapping).
Back to how I code my stored procedures, I would write a procedure that returns 3 resultsets, (one db hit) and return the info about the Customer, the Order(s) (if any) and the OrderDetails(s) (if any exist).
Note that I do NOT do alot of JOINING.
When you do a "Select * from dbo.Customer c join dbo.Orders o on c.CustomerID = o.CustomerId, you'll note you get redundant data in the first columns. This is what I do not like.
I prefer multiple resultsets OVER joining and bringing back a single resultset with redundant data.
Now for the little special trick.
Whenever I select from a table, I always select all columns on that table.
So whenever I write a stored procedure that needs customer data, I do a
Select A,B,C,D,E,F,G from dbo.Customer where (......)
Now, alot of people will argue that. "Why do you bring back more info than you need?"
Well, real ORM's do this anyway. So I am poor-man reflecting this.
And, my code for taking the resultset(s) from the stored procedure to turn that into instances of objects........stays consistent.
Because if you write 3 stored procedures, and each one selects data from Customer table, BUT you select different columns and/or in a different order, youre "object mapper" code needs to have a method for each stored procedure.
This method of ADO.NET has served me well.
And, once my team swapped out ADO.NET for a real ORM, and that transition was very pain free because of the way we did the ADO.NET from the get go.
Quick rules of thumb:
1. If using ADO.NET, use stored procedures.
2. Get multiple result-sets, instead of redundant data via joins.
3. Make your columns consistent from any table you select from.
4. Take the results of your stored procedure call, and write a "hydrater" to take that info and put into your domain-model as soon as you can. (the .cs classes)
That has served me well for many years.
Good luck.
In my opinion :
Stored Procedures are written in big iron database "languages" like PL/SQL or T-SQL
Stored Procedures typically cannot be debugged in the same IDE your write your UI.
Stored Procedures don't provide much feedback when things go wrong.
Stored Procedures can't pass objects.
Stored Procedures hide business logic.
Source :
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html
So I have an application which requires very fast access to large volumes of data and we're at the stage where we're undergoing a large re-design of the database, which gives a good opertunity to re-write the data access layer if nessersary!
Currently in our data access layer we use manually created entities along with plain SQL to fill them. This is pretty fast, but this technology is really getting old, and I'm concerned we're missing out on a newer framework or data access method which could be better in terms of neatness and maintainability.
We've seen the Entity Framework, but after some research it just seems that the benefit of the ORM it gives is not enough to justify the lower performance and as some of our queries are getting complex I'm sure performance with the EF would become more of an issue.
So it is a case of sticking with our current methods of data access, or is there something a bit neater than manually creating and maintaining entities?
I guess the thing that's bugging me is just opening our data layer solution and seeing lots of entities, all of which need to be maintained exactly in line with the database, which sometimes can be a lot of work, but then maybe this is the price we pay for performance?
Any ideas, comments and suggestions are very appreciated! :)
Thanks,
Andy.
** Update **
Forgot to mention that we really need to be able to handle using Azure (client requirements), which currently stops us from using stored procedures. ** Update 2 ** Actually we have an interface layer for our DAL which means we can created an Azure implementation which just override data access methods from the Local implementation which aren't suitable for Azure, so I guess we could just use stored procedures for performance sensitive local databases with EF for the cloud.
I would use an ORM layer (Entity Framework, NHibernate etc) for management of individual entities. For example, I would use the ORM / entities layers to allow users to make edits to entities. This is because thinking of your data as entities is conceptually simpler and the ORMs make it pretty easy to code this stuff without ever having to program any SQL.
For the bulk reporting side of things, I would definitely not use an ORM layer. I would probably create a separate class library specifically for standard reports, which creates SQL statements itself or calls sprocs. ORMs are not really for bulk reporting and you'll never get the same flexibility of querying through the ORM as through hand-coded SQL.
Stored procedures for performance. ORMs for ease of development
Do you feel up to troubleshooting some opaque generated SQL when it runs badly...? That generates several round trips where one would do? Or insists on using wrong datatypes?
You could try using mybatis (previously known as ibatis). It allows you to map sql statements to domain objects. This way you keep full control over SQL being executed and get cleanly defined domain model at the same time.
Don't rule out plain old ADO.NET. It may not be as hip as EF4, but it just works.
With ADO.NET you know what your SQL queries are going to look like because you get 100% control over them. ADO.NET forces developers to think about SQL instead of falling back on the ORM to do the magic.
If performance is high on your list, I'd be reluctant to take a dependency on any ORM especially EF which is new on the scene and highly complex. ORM's speed up development (a little) but are going to make your SQL query performance hard to predict, and in most cases slower than hand rolled SQL/Stored Procs.
You can also unit test SQL/Stored Procs independently of the application and therefore isolate performance issues as either DB/query related or application related.
I guess you are using ADO.NET in your DAL already, so I'd suggest investing the time and effort in refactoring it rather than throwing it out.
Hi all I wanted to know when I should prefer writing stored procedures over writing programming logic and pulling data using a ORM or something else.
Stored procedures are executed on server side.
This means that processing large amounts of data does not require passing these data over the network connection.
Also, with stored procedures, you can build consistent complicated business logic.
Say, you need to update the account balance each time you insert a transaction, and you need to insert many transactions at once.
Instead of doing this with triggers (which are implemented using inefficient record-by-record approach in many systems), you can pass a table variable or temporary table with the inputs and issue a set-based SQL statement inside the procedure. This will be much more efficient.
I prefer SPs over programming logic mainly for two reasons
Performance, anything what will reduce result set or can be more effectively done on the server, e.g.:
paging
filtering
ordering (on indexed columns)
Security -- if someone have got application's access to the database and wants to wipe out your all your records, having to execute Row_Delete for single each of them instead of DELETE FROM Rows already sounds good.
Never unless you identify a performance issue. (largely opinion)
(a Jeff blog post!)
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html
If you see stored procs as optimizations:
http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize
When appropriate.
complex data validation/checking logic
avoid several round trips to do one action in the DB
several clients
anything that should be set based
You can't say "never" or "always".
There is also the case where the database engine will outlive your client code. I bet there's more DAL or ORM upgrades/refactoring that DB engine upgrades/refactoring going on.
Finally, why can't I encapsulate code in a stored proc? Isn't that a good thing?
As ever, much of your decision as to which to use will depend on your application and its environment.
There are a couple of schools of thought here, and this debate always arouses strong sentiments on both sides.
The advantanges of Stored Procedures (as well as the large data moving that Quassnoi has mentioned) are that the logic is tied down in the database, and therefore potentially more secure. It is also only ever in one place.
However, there will be others who believe that the place for application logic should be in the application, especially if you are planning to access other types of datebases (for which you will have to write often different SPs).
Another consideration may be the skills of the resources you have to implement your application.
The point at which stored procedures become preferable to an ORM is that point at which you have multiple applications talking to the same database. At this point, you want your query logic embedded in one place, rather than once per application. And even here, you might want to prefer a service layer (which can scale horizontally) instead of the database (which only scales vertically).
I am going back and forth between using nHibernate and hand written ado.net/stored procedures.
I currently use codesmith with templates I wrote that spits out simple classes that map my database tables, and it wraps my stored procedures for my data layer, and a thin business logic layer that just calls my data layer and returns the objects (1 object or collection).
This application is a web application, used for online communities (basically a forum).
I am watching summer of nhibernate videos right now.
Will using nHibernate make my life easier? Will updates to the database schema be any easier? What effects will there be on performance?
Is setting up nhibernate, and ensuring it performs optimally a headache of its own?
I don't want a complicated or deep object model, I simply want classes that map my tables, and a way to fetch data from my other tables that have foreign keys to them. I don't want a very complicated OOP model.
NHibernate can definitely make your life easier. Updates to your database schema will definitely be easier, because when you use an ORM, you don't have an API of stored procedures hindering you from refactoring your database schema to meet changes in your business model.
OR mappers have a LOT to offer, and are sadly misunderstood by a significant portion of the developer community, and almost all of the DBA community.
Stored procedures in general give the DBA more options for tuning performance in a database, because they have the freedom to rewrite the stored proc so long as they don't change its output. However, in my experience, stored procedures are rarely rewritten, due to other issues that can arise as a result (i.e. when a deployment of a new version of software is performed, any modified versions of existing procs will overwrite the optimized version that was changed by a DBA...thus negating the benefit and creating a maintenance and unexpected performance issue problem.)
Another grave misconception (and this is primarily from the SQL Server camp...I have very little experience with Oracle), is that Stored Procedures are the only thing that can be compiled and the execution plan cached. As far as SQL Server is concerned, any parameterized query can and probably will be compiled and cached.
A benefit of OR mappers is that they are adaptive...with a stored procedure, you write a single statement that will be used regardless of contextual nuances when that query is executed. LINQ to SQL has an amazing capacity to generate the most efficient queries I've ever seen, and often throws DBA's for a serious loop. I've shown DBA's queries generated by L2S that were full of sub queries and unconventional things which were immediately scoffed at. However, given the challenge, the performance (namely physical reads) of a query written by a DBA that was supposedly superior ended up being significantly inferior (sometimes on a scale of 30 physical reads for L2S vs 400 physical reads for the DBA.)
Another detractor as far as DBA's are concerned is that, because ORM's generate dynamic SQL, they have no way to optimize those queries. On the contrary (and again, this is restricted to SQL Server), SQL Server offers a multitude of optimization paths (horizontal and vertical table partitioning, distribution of physical files accross disks for any table or view, indexes, etc.) that can be taken before the need to modify a query is a necessity. Even in the event that a query needs to be modified, SQL Server 2005 and later provide something called Plan Guides, which allow you to moderately tune any query (stored proc, strait sql, etc.). In the event that tuning a query isn't enough, you can match any particular query to a complete replacement query, allowing the DBA to tune the query as much as they need to (but as a last resort.)
There are many, many benefits that can be gained by using an OR mapper, and NHibernate is one of the best free ones (LLBLGen is also very nice, but is not free.) LINQ to Sql and Entity Framework are some new offerings from Microsoft (L2S is soon to be replaced by EF 4.0 from the .NET 4.0 framework...which will at least rival, if not outpace, NHibernate.) The biggest hurdle to adopting an ORM is usually not the ORM product itself, nor its capabilities or performance. The greatest hurdle is usually convincing your DBA (if your lucky/unlucky enough ... depends on your experience ... to have one) that an ORM can improve efficiency and reduce maintenance costs without a cost of optimization paths for the DBA.
NHibernate works very well, especially for a simple model. It will make your life much easier and isn't too tough to learn. Look at "Fluent NHibernate" instead of using XML mappings, it is much easier.