SQL Server stored procedure vs an external dll

SQL Server stored procedure vs an external dll - c#

I am trying to convince someone that using an external DLL to manage sql data is better then using stored procedures. Currently the person I am working with is using vba and calls sql stored procedures to get the complicated data they need from many different sources. It is my understanding that the best way to go about this kind of action is to use a dll/ some intermediate layer to get the data and be able to format it to the needs.
Some things to keep in mind:
The person i am working with doesn't care to much about being able to scale to much further then we are now
They don't care to be able to switch to different platforms
They don't see to much of a performance problem with the current setup
Using a dll requires more work that is in a different direction
They don't want to switch if there's not a current problem with doing it the way it is now.(So just because its not the right way wont work...I tried)
So can anyone tell me some benefits of using an external dll then using sql stored procedures ?

Use stored procedures, and write your data access layer which calls them via parameterized commands in a separate dll. Stored procedures are a standard and give you a ton of benefits, parameterized commands give you automatic string safety.
This type of design is basically so standardized and has been for years now that Microsoft has included a framework that constructs it for you in .NET 4.
More or less, both you and this other fellow are right, use sprocs for security, and separate your DAL for security and reusability and lots of reasons

ORM/DLL Approach
Pro:
You don't have to learn SQL, or stored procedure syntax
Con:
Complicates multiple operations in a single transaction
Risks increasing trips between the application and the database, which means data sync/concurrency issues
Utterly fails at complex queries; most support stored procedures via ORM because of this
You can save SQL, including stored procedures, in flat files. The file extension could be txt, but most use sql - makes storing SQL source in CVS/etc moot vs .NET or Java source code.

Agree with the points about controlling the code, much easier in a DLL. Same with source control. However, from a pure performance perspective, the stored procedures will win they day because they are compiled, not just cached. I don't know if it will make enough difference but thought I'd throw that in.
Using stored procedures can also be much more secure as you can lock down access to only stored procedures and you don't (have to) expose your table data to anyone with a connection.
I guess I'm not really answering your question as much as pointing out holes in your argument. Sorry about that but I'm looking at it from their perspective.

I really think it comes down to a matter of preference. Personally I like ORM & saved queries in a DLL vs. Stored Procs, I find them much easier to maintain and distribute than deploying S.Procs to a DB. There are some certain advantages that a S.Proc has over a raw query though. Some optimizations, and some server-side logic that could improve performance in some areas.
All In all though, personally I prefer to work in code than in DB mumbo-jumbo so that's really why I opt for the DLL approach.
Plus you can keep your source code in source-control too, much harder to do with a stored-proc.
Just my 2c.

Related

Stored procedure vs. Entity Framework (LINQ queries) [duplicate]

I've used the entity framework in a couple of projects. In every project, I've used stored procedures mapped to the entities because of the well known benefits of stored procedures - security, maintainability, etc. However, 99% of the stored procedures are basic CRUD stored procedures. This seems like it negates one of the major, time saving features of the Entity Framework -- SQL generation.
I've read some of the arguments regarding stored procedures vs. generated SQL from the Entity Framework. While using CRUD SPs is better for security, and the SQL generated by EF is often more complex than necessary, does it really buy anything in terms of performance or maintainability to use SPs?
Here is what I believe:
Most of the time, modifying an SP requires updating the data model
anyway. So, it isn't buying much in terms of maintainability.
For web applications, the connection to the database uses a single user ID specific to the application. So, users don't even have direct database access. That reduces the security benefit.
For a small application, the slightly decreased performance from using
generated SQL probably wouldn't be much of an issue. For high
volume, performance critical applications, would EF even be a wise
choice? Plus, are the insert / update / delete statements generated
by EF really that bad?
Sending every attribute to a stored procedure has it's own performance penalties, whereas the EF generated code only sends the attributes that were actually changed. When doing updates to large tables, the increased network traffic and overhead of updating all attributes probably negates the performance benefit of stored procedures.
With that said, my specific questions are:
Are my beliefs listed above correct? Is the idea of always using SPs something that is "old school" now that ORMs are gaining in popularity? In your experience, which is the better way to go with EF -- mapping SPs for all insert / update / deletes, or using EF generated SQL for CRUD operations and only using SPs for more complex stuff?

I think always using SP's is somewhat old school. I used to code that way, and now do everything I can in EF generated code...and when I have a performance problem, or other special need, I then add back in a strategic SP to solve a particular problem....it doesn't have to be either or - use both.
All my basic CRUD operations are straight EF generated code - my web apps used to have 100's or more of SP's, now a typical one will have a dozen SP's and everything else is done in my C# code....and my productivity has gone WAY up by eliminating those 95% of CRUD stored procs.

Yes your beliefs are absolutely correct. Using stored procedures for data manipulation has meaning mainly if:
Database follows strict security rules where changing data is allowed only through stored procedures
You are using views or custom queries for mapping your entities and you need advanced logic in stored procedure to push data back
You have some advanced logic (related to data) in the procedure for any other reason
Using procedures for pure CUD where non of mentioned cases applies is redundant and it doesn't provide any measurable performance boost except single scenario
You will use stored procedures for batch / bulk modifications
EF doesn't have bulk / batch functionality so changing 1000 records result in 1000 updates each executed with separate database roundtrip! But such procedures cannot be mapped to entities anyway and must be executed separately via function import (if possible) or directly as ExecuteStoreCommand or old ADO.NET (for example if you want to use table valued parameter).
The whole different story can be R in CRUD where stored procedure can get significant performance boost for reading data with your own optimized queries.

If performance is your primary concern, then you should take one of your existing apps that uses EF with SPs, disable the SPs, and benchmark the new version. That's the only way to get an answer perfectly applicable to your situation. You might find that EF no matter what you do isn't fast enough for your performance needs compared to custom code, but outside of very high volume sites I think EF 4.1 is actually pretty reasonable.
From my PoV, EF is a great developer productivity boost. You lose a fair bit of that if you're writing SPs for simple CRUD operations, and for Insert/Update/Delete in particular I really don't see you gaining much in performance because those operations are so straightforward to generate SQL for. There are definitely some Select cases where EF will not do the optimal thing and you can get major performance increases by writing a SP (hierarchical queries using CONNECT BY in Oracle come to mind as an example).
The best way to deal with that type of thing is to write your app letting EF generate the SQL. Benchmark it. Find areas where there's performance issues and write SPs for those. Delete is almost never going to be one of the cases you need to do this.
As you mentioned, the security gain here is somewhat lessened because you should have EF on an Application tier that has its own account for the app anyway, so you can restrict what it does. SPs do give you a bit more control but in typical usage I don't think it matters.
It's an interesting question that doesn't have a truely right or wrong answer. I use EF primarily so that I don't have to write generic CRUD SPs and can instead spend my time working on the more complex cases, so for me I'd say you should write fewer of them. :)

I agree broadly with E.J, but there are a couple of other options. It really boils down to the requirements for the particular system:
Do you need to get the app developed FAST? - Then use entity framework and its automatic SQL
Need fine-grained and solid security? - Get onto stored procedures
Need it to run as fast as possible? - You're probably looking at some happy medium!

In my opinion as long as your application/database does not suffer from performance issues and you are mostly using the database for CRUD and accessing it using just one DB user, it is better to use generated SQL. It is faster to develop, more maintainable and the few security or more privacy benefits are not worth it (if the data is not so sensitive). Also the use of model based database access or LINQ disabled the threat from SQL injections.

Should I manually code ADO.Net database access?

I'm really late to the .Net game and struggling to learn ADO.Net. I prefer to learn how to do data access the "right way". Somewhere I've picked up on the idea that it's considered superior to manually code your own Connections, Data Adapters, DataSets, DataTables, and even command statements for updating, adding, and deleting rather than using Visual Studios data wizard. I understand from my reading that there are some things you can only do by writing your own command statements but it isn't completely clear to me what that might be.
Should I always code my own connections, data adapters, datasets and datatables? What about my update, insert, and delete command statements? How do I know when I should code those manually?

There is no right or wrong way. However I would suggest you first do things the "hard way" in that you write your own code for each of the data access routines you need. Of course that would mean you'll also need to know and understand SQL. Eventually you could use/build tools that generate all of your code just the way you need it.
Preferably you'll use stored procedures instead of SQL statements in code, because stored procedures provide an additional level of abstraction, abstracting your database schema from even your data layer and of course your business layer.
I'd used ADO.NET core (that is writing your own code for data access and such). I'd use DataSets/DataTable (if you have to) purely as in-memory data structures without using them to do automatic updates/deletes and the like. Stick to DataReaders to the extent possible converting them over to DTOs (for data retrieval methods). For data modification methods, your data layer should get DTOs as parameters (or simple data types as parameters if there are just one or two).
Personally I use tools to generate the data access layer code that uses ADO.NET core (and not EF or LINQ2SQL and such). That is my personal preference and depending on the size of your application it goes a very long way in towards performance as well as needing to have in-depth knowledge of only two things. Your database and SQL and C# code without also having to learn about the nuances of abstraction layers and specialized languages (in some cases).
In large projects (and teams) leaving the database schema and stored procedures to people specialized in that area becomes a necessity and requirement and in those cases using ADO.NET core also becomes a requirement.
On my blog I have posted an article where in I introduce a tool that generates all of the code. The tool and source code are available for download. The tool also generates code for strongly typed datareaders. That is under the covers you're using a DataReader while in code it looks/feels like a DTO in terms of strongly typed properties.
Data Access Layer CodeGen
DataReader Wrappers - TypeSafe

in my own experience is preferred to always use hard code instead using smart control wizard.

I think you should learn how its done under the covers first and then pick your own abstraction layer of which there are many.

LINQ to SQL does a great job of automating common Db tasks. All your basic CRUD (Create,Read,Update,Delete) operations will be much easier to code by using a DataContext dbml file. The code is much easier to write, does not rely on strings, is compatible with other ADO.NET commands (You can execute a direct DbCommand against your DataContext, and it is more highly optimized than anything most people will write (Especially a beginner!). You will save yourself a whole lot of time by using something like LINQ to SQL or another ORM. Unless your objective is pure learning, you would be best off by creating a working DataContext, and analyzing the source to see how it is working instead of teaching yourself ADO.NET. The fact that you are at a point where you need to ask this question, probably indicates that you will not add value to your application by writing your own boiler plate DB access code.
It looks like a lot of people are recommending that you hard code your DAL first, before you use an ORM like LINQ to SQL. I would just like to point out that the logic involved in this line of thinking would necessitate that we also learn to code with IL before writing C# code, build a computer before we use one, and sail across the ocean before we take an international air plane.

There's not really going to be a black-and-white answer for this, but in my experience, I've always been better off coding my own stuff. This has largely been because I'm just an anal-retentive obsessive-compulsive control freak, and I just don't trust wizards to write code the way I want it written. I'm sure that many people agree with me, just as I'm sure that many people disagree with me.
The fact that OR/Ms exist is plenty of proof to prove that you don't always need to roll your own code. The fact that it's not mandatory is also proof that you aren't compelled to use it.
Do whatever feels right and meets the needs of your solution and its time and budgetary constraints.

What should I use for performance sensitive data access?

So I have an application which requires very fast access to large volumes of data and we're at the stage where we're undergoing a large re-design of the database, which gives a good opertunity to re-write the data access layer if nessersary!
Currently in our data access layer we use manually created entities along with plain SQL to fill them. This is pretty fast, but this technology is really getting old, and I'm concerned we're missing out on a newer framework or data access method which could be better in terms of neatness and maintainability.
We've seen the Entity Framework, but after some research it just seems that the benefit of the ORM it gives is not enough to justify the lower performance and as some of our queries are getting complex I'm sure performance with the EF would become more of an issue.
So it is a case of sticking with our current methods of data access, or is there something a bit neater than manually creating and maintaining entities?
I guess the thing that's bugging me is just opening our data layer solution and seeing lots of entities, all of which need to be maintained exactly in line with the database, which sometimes can be a lot of work, but then maybe this is the price we pay for performance?
Any ideas, comments and suggestions are very appreciated! :)
Thanks,
Andy.
** Update **
Forgot to mention that we really need to be able to handle using Azure (client requirements), which currently stops us from using stored procedures. ** Update 2 ** Actually we have an interface layer for our DAL which means we can created an Azure implementation which just override data access methods from the Local implementation which aren't suitable for Azure, so I guess we could just use stored procedures for performance sensitive local databases with EF for the cloud.

I would use an ORM layer (Entity Framework, NHibernate etc) for management of individual entities. For example, I would use the ORM / entities layers to allow users to make edits to entities. This is because thinking of your data as entities is conceptually simpler and the ORMs make it pretty easy to code this stuff without ever having to program any SQL.
For the bulk reporting side of things, I would definitely not use an ORM layer. I would probably create a separate class library specifically for standard reports, which creates SQL statements itself or calls sprocs. ORMs are not really for bulk reporting and you'll never get the same flexibility of querying through the ORM as through hand-coded SQL.

Stored procedures for performance. ORMs for ease of development
Do you feel up to troubleshooting some opaque generated SQL when it runs badly...? That generates several round trips where one would do? Or insists on using wrong datatypes?

You could try using mybatis (previously known as ibatis). It allows you to map sql statements to domain objects. This way you keep full control over SQL being executed and get cleanly defined domain model at the same time.

Don't rule out plain old ADO.NET. It may not be as hip as EF4, but it just works.
With ADO.NET you know what your SQL queries are going to look like because you get 100% control over them. ADO.NET forces developers to think about SQL instead of falling back on the ORM to do the magic.
If performance is high on your list, I'd be reluctant to take a dependency on any ORM especially EF which is new on the scene and highly complex. ORM's speed up development (a little) but are going to make your SQL query performance hard to predict, and in most cases slower than hand rolled SQL/Stored Procs.
You can also unit test SQL/Stored Procs independently of the application and therefore isolate performance issues as either DB/query related or application related.
I guess you are using ADO.NET in your DAL already, so I'd suggest investing the time and effort in refactoring it rather than throwing it out.

ETL Processing Design and Performance

I am working on a ETL process for a data warehouse using C#, that supports both SQL Server and Oracle. During development I have been writing stored procedures that would synchronize data from one database to another database. The stored procedures code are rather ugly because it involves dynamic SQL. It needs to build the SQL strings since we have dynamic database name.
My team lead want to use C# code to do the ETL. We have code generation that automatic generate new classes when database definition changes. That's also why I decided not to use Rhino ETL.
Here are the pros and cons:
Stored Procedure:
Pros:
fast loading process, everything is handled by the database
easy deployment, no compiling is needed
Cons
poor readability due to dynamic SQL
Need to maintain both T-SQL and PL/SQL scripts when database definition changes
Slow development because no intellisense when writing dynamic SQL
C# Code:
Pros:
easier to develop the ETL process because we get intellisense from our generated class
easier to maintain because of generated class
better logging and error handling
Cons:
slow performance compare with stored procedure
I would prefer to use application code to do the ETL process, but the performance was horrible compare with stored procedures. In one test when I tries to update 10,000 row. The stored procedures took only 1 sec, while my ETL code took 70s. Even I somehow manage to reduce the overhead, 20% of the 70s are purely calling update statement from application code.
Could someone provide me suggestions or comment on how to speed up the ETL process using application code?
My next idea is try doing parallel ETL process by opening multiple database connections and perform the update and insert.
Thanks

You say you have code generation that automatically generates new classes - why don't you have code generation that automatically generate new stored procedures?
That should give you the best of two worlds; encapsulate it into a few nice classes that can inspect the database and update things as necessary and you can, well not increase readability, but hide it (you would not need to update the SPs manually)
Also, the difference should not be so huge, sounds as if you are not doing something right (reusing connections, moving data unnecessary from server to the application or processing data in smaller batches - row by row?).
Also, regarding better logging - care to elaborate on that? You can have logging on the database layer, too, or you can design your SPs so that application layer can still do the logging.

If your C# code is already slow with 10,000 rows, I cannot imagine it in a real environement...
Most ETL are done either within the database (stored procedures, packages, or even compiled within the database (PL/SQL, Java for Oracle)). They can handle millions of rows.
Or some professional tools can be used (Informatica, or others), but it will still be slower than stored procedures, but easier to manage.
So my conclusion is: If you want to come anywhere close to stored procedure performances, you will have to code an application as good as those professional ones on the market, that took years to develop and mature... Do you think you can?
Plus, if you have to handle different database types (SQL Server, Oracle), you CANNOT make a generic application AND optimize it at the same time, it's a choice. Because Oracle does not work the same way SQL Server does.
To give you an idea, in ETLs for Oracle, hints are used (like the Parallel Execution hints), and also some indexes may be dropped or integrity disabled temporarly to optimize the ETL.
There is no way that I know of to the the exact same thing in SQL Server (they might have similar options, but different syntax).
So "one ETL for all databases" can hardly be done without losing efficiency and speed.
So I think your pros and cons are very accurate; you have to choose between speed and ease of development, but not both.

You might consider tuning up your application.
A few tricks of mine:
Don't use connection.Open() and conenction.Close() too much.
Im some cases LINQ will slow things down
Use a procedure and pass more parameters when loading to reduce the number of calls, for example, proc_load_to_table(p1 text) change to proc_load_to_table(p1 text, p2 text, p3 text, p4 tex, p5 text)

When to use Stored Procedures instead of using any ORM with programming logic?

Hi all I wanted to know when I should prefer writing stored procedures over writing programming logic and pulling data using a ORM or something else.

Stored procedures are executed on server side.
This means that processing large amounts of data does not require passing these data over the network connection.
Also, with stored procedures, you can build consistent complicated business logic.
Say, you need to update the account balance each time you insert a transaction, and you need to insert many transactions at once.
Instead of doing this with triggers (which are implemented using inefficient record-by-record approach in many systems), you can pass a table variable or temporary table with the inputs and issue a set-based SQL statement inside the procedure. This will be much more efficient.

I prefer SPs over programming logic mainly for two reasons
Performance, anything what will reduce result set or can be more effectively done on the server, e.g.:
paging
filtering
ordering (on indexed columns)
Security -- if someone have got application's access to the database and wants to wipe out your all your records, having to execute Row_Delete for single each of them instead of DELETE FROM Rows already sounds good.

Never unless you identify a performance issue. (largely opinion)
(a Jeff blog post!)
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html
If you see stored procs as optimizations:
http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize

When appropriate.
complex data validation/checking logic
avoid several round trips to do one action in the DB
several clients
anything that should be set based
You can't say "never" or "always".
There is also the case where the database engine will outlive your client code. I bet there's more DAL or ORM upgrades/refactoring that DB engine upgrades/refactoring going on.
Finally, why can't I encapsulate code in a stored proc? Isn't that a good thing?

As ever, much of your decision as to which to use will depend on your application and its environment.
There are a couple of schools of thought here, and this debate always arouses strong sentiments on both sides.
The advantanges of Stored Procedures (as well as the large data moving that Quassnoi has mentioned) are that the logic is tied down in the database, and therefore potentially more secure. It is also only ever in one place.
However, there will be others who believe that the place for application logic should be in the application, especially if you are planning to access other types of datebases (for which you will have to write often different SPs).
Another consideration may be the skills of the resources you have to implement your application.

The point at which stored procedures become preferable to an ORM is that point at which you have multiple applications talking to the same database. At this point, you want your query logic embedded in one place, rather than once per application. And even here, you might want to prefer a service layer (which can scale horizontally) instead of the database (which only scales vertically).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.