What's the purpose of Datasets?

What's the purpose of Datasets? - c#

I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?

I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Why do you have food in your fridge, when you can just go directly to the grocery store every time you want to eat something? Because going to the grocery store every time you want a snack is extremely inconvenient.
The purpose of DataSets is to avoid directly communicating with the database using simple SQL statements. The purpose of a DataSet is to act as a cheap local copy of the data you care about so that you do not have to keep on making expensive high-latency calls to the database. They let you drive to the data store once, pick up everything you're going to need for the next week, and stuff it in the fridge in the kitchen so that its there when you need it.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
You order a dozen different products from a web site. Which way is better: delivering the items one at a time as soon as they become available from their manufacturers, or waiting until they are all available and shipping them all at once? The first way, you get each item as soon as possible; the second way has lower delivery costs. Which way is better? How the heck should we know? That's up to you to decide!
The data update strategy that is better is the one that does the thing in a way that better meets your customer's wants and needs. You haven't told us what your customer's metric for "better" is, so the question cannot be answered. What does your customer want -- the latest stuff as soon as it is available, or a low delivery fee?

Datasets support disconnected architecture. You can add local data, delete from it and then using SqlAdapter you can commit everything to the database. You can even load xml file directly into dataset. It really depends upon what your requirements are. You can even set in memory relations between tables in DataSet.
And btw, using direct sql queries embedded in your application is a really really bad and poor way of designing application. Your application will be prone to "Sql Injection". Secondly if you write queries like that embedded in application, Sql Server has to do it's execution plan everytime whereas Stored Procedures are compiled and it's execution is already decided when it is compiled. Also Sql server can change it's plan as the data gets large. You will get performance improvement by this. Atleast use stored procedures and validate junk input in that. They are inherently resistant to Sql Injection.
Stored Procedures and Dataset are the way to go.
See this diagram:
Edit: If you are into .Net framework 3.5, 4.0 you can use number of ORMs like Entity Framework, NHibernate, Subsonic. ORMs represent your business model more realistically. You can always use stored procedures with ORMs if some of the features are not supported into ORMs.
For Eg: If you are writing a recursive CTE (Common Table Expression) Stored procedures are very helpful. You will run into too much problems if you use Entity Framework for that.

This page explains in detail in which cases you should use a Dataset and in which cases you use direct access to the databases

I usually like to practice that, if I need to perform a bunch of analytical proccesses on a large set of data I will fill a dataset (or a datatable depending on the structure). That way it is a disconnected model from the database.
But for DML queries I prefer the quick hits directly to the database (preferably through stored procs). I have found this is the most efficient, and with well tuned queries it is not bad at all on the db.

Related

c# update single db field or whole object?

This might seem like an odd question, but it's been bugging me for a while now. Given that i'm not a hugely experienced programmer, and i'm the sole application/c# developer in the company, I felt the need to sanity check this with you guys.
We have created an application that handles shipping information internally within our company, this application works with a central DB at our IT office.
We've recently switch DB from mysql to mssql and during the transition we decided to forgo the webservices previously used and connect directly to the DB using Application Role, for added security we only allow access to Store Procedures and all CRUD operations are handle via these.
However we currently have stored procedures for updating every field in one of our objects, which is quite a few stored procedures, and as such quite a bit of work on the client for the DataRepository (needing separate code to call the procedure and pass the right params for each procedure).
So i'm thinking, would it be better to simply update the entire object (in this case, an object represents a table, for example shipments) given that a lot of that data would be change one field at a time after initial insert, and that we are trying to keep the network usage down, as some of the clients will run with limited internet.
Whats the standard practice for this kind of thing? or is there a method that I've overlooked?

I would say that updating all the columns for the entire row is a much more common practice.
If you have a proc for each field, and you change multiple fields in one update, you will have to wrap all the stored procedure calls into a single transaction to avoid the database getting into an inconsistent state. You also have to detect which field changed (which means you need to compare the old row to the new row).
Look into using an Object Relational Mapper (ORM) like Entity Framework for these kinds of operations. You will find that there is not general consensus on whether ORMs are a great solution for all data access needs, but it's hard to argue that they solve the problem of CRUD pretty comprehensively.

Connecting directly to the DB over the internet isn't something I'd switch to in a hurry.
"we decided to forgo the webservices previously used and connect directly to the DB"
What made you decide this?
If you are intent on this model, then a single SPROC to update an entire row would be advantageous over one per column. I have a similar application which uses SPROCs in this way, however the data from the client comes in via XML, then a middleware application on our server end deals with updating the DB.

The standard practice is not to connect to DB over the internet.
Even for small app, this should be the overall model:
Client app -> over internet -> server-side app (WCF WebService) -> LAN/localhost -> SQL
DB
Benefits:
your client app would not even know that you have switched DB implementations.
It would not know anything about DB security, etc.
you, as a programmer, would not be thinking in terms of "rows" and "columns" on client side. Those would be objects and fields.
you would be able to use different protocols: send only single field updates between client app and server app, but update entire rows between server app and DB.
Now, given your situation, updating entire row (the entire object) is definitely more of a standard practice than updating a single column.

It's better to only update what you change if you know what you change (if using an ORM like entity Framework for example), but if you're going down the stored proc route then yes definately update everything in a row at once that's way granular enough.
You should take the switch as an oportunity to change over to LINQ to entities however if you're already in a big change and ditch stored procedures in the process whenever possible

Is it a good approach to query the database only through stored procedures?

When I am developing an ASP.NET website I do really like to use Entity Framework with both database-first or code-first models (+ asp.net mvc controllers scaffolding).
For an application requiring to access an existing database, I naturally thought to create a database model and to use asp.net mvc scaffolding to get all the basic CRUD operations done in a few minutes with nearly no development costs.
But I discussed with a friend who told me that accessing data stored in the database only through stored procedures is the best approach to take.
My question is thus, what do you think of this sentence? Is it better to create stored procedures for any required operations on a table in the database (e.g. create and read on this table, update and delete only on another one, ...)? And what are the advantages/disadvantages of doing so instead of using a database-first model created from the tables in the database?
What I thought at first is that it double costs of development to do everything through stored procedures as you have to write these stored procedures where Entity Framework could have provided DbContext in a few clicks, allowing me to use LINQ over Entities, ... But then I've read a few stuff about Ownership Chains that might improve security by setting only permissions to execute stored procedures and no permissions for any operations (select, insert, update, delete) on the tables.
Thank you for your answers.

Its a cost benefit analysis. Being a DB focused guy, I would agree with that statement. It is best. It also makes you code easier to read (no crazy sql statements uglifying it). Increased performance with cached execution plans. Ease of modifying the querying without recompiling the code, eetc.
Many of the ppl I work with are not all that familiar with writing SPROCs so it tends to be a constant fight with them use them. Personally I dont see any reason to ever bury SQLStatments in your code. They tend to shy away from them b/c it is more work for them up front.

Yes, it's a good approach.
Whether it's the best approach or not, that depends on a lot of factors, some of them which you don't even know yet.
One important factor is how much furter development there will be, and how much maintainence. If the initial development is a big part of the total job, then you should rather use a method that gets you there as fast and easy as possible.
If you will be working with and maintaining the system for a long time, you should focus less on the initial development time, and more on how easy it is to make changes to the system once it's up and running. Using stored procedures is one way to make the code less depending on the exact data layout, and allows you to make changes without a lot of down time.
Note that it's not neccesarily a choise between stored procedures and Entity Framework. You can also use stored procedures with Entity Framework.

This is primarily an opinion based question and the answer may depend on the situation. Using stored procedure is definetely one of the best ways to query the database but since the emergence of Entity Framework it is widely used. The advantage of Entity Framework is that it provides a higher level of abstraction.
Entity Framework applications provide the following benefits:
Applications can work in terms of a more application-centric conceptual model, including types with inheritance, complex members,
and relationships.
Applications are freed from hard-coded dependencies on a particular data engine or storage schema.
Mappings between the conceptual model and the storage-specific schema can change without changing the application code.
Developers can work with a consistent application object model that can be mapped to various storage schemas, possibly implemented in
different database management systems.
Multiple conceptual models can be mapped to a single storage schema.
Language-integrated query (LINQ) support provides compile-time syntax validation for queries against a conceptual model.
You may also check this related question Best practice to query data from MS SQL Server in C Sharp?

following are some Stored Procedure advantages
Encapsulate multiple statements as single transactions using stored procedured
Implement business logic using temp tables
Better error handling by having tables for capturing/logging errors
Parameter validations / domain validations can be done at database level
Control query plan by forcing to choose index
Use sp_getapplock to enforce single execution of procedure at any time
in addition entity framework will adds an overhead for each request you make, as entity framework will use reflection for each query. So, by implementing stored procedure you will gain in time as it's compiled and not interpreted each time like a normal entity framework query.
The link bellow give some reasons why you should use entity framework
http://kamelbrahim.blogspot.com/2013/10/why-you-should-use-entity-framework.html
Hope this can enlighten you a bit

So I'm gonna give you a suggestion, and it will be something I've done, but not many would say "I do that".
So, yes, I used stored procedures when using ADO.NET.
I also (at times) use ORM's, like NHibernate and EntityFramework.
When I use ADO.NET, I use stored procedures.
When you get data from the database, you have to turn it into something on the DotNet side.
The quickest thing is to put data into a DataTable or DataSet.
I no longer favor this method. While it may make for RAPID development ("just stuff the data into a datatable")......it does not work well for MAINTENANCE, even if that maintenance is only 2-3 months down the road.
So what do I put the data into?
I create DTO/POCO objects and hydrate the data from the database into these objects.
For example.
The NorthWind database has
Customer(s)
Order(s)
and OrderDetail(s)
So I create a csharp class called Order.cs, Order.cs and OrderDetail.cs.
These ONLY contain properties of the entity. Most of the time, the properties simple reflect the columns in the database for that entity. (Order.cs has properties, that simulate a Select * from dbo.Order where OrderID = 123 for example).
Then I create a child-collection object
public class OrderCollection : List<Order>{}
and then the parent object gets a property.
public class Customer ()
{
/* a bunch of scalar properties */
public OrderCollection Orders {get;set;}
}
So now you have a stored procedure. And it gets data.
When that data comes back, one way to get it is with an IDataReader. (.ExecuteReader).
When this IDataReader comes back, I loop over it, and populate the Customer(.cs), the Orders, and the OrderDetails.
This is basic, poor man's ORM (object relation mapping).
Back to how I code my stored procedures, I would write a procedure that returns 3 resultsets, (one db hit) and return the info about the Customer, the Order(s) (if any) and the OrderDetails(s) (if any exist).
Note that I do NOT do alot of JOINING.
When you do a "Select * from dbo.Customer c join dbo.Orders o on c.CustomerID = o.CustomerId, you'll note you get redundant data in the first columns. This is what I do not like.
I prefer multiple resultsets OVER joining and bringing back a single resultset with redundant data.
Now for the little special trick.
Whenever I select from a table, I always select all columns on that table.
So whenever I write a stored procedure that needs customer data, I do a
Select A,B,C,D,E,F,G from dbo.Customer where (......)
Now, alot of people will argue that. "Why do you bring back more info than you need?"
Well, real ORM's do this anyway. So I am poor-man reflecting this.
And, my code for taking the resultset(s) from the stored procedure to turn that into instances of objects........stays consistent.
Because if you write 3 stored procedures, and each one selects data from Customer table, BUT you select different columns and/or in a different order, youre "object mapper" code needs to have a method for each stored procedure.
This method of ADO.NET has served me well.
And, once my team swapped out ADO.NET for a real ORM, and that transition was very pain free because of the way we did the ADO.NET from the get go.
Quick rules of thumb:
1. If using ADO.NET, use stored procedures.
2. Get multiple result-sets, instead of redundant data via joins.
3. Make your columns consistent from any table you select from.
4. Take the results of your stored procedure call, and write a "hydrater" to take that info and put into your domain-model as soon as you can. (the .cs classes)
That has served me well for many years.
Good luck.

In my opinion :
Stored Procedures are written in big iron database "languages" like PL/SQL or T-SQL
Stored Procedures typically cannot be debugged in the same IDE your write your UI.
Stored Procedures don't provide much feedback when things go wrong.
Stored Procedures can't pass objects.
Stored Procedures hide business logic.
Source :
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html

Design Strategy: Query and Update data across 2 different databases

We have a requirement in which we need to query data across 2 different databases ( 1 in SQL Server and other in Oracle).
Here are the scenarios which need to be implemented:
Query: Get the data from one database and match for values in other
Update: Get the data from one database and update the objects in other
Technology that we are using: ASP.net, C#
The options that we have thought about:
Staging area in one database
Link Server ( can't go with the approach as it is not allowed due to organization wide policy)
Create web services
Create 2 different DAL and perform list operations with the data from 2 sources in DAL
I would like to know what is the best design strategy to deal with this kind of a scenario? If yes, then what are the pros and cons of that approach

Is it not possible to use SSIS package to do the data transformation between 2 servers and invoke it either via ASP.Net & c# project or via schedule job invoked on demand?

Will the results from one of the databases be small enough to efficiently pass around?
If so, I would suggest treating the databases as two independent datasources.
If the datasets are large, then you may have to consider some form of ETL into a staging area on one of the database. You may have issues if you need the queries to return up-to-date data from both databases. Because you will need to do a real-time ETL.

There is an article here about performing distributed transactions between Microsoft SQL server and Oracle:
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1054237.html
I don't know how well this works, however if it does work, this will probably be the best solution for you:
It will almost certainly be the fastest method of querying across multiple database servers.
It should also allow for true transactional support even when writing to both databases.

The best strategy for this will be to use Linked Server, as it is designed for querying and writing to heterogeneous databases as you described above. But obviously due to the policy constraint you mentioned, this is not the option.
Therefore, to achieve the result you want in the most optimal performance, here is what I suggest:
Decide which database contains the lookup data only (minimal dataset) and you will need to execute a query on it to pull the info out
Insert the lookup data using bulk copy into a temp/dummy table in the main database (contains most of the data that you will want to retrieve and return to the caller)
Use stored procedure or query to join the temp table with other tables in your main database to retrieve the dataset desired
The decision to whether to write this as web service or not isn't going to change the data retrieval process. But consideration should be given in essentially reducing the overhead on data transfer time by keeping the process as close to your db server as possible either on same machine or within LAN/high speed connection link.
Data update will be quite straightforward. It will just be the standard two phase operations of pull data out from one and update the other. -

It's hard to tell what the best solution is. But we have a scenario that's nearly the same.
RealTime:
For realtime data updating, we are using WebServices, since in our case, the two different databases belongs to distinct projects. So every project offers a WebService which can be used for data retrieval and data update. That has the advantage, that the project must not take care for database structure changes as long the webservice interface does not change.
Static Data:
Static data (e.g. employees) will be mirrored because for faster access. For that huge amount of data we are using flat files for the nightly update.
In case of static data I think it's important to explicit define data owners. For every piece of data it should be clear which database has the original data, and which database only has shadow copies for faster access.
So Static data is readonly in the shadow database, or only updateable through designated WebServices.

The problem with using multiple data sources in your .NET code is that you run the risk of having your CRUD ops fail ACID tests and having data inconsistencies.
I would be most inclined to pursue #Will A's comment to your question...
Set up a replication to a remove server, then link the two remote servers.

Have multiple DALs and handle it in the application - thousands is not a big number, you need to worry only if you are into 100,000s or millions in which case your application will hang.
Use linq to perform data operations on the datasets that are generated rather than looping through them.

When to use Stored Procedures instead of using any ORM with programming logic?

Hi all I wanted to know when I should prefer writing stored procedures over writing programming logic and pulling data using a ORM or something else.

Stored procedures are executed on server side.
This means that processing large amounts of data does not require passing these data over the network connection.
Also, with stored procedures, you can build consistent complicated business logic.
Say, you need to update the account balance each time you insert a transaction, and you need to insert many transactions at once.
Instead of doing this with triggers (which are implemented using inefficient record-by-record approach in many systems), you can pass a table variable or temporary table with the inputs and issue a set-based SQL statement inside the procedure. This will be much more efficient.

I prefer SPs over programming logic mainly for two reasons
Performance, anything what will reduce result set or can be more effectively done on the server, e.g.:
paging
filtering
ordering (on indexed columns)
Security -- if someone have got application's access to the database and wants to wipe out your all your records, having to execute Row_Delete for single each of them instead of DELETE FROM Rows already sounds good.

Never unless you identify a performance issue. (largely opinion)
(a Jeff blog post!)
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html
If you see stored procs as optimizations:
http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize

When appropriate.
complex data validation/checking logic
avoid several round trips to do one action in the DB
several clients
anything that should be set based
You can't say "never" or "always".
There is also the case where the database engine will outlive your client code. I bet there's more DAL or ORM upgrades/refactoring that DB engine upgrades/refactoring going on.
Finally, why can't I encapsulate code in a stored proc? Isn't that a good thing?

As ever, much of your decision as to which to use will depend on your application and its environment.
There are a couple of schools of thought here, and this debate always arouses strong sentiments on both sides.
The advantanges of Stored Procedures (as well as the large data moving that Quassnoi has mentioned) are that the logic is tied down in the database, and therefore potentially more secure. It is also only ever in one place.
However, there will be others who believe that the place for application logic should be in the application, especially if you are planning to access other types of datebases (for which you will have to write often different SPs).
Another consideration may be the skills of the resources you have to implement your application.

The point at which stored procedures become preferable to an ORM is that point at which you have multiple applications talking to the same database. At this point, you want your query logic embedded in one place, rather than once per application. And even here, you might want to prefer a service layer (which can scale horizontally) instead of the database (which only scales vertically).

What is a better practice ? Working with Dataset or Database

I have been developing many application and have been into confusion about using dataset.
Till date i dont use dataset and works into my application directly from my database using queries and procedures that runs on Database Engine.
But I would like to know, what is the good practice
Using Dataset ?
or
Working direclty on Database.
Plz try to give me certain cases also when to use dataset along with operation (Insert/Update)
can we set read/write lock on dataset with respect to our database

You should either embrace stored procedures, or make your database dumb. That means that you have no logic whatsoever in your db, only CRUD operations. If you go with the dumb database model, Datasets are bad. You are better off working with real objects so you can add business logic to them. This approach is more complicated than just operating directly on your database with stored procs, but you can manage complexity better as your system grows. If you have large system with lots of little rules, stored procedures become very difficult to manage.

In ye olde times before MVC was a mere twinkle in Haack's eye, it was jolly handy to have DataSet handle sorting, multiple relations and caching and whatnot.
Us real developers didn't care about such trivia as locks on the database. No, we had conflict resolution strategies that generally just stamped all over the most recent edits. User friendliness? < Pshaw >.
But in these days of decent generic collections, a plethora of ORMs and an awareness of separation of concerns they really don't have much place any more. It would be fair to say that whenever I've seen a DataSet recently I've replaced it. And not missed it.

As a rule of thumb, I would put logic that refers to data consistency, integrity etc. as close to that data as possible - i.e. in the database. Also, if I am having to fetch my data in a way that is interdependent (i.e. fetch from tables A, B and C where the relationship between A, B and C's contribution is known at request time), then it makes sense to save on callout overhead and do it one go, via a database object such as a function, procedure (as already pointed out by OMGPonies). For logic that is a level or two removed, it makes sense to have it where dealing with it "procedurally" is a bit more intuitive, such as in a dataset. Having said all that, rules of thumb are sometimes what their acronym infers...ROT!
In past .Net projects I've often done data imports/transformations (e.g. for bank transaction data files) in the database (one callout, all logic is encapsulated in in procedure and is transaction protected), but have "parsed" items from that same data in a second stage, in my .net code using datatables and the like (although these days I would most likely skip the dataset stage and work on them from a higher lever of abstraction, using class objects).

I have seen datasets used in one application very well, but that is in 7 years development on quite a few different applications (at least double figures).
There are so many best practices around these days that point twords developing with Objects rather than datasets for enterprise development. Objects along with an ORM like NHibernate or Entity Framework can be very powerfull and take a lot of the grunt work out of creating CRUD stored procedures. This is the way I favour developing applications as I can seperate business logic nicely this way in a domain layer.
That is not to say that datasets don't have their place, I am sure in certain circumstances they may be a better fit than Objects but for me I would need to be very sure of this before going with them.

I have also been wondering this when I never needed DataSets in my source code for months.
Actually, if your objects are O/R-mapped, and use serialization and generics, you would never need DataSets.
But DataSet has a great use in generating reports.
This is because, reports have no specific structure that can be or should be O/R-mapped.
I only use DataSets in tandem with reporting tools.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.