upsert data for a list of objects to database - c#

Often times, I find myself needing to send a user updated collection of records to a stored procedure. For example, lets say there is a contacts table in the database. On the front end, I display lets say 10 contact records for the user to edit. User makes his changes and hits save.
At that point, I can either call my upsertContact stored procedure 10 times with the user modified data in a loop, or send an XML formatted <contact><firstname>name</firstname><lastname>lname</lastname></contact> with all 10 together to the stored procedure. I always end up doing xml.
Is there any better way to accomplish this. Is the xml method going to break if there are large number of records due to size. If so, how do people achieve this kind of functionality?
FYI, it is usually not just a direct table update so I have not looked into sqldatasource.
Change: Based on the request, the version so far has been SQL 2005, but we are upgrading to 2008 now. So, any new features are welcome. Thanks.
Update : based on this article and the feedback below, i think Table Valued Parameters are the best approach to choose. Also the new merge functionality of sql 2008 is really cool with TVP.

What version of SQL Server? You can use table-valued parameters in SQL Server 2008+ ... they are very powerful even though they are read-only and are going to be less hassle than XML and less trouble than converting to ORM (IMHO). Hit up the following resources:
MSDN : Table-Valued Parameters:
http://msdn.microsoft.com/en-us/library/bb510489%28SQL.100%29.aspx
Erland Sommarskog's Arrays and Lists in SQL Server 2008 / Table-Valued Parameters:
http://www.sommarskog.se/arrays-in-sql-2008.html#TVP_in_TSQL

I would think directly manipulating XML in the database would be more trouble than it is worth to go that route; I would suggest instead making each call separate like you suggest; 10 calls to save each contact.
There are benefits to that approach and drawbacks; obviously, you're having to create the database connection. However, you could simply queue up a bunch of commands to send on one connection.

The Sql Server XML datatype is the same as a VARCHAR(MAX) so it would take a really large changeset to cause it to break.
I have used a similar method in the past when saving XML requests and responses and found no issues with it. Not sure if it's the "best" solution, but "best" is always relative.

It sounds like you could use an Object-Relational-Management(ORM) solution like NHibernate or the Entity Framework. These solutions provide you with the ability to make changes to objects, and have the changes propagated to the database by the ORM provider. This makes them much more flexible than issuing your own sql statements to the database. They also make optimizations like sending all changes in a single transaction over a single connection.

Related

Inserting/updating huge amount of rows into SQL Server with C#

I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT

Best way to have 2 connections to sql server (one read one write)

I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.

SQL Server - insert multiple values - what is the right way?

I have a .NET application that works against a SQL Server. This app gets data from a remote third party API, and I need to insert that data to my database in a transaction.
First I delete all existing data from the tables, then I insert each row of data that I get from the API.
I wrote a stored procedure that accepts parameters and does the insert. then I call that stored procedure in a loop with a transaction from .NET.
I'm guessing there's a smarter way to do this?
Thanks
If you're doing thousands or maybe even tens of thousands you can probably do best with table valued parameters.
If you're doing more than that then you should probably look at doing the dedicated SQL server bulk insert feature. That might not work great transactionally if I remember correctly.
Either way truncate is way faster than delete.
What I've done in the past to avoid needing transactions is create two tables, and use another for deciding which is the active one. That way you always have a table with valid data and no write locks.

Reading Huge Data Using SQL Stored Procedure and C# (SQL Server 2005)

We have a requirement to pull huge data from SQL Server 2005 database for reporting purpose. Our stored procedure is returning more than 15,000 rows.
When I call the procedure from the application (MVC 4.0) the request is timing out!!! (May be because of the data size)
Is there is any best practice to read such a huge data from SQL Server 2005 database using
MVC 4.0 Application???
You're seeing a timeout because your SQL query takes a long time to finish. This is not due to the size of the result (15,000 records is not a huge amount of data), but because the query runs inefficiently.
Maybe you're missing a couple of indices, maybe the stored procedure is written the wrong way - it's impossible to know from here. Try optimizing your query or database (if you have a DBA available, they can help. If not, the Management Studio can have some tips for you).
If you can't optimize the query or the database, you're left with increasing the time out, as others suggested.
Even i faced the same problem, but i was about to render more than 1,48,000 records. So the solution for this is using multithreading. You will be having one method which fetches the data from database, call that particular method in a seperate thread. Your data will be loaded in less than 5 seconds. Multithreading has been introduced only to manipulate large number of data without lagging performance.
First Que is why you are not using Dataset and Data source view in the Reporting(If its reporting in SQL server).
If its not Reporting Services and you only want to use C# code then try to make some helper function for it.
see here for the timeout option
http://forums.asp.net/t/1040377.aspx
and also here for optimising the code and SP
enter link description here
Here are couple tips on how you can use to optimize this:
Optimize query – see if you can optimize your query in some way. Add indices to your tables, check where statements and such.. I can’t really give you any specific recommendations w/o seeing the query and knowing the schema. See what others have already suggested on this topic.
Limit the amount of data stored procedure is returning – my guess is that MVC app doesn’t really need all 15k rows but a lot more. Check out this post: efficient way to implement paging . This will not speed up the query so much but it will make the app more efficient.

What's the purpose of Datasets?

I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Why do you have food in your fridge, when you can just go directly to the grocery store every time you want to eat something? Because going to the grocery store every time you want a snack is extremely inconvenient.
The purpose of DataSets is to avoid directly communicating with the database using simple SQL statements. The purpose of a DataSet is to act as a cheap local copy of the data you care about so that you do not have to keep on making expensive high-latency calls to the database. They let you drive to the data store once, pick up everything you're going to need for the next week, and stuff it in the fridge in the kitchen so that its there when you need it.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
You order a dozen different products from a web site. Which way is better: delivering the items one at a time as soon as they become available from their manufacturers, or waiting until they are all available and shipping them all at once? The first way, you get each item as soon as possible; the second way has lower delivery costs. Which way is better? How the heck should we know? That's up to you to decide!
The data update strategy that is better is the one that does the thing in a way that better meets your customer's wants and needs. You haven't told us what your customer's metric for "better" is, so the question cannot be answered. What does your customer want -- the latest stuff as soon as it is available, or a low delivery fee?
Datasets support disconnected architecture. You can add local data, delete from it and then using SqlAdapter you can commit everything to the database. You can even load xml file directly into dataset. It really depends upon what your requirements are. You can even set in memory relations between tables in DataSet.
And btw, using direct sql queries embedded in your application is a really really bad and poor way of designing application. Your application will be prone to "Sql Injection". Secondly if you write queries like that embedded in application, Sql Server has to do it's execution plan everytime whereas Stored Procedures are compiled and it's execution is already decided when it is compiled. Also Sql server can change it's plan as the data gets large. You will get performance improvement by this. Atleast use stored procedures and validate junk input in that. They are inherently resistant to Sql Injection.
Stored Procedures and Dataset are the way to go.
See this diagram:
Edit: If you are into .Net framework 3.5, 4.0 you can use number of ORMs like Entity Framework, NHibernate, Subsonic. ORMs represent your business model more realistically. You can always use stored procedures with ORMs if some of the features are not supported into ORMs.
For Eg: If you are writing a recursive CTE (Common Table Expression) Stored procedures are very helpful. You will run into too much problems if you use Entity Framework for that.
This page explains in detail in which cases you should use a Dataset and in which cases you use direct access to the databases
I usually like to practice that, if I need to perform a bunch of analytical proccesses on a large set of data I will fill a dataset (or a datatable depending on the structure). That way it is a disconnected model from the database.
But for DML queries I prefer the quick hits directly to the database (preferably through stored procs). I have found this is the most efficient, and with well tuned queries it is not bad at all on the db.

Categories

Resources