I have a .NET application that works against a SQL Server. This app gets data from a remote third party API, and I need to insert that data to my database in a transaction.
First I delete all existing data from the tables, then I insert each row of data that I get from the API.
I wrote a stored procedure that accepts parameters and does the insert. then I call that stored procedure in a loop with a transaction from .NET.
I'm guessing there's a smarter way to do this?
Thanks
If you're doing thousands or maybe even tens of thousands you can probably do best with table valued parameters.
If you're doing more than that then you should probably look at doing the dedicated SQL server bulk insert feature. That might not work great transactionally if I remember correctly.
Either way truncate is way faster than delete.
What I've done in the past to avoid needing transactions is create two tables, and use another for deciding which is the active one. That way you always have a table with valid data and no write locks.
Related
I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT
I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.
I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.
You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.
SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.
Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.
Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.
I have a table tblSource in SourceDB(SQL Server DB) and tblTarget in TargetDB(SQL Server DB). Data from tblSource has to be moved to tblTarget. tblSource has bit field to indicate which data is moved to tblTarget, so when row is copied to tbltarget this flag has to be set. I need to do it in C#, still suggestions in T-SQL are welcomed. My question is what all are possible solution and which will be best approach?
Merge will work for you if in SQL Server 2008.
OUTPUT will work for you with SQL Server 2005+.
You need to Update the record to set your bit flag and OUTPUT INSERTED.* into your destination table.
You can consider outputting selected records only if you are planning to insert selected records to your destination table.
This is good in terms of performance as this technique will require SQL Server to traverse the record only once.
Check these links for how OUTPUT is used.
http://msdn.microsoft.com/en-us/library/ms177564.aspx &&
http://blog.sqlauthority.com/2007/10/01/sql-server-2005-output-clause-example-and-explanation-with-insert-update-delete/
You could use the TSQL MERGE statement, which would remove the need to keep a flag on each row.
This could be executed from C# if need be, or wrapped in a stored procedure. If they are in separate server instances, you can create Linked server.
I would use SQLBulkCopy class for this. I've used it in the past and had good luck with it. It's plenty fast and easy to use. There's plenty of sample code at that link to get you started.
Is there any reason why a simple INSERT is not an option?
INSERT INTO tblTarget (destcol1, destcol2)
SELECT (sourcecol1, sourcecol2) FROM tblSource
Often times, I find myself needing to send a user updated collection of records to a stored procedure. For example, lets say there is a contacts table in the database. On the front end, I display lets say 10 contact records for the user to edit. User makes his changes and hits save.
At that point, I can either call my upsertContact stored procedure 10 times with the user modified data in a loop, or send an XML formatted <contact><firstname>name</firstname><lastname>lname</lastname></contact> with all 10 together to the stored procedure. I always end up doing xml.
Is there any better way to accomplish this. Is the xml method going to break if there are large number of records due to size. If so, how do people achieve this kind of functionality?
FYI, it is usually not just a direct table update so I have not looked into sqldatasource.
Change: Based on the request, the version so far has been SQL 2005, but we are upgrading to 2008 now. So, any new features are welcome. Thanks.
Update : based on this article and the feedback below, i think Table Valued Parameters are the best approach to choose. Also the new merge functionality of sql 2008 is really cool with TVP.
What version of SQL Server? You can use table-valued parameters in SQL Server 2008+ ... they are very powerful even though they are read-only and are going to be less hassle than XML and less trouble than converting to ORM (IMHO). Hit up the following resources:
MSDN : Table-Valued Parameters:
http://msdn.microsoft.com/en-us/library/bb510489%28SQL.100%29.aspx
Erland Sommarskog's Arrays and Lists in SQL Server 2008 / Table-Valued Parameters:
http://www.sommarskog.se/arrays-in-sql-2008.html#TVP_in_TSQL
I would think directly manipulating XML in the database would be more trouble than it is worth to go that route; I would suggest instead making each call separate like you suggest; 10 calls to save each contact.
There are benefits to that approach and drawbacks; obviously, you're having to create the database connection. However, you could simply queue up a bunch of commands to send on one connection.
The Sql Server XML datatype is the same as a VARCHAR(MAX) so it would take a really large changeset to cause it to break.
I have used a similar method in the past when saving XML requests and responses and found no issues with it. Not sure if it's the "best" solution, but "best" is always relative.
It sounds like you could use an Object-Relational-Management(ORM) solution like NHibernate or the Entity Framework. These solutions provide you with the ability to make changes to objects, and have the changes propagated to the database by the ORM provider. This makes them much more flexible than issuing your own sql statements to the database. They also make optimizations like sending all changes in a single transaction over a single connection.