When I want to insert huge amount of data into database, which way is more efficient?
1. insert data row by row by calling insert statement
2. create a user defined Table type in my database, and write a stored procedure to insert a DataTable into database.
and why?
The most efficient would be SQL Bulk Insert.
To improve the performance further you can use SqlBulkCopy to SQL Server in Parallel.
This will probably vary based on the specific workload.
We can only tell you here what options are available to you - it is down to you to measure how efficient each option is in your specific case.
Options:
SQLBulkCopy
INSERT statements with single row
INSERT statements with multiple rows (experiment with different numbers of rows)
Stored procedure (with variations on the above)
2 is much much faster, because anything that involves per-row operations (RBAR) is inherently slow. I did quite a lot of testing with table valued parameters and found them to be close to the performance of bulk inserts for batch sizes of up to 1,000,000 rows. However, that's purely anecdotal and your mileage may vary.
Also, I would advise against using a DataTable and write something that streamed the rows more directly. I blogged about that here.
Related
Insert Query vs SqlBulkCopy - which one is best as performance wise for insert records from One DataBase to Another DataBase
I know SQLBulkcopy used for large no of records.
If records less than 10 then which one is better.
Please share your views.
As you are asking for less than 10 records,I would suggest you that use simple insert query.
But if you want to use SQLBulkCopy then first you should know when to use it.
Read For Knowledge
BULK INSERT
The BULK INSERT command is the in-process method for bringing data from a text file into SQL Server. Because it runs in process with Sqlservr.exe, it is a very fast way to load data files into SQL Server.
IMO, in your case, using SqlBulkCopy is way overkill..
You can use a User-defined Type (UDT) as a table-valued parameter (TVP) passed to a stored procedure if you have less than 1,000 rows or so. After that threshold, the performance gain of SqlBulkCopy begins to outweigh its initial overhead. SqlBulkCopy works best with Heap Tables (tables without clustered indexes).
I found this article helpful for my own solution - https://www.sentryone.com/blog/sqlbulkcopy-vs-table-valued-parameters-bulk-loading-data-into-sql-server.
Here's a link to a site with some benchmarks.
I've used SqlBulkCopy extensively in the past and it's very efficient. However, you may need to redesign your tables to be able to use it effectively. For example, when you use SqlBulkCopy you may want to use client side generated identifiers, so you will have to invent some scheme that allows you to generate stable, robust and predictable IDs. This is very different from your typical insert into table with auto generated identity column.
As an alternative to SqlBulkCopy you can use the method discussed in the link I provided where you use a combination of union all and insert into statements, it has excellent performance as well. Altough, as the dataset increases I think SqlBulkCopy will be the faster option. A hybrid approach is probably warranted here where you switch to SqlBulkCopy when record count is high.
I recon SqlBulkCopy will win out for larger data sets but you need to understand that a SqlBulkCopy operation can be used to forgo some checks. This will of course speed things up even more but also allow you to violate conditions that you have imposed on your database.
I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT
I am doing web application using c# .net and sql server 2008 as back end. Where application read data from excel and insert into sql table. For this mechanism I have used SQLBulkCopy function which work very well. Sql table has 50 fields from which system_error and mannual_error are two fields. After inserting records in 48 columns I need to re-ckeck all this records and update above mentioned two columns by specific errors e.g. Name filed have number, qty Not specified etc. For this I have to check each column by fetching in datatable and using for loop.
Its work very well when record numbers are 1000 to 5000. But it took huge time say 50 minutes when records are around 100,000 or more than this.
Initially I have used simple SQL Update Query then I had used stored procedure but both requires same time.
How to increase the performance of application? What are other ways when dealing with huge data to update? Do suggestions.
I hope this is why people use mongodb and no SQL systems. You can update huge data setsby optimizing your query. Read more here:
http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/450/sql-server-optimizing-update-queries-for-large-data-volumes
Also check:Best practices for inserting/updating large amount of data in SQL Server 2008
One thing to consider is that iterating over a database table row by row, rather than performing set based update operations would incur a significant performance hit.
If you are in fact performing set based updates on your data and still have significant performance problems you should look at the execution plan of your queries so that you can workout where and why they are performing so badly.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have two tables where I need to extract data from one, do some modifications to that data and then write it to a different table.
I was wondering what is the most space/time efficient way to do this.
Is it better to read one record, modify and write the single record to the other table and loop this or is it better to read the whole thing, modify it and then write all of it into the other table.
I will be writing this using C# and Linq.
The tables have different column headings and structure.
Indeed, the most performant way is to use a stored procedure or something (and then of course use batch/set operations).
If you have to choose C#, choose the option that has the fewest I/O-operations, since those are almost always the performance-breakers. That usually means: read everything in one go, modify, and then write it all in one go, but this is all very dependent on the amount of data you are modifying.
Most efficient way would be to do this entirely in the backend. Write a stored procedure (most likely no loops will be required it should be a matter of INSERT/SELECT) and call that SP from your .NET code.
The best way to do is the ETL process or script.
if the modification require any end user UI activity before inserting into the table then the C#, using linq is good. if the Modification on each record is same then use the ETL or SQL Scripts to perform this.
few guidlines.
For Data Fetching/INSERTING from One Table , Use Stored Procedure at SQL Side thats better.
using ADO.Net for fetching and inserting record is faster than LINQ.
***for same processing on multiple record
For bulk processing of record use table variables to fetch and interate on each record.
table variable is quite faster than temporary variables, it also helps in querying records.
Modify the operation or cursor or any logical based iteration on record or editing in columns data then
insert it through bulk insert in the other table.
SQL Server is quite proficient on records processing in large scale. i would not suggest to get your same business logic into the C#,LINQ based app. try to process your business logic at sql server unless the end user needs to edit the record.
i have moved 1.8 millions records from tables to a newly database structure tables, it happened accurately in 11 minutes but it took 27 minutes to (verification)queries to verify that everything is at right place.
might be it help you.
A big question is your amount of data. A .Net client cannot "write the whole thing" in one request. Inserts and updates happen row-by-row. It certainly makes sense to read the data in one request (or in batches if it is too big to process everything in-memory).
But if you have 100,000s or millions of rows, this process will take many minutes regardless. I therefore would reevaluate your assertion that "the manipulation in the middle requires it [C#]". There are probably ways around this, for example by creating some sort of control table in your database beforehand that you can then use in a query or stored procedure to apply the modifications. The difference in performance makes it worth while to be creative in this situation.
I have a complete working mini example here:
http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
but the gist is this
Use an IDataReader to read data from the source database.
Perform some validations and calculations in the C# code.
The massaged data needs to be put into the destination database.
The trouble here is that "row by row" kills performance, but if you do ~everything in one hit, it may be too much.
So what I do is send down X number of rows (think 1,000 as a example).....via xml to a stored procedure.......and bulk insert X number of rows at a time.
You may have poison records (or that don't pass validation). My model is "get as many that will work into the database, log and deal with the poison records later".
The code will log the xml that did not pass.
Not included in the demo, but an enchancement.
If the bulk insert (of 1,000) does not work, perhaps have a subroutine that passes them in one-by-one at that point........and log the handful that do not work.
The downloadable example is older, but has the skeleton.
Well you can try to update column by column ,so that you don't have to loop for every row and your server trips can be reduced to the 2*number(One for fetching data one for inserting ) of columns you have .
You can get primary keys while fetching records use
where in clause
and insert those values into another table.
But you have to elaborate more of your problem to get a satisfactory answer ,above approach will reduce server trips and looping .
Or you can use SqlDatadapter.Insert
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqldataadapter.insertcommand.aspx
------Hope it works.
I am inserting record in the database (100,1.000, 10.000 and 100.000) using 2 methods
(is a table with no primary key and no index)
using a for and inserting one by one
using a stored procedure
The times are, of course better using stored procedure.
My questions are: 1)if i use a index will the operation go faster and 2)Is there any other way to make the insertion
PS:I am using ibatis as ORM if that makes any difference
Check out SqlBulkCopy.
It's designed for fast insertion of bulk data. I've found it to be fastest when using the TableLock option and setting a BatchSize of around 10,000, but it's best to test the different scenarios with your own data.
You may also find the following useful.
SQLBulkCopy Performance Analysis
No, I suspect that, if you use an index, it will actually go slower. That's because it has to update the index as well as inserting the data.
If you're reasonably certain that the data won't have duplicate keys, add the index after you've inserted all the rows. That way, it built once rather than being added to and re-balanced on every insert.
That's a function of the DBMS. I know it's true for the one I use frequently (which is not SQLServer).
I know this is slightly off-topic, but it's a shame you're not using SQL Server 2008, as there's been a massive improvement in this area with the advent of the MERGE statement and user-defined table types (which allow you to pass-in a 'table' of data to the stored procedure or statement so you can insert/update many records in one go).
For some more information, have a look at http://www.sql-server-helper.com/sql-server-2008/merge-statement-with-table-valued-parameters.aspx
It was already discussed : Insert data into SQL server with best performance.