There are many examples on the use of the SqlBulkCopy Class in System.Data.SqlClient but only in relation to a single file. What I would like to know is how you use it when you have multiple files. I have read that it should only be used once but how do you achieve that ?. Can someone give me example of how to use SqlBulkCopy with multiple files
If you are a novice you probably should call SqlBulkCopy in a loop and be done with it.
If you are advanced and have higher perf requirements you should pass in an IEnumerable<SqlDataRecord> that is a single combined stream that has the data of all files. That way you have the ability to insert a single stream of rows. This can be more efficient than many smaller inserts because the query processor can sort all rows and insert them sequentially. There are also concerns around minimal logging. Sometimes an empty destination table is required.
Related
Insert Query vs SqlBulkCopy - which one is best as performance wise for insert records from One DataBase to Another DataBase
I know SQLBulkcopy used for large no of records.
If records less than 10 then which one is better.
Please share your views.
As you are asking for less than 10 records,I would suggest you that use simple insert query.
But if you want to use SQLBulkCopy then first you should know when to use it.
Read For Knowledge
BULK INSERT
The BULK INSERT command is the in-process method for bringing data from a text file into SQL Server. Because it runs in process with Sqlservr.exe, it is a very fast way to load data files into SQL Server.
IMO, in your case, using SqlBulkCopy is way overkill..
You can use a User-defined Type (UDT) as a table-valued parameter (TVP) passed to a stored procedure if you have less than 1,000 rows or so. After that threshold, the performance gain of SqlBulkCopy begins to outweigh its initial overhead. SqlBulkCopy works best with Heap Tables (tables without clustered indexes).
I found this article helpful for my own solution - https://www.sentryone.com/blog/sqlbulkcopy-vs-table-valued-parameters-bulk-loading-data-into-sql-server.
Here's a link to a site with some benchmarks.
I've used SqlBulkCopy extensively in the past and it's very efficient. However, you may need to redesign your tables to be able to use it effectively. For example, when you use SqlBulkCopy you may want to use client side generated identifiers, so you will have to invent some scheme that allows you to generate stable, robust and predictable IDs. This is very different from your typical insert into table with auto generated identity column.
As an alternative to SqlBulkCopy you can use the method discussed in the link I provided where you use a combination of union all and insert into statements, it has excellent performance as well. Altough, as the dataset increases I think SqlBulkCopy will be the faster option. A hybrid approach is probably warranted here where you switch to SqlBulkCopy when record count is high.
I recon SqlBulkCopy will win out for larger data sets but you need to understand that a SqlBulkCopy operation can be used to forgo some checks. This will of course speed things up even more but also allow you to violate conditions that you have imposed on your database.
I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT
I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.
I am trying to perform insert/update/delete operations on a SQL table based on the input csv file which is loaded into data table from an web application. Currently, I am using DataSet to do CRUD operations but would like to know if there will be any advantages of using LINQ over DataSet. I am assuming code will be reduced and more strongly typed but not sure if I need to switch to LINQ. Any inputs appreciated.
Edit
It is not a bulk operation, CSV might contain 200 records max.
I used the LumenWorks CSV reader which is very fast. It has it's own API for extracting data, using the IDataReader interface. Here is a brief example taken from codeplex.com. I use it for all my CSV projects, as it's very fast at reading CSV data. I was surprised at how fast it actually was.
If you were to go from a reader like this, you're essentially going from a data reader API and as such, would probably work with a data table more easily (you could create a DataTable matching the result set and easily copy data over matching column to column).
A lot of updates can be slower with LINQ, depending on whether you are using Entity Framework or something else, and what flavor you are using. A DataTable, IMHO would probably be faster. I had issues with LINQ and change tracking with a lot of objects (if you are using attached entities, not using POCOs). I've had pretty good performance taking a CSV file from Lumenworks and copying it to a DataTable.
What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database.
The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.)
I need to do this in a efficiently as possible i.e. time constraints
I am siding with the option of reading, transforming and loading the data using a C# application row-by-row? Is this ideal, if not, what are my options? Should I use multithreading?
You will be I/O bound, so multithreading will not necessarily make it run any faster.
Last time I did this, it was about a dozen lines of C#. In one thread it ran the hard disk as fast as it could read data from the platters. I read one line at a time from the source file.
If you're not keen on writing it yourself, you could try the FileHelpers libraries. You might also want to have a look at Sébastien Lorion's work. His CSV reader is written specifically to deal with performance issues.
You could use the csvreader to quickly read the CSV.
Assuming you're using SQL Server, you use csvreader's CachedCsvReader to read the data into a DataTable which you can use with SqlBulkCopy to load into SQL Server.
I would agree with your solution. Reading the file one line at a time should avoid the overhead of reading the whole file into memory at once, which should make the application run quickly and efficiently, primarily taking time to read from the file (which is relatively quick) and parse the lines. The one note of caution I have for you is to watch out if you have embedded newlines in your CSV. I don't know if the specific CSV format you're using might actually output newlines between quotes in the data, but that could confuse this algorithm, of course.
Also, I would suggest batching the insert statements (include many insert statements in one string) before sending them to the database if this doesn't present problems in retrieving generated key values that you need to use for subsequent foreign keys (hopefully you don't need to retrieve any generated key values). Keep in mind that SQL Server (if that's what you're using) can only handle 2200 parameters per batch, so limit your batch size to account for that. And I would recommend using parameterized TSQL statements to perform the inserts. I suspect more time will be spent inserting records than reading them from the file.
You don't state which database you're using, but given the language you mention is C# I'm going to assume SQL Server.
If the data can't be imported using BCP (which it sounds like it can't if it needs significant processing) then SSIS is likely to be the next fastest option. It's not the nicest development platform in the world, but it is extremely fast. Certainly faster than any application you could write yourself in any reasonable timeframe.
BCP is pretty quick so I'd use that for loading the data. For string manipulation I'd go with a CLR function on SQL once the data is there. Multi-threading won't help in this scenario except to add complexity and hurt performance.
read the contents of the CSV file line by line into a in memory DataTable. You can manipulate the data (ie: split the first name and last name) etc as the DataTable is being populated.
Once the CSV data has been loaded in memory then use SqlBulkCopy to send the data to the database.
See http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.writetoserver.aspx for the documentation.
If you really want to do it in C#, create & populate a DataTable, truncate the target db table, then use System.Data.SqlClient.SqlBulkCopy.WriteToServer(DataTable dt).