Bulk query insert in c#

Bulk query insert in c# - c#

I am inserting record using bulk query from source table to destination table. Source table have 10,000 records. Suppose source table have a column sid int, sname varchar(60) and destination column have sid int, sname varchar(30).
Now I was not able to insert all record successfully, as length problem in source sname and destination sname. As only few rows have a problem.
Now my question is that is there any way to insert record in destination table using bulk insert so that correct record is inserted and incorrect record is not inserted.
I am using c# 3.5
code which I am using
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constring, SqlBulkCopyOptions.UseInternalTransaction))
{
try
{
//Edit mapping.
bulkCopy.ColumnMappings.Add("AddressID", "AddressID");
bulkCopy.ColumnMappings.Add("AddressLine1", "AddressLine1");
bulkCopy.ColumnMappings.Add("City", "City");
//specify destination table.
bulkCopy.DestinationTableName = "[Address]";
bulkCopy.BatchSize = 100;
bulkCopy.NotifyAfter = 100;
// bulkCopy.SqlRowsCopied += new SqlRowsCopiedEventHandler(bulkCopy_SqlRowsCopied);
bulkCopy.WriteToServer(table);
}
catch (Exception ex)
{
richTextBox1.AppendText("\n\n" + ex.Message);
}
}
Thanks.

Are you utilizing BULK INSERT through SQL Server, or are you using the System.Data.SqlClient.SqlBulkCopy class?
If you are using the BULK INSERT T-SQL, then you can just set MAXERRORS equal to whatever your desireble threshold is (the default is 10, which is why your `BULK INSERT is probably canceling out).
BULK INSERT YourDb.YourSchema.YourTable
FROM 'c:\YourFile.txt'
WITH
(
FIELDTERMINATOR =' |',
ROWTERMINATOR =' |\n',
MAXERRORS = 1000 -- or however many rows you think are exceeding the 30 char limit
)
EDIT
After seeing your are using the SqlBulkInsert class, I think your best bet would be to modify the DataTable before calling the SqlBulkCopy.WriteToServer() method:
foreach(DataRow row in table.Rows)
{
if (row["YourColumnThatExceeds30Chars"].ToString().Length > 30)
row["YourColumnThatExceeds30Chars"] =
row["YourColumnThatExceeds30Chars"].ToString().Substring(0, 30);
}

Related

SqlBulkCopy datatypes conversion

I have two DB tables, they have the same columns but their data types are different(E.g.: "Check" column is of a type integer in table 1, but varchar in table2). I am trying to copy the data from one table to another by using BulkCopy. I have a code like:
using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
{
cmdSQLT = new SqlCommand("SELECT " + ColumnsNames + " FROM [transfer].[" + SelectedScheme + ".OldTable]", conn);
cmdSQLT.CommandTimeout = 1200;
reader = cmdSQLT.ExecuteReader();
sbc.ColumnMappings.Add("CHECK", "CHECK");
sbc.DestinationTableName = "[" + SelectedScheme + "_Newtable]";
sbc.BulkCopyTimeout = 1200;
sbc.WriteToServer(reader);
}
I am getting an error saying
The locale id '0' of the source column 'CHECK' and the locale id
'1033' of the destination column 'CHECK' do not match.
This is happening due to the data types differences between the tables. How can I make data type conversion in the previous code?
Your help is much appreciated!

You can do the conversion on source select using a CAST() statement.
However, if the target connection have access to the source database then instead of doing a SqlBulkCopy a single "insert into < target > select ... from < source >" statement would be a much more effective solution.
ie:
var colNames = ColumnsNames.Split(',').ToArray();
for(int i=0;i< colNames.Length;i++)
{
if (colNames[i].ToUpper() == "CHECK")
{
colNames[i] = "cast(Check as varchar(10))"
}
}
ColumnsNames = string.Join(",",colNames);

Might not be what you expect but for simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements. For example:
INSERT tbl_A (col, col2)
SELECT col, col2
FROM tbl_B
WHERE NOT EXISTS (SELECT col FROM tbl_A A2 WHERE A2.col = tbl_B.col);
If it's more about column/table sync, the Merge keyword could be what you are looking for.

Bulk Copy Commit Rows Without Errors

I have a process that takes a lists and inserts it into a database using SQL bulk copy because of how particularly large this list can be. It works fine, checks constraints and all which is perfect. The problem is, if I have 10,000 records and one of those records has an error, I still want to commit the other 9,999. Is there a way to do this other than manually checking each constraint before SQL bulk copy or inserting one at a time? Seems tedious and slow which kind of defeats the point. Thanks.
var copy = new SqlBulkCopy(ConfigurationManager.ConnectionStrings["constr"].ConnectionString, SqlBulkCopyOptions.CheckConstraints)
{
DestinationTableName = obj.TableName
};
var table = new DataTable();
copy.WriteToServer(table);

Without setting a batch size to 1 (which would defeat the purpose of the bulk copy) or pre-checking the data before the copy the normal way around this issue is you copy in to a temporary table with the same schema as your target table but with no constraints, remove the rows that would violate the constraints on insert, then do a normal insert from the temp table in to your live table.
const string _createTableString = "Create table #temp (/* SNIP */)";
const string _insertTableString = #"
declare #sql nvarchar(2000)
set #sql = N'INSERT INTO ' + QUOTENAME(#tableName) + N' SELECT * from #temp'
exec sp_executesql #sql";
using (var connection = new SqlConnection(ConfigurationManager.ConnectionStrings["constr"].ConnectionString))
{
connection.Open();
using (var command = new SqlCommand(_createTableString, connection))
{
command.ExecuteNonQuery();
}
using (var copy = new SqlBulkCopy(connection))
{
copy.DestinationTableName = "#temp";
copy.WriteToServer(table);
}
using (var command = new SqlCommand(_insertTableString, connection))
{
command.Parameters.AddWithValue("#tableName", obj.TableName)
command.ExecuteNonQuery();
}
}
Note the use of QUOTENAME to make sure that no SQL injections can sneak in via the name of the table passed in to obj.TableName.

SqlBulkcopy .net 4.0 cannot access destination table

I have written a program that in .net that should copy tables data from one server to another. However I am getting an error:
cannot access destination table "mytable"
Despite googling and looking everywhere I cannot find a solution to the error I am getting
Some posts mentions permissions and I have done the following:
GRANT SELECT, UPDATE, DELETE, INSERT TO bulkadmin
but still no success.
Am I missing the obvious?
Help is greatly appreciated.
EDIT
I bulk copy 3 databases with 1000 tables to 01 "target" database.
I have simplified the code that I use and also tested with no luck.The intention is todo in Parallel ,but I want to get it working with a simple table first
private void TestBulkCopy(string sourceServer, string sourceDatabase, List<string> sourceTables)
{
string connectionStringSource = ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString;
string connectionStringTarget = ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString;
string sqlGetDataFromSource = string.Format("SELECT * FROM {0}", "testTable");
using (var sourceConnection = new SqlConnection(connectionStringSource))
{
sourceConnection.Open();
using (var cmdSource = new SqlCommand(sqlGetDataFromSource, sourceConnection))
using (SqlDataReader readerSource = cmdSource.ExecuteReader())
{
using (var sqlTargetConnection = new SqlConnection(connectionStringTarget))
{
sqlTargetConnection.Open();
using (var bulkCopy = new SqlBulkCopy(sqlTargetConnection, SqlBulkCopyOptions.TableLock, null))
{
bulkCopy.DestinationTableName = "testTable";
bulkCopy.SqlRowsCopied += OnSqlRowsCopied;
bulkCopy.BatchSize = 2600;
bulkCopy.NotifyAfter = 50;
bulkCopy.BulkCopyTimeout = 60;
bulkCopy.WriteToServer(readerSource);
}
}
}
}
}
}

Write the Schema before the table Name
Change
bulkCopy.DestinationTableName = "testTable";
to
bulkCopy.DestinationTableName = "dbo.testTable";

I think your destination table have defined field with auto-number identify. So, SqlBulkCopy can not copy values into that column. You must OFF that to-number identify column on destination table using this code :
BEGIN
SET IDENTITY_INSERT [building] ON;
INSERT INTO [Table2](.....)
VALUES(#id, #id_project,....)
SET IDENTITY_INSERT [building] OFF;
END
or edit definition of destination table and remove auto-number identify on that column.

The table name in WriteToServer method of SqlBulkCopy must be surrounded with [ ] signs.

Very slow batching update with MySQL .Net/Connector 6.5.4/6.6.4

I am using MySQL5.6.9-rc with .net connector 6.5.4 to insert data into a table that has two fields (Interger ID, Integer Data, ID is the primary key). It is very slow (about 35 seconds) to insert 2000 rows into the table (no much difference for UpdateBatchSize = 1 and UpdateBatchSize = 500), I also tried connector 6.6.4, the problem remains.
However it is fast with MySQL5.4.3 and connector 6.20, it just took one second to insert 2000 rows to the table if sets UpdateBatchSize to 500 (it's also slow if UpdateBatchSize = 1). then I tested it with MySQL5.4.3 and connector 6.5.4 or 6.6.4, it is slow!
I wrote the code to insert data like below, run it with mysql6.6.9 and connector 6.54, Windows XP and VS2008.
public void Test()
{
MySqlConnection conn = new MySqlConnection("Database=myDatabase;Server=localhost;User Id=root;Password=myPassword");
string sql = "Select * from myTable";
MySqlDataAdapter adapter = new MySqlDataAdapter(sql, conn);
adapter.UpdateBatchSize = 500;
MySqlCommandBuilder commandBuilder = new MySqlCommandBuilder(adapter);
DataTable table = new DataTable();
adapter.Fill(table); //it is an empty table
Add2000RowsToTable(table);
int count = adapter.Update(table); //It took 35 seconds to complete.
adapter.Dispose();
conn.Close();
}
private void Add2000RowsToTable(DataTable table)
{
DataRow row;
for (int i = 0; i < 2000; i++)
{
row = table.NewRow();
row[0] = i;
row[1] = i;
table.Rows.Add(row);
}
}
It seems to me that MySqlDataAdapter.UpdateBatchSize is not functional with connector 6.5.4 and 6.64, is something wrong with my code?
Thanks in advance

Although this takes a bit of initial coding (and doesn't solve your issue directly), I highly recommend using LOAD DATA INFILE for anything longer than maybe 100 records.
In fact, in my own system, I've coded it once and I reuse it for all my inserts and updates, whether bulk or not.
LOAD DATA INFILE is much more scalable: I've used it to insert 100 million rows without noticeable performance degradation.

Did more test...
Check the logs in mysql server, for connector 6.20, it generates the sql statements for batching updates like below:
insert into mytable (id, data) values (0,0),(1,1),(2,2) ...
but for connector 6.54 and 6.64, the statements are different:
insert into mytable (id, data) values (0,0); Insert into mytable (id, data) values (1,1);Insert into mytable (id, data) values (2,2); ...
I think this is the reason why batching update is so slow with connector 6.5.4/6.6.4, is it a bug for 6.5.4/6.6.4? or the server (tried mysql 5.5.29/5.6.9) should handle the statements more smart?

The solution I went with was to write the bulk row data as CSV into a file and then import using the following command:
LOAD DATA LOCAL INFILE 'C:/path/to/file.csv'
INTO TABLE <tablename>
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(<field1>,<field2>);
only took about 4 seconds for 30000 rows. It's similar to the above recommendation, but allows you to use a file local to your system and not the server.

Possible to get PrimaryKey IDs back after a SQL BulkCopy?

I am using C# and using SqlBulkCopy. I have a problem though. I need to do a mass insert into one table then another mass insert into another table.
These 2 have a PK/FK relationship.
Table A
Field1 -PK auto incrementing (easy to do SqlBulkCopy as straight forward)
Table B
Field1 -PK/FK - This field makes the relationship and is also the PK of this table. It is not auto incrementing and needs to have the same id as to the row in Table A.
So these tables have a one to one relationship but I am unsure how to get back all those PK Id that the mass insert made since I need them for Table B.
Edit
Could I do something like this?
SELECT *
FROM Product
WHERE NOT EXISTS (SELECT * FROM ProductReview WHERE Product.ProductId = ProductReview.ProductId AND Product.Qty = NULL AND Product.ProductName != 'Ipad')
This should find all the rows that where just inserted with the sql bulk copy. I am not sure how to take the results from this then do a mass insert with them from a SP.
The only problem I can see with this is that if a user is doing the records one at a time and a this statement runs at the same time it could try to insert a row twice into the "Product Review Table".
So say I got like one user using the manual way and another user doing the mass way at about the same time.
manual way.
1. User submits data
2. Linq to sql Product object is made and filled with the data and submited.
3. this object now contains the ProductId
4. Another linq to sql object is made for the Product review table and is inserted(Product Id from step 3 is sent along).
Mass way.
1. User grabs data from a user sharing the data.
2. All Product rows from the sharing user are grabbed.
3. SQL Bulk copy insert on Product rows happens.
4. My SP selects all rows that only exist in the Product table and meets some other conditions
5. Mass insert happens with those rows.
So what happens if step 3(manual way) is happening at the same time as step 4(mass way). I think it would try to insert the same row twice causing a primary constraint execption.

In that scenario, I would use SqlBulkCopy to insert into a staging table (i.e. one that looks like the data I want to import, but isn't part of the main transactional tables), and then at the DB to a INSERT/SELECT to move the data into the first real table.
Now I have two choices depending on the server version; I could do a second INSERT/SELECT to the second real table, or I could use the INSERT/OUTPUT clause to do the second insert , using the identity rows from the table.
For example:
-- dummy schema
CREATE TABLE TMP (data varchar(max))
CREATE TABLE [Table1] (id int not null identity(1,1), data varchar(max))
CREATE TABLE [Table2] (id int not null identity(1,1), id1 int not null, data varchar(max))
-- imagine this is the SqlBulkCopy
INSERT TMP VALUES('abc')
INSERT TMP VALUES('def')
INSERT TMP VALUES('ghi')
-- now push into the real tables
INSERT [Table1]
OUTPUT INSERTED.id, INSERTED.data INTO [Table2](id1,data)
SELECT data FROM TMP

If your app allows it, you could add another column in which you store an identifier of the bulk insert (a guid for example). You would set this id explicitly.
Then after the bulk insert, you just select the rows that have that identifier.

I had the same issue where I had to get back ids of the rows inserted with SqlBulkCopy.
My ID column was an identity column.
Solution:
I have inserted 500+ rows with bulk copy, and then selected them back with the following query:
SELECT TOP InsertedRowCount *
FROM MyTable
ORDER BY ID DESC
This query returns the rows I have just inserted with their ids. In my case I had another unique column. So I selected that column and id. Then mapped them with a IDictionary like so:
IDictionary<string, int> mymap = new Dictionary<string, int>()
mymap[Name] = ID
Hope this helps.

My approach is similar to what RiceRiceBaby described, except one important thing to add is that the call to retrieve Max(Id) needs to be a part of a transaction, along with the call to SqlBulkCopy.WriteToServer. Otherwise, someone else may insert during your transaction and this would make your Id's incorrect. Here is my code:
public static void BulkInsert<T>(List<ColumnInfo> columnInfo, List<T> data, string
destinationTableName, SqlConnection conn = null, string idColumn = "Id")
{
NLogger logger = new NLogger();
var closeConn = false;
if (conn == null)
{
closeConn = true;
conn = new SqlConnection(_connectionString);
conn.Open();
}
SqlTransaction tran =
conn.BeginTransaction(System.Data.IsolationLevel.Serializable);
try
{
var options = SqlBulkCopyOptions.KeepIdentity;
var sbc = new SqlBulkCopy(conn, options, tran);
var command = new SqlCommand(
$"SELECT Max({idColumn}) from {destinationTableName};", conn,
tran);
var id = command.ExecuteScalar();
int maxId = 0;
if (id != null && id != DBNull.Value)
{
maxId = Convert.ToInt32(id);
}
data.ForEach(d =>
{
maxId++;
d.GetType().GetProperty(idColumn).SetValue(d, maxId);
});
var dt = ConvertToDataTable(columnInfo, data);
sbc.DestinationTableName = destinationTableName;
foreach (System.Data.DataColumn dc in dt.Columns)
{
sbc.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
}
sbc.WriteToServer(dt);
tran.Commit();
if(closeConn)
{
conn.Close();
conn = null;
}
}
catch (Exception ex)
{
tran.Rollback();
logger.Write(LogLevel.Error, $#"An error occurred while performing a bulk
insert into table {destinationTableName}. The entire
transaction has been rolled back.
{ex.ToString()}");
throw ex;
}
}

Depending on your needs and how much control you have of the tables, you may want to consider using UNIQUEIDENTIFIERs (Guids) instead of your IDENTITY primary keys. This moves key management outside of the database and into your application. There are some serious tradeoffs to this approach, so it may not meet your needs. But it may be worth considering. If you know for sure that you'll be pumping a lot of data into your tables via bulk-insert, it is often really handy to have those keys managed in your object model rather than your application relying on the database to give you back the data.
You could also take a hybrid approach with staging tables as suggested before. Get the data into those tables using GUIDs for the relationships, and then via SQL statements you could get the integer foreign keys in order and pump data into your production tables.

I would:
Turn on identity insert on the table
Grab the Id of the last row of the table
Loop from (int i = Id; i < datable.rows.count+1; i++)
In the loop, assign the Id property of your datable to i+1.
Run your SQL bulk insert with your keep identity turned on.
Turn identity insert back off
I think that's the safest way to get your ids on an SQL bulk insert because it will prevent mismatched ids that could caused by the application be executed on another thread.

Disclaimer: I'm the owner of the project C# Bulk Operations
The library overcome SqlBulkCopy limitations and add flexible features like output inserted identity value.
Behind the code, it does exactly like the accepted answer but way easier to use.
var bulk = new BulkOperation(connection);
// Output Identity
bulk.ColumnMappings.Add("ProductID", ColumnMappingDirectionType.Output);
// ... Column Mappings...
bulk.BulkInsert(dt);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.