We are importing a csv file with CSVReader then using SqlBulkCopy to insert that data into SQL Server. This code works for us and is very simple, but wondering if there is a faster method (some of our files have 100000 rows) that would also not get too complex?
SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
SqlTransaction transaction = conn.BeginTransaction();
try
{
using (TextReader reader = File.OpenText(sourceFileLocation))
{
CsvReader csv = new CsvReader(reader, true);
SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
copy.DestinationTableName = reportType.ToString();
copy.WriteToServer(csv);
transaction.Commit();
}
}
catch (Exception ex)
{
transaction.Rollback();
success = false;
SendFileImportErrorEmail(Path.GetFileName(sourceFileLocation), ex.Message);
}
finally
{
conn.Close();
}
Instead of building your own tool to do this, have a look at SQL Server Import and Export / SSIS. You can target flat files and SQL Server databases directly. The output dtsx package can also be run from the command line or as a job through the SQL Server Agent.
The reason I am suggesting it is because the wizard is optimized for parallelism and works really well on large flat files.
You should consider using a Table-Valued Parameter (TVP), which is based on a User-Defined Table Type (UDTT). This ability was introduced in SQL Server 2008 and allows you to define a strongly-typed structure that can be used to stream data into SQL Server (if done properly). An advantage of this approach over using SqlBulkCopy is that you can do more than a simple INSERT into a table; you can do any logic that you want (validate / upsert / etc) since the data arrives in the form of a Table Variable. You can deal with all of the import logic in a single Stored Procedure that can easily use local temporary tables if any of the data needs to be staged first. This makes it rather easy to isolate the process such that you can run multiple instances at the same time as long as you have a way to logically separate the rows being imported.
I posted a detailed answer on this topic here on S.O. a while ago, including example code and links to other info:
How can I insert 10 million records in the shortest time possible?
There is even a link to a related answer of mine that shows another variation on that theme. I have a third answer somewhere that shows a batched approach if you have millions of rows, which you don't, but as soon as I find that I will add the link here.
Related
I have a lot of data in several tables that I am pulling from and combining into one view. I need to have a daily job using c# to pull all of this data then insert it into a separate database/table running on a different server. The data consists of some 150+ columns once combined and I don't want to use reader.read() reader.getstring() reader.etc for every column then combine it all into a string to insert again. Is there a way to just pass the results of an sql query to an insert in a simple and compact way?
private static void GetPrimaryData(string query)
{
using (MySqlConnection connection = new MySqlConnection(_awsOptionsDBConnectionString))
{
connection.Open();
using (MySqlCommand command = new MySqlCommand(query, connection))
{
using (MySqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32("tsid"));
}
}
}
connection.Close();
}
}
Ideally I'd just replace the Console.WriteLine(reader... part of code with some sort of insert where I pass the reader or the entire result of the reader query in.
Whether the data comes directly from a database, or from a user, one still needs to send the data to a database in the structure the database defines.
Your question is asking for a magical converter which would work with any type of data and send it to any database. There is no such tool or library in C#/.Net.
One should look into a cloud solution that has already been developed such as Azure Data Factory or Informatica, etc.
Otherwise write the mappings/translations as needed in C# and use an ORM to send it the data to the database.
I've inherited an application with a lot of ADO work in it, but the insert/update helper method that was written returns void. We've also been experiencing a lot of issues with data updates/inserts not actually happening. My goal is to update all of them to check rows affected and depending on the results, act accordingly, but for the time being of finding what may be causing the issue, I wanted to log SQL statements that are called against the server and the number of rows affected by the statement.
This is the statement I'm attempting:
SqlCommand com = new SqlCommand(String.Format("'INSERT INTO
SqlUpdateInsertHistory(Statement, AffectedRows) VALUES (''{0}'', {1});'",
statement.Replace("'", "''"), rows), con);
but it seems to constantly break somewhere in the sql that is being passed in (some cases on single quotes, but I imagine there are other characters that could cause it as well.
Is there a safe way to prep a statement string to be inserted?
I just can't rightly propose a solution to this question without totally modifying what you're doing. You're currently wide open to SQL Injection. Even if this is a local application, practice how you want to play.
using (SqlCommand com = new SqlCommand("INSERT INTO SqlUpdateInsertHistory(Statement, AffectedRows) VALUES (#Statement, #AffectedRows)", con))
{
com.Parameters.AddWithValue("#Statement", statement);
com.Parameters.AddWithValue("#AffectedRows", rows);
com.ExecuteNonQuery();
}
Have you tried SQL Server Profiler? It's already been written and logs queries, etc.
Someone else tried this and got a lot of decent answers here.
I have a csv file with 3 million lines and want to stored it in a database using c#. The csv file looks like "device;date;value".
Shall I write it into an array or directly into a System.Data.DataTable? And what is the fastest way to store this DataTable into a Database (SQL-Server for example).
I tried to store the lines using 3 million insert into statements but it was too slow :)
thanks
You can load the data in a DataTable and then use SqlBulkCopy for copying the date to the table in sql server
The SqlBulkCopy class can be used to write data only to SQL Server
tables. However, the data source is not limited to SQL Server; any
data source can be used, as long as the data can be loaded to a
DataTable instance or read with a IDataReader instance.
.
I'd guess BCP would be pretty fast. Once you have the data in a DataTable you can try
using (SqlBulkCopy bcp= new SqlBulkCopy(yourConnectionString))
{
BulkCopy.DestinationTableName = "TargetTable";
BulkCopy.WriteToServer(dataTable);
}
I think the best way is to open stream reader and create row line by line. Use ReadLine in a while loop and split to find differents parts.
Sending 3 millions insert statements is bordering on crazy slow!
Buffer it by using transactions and reading in for example 200-1000 lines at a time (the smaller your data, the more you can read in at a time) then, after reading in these lines, commit your inserts to the database directly.
I have some code that at the end of the program's life, uploads the entire contents of 6 different lists into a database. The problem is, they're parallel lists with about 14,000 items in each, and I have to run an Insert query for each of their separate item(s). This takes a long time, is there a faster way to do this? Here's a sample of the relevant code:
public void uploadContent()
{
var cs = Properties.Settings.Default.Database;
SqlConnection dataConnection = new SqlConnection(cs);
dataConnection.Open();
for (int i = 0; i < urlList.Count; i++)
{
SqlCommand dataCommand = new SqlCommand(Properties.Settings.Default.CommandString, dataConnection);
try
{
dataCommand.Parameters.AddWithValue("#user", userList[i]);
dataCommand.Parameters.AddWithValue("#computer", computerList[i]);
dataCommand.Parameters.AddWithValue("#date", timestampList[i]);
dataCommand.Parameters.AddWithValue("#itemName", domainList[i]);
dataCommand.Parameters.AddWithValue("#itemDetails", urlList[i]);
dataCommand.Parameters.AddWithValue("#timesUsed", hitsList[i]);
dataCommand.ExecuteNonQuery();
}
catch (Exception e)
{
using (StreamWriter sw = File.AppendText("errorLog.log"))
{
sw.WriteLine(e);
}
}
}
dataConnection.Close();
}
Here is the command string the code is pulling from the config file:
CommandString:
INSERT dbo.InternetUsage VALUES (#user, #computer, #date, #itemName, #itemDetails, #timesUsed)
As mentioned in #alerya's answer, doing the following will help (added explanation here)
1) Make Command and parameter creation outside of for loop
Since the same command is being used each time, it doesn't make sense to re-create the command each time. In addition to creating a new object (which takes time), the command must also be verified each time it is created for several things (table exists, etc). This introduces a lot of overhead.
2) Put the inserts within a transaction
Putting all of the inserts within a transaction will speed things up because, by default, a command that is not within a transaction will be considered its own transaction. Therefore, every time you insert something, the database server must then verify that what it just inserted is actually saved (usually on a harddisk, which is limited by the speed of the disk). When multiple INSERTs are within one transactions, however, the check only needs to be performed once.
The downside to this approach, based on the code you've already shown, is that one bad INSERT will spoil the bunch. Whether or not this is acceptable depends on your specific requirements.
Aside
Another thing you really should be doing (though this won't speed things up in the short term) is properly using the IDisposable interface. This means either calling .Dispose() on all IDisposable objects (SqlConnection, SqlCommand), or, ideally, wrapping them in using() blocks:
using( SqlConnection dataConnection = new SqlConnection(cs)
{
//Code goes here
}
This will prevent memory leaks from these spots, which will become a problem quickly if your loops get too large.
Make Command and parameter creation outside of for (int i = 0; i < urlList.Count; i++)
Also Create insert within a transaction
If it possible create a Stored Procedure and pass parameters as DataTable.
Sending INSERT commands one by one to a database will really make the whole process slow, because of the round trips to the database server. If you're worried about performance, you should consider using a bulk insert strategy. You could:
Generate a flat file with all your information, in the format that BULK INSERT understands.
Use the BULK INSERT command to import that file to your database (http://msdn.microsoft.com/en-us/library/ms188365(v=sql.90).aspx).
Ps. I guess when you say SQL you're using MS SQL Server.
Why don't you run your uploadContent() method from a separate thread.
This way you don't need to worry about how much time the query takes to execute.
I am working on a feather that export some tables(~50) to a disk file and import the file back to database. Export is quite easy, serialize dataset to a file stream. But when importing: table structure need to be determined dynamically.What I am doing now :
foreach table in dataset
(compare table schemas that in db and imported dataset)
define a batch command
foreach row in table
contruct a single insert sqlcommand,add it to batch command
execute batch insert command
this is very inefficient and I also I meet some problem to convert datatype in dataset datatable to database datatable. So I want to know is there some good method to do so?
Edit:
In fact, import and export is 2 functions(button) in program, On UI, there is a grid that list lots of tables, what I need to implement is to export selected tables's data to a disk file and import data back to database later
Why not use SQL Server's native Backup and Restore functionality? You can do incremental Restores on the data, and it's by far the fastest way to export and then import data again.
There are a lot of very advanced options to take into account some fringe cases, but at it's heart, it's two commands: Backup Database and Restore Database.
backup database mydb to disk = 'c:\my\path\to\backup.bak'
restore database mydb from disk = 'c:\my\path\to\backup.bak'
When doing this against TB-sized databases, it takes about 45 minutes to an hour in my experience. Much faster than trying to go through every row!
I'm guessing you are using SQL server? if so I would
a) make sure the table names are showing up in the export
b) look into the BulkCopy command. that will allow you to push an entire table in. so you can loop through the datatables and bulk copy each one in.
using (SqlBulkCopy copy = new SqlBulkCopy(MySQLExpConn))
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.ColumnMappings.Add(3, 3);
copy.ColumnMappings.Add(4, 4);
copy.ColumnMappings.Add(5, 5);
copy.ColumnMappings.Add(6, 6);
copy.DestinationTableName = ds.Tables[i].TableName;
copy.WriteToServer(ds.Tables[i]);
}
You can use XML serializatin but you will need good ORML tool like NHibernation etc to help you with it. XML Serialization will maintain its data type and will work flowlessly.
You can read entire table and serialize all values into xml file, and you can read entire xml file back into list of objects and you can store them into database. Using good ORML tool you will not need to write any SQL. And I think they can work on different database servers as well.
I finally choose SqlCommandBuilder to build insert command automatically
See
SqlCommandBuilder Class