Fastest way Store data in database using c# - c#

I have a csv file with 3 million lines and want to stored it in a database using c#. The csv file looks like "device;date;value".
Shall I write it into an array or directly into a System.Data.DataTable? And what is the fastest way to store this DataTable into a Database (SQL-Server for example).
I tried to store the lines using 3 million insert into statements but it was too slow :)
thanks

You can load the data in a DataTable and then use SqlBulkCopy for copying the date to the table in sql server
The SqlBulkCopy class can be used to write data only to SQL Server
tables. However, the data source is not limited to SQL Server; any
data source can be used, as long as the data can be loaded to a
DataTable instance or read with a IDataReader instance.
.

I'd guess BCP would be pretty fast. Once you have the data in a DataTable you can try
using (SqlBulkCopy bcp= new SqlBulkCopy(yourConnectionString))
{
BulkCopy.DestinationTableName = "TargetTable";
BulkCopy.WriteToServer(dataTable);
}

I think the best way is to open stream reader and create row line by line. Use ReadLine in a while loop and split to find differents parts.

Sending 3 millions insert statements is bordering on crazy slow!
Buffer it by using transactions and reading in for example 200-1000 lines at a time (the smaller your data, the more you can read in at a time) then, after reading in these lines, commit your inserts to the database directly.

Related

Save large result set to multiple XML files

I have a stored procedure that returns a large result set (nearly 20 million records). I need to save this result to multiple XML files. I am currently using ADO.Net to fill a dataset, but it quickly throws System.OutOfMemoryException. What other methods that I can use to accomplish this?
Are you using sql server ?
in this case there is a sql instruction to automatically convert the result of a query into a xml structure, you would then get it as a string in the application.
Options :
you split the string into several ones and save them to files (in the app)
modify PS to split result into several xml objects then get them as different strings / row (1 row => 1 object) and save each of them into a file.
write a new PS that calls the original PS, split result into X xml objects, then returns X xml strings that you just have to save in the application
Not using sql server ?
do the XML formatting in the PS or write a new one that does it
Anyway, if think it will be easier to do the xml formatting server side
Assuming you are using SQL Server - you can use paging in your stored procedure. ROW_NUMBER is an option. SQL Server 2012 and above support OFFSET and FETCH.
Also, how many DataTables are you filling? There are row limits for DataTables.
The maximum number of rows that a DataTable can store is 16,777,216
https://msdn.microsoft.com/en-us/library/system.data.datatable.aspx

Out of memory exception when pulling huge data from DB

We are pulling a huge data from sql server DB. It has around 25000 rows with 2500 columns. The requirement is to read the data and export it to spread sheet, so pagination is not a choice. When the records are less it is able to pull the data but when it grows to the size i mentioned above it is throwing exception.
public DataSet Exportexcel(string Username)
{
Database db = DatabaseFactory.CreateDatabase(Config);
DbCommand dbCommand =
db.GetStoredProcCommand("Sp_ExportADExcel");
db.AddInParameter(dbCommand, "#Username", DbType.String,
Username);
return db.ExecuteDataSet(dbCommand);
}
Please help me in resolving this issue.
The requirement is to read the data and export it to spread sheet, so
pagination is not a choice.
Why not read data in steps. Instead of getting all records at once why not get limited number of records every time and write them to excel. Continue until you have processed all the records
Your problem is purely down to the fact that you are trying to extract so much data in one go.
You may get around the problem by installing more memory in the machine doing the query, but this is just a bodge.
Your best to retrieve such amounts of data in steps.
You could quite easily read the data back row by row and export/append that in CSV format to a file and this could all be done in a stored procedure.
You don't say what database you are using, but handling such large amounts of data is what database engines are designed to cope with.
Other than that when handling large quantities of data objects in C# code its best to look into using generics as this doesn't impose object instantiation in the same way that classes do and so reduces the memory footprint.
You can use batch processing logic to fetch records in batches say 5000 records per execution and store the result in a temp dataset and once all processing is done. Dump the data from temp dataset to excel.
You can use C# BulkCopy class for this purpose.
If it is enough to have the data available in Excel as csv you can use bulk copy
bcp "select col1, col2, col3 from database.schema.SomeTable" queryout "c:\MyData.txt" -c -t"," -r"\n" -S ServerName -T
This is mangnitudes faster and has little footprint.

Process 46,000 rows of a document in groups of 1000 using C# and Linq

I have this code below that executes. IT has 46,000 records in the text file that i need to process and insert into the database. It takes FOREVER if i just call it directly and loop one at a time.
I was trying to use LINQ to pull every 1000 rows or so and throw it into a thread so I could proces 3000 rows at once and cut the processing time. I can't figure it out though. so I need some help.
Any suggestions would be welcome. Thank You in advance.
var reader = ReadAsLines(tbxExtended.Text);
var ds = new DataSet();
var dt = new DataTable();
string headerNames = "Long|list|of|strings|"
var headers = headerNames.Split('|');
foreach (var header in headers)
dt.Columns.Add(header);
var records = reader.Skip(1);
foreach (var record in records)
dt.Rows.Add(record.Split('|'));
ds.Tables.Add(dt);
ds.AcceptChanges();
ProcessSmallList(ds);
If you are looking for high performance then look at the SqlBulkInsert if you are using SqlServer. The performance is significantly better than Insert row by row.
Here is an example using a custom CSVDataReader that I used for a project, but any IDataReader compatible Reader, DataRow[] or DataTable can be used as a parameter into WriteToServer, SQLDataReader, OLEDBDataReader etc.
Dim sr As CSVDataReader
Dim sbc As SqlClient.SqlBulkCopy
sbc = New SqlClient.SqlBulkCopy(mConnectionString, SqlClient.SqlBulkCopyOptions.TableLock Or SqlClient.SqlBulkCopyOptions.KeepIdentity)
sbc.DestinationTableName = "newTable"
'sbc.BulkCopyTimeout = 0
sr = New CSVDataReader(parentfileName, theBase64Map, ","c)
sbc.WriteToServer(sr)
sr.Close()
There are quite a number of options available. (See the link in the item)
To bulk insert data into a database, you probably should be using that database engine's bulk-insert utility (e.g. bcp in SQL Server). You might want to first do the processing, write out the processed data into a separate text file, then bulk-insert into your database of concern.
If you really want to do the processing on-line and insert on-line, memory is also a (small) factor, for example:
ReadAllLines reads the whole text file into memory, creating 46,000 strings. That would occupying a sizable chunk of memory. Try to use ReadLines instead which returns an IEnumerable and return strings one line at a time.
Your dataset may contain all 46,000 rows in the end, which will be slow in detecting changed rows. Try to Clear() the dataset table right after insert.
I believe the slowness you observed actually came from the dataset. Datasets issue one INSERT statement per new record, which means that you won't be saving anything by doing Update() 1,000 rows at a time or one row at a time. You still have 46,000 INSERT statements going to the database, which makes it slow.
In order to improve performance, I'm afraid LINQ can't help you here, since the bottleneck is with the 46,000 INSERT statements. You should:
Forgo the use of datasets
Dynamically create an INSERT statement in a string
Batch the update, say, 100-200 rows per command
Dynamically build the INSERT statement with multiple VALUE statments
Run the SQL command to insert 100-200 rows per batch
If you insist on using datasets, you don't have to do it with LINQ -- LINQ solves a different type of problems. Do something like:
// code to create dataset "ds" and datatable "dt" omitted
// code to create data adaptor omitted
int count = 0;
foreach (string line in File.ReadLines(filename)) {
// Do processing based on line, perhaps split it
dt.AddRow(...);
count++;
if (count >= 1000) {
adaptor.Update(dt);
dt.Clear();
count = 0;
}
}
This will improve performance somewhat, but you're never going to approach the performance you obtain by using dedicated bulk-insert utilities (or function calls) for your database engine.
Unfortunately, using those bulk-insert facilities will make your code less portable to another database engine. This is the trade-off you'll need to make.

Bulk Insert into access database from c#?

How can I do this. I have about 10000 records in an an Excel file and I want to insert all records as fast as possible into an access database?
Any suggestions?
What you can do is something like this:
Dim AccessConn As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Test Files\db1 XP.mdb")
AccessConn.Open()
Dim AccessCommand As New System.Data.OleDb.OleDbCommand("SELECT * INTO [ReportFile] FROM [Text;DATABASE=C:\Documents and Settings\...\My Documents\My Database\Text].[ReportFile.txt]", AccessConn)
AccessCommand.ExecuteNonQuery()
AccessConn.Close()
Switch off the indexing on the affected tables before starting the load and then rebuilding the indexes from scratch after the bulk load has finished. Rebuilding the indexes from scratch is faster than trying to keep them up to date while loading large amount of data into a table.
If you choose to insert row by row, then maybe want to you consider using transactions. Like, open transaction, insert 1000 records, commit transaction. This should work fine.
Use the default data import features in Access. If that does not suit your needs and you want to use C#, use standard ADO.NET and simply write record-for-record. 10K records should not take too long.

Import table data into database that expoted before

I am working on a feather that export some tables(~50) to a disk file and import the file back to database. Export is quite easy, serialize dataset to a file stream. But when importing: table structure need to be determined dynamically.What I am doing now :
foreach table in dataset
(compare table schemas that in db and imported dataset)
define a batch command
foreach row in table
contruct a single insert sqlcommand,add it to batch command
execute batch insert command
this is very inefficient and I also I meet some problem to convert datatype in dataset datatable to database datatable. So I want to know is there some good method to do so?
Edit:
In fact, import and export is 2 functions(button) in program, On UI, there is a grid that list lots of tables, what I need to implement is to export selected tables's data to a disk file and import data back to database later
Why not use SQL Server's native Backup and Restore functionality? You can do incremental Restores on the data, and it's by far the fastest way to export and then import data again.
There are a lot of very advanced options to take into account some fringe cases, but at it's heart, it's two commands: Backup Database and Restore Database.
backup database mydb to disk = 'c:\my\path\to\backup.bak'
restore database mydb from disk = 'c:\my\path\to\backup.bak'
When doing this against TB-sized databases, it takes about 45 minutes to an hour in my experience. Much faster than trying to go through every row!
I'm guessing you are using SQL server? if so I would
a) make sure the table names are showing up in the export
b) look into the BulkCopy command. that will allow you to push an entire table in. so you can loop through the datatables and bulk copy each one in.
using (SqlBulkCopy copy = new SqlBulkCopy(MySQLExpConn))
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.ColumnMappings.Add(3, 3);
copy.ColumnMappings.Add(4, 4);
copy.ColumnMappings.Add(5, 5);
copy.ColumnMappings.Add(6, 6);
copy.DestinationTableName = ds.Tables[i].TableName;
copy.WriteToServer(ds.Tables[i]);
}
You can use XML serializatin but you will need good ORML tool like NHibernation etc to help you with it. XML Serialization will maintain its data type and will work flowlessly.
You can read entire table and serialize all values into xml file, and you can read entire xml file back into list of objects and you can store them into database. Using good ORML tool you will not need to write any SQL. And I think they can work on different database servers as well.
I finally choose SqlCommandBuilder to build insert command automatically
See
SqlCommandBuilder Class

Categories

Resources