Insert large amount of data into database using LINQ - c#

I want to insert around 1 million records into a database using Linq in ASP.NET MVC. But when I try the following code it didn't work. It's throwing an OutOfMemoryException. And also it took 3 days in the loop. Can anyone please help me on this???
db.Database.ExecuteSqlCommand("DELETE From [HotelServices]");
DataTable tblRepeatService = new DataTable();
tblRepeatService.Columns.Add("HotelCode",typeof(System.String));
tblRepeatService.Columns.Add("Service",typeof(System.String));
tblRepeatService.Columns.Add("Category",typeof(System.String));
foreach (DataRow row in xmltable.Rows)
{
string[] servicesarr = Regex.Split(row["PAmenities"].ToString(), ";");
for (int a = 0; a < servicesarr.Length; a++)
{
tblRepeatService.Rows.Add(row["HotelCode"].ToString(), servicesarr[a], "PA");
}
String[] servicesarrA = Regex.Split(row["RAmenities"].ToString(), ";");
for (int b = 0; b < servicesarrA.Length; b++)
{
tblRepeatService.Rows.Add(row["hotelcode"].ToString(), servicesarrA[b], "RA");
}
}
HotelAmenties _hotelamenties;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
}
db.SaveChanges();
tblRepeatService table has around 1 million rows.

Bulk inserts like this are highly inefficient in LINQtoSQL. Every insert creates at least three objects (the DataRow, the HotelAmenities object and the tracking record for it), chewing up memory on objects you don't need.
Given that you already have a DataTable, you can use System.Data.SqlClient.SqlBulkCopy to push the content of the table to a temporary table on the SQL server, then use a single insert statement to load the data into its final destination. This is the fastest way I have found so far to move many thousands of records from memory to SQL.

If performance doesn't matter and this is a 1 shot job you can stick to the way you're using. Your problem is you're only saving at the end, so entity Framework has to store and generate the SQL for 1 million operations at once, modify your code so that you save every 1000 or so inserts instead of only at the end and it should work just fine.
int i = 0;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
if((i%1000)==0){
db.SaveChanges();
}
i++;
}
db.SaveChanges();

Related

How to fill PostgreSQL tables from C# DataSet via Npgsql?

I have a DataSet in C# with DataTables and PostgreSQL database with the same tables. I fill DataTable in my code and want to INSERT DataTable to Postgresql DataBase. I tried to insert it with simple SQL queries (INSERT INTO...), but it's very slowly if I have hundred tables of thousands rows. I guess, using DataAdapter will improve performance, but I cant understand, how does it work. Can you explain me at two cases example?
case1:
Inserting DataSet's tables to Postgresql with DataAdapter
case2:
Inserting only uniq values from DataSet to PostgreSQL (if table in database have rows with uniq keys and DataTable contain the same)
Or maybe you can suggest what to read to learn DataAdapters... Anyway, thanks.
With the exception of trivially small datasets, you're going to have a hard time beating the performance of NpgSql's implementation of copy, which can be accomplished via the BeginTextImport method of your NpgSqlConnection object.
So, regardless of how your data exists in your application, if you dump the output via the text import (copy), it should be very zippy. Here is an example of how you would do that with a datatable. Bear in mind the columns in the datatable and the columns in the table would have to line up -- if not, you need to manage that one way or the other.
This presupposes NpgSql 3.1.9 or higher.
object[] outRow = new object[dt.Columns.Count];
using (var writer = conn.BeginTextImport("copy <table> from STDIN WITH NULL AS '' CSV"))
{
foreach (DataRow rw in dt.Rows)
{
for (int col = 0; col < dt.Columns.Count; col++)
outRow[col] = rw[col];
writer.WriteLine(string.Join(",", outRow));
}
}
As far as duplicates... wow, that really depends. Define "duplicates." If it's just a "select distinct," then it also depends on how many duplicates you expect. If it's a small amount, a List.Exists<> would probably be adequate, but if you have a large number of dupes a Dictionary object would make each lookup a lot more efficient. A typical List lookup is O(n), while a Dictionary lookup would be O(1).
Here's a pretty brute-force example of a dictionary distinct insert for the above example:
object[] outRow = new object[dt.Columns.Count];
Dictionary<string, bool> already = new Dictionary<string, bool>();
bool test;
using (var writer = conn.BeginTextImport("copy <table> from STDIN WITH NULL AS '' CSV"))
{
foreach (DataRow rw in dt.Rows)
{
for (int col = 0; col < dt.Columns.Count; col++)
outRow[col] = rw[col];
string output = string.Join(",", outRow);
if (!already.TryGetValue(output, out test))
{
writer.WriteLine(output);
already.Add(output, true);
}
}
}
Disclaimer: This is a memory pig. If you can manage dupes any other way, or guarantee the ordering of the data, there are numerous other options.
If you can't (or won't) use a bulk copy insert, something that would help performance would be to wrap your inserts into a transaction (NpgSqlTransaction), but for hundreds of thousands of rows, I can't see why you would.

Using SQLBulkCopy to Insert into Related Tables

I am using SQL Bulk copy to read data form Excel to SQL DB. In the Database, I have two tables into which I need to insert this data from Excel. Table A and Table B which uses the ID(primary Key IDENTITY) from Table A to insert corresponding row records into Table B.
I am able to insert into one table (Table A) using the following Code.
using (SqlConnection connection = new SqlConnection(strConnection)) {
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection)) {
bulkCopy.DestinationTableName = "dbo.[EMPLOYEEINFO]";
try {
// Write from the source to the destination.
SqlBulkCopyColumnMapping NameMap = new SqlBulkCopyColumnMapping(data.Columns[0].ColumnName, "EmployeeName");
SqlBulkCopyColumnMapping GMap = new SqlBulkCopyColumnMapping(data.Columns[1].ColumnName, "Gender");
SqlBulkCopyColumnMapping CMap = new SqlBulkCopyColumnMapping(data.Columns[2].ColumnName, "City");
SqlBulkCopyColumnMapping AMap = new SqlBulkCopyColumnMapping(data.Columns[3].ColumnName, "HomeAddress");
bulkCopy.ColumnMappings.Add(NameMap);
bulkCopy.ColumnMappings.Add(GMap);
bulkCopy.ColumnMappings.Add(CMap);
bulkCopy.ColumnMappings.Add(AMap);
bulkCopy.WriteToServer(data);
}
catch (Exception ex) {
Console.WriteLine(ex.Message);
}
}
}
But then I am not sure how to extend it for two tables which are bound by Foreign Key relationship.Especially, Table B uses the Identity value from Table A Any example would be great. I googled it and none of the threads on SO couldn't give a Working example.
AFAIK bulk copy can only be used to upload into a single table. In order to achieve a bulk upload into two tables, you will therefore need two bulk uploads. Your problem comes from using a foreign key which is an identity. You can work around this, however. I am pretty sure that bulk copy uploads sequentially, which means that if you upload 1,000 records and the last record gets an ID of 10,197, then the ID of the first record is 9,198! So my recommendation would be to upload your first table, check the max id after the upload, deduct the number of records and work from there!
Of course in a high use database, someone might insert after you, so you would need to get the top id by selecting the record which matches your last one by other details (assuming a combination of (upto) all fields would be guaranteed to be unique). Only you know if this is likely to be a problem.
The alternative is not to use an identity column in the first place, but I presume you have no control over the design? In my younger days, I made the mistake of using identities, I never do now. They always find a way of coming back to bite!
For example to add the second data:
DataTable secondTable = new DataTable("SecondTable");
secondTable.Columns.Add("ForeignKey", typeof(int));
secondTable.Columns.Add("DataField", typeof(yourDataType));
...
Add data to secondTable.
(Depends on format of second data)
int cnt = 0;
foreach (var d in mySecondData)
{
DataRow newRow = secondTable.NewRow();
newRow["ForeignKey"] = cnt;
newRow["DataField"] = d.YourData;
secondTable.Rows.Add(newRow);
}
Then after you found out the starting identity (int startID).
for (int i = 0; i < secondTable.Rows.Count; i++)
{
secondTable["ForeignKey"] = secondTable["ForeignKey"] + startID;
}
Finally:
bulkCopy.DestinationTableName = "YourSecondTable";
bulkCopy.WriteToServer(secondTable);

Filling table takes a lot of time in MS Word

I made the following code to add external data table to another table in MS word document, its working fine but takes a lot of time in case that the number of rows is more than 100, and in case of adding table with rows count more that 500 it fills the ms word table really slow and can't complete the task.
I tried to hide the document and disable the screen update for the document but still no solution for the slow performance.
//Get the required external data to the DT data table
DataTable DT = XDt.GetData();
Word.Table TB;
int X = 1;
foreach (DataRow Rw in DT.Rows)
{
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
for(int i=0;i<=DT.Columns.Count-1;i++)
{
Rn.Cells[i+1].Range.Text = Rw[i].ToString());
}
X++;
}
So is there a way to make this process go faster ?
The most efficient way to add a table to Word is to first concatenate the data in a delimited text string, where "/n" must be the symbol for end-of-row (record separator). The end-of-cell (field separator) can be any character you like that's not in the string content that makes up the table.
Assign this string to a Range object, then use the ConvertToTable() method to create the table.
You're retrieving the last row of the current table for the BeforeRow parameter of TB.Rows.Add. This is significantly slower than simply adding the row. You should replace this:
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
With this:
Word.Row Rn = TB.Rows.Add();
Utilizing parallelization as suggested in the comments might help slightly, but I'm afraid it's not going to do much good seeing the table add code runs on the main thread as mentioned in this link.
EDIT:
If performance is still an issue, I'd look into creating the Word table independently of the Word object model by using OpenXML. It's orders of magnitude faster.
ConvertToTable method is orders of magnitude faster than adding Rows/Cells one at a time.
while (reader.Read())
{
values = new object[reader.FieldCount];
var cols = reader.GetValues(values);
var item = String.Join("\t", values);
items.Add(item);
};
data = String.Join("\n", items.ToArray());
var tempDocument = application.Documents.Add();
var range = tempDocument.Range();
range.Text = data;
var tempTable = range.ConvertToTable(Separator: Microsoft.Office.Interop.Word.WdTableFieldSeparator.wdSeparateByTabs,
NumColumns: reader.FieldCount,
NumRows: rows, DefaultTableBehavior: WdDefaultTableBehavior.wdWord9TableBehavior,
AutoFitBehavior: WdAutoFitBehavior.wdAutoFitWindow);

Update 1 column using bulkinsert C#

I have dataabase of 2.5 million records. One column in database is empty. Now i want to fill that column by inserting in Bulk, because a simple update query takes a lot of time. But the problem is that bulkinsert starts inserting record at the end, not from the start. Here is my code:
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
for (int j = 0; j < rows.Length; j++)
{
DataRow dailyProductSalesRow = prodSalesData.NewRow();
string[] temp = ((string)rows[j]["Area"]).Split(new string[] { " ", "A","a" }, StringSplitOptions.None);
if (Int32.TryParse(temp[0], out number))
{
int num1 = Int32.Parse(temp[0]);
dailyProductSalesRow["Area1"] = 1;
}
else
{
dailyProductSalesRow["Area1"] = 0;
Console.WriteLine("Invalid "+temp[0]);
}
prodSalesData.Rows.Add(dailyProductSalesRow);
}
s.DestinationTableName = prodSalesData.TableName;
foreach (var column in prodSalesData.Columns)
{
s.ColumnMappings.Add(column.ToString(), column.ToString());
}
try
{
s.WriteToServer(prodSalesData);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
If you need to update, use an UPDATE stamenet
BULK INSERT is designed to help make inserting many records faster as INSERT can be very slow and difficult to use when trying to get a large amount of data into a database.
UPDATE is already the fastest way of updating records in a database. In your case I'd guess you want a statement a bit like this based on what your BULK INSERT is apparently trying to do.
UPDATE SalesData
SET Area1 = CONVERT(INT, SUBSTRING(Area, 0, CHARINDEX('A', Area)))
If you want to try and make it faster then try looking at Optimizing Bulk Import Performance. Some of the advice there (e.g. the recovery model used by the database) may also be applicable to UPDATE performance when updating many records.

Fastest way to import from Excel to MVC3 Application

I'm working on an import from a CSV file to my ASP.NET MVC3/C#/Entity Framework Application.
Currently this is my code, but I'm looking to optimise:
var excel = new ExcelQueryFactory(file);
var data = from c in excel.Worksheet(0)
select c;
var dataList = data.ToList();
List<FullImportExcel> importList = new List<FullImportExcel>();
foreach (var s in dataList.ToArray())
{
if ((s[0].ToString().Trim().Length < 6) && (s[1].ToString().Trim().Length < 7))
{
FullImportExcel item = new FullImportExcel();
item.Carrier = s[0].ToString().Trim();
item.FlightNo = s[1].ToString().Trim();
item.CodeFlag = s[2].ToString().Trim();
//etc etc (50 more columns here)
importList.Add(item);
}
}
PlannerEntities context = null;
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var item in importList)
{
++count;
context = AddToFullImportContext(context, item, count, 100, true);
}
private PlannerEntities AddToFullImportContext(PlannerEntities context, FullImportExcel entity, int count, int commitCount, bool recreateContext)
{
context.Set<FullImportExcel>().Add(entity);
if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
}
}
return context;
}
This works fine, but isn't as quick as it could be, and the import that I'm going to need to do will be a minimum of 2 million lines every month. Are there any better methods out there for bulk imports?
Am I better avoiding EF altogether and using SQLConnection and inserting that way?
Thanks
I do like how you're only committing records every X number of records (100 in your case.)
I've recently written a system that once a month, needed to update the status of upwards of 50,000 records in one go - this is updating each record and inserting an audit record for each updated record.
Originally I wrote this with the entity framework, and it took 5-6 minutes to do this part of the task. SQL Profiler showed me it was doing 100,000 SQL queries - one UPDATE and one INSERT per record (as expected I guess.)
I changed this to a stored procedure which takes a comma-separated list of record IDs, the status and user ID as parameters, which does a mass-update followed by a mass-insert. This now takes 5 seconds.
In your case, for this number of records, I'd recommend creating a BULK IMPORT file and passing that over to SQL to import.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
For large number of inserts in SQL Server Bulk Copy is the fastest way. You can use the SqlBulkCopy class for accessing Bulk Copy from code. You have to create an IDataReader for your List or you can use this IDataReader for inserting generic Lists I have written.
Thanks to Andy for the heads up - this was the code used in SQL, with a little help from the ever helpful, Pinal Dave - http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/ :)
DECLARE #bulkinsert NVARCHAR(2000)
DECLARE #filepath NVARCHAR(100)
set #filepath = 'C:\Users\Admin\Desktop\FullImport.csv'
SET #bulkinsert =
N'BULK INSERT FullImportExcel2s FROM ''' +
#filepath +
N''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'')'
EXEC sp_executesql #bulkinsert
Still got a bit of work to do to work it into the code, but we're down to 25 seconds for 50000 rows instead of an hour, so a huge improvement!

Categories

Resources