Fast insert relational(normalized) data tables into SQL Server 2008 database - c#

I'm trying to find a better and faster way to insert pretty massive amount of data(~50K rows) than the Linq that I'm using now.
The data I'm trying to write to a local db is in a list of ORM mapped data serialized and received from WCF.
I'm keen on using SqlBulkCopy, but the problem is that the tables are normalized and are actually a sequence or interconnected tables with one-to-many relationships.
Here's some code that illustrates my point:
foreach (var meeting in meetingsList)
{
int meetingId = dbAccess.InsertM(value1, value2...);
foreach (var competition in meeting.COMPETITIONs)
{
int competitionId = dbAccess.InsertC(meetingid, value1, value2...);
foreach(var competitor in competition.COMPETITORs)
{
int competitorId = dbAccess.InsertCO(comeetitionId, value1,....)
// and so on
}
}
}
where dbAccess.InsertMeeting looks something like this:
// check if meeting exists
int meetingId = GetMeeting(meeting, date);
if (meetingId == 0)
{
// if meeting doesn't exist insert new
var m = new MEETING
{
NAME = name,
DATE = date
}
_db.InsertOnSubmit(m);
_db.SubmitChanges();
}
Thanks in advance for any answers.
Bojan

I would still use SqlBulkCopy to quickly copy your data from the external file into a staging table that has the same (flat) structure as the file (you'll need to create that table ahead of time)
Once it's loaded, you can split up the data across multiple tables using e.g. a stored procedure or something - should be pretty fast since everything's on the server already.

Related

Using SQLBulkCopy to Insert into Related Tables

I am using SQL Bulk copy to read data form Excel to SQL DB. In the Database, I have two tables into which I need to insert this data from Excel. Table A and Table B which uses the ID(primary Key IDENTITY) from Table A to insert corresponding row records into Table B.
I am able to insert into one table (Table A) using the following Code.
using (SqlConnection connection = new SqlConnection(strConnection)) {
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection)) {
bulkCopy.DestinationTableName = "dbo.[EMPLOYEEINFO]";
try {
// Write from the source to the destination.
SqlBulkCopyColumnMapping NameMap = new SqlBulkCopyColumnMapping(data.Columns[0].ColumnName, "EmployeeName");
SqlBulkCopyColumnMapping GMap = new SqlBulkCopyColumnMapping(data.Columns[1].ColumnName, "Gender");
SqlBulkCopyColumnMapping CMap = new SqlBulkCopyColumnMapping(data.Columns[2].ColumnName, "City");
SqlBulkCopyColumnMapping AMap = new SqlBulkCopyColumnMapping(data.Columns[3].ColumnName, "HomeAddress");
bulkCopy.ColumnMappings.Add(NameMap);
bulkCopy.ColumnMappings.Add(GMap);
bulkCopy.ColumnMappings.Add(CMap);
bulkCopy.ColumnMappings.Add(AMap);
bulkCopy.WriteToServer(data);
}
catch (Exception ex) {
Console.WriteLine(ex.Message);
}
}
}
But then I am not sure how to extend it for two tables which are bound by Foreign Key relationship.Especially, Table B uses the Identity value from Table A Any example would be great. I googled it and none of the threads on SO couldn't give a Working example.
AFAIK bulk copy can only be used to upload into a single table. In order to achieve a bulk upload into two tables, you will therefore need two bulk uploads. Your problem comes from using a foreign key which is an identity. You can work around this, however. I am pretty sure that bulk copy uploads sequentially, which means that if you upload 1,000 records and the last record gets an ID of 10,197, then the ID of the first record is 9,198! So my recommendation would be to upload your first table, check the max id after the upload, deduct the number of records and work from there!
Of course in a high use database, someone might insert after you, so you would need to get the top id by selecting the record which matches your last one by other details (assuming a combination of (upto) all fields would be guaranteed to be unique). Only you know if this is likely to be a problem.
The alternative is not to use an identity column in the first place, but I presume you have no control over the design? In my younger days, I made the mistake of using identities, I never do now. They always find a way of coming back to bite!
For example to add the second data:
DataTable secondTable = new DataTable("SecondTable");
secondTable.Columns.Add("ForeignKey", typeof(int));
secondTable.Columns.Add("DataField", typeof(yourDataType));
...
Add data to secondTable.
(Depends on format of second data)
int cnt = 0;
foreach (var d in mySecondData)
{
DataRow newRow = secondTable.NewRow();
newRow["ForeignKey"] = cnt;
newRow["DataField"] = d.YourData;
secondTable.Rows.Add(newRow);
}
Then after you found out the starting identity (int startID).
for (int i = 0; i < secondTable.Rows.Count; i++)
{
secondTable["ForeignKey"] = secondTable["ForeignKey"] + startID;
}
Finally:
bulkCopy.DestinationTableName = "YourSecondTable";
bulkCopy.WriteToServer(secondTable);

Pulling data from one SQL Azure table, add a column, then populate a different table

I using C# and LINQ to pull/push data housed in SQL Azure. The basic scenario is we have a Customer table that contains all customers (PK = CompanyID) and supporting tables like LaborTypes and Materials (FK CompanyID to Customer table).
When a new customer signs up, a new record is created in the Customers table. Once that is complete, I want to load a set of default materials and laborTypes from a separate table. It is simple enough if I just wanted to copy data direct from one table to another but in order to populate the existing tables for the new customer, I need to take the seed data (e.g. laborType, laborDescription), add the CompanyID for each row of seed data, then do the insert to the existing table.
What the best method to accomplish this using C# and LINQ with SQL Azure?
An example of a direct insert from user input for LaborTypes is below for contextual reference.
using (var context = GetContext(memCustomer))
{
var u = GetUserByUsername(context, memUser);
var l = (from lbr in context.LaborTypes
where lbr.LaborType1.ToLower() == laborType
&& lbr.Company == u.Company
select lbr).FirstOrDefault();
if (l == null)
{
l = new AccountDB.LaborType();
l.Company = u.Company;
l.Description = laborDescription;
l.LaborType1 = laborType;
l.FlatRate = flatRate;
l.HourlyRate = hourlyRate;
context.LaborTypes.InsertOnSubmit(l);
context.SubmitChanges();
}
result = true;
}
What you'll want to do is write a query retrieving data from table B and do an Insert Statement on Table A using the result(s).
This has been covered elsewhere in SO I think, here would be a good place to start
I don't know the syntax for Linq specifically; but by constructing something similar to #Murph 's answer beyond that link, I think this might work.
var fromB= from b in TableB
where ... ;//identify the row/data from table B
// you may need to make fromB populate from the DB here.
var toInsert = ...; //construct your object with the desired data
// do other necessary things here
TableA.InsertAllOnSubmit(toInsert);
dc.SubmitChanges(); // Submit your changes

Making safe a mass insert of data

I'm currently building an application that needs a feature to import a user-supplied CSV file as data into a database. Each "cell" in the CSV will be stored in its own row.
Initially I was using parameterized queries to insert each row one-by-one, but the speed of the operation (520,000 inserts in one example file!) meant I'm having to reconsider that. I'm now parsing the CSV file into an IEnumerable<Answer> and handing it over to the following code to be inserted into the database in batches:
public void AddAnswers(IEnumerable<Answer> answers)
{
const int batchSize = 1000;
var values = new StringBuilder();
var i = 0;
foreach (var answer in answers)
{
if (i++ > 0)
{
values.Append(",");
}
values.AppendFormat("({0},{1},'{2}')", answer.AnswerSetId, answer.QuestionId, answer.Value.Replace("'", "''"));
if (i == batchSize)
{
// We've reached the batch size limit - send what we have so far
SendAnswerBatch(values.ToString());
values.Clear();
i = 0;
}
}
if (i > 0)
{
// Ensure any leftovers that didn't reach the maximum batch size are sent over
SendAnswerBatch(values.ToString());
}
}
private void SendAnswerBatch(string values)
{
var query = String.Format("INSERT INTO Answers (AnswerSetId,QuestionId,Value) VALUES {0}", values);
Context.Database.ExecuteSqlCommand(query);
}
This changed a large set of data from taking over 5 minutes to less than 5 seconds to insert, however I realise that basic replacing of ' with '' is not safe.
Obviously the safest way to insert a single row would be to use a parameterized query but is there a way to make such a thing work with a batch insert like this?
If at all possible, I also need it to be non-database specific - I had already considered SqlBulkCopy but the application needs to support multiple database engines.
i would suggest you use sqlBulkCopy, when inserting a lot of values, this provided to be really usefull to me
place your items into a datatable and let SqlBulkCopy do the rest.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

Fastest way to import from Excel to MVC3 Application

I'm working on an import from a CSV file to my ASP.NET MVC3/C#/Entity Framework Application.
Currently this is my code, but I'm looking to optimise:
var excel = new ExcelQueryFactory(file);
var data = from c in excel.Worksheet(0)
select c;
var dataList = data.ToList();
List<FullImportExcel> importList = new List<FullImportExcel>();
foreach (var s in dataList.ToArray())
{
if ((s[0].ToString().Trim().Length < 6) && (s[1].ToString().Trim().Length < 7))
{
FullImportExcel item = new FullImportExcel();
item.Carrier = s[0].ToString().Trim();
item.FlightNo = s[1].ToString().Trim();
item.CodeFlag = s[2].ToString().Trim();
//etc etc (50 more columns here)
importList.Add(item);
}
}
PlannerEntities context = null;
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var item in importList)
{
++count;
context = AddToFullImportContext(context, item, count, 100, true);
}
private PlannerEntities AddToFullImportContext(PlannerEntities context, FullImportExcel entity, int count, int commitCount, bool recreateContext)
{
context.Set<FullImportExcel>().Add(entity);
if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
}
}
return context;
}
This works fine, but isn't as quick as it could be, and the import that I'm going to need to do will be a minimum of 2 million lines every month. Are there any better methods out there for bulk imports?
Am I better avoiding EF altogether and using SQLConnection and inserting that way?
Thanks
I do like how you're only committing records every X number of records (100 in your case.)
I've recently written a system that once a month, needed to update the status of upwards of 50,000 records in one go - this is updating each record and inserting an audit record for each updated record.
Originally I wrote this with the entity framework, and it took 5-6 minutes to do this part of the task. SQL Profiler showed me it was doing 100,000 SQL queries - one UPDATE and one INSERT per record (as expected I guess.)
I changed this to a stored procedure which takes a comma-separated list of record IDs, the status and user ID as parameters, which does a mass-update followed by a mass-insert. This now takes 5 seconds.
In your case, for this number of records, I'd recommend creating a BULK IMPORT file and passing that over to SQL to import.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
For large number of inserts in SQL Server Bulk Copy is the fastest way. You can use the SqlBulkCopy class for accessing Bulk Copy from code. You have to create an IDataReader for your List or you can use this IDataReader for inserting generic Lists I have written.
Thanks to Andy for the heads up - this was the code used in SQL, with a little help from the ever helpful, Pinal Dave - http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/ :)
DECLARE #bulkinsert NVARCHAR(2000)
DECLARE #filepath NVARCHAR(100)
set #filepath = 'C:\Users\Admin\Desktop\FullImport.csv'
SET #bulkinsert =
N'BULK INSERT FullImportExcel2s FROM ''' +
#filepath +
N''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'')'
EXEC sp_executesql #bulkinsert
Still got a bit of work to do to work it into the code, but we're down to 25 seconds for 50000 rows instead of an hour, so a huge improvement!

What is the best way to fast insert SQL data and dependant rows?

I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.
I.e. schema looks like this:
Item
- Id
- Title
Property
- Id
- FK_Item
- Value
My first attempt was something vaguely like this:
BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
Item i = new Item() { Title = value["title"]};
ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
db.Items.InsertOnSubmit(i);
db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();
Obviously this was terribly slow so I'm now using something like this:
BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
DataRow item = dt.NewRow();
item["Title"] = value["title"];
dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
sb.DestinationTableName = "dbo.Item";
sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
sb.WriteToServer(dt);
}
But this doesn't allow me to add the corresponding 'Property' rows.
I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.
Can anyone think of a better (i.e. more succinct, near equal performance) solution?
To combine the previous best two answers and add in the missing piece for the IDs:
1) Use BCP to Load the data into a temporary "staging" table defined like this
CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});
and you'll need the appropriate index for performance later:
CREATE INDEX ix_stage ON stage(Title);
2) Use SQL INSERT to load the Item table:
INSERT INTO Item(Title) SELECT Title FROM stage;
3) Finally load the Property table by joining stage with Item:
INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title
The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.
Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.
You can bulk copy in code as well, using the .NET SqlBulkCopy class.

Categories

Resources