I am using SQL Bulk copy to read data form Excel to SQL DB. In the Database, I have two tables into which I need to insert this data from Excel. Table A and Table B which uses the ID(primary Key IDENTITY) from Table A to insert corresponding row records into Table B.
I am able to insert into one table (Table A) using the following Code.
using (SqlConnection connection = new SqlConnection(strConnection)) {
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection)) {
bulkCopy.DestinationTableName = "dbo.[EMPLOYEEINFO]";
try {
// Write from the source to the destination.
SqlBulkCopyColumnMapping NameMap = new SqlBulkCopyColumnMapping(data.Columns[0].ColumnName, "EmployeeName");
SqlBulkCopyColumnMapping GMap = new SqlBulkCopyColumnMapping(data.Columns[1].ColumnName, "Gender");
SqlBulkCopyColumnMapping CMap = new SqlBulkCopyColumnMapping(data.Columns[2].ColumnName, "City");
SqlBulkCopyColumnMapping AMap = new SqlBulkCopyColumnMapping(data.Columns[3].ColumnName, "HomeAddress");
bulkCopy.ColumnMappings.Add(NameMap);
bulkCopy.ColumnMappings.Add(GMap);
bulkCopy.ColumnMappings.Add(CMap);
bulkCopy.ColumnMappings.Add(AMap);
bulkCopy.WriteToServer(data);
}
catch (Exception ex) {
Console.WriteLine(ex.Message);
}
}
}
But then I am not sure how to extend it for two tables which are bound by Foreign Key relationship.Especially, Table B uses the Identity value from Table A Any example would be great. I googled it and none of the threads on SO couldn't give a Working example.
AFAIK bulk copy can only be used to upload into a single table. In order to achieve a bulk upload into two tables, you will therefore need two bulk uploads. Your problem comes from using a foreign key which is an identity. You can work around this, however. I am pretty sure that bulk copy uploads sequentially, which means that if you upload 1,000 records and the last record gets an ID of 10,197, then the ID of the first record is 9,198! So my recommendation would be to upload your first table, check the max id after the upload, deduct the number of records and work from there!
Of course in a high use database, someone might insert after you, so you would need to get the top id by selecting the record which matches your last one by other details (assuming a combination of (upto) all fields would be guaranteed to be unique). Only you know if this is likely to be a problem.
The alternative is not to use an identity column in the first place, but I presume you have no control over the design? In my younger days, I made the mistake of using identities, I never do now. They always find a way of coming back to bite!
For example to add the second data:
DataTable secondTable = new DataTable("SecondTable");
secondTable.Columns.Add("ForeignKey", typeof(int));
secondTable.Columns.Add("DataField", typeof(yourDataType));
...
Add data to secondTable.
(Depends on format of second data)
int cnt = 0;
foreach (var d in mySecondData)
{
DataRow newRow = secondTable.NewRow();
newRow["ForeignKey"] = cnt;
newRow["DataField"] = d.YourData;
secondTable.Rows.Add(newRow);
}
Then after you found out the starting identity (int startID).
for (int i = 0; i < secondTable.Rows.Count; i++)
{
secondTable["ForeignKey"] = secondTable["ForeignKey"] + startID;
}
Finally:
bulkCopy.DestinationTableName = "YourSecondTable";
bulkCopy.WriteToServer(secondTable);
Related
I'm currently using Mono on Ubuntu with MonoDevelop, running with a DataTable matching a table in the database, and should be attempting to update it.
The code following uses a Dataset loaded from an XML file, which was created from a Dataset.WriteXML on another machine.
try
{
if(ds.Tables.Contains(s))
{
ds.Tables[s].AcceptChanges();
foreach(DataRow dr in ds.Tables[s].Rows)
dr.SetModified(); // Setting to modified so that it updates, rather than inserts, into the database
hc.Data.Database.Update(hc.Data.DataDictionary.GetTableInfo(s), ds.Tables[s]);
}
}
catch (Exception ex)
{
Log.WriteError(ex);
}
This is the code for inserting/updating into the database.
public override int SQLUpdate(DataTable dt, string tableName)
{
MySqlDataAdapter da = new MySqlDataAdapter();
try
{
int rowsChanged = 0;
int tStart = Environment.TickCount;
da.SelectCommand = new MySqlCommand("SELECT * FROM " + tableName);
da.SelectCommand.Connection = connection;
MySqlCommandBuilder cb = new MySqlCommandBuilder(da);
da.UpdateCommand = cb.GetUpdateCommand();
da.DeleteCommand = cb.GetDeleteCommand();
da.InsertCommand = cb.GetInsertCommand();
da.ContinueUpdateOnError = true;
da.AcceptChangesDuringUpdate = true;
rowsChanged = da.Update(dt);
Log.WriteVerbose("Tbl={0},Rows={1},tics={2},", dt.TableName, rowsChanged, Misc.Elapsed(tStart));
return rowsChanged;
catch (Exception ex)
{
Log.WriteError("{0}", ex.Message);
return -1
}
I'm trying the above code, and rowsChanged becomes 4183, the number of rows I'm editing. However, when I use HeidiSQL to check the database itself, it doesn't change anything at all.
Is there a step I'm missing?
Edit: Alternatively, being able to overwrite all rows in the database would work as well. This is a setup for updating remote computers using USB sticks, forcing it to match a source data table.
Edit 2: Added more code sample to show the source of the DT. The DataTable is prefilled in the calling function, and all rows have DataRow.SetModified(); applied.
Edit 3: Additional information. The Table is being filled with data from an XML file. Attempting fix suggested in comments.
Edit 4: Adding calling code, just in case.
Thank you for your help.
The simplest way which you may want to look into might be to TRUNCATE the destination table, then simply save the XML import to it (with AI off so it uses the imported ID if necessary). The only problem may be with the rights to do that. Otherwise...
What you are trying to do can almost be handled using the Merge method. However, it can't/won't know about deleted rows. Since the method is acting on DataTables, if a row was deleted in the master database, it will simply not exist in the XML extract (versus a RowState of Deleted). These can be weeded out with a loop.
Likewise, any new rows may get a different PK for an AI int. To prevent that, just use a simple non-AI PK in the destination db so it can accept any number.
The XML loading:
private DataTable LoadXMLToDT(string filename)
{
DataTable dt = new DataTable();
dt.ReadXml(filename);
return dt;
}
The merge code:
DataTable dtMaster = LoadXMLToDT(#"C:\Temp\dtsample.xml");
// just a debug monitor
var changes = dtMaster.GetChanges();
string SQL = "SELECT * FROM Destination";
using (MySqlConnection dbCon = new MySqlConnection(MySQLOtherDB))
{
dtSample = new DataTable();
daSample = new MySqlDataAdapter(SQL, dbCon);
MySqlCommandBuilder cb = new MySqlCommandBuilder(daSample);
daSample.UpdateCommand = cb.GetUpdateCommand();
daSample.DeleteCommand = cb.GetDeleteCommand();
daSample.InsertCommand = cb.GetInsertCommand();
daSample.FillSchema(dtSample, SchemaType.Source);
dbCon.Open();
// the destination table
daSample.Fill(dtSample);
// handle deleted rows
var drExisting = dtMaster.AsEnumerable()
.Select(x => x.Field<int>("Id"));
var drMasterDeleted = dtSample.AsEnumerable()
.Where( q => !drExisting.Contains(q.Field<int>("Id")));
// delete based on missing ID
foreach (DataRow dr in drMasterDeleted)
dr.Delete();
// merge the XML into the tbl read
dtSample.Merge(dtMaster,false, MissingSchemaAction.Add);
int rowsChanged = daSample.Update(dtSample);
}
For whatever reason, rowsChanged always reports as many changes as there are total rows. But changes from the Master/XML DataTable do flow thru to the other/destination table.
The delete code gets a list of existing IDs, then determines which rows needs to be deleted from the destination DataTable by whether the new XML table has a row with that ID or not. All the missing rows are deleted, then the tables are merged.
The key is dtSample.Merge(dtMaster,false, MissingSchemaAction.Add); which merges the data from dtMaster with dtSample. The false param is what allows the incoming XML changes to overwrite values in the other table (and eventually be saved to the db).
I have no idea whether some of the issues like non matching AI PKs is a big deal or not, but this seems to handle all that I could find. In reality, what you are trying to do is Database Synchronization. Although with one table, and just a few rows, the above should work.
I want to insert around 1 million records into a database using Linq in ASP.NET MVC. But when I try the following code it didn't work. It's throwing an OutOfMemoryException. And also it took 3 days in the loop. Can anyone please help me on this???
db.Database.ExecuteSqlCommand("DELETE From [HotelServices]");
DataTable tblRepeatService = new DataTable();
tblRepeatService.Columns.Add("HotelCode",typeof(System.String));
tblRepeatService.Columns.Add("Service",typeof(System.String));
tblRepeatService.Columns.Add("Category",typeof(System.String));
foreach (DataRow row in xmltable.Rows)
{
string[] servicesarr = Regex.Split(row["PAmenities"].ToString(), ";");
for (int a = 0; a < servicesarr.Length; a++)
{
tblRepeatService.Rows.Add(row["HotelCode"].ToString(), servicesarr[a], "PA");
}
String[] servicesarrA = Regex.Split(row["RAmenities"].ToString(), ";");
for (int b = 0; b < servicesarrA.Length; b++)
{
tblRepeatService.Rows.Add(row["hotelcode"].ToString(), servicesarrA[b], "RA");
}
}
HotelAmenties _hotelamenties;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
}
db.SaveChanges();
tblRepeatService table has around 1 million rows.
Bulk inserts like this are highly inefficient in LINQtoSQL. Every insert creates at least three objects (the DataRow, the HotelAmenities object and the tracking record for it), chewing up memory on objects you don't need.
Given that you already have a DataTable, you can use System.Data.SqlClient.SqlBulkCopy to push the content of the table to a temporary table on the SQL server, then use a single insert statement to load the data into its final destination. This is the fastest way I have found so far to move many thousands of records from memory to SQL.
If performance doesn't matter and this is a 1 shot job you can stick to the way you're using. Your problem is you're only saving at the end, so entity Framework has to store and generate the SQL for 1 million operations at once, modify your code so that you save every 1000 or so inserts instead of only at the end and it should work just fine.
int i = 0;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
if((i%1000)==0){
db.SaveChanges();
}
i++;
}
db.SaveChanges();
I have come across a problem in using the DataAdapter, which I hope someone can help with. Basically I am creating a system, which is as follows:
Data is read in from a data source (MS-Access, SQL Server or Excel), converted to data tables and inserted into a local SQL Server database, using DataAdapters. This bit works fine. The SQL server table has a PK, which is an identity field with auto increment set to on.
Subsequent data loads read in the data from the source and compare it to what we already have. If the record is missing then it is added (this works fine). If the record is different then it needs to be updated (this doesn't work).
When doing the differential data load I create a data table which reads in the schema from the destination table (SQL server) and ensures it has the same columns etc.
The PK in the destination table is column 0, so when a record is inserted all of the values from column 1 onwards are set (as mentioned this works perfectly.). I don't change the row status for items I am adding. The PK in the data table is set correctly and I can confirm this.
When updating data I set column 0 (the PK column) to be the value of the record I am updating and set all of the columns to be the same as the source data.
For updated records I call AcceptChanges and SetModified on the row to ensure (I thought) that the application calls the correct method.
The DataAdapter is set with SelectCommand and UpdateCommand using the command builder.
When I run, I have traced it using SQL profiler and can see that the insert command is being ran correctly, but the update command isn't being ran at all, which is the crux of the problem. For reference an insert table will look something like the following
PK Value1 Value 2 Row State
== ====== ======= =========
124 Test1 Test 2 Added
123 Test3 Test4 Updated
Couple of things to be aware of....
I have tested this by loading the row to be changed into the datatable, changing some column fields and running update and this works. However, this is impractical for my solution because the data is HUGE >1Gb so I can't simply load it into a datatable without taking a huge performance hit. What I am doing is creating the data table with a max of 500 rows and the running the Update. Testing during the initial data load showed this to be the most efficient in terms of memory useage and performance. The data table is cleared after each batch is ran.
Anyone any ideas on where I am going wrong here?
Thanks in advance
Andrew
==========Update==============
Following is the code to create the insert/update rows
private static void AddNewRecordToDataTable(DbDataReader pReader, ref DataTable pUpdateDataTable)
{
// create a new row in the table
DataRow pUpdateRow = pUpdateDataTable.NewRow();
// loop through each item in the data reader - setting all the columns apart from the PK
for (int addCount = 0; addCount < pReader.FieldCount; addCount++)
{
pUpdateRow[addCount + 1] = pReader[addCount];
}
// add the row to the update table
pUpdateDataTable.Rows.Add(pUpdateRow);
}
private static void AddUpdateRecordToDataTable(DbDataReader pReader, int pKeyValue,
ref DataTable pUpdateDataTable)
{
DataRow pUpdateRow = pUpdateDataTable.NewRow();
// set the first column (PK) to the value passed in
pUpdateRow[0] = pKeyValue;
// loop for each row apart from the PK row
for (int addCount = 0; addCount < pReader.FieldCount; addCount++)
{
pUpdateRow[addCount + 1] = pReader[addCount];
}
// add the row to the table and then update it
pUpdateDataTable.Rows.Add(pUpdateRow);
pUpdateRow.AcceptChanges();
pUpdateRow.SetModified();
}
The following code is used to actually do the update:
updateAdapter.Fill(UpdateTable);
updateAdapter.Update(UpdateTable);
UpdateTable.AcceptChanges();
The following is used to create the data table to ensure it has the same fields/data types as the source data
private static DataTable CreateDataTable(DbDataReader pReader)
{
DataTable schemaTable = pReader.GetSchemaTable();
DataTable resultTable = new DataTable(<tableName>); // edited out personal info
// loop for each row in the schema table
try
{
foreach (DataRow dataRow in schemaTable.Rows)
{
// create a new DataColumn object and set values depending
// on the current DataRows values
DataColumn dataColumn = new DataColumn();
dataColumn.ColumnName = dataRow["ColumnName"].ToString();
dataColumn.DataType = Type.GetType(dataRow["DataType"].ToString());
dataColumn.ReadOnly = (bool)dataRow["IsReadOnly"];
dataColumn.AutoIncrement = (bool)dataRow["IsAutoIncrement"];
dataColumn.Unique = (bool)dataRow["IsUnique"];
resultTable.Columns.Add(dataColumn);
}
}
catch (Exception ex)
{
message = "Unable to create data table " + ex.Message;
throw new Exception(message, ex);
}
return resultTable;
}
In case anyone is interested I did manage to get around the problem, but never managed to get the data adapter to work. Basically what I did was as follows:
Create a list of objects with an index and a list of field values as members
Read in the rows that have changed and store the values from the source data (i.e. the values that will overwrite the current ones in the object). In addition I create a comma separated list of the indexes
When I am finished I use the comma separated list in a sql IN statement to return the rows and load them into my data adapter
For each one I run a LINQ query against the index and extract the new values, updating the data set. This sets the row status to modified
I then run the update and the rows are updated correctly.
This isn't the quickest or neatest solution, but it does work and allows me to run the changes in batches.
Thanks
Andrew
I'm doing operation on a dataset containing data from a sql table named Test_1 and then get the updated records using the DataSet.GetChanges(DataRowState.Modified) function. Then i try to save the dataset containing the updated records on a different table than the source one (the table is names Test and has the same structure as Test_1) using the following statement:
sqlDataAdapter.Update(changesDataSet,"Test");
I'm getting the following error : Update unable to find TableMapping['Test'] or DataTable 'Test'
I'm new to ado.net and don't even know if it"s something possible. Any advice is welcome.
Just to provide a bit of context. ETL jobs are importing data in temp table with same structure as the original but with _jobid suffix. Right after a rule engine is doing validation before updating the original table.
why don't you create an update trigger on the table instead that will insert into the other table?
if test and test_1 contain are totally equal, this works:
DataSet1 ds = new DataSet1();
var da = new DataSet1TableAdapters.DepositTableAdapter();
var da1 = new DataSet1TableAdapters.Deposit_1TableAdapter();
da.Fill(ds.Deposit);
foreach (DataSet1.DepositRow row in ds.Deposit.Rows)
{
if (row.ID == 3)
{
row.Amount++;
}
foreach (var c in row.ItemArray)
{
Console.Write(c);
}
Console.WriteLine("");
}
Console.WriteLine(ds.Deposit.GetChanges(System.Data.DataRowState.Modified).Rows);
var updateTable = new DataSet1.Deposit_1DataTable();
foreach (DataSet1.DepositRow row in ds.Deposit.GetChanges(System.Data.DataRowState.Modified).Rows)
{
updateTable.ImportRow((System.Data.DataRow)row);
}
da1.Update(updateTable);
Is test_1 everytime empty after the rule engine worked with it?
Do test_1 and test have the totally exact rows in them?
I hope i can provide further help if you answear my questions.
I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.
I.e. schema looks like this:
Item
- Id
- Title
Property
- Id
- FK_Item
- Value
My first attempt was something vaguely like this:
BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
Item i = new Item() { Title = value["title"]};
ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
db.Items.InsertOnSubmit(i);
db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();
Obviously this was terribly slow so I'm now using something like this:
BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
DataRow item = dt.NewRow();
item["Title"] = value["title"];
dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
sb.DestinationTableName = "dbo.Item";
sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
sb.WriteToServer(dt);
}
But this doesn't allow me to add the corresponding 'Property' rows.
I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.
Can anyone think of a better (i.e. more succinct, near equal performance) solution?
To combine the previous best two answers and add in the missing piece for the IDs:
1) Use BCP to Load the data into a temporary "staging" table defined like this
CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});
and you'll need the appropriate index for performance later:
CREATE INDEX ix_stage ON stage(Title);
2) Use SQL INSERT to load the Item table:
INSERT INTO Item(Title) SELECT Title FROM stage;
3) Finally load the Property table by joining stage with Item:
INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title
The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.
Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.
You can bulk copy in code as well, using the .NET SqlBulkCopy class.