SqlBulkCopy string NULL CSV - c#

I'm trying to mass-insert a CSV file into a SQL Server database.
The process is .CSV files DataTable for SqlBulkCopy SQL Server.
In the file I have several NULL that this code returns as text, and should not be text:
var linea = line.Split(delimiter);
row = dt.NewRow();
for (j = 0; j < linea.Length; j++)
{
if (linea[j].ToString().ToLower() == "null")
{
row[j] = DBNull.Value;
}
}
dt.Rows.Add(row);

It's pretty vague on what's your problem is, but I think I know you want the data to be stored as datatype NULL instead of a string "NULL" in the DB.
From what I know about SQLBulkCopy into datatable, you can't manipulate the data to store NULL in database table. To deal with this, I used two work-around:
Remove the "NULL" inside the CSV before uploading. If the CSV comes from your users, advise them not to use "NULL" for cell that is suppose to be blank/empty. Just leave the cell blank. This is the most direct way, because the datatable will somehow store NULL into database table when the cell is blank.
Pretty sluggish way, replace all "NULL" value in datatable to blank ("") and store into database table. When SQLBulkCopy inserted all data, run an update statement to update all blank column to NULL.

Related

DataTable update() inserts duplicate new rows without checking if it exists

I'm trying to use the update() method, but it is inserting my datatable data into my database without checking if the row exists, so it is inserting duplicate data. It is also not deleting rows that don't exist in datatable. How to resolve this? I want to synchronize my datatable with server table.
private void Form1_Load(object sender, EventArgs e)
{
// TODO: This line of code loads data into the 'MyDatabaseDataSet11.Vendor_GUI_Test_Data' table. You can move, or remove it, as needed.
this.vendor_GUI_Test_DataTableAdapter.Fill(this.MyDatabaseDataSet11.Vendor_GUI_Test_Data);
// read target table on SQL Server and store in a tabledata var
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
}
Insertion
private void convertGUIToTableFormat()
{
ServerDataTable.Rows.Clear();
// loop through GUIDataTable rows
for (int i = 0; i < GUIDataTable.Rows.Count; i++)
{
String guiKEY = (String)GUIDataTable.Rows[i][0] + "," + (String)GUIDataTable.Rows[i][8] + "," + (String)GUIDataTable.Rows[i][9];
//Console.WriteLine("guiKey: " + guiKEY);
// loop through every DOW value, make a new row for every true
for(int d = 1; d < 8; d++)
{
if ((bool)GUIDataTable.Rows[i][d] == true)
{
DataRow toInsert = ServerDataTable.NewRow();
toInsert[0] = GUIDataTable.Rows[i][0];
toInsert[1] = d + "";
toInsert[2] = GUIDataTable.Rows[i][8];
toInsert[3] = GUIDataTable.Rows[i][9];
ServerDataTable.Rows.InsertAt(toInsert, 0);
//printDataRow(toInsert);
//Console.WriteLine("---------------");
}
}
}
Trying to update
// I got this adapter from datagridview, casting my datatable to their format
CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable DT = (CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable)ServerDataTable;
DT.PrimaryKey = new DataColumn[] { DT.Columns["Vendor"], DT.Columns["DOW"], DT.Columns["LeadTime"], DT.Columns["DemandPeriod"] };
this.vendor_GUI_Test_DataTableAdapter.Update(DT);
Let's look at what happens in the code posted.
First this line:
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
This is not a copy, but just an assignment between two variables. The assigned one (ServerDataTable) receives the 'reference' to the memory area where the data coming from the database has been stored. So these two variables 'point' to the same memory area. Whatever you do with one affects what the other sees.
Now look at this line:
ServerDataTable.Rows.Clear();
Uh! Why? You are clearing the memory area where the data loaded from the database were. Now the Datatable is empty and no records (DataRow) are present there.
Let's look at what happen inside the loop
DataRow toInsert = ServerDataTable.NewRow();
A new DataRow has been created, now every DataRow has a property called RowState and when you create a new row this property has the default value of DataRowState.Detached, but when you add the row inside the DataRow collection with
ServerDataTable.Rows.InsertAt(toInsert, 0);
then the DataRow.RowState property becomes DataRowState.Added.
At this point the missing information is how a TableAdapter behaves when you call Update. The adapter needs to build the appropriate INSERT/UPDATE/DELETE sql command to update the database. And what is the information used to choose the proper sql command? Indeed, it looks at the RowState property and it sees that all your rows are in the Added state. So it chooses the INSERT command for your table and barring any duplicate key violation you will end in your table with duplicate records.
What should you do to resolve the problem? Well the first thing is to remove the line that clears the memory from the data loaded, then, instead of calling always InsertAt you should first look if you have already the row in memory. You could do this using the DataTable.Select method. This method requires a string like it is a WHERE statement and you should use some value for the primarykey of your table
var rows = ServerDataTable.Select("PrimaryKeyFieldName = " + valueToSearchFor);
if you get a rows count bigger than zero then you can use the first row returned and update the existing values with your changes, if there is no row matching the condition then you can use the InsertAt like you are doing it now.
You're trying too hard, I think, and you're unfortunately getting nearly everything wrong
// read target table on SQL Server and store in a tabledata var
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
No, this line of code doesn't do anything at all with the database, it just assigns an existing datatable to a property called ServerDataTable.
for (int i = 0; i < GUIDataTable.Rows.Count; i++)
It isn't clear if GUIDataTable is strongly or weakly typed, but if it's strong (I.e. it lives in your dataset, or is of a type that is a part of your dataset) you will do yourself massive favors if you do not access it's Rows collection at all. The way to access a strongly typed datatable is as if it were an array
myStronglyTypedTable[2] //yes, third row
myStronglyTypedTable.Rows[2] //no, do not do this- you end up with a base type DataRow that is massively harder to work with
Then we have..
DataRow toInsert = ServerDataTable.NewRow();
Again, don't do this.. you're working with strongly typed datatables. This makes your life easy:
var r = MyDatabaseDataSet11.Vendor_GUI_Test_Data.NewVendor_GUI_Test_DataRow();
Because now you can refer to everything by name and type, not numerical index and object:
r.Total = r.Quantity * r.Price; //yes
toInsert["Ttoal"] = (int)toInsert["Quantity"] * (double)toInsert["Price"]; //no. Messy, hard work, "stringly" typed, casting galore, no intellisense.. The typo was deliberate btw
You can also easily add data to a typed datatable like:
MyPersonDatatable.AddPersonRow("John, "smith", 29, "New York");
Next up..
// I got this adapter from datagridview, casting my datatable to their format
CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable DT = (CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable)ServerDataTable;
DT.PrimaryKey = new DataColumn[] { DT.Columns["Vendor"], DT.Columns["DOW"], DT.Columns["LeadTime"], DT.Columns["DemandPeriod"] };
this.vendor_GUI_Test_DataTableAdapter.Update(DT);
Need to straighten out the concepts and terminology in your mind here.. that is not an adapter, it didn't come from a datagridview, grid views never provide adapters, your datatable variable was always their format and if you typed it as DataTable ServerDataTable then that just makes it massively harder to work with, in the same way that saying object o = new Person() - now you have to cast o every time you want to do nearly anything Person specific with it. You could always declare all your variables in every program, as type object.. but you don't.. Hence don't do the equivalent by putting your strongly typed datatables inside DataTable typed variables because you're just hiding away the very things that make them useful and easy to work with
If you download rows from a database into a datatable, and you want to...
... delete them from the db, then call Delete on them in the datatable
... update them in the db, then set new values on the existing rows in the datatable
... insert more rows into the db alongside the existing rows, then add more rows to the datatable
Datatables track what you do to their rows. If you clear a datatable it doesn't mark every row as deleted, it just jettisons the rows. No db side rows will be affected. If you delete rows then they gain a rowstate of deleted and a delete query will fire when you call adapter.Update
Modify rows to cause an update to fire. Add new rows for insert
As Steve noted, you jettisoned all the rows, added new ones, added (probably uselessly) a primary key(the strongly typed table will likely have already had this key) which doesn't mean that the new rows are automatically associated to the old/doesn't cause them to be updated, hen inserted a load of new rows and wrote them to the db. This process was never going to update or delete anything
The way this is supposed to work is, you download rows, you see them in the grid, you add some, you change some, you delete some, you hit the save button. Behind the scenes the grid just poked some new rows into the datatable, marked some as deleted, changed others. It didn't go to the huge (and unfortunately incorrect) lengths your code went to. If you want your code to behave the same you follow the same idea:
var pta = new PersonTableAdapter();
var pdt = pta.GetData(); //query that returns all rows
pta.Fill(somedataset.Person); //or can do this
pdt = somedataset.Person; //alias of Person table
var p = pdt.FindByPersonId(123); //PersonId is the primary key in the datatable
p.Delete(); //mark person 123 as deleted
p = pdt.First(r => r.Name = "Joe"); //LINQ just works on strongly typed datatables, out of the box, no messing
p.Name = "John"; //modify joes name to John
pdt.AddPersonRow("Jane", 22);
pta.Update(pdt); //saves changes(delete 123, rename joe, add Jane) to db
What you need to appreciate is that all these commands are just finding or creating datarow obj3cts, that live inside a table.. the table tracks what you do and the adapter uses appropriate sql to send changes to the db.. if you wanted to mark all rows in a datatable as deleted you can visit each of them and call Delete() on it, then update the datatable to save the changes to the db

InvalidOperationException while trying bulkInsert the datatable into SQL Server

I got data from text file into datatable and now when I am trying to insert that data into a SQL Server 2008 database, I get the following error:
InvalidOperationException: String or binary data would be truncated
I cannot get the source of error ie which record is throwing this error.
The code is as below
for (int i = 0; i < dt.Columns.Count; i++)
{
if (i == 159)
{
}
bulkCopy.ColumnMappings.Add(dt.Columns[i].ColumnName,DestTable.Columns[i].ColumnName);
}
bulkCopy.BulkCopyTimeout = 600;
bulkCopy.DestinationTableName = "dbo.TxtFileInfo";
bulkCopy.WriteToServer(dt);
I have the datatable in the dt variable. And matched columns for both datatable created from text file and also the empty table created in database to add the values to it.
I have copied all records from text file into datatable using below code.
while (reader.Read())
{
int count1 = reader.FieldCount;
for (int i = 0; i < count1; i++)
{
string value = reader[i].ToString();
list.Add(value);
}
dt.Rows.Add(list.ToArray());
list.Clear();
}
I have got proper records from the text file. Also the number of columns are equal. My database table TextToTable has all columns of datatype nvarchar(50) and I am fetching each record as string from text file. But during bulk insert the error shown is
Cannot convert string to nvarchar(50)
Seems you are trying to insert data which has less length in DB (for example: you have data length 20 but in DB accept only 10 )
To check data which is coming from text file. Have if condition to check length of data in your code and have break point to troubleshoot.
If yes then increase the length of column in DB.
alter tablename
alter column columnname varchar(xxx)

Optimizing query that uses AsEnumerable and SingleOrDefault

Not long ago there was a feature request in the program I am maintaining. Basically it has to fill up a table in the database with info from a text file. These files can be pretty big, but it was fairly easy to do because these files were defined as the complete list of user data. Therefore the table could be truncated and the just filled up again with data from the text file.
But then a week ago it was decided that these files are actually updates of current user info, so now I have to retrieve the correct MeteringPointId (which only exist once if it does exist) and then update info on it. If it doesn't exist, just insert data as before.
The way I do this is retrieving the complete database table with data from the database into memory and then just updating on that info before finally saving the changes by calling the datatables update function. It works fine, except that finding the row with the MeteringPointId is slow:
DataRow row = MeteringPointsDataTable.NewRow();
// this is called for each line in the text file to find the corresponding MeteringPointId. It can be 300.000 times.
row = MeteringPointsDataTable.AsEnumerable().SingleOrDefault(r => r.Field<string>("MeteringPointId").ToString() == MeteringPointId);
Is there a way to retrieve a DataRow from a DataTable that is faster than this?
If you are sure that only one item con fullfil the condition use FirstOrDefault instead of Single. Thus you won´t collect the whole table but only the first entry you´ve found.
You can use Select method of DataTable.
var expression = "[MeteringPointId] = '" + MeteringPointId + "'";
DataRow[] result = MeteringPointsDataTable.Select(expression);
Also you can create an expression like,
var idList = new []{"id1", "id2", "id3", ...};
var expression = "[MeteringPointId] in " + string.Format("({0})", string.Join(",", idList.Select(i=> "'"+i+"'")));
Similar usage is here
Hope it helps..
You could put the whole table in a dictionary:
//At the start
var meteringPoints = MeteringPointsDataTable.AsEnumerable().ToDictionary(r => r.Field<string>("MeteringPointId").ToString());
//For each row of the text file:
DataRow row;
if (!meteringPoints.TryGetValue(MeteringPointId, out row))
{
row = MeteringPointsDataTable.NewRow();
meteringPoints[MeteringPointId] = row;
}

DataAdapter not updating source

I have come across a problem in using the DataAdapter, which I hope someone can help with. Basically I am creating a system, which is as follows:
Data is read in from a data source (MS-Access, SQL Server or Excel), converted to data tables and inserted into a local SQL Server database, using DataAdapters. This bit works fine. The SQL server table has a PK, which is an identity field with auto increment set to on.
Subsequent data loads read in the data from the source and compare it to what we already have. If the record is missing then it is added (this works fine). If the record is different then it needs to be updated (this doesn't work).
When doing the differential data load I create a data table which reads in the schema from the destination table (SQL server) and ensures it has the same columns etc.
The PK in the destination table is column 0, so when a record is inserted all of the values from column 1 onwards are set (as mentioned this works perfectly.). I don't change the row status for items I am adding. The PK in the data table is set correctly and I can confirm this.
When updating data I set column 0 (the PK column) to be the value of the record I am updating and set all of the columns to be the same as the source data.
For updated records I call AcceptChanges and SetModified on the row to ensure (I thought) that the application calls the correct method.
The DataAdapter is set with SelectCommand and UpdateCommand using the command builder.
When I run, I have traced it using SQL profiler and can see that the insert command is being ran correctly, but the update command isn't being ran at all, which is the crux of the problem. For reference an insert table will look something like the following
PK Value1 Value 2 Row State
== ====== ======= =========
124 Test1 Test 2 Added
123 Test3 Test4 Updated
Couple of things to be aware of....
I have tested this by loading the row to be changed into the datatable, changing some column fields and running update and this works. However, this is impractical for my solution because the data is HUGE >1Gb so I can't simply load it into a datatable without taking a huge performance hit. What I am doing is creating the data table with a max of 500 rows and the running the Update. Testing during the initial data load showed this to be the most efficient in terms of memory useage and performance. The data table is cleared after each batch is ran.
Anyone any ideas on where I am going wrong here?
Thanks in advance
Andrew
==========Update==============
Following is the code to create the insert/update rows
private static void AddNewRecordToDataTable(DbDataReader pReader, ref DataTable pUpdateDataTable)
{
// create a new row in the table
DataRow pUpdateRow = pUpdateDataTable.NewRow();
// loop through each item in the data reader - setting all the columns apart from the PK
for (int addCount = 0; addCount < pReader.FieldCount; addCount++)
{
pUpdateRow[addCount + 1] = pReader[addCount];
}
// add the row to the update table
pUpdateDataTable.Rows.Add(pUpdateRow);
}
private static void AddUpdateRecordToDataTable(DbDataReader pReader, int pKeyValue,
ref DataTable pUpdateDataTable)
{
DataRow pUpdateRow = pUpdateDataTable.NewRow();
// set the first column (PK) to the value passed in
pUpdateRow[0] = pKeyValue;
// loop for each row apart from the PK row
for (int addCount = 0; addCount < pReader.FieldCount; addCount++)
{
pUpdateRow[addCount + 1] = pReader[addCount];
}
// add the row to the table and then update it
pUpdateDataTable.Rows.Add(pUpdateRow);
pUpdateRow.AcceptChanges();
pUpdateRow.SetModified();
}
The following code is used to actually do the update:
updateAdapter.Fill(UpdateTable);
updateAdapter.Update(UpdateTable);
UpdateTable.AcceptChanges();
The following is used to create the data table to ensure it has the same fields/data types as the source data
private static DataTable CreateDataTable(DbDataReader pReader)
{
DataTable schemaTable = pReader.GetSchemaTable();
DataTable resultTable = new DataTable(<tableName>); // edited out personal info
// loop for each row in the schema table
try
{
foreach (DataRow dataRow in schemaTable.Rows)
{
// create a new DataColumn object and set values depending
// on the current DataRows values
DataColumn dataColumn = new DataColumn();
dataColumn.ColumnName = dataRow["ColumnName"].ToString();
dataColumn.DataType = Type.GetType(dataRow["DataType"].ToString());
dataColumn.ReadOnly = (bool)dataRow["IsReadOnly"];
dataColumn.AutoIncrement = (bool)dataRow["IsAutoIncrement"];
dataColumn.Unique = (bool)dataRow["IsUnique"];
resultTable.Columns.Add(dataColumn);
}
}
catch (Exception ex)
{
message = "Unable to create data table " + ex.Message;
throw new Exception(message, ex);
}
return resultTable;
}
In case anyone is interested I did manage to get around the problem, but never managed to get the data adapter to work. Basically what I did was as follows:
Create a list of objects with an index and a list of field values as members
Read in the rows that have changed and store the values from the source data (i.e. the values that will overwrite the current ones in the object). In addition I create a comma separated list of the indexes
When I am finished I use the comma separated list in a sql IN statement to return the rows and load them into my data adapter
For each one I run a LINQ query against the index and extract the new values, updating the data set. This sets the row status to modified
I then run the update and the rows are updated correctly.
This isn't the quickest or neatest solution, but it does work and allows me to run the changes in batches.
Thanks
Andrew

How to delete all rows from datatable

I want to delete all rows from datatable with rowstate property value Deleted.
DataTable dt;
dt.Clear(); // this will not set rowstate property to delete.
Currently I am iterating through all rows and deleting each row.
Is there any efficient way?
I don't want to delete in SQL Server I want to use DataTable method.
We are using this way:
for(int i = table.Rows.Count - 1; i >= 0; i--) {
DataRow row = table.Rows[i];
if ( row.RowState == DataRowState.Deleted ) { table.Rows.RemoveAt(i); }
}
This will satisfy any FK cascade relationships, like 'delete' (that DataTable.Clear() will not):
DataTable dt = ...;
// Remove all
while(dt.Count > 0)
{
dt.Rows[0].Delete();
}
dt.Rows.Clear();
dt.Columns.Clear(); //warning: All Columns delete
dt.Dispose();
I typically execute the following SQL command:
DELETE FROM TABLE WHERE ID>0
Since you're using an SQL Server database, I would advocate simply executing the SQL command "DELETE FROM " + dt.TableName.
I would drop the table, fastest way to delete everything. Then recreate the table.
You could create a stored procedure on the SQL Server db that deletes all the rows in the table, execute it from your C# code, then requery the datatable.
Here is the solution that I settled on in my own code after searching for this question, taking inspiration from Jorge's answer.
DataTable RemoveRowsTable = ...;
int i=0;
//Remove All
while (i < RemoveRowsTable.Rows.Count)
{
DataRow currentRow = RemoveRowsTable.Rows[i];
if (currentRow.RowState != DataRowState.Deleted)
{
currentRow.Delete();
}
else
{
i++;
}
}
This way, you ensure all rows either get deleted, or have their DataRowState set to Deleted.
Also, you won't get the InvalidOperationException due to modifying a collection while enumerating, because foreach isn't used. However, the infinite loop bug that Jorge's solution is vulnerable to isn't a problem here because the code will increment past a DataRow whose DataRowState has already been set to Deleted.

Categories

Resources