Issue Understanding SQL Server Merge for Bulk Insert

Issue Understanding SQL Server Merge for Bulk Insert - c#

I generate a DataTable and I first remove all rows from that DataTable in C#.
Then I pass on the DataTable to a stored procedure for the bulk insert but I am getting error randomly stating:
The Merge statement updated /delete same row more than once. This happens when a target row matches more than one source row.
But what is confusing me is I remove all duplicate rows from the DataTable before sending it to the stored procedure.
CODE:
public static DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
And this is the procedure:
ALTER PROCEDURE [dbo].[Update_DataFeed_Discoverable]
#tblCustomers DateFeed_Discoverable READONLY
AS
BEGIN
SET NOCOUNT ON;
MERGE INTO DataFeed c1
USING #tblCustomers c2
ON c1.CheckSumProductName=checksum(c2.aw_deep_link)
WHEN MATCHED THEN
UPDATE SET
c1.merchant_name = c2.merchant_name
,c1.aw_deep_link = c2.aw_deep_link
,c1.brand_name = c2.brand_name
,c1.product_name = c2.product_name
,c1.merchant_image_url = c2.merchant_image_url
,c1.Price = c2.Price
WHEN NOT MATCHED THEN
INSERT VALUES(c2.merchant_name, c2.aw_deep_link,
c2.brand_name, c2.product_name, c2.merchant_image_url,
c2.Price,1,checksum(c2.aw_deep_link)
);
END
So does it end up with duplicate rows when they are not Inserted and removed before using the DataTable.
Any help is appreciated.

Related

Taking distinct rows from datatable fails

I am trying to take distinct rows from my table based on a column value(Column Name Id)
my master datatable is like this
And I want to take all rows with distinct Id, My code is like this
DataTable distinctDt = dt.DefaultView.ToTable(true, "Id", "ProductName",
"ProductDescription","ProductCategory_Id", "Fair_Id", "Price","ImagePath",
"FairName", "FairLogo", "StartDate", "EndDate", "picId");
But this still returns duplicate rows. What am i doing wrong here?

Your used approach is not useful in this scenario because you need all the columns based on the unique columns ID's, here is the method of doing this:
public static DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
Simply pass your MasterDatatable and column name to this method and it will give you what you want.
It is tested code and works perfectly. See the working.
Hope it helps!

Update :Result is not transfered to the orginal dataset. Sorting Rows in Dataset using C#

Update: even though I have got the required result but when the the second function access the data table the value is still the same
It a sequential program with two functions in different classes. First sort and second replace function. So it should sort the value and other function should be able to retrieve the sorted table but when it retrieve the datatable it gives the unsorted table.
I have used acceptchanges() but it also give the same result.
The program is trying to sort the table according to the required field and the result is stored in Sorted table variable. I am trying to copy this to the original i-e sourceTables but it is not working and is adding another row instead of updating [As shown in below dig]. I have tried to copy whole table but it does not work and by adding rows it is not giving the required result. I have used different methods but I am not getting the required result.
List<DataTable> sourceTables = context.GetDataByTable(sourceTable.StringValue);
List<DataTable> targetTables = context.GetDataByTable(targetTable.StringValue, sourceTables.Count);
string orderDesc= orderField.StringValue + " DESC";
for (int i = 0; i < sourceTables.Count; i++)
{
DataView dv = sourceTables[i].DefaultView;
if (orderDirection.StringValue == OrderDirectionAsc)
{
// for Sorting in Ascending Order
dv.Sort = orderField.StringValue;
}
else
{
// for Sorting in Descending Order
dv.Sort = orderDesc;
}
DataTable sortedTable = dv.ToTable();
DataTable dttableNew = sortedTable.Clone();
//sourceTables[i] = sortedTable.Copy();
//targetTables[i] = dv.ToTable();
//targetTables[i] = sortedTable.Copy();
// foreach (DataRow dr in sortedTable.Rows)
//// targetTables[i].Rows.Add(dr.ItemArray);
//}
for (int j = 0; j < sourceTables[i].Rows.Count; j++)
{
if (sourceTable.GetValue().ToString() == targetTable.GetValue().ToString())
{
foreach (DataRow dr in sortedTable.Rows)
{
targetTables[i].Rows.Add(dr.ItemArray);
}
else
{
foreach (DataRow dr in sortedTable.Rows)
{
targetTables[i].Rows.Add(dr.ItemArray);
}
// targetTables[i] = sortedTable.Copy(); does not work
//foreach (DataRow drtableOld in sortedTable.Rows)
//{
// targetTables[i].ImportRow(drtableOld);
//}
Instead of replacing the first values it is adding more rows
any help would be appreciated

If any one have problem with duplicate data or the changes are only local and is not effecting the original data table. Remember to always use .ImportRow(dr) function to add rows to the table and if you use Tables[i].Rows.Add(dr.ItemArray); the changes will affect only the local table and not the original one. Use .clear to remove the old rows from the orginal table. The action done directly on the original function will only effect the rows. If it is done on the clone copy changes will nor affect the original table.
Here is the complete code
DataTable sortTable = dv.ToTable();
if (sTable.GetValue().ToString() == tTable.GetValue().ToString())
{
sTables[i].Clear();
foreach (DataRow dr in sortTable.Rows)
{
sTables[i].ImportRow(dr);
}
sTables[i].AcceptChanges();
}

Using GetSchemaTable() to retrieve only column names

Is it possible to use GetSchemaTable() to retrieve only column names?
I have been trying to retrieve Column names (only) using this method, is it possible.
DataTable table = myReader.GetSchemaTable();
foreach (DataRow myField in table.Rows)
{
foreach (DataColumn myProperty in table.Columns)
{
fileconnectiongrid.Rows.Add(myProperty.ColumnName + " = "
+ myField[myProperty].ToString());
}
}
This code retrieves a lot of table data unwanted, I only need a list containing
column names!:

You need to use ExecuteReader(CommandBehavior.SchemaOnly)):
DataTable schema = null;
using (var con = new SqlConnection(connection))
{
using (var schemaCommand = new SqlCommand("SELECT * FROM table", con))
{
con.Open();
using (var reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
{
schema = reader.GetSchemaTable();
}
}
}
SchemaOnly:
The query returns column information only. When using SchemaOnly, the
.NET Framework Data Provider for SQL Server precedes the statement
being executed with SET FMTONLY ON.
The column name is in the first column of every row. I don't think that it's possible to omit the other column informations like ColumnOrdinal,ColumnSize,NumericPrecision and so on since you cannot use reader.GetString but only reader.GetSchemaTable in this case.
But your loop is incorrect if you only want the column names:
foreach (DataRow col in schema.Rows)
{
Console.WriteLine("ColumnName={0}", col.Field<String>("ColumnName"));
}

Change your code to below if all you want is to display the column names. Your original code was trying to not only display column names, but also trying to display the actual data values as well.
DataTable table = myReader.GetSchemaTable();
foreach (DataRow myField in table.Rows)
{
foreach (DataColumn myProperty in table.Columns)
{
fileconnectiongrid.Rows.Add(myProperty.ToString());
}
}

This will give you all column names, you can place them in a string[] and do with them what you like.
foreach(var columnName in DataTable.Columns)
{
Console.WriteLine(columnName);
}

//Retrieve column schema into a DataTable.
schemaTable = reader.GetSchemaTable();
int index = schemaTable.Columns.IndexOf("ColumnName");
DataColumn columnName = schemaTable.Columns[index];
//For each field in the table...
foreach (DataRow myField in schemaTable.Rows)
{
String columnNameValue = myField[columnName].ToString();
Console.WriteLine("ColumnName " + columnNameValue);
}

I use same technics to add MAX-STRING-LENGTH constraint on custom TextBox in my VB.Net program.
I use a SQL SELECT command to get 4 column's values
SELECT code_pays
,nom
,code_pays_short
,default_devise
FROM pays
ORDER BY nom
I use the result returned by an IDataReader object to fill a DataGridView.
And finally, I display each row's field in a Panel that contains 4 TextBox.
To avoid that SQL UPDATE command used to save some record's changes done in TextBox return error message due to column value too long, I have added a property in custom Textbox to inform directly user that value's size is overlapped.
Here is my Form
Here is VB.Net code used to initialize MaxStringLength properties
Private Sub PushColumnConstraints(dr As IDataReader)
Dim tb As DataTable = dr.GetSchemaTable()
Dim nColIndex As Integer = -1
For Each col As DataColumn In tb.Columns
If col.ColumnName = "ColumnSize" Then
nColIndex = col.Ordinal
Exit For
End If
Next
If nColIndex < 0 Then
oT.ThrowException("[ColumnSize] columns's index not found !")
Exit Sub
End If
txtCodePays.MaxStringLength = tb.Rows(0).Item(nColIndex)
txtPays.MaxStringLength = tb.Rows(1).Item(nColIndex)
txtShortCodePays.MaxStringLength = tb.Rows(2).Item(nColIndex)
txtDefaultDevise.MaxStringLength = tb.Rows(3).Item(nColIndex)
End Sub
In For loop, program search index of field contained in ColumnSize column's value.
MaxStringLength property is assigned using following syntax
tb.Rows(%TEXT-BOX-INDEX%).Item(nColIndex)
.Rows(%TEXT-BOX-INDEX%) is used to identify column's metadata in SQL SELECT !
.Item(nColIndex) is used to get a specific column's metadata value
Item(n) can return a String or an Integer but VB.Net do implicit conversion when necessary.
This line of code can also be written shortly
tb.Rows(%TEXT-BOX-INDEX%)(nColIndex)
tb(%TEXT-BOX-INDEX%)(nColIndex)
but it is not readable !
Caution: MaxStringLength is a custom property. It is not part of normal TextBox.
In print screen above, you can see that program indicates to user that length is too big for Code Pays (3 lettres) TextBox.
Error's message is displayed in StatusBar at bottom of Form.
This information is displayed before clicking on SAVE button that generates an SQL UPDATE command.
Code used that call PushColumnConstraints method is following
Public Sub FillPanel()
SQL =
<sql-select>
SELECT code_pays
,nom
,code_pays_short
,default_devise
FROM pays
ORDER BY nom
</sql-select>
Dim cmd As New NpgsqlCommand(SQL, cn)
Dim dr As NpgsqlDataReader
Try
dr = cmd.ExecuteReader()
Catch ex As Exception
ThrowException(ex)
End Try
Call PushColumnConstraints(dr)

Enumerate over DataTable, filter items, then revert to DataTable

I'd like to filter items in my DataTable by whether a column value is contained inside a string array by converting it to an IEnumerable<DataRow>, afterwards I'd like to re-convert it to DataTable since that's what my method has to return.
Here's my code so far:
string[] ids = /*Gets string array of IDs here*/
DataTable dt = /*Databasecall returning a DataTable here*/
IEnumerable<DataRow> ie = dt.AsEnumerable();
ie = ie.Where<DataRow>(row => ids.Contains(row["id"].ToString()));
/*At this point I've filtered out the entries I don't want, now how do I convert this back to a DataTable? The following does NOT work.*/
ie.CopyToDataTable(dt, System.Data.LoadOption.PreserveChanges);
return dt;

I would create an empty clone of the data table:
DataTable newTable = dt.Clone();
Then import the rows from the old table that match the filter:
foreach(DataRow row in ie)
{
newTable.ImportRow(row);
}

Assuming that you want to filter the rows in-place, that is the filtered rows should be returned in the same DataTable that was created through the original database query, you should first clear the DataTable.Rows collection. Then you should copy the filtered rows to an array and add them sequentially:
ie = ie.Where<DataRow>(row => ids.Contains(row["id"].ToString())).ToArray();
dt.Rows.Clear();
foreach (var row in ie)
{
dt.Rows.Add(row);
}
An alternative way to achieve this could be to simply iterate through the rows in the DataTable once and delete the ones that should be filtered out:
foreach (var row in dt.Rows)
{
if (ids.Contains(row["id"].ToString()) == false)
{
row.Delete();
}
}
dt.AcceptChanges();
Note that if the DataTable is part of a DataSet that is being used to update the database, all modifications made to the DataTable.Rows collection will be reflected in the corresponding database table during an update.

Remove duplicate column values from a datatable without using LINQ

Consider my datatable,
Id Name MobNo
1 ac 9566643707
2 bc 9944556612
3 cc 9566643707
How to remove the row 3 which contains duplicate MobNo column value in c# without using LINQ. I have seen similar questions on SO but all the answers uses LINQ.

The following method did what i want....
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}

As you are reading your CSV file ( a bit of pseudo code, but you get the picture ):
List<String> uniqueMobiles = new List<String>();
String[] fileLines = readYourFile();
for (String line in fileLines) {
DataRow row = parseLine(line);
if (uniqueMobiles.Contains(row["MobNum"])
{
continue;
}
uniqueMobiles.Add(row["MobNum"]);
yourDataTable.Rows.Add(row);
}
This will only load the records with unique mobiles into your data table.

This is the simplest way .
**
var uniqueContacts = dt.AsEnumerable()
.GroupBy(x=>x.Field<string>("Email"))
.Select(g=>g.First());
**
I found it in this thread
LINQ to remove duplicate rows from a datatable based on the value of a specific row
what actually was for me that I return it as datatable
DataTable uniqueContacts = dt.AsEnumerable()
.GroupBy(x=>x.Field<string>("Email"))
.Select(g=>g.First()).CopyToDataTable();

You might want to look up the inner workings on DISTINCT before running this on your sharp DB (be sure to back up!), but if it works as I think it does (grabbing the first value) you should be able to use (something very similar to) the following SQL:
DELETE FROM YourTable WHERE Id NOT IN (SELECT DISTINCT Id, MobNo FROM YourTable);

You can use "IEqualityComparer" in C#

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Issue Understanding SQL Server Merge for Bulk Insert - c#

Related

Taking distinct rows from datatable fails

Update :Result is not transfered to the orginal dataset. Sorting Rows in Dataset using C#

Using GetSchemaTable() to retrieve only column names

Enumerate over DataTable, filter items, then revert to DataTable

Remove duplicate column values from a datatable without using LINQ

Categories

Resources