Remove duplicate column values from a datatable without using LINQ - c#

Consider my datatable,
Id Name MobNo
1 ac 9566643707
2 bc 9944556612
3 cc 9566643707
How to remove the row 3 which contains duplicate MobNo column value in c# without using LINQ. I have seen similar questions on SO but all the answers uses LINQ.

The following method did what i want....
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}

As you are reading your CSV file ( a bit of pseudo code, but you get the picture ):
List<String> uniqueMobiles = new List<String>();
String[] fileLines = readYourFile();
for (String line in fileLines) {
DataRow row = parseLine(line);
if (uniqueMobiles.Contains(row["MobNum"])
{
continue;
}
uniqueMobiles.Add(row["MobNum"]);
yourDataTable.Rows.Add(row);
}
This will only load the records with unique mobiles into your data table.

This is the simplest way .
**
var uniqueContacts = dt.AsEnumerable()
.GroupBy(x=>x.Field<string>("Email"))
.Select(g=>g.First());
**
I found it in this thread
LINQ to remove duplicate rows from a datatable based on the value of a specific row
what actually was for me that I return it as datatable
DataTable uniqueContacts = dt.AsEnumerable()
.GroupBy(x=>x.Field<string>("Email"))
.Select(g=>g.First()).CopyToDataTable();

You might want to look up the inner workings on DISTINCT before running this on your sharp DB (be sure to back up!), but if it works as I think it does (grabbing the first value) you should be able to use (something very similar to) the following SQL:
DELETE FROM YourTable WHERE Id NOT IN (SELECT DISTINCT Id, MobNo FROM YourTable);

You can use "IEqualityComparer" in C#

Related

Issue Understanding SQL Server Merge for Bulk Insert

I generate a DataTable and I first remove all rows from that DataTable in C#.
Then I pass on the DataTable to a stored procedure for the bulk insert but I am getting error randomly stating:
The Merge statement updated /delete same row more than once. This happens when a target row matches more than one source row.
But what is confusing me is I remove all duplicate rows from the DataTable before sending it to the stored procedure.
CODE:
public static DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
And this is the procedure:
ALTER PROCEDURE [dbo].[Update_DataFeed_Discoverable]
#tblCustomers DateFeed_Discoverable READONLY
AS
BEGIN
SET NOCOUNT ON;
MERGE INTO DataFeed c1
USING #tblCustomers c2
ON c1.CheckSumProductName=checksum(c2.aw_deep_link)
WHEN MATCHED THEN
UPDATE SET
c1.merchant_name = c2.merchant_name
,c1.aw_deep_link = c2.aw_deep_link
,c1.brand_name = c2.brand_name
,c1.product_name = c2.product_name
,c1.merchant_image_url = c2.merchant_image_url
,c1.Price = c2.Price
WHEN NOT MATCHED THEN
INSERT VALUES(c2.merchant_name, c2.aw_deep_link,
c2.brand_name, c2.product_name, c2.merchant_image_url,
c2.Price,1,checksum(c2.aw_deep_link)
);
END
So does it end up with duplicate rows when they are not Inserted and removed before using the DataTable.
Any help is appreciated.

Taking distinct rows from datatable fails

I am trying to take distinct rows from my table based on a column value(Column Name Id)
my master datatable is like this
And I want to take all rows with distinct Id, My code is like this
DataTable distinctDt = dt.DefaultView.ToTable(true, "Id", "ProductName",
"ProductDescription","ProductCategory_Id", "Fair_Id", "Price","ImagePath",
"FairName", "FairLogo", "StartDate", "EndDate", "picId");
But this still returns duplicate rows. What am i doing wrong here?
Your used approach is not useful in this scenario because you need all the columns based on the unique columns ID's, here is the method of doing this:
public static DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
Simply pass your MasterDatatable and column name to this method and it will give you what you want.
It is tested code and works perfectly. See the working.
Hope it helps!

Cannot sort DDL using multiple fields in asp.net

I am trying to sort a dynamically created DDL. This DDL has a combination of fields being Location1, Location2, Location3.
The code used to create the DDL is as follows:
ddlLocation.Items.Clear();
dtLocation = dataSource.GetFilteredRiskInfo("location");
ddlLocation.Items.Insert(0, new ListItem("All", "-1"));
foreach (DataRow row in dtLocation.Rows)
{
if (this.ddlLocation.Items.FindByText(row["Location1"].ToString()) == null)
this.ddlLocation.Items.Add(new ListItem(row["Location1"].ToString()));
if (this.ddlLocation.Items.FindByText(row["Location2"].ToString()) == null)
this.ddlLocation.Items.Add(new ListItem(row["Location2"].ToString()));
if (this.ddlLocation.Items.FindByText(row["Location3"].ToString()) == null)
this.ddlLocation.Items.Add(new ListItem(row["Location3"].ToString()));
}
The following code works for location1 (I think).
[DisplayColumn("location","Location1")]
public partial class Location
{
}
I have been researching how to add multiple fields ie: location2 & location3, it does not work. The code itself shows an error for Location1 & Location2 ("The name 'Location1' does not exist in the current context")
[DisplayColumn("location","SortColumn")]
public partial class Location
{
public string SortColumn
{
get { return Location1 + Location2; }
}
}
All help will be much apprecaited.
The list of locations need to be as follows. Say the locations have the following data.
Location1:
Sydney, Wollongong Perth
Location2:
Adelaide, Northern Territory
Location3:
Brisbane, Canberra, Hobart
I need the combined list to look as follows:
Adelaide
Brisbane
Canberra
Hobart
Northern Territory
Perth
Sydney
Wollongong
My code now looks as follows:
ddlLocation.Items.Clear();
// new table to combine the 3 columns of dtLocation into a one column datatable
DataTable sortedDt = new DataTable();
dtLocation = dataSource.GetFilteredRiskInfo("location");
sortedDt.Columns.Add("Location");
// combining columns
foreach (DataRow row in dtLocation.Rows)
{
sortedDt.Rows.Add(row["Location1"]);
sortedDt.Rows.Add(row["Location2"]);
sortedDt.Rows.Add(row["Location3"]);
}
// now sort these now that they're all in the same column
sortedDt.DefaultView.Sort = "Location";
sortedDt = sortedDt.DefaultView.ToTable(); // should be the new sorted table
// now your original code, but modified to populate the ddl with the new sorted data
ddlLocation.Items.Insert(0, new ListItem("All", "-1"));
foreach (DataRow row in sortedDt.Rows)
{
this.ddlLocation.Items.Add(new ListItem(row["Location"].ToString()));
}
Now the list is in alphabetical order but has duplicate entries. I need to be able to remove the duplicates/make the entries distinct.
Not only should you presort it in GetFilteredRiskInfo as FrankO suggests, but it sounds like you have to combine the 3 location columns into one. If you can do that in GetFilteredRiskInfo it might be cleaner, if not you can try combining the columns and sorting in code with something like this:
// new table to combine the 3 columns of dtLocation into a one column datatable
DataTable sortedDt = new DataTable();
sortedDt.Columns.Add("Location");
// combining columns
foreach (DataRow row in dtLocation.Rows)
{
sortedDt.Rows.Add(row["Location1"]);
sortedDt.Rows.Add(row["Location2"]);
sortedDt.Rows.Add(row["Location3"]);
}
// now sort these now that they're all in the same column
sortedDt.DefaultView.Sort = "Location";
sortedDt = sortedDt.DefaultView.ToTable(); // should be the new sorted table
// now your original code, but modified to populate the ddl with the new sorted data
ddlLocation.Items.Clear();
dtLocation = dataSource.GetFilteredRiskInfo("location");
ddlLocation.Items.Insert(0, new ListItem("All", "-1"));
foreach (DataRow row in sortedDt.Rows)
{
this.ddlLocation.Items.Add(new ListItem(row["Location"].ToString()));
}
Thanks both FronkO (I now have sorted lists on all my drop downs) and madamission (for the extra help with the locations sorting). Between the 2 of you and some help from the internet I now have it working. The working code is as follows:
//Location
ddlLocation.Items.Clear();
// new table to combine the 3 columns of dtLocation into a one column datatable
DataTable sortedDt = new DataTable();
sortedDt.Columns.Add("Location");
dtLocation = dataSource.GetFilteredRiskInfo("location");
// combining columns
foreach (DataRow row in dtLocation.Rows)
{
sortedDt.Rows.Add(row["Location1"]);
sortedDt.Rows.Add(row["Location2"]);
sortedDt.Rows.Add(row["Location3"]);
}
// now make them distinct & sort them now that they're all in the same column
sortedDt = sortedDt.DefaultView.ToTable(/*distinct*/ true);
sortedDt.DefaultView.Sort = "Location";
sortedDt = sortedDt = sortedDt.DefaultView.ToTable();
// now your original code, but modified to populate the ddl with the new sorted data
ddlLocation.Items.Insert(0, new ListItem("All", "-1"));
foreach (DataRow row in sortedDt.Rows)
{
this.ddlLocation.Items.Add(new ListItem(row["Location"].ToString()));
}

Delete row from datatable where first column does not contain (some string)

I have DataTable with the following columns:
ClientID date numberOfTransactions price
ClientID is of type string and I need to ensure that its contents include "A-" and "N6" for every value in the table.
I need to delete all rows from the DataTable where this first column (ClientID) does not contain both "A-" and "N6" (some totals and other unnecessary data). How can I select and delete these rows specifically from the DataTable?
I know this:
foreach (DataRow row in table.Rows) // Loop over the rows.
{
//Here should come part "if first column contains mentioned values
}
I also know this
If (string.Contains("A-") == true && string.Contains("N6") == true)
{
//Do something
}
I need help how to implement this for first column of each row.
Try this:
EDIT: Totally messed up that last line, so if you tried it, try it now that I made it not stupid. =)
List<int> IndicesToRemove = new List<int>();
DataTable table = new DataTable(); //Obviously, your table will already exist at this point
foreach (DataRow row in table.Rows)
{
if (!(row["ClientID"].ToString().Contains("A-") && row["ClientID"].ToString().Contains("N6")))
IndicesToRemove.Add(table.Rows.IndexOf(row));
}
IndicesToRemove.Sort();
for (int i = IndicesToRemove.Count - 1; i >= 0; i--) table.Rows.RemoveAt(IndicesToRemove[i]);
try using this,
assuming dt as your Datatabe object and ClientID as your first column (hence using ItemArray[0])
for(int i=0; i<dt.Rows.Count; i++)
{
temp = dt.Rows[i].ItemArray[0].ToString();
if (System.Text.RegularExpressions.Regex.IsMatch(temp, "A-", System.Text.RegularExpressions.RegexOptions.IgnoreCase) || System.Text.RegularExpressions.Regex.IsMatch(temp, "N6", System.Text.RegularExpressions.RegexOptions.IgnoreCase))
{
dt.Rows.RemoveAt(i);
i--;
}
}
Simple and straight forward solution... hope it helps
this should be more efficient, both in lines of Code and Time, try this :)
for(int x=0; x<table.Rows.Count;)
{
if (!table.Rows[x].ItemArray[0].contains("A-") && !table.Rows[x].ItemArray[0].contains("N6"))
table.Rows.RemoveAt(x);
else x++;
}
Happy Coding
Preface: C.Barlow's existing answer is awesome, this is just another route someone could take.
This is one way to do it where you never have to loop all the way through the original table (by taking advantage of the DataTable.Select() method):
DataTable table = new DataTable(); // This would be your existing DataTable
// Grab only the rows that meet your criteria using the .Select() method
DataRow[] newRows = table.Select("ClientID LIKE '%A-%' AND ClientID LIKE '%N6%'");
// Create a new table with the same schema as your existing one.
DataTable newTable = table.Clone();
foreach (DataRow r in newRows)
{
// Dump the selected rows into the table.
newTable.LoadDataRow(r.ItemArray, true);
}
And now you have a DataTable with only the rows you want. If necessary, at this point you could clear out the original table and replace it with the contents of the new one:
table.Clear();
table = newTable.Copy();
Edit: I thought of a memory optimization last night, you can just overwrite the existing table once you have the rows you need, which avoids the need for the temporary table.
DataTable table = new DataTable(); // This would be your existing DataTable
// Grab only the rows that meet your criteria using the .Select() method
DataRow[] newRows = table.Select("ClientID LIKE '%A-%' AND ClientID LIKE '%N6%'");
// Clear out the old table
table.Clear();
foreach (DataRow r in newRows)
{
// Dump the selected rows into the table.
table.LoadDataRow(r.ItemArray, true);
}

Compare two DataTables to determine rows in one but not the other

I have two DataTables, A and B, produced from CSV files. I need to be able to check which rows exist in B that do not exist in A.
Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large.
Assuming you have an ID column which is of an appropriate type (i.e. gives a hashcode and implements equality) - string in this example, which is slightly pseudocode because I'm not that familiar with DataTables and don't have time to look it all up just now :)
IEnumerable<string> idsInA = tableA.AsEnumerable().Select(row => (string)row["ID"]);
IEnumerable<string> idsInB = tableB.AsEnumerable().Select(row => (string)row["ID"]);
IEnumerable<string> bNotA = idsInB.Except(idsInA);
would I have to iterate through each row on each DataTable to check if they are the same.
Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.
Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:
1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:
Sort both tables by their ID (using some useful thing like a quicksort). If they're already sorted then you win big.
Step through both tables at once, skipping over any gaps in ID's in either table. Matched ID's mean duplicated records.
This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.
(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )
2: An alternative approach, which may be more or less efficient depending on how big your data is:
Run through table 1, and for each row, stick it's ID (or computed hashcode, or some other unique ID for that row) into a dictionary (or hashtable if you prefer to call it that).
Run through table 2, and for each row, see if the ID (or hashcode etc) is present in the dictionary. You're exploiting the fact that dictionaries have really fast - O(1) I think? lookup. This step will be really fast, but you'll have paid the price doing all those dictionary inserts.
I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)
You can use the Merge and GetChanges methods on the DataTable to do this:
A.Merge(B); // this will add to A any records that are in B but not A
return A.GetChanges(); // returns records originally only in B
The answers so far assume that you're simply looking for duplicate primary keys. That's a pretty easy problem - you can use the Merge() method, for instance.
But I understand your question to mean that you're looking for duplicate DataRows. (From your description of the problem, with both tables being imported from CSV files, I'd even assume that the original rows didn't have primary key values, and that any primary keys are being assigned via AutoNumber during the import.)
The naive implementation (for each row in A, compare its ItemArray with that of each row in B) is indeed going to be computationally expensive.
A much less expensive way to do this is with a hashing algorithm. For each DataRow, concatenate the string values of its columns into a single string, and then call GetHashCode() on that string to get an int value. Create a Dictionary<int, DataRow> that contains an entry, keyed on the hash code, for each DataRow in DataTable B. Then, for each DataRow in DataTable A, calculate the hash code, and see if it's contained in the dictionary. If it's not, you know that the DataRow doesn't exist in DataTable B.
This approach has two weaknesses that both emerge from the fact that two strings can be unequal but produce the same hash code. If you find a row in A whose hash is in the dictionary, you then need to check the DataRow in the dictionary to verify that the two rows are really equal.
The second weakness is more serious: it's unlikely, but possible, that two different DataRows in B could hash to the same key value. For this reason, the dictionary should really be a Dictionary<int, List<DataRow>>, and you should perform the check described in the previous paragraph against each DataRow in the list.
It takes a fair amount of work to get this working, but it's an O(m+n) algorithm, which I think is going to be as good as it gets.
Just FYI:
Generally speaking about algorithms, comparing two sets of sortable (as ids typically are) is not an O(M*N/2) operation, but O(M+N) if the two sets are ordered. So you scan one table with a pointer to the start of the other, and:
other_item= A.first()
only_in_B= empty_list()
for item in B:
while other_item > item:
other_item= A.next()
if A.eof():
only_in_B.add( all the remaining B items)
return only_in_B
if item < other_item:
empty_list.append(item)
return only_in_B
The code above is obviously pseudocode, but should give you the general gist if you decide to code it yourself.
Thanks for all the feedback.
I do not have any index's unfortunately. I will give a little more information about my situation.
We have a reporting program (replaced Crystal reports) that is installed in 7 Servers across EU. These servers have many reports on them (not all the same for each country). They are invoked by a commandline application that uses XML files for their configuration. So One XML file can call multiple reports.
The commandline application is scheduled and controlled by our overnight process. So the XML file could be called from multiple places.
The goal of the CSV is to produce a list of all the reports that are being used and where they are being called from.
I am going through the XML files for all references, querying the scheduling program and producing a list of all the reports. (this is not too bad).
The problem I have is I have to keep a list of all the reports that might have been removed from production. So I need to compare the old CSV with the new data. For this I thought it best to put it into DataTables and compare the information, (this could be the wrong approach. I suppose I could create an object that holds it and compares the difference then create iterate through them).
The data I have about each report is as follows:
String - Task Name
String - Action Name
Int - ActionID (the Action ID can be in multiple records as a single action can call many reports, i.e. an XML file).
String - XML File called
String - Report Name
I will try the Merge idea given by MusiGenesis (thanks). (rereading some of the posts not sure if the Merge will work, but worth trying as I have not heard about it before so something new to learn).
The HashCode Idea sounds interesting as well.
Thanks for all the advice.
I found an easy way to solve this. Unlike previous "except method" answers, I use the except method twice. This not only tells you what rows were deleted but what rows were added. If you only use one except method - it will only tell you one difference and not both. This code is tested and works. See below
//Pass in your two datatables into your method
//build the queries based on id.
var qry1 = datatable1.AsEnumerable().Select(a => new { ID = a["ID"].ToString() });
var qry2 = datatable2.AsEnumerable().Select(b => new { ID = b["ID"].ToString() });
//detect row deletes - a row is in datatable1 except missing from datatable2
var exceptAB = qry1.Except(qry2);
//detect row inserts - a row is in datatable2 except missing from datatable1
var exceptAB2 = qry2.Except(qry1);
then execute your code against the results
if (exceptAB.Any())
{
foreach (var id in exceptAB)
{
//execute code here
}
}
if (exceptAB2.Any())
{
foreach (var id in exceptAB2)
{
//execute code here
}
}
Could you not simply compare the CSV files before loading them into DataTables?
string[] a = System.IO.File.ReadAllLines(#"cvs_a.txt");
string[] b = System.IO.File.ReadAllLines(#"csv_b.txt");
// get the lines from b that are not in a
IEnumerable<string> diff = b.Except(a);
//... parse b into DataTable ...
public DataTable compareDataTables(DataTable First, DataTable Second)
{
First.TableName = "FirstTable";
Second.TableName = "SecondTable";
//Create Empty Table
DataTable table = new DataTable("Difference");
DataTable table1 = new DataTable();
try
{
//Must use a Dataset to make use of a DataRelation object
using (DataSet ds4 = new DataSet())
{
//Add tables
ds4.Tables.AddRange(new DataTable[] { First.Copy(), Second.Copy() });
//Get Columns for DataRelation
DataColumn[] firstcolumns = new DataColumn[ds4.Tables[0].Columns.Count];
for (int i = 0; i < firstcolumns.Length; i++)
{
firstcolumns[i] = ds4.Tables[0].Columns[i];
}
DataColumn[] secondcolumns = new DataColumn[ds4.Tables[1].Columns.Count];
for (int i = 0; i < secondcolumns.Length; i++)
{
secondcolumns[i] = ds4.Tables[1].Columns[i];
}
//Create DataRelation
DataRelation r = new DataRelation(string.Empty, firstcolumns, secondcolumns, false);
ds4.Relations.Add(r);
//Create columns for return table
for (int i = 0; i < First.Columns.Count; i++)
{
table.Columns.Add(First.Columns[i].ColumnName, First.Columns[i].DataType);
}
//If First Row not in Second, Add to return table.
table.BeginLoadData();
foreach (DataRow parentrow in ds4.Tables[0].Rows)
{
DataRow[] childrows = parentrow.GetChildRows(r);
if (childrows == null || childrows.Length == 0)
table.LoadDataRow(parentrow.ItemArray, true);
table1.LoadDataRow(childrows, false);
}
table.EndLoadData();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
return table;
}
try
{
if (ds.Tables[0].Columns.Count == ds1.Tables[0].Columns.Count)
{
for (int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
for (int j = 0; j < ds.Tables[0].Columns.Count; j++)
{
if (ds.Tables[0].Rows[i][j].ToString() == ds1.Tables[0].Rows[i][j].ToString())
{
}
else
{
MessageBox.Show(i.ToString() + "," + j.ToString());
}
}
}
}
else
{
MessageBox.Show("Table has different columns ");
}
}
catch (Exception)
{
MessageBox.Show("Please select The Table");
}
I'm continuing tzot's idea ...
If you have two sortable sets, then you can just use:
List<string> diffList = new List<string>(sortedListA.Except(sortedListB));
If you need more complicated objects, you can define a comparator yourself and still use it.
The usual usage scenario considers a user that has a DataTable in hand and changes it by Adding, Deleting or Modifying some of the DataRows.
After the changes are performed, the DataTable is aware of the proper DataRowState for each row, and also keeps track of the Original DataRowVersion for any rows that were changed.
In this usual scenario, one can Merge the changes back into a source table (in which all rows are Unchanged). After merging, one can get a nice summary of only the changed rows with a call to GetChanges().
In a more unusual scenario, a user has two DataTables with the same schema (or perhaps only the same columns and lacking primary keys). These two DataTables consist of only Unchanged rows. The user may want to find out what changes does he need to apply to one of the two tables in order to get to the other one. That is, which rows need to be Added, Deleted, or Modified.
We define here a function called GetDelta() which does the job:
using System;
using System.Data;
using System.Xml;
using System.Linq;
using System.Collections.Generic;
using System.Data.DataSetExtensions;
public class Program
{
private static DataTable GetDelta(DataTable table1, DataTable table2)
{
// Modified2 : row1 keys match rowOther keys AND row1 does not match row2:
IEnumerable<DataRow> modified2 = (
from row1 in table1.AsEnumerable()
from row2 in table2.AsEnumerable()
where table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row2[keycol.Ordinal]))
&& !row1.ItemArray.SequenceEqual(row2.ItemArray)
select row2);
// Modified1 :
IEnumerable<DataRow> modified1 = (
from row1 in table1.AsEnumerable()
from row2 in table2.AsEnumerable()
where table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row2[keycol.Ordinal]))
&& !row1.ItemArray.SequenceEqual(row2.ItemArray)
select row1);
// Added : row2 not in table1 AND row2 not in modified2
IEnumerable<DataRow> added = table2.AsEnumerable().Except(modified2, DataRowComparer.Default).Except(table1.AsEnumerable(), DataRowComparer.Default);
// Deleted : row1 not in row2 AND row1 not in modified1
IEnumerable<DataRow> deleted = table1.AsEnumerable().Except(modified1, DataRowComparer.Default).Except(table2.AsEnumerable(), DataRowComparer.Default);
Console.WriteLine();
Console.WriteLine("modified count =" + modified1.Count());
Console.WriteLine("added count =" + added.Count());
Console.WriteLine("deleted count =" + deleted.Count());
DataTable deltas = table1.Clone();
foreach (DataRow row in modified2)
{
// Match the unmodified version of the row via the PrimaryKey
DataRow matchIn1 = modified1.Where(row1 => table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row[keycol.Ordinal]))).First();
DataRow newRow = deltas.NewRow();
// Set the row with the original values
foreach(DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = matchIn1[dc.ColumnName];
deltas.Rows.Add(newRow);
newRow.AcceptChanges();
// Set the modified values
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
// At this point newRow.DataRowState should be : Modified
}
foreach (DataRow row in added)
{
DataRow newRow = deltas.NewRow();
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
deltas.Rows.Add(newRow);
// At this point newRow.DataRowState should be : Added
}
foreach (DataRow row in deleted)
{
DataRow newRow = deltas.NewRow();
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
deltas.Rows.Add(newRow);
newRow.AcceptChanges();
newRow.Delete();
// At this point newRow.DataRowState should be : Deleted
}
return deltas;
}
private static void DemonstrateGetDelta()
{
DataTable table1 = new DataTable("Items");
// Add columns
DataColumn column1 = new DataColumn("id1", typeof(System.Int32));
DataColumn column2 = new DataColumn("id2", typeof(System.Int32));
DataColumn column3 = new DataColumn("item", typeof(System.Int32));
table1.Columns.Add(column1);
table1.Columns.Add(column2);
table1.Columns.Add(column3);
// Set the primary key column.
table1.PrimaryKey = new DataColumn[] { column1, column2 };
// Add some rows.
DataRow row;
for (int i = 0; i <= 4; i++)
{
row = table1.NewRow();
row["id1"] = i;
row["id2"] = i*i;
row["item"] = i;
table1.Rows.Add(row);
}
// Accept changes.
table1.AcceptChanges();
PrintValues(table1, "table1:");
// Create a second DataTable identical to the first.
DataTable table2 = table1.Clone();
// Add a row that exists in table1:
row = table2.NewRow();
row["id1"] = 0;
row["id2"] = 0;
row["item"] = 0;
table2.Rows.Add(row);
// Modify the values of a row that exists in table1:
row = table2.NewRow();
row["id1"] = 1;
row["id2"] = 1;
row["item"] = 455;
table2.Rows.Add(row);
// Modify the values of a row that exists in table1:
row = table2.NewRow();
row["id1"] = 2;
row["id2"] = 4;
row["item"] = 555;
table2.Rows.Add(row);
// Add a row that does not exist in table1:
row = table2.NewRow();
row["id1"] = 13;
row["id2"] = 169;
row["item"] = 655;
table2.Rows.Add(row);
table2.AcceptChanges();
Console.WriteLine();
PrintValues(table2, "table2:");
DataTable delta = GetDelta(table1,table2);
Console.WriteLine();
PrintValues(delta,"delta:");
// Verify that the deltas DataTable contains the adequate Original DataRowVersions:
DataTable originals = table1.Clone();
foreach (DataRow drow in delta.Rows)
{
if (drow.RowState != DataRowState.Added)
{
DataRow originalRow = originals.NewRow();
foreach (DataColumn dc in originals.Columns)
originalRow[dc.ColumnName] = drow[dc.ColumnName, DataRowVersion.Original];
originals.Rows.Add(originalRow);
}
}
originals.AcceptChanges();
Console.WriteLine();
PrintValues(originals,"delta original values:");
}
private static void Row_Changed(object sender,
DataRowChangeEventArgs e)
{
Console.WriteLine("Row changed {0}\t{1}",
e.Action, e.Row.ItemArray[0]);
}
private static void PrintValues(DataTable table, string label)
{
// Display the values in the supplied DataTable:
Console.WriteLine(label);
foreach (DataRow row in table.Rows)
{
foreach (DataColumn col in table.Columns)
{
Console.Write("\t " + row[col, row.RowState == DataRowState.Deleted ? DataRowVersion.Original : DataRowVersion.Current].ToString());
}
Console.Write("\t DataRowState =" + row.RowState);
Console.WriteLine();
}
}
public static void Main()
{
DemonstrateGetDelta();
}
}
The code above can be tested in https://dotnetfiddle.net/. The resulting output is shown below:
table1:
0 0 0 DataRowState =Unchanged
1 1 1 DataRowState =Unchanged
2 4 2 DataRowState =Unchanged
3 9 3 DataRowState =Unchanged
4 16 4 DataRowState =Unchanged
table2:
0 0 0 DataRowState =Unchanged
1 1 455 DataRowState =Unchanged
2 4 555 DataRowState =Unchanged
13 169 655 DataRowState =Unchanged
modified count =2
added count =1
deleted count =2
delta:
1 1 455 DataRowState =Modified
2 4 555 DataRowState =Modified
13 169 655 DataRowState =Added
3 9 3 DataRowState =Deleted
4 16 4 DataRowState =Deleted
delta original values:
1 1 1 DataRowState =Unchanged
2 4 2 DataRowState =Unchanged
3 9 3 DataRowState =Unchanged
4 16 4 DataRowState =Unchanged
Note that if your tables don't have a PrimaryKey, the where clause in the LINQ queries gets simplified a little bit. I'll let you figure that out on your own.
Achieve it simply using linq.
private DataTable CompareDT(DataTable TableA, DataTable TableB)
{
DataTable TableC = new DataTable();
try
{
var idsNotInB = TableA.AsEnumerable().Select(r => r.Field<string>(Keyfield))
.Except(TableB.AsEnumerable().Select(r => r.Field<string>(Keyfield)));
TableC = (from row in TableA.AsEnumerable()
join id in idsNotInB
on row.Field<string>(ddlColumn.SelectedItem.ToString()) equals id
select row).CopyToDataTable();
}
catch (Exception ex)
{
lblresult.Text = ex.Message;
ex = null;
}
return TableC;
}

Categories

Resources