Compare 2 Datatables to find difference/accuracy between the columns - c#

So, I have 2 separate datatables, that look pretty identical but the values in their rows might be different for instance.
EDIT:
I can have an unique ID BY creating a temporary identity column that can be used as primary key if that will make it easier. so think of ID column as the primary key than.
Table A
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 50 | 150 | 35
2 | Bill | 55 | 47 | 98
3 | Pat | 10 | 15 | 45
4 | Cat | 70 | 150 | 35
Table B
ID | Name | Value1 | Value2 | Value3
-------------------------------------
1 | Bob | 30 | 34 | 67
2 | Bill | 55 | 47 | 98
3 | Pat | 100 | 15 | 45
4 | Cat | 70 | 100 | 20
Output Should be:
Table C
ID | Name | TableAValue1 | TableBValue1 | DiffValue1 ....Samething for Value2 .....samething for value3
------------------------------------------------------
1 | Bob | 50 | 30 | 20
2 | Bill | 55 | 55 | 0
3 | Pat | 10 | 100 | 90
4 | Cat | 70 | 70 | 0
I Know the tedious method to do this is by using a forloop and looping through each row comparing column rows with each other. But I am not sure how to create a new Table C with the results I want. Also I think there might be a simpler solution using Linq which I am not very familiar with but I would be interested in the solution with linq if it faster and less lines of code. I am looking for the most optimal/efficient way of going about this. as these datatables can be anywhere between 5,000 to 15,000+ rows in size so memory usage becomes an issue.

LINQ is not faster, at least not in general. But it can help to increase readability.
You can use Enumerable.Join which might be more efficient than nested loops, but you need a loop to fill your third table anyway. So the first two columns are the identifiers and the rest are the values:
var query = from r1 in table1.AsEnumerable()
join r2 in table2.AsEnumerable()
on new { ID = r1.Field<int>("ID"), Name = r1.Field<string>("Name") }
equals new { ID = r2.Field<int>("ID"), Name = r2.Field<string>("Name") }
select new { r1, r2 };
var columnsToCompare = table1.Columns.Cast<DataColumn>().Skip(2);
foreach (var rowInfo in query)
{
var row = table3.Rows.Add();
row.SetField("ID", rowInfo.r1.Field<int>("ID"));
row.SetField("Name", rowInfo.r1.Field<int>("Name"));
foreach (DataColumn col in columnsToCompare)
{
int val1 = rowInfo.r1.Field<int>(col.ColumnName);
int val2 = rowInfo.r2.Field<int>(col.ColumnName);
int diff = (int)Math.Abs(val1-val2);
row.SetField(col.ColumnName, diff);
}
}

var tableC = new DataTable();
tableC.Columns.Add(new DataColumn("ID"));
tableC.Columns.Add(new DataColumn("Name"));
tableC.Columns.Add(new DataColumn("TableAValue1"));
tableC.Columns.Add(new DataColumn("TableBValue1"));
tableC.Columns.Add(new DataColumn("DiffValue1"));
foreach (DataRow rowA in tableA.Rows)
{
foreach (DataRow rowB in tableB.Rows)
{
if (Convert.ToInt32(rowA["ID"]) == Convert.ToInt32(rowB["ID"]) &&
rowA["Name"].ToString() == rowB["Name"].ToString() &&
Convert.ToInt32(rowA["Value1"]) != Convert.ToInt32(rowB["Value1"]))
{
var newRow = tableC.NewRow();
newRow["ID"] = rowA["ID"];
newRow["Name"] = rowA["Name"];
newRow["TableAValue1"] = rowA["Value1"];
newRow["TableBValue1"] = rowB["Value1"];
newRow["DiffValue1"] = Convert.ToInt32(rowA["Value1"]) - Convert.ToInt32(rowB["Value1"]);
tableC.Rows.Add(newRow);
}
}
}

Using LINQ, create an anonymous type as follows
var joinedRows = (from rowA in TableA.AsEnumerable()
from rowB in TableB.AsEnumerable()
where rowA.Field<String>("Name") == rowB.Field<String>("Name")
select new
{
ID = rowA.Field<int>("ID"),
Name = rowA.Field<String>("Name"),
TableAValue1 = rowA.Field<int>("Value1"),
TableBValue1 = rowB.Field<int>("Value1"),
DiffValue1 = Math.Abs(rowA.Field<int>("Value1") - rowB.Field<int>("Value1")),
TableAValue2 = rowA.Field<int>("Value2"),
TableBValue2 = rowB.Field<int>("Value2"),
DiffValue2 = Math.Abs(rowA.Field<int>("Value2") - rowB.Field<int>("Value2")),
TableAValue3 = rowA.Field<int>("Value3"),
TableBValue3 = rowB.Field<int>("Value3"),
DiffValue3 = Math.Abs(rowA.Field<int>("Value3") - rowB.Field<int>("Value3"))
});
Table.AsEnumerable() will give you an IEnumerable(of DataRow)
row.Field will cast it to the correct type for you
You can now use the anonymous type of joinedRows and create your new dataTable from it

This uses a strategy similar to kippermand's, but will probably perform slightly better on large sets of data by avoiding the O(n²) complexity of checking every ID against every other ID, and by reusing the values extracted from the data table:
// joining by row location
var joinedTableRows =
dt1.AsEnumerable().Zip(dt2.AsEnumerable(),
(r1, r2) => new{r1, r2});
// or, joining by ID
var joinedTableRows2 =
dt1.AsEnumerable().Join(dt2.AsEnumerable(),
r => r.Field<int>("ID"),
r => r.Field<int>("ID"),
(r1, r2) => new{r1, r2});
var result =
from row in joinedTableRows
let rowA = row.r1
let rowB = row.r2
let tableAValue1 = rowA.Field<int>("Value1")
let tableBValue1 = rowB.Field<int>("Value1")
let tableAValue2 = rowA.Field<int>("Value2")
let tableBValue2 = rowB.Field<int>("Value2")
let tableAValue3 = rowA.Field<int>("Value3")
let tableBValue3 = rowB.Field<int>("Value3")
select new
{
ID = row.r1.Field<int>("ID"),
Name = row.r1.Field<string>("Name"),
TableAValue1 = tableAValue1,
TableBValue1 = tableBValue1,
DiffValue1 = Math.Abs(tableAValue1 - tableBValue1),
TableAValue2 = tableAValue2,
TableBValue2 = tableBValue2,
DiffValue2 = Math.Abs(tableAValue2 - tableBValue2),
TableAValue3 = tableAValue3,
TableBValue3 = tableBValue3,
DiffValue3 = Math.Abs(tableAValue3 - tableBValue3)
};
Depending on how your data needs to be consumed, you could either declare a class matching this anonymous type, and consume that directly (which is what I'd prefer), or you can create a DataTable from these objects, if you have to.

Related

Copy row from datatable to another where there are common column headers

I have two datatables, I am trying to copy row from one table to another, I have tried this. the thing is that my tables are not exactly the same, both tables have common headers, but to the second table have more columns, therefore I need "smart" copy, i.e to copy the row according to the column header name.
d1:
+--------+--------+--------+
| ID | aaa | bbb |
+--------+--------+--------+
| 23 | value1 | value2 | <----copy this row
d2:
+--------+--------+--------+--------+
| ID | ccc | bbb | aaa |
+--------+--------+--------+--------+
| 23 | | value2 | value1 | <----I need this result
but this code:
string rowID=23;
DataRow[] result = dt1.Select($"ID = {rowID}");
dt2.Rows.Add(result[0].ItemArray);
gives:
d2:
+--------+--------+--------+--------+
| ID | ccc | bbb | aaa |
+--------+--------+--------+--------+
| 23 | value1 | value2 | | <---- :( NOT what I need
I think this is your homework, but here you have some simple and not very smart solution:
private DataTable DTCopySample()
{
int cnt = 0;
DataTable dt1 = new DataTable();
dt1.Columns.Add("ID");
dt1.Columns.Add("aaa");
dt1.Columns.Add("bbb");
DataTable dt2 = new DataTable();
dt2.Columns.Add("ID");
dt2.Columns.Add("ccc");
dt2.Columns.Add("bbb");
dt2.Columns.Add("aaa");
dt1.Rows.Add();
dt1.Rows[0]["ID"] = "23";
dt1.Rows[0]["aaa"] = "val1";
dt1.Rows[0]["bbb"] = "val2";
dt1.Rows.Add();
dt1.Rows[1]["ID"] = "99";
dt1.Rows[1]["aaa"] = "val99";
dt1.Rows[1]["bbb"] = "val98";
string colName = string.Empty;
foreach (DataRow row in dt1.Rows)
{
dt2.Rows.Add();
foreach (DataColumn col in dt1.Columns)
{
dt2.Rows[cnt][col.ColumnName] = row[col.ColumnName].ToString();
}
cnt++;
}
return dt2;
}
There are more smart and better solutions, but this is fast-written (2 mins) and works.
Remeber, that you have not specified columns datatypes or anything else, so I assumed there are strings everywhere for creating simple sample.

linq Group By from DataTable

I have DataTable Like this
Thank you Bob Vale for your help
what is (Select(X,i) mean in your linq,
but as i made a mistake in my table
I have this
No | Size | Type | FB | FP
----------------------------------------
100 | 2 | typeA | FB1 | A1
101 | 3 | typeB | FB1 | A1
101 | 4 | typec | FB1 | A1
103 | 4 | typeC | FB2 | A2
103 | 5 | typeD | FB2 | A2
103 | 6 | typeE | FB2 | A2
I want to have some thing like that
No | Size | Type | FB | FP
---------------------------------
100 | 2 | typeA | FB1 | A1
101 | 3 | typeB | FB1 | A1
| 4 | typec | |
103 | 4 | typeC | FB2 | A2
| 5 | typeD | |
| 6 | typeE | |
How can I make it? I can make Group By
var result = from row in cableDataTable.AsEnumerable()
group row by new
{
FB = row.Field<string>("FB"),
FP = row.Field<string>("FP"),
Size = row.Field<int>("Size"),
Type = row.Field<int>("Type"),
no= row.Field<int>("no"),
} into g
select new
{
FB = g.Key.FB,
FP = g.Key.FP,
Size = g.Key.Size,
Type = g.Key.Type
no= g.Key.no
};
but it that could't give the result
thank you for your attention
How about this:
// First declare a conversion from the DataTable to an anon type
var rows = cableDataTable.AsEnumerable()
.Select(x => new {
Size = x.Field<int>("Size"),
Type= x.Field<string>("Type"),
FB = x.Field<string>("FB"),
FP = x.Field<string>("FP")
});
// Now use group by, ordering and select many to select the rows
var result = rows.GroupBy (row => new {row.FB, row.FP} )
.OrderBy (g => g.Key.FB)
.ThenBy(g => g.Key.FP)
.SelectMany(g => g.OrderBy(row => row.Size)
.Select((x,i) =>
new {
Size = x.Size,
Type = x.Type,
FB = (i==0) ? x.FB : null,
FP= (i==0) ? x.FP : null
}));
You can use linq query as
var result = cableDataTable.AsEnumerable().GroupBy(g => new { g.FB, g.FP}).Select(x => x);

Check occurrence of word appearing in datatable column

I have the data below in a datatable this is example data. I would like get the occurrence of 12,13 in the datatable as normally there would be 10-20 million row in the datatable.
Customer | quantity | Product | Code
1 | 3 | Product | 12
2 | 4 | Product | 13
3 | 1 | Product | 12
4 | 6 | Product | 13
how about simple for each loop
private int getCount(int yourSearchDigit)
{
int counter = 0;
foreach (DataRow dr in youDataTable.Rows)
{
if (Convert.ToInt32(dr["Code"]) == yourSearchDigit)
counter++;
}
return counter;
}
You can use Linq-To-DataTable:
int[] allowedCodes = new []{ 12, 13 };
var rows = table.AsEnumerable()
.Where(r => allowedCodes.Contains(r.Field<int>("Code")));
However, if you have 10-20 million row in the datatable you should consider to do the filtering in the database itself.
If you want to know the number they occur:
int count = table.AsEnumerable()
.Count(r => allowedCodes.Contains(r.Field<int>("Code")));

How to select rows from DataTable based on Index / Row Number?

I have a DataTable. I want to select the rows based on the Index/Row Number of the rows in DataTable.
Suppose below is the DataTable:
---------------- ---------------
| ID | Name | | Index/RowNo |
---------------- ---------------
| A001 | John | | 1 |
| A002 | Foo | | 2 |
| A003 | Rambo | | 3 |
| A004 | Andy | | 4 |
| ... | ... | | 5 |
---------------- ---------------
Now, i want to select the Rows from above shown DataTable using criteria say for example Index > 2, In that case First entry at Index 1, A001 | John, will not become part of the resultant DataTable. How can i do it efficiently?
Moreover, i want to have my result both in the form of DataTable and Linq query outcome.
I am trying to do something like this:
var result = dt.Select("RowNum > 1", "");
OR
var result = from row in dt.AsEnumerable()
where RowNum > 1
select row;
I am trying to do something like this:
var result = dt.Select("RowNum > 1", "");
You can use Enumerable.Skip even with a DataTable since it is an IEnumerable<DataRow>:
IEnumerable<DataRow> allButFirst = table.AsEnumerable().Skip(1);
get a new DataTable with:
DataTable tblAllButFirst = allButFirst.CopyToDataTable();
If your next question is how you can take only rows with given indices:
var allowedIndices = new[]{ 2, 4, 7, 8, 9, 10 };
DataTable tblAllowedRows = table.AsEnumerable()
.Where((r, i) => allowedIndices.Contains(i))
.CopyToDataTable();
var result = table.AsEnumerable()
.Where((row, index) => index > 1)
.CopyToDataTable()

Sort several duplicate values in DataTable to one row

I've imported a DataTable from a SQL Database using SqlDataAdapter and Fill-Method.
My datatable looks like this:
Timestamp(unix time) | Value
x | 10
x | 42
x | 643
y | 5
y | 9
y | 70
...and so on. The table contains a lot of values (1000+) but has always three rows with the same timestamp.
Now I want it to look like this:
Timestamp(unix time) | Value 1 | Value 2 | Value 3
x | 10 | 42 | 643
y | 5 | 9 | 70
How can I sort it this way?
(If there are more than three values, the programm should just insert the first three values it has found)
Thanks for any help!
Thanks for your approach! I solved it myself now.
This is how I've done it:
var grouped = from myRow in myDataTable.AsEnumerable()
group myRow by myRow.Field<int>("TIMESTAMP");
foreach (var timestamp in grouped)
{
string[] myRow = new string[5];
myRow[0] = timestamp.Key.ToString();
int i = 1;
foreach (var value in timestamp)
{
myRow[i] = value.Field<double>("VALUE").ToString();
i++;
if (i > 4)
break;
}
mySortedTable.Rows.Add(myRow);
}
I think this may also be solvable in SQL, but if you want to do it programmatically, I have tested the following in LinqPad:
void Main()
{
var list = new List<Tuple<string,int>> {
Tuple.Create("x", 10),
Tuple.Create("x", 42),
Tuple.Create("x", 643),
Tuple.Create("y", 5),
Tuple.Create("y", 9),
Tuple.Create("y", 70),
};
var result =
from grp in list.GroupBy(t => t.Item1)
let firstThree = grp.Select(t => t.Item2).Take(3).ToList()
select new {
Key = grp.Key,
Value1 = firstThree[0],
Value2 = firstThree[1],
Value3 = firstThree[2] };
foreach (var item in result)
Console.WriteLine(item);
}
It assumes that you have at least three elements, otherwise you'll get an out of range exception.
While the end result is an anonymous type, you could easily pipe the results of the operation into a DataRow instead.

Categories

Resources