Compare Data tables - c#

I need to compare the two data table,
In both datatable we have the systemuserid . In datatable1 we have two rows.The system user id will start with c2dd... and 53cf...
Now i need to compare the two tables whther all systemuserids are available in second Datatable.
In these table the c2dd... sustem user is not available in the datatable 2. so i need to add that c2dd.. row in datatable 2 with noofCall as 0

If you have two datatable available, then you can compare two table and get table1 row systemuserid which are not available in table2 in following way :
IEnumerable<DataRow> differenceRows = table1.AsEnumerable()
.Where(x => table2.AsEnumerable()
.All(y => y.Field<string>("systemuserid") != x.Field<string>("systemuserid")));
After getting differenceRows, you can add new row in table2 iterating through differenceRows.

Related

LINQ join to return dynamic column list

I'm looking for a way to return a dynamic column list from a LINQ join of two datatables.
First, this is not a duplicate. I have already studied and discarded:
C# LINQ list select columns dynamically from a joined dataset
Creating a LINQ select from multiple tables
How to do a LINQ join that behaves exactly like a physical database inner join?
(and many others)
Here is my starting point:
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
DataTable result = ( from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
[...I NEED HELP HERE with the SELECT....]).CopyToDataTable();
return result;
}
A few notes and requirements:
There is no database engine. The data sources are large CSV files (500K+ records) being read into c# DataTables.
Because the CSVs are large, looping through each record in the join is a bad solution for performance reasons. I've already tried record looping and it's just too slow. I get great performance on the join above, but I can't find a way to have it return just the columns I want (specified by the caller) without looping records.
If I need to loop over columns in the join, that is perfectly fine, I just don't want to loop rows.
I want to be able to pass in an array of column names and return just those columns in the resulting DataTable. If both datatables being passed in happen to have a column named the same, and if that column is in my array of column names, just pass back either column because the data will be the same between the 2 columns in that case.
If I need to pass in 2 arrays (1 for each datatable's desired columns) that's fine, but 1 array of column names would be ideal.
The column list cannot be static and hardcoded into the function. The reason is because my JoinDataTables() is called from many different places in my system in order to join a wide variety of CSVs-turned-datatables, and each CSV file has very different columns.
I don't want all columns returned in the resulting DataTable -- just the columns I specify in the columns array.
So suppose, before calling JoinDataTables(), I have the following 2 datatables:
Table: T1
T1A T1B T1C T1D
==================
10 AA H1 Foo1
11 AB H1 Foo2
12 AA H2 Foo1
13 AB H2 Foo2
Table: T2
T2A T2X T2Y T2Z
==================
12 N1 O1 Yeah1
17 N2 O2 Yeah2
18 N3 O1 Yeah1
19 N4 O2 Yeah2
Now suppose we join these 2 tables like so:
ON T1.T1A = T2.T2A
select * from [join]
and that yields this resultset:
T1A T1B T1C T1D T2A T2X T2Y T2Z
====================================
12 AA H2 Foo1 12 N1 O1 Yeah1
Notice that only 1 row is yielded by the join.
Now to the crux of my question. Suppose that for a given use case, I want to return only 4 columns from this join: T1A, T1D, T2A, and T2Y. So my resultset would then look like this:
T1A T1D T2A T2Y
==================
12 Foo1 12 O1
I'd like to be able to call my JoinDataTables function like so:
DataTable dt = JoinDataTables(dt1, dt2, "T1A", "T2A", new string[] {"T1A", "T1D", "T2A", "T2Y"});
Keeping in mind performance and the fact that I don't want to loop through records (because it's slow for large sets of data), how can this be accomplished? (The join is already working well, now I just need a correct select segment (whether via new{..} or whatever you think)).
I cannot accept a solution with a hardcoded column list inside the function. I have found examples of that approach all over SO.
Any ideas?
EDIT: I'd be ok getting ALL columns back every time, but every attempt I've made to include all columns has resulted in some kind of FULL OUTER JOIN or CROSS JOIN, returning orders of magnitude more records than it should. So, I'd be open to getting all columns back, as long as I don't get the cross join.
I'm not sure of the performance with 500k records, but here is an attempted solution.
Since you are combining two subsets of DataRows from different tables, there are no easy operations that will create the subset or create a new DataTable from the subsets (though I have an extension method for flattening an IEnumerable<anon> where anon = new { DataRow1, DataRow2, ... } from a join, it would probably be slow for you).
Instead, I pre-create an answer DataTable with the columns requested and then use LINQ to build the value arrays to be added as the rows.
public static DataTable JoinDataTables(DataTable dt1, DataTable dt2, string table1KeyField, string table2KeyField, string[] columns) {
var rtnCols1 = dt1.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName)).ToList();
var rc1 = rtnCols1.Select(dc => dc.ColumnName).ToList();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.ColumnName).ToList();
var work = from dataRows1 in dt1.AsEnumerable()
join dataRows2 in dt2.AsEnumerable()
on dataRows1.Field<string>(table1KeyField) equals dataRows2.Field<string>(table2KeyField)
select (from c1 in rc1 select dataRows1[c1]).Concat(from c2 in rc2 select dataRows2[c2]).ToArray();
var result = new DataTable();
foreach (var rc in rtnCols1)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rc in rtnCols2)
result.Columns.Add(rc.ColumnName, rc.DataType);
foreach (var rowVals in work)
result.Rows.Add(rowVals);
return result;
}
Since you were using query syntax, I did as well, but normally I would probably do the select like so:
select rc1.Select(c1 => dataRows1[c1]).Concat(rc2.Select(c2 => dataRows2[c2])).ToArray();
Updated: It is probably worthwhile to use the column ordinals instead of the names to index into each DataRow by replacing the definitions of rc1 and rc2:
var rc1 = rtnCols1.Select(dc => dc.Ordinal).ToList();
var rc1Names = rtnCols1.Select(dc => dc.ColumnName).ToHashSet();
var rtnCols2 = dt2.Columns.Cast<DataColumn>().Where(dc => columns.Contains(dc.ColumnName) && !rc1Names.Contains(dc.ColumnName)).ToList();
var rc2 = rtnCols2.Select(dc => dc.Ordinal).ToList();

Copy to Datatable

I am joining datatables to create a new datatable,
Code :
var row = from r0w1 in dt_vi.AsEnumerable()
join r0w2 in dt_w.AsEnumerable()
on r0w1.Field<int>("ID") equals r0w2.Field<int>("iD")
join r0w3 in dt_re.AsEnumerable()
on r0w1.Field<int?>("ID") equals r0w3.Field<int?>("id")
join r0w4 in dt_def.AsEnumerable()
on r0w1.Field<int?>("ID") equals r0w4.Field<int?>("id") into ps
from r0w4 in ps.DefaultIfEmpty()
select r0w1.ItemArray.Concat(r0w2.ItemArray.Concat(r0w3.ItemArray.Concat(r0w4 != null ? r0w4.ItemArray : new object[] { }))).ToArray();
foreach (object[] values in row)
dt.Rows.Add(values);
In the above code,
foreach (object[] values in row)
dt.Rows.Add(values);
is slow for hundreds of thousands of rows. I want to put the data of row into dt
I want to know if there is any way exists so, that I don't have to use loop to create a new datatable ?
As #Thanos Markow mentioned, you need to use dataTable.Merge(the second data table).
An Important Note: The merge operation takes into account only the original table, and the table to be merged. Child tables are not affected or included. If a table has one or more child tables, defined as part of a relationship, each child table must be merged individually.

How to pivot DataTable by Column

I like to group a datatable by a known column but the rest of the columns are unknown. The first table in the picture is the source and the second table is the one i like to produce. Only the column that is needed to group by is sure to be there. I don't know the rest of the columns so it must be dynamic.
So far, i have tried using Linq but it doesn't product the output i wanted.
var dt = res.AsEnumerable()
.GroupBy(r => r.Field<string>("GroupBy"))
.SelectMany(t => t.ToList())
.CopyToDataTable();
When you talk about pivoting a table, you are usually summarizing the data in some fashion -- counting, totaling, averaging. If you only know one column, you can't really pivot it other than to count how many rows are in each group:
var dt = res
.AsEnumerable()
.GroupBy(r => r.Field<String>("ColumnToGroup"))
.Select(r => new { Key = r.Key, Count = r.Count() });
Gives you a pivot table that looks something like:
Key Count
London 2
Manchester 2
To do a useful pivot, you have to know something about the data in the table.

Merging 2 datatables in to 1 datatable with same number of rows.

How can i merge two Datatables into the same row. I am using different stored procedures to get data into datasets. In asp.net using c#, i want to merge them so there are same number of rows as table 1 with an added column from table 2.
For example:
DataTable table1 = dsnew.Tables[0];
DataTable table2 = dsSpotsLeft.Tables[0];
table1.Merge(table2);
This is fetching me 4 rows instead of 2 rows. What am i missing here? Thanks in advance!!
You cannot use the method Merge in this case, instead you should create new DataTable dt3, and then add columns and rows based on the table 1 and 2:
var dt3 = new DataTable();
var columns = dt1.Columns.Cast<DataColumn>()
.Concat(dt2.Columns.Cast<DataColumn>());
foreach (var column in columns)
{
dt3.Columns.Add(column.ColumnName, column.DataType);
}
//TODO Check if dt2 has more rows than dt1...
for (int i = 0; i < dt1.Rows.Count; i++)
{
var row = dt3.NewRow();
row.ItemArray = dt1.Rows[i].ItemArray
.Concat(dt2.Rows[i].ItemArray).ToArray();
dt3.Rows.Add(row);
}
Without knowing more about the design of these tables, some of this is speculation.
What it sounds like you want to perform is a JOIN. For example, if you have one table that looks like:
StateId, StateName
and another table that looks like
EmployeeId, EmployeeName, StateId
and you want to end up with a result set that looks like
EmployeeId, EmployeeName, StateId, StateName
You would perform the following query:
SELECT Employee.EmployeeId, Employee.EmployeeName, Employee.StateId, State.StateName
FROM Employee
INNER JOIN State ON Employee.StateId = State.StateId
This gives you a resultset but doesn't update any data. Again, speculating on your dataset, I'm assuming that your version of the Employee table might look like the resultset:
EmployeeId, EmployeeName, StateId, StateName
but with StateName in need of being populated. In this case, you could write the query:
UPDATE Employee
SET Employee.StateName = State.StateName
FROM Employee
INNER JOIN State ON Employee.StateId = State.StateId
Tested in SQL Server.
Assuming you have table Category and Product related by CategoryID, then try this
var joined = from p in prod.AsEnumerable()
join c in categ.AsEnumerable()
on p["categid"] equals c["categid"]
select new
{
ProductName = p["prodname"],
Category = c["name"]
};
var myjoined = joined.ToList();
Sources
LINQ query on a DataTable
Inner join of DataTables in C#
http://social.msdn.microsoft.com/Forums/en-US/adodotnetdataset/thread/ecb6a83d-b9b0-4e64-8107-1ca8757fe58c/
That was a LINQ solution. You can also loop through the first datatable and add columns from the second datatable

How to join multiple tables into one

How do I join multiple tables into one?
I have 3 Datatables with column names: Test 1, Test 2 and Test3
The Datatables got 3 rows with values.
I want them in one datatable like the example below
TEST 1 | TEST2 | TEST 3
I've tried the merge function dt1.merge(dt2) but it adds additional 3 rows to every column.
The table is for DataGridView.
This is the code example on how i retrieve the table from the database.
string queryStatusTest =
"SELECT status AS 'uppgift " + t +"' FROM b_personuppgift
WHERE uppgiftid IN(SELECT uppgiftid FROM b_uppgift
WHERE kursid = 'ABC123') "
+ "AND uppgiftid=" + t + " ORDER BY uppgiftid";
DataTable dTest = dataBase.Select(queryStatusTest);
The database.Select() function returns only one table.
Assume 3 columns type are string:
var mergeTable = dt1.AsEnumerable()
.Concat(dt2.AsEnumerable())
.Concat(dt3.AsEnumerable())
.GroupBy(row => new {
Test1 = row.Field<string>("Test1")
Test2 = row.Field<string>("Test2")
Test3 = row.Field<string>("Test3")
})
.Select(g => g.First())
.CopyToDataTable();
I guess that saying database you mean table. You can insert data from all 3 tables into one using this statement:
INSERT INTO table_all(column1,column2,column3)
VALUES(
(SELECT column1 FROM test1),
(SELECT column1 FROM test2),
(SELECT column1 FROM test3),
)
The most common reason for this problem is that you haven't set the primary key on the DataTables. The Merge method uses the primary key to match rows.
The Merge method documentation in MSDN states:
When merging a new source DataTable into the target, any source rows
with a DataRowState value of Unchanged, Modified, or Deleted, is
matched to target rows with the same primary key values. Source rows
with a DataRowState value of Added are matched to new target rows with
the same primary key values as the new source rows.

Categories

Resources