I'm processing 2 DataTables:
SSFE: Contains the values I want to find
FFE: Is larger, smaller or equally large as SSFE, but does not necessarily contain every value of SSFE
The values I need to match between these tables are integers, both tables are sorted from small to large.
My idea was to start searching on the first item in FFE, start looping through SSFE, and when I find a match -> remember current index -> save match -> select next item from FFE and continue from the previous index.
Also FFE can contain integers, but can also contain strings, that is why I cast the values to a string and compare these.
I made some code, but it takes, too much time.
It will take about a minute to compare SSFE(1.000 items) to FFE(127.000) items.
int whereami = 0;
bool firstiteration = true;
for (int i = 0; i < FFEData.Rows.Count - 1; i++)
{
for (int j = 0; j < SSFEData.Rows.Count - 1; j++)
{
if (firstiteration)
{
j = whereami;
firstiteration = false;
}
if (SSFEData.Rows[j][0] == FFEData.Rows[i][0].ToString())
{
found++;
whereami = j;
firstiteration = true;
break;
}
}
}
I'm only storing how many occurences I have found for testing. In this example it will find 490 matches, not that this is relevant.
Any suggestions would be great!
Could try the DataRelation class. It creates a foreign key / join between two DataTables in a DataSet.
using System.Data;
using System.Text;
public int GetMatches(DataTable table1, DataTable table2)
{
DataSet set = new DataSet();
//wrap the tables in a DataSet.
set.Tables.Add(table1);
set.Tables.Add(table2);
//Creates a ForeignKey like Join between two tables.
//Table1 will be the parent. Table2 will be the child.
DataRelation relation = new DataRelation("IdJoin", table1.Columns[0], table2.Columns[0], false);
//Have the DataSet perform the join.
set.Relations.Add(relation);
int found = 0;
//Loop through table1 without using LINQ.
for(int i = 0; i < table1.Rows.Count; i++)
{
//If any rows in Table2 have the same Id as the current row in Table1
if (table1.Rows[i].GetChildRows(relation).Length > 0)
{
//Add a counter
found++;
//For debugging, proof of match:
//Get the id's that matched.
string id1 = table1.Rows[i][0].ToString();
string id2 = table1.Rows[i].GetChildRows(relation)[0][0].ToString();
}
}
return found;
}
I randomly populated two non-indexed tables with nvarchar(2) strings, with 10,000 rows each. The match took sub 1 second, including the time spent populating the tables. I would get between 3500 and 4000 matches a run on average.
However, the major caveat is that the DataColumns being matched must be the same data type. So if both columns are string, or at least integers stored as string, then this will work.
But if one column is an integer, you will have to add a new column, and store the integers as string in that column first. The string translations will add a hefty amount of time.
Another option will be uploading the tables to a database and performing a query. That large an upload will probably take a few seconds, but the query will be under one second as well. So still better than 60 seconds+.
Related
I made the following code to add external data table to another table in MS word document, its working fine but takes a lot of time in case that the number of rows is more than 100, and in case of adding table with rows count more that 500 it fills the ms word table really slow and can't complete the task.
I tried to hide the document and disable the screen update for the document but still no solution for the slow performance.
//Get the required external data to the DT data table
DataTable DT = XDt.GetData();
Word.Table TB;
int X = 1;
foreach (DataRow Rw in DT.Rows)
{
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
for(int i=0;i<=DT.Columns.Count-1;i++)
{
Rn.Cells[i+1].Range.Text = Rw[i].ToString());
}
X++;
}
So is there a way to make this process go faster ?
The most efficient way to add a table to Word is to first concatenate the data in a delimited text string, where "/n" must be the symbol for end-of-row (record separator). The end-of-cell (field separator) can be any character you like that's not in the string content that makes up the table.
Assign this string to a Range object, then use the ConvertToTable() method to create the table.
You're retrieving the last row of the current table for the BeforeRow parameter of TB.Rows.Add. This is significantly slower than simply adding the row. You should replace this:
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
With this:
Word.Row Rn = TB.Rows.Add();
Utilizing parallelization as suggested in the comments might help slightly, but I'm afraid it's not going to do much good seeing the table add code runs on the main thread as mentioned in this link.
EDIT:
If performance is still an issue, I'd look into creating the Word table independently of the Word object model by using OpenXML. It's orders of magnitude faster.
ConvertToTable method is orders of magnitude faster than adding Rows/Cells one at a time.
while (reader.Read())
{
values = new object[reader.FieldCount];
var cols = reader.GetValues(values);
var item = String.Join("\t", values);
items.Add(item);
};
data = String.Join("\n", items.ToArray());
var tempDocument = application.Documents.Add();
var range = tempDocument.Range();
range.Text = data;
var tempTable = range.ConvertToTable(Separator: Microsoft.Office.Interop.Word.WdTableFieldSeparator.wdSeparateByTabs,
NumColumns: reader.FieldCount,
NumRows: rows, DefaultTableBehavior: WdDefaultTableBehavior.wdWord9TableBehavior,
AutoFitBehavior: WdAutoFitBehavior.wdAutoFitWindow);
I want to insert around 1 million records into a database using Linq in ASP.NET MVC. But when I try the following code it didn't work. It's throwing an OutOfMemoryException. And also it took 3 days in the loop. Can anyone please help me on this???
db.Database.ExecuteSqlCommand("DELETE From [HotelServices]");
DataTable tblRepeatService = new DataTable();
tblRepeatService.Columns.Add("HotelCode",typeof(System.String));
tblRepeatService.Columns.Add("Service",typeof(System.String));
tblRepeatService.Columns.Add("Category",typeof(System.String));
foreach (DataRow row in xmltable.Rows)
{
string[] servicesarr = Regex.Split(row["PAmenities"].ToString(), ";");
for (int a = 0; a < servicesarr.Length; a++)
{
tblRepeatService.Rows.Add(row["HotelCode"].ToString(), servicesarr[a], "PA");
}
String[] servicesarrA = Regex.Split(row["RAmenities"].ToString(), ";");
for (int b = 0; b < servicesarrA.Length; b++)
{
tblRepeatService.Rows.Add(row["hotelcode"].ToString(), servicesarrA[b], "RA");
}
}
HotelAmenties _hotelamenties;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
}
db.SaveChanges();
tblRepeatService table has around 1 million rows.
Bulk inserts like this are highly inefficient in LINQtoSQL. Every insert creates at least three objects (the DataRow, the HotelAmenities object and the tracking record for it), chewing up memory on objects you don't need.
Given that you already have a DataTable, you can use System.Data.SqlClient.SqlBulkCopy to push the content of the table to a temporary table on the SQL server, then use a single insert statement to load the data into its final destination. This is the fastest way I have found so far to move many thousands of records from memory to SQL.
If performance doesn't matter and this is a 1 shot job you can stick to the way you're using. Your problem is you're only saving at the end, so entity Framework has to store and generate the SQL for 1 million operations at once, modify your code so that you save every 1000 or so inserts instead of only at the end and it should work just fine.
int i = 0;
foreach (DataRow hadr in tblRepeatService.Rows)
{
_hotelamenties = new HotelAmenties();
_hotelamenties.Id = Guid.NewGuid();
_hotelamenties.ServiceName = hadr["Service"].ToString();
_hotelamenties.HotelCode = hadr["HotelCode"].ToString();
db.HotelAmenties.Add(_hotelamenties);
if((i%1000)==0){
db.SaveChanges();
}
i++;
}
db.SaveChanges();
I am using c#, asp.net
I have two different tables from two different databases.
Both Have one field in common that I am interested in say custid
I have assigned both to data tables but what I would like to do is this
Compare the two datatables (maybe with a dataview, which I have never used before)
to see what custid are missing from either table,
then i can insert or delete them from the first table , as I want the first table to have the same custid's as the second table.
Table 1 Table 2
1 1
2 - want to delete from table 1
3 3
4 - want to add to table 1
Any help would be appreciated.
As I didn't want to display the data just manipulate it
What I did in the end was
string [] servercustomers = new string [servercustomertable.lenght];
for (i = 0; i < servercustomertable.length; i++)
{
// run query to see if current record exists in localtable
// if it does update it
// if it doesnt create it
servercustomer[i] = servercustomertable.Rows[i]["Custid"];
}
// check through the localtable
for (i = 0; i < localcustomertable.length; i++)
{ bool recordexist = false;
// check through the array to see if the custid is there
for (j = 0; j < servercustomer.length; j++)
{
if (servercustomer[j] = localcustomertable.Rows[i]["custId"])
{
recordexists = true;
break;
}
if(!recordexists)
// delete the record from the local table
}
}
Thanks for all your help
Rachael
I'm importing the data from three Tab delimited files in the DataTables and after that I need to go thru every row of master table and find all the rows in two child tables. Against each DataRow[] array I found from the child tables, I have to again go thru individually each row and check the values based upon different paramenters and at the end I need to create a final record which will be merger of master and two child table columns.
Now I have done that and its working but the problem is its Performance. I'm using the DataTable.Select to find all child rows from child table which I believe making it very slow.
Please remember the None of the table has any Primary key as the duplicate rows are acceptable.
At the moment I have 1200 rows in master table and aroun 8000 rows in child table and the total time it takes to do that is 8 minutes.
Any idea how can I increase the Performance.
Thanks in advance
The code is below ***************
DataTable rawMasterdt = importMasterFile();
DataTable rawDespdt = importDescriptionFile();
dsHelper = new DataSetHelper();
DataTable distinctdt = new DataTable();
distinctdt = dsHelper.SelectDistinct("DistinctOffers", rawMasterdt, "C1");
if (distinctdt.Rows.Count > 0)
{
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
count++;
txtBlock1.Text = "Importing Offer " + count.ToString() + " of " + distinctdt.Rows.Count.ToString();
if (masterRow != null )
{
Product newProduct = new Product();
newProduct.Code = masterRow["C4"].ToString();
newProduct.Name = masterRow["C5"].ToString();
// -----
newProduct.Description = getProductDescription(offer[0].ToString(), rawDespdt);
newProduct.Weight = getProductWeight(offer[0].ToString(), rawDespdt);
newProduct.Price = getProductRetailPrice(offer[0].ToString(), rawDespdt);
newProduct.UnitPrice = getProductUnitPrice(offer[0].ToString(), rawDespdt);
// ------- more functions similar to above here
productList.Add(newProduct);
}
}
txtBlock1.Text = "Import Completed";
public string getProductDescription(string offercode, DataTable dsp)
{
string exp = "((C1 = " + "'" + offercode + "')" + " AND ( C6 = 'c' ))";
DataRow[] dRows = dsp.Select( exp);
string descrip = "";
if (dRows.Length > 0)
{
for (int i = 0; i < dRows.Length - 1; i++)
{
descrip = descrip + " " + dRows[i]["C12"];
}
}
return descrip;
}
.Net 4.5 and the issue is still there.
Here are the results of a simple benchmark where DataTable.Select and different dictionary implementations are compared for CPU time (results are in milliseconds)
#Rows Table.Select Hashtable[] SortedList[] Dictionary[]
1000 43,31 0,01 0,06 0,00
6000 291,73 0,07 0,13 0,01
11000 604,79 0,04 0,16 0,02
16000 914,04 0,05 0,19 0,02
21000 1279,67 0,05 0,19 0,02
26000 1501,90 0,05 0,17 0,02
31000 1738,31 0,07 0,20 0,03
Problem:
The DataTable.Select method creates a "System.Data.Select" class instance internally, and this "Select" class creates indexes based on the fields (columns) specified in the query. The Select class makes re-use of the indexes it had created but the DataTable implementation does not re-use the Select class instance hence the indexes are re-created every time DataTable.Select is invoked. (This behaviour can be observed by decompiling System.Data)
Solution:
Assume the following query
DataRow[] rows = data.Select("COL1 = 'VAL1' AND (COL2 = 'VAL2' OR COL2 IS NULL)");
Instead, create and fill a Dictionary with keys corresponding to the different value combinations of the values of the columns used as the filter. (This relatively expensive operation must be done only once and the dictionary instance must then be re-used)
Dictionary<string, List<DataRow>> di = new Dictionary<string, List<DataRow>>();
foreach (DataRow dr in data.Rows)
{
string key = (dr["COL1"] == DBNull.Value ? "<NULL>" : dr["COL1"]) + "//" + (dr["COL2"] == DBNull.Value ? "<NULL>" : dr["COL2"]);
if (di.ContainsKey(key))
{
di[key].Add(dr);
}
else
{
di.Add(key, new List<DataRow>());
di[key].Add(dr);
}
}
Query the Dictionary (multiple queries may be required) to filter the rows and combine the results into a List
string key1 = "VAL1//VAL2";
string key2 = "VAL1//<NULL>";
List<DataRow>() results = new List<DataRow>();
if (di.ContainsKey(key1))
{
results.AddRange(di[key1]);
}
if (di.ContainsKey(key2))
{
results.AddRange(di[key2]);
}
I know this is an old question, and code underpinning this issue may have changed, but I've recently encountered (and gain some insight into) this very issue.
For anyone coming along at a later date ... here's what I found.
Performance of the DataTable.Select(condition) is quite sensitive to the nature and structure of the 'condition' you provide. This looks like a bug to me (where would I report it to Microsoft?) but it may merely be a quirk.
I've written a set of tests to demonstrate the issue that are structured as follows:
Define a datatable with a few simple columns,like this:
var dataTable = new DataTable();
var idCol = dataTable.Columns.Add("Id", typeof(Int32));
dataTable.Columns.Add("Code", typeof(string));
dataTable.Columns.Add("Name", typeof(string));
dataTable.Columns.Add("FormationDate", typeof(DateTime));
dataTable.Columns.Add("Income", typeof(Decimal));
dataTable.Columns.Add("ChildCount", typeof(Int32));
dataTable.Columns.Add("Foreign", typeof(Boolean));
dataTable.PrimaryKey = new DataColumn[1] { idCol };
Populate the table with 40000 records, each with a unique 'Code' field.
Perform a batch of 'selects' (each with different parameters) against the datatable using two similar, but differently formatted, queries and record and compare the total time taken by each of the two formats.
You get remarkable results. Testing, for example, the below two conditions side-by-side:
Q1: [Code] = 'XX'
Q2: ([Code] = 'XX')
[ I do multiple Select calls using the above two queries, each iteration I replace the XX with a valid code that exists in the datatable ]
The result?
Time comparison for 320 lookups against 40000 records: 180 msec total search time with no brackets, 6871 msec total search time for search WITH brackets
Yes - 38 times slower if you just have the extra brackets surrounding the condition.
There are other scenarios which react differently.
For example,
[Code] = '{searchCode}' OR 1=0 vs ([Code] = '{searchCode}' OR 1=0) take similar (slow) times to execute, but:
[Code] = '{searchCode}' AND 1=1 vs ([Code] = '{searchCode}' AND 1=1) again shows the non-bracketed version to be close to 40 times faster.
I've not investigated all scenarios, but it seems that the introduction of brackets - either redundantly around a simple comparison check, or as required to specify sub-expression precedence - or the presence of an 'OR' slows the query down considerably.
I could speculate that the issue is caused by how the datatable parses the condition you use and how it creates and uses internal indexes ... but I won't.
You can speed it up a lot by using a dictionary. For example:
if (distinctdt.Rows.Count > 0)
{
// build index of C1 values to speed inner loop
Dictionary<string, DataRow> masterIndex = new Dictionary<string, DataRow>();
foreach (DataRow row in rawMasterdt.Rows)
masterIndex[row["C1"].ToString()] = row;
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
Then in place of
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
You would do this
DataRow masterRow;
if (masterIndex.ContainsKey(offer[0].ToString())
masterRow = masterIndex[offer[0].ToString()];
else
masterRow = null;
If you create a DataRelation between your parent and child DataTables, you can look up child rows by invoking DataRow.GetChildRows(DataRelation) on the parent row (resp. DataRow.GetChildRelName in case of typed DataSets). The search will apply a TreeMap lookup, and performance should be fine even with a lot of child rows.
In case you have to search for rows based on other criteria than on a DataRelation's foreign keys, I recommend to use DataView.Sort / DataView.FindRows() instead of DataTable.Select(), as soon as you have to query the data more than once. DataView.FindRows() will be based on TreeMap lookup (O(log(N)), where as DataTable.Select() has to scan all rows (O(N)). This article contains more details: http://arnosoftwaredev.blogspot.com/2011/02/when-datatableselect-is-slow-use.html
DataTables can be made to have Relationships with other DataTables in a DataSet. See http://msdn.microsoft.com/en-us/library/ay82azad%28VS.71%29.aspx for a bit of discussion and as a start point to browsing. I've not much experience of using them but as I understand it they will do what you want (assuming your tables are in a suitable format). I would assume that these have greater efficiency than a manual process of doing the same but I may be wrong. Might be worth seeing if they work for you and benchmarking to see if they are an improvement or not...
Have you ran it through a profiler? That should be the first step. Anyhow, this might help:
Read the master text file into memory line by line. Put the master record into a dictionary as the key. Add it to the dataset (1 pass through master).
Read child text file line by line, add this as a value for the appropriate master record in the dictionary created above
Now you have everything in the dictionary in memory, only doing 1 pass through each file.
Do a final pass through the dictionary/children and process each column and perform final calcs.
I have two DataTables, A and B, produced from CSV files. I need to be able to check which rows exist in B that do not exist in A.
Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large.
Assuming you have an ID column which is of an appropriate type (i.e. gives a hashcode and implements equality) - string in this example, which is slightly pseudocode because I'm not that familiar with DataTables and don't have time to look it all up just now :)
IEnumerable<string> idsInA = tableA.AsEnumerable().Select(row => (string)row["ID"]);
IEnumerable<string> idsInB = tableB.AsEnumerable().Select(row => (string)row["ID"]);
IEnumerable<string> bNotA = idsInB.Except(idsInA);
would I have to iterate through each row on each DataTable to check if they are the same.
Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.
Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:
1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:
Sort both tables by their ID (using some useful thing like a quicksort). If they're already sorted then you win big.
Step through both tables at once, skipping over any gaps in ID's in either table. Matched ID's mean duplicated records.
This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.
(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )
2: An alternative approach, which may be more or less efficient depending on how big your data is:
Run through table 1, and for each row, stick it's ID (or computed hashcode, or some other unique ID for that row) into a dictionary (or hashtable if you prefer to call it that).
Run through table 2, and for each row, see if the ID (or hashcode etc) is present in the dictionary. You're exploiting the fact that dictionaries have really fast - O(1) I think? lookup. This step will be really fast, but you'll have paid the price doing all those dictionary inserts.
I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)
You can use the Merge and GetChanges methods on the DataTable to do this:
A.Merge(B); // this will add to A any records that are in B but not A
return A.GetChanges(); // returns records originally only in B
The answers so far assume that you're simply looking for duplicate primary keys. That's a pretty easy problem - you can use the Merge() method, for instance.
But I understand your question to mean that you're looking for duplicate DataRows. (From your description of the problem, with both tables being imported from CSV files, I'd even assume that the original rows didn't have primary key values, and that any primary keys are being assigned via AutoNumber during the import.)
The naive implementation (for each row in A, compare its ItemArray with that of each row in B) is indeed going to be computationally expensive.
A much less expensive way to do this is with a hashing algorithm. For each DataRow, concatenate the string values of its columns into a single string, and then call GetHashCode() on that string to get an int value. Create a Dictionary<int, DataRow> that contains an entry, keyed on the hash code, for each DataRow in DataTable B. Then, for each DataRow in DataTable A, calculate the hash code, and see if it's contained in the dictionary. If it's not, you know that the DataRow doesn't exist in DataTable B.
This approach has two weaknesses that both emerge from the fact that two strings can be unequal but produce the same hash code. If you find a row in A whose hash is in the dictionary, you then need to check the DataRow in the dictionary to verify that the two rows are really equal.
The second weakness is more serious: it's unlikely, but possible, that two different DataRows in B could hash to the same key value. For this reason, the dictionary should really be a Dictionary<int, List<DataRow>>, and you should perform the check described in the previous paragraph against each DataRow in the list.
It takes a fair amount of work to get this working, but it's an O(m+n) algorithm, which I think is going to be as good as it gets.
Just FYI:
Generally speaking about algorithms, comparing two sets of sortable (as ids typically are) is not an O(M*N/2) operation, but O(M+N) if the two sets are ordered. So you scan one table with a pointer to the start of the other, and:
other_item= A.first()
only_in_B= empty_list()
for item in B:
while other_item > item:
other_item= A.next()
if A.eof():
only_in_B.add( all the remaining B items)
return only_in_B
if item < other_item:
empty_list.append(item)
return only_in_B
The code above is obviously pseudocode, but should give you the general gist if you decide to code it yourself.
Thanks for all the feedback.
I do not have any index's unfortunately. I will give a little more information about my situation.
We have a reporting program (replaced Crystal reports) that is installed in 7 Servers across EU. These servers have many reports on them (not all the same for each country). They are invoked by a commandline application that uses XML files for their configuration. So One XML file can call multiple reports.
The commandline application is scheduled and controlled by our overnight process. So the XML file could be called from multiple places.
The goal of the CSV is to produce a list of all the reports that are being used and where they are being called from.
I am going through the XML files for all references, querying the scheduling program and producing a list of all the reports. (this is not too bad).
The problem I have is I have to keep a list of all the reports that might have been removed from production. So I need to compare the old CSV with the new data. For this I thought it best to put it into DataTables and compare the information, (this could be the wrong approach. I suppose I could create an object that holds it and compares the difference then create iterate through them).
The data I have about each report is as follows:
String - Task Name
String - Action Name
Int - ActionID (the Action ID can be in multiple records as a single action can call many reports, i.e. an XML file).
String - XML File called
String - Report Name
I will try the Merge idea given by MusiGenesis (thanks). (rereading some of the posts not sure if the Merge will work, but worth trying as I have not heard about it before so something new to learn).
The HashCode Idea sounds interesting as well.
Thanks for all the advice.
I found an easy way to solve this. Unlike previous "except method" answers, I use the except method twice. This not only tells you what rows were deleted but what rows were added. If you only use one except method - it will only tell you one difference and not both. This code is tested and works. See below
//Pass in your two datatables into your method
//build the queries based on id.
var qry1 = datatable1.AsEnumerable().Select(a => new { ID = a["ID"].ToString() });
var qry2 = datatable2.AsEnumerable().Select(b => new { ID = b["ID"].ToString() });
//detect row deletes - a row is in datatable1 except missing from datatable2
var exceptAB = qry1.Except(qry2);
//detect row inserts - a row is in datatable2 except missing from datatable1
var exceptAB2 = qry2.Except(qry1);
then execute your code against the results
if (exceptAB.Any())
{
foreach (var id in exceptAB)
{
//execute code here
}
}
if (exceptAB2.Any())
{
foreach (var id in exceptAB2)
{
//execute code here
}
}
Could you not simply compare the CSV files before loading them into DataTables?
string[] a = System.IO.File.ReadAllLines(#"cvs_a.txt");
string[] b = System.IO.File.ReadAllLines(#"csv_b.txt");
// get the lines from b that are not in a
IEnumerable<string> diff = b.Except(a);
//... parse b into DataTable ...
public DataTable compareDataTables(DataTable First, DataTable Second)
{
First.TableName = "FirstTable";
Second.TableName = "SecondTable";
//Create Empty Table
DataTable table = new DataTable("Difference");
DataTable table1 = new DataTable();
try
{
//Must use a Dataset to make use of a DataRelation object
using (DataSet ds4 = new DataSet())
{
//Add tables
ds4.Tables.AddRange(new DataTable[] { First.Copy(), Second.Copy() });
//Get Columns for DataRelation
DataColumn[] firstcolumns = new DataColumn[ds4.Tables[0].Columns.Count];
for (int i = 0; i < firstcolumns.Length; i++)
{
firstcolumns[i] = ds4.Tables[0].Columns[i];
}
DataColumn[] secondcolumns = new DataColumn[ds4.Tables[1].Columns.Count];
for (int i = 0; i < secondcolumns.Length; i++)
{
secondcolumns[i] = ds4.Tables[1].Columns[i];
}
//Create DataRelation
DataRelation r = new DataRelation(string.Empty, firstcolumns, secondcolumns, false);
ds4.Relations.Add(r);
//Create columns for return table
for (int i = 0; i < First.Columns.Count; i++)
{
table.Columns.Add(First.Columns[i].ColumnName, First.Columns[i].DataType);
}
//If First Row not in Second, Add to return table.
table.BeginLoadData();
foreach (DataRow parentrow in ds4.Tables[0].Rows)
{
DataRow[] childrows = parentrow.GetChildRows(r);
if (childrows == null || childrows.Length == 0)
table.LoadDataRow(parentrow.ItemArray, true);
table1.LoadDataRow(childrows, false);
}
table.EndLoadData();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
return table;
}
try
{
if (ds.Tables[0].Columns.Count == ds1.Tables[0].Columns.Count)
{
for (int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
for (int j = 0; j < ds.Tables[0].Columns.Count; j++)
{
if (ds.Tables[0].Rows[i][j].ToString() == ds1.Tables[0].Rows[i][j].ToString())
{
}
else
{
MessageBox.Show(i.ToString() + "," + j.ToString());
}
}
}
}
else
{
MessageBox.Show("Table has different columns ");
}
}
catch (Exception)
{
MessageBox.Show("Please select The Table");
}
I'm continuing tzot's idea ...
If you have two sortable sets, then you can just use:
List<string> diffList = new List<string>(sortedListA.Except(sortedListB));
If you need more complicated objects, you can define a comparator yourself and still use it.
The usual usage scenario considers a user that has a DataTable in hand and changes it by Adding, Deleting or Modifying some of the DataRows.
After the changes are performed, the DataTable is aware of the proper DataRowState for each row, and also keeps track of the Original DataRowVersion for any rows that were changed.
In this usual scenario, one can Merge the changes back into a source table (in which all rows are Unchanged). After merging, one can get a nice summary of only the changed rows with a call to GetChanges().
In a more unusual scenario, a user has two DataTables with the same schema (or perhaps only the same columns and lacking primary keys). These two DataTables consist of only Unchanged rows. The user may want to find out what changes does he need to apply to one of the two tables in order to get to the other one. That is, which rows need to be Added, Deleted, or Modified.
We define here a function called GetDelta() which does the job:
using System;
using System.Data;
using System.Xml;
using System.Linq;
using System.Collections.Generic;
using System.Data.DataSetExtensions;
public class Program
{
private static DataTable GetDelta(DataTable table1, DataTable table2)
{
// Modified2 : row1 keys match rowOther keys AND row1 does not match row2:
IEnumerable<DataRow> modified2 = (
from row1 in table1.AsEnumerable()
from row2 in table2.AsEnumerable()
where table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row2[keycol.Ordinal]))
&& !row1.ItemArray.SequenceEqual(row2.ItemArray)
select row2);
// Modified1 :
IEnumerable<DataRow> modified1 = (
from row1 in table1.AsEnumerable()
from row2 in table2.AsEnumerable()
where table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row2[keycol.Ordinal]))
&& !row1.ItemArray.SequenceEqual(row2.ItemArray)
select row1);
// Added : row2 not in table1 AND row2 not in modified2
IEnumerable<DataRow> added = table2.AsEnumerable().Except(modified2, DataRowComparer.Default).Except(table1.AsEnumerable(), DataRowComparer.Default);
// Deleted : row1 not in row2 AND row1 not in modified1
IEnumerable<DataRow> deleted = table1.AsEnumerable().Except(modified1, DataRowComparer.Default).Except(table2.AsEnumerable(), DataRowComparer.Default);
Console.WriteLine();
Console.WriteLine("modified count =" + modified1.Count());
Console.WriteLine("added count =" + added.Count());
Console.WriteLine("deleted count =" + deleted.Count());
DataTable deltas = table1.Clone();
foreach (DataRow row in modified2)
{
// Match the unmodified version of the row via the PrimaryKey
DataRow matchIn1 = modified1.Where(row1 => table1.PrimaryKey.Aggregate(true, (boolAggregate, keycol) => boolAggregate & row1[keycol].Equals(row[keycol.Ordinal]))).First();
DataRow newRow = deltas.NewRow();
// Set the row with the original values
foreach(DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = matchIn1[dc.ColumnName];
deltas.Rows.Add(newRow);
newRow.AcceptChanges();
// Set the modified values
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
// At this point newRow.DataRowState should be : Modified
}
foreach (DataRow row in added)
{
DataRow newRow = deltas.NewRow();
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
deltas.Rows.Add(newRow);
// At this point newRow.DataRowState should be : Added
}
foreach (DataRow row in deleted)
{
DataRow newRow = deltas.NewRow();
foreach (DataColumn dc in deltas.Columns)
newRow[dc.ColumnName] = row[dc.ColumnName];
deltas.Rows.Add(newRow);
newRow.AcceptChanges();
newRow.Delete();
// At this point newRow.DataRowState should be : Deleted
}
return deltas;
}
private static void DemonstrateGetDelta()
{
DataTable table1 = new DataTable("Items");
// Add columns
DataColumn column1 = new DataColumn("id1", typeof(System.Int32));
DataColumn column2 = new DataColumn("id2", typeof(System.Int32));
DataColumn column3 = new DataColumn("item", typeof(System.Int32));
table1.Columns.Add(column1);
table1.Columns.Add(column2);
table1.Columns.Add(column3);
// Set the primary key column.
table1.PrimaryKey = new DataColumn[] { column1, column2 };
// Add some rows.
DataRow row;
for (int i = 0; i <= 4; i++)
{
row = table1.NewRow();
row["id1"] = i;
row["id2"] = i*i;
row["item"] = i;
table1.Rows.Add(row);
}
// Accept changes.
table1.AcceptChanges();
PrintValues(table1, "table1:");
// Create a second DataTable identical to the first.
DataTable table2 = table1.Clone();
// Add a row that exists in table1:
row = table2.NewRow();
row["id1"] = 0;
row["id2"] = 0;
row["item"] = 0;
table2.Rows.Add(row);
// Modify the values of a row that exists in table1:
row = table2.NewRow();
row["id1"] = 1;
row["id2"] = 1;
row["item"] = 455;
table2.Rows.Add(row);
// Modify the values of a row that exists in table1:
row = table2.NewRow();
row["id1"] = 2;
row["id2"] = 4;
row["item"] = 555;
table2.Rows.Add(row);
// Add a row that does not exist in table1:
row = table2.NewRow();
row["id1"] = 13;
row["id2"] = 169;
row["item"] = 655;
table2.Rows.Add(row);
table2.AcceptChanges();
Console.WriteLine();
PrintValues(table2, "table2:");
DataTable delta = GetDelta(table1,table2);
Console.WriteLine();
PrintValues(delta,"delta:");
// Verify that the deltas DataTable contains the adequate Original DataRowVersions:
DataTable originals = table1.Clone();
foreach (DataRow drow in delta.Rows)
{
if (drow.RowState != DataRowState.Added)
{
DataRow originalRow = originals.NewRow();
foreach (DataColumn dc in originals.Columns)
originalRow[dc.ColumnName] = drow[dc.ColumnName, DataRowVersion.Original];
originals.Rows.Add(originalRow);
}
}
originals.AcceptChanges();
Console.WriteLine();
PrintValues(originals,"delta original values:");
}
private static void Row_Changed(object sender,
DataRowChangeEventArgs e)
{
Console.WriteLine("Row changed {0}\t{1}",
e.Action, e.Row.ItemArray[0]);
}
private static void PrintValues(DataTable table, string label)
{
// Display the values in the supplied DataTable:
Console.WriteLine(label);
foreach (DataRow row in table.Rows)
{
foreach (DataColumn col in table.Columns)
{
Console.Write("\t " + row[col, row.RowState == DataRowState.Deleted ? DataRowVersion.Original : DataRowVersion.Current].ToString());
}
Console.Write("\t DataRowState =" + row.RowState);
Console.WriteLine();
}
}
public static void Main()
{
DemonstrateGetDelta();
}
}
The code above can be tested in https://dotnetfiddle.net/. The resulting output is shown below:
table1:
0 0 0 DataRowState =Unchanged
1 1 1 DataRowState =Unchanged
2 4 2 DataRowState =Unchanged
3 9 3 DataRowState =Unchanged
4 16 4 DataRowState =Unchanged
table2:
0 0 0 DataRowState =Unchanged
1 1 455 DataRowState =Unchanged
2 4 555 DataRowState =Unchanged
13 169 655 DataRowState =Unchanged
modified count =2
added count =1
deleted count =2
delta:
1 1 455 DataRowState =Modified
2 4 555 DataRowState =Modified
13 169 655 DataRowState =Added
3 9 3 DataRowState =Deleted
4 16 4 DataRowState =Deleted
delta original values:
1 1 1 DataRowState =Unchanged
2 4 2 DataRowState =Unchanged
3 9 3 DataRowState =Unchanged
4 16 4 DataRowState =Unchanged
Note that if your tables don't have a PrimaryKey, the where clause in the LINQ queries gets simplified a little bit. I'll let you figure that out on your own.
Achieve it simply using linq.
private DataTable CompareDT(DataTable TableA, DataTable TableB)
{
DataTable TableC = new DataTable();
try
{
var idsNotInB = TableA.AsEnumerable().Select(r => r.Field<string>(Keyfield))
.Except(TableB.AsEnumerable().Select(r => r.Field<string>(Keyfield)));
TableC = (from row in TableA.AsEnumerable()
join id in idsNotInB
on row.Field<string>(ddlColumn.SelectedItem.ToString()) equals id
select row).CopyToDataTable();
}
catch (Exception ex)
{
lblresult.Text = ex.Message;
ex = null;
}
return TableC;
}