Find intersecting DataRows in a List of DataTables - c#

I have a List. I would like to filter through all the rows in the list of tables to find all the rows that are in every datatable in the list.
If possible, the compare needs to be on the "ID" column that is on every row.
I have tried to solve this with Linq but got stuck. This is what I have so far:
List<DataTable> dataTables = new List<DataTable>();
// fill up the list
List<DataRow> dataRows =
dataTables.SelectMany(dt => dt.Rows.Cast<DataRow>().AsEnumerable()).
Aggregate((r1, r2) => r1.Intersect(r2));
Any suggestions?

Not a simple question. Here's a solution (which seems too complicated to me, but it works).
Obtain the Id value from each row using Linq to DataSets
Intersect the multiple lists to find all the common values
Find a single occurence of a row in all of the rows that have one of the matching ids
To use Linq on DataTable, see this article for a start.
You could get the ids from one table like this
var ids = dt.AsEnumerable().Select (d => d.Field<int>("ID")).OfType<int>();
and from multiple tables
var setsOfIds = dataTables.Select (
t => t.AsEnumerable().Select (x => x.Field<int>("ID")).OfType<int>());
To intersect multiple lists, try this article. Using one of the methods there you could obtain the intersection of all of the ids.
Using Jon Skeet's helper method
public static class MyExtensions
{
public static List<T> IntersectAll<T>(this IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = new HashSet<T>(lists.First());
foreach (var list in lists.Skip(1))
{
hashSet.IntersectWith(list);
}
return hashSet.ToList();
}
}
we can write
var commonIds = setsOfIds.InsersectAll();
Now flatten all the rows from the DataTables and filter by the common ids:
var rows = dataTables.SelectMany (t => t.AsEnumerable()).Where(
r => commonIds.Contains(r.Field<int>("ID")));
Now group by id and take the first instance of each row:
var result = rows.GroupBy (r => r.Field<int>("ID")).Select (r => r.First ());

Try this to find the intersection between the two lists:
r1.Join(r2, r1 => r1.Id, r2 => r2.Id, (r1, r2) => r1);

Related

LINQ - Simplify two lists by keeping index and removing items by condition

I need to simplify two lists using LINQ keeping their index and removing pairs that might have a null/empty partner. Or if I could combine them into a dictionary of key-value pairs (int and decimal) that will also be great.
list1=["1","2","4","5","6"]
list2=["20.20","","",50.0,""]
to
list1=["1","5"]
list2=["20.20","50.0"]
I get the lists from a form collection of paymentcategory and amounts. The categories and amounts are dynamically generated.
You can work with Enumerable.Zip() to combine both lists into one and then perform the filtering.
using System.Linq;
var combineList = list1.Zip(list2,
(a, b) => new { a = a?.ToString(), b = b?.ToString() })
.Where(x => !String.IsNullOrEmpty(x.a)
&& !String.IsNullOrEmpty(x.b))
.ToList();
list1 = combineList.Select(x => x.a).ToList();
list2 = combineList.Select(x => (object)x.b).ToList();
Demo # .NET Fiddle
I have solution using loop.
var data = new Dictionary<int, decimal>()
if(lis1.Count==list2.Count)
{
for (var i = 0; i < list1.Length; i++)
{
if (!string.IsNullOrEmpty(list1[i])&&!string.IsNullOrEmpty(list2[i]))
data.Add(Convert.ToInt32(list1[i]),Convert.ToDecimal(list2[i]));
}
}
now your data variable have Dictionary with int and decimal data.
Note: The both list needs to have same number of data otherwise it will through IndexOutOfRange exception

Delete rows in datatable from another datatable

I have two DataTables: allRows and rowsToDelete. I want to delete rows from allRows that rowsToDelete contains. Both tables have the same structure (same columns and their datatypes), but I can't know exact column names and even their quantity.
object.Equals() method recognizes rows from different tables as not equal so I can't use this approach.
From googling and reading StackOverflow I got an idea that probably it can be even done in one line, but I don't know how to build condition for this case:
allRows = allRows.AsEnumerable().Where(???).CopyToDataTable();
I don't know the structure of your data but generally speaking you can do
var cleanedUp = allRows.Where(row => !rowsToDelete.Any(row2 => row2.Id == row.Id ));
If you don not have any primary key then check multiple columns that combinations(composite key) provide you uniqueness on the behalf of which you can easily delete rows from all rows having similar data exist in rowsTodelete table.
OR
First implement inner join among two tables on similar column for retrieving data that you want to delete and insert that into any temp table and delete this data from allrows table.
Hope that will help you.
Your task has no solution, and you change the conditions,
first you write:
Both tables have the same structure (same columns and their datatypes)
then you write:
That's the problem - structure can be different and there can be no "Id" column
If you can even change the data, you can subtract both tables and investigate their structure, columns, etc. But you insist that you do not know them.
When you find out the structure, the task becomes elementary, smth like this(i use my structure from my own project):
var foo = DbContext
.Set<Task>()
.Select(x => new{x.Assignee, x.Availability})
.ToList();
var foo2 = DbContext
.Set<Task2>()
.Select(x => x)
.ToList();
var bar = foo2.Where(x => foo.Select(y => y.Assignee).Contains(x.Assignee)
&& foo.Select(y => y.Availability).Contains(x.Availability));
DbContext.RemoveRange(bar);
DbContext.SaveChanges();
you can write more elegantly, but it's so obvious
This solution works for any data structure, any columns' number and types.
We present DataRows as arrays and compare each cell.
public static void DeleteCopies(DataTable allRows, DataTable rowsToDelete)
{
foreach (DataRow rowToDelete in rowsToDelete.Rows)
{
foreach (DataRow row in allRows.Rows)
{
var rowToDeleteArray = rowToDelete.ItemArray;
var rowArray = row.ItemArray;
bool equalRows = true;
for (int i = 0; i < rowArray.Length; i++)
{
if (!rowArray[i].Equals(rowToDeleteArray[i]))
{
equalRows = false;
}
}
if (equalRows)
{
allRows.Rows.Remove(row);
break;
}
}
}
}

List of Objects with int property compared to List of Int

I have 2 lists. First is a list of objects that has an int property ID. The other is a list of ints.
I need to compare these 2 lists and copy the objects to a new list with only the objects that matches between the two lists based on ID. Right now I am using 2 foreach loops as follows:
var matched = new list<Cars>();
foreach(var car in cars)
foreach(var i in intList)
{
if (car.id == i)
matched.Add(car);
}
This seems like it is going to be very slow as it is iterating over each list many times. Is there way to do this without using 2 foreach loops like this?
One slow but clear way would be
var matched = cars.Where(car => intList.Contains(car.id)).ToList();
You can make this quicker by turning the intList into a dictionary and using ContainsKey instead.
var intLookup = intList.ToDictionary(k => k);
var matched = cars.Where(car => intLookup.ContainsKey(car.id)).ToList();
Even better still, a HashSet:
var intHash = new HashSet(intList);
var matched = cars.Where(car => intHash.Contains(car.id)).ToList();
You could try some simple linq something like this should work:
var matched = cars.Where(w => intList.Contains(w.id)).ToList();
this will take your list of cars and then find only those items where the id is contained in your intList.

Parameterise LINQ GroupBy

The following C# code takes a large datatable with many columns and an array of 2 column names. It will give a new datatable with two rows where there are duplicate rows for the two fields supplied staff no & skill.
This is too specific and I need to supply any number of fields as the groupby.
can someone help me?
string[] excelField = new string[0]; // contains a list of field name for uniquness
excelField[0] = "staff No";
excelField[1] = "skill";
DataTable dataTableDuplicateRows = new DataTable();
dataTableDuplicateRows.Clear();
dataTableDuplicateRows.Columns.Clear();
foreach (string fieldName in excelField)
{
dataTableDuplicateRows.Columns.Add(fieldName);
}
var duplicateValues = dataTableCheck.AsEnumerable()
.GroupBy(row => new { Field0 = row[excelField[0]], Field1 = row[excelField[1]] })
.Where(group => group.Count() > 1)
.Select(g => g.Key);
foreach (var duplicateValuesRow in duplicateValues)
{
dataTableDuplicateRows.Rows.Add(duplicateValuesRow.Field0, duplicateValuesRow.Field1);
}
I think what you require is something make the linq more dynamic, even though you could achieve it by using expression tree, the DynamicLinq library would appear to solve your issue in an easier way.
For you case, with the library, just use the GroupBy extension method with a string value.
More info about DynamicLinq library:
Scott Gu's blog

C# LINQ - What can I do here to improve performance?

I'm doing some heavy filtering on a collection (which is nothing more than an encapsulated list of entries of "datalines").
I need to 'consolidate' these lines on 3 fields (Date (string), Route (string) and ConsolidationCode (string)).
Extracting the 3 Distinct Lists works fast. I'm more worried about the triple foreach...
I'd say that a normal "complete _DealerCaseSetComplete contains 5000 entries.
The Dates would be around 5, the Routes would be around 100 and the Consolidations 350-500.
I have written following method. It does exactly what I want it to do, but is very slow in calculationtime.
Perhaps you people could guide me towards a faster code execution.
If you require any other code (which is really plain actually, please ask.
private void FillDataGridView()
{
//
_LocalGridControl.Invoke(CreateDataGrid);
//Filter by Date
List<string> Dates = _DealerCaseSetComplete.Data.Select(rec => rec.DateAdded).Distinct().ToList();
//Filter by Route
List<string> Routes = _DealerCaseSetComplete.Data.Select(rec => rec.Route).Distinct().ToList();
//Filter by Consolidation
List<string> Consolidations = _DealerCaseSetComplete.Data.Select(rec => rec.DealerConsolidationCode).Distinct().ToList();
foreach(string d in Dates)
{
foreach(string r in Routes)
{
foreach(string c in Consolidations)
{
List<DealerCaseLine> Filter = _DealerCaseSetComplete.Data.Where(rec => rec.DateAdded == d &&
rec.Route == r &&
rec.DealerConsolidationCode == c).ToList();
if(Filter.Count > 0)
_LocalGridControl.Invoke(AddLineToDataGrid, Filter);
}
}
}
_LocalGridControl.Invoke(SortDataGrid);
}
Looks like you need grouping by three fields:
var filters = from r in _DealerCaseSetComplete.Data
group r by new {
r.DateAdded,
r.Route,
r.DealerConsolidationCode
} into g
select g.ToList();
foreach(List<DealerCaseLine> filter in filters)
_LocalGridControl.Invoke(AddLineToDataGrid, filter);
Your code iterates all data three times to get distinct fields. Then it iterates all data for all combinations of distinct fields (when you do filtering with where clause). With grouping by this three fields you will iterate data only once. Each resulting group will have at least one item, so you don't need to check if there is any items in group, before invoking filter.
It looks like you're trying to get every distinct combination of Dates, Routes and Consolidations.
Your current code is slow because it is, I think, O(n^4). You have three nested loops, the body of which is a linear search.
You can get much better performance by using the overload of Distinct that takes an IEqualityComparer<T>:
http://msdn.microsoft.com/en-us/library/bb338049.aspx
var Consolidated =
_DealerCaseSetComplete.Data.Select(rec => rec).
Distinct(new DealerCaseComparer());
The class DealerCaseComparer would be implemented much as in the above MSDN link.

Categories

Resources