C# - Remove rows with the same column value from a DataTable - c#

I have a DataTable which looks like this:
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
2 cc 1.2.12
3 cd 2.3.12
Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
3 cd 2.3.12
I don't want to double pass the table rows, because the row number is big.
I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.

You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:
var result = dt.AsEnumerable()
.GroupBy(r => r.Field<int>("ID"))
.Select(g => g.First())
.CopyToDataTable();

I was solving the same situation and found it quite interesting and would like to share my finding.
If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");
The columns you mention here, only those will be returned back in newDatatable.
If distinct based on one column and column type is int then I would prefer LINQ query.
DataTable newDatatable = dt.AsEnumerable()
.GroupBy(dr => dr.Field<int>("ID"))
.Select(dg => dg).Take(1)
.CopyToDataTable();
If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
var idValue = (string)dt.Rows[i]["ID"];
if (toExclude.Contains(idValue))
{
dt.Rows.Remove(dt.Rows[i]);
i--;
}
toExclude.Add(glAccount);
}
Third being my favorite.
I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.
Hope it helps.

you can try this
DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");

Not necessarily the most efficient approach, but maybe the most readable:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup.First())
.CopyToDataTable();
Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup
.OrderByDescending(r => r.Field<DateTime>("DateBirth"))
.First())
.CopyToDataTable();

Get a record count for each ID
var rowsToDelete =
(from row in dataTable.AsEnumerable()
group row by row.ID into g
where g.Count() > 1
Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
Flatten
.SelectMany( g => g );
Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
Accept changes
dataTable.AcceptChanges();

Heres a way to achive this,
All you need to use moreLinq library use its function DistinctBy
Code:
protected void Page_Load(object sender, EventArgs e)
{
var DistinctByIdColumn = getDT2().AsEnumerable()
.DistinctBy(
row => new { Id = row["Id"] });
DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}
public DataTable getDT2()
{
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dob", typeof(string));
dt.Rows.Add("1", "aa","1.1.11");
dt.Rows.Add("2", "bb","2.3.11");
dt.Rows.Add("2", "cc","1.2.12");
dt.Rows.Add("3", "cd","2.3.12");
return dt;
}
OutPut: As what you expected
For moreLinq sample code view my blog

Related

How do I GroupBy one column on this DataTable

Suppose I have a call log DataTable where each row represents a call placed with the following columns:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition
I want to GroupBy column AccountNumber1 and want a new DataTable with the same columns + 1 additional column NumCalls which will be the count of calls for each AccountNumber1.
New DataTable after GroupBy:
AccountNumber1, AccountNumber2, AccountListDate, AccountDisposition, NumCalls
So far I have the following:
table.AsEnumerable()
.GroupBy(x => x.Field<int>("AccountNumber1"))
.Select(x => new { x.Key.AccountNumber1, NumCalls = x.Count() })
.CopyToDataTable()
Which gives me a DataTable with just two columns AccountNumber1 and NumCalls. How do I get the other columns as I described above?? I would appreciate any help. Thank you.
There's no magic, you need to use a loop and initialize the new table with the new column:
DataTable tblResult = table.Clone();
tblResult.Columns.Add("NumCalls", typeof(int));
var query = table.AsEnumerable().GroupBy(r => r.Field<string>("AccountNumber1"));
foreach (var group in query)
{
DataRow newRow = tblResult.Rows.Add();
DataRow firstOfGroup = group.First();
newRow.SetField<string>("AccountNumber1", group.Key);
newRow.SetField<string>("AccountNumber2", firstOfGroup.Field<string>("AccountNumber2"));
newRow.SetField<DateTime>("AccountListDate", firstOfGroup.Field<DateTime>("AccountListDate"));
newRow.SetField<string>("AccountDisposition", firstOfGroup.Field<string>("AccountDisposition"));
newRow.SetField<int>("NumCalls", group.Count());
}
This takes arbitrary values from the first row of each group which seems to be desired.

Comparing Values in One DataTable Column

I have a datatable that I read in from a csv. What I would like to do is find all the duplicate names within one row titled "name" and add them to another datable for use later. The code I have so far:
private DataTable MatcherTable(DataTable table)
{
DataTable match = new DataTable();
match = table.Clone();
var equalRows = table.Rows.Cast<DataRow>().Where(dataRow => dataRow["name"] == dataRow["name"]).ToList();
foreach (var equalRow in equalRows)
{
match.Rows.Add(equalRow.ItemArray);
}
return match;
}
However when I return the table that should be full of matches, it returns the exact same table that I read in. Am I missing something simple?
The code is simply copying all the datarows in the output table because the comparison expression compares the same row and column with itself.
You could resolve your problem with a single Linq expression
private DataTable MatcherTable(DataTable table)
{
DataTable match = table.Rows.Cast<DataRow>()
.GroupBy(x => x["Name"])
.Where(g => g.Count() > 1)
.Select(k => k.FirstOrDefault())
.CopyToDataTable();
return match;
}
Here we GroupBy the rows using the value in the Name column and filter out all Groups with an occurence count less than 2. Next we take the first row from the group and build a Datarow sequence finally copied in the output table.
The code above will return just one row of the duplicate ones. If you want to keep all duplicate rows then you need
DataTable match = table.Rows.Cast<DataRow>()
.GroupBy(x => x["Name"])
.Where(g => g.Count() > 1)
.SelectMany(k => k)
.CopyToDataTable();
Create empty List, so u can do this
List<string> names= new List<string>();
foreach(var row in table.Rows)
{
if(names.Contains(row["name"])
{
names.Add(row["name"].ToString());
}
else
{
DataRow dr = match.NewDataRow();
dr.ItemArray=row.ItemArray; match.Rows.Add(dr);
}
}
I might have some mistakes in spelling or smt, but this is just to give u an idea!

c# - Copy only selected data to new datatable with linq

I've searched the web for quite some time now and can't seem to find an elegant way to
read data from one datatable,
group it by two variables with linq
select only those two variables (forget about the others in the source datatable) and
copy these items to a new datatable.
I got it working without selecting specific variables, but at the amount of data the program is going to process later I'd rather only copy what's really needed.
var temp123 = from row in oldDataTable.AsEnumerable()
orderby row["Column1"] ascending
group row by new { Column1 = row["Column1"], Column2 = row["Column2"] } into grp
select grp.First();
newDataTable = temp123.CopyToDataTable();
Can anyone please be so kind to help me out here? Thanks!
You can use custom implementation of CopyToDataTable method from this article How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
newDataTable =
oldDataTable
.AsEnumerable()
.GroupBy(r => new { Column1 = row["Column1"], Column2 = row["Column2"] })
.Select(g => g.First())
.OrderBy(x => x.Column1)
.CopyToDataTable(); // your custom extension
Another option, as Tim suggested - manual creation of DataTable.
var newDataTable = new DataTable();
newDataTable.Columns.Add("Column1");
newDataTable.Columns.Add("Column2");
foreach(var item in temp123)
newDataTable.Rows.Add(item.Column1, item.Column2);
And last option (if possible) - don't use DataTable - simply use collection of strongly typed objects.

How to use LINQ to get unique columns from a DataTable

I have a DataTable in C# with columns defined as follows:
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
"UserName", "UserExId", and "UserEmail" are all unique and they are grouped by "OrgName" and "OrgExId"
I want to write a LINQ query to make a new DataTable that contains unique "OrgExId's" and "OrgName's"
This is as far as I got:
var results = from row in dt.AsEnumerable()
group row by row["OrgExId"] into orgs
select orgs;
Specifically in this query, I don't understand how I am supposed to select the rows from the original DataTable. Visual Studio says orgs is of the type `IGrouping, but I have never really seen this type before and am not sure how to manipulate it.
Is this a key value pair?
Sorry about that all. I did not specify my end result.
I want to end up with a DataTable with two columns, distinct "OrgExId" and "OrgName". (There is a one to one relationship between "OrgExId" and "OrgName")
All you really need is a Distinct clause
var output = dt.AsEnumerable()
.Select(x => new {OrgExId = x["OrgExId"], OrgName = x["OrgName"]})
.Distinct();
You can then iterate over this and build a DataTable or whatever you need.
UPDATE: You asked for the output to be a DataTable and the above solution didn't quite sit well with me since it requires extra work. To make this more efficient you could do a custom equality comparer.
Your linq looks like this...
// This returns a DataTable
var output = dt.AsEnumerable()
.Distinct(new OrgExIdEqualityComparer())
.CopyToDataTable();
And your comparer looks like this...
public class OrgExIdEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["OrgExId"].Equals(y["OrgExId"]);
}
public int GetHashCode(DataRow obj)
{
return obj["OrgExId"].GetHashCode();
}
}
Use Key property of IGrouping:
var results = from row in dt.AsEnumerable()
group row by new {
row.GetField<string>("OrgExId"),
row.GetField<string>("UserName")
} into orgs
select orgs.Key;
It will give you collection of anonymous types. To get DataTable you can simply iterate over results and add it into DataTable.
DataTable dt = new DataTable();
dt.Columns.Add("OrgName", typeof(string));
dt.Columns.Add("OrgExId", typeof(string));
dt.Columns.Add("UserName", typeof(string));
dt.Columns.Add("UserExId", typeof(string));
dt.Columns.Add("UserEmail", typeof(string));
// put some data for testing purpose
var id = Guid.NewGuid().ToString();
for (var i = 0; i < 10; i++)
dt.Rows.Add(id, i.ToString(), "user_name", Guid.NewGuid().ToString());
var x = dt.Rows.Cast<DataRow>().Select(x => x.Field<string>("UserName")).Distinct();
Console.WriteLine(x);

How do I use LINQ to filter a datatable against a Lst of strings that need to be split?

I have a datatable and I want to use LINQ to filter against a List of strings, with each string delimited using the pipe ('|'), and contains two values.
The list (List Actions) of string looks like this. This is only two strings in this list, but it can have many more.
8/1/2013 9:57:52 PM|Login for bill.lock#cap.com
8/1/2013 9:57:37 PM|Login for bill.lock#cap.com
The datatable has five (5) fields in each row, and I'm using each string from the list above to compare two fields (Text and Time) in the datatable to omit or delete those rows.
The datatable is structured like this
DataTable stdTable = new DataTable("Actions");
DataColumn col1 = new DataColumn("Area");
DataColumn col2 = new DataColumn("Action");
DataColumn col3 = new DataColumn("Time");
DataColumn col4 = new DataColumn("Text");
Currently I'm manually performing all this, but I know it can be done in LINQ with just a few lines of code. I'm not sure how to iterate thru the list and use the split. I saw this example, but the split is beyond me.
// Get all checked id's.
var ids = chkGodownlst.Items.OfType<ListItem>()
.Where(cBox => cBox.Selected)
.Select(cBox => cBox.Value)
.ToList();
// Now get all the rows that has a CountryID in the selected id's list.
var a = dt.AsEnumerable().Where(r =>
ids.Any(id => id == r.Field<int>("CountryID"))
);
// Create a new table.
DataTable newTable = a.CopyToDataTable();
Any help would be appreciated.
Thanks
List<string> list = {
"8/1/2013 9:57:52 PM|Login for bill.lock#cap.com",
"8/1/2013 9:57:37 PM|Login for bill.lock#cap.com"
};
var a = dt.AsEnumerable().Where(x=>
!list.Select(y=> new {
Time = DateTime.Parse(y.Split('|')[0]),
Text = y.Split('|')[1]
})
.Any(z=> z.Time == x.Time && z.Text == x.Text));
or
var a = dt.AsEnumerable().Where(x=>
!list.Any(y=> y == string.Format("{0}|{1}",x["Time"],x["Text"])));
DataTable newTable = a.CopyToDataTable();

Categories

Resources