c# - Summarizing duplicate rows in datatable

c# - Summarizing duplicate rows in datatable - c#

I have a table and I want to sum up duplicate rows:
|name | n | |name | n |
|------+---| |------+---|
|leo | 1 | |leo | 3 |
|wayne | 1 | |wayne | 2 |
|joe | 1 | |joe | 1 |
|wayne | 1 |
|leo | 1 |
|leo | 1 |
I can delete it like this, but how to summarize?
ArrayList UniqueRecords = new ArrayList();
ArrayList DuplicateRecords = new ArrayList();
foreach (DataRow dRow in table.Rows)
{
if (UniqueRecords.Contains(dRow["name"]))
DuplicateRecords.Add(dRow);
else
UniqueRecords.Add(dRow["name"]);
}
foreach (DataRow dRow in DuplicateRecords)
{
table.Rows.Remove(dRow);
}

This is how you do it with a dictionary. Basically you create a dictionary from "name" to DataRow object and then sum up the DataRows' "n" property:
// create intermediate dictionary to group the records
Dictionary<string, DataRow> SummarizedRecords = new Dictionary<string, DataRow>();
// iterate over all records
foreach(DataRow dRow in table.Rows)
{
// if the record is in the dictionary already -> sum the "n" value
if(SummarizedRecords.ContainsKey(dRow["name"]))
{
SummarizedRecords[dRow["name"]].n += dRow["n"];
}
else
{
// otherwise just add the element
SummarizedRecords[dRow["name"]] = dRow;
}
}
// transform the dictionary back into a list for further usage
ArrayList<DataRow> summarizedList = SummarizedRecords.Values.ToList();
I think this can be done more elegantly (1 line of code) with LINQ. Let me think some more about it :)
Edit
Here is a Linq version, which however involves creating new DataRow objects, this may not be your intention - don't know:
ArrayList<DataRow> summarizedRecords = table.Rows.GroupBy(row => row["name"]) // this line groups the records by "name"
.Select(group =>
{
int sum = group.Sum(item => item["n"]); // this line sums the "n"'s of the group
DataRow newRow = new DataRow(); // create a new DataRow object
newRow["name"] = group.Key; // set the "name" (key of the group)
newRow["n"] = sum; // set the "n" to sum
return newRow; // return that new DataRow
})
.ToList(); // make the resulting enumerable a list

Thanks for your replies, another variant:
var result = from row in table.AsEnumerable()
group row by row.Field<string>("Name") into grp
select new
{
name = grp.Key,
n = grp.Count()
};

Related

Splitting hierarchies from datatable

I have a datatable which contains multiple hierarchies with different heights which I need to split.
eg.
|---------------------|------------------|
| Account | Hierarchy Account|
|---------------------|------------------|
| 1 | |
|---------------------|------------------|
| 2 | 1 |
|---------------------|------------------|
| 3 | 1 |
|---------------------|------------------|
| 4 | 2 |
|---------------------|------------------|
| 5 | 3 |
|---------------------|------------------|
| 6 | |
|---------------------|------------------
| 7 | 6 |
|---------------------|------------------|
| 8 | 6 |
|---------------------|------------------|
Below is what I have tried so far.
private List<DataTable> SplitDataTablesOnHierarchy(DataTable dataTable)
{
List<DataTable> dataTablesList = new List<DataTable>();
List<string> listTemp = new List<string>();
var HierarchyAccounts = dataTable.AsEnumerable().Where(m => m.Field<string>("Hierarchy Account Number") == "");
foreach(var topAccount in TopAccounts )
{
//Check if account exists in Hierarchy Account Number
var topAccountExists = dataTable.AsEnumerable().Any(m => m.Field<string>("Hierarchy Account Number") == topAccount.Field<string>("Account Number"));
if (topAccountExists == true)
{
DataTable newDataTable = dataTable.Clone();
newDataTable.ImportRow(payerAccount);
dataTablesList.Add(newDataTable);
}
//Top Accounts found and added to tempList
}
//CreateDataTable with Top Accounts
foreach(DataTable dTable in dataTablesList)
{
bool bottomHierarchyReached = true;
var TempSearch = dTable.Rows;
while(bottomHierarchyReached)
{
foreach(DataRow account in TempSearch)
{
var rows = dataTable.AsEnumerable().Where(m => m.Field<string>("Hierarchy Account Number") == account.Field<string>("Account Number")).CopyToDataTable();
if(rows.Rows.Count == 0)
{
bottomHierarchyReached = false;
break;
}
else
{
TempSearch = rows.Rows;
dTable.Rows.Add(rows.Rows);
}
}
}
}
return dataTablesList;
}
My thought process above was to first find the highest accounts in the hierarchy, create new datatables with those accounts and then drill down and add the following levels to the relevant datatable recursively since I do not know the height of each hierarchy.

Found a solution by creating a tempList which keeps the all of the lower levels while searching through the level above.
Once the loop through the SearchList is done we assign the tempList to it.
And then search through the next level of the hierarchy.
foreach (DataTable dTable in dataTablesList)
{
bool bottomHierarchyReached = true;
var SearchList = dTable.AsEnumerable().Select(p=> new { HierarchyAccount = p.Field<string>("Hierarchy Account Number"),
Account = p.Field<string>("Account Number")
}).ToList();
var tempList = SearchList.ToList();
tempList.Clear();
while (bottomHierarchyReached)
{
tempList.Clear();
foreach (var account in SearchList)
{
var rows = dataTable.AsEnumerable().Where(m => m.Field<string>("Hierarchy Account Number") == account.Account);
if(rows.Count() == 0)
{
bottomHierarchyReached = false;
break;
}
else
{
tempList.AddRange(rows.AsEnumerable().Select(p => new {
HierarchyAccount = p.Field<string>("Hierarchy Account Number"),
Account = p.Field<string>("Account Number")
}).ToList());
foreach(var row in rows)
{
dTable.ImportRow(row);
}
}
}
SearchList = tempList.ToList();
}
}

Replace values of List<Class>

I have two List<class>, List1 and List2 which contains multiple columns: RowNo, Value1, Value2, etc. as follows
List1
| RowNo | Value |
|-------|-------|
| 1 | 11 |
| 2 | 22 |
| 3 | 33 |
| 4 | 88 |
List2
| RowNo | Value |
|-------|-------|
| 1 | 44 |
| 2 | 55 |
| 3 | 66 |
I want to replace the value of element of List1 with the value of element of List2 if the RowNo matches.The output I want to generate is as follows
Desired result
| RowNo | Value |
|-------|-------|
| 1 | 44 |
| 2 | 55 |
| 3 | 66 |
| 4 | 88 |
Any Ideas or suggestions? How can I achieve this? What can be the best and efficient way to do this?

You can just use a loop to compare the values in List1 with List2, and if a match is found, update the Value
foreach (var item in List1)
{
var match = List2.FirstOrDefault(x => x.RowNo == item.RowNo);
if (match != null)
{
item.Value = match.Value;
}
}

Using Linq
List1.ForEach(l1 => l1.Value = (List2.FirstOrDefault(l2 => l2.RowNo == l1.RowNo) ?? l1).Value);
The Value property of l1 list element will be set to itself if no element will be found on the List2 list.
Full code
class MyClass
{
public int RowNo { get; set; }
public int Value { get; set; }
}
var List1 = new List<MyClass>()
{
new MyClass(){RowNo = 1, Value = 11},
new MyClass(){RowNo = 2, Value = 22},
new MyClass(){RowNo = 3, Value = 33},
new MyClass(){RowNo = 4, Value = 88},
};
var List2 = new List<MyClass>()
{
new MyClass(){RowNo = 1, Value = 44},
new MyClass(){RowNo = 2, Value = 55},
new MyClass(){RowNo = 3, Value = 66}
};
List1.ForEach(l1 => l1.Value = (List2.FirstOrDefault(l2 => l2.RowNo == l1.RowNo) ?? l1).Value);

List1.ForEach(x =>
{
var item = List2.FirstOrDefault(y => y.RowNo == x.RowNo);
if (item != null)
{
x.Value = item.Value;
}
});

Put all data of list1 into a Dictionary (key is the RowNo).
Loop over list2 to update the Dictionary.
Convert the data of the Dictionary to a List.
It approaches an O(n) operation.

Use this extension method to achieve what you want:
public static class LinqExtentions
{
public static void Project<T>(this IEnumerable<T> lst1, IEnumerable<T> lst2,
Func<T, object> key, Action<T, T> action)
{
foreach (var item1 in lst1)
{
var item2 = lst2.FirstOrDefault(x => key(x).Equals(key(item1)));
if (item2 != null)
{
action(item1, item2);
}
}
}
}
then you can use it like this:
List1.Project(List2, x => x.RowNo, (y, z) => { y.Value = z.Value; });
What it does
is projecting one list over the other, then matching the key values in both (RowNo in your example), when two items have the same key then the action supplied in the third parameter is applied, in this example, you want elements in the first list to have the same Value as elements in the second list, that's exactly what this delegate does:
(y, z) => { y.Value = z.Value; }
you can use this extension method to achieve the same requirement for any pair of lists:
Call Project on the list you want to change.
Pass the the list of the values you want to assign to the first list, as the first parameter.
Pass the key property as the second parameter.
The third
parameter is the action you want to apply on your list.

You can loop over List1 and check if List2 contains a match then fill the result in a new list
List<YourClass> result = new List<YourClass>();
for (int i = 0; i < List1.Count; i++)
{
YourClass resRowValue = List1[i];
if (List2.Count > i && List2[i].RowValue.equals(resStrRowValue.RowValue)
resStr.RowValue = List2[i].RowValue;
result.Add(resRowValue);
}
//set the result to List1
List1 = result;
You can do this also using linq
List1 = List1.Select(x => {
int i = List1.IndexOf(x);
YourClass newValue = List2.FirstOrDefault(y => y.RowValue.Equals(x.RowValue));
if (newValue != null)
x.RowValue = newValue.RowValue;
return x;
}).ToList();

Transform SQL Records to Object

Currently I am developing an application like SharePoint and I am encountering a difficult as followed.
I have a DB table to keep my contents like the following
+----+---------------+----------+---------+----------+--------------------------+
| ID | Content_type | List_ID | COL_ID | ITEM_ID | VALUE |
+----+---------------+----------+---------+----------+--------------------------+
| 1 | "Column" | 1 | 0 | | "ABC" |
| 2 | "Column" | 1 | 1 | | "DEF" |
| 3 | "Item" | 1 | 0 | 1 | "<VALUE OF Column ABC>" |
| 4 | "Item" | 1 | 1 | 1 | "<VALUE OF Column DEF>" |
+----+---------------+----------+---------+----------+--------------------------+
and I would like to display these record on the web using linq and C# like the following....
ITEM_ID |ABC |DEF
------------+---------------------+----------------------
1 |<VALUE OF Column ABC>|<VALUE OF Column DEF>
EDITED:
My questions are:
I would like to use the DB record stated as Column in the content_type field to be the DataColumn of a DataTable.
I would like to map all records in the DB stated as ITEM with the same Item_ID as 1 DataRow of a DataTable. The value field of each DB records will fall onto the column of the DataTable based on the Column ID.

Thanks for the help. I had make it by myself....
First get the records from DB where content_type = "Column" and use
these records to form a DataTable
Get all records from DB where content_typ= "Item" and re-group each item to a List where ITEM_id =
Map the Item.title = column of the DataTable and Items.value = value of the DataTable rows....
public static DataTable PopulateRec(int list_id, string web_app, string host_auth)
{
DataTable dt = new DataTable();
List Column = Get_Column(list_id, web_app, host_auth);
for (int i = 0; i < Column.Count(); i++)
{
DataColumn datacolumn = new DataColumn();
datacolumn.ColumnName = Column[i].ToString();
dt.Columns.Add(datacolumn);
}
List Items = Get_Item(list_id, web_app, host_auth);
if (Items.Count != 0)
{
int ItemCount = Convert.ToInt32((from Itms in Items
select Itms.Item_id).Max());
for (int j = 0; j <= ItemCount; j++)
{
dt.Rows.Add();
List IndvItem = (from Indv in Items
where Indv.Item_id == j
select Indv).ToList();
foreach (var val in IndvItem)
{
dt.Rows[j][val.title] = val.value;
}
IndvItem = null;
}
for (int k = 0; k < dt.Rows.Count; k++)
{
if (dt.Rows[k][0].ToString() == string.Empty)
{
dt.Rows[k].Delete();
}
}
}
Column = null;
Items = null;
return dt;
}
private static List Get_Column(int list_id, string web_app, string host_auth)
{
List content_db = new List();
List columnname = new List();
Config_DB_Context configdb = new Config_DB_Context("dms_config");
content_db = (from c in configdb.site_mapping
where c.host_auth == host_auth
&& c.web_app == web_app
select c.content_db).ToList();
for(int i = 0; i < content_db.Count(); i++)
{
Content_DB_Context contentdb = new Content_DB_Context(content_db[i]);
columnname = (from c in contentdb.content
where c.content_type == "Column"
&& c.list_id == list_id
select c.title).ToList();
}
content_db = null;
return columnname;
}
private static List Get_Item(int list_id, string web_app, string host_auth)
{
List content_db = new List();
List Itm = new List();
Config_DB_Context configdb = new Config_DB_Context("dms_config");
content_db = (from c in configdb.site_mapping
where c.host_auth == host_auth
&& c.web_app == web_app
select c.content_db).ToList();
for (int i = 0; i < content_db.Count(); i++)
{
Content_DB_Context contentdb = new Content_DB_Context(content_db[i]);
Itm = (from c in contentdb.content
where c.content_type == "Item"
&& c.list_id == list_id
select new MyItem
{
col_id = (int)c.column_id,
list_id = (int)c.list_id,
title = c.title,
value = c.value,
Item_id = (int)c.item_id,
hidden = c.hidden
}).ToList();
}
content_db = null;
return Itm;
}

How to remove complete row from a data table when it is a duplicate row in c#

I have a data table which has duplicate row as follow.
| | | | | |
|cid | usrnme | pname | prate | cabin |
|-----------------------------------------------------------|
|c11 | demo1#gmail.com | sample1 | 2000 | B2 | *******
|c14 | demo2#live.com | sample2 | 5000 | B3 |
|c15 | demo3#yahoo.com | sample3 | 8000 | B2 |
|c11 | demo1#gmail.com | sample1 | 2000 | B2 | *******
|c18 | demo4#gmail.com | sample4 | 3000 | L1 |
|c11 | demo5#gmail.com | sample5 | 7400 | B4 | &&&&&&&
============================================================
NOTE : there are different data for same ID ,see &&&&&&& row
How to get one row for above duplicate two rows.I have tried This
this is the code I used.
public DataTable RemoveduplicateRows(DataTable dTable,string colName)
{
colName = "cabin";
Hashtable hTable = new Hashtable();
ArrayList duplicateArrayList = new ArrayList();
foreach(DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateArrayList.Add(drow);
else
{
hTable.Add(drow[colName], string.Empty);
}
}
foreach (DataRow dRow in duplicateArrayList)
dTable.Rows.Remove(dRow);
return dTable;
}
if I used above code it avoid duplicate according to cabin then it removes all records which its cabin is B2 and keep the first one only.what I want is to remove only the full row(keep one and delete others).how can I do that.

Its your cid that decides the uniqueness of the record. In the provided example there are two rows with same cid and the entire row is same too.. That proves that if you use cid to find the duplicates then you will get the desired output
change only this line of code
colName = "cabin"; to colName = "cid";

try to use DataView and do this
DataView view = new DataView(table);
DataTable distinctValues = view.ToTable(true, "Column1", "Column2" ...);
in your case put all the columns

You can use IEnumerable.Distinct
DataTable dataTable = // from data source
var distinctDataTable = dataTable.AsEnumerable().Distinct(DataRowComparer.Default);
Also see Comparing DataRows (LINQ to DataSet) as reference

The problem with your solution is that it removes all duplicates instead of keeping one row from each duplicate group.
To do so, you'll need to group the rows by duplicate column and from each group of duplicates delete all rows except the first one.
I haven't tested it in Visual Studio but the code below should give you a hint in the right direction.
var duplicates = dataTable.Rows
.Cast<DataRow>()
.GroupBy(r => r["cabin"])
.SelectMany(g => g.Skip(1))
.ToList();
duplicates.ForEach(dataTable.Rows.Remove);

Its very simple.
You can try this below snippet:
DataTable.DefaultView.ToTable(bool distinct, string[] ColumnNames)

This may seem like a lot of code but it gets the Distinct Method to work on a DataTable row. So you only need on instruction in the main code. This method uses standard built in library methods.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
DataTable dt = new DataTable();
dt.Columns.Add("cid", typeof(string));
dt.Columns.Add("usrnme", typeof(string));
dt.Columns.Add("pname", typeof(string));
dt.Columns.Add("prate", typeof(int));
dt.Columns.Add("cabin", typeof(string));
dt.Rows.Add(new object[] { "c11", "demo1#gmail.com", "sample1", 2000, "B2" });
dt.Rows.Add(new object[] { "c14", "demo2#live.com", "sample2", 5000, "B3" });
dt.Rows.Add(new object[] { "c15", "demo3#yahoo.com", "sample3", 8000, "B2" });
dt.Rows.Add(new object[] { "c11", "demo1#gmail.com", "sample1", 2000, "B2" });
dt.Rows.Add(new object[] { "c18", "demo4#gmail.com", "sample4", 3000, "L1" });
dt.Rows.Add(new object[] { "c11", "demo5#gmail.com", "sample5", 7400, "B4" });
dt = dt.AsEnumerable().Select(x => new UniqueColumns(x)).Distinct().Select(y => y.row).CopyToDataTable();
}
}
public class UniqueColumns : EqualityComparer<UniqueColumns>
{
public DataRow row { get; set; }
public UniqueColumns(DataRow row)
{
this.row = row;
}
public override int GetHashCode(UniqueColumns _this)
{
int hash = 0;
foreach(var x in _this.row.ItemArray){hash ^= x.GetHashCode();} ;
return hash;
}
public override int GetHashCode()
{
return this.GetHashCode(this);
}
public override Boolean Equals(UniqueColumns _this, UniqueColumns other)
{
Boolean results = _this.row.ItemArray.Select((x,i) => x.Equals(other.row.ItemArray[i])).All(y => y);
return results;
}
public override bool Equals(object other)
{
return this.Equals(this, (UniqueColumns)other);
}
}
}

Implementing COUNT() and ROUND() method in LINQ query

For another example to get return data of a pivot table I'm defined a LINQ query to solve this problem. Well, now my question is how to count the values of a column?
Here the following C# Code:
var query = from q in db.DS
where q.datum >= fromDate && q.datum <= toDate
group q by q.quot_rate
into grp
select new
{
Grade = grp.Key,
Total = grp.Select(t => new { t.fon, t.quot_rate }).AsQueryable()
};
var rate = (from q in db.DS
select q.fon).Distinct();
DataTable dt = new DataTable();
dt.Columns.Add("Grade");
foreach (var r in rate)
{
dt.Columns.Add(r.ToString());
}
foreach (var q in query)
{
DataRow dr = dt.NewRow();
dr["Grade"] = q.grade; //round q_grade
foreach (var t in q.Total)
{
dr[t.fon] = t.quot_rate; //count t.quot_rate
}
dt.Rows.Add(dr);
}
return dt;
You can see the comments where the numbers have to ROUND() and COUNT().
How can I define this?
EDIT:
The output is currently as follows:
Grade | AB001 | AB002 | AB003 ...
90,045| 90,045| null | null
85,590| null | 85,590| 85,590
85,450| null | 85,450| null
84,901| null | 84,901| null
and I want the result as follows:
Grade | AB001 | AB002 | AB003 ...
90 | 1 | 0 | 0
86 | 0 | 1 | 1
85 | 0 | 2 | 0

So it appears that you actually want rounding to happen inside the query, so that you can do grouping by rounded values. So first part of the question can be answered as:
Grade = Math.Round(grp.Key),
Then the counts come out naturally as:
q.Total.Count()
However it seems that you actually want counts by rate items, so I would suggest something like that for each table row:
foreach (var r in rate)
{
dr[r] = q.Total.Count(x => x.fon == r);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# - Summarizing duplicate rows in datatable - c#

Thanks for your replies, another variant: var result = from row in table.AsEnumerable() group row by row.Field<string>("Name") into grp select new { name = grp.Key, n = grp.Count() };

Related

Splitting hierarchies from datatable

Replace values of List<Class>

Transform SQL Records to Object

How to remove complete row from a data table when it is a duplicate row in c#

Implementing COUNT() and ROUND() method in LINQ query

Categories

Resources