Find duplicate and merge record into single datatable c#

Find duplicate and merge record into single datatable c# - c#

I am able to find the duplicates out of DataTable rows. Like following:
var groups = table.AsEnumerable()
.GroupBy(r => new
{
c1 = r.Field<String>("Version"),
});
var tblDuplicates = groups
.Where(grp => grp.Count() > 1)
.SelectMany(grp => grp)
.CopyToDataTable();
Now, I want to merge all the duplicate records in to single and sum it's Value column value.
Pretty much like following:
DataTable with Duplicates:
Version Value
1 2
2 2
2 1
1 3
2 1
3 2
DataTable with no duplicates and Value summed.:
Version Value
1 5
2 4
3 2
I am aware about this link which does this with the help of reflection.
http://forums.asp.net/t/1570562.aspx/1
Anyother way to do it?
Edit:
However, if I have more than two columns, like five columns and I still want to do the sum on Value column and also need other columns data in resulatant summed datatable. How to do it? Here I get the Version and Value in my result DataTable. I want other columns with values also. Like following:
Version col1 col2 Value
1 A A 2
2 B B 2
2 B B 1
1 A A 3
2 B B 1
3 C C 2

var result = table.AsEnumerable()
.GroupBy(r => r.Field<string>("Version"))
.Select(g =>
{
var row = table.NewRow();
row.ItemArray = new object[]
{
g.Key,
g.Sum(r => r.Field<int>("Value"))
};
return row;
}).CopyToDataTable();
Edit:
If you want to keep other field, try below:
var result = table.AsEnumerable()
.GroupBy(r => new
{
Version = r.Field<String>("Version"),
Col1 = r.Field<String>("Col1"),
Col2 = r.Field<String>("Col2")
})
.Select(g =>
{
var row = g.First();
row.SetField("Value", g.Sum(r => r.Field<int>("Value")));
return row;
}).CopyToDataTable();

Related

sum distinct values from a column in datagridview

I have a datagridview with two columns like this:
group | quantity
------------------------
chest | 3
legs | 7
back | 2
chest | 1
back | 5
legs | 2
What I'm trying to do is to get the sum of distinct group to a list and use that list for populate another datagridview.
So the result must be in this example:
chest | 4
legs | 9
back | 7
I've tried some linq query code but without any success.
How can I do it?
Here's some Linq queries I tried:
List<string> vv = dataGridView1.Rows.Cast<DataGridViewRow>()
.Where(x => !x.IsNewRow)
// either..
.Where(x => x.Cells[7].Value != null)
//..or or both
.Select(x => x.Cells[7].Value.ToString())
.Distinct()
.ToList();
dataGridView6.DataSource = vv;
EDIT
the group column is being auto filled after a selection of another column combobox, the quantity is filled manually. For the group by I found this code and works but throw an error if a cell is empty:
var Sums = dataGridView1.Rows.Cast<DataGridViewRow>()
.GroupBy(row => row.Cells[7].Value.ToString()) // group column
.Select(g => new { User = g.Key, Sum = g.Sum(row => Convert.ToInt32(row.Cells[1].Value)) });
dataGridView6.DataSource = Sums.ToList();

ok, here the solution that works:
var Sums = dataGridView1.Rows.Cast<DataGridViewRow>()
.Where(row => row.Cells[7].Value != null)
.GroupBy(row => row.Cells[7].Value.ToString()) // group column
.Select(g => new { User = g.Key, Sum = g.Sum(row => Convert.ToInt32(row.Cells[1].Value)) }); // quantity column
dataGridView6.DataSource = Sums.ToList();

datatable colum sum using linq in c#

Here is my first datatable dt
sscode scons cscons cstagged
A 10 2 20
A 10 2 20
B 10 2 40
Here is my second datatable dt1
Unit sscode
A101 A
A101 A
B101 B
and i want this output
Unit scons cscons cstagged
A101 20 4 40
I'm getting error while executing this query.
Here is my code
IEnumerable<DataRow> result = from data1 in dt.AsEnumerable()
join data2 in dt1.AsEnumerable()
on data1.Field<string>("sscode") equals
data2.Field<string>("substation_code")
group data2.Field<string>("Unit") by new {unit= data2.Field<string>("Unit")} into grp
orderby grp.Key.unit
select new
{
unit = grp.Key.unit,
sscons = grp.Sum(s => s.Field<string>("cscons")),
cscons = grp.Sum(s => s.Field<string>("cscons")),
cstagged = grp.Sum(s => s.Field<string>("cstagged"))
};
result.CopyToDataTable();

The problem with your current code is that grp holds the collection of both dataTables in which case you won't be able to get the items from first DataTable directly.
If I have understood your question correctly then this should give you the expected output:-
var result = from data2 in dt2.AsEnumerable()
group data2 by data2.Field<string>("Unit") into g
select new { Unit = g.Key, dt2Obj = g.FirstOrDefault() } into t3
let filteredData1 = dt.AsEnumerable()
.Where(x => x.Field<string>("sscode") == t3.dt2Obj.Field<string>("sscode"))
select new
{
unit = t3.unit,
sscons = filteredData1.Sum(s => s.Field<int>("cscons")),
cscons = filteredData1.Sum(s => s.Field<int>("cscons")),
cstagged = filteredData1.Sum(s => s.Field<int>("cstagged"))
};
First we are grouping by Unit in second dataTable (as that is the grouo which we need) then we are projecting the the entire object to get the sscode by using FirstOrDefault, after this simply filter the first list based on value we got from grouped sscode and project the items.
Check Working Fiddle.

First, You have to select after the group by otherwise only the grouped field is selected.
Second, You cannot sum strings. Only numeric fields (int, double...)
I'm not fluent in the inline-linq syntax, so I've changed it to methods chain.
var result =
dt.AsEnumerable()
.Join(dt1.AsEnumerable(), data1 => data1.Field<string>("sscode"), data2 => data2.Field<string>("substation_code"),
(data1, data2) => new {data1, data2})
.GroupBy(#t => new {unit = #t.data2.Field<string>("Unit")},
#t => #t.data1)
.Select(
grp =>
new
{
unit = grp.Key.unit,
sscons = grp.Sum(s => s.Field<int>("sscons")),
cscons = grp.Sum(s => s.Field<int>("cscons")),
cstagged = grp.Sum(s => s.Field<int>("cstagged"))
});
Note: Be aware that from this query you cannot use CopyToDataTable
Update
Since i understand that your fields are stored as strings you should use Convert.ToInt32:
grp.Sum(s => Convert.ToInt32(s.Field<string>("cscons"))
Update 2
As per the chat - it seems that the values are decimal and not ints:
sscons = grp.Sum(s => s.Field<decimal>("sscons")),
cscons = grp.Sum(s => s.Field<decimal>("cscons")),
cstagged = grp.Sum(s => s.Field<decimal>("cstagged"))

Find duplicates in datatable with multiple columns except two

I am new at coding and trying to check a spreadsheet for duplicate rows. The spreadsheet has 50 columns and every column has to be compared except two. If the rows is duplicated, it will combined them to one row and the amounts in columns REQNUM and AUTHNUM will be summed. Most of the samples I found use "Field("a column name")". Because of the large amount of columns, I want to use a variable that excluded the two I don't need in compare.
Example:
Before. The dots represent more columns
COL1|COL2|COL3|...|REQNUM|AUTHNUM
:-----: | :-----: | :----: |...| :----------: | :-----------: |....
x | y | z |...| 1 | 1
x | y | z |...| 2 | 3
After
COL1|COL2|COL3|...|REQNUM|AUTHNUM
------- | ------ | ------ | ...|------------ | ------------|....
x | y | z |...| 3 | 4
This is the code I have and it seems close but not quite right. I was expecting a result of just duplicate rows, so later I can run it through a foreach that will sum and delete extra rows. dtrow gets me the columns I want.(Thanks to Linq Excluding a column). When I try to use this the variable in my query, I get no results and if I remove the "g.Count() > 1" I get all the rows with them missing the two columns. I would like to keep the all the two columns in the results and not have to add them back in later.
var dtRow = dtExcel.Columns.Cast<DataColumn>().Where(c => c.ColumnName != "REQNUM" && c.ColumnName != "AUTHNUM").ToList();
var checkExcel = dtExcel.Rows.Cast<DataRow>()
.GroupBy(x => dtRow.Select(c => x[c]))
.Where(g => g.Count() > 1)
.Select(gr => gr);
//.CopyToDataTable();
Thank to Ken for help. This worked great for what I needed. I used the groupby clause so I can combine the duplicate into one row and add the number fields. also group by create a key that I use in an IF statement.
var dtRow = dtExcel.Columns.Cast<DataColumn>().Where(c => c.ColumnName != "REQNUM" && c.ColumnName != "AUTHNUM").ToList();
var excelDup = dtExcel.Rows.Cast<DataRow>()
.GroupBy(x => String.Join("", dtRow.Select(c => x[c])))
.Select(g =>
{
var row = g.First();
row.SetField("REQNUM", g.Sum(x => x.Field<double>("REQNUM")));
row.SetField("AUTHNUM", g.Sum(x => x.Field<double>("AUTHNUM")));
return row;
})
.CopyToDataTable();
I also used a where clause to create a variable for datarow compare and no key needed.
//Creates variable with all columns except three. It is used in next query
var dtExcelRow = dtExcel.Columns
.Cast().Where(c => c.ColumnName != "TITLE" && c.ColumnName != "REQSTR" && c.ColumnName != "AUTHSTR").ToList();
var dtListRow = dtList.Columns
.Cast().Where(c => c.ColumnName != "TITLE" && c.ColumnName != "REQSTR" && c.ColumnName != "AUTHSTR").ToList();
// Querys create datarow list for compare
IEnumerable<DataRow> eRow = dtExcel.AsEnumerable()
.Where(w => dtExcelRow.Select(c => w[c]).Any())
.Select(x => x);
IEnumerable<DataRow> lRow = dtList.AsEnumerable()
.Where(w => dtListRow.Select(c => w[c]).Any())
.Select(x => x);
// 1st compare gets list of new records that have changes or are new. 2nd is list of old records being change.
var newRecords = eRow.AsEnumerable().Except(lRow.AsEnumerable(), DataRowComparer.Default);
var oldRecords = lRow.AsEnumerable().Except(eRow.AsEnumerable(), DataRowComparer.Default);

You cannot just group the data by dtRow.Select(c => x[c]) because it is a IEnumerable, they may have the same content but they are still different IEnumerable.
If they are string, you may group the data by the joined string:
x => String.Join("", dtRow.Select(c => x[c]))

Sum/Count Column Data Datatable C# Console App

I connect to ODBC and populate Data Table.
Depending on the identifie type, INVOICE is + or negative.
I need to sum column two, by each identifier.
I currently use grouping of column 'indentifier, but it is a count so does not take into account a + or -. Simply it counts each go.
Here is example.
IDENTIFIER----| INVOICE
1A557--------| 1 -----------|
2B123--------| 1 -----------|
1A557--------| -1 -----------|
1A557--------| 1 -----------|
2B123--------| 1 -----------|
9C437--------| 1 -----------|
What I want to see is a summary.
This is the result of the above.
1A557--------| 1 -----------|
2B123--------| 2 -----------|
9C437--------| 1 -----------|
this is the code i currently use, which does not do the job.
var accountGroups = completeDT_units.AsEnumerable()
.GroupBy(row => row.Field<String>("IDENTIFIER"))
.Select(grp => new
{
Account = grp.Key,
Count = grp.Count()
});
Once this has run I need to see the summary counts.
I have previously copied to another datatable using the following code.
var tblAccCounts = new DataTable(); tblAccCounts.Columns.Add("IDENTIFIER"); tblAccCounts.Columns.Add("Totals"); //, typeof(int) foreach (var grp in accountGroups) tblAccCounts.Rows.Add(grp.Account, grp.Count);

You should use grp.Sum instead of grp.Count.
Something like:
var accountGroups = completeDT_units.AsEnumerable()
.GroupBy(row => row.Field<String>("IDENTIFIER"))
.Select(grp => new
{
Account = grp.Key,
Count = grp.Sum(row=>row.Field<int>("INVOICE"))
});

Try this one:
var accountGroups = completeDT_units.AsEnumerable()
.GroupBy(row => row.Field<String>("IDENTIFIER"))
.Select(grp => new
{
Account = grp.Key,
Count = grp.Sum(r => r.Field<int>("INVOICE"))
});

Entity Framework group by and get max

Lets say I've table with the structure as below:
MyRow:
Id Name Date
1 A 2015/01/01
2 B 2015/01/01
3 C 2015/01/02
4 A 2015/01/03
5 B 2015/01/03
6 A 2015/01/02
7 C 2015/01/01
Using EF I would like to get list of MyRow which would contain elements with distinct names and newest date so in this case it would be:
4 A 2015/01/03
5 B 2015/01/03
3 C 2015/01/02
I started with something like this:
var myRows = context.MyRows.GroupBy(mr => mr.Name).Select(..now wth with max..)

Order each group and take the last of each.
Or since EF doesn't (last time I checked) support Last(), order each group in reverse and take the first:
var myRows = context.MyRows
.GroupBy(mr => mr.Name)
.Select(grp => grp.OrderByDescending(mr => mr.Date).First());

var data = context.MyRows.Group(p => p.Name)
.Select(g => new {
Type = g.Key,
Date = g.OrderByDescending(p => p.Date)
.FirstOrDefault()
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find duplicate and merge record into single datatable c# - c#

Related

sum distinct values from a column in datagridview

datatable colum sum using linq in c#

Find duplicates in datatable with multiple columns except two

Sum/Count Column Data Datatable C# Console App

Entity Framework group by and get max

Categories

Resources