Find duplicates in datatable with multiple columns except two - c#

I am new at coding and trying to check a spreadsheet for duplicate rows. The spreadsheet has 50 columns and every column has to be compared except two. If the rows is duplicated, it will combined them to one row and the amounts in columns REQNUM and AUTHNUM will be summed. Most of the samples I found use "Field("a column name")". Because of the large amount of columns, I want to use a variable that excluded the two I don't need in compare.
Example:
Before. The dots represent more columns
COL1|COL2|COL3|...|REQNUM|AUTHNUM
:-----: | :-----: | :----: |...| :----------: | :-----------: |....
x | y | z |...| 1 | 1
x | y | z |...| 2 | 3
After
COL1|COL2|COL3|...|REQNUM|AUTHNUM
------- | ------ | ------ | ...|------------ | ------------|....
x | y | z |...| 3 | 4
This is the code I have and it seems close but not quite right. I was expecting a result of just duplicate rows, so later I can run it through a foreach that will sum and delete extra rows. dtrow gets me the columns I want.(Thanks to Linq Excluding a column). When I try to use this the variable in my query, I get no results and if I remove the "g.Count() > 1" I get all the rows with them missing the two columns. I would like to keep the all the two columns in the results and not have to add them back in later.
var dtRow = dtExcel.Columns.Cast<DataColumn>().Where(c => c.ColumnName != "REQNUM" && c.ColumnName != "AUTHNUM").ToList();
var checkExcel = dtExcel.Rows.Cast<DataRow>()
.GroupBy(x => dtRow.Select(c => x[c]))
.Where(g => g.Count() > 1)
.Select(gr => gr);
//.CopyToDataTable();
Thank to Ken for help. This worked great for what I needed. I used the groupby clause so I can combine the duplicate into one row and add the number fields. also group by create a key that I use in an IF statement.
var dtRow = dtExcel.Columns.Cast<DataColumn>().Where(c => c.ColumnName != "REQNUM" && c.ColumnName != "AUTHNUM").ToList();
var excelDup = dtExcel.Rows.Cast<DataRow>()
.GroupBy(x => String.Join("", dtRow.Select(c => x[c])))
.Select(g =>
{
var row = g.First();
row.SetField("REQNUM", g.Sum(x => x.Field<double>("REQNUM")));
row.SetField("AUTHNUM", g.Sum(x => x.Field<double>("AUTHNUM")));
return row;
})
.CopyToDataTable();
I also used a where clause to create a variable for datarow compare and no key needed.
//Creates variable with all columns except three. It is used in next query
var dtExcelRow = dtExcel.Columns
.Cast().Where(c => c.ColumnName != "TITLE" && c.ColumnName != "REQSTR" && c.ColumnName != "AUTHSTR").ToList();
var dtListRow = dtList.Columns
.Cast().Where(c => c.ColumnName != "TITLE" && c.ColumnName != "REQSTR" && c.ColumnName != "AUTHSTR").ToList();
// Querys create datarow list for compare
IEnumerable<DataRow> eRow = dtExcel.AsEnumerable()
.Where(w => dtExcelRow.Select(c => w[c]).Any())
.Select(x => x);
IEnumerable<DataRow> lRow = dtList.AsEnumerable()
.Where(w => dtListRow.Select(c => w[c]).Any())
.Select(x => x);
// 1st compare gets list of new records that have changes or are new. 2nd is list of old records being change.
var newRecords = eRow.AsEnumerable().Except(lRow.AsEnumerable(), DataRowComparer.Default);
var oldRecords = lRow.AsEnumerable().Except(eRow.AsEnumerable(), DataRowComparer.Default);

You cannot just group the data by dtRow.Select(c => x[c]) because it is a IEnumerable, they may have the same content but they are still different IEnumerable.
If they are string, you may group the data by the joined string:
x => String.Join("", dtRow.Select(c => x[c]))

Related

sum distinct values from a column in datagridview

I have a datagridview with two columns like this:
group | quantity
------------------------
chest | 3
legs | 7
back | 2
chest | 1
back | 5
legs | 2
What I'm trying to do is to get the sum of distinct group to a list and use that list for populate another datagridview.
So the result must be in this example:
chest | 4
legs | 9
back | 7
I've tried some linq query code but without any success.
How can I do it?
Here's some Linq queries I tried:
List<string> vv = dataGridView1.Rows.Cast<DataGridViewRow>()
.Where(x => !x.IsNewRow)
// either..
.Where(x => x.Cells[7].Value != null)
//..or or both
.Select(x => x.Cells[7].Value.ToString())
.Distinct()
.ToList();
dataGridView6.DataSource = vv;
EDIT
the group column is being auto filled after a selection of another column combobox, the quantity is filled manually. For the group by I found this code and works but throw an error if a cell is empty:
var Sums = dataGridView1.Rows.Cast<DataGridViewRow>()
.GroupBy(row => row.Cells[7].Value.ToString()) // group column
.Select(g => new { User = g.Key, Sum = g.Sum(row => Convert.ToInt32(row.Cells[1].Value)) });
dataGridView6.DataSource = Sums.ToList();
ok, here the solution that works:
var Sums = dataGridView1.Rows.Cast<DataGridViewRow>()
.Where(row => row.Cells[7].Value != null)
.GroupBy(row => row.Cells[7].Value.ToString()) // group column
.Select(g => new { User = g.Key, Sum = g.Sum(row => Convert.ToInt32(row.Cells[1].Value)) }); // quantity column
dataGridView6.DataSource = Sums.ToList();

C# SQL filter for a pair of values?

I'm working with a data table that has some rows in it copied over as kind of a backup. I'm trying to filter out data that's been copied but I'm having some trouble because I need to filter on a pair of columns. Each row has a name and a date, a flag for if it's a copy or a few other things (has more then just two values, but these are the ones I'm interested in) and some other information. I'm trying to get all the rows that do not appear as backups, so for example:
ABC 1/1/2001 dataSet ... | ABC 1/1/2001 backupSet ...
DEF 2/2/2002 dataSet ... | DEF 2/2/2002 backupSet ...
GHI 3/3/2003 dataSet ... | ABC 4/4/2004 backupSet ...
ABC 4/4/2004 dataSet ... |
DEF 5/5/2005 dataSet ... |
ABC 6/6/2006 dataSet ... |
Would result in:
GHI 3/3/2003 dataSet ...
DEF 5/5/2005 dataSet ...
ABC 6/6/2006 dataSet ...
I can filter on one column, but I don't know how do both simultaneously.
var result = from a in db.table
where a.type == "dataSet"
let backupData = (from b in db.table where b.type == "backupSet" select b.name)
where !backupData.Contains(a.type)
select new DataObject
{
...
};
Is as far as I got.
I'm also trying to keep it to just one query since the result set could potentially be quite large, so I didn't want to just create a pair of collections in memory and then try and filter them out. Is that possible? Still a bit inexperienced at SQL, any help is appreciated.
You can use either Any (with !):
var result =
from a in db.table
where a.type == "dataSet"
&& !db.table.Any(b => b.type == "backupSet"
&& b.name == a.name && b.date == a.date)
select new DataObject
{
...
};
or (looking a bit more complicated, but in general more efficient) antijoin (implemented as left outer join with null right side):
var result =
from a in db.table.Where(x => x.type == "dataSet")
join b in db.table.Where(x => x.type == "backupSet")
on new { a.name, a.date } equals new { b.name, b.date } into bGroup
from b in bGroup.DefaultIfEmpty()
where b == null
select new DataObject
{
...
};
Depending on what you are trying to do (and what you can do), you may want to use group by, assuming name is the key to link backupSet and dataSet together:
var result = db.Table
.GroupBy(x => new { x.name, x.type }) // Composite key
.Select(g => new {
Name = g.Key.name, // ABC, DEF, etc
Type = g.Key.type, // "backupSet", "dataSet"
LastRecord = g.OrderByDescending(x => x.date).FirstOrDefault() // Last record for either backupSet or dataSet
})
.ToList() // Materialize... query can be very complex
.GroupBy(x => x.Name)
.Select(g => new {
Name = g.Key,
LastBackupSet = g.First(x => x.Type == "backupSet").LastRecord,
LastDataSet = g.First(x => x.Type == "dataSet").LastRecord
})
.ToList();
Be careful though, in the last expression g.First can throw Sequence contains no elements and .LastRecord can throw Object reference not set to an instance of an object

Successive SelectMany in Linq Request

I have three tables built with EF code first.
I try to retrieve some information with SelectMany so that I can flatten the query and get only the fields that I need among those three tables.
My tables are presented as follow:
Tables: ProductOptions *-* ProductOptionValues 1-* LanguageProductOptionValue
|ProductOptionID | OVPriceOffset | LanguagesListID
|PriceOffset | OptionValueCategory | ProductOptionValueName
| | ... |
var queryCabColor = _db.ProductOptions
.Where(c => c.ProductOptionTypeID == 18 && c.ProductId == 1)
.SelectMany(z => z.ProductOptionValues, (productOptions, productOptionValues)
=> new
{
productOptions.ProductOptionID,
productOptions.PriceOffset,
productOptionValues.OVPriceOffset,
productOptionValues.OptionValueCategory,
productOptionValues.ProductOptionValuesID,
productOptionValues.Value,
productOptionValues.LanguageProductOptionValue
})
.SelectMany(d => d.LanguageProductOptionValue, (productOptionValues, productOptionValuesTranslation)
=> new
{
productOptionValuesTranslation.LanguagesListID,
productOptionValuesTranslation.ProductOptionValueName
})
.Where(y => y.LanguagesListID == currentCulture);
So far, when I loop in the query I can just retrieve the LanguagesListID and ProductOptionValueName and I can't find a way to get all of the above mentionned fields. Any suggestion?
I think in your case the Linq syntax is more appropriate than explicit SelectMany. Something like this should work:
var queryCabColor =
from productOptions in db.ProductOptions
where productOptions.ProductOptionTypeID == 18 && productOptions.ProductId == 1
from productOptionValues in productOptions.ProductOptionValues
from productOptionValuesTranslation in productOptionValues.LanguageProductOptionValue
where productOptionValuesTranslation.LanguagesListID == currentCulture
select new
{
productOptions.ProductOptionID,
productOptions.PriceOffset,
productOptionValues.OVPriceOffset,
productOptionValues.OptionValueCategory,
productOptionValues.ProductOptionValuesID,
productOptionValues.Value,
productOptionValuesTranslation.LanguagesListID,
productOptionValuesTranslation.ProductOptionValueName
};

Find duplicate and merge record into single datatable c#

I am able to find the duplicates out of DataTable rows. Like following:
var groups = table.AsEnumerable()
.GroupBy(r => new
{
c1 = r.Field<String>("Version"),
});
var tblDuplicates = groups
.Where(grp => grp.Count() > 1)
.SelectMany(grp => grp)
.CopyToDataTable();
Now, I want to merge all the duplicate records in to single and sum it's Value column value.
Pretty much like following:
DataTable with Duplicates:
Version Value
1 2
2 2
2 1
1 3
2 1
3 2
DataTable with no duplicates and Value summed.:
Version Value
1 5
2 4
3 2
I am aware about this link which does this with the help of reflection.
http://forums.asp.net/t/1570562.aspx/1
Anyother way to do it?
Edit:
However, if I have more than two columns, like five columns and I still want to do the sum on Value column and also need other columns data in resulatant summed datatable. How to do it? Here I get the Version and Value in my result DataTable. I want other columns with values also. Like following:
Version col1 col2 Value
1 A A 2
2 B B 2
2 B B 1
1 A A 3
2 B B 1
3 C C 2
var result = table.AsEnumerable()
.GroupBy(r => r.Field<string>("Version"))
.Select(g =>
{
var row = table.NewRow();
row.ItemArray = new object[]
{
g.Key,
g.Sum(r => r.Field<int>("Value"))
};
return row;
}).CopyToDataTable();
Edit:
If you want to keep other field, try below:
var result = table.AsEnumerable()
.GroupBy(r => new
{
Version = r.Field<String>("Version"),
Col1 = r.Field<String>("Col1"),
Col2 = r.Field<String>("Col2")
})
.Select(g =>
{
var row = g.First();
row.SetField("Value", g.Sum(r => r.Field<int>("Value")));
return row;
}).CopyToDataTable();

Remove Duplicate based on column value-linq

i have many to many relationship between employee and group. following linq statement
int[] GroupIDs = {6,7};
var result = from g in umGroups
join empGroup in umEmployeeGroups on g.GroupID equals empGroup.GroupID
where GroupIDs.Contains(g.GroupID)
select new { GrpId = g.GroupID,EmployeeID = empGroup.EmployeeID };
returns groupid and the employeeid. and result is
GrpId | EmployeeID
6 | 18
6 | 20
7 | 19
7 | 20
I need to remove the rows for which the employeeid is repeating e.g. any one of the row with employeeid= 20
Thanks
Okay, if you don't care which employee is removed, you could try something like:
var result = query.GroupBy(x => x.EmployeeId)
.Select(group => group.First());
You haven't specified whether this is in LINQ to SQL, LINQ to Objects or something else... I don't know what the SQL translation of this would be. If you're dealing with a relatively small amount of data you could always force this last bit to be in-process:
var result = query.AsEnumerable()
.GroupBy(x => x.EmployeeId)
.Select(group => group.First());
At that point you could actually use MoreLINQ which has a handy DistinctBy method:
var result = query.AsEnumerable()
.DistinctBy(x => x.EmployeeId);

Categories

Resources