Linq Distinct not bringing back the correct results - c#

I'm trying to select a distinct values from a DataTable using Linq. The DataTable gets populated from an excel sheet which has dynamic column apart from each excel sheet has a column name SERIAL NUMBER which is mandatory.
I have a DataTable for demo purpose which consist of 4 serial number as:
12345
12345
98765
98765
When I do
var distinctList = dt.AsEnumerable().Select(a => a).Distinct().ToList();
If I do
var distinctList = dt.AsEnumerable().Select(a => a.Field<string>("SERIAL NUMBER").Distinct().ToList();
Then I get the correct results, however but it only contains the one column from dt and not all the other columns
I get all four records instead of 2. Can someone tell me where I'm going wrong please.

The problem is that Distinct method by default uses the default equality comparer, which for DataRow is comparing by reference. To get the desired result, you can use the Distinct overload that allows you to pass IEqualityComparer<T>, and pass DataRowComparer.Default:
The DataRowComparer<TRow> class is used to compare the values of the DataRow objects and does not compare the object references.
var distinctList = dt.AsEnumerable().Distinct(DataRowComparer.Default).ToList();
For more info, see Comparing DataRows (LINQ to DataSet).

So, you want to group them by Serial Number and retrieve the full DataRow? Assuming that after grouping them we want to retrieve the first item:
var distinctList = dt.AsEnumerable().GroupBy(a => a.Field<string>("SERIAL NUMBER"))
.Select(a => a.FirstOrDefault()).Distinct().ToList();
EDIT: As requested
var distinctValues = dt.AsEnumerable().Select(a => a.Field<string>("SERIAL NUMBER")).Distinct().ToList();
var duplicateValues = dt.AsEnumerable().GroupBy(a => a.Field<string>("SERIAL NUMBER")).SelectMany(a => a.Skip(1)).Distinct().ToList();
var duplicatesRemoved = dt.AsEnumerable().Except(duplicateValues);

In ToTable method the first parameter specifies if you want Distinct records, the second specify by which column name we will make distinct.
DataTable returnVals = dt.DefaultView.ToTable(true, "ColumnNameOnWhichYouWantDistinctRecords");
Here there is no need to use linq for this task !

Using Linq a GroupBy would be better suited, by the sounds of it.
var groups = dt.AsEnumerable().GroupBy(a => a.SerialNumber).Select(_ => new {Key = _.Key, Items = _});
This will then contain groupings based on the Serial Number. With each group of items having the same serial number, but other property values different.

Try this:
List<string> distinctValues = (from row in dt.AsEnumerable() select row.Field<string>("SERIAL NUMBER")).Distinct().ToList();
However to me this also works:
List<string> distinctValues = dt.AsEnumerable().Select(row => row.Field<string>("SERIAL NUMBER")).Distinct().ToList();

Related

How do I select a single grouped column from a dataview?

I have a data table tblWorkList with multiple columns: RecordNr, GroupNum, Section, SubscriberID, and quite a few others.
What I need to do is create a dataview or second datatable that is the equivalent of:
SELECT SubscriberID FROM tblWorkList GROUP BY SubscriberID;
I'm doing it in the application because I need this to end up in a dataview that will then be filtered based on multiple user inputs. I have that part working. I've spent several hours now beating my head against the internet trying to figure out how to do this, but I keep running up against errors in solutions that LOOK like they should work but end up failing spectacularly. Although, that said, I'm VERY inexperienced with LINQ right now, so I'm sure I'm missing something pretty straightforward.
(The basic functionality is this: The table contains a list of records to be processed. Basically, I need to take the table full of records, pull the subscriber IDs into a dataview, allow the user to filter that dataview down by a variety of methods (and providing the user a running count of the number of SubscriberID's matching the selected criteria), and when they're done, assign all of the records associated with the resulting SubscriberID collection to a specific analyst to be processed.)
All of the methods I've attempted to use to create the list or dataview of SubscriberID values are enclosed in this:
using (DataTable dt = dsWorkData.Tables["tblWorkData"])
The table tblWorkData contains approximately 23,000 records.
Here are several of my attempts.
Attempt 1 - Error is
Parameter may not be null. Parameter: source'
var result1 = from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
ShowMessage(result1.Count().ToString());
Attempt 2 - Error is
'Cannot implicitly convert anonymous type: string SubscriberID to DataRow'
EnumerableRowCollection<DataRow> query =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
Attempt 3 - Error is
'The [third] name 'row' does not exist in the current context.'
EnumerableRowCollection<DataRow> query2 =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select row;
Attempt 4 - same error as Attempt 1:
DataTable newDt = dt.AsEnumerable()
.GroupBy(r => new { SubscriberID = r["SubscriberID"] })
.Select(g => g.OrderBy(r => r["SubscriberID"]).First())
.CopyToDataTable();
MessageBox.Show(newDt.Rows.Count.ToString());
Attempt 5 - same error as Attempt 1:
var result = dt.AsEnumerable().GroupBy(row => row.Field<string>("SubscriberID"));
MessageBox.Show(result.Count().ToString());
Attempt 6 - same error as Attempt 1:
var results = dt.AsEnumerable().GroupBy(g => g["SubscriberID"])
.Select(x => x.First());
MessageBox.Show(results.Count().ToString());
So can someone explain what I'm doing wrong here, or at least point me in the right direction? I don't really care WHICH approach gets used, for the record, as long as there's a way to do this.
Answer was a pair of comments from NetMage:
Your SQL query is really using GROUP BY to do DISTINCT, so just use the LINQ Distinct: dt.AsEnumerable().Select(r => r.Field<string>("SubscriberID") ).Distinct().
PS Your first error implies that dt is null - source is the parameter name to AsEnumerable.

With LINQ DISTINCT a Data Table Multiple Columns Excluding a Single Column

I have a C# DataTable. I am retrieving Data into DataTable. After that I am trying to DISTINCT entry's at the same time creating a List<MyObject>.
Here is the code with what I am chasing with:
viewModelList = (from item in response.AsEnumerable()
select new
{
description = DataTableOperationHelper.GetStringValue(item, "description"),
unitCost = DataTableOperationHelper.GetDecimalValue(item, "unitcost"),
defaultChargeable = DataTableOperationHelper.GetBoolValue(item, "defaultChargeable"),
contractId = DataTableOperationHelper.GetIntValue(item, "contractID"),
consumableid = DataTableOperationHelper.GetIntValue(item, "consumableid")
})
.Distinct()
.Select(x => new ConsumablesViewModel(
x.description,
x.unitCost,
x.defaultChargeable,
x.contractId,
x.consumableid)
)
.ToList();
I just want to exclude a single column (consumableid) when I am doing DISTINCT. How could I DISTINCT with my rest of the Data Excluding a single value (consumableid)?
Take a look at this answered question (LinQ distinct with custom comparer leaves duplicates).
Basically, you create an equality comparer for your type that allows you to decide what makes an object distinct.

c# - Copy only selected data to new datatable with linq

I've searched the web for quite some time now and can't seem to find an elegant way to
read data from one datatable,
group it by two variables with linq
select only those two variables (forget about the others in the source datatable) and
copy these items to a new datatable.
I got it working without selecting specific variables, but at the amount of data the program is going to process later I'd rather only copy what's really needed.
var temp123 = from row in oldDataTable.AsEnumerable()
orderby row["Column1"] ascending
group row by new { Column1 = row["Column1"], Column2 = row["Column2"] } into grp
select grp.First();
newDataTable = temp123.CopyToDataTable();
Can anyone please be so kind to help me out here? Thanks!
You can use custom implementation of CopyToDataTable method from this article How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
newDataTable =
oldDataTable
.AsEnumerable()
.GroupBy(r => new { Column1 = row["Column1"], Column2 = row["Column2"] })
.Select(g => g.First())
.OrderBy(x => x.Column1)
.CopyToDataTable(); // your custom extension
Another option, as Tim suggested - manual creation of DataTable.
var newDataTable = new DataTable();
newDataTable.Columns.Add("Column1");
newDataTable.Columns.Add("Column2");
foreach(var item in temp123)
newDataTable.Rows.Add(item.Column1, item.Column2);
And last option (if possible) - don't use DataTable - simply use collection of strongly typed objects.

How to convert to int and then compare in linq query c#

IEnumerable<classB> list = getItems();
//dt is datatable
list = list.Where(x => Convert.ToInt32( !dt.Columns["Id"]) == (x.Id));
I want to only keep the items in the list which match in datatable id column. The rest are removed. I m not doing it right.
The datatable can have: ID - 1,3,4,5,7
The list can have: ID - 1,2,3,4,5,6,7,8,9,10
I want the output list to have: ID - 1,3,4,5,7
Your code won't work because you're comparing a definition of a column to an integer value. That's not a sensible comparison to make.
What you can do is put all of the values from the data table into a collection that can be effectively searched and then get all of the items in the list that are also in that collection:
var ids = new HashSet<int>(dt.AsEnumerable()
.Select(row => row.Field<int>("Id"));
list = list.Where(x => ids.Contains(x.Id));
Try this one
var idList = dt.AsEnumerable().Select(d => (int) d["Id"]).ToList();
list = list.Where(x => idList.Contains(x.Id));
You can't do it like that. Your dt.Columns["Id"] returns the DataColumn and not the value inside that column in a specific datarow. You need to make a join between two linq query, the first one you already have, the other you need to get from the DataTable.
var queryDt = (from dtRow in dt
where !dtRow.IsNull("Id")
select int.Parse(dtRow["Id"])).ToList();
Now the join
var qry = from nonNull in queryDt
join existing in list on nonNull equals list.id

Get an array of IDs(values) from a datatable

I have a datable with 50 rows and has an ID Column. I am trying to get an array that holds only the IDs like:
string [] IDs = (from row in DataTable.Rows
select row["ID"].toString()).ToArray();
Is there a way to do this. I always get the error "Could not find he implementation of the query...."
Use the DataTableExtensions.AsEnumerable method by adding a reference to System.Data.DataSetExtensions and a using System.Data; Then you should be able to use the following query:
var query = from row in datatable.AsEnumerable()
select row["ID"].ToString();
string[] ids = query.ToArray();
If you really need an array you can use the last line above or enclose the query in parentheses and call ToArray() as you did originally. I'm generally not a fan of the latter approach.
In fluent syntax it would be:
string[] ids = datatable.AsEnumerable()
.Select(row => row["ID"].ToString())
.ToArray();
is there is anyway you can select a list data table into a customer object array. Assuming all the columns are going to be same.

Categories

Resources