Group by one column and Distinct by another column using Linq

Group by one column and Distinct by another column using Linq - c#

I am using a Linq query to groupBy a column name and return a list of rows.
var query = from row in ProcessSummaryData.AsEnumerable()
group row by new { Key = row .Field<string>("GroupDescription") } into g
select new
{
GroupDescription = g.Key,
Values = g.ToList(),
};
The output of this query is something like this
GroupDescription Values
1 12,abc,xyz
12,abx,yut
13,tye,lki
2 14,asd,acd
Now the in the above example Values is a DataRow and I have just given an example of values in it.
Now what I want is that for GroupDescription '1' the output only has one row with '12' value.
I have tried a few things one of which is to have another Linq query on first list but that's over complicating things.
How do I use linq to group by first column and then use Distinct on certain column returned list to get only Distinct rows?

To get the first occurrence of a field's values you can group by that field and then take the first row of each grouping.
var query = from row in ProcessSummaryData.AsEnumerable()
group row by new { Key = row .Field<string>("GroupDescription") } into g
select new
{
GroupDescription = g.Key,
Values = (from value in g.ToList()
group value by value["Id"] into valueGroup
select valueGroup.First()).ToList()
};

Related

Duplicates In DataTable, Getting Last By Specifying Two Properties

I have a DataTable, with two columns of type String named ID and Value. These values are not required to be unique.
As I add to my DataTable throughout my application, at some point I am trying to get the last item that was added that meets the value of the two properties. For example, for all records where ID = 1 and Value = 2, there may be several. I need the last record.
I have been trying to use LINQ groupbys, the MyDataTable variable is my datatable.:
var groupQuery = from table in MyDataTable.AsEnumerable()
group table by new {column1 = table["PERSON_GU"], column2 = table["FIELD"]}
into groupedTable
select new
{
x = groupedTable.Key, // Each Key contains column1 and column2
y = groupedTable.Count()
};
I cant figure out how to make this select last though, it appears to return an anonymous type which is a little out of my development skill wheelhouse.
In summary, I have a datatable with two columns, I am trying to group my final datatable by these column values, and then get the last item.

If you want the last DataRow of each group:
var groupQuery =
from table in MyDataTable.AsEnumerable()
group table by new {column1 = table["PERSON_GU"], column2 = table["FIELD"]}
into groupedTable
select groupedTable.Last();

How to get a column value from data table with Linq

How to get a column value from a data table object. I have the id column on which basis I am trying to get another column value.
e.g. ApplicationId is the primary key column which I have and now I want to get the xyz column value for this ApplicationId.

I have accomplished my result by making use of the following Linq statement
List<string> lstResult= (from table in dt.AsEnumerable()
where table.Field<int>("Id") == id
select table.Field<string>("status")).ToList();
string dtStatus = lstResult[0];

you can do it lik this
var results = (from rows in dt.AsEnumerable() select new {resultcolumnname=row["resultcolumnname"]}).where(item=>item.columnname == value).ToList()

var x= from myrow in myDataTable.asEnumerable() where myrow.ApplicationId==[YourValue] select myRow.[ColumnYouWant];
I am not great when it comes to linq but this should do the trick.

C# Linq filter DataTable using array elements

I want filter the data in a data table using linq.
My scenario is I have an array of elements which contains dates created dynamically and in the data table we have columns as id,date,etc.
We have to retrieve the id's which contains all the dates in array
ex:
string[] arr={"10/10/2012","11/11/2012","9/9/2012"}
Table :
ID date
1 10/10/2012
2 11/11/2012
1 9/9/2012
6 9/9/2012
3 9/9/2012
6 11/11/2012
1 11/11/2012
Output would be 1 - because only id '1' has all the array elements.
To accomplish above functionality I am using the Linq query shown below. But I am literally failing.
Dim volunteers As DataTable =
(From leftTable In dtavailableVolunteers.AsEnumerable()
Join rightTable In dtavailableVolunteers.AsEnumerable()
On leftTable.VolunteerId Equals rightTable.VolunteerId
Where SelectedDatesArray.All(Function(i) rightTable.Field(Of String)("SelectedDate").Equals(i.ToString()))
Select rightTable).CopyToDataTable()

Lets say your datatable is dt
DataRow[] dr = dt.Select("date in (" + string.join("," , arr) + ")");
string[] st = dr.Select(ss => ss["id"].ToString()).ToArray();
OR
DataTable newdt = dr.CopyToDataTable();
Second line is of LINQ

You could group the rows by ID, and then find the groups where: there does not exist an arr element which the group's dates doesn't contain that element. I mean something like:
var result = from item in list
group item by item.ID into grouping
where !arr.Exists(date =>
!grouping.Select(x => x.Date).Contains(date))
select grouping.Key;

Here is another version:
from volunteer in dtavailableVolunteers
group volunteer by volunteer.Id into g
let volunteerDates = g.Select(groupedElement=>groupedElement.date)
where arr.All(date=>volunteerDates.Contains(date))
select g.Key

LINQ GroupBy confusion

I have
var result = (from rev in Revisions
join usr in Users on rev.UserID equals usr.ID
join clc in ChangedLinesCounts on rev.Revision equals clc.Revision
select new {rev.Revision,
rev.Date, usr.UserName, usr.ID, clc.LinesCount}).Take(6);
I make a couple of joins on different tables, not relevant for this question what keys are, but at the end of this query my result "table" contains
{Revision, Date, UserName, ID, LinesCount}
Now I execute e GroupBy in order to calculate a total lines count per user.
So..
from row in result group row by row.ID into g {1}
select new {
g.Key,
totalCount = g.Sum(count=>count.LinesCount)
};
So I get a Key=ID, and totalCount=Sum, but
Confusion
I would like to have also other fields in final result.
In my understanding "table" after {1} grouping query consist of
{Revision, Date, UserName, ID, LinesCount, TotalCount}
If my assumption is correct, why I can not do something like this:
from row in result group row by row.ID into g {1}
select new {
g.Key,
g.Revision //Revision doesn't exist ! Why ??
totalCount = g.Sum(count=>count.LinesCount)
};
but
from row in result group row by row.ID into g {1}
select new {
g.Key,
Revision = g.Select(x=>x.Revision), //Works !
totalCount = g.Sum(count=>count.LinesCount)
};
Works !, but imo, sucks, cause I execute another Select.
Infact looking on LinqPad SQL output I get 2 SQL queries.
Question
Is there any elegant and optimal way to do this, or I always need to run Select
on groupped data, in order to be able to access the fields, that exists ?

The problem is, that you only group by ID - if you'd do that in SQL, you couldn't access the other fields either...
To have the other fields as well, you have to include them in you group clause:
from row in result group row by new { row.ID, row.Revision } into g
select new {
g.Key.ID,
g.Key.Revision
totalCount = g.Sum(count=>count.LinesCount)
};

The problem here is your output logically looks something like this:
Key = 1
Id = 1, Revision = 3587, UserName = Bob, LinesCount = 34, TotalCount = 45
Id = 1, Revision = 3588, UserName = Joe, LinesCount = 64, TotalCount = 54
Id = 1, Revision = 3589, UserName = Jim, LinesCount = 37, TotalCount = 26
Key = 2
Id = 2, Revision = 3587, UserName = Bob, LinesCount = 34, TotalCount = 45
Id = 2, Revision = 3588, UserName = Joe, LinesCount = 64, TotalCount = 54
Id = 2, Revision = 3589, UserName = Jim, LinesCount = 37, TotalCount = 26
Much like if you were to perform a an SQL GROUP BY, an value is either part of the key and thus unique per group, or is in the details and thus is repeated multiple times and possibly different for each row.
Now, logically, it might be that Revision and UserName are unique for each Id but Linq has no way to know that (the same as SQL has no way to know that).
To solve this you'll need to some how specify which revision you want. For instance:
Revision = g.FirstOrDefault(x => x.Revision)
To avoid the multiple SQL problem you would need to use an aggregate function that can be translated in to SQL since most SQL dialects do not have a first operator (the result set is considered unordered so technically no item is "first").
Revision = g.Min(x => x.Revision)
Revision = g.Max(x => x.Revision)
Unfortunately Linq does not have a min/max operator for strings, so although the SQL might support this, Linq does not.
In this case you can produce an intermediate result set for the Id and totals, then join this back to the original set to get the details, eg:
from d in items
join t in (
from t in items
group by t.Id into g
select new { Id = g.Key, Total = g.Sum(x => x.LineCount) }
) on d.Id equals t.Id
select new { Id = d.Id, Revision = d.Revision, Total = t.Total }

Revision doesn't exist in your second example because it's not a member of IGrouping<T>, in IGrouping<T> you have a Key property, and it's also an IEnumerable<T> for all the rows grouped together. Thus each of those rows has a Revision, but there is no Revision for the grouping itself.
If the Revision will be the same for all rows with the same ID, you could use FirstOrDefault() so that the select nets at most one answer:
from row in result group row by row.ID into g {1}
select new {
g.Key,
Revision = g.Select(x=>x.Revision).FirstOrDefault(),
totalCount = g.Sum(count=>count.LinesCount)
};
If the Revision is not unique per ID, though, you'd want to use an anonymous type as #Tobias suggests for the grouping, then you will get a grouping based on ID and Revision.

LINQ TO DataSet: Multiple group by on a data table

I am using Linq to dataset to query a datatable. If i want to perform a group by on "Column1" on data table, I use following query
var groupQuery = from table in MyTable.AsEnumerable()
group table by table["Column1"] into groupedTable
select new
{
x = groupedTable.Key,
y = groupedTable.Count()
}
Now I want to perform group by on two columns "Coulmn1" and "Column2". Can anybody tell me the syntax or provide me a link explaining multiple group by on a data table??
Thanks

You should create an anonymous type to do a group by multiple columns:
var groupQuery = from table in MyTable.AsEnumerable()
group table by new { column1 = table["Column1"], column2 = table["Column2"] }
into groupedTable
select new
{
x = groupedTable.Key, // Each Key contains column1 and column2
y = groupedTable.Count()
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Group by one column and Distinct by another column using Linq - c#

Related

Duplicates In DataTable, Getting Last By Specifying Two Properties

How to get a column value from data table with Linq

C# Linq filter DataTable using array elements

LINQ GroupBy confusion

LINQ TO DataSet: Multiple group by on a data table

Categories

Resources