How do I select a single grouped column from a dataview? - c#

I have a data table tblWorkList with multiple columns: RecordNr, GroupNum, Section, SubscriberID, and quite a few others.
What I need to do is create a dataview or second datatable that is the equivalent of:
SELECT SubscriberID FROM tblWorkList GROUP BY SubscriberID;
I'm doing it in the application because I need this to end up in a dataview that will then be filtered based on multiple user inputs. I have that part working. I've spent several hours now beating my head against the internet trying to figure out how to do this, but I keep running up against errors in solutions that LOOK like they should work but end up failing spectacularly. Although, that said, I'm VERY inexperienced with LINQ right now, so I'm sure I'm missing something pretty straightforward.
(The basic functionality is this: The table contains a list of records to be processed. Basically, I need to take the table full of records, pull the subscriber IDs into a dataview, allow the user to filter that dataview down by a variety of methods (and providing the user a running count of the number of SubscriberID's matching the selected criteria), and when they're done, assign all of the records associated with the resulting SubscriberID collection to a specific analyst to be processed.)
All of the methods I've attempted to use to create the list or dataview of SubscriberID values are enclosed in this:
using (DataTable dt = dsWorkData.Tables["tblWorkData"])
The table tblWorkData contains approximately 23,000 records.
Here are several of my attempts.
Attempt 1 - Error is
Parameter may not be null. Parameter: source'
var result1 = from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
ShowMessage(result1.Count().ToString());
Attempt 2 - Error is
'Cannot implicitly convert anonymous type: string SubscriberID to DataRow'
EnumerableRowCollection<DataRow> query =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
Attempt 3 - Error is
'The [third] name 'row' does not exist in the current context.'
EnumerableRowCollection<DataRow> query2 =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select row;
Attempt 4 - same error as Attempt 1:
DataTable newDt = dt.AsEnumerable()
.GroupBy(r => new { SubscriberID = r["SubscriberID"] })
.Select(g => g.OrderBy(r => r["SubscriberID"]).First())
.CopyToDataTable();
MessageBox.Show(newDt.Rows.Count.ToString());
Attempt 5 - same error as Attempt 1:
var result = dt.AsEnumerable().GroupBy(row => row.Field<string>("SubscriberID"));
MessageBox.Show(result.Count().ToString());
Attempt 6 - same error as Attempt 1:
var results = dt.AsEnumerable().GroupBy(g => g["SubscriberID"])
.Select(x => x.First());
MessageBox.Show(results.Count().ToString());
So can someone explain what I'm doing wrong here, or at least point me in the right direction? I don't really care WHICH approach gets used, for the record, as long as there's a way to do this.

Answer was a pair of comments from NetMage:
Your SQL query is really using GROUP BY to do DISTINCT, so just use the LINQ Distinct: dt.AsEnumerable().Select(r => r.Field<string>("SubscriberID") ).Distinct().
PS Your first error implies that dt is null - source is the parameter name to AsEnumerable.

Related

How to group by records from database based on date (day) c#?

I know this could be a possible duplicate question, pardon me if it is.
Is there a way to GroupBy all the records from the database by date?
So:
say i have multiple records for this date 22/05/2022
and say i have multiple records from this date: 23/05/2022
Can i group all the records based on date parameter 22/05 and 23/05?
So that i would end up with a list containing n list for each day.
Here is what i did:
var grpQuery = await ctx.Registration.GroupBy(c => c.DateReference.Day).ToListAsync();
Where:
Registration is my table from where i am pulling the data
DateReference is a Date object containing the date
But i am getting this error "the linq expession could not be translated".
Can somone give me some advice on this?
EDIT
I tried this but it seems not to load any data, even setting break a break point will not return anything:
var grpQuery = await query.GroupBy(d => new { DateReference = d.DateReference.Date }).Select(c => new RegistrationViewModel()
{
RegistrationId = c.FirstOrDefault().RegistrationId,
PeopleId = c.FirstOrDefault().PeopleId,
DateReference = c.Key.DateReference,
DateChange = c.FirstOrDefault().DateChange,
UserRef = c.FirstOrDefault().UserRef,
CommissionId = c.FirstOrDefault().CommissionId,
ActivityId = c.FirstOrDefault().ActivityId,
MinuteWorked = c.FirstOrDefault().MinuteWorked,
}).OrderBy(d => d.DateReference).ToListAsync();
Where:
RegistrationViewModel contains all those properties including DateReference
If i call the method using the API is stuck at "pending"
First, don't. Even if the query was fixed, the equivalent query in the database would be GROUP BY DATEPART(day,registration.Date) which can't use indexes and therefore is slow.
According to the docs the equivalent of DATEPART(day, #dateTime) is dateTime.Day. The query still needs to have a proper Select though.
A correct query would be :
counts = ctx.Registrations.GroupBy(r=>r.RegistrationDate.Day)
.Select(g=>new {Day=g.Key,Count=g.Count())
.ToList();
The equivalent, slow query would be
SELECT DATEPART(day,registration.Date) as Day,count(*)
FROM Registrations
GROUP BY DATEPART(day,registration.Date)
Things get worse if we have to eg filter by date. The query would have to scan the entire table because it wouldn't be able to use any indexes covering the Date column
SELECT DATEPART(day,registration.Date) as Day,count(*)
FROM Registrations
WHERE Date >'20220901'
GROUP BY DATEPART(day,registration.Date)
Imagine having to scan 10 years of registrations only to get the current month.
This is a reporting query. For date related reports, using a prepopulated Calendar table can make the query infinitely easier.
SELECT Calendar.Day,COUNT(*)
FROM Registrations r
INNER JOIN Calendar on r.RegistrationDate=Calendar.Date
GROUP BY Calendar.Day
or
SELECT Calendar.Year, Calendar.Semester, Calendar.Day,COUNT(*)
FROM Registrations r
INNER JOIN Calendar on r.RegistrationDate=Calendar.Date
WHERE Calendar.Year = #someYear
GROUP BY Calendar.Year, Calendar.Semester,Calendar.Day
A Calendar table or Date dimension is a table with prepopulated dates, years, months, semesters or any other reporting period along with their names or anything needed to make reporting easier. Such a table can contain eg 10 or 20 years of data without taking a lot of space. To speed up queries, the columns can be aggressively indexed without taking too much extra space.
Doing the same in EF Core requires mapping Calendar as an entity and performing the JOIN in LINQ. This is one of the cases where it makes no sense to add a relation between entities :
var query=from registration in ctx.Registrations
join date in Calendar
on registration.Date equals Calendar.Date
group registration by date.Day into g
select new { Day=g.Key, Count=g.Count()};
var counts = query.ToList();
If you are using EF Core Please try this:
var grpQuery = await ctx.Registration.Select(a=>new {Re = a, G = (EF.Functions.DateDiffDay(a.End,DateTime.Today))}).ToListAsync().ContinueWith(d=>d.Result.GroupBy(a=>a.G));

Why is linq reversing order in group by

I have a linq query which seems to be reversing one column of several in some rows of an earlier query:
var dataSet = from fb in ds.Feedback_Answers
where fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID == criteriaType
&& fb.UpdatedDate >= dateFeedbackFrom && fb.UpdatedDate <= dateFeedbackTo
select new
{
fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID,
fb.QuestionID,
fb.Feedback_Questions.Text,
fb.Answer,
fb.UpdatedBy
};
Gets the first dataset and is confirmed working.
This is then grouped like this:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.Select(i => i.QuestionID),
Question = grp.Select(q => q.Text),
Answer = grp.Select(a => a.Answer)
};
While grouping, the resulting returnset (of type: string, list int, list string, list int) sometimes, but not always, turns the question order back to front, without inverting answer or questionID, which throws it off.
i.e. if the set is questionID 1,2,3 and question A,B,C it sometimes returns 1,2,3 and C,B,A
Can anyone advise why it may be doing this? Why only on the one column? Thanks!
edit: Got it thanks all! In case it helps anyone in future, here is the solution used:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.OrderBy(x=>x.QuestionID).Select(i => i.QuestionID),
Question = grp.OrderBy(x=>x.QuestionID).Select(q => q.Text),
Answer = grp.OrderBy(x=>x.QuestionID).Select(a => a.Answer)
};
Reversal of a grouped order is a coincidence: IQueryable<T>'s GroupBy returns groups in no particular order. Unlike in-memory GroupBy, which specifies the order of its groups, queries performed in RDBMS depend on implementation:
The query behavior that occurs as a result of executing an expression tree that represents calling GroupBy<TSource,TKey,TElement>(IQueryable<TSource>, Expression<Func<TSource,TKey>>, Expression<Func<TSource,TElement>>) depends on the implementation of the type of the source parameter.`
If you would like to have your rows in a specific order, you need to add OrderBy to your query to force it.
How I do it and maintain the relative list order, rather than apply an order to the resulting set?
One approach is to apply grouping to your data after bringing it into memory. Apply ToList() to dataSet at the end to bring data into memory. After that, the order of subsequent GrouBy query will be consistent with dataSet. A drawback is that the grouping is no longer done in RDBMS.

Linq Distinct not bringing back the correct results

I'm trying to select a distinct values from a DataTable using Linq. The DataTable gets populated from an excel sheet which has dynamic column apart from each excel sheet has a column name SERIAL NUMBER which is mandatory.
I have a DataTable for demo purpose which consist of 4 serial number as:
12345
12345
98765
98765
When I do
var distinctList = dt.AsEnumerable().Select(a => a).Distinct().ToList();
If I do
var distinctList = dt.AsEnumerable().Select(a => a.Field<string>("SERIAL NUMBER").Distinct().ToList();
Then I get the correct results, however but it only contains the one column from dt and not all the other columns
I get all four records instead of 2. Can someone tell me where I'm going wrong please.
The problem is that Distinct method by default uses the default equality comparer, which for DataRow is comparing by reference. To get the desired result, you can use the Distinct overload that allows you to pass IEqualityComparer<T>, and pass DataRowComparer.Default:
The DataRowComparer<TRow> class is used to compare the values of the DataRow objects and does not compare the object references.
var distinctList = dt.AsEnumerable().Distinct(DataRowComparer.Default).ToList();
For more info, see Comparing DataRows (LINQ to DataSet).
So, you want to group them by Serial Number and retrieve the full DataRow? Assuming that after grouping them we want to retrieve the first item:
var distinctList = dt.AsEnumerable().GroupBy(a => a.Field<string>("SERIAL NUMBER"))
.Select(a => a.FirstOrDefault()).Distinct().ToList();
EDIT: As requested
var distinctValues = dt.AsEnumerable().Select(a => a.Field<string>("SERIAL NUMBER")).Distinct().ToList();
var duplicateValues = dt.AsEnumerable().GroupBy(a => a.Field<string>("SERIAL NUMBER")).SelectMany(a => a.Skip(1)).Distinct().ToList();
var duplicatesRemoved = dt.AsEnumerable().Except(duplicateValues);
In ToTable method the first parameter specifies if you want Distinct records, the second specify by which column name we will make distinct.
DataTable returnVals = dt.DefaultView.ToTable(true, "ColumnNameOnWhichYouWantDistinctRecords");
Here there is no need to use linq for this task !
Using Linq a GroupBy would be better suited, by the sounds of it.
var groups = dt.AsEnumerable().GroupBy(a => a.SerialNumber).Select(_ => new {Key = _.Key, Items = _});
This will then contain groupings based on the Serial Number. With each group of items having the same serial number, but other property values different.
Try this:
List<string> distinctValues = (from row in dt.AsEnumerable() select row.Field<string>("SERIAL NUMBER")).Distinct().ToList();
However to me this also works:
List<string> distinctValues = dt.AsEnumerable().Select(row => row.Field<string>("SERIAL NUMBER")).Distinct().ToList();

c# - Copy only selected data to new datatable with linq

I've searched the web for quite some time now and can't seem to find an elegant way to
read data from one datatable,
group it by two variables with linq
select only those two variables (forget about the others in the source datatable) and
copy these items to a new datatable.
I got it working without selecting specific variables, but at the amount of data the program is going to process later I'd rather only copy what's really needed.
var temp123 = from row in oldDataTable.AsEnumerable()
orderby row["Column1"] ascending
group row by new { Column1 = row["Column1"], Column2 = row["Column2"] } into grp
select grp.First();
newDataTable = temp123.CopyToDataTable();
Can anyone please be so kind to help me out here? Thanks!
You can use custom implementation of CopyToDataTable method from this article How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
newDataTable =
oldDataTable
.AsEnumerable()
.GroupBy(r => new { Column1 = row["Column1"], Column2 = row["Column2"] })
.Select(g => g.First())
.OrderBy(x => x.Column1)
.CopyToDataTable(); // your custom extension
Another option, as Tim suggested - manual creation of DataTable.
var newDataTable = new DataTable();
newDataTable.Columns.Add("Column1");
newDataTable.Columns.Add("Column2");
foreach(var item in temp123)
newDataTable.Rows.Add(item.Column1, item.Column2);
And last option (if possible) - don't use DataTable - simply use collection of strongly typed objects.

Get an array of IDs(values) from a datatable

I have a datable with 50 rows and has an ID Column. I am trying to get an array that holds only the IDs like:
string [] IDs = (from row in DataTable.Rows
select row["ID"].toString()).ToArray();
Is there a way to do this. I always get the error "Could not find he implementation of the query...."
Use the DataTableExtensions.AsEnumerable method by adding a reference to System.Data.DataSetExtensions and a using System.Data; Then you should be able to use the following query:
var query = from row in datatable.AsEnumerable()
select row["ID"].ToString();
string[] ids = query.ToArray();
If you really need an array you can use the last line above or enclose the query in parentheses and call ToArray() as you did originally. I'm generally not a fan of the latter approach.
In fluent syntax it would be:
string[] ids = datatable.AsEnumerable()
.Select(row => row["ID"].ToString())
.ToArray();
is there is anyway you can select a list data table into a customer object array. Assuming all the columns are going to be same.

Categories

Resources