Why is linq reversing order in group by - c#

I have a linq query which seems to be reversing one column of several in some rows of an earlier query:
var dataSet = from fb in ds.Feedback_Answers
where fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID == criteriaType
&& fb.UpdatedDate >= dateFeedbackFrom && fb.UpdatedDate <= dateFeedbackTo
select new
{
fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID,
fb.QuestionID,
fb.Feedback_Questions.Text,
fb.Answer,
fb.UpdatedBy
};
Gets the first dataset and is confirmed working.
This is then grouped like this:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.Select(i => i.QuestionID),
Question = grp.Select(q => q.Text),
Answer = grp.Select(a => a.Answer)
};
While grouping, the resulting returnset (of type: string, list int, list string, list int) sometimes, but not always, turns the question order back to front, without inverting answer or questionID, which throws it off.
i.e. if the set is questionID 1,2,3 and question A,B,C it sometimes returns 1,2,3 and C,B,A
Can anyone advise why it may be doing this? Why only on the one column? Thanks!
edit: Got it thanks all! In case it helps anyone in future, here is the solution used:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.OrderBy(x=>x.QuestionID).Select(i => i.QuestionID),
Question = grp.OrderBy(x=>x.QuestionID).Select(q => q.Text),
Answer = grp.OrderBy(x=>x.QuestionID).Select(a => a.Answer)
};

Reversal of a grouped order is a coincidence: IQueryable<T>'s GroupBy returns groups in no particular order. Unlike in-memory GroupBy, which specifies the order of its groups, queries performed in RDBMS depend on implementation:
The query behavior that occurs as a result of executing an expression tree that represents calling GroupBy<TSource,TKey,TElement>(IQueryable<TSource>, Expression<Func<TSource,TKey>>, Expression<Func<TSource,TElement>>) depends on the implementation of the type of the source parameter.`
If you would like to have your rows in a specific order, you need to add OrderBy to your query to force it.
How I do it and maintain the relative list order, rather than apply an order to the resulting set?
One approach is to apply grouping to your data after bringing it into memory. Apply ToList() to dataSet at the end to bring data into memory. After that, the order of subsequent GrouBy query will be consistent with dataSet. A drawback is that the grouping is no longer done in RDBMS.

Related

How do I select a single grouped column from a dataview?

I have a data table tblWorkList with multiple columns: RecordNr, GroupNum, Section, SubscriberID, and quite a few others.
What I need to do is create a dataview or second datatable that is the equivalent of:
SELECT SubscriberID FROM tblWorkList GROUP BY SubscriberID;
I'm doing it in the application because I need this to end up in a dataview that will then be filtered based on multiple user inputs. I have that part working. I've spent several hours now beating my head against the internet trying to figure out how to do this, but I keep running up against errors in solutions that LOOK like they should work but end up failing spectacularly. Although, that said, I'm VERY inexperienced with LINQ right now, so I'm sure I'm missing something pretty straightforward.
(The basic functionality is this: The table contains a list of records to be processed. Basically, I need to take the table full of records, pull the subscriber IDs into a dataview, allow the user to filter that dataview down by a variety of methods (and providing the user a running count of the number of SubscriberID's matching the selected criteria), and when they're done, assign all of the records associated with the resulting SubscriberID collection to a specific analyst to be processed.)
All of the methods I've attempted to use to create the list or dataview of SubscriberID values are enclosed in this:
using (DataTable dt = dsWorkData.Tables["tblWorkData"])
The table tblWorkData contains approximately 23,000 records.
Here are several of my attempts.
Attempt 1 - Error is
Parameter may not be null. Parameter: source'
var result1 = from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
ShowMessage(result1.Count().ToString());
Attempt 2 - Error is
'Cannot implicitly convert anonymous type: string SubscriberID to DataRow'
EnumerableRowCollection<DataRow> query =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select new { SubscriberID = grp.Key };
Attempt 3 - Error is
'The [third] name 'row' does not exist in the current context.'
EnumerableRowCollection<DataRow> query2 =
from row in dt.AsEnumerable()
group row by row.Field<string>("SubscriberID") into grp
select row;
Attempt 4 - same error as Attempt 1:
DataTable newDt = dt.AsEnumerable()
.GroupBy(r => new { SubscriberID = r["SubscriberID"] })
.Select(g => g.OrderBy(r => r["SubscriberID"]).First())
.CopyToDataTable();
MessageBox.Show(newDt.Rows.Count.ToString());
Attempt 5 - same error as Attempt 1:
var result = dt.AsEnumerable().GroupBy(row => row.Field<string>("SubscriberID"));
MessageBox.Show(result.Count().ToString());
Attempt 6 - same error as Attempt 1:
var results = dt.AsEnumerable().GroupBy(g => g["SubscriberID"])
.Select(x => x.First());
MessageBox.Show(results.Count().ToString());
So can someone explain what I'm doing wrong here, or at least point me in the right direction? I don't really care WHICH approach gets used, for the record, as long as there's a way to do this.
Answer was a pair of comments from NetMage:
Your SQL query is really using GROUP BY to do DISTINCT, so just use the LINQ Distinct: dt.AsEnumerable().Select(r => r.Field<string>("SubscriberID") ).Distinct().
PS Your first error implies that dt is null - source is the parameter name to AsEnumerable.

Lambda Distinct not working

I am unable to get a distinct list of 'Order' from my Lambda query. Even though am using the keyword Distinct() it is still returning repeated select list item.
public ActionResult Index()
{
var query = _dbContext.Orders
.ToList()
.Select(x => new SelectListItem
{
Text = x.OrderID.ToString(),
Value = x.ShipCity
})
.OrderBy(y => y.Value)
.Distinct();
ViewBag.DropDownValues = new SelectList(query, "Text", "Value");
return View();
}
Any suggestions please?
UPDATE
Sorry guys I genuinely missed out the Distinct() from my code. I have now added it to my code.
I am basically trying to get all distinct rows where yes the values are same but the ids are different.
Same as this SQL Query......
SELECT distinct [ShipCity] FROM [northwind].[dbo].[Orders] ORDER by ShipCity
I'm assuming you removed your distinct from the end of the query.
Actually for that matter i don't see how you could get duplicate orders at all since you're doing nothing in your query except selecting and your query is on a table in a database, so you already can't get the same row multiple time.
What do you call a "duplicate"? If you mean two rows with the same values except their ID that's not a duplicate at all, that's just two unrelated rows, with the same values . . .
If on the other hand you mean you expect them to be equal because you're tossing the .Distinct after the select and you're only using OrderId and ShipCity in there for which there are duplicates (and i really don't see why a column named OrderId in an orders table should have duplicates but that's another issue) then that still won't work because you're NOT selecting OrderId nor ShipCity, you're selecting a new SelectListItem and if you create two reference types with the same value, they're not equal in .NET, they need to be the same instance to be equal, not two instances with different values.
edited following your comment :
var query = _dbContext.Orders
.ToList()
// Group them by what you want to "distint" on
.GroupBy(item=>item.ShipCity)
// For each of those groups grab the first item, we just faked a distinct)
.Select(item=>item.First())
.Select(x => new SelectListItem
{
Text = x.OrderID.ToString(),
Value = x.ShipCity
})
.OrderBy(y => y.Value)
.Distinct();

Linq query to filter on most recent value / record

I have a 'complex' linq query I would like to improve and to understand.
(from x in tblOrder
orderby x.OrderNo
// where x.Filename is most recent filename for this order
group x by new { x.OrderNo, x.Color } into groupedByColorCode
select new
{
OrderNo = groupedByColorCode.Key.OrderNo,
ProductRef = groupedByColorCode.FirstOrDefault().ProductRef,
Color = groupedByColorCode.Key.Color,
Packing = groupedByColorCode.FirstOrDefault().Packing,
TotalQuantity = groupedByColorCode.Sum(bcc => bcc.OriQty).ToString()
}
x is an Order. I also would like to filter by Filename. Filename is a variable from tblOrder. Actually I would like to keep and keep only the orders from the most recent file.
What 'where' clause should I add to my linq query to be able to filter these last file name.
Thank you
First it's better to use orderby in the end of the query, because sorting will work quicker on the smaller set of data.
Second you should use where in the top of query, it will make smaller your set before grouping and sorting (set it after from line)
At last grouping creates dictionary with Key = new { x.OrderNo, x.Color } (in this keys) and Value = IEnumerable, and then groupedByColorCode becomes IEnumerabler of {Key, Value}. So it should stand in the end before orederby
there is MaxBy() or MinBy() if you need max or min by some criteria

With LINQ DISTINCT a Data Table Multiple Columns Excluding a Single Column

I have a C# DataTable. I am retrieving Data into DataTable. After that I am trying to DISTINCT entry's at the same time creating a List<MyObject>.
Here is the code with what I am chasing with:
viewModelList = (from item in response.AsEnumerable()
select new
{
description = DataTableOperationHelper.GetStringValue(item, "description"),
unitCost = DataTableOperationHelper.GetDecimalValue(item, "unitcost"),
defaultChargeable = DataTableOperationHelper.GetBoolValue(item, "defaultChargeable"),
contractId = DataTableOperationHelper.GetIntValue(item, "contractID"),
consumableid = DataTableOperationHelper.GetIntValue(item, "consumableid")
})
.Distinct()
.Select(x => new ConsumablesViewModel(
x.description,
x.unitCost,
x.defaultChargeable,
x.contractId,
x.consumableid)
)
.ToList();
I just want to exclude a single column (consumableid) when I am doing DISTINCT. How could I DISTINCT with my rest of the Data Excluding a single value (consumableid)?
Take a look at this answered question (LinQ distinct with custom comparer leaves duplicates).
Basically, you create an equality comparer for your type that allows you to decide what makes an object distinct.

Most recent records with 2 tables and take / skip

What I want to do, is basically what this question offers: SQL Server - How to display most recent records based on dates in two tables .. Only difference is: I am using Linq to sql.
I have to tables:
Assignments
ForumPosts
These are not very similar, but they both have a "LastUpdated" field. I want to get the most recent joined records. However, I also need a take/skip functionality for paging (and no, I don't have SQL 2012).
I don't want to create a new list (with ToList and AddRange) with ALL my records, so I know the whole set of records, and then order.. That seems extremely unefficient.
My attempt:
Please don't laugh at my inefficient code.. Well ok, a little (both because it's inefficient and... it doesn't do what I want when skip is more than 0).
public List<TempContentPlaceholder> LatestReplies(int take, int skip)
{
using (GKDBDataContext db = new GKDBDataContext())
{
var forumPosts = db.dbForumPosts.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
var assignMents = db.dbUploadedAssignments.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
List<TempContentPlaceholder> fps =
forumPosts.Select(
c =>
new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
List<TempContentPlaceholder> asm =
assignMents.Select(
c =>
new TempContentPlaceholder()
{
Id = c.UploadAssignmentId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
fps.AddRange(asm);
return fps.OrderBy(c=>c.LastUpdated).ToList();
}
}
Any awesome Linq to SQl people, who can throw me a hint? I am sure someone can join their way out of this!
First, you should be using OrderByDescending, since later dates have greater values than earlier dates, in order to get the most recent updates. Second, I think what you are doing will work, for the first page, but you need to only take the top take values from the joined list as well. That is if you want the last 20 entries from both tables combined, take the last 20 entries from each, merge them, then take the last 20 entries from the merged list. The problem comes in when you attempt to use paging because what you will need to do is know how many elements from each list went into making up the previous pages. I think, your best bet is probably to merge them first, then use skip/take. I know you don't want to hear that, but other solutions are probably more complex. Alternatively, you could take the top skip+take values from each table, then merge, skip the skip values and apply take.
using (GKDBDataContext db = new GKDBDataContext())
{
var fps = db.dbForumPosts.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
})
.Concat( db.dbUploadedAssignments.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}))
.OrderByDescending( c => c.LastUpdated )
.Skip(skip)
.Take(take)
.ToList();
return fps;
}

Categories

Resources