LINQ GroupBy confusion

LINQ GroupBy confusion - c#

I have
var result = (from rev in Revisions
join usr in Users on rev.UserID equals usr.ID
join clc in ChangedLinesCounts on rev.Revision equals clc.Revision
select new {rev.Revision,
rev.Date, usr.UserName, usr.ID, clc.LinesCount}).Take(6);
I make a couple of joins on different tables, not relevant for this question what keys are, but at the end of this query my result "table" contains
{Revision, Date, UserName, ID, LinesCount}
Now I execute e GroupBy in order to calculate a total lines count per user.
So..
from row in result group row by row.ID into g {1}
select new {
g.Key,
totalCount = g.Sum(count=>count.LinesCount)
};
So I get a Key=ID, and totalCount=Sum, but
Confusion
I would like to have also other fields in final result.
In my understanding "table" after {1} grouping query consist of
{Revision, Date, UserName, ID, LinesCount, TotalCount}
If my assumption is correct, why I can not do something like this:
from row in result group row by row.ID into g {1}
select new {
g.Key,
g.Revision //Revision doesn't exist ! Why ??
totalCount = g.Sum(count=>count.LinesCount)
};
but
from row in result group row by row.ID into g {1}
select new {
g.Key,
Revision = g.Select(x=>x.Revision), //Works !
totalCount = g.Sum(count=>count.LinesCount)
};
Works !, but imo, sucks, cause I execute another Select.
Infact looking on LinqPad SQL output I get 2 SQL queries.
Question
Is there any elegant and optimal way to do this, or I always need to run Select
on groupped data, in order to be able to access the fields, that exists ?

The problem is, that you only group by ID - if you'd do that in SQL, you couldn't access the other fields either...
To have the other fields as well, you have to include them in you group clause:
from row in result group row by new { row.ID, row.Revision } into g
select new {
g.Key.ID,
g.Key.Revision
totalCount = g.Sum(count=>count.LinesCount)
};

The problem here is your output logically looks something like this:
Key = 1
Id = 1, Revision = 3587, UserName = Bob, LinesCount = 34, TotalCount = 45
Id = 1, Revision = 3588, UserName = Joe, LinesCount = 64, TotalCount = 54
Id = 1, Revision = 3589, UserName = Jim, LinesCount = 37, TotalCount = 26
Key = 2
Id = 2, Revision = 3587, UserName = Bob, LinesCount = 34, TotalCount = 45
Id = 2, Revision = 3588, UserName = Joe, LinesCount = 64, TotalCount = 54
Id = 2, Revision = 3589, UserName = Jim, LinesCount = 37, TotalCount = 26
Much like if you were to perform a an SQL GROUP BY, an value is either part of the key and thus unique per group, or is in the details and thus is repeated multiple times and possibly different for each row.
Now, logically, it might be that Revision and UserName are unique for each Id but Linq has no way to know that (the same as SQL has no way to know that).
To solve this you'll need to some how specify which revision you want. For instance:
Revision = g.FirstOrDefault(x => x.Revision)
To avoid the multiple SQL problem you would need to use an aggregate function that can be translated in to SQL since most SQL dialects do not have a first operator (the result set is considered unordered so technically no item is "first").
Revision = g.Min(x => x.Revision)
Revision = g.Max(x => x.Revision)
Unfortunately Linq does not have a min/max operator for strings, so although the SQL might support this, Linq does not.
In this case you can produce an intermediate result set for the Id and totals, then join this back to the original set to get the details, eg:
from d in items
join t in (
from t in items
group by t.Id into g
select new { Id = g.Key, Total = g.Sum(x => x.LineCount) }
) on d.Id equals t.Id
select new { Id = d.Id, Revision = d.Revision, Total = t.Total }

Revision doesn't exist in your second example because it's not a member of IGrouping<T>, in IGrouping<T> you have a Key property, and it's also an IEnumerable<T> for all the rows grouped together. Thus each of those rows has a Revision, but there is no Revision for the grouping itself.
If the Revision will be the same for all rows with the same ID, you could use FirstOrDefault() so that the select nets at most one answer:
from row in result group row by row.ID into g {1}
select new {
g.Key,
Revision = g.Select(x=>x.Revision).FirstOrDefault(),
totalCount = g.Sum(count=>count.LinesCount)
};
If the Revision is not unique per ID, though, you'd want to use an anonymous type as #Tobias suggests for the grouping, then you will get a grouping based on ID and Revision.

Related

SQL to LINQ duplicated group count

I am converting a SQL result to LINQ.
The SQL is simple:
select NAME, DESC, count(*) total from dbo.TBL_ITEM_BY_PROVIDER p
inner join dbo.TBL_TYPE_PROVIDER tp on tp.id = p.provider_id
group by NAME, DESC SORT_ORDER
order by SORT_ORDER
The output is simple:
NAME DESC Count(*)
CSD Census and Statistics 5
LandsD Lands Department 52
PlandD Planning Department 29
My LINQ:
from p in data.TBL_ITEM_BY_PROVIDERs
join tp in data.TBL_TYPE_PROVIDERs on p.PROVIDER_ID equals tp.ID
group new { p, tp } by new { tp.NAME, tp.DESC } into provider
orderby (provider.Key.NAME)
select new {
provider.Key.NAME,
provider.Key.DESC,
count = (from pp in provider select pp.tp.NAME.ToList().Count())
};
and the output is a duplicated count array: [5,5,5,5,5]
0:{NAME: "CSD", DESC: "Census and Statistics", count: [5, 5, 5, 5, 5]}
1:{NAME: "LandsD", DESC: "Lands Department", count: [52, 52, 52, 52...]}
2:{NAME: "PlandD", DESC: "Planning Department", count: [29, 29, 29, 29...]}
How to properly write a group statement like SQL?

You can write the grouping a bit differently. As you only want the count of how many items there are in the group you can just:
var result = from p in data.TBL_ITEM_BY_PROVIDERs
join tp in data.TBL_TYPE_PROVIDERs on p.PROVIDER_ID equals tp.ID
group 1 by new { tp.NAME, tp.DESC } into provider
orderby provider.Key.NAME
select new {
provider.Key.NAME,
provider.Key.DESC,
Count = provider.Count()
};
Notice that the following does not do what you expect:
pp.tp.NAME.ToList().Count()
NAME is a string. Performing ToList() on it returns a List<char> so Count() on that counts the number of letters in the string. As you are doing in in the select statement of a nested query you get back a collection of the count, instead of a number.
Last, notice that in your sql your ordering is by order by SORT_ORDER and in your linq it is by order by provider.Key.NAME - Not the same field, and just by chance gives for this data the same desired ordering

According to documentation, LINQ group clause returns a sequence of IGrouping<TKey,TElement>. While IGrouping<TKey,TElement> implements IEnumerable<TElement>, to calculate count of items in the group you can just call Count() method.
Also you can simplify group clause for your query.
from item in data.TBL_ITEM_BY_PROVIDERs
join provider in data.TBL_TYPE_PROVIDERs
on item.PROVIDER_ID equals provider.ID
group item by provider into itemsByProvider
orderby itemsByProvider.Key.NAME
select new
{
itemsByProvider.Key.NAME,
itemsByProvider.Key.DESC,
count = itemsByProvider.Count()
};

Cleaner Way to Update Multiple Field Based on Condition?

Currently I am writing an application where speed is extremely important. The app processes a number of records, and at the end, I'd like to update those records that they were processed. A beta version had the following logic that worked fine:
string listOfIds = string.Join(", ", listOfIds.Select(q=> q.ID));
_db.ExecuteCommand(string.Format("update table set processed = 1 where id in ({1})", listofIds));
Where listOfIds contains a list of all of the Ids that have been processed. This works great, but now I need to set 'processed' to different values, based on what happened during the process. So I can't just set processed = 1, it's conditional. So listOfIds is actually defined like this:
List<CustomClass> listOfIds = new List<CustomClass>();
class CustomClass
{
public int Status { get; set; }
public int ID { get; set; }
}
My solution would be as follows. Instead of adding all of the records to the listOfIds, I'd have to add each possible value of 'status' to a separate list. Like this:
List<CustomClass> listOfSuccessfulIds = new List<CustomClass>();
List<CustomClass> listOfFailedIds = new List<CustomClass>();
List<CustomClass> listOfSomethingElseIds = new List<CustomClass>();
...
_db.ExecuteCommand(string.Format("update table set processed = 1 where id in ({1})", listOfSuccessfulIds ));
_db.ExecuteCommand(string.Format("update table set processed = 2 where id in ({1})", listOfFailedIds ));
_db.ExecuteCommand(string.Format("update table set processed = 3 where id in ({1})", listOfSomethingElseIds ));
This is certainly functional, but it seems messy. Especially if there are a large number of possibilities for 'processed' I feel like, as always, there's a better way to handle this.

If you don't have too many distinct values, you could use a case statement:
List<CustomClass> toUpdate = ...
var query = string.Format(#"
UPDATE table
SET processed = CASE {0} ELSE 1/0 END
WHERE id IN ({1})
",
string.Join(
" ",
toUpdate.GroupBy(c => c.Status)
.Select(g => string.Format("WHEN id IN ({0}) THEN {1}", g.Key, string.Join(",", g.Select(c => c.ID))
),
string.Join(",", toUpdate.Select(c => c.ID))
);
This will give a query like:
UPDATE table
SET processed = CASE WHEN id IN (1, 2) THEN 1 WHEN id IN (3, 4) THEN 2 ELSE 1/0 END
WHERE id IN (1, 2, 3, 4)
If you have a large number of different ids, you may be best off generating subquery and joining to that:
var subQuery = string.Join(
" UNION ALL ",
toUpdate.Select(c => string.Format("SELECT {0} AS id, {1} AS status", c.ID, c.Status)
);
Then you would execute a query like:
UPDATE t
SET t.processed = q.status
FROM table t
JOIN ({subQuery}) q
ON q.id = t.id
Finally, if this is still generating too much text, you could insert the "table" represented by the subquery into a temporary table first (e. g. using SqlBulkCopy) and then execute the above query joining to the temporary table rather than the SELECT ... UNION ALL subquery.

How to select a subset from a DataTable through LINQ?

I am new to LINQ.
I have the following DataTable
Name Date price1 price2
string DateTime decimal decimal
Jan09 14.01.2009 10.0 12.0
Feb09 14.01.2009 11.0 13.0
Jan09 15.01.2009 10.0 12.5
Feb09 15.01.2009 9.0 10.0
Jan09 18.01.2009 10.0 12.5
Feb09 18.01.2009 9.0 10.0
Name and Date are the primary compound key.
I want to select all Names for each Date, then iterate through the new collection and select the next date.
var subCollection = tab.Rows.Cast<DataRow>().Select(r1 => r1["Date"]).Select<string>(r2 => r2["Name"])
foreach (DataRow row in subCollection)
{
// do something with row
}
My Linq expression is wrong

I think what you want is to group by your Date, then look at all the Names for a given date, then most onto the next.
If that is the case, you want to use the Linq group syntax...
var query = from row in table.AsEnumerable()
group row by row["Date"] into g
select g;
You can find a lot of examples online for doing various things with the Linq group syntax. The thing I find important is realizing that you can group by multiple columns and still apply aggregate functions like Sum, Max, or Count using the following syntax:
var query = from row in table.AsEnumerable()
group row by new { Date = row["Date"], Price1 = row["Price1"] } into g
select new
{
Date = g.Key.Date,
Price = g.Key.Price1,
Count = g.Count()
};

sql query to linq-to-entities

How can this query be transform to linq
SELECT materialId, SUM(totalAmount) as quantity FROM Inventory
It's the sum part that I don't know how...
query = from inv in context.Inventory
select new MaterialQuantity()
{
MaterialId = inv.materialId,
Quantity = ??
};
EDIT
Trying to sum the value of totalAmount.
It's a view that is
materialId totalSum and other fields
1 5
1 10
1 20
So I want my linq to return me
MaterialId = 1, Quantity = 35

I'm going to give a complete guess here... assuming your inventory has multiple rows with the same materialId and you want to sum in those groups, you could use:
var query = from inv in content.Inventory
group inv.totalAmount by inv.materialId into g
select new { MaterialId = g.Key, Quantity = g.Sum() };
If you're not trying to group though, you'll need to clarify your question. Sample data and expected output would help.

Need SELECT WHERE COUNT = INT in LINQ

I have the following table that records when a particular room in a hotel (designated by a three character code [dlx, sup, jac, etc..]) is sold out on a particular DATETIME.
CREATE TABLE [dbo].[RoomSoldOut](
[SoldOutID] [int] IDENTITY(1,1) NOT NULL,
[RoomType] [nchar](3) NOT NULL,
[SoldOutDate] [datetime] NOT NULL,
CONSTRAINT [PK_RoomSoldOut5] PRIMARY KEY CLUSTERED
I need to find out when a particular date is sold out in the entire hotel. There are 8 room types and if all 8 are sold out then the hotel is booked solid for that night.
the LINQ statement to count the roomtypes sold for a given night works like this.
var solds = from r in RoomSoldOuts
group r by r.SoldOutDate into s
select new
{
Date = s.Key,
RoomTypeSoldOut = s.Count()
};
from this LINQ statement I can get a list of all the sold out DATETIME's with a COUNT of the number of rooms that are sold out.
I need to filter this list to only those DATETIME's where the COUNT = 8, because then the hotel is sold out for that day.
This should be simple but I can not figure out how to do it in LINQ

I think that you need to add the following to the query: where s.Count()==8

You can also try
var solds = (from r in RoomSoldOuts
group r by r.SoldOutDate into s
select new
{
Date = s.Key,
RoomTypeSoldOut = s.Count()
}).Where(x => x.RoomTypeSoldOut == 8);
You could then also have shortened it to only select the dates
var solds = from r in RoomSoldOuts
group r by r.SoldOutDate into s
where s.Count() == 8
select s.Key;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ GroupBy confusion - c#

Related

SQL to LINQ duplicated group count

Cleaner Way to Update Multiple Field Based on Condition?

How to select a subset from a DataTable through LINQ?

sql query to linq-to-entities

Need SELECT WHERE COUNT = INT in LINQ

Categories

Resources