Using C# LINQ, I would like to be able to turn the following:
[
{
Id: 1,
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 1
},
{
Id: 2,
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 1
},
{
Id: 3,
StartDate: '2021-03-10',
EndDate: '2021-03-23',
Quantity: 2
},
{
Id: 4,
StartDate: '2021-03-10',
EndDate: '2021-03-25',
Quantity: 1
}
]
Into this:
[
{
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 5,
Ids: [1, 2, 3, 4]
},
{
StartDate: '2021-03-22',
EndDate: '2021-03-23',
Quantity: 3,
Ids: [3, 4]
}, {
StartDate: '2021-03-24',
EndDate: '2021-03-25',
Quantity: 1,
Ids: [4]
}
]
In this scenario:
EndDate may change per entry on the input but StartDate will always be the same.
It is possible that two entries may have the same EndDate. In this case, the results will aggregate, with the results showing one entry for those dates and a summed quantity.
Desired solution:
The LINQ would group by unique date range, sum the quantity value and include an array of ids, indicating which date ranges have been covered.
Help is much appreciated.
This question is a step in the right direction but doesn't take care of indicating the date range as demonstrated.
Select multiple fields group by and sum
So given this class:
public class Entry
{
public int Id { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
}
And this starting point:
Entry[] entries = JsonConvert.DeserializeObject<Entry[]>(jsonText);
I first get a list of the distinct dates involved:
DateTime[] dates =
entries
.SelectMany(x => new [] { x.StartDate, x.EndDate })
.Distinct()
.OrderBy(x => x)
.ToArray();
Now I can query to split each Entry into the set of distinct date ranges:
var query =
from e in entries
let splitters =
dates
.Where(x => x >= e.StartDate)
.Where(x => x <= e.EndDate)
.ToArray()
from s in
splitters
.Skip(1)
.Zip(
splitters,
(s1, s0) => new Entry()
{
Id = e.Id,
StartDate = s0,
EndDate = s1,
Quantity = e.Quantity
})
group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
select new
{
gss.Key.StartDate,
gss.Key.EndDate,
Quantity = gss.Sum(gs => gs.Quantity),
Ids = gss.Select(gs => gs.Id).ToArray(),
};
That gives me:
Or:
[
{
"StartDate": "2021-03-10T00:00:00",
"EndDate": "2021-03-21T00:00:00",
"Quantity": 5,
"Ids": [
1,
2,
3,
4
]
},
{
"StartDate": "2021-03-21T00:00:00",
"EndDate": "2021-03-23T00:00:00",
"Quantity": 3,
"Ids": [
3,
4
]
},
{
"StartDate": "2021-03-23T00:00:00",
"EndDate": "2021-03-25T00:00:00",
"Quantity": 1,
"Ids": [
4
]
}
]
If I'm allowed to use Microsoft's "System.Interactive" library then it can be done like this:
var query =
from e in entries
from s in
dates
.Where(x => x >= e.StartDate)
.Where(x => x <= e.EndDate)
.Buffer(2, 1)
.SkipLast(1)
.Select(x => new Entry()
{
Id = e.Id,
StartDate = x[0],
EndDate = x[1],
Quantity = e.Quantity
})
group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
select new
{
gss.Key.StartDate,
gss.Key.EndDate,
Quantity = gss.Sum(gs => gs.Quantity),
Ids = gss.Select(gs => gs.Id).ToArray(),
};
These are a couple of LINQ extension methods that are based on the APL scan operator, extended. scan is like Aggregate, but returns the intermediate results. These are a variation that take pairs of items and then processes them.
public static class IEnumerableExt {
// TRes firstResFn(T firstValue)
// TRes combineFn(T PrevValue, T CurValue)
// returns firstResFn(items.First()) then ScanByPairs(items, combineFn)
public static IEnumerable<TRes> ScanByPairs<T, TRes>(this IEnumerable<T> items, Func<T, TRes> firstResFn, Func<T, T, TRes> combineFn) {
using (var itemsEnum = items.GetEnumerator())
if (itemsEnum.MoveNext()) {
var prev = itemsEnum.Current;
yield return firstResFn(prev);
while (itemsEnum.MoveNext())
yield return combineFn(prev, prev = itemsEnum.Current);
}
}
// THelper helperSeedFn(T FirstValue)
// TRes resSeedFn(T FirstValue)
// (THelper Helper, TRes Res) combineFn((THelper Helper, TRes PrevRes), T CurValue)
// returns resSeedFn, combineFn,...
public static IEnumerable<TRes> ScanToPairsWithHelper<T, THelper, TRes>(this IEnumerable<T> items, Func<T, THelper> helperSeedFn, Func<T, TRes> resSeedFn, Func<(THelper Helper, TRes PrevRes), T, (THelper Helper, TRes Res)> combineFn) {
using (var itemsEnum = items.GetEnumerator())
if (itemsEnum.MoveNext()) {
var seed = (Helper: helperSeedFn(itemsEnum.Current), Res: resSeedFn(itemsEnum.Current));
while (itemsEnum.MoveNext()) {
yield return seed.Res;
seed = combineFn(seed, itemsEnum.Current);
}
yield return seed.Res;
}
}
}
First, create an intermediate list that combines any identical date ranges:
var int1 = src.GroupBy(s => s.EndDate)
.Select(sg => new {
Ids = sg.Select(s => s.Id).ToList(),
StartDate = sg.First().StartDate,
EndDate = sg.Key,
Quantity = sg.Sum(s => s.Quantity)
});
Then use ScanByPairs to compute the new start dates so ranges don't overlap (this assumes EndDate is in increasing order):
var int2 = int1.ScanByPairs(s => new { s.Ids, s.Quantity, s.StartDate, s.EndDate },
(prev, next) => new { next.Ids, next.Quantity, StartDate = prev.EndDate.AddDays(1), next.EndDate });
Finally, process the previous result in reverse, aggregating the quantities and IDs as you go:
var ans = int2.Reverse()
.ScanToPairsWithHelper(first => new { first.Ids, first.Quantity },
first => new { first.Ids, first.Quantity, first.StartDate, first.EndDate },
(helpernext, prev) => (new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity },
new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity, prev.StartDate, prev.EndDate }))
.Reverse();
This is rather tough to do with LINQ. I recommend using MoreLINQ for that.
// these helpers make things much easier
using static MoreLinq.Extensions.GroupAdjacentExtension;
using static MoreLinq.Extensions.ScanRightExtension;
using static MoreLinq.Extensions.WindowRightExtension;
// ...
var output = input
// aggregate identical end dates
.GroupAdjacent(x => x.EndDate)
.Select(xs => xs.Aggregate(
new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
(acc, curr) => new {
Ids = acc.Ids.Append(curr.Id),
Quantity = acc.Quantity + curr.Quantity,
curr.StartDate,
curr.EndDate
}))
// cut off overlapping start dates
.WindowRight(2)
.Select(win => win.Count == 1
? win[0]
: new {
win[1].Ids,
win[1].Quantity,
StartDate = win[0].EndDate.AddDays(1),
win[1].EndDate
})
// accumulate IDs and quantities
.ScanRight((curr, prev) => new {
Ids = curr.Ids.Concat(prev.Ids),
Quantity = curr.Quantity + prev.Quantity,
curr.StartDate,
curr.EndDate
});
This doesn't iterate multiple times and efficiently streams everything. But its a lot of code.
Working example: https://dotnetfiddle.net/YxDQEX
The main trick is to iterate the input backwards. This let's us easily accumulate the IDs and quantities, because they always grow when scanning the list from back to front. E.g. the second-to-last item's quantity will be its own quantity plus the quantity of the last item; the third-to-last item's quantity will be its own plus the quantity of the second-to-last item; etc.
ScanRight() does the backwards iteration for us. It is similar to Aggregate(), but it yields the intermediate values of the accumulate as well, thus returning an IEnumerable<TAccumulate> rather than a single TAccumulate.
It might be possible to get a cleaner version of this using a traditional foreach loop, but I kind of doubt it. With a foreach you would have to do everything at the same time (i.e. in the same loop body) in order to do it efficiently and that will probably become unreadable as well (considering all the state that has to be managed).
With the LINQ version you at least have three disparate steps that you could extract using extension functions:
public static IEnumerable<T> MergeEqualEndDates<T>(this IEnumerable<T> xs) => xs
.GroupAdjacent(x => x.EndDate)
.Select(xs => xs.Aggregate(
new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
(acc, curr) => new {
Ids = acc.Ids.Append(curr.Id),
Quantity = acc.Quantity + curr.Quantity,
curr.StartDate,
curr.EndDate
}));
Which could leave you with:
var output = input
.MergeEqualEndDates()
.TrimStartDates()
.AccumulateIdsAndQuantities();
Thank you to #GoodNightNerdPride and #NetMage for their help.
This is the version I came up with. I cheated and used a for-loop. It is less elegant but I do find it easier to read.
Below is the main extention method (which borrows from #NetMage) and beneath that is an Xunit file I used to test the whole thing.
Extention Method
public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {
var requestsWithGroupedIds = requests
.GroupBy(x => x.EndDate)
.Select(x => new {
Ids = x.Select(y => y.Id).ToArray(),
StartDate = x.First().StartDate,
EndDate = x.Key,
Quantity = x.Sum(y => y.Quantity)
})
.OrderBy(x => x.EndDate)
.ToList();
var responses = new List<Response>();
for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {
var req = requestsWithGroupedIds[i];
var overlappingEntries = requestsWithGroupedIds
.Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
.ToList();
var resp = new Response {
Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
Quantity = overlappingEntries.Sum(x => x.Quantity),
StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
EndDate = req.EndDate
};
responses.Add(resp);
}
return responses;
}
XUnit Code
using System;
using System.Linq;
using Xunit;
using System.Collections.Generic;
namespace StackOverflow.Tests {
public class GroupingAndAggregationTests {
[Fact]
void ShouldAggregateDuplicateDataIntoSingle() {
var requests = new List<Request>() {
new Request{Id = 9, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 6 },
new Request{Id = 345, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 29 }
};
var responses = requests.SumBasedUponOverlappingDates();
Assert.Equal(1, responses.Count());
var expectedResponse =
new Response() {
StartDate = new DateTime(2021, 2, 20),
EndDate = new DateTime(2021, 3, 5),
Quantity = 35,
Ids = new int[] { 9, 345 }
};
var actualResponse = responses[0];
Assert.True(actualResponse.IsEqual(expectedResponse));
}
[Fact]
void ShouldAggregateMultipleBasedOnOverlappingDates() {
var requests = new List<Request>() {
new Request{Id = 1, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
new Request{Id = 2, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
new Request{Id = 3, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 23), Quantity = 2 },
new Request{Id = 4, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 25), Quantity = 1 }
};
var responses = requests.SumBasedUponOverlappingDates();
Assert.Equal(3, responses.Count());
var expecedResponse1 =
new Response() {
StartDate = new DateTime(2021, 3, 10),
EndDate = new DateTime(2021, 3, 21),
Quantity = 5,
Ids = new int[] { 1, 2, 3, 4 }
};
var actualResponse1 = responses[0];
Assert.True(actualResponse1.IsEqual(expecedResponse1));
var expectedResponse2 =
new Response() {
StartDate = new DateTime(2021, 3, 22),
EndDate = new DateTime(2021, 3, 23),
Quantity = 3,
Ids = new int[] { 3, 4 }
};
var actualResponse2 = responses[1];
Assert.True(actualResponse2.IsEqual(expectedResponse2));
var expectedResponse3 =
new Response() {
StartDate = new DateTime(2021, 3, 24),
EndDate = new DateTime(2021, 3, 25),
Quantity = 1,
Ids = new int[] { 4 }
};
var actualResponse3 = responses[2];
Assert.True(actualResponse3.IsEqual(expectedResponse3));
}
}
public class Request {
public int Id { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
}
public class Response {
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
public int[] Ids { get; set; }
public bool IsEqual(Response resp)
=>
StartDate == resp.StartDate &&
EndDate == resp.EndDate &&
Quantity == resp.Quantity &&
Ids.OrderBy(x => x).SequenceEqual(resp.Ids.OrderBy(x => x));
}
public static class ExtentionMethods {
public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {
var requestsWithGroupedIds = requests
.GroupBy(x => x.EndDate)
.Select(x => new {
Ids = x.Select(y => y.Id).ToArray(),
StartDate = x.First().StartDate,
EndDate = x.Key,
Quantity = x.Sum(y => y.Quantity)
})
.OrderBy(x => x.EndDate)
.ToList();
var responses = new List<Response>();
for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {
var req = requestsWithGroupedIds[i];
var overlappingEntries = requestsWithGroupedIds
.Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
.ToList();
var resp = new Response {
Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
Quantity = overlappingEntries.Sum(x => x.Quantity),
StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
EndDate = req.EndDate
};
responses.Add(resp);
}
return responses;
}
}
}
I have a collection of group users which has a GroupId and UserId. I need to filter out any duplicate GroupId/UserId objects which may exist in the collection. How can I write a non-query expression GroupBy to filter out the duplicate rows? The following example is adapted from a group by example that I found online but I'm not quite clear on how to refine this code for my particular scenario:
var groupByResults =
groupUsers.GroupBy(
x => x.GroupId,
x => x.UserId,
(key, g) => new
{
[?] = key,
[?] = g.ToList()
}
);
If your data looks like the list below you can group by the compound key then take the first value in each group. The OrderBy is optional
var groupUsers = new List<dynamic>() {
new { groupId = 1, userId = 1, name = "a" },
new { groupId = 1, userId = 1, name = "b" },
new { groupId = 1, userId = 2, name = "c" }
};
var result = groupUsers
.GroupBy(u => new { u.groupId, u.userId} )
.Select(g => g.OrderBy(u => u.name).FirstOrDefault());
To find out the duplicated userId, groupId.
GroupBy userId, groupId
Count if any group item >=2
SelectMany the collection
Code:
var duplicatedUsers = groupUsers
.GroupBy(gu => new { gu.UserId, gu.GroupId })
.Where(g => g.Count() >= 2)
.SelectMany(g => g)
Following code will be helpful to you,
class GroupUsers
{
public int GroupId {get;set;}
public int UserId {get;set;}
}
public class Program
{
public static void Main()
{
var groupUsers = new List<GroupUsers>() {
new GroupUsers{ GroupId = 1, UserId = 1},
new GroupUsers{ GroupId = 1, UserId = 1},
new GroupUsers{ GroupId = 1, UserId = 2},
new GroupUsers{ GroupId = 1, UserId = 2},
new GroupUsers{ GroupId = 1, UserId = 3},
new GroupUsers{ GroupId = 1, UserId = 4},
new GroupUsers{ GroupId = 1, UserId = 5},
new GroupUsers{ GroupId = 1, UserId = 3}
};
var result1 = groupUsers
.GroupBy(u => new { u.GroupId, u.UserId} )
.Where(g => g.Count()>=2) // check for duplicate value by checking whether the count is greater than or equal to 2.
.SelectMany(g=>g); // flatten the list
foreach(var user in result1) // Iterate over the result
{
Console.WriteLine(user.GroupId +" "+user.UserId);
}
// Or
var result2 = from a in groupUsers
group a by new{a.GroupId, a.UserId} into grp
where grp.Count()>=2
from g in grp select new{g}
foreach(var user in result2)
{
Console.WriteLine(user.g.GroupId +" "+user.g.UserId);
}
}
}
var ids = new int[] { 3, 2, 20, 1 };
var entities = categories.Where(entity => ids.Contains(entity.Id));
I have to sort entities by exactly same like in ids array. How can i do that ?
This should do the trick (written off the top of my head, so may have mistakes)
var ids = new int[] { 3, 2, 20, 1 };
var ordering = ids.Select((id,index) => new {id,index});
var entities =
categories
.Where(entity => ids.Contains(entity.Id))
.AsEnumerable() //line not necessary if 'categories' is a local sequence
.Join(ordering, ent => ent.Id, ord => ord.id, (ent,ord) => new {ent,ord})
.OrderBy(x => x.ord.index)
.Select(x => x.ent)
You could use OrderBy with the index of the Ids in ids.
To get the index of an Id from ids, you could create a map of Id to index. That way you can look up the index in almost constant time, instead of having to call IndexOf and traverse the whole list each time.
Something like this:
var idToIndexMap = ids
.Select((i, v) => new { Index = i, Value = v })
.ToDictionary(
pair => pair.i,
pair => pair.v
);
var sortedEntities = categories
.Where(e => ids.Contains(e.Id))
.ToList() // Isn't necessary if this is Linq-to-Objects instead of entities...
.OrderBy(e => idToIndexMap[e.Id])
;
You may have a go with this:
public class Foo
{
public void Bar()
{
int[] idOrder = new int[] { 3, 2, 20, 1 };
var lookup = idOrder.ToDictionary(i => i,
i => Array.IndexOf(idOrder, i));
foreach(var a in idOrder.OrderBy(i => new ByArrayComparable<int>(lookup, i)))
Console.WriteLine(a);
}
}
public class ByArrayComparable<T> : IComparable<ByArrayComparable<T>> where T : IComparable<T>
{
public readonly IDictionary<T, int> order;
public readonly T element;
public ByArrayComparable(IDictionary<T, int> order, T element)
{
this.order = order;
this.element = element;
}
public int CompareTo(ByArrayComparable<T> other)
{
return this.order[this.element].CompareTo(this.order[other.element]);
}
}
This works for unique elements only, but the lookup efford is constant.