LINQ - Sum Based on Overlapping Dates - c#

Using C# LINQ, I would like to be able to turn the following:
[
{
Id: 1,
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 1
},
{
Id: 2,
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 1
},
{
Id: 3,
StartDate: '2021-03-10',
EndDate: '2021-03-23',
Quantity: 2
},
{
Id: 4,
StartDate: '2021-03-10',
EndDate: '2021-03-25',
Quantity: 1
}
]
Into this:
[
{
StartDate: '2021-03-10',
EndDate: '2021-03-21',
Quantity: 5,
Ids: [1, 2, 3, 4]
},
{
StartDate: '2021-03-22',
EndDate: '2021-03-23',
Quantity: 3,
Ids: [3, 4]
}, {
StartDate: '2021-03-24',
EndDate: '2021-03-25',
Quantity: 1,
Ids: [4]
}
]
In this scenario:
EndDate may change per entry on the input but StartDate will always be the same.
It is possible that two entries may have the same EndDate. In this case, the results will aggregate, with the results showing one entry for those dates and a summed quantity.
Desired solution:
The LINQ would group by unique date range, sum the quantity value and include an array of ids, indicating which date ranges have been covered.
Help is much appreciated.
This question is a step in the right direction but doesn't take care of indicating the date range as demonstrated.
Select multiple fields group by and sum

So given this class:
public class Entry
{
public int Id { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
}
And this starting point:
Entry[] entries = JsonConvert.DeserializeObject<Entry[]>(jsonText);
I first get a list of the distinct dates involved:
DateTime[] dates =
entries
.SelectMany(x => new [] { x.StartDate, x.EndDate })
.Distinct()
.OrderBy(x => x)
.ToArray();
Now I can query to split each Entry into the set of distinct date ranges:
var query =
from e in entries
let splitters =
dates
.Where(x => x >= e.StartDate)
.Where(x => x <= e.EndDate)
.ToArray()
from s in
splitters
.Skip(1)
.Zip(
splitters,
(s1, s0) => new Entry()
{
Id = e.Id,
StartDate = s0,
EndDate = s1,
Quantity = e.Quantity
})
group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
select new
{
gss.Key.StartDate,
gss.Key.EndDate,
Quantity = gss.Sum(gs => gs.Quantity),
Ids = gss.Select(gs => gs.Id).ToArray(),
};
That gives me:
Or:
[
{
"StartDate": "2021-03-10T00:00:00",
"EndDate": "2021-03-21T00:00:00",
"Quantity": 5,
"Ids": [
1,
2,
3,
4
]
},
{
"StartDate": "2021-03-21T00:00:00",
"EndDate": "2021-03-23T00:00:00",
"Quantity": 3,
"Ids": [
3,
4
]
},
{
"StartDate": "2021-03-23T00:00:00",
"EndDate": "2021-03-25T00:00:00",
"Quantity": 1,
"Ids": [
4
]
}
]
If I'm allowed to use Microsoft's "System.Interactive" library then it can be done like this:
var query =
from e in entries
from s in
dates
.Where(x => x >= e.StartDate)
.Where(x => x <= e.EndDate)
.Buffer(2, 1)
.SkipLast(1)
.Select(x => new Entry()
{
Id = e.Id,
StartDate = x[0],
EndDate = x[1],
Quantity = e.Quantity
})
group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
select new
{
gss.Key.StartDate,
gss.Key.EndDate,
Quantity = gss.Sum(gs => gs.Quantity),
Ids = gss.Select(gs => gs.Id).ToArray(),
};

These are a couple of LINQ extension methods that are based on the APL scan operator, extended. scan is like Aggregate, but returns the intermediate results. These are a variation that take pairs of items and then processes them.
public static class IEnumerableExt {
// TRes firstResFn(T firstValue)
// TRes combineFn(T PrevValue, T CurValue)
// returns firstResFn(items.First()) then ScanByPairs(items, combineFn)
public static IEnumerable<TRes> ScanByPairs<T, TRes>(this IEnumerable<T> items, Func<T, TRes> firstResFn, Func<T, T, TRes> combineFn) {
using (var itemsEnum = items.GetEnumerator())
if (itemsEnum.MoveNext()) {
var prev = itemsEnum.Current;
yield return firstResFn(prev);
while (itemsEnum.MoveNext())
yield return combineFn(prev, prev = itemsEnum.Current);
}
}
// THelper helperSeedFn(T FirstValue)
// TRes resSeedFn(T FirstValue)
// (THelper Helper, TRes Res) combineFn((THelper Helper, TRes PrevRes), T CurValue)
// returns resSeedFn, combineFn,...
public static IEnumerable<TRes> ScanToPairsWithHelper<T, THelper, TRes>(this IEnumerable<T> items, Func<T, THelper> helperSeedFn, Func<T, TRes> resSeedFn, Func<(THelper Helper, TRes PrevRes), T, (THelper Helper, TRes Res)> combineFn) {
using (var itemsEnum = items.GetEnumerator())
if (itemsEnum.MoveNext()) {
var seed = (Helper: helperSeedFn(itemsEnum.Current), Res: resSeedFn(itemsEnum.Current));
while (itemsEnum.MoveNext()) {
yield return seed.Res;
seed = combineFn(seed, itemsEnum.Current);
}
yield return seed.Res;
}
}
}
First, create an intermediate list that combines any identical date ranges:
var int1 = src.GroupBy(s => s.EndDate)
.Select(sg => new {
Ids = sg.Select(s => s.Id).ToList(),
StartDate = sg.First().StartDate,
EndDate = sg.Key,
Quantity = sg.Sum(s => s.Quantity)
});
Then use ScanByPairs to compute the new start dates so ranges don't overlap (this assumes EndDate is in increasing order):
var int2 = int1.ScanByPairs(s => new { s.Ids, s.Quantity, s.StartDate, s.EndDate },
(prev, next) => new { next.Ids, next.Quantity, StartDate = prev.EndDate.AddDays(1), next.EndDate });
Finally, process the previous result in reverse, aggregating the quantities and IDs as you go:
var ans = int2.Reverse()
.ScanToPairsWithHelper(first => new { first.Ids, first.Quantity },
first => new { first.Ids, first.Quantity, first.StartDate, first.EndDate },
(helpernext, prev) => (new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity },
new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity, prev.StartDate, prev.EndDate }))
.Reverse();

This is rather tough to do with LINQ. I recommend using MoreLINQ for that.
// these helpers make things much easier
using static MoreLinq.Extensions.GroupAdjacentExtension;
using static MoreLinq.Extensions.ScanRightExtension;
using static MoreLinq.Extensions.WindowRightExtension;
// ...
var output = input
// aggregate identical end dates
.GroupAdjacent(x => x.EndDate)
.Select(xs => xs.Aggregate(
new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
(acc, curr) => new {
Ids = acc.Ids.Append(curr.Id),
Quantity = acc.Quantity + curr.Quantity,
curr.StartDate,
curr.EndDate
}))
// cut off overlapping start dates
.WindowRight(2)
.Select(win => win.Count == 1
? win[0]
: new {
win[1].Ids,
win[1].Quantity,
StartDate = win[0].EndDate.AddDays(1),
win[1].EndDate
})
// accumulate IDs and quantities
.ScanRight((curr, prev) => new {
Ids = curr.Ids.Concat(prev.Ids),
Quantity = curr.Quantity + prev.Quantity,
curr.StartDate,
curr.EndDate
});
This doesn't iterate multiple times and efficiently streams everything. But its a lot of code.
Working example: https://dotnetfiddle.net/YxDQEX
The main trick is to iterate the input backwards. This let's us easily accumulate the IDs and quantities, because they always grow when scanning the list from back to front. E.g. the second-to-last item's quantity will be its own quantity plus the quantity of the last item; the third-to-last item's quantity will be its own plus the quantity of the second-to-last item; etc.
ScanRight() does the backwards iteration for us. It is similar to Aggregate(), but it yields the intermediate values of the accumulate as well, thus returning an IEnumerable<TAccumulate> rather than a single TAccumulate.
It might be possible to get a cleaner version of this using a traditional foreach loop, but I kind of doubt it. With a foreach you would have to do everything at the same time (i.e. in the same loop body) in order to do it efficiently and that will probably become unreadable as well (considering all the state that has to be managed).
With the LINQ version you at least have three disparate steps that you could extract using extension functions:
public static IEnumerable<T> MergeEqualEndDates<T>(this IEnumerable<T> xs) => xs
.GroupAdjacent(x => x.EndDate)
.Select(xs => xs.Aggregate(
new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
(acc, curr) => new {
Ids = acc.Ids.Append(curr.Id),
Quantity = acc.Quantity + curr.Quantity,
curr.StartDate,
curr.EndDate
}));
Which could leave you with:
var output = input
.MergeEqualEndDates()
.TrimStartDates()
.AccumulateIdsAndQuantities();

Thank you to #GoodNightNerdPride and #NetMage for their help.
This is the version I came up with. I cheated and used a for-loop. It is less elegant but I do find it easier to read.
Below is the main extention method (which borrows from #NetMage) and beneath that is an Xunit file I used to test the whole thing.
Extention Method
public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {
var requestsWithGroupedIds = requests
.GroupBy(x => x.EndDate)
.Select(x => new {
Ids = x.Select(y => y.Id).ToArray(),
StartDate = x.First().StartDate,
EndDate = x.Key,
Quantity = x.Sum(y => y.Quantity)
})
.OrderBy(x => x.EndDate)
.ToList();
var responses = new List<Response>();
for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {
var req = requestsWithGroupedIds[i];
var overlappingEntries = requestsWithGroupedIds
.Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
.ToList();
var resp = new Response {
Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
Quantity = overlappingEntries.Sum(x => x.Quantity),
StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
EndDate = req.EndDate
};
responses.Add(resp);
}
return responses;
}
XUnit Code
using System;
using System.Linq;
using Xunit;
using System.Collections.Generic;
namespace StackOverflow.Tests {
public class GroupingAndAggregationTests {
[Fact]
void ShouldAggregateDuplicateDataIntoSingle() {
var requests = new List<Request>() {
new Request{Id = 9, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 6 },
new Request{Id = 345, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 29 }
};
var responses = requests.SumBasedUponOverlappingDates();
Assert.Equal(1, responses.Count());
var expectedResponse =
new Response() {
StartDate = new DateTime(2021, 2, 20),
EndDate = new DateTime(2021, 3, 5),
Quantity = 35,
Ids = new int[] { 9, 345 }
};
var actualResponse = responses[0];
Assert.True(actualResponse.IsEqual(expectedResponse));
}
[Fact]
void ShouldAggregateMultipleBasedOnOverlappingDates() {
var requests = new List<Request>() {
new Request{Id = 1, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
new Request{Id = 2, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
new Request{Id = 3, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 23), Quantity = 2 },
new Request{Id = 4, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 25), Quantity = 1 }
};
var responses = requests.SumBasedUponOverlappingDates();
Assert.Equal(3, responses.Count());
var expecedResponse1 =
new Response() {
StartDate = new DateTime(2021, 3, 10),
EndDate = new DateTime(2021, 3, 21),
Quantity = 5,
Ids = new int[] { 1, 2, 3, 4 }
};
var actualResponse1 = responses[0];
Assert.True(actualResponse1.IsEqual(expecedResponse1));
var expectedResponse2 =
new Response() {
StartDate = new DateTime(2021, 3, 22),
EndDate = new DateTime(2021, 3, 23),
Quantity = 3,
Ids = new int[] { 3, 4 }
};
var actualResponse2 = responses[1];
Assert.True(actualResponse2.IsEqual(expectedResponse2));
var expectedResponse3 =
new Response() {
StartDate = new DateTime(2021, 3, 24),
EndDate = new DateTime(2021, 3, 25),
Quantity = 1,
Ids = new int[] { 4 }
};
var actualResponse3 = responses[2];
Assert.True(actualResponse3.IsEqual(expectedResponse3));
}
}
public class Request {
public int Id { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
}
public class Response {
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Quantity { get; set; }
public int[] Ids { get; set; }
public bool IsEqual(Response resp)
=>
StartDate == resp.StartDate &&
EndDate == resp.EndDate &&
Quantity == resp.Quantity &&
Ids.OrderBy(x => x).SequenceEqual(resp.Ids.OrderBy(x => x));
}
public static class ExtentionMethods {
public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {
var requestsWithGroupedIds = requests
.GroupBy(x => x.EndDate)
.Select(x => new {
Ids = x.Select(y => y.Id).ToArray(),
StartDate = x.First().StartDate,
EndDate = x.Key,
Quantity = x.Sum(y => y.Quantity)
})
.OrderBy(x => x.EndDate)
.ToList();
var responses = new List<Response>();
for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {
var req = requestsWithGroupedIds[i];
var overlappingEntries = requestsWithGroupedIds
.Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
.ToList();
var resp = new Response {
Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
Quantity = overlappingEntries.Sum(x => x.Quantity),
StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
EndDate = req.EndDate
};
responses.Add(resp);
}
return responses;
}
}
}

Related

Linq-to-Entities contains and order by

I use EF Core and I want to select only the IDs I need, as I would do it, I use an In SQL expression. How can I get the result in the same order as the Ids in the array? And fill OrderNum value in result Dto?
public IEnumerable<ResultDto> Foo()
{
int[] validIds = { 100, 2, 3, 4, 5, 6, 8, 13, 14, 16, 22 };
// Without the required sorting
var query = dc.LeaveRequests.Where(x => validIds.Contains(x.Id));
...
}
class Model
{
public int Id { get; set; }
public string Name { get; set; }
}
class ResultDto
{
public int Id { get; set; }
public string Name { get; set; }
public int OrderNum { get; set; }
}
I would create an index lookup dictionary with the ID as the key and the index as the value. You can then order the result by looking up the index in the dictionary in O(1) time. (using .IndexOf on the array would be an O(n) operation)
int[] validIds = { 100, 2, 3, 4, 5, 6, 8, 13, 14, 16, 22 };
var result = dc.LeaveRequests.Where(x => validIds.Contains(x.Id)).AsEnumerable();
var indexLookup = validIds.Select((v,i) => (v,i)).ToDictionary(x => x.v, x => x.i);
var sortedResult = result.OrderBy(x => indexLookup[x.Id]);
Perhaps an even more simple solution would be to join the validIds with the result of the query. The order from the first collection is preserved and the join will use a HashSet internally for the lookup. It would also perform better since ordering the result using OrderBy can be avoided.
int[] validIds = { 100, 2, 3, 4, 5, 6, 8, 13, 14, 16, 22 };
var result = dc.LeaveRequests.Where(x => validIds.Contains(x.Id)).AsEnumerable();
var sortedResult = validIds.Join(result, x => x, x => x.Id, (x, y) => y);
Assuming that the valid ids may be provided in another order, you could order by the index position of the id in the validIds (using a list instead of an array) and map the index position of the result to the OrderNum property:
var query = dc.LeaveRequests
.Where(x => validIds.Contains(x.Id))
.OrderBy(x => validIds.IndexOf(x.Id))
.Select((x, i) => new ResultDto
{
Id = x.Id,
Name = x.Name,
OrderNum = i
});
Try OrderBy if you don't have more requirements.
var query = dc.LeaveRequests
.Where(x => validIds.Contains(x.Id))
.OrderBy(x => validIds.IndexOf(x.Id))
.Select(x => new OrderNum {
Id = x.Id,
Name = x.Name,
OrderNum = //fill OrderNum here,
})
.AsEnumerable();
var results = dc.LeaveRequests
.Where(x => validIds.Contains(x.Id))
.Select(x => new ResultDto
{
Id = x.Id,
Name = x.FechaCreacion,
})
.AsEnumerable()
.OrderBy(x => validIds.IndexOf(x.Id))
.ToList();
for (int i = 0; i < results.Count; i++)
{
results[i].OrderNum = i;
}

Convert list entries to array

In order to generate a graph using d3 I need to convert my list of time entries to several arrays.
I store my data in a list of work records per day per staff
I need to be able to get an array of all days, and then a array each per member of staff.
So lets say staff x has 3.5h against 01/1/19 and 4.5h against 03/1/19
Staff y has 6h agaist 2/1/19
I'd expect my arrays to look as following:
Dates[1/1/19, 2/1/19, 3/1/19]
X[3.5,0,4.5]
Y[0,6,0]
Some of my code is:
public IEnumerable<TicketWorkRecord> TimeByDateByStaff { get; set; }
public class TicketWorkRecord
{
public DateTime Date { get; set; }
public decimal TimeSpent { get; set; }
}
Assuming you have a class called StaffMember that looks like this:
public class StaffMember
{
public IEnumerable<TicketWorkRecord> TimeByDateByStaff { get; set; }
// Other properties
}
And after adding the following constructor to your TicketWorkRecord class:
public TicketWorkRecord(DateTime date, decimal timeSpent)
{
Date = date;
TimeSpent = timeSpent;
}
Let's create a dummy data for X and Y staff members:
StaffMember X = new StaffMember
{
TimeByDateByStaff = new List<TicketWorkRecord>()
{
new TicketWorkRecord(DateTime.Today.Date, 3.5M),
new TicketWorkRecord(DateTime.Today.Date.AddDays(2), 4.5M)
}
};
StaffMember Y = new StaffMember
{
TimeByDateByStaff = new List<TicketWorkRecord>()
{ new TicketWorkRecord(DateTime.Today.Date.AddDays(1), 6M) }
};
var staffMembers = new List<StaffMember>() { X, Y };
Now, you can construct your desired 3 arrays using the following code:
var dates = staffMembers.SelectMany(s => s.TimeByDateByStaff)
.Select(t => t.Date)
.Distinct()
.OrderBy(d => d).ToArray();
var xTimes = dates.Select(d => X.TimeByDateByStaff
.FirstOrDefault(t => t.Date == d)?.TimeSpent ?? 0).ToArray();
var yTimes = dates.Select(d => Y.TimeByDateByStaff
.FirstOrDefault(t => t.Date == d)?.TimeSpent ?? 0).ToArray();
To test it:
Console.WriteLine("Dates: " + string.Join(",", dates.Select(d => d.ToShortDateString())));
Console.WriteLine("xTimes: " + string.Join(",", xTimes));
Console.WriteLine("yTimes: " + string.Join(",", yTimes));
Output:
Dates: 12/01/2019,13/01/2019,14/01/2019
xTimes: 3.5,0,4.5
yTimes: 0,6,0
If TicketWorkRecord has a property specifying which staff member it is (X or Y), then this would be pretty straight forward using LINQ:
var dates = TimeByDateByStaff.Select(x => x.Date.ToString("MM/dd/yy")).ToArray();
var staffXTimeSpent = TimeByDateByStaff.Select(x => x.StaffMember == "X" ? x.TimeSpent : 0M).ToArray();
var staffYTimeSpent = TimeByDateByStaff.Select(x => x.StaffMember == "Y" ? x.TimeSpent : 0M).ToArray();
Alternatively, if the exact staff members aren't known at compile time then you can get the time entries by staff member at runtime:
var timeSpentByStaffMembers = TimeByDateByStaff
.Select(x => x.StaffMember)
.Distinct()
.ToDictionary(
key => key,
value => TimeByDateByStaff.Select(x => x.StaffMember == value ? x.TimeSpent : 0M).ToArray());
With the following data:
var TimeByDateByStaff = new List<TicketWorkRecord>
{
new TicketWorkRecord
{
Date = new DateTime(2019, 1, 1),
TimeSpent = 3.5M,
StaffMember = "X"
},
new TicketWorkRecord
{
Date = new DateTime(2019, 2, 1),
TimeSpent = 6M,
StaffMember = "Y"
},
new TicketWorkRecord
{
Date = new DateTime(2019, 3, 1),
TimeSpent = 4.5M,
StaffMember = "X"
}
};
The LINQ statements above produce the following output:
If i understand you correctly, you want to split your list of objects into individuals fields arrays.
If yes, Lets say you have the following list
List<Object> ObjectsList = ObjectsList;
string[] ExtractDates = ObjectsList.Select(x=>x.Date).ToArray();
double[] TimeSpent = ObjectsList.Select(x=> x.TimeSpent).ToArray();
and so forth, you can apply where condition to filter based on members

Setting EndDate based on StartDate in next list item in a C# List

I have a C# list items as follows-
List<MyClass> All_Items = GetListItems();
GetListItems() returns the result as follows-
Category StartDate EndDate
AA 2008-05-1
AA 2012-02-1
BB 2009-09-1
BB 2010-08-1
CC 2009-10-1
Using LINQ on All_Items, I want to update EndDate column in a way that if
If the current Category's StartDate is less than the Start Date of next bigger date item within same Category then use one less day than that of bigger date.
If there is no bigger date remaining then update to 2099-12-31
Final result is as follows-
Category StartDate EndDate
AA 2008-05-1 2012-01-31
AA 2012-02-1 2099-12-31
BB 2009-09-1 2010-07-31
BB 2010-08-1 2099-12-31
CC 2009-10-1 2099-12-31
I can only think of getting it done using too many loops. What is the better option?
Try this code. It Loops over all items and selects the next bigger item.StartDate for the same category.
If such an item is not available it sets you default date.
I couldn't Test the code as I'm writing on my mobile, so any correction is welcome.
foreach(var item in All_Items)
{
var nextItem = (from i in All_Items
where i != null &&
i.Category == item.Category &&
i.StartDate > item.StartDate
orderby i.StartDate
select i).FirstOrDefault();
item.EndDate = nextItem != null ? nextItem.StartDate.AddDays(-1) : new DateTime(2099,12,31);
}
LINQ is not good for processing dependencies between elements of a sequence, and for sure is not intended for updating.
Here is the simple and efficient way to achieve the goal:
var groups = All_Items.OrderBy(item => item.StartDate).GroupBy(item => item.Category);
foreach (var group in groups)
{
MyClass last = null;
foreach (var item in group)
{
if (last != null) last.EndDate = item.StartDate.AddDays(-1);
last = item;
}
last.EndDate = new DateTime(2099, 12, 31);
}
So we use LINQ just to order the elements by StartDate and group the result by Category (which preserves the ordering inside each group). Then simply iterate the LINQ query result and update the EndDate accordingly.
You can select dates for each category and put it into dictionary to save time later.
Then you just go through all your items and check if the start date less than next in category or not, according to you requirements.
Here it is:
var categoryDictionary = All_Items
.GroupBy(i => i.Category)
.ToDictionary(
g => g.Key,
g => g.Select(i => i.StartDate));
var defaultDate = DateTime.Parse("2099-12-31");
foreach (var item in All_Items)
{
var nextDateInCategory = categoryDictionary[item.Category]
.Where(i => i > item.StartDate)
.OrderBy(i => i)
.FirstOrDefault();
item.EndDate =
nextDateInCategory != default(DateTime)
? nextDateInCategory.AddDays(-1)
: defaultDate;
}
Let's assume your MyClass looks something like this:
public class MyClass
{
public string Category { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
}
Here is how you can do it, see the comments in the code for an explanation.
IEnumerable<MyClass> All_Items = new List<MyClass>
{
new MyClass { Category = "AA", StartDate = new DateTime(2008, 5, 1) },
new MyClass { Category = "AA", StartDate = new DateTime(2012, 2, 1) },
new MyClass { Category = "BB", StartDate = new DateTime(2009, 9, 1) },
new MyClass { Category = "BB", StartDate = new DateTime(2010, 8, 1) },
new MyClass { Category = "CC", StartDate = new DateTime(2009, 10, 1) }
}
// Group by category
.GroupBy(c => c.Category)
// Colapse the groups into a single IEnumerable
.SelectMany(g =>
{
// Store the already used dates
List<DateTime> usedDates = new List<DateTime>();
// Get a new MyClass that has the EndDate set, from each MyClass in the category
return g.Select(c =>
{
// Get all biggerDates that were not used already
var biggerDates = g.Where(gc => gc.StartDate > c.StartDate && !usedDates.Any(ud => ud == gc.StartDate));
// Set the endDate to the default one
DateTime date = new DateTime(2099, 12, 31);
// If a bigger date was found, mark it as used and set the EndDate to it
if (biggerDates.Any()) {
date = biggerDates.Min(gc => gc.StartDate).AddDays(-1);
usedDates.Add(date);
}
return new MyClass
{
Category = c.Category,
StartDate = c.StartDate,
EndDate = date
};
});
});
In a single LINQ statement (maxEndDate is 2099-12-31):
All_Items.GroupBy(category => category.Category).Select(key =>
{
var maxCategoryStartDate = key.Max(value => value.StartDate);
return key.Select(v => {
if (DateTime.Equals(v.StartDate, maxCategoryStartDate))
{
v.EndDate = maxEndDate;
}
else
{
v.EndDate = maxCategoryStartDate - TimeSpan.FromDays(1);
}
return v;
});
}
).SelectMany(x => x);

Filter large list based on date time

I have the following Items class:
public class Item
{
public Item()
{
}
public string Id {get; set;}
public string Name {get; set;}
public string Price {get; set;}
public DateTime CreatedDate {get; set;}
}
and then in my code I have List<Item> items that contains A LOT of items of type Item, my question is what you recommend as the best way/practice of sorting/filtering the items in the list based on CreatedDate for the following scenarios:
all items where the CreatedDate is before date x
all items where the CreatedDate after date x
all items where the CreatedDate between date x and date y
P.S. What about if I will mentione also the time? Like before/after/between date x time y ?
You can use LINQ:
var beforeDateX = items
.Where(i => i.CreatedDate.Date < DateX); // remove the .Date if you want to include the time
var afterDateX = items
.Where(i => i.CreatedDate.Date > DateX);
var betweenDates = items
.Where(i => i.CreatedDate.Date >= DateX && i.CreatedDate.Date <= DateY);
You can use a foreach or methods like ToList to execute the query and materialize the result.
foreach(Item i in beforeDateX)
Console.WriteLine("Name:{0} CreatedDate:{1}", i.Name, i.CreatedAt);
Use Linq:
var itemsBefore = items.Where(i => i.CreatedDate <= timeBefore);
var itemsAfter = items.Where(i => i.CreatedDate >= timeAfter);
var itemsBetween = items.Where(i => i.CreatedDate >= timeStart && i.CreatedDate <= timeEnd);
For ordering
var ordrered = items.OrderBy(i => i.CreatedDate);
Considering you have a List<>, I suggest:
List<Item> itemsBefore = items.FindAll(i => i.CreatedDate <= timeBefore);
List<Item> itemsAfter = items.FindAll(i => i.CreatedDate >= timeAfter);
List<Item> itemsBetween = items.FindAll(i => i.CreatedDate >= timeStart && i.CreatedDate <= timeEnd);
there is a subtle difference between what I suggested and what the other have suggested.
The .Where method doesn't "cache" the returned list, so if you do:
var filtered = items.Where(condition);
foreach (var item in filtered)
{
}
foreach (var item in filtered)
{
}
your whole list will be parsed twice to search for the items that make the condition true. To solve this "problem" (sometimes it could be a problem) you can add a .ToList() after the .Where()
The List<>.FindAll() returns a new List<> with only the selected items. So you can enumerate it how many times you want, because it has been "materialized".
All the LINQ approaches are great, but they iterate the list 3 times. If there are really LOT of items, then maybe an old-fashioned way will be more efficient (that is, if you want all three scenarios at once, otherwise the LINQ answers are the way to go):
List<Item> before = new List<Item>();
List<Item> after = new List<Item>();
List<Item> between = new List<Item>();
foreach (var item in Items)
{
if (item.CreatedDate <= timeBefore)
{
before.Add(item);
}
else if (item.CreatedDate >= timeAfter)
{
after.Add(item);
}
else
{
between.Add(item);
}
}
You could use LINQ Where:
static void Main(string[] args)
{
Item item1 = new Item() { CreatedDate = new DateTime(2010, 11, 10), Id = "1", Name = "foo1", Price = "10.00" };
Item item2 = new Item() { CreatedDate = new DateTime(2010, 11, 11), Id = "2", Name = "foo2", Price = "11.00" };
Item item3 = new Item() { CreatedDate = new DateTime(2010, 11, 12), Id = "3", Name = "foo3", Price = "12.00" };
Item item4 = new Item() { CreatedDate = new DateTime(2010, 11, 13), Id = "4", Name = "foo4", Price = "13.00" };
Item item5 = new Item() { CreatedDate = new DateTime(2010, 11, 14), Id = "5", Name = "foo5", Price = "14.00" };
Item item6 = new Item() { CreatedDate = new DateTime(2010, 11, 15), Id = "6", Name = "foo6", Price = "15.00" };
Item item7 = new Item() { CreatedDate = new DateTime(2010, 11, 16), Id = "7", Name = "foo7", Price = "16.00" };
Item item8 = new Item() { CreatedDate = new DateTime(2010, 11, 17), Id = "8", Name = "foo8", Price = "17.00" };
List<Item> items = new List<Item>();
items.Add(item1);
items.Add(item2);
items.Add(item3);
items.Add(item4);
items.Add(item5);
items.Add(item6);
items.Add(item7);
items.Add(item8);
List<Item> filtered = ItemsBeforeDate(items, new DateTime(2010, 11, 16));
foreach (Item i in filtered)
{
Console.Write(i.Name);
}
Console.Read();
}
public static List<Item> ItemsBeforeDate(List<Item> items, DateTime beforeDate)
{
return items.Where(i => i.CreatedDate < beforeDate).ToList();
}
public static List<Item> ItemsAfterDate(List<Item> items, DateTime afterDate)
{
return items.Where(i => i.CreatedDate > afterDate).ToList();
}
public static List<Item> ItemsBetweenDates(List<Item> items, DateTime startDate, DateTime endDate)
{
return items.Where(i => i.CreatedDate >= startDate && i.CreatedDate <= endDate).ToList();
}
Prints:
foo1
foo2
foo3
foo4
foo5
foo6
You need to take a look at Enumerable METHODS
For Filtering use the Where
list.Where(x=>x.CreatedDate < yourDate).ToList();
For Ordering
list.OrderBy(x=>x.CreatedDate).ToList();
IMHO the third way is OK.
But if you don't want filter you can implement a pagination when you retrieve your list. Because if you put a large date range you don't resolve your performance issue.

LINQ - Grouping a list by multiple properties and returning an object with an array member

This is going to be a two part question.
I am trying to build a data structure for use with the Google Charts API (specifically, their data table).
Here is my code as it stands now:
return Json.Encode(
RMAs
.Where(r => r.CreatedDate.Year > DateTime.Now.Year - 4) //Only grab the last 4 years worth of RMAs
.GroupBy(r => new { Problem = r.Problem, Year = r.CreatedDate.Year, Quarter = ((r.CreatedDate.Month) / 3) })
.Select(r => new { Problem = r.Key.Problem, Year = r.Key.Year, Quarter = r.Key.Quarter, Count = r.Count() })
);
This gets me very close. This gets me an array similar to the following:
{"Problem":"It broke!","Year":2012,"Quarter":2,"Count":3},
{"Problem":"It broke!","Year":2012,"Quarter":1,"Count":1}
But, what I want is for the data to be grouped further by the "Problem" property so that the quarter is an array for each problem (this makes the data structure much easier to iterate over). An example of the desired structure:
{"Problem":"It broke!",
{"Year":2012,"Quarter":2,"Count":3},
{"Year":2012,"Quarter":1,"Count":1}
},
{"Problem":"Some other problem",
{"Year":2012,"Quarter":1,"Count":31}
}
The second part of the question: How can I ensure that I have data for each quarter (again, this makes it much easier to iterate over for building the data table with the API), even if a "Problem" did not occur in that quarter? Using the same example as last time:
{"Problem":"It broke!",
{"Year":2012,"Quarter":2,"Count":3},
{"Year":2012,"Quarter":1,"Count":1}
},
{"Problem":"Some other problem",
{"Year":2012,"Quarter":2,"Count":0}
{"Year":2012,"Quarter":1,"Count":31}
}
Thanks to Mr. TA for the inspiration and for showing me that you can use LINQ against a grouping.
I have tested this out in a local environment and the LINQ does indeed return a list of Problems tied to an array of Year/Quarter groupings with a total Count. I don't know if Json.Encode encodes it in the correct format though.
The following LINQ should return an anonymous type that fits the format you needed:
Edit: Query now returns count=0 for quarters where at least one problem occurred, but specified problem did not occur
var quarters = RMAs
.Where(rma => rma.CreatedDate.Year > DateTime.Now.Year - 4)
.GroupBy(rma => new {
Year = rma.CreatedDate.Year,
Quarter = ((rma.CreatedDate.Month) / 3)
});
return Json.Encode(
RMAs
//Only grab the last 4 years worth of RMAs
.Where(r => r.CreatedDate.Year > DateTime.Now.Year - 4)
// Group all records by problem
.GroupBy(r => new { Problem = r.Problem })
.Select(grouping => new
{
Problem = grouping.Key.Problem,
Occurrences = quarters.Select(quarter => new
{
Year = quarter.Key.Year,
Quarter = quarter.Key.Quarter,
Count = grouping
.GroupBy(record => new
{
Year = record.CreatedDate.Year,
Quarter = ((record.CreatedDate.Month) / 3)
})
.Where(record =>
record.Key.Year == quarter.Key.Year
&& record.Key.Quarter == quarter.Key.Quarter
).Count()
}).ToArray()
}));
Update: Thanks to JamieSee for updating with example JSON output:
This is an example of the JSON output:
[{"Problem":"P","Occurrences":[{"Year":2012,"Quarter":4,"Count":2},{"Year":2012,"Quarter":2,"Count":1},{"Year":2012,"Quarter":1,"Count":1}]},{"Problem":"Q","Occurrences":[{"Year":2012,"Quarter":3,"Count":1},{"Year":2012,"Quarter":2,"Count":1},{"Year":2012,"Quarter":1,"Count":1}]}]
Add the following to your query:
.GroupBy(x => x.Problem)
.ToDictionary(g => g.Key, g => g.Select(x=>new { Year=x.Year, Quarter=x.Quarter, Count = x.Count }));
You have to insert the following before .ToDictionary() above:
.Select(g =>
new {
Key = g.Key,
Items =
g
.GroupBy(r => r.Year)
.SelectMany(gy =>
gy.Concat(
Enumerable.Range(1,5)
.Where(q => !gy.Any(r=>r.Quarter == q))
.Select(q => new { Problem = g.Key, Year = gy.Key, Quarter = q, Count = 0 })
)
)
}
)
I think... try it out :)
I would advise against following this approach, however, and create "empty" records on the client, to avoid excessive bandwidth use.
Here's the full restatement to meet all your criteria:
public static IEnumerable<DateTime> GetQuarterDates()
{
for (DateTime quarterDate = DateTime.Now.AddYears(-4); quarterDate <= DateTime.Now; quarterDate = quarterDate.AddMonths(3))
{
yield return quarterDate;
}
}
public static void RunSnippet()
{
var RMAs = new[] {
new { Problem = "P", CreatedDate = new DateTime(2012, 6, 2) },
new { Problem = "P", CreatedDate = new DateTime(2011, 12, 7) },
new { Problem = "P", CreatedDate = new DateTime(2011, 12, 8) },
new { Problem = "P", CreatedDate = new DateTime(2011, 8, 1) },
new { Problem = "P", CreatedDate = new DateTime(2011, 4, 1) },
new { Problem = "Q", CreatedDate = new DateTime(2011, 11, 11) },
new { Problem = "Q", CreatedDate = new DateTime(2011, 6, 6) },
new { Problem = "Q", CreatedDate = new DateTime(2011, 3, 3) }
};
var quarters = GetQuarterDates().Select(quarterDate => new { Year = quarterDate.Year, Quarter = Math.Ceiling(quarterDate.Month / 3.0) });
var rmaProblemQuarters = from rma in RMAs
where rma.CreatedDate > DateTime.Now.AddYears(-4)
group rma by rma.Problem into rmaProblems
select new {
Problem = rmaProblems.Key,
Quarters = (from quarter in quarters
join rmaProblem in rmaProblems on quarter equals new { Year = rmaProblem.CreatedDate.Year, Quarter = Math.Ceiling(rmaProblem.CreatedDate.Month / 3.0) } into joinedQuarters
from joinedQuarter in joinedQuarters.DefaultIfEmpty()
select new {
Year = quarter.Year,
Quarter = quarter.Quarter,
Count = joinedQuarters.Count()
})
};
string json = System.Web.Helpers.Json.Encode(rmaProblemQuarters);
Console.WriteLine(json);
}
Which yields:
[{"Problem":"P","Quarters":[{"Year":2008,"Quarter":2,"Count":0},{"Year":2008,"Quarter":3,"Count":0},{"Year":2008,"Quarter":4,"Count":0},{"Year":2009,"Quarter":1,"Count":0},{"Year":2009,"Quarter":2,"Count":0},{"Year":2009,"Quarter":3,"Count":0},{"Year":2009,"Quarter":4,"Count":0},{"Year":2010,"Quarter":1,"Count":0},{"Year":2010,"Quarter":2,"Count":0},{"Year":2010,"Quarter":3,"Count":0},{"Year":2010,"Quarter":4,"Count":0},{"Year":2011,"Quarter":1,"Count":0},{"Year":2011,"Quarter":2,"Count":1},{"Year":2011,"Quarter":3,"Count":1},{"Year":2011,"Quarter":4,"Count":2},{"Year":2011,"Quarter":4,"Count":2},{"Year":2012,"Quarter":1,"Count":0},{"Year":2012,"Quarter":2,"Count":1}]},{"Problem":"Q","Quarters":[{"Year":2008,"Quarter":2,"Count":0},{"Year":2008,"Quarter":3,"Count":0},{"Year":2008,"Quarter":4,"Count":0},{"Year":2009,"Quarter":1,"Count":0},{"Year":2009,"Quarter":2,"Count":0},{"Year":2009,"Quarter":3,"Count":0},{"Year":2009,"Quarter":4,"Count":0},{"Year":2010,"Quarter":1,"Count":0},{"Year":2010,"Quarter":2,"Count":0},{"Year":2010,"Quarter":3,"Count":0},{"Year":2010,"Quarter":4,"Count":0},{"Year":2011,"Quarter":1,"Count":1},{"Year":2011,"Quarter":2,"Count":1},{"Year":2011,"Quarter":3,"Count":0},{"Year":2011,"Quarter":4,"Count":1},{"Year":2012,"Quarter":1,"Count":0},{"Year":2012,"Quarter":2,"Count":0}]}]

Categories

Resources