Optimising the process of a Huge List<T> in C# - c#

I'm working on a scheduling algorithm that generates/assigns time-slots to a List of Recipients based on the following restrictions:
Max Recipients per Minute
Max Recipients per Hour
Suppose that the delivery Start Time is 2018-10-17 9:00 AM and we have 19 recipients with Max of 5 per min and and 10 per hour, so the output should be:
5 Recipients will be scheduled on 2018-10-17 9:00 AM
5 Recipients will be scheduled on 2018-10-17 9:01 AM
5 Recipients will be scheduled on 2018-10-17 10:00 AM
4 Recipients will be scheduled on 2018-10-17 10:01 AM
The algorithm is very accurate, but the way it works is as following:
First it generates a list of time-slots or time-windows that are accurately fits the no. of recipients based on the restrictions i mentioned before.
then, I'm moving whatever available in the List of Time-Slots for each set/group or recipients.
in the list of Time-Slots I added a counter that increments for every recipient added to it, so in this way I can track the no. of each recipients added to each time-slot to respect the Max per Min./Hr restrictions.
The previous process it simplified in this code snippet - I'm using While Loop to iterate, in my case when having 500K recipients this is taking 28 minutes to get it done!
I tried to use Parallel.ForEach but I couldn't figure out how to implement it in this case.
DateTime DeliveryStart = DateTime.Now;
//This list has DateTime: Time-windows values starting from DeliveryStart to the Max value of the time needed to schedule the Recipients
var listOfTimeSlots = new List<Tuple<DateTime, bool, int>>();
//List of Recipients with Two types of data: DateTime to tell when its scheduled and int value refers to the Recipient's ID
var ListOfRecipients = new List<Tuple<DateTime, int>>();
List<Tuple<int, DateTime>> RecipientsWithTimeSlots= new List<Tuple<int, DateTime>>();
int noOfRecipients = ListOfRecipients.Count;
int Prevhour = 0, _AddedPerHour = 0, Prevday = 0;
// Scheduling restrictions
int _MaxPerHour = 5400, _MaxPerMinute = 90;
int i = 0;
int indexStart = 0;
// ...
// ...
// Code to fill listOfTimeSlots ListOfRecipients with Data
while (noOfRecipients > 0)
{
var TimeStamp = listOfTimeSlots[i];
int hour = TimeStamp.Item1.Hour;
int day = TimeStamp.Item1.Day;
if (Prevhour == 0)
{
Prevhour = hour;
Prevday = day;
}
if (Prevhour != hour)
{
Prevhour = hour;
_AddedPerHour = 0;
}
if (_AddedPerHour >= _MaxPerHour)
{
var tmpItem = listOfTimeSlots.Where(l => l.Item1.Hour == hour && l.Item1.Day == day).LastOrDefault();
int indexOfNextItem = listOfTimeSlots.LastIndexOf(tmpItem) + 1;
i = indexOfNextItem;
_AddedPerHour = 0;
continue;
}
else
{
int endIndex;
endIndex = _MaxPerMinute > noOfRecipients ? noOfRecipients : _MaxPerMinute;
if (endIndex > Math.Abs(_AddedPerHour - _MaxPerHour))
endIndex = Math.Abs(_AddedPerHour - _MaxPerHour);
var RecipientsToIteratePerMinute = ListOfRecipients.GetRange(indexStart, endIndex);
foreach (var item in RecipientsToIteratePerMinute)
{
RecipientsWithTimeSlots.Add(new Tuple<int, DateTime>(item.Item2, TimeStamp.Item1));
listOfTimeSlots[i] = new Tuple<DateTime, bool, int>(TimeStamp.Item1, true, listOfTimeSlots[i].Item3 + 1);
_AddedPerHour++;
}
indexStart += endIndex;
noOfRecipients -= endIndex;
i++;
}
}
I simplified the code in here, for not making it so complex to understand, all i want it to speed-up the while loop or replacing it with a Parallel.ForEach.
THE WHILE LOOP IS NEVER SIMPLIFIED, THIS IS HOW IT EXACTLY WORKS \
Any help or suggestion is appreciated.

Here is a different approach. It creates the groups of ids first, then assigns them the date based on the requirements.
First, a class to represent the groups (avoid them tuples):
public class RecipientGroup
{
public RecipientGroup(DateTime scheduledDateTime, IEnumerable<int> recipients)
{
ScheduledDateTime= scheduledDateTime;
Recipients = recipients;
}
public DateTime ScheduledDateTime { get; private set; }
public IEnumerable<int> Recipients { get; private set; }
public override string ToString()
{
return string.Format($"Date: {ScheduledDateTime.ToShortDateString()} {ScheduledDateTime.ToLongTimeString()}, count: {Recipients.Count()}");
}
}
Then a class to iterate through the groups. You will see why this is needed later:
public class GroupIterator
{
public GroupIterator(DateTime scheduledDateTime)
{
ScheduledDateTime = scheduledDateTime;
}
public DateTime ScheduledDateTime { get; set; }
public int Count { get; set; }
}
Now, the code:
DateTime DeliveryStart = new DateTime(2018, 10, 17);
//List of Recipients (fake populate function)
IEnumerable<int> allRecipients = PopulateRecipients();
// Scheduling restrictions
int maxPerMinute = 90;
int maxPerHour = 270;
//Creates groups broken down by the max per minute.
var groupsPerMinute = allRecipients
.Select((s, i) => new { Value = s, Index = i })
.GroupBy(x => x.Index / maxPerMinute)
.Select(group => group.Select(x => x.Value).ToArray());
//This will be the resulting groups
var deliveryDateGroups = new List<RecipientGroup>();
//Perform an aggregate run on the groups using the iterator
groupsPerMinute.Aggregate(new GroupIterator(DeliveryStart), (iterator, ids) =>
{
var nextBreak = iterator.Count + ids.Count();
if (nextBreak >= maxPerHour)
{
//Will go over limit, split
var difference = nextBreak-maxPerHour;
var groupSize = ids.Count() - difference;
//This group completes the batch
var group = new RecipientGroup(iterator.ScheduledDateTime, ids.Take(groupSize));
deliveryDateGroups.Add(group);
var newDate = iterator.ScheduledDateTime.AddHours(1).AddMinutes(-iterator.ScheduledDateTime.Minute);
//Add new group with remaining recipients.
var stragglers = new RecipientGroup(newDate, ids.Skip(groupSize));
deliveryDateGroups.Add(stragglers);
return new GroupIterator(newDate, difference);
}
else
{
var group = new RecipientGroup(iterator.ScheduledDateTime, ids);
deliveryDateGroups.Add(group);
iterator.ScheduledDateTime = iterator.ScheduledDateTime.AddMinutes(1);
iterator.Count += ids.Count();
return iterator;
}
});
//Output minute group count
Console.WriteLine($"Group count: {deliveryDateGroups.Count}");
//Groups by hour
var byHour = deliveryDateGroups.GroupBy(g => new DateTime(g.ScheduledDateTime.Year, g.ScheduledDateTime.Month, g.ScheduledDateTime.Day, g.ScheduledDateTime.Hour, 0, 0));
Console.WriteLine($"Hour Group count: {byHour.Count()}");
foreach (var group in byHour)
{
Console.WriteLine($"Date: {group.Key.ToShortDateString()} {group.Key.ToShortTimeString()}; Count: {group.Count()}; Recipients: {group.Sum(g => g.Recipients.Count())}");
}
Output:
Group count: 5556
Hour Group count: 1852
Date: 10/17/2018 12:00 AM; Count: 3; Recipients: 270
Date: 10/17/2018 1:00 AM; Count: 3; Recipients: 270
Date: 10/17/2018 2:00 AM; Count: 3; Recipients: 270
Date: 10/17/2018 3:00 AM; Count: 3; Recipients: 270
Date: 10/17/2018 4:00 AM; Count: 3; Recipients: 270
Date: 10/17/2018 5:00 AM; Count: 3; Recipients: 270
... and so on for all 1852 groups.
This takes about 3 seconds to complete.
I am sure there are edge cases. I wrote this in a hurry so just think about those.

Related

Finding the longest overlapping period

I have a list of records containing Id, DateFrom, DateTo. For the sake of this question we can use this one:
List<(int, DateTime, DateTime)> data = new List<(int, DateTime, DateTime)>
{
(1, new DateTime(2012, 5, 16), new DateTime(2018, 1, 25)),
(2, new DateTime(2009, 1, 1), new DateTime(2011, 4, 27)),
(3, new DateTime(2014, 1, 1), new DateTime(2016, 4, 27)),
(4, new DateTime(2015, 1, 1), new DateTime(2015, 1, 3)),
(2, new DateTime(2013, 5, 10), new DateTime(2017, 4, 27)),
(5, new DateTime(2013, 5, 16), new DateTime(2018, 1, 24)),
(2, new DateTime(2017, 4, 28), new DateTime(2018, 1, 24)),
};
In my real case the List could be a lot bigger. Initially I was working with the assumption that there can be only one record for a certain Id and I was able to come up with a pretty good solution but now, as you can see, the assumption is that you can have several periods for an Id and all periods should be taken into consideration when comparing the whole time.
The task is to find the two records that has the longest time overlap and to return the ids and the number of days overlapped.
Which in this sample case means that these should be records 1 and 2.
My implementation of this is the following:
public (int, int, int) GetLongestElapsedPeriodWithDuplications(List<(int, DateTime, DateTime)> periods)
{
Dictionary<int, List<(DateTime, DateTime)>> periodsByPeriodId = new Dictionary<int, List<(DateTime, DateTime)>>();
foreach (var period in periods)
{
if (periodsByPeriodId.ContainsKey(period.Item1))
{
periodsByPeriodId[period.Item1].Add((period.Item2, period.Item3));
}
else
{
periodsByPeriodId[period.Item1] = new List<(DateTime, DateTime)>();
periodsByPeriodId[period.Item1].Add((period.Item2, period.Item3));
}
}
int firstId = -1;
int secondId = -1;
int periodInDays = 0;
foreach (var period in periodsByPeriodId)
{
var Id = period.Key;
foreach (var currPeriod in periodsByPeriodId)
{
int currentPeriodInDays = 0;
if (Id != currPeriod.Key)
{
for (var i = 0; i < period.Value.Count; i++)
{
for (var j = 0; j < currPeriod.Value.Count; j++)
{
var firstPeriodDateFrom = period.Value[i].Item1;
var firstPeriodDateTo = period.Value[i].Item2;
var secondPeriodDateFrom = currPeriod.Value[j].Item1;
var secondPeriodDateTo = currPeriod.Value[j].Item2;
if (secondPeriodDateFrom < firstPeriodDateTo && secondPeriodDateTo > firstPeriodDateFrom)
{
DateTime commonStartingDate = secondPeriodDateFrom > firstPeriodDateFrom ? secondPeriodDateFrom : firstPeriodDateFrom;
DateTime commonEndDate = secondPeriodDateTo > firstPeriodDateTo ? firstPeriodDateTo : secondPeriodDateTo;
currentPeriodInDays += (int)(commonEndDate - commonStartingDate).TotalDays;
}
}
}
if (currentPeriodInDays > periodInDays)
{
periodInDays = currentPeriodInDays;
firstId = Id;
secondId = currPeriod.Key;
}
}
}
}
return (firstId, secondId, periodInDays);
}
As you can see the method is pretty big and in my opinion far from optimized in terms of execution speed. I know that those nested loops rise the complexity a lot, but this additional requirement to deal with more than one period for an Id really left me without ideas. How can I optimize this logic so in case of bigger input it would execute faster than now?
As in your original solution - you need to compare each interval with any other, except intervals with the same id, so I'd code this like this:
Supporting classes, just to simplify actual algorithm:
class Period {
public DateTime Start { get; }
public DateTime End { get; }
public Period(DateTime start, DateTime end) {
this.Start = start;
this.End = end;
}
public int Overlap(Period other) {
DateTime a = this.Start > other.Start ? this.Start : other.Start;
DateTime b = this.End < other.End ? this.End : other.End;
return (a < b) ? b.Subtract(a).Days : 0;
}
}
class IdData {
public IdData() {
this.Periods = new List<Period>();
this.Overlaps = new Dictionary<int, int>();
}
public List<Period> Periods { get; }
public Dictionary<int, int> Overlaps { get; }
}
Method to find max overlap:
static int GetLongestElapsedPeriod(List<(int, DateTime, DateTime)> periods) {
int maxOverlap = 0;
Dictionary<int, IdData> ids = new Dictionary<int, IdData>();
foreach (var period in periods) {
int id = period.Item1;
Period idPeriod = new Period(period.Item2, period.Item3);
// preserve interval for ID
var idData = ids.GetValueOrDefault(id, new IdData());
idData.Periods.Add(idPeriod);
ids[id] = idData;
foreach (var idObj in ids) {
if (idObj.Key != id) {
// here we calculate of new interval with all previously met
int o = idObj.Value.Overlaps.GetValueOrDefault(id, 0);
foreach (var otherPeriods in idObj.Value.Periods)
o += idPeriod.Overlap(otherPeriods);
idObj.Value.Overlaps[id] = o;
// check whether newly calculate overlapping is the maximal one, preserve Ids if needed too
if (o > maxOverlap)
maxOverlap = o;
}
}
}
return maxOverlap;
}
You can use TimePeriodLibrary.NET:
PM> Install-Package TimePeriodLibrary.NET
TimePeriodCollection timePeriods = new TimePeriodCollection(
data.Select(q => new TimeRange(q.Item2, q.Item3)));
var longestOverlap = timePeriods
.OverlapPeriods(new TimeRange(timePeriods.Start, timePeriods.End))
.OrderByDescending(q => q.Duration)
.FirstOrDefault();
With an extension method:
public static T MaxBy<T, TKey>(this IEnumerable<T> src, Func<T, TKey> key, Comparer<TKey> keyComparer = null) {
keyComparer = keyComparer ?? Comparer<TKey>.Default;
return src.Aggregate((a, b) => keyComparer.Compare(key(a), key(b)) > 0 ? a : b);
}
And some helper functions
DateTime Max(DateTime a, DateTime b) => (a > b) ? a : b;
DateTime Min(DateTime a, DateTime b) => (a < b) ? a : b;
int OverlappingDays((DateTime DateFrom, DateTime DateTo) span1, (DateTime DateFrom, DateTime DateTo) span2) {
var maxFrom = Max(span1.DateFrom, span2.DateFrom);
var minTo = Min(span1.DateTo, span2.DateTo);
return Math.Max((minTo - maxFrom).Days, 0);
}
You can group together the spans with matching Ids
var dg = data.GroupBy(d => d.Id);
Generate all pairs of Ids
var pdgs = from d1 in dg
from d2 in dg.Where(d => d.Key > d1.Key)
select new[] { d1, d2 };
Then compute the overlap in days between each pair of Ids and find the maximum:
var MaxOverlappingPair = pdgs.Select(pdg => new {
Id1 = pdg[0].Key,
Id2 = pdg[1].Key,
OverlapInDays = pdg[0].SelectMany(d1 => pdg[1].Select(d2 => OverlappingDays((d1.DateFrom, d1.DateTo), (d2.DateFrom, d2.DateTo)))).Sum()
}).MaxBy(TwoOverlap => TwoOverlap.OverlapInDays);
Since efficiency is mentioned, I should say that implementing some of these operations directly instead of using LINQ is more efficient, but you are using Tuples and in-memory structures so I don't think it will make much difference.
I ran some performance tests using a list of 24000 spans with 1249 unique IDs. The LINQ code took about 16 seconds. By inlining some of the LINQ and replacing anonymous objects with tuples, it came down to about 3.1 seconds. By adding a shortcut skipping any IDs whose cumulative days were shorter than the current max overlapping days and a few more optimizations, I got it down to less than 1 second.
var baseDate = new DateTime(1970, 1, 1);
int OverlappingDays(int DaysFrom1, int DaysTo1, int DaysFrom2, int DaysTo2) {
var maxFrom = DaysFrom1 > DaysFrom2 ? DaysFrom1 : DaysFrom2;
var minTo = DaysTo1 < DaysTo2 ? DaysTo1 : DaysTo2;
return (minTo > maxFrom) ? minTo - maxFrom : 0;
}
var dgs = data.Select(d => {
var DaysFrom = (d.DateFrom - baseDate).Days;
var DaysTo = (d.DateTo - baseDate).Days;
return (d.Id, DaysFrom, DaysTo, Dist: DaysTo - DaysFrom);
})
.GroupBy(d => d.Id)
.Select(dg => (Id: dg.Key, Group: dg, Dist: dg.Sum(d => d.Dist)))
.ToList();
var MaxOverlappingPair = (Id1: 0, Id2: 0, OverlapInDays: 0);
for (int j1 = 0; j1 < dgs.Count; ++j1) {
var dg1 = dgs[j1];
if (dg1.Dist > MaxOverlappingPair.OverlapInDays)
for (int j2 = j1 + 1; j2 < dgs.Count; ++j2) {
var dg2 = dgs[j2];
if (dg2.Dist > MaxOverlappingPair.OverlapInDays) {
var testOverlapInDays = 0;
foreach (var d1 in dg1.Group)
foreach (var d2 in dg2.Group)
testOverlapInDays += OverlappingDays(d1.DaysFrom, d1.DaysTo, d2.DaysFrom, d2.DaysTo);
if (testOverlapInDays > MaxOverlappingPair.OverlapInDays)
MaxOverlappingPair = (dg1.Id, dg2.Id, testOverlapInDays);
}
}
}
Optimizations applied:
Convert each spans DateTimes to # of days from an arbitrary baseDate to optimize overlapping days calculation by doing date conversion once.
Compute the total days for each span and skip any span pairs that can't exceed the current overlap
Replace SelectMany/Select with nested foreach to compute overlapping days.
Use ValueTuples instead of anonymous objects which are (slightly) faster for this problem.
Replace pair generation LINQ with nested for loops generating each possible pair directly
Pass individual from/to parameters instead of objects to OverlappingDays function
Note: I tried a smarter overlapping days calculation but when the number of spans per ID is small, the overhead took longer than just doing the calculation directly.
There are already few solutions
but
if you want to improve the efficiency then you don't have to compare every objects/value with everyother value or object. You can use Interval Search Tree for this problem and it can be solved in RlogN where R are number of intersections between intervals.
I recommend you to watch this video of Robert Sedgwick and also that book is online available.
Your basic problem here is how to identify a unique set of time periods. Give each one its own unique ID yourself.
When you write your final answer, include the additional details in the output so the user can understand which (original) IDs and original time periods resulted in the final answer.
Remember - the problem is still the same as in the original post (https://codereview.stackexchange.com/questions/186014/finding-the-longest-overlapping-period/186031?noredirect=1#comment354707_186031) and you still have the same information to work with. Don't get too hung up on the "ID"s as provided in the original list - you are still iterating through a list of time periods.

ASP.NET MVC Filter datetime by weeks

I've got a Web API and a Get method, returning a query:
var query = from results in context.Table
where results.Date>= startDate && results.Date <= endDate
select new
{
Week = { this is where I need a method to group by weeks },
Average = results.Where(x => x.Number).Average()
}
return query.ToList();
I want to calculate the average for each 7 days (that being the first week).
Example:
Average 1 ... day 7 (Week 1)
Average 2 ... day 14 (Week 2)
How can I do that? Being given an interval of datetimes, to filter it by weeks (not week of year)
Try this (not tested with tables)
var avgResult = context.QuestionaireResults
.Where(r => (r.DepartureDate >= startDate && r.DepartureDate <= endDate)).ToList()
.GroupBy( g => (Decimal.Round(g.DepartureDate.Day / 7)+1))
.Select( g => new
{
Week = g.Key,
Avg = g.Average(n => n.Number)
});
You will need to group by the number of days, since a reference date, divided by 7, so
.GroupBy(x => Math.Floor(((x.DepartureDate - new DateTime(1980,1,1)).TotalDays + 2) / 7))
Subtracting "Jan 1, 1980" from your departure date, gives you a TimeSpan object with the difference between the two dates. The TotalDays property of that timespan gives you timespan in days. Adding 2 corrects for the fact that "Jan 1, 1980" was a Tuesday. Dividing by 7 gives you the number of weeks since then. Math.Floor rounds it down, so that you get a consistent integer for the week, given any day of the week or portion of days within the week.
You could simplify a little by picking a reference date that is a Sunday (assuming that is your "first day of the week"), so you dont have to add 2 to correct. Like so:
.GroupBy(x => Math.Floor(((x.DepartureDate - new DateTime(1979,12,30)).TotalDays) / 7))
If you are sure that your data all falls within a single calendar year, you could maybe use the Calendar.GetWeekOfYear method to figure out the week, but I am not sure it would be any simpler.
Why not write a stored procedure, I think there may be some limitations on your flexibility using Linq because of the idea that normally the GroupBy groups by value (the value of the referenced "thing") so you can group by State, or Age, but I guess you can Group week... (new thought)
Add a property called EndOfWeek and for example, the end of this week is (Sunday let's say) then EndOfWeek = 9.2.16 whereas last week was 8.28.16... etc. then you can easily group but you still have to arrange the data.
I know I didn't answer the question but I hope that I sparked some brain activity in an area that allows you to solve the problem.
--------- UPDATED ----------------
simple solution, loop through your records, foreach record determine the EndOfWeek for that record. After this you will now have a groupable value. Easily group by EndOfWeek. Simple!!!!!!!!!!!! Now, #MikeMcCaughan please tell me how this doesn't work? Is it illogical to extend an object? What are you talking about?
------------ HERE IS THE CODE ----------------
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SandboxConsole
{
class Program
{
static void Main(string[] args)
{
var t = new Transactions();
List<Transactions> transactions = t.GetTransactions();
// Now let's add a Weeks end date so we can determine the average per week
foreach(var transaction in transactions)
{
var transactionDayOfWeek = transaction.TransactionDate;
int daysUntilEndOfWeek_Sat = ((int)DayOfWeek.Saturday - (int)transactionDayOfWeek.DayOfWeek + 7) % 7;
transaction.Newly_Added_Property_To_Group_By_Week_To_Get_Averages = transactionDayOfWeek.AddDays(daysUntilEndOfWeek_Sat).ToShortDateString();
//Console.WriteLine("{0} {")
}
foreach(var weekEnd in transactions.GroupBy(tt => tt.Newly_Added_Property_To_Group_By_Week_To_Get_Averages))
{
decimal weekTotal = 0;
foreach(var trans in weekEnd)
{
weekTotal += trans.Amount;
}
var weekAverage = weekTotal / 7;
Console.WriteLine("Week End: {0} - Avg {1}", weekEnd.Key.ToString(), weekAverage.ToString("C"));
}
Console.ReadKey();
}
}
class Transactions
{
public int Id { get; set; }
public string SomeOtherProp { get; set; }
public DateTime TransactionDate { get; set; }
public decimal Amount { get; set; }
public string Newly_Added_Property_To_Group_By_Week_To_Get_Averages { get; set; }
public List<Transactions> GetTransactions()
{
var results = new List<Transactions>();
for(var i = 0; i<100; i++)
{
results.Add(new Transactions
{
Id = i,
SomeOtherProp = "Customer " + i.ToString(),
TransactionDate = GetRandomDate(i),
Amount = GetRandomAmount()
});
}
return results;
}
public DateTime GetRandomDate(int i)
{
Random gen = new Random();
DateTime startTime = new DateTime(2016, 1, 1);
int range = (DateTime.Today - startTime).Days + i;
return startTime.AddDays(gen.Next(range));
}
public int GetRandomAmount()
{
Random rnd = new Random();
int amount = rnd.Next(1000, 10000);
return amount;
}
}
}
------------ OUTPUT ---------------
Sample Output

C# - Grouping Rows with dates into Blocks with a time duration

This is a little hard to explain. I have a datatable with schedule information. Each row represents a schedule with a start date/time and an end date/time. I need to group these such that the overall start to end time matches a given duration.
For example, I might have the following in my datatable:
Schedule1: Start - 9:00AM, End - 9:30AM
Schedule2: Start - 9:30AM, End - 10:00AM
Schedule3: Start - 10:00AM, End - 10:30AM
Schedule4: Start - 10:30AM, End - 11:00AM
Now if I'm given a duration value of 60 min, then I need to be able to produce the following as output:
Block1: Schedules(1,2): 9:00AM - 10:00AM
Block2: Schedules(2,3): 9:30AM - 10:30AM
Block3: Schedules(3,4): 10:00AM - 11:00AM
If however the duration was instead 120 min, then I would need to produce the following:
Block1: Schedules(1,2,3,4): 9:00AM - 11:00AM
Let me know if this needs clarification. I need to write a method in C# to do this conversion. Please help me with this as I've been stuck on it for a long time.
Whether you choose to do this in C# or SQL depends partly on the scale of the data. Assuming that we're working with a relatively small number of time ranges (say < 10), it would be reasonable to pull all the times into memory and find the blocks in C#.
Given the following classes:
public class Schedule {
public int ID { get; set; }
public DateTime Start { get; set; }
public DateTime End { get; set; }
public int Minutes { get; set; }
}
public class ScheduleBlock : Schedule {
public List<Schedule> Schedules { get; set; }
}
Here is a simple algorithm that iteratively combines ranges together until every possible combination is represented (note that the number of combinations grows as O(n^2)):
public List<ScheduleBlock> CombineAllSchedules(List<Schedule> origschedules, out int added)
{
added = 0;
var schedules = new List<ScheduleBlock>();
foreach (var s in origschedules) {
var snew = new ScheduleBlock { Schedules = new List<Schedule> { s }, Start = s.Start, End = s.End, Minutes = s.Minutes };
schedules.Add(snew);
}
for (var i = 0; i < schedules.Count; i++) {
var s = schedules[i];
var matchstart = schedules.Where (s2 => s2.End == s.Start).ToList();
var matchend = schedules.Where (s2 => s2.Start == s.End).ToList();
foreach (var s2 in matchstart) {
var newschedule = CombineSchedules(s2, s);
if (!schedules.Any (sc => sc.Start == newschedule.Start && sc.End == newschedule.End)) {
schedules.Add(newschedule);
added++;
}
}
foreach (var s2 in matchend) {
var newschedule = CombineSchedules(s, s2);
if (!schedules.Any (sc => sc.Start == newschedule.Start && sc.End == newschedule.End)) {
schedules.Add(newschedule);
added++;
}
}
}
return schedules;
}
public ScheduleBlock CombineSchedules(Schedule s1, Schedule s2)
{
var schedules = new List<Schedule>();
if (s1 is ScheduleBlock) schedules.AddRange(((ScheduleBlock)s1).Schedules);
else schedules.Add(s1);
if (s2 is ScheduleBlock) schedules.AddRange(((ScheduleBlock)s2).Schedules);
else schedules.Add(s2);
var s = new ScheduleBlock {
Schedules = schedules,
Start = s1.Start, End = s2.End, Minutes = s1.Minutes + s2.Minutes
};
return s;
}
Once the combinations are put together, then it is a simple matter to query them and get specific lengths (like 60 minutes or 120 minutes):
public List<ScheduleBlock> FindBlocks(List<Schedule> schedules, int blockLength)
{
int added;
var combinedSchedules = CombineAllSchedules(schedules, out added);
var result = combinedSchedules.Where (s => s.Minutes == blockLength).ToList();
return result;
}
With this algorithm in place, you can do something like this for example to get the output you're looking for:
var schedules = new List<Schedule> {
new Schedule { ID = 1, Start = DateTime.Parse("09:00 AM"), End = DateTime.Parse("09:30 AM") },
new Schedule { ID = 2, Start = DateTime.Parse("09:30 AM"), End = DateTime.Parse("10:00 AM") },
new Schedule { ID = 3, Start = DateTime.Parse("10:00 AM"), End = DateTime.Parse("10:30 AM") },
new Schedule { ID = 4, Start = DateTime.Parse("10:30 AM"), End = DateTime.Parse("11:00 AM") },
};
foreach (var s in schedules) {
s.Minutes = (int)(s.End - s.Start).TotalMinutes;
}
Console.WriteLine("60 Minute Blocks");
Console.WriteLine("----------------");
var blocks = FindBlocks(schedules, 60);
var blockId = 1;
foreach (var block in blocks) {
var output = "Block" + blockId +
": Schedules(" + string.Join(",", block.Schedules.Select (s => s.ID)) + "): " +
block.Start.ToString("h:mmtt") + " - " + block.End.ToString("h:mmtt");
Console.WriteLine(output);
blockId++;
}
Console.WriteLine();
Console.WriteLine("120 Minute Blocks");
Console.WriteLine("----------------");
blocks = FindBlocks(schedules, 120);
blockId = 1;
foreach (var block in blocks) {
var output = "Block" + blockId +
": Schedules(" + string.Join(",", block.Schedules.Select (s => s.ID)) + "): " +
block.Start.ToString("h:mmtt") + " - " + block.End.ToString("h:mmtt");
Console.WriteLine(output);
blockId++;
}
Sample Result:
60 Minute Blocks
----------------
Block1: Schedules(1,2): 9:00AM - 10:00AM
Block2: Schedules(2,3): 9:30AM - 10:30AM
Block3: Schedules(3,4): 10:00AM - 11:00AM
120 Minute Blocks
----------------
Block1: Schedules(1,2,3,4): 9:00AM - 11:00AM

LINQ query to Sum value over date ranges

I'm trying to create a linq query that would produce a collection of date ranges with sums of the Capacity value taking into the account that ranges can overlap and I'd like a sum and a distinct date range for that overlapping periods. Thanks.
public ActionResult Index()
{
List<Capacities> _list = new List<Capacities>{
new Capacities {StartDate = DateTime.Parse("01/01/2013"), StopDate = DateTime.Parse("01/01/2013 06:00"), Capacity = 100},
new Capacities {StartDate = DateTime.Parse("01/01/2013 04:00"), StopDate = DateTime.Parse("01/02/2013 00:00"), Capacity = 120},
new Capacities {StartDate = DateTime.Parse("01/04/2013"), StopDate = DateTime.Parse("01/04/2013 15:00"), Capacity = 100},
new Capacities {StartDate = DateTime.Parse("01/04/2013 15:00"), StopDate = DateTime.Parse("01/04/2013 18:00"), Capacity = 150}
};
//results expected
//01/01/2013 00:00 - 01/01/2013 04:00 100
//01/01/2013 04:00 - 01/01/2013 06:00 220
//01/01/2013 06:00 - 01/02/2013 00:00 120
//01/04/2013 00:00 - 01/04/2013 15:00 100
//01/04/2013 15:00 - 01/04/2013 18:00 150
return View();
}
public class Capacities
{
public DateTime StartDate { get; set; }
public DateTime StopDate { get; set; }
public int Capacity {get;set;}
}
I did some programming, but I extended your code quite a bit. But I was able to use LINQ in the very end :-)
my code:
SortedSet<DateTime> splitdates = new SortedSet<DateTime>();
foreach (var item in _list)
{
splitdates.Add(item.Period.Start);
splitdates.Add(item.Period.End);
}
var list = splitdates.ToList();
var ranges = new List<DateRange>();
for (int i = 0; i < list.Count - 1; i++)
ranges.Add(new DateRange() { Start = list[i], End = list[i + 1] });
var result = from range in ranges
from c in _list
where c.Period.Intersect(range) != null
group c by range into r
select new Capacities(r.Key.Start, r.Key.End, r.Sum(a => a.Capacity));
Complete code is here: http://pastebin.com/wazbb1r3
Note that the output differs because of locale. Also, some bits are not necessary like DateRange.Contains().
On the two loops above, I have no idea how to transform them to LINQ in a readable manner.

LINQ aggregate 30 minute interval to hour

I'm not a super expert on LINQ, I've a data below provided by third party:
Data
Start: 6:00
End: 6:30
value: 1
Start: 7:00
End: 7:30
value: 1
Start: 8:00
End: 8:30
value: 1
Start: 9:00
End: 9:30
value: 1
Start: 10:00
End: 10:30
value: 1
Start: 11:00
End: 11:30
value: 1
Start: 12:00
End: 12:30
value: 1
Start: 13:00
End: 13:30
value: 1
Start: 14:00
End: 14:30
value: 1
...
Start: 05:00
End: 05:30
value: 1
This data keeps going for a week then 30 days and 365days.
I need to transform each 30minute block in to an hour.
e.g
Start: 6:00
End: 7:00
Value: 2
Start:7:00
End: 8:00
Value:2
......
Assuming that Start, End and Value comes as one row, could someone help how above can be achieved?
This query is able to group by the given AggregationType and it is able to filter out incomplete groups using the second parameter checkType.
private enum AggerationType { Year = 1, Month = 2, Day = 3, Hour = 4 }
private IList<Data> RunQuery(AggerationType groupType, AggerationType checkType)
{
// The actual query which does to trick
var result =
from d in testList
group d by new {
d.Start.Year,
Month = (int)groupType >= (int)AggerationType.Month ? d.Start.Month : 1,
Day = (int)groupType >= (int)AggerationType.Day ? d.Start.Day : 1,
Hour = (int)groupType >= (int)AggerationType.Hour ? d.Start.Hour : 1
} into g
// The where clause checks how much data needs to be in the group
where CheckAggregation(g.Count(), checkType)
select new Data() { Start = g.Min(m => m.Start), End = g.Max(m => m.End), Value = g.Sum(m => m.Value) };
return result.ToList();
}
private bool CheckAggregation(int groupCount, AggerationType checkType)
{
int requiredCount = 1;
switch(checkType)
{
// For year all data must be multiplied by 12 months
case AggerationType.Year:
requiredCount = requiredCount * 12;
goto case AggerationType.Month;
// For months all data must be multiplied by days in month
case AggerationType.Month:
// I use 30 but this depends on the given month and year
requiredCount = requiredCount * 30;
goto case AggerationType.Day;
// For days all data need to be multiplied by 24 hour
case AggerationType.Day:
requiredCount = requiredCount * 24;
goto case AggerationType.Hour;
// For hours all data need to be multiplied by 2 (because slots of 30 minutes)
case AggerationType.Hour:
requiredCount = requiredCount * 2;
break;
}
return groupCount == requiredCount;
}
Here some Test data if you want:
class Data
{
public DateTime Start { get; set; }
public DateTime End { get; set; }
public int Value { get; set; }
}
// Just setup some test data simulary to your example
IList<Data> testList = new List<Data>();
DateTime date = DateTime.Parse("6:00");
// This loop fills just some data over several years, months and days
for (int year = date.Year; year > 2010; year--)
{
for(int month = date.Month; month > 0; month--)
{
for (int day = date.Day; day > 0; day--)
{
for(int hour = date.Hour; hour > 0; hour--)
{
DateTime testDate = date.AddHours(-hour).AddDays(-day).AddMonths(-month).AddYears(-(date.Year - year));
testList.Add(new Data() { Start = testDate, End = testDate.AddMinutes(30), Value = 1 });
testList.Add(new Data() { Start = testDate.AddMinutes(30), End = testDate.AddHours(1), Value = 1 });
}
}
}
}
Below is the code. It seems a little bit ugly because of switch statement. It would be better to refactor it but it should show the idea.
var items = input.Split('\n');
Func<string, string> f = s =>
{
var strings = s.Split(new[] {':'}, 2);
var key = strings[0];
var value = strings[1];
switch (key.ToLower())
{
case "start":
return s;
case "value":
return String.Format("{0}: {1}", key, Int32.Parse(value) + 1);
case "end":
return String.Format("{0}: {1:h:mm}", key,
DateTime.Parse(value) +
TimeSpan.FromMinutes(30));
default:
return "";
}
};
var resultItems = items.Select(f);
Console.Out.WriteLine("result = {0}",
String.Join(Environment.NewLine, resultItems));
It's actually quite hard to completely approach this with with pure LINQ. To make life easier, you'll need to write atleast one helper method that allows you to transform an enumeration. Take a look at the example below. Here I make use of an IEnumerable of TimeInterval and have a custom Split method (implemented with C# iterators) that Joins two elements together in one Tuple:
class TimeInterval
{
DateTime Start;
DateTime End;
int Value;
}
IEnumerable<TimeInterval> ToHourlyIntervals(
IEnunumerable<TimeInterval> halfHourlyIntervals)
{
return
from pair in Split(halfHourlyIntervals)
select new TimeInterval
{
Start = pair.Item1.Start,
End = pair.Item2.End,
Value = pair.Item1.Value + pair.Item2.Value
};
}
static IEnumerable<Tuple<T, T>> Split<T>(
IEnumerable<T> source)
{
using (var enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
T first = enumerator.Current;
if (enumerator.MoveNext())
{
T second = enumerator.Current;
yield return Tuple.Create(first, second);
}
}
}
}
The same can be applied to the first part of the problem (extracting half hourly TimeIntervals from the list of strings):
IEnumerable<TimeInterval> ToHalfHourlyIntervals(
IEnumerable<string> inputLines)
{
return
from triple in TripleSplit(inputLines)
select new TimeInterval
{
Start = DateTime.Parse(triple.Item1.Replace("Start: ", "")),
End = DateTime.Parse(triple.Item2.Replace("End: ", "")),
Value = Int32.Parse(triple.Item3)
};
}
Here I make use of a custom TripleSplit method that returns a Tuple<T, T, T> (which will be easy to write). With this in place, the complete solution would look like this:
// Read data lazilzy from disk (or any other source)
var lines = File.ReadLines(path);
var halfHourlyIntervals = ToHalfHourlyIntervals(lines);
var hourlyIntervals = ToHourlyIntervals(halfHourlyIntervals);
foreach (var interval in hourlyIntervals)
{
// process
}
What's nice about this solution is that it is completely deferred. It processes one line at a time, which allows you to process indefinately big sources without the danger of any out of memory exception, which seems important considering your given requirement:
This data keeps going for a week then 30 days and 365days.

Categories

Resources