Increase performance of timeinterval calculation - c#

I have the code fragment below (short version first, compete version after) which loops over lots of records which are in chronological order. The number of records ranges from 100's of thousands to millions. I need to compare the time interval between successive records and determine the difference in minutes to decide on some action and set a value. This is the performance bottleneck of the whole application so I need to do something. The profiler clearly shows that
(DayList[nextIndex].ThisDate - entry.ThisDate).Minutes
is the bottleneck of the bottleneck. When this is solved, the next bottleneck will be the date call in the DayList creation:
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
Those two lines roughly take 60% - 70% of all CPU.
So the question is: how can I increase performance (dramatically) or should I abandon this road completely (because this performance is unacceptable)?
for ( DateTime i=startdate; i<=enddate; i=i.AddDays(1) )
{
int nextIndex = 0;
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
foreach (MonthfileValue entry in DayList)
{
if (++nextIndex < DayList.Count - 1)
{
IntervalInMinutes = (DayList[nextIndex].ThisDate - entry.ThisDate).Minutes;
}
// do some calculations
}
// do some calculations
}
The complete version is below:
for ( DateTime i=startdate; i<=enddate; i=i.AddDays(1) )
{
int nextIndex = 0;
DaySolarValues tmp = new DaySolarValues();
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
foreach (MonthfileValue entry in DayList)
{
if (++nextIndex < DayList.Count - 1)
{
OldIntervalInMinutes = IntervalInMinutes;
IntervalInMinutes = (DayList[nextIndex].ThisDate - entry.ThisDate).Minutes;
if (IntervalInMinutes > 30)
{
IntervalInMinutes = OldIntervalInMinutes; //reset the value and try again
continue; // If more than 30 minutes, then skip this data
}
else if (IntervalInMinutes != OldIntervalInMinutes)
{
// Log some message and continue
}
}
tmp.SolarHours += entry.SolarRad / entry.SolarTheoreticalMax >= SunThreshold ? IntervalInMinutes : 0;
tmp.SolarEnergy += entry.SolarRad * IntervalInMinutes * 60;
tmp.SunUpTimeInMinutes += IntervalInMinutes;
}
tmp.SolarHours /= 60;
tmp.SolarEnergy /= 3600;
tmp.ThisDate = i;
DailySolarValuesList.Add(tmp);
}

I can clearly see that the Where(...) call steals performance.
For me it would be the first step to try this:
var dayLookup = thisList.ToLookup(x => x.ThisDate.Date);
for ( DateTime currentDate =startdate; currentDate <=enddate; currentDate = currentDate.AddDays(1) )
{
int nextIndex = 0;
List<MonthfileValue> DayList = dayLookup[currentDate];
...
}
This way you create a hash lookup before the loop, so getting the DayList will be a less expensive operation

Related

Improving performance of loop by using different list types

I have the following algorithm which works fine for smaller date ranges, however if I increase the date range to about a year (startDate, endDate) it obviously drops performance because I loop every minute of each day, is there a way to improve the performance by using different list types such as hashsets or dictionaries or are there other fallbacks I'm not aware of?
listWorkTime contains around 300+ entries where some date ranges could overlap or being the same but having different TimeRangeId
private List<DateSharedWork> CalculateDateSharedWork(DateTime startDate,
DateTime endDate, ICollection<WorkTime> listWorkTime)
{
List<DateSharedWork> listDateSharedWork = new List<DateSharedWork>();
// +1 to include last day at full
int range = endDate.Subtract(startDate).Days + 1;
// start at startDate
Parallel.For(0, range, i =>
{
DateTime currDate = startDate.AddDays(i);
//set minute interval
double everyNMinutes = 1.0;
double minutesADay = 1440.0;
// reset counter
int work_counter = 0;
int lowWork_counter = 0;
int noWork_counter = 0;
int l = (int)(minutesADay / everyNMinutes);
for (int j = 0; j < l; j++)
{
DateTime check15 = currDate.AddMinutes(j * everyNMinutes);
// check if listWorkTime includes current date
var foundTime = listWorkTime
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate).ToList();
if (foundTime.Count(x => x.TimeRangeId == 1) > 0)
{
// found interval that is within work hours
work_counter++;
noWork_counter++;
}
else
{
if (foundTime.Count(x => x.TimeRangeId == 2) > 0)
{
// found intervall that is within low work hours
lowWork_counter++;
noWork_counter++;
}
}
};
double work = everyNMinutes / minutesADay * work_counter;
double lowWork = everyNMinutes / minutesADay * lowWork_counter;
double noWork = 1.0 - (everyNMinutes / minutesADay * noWork_counter);
listDateSharedWork.Add(new DateSharedWork(currDate, work, lowWork, noWork));
});
listDateSharedWork.Sort((x, y) => DateTime.Compare(x.Date, y.Date));
return listDateSharedWork;
}
Edit*
class definitions
public class DateSharedWork
{
public DateSharedWork(DateTime date, double? work = 0.0, double? lowWork = 0.0, double? noWork = 1.0)
{
this.Date = date;
this.Work = work.Value;
this.LowWork = lowWork.Value;
this.NoWork = noWork.Value;
}
public DateTime Date { get; private set; }
public double Work { get; private set; }
public double LowWork { get; private set; }
public double NoWork { get; private set; }
}
I would try to add a uint in DateSharedWork to store an integer as representation of the date
i.e. : 09-07-2021-10:05:12.13546 => 202109071005
You can add the calculation of this uint field in the DateSharedWork constructor. It will surely cost at the loading of data.
Or you can tried to add the field and the calculation in the underlying DB and performing the calculation when performing upsert.
In the end i think it might improve query performance in your code. At least I already saw this approach in a Data Cube perf hint.
You could speed up the calculation of the foundTime list, by limiting the search to only these WorkTime instances that are relevant with the currDate. To do this you must first build a Dictionary that has a DateTime as key, a List<WorkTime> as value:
Dictionary<DateTime, List<WorkTime>> perDay = new();
foreach (var workTime in listWorkTime)
{
for (var d = workTime.FromDate.Date; d < workTime.ToDate.Date.AddDays(1); d.AddDays(1))
{
if (!perDay.TryGetValue(d, out var list)) perDay.Add(d, list = new());
list.Add(workTime);
}
}
This constitutes an extra work that must be done before starting the calculations, but hopefully it will speed the calculations enough to compensate for the initial cost.
Then you will be able to replace this:
var foundTime = listWorkTime
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate).ToList();
With this:
List<WorkTime> foundTime;
if (perDay.TryGetValue(currDate, out List<WorkTime> currDateList))
{
foundTime = currDateList
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate)
.ToList();
}
else
{
foundTime = new();
}
I would also suggest to replace the Parallel.For loop with a normal for loop. The parallel programming has quite a lot of gotchas, ready to catch the unwary by surprise.

How to check if my date range is totally covered by CONSECUTIVE rows?

I need to check if a date range is totally covered by this date range table sorted in ascending order of dFrom, both are Date type:
dFrom dTo
----- -----
10/01 10/03
10/05 10/08
10/08 10/09
10/09 10/12
10/13 10/18
10/15 10/17
10/19 10/24
range A: 10/01-10/14 is NOT totally covered because 10/04 is missing from table.
range B: 10/10-10/20 is totally covered.
What I can think of is for a given date range like A and B, to check if each day is covered in the table:
var dRangeFrom = rangeFrom.Date; // use "var" as C# has no date type
var dRangeTo = rangeTo.Date;
int DaysCovered = 0;
int HowManyDays = (dRangeTo - dRangeFrom).TotalDays()+1;
int StartFromRow = 0;
while (dRangeFrom <= dRangeTo)
{
for (int i=StartFromRow; i<table.rows.count; i++)
{
if (table.rows[i]["dFrom"] > dRangeFrom) // optimization 1: no need to continue.
break;
if (dRangeFrom >= table.rows[i]["dFrom"] && dRangeFrom <= table.rows[i]["dTo"])
{
DaysCovered++;
StartFromRow = i; // optimization 2: next day comparison simply starts from here
break;
}
}
dRangeFrom.AddDays(1);
}
if (DaysCovered == HowManyDays)
Console.Write("Totally covered");
else
Console.Write("NOT");
One way to solve it would be to write a helper method that gets all the days in a range:
public static List<DateTime> GetDaysCovered(DateTime from, DateTime to)
{
var result = new List<DateTime>();
for (var i = 0; i < (to.Date - from.Date).TotalDays; i++)
{
result.Add(from.Date.AddDays(i));
}
return result;
}
And then we can join all the ranges from the table together and see if they match the days in the range we're trying to cover:
foreach (DataRow row in table.Rows)
{
tableDates.AddRange(GetDaysCovered(
row.Field<DateTime>("dFrom").Date,
row.Field<DateTime>("dTo").Date));
}
var rangeDates = GetDaysCovered(dRangeFrom, dRangeTo);
var missingDates = rangeDates
.Where(rangeDate => !tableDates.Contains(rangeDate))
.ToList();
if (missingDates.Any())
{
Console.Write("These dates are not covered: ");
Console.Write(string.Join(",",
missingDates.Select(date => date.ToShortDateString())));
}
else
{
Console.Write("Totally covered");
}
A naive solution is to check for each date in the range whether it is covered by any row.
var totallyCovered = true;
for (var date = rangeFrom.Date; date <= rangeTo.Date; date = date.AddDays(1))
{
var covered = dates.Any(x => date >= x.dFrom && date <= x.dTo);
if (!covered)
{
totallyCovered = false;
break;
}
}
if (totallyCovered)
{
Console.WriteLine("Totally covered.");
}
else
{
Console.WriteLine("No.");
}
That's kinda long and ugly, but thankfully you can fit that into a single LINQ query:
var dateRange = Enumerable.Range(0, 1 + rangeTo.Subtract(rangeFrom).Days)
.Select(offset => rangeFrom.Date.AddDays(offset));
var totallyCovered = dateRange.All(d => dates.Any(x => d >= x.dFrom && d <= x.dTo));
Note: This has time complexity of O(|range| * |rows|), which might be too much. To fix that you'd have to employ a more sophisticated data structure that would allow you to query ranges in logarithmic time, but since your original sample also contained nested loops, I'll assume it's unnecessary.

Aggregate value takes very long time

I have one big list of 15 min values for oround year. and I would like to aggregate them into hours. I am doing it in very simple way :
for (; from <= to; from = from.AddHours(1))
{
List<DataPoint> valuesToAgregate = data.Where(x => x.TimeStamp >= from && x.TimeStamp < from.AddHours(1)).ToList();
dailyInputData.Add(valuesToAgregate.Sum(x=>x.Val));
}
This way it takes a lot of time, like 30 seconds for 35k of values, is there any way to optimize it ? maybe use ordering functionality or some how add index to list or using grouping by instead of for loop?
Of course, if you order your list by TimeStamp previously, this will work quicker. Example:
var orderedData = data.OrderBy(item => item.TimeStamp).ToList();
int firstIndex = 0;
var from = orderedData.First().TimeStamp;
var to = orderedData.Last().TimeStamp;
while (from < to)
{
var sum = 0;
var newTo = from.AddHours(1);
while (firstIndex < data.Count && orderedData[firstIndex].TimeStamp < newTo)
{
sum += orderedData[firstIndex].Val;
++firstIndex;
}
dailyInputData.Add(sum);
from = from.AddHours(1);
}
data = data.Sort(x=>x.TimeStamp);
int counter = 0;
var boundary = from.AddHours(1);
foreach(var d in data){
if(d.TimeStamp > boundary){
boundary = boundary.AddHours(1);
counter = 0;
dailyInputData.Add(counter);
}
++counter;
}
This problem lies in the logic
the list is scanned from start to end every time to find the candidate values (your where clause)
the candidate values are inserted to another temp list
the temp list is THEN scanned from start to end to calculate the sum
The fastest approach:
sort the list
go through the items, if they belong to the current group, add the counter, otherwise you've jumped to a new group, flush the counter to record the value and start it over again

Data Aggregations over time, when time is variable

I am in the process of developing an application which calculates the shared acquired in a product over a specified time period (Term).
After the calculations have been performed, it is necessary for me to aggregate the data into groups based on a predefined review period (for example if the time required to gain 100% ownership of the product is 25 years, and the review period value is 5 years, I would have 5 sets of data aggregations for the agreement).
I perform the aggregations as shown by looping through my calculation result set:
if (Year% ReviewPeriod == 0)
{
// Perform Aggregations
}
This works fine in most scenarios.
However I do have a number of scenarios where the product reaches 100% ownership before the end of term.
What I need to be able to do is aggregate the calculations performed based on the ReviewPeriod variable, but if the final number of values in the calculations is not equal to the review period, aggregate the items based on the number of items remaining.
For example, given a 22 year term, data would be aggregated based on the Review Period variable, however if there is a remainder, then the remainder should be aggregated based on the value of the remainder.
Worked Example
Year 0 - 5 = 5 Aggregations
Year 6 - 10 = 5 Aggregations
Year 11 - 15 = 5 Aggregations
Year 16 - 20 = 5 Aggregations
Year 21 - 22 = 2 Aggregations
Could anyone help me with the logic to aggregate the data as I have described.
Probably the simplest way would be something like:
for ( int year = 0; year <= max_year; year++ ) {
if ( year % reviewPeriod == 0 ) {
// start a new aggregation
}
// add year to current aggregation
}
You could keep a list of aggregations and add a new one at the start of each period.
Here is a working example that just groups years in lists:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Aggregations
{
class Program
{
static void Main(string[] args)
{
int maxYear = 22;
int period = 5;
int year = 1985;
List<List<int>> aggregations = new List<List<int>>();
int i = -1;
for (int y = 0; y <= maxYear; y++)
{
if (y % period == 0)
{
aggregations.Add(new List<int>());
i++;
}
aggregations.ElementAt(i).Add(year);
year++;
}
foreach ( List<int> l in aggregations )
{
foreach (int yy in l)
{
Console.Write(yy + " ");
}
Console.WriteLine();
}
}
}
}
You've not really given enough of your code to go on. Hopefully you should be able to use this however your loop is currently set up. It "leaks" the mod value to the outside of the loop; after the loop is over, you can check the final mod value to see how many aggregations are left.
int modValue = 0;
for //foreach/while/... - your loop here
{
...
modValue = Year % ReviewPeriod;
if (modValue == 0)
{
// Perform Aggregations
}
...
} // end of your loop
if (modValue != 0)
{
// Perform final aggregation. There are modValue items to aggregate.
}
I think my suggestion is not worth 300rep bounty, and either I misunderstood your problem, or you've overshot the bounty..
Do your existing code that calculates the final aggregations works well? If so, then to determine the ranges yo umay just use modulo (%) and simple math:
int minYear = ...the first year // inclusive, i.e. 1970
int maxYear = ...the last year // inclusive, i.e. 2012
int span = maxYear - minYear + 1; // 1970..2012->43, 2001..2006->6
int fullFives = span / 5; // 1970..2012->8, 2001..2006->1
int remainder = span % 5; // 2001..2006->3, 2001..2006->1
for(int i=0; i<fullFives; ++i)
{
int yearFrom = minYear + 5*i
int yearTo = minYear + 5*(i+1) - 1
// 1970..2012 -> 1970-1974, 1975-1979,1980-1984,1985-1989,1990-1994,1995-1999,2000-2004,2005-2009
// 2001..2006 -> 2001-2005
aggregate(yearFrom, yearTo);
}
if(remainder > 0)
{
int yearFrom = minYear + 5*fullFives
int yearTo = minYear + maxYear
// 1970..2012 -> 2010-2012
// 2001..2006 -> 2006-2006
aggregate(yearFrom, yearTo);
}
This is written "out of thin air", I've not checked/compiled it - it is just to sketch the idea.
Note: you've said that everything works but sometimes "a number of scenarios where the product reaches 100% ownership before the end of term." - that would suggest that you rather have an error in the calculations, not in the looping. If the error were in the loop or year boundary detection, then probably almost all would be off. It's hard to say without more of the calculating code is revealed.
The code sample will fire on years 0, 5, 10 etc rather than for every year.
If you just need the number of years to aggregate when that code fires, and the term can be set in advance when a product reaches 100% ownership early, I think this would work:
int term = 22;
int reviewperiod = 5;
for (int year = 0; year < term; year++)
{
if (year % reviewperiod == 0)
{
var endyear = Math.Min(year + reviewperiod, term);
Console.WriteLine("Aggregate years {0} to {1}, {2} Aggregations ", year, endyear, endyear - year);
}
}
Do you think of something like
private int reviewPeriod = 5;
public void Aggregate(int term)
{
Enumerable.Range(0, term)
.ToList()
.Foreach(this.AggregateYear);
}
when this.AggregateYear is defined as follows
public void AggregateYear(int year)
{
var currentRemainder = year % this.reviewPeriod;
var aggregatePeriod = (currentRemainder == 0)
? this.reviewPeriod
: currentRemainder;
this.PerformAggregation(aggregatePeriod);
}
and this.PerformAggregation is defined as follows
private void PerformAggregation(int aggregatePeriod)
{
//...
}
Assuming this data is in memory (since you have not specified otherwise), then you can just use the GroupBy function from Linq:
struct YearValue
{
public int Year, Value;
}
static void Main()
{
// Create some data, hopefully representative of what you are dealing with...
Random r = new Random();
YearValue[] dataValues = new YearValue[22];
for (int i = 0; i < dataValues.Length; i++)
dataValues[i] = new YearValue {Year = i, Value = r.Next(200)};
// Average of values across 'ReviewPeriod' of five:
foreach (var item in dataValues.AsEnumerable().GroupBy(i => i.Year / 5))
{
YearValue[] items = item.ToArray();
Console.WriteLine("Group {0} had {1} item(s) averaging {2}",
item.Key,
items.Length,
items.Average(i => i.Value)
);
}
}
This program then outputs the following text:
Group 0 had 5 item(s) averaging 143.6
Group 1 had 5 item(s) averaging 120.4
Group 2 had 5 item(s) averaging 83
Group 3 had 5 item(s) averaging 145.2
Group 4 had 2 item(s) averaging 98.5

Calculate the average TimeSpan between a collection of DateTimes

Let's say we're tracking the times when a user is performing a certain action, and we want to know the average time between said actions.
For example, if the user performed this action at these times:
today, 1 PM
today, 3 PM
today, 6 PM
The result would be 2.5 hours.
I actually have solved this already, but I felt my solution was more complicated than necessary. I'll post it as an answer.
It seems that you are basically looking for Max - Min divided by Count.
public TimeSpan? Average
{
get
{
var diff = _dateTimes.Max().Subtract(_dateTimes.Min());
var avgTs = TimeSpan.FromMilliseconds(diff.TotalMilliseconds / (_dateTimes.Count() - 1));
return avgTs;
}
}
Make sure you check that there is more than one DateTime.
Update: Even more accurate if you use Ticks.
TimeSpan.FromTicks(diff.Ticks / (_dateTimes.Count() - 1));
I recently had a similar task in where I had a long running operation iterating over thousands of rows with 20-30 iterations within each.
void LongRunningOperation()
{
int r = 5000;
int sR = 20;
List<TimeSpan> timeSpanList = new List<TimeSpan>();
for (int i = 0; i < r; i++)
{
DateTime n = DateTime.Now; // Gets start time of this iteration.
for (int x = 0; x < sR; x++)
{
// DOING WORK HERE
}
timeSpanList.Add(DateTime.Now - n); // Gets the length of time of iteration and adds it to list.
double avg = timeSpanList.Select(x => x.TotalSeconds).Average(); // Use LINQ to get an average of the TimeSpan durations.
TimeSpan timeRemaining = DateTime.Now.AddSeconds((r - i) * avg) - DateTime.Now;
// Calculate time remaining by taking the total number of rows minus the number of rows done multiplied by the average duration.
UpdateStatusLabel(timeRemaining);
}
}
This is how I solved it, but I don't like it much:
public class HistoryItem
{
private IEnumerable<DateTime> _dateTimes;
public TimeSpan? Average
{
get {
TimeSpan total = default(TimeSpan);
DateTime? previous = null;
int quotient = 0;
var sortedDates = _dateTimes.OrderBy(x => x);
foreach (var dateTime in sortedDates)
{
if (previous != null)
{
total += dateTime - previous.Value;
}
++quotient;
previous = dateTime;
}
return quotient > 0 ? (TimeSpan.FromMilliseconds(total.TotalMilliseconds/quotient)) as TimeSpan? : null;
}
}
}

Categories

Resources