I am creating time buckets of 60 minutes each. This is data that is returned from a Postgres query.
the duration that is being considered for creating buckets is like 1 day, 7 days.
there is an instance when there are missing buckets. to fill those I am creating a method that will identify the missing point and will be added in the series.
This method is built with C#.
to demonstrate logic, I have created a sample.
DateTimeOffset startTime = DateTime.Now.AddDays(-1), startTargetTime = DateTime.Now.AddDays(-1);
DateTimeOffset endTime = DateTimeOffset.Now;
int interval = 60;
List<DateTimeOffset> series = new List<DateTimeOffset>();
List<DateTimeOffset> target = new List<DateTimeOffset>(); // this will be the database values
int cntr = 0;
while(startTime < endTime)
{
startTime = startTime.AddMinutes(interval);
if (cntr % 2 == 0)
{
target.Add(startTime) ;
}
cntr++;
}
while (startTargetTime < endTime)
{
startTargetTime = startTargetTime.AddMinutes(interval);
series.Add(startTargetTime);
}
if(target.Count != series.Count)
{
foreach (var item in series)
{
if(!target.Exists(val => val == item))
{
target.Add(item);
}
}
}
The current problem is even though the same item exists it is still added in the target list. what could be the problem.
Are there more efficient ways to make the comparison and add the missing items?
This portion of your code, which populates series, is independent of anything going on in target. It just adds all values in the interval.
while (startTargetTime < endTime)
{
startTargetTime = startTargetTime.AddMinutes(interval);
series.Add(startTargetTime);
}
To fix it, just do a check against whether the value already exists in target:
while (startTargetTime < endTime) {
startTargetTime = startTargetTime.AddMinutes(interval);
if (!target.Contains(startTargetTime)) {
series.Add(startTargetTime);
}
}
Related
I have the following algorithm which works fine for smaller date ranges, however if I increase the date range to about a year (startDate, endDate) it obviously drops performance because I loop every minute of each day, is there a way to improve the performance by using different list types such as hashsets or dictionaries or are there other fallbacks I'm not aware of?
listWorkTime contains around 300+ entries where some date ranges could overlap or being the same but having different TimeRangeId
private List<DateSharedWork> CalculateDateSharedWork(DateTime startDate,
DateTime endDate, ICollection<WorkTime> listWorkTime)
{
List<DateSharedWork> listDateSharedWork = new List<DateSharedWork>();
// +1 to include last day at full
int range = endDate.Subtract(startDate).Days + 1;
// start at startDate
Parallel.For(0, range, i =>
{
DateTime currDate = startDate.AddDays(i);
//set minute interval
double everyNMinutes = 1.0;
double minutesADay = 1440.0;
// reset counter
int work_counter = 0;
int lowWork_counter = 0;
int noWork_counter = 0;
int l = (int)(minutesADay / everyNMinutes);
for (int j = 0; j < l; j++)
{
DateTime check15 = currDate.AddMinutes(j * everyNMinutes);
// check if listWorkTime includes current date
var foundTime = listWorkTime
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate).ToList();
if (foundTime.Count(x => x.TimeRangeId == 1) > 0)
{
// found interval that is within work hours
work_counter++;
noWork_counter++;
}
else
{
if (foundTime.Count(x => x.TimeRangeId == 2) > 0)
{
// found intervall that is within low work hours
lowWork_counter++;
noWork_counter++;
}
}
};
double work = everyNMinutes / minutesADay * work_counter;
double lowWork = everyNMinutes / minutesADay * lowWork_counter;
double noWork = 1.0 - (everyNMinutes / minutesADay * noWork_counter);
listDateSharedWork.Add(new DateSharedWork(currDate, work, lowWork, noWork));
});
listDateSharedWork.Sort((x, y) => DateTime.Compare(x.Date, y.Date));
return listDateSharedWork;
}
Edit*
class definitions
public class DateSharedWork
{
public DateSharedWork(DateTime date, double? work = 0.0, double? lowWork = 0.0, double? noWork = 1.0)
{
this.Date = date;
this.Work = work.Value;
this.LowWork = lowWork.Value;
this.NoWork = noWork.Value;
}
public DateTime Date { get; private set; }
public double Work { get; private set; }
public double LowWork { get; private set; }
public double NoWork { get; private set; }
}
I would try to add a uint in DateSharedWork to store an integer as representation of the date
i.e. : 09-07-2021-10:05:12.13546 => 202109071005
You can add the calculation of this uint field in the DateSharedWork constructor. It will surely cost at the loading of data.
Or you can tried to add the field and the calculation in the underlying DB and performing the calculation when performing upsert.
In the end i think it might improve query performance in your code. At least I already saw this approach in a Data Cube perf hint.
You could speed up the calculation of the foundTime list, by limiting the search to only these WorkTime instances that are relevant with the currDate. To do this you must first build a Dictionary that has a DateTime as key, a List<WorkTime> as value:
Dictionary<DateTime, List<WorkTime>> perDay = new();
foreach (var workTime in listWorkTime)
{
for (var d = workTime.FromDate.Date; d < workTime.ToDate.Date.AddDays(1); d.AddDays(1))
{
if (!perDay.TryGetValue(d, out var list)) perDay.Add(d, list = new());
list.Add(workTime);
}
}
This constitutes an extra work that must be done before starting the calculations, but hopefully it will speed the calculations enough to compensate for the initial cost.
Then you will be able to replace this:
var foundTime = listWorkTime
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate).ToList();
With this:
List<WorkTime> foundTime;
if (perDay.TryGetValue(currDate, out List<WorkTime> currDateList))
{
foundTime = currDateList
.Where(x => check15 >= x.FromDate && check15 <= x.ToDate)
.ToList();
}
else
{
foundTime = new();
}
I would also suggest to replace the Parallel.For loop with a normal for loop. The parallel programming has quite a lot of gotchas, ready to catch the unwary by surprise.
I need to check if a date range is totally covered by this date range table sorted in ascending order of dFrom, both are Date type:
dFrom dTo
----- -----
10/01 10/03
10/05 10/08
10/08 10/09
10/09 10/12
10/13 10/18
10/15 10/17
10/19 10/24
range A: 10/01-10/14 is NOT totally covered because 10/04 is missing from table.
range B: 10/10-10/20 is totally covered.
What I can think of is for a given date range like A and B, to check if each day is covered in the table:
var dRangeFrom = rangeFrom.Date; // use "var" as C# has no date type
var dRangeTo = rangeTo.Date;
int DaysCovered = 0;
int HowManyDays = (dRangeTo - dRangeFrom).TotalDays()+1;
int StartFromRow = 0;
while (dRangeFrom <= dRangeTo)
{
for (int i=StartFromRow; i<table.rows.count; i++)
{
if (table.rows[i]["dFrom"] > dRangeFrom) // optimization 1: no need to continue.
break;
if (dRangeFrom >= table.rows[i]["dFrom"] && dRangeFrom <= table.rows[i]["dTo"])
{
DaysCovered++;
StartFromRow = i; // optimization 2: next day comparison simply starts from here
break;
}
}
dRangeFrom.AddDays(1);
}
if (DaysCovered == HowManyDays)
Console.Write("Totally covered");
else
Console.Write("NOT");
One way to solve it would be to write a helper method that gets all the days in a range:
public static List<DateTime> GetDaysCovered(DateTime from, DateTime to)
{
var result = new List<DateTime>();
for (var i = 0; i < (to.Date - from.Date).TotalDays; i++)
{
result.Add(from.Date.AddDays(i));
}
return result;
}
And then we can join all the ranges from the table together and see if they match the days in the range we're trying to cover:
foreach (DataRow row in table.Rows)
{
tableDates.AddRange(GetDaysCovered(
row.Field<DateTime>("dFrom").Date,
row.Field<DateTime>("dTo").Date));
}
var rangeDates = GetDaysCovered(dRangeFrom, dRangeTo);
var missingDates = rangeDates
.Where(rangeDate => !tableDates.Contains(rangeDate))
.ToList();
if (missingDates.Any())
{
Console.Write("These dates are not covered: ");
Console.Write(string.Join(",",
missingDates.Select(date => date.ToShortDateString())));
}
else
{
Console.Write("Totally covered");
}
A naive solution is to check for each date in the range whether it is covered by any row.
var totallyCovered = true;
for (var date = rangeFrom.Date; date <= rangeTo.Date; date = date.AddDays(1))
{
var covered = dates.Any(x => date >= x.dFrom && date <= x.dTo);
if (!covered)
{
totallyCovered = false;
break;
}
}
if (totallyCovered)
{
Console.WriteLine("Totally covered.");
}
else
{
Console.WriteLine("No.");
}
That's kinda long and ugly, but thankfully you can fit that into a single LINQ query:
var dateRange = Enumerable.Range(0, 1 + rangeTo.Subtract(rangeFrom).Days)
.Select(offset => rangeFrom.Date.AddDays(offset));
var totallyCovered = dateRange.All(d => dates.Any(x => d >= x.dFrom && d <= x.dTo));
Note: This has time complexity of O(|range| * |rows|), which might be too much. To fix that you'd have to employ a more sophisticated data structure that would allow you to query ranges in logarithmic time, but since your original sample also contained nested loops, I'll assume it's unnecessary.
I have the code fragment below (short version first, compete version after) which loops over lots of records which are in chronological order. The number of records ranges from 100's of thousands to millions. I need to compare the time interval between successive records and determine the difference in minutes to decide on some action and set a value. This is the performance bottleneck of the whole application so I need to do something. The profiler clearly shows that
(DayList[nextIndex].ThisDate - entry.ThisDate).Minutes
is the bottleneck of the bottleneck. When this is solved, the next bottleneck will be the date call in the DayList creation:
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
Those two lines roughly take 60% - 70% of all CPU.
So the question is: how can I increase performance (dramatically) or should I abandon this road completely (because this performance is unacceptable)?
for ( DateTime i=startdate; i<=enddate; i=i.AddDays(1) )
{
int nextIndex = 0;
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
foreach (MonthfileValue entry in DayList)
{
if (++nextIndex < DayList.Count - 1)
{
IntervalInMinutes = (DayList[nextIndex].ThisDate - entry.ThisDate).Minutes;
}
// do some calculations
}
// do some calculations
}
The complete version is below:
for ( DateTime i=startdate; i<=enddate; i=i.AddDays(1) )
{
int nextIndex = 0;
DaySolarValues tmp = new DaySolarValues();
List<MonthfileValue> DayList = thisList.Where(x => x.ThisDate.Date == i.Date).ToList();
foreach (MonthfileValue entry in DayList)
{
if (++nextIndex < DayList.Count - 1)
{
OldIntervalInMinutes = IntervalInMinutes;
IntervalInMinutes = (DayList[nextIndex].ThisDate - entry.ThisDate).Minutes;
if (IntervalInMinutes > 30)
{
IntervalInMinutes = OldIntervalInMinutes; //reset the value and try again
continue; // If more than 30 minutes, then skip this data
}
else if (IntervalInMinutes != OldIntervalInMinutes)
{
// Log some message and continue
}
}
tmp.SolarHours += entry.SolarRad / entry.SolarTheoreticalMax >= SunThreshold ? IntervalInMinutes : 0;
tmp.SolarEnergy += entry.SolarRad * IntervalInMinutes * 60;
tmp.SunUpTimeInMinutes += IntervalInMinutes;
}
tmp.SolarHours /= 60;
tmp.SolarEnergy /= 3600;
tmp.ThisDate = i;
DailySolarValuesList.Add(tmp);
}
I can clearly see that the Where(...) call steals performance.
For me it would be the first step to try this:
var dayLookup = thisList.ToLookup(x => x.ThisDate.Date);
for ( DateTime currentDate =startdate; currentDate <=enddate; currentDate = currentDate.AddDays(1) )
{
int nextIndex = 0;
List<MonthfileValue> DayList = dayLookup[currentDate];
...
}
This way you create a hash lookup before the loop, so getting the DayList will be a less expensive operation
My objective is to populate a combo box with time intervals of 30 min for 24 hours. I.E - 12.00am, 12.30am, 1.00am, 1.30am and so on. I need to know how to put these details into array. Thank you
Perhaps:
string[] comboboxDataSource = Enumerable.Range(0, 2 * 24)
.Select(min => DateTime.Today.AddMinutes(30 * min).ToString("h.mmtt", CultureInfo.InvariantCulture))
.ToArray();
One way is to iterate 30 minutes in a day and add this DateTime values with a specific string representation to your list. Like;
List<string> list = new List<string>();
DateTime start = DateTime.Today;
DateTime end = DateTime.Today.AddDays(1);
while (end > start)
{
list.Add(start.ToString("h.mmtt", CultureInfo.InvariantCulture));
start = start.AddMinutes(30);
}
If you wanna get them as an array, just use list.ToArray() to get it. Also time designators are in .NET Framework are mostly (I haven't check all of them) upper case. That means, you will get AM or PM when you use tt specifier, not am or pm. In such a case, you need to replace these values with their lower cases.
Don't know exactly what you mean. I would start with something like this:
private IEnumerable<Timespan> Get30MinuteIntervalls()
{
var currentValue = new Timespan(0);
while (currentValue <= Timespan.FromHours(24)
{
yield return currentValue;
currentValue = currentValue.Add(Timespan.FromMinutes(30));
}
}
var values = Get30MinuteIntervalls().ToArray();
Try:
var d = new DateTime();
d = d.Date.AddHours("0").AddMinutes("0");
for (int i = 0; i < 48; i++)
{
d.AddMinutes(30);
cbo.AddItem(d.TimeOfDay.ToString());
}
Let's say we're tracking the times when a user is performing a certain action, and we want to know the average time between said actions.
For example, if the user performed this action at these times:
today, 1 PM
today, 3 PM
today, 6 PM
The result would be 2.5 hours.
I actually have solved this already, but I felt my solution was more complicated than necessary. I'll post it as an answer.
It seems that you are basically looking for Max - Min divided by Count.
public TimeSpan? Average
{
get
{
var diff = _dateTimes.Max().Subtract(_dateTimes.Min());
var avgTs = TimeSpan.FromMilliseconds(diff.TotalMilliseconds / (_dateTimes.Count() - 1));
return avgTs;
}
}
Make sure you check that there is more than one DateTime.
Update: Even more accurate if you use Ticks.
TimeSpan.FromTicks(diff.Ticks / (_dateTimes.Count() - 1));
I recently had a similar task in where I had a long running operation iterating over thousands of rows with 20-30 iterations within each.
void LongRunningOperation()
{
int r = 5000;
int sR = 20;
List<TimeSpan> timeSpanList = new List<TimeSpan>();
for (int i = 0; i < r; i++)
{
DateTime n = DateTime.Now; // Gets start time of this iteration.
for (int x = 0; x < sR; x++)
{
// DOING WORK HERE
}
timeSpanList.Add(DateTime.Now - n); // Gets the length of time of iteration and adds it to list.
double avg = timeSpanList.Select(x => x.TotalSeconds).Average(); // Use LINQ to get an average of the TimeSpan durations.
TimeSpan timeRemaining = DateTime.Now.AddSeconds((r - i) * avg) - DateTime.Now;
// Calculate time remaining by taking the total number of rows minus the number of rows done multiplied by the average duration.
UpdateStatusLabel(timeRemaining);
}
}
This is how I solved it, but I don't like it much:
public class HistoryItem
{
private IEnumerable<DateTime> _dateTimes;
public TimeSpan? Average
{
get {
TimeSpan total = default(TimeSpan);
DateTime? previous = null;
int quotient = 0;
var sortedDates = _dateTimes.OrderBy(x => x);
foreach (var dateTime in sortedDates)
{
if (previous != null)
{
total += dateTime - previous.Value;
}
++quotient;
previous = dateTime;
}
return quotient > 0 ? (TimeSpan.FromMilliseconds(total.TotalMilliseconds/quotient)) as TimeSpan? : null;
}
}
}