How to group a time series by interval (OHLC bars) with LINQ - c#

I've seen variation on this question before, but without a definite answer.
I have a list of object with a timestamp (stock trades data, or 'ticks'):
Class Tick
{
Datetime Timestamp;
double Price;
}
I want to generate another list based on those values which is grouped by certain interval
in order to create an OHLC bar (Open, High, Low, Close).
These bars may be of any interval specified (1 minute, 5, 10 or even 1 hour).
I also need to find an efficient way to sort new "ticks" into the list, as they
may arrive at high rate (3-5 ticks per second).
Would appreciate any thoughts on this, Thanks!

I want to generate another list based
on those values which is grouped by
certain interval in order to create an
OHLC bar (Open, High, Low, Close).
These bars may be of any interval
specified (1 minute, 5, 10 or even 1
hour).
Unfortunately, you haven't specified:
What the phase of the bar-series will be.
Whether a bar's begin / end times are purely "natural-time" based (depend solely on a fixed schedule rather than on the timestamp of the first and last ticks in it) or not.
Assuming natural-time intra-day bars, the phases are usually clamped to midnight. So hourly bars will be 00:00 - 01:00, 01:00 - 02:00, etc. In this case, the begin / end-time of a bar can serve as its unique-key.
So then the problem becomes: To what bar- begin / end time does a tick's timestamp belong to? If we assume everything I've assumed above, that can be solved easily with some simple integer math. The query can then be something like (untested, algo only):
var bars = from tick in ticks
// Calculate the chronological, natural-time, intra-day index
// of the bar associated with a tick.
let barIndexForDay = tick.Timestamp.TimeOfDay.Ticks / barSizeInTicks
// Calculate the begin-time of the bar associated with a tick.
// For example, turn 2011/04/28 14:23.45
// into 2011/04/28 14:20.00, assuming 5 min bars.
let barBeginDateTime = tick.Timestamp.Date.AddTicks
(barIndexForDay * barSizeInTicks)
// Produce raw tick-data for each bar by grouping.
group tick by barBeginDateTime into tickGroup
// Order prices for a group chronologically.
let orderedPrices = tickGroup.OrderBy(t => t.Timestamp)
.Select(t => t.Price)
select new Bar
{
Open = orderedPrices.First(),
Close = orderedPrices.Last(),
High = orderedPrices.Max(),
Low = orderedPrices.Min(),
BeginTime = tickGroup.Key,
EndTime = tickGroup.Key.AddTicks(barSizeInTicks)
};
It's common to want to locate a bar by index / date-time as well as to enumerate all bars in a series chronologically. In this case, you might want to consider storing the bars in a collection such as a SortedList<DateTime, Bar> (where the key is a bar's begin or end time), which will fill all these roles nicely.
I also need to find an efficient way
to sort new "ticks" into the list, as
they may arrive at high rate (3-5
ticks per second).
It depends on what you mean.
If these ticks are coming off a live price-feed (chronologically), you don't need a look-up at all - just store the current, incomplete, "partial" bar. When a new tick arrives, inspect its timestamp. If it is still part of the current "partial" bar, just update the bar with the new information (i.e. Close = tick.Price, High = Max(oldHigh, tick.Price) etc.). Otherwise, the "partial" bar is done - push it into your bar-collection. Do note that if you are using "natural-time" bars, the end of a bar could also be brought on by the passage of time rather than by a price-event (e.g. an hourly bar completes on the hour).
EDIT:
Otherwise, you'll need to do a lookup. If you're storing in the bars in a sorted-list (keyed by begin-time / end-time) as I've mentioned above, then you'll just need to calculate the bar begin-time / end-time associated with a tick. That should be easy enough; I've already given you a sample of how you might accomplish that in the LINQ query above.
For example:
myBars[GetBeginTime(tick.Timestamp)].Update(tick);

Related

Are DateTimes passed by reference in C#? If not, why is my object updating as I change a variable?

I have ran into a problem which I simply cannot get my head around. When I debug the program, I can see the program works fine - this is the strange part.
The issue I am facing is when I append to a List with the new object - it seems to completely change. Let me explain better by showing my code.
System.DateTime timeNow = System.DateTime.Now;
List<Train> trainsOnNet = new List<Train>();
for(int i = 1; i <= 3; i++)
{
Train t = new Train();
t.NewCarrige(true, "A"[0]);
t.NewCarrige(false, "B"[0]);
t.addStation(App.Stations.FirstOrDefault(x => x.GetShortCode().Equals("MTG")));
t.addStation(App.Stations.FirstOrDefault(x => x.GetShortCode().Equals("BNS")));
t.addStation(App.Stations.FirstOrDefault(x => x.GetShortCode().Equals("BSH")));
int minsToAdd = 5;
t.GetStations().ForEach(x =>
{
timeNow = timeNow.AddMinutes(minsToAdd);
x.SetArrivalTime(timeNow);
minsToAdd += 10;
});
timeNow = timeNow.AddMinutes(15);
trainsOnNet.Add(t);
}
When I add t to the trainsOnNet List, the time seems to change to the last time that the loop will generate, even before it generates.
If I place a stopper on this line, I can see that the t instance has the correct time variables (ie, 5 minutes from the current execution time and then 10 minutes between each). However, when I then press continue and inspect the trainsOnNet list. This time has been changed to the next trains set of times.
It appears to me that timeNow is being passed by reference, and as the loop increases the time, the stored time updates until I am left with 3 trains all saying the same time.
For example, if I execute the program at 1951 with a stopper on trainsOnNet.Add(t) I can see that t holds 3 stations, in which the first stations Arrival time is 19:56 and the second is 20:11 and 20:26. Perfect. I then hit continue and inspect the newly appended object to my list. On inspection, the t instance arrival time properties have now changed to:
20:56, 21:11, 21:36
Doing the maths of my code, the next train should arrive 20 minutes after the previous train has arrived at the end station. Which brings me to 20:46. Thus deeming it more confusing why the first train is being changed past the second trains expected time let alone being updated without stating to do so.
Any ideas would be greatly appricated.
Stop on the line on the first execution:
Stop on same line, after pressing continue (changed properties):
As you can see, the values are being changed? In this case, by a whole hour?
System.DateTime is a "value type". You can check that by seeing on your intellisense in Visual Studio or in the documentation that it is declared as a struct, not a class (the latter are reference types).
Your problem, however, is not really about this.
It's not clear what you exactly intend to do, and the flaw is probbaly in your logic implementation.
1 - you reuse the same variable timeNow for all iterations, it's hard (at least to me) to keep track on how much it is supposed to be and how much the value is after a few "mental" iterations.
2 - you set minsToAdd to 5, then increase it by 10 in loop. So, after a few iterations of the inner loop, minsToAdd gets these values :
15, 25, 35, 45, etc...
Then, you add this to timeNow, so imagine that timeNow is 00:00 at the beginning, in the inner loop it gets values
00:05,
00:05 + 15min = 00:20
00:20 + 25min = 00:55
00:55 + 35min = 01:30
etc...
and after the inner loop you add 15 min to the last value.
Is that what you expect?
Try to change these two lines:
timeNow = timeNow.AddMinutes(minsToAdd);
x.SetArrivalTime(timeNow);
To just:
x.SetArrivalTime(timeNow.AddMinutes(minsToAdd));
As others explained, you're overriding the timeNow variable with each iteration. This change should give you the expected result.

Modify code to get synthetic data that trends smoothly from bull to bear market cycles

I have this class that generates synthetic looking (stock) data and it works fine. However, I want to modify it so that NewPrice generates smooth trending data for say n-bars.
I know that if I reduce the volatility, I get smoother prices. However, not sure how to guarantee that the data goes into alternating persistant trend either up/down. A sine wave looking thing, but with stock looking prices, i.e, no negative prices.
Price = Trend + Previous Price + Random Component I am missing the trend component in the implementation below.
Any suggestions?
class SyntheticData
{
public static double previous = 1.0;
public static double NewPrice(double volatility, double rnd)
{
var change_percent = 2 * volatility * rnd;
if (change_percent > volatility)
change_percent -= (2 * volatility);
var change_amount = previous * change_percent;
var new_price = previous + change_amount;
previous = new_price;
return new_price;
}
}
Trade.previous = 100.0;
Price = Trade.NewPrice(.03, rnd.NextDouble()),
Exponential smoothing or exponential moving average will create the type of data you want. Ideally, you would have existing stock price data that represents the type of time series that you want to generate. You fit an exponential smoothing model to your data. This will determine a number of parameters for that model. You can then use the model and its parameters to generate similar time series with the same kind of trends, and you can control the volatility (standard deviation) of the random variable associated with the model.
As an example of what you can do, in the image below the blue and yellow parts are from real data, and the green part is synthetic data generated with a model that was fit to the real data.
Time series forecasting is a large topic. I do not know how familiar you are with that topic. See Time Series Analysis, it covers a large range of time series providing clear presentations and examples in Excel. See exponential smoothing for more theoretical background
Here is a specific example of how such a time series can be generated. I chose one of the 30 exponential smoothing models, one that has additive trend and volatility, and no seasonal component. The equations for generating the time series are:
The time index is t, an integer. The values of the time series are yt. lt and bt are respectively the offset and slope components of the time series. Alpha and beta are parameters, and l-1 and b-1 are initial values of the offset and slope components. et is the value of a random variable that follows some distribution, e.g. normal. Alpha and beta must satisfy the relations below for stability of the time series.
To generate different time series you choose values for alpha, beta, l-1, b-1, and the standard deviation of et assuming a normal law, and calculate the successive values of yt. I have done this in Excel for several combinations of values. Here are several time series generated with this model. Sigma is the standard deviation (volatility) of et.
Here are the equations for the 30 models. N means no trend / seasonal component. A means additive component. M means multiplicative component. The d subscript indicates a variant that is damped. You can get all of the details from the references above.
Something like this is what I was looking for:
public static double[] Sine(int n)
{
const int FS = 64; // sampling rate
return MathNet.Numerics.Generate.Sinusoidal(n, FS, 1.0, 20.0);
}
Although, it is not intuitive for a person that wants to deal in prices and time-based periodicity and not in mathematical functions.
https://numerics.mathdotnet.com/Generate.html

Calculate event rate per second

I have a game file with millions of events, file size can be > 10gb
Each line is a game action, like:
player 1, action=kill, timestamp=xxxx(ms granularity)
player 1, action=jump, timestamp=xxxx
player 2, action=fire, timestamp=xxxx
Each action is unique and finite for this data set.
I want to perform analysis on this file, like the total number of events per second, while tracking the individual number of actions in that second.
My plan in semi pseudocode:
lastReadGameEventTime = DateTime.MinValue;
while(line=getNextLine() != null)
{
parse_values(lastReadGameEventTime, out var timestamp, out var action);
if(timestamp == MinValue)
{
lastReadGameEventTime = timestamp;
}
else if(timestamp.subtract(lastReadGameEventTime).TotalSeconds > 1)
{
notify_points_for_this_second(datapoints);
datapoints = new T();
}
if(!datapoints.TryGetValue(action, out var act))
act = new Dictionary<string,int>();
act[action] = 0;
else
act[action]++;
}
lastReadGameEventTime = parse_time(line)
My worry is that this is too naive. I was thinking maybe count the entire minute and get the average per second. But of course I will miss game event spikes.
And if I want to calculate a 5 day average, it will further degrade the result set.
Any clever ideas?
You're asking several different questions here. All are related. Your requirements aren't real detailed, but I think I can point you in the right direction. I'm going to assume that all you want is number of events per second, for some period in the past. So all we need is some way to hold an integer (count of events) for every second during that period.
There are 86,400 seconds in a day. Let's say you want 10 days worth of information. You can build a circular buffer of size 864,000 to hold 10 days' worth of counts:
const int SecondsPerDay = 86400;
const int TenDays = 10 * SecondsPerDay;
int[] TenDaysEvents = new int[TenDays];
So you always have the last 10 days' of counts.
Assuming you have an event handler that reads your socket data and passes the information to a function, you can easily keep your data updated:
DateTime lastEventTime = DateTime.MinValue;
int lastTimeIndex = 0;
void ProcessReceivedEvent(string event)
{
// here, parse the event string to get the DateTime
DateTime eventTime = GetEventDate(event);
if (lastEventTime == DateTime.MinValue)
{
lastTimeIndex = 0;
}
else if (eventTime != lastEventTime)
{
// get number of seconds since last event
var elapsedTime = eventTime - lastEventTime;
var elapsedSeconds = (int)elapsedTime.TotalSeconds;
// For each of those seconds, set the number of events to 0
for (int i = 1; i <= elapsedSeconds; ++i)
{
lastTimeIndex = (lastTimeIndex + 1) % TenDays; // wrap around if we get past the end
TenDaysEvents[lastTimeIndex] = 0;
}
}
// Now increment the count for the current time index
++TenDaysEvents[lastTimeIndex];
}
This keeps the last 10 days in memory at all times, and is easy to update. Reporting is a bit more difficult because the start might be in the middle of the array. That is, if the current index is 469301, then the starting time is at 469302. It's a circular buffer. The naive way to report on this would be to copy the circular buffer to another array or list, with the starting point at position 0 in the new collection, and then report on that. Or, you could write a custom enumerator that counts back from the current position and starts there. That wouldn't be especially difficult to create.
The beauty of the above is that your array remains static. You allocate it once, and just re-use it. You might want to add an extra 60 entries, though, so that there's some "buffer" between the current time and the time from 10 days ago. That will prevent the data for 10 days ago from being changed during a query. Add an extra 300 items, to give yourself a 5-minute buffer.
Another option is to create a linked list of entries. Again, one per second. With that, you add items to the end of the list, and remove older items from the front. Whenever an event comes in for a new second, add the event entry to the end of the list, and then remove entries that are more than 10 days (or whatever your threshold is) from the front of the list. You can still use LINQ to report on things, as recommended in another answer.
You could use a hybrid, too. As each second goes by, write a record to the database, and keep the last minute, or hour, or whatever in memory. That way, you have up-to-the-second data available in memory for quick reports and real-time updates, but you can also use the database to report on any period since you first started collecting data.
Whatever you decide, you probably should keep some kind of database, because you can't guarantee that your system won't go down. In fact, you can pretty much guarantee that your system will go down at some point. It's no fun losing data, or having to scan through terabytes of log data to re-build the data that you've collected over time.

How to disable the booked time slots from a list of business hour time slots

Friends,
I' working Appointment booking Project, Details are as follows:
Business hour starts from 9:00 to 7:00 with default duration of 30mins. So, Slots start like (9:00, 9:30, 10:00.... 7:00).
Here, to show the available slots, I'm using the following Logic.
Storing all the Slots with 30 min duration in a list (LIST A) like [9:00, 9:30, 10:00, 10:30, ... 7:00]
Looping through booked appointments (contains start and end time), and if start time is matched with any of LIST A elements, I', removing that element from that List. and Loop continues.
Here, the problem is, Consider If appointment is booked 9:30-10:00.
Based on my logic, 9:30 is matched with LIST A element, and It will remove 9:30 from that list.
So, available slots will be displayed as [9:00, X ,10:00, 10:30, .... 7:00]. Actually It should be [9:00, 9:30, 10:30, 11:00... 7:00]
Instead of showing available slots 9:00-9:30, 10:30-11:00 it shows 9:00-10:00, 10:30-11 since 9:30 is removed from the list.,
Please help to solve this, or suggest me some alternative approaches for this problem. Badly needed.
The thing you are mixing up is, you are taking second slot's start time as first slot's end time. So rather then doing that, what you can do is to store start time and duration.
And to simply compute the end time, you do
StartTime.AddMinutes(30);
And to add one more comment at end; you are trying to build a very rigid structure. And will face problems if you'd try to extend the application, IMHO.
I suggest, Instead of using Single Dimensional Array, use Multidimensional array like
[[9:00][9:30],[9:30][10:00],[10:00][10:30], .... nth Item]
Here, Logic should be like this
var start=[start time]
var end=[end time]
var duration=[duration]
for (i=start;i<end;i+=duration)
{
if(start==A[i][0])
remove(A[i][0]);
}
A.sort();
return A;

Counter of type RateOfCountsPerSecond32 always shows 0

I have a windows service that serves messages of some virtual queue via a WCF service interface.
I wanted to expose two performance counters -
The number of items on the queue
The number of items removed from the queue per second
The first one works fine, the second one always shows as 0 in PerfMon.exe, despite the RawValue appearing to be correct.
I'm creating the counters as such -
internal const string PERF_COUNTERS_CATEGORY = "HRG.Test.GDSSimulator";
internal const string PERF_COUNTER_ITEMSINQUEUE_COUNTER = "# Messages on queue";
internal const string PERF_COUNTER_PNR_PER_SECOND_COUNTER = "# Messages read / sec";
if (!PerformanceCounterCategory.Exists(PERF_COUNTERS_CATEGORY))
{
System.Diagnostics.Trace.WriteLine("Creating performance counter category: " + PERF_COUNTERS_CATEGORY);
CounterCreationDataCollection counters = new CounterCreationDataCollection();
CounterCreationData numberOfMessagesCounter = new CounterCreationData();
numberOfMessagesCounter.CounterHelp = "This counter provides the number of messages exist in each simulated queue";
numberOfMessagesCounter.CounterName = PERF_COUNTER_ITEMSINQUEUE_COUNTER;
numberOfMessagesCounter.CounterType = PerformanceCounterType.NumberOfItems32;
counters.Add(numberOfMessagesCounter);
CounterCreationData messagesPerSecondCounter= new CounterCreationData();
messagesPerSecondCounter.CounterHelp = "This counter provides the number of messages read from the queue per second";
messagesPerSecondCounter.CounterName = PERF_COUNTER_PNR_PER_SECOND_COUNTER;
messagesPerSecondCounter.CounterType = PerformanceCounterType.RateOfCountsPerSecond32;
counters.Add(messagesPerSecondCounter);
PerformanceCounterCategory.Create(PERF_COUNTERS_CATEGORY, "HRG Queue Simulator performance counters", PerformanceCounterCategoryType.MultiInstance,counters);
}
Then, on each service call, I increment the relevant counter, for the per/sec counter this currently looks like this -
messagesPerSecCounter = new PerformanceCounter();
messagesPerSecCounter.CategoryName = QueueSimulator.PERF_COUNTERS_CATEGORY;
messagesPerSecCounter.CounterName = QueueSimulator.PERF_COUNTER_PNR_PER_SECOND_COUNTER;
messagesPerSecCounter.MachineName = ".";
messagesPerSecCounter.InstanceName = this.ToString().ToLower();
messagesPerSecCounter.ReadOnly = false;
messagesPerSecCounter.Increment();
As mentioned - if I put a breakpoint after the call to increment I can see the RawValue constantly increasing, in consistence with the calls to the service (fairly frequently, more than once a second, I would think)
But the performance counter itself stays on 0.
The performance counter providing the count of items on the 'queue', which is implemented in the same way (although I assign the RawValue, rather than call Increment) works just fine.
What am I missing?
I also initially had problems with this counter. MSDN has a full working example that helped me a lot:
http://msdn.microsoft.com/en-us/library/4bcx21aa.aspx
As their example was fairly long winded, I boiled it down to a single method to demonstrate the bare essentials. When run, I see the expected value of 10 counts per second in PerfMon.
public static void Test()
{
var ccdc = new CounterCreationDataCollection();
// add the counter
const string counterName = "RateOfCountsPerSecond64Sample";
var rateOfCounts64 = new CounterCreationData
{
CounterType = PerformanceCounterType.RateOfCountsPerSecond64,
CounterName = counterName
};
ccdc.Add(rateOfCounts64);
// ensure category exists
const string categoryName = "RateOfCountsPerSecond64SampleCategory";
if (PerformanceCounterCategory.Exists(categoryName))
{
PerformanceCounterCategory.Delete(categoryName);
}
PerformanceCounterCategory.Create(categoryName, "",
PerformanceCounterCategoryType.SingleInstance, ccdc);
// create the counter
var pc = new PerformanceCounter(categoryName, counterName, false);
// send some sample data - roughly ten counts per second
while (true)
{
pc.IncrementBy(10);
System.Threading.Thread.Sleep(1000);
}
}
I hope this helps someone.
When you are working with Average type Performance Counters there are two components - a numerator and a denominator. Because you are working with an average the counter is calculated as 'x instances per y instances'. In your case you are working out 'number items' per 'number of seconds'. In other words, you need to count both how many items you take out of the queue and how many seconds they take to be removed.
The Average type Performance Counters actually create two counters - a numerator component called {name} and a denominator component called {name}Base. If you go to the Performance Counter snap-in you can view all the categories and counters; you can check the name of the Base counter. When the queue processing process is started, you should
begin a stopwatch
remove item(s) from the queue
stop the stopwatch
increment the {name} counter by the number of items removed from the queue
increment the {name}Base counter by the number of ticks on the stopwatch
The counter is supposed to automatically know to divide the first counter by the second to give the average rate. Check CodeProject for a good example of how this works.
It's quite likely that you don't want this type of counter. These Average counters are used to determine how many instances happen per second of operation; e.g. the average number of seconds it takes to complete an order, or to do some complex transaction or process. What you may want is an average number of instances in 'real' time as opposed to processing time.
Consider if you had 1 item in your queue, and it took 1ms to remove, that's a rate of 1000 items per second. But after one second you have only removed 1 item (because that's all there is) and so you are processing 1 item per second in 'real' time. Similarly, if there are a million items in the queue but you've only processed one because your server is busy doing some other work, do you want to see 1000 items / second theoretical or 1 item / second real?
If you want this 'real' figure, as opposed to the theoretical throughput figure, then this scenario isn't really suited to performance counters - instead you need to know a start time, and end time, and a number of items processed. It can't really be done with a simple 'counter'. Instead you would store a system startup time somewhere, and calculate (number of items) / (now - startup time).
I had the same problem. In my testing, I believe I was seeing the issue was some combination of multi instance and rate of count per second. If I used single instance or a number of items counter it worked. Something about that combination of multi instance and rate per second caused it to be always zero.
As Performance counter of type RateOfCountsPerSecond64 has always the value 0 mentions, a reboot may do the trick. Worked for me anyway.
Another thing that worked for me was this initializing the counter in a block like this:
counter.BeginInit();
counter.RawValue = 0;
counter.EndInit();
I think you need some way to persist the counter. It appears to me that each time the service call is initiated then the counter is recreated.
So you could save the counter to a DB, flat file, or perhaps even a session variable if you wanted it unique to a user.

Categories

Resources