Performance issues with Frame windowing vs Series - c#

I am using Deedle from c# and windowing through a frame is very slow compared with the same operation on a series. For example, for a series and frame with a similar size I am seeing 60ms vs 3500ms (series vs. frame).
Has anyone seen this before ?
var msftRaw = Frame.ReadCsv(#"C:\Users\olivi\source\repos\ConsoleApp\MSFT.csv");
var msft = msftRaw.IndexRows<DateTime>("Date").SortRowsByKey();
var rollingFrame = msft.Window(60); // 7700 ms
var openSeries = msft.GetColumn<double>("Open");
var rollingSeries = openSeries.Window(60); // 14 ms
var oneSeriesFrame = Frame.FromColumns(new Dictionary<string, Series<DateTime, double>> { { "Open", openSeries } });
var rollingFakeFrame = oneSeriesFrame.Window(60); // 3300mm
This is quite a common operation when working with financial time series data, for example calculating rolling correlation between prices, or calculating rolling realized volatility when there is a condition on another price time series.

I found a workaround for the performance issue: perform the rolling operation on each of the series individually, join the rolling series in a frame so they are aligned by date and write the processing function on the frame, selecting each series inside the processing function.
Continuing from the example above:
private static double CalculateRealizedCorrelation(ObjectSeries<string> objectSeries)
{
var openSeries = objectSeries.GetAs<Series<DateTime, double>>("Open");
var closeSeries = objectSeries.GetAs<Series<DateTime, double>>("Close");
return MathNet.Numerics.Statistics.Correlation.Pearson(openSeries.Values, closeSeries.Values);
}
var rollingAgg = new Dictionary<string, Series<DateTime, Series<DateTime, double>>>();
foreach (var column in msft.ColumnKeys)
{
rollingAgg[column] = msft.GetColumn<double>(column);
}
var rollingDf = Frame.FromColumns(rollingAgg);
var rolingCorr = rollingDf.Rows.Select(kvp => CalculateRealizedCorrelation(kvp.Value));

Related

c# AutoML Regression training wont show Rsquared or MeanAbsoluteError

I am using c# AutoML to train a model for Regression and i cant see the Rsquared or MeanError for any of the algorithms
//loading train data through Text Loader
var trainData = loader.Load(_filePath);
Console.WriteLine("created train data");
var settings = new RegressionExperimentSettings
{
MaxExperimentTimeInSeconds = 10,
//OptimizingMetric = RegressionMetric.MeanAbsoluteError
};
var progress = new Progress<RunDetail<RegressionMetrics>>(p =>
{
if (p.ValidationMetrics != null)
{
Console.WriteLine($"Current Result - {p.TrainerName}, {p.ValidationMetrics.RSquared}, {p.ValidationMetrics.MeanAbsoluteError}");
}
});
var experiment = context.Auto().CreateRegressionExperiment(settings);
//find best model
var labelColumnInfo = new ColumnInformation()
{
LabelColumnName = "median_house_value"
};
var result = experiment.Execute(trainData, labelColumnInfo, progressHandler: progress);
Console.WriteLine(Environment.NewLine);
Console.WriteLine("Best run:");
Console.WriteLine($"Trainer name - {result.BestRun.TrainerName}");
Console.WriteLine($"RSquared - {result.BestRun.ValidationMetrics.RSquared}");
Console.WriteLine($"MAE - {result.BestRun.ValidationMetrics.MeanAbsoluteError}");
Console.ReadLine();
When i run the console application the outputs i get are 0 -infinite or not a number
I've gotten similar results when my dataset has been too small.
AutoML is using 10 fold cross validation if I recall correctly. This can lead to the test data set being too small to get any usable metrics out of it.
So if your dataset is small, you could try with a bigger one and see if it gets better metrics, at least to rule out that case.

Accumulate values in chart series - WPF DevExpress

I am creating several line series for a chart control in DevExpress at run-time. The series must be created at run-time since the number of series can vary from the data query I do. Here is how I create the series:
foreach (var item in lstSPCPrintID)
{
string seriesName = Convert.ToString(item);
LineSeries2D series = new LineSeries2D();
dxcSPCDiagram.Series.Add(series);
series.DisplayName = seriesName;
var meas = from x in lstSPCChart
where x.intSPCPrintID == item
select new { x.intSPCMeas };
foreach (var item2 in meas)
{
series.Points.Add(new SeriesPoint(item2.intSPCMeas));
}
}
This happens inside a backgroundworker completed event and all the data needed is in the appropriate lists. In the test instance I am running, 6 series are created.
Each series consists of some test measurements that I need in the x-axis. These measurements can be the same value (and are the same value in a lot of cases). What I want then is for the y-axis to contain the count of how many times a measurement is for example -21. This will in the end create a curve.
Right now I create a series point for each measurement, but I do not know how to handle the ArgumentDataMember/ValueDataMember in this specific scenario. Is there a way for the chart to automatically do the counting or do I need to do it manually? Can anyone help me back on track?
I ended up doing a distinct count of the measurements before adding the series points.
foreach (var item in lstSPCPrintID)
{
string seriesName = String.Format("Position: {0}", Convert.ToString(item));
LineStackedSeries2D series = new LineStackedSeries2D();
series.ArgumentScaleType = ScaleType.Numerical;
series.DisplayName = seriesName;
series.SeriesAnimation = new Line2DUnwindAnimation();
var meas = from x in lstSPCChart
where x.intSPCPrintID == item
select new { x.dblSPCMeas };
var measDistinctCount = meas.GroupBy(x => x.dblSPCMeas).Select(group => new { Meas = group.Key, Count = group.Count() }).OrderBy(y => y.Meas);
foreach (var item2 in measDistinctCount)
{
series.Points.Add(new SeriesPoint(item2.Meas, item2.Count));
}
dxcSPCDiagram.Series.Add(series);
series.Animate();
}

Optimize or Shard Large set of permutations

I have data set that I have generate every permutation, then check some properties on it to see if is an object that I want to keep and use. The number of permutations is staggering, in the quadrillions. Is there anything that you can see in the code below that I can use to speed this up? I suspect that I can't speed it up to a reasonable amount of time, so I'm also looking at possibly sharding it onto multiple servers to process, but I'm having a hard time deciding where to shard it.
Any opinions or ideas is appreciated.
var boats = _warMachineRepository.AllBoats();
var marines = _warMachineRepository.AllMarines();
var bombers = _warMachineRepository.AllBombers().ToList();
var carriers = _warMachineRepository.AllCarriers().ToList();
var tanks = _warMachineRepository.AllTanks().ToList();
var submarines = _warMachineRepository.AllSubmarines();
var armies = new List<Army>();
int processed = 0;
Console.WriteLine((long)boats.Count*marines.Count*bombers.Count*carriers.Count*tanks.Count*submarines.Count);
// 70k of these
Parallel.ForEach(boats, new ParallelOptions(){MaxDegreeOfParallelism = Environment.ProcessorCount},boat =>
{
// 7500 of these
foreach (var marine in marines)
{
// 200 of these
foreach (var bomber in bombers)
{
// 200 of these
foreach (var carrier in carriers)
{
// 400 of these
foreach (var tank in tanks)
{
// 50 of these
foreach (var submarine in submarines)
{
var lineup = new Army()
{
Tank = tank,
Submarine = submarine,
Carrier = carrier,
Marine = marine,
Bomber = bomber,
Boats = boat
};
if (army.Hitpoints > 50000)
{
lock (lockObject)
{
armies.Add(lineup);
}
}
processed++;
if (processed%10000000 == 0)
{
Console.WriteLine("Processed: {0}, valid: {1}, DateTime: {2}", processed, armies.Count, DateTime.Now);
}
}
}
}
}
}
});
return armies;
If this code is referring to a simulation you might want to add some optimizations by:
Mark an object as changed (put it in a list) when it changes so there is no need to search multiple times
Decrease/throttle/tune the object update frequency
Use other information available to filter objects: are objects close to one another so they might affect/hurt/heal each other -> only then investigate changes
Change the data structure; by putting all attributes of all objects in a smartly setup matrix you might be able to use simple matrix multiplication to have the object interact. You might even be able to offload the multiplication to the GPU
You might be asking too much: so scale out by using more nodes/machines.

Microsoft .NET Charting components break doing a Simple Moving Average on a Stock chart

I creating a simple Microsoft Forms application that will display a stock chart using the Microsoft .Net Chart Controls Library. I can successfully display the stock action using LINQ to SQL:
IEnumerable<Quote> quotes = Store.Quotes.Where(q => q.Ticker == "MSFT" && q.Date > DateTime.Parse("7/01/2013"));
MainChart.Series["SeriesTop"].Points.Clear();
MainChart.Series["SeriesTop"].Points.DataBindXY(quotes, "Date", quotes, "Low,High,Open,Close");
MainChart.Series["SeriesTop"].LegendText = quotes.First().Ticker;
However if I add a Simple Moving average I get a big red X instead of a Chart, no exception or other message that would help. This is the line of code that breaks it:
MainChart.DataManipulator.FinancialFormula(FinancialFormula.MovingAverage, "5", "SeriesTop", "AvgTop");
I use visual studio to examine the contents of the "AvgTop" series it looks ok to me, but the chart won't display.
Thanks,
Ken
I played around with DataManipulator.InsertEmptyPoints (as per http://msdn.microsoft.com/en-us/library/dd456677.aspx) and the big red X went away but the the weekends got filled with empty data and I didn't want that, I wanted the weekends (and other non-trading days) to disappear from the graph (see Series.IsXValueIndexed = true). So I rolled my own method to align the two data series:
public static void AlignSeries(Series seriesA, Series seriesB)
{
var aligned = seriesA.Points.GroupJoin(seriesB.Points, a => a.XValue, b => b.XValue, (a, b) => new { a = a, b = b.SingleOrDefault() }).ToArray();
DataPointCollection bCollection = seriesB.Points;
bCollection.Clear();
foreach (var pair in aligned)
{
DataPoint bPoint = new DataPoint();
bPoint.XValue = pair.a.XValue;
if (null != pair.b)
{
bPoint.YValues = pair.b.YValues;
}
else
{
bPoint.IsEmpty = true;
}
bCollection.Add(bPoint);
}
}
I would certainly hope that someone with more wisdom than me could recommend a better approach or an API call that I missed.

How can I create an even distribution of objects with different weights (scheduling)?

Here is my problem in English:
I've got several WidgetContainer objects.
Each WidgetContainer will have at least one Widget.
Each WidgetContainer wants to display one of its Widgets n amount of times per day.
Widgets could be displayed on 'x' number of Venues.
A Widget is displayed for exactly t seconds before the next scheduled WidgetContainer's Widget takes its place.
If the entire day's is not filled up then nothing should be displayed during those times (ads should be evenly dispersed throughout the day t seconds at a time)
And here are the objects represented by pseudo code:
var WidgetContainers = [
{
DailyImpressionsRequired: 52, // should be split between Venues
Widgets: ["one", "two"],
Venues: ["here", "there"]
},
{
DailyImpressionsRequired: 20,
Widgets: ["foo"],
Venues: ["here", "there", "everywhere"]
},
{
DailyImpressionsRequired: 78,
Widgets: ["bar", "bat", "heyhey!"],
Venues: ["up", "down", "allAround"]
}
];
var SecondsInADay = 86400;
var DisplayInterval = 30; // seconds
var TotalNumverOrVenues = /*eh, some calulations...*/;
var AvailableSlots = /*eh, some calulations...*/;
var SlotsNeeded = /*eh, some calulations...*/;
I need to find an efficient way of calculating an evenly distributed schedule for these objects. These "objects" are linq-to-sql objects so some linq suggestions would be nice
My idea right now is to flatten the WidgetContainers to their Widgets; dividing their DailyImpressions by the number of Widgets.
I could figure it out easily if there weren't multiple and differing Venues to take into account.
I have a feeling I just need to see someone else's perspective on the problem since I've been staring at is so long.
So, any help that could possibly point me in the right direction or provide some perspective on the problem, even if it is obvious, would be greatly appreciated!
Based on that lot, if I've understood, this should give you correct answers:
static void Main(string[] args)
{
List<WidgetContainer> data = new List<WidgetContainer>();
data.Add(new WidgetContainer {
Widgets = new List<String> {"one","two"},
Venues = new List<String>{"here","there"},
DailyImpressionsRequired=52});
data.Add(new WidgetContainer {
Widgets = new List<String> {"foo"},
Venues = new List<String>{"here","there","everywhere"},
DailyImpressionsRequired=20});
data.Add(new WidgetContainer {
Widgets = new List<String> {"bar","bat", "heyhey!"},
Venues = new List<String>{"up","down", "allAround"},
DailyImpressionsRequired=78});
var SecondsInADay = 86400;
var DisplayInterval = 30; // seconds
var TotalNumverOfVenues = data.SelectMany(x=> x.Venues).Distinct().Count();
var AvailableSlots = SecondsInADay * data.SelectMany(x=> x.Venues).Distinct().Count() / DisplayInterval ; //assuming you didn't already have the count as a variable - will re-evaluate so don't use this for real!
//var AvailableSlots = SecondsInADay * TotalNumverOfVenues / DisplayInterval ; //the better way - avoids recalculating count
var SlotsNeeded = data.Sum(x => x.DailyImpressionsRequired);
}

Categories

Resources