I am using c# AutoML to train a model for Regression and i cant see the Rsquared or MeanError for any of the algorithms
//loading train data through Text Loader
var trainData = loader.Load(_filePath);
Console.WriteLine("created train data");
var settings = new RegressionExperimentSettings
{
MaxExperimentTimeInSeconds = 10,
//OptimizingMetric = RegressionMetric.MeanAbsoluteError
};
var progress = new Progress<RunDetail<RegressionMetrics>>(p =>
{
if (p.ValidationMetrics != null)
{
Console.WriteLine($"Current Result - {p.TrainerName}, {p.ValidationMetrics.RSquared}, {p.ValidationMetrics.MeanAbsoluteError}");
}
});
var experiment = context.Auto().CreateRegressionExperiment(settings);
//find best model
var labelColumnInfo = new ColumnInformation()
{
LabelColumnName = "median_house_value"
};
var result = experiment.Execute(trainData, labelColumnInfo, progressHandler: progress);
Console.WriteLine(Environment.NewLine);
Console.WriteLine("Best run:");
Console.WriteLine($"Trainer name - {result.BestRun.TrainerName}");
Console.WriteLine($"RSquared - {result.BestRun.ValidationMetrics.RSquared}");
Console.WriteLine($"MAE - {result.BestRun.ValidationMetrics.MeanAbsoluteError}");
Console.ReadLine();
When i run the console application the outputs i get are 0 -infinite or not a number
I've gotten similar results when my dataset has been too small.
AutoML is using 10 fold cross validation if I recall correctly. This can lead to the test data set being too small to get any usable metrics out of it.
So if your dataset is small, you could try with a bigger one and see if it gets better metrics, at least to rule out that case.
Related
I have 99% confidence machinelearning structure on ml.net, however it shows wrong anomaly.
I use the below code to detect it:
var mlcontext = new MLContext();
var dataView = mlcontext.Data.LoadFromEnumerable(list);
string outputColumnName = nameof(IidSpikePrediction.Prediction);
string inputColumnName = nameof(TimeSeriesData.VALUE);
var transformedData = mlcontext.Transforms.DetectIidSpike(outputColumnName, inputColumnName, 99, list.Count / 4).Fit(dataView).Transform(dataView);
var predictionColumn = mlcontext.Data.CreateEnumerable<IidSpikePrediction>(transformedData, reuseRowObject: false);
Here is the result.
The count pass to 6 from 5 but it says that this is alert but it is not.
How can we explain this issue?
I use an e-commerce SOAP API, forgive me if it's too confusing to answer. I just want to know if I'm just making a stupid mistake before I write a ticket to API maintainers.
public void WriteXML()
{
var results = GetProductsFromApi().results.ToList();
using (var file = File.Create(#"products.xml"))
{
var list = new List<Product>();
var writer = new XmlSerializer(typeof(List<Product>));
foreach (var result in results)
{
var newProduct = new Product
{
Id = result.productId,
Index = result.productDisplayedCode,
Stock = result.productStocksData.productStocksQuantities[0].productSizesData[0].productSizeQuantity,
IsIgnored = false,
IsInDelivery = false
};
list.Add(newProduct);
}
writer.Serialize(file, list);
}
}
I made a request to the API and I want to store it in a list, so each result gets serialized into an XML later. The code above works if I set Stock to something like Stock = 1, - however if left as it is, the program quits with Unhandled exception: System.IndexOutOfRangeException.
What's weird though is that if I do something similar to the following before building the Product object:
Console.WriteLine(result.productStocksData.productStocksQuantities[0].productSizesData[0].productSizeQuantity);
...I am met with a correct API response.
I have no clue what's going on. I tried checking if the result is null before constructing Product, but it didn't help. The whole exception is so confusing to me I don't really know where to start looking.
Edit: Using Fildor's proposed code, I wrote this:
float? lol = result.productStocksData.productStocksQuantities.FirstOrDefault().productSizesData.FirstOrDefault()?.productSizeQuantity ?? null; //doesn't actually change anything if it's null or 0
if (lol.GetValueOrDefault() == 0) {
lol = 1;
};
var newProduct = new Product
{
Id = result.productId,
Index = result.productDisplayedCode,
Stock = (float)lol,
IsIgnored = false,
IsInDelivery = false
};
Console.WriteLine("Processed product has a stock of " + lol);
list.Add(newProduct);
It now results in System.NullReferenceException. It responds with actual stock size, and said error appears after.
I am using Deedle from c# and windowing through a frame is very slow compared with the same operation on a series. For example, for a series and frame with a similar size I am seeing 60ms vs 3500ms (series vs. frame).
Has anyone seen this before ?
var msftRaw = Frame.ReadCsv(#"C:\Users\olivi\source\repos\ConsoleApp\MSFT.csv");
var msft = msftRaw.IndexRows<DateTime>("Date").SortRowsByKey();
var rollingFrame = msft.Window(60); // 7700 ms
var openSeries = msft.GetColumn<double>("Open");
var rollingSeries = openSeries.Window(60); // 14 ms
var oneSeriesFrame = Frame.FromColumns(new Dictionary<string, Series<DateTime, double>> { { "Open", openSeries } });
var rollingFakeFrame = oneSeriesFrame.Window(60); // 3300mm
This is quite a common operation when working with financial time series data, for example calculating rolling correlation between prices, or calculating rolling realized volatility when there is a condition on another price time series.
I found a workaround for the performance issue: perform the rolling operation on each of the series individually, join the rolling series in a frame so they are aligned by date and write the processing function on the frame, selecting each series inside the processing function.
Continuing from the example above:
private static double CalculateRealizedCorrelation(ObjectSeries<string> objectSeries)
{
var openSeries = objectSeries.GetAs<Series<DateTime, double>>("Open");
var closeSeries = objectSeries.GetAs<Series<DateTime, double>>("Close");
return MathNet.Numerics.Statistics.Correlation.Pearson(openSeries.Values, closeSeries.Values);
}
var rollingAgg = new Dictionary<string, Series<DateTime, Series<DateTime, double>>>();
foreach (var column in msft.ColumnKeys)
{
rollingAgg[column] = msft.GetColumn<double>(column);
}
var rollingDf = Frame.FromColumns(rollingAgg);
var rolingCorr = rollingDf.Rows.Select(kvp => CalculateRealizedCorrelation(kvp.Value));
Im trying to improve my model from ML.NET 0.5 to 0.6 and i have a question.
I Copy-paste example from ML.NET Cookbook that says:
// Create a new environment for ML.NET operations. It can be used for
exception tracking and logging,
// as well as the source of randomness.
var env = new LocalEnvironment();
// Create the reader: define the data columns and where to find them in the
text file.
var reader = TextLoader.CreateReader(env, ctx => (
// We read the first 11 values as a single float vector.
FeatureVector: ctx.LoadFloat(0, 10),
// Separately, read the target variable.
Target: ctx.LoadFloat(11)
),
// Default separator is tab, but we need a comma.
separator: ',');
// Now read the file (remember though, readers are lazy, so the actual
reading will happen when the data is accessed).
var data = reader.Read(new MultiFileSource(dataPath));
So i started to implementing it into my model:
using System;
using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Transforms;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Models;
using Microsoft.ML.Runtime.Data;
public static PredictionModel<CancerData, CancerPrediction> Train()
{
var pipeline = new LearningPipeline();
//0.6 way to upload data into model
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadText(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
//pipeline.Add(new TextLoader("Cancer-Train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
pipeline.Add(data); //dont work, i just write it to show you what i want to do
//below the 0.5 way to load data into pipeline!
//pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
// "RadiusMean",
// "TextureMean",
// .. and so on...
// "SymmetryWorst",
// "FractalDimensionWorst"));
pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();
model.WriteAsync(modelPath);
return model;
}
The question is, how to add var data into my exisitng pipeline? What i need to do, to var data from ver 0.6 works on 0.5 pipeline?
I don't think the LearningPipeline APIs are compatible with the new static typing APIs (e.g. TextLoader.CreateReader). The cookbook helps to show the new APIs for training and also other scenarios like using the model for predictions. This test might also be helpful for binary classification.
For your code specifically, I believe the training code would look something like:
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadBool(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
BinaryClassificationContext bcc = new BinaryClassificationContext(env);
var estimator = reader.MakeNewEstimator()
.Append(row => (
label: row.Target,
features: row.FeatureVector.Normalize()))
.Append(row => (
row.label,
score: bcc.Trainers.Sdca(row.label, row.features)))
.Append(row => (
row.label,
row.score,
predictedLabel: row.score.predictedLabel));
var model = estimator.Fit(data);
Here is my problem in English:
I've got several WidgetContainer objects.
Each WidgetContainer will have at least one Widget.
Each WidgetContainer wants to display one of its Widgets n amount of times per day.
Widgets could be displayed on 'x' number of Venues.
A Widget is displayed for exactly t seconds before the next scheduled WidgetContainer's Widget takes its place.
If the entire day's is not filled up then nothing should be displayed during those times (ads should be evenly dispersed throughout the day t seconds at a time)
And here are the objects represented by pseudo code:
var WidgetContainers = [
{
DailyImpressionsRequired: 52, // should be split between Venues
Widgets: ["one", "two"],
Venues: ["here", "there"]
},
{
DailyImpressionsRequired: 20,
Widgets: ["foo"],
Venues: ["here", "there", "everywhere"]
},
{
DailyImpressionsRequired: 78,
Widgets: ["bar", "bat", "heyhey!"],
Venues: ["up", "down", "allAround"]
}
];
var SecondsInADay = 86400;
var DisplayInterval = 30; // seconds
var TotalNumverOrVenues = /*eh, some calulations...*/;
var AvailableSlots = /*eh, some calulations...*/;
var SlotsNeeded = /*eh, some calulations...*/;
I need to find an efficient way of calculating an evenly distributed schedule for these objects. These "objects" are linq-to-sql objects so some linq suggestions would be nice
My idea right now is to flatten the WidgetContainers to their Widgets; dividing their DailyImpressions by the number of Widgets.
I could figure it out easily if there weren't multiple and differing Venues to take into account.
I have a feeling I just need to see someone else's perspective on the problem since I've been staring at is so long.
So, any help that could possibly point me in the right direction or provide some perspective on the problem, even if it is obvious, would be greatly appreciated!
Based on that lot, if I've understood, this should give you correct answers:
static void Main(string[] args)
{
List<WidgetContainer> data = new List<WidgetContainer>();
data.Add(new WidgetContainer {
Widgets = new List<String> {"one","two"},
Venues = new List<String>{"here","there"},
DailyImpressionsRequired=52});
data.Add(new WidgetContainer {
Widgets = new List<String> {"foo"},
Venues = new List<String>{"here","there","everywhere"},
DailyImpressionsRequired=20});
data.Add(new WidgetContainer {
Widgets = new List<String> {"bar","bat", "heyhey!"},
Venues = new List<String>{"up","down", "allAround"},
DailyImpressionsRequired=78});
var SecondsInADay = 86400;
var DisplayInterval = 30; // seconds
var TotalNumverOfVenues = data.SelectMany(x=> x.Venues).Distinct().Count();
var AvailableSlots = SecondsInADay * data.SelectMany(x=> x.Venues).Distinct().Count() / DisplayInterval ; //assuming you didn't already have the count as a variable - will re-evaluate so don't use this for real!
//var AvailableSlots = SecondsInADay * TotalNumverOfVenues / DisplayInterval ; //the better way - avoids recalculating count
var SlotsNeeded = data.Sum(x => x.DailyImpressionsRequired);
}