Im trying to improve my model from ML.NET 0.5 to 0.6 and i have a question.
I Copy-paste example from ML.NET Cookbook that says:
// Create a new environment for ML.NET operations. It can be used for
exception tracking and logging,
// as well as the source of randomness.
var env = new LocalEnvironment();
// Create the reader: define the data columns and where to find them in the
text file.
var reader = TextLoader.CreateReader(env, ctx => (
// We read the first 11 values as a single float vector.
FeatureVector: ctx.LoadFloat(0, 10),
// Separately, read the target variable.
Target: ctx.LoadFloat(11)
),
// Default separator is tab, but we need a comma.
separator: ',');
// Now read the file (remember though, readers are lazy, so the actual
reading will happen when the data is accessed).
var data = reader.Read(new MultiFileSource(dataPath));
So i started to implementing it into my model:
using System;
using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Transforms;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Models;
using Microsoft.ML.Runtime.Data;
public static PredictionModel<CancerData, CancerPrediction> Train()
{
var pipeline = new LearningPipeline();
//0.6 way to upload data into model
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadText(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
//pipeline.Add(new TextLoader("Cancer-Train.csv").CreateFrom<CancerData>(useHeader: true, separator: ';'));
pipeline.Add(new Dictionarizer(("Diagnosis", "Label")));
pipeline.Add(data); //dont work, i just write it to show you what i want to do
//below the 0.5 way to load data into pipeline!
//pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
// "RadiusMean",
// "TextureMean",
// .. and so on...
// "SymmetryWorst",
// "FractalDimensionWorst"));
pipeline.Add(new StochasticDualCoordinateAscentBinaryClassifier());
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
PredictionModel<CancerData, CancerPrediction> model = pipeline.Train<CancerData, CancerPrediction>();
model.WriteAsync(modelPath);
return model;
}
The question is, how to add var data into my exisitng pipeline? What i need to do, to var data from ver 0.6 works on 0.5 pipeline?
I don't think the LearningPipeline APIs are compatible with the new static typing APIs (e.g. TextLoader.CreateReader). The cookbook helps to show the new APIs for training and also other scenarios like using the model for predictions. This test might also be helpful for binary classification.
For your code specifically, I believe the training code would look something like:
var env = new LocalEnvironment();
var reader = Microsoft.ML.Runtime.Data.TextLoader.CreateReader(env, ctx => (
FeatureVector: ctx.LoadFloat(0, 30),
Target: ctx.LoadBool(31)
),
separator: ';');
var data = reader.Read(new MultiFileSource("Cancer-Train.csv"));
BinaryClassificationContext bcc = new BinaryClassificationContext(env);
var estimator = reader.MakeNewEstimator()
.Append(row => (
label: row.Target,
features: row.FeatureVector.Normalize()))
.Append(row => (
row.label,
score: bcc.Trainers.Sdca(row.label, row.features)))
.Append(row => (
row.label,
row.score,
predictedLabel: row.score.predictedLabel));
var model = estimator.Fit(data);
Related
i have a test where i will be comparing two objects.
i am open to know whats the best way to do it.
i have created something for which i have an issue that needs help.
following code has an object property that needs to be present
i would like to assert that all fields to be present except the id property.
i feel like the last 5 statements feel inappropriate, if there is a clearer way of doing it, i would like to know.
[Fact]
public void CreateTransaction_AddFirstTransaction_ShouldUpdateTransactionJson()
{
// Arrange
var mockFileSystem = new MockFileSystem();
var buyCrypto = new BuyCrypto(mockFileSystem);
var bitcoin = new Currency()
{
name = "bitcoin",
code = "btc",
price = 10
};
// Act
buyCrypto.CreateTransaction(true, bitcoin, 10);
//Assert
var result = JsonSerializer
.Deserialize<List<Transaction>>(mockFileSystem.GetFile(TransactionJson).TextContents);
Assert.Equal("bitcoin", result[0].currency);
Assert.Equal(DateTime.Now.ToString(), result[0].dateTime);
Assert.Equal("TestName", result[0].name);
Assert.Equal(10, result[0].quantity);
Assert.Equal(100, result[0].total);
}
I love using Fluent Assertions for these kinds of tests (docs). You could do something like this:
// Arrange
// ... other stuff
var expectedTransaction = new Transaction {
currency = "bitcoin",
dateTime = DateTime.Now.ToString(),
name = "TestName",
quantity = 10,
total = 100 };
// Act
// ...
// Assert
result[0].Should().BeEquivalentTo(expectedTransaction, options => options.Excluding(t => t.Id));
I have tried the below code snippet in a .NET console app and able to see the forecasted predictions.
var context = new MLContext();
DatabaseLoader loader = context.Data.CreateDatabaseLoader<TimeSeriesInput>();
Entities db = new Entities();
string query = "select cast([Date] as datetime) [Timestamp],cast(Energy as real) [Data] from _energy_hourly_for_ml";
var mldata = db.Database.SqlQuery<TimeSeriesInput>(query).AsEnumerable();
var data = context.Data.LoadFromEnumerable(mldata);
//_energy_hourly_for_ml : new table
var pipeline = context.Forecasting.ForecastBySsa(
nameof(TimeSeriesOutput.Forecast),
nameof(TimeSeriesInput.Data),
windowSize: 5,
seriesLength: 10,
trainSize: 100,
horizon: 4); //no of next set of output predictions
var model = pipeline.Fit(data);
var forecastingEngine = model.CreateTimeSeriesEngine<TimeSeriesInput, TimeSeriesOutput>(context);
var forecasts = forecastingEngine.Predict();
Now I want to save entire above code in database. What I want to do is:
fetch the above code from database
execute it dynamically
fetch the forecasted predictions as output from previous step execution
display it on the view
Let me know for any ref pointers on this please.
Using ElasticSearch NEST .Net package 7.13.2 in Visual Studio 2019
For a list of products I am currently updating existing documents in my product index by using the following code:
var productIndex = "productindex";
foreach (var product in products)
{
productClassIdScript = $"ctx._source.productClassId = \"{product.ProductClassId}\"; ";
elasticClient.Update<productIndex, object>(product.Id,
q => q.Script(s => s.Source(productClassIdScript).Lang("painless")));
}
I do this for more than 10000 products and it takes about 2 hours.
I know I can insert new documents with the Bulk API.
Can I do the updates with the BulkAll method ?
Something like this:
var bulkAllObservable = elasticClient.BulkAll<Product>(myBulkAllRequest)
.Wait(TimeSpan.FromMinutes(15), next =>
{
// do something e.g. write number of pages to console
});
How should I construct myBulkAllRequest ?
Any help is much appreciated.
Bulk index will drastically reduce your indexing / updating time, so this is a good way to go.
You can still use BulkAll for updates, in case elasticsearch already has
document with provided id, the document will be updated.
var bulk = elasticClient.BulkAll<EsDocument>(new List<EsDocument> { new EsDocument { Id = "1", Name = "1" }}, d => d);
using var subscribe = bulk.Subscribe(new BulkAllObserver(onNext: response => Console.WriteLine("inserted")));
bulk.Wait(TimeSpan.FromMinutes(1), response => Console.WriteLine("Bulk insert done"));
var bulk2 = elasticClient.BulkAll<EsDocument>(new List<EsDocument> { new EsDocument { Id = "1", Name = "1_updated" }}, d => d);
using var subscribe2 = bulk2.Subscribe(new BulkAllObserver(onNext: response => Console.WriteLine("inserted")));
bulk2.Wait(TimeSpan.FromMinutes(1), response => Console.WriteLine("Bulk insert done"));
First BulkAll will insert document with Id "1" second, will update document with Id "1".
Index state after the first bulkd
and after second one
I am using c# AutoML to train a model for Regression and i cant see the Rsquared or MeanError for any of the algorithms
//loading train data through Text Loader
var trainData = loader.Load(_filePath);
Console.WriteLine("created train data");
var settings = new RegressionExperimentSettings
{
MaxExperimentTimeInSeconds = 10,
//OptimizingMetric = RegressionMetric.MeanAbsoluteError
};
var progress = new Progress<RunDetail<RegressionMetrics>>(p =>
{
if (p.ValidationMetrics != null)
{
Console.WriteLine($"Current Result - {p.TrainerName}, {p.ValidationMetrics.RSquared}, {p.ValidationMetrics.MeanAbsoluteError}");
}
});
var experiment = context.Auto().CreateRegressionExperiment(settings);
//find best model
var labelColumnInfo = new ColumnInformation()
{
LabelColumnName = "median_house_value"
};
var result = experiment.Execute(trainData, labelColumnInfo, progressHandler: progress);
Console.WriteLine(Environment.NewLine);
Console.WriteLine("Best run:");
Console.WriteLine($"Trainer name - {result.BestRun.TrainerName}");
Console.WriteLine($"RSquared - {result.BestRun.ValidationMetrics.RSquared}");
Console.WriteLine($"MAE - {result.BestRun.ValidationMetrics.MeanAbsoluteError}");
Console.ReadLine();
When i run the console application the outputs i get are 0 -infinite or not a number
I've gotten similar results when my dataset has been too small.
AutoML is using 10 fold cross validation if I recall correctly. This can lead to the test data set being too small to get any usable metrics out of it.
So if your dataset is small, you could try with a bigger one and see if it gets better metrics, at least to rule out that case.
We need to log specific action in MVC 3 project. A database table will store logs something like:
"User [SessionUserHere...] changed [Name][Lastname][OtherAttributesHere...] values of [ChangedEmployeeHere...]"
I need to learn which attributes of a model changed and which ones keep their original values.
Is there any way to track which attributes of a Model changed?
In MVC3 doing Audit trail a database trigger is proposed; but we use Sql Server Compact for this project.
Thanks...
Have you had look at the INotifyPropertyChanged interface?
http://msdn.microsoft.com/en-us/library/ms743695.aspx
You could do the audit in code rather than the database. In the HttpPost handler, get the original value and compare the objects using an auditing function. I have a home brew implementation, but this does a similar thing Compare .NET Objects.
It means you can do the following:
var original = GetDatabaseRecord(xx);
var newRec = GetFormSubmission(); // However you do this
var auditor = new CompareObjects();
if ( auditor.Compare(original, newRec) )
{
foreach ( var diff in auditor.Differences )
{
// work through the deltas
}
}
else
{
// Nothing changed!
}
My own version returns a structure of:
Name (such as "Issue.Priority")
Change (ChangedValue, OnlyLeft, OnlyRight)
Old Value
New Value
The link provided may suffice for you or act as a starting point.
I created a library to do just this and provide some additional metadata. It relies on MVC ModelMetadata and DataAnnotations to provide a "readable version" of the diff for non technical users.
https://github.com/paultyng/ObjectDiff
Given objects like (no metadata obviously):
var before = new
{
Property1 = "",
MultilineText = "abc\ndef\nghi",
ChildObject = new { ChildProperty = 7 },
List = new string[] { "a", "b" }
};
var after = new
{
Property1 = (string)null,
MultilineText = "123\n456",
NotPreviouslyExisting = "abc",
ChildObject = new { ChildProperty = 6 },
List = new string[] { "b", "c" }
};
It would output something like:
ChildObject - ChildProperty: '6', was '7'
List - [2, added]: 'c', was not present
List - [removed]: No value present, was 'a'
MultilineText:
-----
123
456
-----
was
-----
abc
def
ghi
-----
NotPreviouslyExisting: 'abc', was not present