My Situation
I am attempting to create a neural network that classifies two types of signals (yes or no essentially) using ML.net. I have one set of data that maps to no and another that will map to yes. I hope to train the network with this data.
My Problem
Since my training data is supervised (I know the desired output), how do I "tell" the LoadFromTextFile function that all that data should map to "yes" (or 1 it doesn't matter)
My Question
In short, how do you train a network with supervised data(I know the desired output of my training data) in ML.Net?
My Data Model:
public class Analog
{
[LoadColumn(0, Global.SAMPLE_SIZE - 1)]
[VectorType(Global.SAMPLE_SIZE)]
public float[] DiscreteSignal { get; set; }
}
Loading code:
//Create MLContext
static MLContext mCont = new MLContext();
//Load Data
IDataView data = mCont.Data.LoadFromTextFile<Analog>("myYesSignalData.csv", separatorChar: ',', hasHeader: false);
ML.NET has support for loading multiple datasets into one IDataView, by using the MultiFileSource class:
var loader = mCont.Data.LoadFromTextFile<Analog>(separatorChar: ',', hasHeader: false);
IDataView data = loader.Load(new MultiFileSource("myYesSignalData.csv", "myNoSignalData.csv"));
However, I currently see no way to let the trainer know which examples are positive and which are negative other than to add a label column to both files: in the "yes" file add an all-ones column and in the "no" file add an all-zeros column. Then define the Analog class this way:
public class Analog
{
[LoadColumn(0, Global.SAMPLE_SIZE - 1)]
[VectorType(Global.SAMPLE_SIZE)]
public float[] DiscreteSignal { get; set; }
[LoadColumn(Global.SAMPLE_SIZE)]
public float Label { get; set; }
}
Adding the label column can be done with a simple C# program, such as this:
public class AnalogNoLabel
{
[LoadColumn(0, Global.SAMPLE_SIZE - 1)]
[VectorType(Global.SAMPLE_SIZE)]
public float[] DiscreteSignal { get; set; }
}
public void AddLabel(MLContext mCont)
{
IDataView data = mCont.Data.LoadFromTextFile<AnalogNoLabel>("myYesSignalData.csv", separatorChar: ',', hasHeader: false);
var pipeline = mCont.Transforms.CustomMapping<AnalogNoLabel, Analog>((input, output) => {
output.DiscreteSignal = input.DiscreteSignal;
output.Label = 1;
}, contractName: null);
IDataView dataWithLabel = pipeline.Fit(data).Transform(data);
using (var stream = new FileStream("myNewYesSignalData.txt", FileMode.Create))
mCont.Data.SaveAsText(dataWithLabel, stream);
}
and a similar script for "myNoSignalData.csv" with output.Label = 0 instead of output.Label = 1.
Related
I am trying to read an onnx file, and input my parameters as an array, and make a prediction.
this is what my code looks like
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.Transforms.Onnx;
using System.Collections.Generic;
namespace MachineLearning
{
public class MLPrediction
{
public static double PredictOutput(string onnxFilePath, float[] inputParameterValues)
{
// Set up the MLContext
var mlContext = new MLContext();
// Load the ONNX model
OnnxScoringEstimator estimator = mlContext.Transforms.ApplyOnnxModel(onnxFilePath);
// Create an instance of InputData
InputData inputData = new InputData
{
FeatureVector = inputParameterValues
};
// Create an IEnumerable with a single element: the InputData instance
IEnumerable<InputData> inputEnumerable = new InputData[] { inputData };
// Load the data from the IEnumerable
IDataView prediction = mlContext.Data.LoadFromEnumerable<InputData>(inputEnumerable);
var model = estimator.Fit(prediction);
// Return the requested output parameter
double output = (double)model.GetType().GetProperty("Target").GetValue(model);
return output;
}
class InputData
{
[VectorType]
public float[] FeatureVector { get; set; }
}
class OutputData
{
public double Target { get; set; }
}
}
}
however i get the error on "System.ArgumentOutOfRangeException: 'Could not find input column 'Target' '" when I try and create the model.
Would anyone be able to point me where I am going wrong?
Have tried adding a Target to the inputs, but I don't believe it should include this.
I am new to machine learning and ML.NET. I want to solve a task about Excel column identification.
Columns in Excel like 序号, 编号, 编码, 名称, 项目名称, and for each column, there is a corresponding field name as following:
Column_Field.csv
Column
FieldName
序号
OrdCode
编号
OrdCode
编码
OrdCode
名称
Name
项目名称
Name
Each field may have one or more than one column names, such as 序号, 编号, 编码 for OrdCode. And the task is to try to identify or find the corresponding field name for an incoming column.
Based on the above dataset, I use ML.NET, and want to predict the right field for columns that are read from an Excel file.
I use Naive Bayes algorithm. The code:
public class Program
{
private static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "Column_Field.csv");
private static void Main(string[] args)
{
MLContext mlContext = new MLContext();
IDataView dataView = mlContext.Data.LoadFromTextFile<ColumnInfo>(_dataPath, hasHeader: true, separatorChar: '\t');
var pipeline = mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "Label", outputColumnName: "Label")
.Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: "Column"))
.Append(mlContext.MulticlassClassification.Trainers.NaiveBayes())
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
var model = pipeline.Fit(dataView);
// evaluate
//List<ColumnInfo> dataForEvaluation = new List<ColumnInfo>()
//{
// new ColumnInfo{ Column="名称", FieldName="Name" },
// new ColumnInfo{ Column="<名称>", FieldName="Name" },
// new ColumnInfo{ Column="序号", FieldName="OrdName" },
//};
//IDataView testDataSet = mlContext.Data.LoadFromEnumerable(dataForEvaluation);
//var metrics = mlContext.MulticlassClassification.Evaluate(testDataSet);
//Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:P2}");
//Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:P2}");
// predict
var dataForPredictation = new List<ColumnInfo>();
dataForPredictation.Add(new ColumnInfo { Column = "名称" });
dataForPredictation.Add(new ColumnInfo { Column = "ABC" });
dataForPredictation.Add(new ColumnInfo { Column = "名" });
var engine = mlContext.Model.CreatePredictionEngine<ColumnInfo, Predication>(model);
foreach (var data in dataForPredictation)
{
var result = engine.Predict(data);
Console.WriteLine($"{data.Column}: \t{result.FieldName}");
}
Console.ReadLine();
}
}
public class ColumnInfo
{
[LoadColumn(0)]
public string Column { get; set; }
[LoadColumn(1), ColumnName("Label")]
public string FieldName { get; set; }
}
public class Predication
{
[ColumnName("PredictedLabel")]
public string FieldName { get; set; }
}
However, the result is not as expected.
Result:
名称: OrdCode
ABC: OrdCode
名: OrdCode
So what is wrong with the code? I suppose the problem may be lacking of proper processing the data in the pipeline before training.
Thanks.
I have been struggling to create the proper data structure for ML.net and get it to load into my application. Essentially, I have an application where the training data will be dynamic and the type and/or size will not be known prior to runtime. In addition, I have to convert the training data from a non-standard primitive types (ie. App_Bool, or App_Number... rather than simply using bool or double, etc.) So, this has been proving to be a problem as I try to convert my training data into a generic data type which can then be loaded from memory using the LoadFromEnumerable function.
I have four basic data type classes:
public class MLIntData
{
public MLIntData(string label, List<object> l)
{
Label = label;
foreach (App_Integer element in l)
Features.Add((int)element.Value);
}
public List<int> Features { get; set; } = new List<int>();
public string Label { get; set; } = "";
}
public class MLNumberData
{
public MLNumberData(string label, List<object> l)
{
Label = label;
foreach (App_Number element in l)
Features.Add((double)element.Value);
}
public List<double> Features { get; set; } = new List<double>();
public string Label { get; set; } = "";
}
public class MLBoolData
{
public MLBoolData(string label, List<object> l)
{
Label = label;
foreach (App_Boolean element in l)
Features.Add((bool)element.Value);
}
public List<bool> Features { get; set; } = new List<bool>();
public string Label { get; set; } = "";
}
public class MLTextData
{
public MLTextData(string label, List<object> l)
{
Label = label;
foreach (App_String element in l)
Features.Add(element.Value.ToString());
}
public List<string> Features { get; set; } = new List<string>();
public string Label { get; set; } = "";
}
So, each base class will contain a label for the data and then a list of features which will either be of type bool, double, int, or string.
Now, in my ML.net code I'm trying to load in the training data and then create an IDataView object of the data. First I loop through the input data (which is originally of the generic type object) then create the new classes of data.
List<object> data = new List<object>();
for(int i = 0; i < input.Count; i++)
{
MLCodifiedData codifiedData = input[i].Value as MLCodifiedData;
Type dataType = codifiedData.Features[0].GetType();
if (dataType == typeof(App_Boolean))
{
data.Add(new MLBoolData(codifiedData.Label, codifiedData.Features));
}
else if (dataType == typeof(App_Number))
{
data.Add(new MLNumberData(codifiedData.Label, codifiedData.Features));
}
else if (dataType == typeof(App_Integer))
{
data.Add(new MLIntData(codifiedData.Label, codifiedData.Features));
}
if (dataType == typeof(App_String))
{
data.Add(new MLTextData(codifiedData.Label, codifiedData.Features));
}
}
IDataView TrainingData = mlContext.Data.LoadFromEnumerable<object>(data);
I have tried creating a schema definition (which can be passed in as the second parameter in the LoadFromEnumerable method, but I can't seem to get that to work. I've also tried creating a schema using the schema builder to create a schema, but that doesn't seem to work either. Right now, I'm using one of the datasets that is included in one of the sample files. And to preempt questions, yes, I know I could simply load the data as file and read it in that way... However, in my app I need to first read in the CSV into memory, then create the data structure so I can't really use many of the examples which are geared toward reading in a CSV file using the LoadFromTextFile method. Can anyone provide support as to how I could setup a dynamic in-memory collection and get it converted into a IDataView object?
I'd like to add a custom column after loading my IDataView from file.
In each row, the column value should be the sum of previous 2 values. A sort of Fibonacci series.
I was wondering to create a custom transformer but I wasn't able to find something that could help me to understand how to proceed.
I also tried to clone ML.Net Git repository in order to see how other transformers were implemented but I saw many classes are marked as internal so I cannot re-use them in my project.
There is a way to create a custom transform with CustomMapping
Here's an example I used for this answer.
The input and output classes:
class InputData
{
public int Age { get; set; }
}
class CustomMappingOutput
{
public string AgeName { get; set; }
}
class TransformedData
{
public int Age { get; set; }
public string AgeName { get; set; }
}
Then, in the ML.NET program:
MLContext mlContext = new MLContext();
var samples = new List<InputData>
{
new InputData { Age = 16 },
new InputData { Age = 35 },
new InputData { Age = 60 },
new InputData { Age = 28 },
};
var data = mlContext.Data.LoadFromEnumerable(samples);
Action<InputData, CustomMappingOutput> mapping =
(input, output) =>
{
if (input.Age < 18)
{
output.AgeName = "Child";
}
else if (input.Age < 55)
{
output.AgeName = "Man";
}
else
{
output.AgeName = "Grandpa";
}
};
var pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
var dataEnumerable = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: true);
foreach (var row in dataEnumerable)
{
Console.WriteLine($"{row.Age}\t {row.AgeName}");
}
Easy thing. I am assuming, you know how to use pipelines.
This is a part of my project, where I merge two columns together:
IEstimator<ITransformer> pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null)
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question1", outputColumnName: "question1Featurized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question2", outputColumnName: "question2Featurized"))
.Append(mlContext.Transforms.Concatenate("Features", "question1Featurized", "question2Featurized"))
//.Append(mlContext.Transforms.NormalizeMinMax("Features"))
//.AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: nameof(customTransform.Label), featureColumnName: "Features"));
As you can see the two columns question1Featurized and question2Featurized are combined into Features which will be created and can be used as any other column of IDataView. The Features column does not need to be declared in a separate class.
So in your case you should transform the columns firs in their data type, if strings you can do what I did and in case of numeric values use a custom Transformer/customMapping.
The documentation of the Concatenate function might help as well!
I am creating Windows Store App based on Split App template. What is the best way to save data from SampleDataSource for later use?
I tried:
Windows.Storage.ApplicationDataContainer roamingSettings = Windows.Storage.ApplicationData.Current.RoamingSettings;
roamingSettings.Values["Data"] = AllGroups;
It throws exception: 'Data of this type is not supported'.
RoamingSettings only supports the runtime data types (with exception of Uri); additionally, there's a limitation as to how much data you can save per setting and in total.
You'd be better off using RoamingFolder (or perhaps LocalFolder) for the storage aspects.
For the serialization aspect you might try the DataContractSerializer. If you have a class like:
public class MyData
{
public int Prop1 { get; set; }
public int Prop2 { get; set; }
}
public ObservableCollection<MyData> coll;
then write as follows
var f = await Windows.Storage.ApplicationData.Current.LocalFolder.CreateFileAsync("data.txt");
using ( var st = await f.OpenStreamForWriteAsync())
{
var s = new DataContractSerializer(typeof(ObservableCollection<MyData>),
new Type[] { typeof(MyData) });
s.WriteObject(st, coll);
and read like this
using (var st = await Windows.Storage.ApplicationData.Current.LocalFolder.OpenStreamForReadAsync("data.txt"))
{
var t = new DataContractSerializer(typeof(ObservableCollection<MyData>),
new Type[] { typeof(MyData) });
var col2 = t.ReadObject(st) as ObservableCollection<MyData>;
}