Add custom column to IDataView in ML.NET - c#

I'd like to add a custom column after loading my IDataView from file.
In each row, the column value should be the sum of previous 2 values. A sort of Fibonacci series.
I was wondering to create a custom transformer but I wasn't able to find something that could help me to understand how to proceed.
I also tried to clone ML.Net Git repository in order to see how other transformers were implemented but I saw many classes are marked as internal so I cannot re-use them in my project.

There is a way to create a custom transform with CustomMapping
Here's an example I used for this answer.
The input and output classes:
class InputData
{
public int Age { get; set; }
}
class CustomMappingOutput
{
public string AgeName { get; set; }
}
class TransformedData
{
public int Age { get; set; }
public string AgeName { get; set; }
}
Then, in the ML.NET program:
MLContext mlContext = new MLContext();
var samples = new List<InputData>
{
new InputData { Age = 16 },
new InputData { Age = 35 },
new InputData { Age = 60 },
new InputData { Age = 28 },
};
var data = mlContext.Data.LoadFromEnumerable(samples);
Action<InputData, CustomMappingOutput> mapping =
(input, output) =>
{
if (input.Age < 18)
{
output.AgeName = "Child";
}
else if (input.Age < 55)
{
output.AgeName = "Man";
}
else
{
output.AgeName = "Grandpa";
}
};
var pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
var dataEnumerable = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: true);
foreach (var row in dataEnumerable)
{
Console.WriteLine($"{row.Age}\t {row.AgeName}");
}

Easy thing. I am assuming, you know how to use pipelines.
This is a part of my project, where I merge two columns together:
IEstimator<ITransformer> pipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null)
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question1", outputColumnName: "question1Featurized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "question2", outputColumnName: "question2Featurized"))
.Append(mlContext.Transforms.Concatenate("Features", "question1Featurized", "question2Featurized"))
//.Append(mlContext.Transforms.NormalizeMinMax("Features"))
//.AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: nameof(customTransform.Label), featureColumnName: "Features"));
As you can see the two columns question1Featurized and question2Featurized are combined into Features which will be created and can be used as any other column of IDataView. The Features column does not need to be declared in a separate class.
So in your case you should transform the columns firs in their data type, if strings you can do what I did and in case of numeric values use a custom Transformer/customMapping.
The documentation of the Concatenate function might help as well!

Related

Item classification problem using ML.net with Naive Bayes

I am new to machine learning and ML.NET. I want to solve a task about Excel column identification.
Columns in Excel like 序号, 编号, 编码, 名称, 项目名称, and for each column, there is a corresponding field name as following:
Column_Field.csv
Column
FieldName
序号
OrdCode
编号
OrdCode
编码
OrdCode
名称
Name
项目名称
Name
Each field may have one or more than one column names, such as 序号, 编号, 编码 for OrdCode. And the task is to try to identify or find the corresponding field name for an incoming column.
Based on the above dataset, I use ML.NET, and want to predict the right field for columns that are read from an Excel file.
I use Naive Bayes algorithm. The code:
public class Program
{
private static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "Column_Field.csv");
private static void Main(string[] args)
{
MLContext mlContext = new MLContext();
IDataView dataView = mlContext.Data.LoadFromTextFile<ColumnInfo>(_dataPath, hasHeader: true, separatorChar: '\t');
var pipeline = mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "Label", outputColumnName: "Label")
.Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: "Column"))
.Append(mlContext.MulticlassClassification.Trainers.NaiveBayes())
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
var model = pipeline.Fit(dataView);
// evaluate
//List<ColumnInfo> dataForEvaluation = new List<ColumnInfo>()
//{
// new ColumnInfo{ Column="名称", FieldName="Name" },
// new ColumnInfo{ Column="<名称>", FieldName="Name" },
// new ColumnInfo{ Column="序号", FieldName="OrdName" },
//};
//IDataView testDataSet = mlContext.Data.LoadFromEnumerable(dataForEvaluation);
//var metrics = mlContext.MulticlassClassification.Evaluate(testDataSet);
//Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:P2}");
//Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:P2}");
// predict
var dataForPredictation = new List<ColumnInfo>();
dataForPredictation.Add(new ColumnInfo { Column = "名称" });
dataForPredictation.Add(new ColumnInfo { Column = "ABC" });
dataForPredictation.Add(new ColumnInfo { Column = "名" });
var engine = mlContext.Model.CreatePredictionEngine<ColumnInfo, Predication>(model);
foreach (var data in dataForPredictation)
{
var result = engine.Predict(data);
Console.WriteLine($"{data.Column}: \t{result.FieldName}");
}
Console.ReadLine();
}
}
public class ColumnInfo
{
[LoadColumn(0)]
public string Column { get; set; }
[LoadColumn(1), ColumnName("Label")]
public string FieldName { get; set; }
}
public class Predication
{
[ColumnName("PredictedLabel")]
public string FieldName { get; set; }
}
However, the result is not as expected.
Result:
名称: OrdCode
ABC: OrdCode
名: OrdCode
So what is wrong with the code? I suppose the problem may be lacking of proper processing the data in the pipeline before training.
Thanks.

Mongo error when trying to update nested arrays: No array filter found for identifier

I am trying to update a document in Mongo that represents a community with the following scenario.
A community has a collection of blocks
A block has a collection of floors
A floor has a collection of doors
A door has a collection of label names
Given a document Id and information about the labels that must be placed into each door, I want to use the MongoDb C# driver v2.10.4 and mongo:latest to update nested lists (several levels).
I've reading the documentation, about array filters, but I can't have it working.
I've created a repository from scratch to reproduce the problem, with instructions on the Readme on how to run the integration test and a local MongoDB with docker.
But as a summary, my method groupds the labels so that I can bulk place names on the desired door and then it iterates over these groups and updates on Mongo the specific document setting the desired value inside some levels deep nested object. I couldn't think of a more efficient way.
All the code in the above repo.
The DB document:
public class Community
{
public Guid Id { get; set; }
public IEnumerable<Block> Blocks { get; set; } = Enumerable.Empty<Block>();
}
public class Block
{
public string Name { get; set; } = string.Empty;
public IEnumerable<Floor> Floors { get; set; } = Enumerable.Empty<Floor>();
}
public class Floor
{
public string Name { get; set; } = string.Empty;
public IEnumerable<Door> Doors { get; set; } = Enumerable.Empty<Door>();
}
public class Door
{
public string Name { get; set; } = string.Empty;
public IEnumerable<string> LabelNames = Enumerable.Empty<string>();
}
The problematic method with array filters:
public async Task UpdateDoorNames(Guid id, IEnumerable<Label> labels)
{
var labelsGroupedByHouse =
labels
.ToList()
.GroupBy(x => new { x.BlockId, x.FloorId, x.DoorId })
.ToList();
var filter =
Builders<Community>
.Filter
.Where(x => x.Id == id);
foreach (var house in labelsGroupedByHouse)
{
var houseBlockName = house.Key.BlockId;
var houseFloorName = house.Key.FloorId;
var houseDoorName = house.Key.DoorId;
var names = house.Select(x => x.Name).ToList();
var update =
Builders<Community>
.Update
.Set($"Blocks.$[{houseBlockName}].Floors.$[{houseFloorName}].Doors.$[{houseDoorName}].LabelNames", names);
await _communities.UpdateOneAsync(filter, update);
}
}
The exception is
MongoDB.Driver.MongoWriteException with the message "A write operation resulted in an error.
No array filter found for identifier 'Block 1' in path 'Blocks.$[Block 1].Floors.$[Ground Floor].Doors.$[A].LabelNames'"
Here's a more visual sample on how the nested structure looks like in the database. Notice the value I want to update is the LabelNames, which is an array of string.
I appreciate any help to have this working and suggestions on whether it's the right approach assuming that I cannot change the repository's method signature.
SOLUTION RESULT:
Thanks for the quick answer #mickl, it works perfectly.
Result at this repo's specific point of history exactly as suggested.
The $[{houseBlockName}] expects an identifier which acts as a placeholder and has a corresponding filter defined within arrayfilters (positional filtered). It seems like you're trying to pass the filter value directly which is incorrect.
Your C# code can look like this:
var houseBlockName = house.Key.BlockId;
var houseFloorName = house.Key.FloorId;
var houseDoorName = house.Key.DoorId;
var names = house.Select(x => x.Name).ToList();
var update = Builders<Community>.Update.Set("Blocks.$[block].Floors.$[floor].Doors.$[door].LabelNames", names);
var arrayFilters = new List<ArrayFilterDefinition>();
ArrayFilterDefinition<BsonDocument> blockFilter = new BsonDocument("block.Name", new BsonDocument("$eq", houseBlockName));
ArrayFilterDefinition<BsonDocument> floorFilter = new BsonDocument("floor.Name", new BsonDocument("$eq", houseFloorName));
ArrayFilterDefinition<BsonDocument> doorFilter = new BsonDocument("door.Name", new BsonDocument("$eq", houseDoorName));
arrayFilters.Add(blockFilter);
arrayFilters.Add(floorFilter);
arrayFilters.Add(doorFilter);
var updateOptions = new UpdateOptions { ArrayFilters = arrayFilters };
var result = _communities.UpdateOne(filter, update, updateOptions);
{
var filterCompany = Builders<CompanyInfo>.Filter.Eq(x => x.Id, Timekeepping.CompanyID);
var update = Builders<CompanyInfo>.Update.Set("LstPersonnel.$[i].Timekeeping.$[j].CheckOutDate", DateTime.UtcNow);
var arrayFilters = new List<ArrayFilterDefinition>
{
new BsonDocumentArrayFilterDefinition<BsonDocument>(new BsonDocument("i.MacAddress",new BsonDocument("$eq", Timekeepping.MacAddress) )),
new BsonDocumentArrayFilterDefinition<BsonDocument>(new BsonDocument("j.Id", new BsonDocument("$eq", timeKeeping.Id)))
};
var updateOptions = new UpdateOptions { ArrayFilters = arrayFilters};
var updateResult = await _companys.UpdateOneAsync(filterCompany, update, updateOptions);
return updateResult.ModifiedCount != 0;
}

Converting two-dimensional array to an object c#

This all originates from querying Google Analytics data. For a basic query the main factors that change are the dimensions and the metrics. The object that is returned is of a type called GaData, and the actual results that you need reside in GaData.Rows.
The format of GaData.Rows looks like this:
There will first be a row for each dimension, in this example there is a row for "New Visitor" and a 2nd row for "Returning Visitor". Within those rows will be another set of rows that contain the Dimension value, and then each metric that you specify (I've only asked for one metric).
So far the class setup I have is as follows:
public class Results
{
public List<Dimension> Dimensions { get; set; }
}
public class Dimension
{
public string Value { get; set; }
public List<Metric> Metrics { get; set; }
}
public class Metric
{
public int Value { get; set; }
}
Finally, maybe its just late and my brain isn't functioning well, but I'm having a little bit of difficulty converting this data into the Results type, I think because of the multiple layers. Any help?
Edit
I added an answer below for how I ended up accomplishing it, if anyone has a more condensed example let me know!
Well, I don't know what Rows is inside Ga, but maybe this will point you in the right direction.
var results
= GaData.Rows.Select(x => x.Rows.Select(y =>
new Dimension { Value = y.Value, Metrics = new List<Metric> {innerRow.Metric}}));
I ended up creating an extension method for GaData called ToDimensionResults(). I'm not sure if I would have been able to accomplish this using LINQ as I needed to know the index of some of the rows (like the Dimension Value). So I opted to just loop through both dimensions and metrics and create the class manually. NOTE: if you do not include a dimension in your query, the results do not contain the dimension value, only a list of metrics, so this accommodates that possibility.
public static Results ToDimensionResults(this GaData ga)
{
var results = new Results();
var dimensions = new List<Dimension>();
List<Metric> metrics;
var value = "";
var metricStartIndex = 1;
for (var i = 0; i < ga.Rows.Count; i++)
{
//accomodate data without dimensions
if (!string.IsNullOrEmpty(ga.Query.Dimensions))
{
value = ga.Rows[i][0].ToString();
}
else
{
value = "";
metricStartIndex = 0;
}
metrics = new List<Metric>();
for (var x = metricStartIndex; x < ga.Rows[i].Count; x++)
{
metrics.Add(new Metric
{
Value = Convert.ToInt32(ga.Rows[i][x])
});
}
dimensions.Add(new Dimension
{
Value = value,
Metrics = metrics
});
}
results.Dimensions = dimensions;
return results;
}

Creating Relationships Between Nodes in Neo4j with Neo4jClient in C#

I'm working with Neo4j using the .Net Neo4jClient (http://hg.readify.net/neo4jclient/wiki/Home). In my code, nodes are airports and relationships are flights.
If I want to create nodes and relationships at the same time, I can do it with the following code:
Classes
public class Airport
{
public string iata { get; set; }
public string name { get; set; }
}
public class flys_toRelationship : Relationship, IRelationshipAllowingSourceNode<Airport>, IRelationshipAllowingTargetNode<Airport>
{
public static readonly string TypeKey = "flys_to";
// Assign Flight Properties
public string flightNumber { get; set; }
public flys_toRelationship(NodeReference targetNode)
: base(targetNode)
{ }
public override string RelationshipTypeKey
{
get { return TypeKey; }
}
}
Main
// Create a New Graph Object
var client = new GraphClient(new Uri("http://localhost:7474/db/data"));
client.Connect();
// Create New Nodes
var lax = client.Create(new Airport() { iata = "lax", name = "Los Angeles International Airport" });
var jfk = client.Create(new Airport() { iata = "jfk", name = "John F. Kennedy International Airport" });
var sfo = client.Create(new Airport() { iata = "sfo", name = "San Francisco International Airport" });
// Create New Relationships
client.CreateRelationship(lax, new flys_toRelationship(jfk) { flightNumber = "1" });
client.CreateRelationship(lax, new flys_toRelationship(sfo) { flightNumber = "2" });
client.CreateRelationship(sfo, new flys_toRelationship(jfk) { flightNumber = "3" });
The problem, however, is when I want to add relationships to already existing nodes. Say I have a graph consisting of only two nodes (airports), say SNA and EWR, and I would like to add a relationship (flight) from SNA to EWR. I try the following and it fails:
// Create a New Graph Object
var client = new GraphClient(new Uri("http://localhost:7474/db/data"));
client.Connect();
Node<Airport> departure = client.QueryIndex<Airport>("node_auto_index", IndexFor.Node, "iata:sna").First();
Node<Airport> arrival = client.QueryIndex<Airport>("node_auto_index", IndexFor.Node, "iata:ewr").First();
//Response.Write(departure.Data.iata); <-- this works fine, btw: it prints "sna"
// Create New Relationships
client.CreateRelationship(departure, new flys_toRelationship(arrival) { flightNumber = "4" });
The two errors I'm receiving are as follows:
1) Argument 1: cannot convert from 'Neo4jClient.Node' to 'Neo4jClient.NodeReference'
2) The type arguments for method 'Neo4jClient.GraphClient.CreateRelationship(Neo4jClient.NodeReference, TRelationship)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
The method the error is referring to is in the following class: http://hg.readify.net/neo4jclient/src/2c5446c17a65d6e5accd420a2dff0089799cbe16/Neo4jClient/GraphClient.cs?at=default
Any ideas?
In your CreateRelationship call you will need to use the node references, not the nodes, so:
client.CreateRelationship(departure.Reference, new flys_toRelationship(arrival.Reference) { flightNumber = "4" });
The reason why your initial creation code works and this didn't is because Create returns you a NodeReference<Airport> (the var is hiding that for you), and the QueryIndex returns a Node<Airport> instance instead.
Neo4jClient predominantly uses NodeReference's for the majority of its operations.
The second error you had was just related to not using the .Reference property as it couldn't determine the types, when you use the .Reference property that error will go away as well.

How to cast from a list with objects into another list.

I have an Interface [BindControls] which takes data from GUI and store it into a list „ieis”.
After that, Into another class, which sends this data through WebServices, I want to take this data from „ieis” and put it into required by WS Class fields (bottom is a snippet of code)
This is the interface:
void BindControls(ValidationFrameBindModel<A.B> model)
{
model.Bind(this.mtbxTax, (obj, value) =>
{
var taxa = TConvertor.Convert<double>((string)value, -1);
if (taxa > 0)
{
var ieis = new List<X>();
var iei = new X
{
service = new ServiceInfo
{
id = Constants.SERVICE_TAX
},
amount = tax,
currency = new CurrencyInfo
{
id = Constants.DEFAULT_CURRENCY_ID
}
};
ieis.Add(iei);
}
},"Tax");
}
This is the intermediate property:
//**********
class A
{
public B BasicInfo
{
get;
set;
}
class B
{
public X Tax
{
get;
set;
}
}
}
//***********
This is the class which sends through WS:
void WebServiceExecute(SomeType someParam)
{
//into ‚iai’ i store the data which comes from interface
var iai = base.Params.FetchOrDefault<A>( INFO, null);
var convertedObj = new IWEI();
//...
var lx = new List<X>();
//1st WAY: I tried to put all data from ‚Tax’into my local list ‚lx’
//lx.Add(iai.BasicInfo.Tax); - this way is not working
//2nd WAY: I tried to put data separately into ‚lx’
var iei = new X
{
service = new ServiceInfo
{
id = iai.BasicInfo.Tax.service.id
},
amount = iai.BasicInfo.Tax.amount,
currency = new CurrencyInfo
{
id = iai.BasicInfo.Tax.currency.id
}
};
lx.Add(iei);
// but also is not working
Can you help me please to suggest how to implement a way that will fine do the work (take data from ‚ieis’ and put her into ‚lx’).
Thank you so much
As noted in my comment, it looks like iai.BasicInfo.Tax is null, once you find out why that is null your original Add() (#1) will work.

Categories

Resources