Why is the ReadFromEnumerable method not working? ML.NET - c#

I'm trying to recreate the following sample https://github.com/dotnet/machinelearning/blob/master/docs/samples/Microsoft.ML.Samples/Dynamic/SsaSpikeDetectorTransform.cs
but I keep getting an error that DataOperations doesn't contain a definition for the ReadFromEnumerable method.
I also get an error that the CreateEnumerable method doesn't exist, but I suspect it relates to the ReadFromEnumerable error.
I've copied the entire namespaces and code in case I might have missed out something, but the error still occurs.
Read From Enumerable method
var ml = new MLContext();
//Generate sample series data with a recurring pattern and a spike within the pattern
const int SeasonalitySize = 5;
const int TrainingSeasons = 3;
const int TrainingSize = SeasonalitySize * TrainingSeasons;
var data = new List<SsaSpikeData>();
for (int i = 0; i < TrainingSeasons; i++)
for (int j = 0; j < SeasonalitySize; j++)
data.Add(new SsaSpikeData(j));
//This is a spike
data.Add(new SsaSpikeData(100));
for (int i = 0; i < SeasonalitySize; i++)
data.Add(new SsaSpikeData(i));
// Convert data to IDataView.
var dataView = ml.Data.ReadFromEnumerable(data); //This is where the error occurs
CreateEnumerable Method
var predictionColumn = ml.CreateEnumerable<SsaSpikePrediction>(transformedData, reuseRowObject: false);

Just like dlatikay said, it was a version mismatch.
The sample I provided is from a version that's still in preview.
For ML.NET 0.9.0 and older versions, you need to use CreateStreamingDataView.
To get ReadFromEnumerable and CreateEnumerable working, you can download the ML.NET 0.10.0 and 0.11.0 preview packages from here https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML/0.11.0-preview-27404-5

Related

Possibilities to improve performance using vectorization for the following function in C#?

I have a function that estimates correlation between two input arrays.
The input is feeded by a dataDict which is of type Dictionary<string, double[]> which has 153 keys with values as double array of size 1500.
For each individual key, I need to estimate its correlation with all other keys and store the result to a double[,] that has a size of double[dataDict.Count(), dataDict.Count()]
The following function prepares two double[] arrays whose correlation needs to be estimated.
public double[,] CalculateCorrelation(Dictionary<string, double?[]> dataDict, string corrMethod = "kendall")
{
CorrelationLogicModule correlationLogicModule = new CorrelationLogicModule();
double[,] correlationMatrix = new double[dataDict.Count(), dataDict.Count()];
for (int i = 0; i < dataDict.Count; i++)
{
for (int j = 0; j < dataDict.Count; j++)
{
var arrayA = dataDict[dataDict.ElementAt(i).Key].Cast<double>().ToArray();
var arrayB = dataDict[dataDict.ElementAt(j).Key].Cast<double>().ToArray();
correlationMatrix[i, j] = correlationLogicModule.Kendall_Formula(arrayA, arrayB);
}
}
return correlationMatrix;
}
The following function (I found it on internet from here) finds correlation between two input arrays using 'Kendall's' method.
public double Kendall_Formula(double[] Ticker1, double[] Ticker2)
{
double NbrConcord, NbrDiscord, S;
NbrConcord = 0;
NbrDiscord = 0;
S = 0;
for (int i = 0; i < Ticker1.Length - 1; i++)
{
for (int j = i + 1; j < Ticker1.Length; j++)
{
//Compute the number of concordant pairs
if (((Ticker1[i] < Ticker1[j]) & (Ticker2[i] < Ticker2[j])) | ((Ticker1[i] > Ticker1[j]) & (Ticker2[i] > Ticker2[j])))
{
NbrConcord++;
}
//Compute the number of discordant pairs
else if (((Ticker1[i] > Ticker1[j]) & (Ticker2[i] < Ticker2[j])) | ((Ticker1[i] < Ticker1[j]) & (Ticker2[i] > Ticker2[j])))
{
NbrDiscord++;
}
}
}
S = NbrConcord - NbrDiscord;
//Proportion with the total pairs
return 2 * S / (Ticker1.Length * (Ticker1.Length - 1));
}
Moving this way forward, takes a very long time to calculate the correlations for all the keys.
is there a possible way to optimize the performance?.
I am new to C# but I have been using Python for a long time and in Python using 'Numpys' and 'Pandas' I am sure the above operation would take seconds to compute. For e.g. lets say I had the above data in form of a pandas dataframe, then data[[list of columns]].corr('method') would lead the result in seconds. This is because pandas uses numpy under the hood which takes benefit from vectorization. I would like to learn how can I take benefit from vectorization to improve the performance of the above code in C# and if there are other factors I need to consider. Thank you!
You are using dataDict[dataDict.ElementAt(i).Key] to access the dictionary values in an undefined order. I don't know if that's what you intended, but the following code should give the the same results.
If you call dataDict.Values.ToArray(); you will get the dictionary values in the same order as you would when using foreach to iterate over it. That means that it will be the same as the order when using dataDict[dataDict.ElementAt(i).Key].
Therefore this code should be equivalent, and it should be faster:
public double[,] CalculateCorrelation(Dictionary<string, double?[]> dataDict, string corrMethod = "kendall")
{
CorrelationLogicModule correlationLogicModule = new CorrelationLogicModule();
var values = dataDict.Values.Select(array => array.Cast<double>().ToArray()).ToArray();
double[,] correlationMatrix = new double[dataDict.Count, dataDict.Count];
for (int i = 0; i < dataDict.Count; i++)
{
for (int j = 0; j < dataDict.Count; j++)
{
var arrayA = values[i];
var arrayB = values[j];
correlationMatrix[i, j] = correlationLogicModule.Kendall_Formula(arrayA, arrayB);
}
}
return correlationMatrix;
}
Note that the .ElementAt() call in your original code is a Linq extension, not a member of Dictionary<TKey,TValue>. It iterates from the start of the dictionary EVERY TIME you call it - and it also returns items in an unspecified order. From the documentation: For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<TKey,TValue> structure representing a value and its key. The order in which the items are returned is undefined.
Also:
You should change the bitwise & to logical && in your conditions. The use of & will prevent the compiler applying a boolean short-circuit optimisation, meaning that all the < / > comparisons will be performed, even if the first condition is false.

How to create a table of contents using NPOI?

I can barely find stuff for npoi when I do its for poi and its lacking.
I found this for poi:
CTP ctP = p.getCTP();
CTSimpleField toc = ctP.addNewFldSimple();
toc.setInstr("TOC \\h");
toc.setDirty(STOnOff.TRUE);enter code here`enter code here`
And was able to "adapt" into this
XWPFParagraph p=doc.CreateParagraph();
CT_P ctP = p.GetCTP();
CT_SimpleField toc = ctP.***Field not working***;
toc.instr="TOC \\h";
toc.dirty=ST_OnOff.True;
(When I wrote Field not working, its 'cause i can't find the c# variation)
Also found
XWPFDocument doc = new XWPFDocument();
doc.CreateTOC();
But can't find how to set it up.
Might be simple but I'm still trying to learn and can't find proper documentation.(Also if you can help me add pagination would be awesome)
Thanks in advance :)
private static XWPFTable createXTable(XWPFDocument myDoc, DataTable dtSource)
{
int rowCount = dtSource.Rows.Count;
int columnCount = dtSource.Columns.Count;
CT_Tbl table = new CT_Tbl();
XWPFTable xTable = new XWPFTable(table, myDoc, rowCount, columnCount);
xTable.Width = 5000;
for (int i = 0; i < rowCount; i++)
{
for (int j = 0; j < columnCount; j++)
{
xTable.GetRow(i).GetCell(j).SetParagraph(SetCellText(xTable, dtSource.Rows[i][j].ToString()));
}
}
return xTable;
}

How can I convert a 3D C# array to a 3D Matlab array as an importable .mat file?

I have a 3D double array double[,,] surfaceData = new double[5, 304, 304]; that I then populate with nested for loops. It works great in C#, but how do I convert it to a .mat Matlab-readable file?
I am using csmatio. I can output .mat files with it:
List<MLArray> mlList = new List<MLArray>();
mlList.Add(mlDouble);
MatFileWriter mfw = new MatFileWriter("SurfaceDataTest.mat", mlList, false);
...where mlDouble is an MLDouble object in csmatio. This is no issue. The issue is populating that mlDouble when I can't directly reference three indeces (mlDouble[4,3,60] for example). Instead, the usage guidlines suggest I populate my 3D array like so...
I have tried many nested for loops and haven't yet found a solution.
Here is a messy example:
for(int i = 0; i < 304; i++)
{
for(int j = 0; j < 304; j++)
{
for(int k = 0; k < 5; k++)
{
mlDouble.Set(surfaceData[k, j, i], i, j * k);
}
}
}
In case this helps anyone, I found it easier to use MatFileHandler instead of csmatio.
In MatFileHandler, simply define a DataBuilder:
DataBuilder builder = new DataBuilder();
Define a MatLab variable with a string name and the C# object:
var matVar = builder.NewVariable("<VariableNameForML>", csharpvar);
Create a list of variables you want to add, even if only adding one:
List<IVariable> matList = new List<IVariable>();
matList.Add(matVar);
Then:
var matFile = builder.NewFile(matList);
using (var fileStream = new FileStream("SurfaceData.mat", FileMode.Create))
{
var writer = new MatFileWriter(fileStream);
writer.Write(matFile);
}
Hope this helped anyone in the same position as me :)

System.Numerics.Vector.GreaterThan and bool results

I am trying to convert some existing code that can be optimized using SIMD instructions. There is a mask generation code that I am testing how much performance I can get out of SIMD after converting it and the below is a oversimplified chunk I am using to profile it.
Random r = new Random();
var random1 = new double[65536000*4];
var random2 = new double[random1.Length];
var result = new bool[random1.Length];
for (i = 0; i < random1.Length; i++)
{
random1[i] = r.Next();
random2[i] = r.Next();
}
var longRes = new long[random1.Length];
for (int i = 0; i < result.Length; i += Vector<double>.Count)
{
Vector<double> v1 = new Vector<double>(random1, i);
Vector<double> v2 = new Vector<double>(random2, i);
Vector<long> res = System.Numerics.Vector.GreaterThan(v1, v2);
res.CopyTo(longRes, i);
}
Is there a technique I could use to efficiently put the result res into the result array?
Originally I thought I could live with Vector<long> and keep the masks in long[] but I realized that maybe this is not feasible.
As commented on the original question I came to a realization that System.Numberics.Vector.GreaterThan and other similar methods like LesserThan etc are designed for use with ConditionalSelect().
In my case, I was attempting to generate a bool array that represents an image mask that is later used throughout the API and converting the long to bool wouldn't be feasible.
In other words, these comparison methods were not meant to be for general purpose use.

To add the element to List

Below is my code,
List<float?> LValues = new List<float?>();
List<float?> IValues = new List<float?>();
List<float?> BValues = new List<float?>();
List<HMData>[] data = new List<HMData>[4];
List<HMData>[] Data = new List<HMData>[7];
float? Value_LfromList = 0;
float? Value_IfromList = 0;
float? Value_BfromList = 0;
int indexer=0;
foreach (var item in Read_xml_for_childobjects_id.Root.Descendants("object"))
{
data[indexer] = new List<HMData>(); // Error occuring on this line
for (int k = 0; k < 7; k++)
{
Value_LfromList = LValues.ElementAt(k);
Value_IfromList = IValues.ElementAt(k);
Value_BfromList = BValues.ElementAt(k);
Data[k].Add(new HMData { x = Value_LfromList, y = Value_IfromList, z = Value_BfromList });
}
indexer++;
}
As soon as I intend to add the element at Data list in following line,
Data[k].Add(new HMData { x = Value_LfromList, y = Value_IfromList, z = Value_BfromList });
I get an error as Object reference not set to instant of object,
I want output be as shown in following question link,
Result required as shown in this question,
I have tried by lots of ways but could not make it,will really appreciate help if provided,Thanks.
Your code is a nightmare. You should really think about refactoring...
You have to initialize the lists within Data array.
List<HMData>[] Data = new List<HMData>[7];
for(int i = 0; i < 7; i++)
Data[i] = new List<HMData>();
There are tons of other problems and questions that should be asked (like what's the difference between data and Data?, why are these array sized explicitly?). Without that knowledge every advice can be not enough to solve your real problem.
you just need to declare the list as
List<HMData> Data = new List<HMData>();
and add new element to the list by
Data.Add(new HMData { x = Value_LfromList, y = Value_IfromList, z = Value_BfromList });

Categories

Resources