CNTK Sequences in C#

CNTK Sequences in C# - c#

I made a working script in python to train a CNTK Model with some data samples. Now I'm trying to translate it to C# in CNTK V2.2 but I'm getting different results.
This is what I got in Python to create a model:
def create_model_function(num_hidden_layers,hidden_layers_dim,num_output_classes):
return Sequential([For(range(num_hidden_layers)
, lambda i: Dense(hidden_layers_dim
, activation=cntk.tanh))
, Dense(num_classes,init=cntk.glorot_uniform()
, activation=cntk.softmax)])
Thanks
My C# function looks like this:
private Function CreateModel(DeviceDescriptor device, int HiddenLayerCount, int HiddenLayerDimension,int OutputClassesCount, Variable Input)
{
Function[] HiddenLayers= new Function[HiddenLayerCount];
for (int i = 1; i < HiddenLayerCount - 1; i++)
{
HiddenLayers[i] = Dense(HiddenLayers[i - 1], HiddenLayerDimension, device, Activation.Tanh, "");
}
return Dense(HiddenLayers[HiddenLayerCount-1], OutputClassesCount, device, Activation.Sigmoid, "");
}
I'm just not sure this is the equivalent of the Python sequential.

Python Dense function is not directly supported in C# yet. The Dense function you used in C# might be different than CNTK Python implementation. May you build a model in both C# and python with operators available in C# and see if they are the same?
I am attaching a C# function to help you check the model graph. Use it with python model loaded into C# and compare with the one you created in C#. Thanks.
static void PrintGraph(Function function, int spaces, bool useName = false)
{
string indent = new string('.', spaces);
if (function.Inputs.Count() == 0)
{
Console.WriteLine(indent + "(" + (useName ? function.Name : function.Uid) + ")" +
"(" + function.OpName + ")" + function.AsString());
return;
}
foreach (var input in function.Inputs)
{
Console.WriteLine(indent + "(" + (useName ? function.Name : function.Uid) + ")" +
"(" + function.OpName + ")" + "->" +
"(" + (useName ? input.Name : input.Uid) + ")" + input.AsString());
}
foreach (var input in function.Inputs)
{
if (input.Owner != null)
{
Function f = input.Owner;
PrintGraph(f, spaces + 4, useName);
}
}
}

Following example shows a simple feed forward path from left to right.
To generate the deep network,
control the FOR LOOP per your requirements.
In this example,
Loop control is modifying number of nodes with even and odd loopcount.
CreateUniteLayer builds a unit layer
- LXNodes on left side connected to LYNodes on right side.
Other variables are self explained.
ParameterVector NetParamVec is needed to create the trainer.
Pass this as parameter if you use CNTKLib.xxxx_learner function.
Carefully check on connectivity of Input features to first layer,
First layer to intermediate layers
and then last layer connection
finally leading to sigmoid.
Adjust LXNodes and LYNodes variables appropriately per your need.
Add this code in a class or pull it inside a method as appropriate for your application.
If building all layers of same node size,
LXNodes = LYNodes = number of nodes per layer
NetOut represents final output of the deep network.
Hope this helps to build the net you are looking for.
Best wishes.
List<Function> Layers = new List<Function>() ;
ParameterVector NetParamVec = new ParameterVector ();
// Define first layer immediately after input.
Function layer1 = CreateUnitLayer(features, LXNodes, inputDim, "NetLayer0", InitWeight, InitBias);
Layers.Add(layer1);
//Defines Intermediate hidden layers
for (int i = 1; i < LayerCount; i++)
{
Function ly;
if (i % 2 == 0)
ly = CreateUnitLayer(Layers[i - 1], LXNodes, LYNodes, "NetLayer" + i.ToString(), InitWeight, InitBias);
else
ly = CreateUnitLayer(Layers[i - 1], LYNodes, LXNodes, "NetLayer" + i.ToString(), InitWeight, InitBias);
Layers.Add(ly);
}
//Defines Last layer
int lastDim = LXNodes;
if (LayerCount % 2 == 0)lastDim = LYNodes;
Function layerLast = CreateUnitLayer(Layers[LayerCount - 1], outDim, lastDim, "NetLayerOut", InitWeight, InitBias);
Layers.Add(layerLast);
Function NetOut = CNTKLib.Sigmoid(layerLast);
public Function CreateUnitLayer(Variable LXIn, int LYNodes, int LXNodes, string LYName, float InitWeight, float InitBias)
{
Parameter weightParamy = new Parameter(new int[] { LYNodes, LXNodes }, DataType.Float, InitWeight, device, "W" + LYName);
Parameter biasParamy = new Parameter(new int[] { LYNodes }, DataType.Float, InitBias, device, "B" + LYName);
Function LayerY = CNTKLib.Plus(CNTKLib.Times(weightParamy, LXIn), biasParamy);
NetParamVec.Add(weightParamy);
NetParamVec.Add(biasParamy);
return LayerY;
}

Related

Possible memory leak in simple batch file processing function in c#

I'm running a very simple function that reads lines from a text file in batches. Each line contains an sql query so the function grabs a specified number of queries, executes them against the SQL database, then grabs the next batch of queries until the entire file is read. The problem is that over time with very large files the process starts to slow considerably. I'm guessing there is a memory leak somewhere in the function but can't determine where it may be. There is nothing else going on while this function is running. My programming skills are crude at best so please go easy on me. :)
for (int x = 0; x<= totalBatchesInt; x++)
{
var lines = System.IO.File.ReadLines(file).Skip(skipCount).Take(batchSize).ToArray();
string test = string.Join("\n", lines);
SqlCommand cmd = new SqlCommand(test.ToString());
try
{
var rowsEffected = qm.ExecuteNonQuery(CommandType.Text, cmd.CommandText, 6000, true);
totalRowsEffected = totalRowsEffected + rowsEffected;
globalRecordCounter = globalRecordCounter + rowsEffected;
fileRecordCounter = fileRecordCounter + rowsEffected;
skipCount = skipCount + batchSize;
TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Starting, "Rows
progress for " + folderName + "_" + fileName + " : " + fileRecordCounter.ToString() + "
of " + linesCount + " records");
}
catch (Exception esql)
{
TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Cancelling, "Error
processing file " + folderName + "_" + fileName + " : " + esql.Message.ToString() + ".
Aborting file read");
}
}

There are many things wrong with your code:
You never dispose your command. That's a native handle to an ODBC driver, waiting for the garbage collector to dispose it is very bad practice.
You shouldn't be sending those commands individually anyway. Either send them all at once in one command, or use transactions to group them together.
This one is the reason why it's getting slower over time: File.ReadLines(file).Skip(skipCount).Take(batchSize) will read the same file over and over and ignore a growing amount of lines every attempt, and so growing slower and slower as the number of lines ignored (but processed) gets larger and larger.
To fix #3, simply create the enumerator once and iterate it in batches. In pure C#, you can do something like:
using var enumerator = File.ReadLines(file).GetEnumerator();
for (int x = 0; x<= totalBatchesInt; x++)
{
var lines = new List<string>();
while(enumerator.MoveNext() && lines.Count < batchSize)
list.Add(enumerator.Current);
string test = string.Join("\n", lines);
// your code...
}
Or if you're using Morelinq (which I recommend), something like this:
foreach(var lines in File.ReadLines(file).Batch(batchSize))
{
// your code...
}

Running multiple sql queries to get matrix results in .NET Core

I am trying to fetch results from database to generate some kind of matrix results to send back to front end. The point is that I have percentile value for X and Y axes which I divide into 10 parts to have 10x10 table. To get each value I calculate distinct user Ids, so it goes like 1-1, 1-2 ... 10-10.
This is my current code (unfinished though, just the idea what I have so far) which I want to improve because running 100 queries one after one doesn't seem like a nice solution. However I am a little bit stuck how to make the performance better and whether I should return results in dictionary with lenght=100, or multidimensional matrix array to make it a good practice code. Thanks anyone for the tips in advance, my code is below:
public async Task GenerateMatrix(List<double> x, string xAxis, List<double> y, string yAxis, Parameters parameters)
{
IDictionary<string, string> xDict = GenerateRanges(x, parameters.XAxis);
IDictionary<string, string> yDict = GenerateRanges(y, parameters.YAxis);
var innerJoin = GenerateInnerJoin(parameters);
var whereClauses = GenerateWhereClause(parameters);
var sql = $#"SELECT COUNT(DISTINCT [dbo].[{nameof(Table)}].[{nameof(Table.UserId)}]) FROM [dbo].[{nameof(Table)}] {innerJoin} ";
if (whereClauses.Any())
{
sql += " WHERE " + string.Join(" AND ", whereClauses);
}
for (int i = 0; i < x.Count; i++)
{
var queryToExecute = "";
for (int j = 0; j < y.Count; j++)
{
queryToExecute = sql + " AND " + xDict.Values.ElementAt(i) + " AND " + yDict.Values.ElementAt(j);
var userCount = await Store().QueryScalar<int>(queryToExecute);
}
}
return null;
}
private IDictionary<string, string> GenerateRanges(List<double> axis, string columnTitle)
{
IDictionary<string, string> d = new Dictionary<string, string>();
for (int i = 0; i < axis.Count; i++)
{
var rangeSql = $#" [dbo].[{nameof(Table)}].[{columnTitle}]";
if (i == 0)
{
d.Add(axis[i].ToString(), rangeSql + " < " + axis[i]);
}
else if (i == axis.Count - 1)
{
d.Add(axis[i] + "+", rangeSql + " > " + axis[i]);
}
else
{
d.Add(axis[i-1] + "-" + axis[i], rangeSql + " > " + axis[i-1] + " AND " + rangeSql + " < " + axis[i]);
}
}
return d;
}
sql looks like this:
SELECT
COUNT(DISTINCT [dbo].[Table].[UserId])
FROM [Table]
WHERE Table.[ClientId] = '2'
AND [dbo].[Table].[ProbabilityAlive] < 0.1
AND [dbo].[Table].[SpendAverage] < 24.86
so there's going to be 100 hundred of lines like this.
ProbabilityAlive and SpendAverage are column titles that come from front end, there could be any other column titles.
For these two columns I calculate percentile value, which then I divide into ten parts, one being X axis, another being Y axis. and then I use sql query from above to get value for each matrix value which becomes 100 queries since the matrix is 10x10.
As a result I want to get 100 integer values I am still trying to figure out whether it's best to put data in dictionary and then have key with range x-y and value as select result (e.g. "0-1", 5472"), or whether to put it in multidimensional array or something else. I have xDict that contains range as a key e.g. "0-1" and then sql sentence ProbabilityAlive > 0 AND ProbabilityALive <1 and then add same for Y axis from yDict. Then I have two lists x and y that contain 10 double values that are used for these ranges

It looks like you want to calculate the user counts for specific ranges of ProbabilityAlive and SpendAverage.
First, you need to generate the range combinations. The easy way to generate combinations in SQL is to join two tables or sets of values.
If you had two tables with the range values like these:
create table ProbabilityRanges
(
LowBound decimal(3,2), UpperBound(3,2)
)
create table SpendRanges
(
LowBound decimal(3,2), UpperBound(3,2)
)
You could use a cross-join to generate all the combinations:
SELECT
SpendRanges.LowBound as SLow,
SpendRanges.UpperBound as SUpper,
ProbabilityRangers.LowBound as PLow,
ProbabilityRanges.UpperBound as PUpper
FROM ProbabilityRanges CROSS JOIN SpeedRanges
You can use those combinations to filter and count the rows in another table that are within those bounds:
SELECT
SpendRanges.LowBound as SpendValue,
SpendRanges.LowBound as ProbabilityValue,
Count(DISTINCT UserID) as Count
FROM SomeTable CROSS JOIN ProbabilityRanges CROSS JOIN SpeedRanges
Where
SomeTable.ClientID=2
AND SomeTable.SpendAverage >=SpeedRanges.LowBound
AND SpendAverage < SpeedRanges.UpperBound
AND SomeTable.ProbabilityAlive >= ProbabilityRangers.LowBound
AND SomeTable.ProbabilityAlive < ProbabilityRanges.UpperBound
GROUP BY SpendRanges.LowBound,SpendRanges.LowBound
It's possible to create the bounds dynamically for a specific number of bins, eg using a Numbers table. You'll have to provide more information on what you actually want though

With just two dimensions, easiest to do & maintain, would be to make a TSQL Stored Procedure that outputs a Tuple (the default output) of what you want.
Pass in as parameters what you got as input from the Front-End.
What if you had that functionality with a web-service HTTP GET returning a JSON (or XML)?
Simulate it with TSQL instead.
Since you have direct access to the SQL Server, you call a Stored Procedure of type "get" that you pass parameters, and get a Tuple result.
Easy to write & test independently of your .Net core application. Will be very fast also.
type "get" = just a read-only SP, I name mine USP_GET_method_name.
I also use SPs for saving with SQL-side validations, so I call them USP_PUT_method_name.
Cheers

Problems with EquationMgr & SelectionManger Solidworks Api C#

I'm trying to understand principles of solidworks API, but have several problems.
Here is my code:
for (var i = 0; i < selMgr.GetSelectedObjectCount(); i++)
{
var Face = selMgr.GetSelectedObject(i+1);
surfaces.Add(Face.GetSurface());
measure = swModel.Extension.CreateMeasure();
if (surfaces[i].IsCylinder())
{
// Problem # 1
Console.WriteLine("Cylinder " + i);
measure.Calculate(surfaces[i]);
var diameter = measure.Diameter * 1000;
var length = 1000 * measure.Perimeter / (measure.Diameter * Math.PI);
var temp = swApp.OpenDoc6(#"E:\OAK\Locator9.SLDPRT", 1, 1, "", 0, 0);
var part = component.AddComponent5(#"E:\OAK\Locator9.SLDPRT", 0, "", true, "", 0, 0, 0.3);
swApp.CloseDoc(#"E:\OAK\Locator9.SLDPRT");
ModelDoc2 locator = part.GetModelDoc();
var eqMgr = locator.GetEquationMgr();
Console.WriteLine("Evaluated diameter " + diameter);
Console.WriteLine("Evaluated length " + length);
Console.WriteLine(eqMgr.Equation[1] + " " + eqMgr.Equation[2]);
//Problem #2
eqMgr.set_Equation(1, $#"""D""={diameter}");
eqMgr.set_Equation(2, $#"""L""={length}");
eqMgr.EvaluateAll();
locator.EditRebuild3();
locator.ForceRebuild3(false);
}
else
{
// TODO: Handle other type of surface
}
}
1) I want to measure perimeter & diameter of the selected surface. But if a return value of GetSelectedObjectCount() method is greater than 1, measure.Diameter & measure.Perimeter both returns -1. And I kinda understand why, 'cause such operation isn't possible via UI as well, but can I do smth to solve the problem?
2) The code above has no influence on the equation of the inserted component, even if it writes it on the console.
Help please!

1 For primitive surfaces you can use *Params property of the ISurface object to get the information you need. For cylinder it would be CylinderParams. I can't find the link right now but I remember reading that measure shouldn't be used for any precise calculations as it is not guaranteed to be accurate at all times. If you don't care about precision and still want to use measure you can manually manipulate set of selected objects.
2 I haven't used IEquationMgr but in general I tried to stay away from VB styled parameterized properties like Equation , I'd suggest trying to Delete and then Add equation.

Dice Sorensen Distance error calculating Bigrams without using Intersect method

I have been programming an object to calculate the DiceSorensen Distance between two strings. The logic of the operation is not so difficult. You calculate how many two letter pairs exist in a string, compare it with a second string and then perform this equation
2(x intersect y)/ (|x| . |y|)
where |x| and |y| is the number of bigram elements in x & y. Reference can be found here for further clarity https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
So I have tried looking up how to do the code online in various spots but every method I have come across uses the 'Intersect' method between two lists and as far as I am aware this won't work because if you have a string where the bigram already exists it won't add another one. For example if I had a string
'aaaa'
I would like there to be 3 'aa' bigrams but the Intersect method will only produce one, if i am incorrect on this assumption please tell me cause i wondered why so many people used the intersect method. My assumption is based on the MSDN website https://msdn.microsoft.com/en-us/library/bb460136(v=vs.90).aspx
So here is the code I have made
public static double SorensenDiceDistance(this string source, string target)
{
// formula 2|X intersection Y|
// --------------------
// |X| + |Y|
//create variables needed
List<string> bigrams_source = new List<string>();
List<string> bigrams_target = new List<string>();
int source_length;
int target_length;
double intersect_count = 0;
double result = 0;
Console.WriteLine("DEBUG: string length source is " + source.Length);
//base case
if (source.Length == 0 || target.Length == 0)
{
return 0;
}
//extract bigrams from string 1
bigrams_source = source.ListBiGrams();
//extract bigrams from string 2
bigrams_target = target.ListBiGrams();
source_length = bigrams_source.Count();
target_length = bigrams_target.Count();
Console.WriteLine("DEBUG: bigram counts are source: " + source_length + " . target length : " + target_length);
//now we have two sets of bigrams compare them in a non distinct loop
for (int i = 0; i < bigrams_source.Count(); i++)
{
for (int y = 0; y < bigrams_target.Count(); y++)
{
if (bigrams_source.ElementAt(i) == bigrams_target.ElementAt(y))
{
intersect_count++;
//Console.WriteLine("intersect count is :" + intersect_count);
}
}
}
Console.WriteLine("intersect line value : " + intersect_count);
result = (2 * intersect_count) / (source_length + target_length);
if (result < 0)
{
result = Math.Abs(result);
}
return result;
}
In the code somewhere you can see I call a method called listBiGrams and this is how it looks
public static List<string> ListBiGrams(this string source)
{
return ListNGrams(source, 2);
}
public static List<string> ListTriGrams(this string source)
{
return ListNGrams(source, 3);
}
public static List<string> ListNGrams(this string source, int n)
{
List<string> nGrams = new List<string>();
if (n > source.Length)
{
return null;
}
else if (n == source.Length)
{
nGrams.Add(source);
return nGrams;
}
else
{
for (int i = 0; i < source.Length - n; i++)
{
nGrams.Add(source.Substring(i, n));
}
return nGrams;
}
}
So my understanding of the code step by step is
1) pass in strings
2) 0 length check
3) create list and pass up bigrams into them
4) get the lengths of each bigram list
5) nested loop to check in source position[i] against every bigram in target string and then increment i until no more source list to check against
6) perform equation mentioned above taken from wikipedia
7) if result is negative Math.Abs it to return a positive result (however i know the result should be between 0 and 1 already this is what keyed me into knowing i was doing something wrong)
the source string i used is source = "this is not a correct string" and the target string was, target = "this is a correct string"
the result I got was -0.090909090908
I'm SURE (99%) that what I'm missing is something small like a mis-calculated length somewhere or a count mis-count. If anyone could point out what i'm doing wrong I'd be really grateful. Thank you for your time!

This looks like homework, yet this similarity metric on strings is new to me so I took a look.
Algorith implementation in various languages
As you may notice the C# version uses HashSet and takes advantage of the IntersectWith method.
A set is a collection that contains no duplicate elements, and whose
elements are in no particular order.
This solves your string 'aaaa' puzzle. Only one bigram there.
My naive implementation on Rextester
If you prefer Linq then I'd suggest Enumerable.Distinct, Enumerable.Union and Enumerable.Intersect. These should mimic very well the duplicate removal capabilities of the HashSet.
Also found this nice StringMetric framework written in Scala.

If statement in Calculator

I'm a beginner programmer designing a calculator in Visual Studio 2013 using C#. I've run into a slight problem that I can't seem to fix. When I click on the square root button, I want the program to display the square root in the text box, as well as show what's happening in the label.
For example, if I want to calculate 9 + sqrt(16), I need to click 9 + 16, then press the square root button. The label above the text box should show "9 + sqrt(16)" and the text box itself should say "4". This all works like it should. But if I then take the square root of 4, I want the label to say "9 + sqrt(4)". I tried storing the first part (9 + ) as a string, but when the square root button is pressed twice, it displays "9 + sqrt(16) sqrt(4)".
Is there a different way to fix this or am I doing something wrong?
Here's part of the code I tried (result is the text box and expression is the label):
private void sqrt_Click(object sender, EventArgs e)
{
bool square_root_pressed = false;
string exp = "";
Button b = (Button)sender;
double res = Convert.ToDouble(result.Text);
if ((equal_pressed) || (operation == ""))
{
if (b.Text == "√")
{
result.Text = Convert.ToString(Math.Sqrt(res));
expression.Text = "sqrt(" + Convert.ToString(res) + ") =";
}
else
{
result.Text = Convert.ToString(Math.Sqrt(res));
if (square_root_pressed)
{
expression.Text = exp + " sqrt(" + Convert.ToString(res) + ")";
}
else
{
exp = expression.Text;
expression.Text = expression.Text + " sqrt(" + Convert.ToString(res) + ")";
square_root_pressed = true;
}
}

the local variable square_root_pressed is always set to false when the event handler is called. The statement
expression.Text = exp + " sqrt(" + Convert.ToString(res) + ")";
will never be reached.
Try to store the state of keys pressed outside of the event handler.

you can add another event handler for "square root" and double the number when the Event is called,
once you click on the sqrt_Click Event "res" calculated the correct sqrt number.

If i understand your question properly, when the square root button is pressed twice, the label display repeats sqrt. A possible solution is when the statement exp = expression.Text executes; check to see if the text sqrt already exists in the label. If so then remove it in-order to avoid duplication.
exp = expression.Text;
if (exp.Contains("sqrt"))
{ exp = exp.Remove(exp.IndexOf("sqrt")); }
expression.Text = exp + " sqrt(" + Convert.ToString(res) + ")";
square_root_pressed = true;

I would suggest a totally different approach: I'd store the entire term as a list of objects and write a method to visualize them.
An example for what I mean could be the following:
public class Operation
{
public string Operator;
}
public class Function
{
public string Functor;
public List<object> Term;
}
These two classes would be my little helpers. Now if the user presses only numbers and operators, you can fill a list of term items like this:
public List<object> term = new List<object>();
...
term.Add(3);
term.Add(new Operation() { Operator = "+" });
term.Add(4);
This would be the equivalent of 3+4.
If the user presses the SQRT button, you could do this:
// Get the last element in the term and remove it
object o = term[term.Count - 1];
term.RemoveAt(term.Count - 1);
// Wrap the element in a function
Function f = new Function() { Functor = "SQRT" };
f.Term = new List<object>();
f.Term.Add(o);
// Add the function as new element to the term
term.Add(f);
This would turn the list from
3
+
4
into
3
+
SQRT
4
Similarly, if the user presses SQRT again, you can remove the function:
// Get the last element in the term and remove it
object o = term[term.Count - 1];
term.RemoveAt(term.Count - 1);
// Add the function term as new element to the term
term.AddRange((o as Function).Term);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

CNTK Sequences in C# - c#

Related

Possible memory leak in simple batch file processing function in c#

Running multiple sql queries to get matrix results in .NET Core

Problems with EquationMgr & SelectionManger Solidworks Api C#

Dice Sorensen Distance error calculating Bigrams without using Intersect method

If statement in Calculator

Categories

Resources