This is probably a long shot but I asked a question about converting one of the statistics toolbox codes earlyier into C# realising that it was just a huge and lengthy process and there was not much in the way to automate it (really what I wanted as the references I provided explained why it was so hard to do by hand, as the comments I got where: why dont you try convert it and ask questions on where you are stuck, which obviously my question wasnt understood!).
The reason I was looking to do this is because of the long processing time required by matlab to complete what im working on (k-means and bayes classifiers on large data sets). So I thought well hey why not just convert the code into C# and try my hand at multithreading and Multiprocessing, this might provide a functional means to decrease the processing time. But obviously its extremely hard to convert all of matlabs functions to C# by hand to accommandate this.
So my question is if I import matlabs files into C# is it possible to have them used/ran in multithreading and multiprocessing fashion or will the imported files just run like they do in matlab?
The reason (I think) it runs slow in matlab is that the functions or some of them in the statistics toolbox only benefit from multithreading specifically:
MATHEMATICS
Arrays and matrices
• Basic information: ISFINITE, ISINF, ISNAN, MAX, MIN
• Operators: +, -, .*, ./, .\, .^, *, ^, \ (MLDIVIDE), / (MRDIVIDE)
• Array operations: PROD, SUM
• Array manipulation: BSXFUN, SORT
Linear algebra
• Matrix Analysis: DET, RCOND
• Linear Equations: CHOL, INV, LINSOLVE, LU, QR
• Eigenvalues and singular values: EIG, HESS, SCHUR, SVD, QZ
Elementary math
• Trigonometric: ACOS, ACOSD, ACOSH, ASIN, ASIND, ASINH, ATAN, ATAND, ATANH, COS, COSD, COSH,HYPOT, SIN, SIND, SINH, TAN, TAND, TANH
• Exponential: EXP, POW2, SQRT
• Complex: ABS
• Rounding and remainder: CEIL, FIX, FLOOR, MOD, REM, ROUND
Special Functions
• ERF, ERFC, ERFCINV, ERFCX, ERFINV, GAMMA, GAMMALN
DATA ANALYSIS
• CONV2, FILTER, FFT and IFFT of multiple columns or long vectors, FFTN, IFFTN
So im not to sure how or in what way I could potentially decrease the processing time, the kmeans and bayes classifier when processeing near tens of thousands of records really is just unbearable on its processing time (understandable).
This is not something you will be able to do easily. In fact I would say it is not possible.
If you attempt it you have the following issues to deal with:
Find a (semi) automated way to convert math lab functionality into C#
This does not exist to my knowledge.
Alter the resulting code to be multithreading enabled
To make modify a mathematical algorithm to supported multiple threads is very difficult and sometimes even impossible due to the data structures used
Also keep in mind that some mathematical problems do not scale with the number of processors, so you might not even get the benefit you expected.
Related
I'm looking for a simple function in C# to interpolate my 3D data.
Given is already a list with around 100-150 data sets and 3 double values.
-25.000000 -0.770568 2.444945
-20.000000 -0.726583 2.467809
-15.000000 -0.723274 2.484167
-10.000000 -0.723114 2.506445
and so on...
The chart created by these values looks usually like this, I'm not sure if this counts as scattered or rather still gridded data ...
In the end I want to hand over two double values and get the third then from the interpolation function. It shouldn't flatten the surface, it should still go through all the given data points.
Since I'm not given the time to look into all possible algorithms and lack the mathematical background I'm a bit overwhelmed by all the possibilities that I get thrown at: Kriging, Delauney triangulation, NURBs and many more ...
In addition to that most solutions I found in the net were either for a different language, outdated or are charged by the time (e.g ilnumerics, still not sure if they have the solution)
In matlab there exists a griddata function that does exactly this (and is based on a kriging algorithm as far as I know) but in this case C# is mandatory for me.
Thank you for your help and criticism and suggestions are welcome.
I'm implementing Ng's example of OCR neural network in C#.
I think I've got all formulas correctly implemented [vectorized version] and my app is training the network.
Any advice on how can I see my network improving in recognition - without testing examples manually by drawing them after the training is done? I want to see where my training is going while it's being trained.
I've test my trained weights on a drawn digits, output on all neurons is quite similar(approx. 0.077,or something like that ...on all neurons) ,and the largest value is on the wrong neuron. So the result doesn't match the drawn image.
This is the only test I'm doing so far: Cost Function changes with epochs
So, this is what happens with Cost function (some call it objective function? ) in 50 epochs.
my Lambda value is set to 3.0 , learning rate is 0.01, 5000 examples, I do batch after each epoch i.e. after those 5000 examples. Activation function: sigmoid.
input: 400
hidden: 25
output:10
I don't know what proper values are for lambda and learning rate so that my network can learn without overfitting or underfitting.
Any suggestions how to find out my network is learning well?
Also, what value should J cost function have after all this training?
Should it approach zero?
Should I have more epochs?
Is it bad that my examples are all ordered by digits?
Any help is appreciated.
Q: Any suggestions how to find out my network is learning well?
A: Split the data into three groups training, cross validation and test.Validate your result with test data. This is actually address in the course later.
Q: Also, what value should J cost function have after all this training? Should it approach zero?
A: I recall in the homework Ng mentioned what is the expected value. The regularized cost should not be zero since it includes a sum of all the weights.
Q: Should I have more epochs?
A: If you run your program long enough ( less than 20 minutes? ) you will see the cost is not getting smaller, I assume it reached the local/global optimum so more epochs would not be necessary.
Q: Is it bad that my examples are all ordered by digits?
A: The algorithm modify the weights for every example so different order of data does affect each step in a batch. However the final result should not have much difference.
Long story short, i have to solve 20..200 block-tridiagonal linear systems during an iterational process. Size of systems is 50..100 blocks, 50..100 x 50..100 each. I will write down here my thoughts on it, and i ask you to share your opinion on my thoughts, as it is possible that i am mistaken in one regard or another.
To solve those equations, i use a matrix version of Thomas algorithm. It's exactly like scalar one, except instead of scalar coefficients in equations i have matrices (i.e. instead of "a_i x_{i-1} + b_i x_i + c_i x_{i+1} = f_i" i have "A_i X_{i-1} + B_i X_i + C_i X_{i+1} = F_i", where A_i, B_i, C_i - matrices; F_i and X_i are vectors.
Asymptotic complexity of such algorithm is O(N*M^3), where N is the size of overall matrix in blocks, and M is the size of each block.
Right now my bottleneck is inversion operation. Deep inside nested loops i have to calculate /a lot/ of inversions that look like "(c_i - a_i * alpha_i)^-1", where alpha_i is a dense MxM matrix. I am doing it using Gauss-Jordan algorithm, using additional memory (which i will have to use anyway later in the program) and O(M^3) operations.
Trying to find info on how to optimize inversion operation, i've found only threads about solving AX=B systems 'canonically', i.e. X=A^-1 B, with suggestions to use LU factorization instead of it. Sadly, as my inversion is a part of Thomas algorithm, if i resort to LU factorization, i will have to do it for a M*NxM*N matrix, which wil rise the complexity of solving the linear system by extra N^2 to O(N^3*M^3). That's slowing down by a factor of 2500..10000, which is quite bad.
Approximate or iterative inversions are out of scope, too, as slightest residual with exact inversion will cumulate very fast and cause global iterational process to explode.
I do calculations in parallel with Parallel.For(), solving each of the 20..200 systems separately.
Right now, to solve 20 such systems with N,M=50 on average takes 872ms (i7-3630QM, 2.4Ghz, 4cores (8 with hyper threading)).
And finally, here come the questions.
Am i correct on what i wrote here? Is there an algorithm to significantly speed up calculations over what they are now?
Inside of number-grinder part of my program i use only For loops (most of them are with constant boundaries, the exception being one of the loops inside of inversion algorithm) double arithmetic (+,-,*,/) and standard arrays ([], [,], [,,]). Will there be any speed-up if i rewrite this part as unsafe? Or as a library in C?
How much is C# overhead on such tasks (double arrays grinding)? Are C compilers better at optimization of such simple code than C# 'compiler'?
What should i look at when optimizing numbergrinder in C#? Is it suited for such task at all?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
I need a fast runtime expression parser
How do I make it that when someone types in x*y^z in a textbox on my page to calculate that equation in the code behind and get the result?
.NET does not have a built-in function for evaluating arbitrary strings. However, an open source .NET library named NCalc does.
NCalc is a mathematical expressions evaluator in .NET. NCalc can parse
any expression and evaluate the result, including static or dynamic
parameters and custom functions.
Answer from operators as strings by user https://stackoverflow.com/users/1670022/matt-crouch, using built-in .NET functionality:
"If all you need is simple arithmetic, do this.
DataTable temp = new DataTable();
Console.WriteLine(temp.Compute("15 / 3",string.Empty));
EDIT: a little more information. Check out the MSDN documentation for the Expression property of the System.Data.DataColumn class. The stuff on "Expression Syntax" outlines a list of commands you can use in addition to the arithmetic operators. (ex. IIF, LEN, etc.)."
EDIT 2: For convenience, you can put this into a little function like:
public string Eval(string expr)
{
var temp = new System.Data.DataTable();
string result = null;
try
{
result = $"{temp.Compute(expr, string.Empty)}";
}
catch (System.Data.EvaluateException ex)
{
if (ex.Message.ToLower().Contains("cannot find column"))
throw new System.Data.SyntaxErrorException($"Syntax error: Invalid expression: '{expr}'."
+ " Variables as operands are not supported.");
else
throw;
}
return result;
}
So you can use it like:
Console.WriteLine(Eval("15 * (3 + 5) / (7 - 2)"));
giving the expected output:
24
Note that the error handler helps to handle exceptions caused by using variables which are not allowed here. Example: Eval("a") - Instead of returning "Cannot find column [a]", which doesn't make much sense in this context (we're not using it in a database context) it is returning "Syntax error: Invalid expression: 'a'. Variables as operands are not supported."
Run it on DotNetFiddle
There are two main approaches to this problem, each with some variations, as illustrated in the variety of answers.
Option A: Find an existing mathematical expresssion evaluator
Option B: Write your own parser and the logic to compute the result
Before going into some details about this, it is appropriate to stress that interpreting arbitrary mathematical expressions is not a trivial task, for any expression grammar other than "toy" grammars such as these that only accept one or two arithmetic operations and do not allow parenthesis etc.
Understanding that such task is deceivingly trivial, and acknowledging that, after all, interpreting arithmetic expressions of average complexity is a relatively recurrent need for various applications [hence one for which mature solutions should be available], it is probably wise to try and make do with "Option A".
I'd therefore second Jed's recommendation of a ready-make expression evaluator such as NCalc.
It may be useful however to take the time and understand the various concepts and methods associated with parsing and interpreting arithmetic expressions, as if one were going to whip-up one's own implementation.
The key concept is that of a formal grammar. The arithmetic expressions which the evaluator will accept must follow a set of rules such as the list of arithmetic operations allowed. For example will the evaluator support, say, trigonometric functions, or if it does, will this also include say atan2(). The rules also indicate what consitutes an operand, for example will it be allowed to input numerical values as big as say 45 digits. etc. The point is that all these rules are formalized in a grammar.
Typically a grammar works on tokens which have previously been extracted from the raw input text. Essentially at some time in the process, some logic needs to analyze the input string, character by character, and determine which sequences of characters go together. For example in the 123 + 45 / 9.3 expression, the tokens are the integer value 123, the plus operator, the integer value 45, the division operator and finally the 9.3 real value. The task of identifying the tokens and associating them with a token type is the job a lexer. Lexers can be build themselves on a grammar (a grammar which "tokens" are single characters, as opposed to the grammar for the arithmetic expression parser which tokens are short strings produced by the lexer.)
BTW, grammars are used to define many other things beyond arithmetic expressions. Computer languages follow [rather sophiticated] grammars, but it is relatively common to introduce Domain Specific Languages aka DSLs in support of various features of computer applications.
For very simple grammars, one may be able to write the corresponding lexer and parser from scratch. But sooner than later the grammars may get complicated to the point that hand-writing these modules becomes fastidious, bug-prone and maybe more importantly difficult to read. Hence the existence of Lexer and Parser Generators which are stand-alone programs that produce the code of lexers and parsers (in a particular programming language such as C, Java or C#) from a list of rules (expressed in a syntax particular to the generator, though many generators tend to use similar syntaxes, loosely base on BNF).
When using such a lexer/parser generator, work in done in multiple steps:
- first one writes a definition of the grammar (in the generator-specific language/syntax)
- one runs this grammar through the generator.
- one often repeats the above two steps multiple times, because writing a grammar is an exacting exercise: the generator will complain of many possible ambiguities one may write into the grammar.
- eventually the generator produces a source file (in the desired target language such as C# etc.)
- this source is included in the overall project
- other source files in the project may invoke the functions exposed in the source files produced by the generator and/or some logic corresponding to various patterns identified during parsing may readily be may imbedded in the generator produced code.
- the project can then be build as usual, i.e. as if the parser and lexer had be hand-written.
And that's about it for a 20,000 feet high presentation of the process of working with formal grammars and code generators.
A list of parser-generators (aka compiler-compilers) can be found at this link. For simple work in C# I also want to mention Irony. It may be very insightful to peruse these sites, to get a better feel for these concept, even without the intent of becoming a practitioner at this time.
As said, I wish to stress that for this particular application, a ready-made arithmetic evaluator is likely the better approach. The main downside of these would be
some limitations as to what the allowed expression syntax is (either the grammar allowed is too restrictive: you also need say stddev() or is too broad: you don't want your users to use trig functions. With the more mature evaluators, there will be some form of configuration/extension feature which allows dealing with this problem.
the learning curve of such a 3rd party module. Hopefully many of them should be relatively "plug-and-play".
solved with this library http://www.codeproject.com/Articles/21137/Inside-the-Mathematical-Expressions-Evaluator
my final code
Calculator Cal = new Calculator();
txt_LambdaNoot.Text = (Cal.Evaluate(txt_C.Text) / fo).ToString();
now when some one type 3*10^11 he will get 300000000000
You will need to implement (or find a third-party source) an expression parser. This is not a trivial thing to do.
What you need - if you want to do it yourself - is a Scanner (also known as Lexer) + Parser in the code behind which interprets the expression. Alternatively, you can find a 3rd party library which does the job and works similar as the JavaScript eval(string) function does.
Please take a look here, it describes an recursive descent parser. The example is written in C, but you should be able to adapt it to C# easily once you got the idea described in the article.
It is less complicated than it sounds, especially if you have a limited amount of operators to support.
The advantage is that you keep full control on what expressions will be executed (to prevent malicious code injections by the end-user of your website).
I'd like your advice: could you recommend a library that allows you to add/subtract/multiply/divide PDFs (Probability Density Functions) like real numbers?
Behind the scenes, it would have to do a Monte Carlo to work the result out, so I'd probably prefer something fast and efficient, that can take advantage of any GPU in the system.
Update:
This is the sort of C# code I am looking for:
var a = new Normal(0.0, 1.0); // Creates a PDF with mean=0, std. dev=1.0.
var b = new Normal(0.0, 2.0); // Creates a PDF with mean=0, std. dev=2.0.
var x = a + b; // Creates a PDF which is the sum of a and b.
// i.e. perform a Monte Carlo by taking thousands of samples
// of a and b to construct the resultant PDF.
Update:
What I'm looking for is a method to implement the algebra on "probability shapes" in The Flaw of Averages by Sam Savage. The video Monte Carlo Simulation in Matlab explains the effect I want - a library to perform math on a series of input distributions.
Update:
Searching for the following will produce info on the appropriate libraries:
"monte carlo library"
"monte carlo C++"
"monte carlo Matlab"
"monte carlo .NET"
The #Risk Developer Kit allows you to start with a set of probability density functions, then perform algebra on the inputs to get some output, i.e. P = A + B.
The keywords on this page can be used to find other competing offerings, e.g. try searching for:
"monte carlo simulation model C++"
"monte carlo simulation model .NET"
"risk analysis toolkit"
"distributing fitting capabilties".
Its not all that difficult to code this up in a language such as C++ or .NET. The Monte Carlo portion is probably only about 50 lines of code:
Read "The Flaw Of Averages" by Sam Savage to understand how you can use algebra on "probability shapes".
Have some method of generating a "probability shape", either by bootstrapping from some sampled data, or from a pre-determined probability density function, or by using the Math.NET probability library.
Take 10000 samples from the input probability shapes.
Do the algebra on the samples, i.e. +, -, /, *, etc, to get 1000 outputs. You can also form a probability tree which implies and, or, etc on the inputs.
Combine these 10000 outputs into a new "probability shape" by putting the results into 100 discrete "buckets".
Now that we have a new "probability shape", we can then use that as the input into a new probability tree, or perform an integration to get the area, which converts it back into a hard probability number given some threshold.
The video Monte Carlo Simulation in Matlab explains this entire process much better than I can.
#Gravitas - Based on that exchange with #user207442, it sounds like you just want an object that abstracts away a convolution for addition and subtraction. There is certainly a closed form solution for the product of two random variables, but it might depend on the distribution.
C#'s hot new step sister, F#, let's you do some fun FP techniques, and it integrates seamlessly with C#. Your goal of abstracting out a "random variable" type that can be "summed" (convolved) or "multiplied" (??) seems like it is screaming for a monad. Here is a simple example.
Edit: do you need to reinvent mcmc in c#? we use winbugs for this at my school ... this is the c++ library winbugs uses: http://darwin.eeb.uconn.edu/mcmc++/mcmc++.html. rather than reinventing the wheel, could you just wrap your code around the c++ (again, seems like monads would come in hand here)?
Take a look at the Math.NET Numerics library. Here is the page specific to probability distribution support.