I started using the MathNet Numerics Library and I need it to calculate the largest Eigenvalues corresponding to their Eigenvectors of my adjacency matrix.
When using large amount of points my adjacency Matrix gets quite big (i.e. 5782x5782 entries)
Most of the entries are '0' so I thought I could use the 'SparseMatrix'. But when I use it, it still takes ages for computation. In fact I never really waited that long until its finished.
I tried the whole thing in matlab and there wasn't any problem at all. Matlab solved it within a few seconds.
Do you have any suggestions for me?
Here is what I'm doing:
// initialize matrix and fill it with zeros
Matrix<double> A = SparseMatrix.Create(count, count, 0);
... fill matrix with values ...
// get eigenvalues and eigenvectors / this part takes centuries =)
Evd<double> eigen = A.Evd(Symmetricity.Symmetric);
Vector<Complex> eigenvector = eigen.EigenValues;
Math.Net Numerics's implementation is purely C# based. Therefore, performance may not be on-par with tools such as MATLAB since they mostly rely on native and highly optimized BLAS libraries for performing numerical computations.
You may want to use the native wrappers that come with Math.Net that leverage highly optimized linear algebra libraries (such as Intel's MKL or AMD's ACML). There is a guide on this MSDN page that explains how to build Math.NET with ACML support (look under Compiling and Using AMD ACML in Math.NET Numerics).
Related
I'm trying to reproduce same curve fitting (called "trending") in Excel but in C#: Exponential, Linear, Logarithmic, Polynomial and Power.
I found linear and polynomial as :
Tuple<double, double> line = Fit.Line(xdata, ydata);
double[] poly2 = Fit.Polynomial(xdata, ydata, 2);
I also found Exponential fit.
But I wonder how to do curve fitting for Power. Anybody has an idea?
I should be able to get both constants like shown into the Excel screen shot formula:
power
multiplier (before x)
Before anybody would be the fifth who vote to close this question...
I asked the question directly to the forum of mathdotnet (that I recently discovered). Christoph Ruegg, the main developper of the lib, answered me something excellent that I want to share in order to help other with the same problem:
Assuming with power you’re referring to a target function along the
lines of y : x -> a*x^b, then this is a simpler version of what I’ve
described in Linearizing non-linear models by transformation.
This seems to be used often enough so I’ve started to add a new
Fit.Power and Fit.Exponential locally for this case - not pushed yet
since it first needs more testing, but I expect it to be part of v4.1.
Alternatively, by now we also support non-linear optimization which
could also be used for use cases like this (FindMinimum module).
Link to my question: mathdonet - Curve fitting: Power
Having a matrix like this in C#
double[,] M
I would like to get the same fast manipulation of its content as Matlab does. In particular, having this code in Matlab:
for i = 1:N
M(i, 1:i) = 1;
I would like to have its equivalent in C# without a second loop. I'm not sure about this, but as far as I know, Matlab uses a process called Vectorization for this line M(i, 1:i) = 1, which is faster than me implementing a for loop from 1 to i setting each cell to 1. Maybe I'm wrong, please correct me.
So how can achieve a fast manipulation of matrices in C# like Matlab.
A common solution is to use a matrix library like math.net numerics for matrix operations.
I understand that some matrices have a lot of data, while others have mainly 0's or are empty. But what is the advantage of creating a SparseMatrix object to hold a sparsely populated matrix over creating a DenseMatrix object to hold a sparsely populated matrix? They both seem to offer more or less the same operations as far as methods go.
I'm also wondering when you would use a Matrix object to hold data -- as in are there any advantages or situations where this would be preferred over using the other two.
For small matrices (e.g. less than 1000x1000) dense matrices work well. But in practice there are a lot of problems where much larger matrices are needed, but where almost all values are zero (often with non-zero values close to the diagonal). With sparse matrices it is possible to handle very large matrices in cases where the dense structure is unfeasible (because it needs too much memory or is way to expensive to compute with CPU-time wise).
Note that as of today the Math.NET Numerics direct matrix decomposition methods are optimized for dense matrices only; use iterative solvers for sparse data instead.
Regarding types, in Math.NET Numerics v3 the hierarchy for double-valued matrices is as follows:
Matrix<double>
|- Double.Matrix
|- Double.DenseMatrix
|- Double.SparseMatrix
|- Double.DiagonalMatrix
With Matrix<T> I refer to the full type MathNet.Numerics.LinearAlgebra.Matrix<T>, with
Double.Matrix to MathNet.Numerics.LinearAlgebra.Double.Matrix, etc.
Matrix<double>: always declare all variables, properties and arguments using this generic type only. Indeed, in most cases this is the only type needed in user code.
Double.Matrix: do not use
Double.DenseMatrix: use for creating a dense matrix only - if you do not wish to use the builder (Matrix<double>.Build.Dense...)
Double.SparseMatrix: use for creating a sparse matrix only - if you do not wish to use the builder
Double.DiagonalMatrix: use for creating a diagonal matrix only - if you do not wish to use the builder
They each are optimized for that specific use. For example sparse matrix uses CSR format.
Compressed sparse row (CSR or CRS)
CSR is effectively identical to the Yale Sparse Matrix format, except
that the column array is normally stored ahead of the row index array.
I.e. CSR is (val, col_ind, row_ptr), where val is an array of the
(left-to-right, then top-to-bottom) non-zero values of the matrix;
col_ind is the column indices corresponding to the values; and,
row_ptr is the list of value indexes where each row starts. The name
is based on the fact that row index information is compressed relative
to the COO format. One typically uses another format (LIL, DOK, COO)
for construction. This format is efficient for arithmetic operations,
row slicing, and matrix-vector products. See scipy.sparse.csr_matrix.
See wiki for more info.
Long story short, i have to solve 20..200 block-tridiagonal linear systems during an iterational process. Size of systems is 50..100 blocks, 50..100 x 50..100 each. I will write down here my thoughts on it, and i ask you to share your opinion on my thoughts, as it is possible that i am mistaken in one regard or another.
To solve those equations, i use a matrix version of Thomas algorithm. It's exactly like scalar one, except instead of scalar coefficients in equations i have matrices (i.e. instead of "a_i x_{i-1} + b_i x_i + c_i x_{i+1} = f_i" i have "A_i X_{i-1} + B_i X_i + C_i X_{i+1} = F_i", where A_i, B_i, C_i - matrices; F_i and X_i are vectors.
Asymptotic complexity of such algorithm is O(N*M^3), where N is the size of overall matrix in blocks, and M is the size of each block.
Right now my bottleneck is inversion operation. Deep inside nested loops i have to calculate /a lot/ of inversions that look like "(c_i - a_i * alpha_i)^-1", where alpha_i is a dense MxM matrix. I am doing it using Gauss-Jordan algorithm, using additional memory (which i will have to use anyway later in the program) and O(M^3) operations.
Trying to find info on how to optimize inversion operation, i've found only threads about solving AX=B systems 'canonically', i.e. X=A^-1 B, with suggestions to use LU factorization instead of it. Sadly, as my inversion is a part of Thomas algorithm, if i resort to LU factorization, i will have to do it for a M*NxM*N matrix, which wil rise the complexity of solving the linear system by extra N^2 to O(N^3*M^3). That's slowing down by a factor of 2500..10000, which is quite bad.
Approximate or iterative inversions are out of scope, too, as slightest residual with exact inversion will cumulate very fast and cause global iterational process to explode.
I do calculations in parallel with Parallel.For(), solving each of the 20..200 systems separately.
Right now, to solve 20 such systems with N,M=50 on average takes 872ms (i7-3630QM, 2.4Ghz, 4cores (8 with hyper threading)).
And finally, here come the questions.
Am i correct on what i wrote here? Is there an algorithm to significantly speed up calculations over what they are now?
Inside of number-grinder part of my program i use only For loops (most of them are with constant boundaries, the exception being one of the loops inside of inversion algorithm) double arithmetic (+,-,*,/) and standard arrays ([], [,], [,,]). Will there be any speed-up if i rewrite this part as unsafe? Or as a library in C?
How much is C# overhead on such tasks (double arrays grinding)? Are C compilers better at optimization of such simple code than C# 'compiler'?
What should i look at when optimizing numbergrinder in C#? Is it suited for such task at all?
I'd like your advice: could you recommend a library that allows you to add/subtract/multiply/divide PDFs (Probability Density Functions) like real numbers?
Behind the scenes, it would have to do a Monte Carlo to work the result out, so I'd probably prefer something fast and efficient, that can take advantage of any GPU in the system.
Update:
This is the sort of C# code I am looking for:
var a = new Normal(0.0, 1.0); // Creates a PDF with mean=0, std. dev=1.0.
var b = new Normal(0.0, 2.0); // Creates a PDF with mean=0, std. dev=2.0.
var x = a + b; // Creates a PDF which is the sum of a and b.
// i.e. perform a Monte Carlo by taking thousands of samples
// of a and b to construct the resultant PDF.
Update:
What I'm looking for is a method to implement the algebra on "probability shapes" in The Flaw of Averages by Sam Savage. The video Monte Carlo Simulation in Matlab explains the effect I want - a library to perform math on a series of input distributions.
Update:
Searching for the following will produce info on the appropriate libraries:
"monte carlo library"
"monte carlo C++"
"monte carlo Matlab"
"monte carlo .NET"
The #Risk Developer Kit allows you to start with a set of probability density functions, then perform algebra on the inputs to get some output, i.e. P = A + B.
The keywords on this page can be used to find other competing offerings, e.g. try searching for:
"monte carlo simulation model C++"
"monte carlo simulation model .NET"
"risk analysis toolkit"
"distributing fitting capabilties".
Its not all that difficult to code this up in a language such as C++ or .NET. The Monte Carlo portion is probably only about 50 lines of code:
Read "The Flaw Of Averages" by Sam Savage to understand how you can use algebra on "probability shapes".
Have some method of generating a "probability shape", either by bootstrapping from some sampled data, or from a pre-determined probability density function, or by using the Math.NET probability library.
Take 10000 samples from the input probability shapes.
Do the algebra on the samples, i.e. +, -, /, *, etc, to get 1000 outputs. You can also form a probability tree which implies and, or, etc on the inputs.
Combine these 10000 outputs into a new "probability shape" by putting the results into 100 discrete "buckets".
Now that we have a new "probability shape", we can then use that as the input into a new probability tree, or perform an integration to get the area, which converts it back into a hard probability number given some threshold.
The video Monte Carlo Simulation in Matlab explains this entire process much better than I can.
#Gravitas - Based on that exchange with #user207442, it sounds like you just want an object that abstracts away a convolution for addition and subtraction. There is certainly a closed form solution for the product of two random variables, but it might depend on the distribution.
C#'s hot new step sister, F#, let's you do some fun FP techniques, and it integrates seamlessly with C#. Your goal of abstracting out a "random variable" type that can be "summed" (convolved) or "multiplied" (??) seems like it is screaming for a monad. Here is a simple example.
Edit: do you need to reinvent mcmc in c#? we use winbugs for this at my school ... this is the c++ library winbugs uses: http://darwin.eeb.uconn.edu/mcmc++/mcmc++.html. rather than reinventing the wheel, could you just wrap your code around the c++ (again, seems like monads would come in hand here)?
Take a look at the Math.NET Numerics library. Here is the page specific to probability distribution support.