Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
This will be a long question, sorry in advance. I don't expect a full code solution, I am looking for some input from people with a different perspective and some more experience than me.
My company is developing software for a product that does some rather expensive calculations using a film from an IR camera where every pixel contains a temperature value. The most costly of those methods is called Thermal Signal Reconstruction (if you are interested, you can read about it here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321698/ ). It basically performs a polynomial fit for each pixel over time (the number of frames). My C# implementation looks something like this:
public static double[,,] ThermalSignalReconstruction(List<Frame<double>> thermalFilm, byte polyOrder)
{
Resolution filmResolution = thermalFilm[0].Resolution;
uint width = filmResolution.Width;
uint height = filmResolution.Height;
int frames = thermalFilm.Count;
double[,,] result = new double[polyOrder + 1, height, width];
// using frame indexes as x-values for poly fit
List<double> frameIndexes = new List<double>(frames);
for (var frame = 0U; frame < frames; ++frame)
frameIndexes.Add(frame);
// make local copy of thermal film and fill with difference images
List<Frame<double>> localThermalFilm = new List<Frame<double>>(frames);
for (var frame = 0U; frame < frames; ++frame)
localThermalFilm.Add(new Frame<double>(filmResolution));
Parallel.For(0U, frames, frame =>
{
for (var row = 0U; row < height; ++row)
for (var col = 0U; col < width; ++col)
localThermalFilm[(int)frame].Data[row, col] = thermalFilm[(int)frame].Data[row, col] - thermalFilm[0].Data[row, col];
});
// determine flashpoint by finding the frame with the maximum average pixel value
double maxAverage = double.MinValue;
uint maxIndex = 0U;
Parallel.For(0U, frames, frame =>
{
double average = Math.MatrixMean(localThermalFilm[(int)frame].Data);
if (average > maxAverage)
{
maxAverage = average;
maxIndex = (uint)frame;
}
});
// remove frames preceeding flashpoint, including itself, from film
localThermalFilm.RemoveRange(0, (int)maxIndex + 1);
frameIndexes.RemoveRange(0, (int)maxIndex + 1);
frames -= (int)maxIndex + 1;
// calculate base 10 logarithm of all pixels and frame indexes
Parallel.For(0U, frames, frame =>
{
for (var row = 0U; row < height; ++row)
for (var col = 0U; col < width; ++col)
localThermalFilm[(int)frame].Data[row, col] = System.Math.Log10(localThermalFilm[(int)frame].Data[row, col]);
frameIndexes[(int)frame] = System.Math.Log10(frameIndexes[(int)frame]);
});
// perform polynomial fit for each pixel
Parallel.For(0U, height, row =>
{
for (var col = 0U; col < width; ++col)
{
// extract poly fit input y-values for current pixel
double[] pixelValues = new double[frames];
for (var frame = 0U; frame < frames; ++frame)
pixelValues[frame] = localThermalFilm[(int)frame].Data[row, col];
// (...) do some value validations
// poly fit for current pixel - this is the longest step
double[] coefficients = Math.PolynomialRegression(frameIndexesValidated.ToArray(), pixelValuesValidated.ToArray(), polyOrder);
// insert into coefficient images result array
for (var coefficient = 0U; coefficient < result.GetLength(0); ++coefficient)
result[coefficient, row, col] = coefficients[coefficient];
}
});
return result;
}
As you can see, several parallelized loops performing several operations on the frames are executed in sequence with the polynomial fit (Math.PolynomialRegression) being the last and most expensive one. This is a function containing a polynomial fit algorithm I pieced together myself since it doesn't exist in the standard System.Math library and the only other one I tried from the Math.NET library actually runs slower than the one I wrote. My code is based on examples given on Rosetta Code: https://rosettacode.org/wiki/Polynomial_regression
My point is, that I wrote this entire algorithm before in unmanaged C++, our company decided to move away from that because of some licensing issues with the GUI framework we were using back then and to instead now use C#/.NET. A direct comparison of the old unmanaged C++ code with the one I posted above in managed C# recently showed me that the C# code takes about 48% (!!!) longer to execute than the C++ code even though the algorithm is identical. I am aware that C# is a higher level, managed language and therefore has a greater translation distance than C++ so I fully expected it to run slower, but I didn't expect it to be this bad. 48% is quite a big deal which leads me to believe that I may be doing something wrong. On the other hand, I don't have that much experience yet so if I'm perfectly honest I also don't really know what to expect in a situation like this.
What I have tried so far:
switching between running the various individual loops sequentially and parallelized, it's fastest with all of them parallelized like above
adjusting the variables that are accessed by the individual parallelized loop instances (e.g. not accessing the same resolution object each time but declaring separate variables for width and height before starting the loop) which already improved performance quite a bit, but the 48% are still left after that
trying a Parallel.ForEach(Partitioner.Create(0, frames) ... ) approach, i.e. partitioning the chunks of data more coarsely with the Partitioner class, it didn't help, made the code run slower
optimizing other functions that are called as well as code on the side of this one's caller as best I can
To home in to the question: is it even possible to make a C# code like this run with comparable performance than the same code in C++, if so how? Or is what I observed perfectly normal and I have to deal with it?
EDIT: added the first three loop bodies in TSR per request and my polynomial regression implementation looks like this:
public static double[] PolynomialRegression(in double[] xValues, in double[] yValues, byte order)
{
Debug.Assert(xValues != null && yValues != null);
Debug.Assert(xValues.Length == yValues.Length);
Debug.Assert(xValues.Length != 0 || yValues.Length != 0);
int dataSamples = xValues.Length;
double[] result = new double[order + 1];
// array containing N,sigma(xi),sigma(xi^2),sigma(xi^3)...sigma(xi^2*poly_order), where N=number of samples
double[] sigmaX = new double[2 * order + 1];
for (var index = 0U; index < sigmaX.Length; ++index)
{
sigmaX[index] = 0.0;
for (var dataPoint = 0U; dataPoint < dataSamples; ++dataPoint)
sigmaX[index] += System.Math.Pow(xValues[(int)dataPoint], index);
}
// array containing sigma(yi),sigma(xi*yi),sigma(xi^2*yi)...sigma(xi^poly_order*yi)
double[] sigmaY = new double[order + 1];
for (var pOrder = 0U; pOrder < sigmaY.Length; ++pOrder)
{
sigmaY[pOrder] = 0.0;
for (var dataPoint = 0U; dataPoint < dataSamples; ++dataPoint)
sigmaY[pOrder] += System.Math.Pow(xValues[(int)dataPoint], pOrder) * yValues[(int)dataPoint];
}
// equation system's augmented normal matrix
int matrixRows = order + 1;
int matrixCols = order + 2;
double[,] matrix = new double[matrixRows, matrixCols];
for (var row = 0U; row < matrixRows; ++row)
for (var col = 0U; col < matrixCols - 1; ++col)
matrix[row, col] = sigmaX[row + col];
for (var row = 0U; row < matrixRows; ++row)
matrix[row, order + 1] = sigmaY[row];
// pivotisation of matrix
for (var pivotRow = 0U; pivotRow < matrixRows; ++pivotRow)
for (var lowerRow = pivotRow + 1U; lowerRow < matrixRows; ++lowerRow)
if (matrix[pivotRow, pivotRow] < matrix[lowerRow, pivotRow])
for (var col = 0U; col < matrixCols; ++col)
{
double temp = matrix[pivotRow, col];
matrix[pivotRow, col] = matrix[lowerRow, col];
matrix[lowerRow, col] = temp;
}
// Gaussian elimination
for (var pivotRow = 0U; pivotRow < matrixRows; ++pivotRow)
for (var lowerRow = pivotRow + 1U; lowerRow < matrixRows; ++lowerRow)
{
double ratio = matrix[lowerRow, pivotRow] / matrix[pivotRow, pivotRow];
for (var col = 0U; col < matrixCols; ++col)
matrix[lowerRow, col] -= ratio * matrix[pivotRow, col];
}
// back-substitution
for (var row = (short)order; row >= 0; --row)
{
result[row] = matrix[row, order + 1];
for (var col = 0U; col < matrixCols - 1; ++col)
if (col != row)
result[row] -= matrix[row, col] * result[col];
result[row] /= matrix[row, row];
}
return result;
}
Thank you to everyone who commented. I have tried some of the suggestions: rewriting the PolynomialRegression method using fixed pointers, this did not have any effect. I also combined some of the loops in the TSR method, I now just have two parallel loops running in sequence (need the first one definitely to find the flash point), this helped but only little bit (something like 1m21s instead of 1m26s).
Then I analyzed the code a little more with the VS CPU profiler. 85% of all CPU cycles within the TSR method were done in the PolynomialRegression method, as expected this one does the bulk of the work. Within polynomial regression however was what surprised me: the System.Math.Pow method is actually a huge bottleneck.
the first call: sigmaX[index] += System.Math.Pow(xValues[(int)dataPoint], index); within the first loop owned about 55% of the CPU cycles
the second call in the second loop: sigmaY[pOrder] += System.Math.Pow(xValues[(int)dataPoint], pOrder) * yValues[(int)dataPoint]; owned about 26%
the other steps, even the big matrix pivotisation and Gaussian elimination steps were almost negligible by comparison.
I gathered this is because System.Math.Pow is the general implementation of the exponentiation problem, thus including all sorts of checking for negative and fractional exponents. Since in my current problem I only ever had positive integer exponents I wrote my own specialized method instead:
public static double UIntPow(double #base, uint power)
{
if (power == 0)
return 1.0;
else if (power == 1)
return #base;
else
return #base * UIntPow(#base, power - 1);
}
While this recursive method runs extremely slowly in debug mode (about twice as slowly as System.Math.Pow), it actually is very fast in the release build where code gets optimized. The execution of TSR now actually runs faster than the C++ equivalent, although I assume I could have gotten the same performance improvement there if I had also used my own UIntPow method.
Again thanks to everyone who took the time to look at my problem and maybe this solution helps someone in the future.
EDIT: Thanks again for the input! This algorithm runs even faster than my recursive attempt:
public static double UIntPow(double #base, uint power)
{
double result = 1.0;
while (power != 0)
{
if ((power & 1) == 1)
result *= #base;
#base *= #base;
power >>= 1;
}
return result;
}
Related
I am trying to figure out, if this really is the fastest approach. I want this to be as fast as possible, cache friendly, and serve a good time complexity.
DEMO: https://dotnetfiddle.net/BUGz8s
private static void InvokeMe()
{
int hz = horizontal.GetLength(0) * horizontal.GetLength(1);
int vr = vertical.GetLength(0) * vertical.GetLength(1);
int hzcol = horizontal.GetLength(1);
int vrcol = vertical.GetLength(1);
//Determine true from Horizontal information:
for (int i = 0; i < hz; i++)
{
if(horizontal[i / hzcol, i % hzcol] == true)
System.Console.WriteLine("True, on position: {0},{1}", i / hzcol, i % hzcol);
}
//Determine true position from vertical information:
for (int i = 0; i < vr; i++)
{
if(vertical[i / vrcol, i % vrcol] == true)
System.Console.WriteLine("True, on position: {0},{1}", i / vrcol, i % vrcol);
}
}
Pages I read:
Is there a "faster" way to iterate through a two-dimensional array than using nested for loops?
Fastest way to loop through a 2d array?
Time Complexity of a nested for loop that parses a matrix
Determining the big-O runtimes of these different loops?
EDIT: The code example, is now, more towards what I am dealing with. It's about determining a true point x,y from a N*N Grid. The information available at disposal is: horizontal and vertical 2D arrays.
To NOT cause confusion. Imagine, that overtime, some positions in vertical or horizontal get set to True. This works currently perfectly well. All I am in for, is, the current approach of using one for-loop per 2D array like this, instead of using two for loops per 2D array.
Time complexity for approach with one loop and nested loops is the same - O(row * col) (which is O(n^2) for row == col as in your example for both cases) so the difference in the execution time will come from the constants for operations (since the direction of traversing should be the same). You can use BenchmarkDotNet to measure those. Next benchmark:
[SimpleJob]
public class Loops
{
int[, ] matrix = new int[10, 10];
[Benchmark]
public void NestedLoops()
{
int row = matrix.GetLength(0);
int col = matrix.GetLength(1);
for (int i = 0; i < row ; i++)
for (int j = 0; j < col ; j++)
{
matrix[i, j] = i * row + j + 1;
}
}
[Benchmark]
public void SingleLoop()
{
int row = matrix.GetLength(0);
int col = matrix.GetLength(1);
var l = row * col;
for (int i = 0; i < l; i++)
{
matrix[i / col, i % col] = i + 1;
}
}
}
Gives on my machine:
Method
Mean
Error
StdDev
Median
NestedLoops
144.5 ns
2.94 ns
4.58 ns
144.7 ns
SingleLoop
578.2 ns
11.37 ns
25.42 ns
568.6 ns
Making single loop actually slower.
If you will change loop body to some "dummy" operation - for example incrementing some outer variable or updating fixed (for example first) element of the martix you will see that performance of both loops is roughly the same.
Did you consider
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
Console.Write(string.Format("{0:00} ", matrix[i, j]));
Console.Write(Environment.NewLine + Environment.NewLine);
}
}
It is basically the same loop as yours, but without / and % that compiler may or may not optimize.
This is a piece of my code, which calculate the differentiate. It works correctly but it takes a lot (because of height and width).
"Data" is a grey image bitmap.
"Filter" is [3,3] matrix.
"fh" and "fw" maximum values are 3.
I am looking to speed up this code.
I also tried with using parallel, for but it didn't work correct (error with out of bounds).
private float[,] Differentiate(int[,] Data, int[,] Filter)
{
int i, j, k, l, Fh, Fw;
Fw = Filter.GetLength(0);
Fh = Filter.GetLength(1);
float sum = 0;
float[,] Output = new float[Width, Height];
for (i = Fw / 2; i <= (Width - Fw / 2) - 1; i++)
{
for (j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
sum=0;
for(k = -Fw/2; k <= Fw/2; k++)
{
for(l = -Fh/2; l <= Fh/2; l++)
{
sum = sum + Data[i+k, j+l] * Filter[Fw/2+k, Fh/2+l];
}
}
Output[i,j] = sum;
}
}
return Output;
}
For parallel execution you need to drop c language like variable declaration at the beginning of method and declare them in actual scope that they are used so they are not shared between threads. Making it parallel should provide some benefit for performance, but making them all ParallerFors is not a good idea as there is a limit for threads amount that actually can run in parallel. I would try to make it with top level loop only:
private static float[,] Differentiate(int[,] Data, int[,] Filter)
{
var Fw = Filter.GetLength(0);
var Fh = Filter.GetLength(1);
float[,] Output = new float[Width, Height];
Parallel.For(Fw / 2, Width - Fw / 2 - 1, (i, state) =>
{
for (var j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
var sum = 0;
for (var k = -Fw / 2; k <= Fw / 2; k++)
{
for (var l = -Fh / 2; l <= Fh / 2; l++)
{
sum = sum + Data[i + k, j + l] * Filter[Fw / 2 + k, Fh / 2 + l];
}
}
Output[i, j] = sum;
}
});
return Output;
}
This is a perfect example of a task where using the GPU is better than using the CPU. A GPU is able to perform trillions of floating point operations per second (TFlops), while CPU performance is still measured in GFlops. The catch is that it's only any good if you use SIMD instructions (Single Instruction Multiple Data). The GPU excels at data-parallel tasks. If different data needs different instructions, using the GPU has no advantage.
In your program, the elements of your bitmap go through the same calculations: the same computations just with slightly different data (SIMD!). So using the GPU is a great option. This won't be too complex because with your calculations threads on the GPU would not need to exchange information, nor would they be dependent on results of previous iterations (Each element would be processed by a different thread on the GPU).
You can use, for example, OpenCL to easily access the GPU. More on OpenCL and using the GPU here: https://www.codeproject.com/Articles/502829/GPGPU-image-processing-basics-using-OpenCL-NET
I have a problem that I don't understand, in that code:
ilProbekUcz= valuesUcz.Count; //valuesUcz is the list of <float[]>
for (int i = 0; i < ilWezlowDanych; i++) nodesValueArrayUcz[i] = new BitArray(ilProbekUcz);
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < ilProbekUcz; i++)
{
int index = 0;
linia = (float[])valuesUcz[i];//removing this line not solve problem
for (int a = 0; a < ileRazem; a++)
for (int b = 0; b < ileRazem; b++)
if (a != b)
{
bool value = linia[a] >= linia[b];
nodesValueArrayUcz[index][i] = value;
nodesValueArrayUcz[ilWezlowDanychP2 + index][i] = !value;
index++;
}
}
sw.Stop();
When i increase size of valuesUcz 2x, time of execution is 4x bigger
When i increase size of valuesUcz 4x, time of execution is 8x bigger
etc ...
(ileRazem,ilWezlowDanych is the same)
I understand: increase of ilProbekUcz increases size of BitArrays but i test it many times and it is no problem - time should grow linearly - in code:
ilProbekUcz= valuesUcz.Count; //valuesTest is the list of float[]
for (int i = 0; i < ilWezlowDanych; i++) nodesValueArrayUcz[i] = new BitArray(ilProbekUcz);
BitArray test1 = nodesValueArrayUcz[10];
BitArray test2 = nodesValueArrayUcz[20];
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < ilProbekUcz; i++)
{
int index = 0;
linia = (float[])valuesUcz[i];//removing this line not solve problem
for (int a = 0; a < ileRazem; a++)
for (int b = 0; b < ileRazem; b++)
if (a != b)
{
bool value = linia[a] >= linia[b];
test1[i] = value;
test2[i] = !value;
index++;
}
}
time grows linearly, so the problem is to take a BitArray from the array...
Is any method to do it faster ? (i want time to grow linearly)
You have to understand that measuring time there are many factors that makes them inacurate. The biggest factor when you have huuuuuge arrays as in your example is cashe misses. Many times the same thing written when taking account of cashe, can be as much as 2-5 or more times faster. Two words how cashe works, very roughly. Cache is memory inside cpu. It is waaaaaaaaaaaaaay faster than ram so when you want to fetch a variable from memory you want to make sure this variable is stored in cache and not in ram. If it is stored in cache we say we have a hit otherwise a miss. Some times, not so often, a program is so big that it stores variables in hard drive. In that case you have a huuuuuuuuuuuge hit in delay when you fetch these! An example of cache:
Lets say we have an array of 10 elements in memory(ram)
when you get the first element testArray[0], because testArray[0] is not in cache the cpu brings this value along with a number(lets say 3, the number depends on the cpu) of adjacent elements of the array eg it stores to cache testArray[0], testArray[1], testArray[2], testArray[3]
Now when we get testArray[1] it is in cache so we have a hit. The same with testArray[2] and testArray[3]. testArray[4] isn't in cache so it gets testArray[4] along with another 3 testArray[5], testArray[6], testArray[7]
and so on...
Cache misses are very costly. That means you may expect an array of double the size is going to be accessible double the time. But this is not true. Bigger arrays more misses
and the time may increase 2 or 3 or 4 or more times from what you expect. This is normal. In your example that is what is happening. From 100 million elemensts(first array) you go t0 400 million (second one). The missesare not double but waaay more as you saw. A very cool trick has to do with the way you access an array. In your example ba1[j][i] = (j % 2) == 0; is way worse than ba1[i][j] = (j % 2) == 0;. The same with ba2[j][i] = (j % 2) == 0; and ba1[i][j] = (j % 2) == 0;. You can test it. Just reverse i and j. It has to do with the way the 2D array is stored in memory so in the second case you have more hits that the first one.
I am new to optimization problem and gone through several math library like Alglib, DotNumerics and Microsoft Solver Foundation, I have no luck on how to kick start, perhaps some experts can shed some lights.
I wanted to get optimal translation from 3d points on reference contour to target contour.
Below is the constrained optimization problem. How do I optimize it if I wish to use DotNumerics for instance, I have no idea how to kickstart:
Pr : 3d points on reference contour
Pt : 3d points on target contour
t(Pr): translation vector of point Pr <-- this is what I looking for
Below is the example provided by DotNumerics, how should I take all my 3d points as input and churn out translation vector?
public void OptimizationLBFGSBConstrained()
{
//This example minimize the function
//f(x0,x2,...,xn)= (x0-0)^2+(x1-1)^2+...(xn-n)^2
//The minimum is at (0,1,2,3...,n) for the unconstrained case.
//using DotNumerics.Optimization;
L_BFGS_B LBFGSB = new L_BFGS_B();
int numVariables = 5;
OptBoundVariable[] variables = new OptBoundVariable[numVariables];
//Constrained Minimization on the interval (-10,10), initial Guess=-2;
for (int i = 0; i < numVariables; i++) variables[i] = new OptBoundVariable("x" + i.ToString(), -2, -10, 10);
double[] minimum = LBFGSB.ComputeMin(ObjetiveFunction, Gradient,variables);
ObjectDumper.Write("L-BFGS-B Method. Constrained Minimization on the interval (-10,10)");
for (int i = 0; i < minimum.Length; i++) ObjectDumper.Write("x" + i.ToString() + " = " + minimum[i].ToString());
//Constrained Minimization on the interval (-10,3), initial Guess=-2;
for (int i = 0; i < numVariables; i++) variables[i].UpperBound = 3;
minimum = LBFGSB.ComputeMin(ObjetiveFunction, Gradient, variables);
ObjectDumper.Write("L-BFGS-B Method. Constrained Minimization on the interval (-10,3)");
for (int i = 0; i < minimum.Length; i++) ObjectDumper.Write("x" + i.ToString() + " = " + minimum[i].ToString());
//f(x0,x2,...,xn)= (x0-0)^2+(x1-1)^2+...(xn-n)^2
//private double ObjetiveFunction(double[] x)
//{
// int numVariables = 5;
// double f = 0;
// for (int i = 0; i < numVariables; i++) f += Math.Pow(x[i] - i, 2);
// return f;
//}
//private double[] Gradient(double[] x)
//{
// int numVariables = 5;
// double[] grad = new double[x.Length];
// for (int i = 0; i < numVariables; i++) grad[i] = 2 * (x[i] - i);
// return grad;
//}
}
Edit 1:
To make things complicated, I have added the real problem been working on Unity. I sampled 5 iso-contour lines from the reference model and did the same to target model(different mesh and different vertices position) - On first iso-contour on reference model, I sampled 8 normalized points(equally split by distance), still I did the same to the target model. Therefore, I have 2 pairs of corresponding point sets (target model normalized points position will always change given every user has different body size) - Next, I repeat steps mentioned above to cover rest of iso-contours. Once I done that, I will be using formula above to optimize the problem in order to get the optimal translation vector so that I can translate all vertices from reference to target model with 1 single translation vector (not sure this is possible) - Is this how optimization works?
Please ignore red line in the yellow iso-contour
Hi and thanks for looking!
Background
I have a computing task that requires either a lot of time, or parallel computing.
Specifically, I need to loop through a list of about 50 images, Base64 encode them, and then calculate the Levenshtein distance between each newly encoded item and values in an XML file containing about 2000 Base64 string-encoded images in order to find the string in the XML file that has the smallest Lev. Distance from the benchmark string.
A regular foreach loop works, but is too slow so I have chosen to use PLINQ to take advantage of my Core i7 multi-core processor:
Parallel.ForEach(candidates, item => findImage(total,currentWinner,benchmark,item));
The task starts brilliantly, racing along at high speed, but then I get an "Out of Memory" exception.
I am using C#, .NET 4, Forms App.
Question
How do I tweak my PLINQ code so that I don't run out of available memory?
Update/Sample Code
Here is the method that is called to iniate the PLINQ foreach:
private void btnGo_Click(object sender, EventArgs e)
{
XDocument doc = XDocument.Load(#"C:\Foo.xml");
var imagesNode = doc.Element("images").Elements("image"); //Each "image" node contains a Base64 encoded string.
string benchmark = tbData.Text; //A Base64 encoded string.
IEnumerable<XElement> candidates = imagesNode;
currentWinner = 1000000; //Set the "Current" low score to a million and bubble lower scores into it's place iteratively.
Parallel.ForEach(candidates, i => {
dist = Levenshtein(benchmark, i.Element("score").Value);
if (dist < currentWinner)
{
currentWinner = dist;
path = i.Element("path").Value;
}
});
}
. . .and here is the Levenshtein Distance Method:
public static int Levenshtein(string s, string t) {
int n = s.Length;
int m = t.Length;
var d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
Thanks in advance!
Update
Ran into this error again today under different circumstances. I was working on a desktop app with high memory demand. Make sure that you have set the project for 64-bit architecture to access all available memory. My project was set on x86 by default and so I kept getting out of memory exceptions. Of course, this only works if you can count on 64-bit processors for your deployment.
End Update
After struggling a bit with this it appears to be operator error:
I was making calls to the UI thread from the parallel threads in order to update progress labels, but I was not doing it in a thread-safe way.
Additionally, I was running the app without the debugger, so there was an uncaught exception each time the code attempted to update the UI thread from a parallel thread which caused the overflow.
Without being an expert on PLINQ, I am guessing that it handles all of the low-level allocation stuff for you as long as you don't make a goofy smelly code error like this one.
Hope this helps someone else.