Hi and thanks for looking!
Background
I have a computing task that requires either a lot of time, or parallel computing.
Specifically, I need to loop through a list of about 50 images, Base64 encode them, and then calculate the Levenshtein distance between each newly encoded item and values in an XML file containing about 2000 Base64 string-encoded images in order to find the string in the XML file that has the smallest Lev. Distance from the benchmark string.
A regular foreach loop works, but is too slow so I have chosen to use PLINQ to take advantage of my Core i7 multi-core processor:
Parallel.ForEach(candidates, item => findImage(total,currentWinner,benchmark,item));
The task starts brilliantly, racing along at high speed, but then I get an "Out of Memory" exception.
I am using C#, .NET 4, Forms App.
Question
How do I tweak my PLINQ code so that I don't run out of available memory?
Update/Sample Code
Here is the method that is called to iniate the PLINQ foreach:
private void btnGo_Click(object sender, EventArgs e)
{
XDocument doc = XDocument.Load(#"C:\Foo.xml");
var imagesNode = doc.Element("images").Elements("image"); //Each "image" node contains a Base64 encoded string.
string benchmark = tbData.Text; //A Base64 encoded string.
IEnumerable<XElement> candidates = imagesNode;
currentWinner = 1000000; //Set the "Current" low score to a million and bubble lower scores into it's place iteratively.
Parallel.ForEach(candidates, i => {
dist = Levenshtein(benchmark, i.Element("score").Value);
if (dist < currentWinner)
{
currentWinner = dist;
path = i.Element("path").Value;
}
});
}
. . .and here is the Levenshtein Distance Method:
public static int Levenshtein(string s, string t) {
int n = s.Length;
int m = t.Length;
var d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
Thanks in advance!
Update
Ran into this error again today under different circumstances. I was working on a desktop app with high memory demand. Make sure that you have set the project for 64-bit architecture to access all available memory. My project was set on x86 by default and so I kept getting out of memory exceptions. Of course, this only works if you can count on 64-bit processors for your deployment.
End Update
After struggling a bit with this it appears to be operator error:
I was making calls to the UI thread from the parallel threads in order to update progress labels, but I was not doing it in a thread-safe way.
Additionally, I was running the app without the debugger, so there was an uncaught exception each time the code attempted to update the UI thread from a parallel thread which caused the overflow.
Without being an expert on PLINQ, I am guessing that it handles all of the low-level allocation stuff for you as long as you don't make a goofy smelly code error like this one.
Hope this helps someone else.
Related
I try to optimize pefrormance using IJobParallelFor in Unity code:
Unfortunatelly I met error likie this:
System.IndexOutOfRangeException: Index {0} is out of restricted
IJobParallelFor range [{1}...{2}] in ReadWriteBuffer.
I tried to use
[NativeDisableParallelForRestriction] and
[NativeDisableContainerSafetyRestriction]
but without effect
[BurstCompile(CompileSynchronously = true)]
public struct DilationJob : IJobParallelFor
{
[ReadOnly]
public NativeArray<Color32> colorsArray;
public NativeArray<int> voxelToColor;
public int kernelSize;
public NativeArray<int> neighboursArray;
public int cubeNumber;
public void Execute(int index)
{
int dimx = 512;
int dimy = 512;
//int[] neighboursArray = new int[kernelSize * kernelSize * kernelSize];
int listIndex = 0;
for (int i = -1; i < kernelSize - 1; i++)
{
for (int j = -1; j < kernelSize - 1; j++)
{
for (int k = -1; k < kernelSize - 1; k++)
{
int neigbourIndex = (i + 1) * (j + 1) * kernelSize + (j + 1) * kernelSize + (k + 1);
if (neigbourIndex < 0)
{
neigbourIndex = 0;
}
neighboursArray[neigbourIndex] =
index + k * dimx * dimy + j * dimx + i;
if (neighboursArray[neigbourIndex] < colorsArray.Length && neighboursArray[neigbourIndex] >= 0 &&
colorsArray[neighboursArray[neigbourIndex]].b == 255)
{
voxelToColor[listIndex] = index;
listIndex++;
}
}
}
}
}
}
Afaik this exception is caused by the fact that an IJobParallelFor as the same says is executed in parallel
Execute(int index) will be executed once for each index from 0 to the provided length. Each iteration must be independent from other iterations (The safety system enforces this rule for you). The indices have no guaranteed order and are executed on multiple cores in parallel.
Unity automatically splits the work into chunks of no less than the provided batchSize, and schedules an appropriate number of jobs based on the number of worker threads, the length of the array and the batch size.
What this means is lets say you have an array of length 10 an you say the batchSize can be up to 4 then you will probably end up with 3 parallel job chunks for the indices [0, 1, 2, 3], [4, 5, 6, 7] and [8, 9]. Since each of these 3 chunks is possibly on a different kernel they only get access to their according chunk of the NativeArray(s). (More on that here)
What probably happens is that multiple of your parallel jobs try to access and write to the same index of your output NativeArrays voxelToColor and neighboursArray. In specific without doing the rest of calculation you definitely will possibly try to write to voxelToColor[0] in each and every of your parallel jobs which is neither allowed nor makes a lot of sense to me.
Within one Execute call you are restricted to only write to the given index.
Afaik the message should further read
ReadWriteBuffers are restricted to only read & write the element at the job index. You can use double buffering strategies to avoid race conditions due to reading & writing in parallel to the same elements from a job.
[ReadOnly] tagged arrays are the exception because here multiple parallel accesses can't corrupt the data as long as you only read.
Then the [NativeDisableContainerSafetyRestriction] if I understand correctly solves race conditions between different jobs and the main thread.
While what you probably want to go with is more [NativeDisableParallelForRestriction] which as far as I understand disables the security restriction for parallel access to array indices. To be honest the Unity API is quite spare on these.
As a little example
public class Example : MonoBehaviour
{
public Color32[] colors = new Color32[10];
private void Awake()
{
var job = new DilationJob()
{
colorsArray = new NativeArray<Color32>(colors, Allocator.Persistent),
voxelToColor = new NativeArray<int>(colors.Length, Allocator.Persistent)
};
var handle = job.Schedule(colors.Length, 4);
handle.Complete();
foreach (var i in job.voxelToColor)
{
Debug.Log(i);
}
job.colorsArray.Dispose();
job.voxelToColor.Dispose();
}
}
[BurstCompile(CompileSynchronously = true)]
public struct DilationJob : IJobParallelFor
{
[ReadOnly] public NativeArray<Color32> colorsArray;
[NativeDisableParallelForRestriction]
public NativeArray<int> voxelToColor;
public void Execute(int index)
{
voxelToColor[index] = colorsArray[index].a;
if (index + 1 < colorsArray.Length - 1) voxelToColor[index + 1] = 0;
}
}
should not throw any exception. But if you comment out the [NativeDisableParallelForRestriction] you will get the exception you are getting.
This is a piece of my code, which calculate the differentiate. It works correctly but it takes a lot (because of height and width).
"Data" is a grey image bitmap.
"Filter" is [3,3] matrix.
"fh" and "fw" maximum values are 3.
I am looking to speed up this code.
I also tried with using parallel, for but it didn't work correct (error with out of bounds).
private float[,] Differentiate(int[,] Data, int[,] Filter)
{
int i, j, k, l, Fh, Fw;
Fw = Filter.GetLength(0);
Fh = Filter.GetLength(1);
float sum = 0;
float[,] Output = new float[Width, Height];
for (i = Fw / 2; i <= (Width - Fw / 2) - 1; i++)
{
for (j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
sum=0;
for(k = -Fw/2; k <= Fw/2; k++)
{
for(l = -Fh/2; l <= Fh/2; l++)
{
sum = sum + Data[i+k, j+l] * Filter[Fw/2+k, Fh/2+l];
}
}
Output[i,j] = sum;
}
}
return Output;
}
For parallel execution you need to drop c language like variable declaration at the beginning of method and declare them in actual scope that they are used so they are not shared between threads. Making it parallel should provide some benefit for performance, but making them all ParallerFors is not a good idea as there is a limit for threads amount that actually can run in parallel. I would try to make it with top level loop only:
private static float[,] Differentiate(int[,] Data, int[,] Filter)
{
var Fw = Filter.GetLength(0);
var Fh = Filter.GetLength(1);
float[,] Output = new float[Width, Height];
Parallel.For(Fw / 2, Width - Fw / 2 - 1, (i, state) =>
{
for (var j = Fh / 2; j <= (Height - Fh / 2) - 1; j++)
{
var sum = 0;
for (var k = -Fw / 2; k <= Fw / 2; k++)
{
for (var l = -Fh / 2; l <= Fh / 2; l++)
{
sum = sum + Data[i + k, j + l] * Filter[Fw / 2 + k, Fh / 2 + l];
}
}
Output[i, j] = sum;
}
});
return Output;
}
This is a perfect example of a task where using the GPU is better than using the CPU. A GPU is able to perform trillions of floating point operations per second (TFlops), while CPU performance is still measured in GFlops. The catch is that it's only any good if you use SIMD instructions (Single Instruction Multiple Data). The GPU excels at data-parallel tasks. If different data needs different instructions, using the GPU has no advantage.
In your program, the elements of your bitmap go through the same calculations: the same computations just with slightly different data (SIMD!). So using the GPU is a great option. This won't be too complex because with your calculations threads on the GPU would not need to exchange information, nor would they be dependent on results of previous iterations (Each element would be processed by a different thread on the GPU).
You can use, for example, OpenCL to easily access the GPU. More on OpenCL and using the GPU here: https://www.codeproject.com/Articles/502829/GPGPU-image-processing-basics-using-OpenCL-NET
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
namespace SquareStars
{
class Program
{
static void Main(string[] args)
{
var n = int.Parse(Console.ReadLine());
for (int i = 0; i <n-1; i++)
{
Console.Write("*");
}
for (int i = 0; i < n-1; i++)
{
Console.WriteLine("*");
}
for (int i = 0; i < n; i++)
{
Console.Write("*");
}
}
}
}
My exercises is to do Square of Stars like this:
depends of "n".I try to use for loops,but I can do right side of square.
I do this:
***
*
***
but I want this:
*** ****
* * * *
*** or * *
****
Can sameone help me to this code????
The following should do:
public static string CreateASCIISquare(int squareSideLength, char c = '*')
{
if (squareSideLength < 1)
return "";
if (squareSideLength == 1)
return c.ToString();
var horizontalOuterRow = new String(c, squareSideLength);
var horizontalInnerRow = $"{c}{new string(' ', squareSideLength - 2)}{c}";
var squareBuilder = new StringBuilder();
squareBuilder.AppendLine(horizontalOuterRow);
for (int i = 0; i < squareSideLength - 2; i++)
{
squareBuilder.AppendLine(horizontalInnerRow);
}
squareBuilder.Append(horizontalOuterRow);
return squareBuilder.ToString();
}
Ok, so lets explain the code a little bit:
Always validate the data going into your method. In your case the user can specify the length of the square's side. Are there any invalid values? Well clearly yes, -2 doesn't seem to be a valid choice.
You have many options here. You can tell the user the value is not valid, you can simply ignore it, do nothing and let your application crash and die miserably, or you can return an empty square. All are valid choices, but consciously choose one and design according to your decision. In my case I chose to return an empty square:
if (squareSideLength < 0)
return "";
Are there any trivial cases I can manage easily without for loops and Console.Writes etc.? Yes, lengths 0 and 1 seem pretty straightforward. It stands to reason that if I've returned the empty square for negative values, I do the same for 0 sized squares. So I change the previous code to:
if (squareSideLength < 1)
return "";
Good, now what about 1? Well that's pretty easy too isn't it?
if (squareSideLength == 1)
return c.ToString();
Moral of the story: take care of invalid data or trivial cases first. Many times trivial cases can also be corner cases that can complicate your general solution, get them out of the way fast!
Ok, now lets think about what a square looks like:
**
**
***
* *
***
****
* *
* *
****
Well the pattern seems pretty obvious. A square is made up of two rows containing the specified number of stars, and 0 or more rows containing only 2 stars and squareSideLength - 2 spaces in between. Well, its only two types of rows, lets build them up front:
var horizontalOuterRow = new String(c, squareSideLength);
var horizontalInnerRow = $"{c}{new string(' ', squareSideLength - 2)}{c}";
Great! We got our building blocks, now lets build our square:
So how does that go? Well we simply start by adding a horizontalOuterRow then we add the squareSideLength - 2 horizontalInnerRows and we finish up adding one more horizontalOuterRow.
And voilá, we got ourselves our nice little ASCII square:
squareBuilder.AppendLine(horizontalOuterRow);
for (int i = 0; i < squareSideLength - 2; i++)
{
squareBuilder.AppendLine(horizontalInnerRow);
}
squareBuilder.Append(horizontalOuterRow);
The fact that I've used a StringBuilder is not really germane to the question, but get into the habit of using this tool when constructing dynamically built strings you don't know the length of beforehand. Concatenating strings can give you pretty bad performance so it's best to avoid it if possible.
Now we proudly return our beautiful ASCII square back to our extatic user and bask under the general round of applause:
return squareBuilder.ToString();
Hope this little answer helps you.
Remember, to code well, break everything up into very small tasks, and take care of them one at a time. Think how you'd do it by hand and write it that way. Once you have it working there'll be time to see if you need to optimize, refactor, etc. your code. But the key is getting it to work with code as clear as possible.
You can use the string(char, int) constructor to create strings of repeating characters. This lets you simplify the code to use a single for loop:
var n = int.Parse(Console.ReadLine());
Console.WriteLine(new string('*', n));
for (int i = 0; i < n - 2; i++)
{
Console.WriteLine("*" + new string(' ', n - 2) + "*");
}
Console.WriteLine(new string('*', n));
class Program
{
static void Main(string[] args)
{
var n = int.Parse(Console.ReadLine());
for (int row = 1; row <= n; row++)
{
for (int col = 1; col <= n; col++)
{
if (row == 1 || row == n)
{
Console.Write("*");
}
else
{
if (col == 1 || col == n)
{
Console.Write("*");
}
else
{
Console.Write(" ");
}
}
}
Console.WriteLine();
}
Console.ReadKey();
}
}
You can change Console.WriteLine in 2nd loop like this :
class Program
{
static void Main(string[] args)
{
var n = int.Parse(Console.ReadLine());
for (int i = 0; i < n - 1; i++)
{
Console.Write("*");
}
for (int i = 0; i < n - 1; i++)
{
Console.WriteLine(i == 0 ? "*" : "*" + string.Empty.PadLeft(n - 2) + "*");
}
for (int i = 0; i < n; i++)
{
Console.Write("*");
}
}
}
Lately I've implemented my own neural network (using different guides, but mainly from here), for future use (I intend to use it for an OCR program i'l develop). currently I'm testing it, and I'm having this weird problem.
Whenever I give my network a training example, the algorithm changes the weights in a way that leads to the desired output. However, after a few training examples, the weights get messed up- making the network work well for some outputs, and making it wrong for other outputs (even if I enter the input of the training examples, exactly as it was).
I would appreciate if someone directed me towards the problem, should they see it.
Here are the methods for calculating the error of the neurons and the weight adjusting-
private static void UpdateOutputLayerDelta(NeuralNetwork Network, List<double> ExpectedOutputs)
{
for (int i = 0; i < Network.OutputLayer.Neurons.Count; i++)
{
double NeuronOutput = Network.OutputLayer.Neurons[i].Output;
Network.OutputLayer.Neurons[i].ErrorFactor = ExpectedOutputs[i]-NeuronOutput; //calculating the error factor
Network.OutputLayer.Neurons[i].Delta = NeuronOutput * (1 - NeuronOutput) * Network.OutputLayer.Neurons[i].ErrorFactor; //calculating the neuron's delta
}
}
//step 3 method
private static void UpdateNetworkDelta(NeuralNetwork Network)
{
NeuronLayer UpperLayer = Network.OutputLayer;
for (int i = Network.HiddenLayers.Count - 1; i >= 0; i--)
{
foreach (Neuron LowerLayerNeuron in Network.HiddenLayers[i].Neurons)
{
for (int j = 0; j < UpperLayer.Neurons.Count; j++)
{
Neuron UpperLayerNeuron = UpperLayer.Neurons[j];
LowerLayerNeuron.ErrorFactor += UpperLayerNeuron.Delta * UpperLayerNeuron.Weights[j + 1]/*+1 because of bias*/;
}
LowerLayerNeuron.Delta = LowerLayerNeuron.Output * (1 - LowerLayerNeuron.Output) * LowerLayerNeuron.ErrorFactor;
}
UpperLayer = Network.HiddenLayers[i];
}
}
//step 4 method
private static void AdjustWeights(NeuralNetwork Network, List<double> NetworkInputs)
{
//Adjusting the weights of the hidden layers
List<double> LowerLayerOutputs = new List<double>(NetworkInputs);
for (int i = 0; i < Network.HiddenLayers.Count; i++)
{
foreach (Neuron UpperLayerNeuron in Network.HiddenLayers[i].Neurons)
{
UpperLayerNeuron.Weights[0] += -LearningRate * UpperLayerNeuron.Delta;
for (int j = 1; j < UpperLayerNeuron.Weights.Count; j++)
UpperLayerNeuron.Weights[j] += -LearningRate * UpperLayerNeuron.Delta * LowerLayerOutputs[j - 1] /*-1 because of bias*/;
}
LowerLayerOutputs = Network.HiddenLayers[i].GetLayerOutputs();
}
//Adjusting the weight of the output layer
foreach (Neuron OutputNeuron in Network.OutputLayer.Neurons)
{
OutputNeuron.Weights[0] += -LearningRate * OutputNeuron.Delta * 1; //updating the bias - TODO: change this if the bias is also changed throughout the program
for (int j = 1; j < OutputNeuron.Weights.Count; j++)
OutputNeuron.Weights[j] += -LearningRate * OutputNeuron.Delta * LowerLayerOutputs[j - 1];
}
}
The learning rate is 0.5, and the neurons' activation function is a sigmoid function.
EDIT: I've noticed I never implemented the function to calculate the overall error: E=0.5 * Sum(t-y) for each training example. could that be the problem? and if so, how should I fix it?
The learning rate 0.5 seems a bit too large. Usually values closer to 0.01 or 0.1 are used. Also, it usually helps in convergence if training patterns are presented in random order. More useful hints can be found here: Neural Network FAQ (comp.ai.neural archive).
I'm creating System.Threading.Tasks from within a for loop, then running a ContinueWith within the same loop.
int[,] referencedArray = new int[input.Length / 4, 5];
for (int i = 0; i <= input.Length/4; i += 2048)
{
int[,] res = new int[input.Length / 4, 5];
int[,] arrayToReference = new int[input.Length / 4, 5];
Array.Clear(arrayToReference, 0, arrayToReference.Length);
Array.Copy(input, i, arrayToReference, 0, input.Length / 4);
Task<Tuple<int[,], int>> test = Task.Factory.StartNew(() => addReferences(arrayToReference, i));
test.ContinueWith(t => { Array.Copy(t.Result.Item1, 0, referencedArray, t.Result.Item2, input.Length / 4); MessageBox.Show("yai", t.Result.Item2.ToString()); });
Array.Copy(res, 0, referencedArray, i, input.Length / 4);
where the addReference sbroutine is:
public Tuple<int[,],int> addReferences(int[,] input, int index)
{
for (int i = 0; i < 2048; i++)
{
for (int j = 0; j < i; j++)
{
if ((input[i, 0] == input[j, 0]) && (input[i, 1] == input[j, 1]) && (input[i, 2] == input[j, 2]) && (input[i, 3] == input[j, 3]))
{
input[i, 4] = (j - i);
}
}
}
}
return new Tuple<int[,],int>(input,index);
}
However, I am getting really strange results:
1.The index (generated from the loop counter when the task is started) somehow becomes too large. I initially thought that this was because the loop counter had incremented past its maximum, stopping the loop, but when the continueWith executed it used the new value. However, even when I send the loop counter's value to the task as index when it starts, the error persists. Edit: Solved
I made I mistake in the original count
2.For some reason fewer than expected tasks are actually completed (e.g. even though 85 tasks are expected to be created, based on a loop counter maximum of 160,000 divided by a iteration of 2,048 = 78, only 77 pop ups appeared) - Edit Checking the console, it's apparent that the loop counter got up until about 157,696 before suddenly stopping.
I'm really confused by Tasks and Threading, so if you could be of any assistance that would be really useful - thanks in advance.
Edit I've made the changes suggested by Douglas, which solved my first problem - I'm still stuck on the second.
This might not completely resolve the issues you're having, but as a starting point, you need to avoid the closure issue by copying your loop counter to an inner variable before referencing it in your anonymous method:
for (int i = 0; i <= input.Length/4; i += 2048)
{
// ...
int iCopy = i;
var test = Task.Factory.StartNew(() => addReferences(arrayToReference, iCopy));
// ...
}
See Jon Skeet's answer (particularly the linked article) for an explanation of why this is necessary.
Edit: Regarding your second problem, I suspect it might have to do with integer division. What is the value of input.Length for which you're expecting 85 iterations? You say that 174,000 divided by 2,048 should give 85, but in practice, it will give 84, since integer division causes results to be truncated.
Edit2: Another reason you're not seeing all expected iterations could be that your program is terminating without waiting for the tasks (which typically execute on background threads) to complete. Check whether waiting on the tasks resolves your issue:
List<Task> tasks = new List<Task>();
for (int i = 0; i <= input.Length/4; i += 2048)
{
// ...
var test = Task.Factory.StartNew(() => addReferences(arrayToReference, iCopy));
tasks.Add(test);
// ...
}
Task.WaitAll(tasks);