C#.Net - Threads are grabbing the same files within a loop - c#

I'm trying to design a program that uses an external OCR application to flip an image until its right side up. All of the image locations are kept in files[].
The problem is, doing one file at a time is too slow to handle the tens of thousands of images I have. I need to launch several instances of the OCR program to scan multiples images at the same time.
My crappy implementation is the following:
public Program(string[] files)
{
for(int i = 0; i < files.Length; i++)
{
ThreadStart start = () => {flip(files[i]);};
Thread t = new Thread(start);
t.Start();
if(i % 5 == 0)
{
t.Join();
}
}
}
The code is supposed to launch 5 instances of the OCR program. On every fifth, it waits for the thread to close before continuing. This is supposed to act as a buffer.
However, what's happening instead is that repeating files are being passed into the OCR program instead of a different one for each iteration. Different threads are grabbing the same file. This causes a crash when the different instances of the OCR application go to work on the same file.
Does anyone have any idea whats going on, or know a completely different approach I can take?

You're suffering from a problem called accessing a modified closure. The value of i is changing as the threads are starting. Change the code to use a local variable instead.
for (int i = 0; i < args.Length; i++)
{
int currenti = i;
ThreadStart start = () => { flip(files[currenti]); };
Thread t = new Thread(start);
t.Start();
if (i % 5 == 0)
{
t.Join();
}
}

The problem is that your lambda expression is capturing the variable i, rather than its value for that iteration of the loop.
There are two options:
Capture a copy
for (int i = 0; i < files.Length; i++)
{
int copy = i;
ThreadStart start = () => flip(files[copy]); // Braces aren't needed
...
}
Use foreach - C# 5 only!
This won't help as much in your case because you're joining on every fifth item, so you need the index, but if you didn't have that bit and if you were using C# 5, you could just use:
foreach (var file in files)
{
ThreadStart start = () => flip(file);
...
}
Note that prior to C# 5, this would have had exactly the same problem.
For more details of the problem, see Eric Lippert's blog posts (part one; part two).

Related

How to best use multiple Tasks? (progress reporting and performance)

I created the following code to compare images and check if they are similar. Since that takes quite a while, I tried to optimized my code using multithreading.
I worked with BackgroundWorker in the past and was now starting to use Tasks, but I am still not fully familiar with that.
Code below:
allFiles is a list of images to be compared.
chunksToCompare contains subset of the Tuples of files to compare (always a combination of two files to compare) - so each task can compare e. g. 20 Tuples of files.
The code below works fine in general but has two issues
progress reporting does not really make sense, since progress is only updated when all Tasks have been completed which takes quite a while
depending on the size of files, each thread has different processing time: in the code below it always waits until all (64) task are completed before the next is started which is obviously not optimal
Many thanks in advance of any hint / idea.
// List for results
List<SimilarImage> similarImages = new List<SimilarImage>();
// create chunk of files to send to a thread
var chunksToCompare = GetChunksToCompare(allFiles);
// position of processed chunks of files
var i = 0;
// number of tasks
var taskCount = 64;
while (true)
{
// list of all tasks
List<Task<List<SimilarImage>>> tasks = new();
// create single tasks
for (var n = 0; n < taskCount; n++)
{
var task = (i + 1 + n < chunksToCompare.Count) ?
GetSimilarImageAsync2(chunksToCompare[i + n], threshold) : null;
if (task != null) tasks.Add(task);
}
// wait for all tasks to complete
await Task.WhenAll(tasks.Where(i => i != null));
// get results of single task and add it to list
foreach (var task in tasks)
{
if (task?.Result != null) similarImages.AddRange(task.Result);
}
// progress of processing
i += tasks.Count;
// report the progress
progress.Report(new ProgressInformation() { Count = chunksToCompare.Count,
Position = i + 1 });
// exit condition
if (i + 1 >= chunksToCompare.Count) break;
}
return similarImages;
More info: I am using .NET 6. Images are stores on a SSD. With my test dataset it took 6:30 minutes with sequential and 4:00 with parallel execution. I am using a lib which only takes the image path of two images and then compares them. There is a lot of overhead because the same image is reloaded multiple times. I was looking for a different lib to compare images, but I was not successful.

Is there a way of running a single method call using more threads?

I am currently working on a project at my University with trafficlights. I am using SUMO as my simulation program and i have stumbled upon the TraCI libary for controlling the trafficlights.
I have programmed a genetic algorithm, but i have one problem, which in essense, is a bottleneck so small no particle can pass, and that is the simulation program itself.
When controlling multiple clients from the same program (my program) all of the clients run on 2 threads, where in my case i have 8 available. My intention with running the program in multiple threads is, that the program will run faster, since 100 simulations takes roughly 1,5 hours to complete even though i have only simulated about 40 minutes of traffic.
I have posted below the method in which i initialize, start the clients and control them.
The main culprit is probably the two method calls in the last for-loop (the one that control the traffic lights)
So my question is, how can this be parallelized to run on multiple threads, so the program runs faster?
best regards
private async Task RunSimulationAsync()
{
List<TraCIClient> listOfClients = new List<TraCIClient>();
List<SimulationCommands> listOfSimulations = new List<SimulationCommands>();
List<TrafficLightCommands> listOfTrafficLights = new List<TrafficLightCommands>();
//initialize clients, simulationCommands and trafficlightCommands used for controlling sumo
for (int i = 0; i < numberOfInstances; ++i)
{
listOfClients.Add(new TraCIClient());
listOfSimulations.Add(new SimulationCommands(listOfClients[i]));
listOfTrafficLights.Add(new TrafficLightCommands(listOfClients[i]));
}
//open SUMO clients
for (int i = 0; i < numberOfInstances; ++i)
{
OpenSumo(portNumber, sumoOutputFilePath + $"{i}.xml");
await listOfClients[i].ConnectAsync("127.0.0.1", portNumber);
++portNumber;
}
// control trafficlights in simulation
for (int i = 0; i < dnaSize; ++i)
{
for (int j = 0; j < numberOfInstances; j++)
{
listOfTrafficLights[j].SetRedYellowGreenState("n0", $" {Population[j].genes[i]}");
listOfClients[j].Control.SimStep();
}
}
how can this be parallelized to run on multiple threads, so the program runs faster?
First, you need to be sure that the library you're using is capable of being called in a parallel fashion. Not all of them are.
Second, since you have a for index, the most straightforward translation would be to use Parallel.For, e.g.:
for (int i = 0; i < dnaSize; ++i)
{
Parallel.For(0, numberOfInstances, j =>
{
listOfTrafficLights[j].SetRedYellowGreenState("n0", $" {Population[j].genes[i]}");
listOfClients[j].Control.SimStep();
}
}
This will parallelize out to the number of instances.

Time Slicing Between Five Threads In C#

Here's a description of what the program should do. The program should create a file and five threads to write in that file...
The first thread should write from 1 to 5 into that file.
The second thread should write from 1 to 10.
The third thread should write from 1 to 15.
The fourth thread should write from 1 to 20.
The fifth thread should write from 1 to 25.
Moreover, an algorithm should be implemented to make each thread print 2 numbers and stops. the next thread should print two numbers and stop. and so on until all the threads finish printing their numbers.
Here's the code I've developed so far...
using System;
using System.IO;
using System.Threading;
using System.Collections;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
public static class OSAssignment
{
// First Thread Tasks...
static void FirstThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 5; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread1] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread1] " + i);
}
}
}
// Second Thread Tasks...
static void SecondThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 10; i++)
{
if (i % 2 == 0)
{
if (i == 10)
Console.WriteLine("[Thread2] " + i);
else
{
Console.WriteLine("[Thread2] " + i);
Thread.Sleep(i);
}
}
else
{
Console.WriteLine("[Thread2] " + i);
}
}
}
// Third Thread Tasks..
static void ThirdThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 15; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread3] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread3] " + i);
}
}
}
// Fourth Thread Tasks...
static void FourthThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 20; i++)
{
if (i % 2 == 0)
{
if (i == 20)
Console.WriteLine("[Thread4] " + i);
else
{
Console.WriteLine("[Thread4] " + i);
Thread.Sleep(i);
}
}
else
{
Console.WriteLine("[Thread4] " + i);
}
}
}
// Fifth Thread Tasks...
static void FifthThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 25; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread5] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread5] " + i);
}
}
}
// Main Function...
static void Main(string[] args)
{
FileStream File = new FileStream("output.txt", FileMode.Create, FileAccess.Write, FileShare.Write);
StreamWriter Writer = new StreamWriter(File);
Thread T1 = new Thread(() => FirstThreadTasks(Writer));
Thread T2 = new Thread(() => SecondThreadTasks(Writer));
Thread T3 = new Thread(() => ThirdThreadTasks(Writer));
Thread T4 = new Thread(() => FourthThreadTasks(Writer));
Thread T5 = new Thread(() => FifthThreadTasks(Writer));
Console.WriteLine("Initiating Jobs...");
T1.Start();
T2.Start();
T3.Start();
T4.Start();
T5.Start();
Writer.Flush();
Writer.Close();
File.Close();
}
}
}
Here's the problems I'm facing...
I cannot figure out how to make the 5 threads write into the same file at the same time even with making FileShare.Write. So I simply decided to write to console for time being and to develop the algorithm and see how it behaves first in console.
Each time I run the program, the output is slightly different from previous. It always happen that a thread prints only one of it's numbers in a specific iteration and continues to output the second number after another thread finishes its current iteration.
I've a got a question that might be somehow offtrack. If I removed the Console.WriteLine("Initiating Jobs..."); from the main method, the algorithm won't behave like I mentioned in Point 2. I really can't figure out why.
Your main function is finishing and closing the file before the threads have started writing to it, so you can you use Thread.Join to wait for a thread to exit. Also I'd advise using a using statement for IDisposable objects.
When you have a limited resources you want to share among threads, you'll need a locking mechanism. Thread scheduling is not deterministic. You've started 5 threads and at that point it's not guaranteed which one will run first. lock will force a thread to wait for a resource to become free. The order is still not determined so T3 might run before T2 unless you add additional logic/locking to force the order as well.
I'm not seeing much difference in the behavior but free running threads will produce some very hard to find bugs especially relating to timing issues.
As an extra note I'd avoid using Sleep as a way of synchronizing threads.
To effectively get one thread to write at a time you need to block all other threads, there's a few methods for doing that such as lock, Mutex, Monitor,AutoResetEvent etc. I'd use an AutoResetEvent for this situation. The problem you then face is each thread needs to know which thread it's waiting for so that it can wait on the correct event.
Please see James' answer as well. He points out a critical bug that escaped my notice: you're closing the file before the writer threads have finished. Consider posting a new question to ask how to solve that problem, since this "question" is already three questions rolled into one.
FileShare.Write tells the operating system to allow other attempts to open the file for writing. Typically this is used for systems that have multiple processes writing to the same file. In your case, you have a single process and it only opens the file once, so this flag really makes no difference. It's the wrong tool for the job.
To coordinate writes between multiple threads, you should use locking. Add a new static field to the class:
private static object synchronizer = new object();
Then wrap each write operation on the file with a lock on that object:
lock(synchronizer)
{
Console.WriteLine("[Thread1] " + i);
}
This wil make no difference while you're using the Console, but I think it will solve the problem you had with writing to the file.
Speaking of which, switching from file write to console write to sidestep the file problem was a clever idea, so kudos for that. Howver an even better implementation of that idea would be to replace all of the write calls with a call to a single function, e.g. "WriteOutput(string)" so that you can switch everything from file to console just by changing one line in that function.
And then you could put the lock into that function as well.
Threaded stuff is not deterministic. It's guaranteed that each thread will run, but there are no guarantees about ordering, when threads will be interrupted, which thread will interrupt which, etc. It's a roll of the dice every time. You just have to get used to it, or go out of your way to force thing to happen in a certain sequence if that really matters for your application.
I dunno about this one. Seems like that shouldn't matter.
OK, I'm coming to this rather late, and but from a theoretical point of view, I/O from multiple threads to a specific end-point is inevitably fraught.
In the example above, it would almost certainly faster and safer to queue the output into an in-memory structure, each thread taking an exclusive lock before doing so, and then having a separate thread to output to the device.

Run a function on multiple threads with a limited count

Let's say I have a list that contains list of files I want to download and a function that gets a file URL name and downloads it. I want to download at max 4 parallel downloads. I know I can use a perfect solution for this:
Parallel.ForEach
(
Result,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
file => DownloadSingleFile(file)
);
But what do you suggest if we don't want to use this method? What is best in your idea?
Thank you.
How about a good old-fashioned Thread like this:
for(int i = 0; i < numThreads; i++)
{
Thread t = new Thread(()=>
{
try
{
while(working)
{
file = DownloadSingleFile(blockingFileQueue.Dequeue());
}
}
catch(InterruptException)
{
// eat the exception and exit the thread
}
});
t.IsBackground = true;
t.Start();
}
We may start 4 threads or use ThreadPool to add 4 work items.

Running multiple threads, starting new one as another finishes

I have an application that has many cases. Each case has many multipage tif files. I need to covert the tf files to pdf file. Since there are so many file, I thought I could thread the conversion process. I'm currently limiting the process to ten conversions at a time (i.e ten treads). When one conversion completes, another should start.
This is the current setup I'm using.
private void ConvertFiles()
{
List<AutoResetEvent> semaphores = new List<AutoResetEvet>();
foreach(String fileName in filesToConvert)
{
String file = fileName;
if(semaphores.Count >= 10)
{
WaitHandle.WaitAny(semaphores.ToArray());
}
AutoResetEvent semaphore = new AutoResetEvent(false);
semaphores.Add(semaphore);
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
semaphore.Set();
semaphores.Remove(semaphore);
}, null);
}
if(semaphores.Count > 0)
{
WaitHandle.WaitAll(semaphores.ToArray());
}
}
Using this, sometimes results in an exception stating the WaitHandle.WaitAll() or WaitHandle.WaitAny() array parameters must not exceed a length of 65. What am I doing wrong in this approach and how can I correct it?
There are a few problems with what you have written.
1st, it isn't thread safe. You have multiple threads adding, removing and waiting on the array of AutoResetEvents. The individual elements of the List can be accessed on separate threads, but anything that adds, removes, or checks all elements (like the WaitAny call), need to do so inside of a lock.
2nd, there is no guarantee that your code will only process 10 files at a time. The code between when the size of the List is checked, and the point where a new item is added is open for multiple threads to get through.
3rd, there is potential for the threads started in the QueueUserWorkItem to convert the same file. Without capturing the fileName inside the loop, the thread that converts the file will use whatever value is in fileName when it executes, NOT whatever was in fileName when you called QueueUserWorkItem.
This codeproject article should point you in the right direction for what you are trying to do: http://www.codeproject.com/KB/threads/SchedulingEngine.aspx
EDIT:
var semaphores = new List<AutoResetEvent>();
foreach (String fileName in filesToConvert)
{
String file = fileName;
AutoResetEvent[] array;
lock (semaphores)
{
array = semaphores.ToArray();
}
if (array.Count() >= 10)
{
WaitHandle.WaitAny(array);
}
var semaphore = new AutoResetEvent(false);
lock (semaphores)
{
semaphores.Add(semaphore);
}
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
lock (semaphores)
{
semaphores.Remove(semaphore);
}
semaphore.Set();
}, null);
}
Personally, I don't think I'd do it this way...but, working with the code you have, this should work.
Are you using a real semaphore (System.Threading)? When using semaphores, you typically allocate your max resources and it'll block for you automatically (as you add & release). You can go with the WaitAny approach, but I'm getting the feeling that you've chosen the more difficult route.
Looks like you need to remove the handle the triggered the WaitAny function to proceed
if(semaphores.Count >= 10)
{
int index = WaitHandle.WaitAny(semaphores.ToArray());
semaphores.RemoveAt(index);
}
So basically I would remove the:
semaphores.Remove(semaphore);
call from the thread and use the above to remove the signaled event and see if that works.
Maybe you shouldn't create so many events?
// input
var filesToConvert = new List<string>();
Action<string> Convert = Console.WriteLine;
// limit
const int MaxThreadsCount = 10;
var fileConverted = new AutoResetEvent(false);
long threadsCount = 0;
// start
foreach (var file in filesToConvert) {
if (threadsCount++ > MaxThreadsCount) // reached max threads count
fileConverted.WaitOne(); // wait for one of started threads
Interlocked.Increment(ref threadsCount);
ThreadPool.QueueUserWorkItem(
delegate {
Convert(file);
Interlocked.Decrement(ref threadsCount);
fileConverted.Set();
});
}
// wait
while (Interlocked.Read(ref threadsCount) > 0) // paranoia?
fileConverted.WaitOne();

Categories

Resources