Here's a description of what the program should do. The program should create a file and five threads to write in that file...
The first thread should write from 1 to 5 into that file.
The second thread should write from 1 to 10.
The third thread should write from 1 to 15.
The fourth thread should write from 1 to 20.
The fifth thread should write from 1 to 25.
Moreover, an algorithm should be implemented to make each thread print 2 numbers and stops. the next thread should print two numbers and stop. and so on until all the threads finish printing their numbers.
Here's the code I've developed so far...
using System;
using System.IO;
using System.Threading;
using System.Collections;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
public static class OSAssignment
{
// First Thread Tasks...
static void FirstThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 5; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread1] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread1] " + i);
}
}
}
// Second Thread Tasks...
static void SecondThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 10; i++)
{
if (i % 2 == 0)
{
if (i == 10)
Console.WriteLine("[Thread2] " + i);
else
{
Console.WriteLine("[Thread2] " + i);
Thread.Sleep(i);
}
}
else
{
Console.WriteLine("[Thread2] " + i);
}
}
}
// Third Thread Tasks..
static void ThirdThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 15; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread3] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread3] " + i);
}
}
}
// Fourth Thread Tasks...
static void FourthThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 20; i++)
{
if (i % 2 == 0)
{
if (i == 20)
Console.WriteLine("[Thread4] " + i);
else
{
Console.WriteLine("[Thread4] " + i);
Thread.Sleep(i);
}
}
else
{
Console.WriteLine("[Thread4] " + i);
}
}
}
// Fifth Thread Tasks...
static void FifthThreadTasks(StreamWriter WritingBuffer)
{
for (int i = 1; i <= 25; i++)
{
if (i % 2 == 0)
{
Console.WriteLine("[Thread5] " + i);
Thread.Sleep(i);
}
else
{
Console.WriteLine("[Thread5] " + i);
}
}
}
// Main Function...
static void Main(string[] args)
{
FileStream File = new FileStream("output.txt", FileMode.Create, FileAccess.Write, FileShare.Write);
StreamWriter Writer = new StreamWriter(File);
Thread T1 = new Thread(() => FirstThreadTasks(Writer));
Thread T2 = new Thread(() => SecondThreadTasks(Writer));
Thread T3 = new Thread(() => ThirdThreadTasks(Writer));
Thread T4 = new Thread(() => FourthThreadTasks(Writer));
Thread T5 = new Thread(() => FifthThreadTasks(Writer));
Console.WriteLine("Initiating Jobs...");
T1.Start();
T2.Start();
T3.Start();
T4.Start();
T5.Start();
Writer.Flush();
Writer.Close();
File.Close();
}
}
}
Here's the problems I'm facing...
I cannot figure out how to make the 5 threads write into the same file at the same time even with making FileShare.Write. So I simply decided to write to console for time being and to develop the algorithm and see how it behaves first in console.
Each time I run the program, the output is slightly different from previous. It always happen that a thread prints only one of it's numbers in a specific iteration and continues to output the second number after another thread finishes its current iteration.
I've a got a question that might be somehow offtrack. If I removed the Console.WriteLine("Initiating Jobs..."); from the main method, the algorithm won't behave like I mentioned in Point 2. I really can't figure out why.
Your main function is finishing and closing the file before the threads have started writing to it, so you can you use Thread.Join to wait for a thread to exit. Also I'd advise using a using statement for IDisposable objects.
When you have a limited resources you want to share among threads, you'll need a locking mechanism. Thread scheduling is not deterministic. You've started 5 threads and at that point it's not guaranteed which one will run first. lock will force a thread to wait for a resource to become free. The order is still not determined so T3 might run before T2 unless you add additional logic/locking to force the order as well.
I'm not seeing much difference in the behavior but free running threads will produce some very hard to find bugs especially relating to timing issues.
As an extra note I'd avoid using Sleep as a way of synchronizing threads.
To effectively get one thread to write at a time you need to block all other threads, there's a few methods for doing that such as lock, Mutex, Monitor,AutoResetEvent etc. I'd use an AutoResetEvent for this situation. The problem you then face is each thread needs to know which thread it's waiting for so that it can wait on the correct event.
Please see James' answer as well. He points out a critical bug that escaped my notice: you're closing the file before the writer threads have finished. Consider posting a new question to ask how to solve that problem, since this "question" is already three questions rolled into one.
FileShare.Write tells the operating system to allow other attempts to open the file for writing. Typically this is used for systems that have multiple processes writing to the same file. In your case, you have a single process and it only opens the file once, so this flag really makes no difference. It's the wrong tool for the job.
To coordinate writes between multiple threads, you should use locking. Add a new static field to the class:
private static object synchronizer = new object();
Then wrap each write operation on the file with a lock on that object:
lock(synchronizer)
{
Console.WriteLine("[Thread1] " + i);
}
This wil make no difference while you're using the Console, but I think it will solve the problem you had with writing to the file.
Speaking of which, switching from file write to console write to sidestep the file problem was a clever idea, so kudos for that. Howver an even better implementation of that idea would be to replace all of the write calls with a call to a single function, e.g. "WriteOutput(string)" so that you can switch everything from file to console just by changing one line in that function.
And then you could put the lock into that function as well.
Threaded stuff is not deterministic. It's guaranteed that each thread will run, but there are no guarantees about ordering, when threads will be interrupted, which thread will interrupt which, etc. It's a roll of the dice every time. You just have to get used to it, or go out of your way to force thing to happen in a certain sequence if that really matters for your application.
I dunno about this one. Seems like that shouldn't matter.
OK, I'm coming to this rather late, and but from a theoretical point of view, I/O from multiple threads to a specific end-point is inevitably fraught.
In the example above, it would almost certainly faster and safer to queue the output into an in-memory structure, each thread taking an exclusive lock before doing so, and then having a separate thread to output to the device.
Related
using System;
using System.Threading;
namespace Threading
{
class Program
{
static void Main(string[] args)
{
Semaphore even = new Semaphore(1, 1);
Semaphore odd = new Semaphore(1, 1);
Thread evenThread = new Thread(() =>
{
for (int i = 1; i <= 100; i++)
{
even.WaitOne();
if(i % 2 == 0)
{
Console.WriteLine(i);
}
odd.Release();
}
});
Thread oddThread = new Thread(() =>
{
for(int i = 1; i <=100; i++)
{
odd.WaitOne();
if(i%2 != 0)
{
Console.WriteLine(i);
}
even.Release();
}
});
oddThread.Start();
evenThread.Start();
}
}
}
So I have written this code where one thread is producing Odd numbers and other is producing even numbers.
Using Semaphores I have made sure that they print numbers in orders and it works perfectly.
But I have a special situation in mind, for example each thread waits until the other thread releases its semaphore. So can there be a condition where both threads are waiting and no thread is making any progress and there is a deadlock situation ?
For deadlock to occur, two or more threads must be trying to acquire two or more resources, but do so in different orders. See e.g. Deadlock and Would you explain lock ordering?.
Your code does not involve more than one lock per thread† and so does not have the ability to deadlock.
It does have the ability to throw an exception. As noted in this comment, it is theoretically possible for one of the threads to get far enough ahead of the other thread that it attempts to release a semaphore lock that hasn't already been taken. For example, if evenThread is pre-empted (or simply doesn't get scheduled to start running) before it gets to its first call to even.WaitOne(), but oddThread gets to run, then oddThread can acquire the odd semaphore, handle the if statement, and then try to call even.Release() before evenThread has had a chance to acquire that semaphore.
This will result in a SemaphoreFullException being thrown by the call to Release().
This would be a more likely possibility on a single-CPU system, something that is very hard to find these days. :) But it's still theoretically possible for any CPU configuration.
† Actually, there's an implicit lock in the Console.WriteLine() call, which is thread-safe by design. But from your code's point of view, that's an atomic operation. It's not possible for your code to acquire that lock and then wait on another. So it doesn't have any relevance to your specific question.
I have developed an application in c#. The class structure is as follows.
Form1 => The UI form. Has a backgroundworker, processbar, and a "ok" button.
SourceReader, TimedWebClient, HttpWorker, ReportWriter //clases do some work
Controller => Has the all over control. From "ok" button click an instance of this class called "cntrl" is created. This cntrlr is a global variable in Form1.cs.
(At the constructor of the Controler I create SourceReader, TimedWebClient,HttpWorker,ReportWriter instances. )
Then I call the RunWorkerAsync() of the background worker.
Within it code is as follows.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int iterator = 1;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
cntrlr.Vmain(iterator-1);
backgroundWorker1.ReportProgress(iterator);
}
}
At themoment ReportProgress updates the progressbar.
The urlList mentioned above has 1000 of urls. cntlr.Vamin(int i) process the whole process at themoment. I want to give the task to several threads, each one having to process 100 of urls. Though access for other instances or methods of them is not prohibited, access to ReportWriter should be limited to only one thread at a time. I can't find a way to do this. If any one have an idea or an answer, please explain.
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.
I think I would have used the ThreadPool, instead of background worker, and given each thread 1, not 100 url's to process. The thread pool will limit the number of threads it starts at once, so you wont have to worry about getting 1000 requests at once. Have a look here for a good example
http://msdn.microsoft.com/en-us/library/3dasc8as.aspx
Feeling a little more adventurous? Consider using TPL DataFlow to download a bunch of urls:
var urls = new[]{
"http://www.google.com",
"http://www.microsoft.com",
"http://www.apple.com",
"http://www.stackoverflow.com"};
var tb = new TransformBlock<string, string>(async url => {
using(var wc = new WebClient())
{
var data = await wc.DownloadStringTaskAsync(url);
Console.WriteLine("Downloaded : {0}", url);
return data;
}
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 4});
var ab = new ActionBlock<string>(data => {
//process your data
Console.WriteLine("data length = {0}", data.Length);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
tb.LinkTo(ab); //join output of producer to consumer block
foreach(var u in urls)
{
tb.Post(u);
}
tb.Complete();
Note how you can control the parallelism of each block explicitly, so you can gather in parallel but process without going concurrent (for example).
Just grab it with nuget. Easy.
So this is a continuation from my last question - So the question was
"What is the best way to build a program that is thread safe in terms that it needs to write double values to a file. If the function that saves the values via streamwriter is being called by multiple threads? Whats the best way of doing it?"
And I modified some code found at MSDN, how about the following? This one correctly writes everything to the file.
namespace SafeThread
{
class Program
{
static void Main()
{
Threading threader = new Threading();
AutoResetEvent autoEvent = new AutoResetEvent(false);
Thread regularThread =
new Thread(new ThreadStart(threader.ThreadMethod));
regularThread.Start();
ThreadPool.QueueUserWorkItem(new WaitCallback(threader.WorkMethod),
autoEvent);
// Wait for foreground thread to end.
regularThread.Join();
// Wait for background thread to end.
autoEvent.WaitOne();
}
}
class Threading
{
List<double> Values = new List<double>();
static readonly Object locker = new Object();
StreamWriter writer = new StreamWriter("file");
static int bulkCount = 0;
static int bulkSize = 100000;
public void ThreadMethod()
{
lock (locker)
{
while (bulkCount < bulkSize)
Values.Add(bulkCount++);
}
bulkCount = 0;
}
public void WorkMethod(object stateInfo)
{
lock (locker)
{
foreach (double V in Values)
{
writer.WriteLine(V);
writer.Flush();
}
}
// Signal that this thread is finished.
((AutoResetEvent)stateInfo).Set();
}
}
}
Thread and QueueUserWorkItem are the lowest available APIs for threading. I wouldn't use them unless I absolutely, finally, had no other choice. Try the Task class for a much higher-level abstraction. For details, see my recent blog post on the subject.
You can also use BlockingCollection<double> as a proper producer/consumer queue instead of trying to build one by hand with the lowest available APIs for synchronization.
Reinventing these wheels correctly is surprisingly difficult. I highly recommend using the classes designed for this type of need (Task and BlockingCollection, to be specific). They are built-in to the .NET 4.0 framework and are available as an add-on for .NET 3.5.
the code has the writer as an instance var but using a static locker. If you had multiple instances writing to different files, there's no reason they would need to share the same lock
on a related note, since you already have the writer (as a private instance var), you can use that for locking instead of using a separate locker object in this case - that makes things a little simpler.
The 'right answer' really depends on what you're looking for in terms of locking/blocking behavior. For instance, the simplest thing would be to skip the intermediate data structure just have a WriteValues method such that each thread 'reporting' its results goes ahead and writes them to the file. Something like:
StreamWriter writer = new StreamWriter("file");
public void WriteValues(IEnumerable<double> values)
{
lock (writer)
{
foreach (var d in values)
{
writer.WriteLine(d);
}
writer.Flush();
}
}
Of course, this means worker threads serialize during their 'report results' phases - depending on the performance characteristics, that may be just fine though (5 minutes to generate, 500ms to write, for example).
On the other end of the spectrum, you'd have the worker threads write to a data structure. If you're in .NET 4, I'd recommend just using a ConcurrentQueue rather than doing that locking yourself.
Also, you may want to do the file i/o in bigger batches than those being reported by the worker threads, so you might choose to just do writing in a background thread on some frequency. That end of the spectrum looks something like the below (you'd remove the Console.WriteLine calls in real code, those are just there so you can see it working in action)
public class ThreadSafeFileBuffer<T> : IDisposable
{
private readonly StreamWriter m_writer;
private readonly ConcurrentQueue<T> m_buffer = new ConcurrentQueue<T>();
private readonly Timer m_timer;
public ThreadSafeFileBuffer(string filePath, int flushPeriodInSeconds = 5)
{
m_writer = new StreamWriter(filePath);
var flushPeriod = TimeSpan.FromSeconds(flushPeriodInSeconds);
m_timer = new Timer(FlushBuffer, null, flushPeriod, flushPeriod);
}
public void AddResult(T result)
{
m_buffer.Enqueue(result);
Console.WriteLine("Buffer is up to {0} elements", m_buffer.Count);
}
public void Dispose()
{
Console.WriteLine("Turning off timer");
m_timer.Dispose();
Console.WriteLine("Flushing final buffer output");
FlushBuffer(); // flush anything left over in the buffer
Console.WriteLine("Closing file");
m_writer.Dispose();
}
/// <summary>
/// Since this is only done by one thread at a time (almost always the background flush thread, but one time via Dispose), no need to lock
/// </summary>
/// <param name="unused"></param>
private void FlushBuffer(object unused = null)
{
T current;
while (m_buffer.TryDequeue(out current))
{
Console.WriteLine("Buffer is down to {0} elements", m_buffer.Count);
m_writer.WriteLine(current);
}
m_writer.Flush();
}
}
class Program
{
static void Main(string[] args)
{
var tempFile = Path.GetTempFileName();
using (var resultsBuffer = new ThreadSafeFileBuffer<double>(tempFile))
{
Parallel.For(0, 100, i =>
{
// simulate some 'real work' by waiting for awhile
var sleepTime = new Random().Next(10000);
Console.WriteLine("Thread {0} doing work for {1} ms", Thread.CurrentThread.ManagedThreadId, sleepTime);
Thread.Sleep(sleepTime);
resultsBuffer.AddResult(Math.PI*i);
});
}
foreach (var resultLine in File.ReadAllLines(tempFile))
{
Console.WriteLine("Line from result: {0}", resultLine);
}
}
}
So you're saying you want a bunch of threads to write data to a single file using a StreamWriter? Easy. Just lock the StreamWriter object.
The code here will create 5 threads. Each thread will perform 5 "actions," and at the end of each action it will write 5 lines to a file named "file."
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;
namespace ConsoleApplication1 {
class Program {
static void Main() {
StreamWriter Writer = new StreamWriter("file");
Action<int> ThreadProcedure = (i) => {
// A thread may perform many actions and write out the result after each action
// The outer loop here represents the multiple actions this thread will take
for (int x = 0; x < 5; x++) {
// Here is where the thread would generate the data for this action
// Well simulate work time using a call to Sleep
Thread.Sleep(1000);
// After generating the data the thread needs to lock the Writer before using it.
lock (Writer) {
// Here we'll write a few lines to the Writer
for (int y = 0; y < 5; y++) {
Writer.WriteLine("Thread id = {0}; Action id = {1}; Line id = {2}", i, x, y);
}
}
}
};
//Now that we have a delegate for the thread code lets make a few instances
List<IAsyncResult> AsyncResultList = new List<IAsyncResult>();
for (int w = 0; w < 5; w++) {
AsyncResultList.Add(ThreadProcedure.BeginInvoke(w, null, null));
}
// Wait for all threads to complete
foreach (IAsyncResult r in AsyncResultList) {
r.AsyncWaitHandle.WaitOne();
}
// Flush/Close the writer so all data goes to disk
Writer.Flush();
Writer.Close();
}
}
}
The result should be a file "file" with 125 lines in it with all "actions" performed concurrently and the result of each action written synchronously to the file.
The code you have there is subtly broken - in particular, if the queued work item runs first, then it will flush the (empty) list of values immediately, before terminating, after which point your worker goes and fills up the List (which will end up being ignored). The auto-reset event also does nothing, since nothing ever queries or waits on its state.
Also, since each thread uses a different lock, the locks have no meaning! You need to make sure you hold a single, shared lock whenever accessing the streamwriter. You don't need a lock between the flushing code and the generation code; you just need to make sure the flush runs after the generation finishes.
You're probably on the right track, though - although I'd use a fixed-size array instead of a list, and flush all entries from the array when it gets full. This avoids the possibility of running out of memory if the thread is long-lived.
I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.
Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:
private void CreateThreads()
{
_startTime = DateTime.Now;
var totalThreads = 0;
var totalRecords = 0;
progressThreadsCreated.Maximum = _threadCount;
progressThreadsCreated.Step = 1;
LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
this.Update();
for(var i = 1; i <= _threadCount; i++)
{
LabelThreadsCreated.Text = i + " / " + _threadCount;
progressThreadsCreated.Value = i;
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
totalThreads += 1;
LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();
var paramss = new ArrayList { i, records };
var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
thread.Start(paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (!runOnce)
{
CreateProgressArea(threadCount, records.Count);
}
else
{
ResetProgressBarMethod(threadCount, records.Count);
}
runOnce = true;
var counter = 0;
if (records.Count > 0)
{
foreach (var record in records)
{
counter += 1;
runningCount += 1;
_totalRecords += 1;
var rec = record;
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);
if (_stopThreads)
{
break;
}
}
UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);
if (!_noRecordsInPool)
{
records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
if (records == null || records.Count <= 0)
{
_noRecordsInPool = true;
break;
}
else
{
lastCount = records.Count;
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.
If you decide to move to .NET 4.0, Tasks would be another way to go.
I allow the user to input the number
of threads they would like to have
running. I also allow the user to
allocate the number of records to
process per thread.
This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.
I am not performing any locking (and
not sure I need to or not) on the
threads.
The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.
Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.
In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)
Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.
The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?
How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?
I have an application that has many cases. Each case has many multipage tif files. I need to covert the tf files to pdf file. Since there are so many file, I thought I could thread the conversion process. I'm currently limiting the process to ten conversions at a time (i.e ten treads). When one conversion completes, another should start.
This is the current setup I'm using.
private void ConvertFiles()
{
List<AutoResetEvent> semaphores = new List<AutoResetEvet>();
foreach(String fileName in filesToConvert)
{
String file = fileName;
if(semaphores.Count >= 10)
{
WaitHandle.WaitAny(semaphores.ToArray());
}
AutoResetEvent semaphore = new AutoResetEvent(false);
semaphores.Add(semaphore);
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
semaphore.Set();
semaphores.Remove(semaphore);
}, null);
}
if(semaphores.Count > 0)
{
WaitHandle.WaitAll(semaphores.ToArray());
}
}
Using this, sometimes results in an exception stating the WaitHandle.WaitAll() or WaitHandle.WaitAny() array parameters must not exceed a length of 65. What am I doing wrong in this approach and how can I correct it?
There are a few problems with what you have written.
1st, it isn't thread safe. You have multiple threads adding, removing and waiting on the array of AutoResetEvents. The individual elements of the List can be accessed on separate threads, but anything that adds, removes, or checks all elements (like the WaitAny call), need to do so inside of a lock.
2nd, there is no guarantee that your code will only process 10 files at a time. The code between when the size of the List is checked, and the point where a new item is added is open for multiple threads to get through.
3rd, there is potential for the threads started in the QueueUserWorkItem to convert the same file. Without capturing the fileName inside the loop, the thread that converts the file will use whatever value is in fileName when it executes, NOT whatever was in fileName when you called QueueUserWorkItem.
This codeproject article should point you in the right direction for what you are trying to do: http://www.codeproject.com/KB/threads/SchedulingEngine.aspx
EDIT:
var semaphores = new List<AutoResetEvent>();
foreach (String fileName in filesToConvert)
{
String file = fileName;
AutoResetEvent[] array;
lock (semaphores)
{
array = semaphores.ToArray();
}
if (array.Count() >= 10)
{
WaitHandle.WaitAny(array);
}
var semaphore = new AutoResetEvent(false);
lock (semaphores)
{
semaphores.Add(semaphore);
}
ThreadPool.QueueUserWorkItem(
delegate
{
Convert(file);
lock (semaphores)
{
semaphores.Remove(semaphore);
}
semaphore.Set();
}, null);
}
Personally, I don't think I'd do it this way...but, working with the code you have, this should work.
Are you using a real semaphore (System.Threading)? When using semaphores, you typically allocate your max resources and it'll block for you automatically (as you add & release). You can go with the WaitAny approach, but I'm getting the feeling that you've chosen the more difficult route.
Looks like you need to remove the handle the triggered the WaitAny function to proceed
if(semaphores.Count >= 10)
{
int index = WaitHandle.WaitAny(semaphores.ToArray());
semaphores.RemoveAt(index);
}
So basically I would remove the:
semaphores.Remove(semaphore);
call from the thread and use the above to remove the signaled event and see if that works.
Maybe you shouldn't create so many events?
// input
var filesToConvert = new List<string>();
Action<string> Convert = Console.WriteLine;
// limit
const int MaxThreadsCount = 10;
var fileConverted = new AutoResetEvent(false);
long threadsCount = 0;
// start
foreach (var file in filesToConvert) {
if (threadsCount++ > MaxThreadsCount) // reached max threads count
fileConverted.WaitOne(); // wait for one of started threads
Interlocked.Increment(ref threadsCount);
ThreadPool.QueueUserWorkItem(
delegate {
Convert(file);
Interlocked.Decrement(ref threadsCount);
fileConverted.Set();
});
}
// wait
while (Interlocked.Read(ref threadsCount) > 0) // paranoia?
fileConverted.WaitOne();