I have a list of 10 items that I need to process, with each item using a separate thread. Should the code be like this:
foreach (Item item in items)
{
Thread t = new Thread(() =>
{
ProcessItem(item);
});
t.Start();
}
I would also need to pause the thread for (1 second minus the time taken to execute the thread). Should I use Thread.Sleep in this case?
If you don't mind skipping the manual handling of Threads, the following line should do exactly what you want:
Parallel.ForEach(items, ProcessItem);
Or sleeping before processing each (although that does not make much sense):
Parallel.ForEach(items, item => { Thread.Sleep(1000); ProcessItem(item); });
You will use Thread.Join to wait for other threads to finish their work.
Thread.Sleep will essentially wait for the specified number of milli-seconds
Thread.Sleep indeed has side-effects and is not recommended.
Some points to be noted in your context:
What if there are no more threads available (if the number of items increases ?)
Does the threads access some shared resources ?
Check out the ThreadPooling and thread-safe operations too.
The code for starting the threads looks fine.
You will have to use Thread.Sleep(duration in milliseconds) for making the thread to pause for duration amount of time.
Join will halt the current thread until the thread on which you join does not complete its processing.
Use the following if, for some reason, you don't want to use the Parallel.ForEach:
Thread[] threads = new Thread[10];
int count = 0;
foreach (Item item in items)
{
Thread t = new Thread(() =>
{
ProcessItem(item);
});
t.Start();
threads[count++]=t;
}
for (int i=0;i<10;++i)
threads[i].Join();
Use Thread.Sleep.
Thread.Sleep and Thread.Join are different things.
Thread.Sleep blocks (stops) the current thread for a certain time.
Thread.Join blocks (stops) the current thread until the one which Join was called finishes.
Also, consider using Parallel.ForEach as #nvoigt suggested.
Related
Below is my code:
class Program
{
static void Main(string[] args)
{
Test();
}
static void Test()
{
string[] strList = { "first", "second", "third" };
Parallel.ForEach(strList, currStr =>
{
Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId} is handling {currStr}");
if (Thread.CurrentThread.ManagedThreadId != 1) //if not primary thread, sleep for 5 secs
{
Thread.Sleep(5000);
}
});
Console.WriteLine($"Here is thread {Thread.CurrentThread.ManagedThreadId}");
...
doMoreWork();
...
}
}
so Parallel.ForEach fetches two threads from the ThreadPool plus existing primary thread. And the output is:
Thread 1 is handling first
Thread 3 is handling second
Thread 4 is handling third
and after 5 seconds:
Here is Thread 1
Obviously, thread 1(primary thread) was blocked. But why wasbthe primary thread blocked? I can kind of get the idea that primary thread is blocked to wait for other threads to finish their jobs. But isn't that very inefficient, because the primary thread is blocked, it cannot continue to execute doMoreWork() until all other threads finish.
It isn't inefficient, it is simply the way you have coded it. While parallel thread execution is useful, so is sequential execution. The main purpose for the Parallel.Foreach is to iterator over an enumeration by partitioning the enumeration across multiple threads. Lets say for example the Foreach loop calculates a value by applying operations to each item in the enumeration. You then want to use this single value in a call to doMoreWork. If the Foreach loop and the doMoreWork executed in parallel you would have to introduce some form of wait to ensure the foreach completed before calling doMoreWork.
You might want to take a look at the Task Class documentation and examples. If you really want to have a Parallel.Foreach and doMoreWork running in separate threads at the same time you can uses Task.Run to start a function (or lambda), then independently wait on these to finish.
I will note that parallel execution doesn't guarantee efficient or speed. There are many factors to consider such as Amdahl's law, the effect of locking memory to ensure coherence, total system resources, etc. It's a very big topic.
How else could this possibly work? The purpose of a parallel for loop is to speed up a calculation by performing parts of it in parallel. The program cannot continue until all parts of the loop have completed (and the final result of the calculation can be computed). It's purpose is not to hand off work to execute asynchronously while the initiating thread continues on its way. You're using the wrong tool for the job. You should look into Task objects.
In a Parallel.For, is it possible to synchronize each threads with a 'WaitAll' ?
Parallel.For(0, maxIter, i =>
{
// Do stuffs
// Synchronisation : wait for all threads => ???
// Do another stuffs
});
Parallel.For, in the background, batches the iterations of the loop into one or more Tasks, which can executed in parallel. Unless you take ownership of the partitioning, the number of tasks (and threads) is (and should!) be abstracted away. Control will only exit the Parallel.For loop once all the tasks have completed (i.e. no need for WaitAll).
The idea of course is that each loop iteration is independent and doesn't require synchronization.
If synchronization is required in the tight loop, then you haven't isolated the Tasks correctly, or it means that Amdahl's Law is in effect, and the problem can't be speeded up through parallelization.
However, for an aggregation type pattern, you may need to synchronize after completion of each Task - use the overload with the localInit / localFinally to do this, e.g.:
// allTheStrings is a shared resource which isn't thread safe
var allTheStrings = new List<string>();
Parallel.For( // for (
0, // var i = 0;
numberOfIterations, // i < numberOfIterations;
() => new List<string> (), // localInit - Setup each task. List<string> --> localStrings
(i, parallelLoopState, localStrings) =>
{
// The "tight" loop. If you need to synchronize here, there is no point
// using parallel at all
localStrings.Add(i.ToString());
return localStrings;
},
(localStrings) => // local Finally for each task.
{
// Synchronization needed here is needed - run once per task
lock(allTheStrings)
{
allTheStrings.AddRange(localStrings);
}
});
In the above example, you could also have just declared allTheStrings as
var allTheStrings = new ConcurrentBag<string>();
In which case, we wouldn't have required the lock in the localFinally.
You shouldn't (for reasons stated by other users), but if you want to, you can use Barrier. This can be used to cause all threads to wait (block) at a certain point before X number of participants hit a barrier, causing the barrier to proceed and threads to unblock. The downside of this approach, as others have said, deadlocks
The below code uses a background worker thread to process work items one by one. The worker thread starts waiting on a ManualResetEvent whenever it runs out of work items. Main thread periodically adds new work items and wakes the worker thread.
The waking mechanism has a race condition. If a new item is added by main thread while worker thread is at the place indicated by *, the worker thread will not get woken.
Is there a simple and correct way of waking the worker thread that does not have this problem?
ManualResetEvent m_waitEvent;
// Worker thread processes work items one by one
void WorkerThread()
{
while (true)
{
m_waitEvent.WaitOne();
bool noMoreItems = ProcessOneWorkItem();
if (noMoreItems)
{
// *
m_waitEvent.Reset(); // No more items, wait for more
}
}
}
// Main thread code that adds a new work item
AddWorkItem();
m_waitEvent.Set(); // Wake worker thread
You're using the wrong synchronization mechanism. Rather than a MRE just use a Semaphore. The semaphore will then represent the number of items yet to be processed. You can set it to add one, or wait on it to reduce it by one. There is no if, you always do every semaphore action and as a result there is no race condition.
That said, you can avoid the problem entirely. Rather than managing the synchronization primitives yourself you can just use a BlockingCollection. Have the producer add items and the consumer consume them. The synchronization will all be taken care of for you by that class, and likely more efficiently than your implementation would be as well.
I tend to use a current work items counter and increment and decrement that counter. You can turn your processor thread into a loop that is looking at that counter then sleeping rather than run once and done. That way, no matter where you are when the item is added, you are 1 sleep cycle from the item being processed.
I am using multiple threads in my application using while(true) loop and now i want to exit from loop when all the active threads complete their work.
Assuming that you have a list of the threads themselves, here are two approaches.
Solution the first:
Use Thread.Join() with a timespan parameter to synch up with each thread in turn. The return value tells you whether the thread has finished or not.
Solution the second:
Check Thread.IsAlive() to see if the thread is still running.
In either situation, make sure that your main thread yields processor time to the running threads, else your wait loop will consume most/all the CPU and starve your worker threads.
You can use Process.GetCurrentProcess().Threads.Count.
There are various approaches here, but utlimately most of them come down to your changing the executed threads to do something whenever they leave (success or via exception, which you don't want to do anyway). A simple approach might be to use Interlock.Decrement to reduce a counter - and if it is zero (or -ve, which probably means an error) release a ManualResetEvent or Monitor.Pulse an object; in either case, the original thread would be waiting on that object. A number of such approaches are discussed here.
Of course, it might be easier to look at the TPL bits in 4.0, which provide a lot of new options here (not least things like Parallel.For in PLINQ).
If you are using a synchronized work queue, it might also be possible to set that queue to close (drain) itself, and simply wait for the queue to be empty? The assumption here being that your worker threads are doing something like:
T workItem;
while(queue.TryDequeue(out workItem)) { // this may block until either something
ProcessWorkItem(workItem); // todo, or the queue is terminated
}
// queue has closed - exit the thread
in which case, once the queue is empty all your worker threads should already be in the process of suicide.
You can use Thread.Join(). The Join method will block the calling thread until the thread (the one on which the Join method is called) terminates.
So if you have a list of thread, then you can loop through and call Join on each thread. You loop will only exit when all the threads are dead. Something like this:
for(int i = 0 ;i < childThreadList.Count; i++)
{
childThreadList[i].Join();
}
///...The following code will execute when all threads in the list have been terminated...///
I find that using the Join() method is the cleanest way. I use multiple threads frequently, and each thread is typically loading data from different data sources (Informix, Oracle and SQL at the same time.) A simple loop, as mentioned above, calling Join() on each thread object (which I store in a simple List object) works!!!
Carlos Merighe.
I prefer using a HashSet of Threads:
// create a HashSet of heavy tasks (threads) to run
HashSet<Thread> Threadlist = new HashSet<Thread>();
Threadlist.Add(new Thread(() => SomeHeavyTask1()));
Threadlist.Add(new Thread(() => SomeHeavyTask2()));
Threadlist.Add(new Thread(() => SomeHeavyTask3()));
// start the threads
foreach (Thread T in Threadlist)
T.Start();
// these will execute sequential
NotSoHeavyTask1();
NotSoHeavyTask2();
NotSoHeavyTask3();
// loop through tasks to see if they are still active, and join them to main thread
foreach (Thread T in Threadlist)
if (T.ThreadState == ThreadState.Running)
T.Join();
// finally this code will execute
MoreTasksToDo();
This isn't about the different methods I could or should be using to utilize the queues in the best manner, rather something I have seen happening that makes no sense to me.
void Runner() {
// member variable
queue = Queue.Synchronized(new Queue());
while (true) {
if (0 < queue.Count) {
queue.Dequeue();
}
}
}
This is run in a single thread:
var t = new Thread(Runner);
t.IsBackground = true;
t.Start();
Other events are "Enqueue"ing else where. What I've seen happen is over a period of time, the Dequeue will actually throw InvalidOperationException, queue empty. This should be impossible seeing as how the count guarantees there is something there, and I'm positive that nothing else is "Dequeue"ing.
The question(s):
Is it possible that the Enqueue actually increases the count before the item is fully on the queue (whatever that means...)?
Is it possible that the thread is somehow restarting (expiring, reseting...) at the Dequeue statement, but immediately after it already removed an item?
Edit (clarification):
These code pieces are part of a Wrapper class that implements the background helper thread. The Dequeue here is the only Dequeue, and all Enqueue/Dequeue are on the Synchronized member variable (queue).
Using Reflector, you can see that no, the count does not get increased until after the item is added.
As Ben points out, it does seem as you do have multiple people calling dequeue.
You say you are positive that nothing else is calling dequeue. Is that because you only have the one thread calling dequeue? Is dequeue called anywhere else at all?
EDIT:
I wrote a little sample code, but could not get the problem to reproduce. It just kept running and running without any exceptions.
How long was it running before you got errors? Maybe you can share a bit more of the code.
class Program
{
static Queue q = Queue.Synchronized(new Queue());
static bool running = true;
static void Main()
{
Thread producer1 = new Thread(() =>
{
while (running)
{
q.Enqueue(Guid.NewGuid());
Thread.Sleep(100);
}
});
Thread producer2 = new Thread(() =>
{
while (running)
{
q.Enqueue(Guid.NewGuid());
Thread.Sleep(25);
}
});
Thread consumer = new Thread(() =>
{
while (running)
{
if (q.Count > 0)
{
Guid g = (Guid)q.Dequeue();
Console.Write(g.ToString() + " ");
}
else
{
Console.Write(" . ");
}
Thread.Sleep(1);
}
});
consumer.IsBackground = true;
consumer.Start();
producer1.Start();
producer2.Start();
Console.ReadLine();
running = false;
}
}
Here is what I think the problematic sequence is:
(0 < queue.Count) evaluates to true, the queue is not empty.
This thread gets preempted and another thread runs.
The other thread removes an item from the queue, emptying it.
This thread resumes execution, but is now within the if block, and attempts to dequeue an empty list.
However, you say nothing else is dequeuing...
Try outputting the count inside the if block. If you see the count jump numbers downwards, someone else is dequeuing.
Here's a possible answer from the MSDN page on this topic:
Enumerating through a collection is
intrinsically not a thread-safe
procedure. Even when a collection is
synchronized, other threads can still
modify the collection, which causes
the enumerator to throw an exception.
To guarantee thread safety during
enumeration, you can either lock the
collection during the entire
enumeration or catch the exceptions
resulting from changes made by other
threads.
My guess is that you're correct - at some point, there's a race condition happening, and you end up dequeuing something that isn't there.
A Mutex or Monitor.Lock is probably appropriate here.
Good luck!
Are the other areas that are "Enqueuing" data also using the same synchronized queue object? In order for the Queue.Synchronized to be thread-safe, all Enqueue and Dequeue operations must use the same synchronized queue object.
From MSDN:
To guarantee the thread safety of the
Queue, all operations must be done
through this wrapper only.
Edited:
If you are looping over many items that involve heavy computation or if you are using a long-term thread loop (communications, etc.), you should consider having a wait function such as System.Threading.Thread.Sleep, System.Threading.WaitHandle.WaitOne, System.Threading.WaitHandle.WaitAll, or System.Threading.WaitHandle.WaitAny in the loop, otherwise it might kill system performance.
question 1: If you're using a synchronized queue, then: no, you're safe! But you'll need to use the synchronized instance on both sides, the supplier and the feeder.
question 2: Terminating your worker thread when there is no work to do, is a simple job. However, you either way need a monitoring thread or have the queue start a background worker thread whenever the queue has something to do. The last one sounds more like the ActiveObject Pattern, than a simple queue (which's Single-Responsibily-Pattern says that it should only do queueing).
In addition, I'd go for a blocking queue instead of your code above. The way your code works requires CPU processing power even if there is no work to do. A blocking queue lets your worker thread sleep whenever there is nothing to do. You can have multiple sleeping threads running without using CPU processing power.
C# doesn't come with a blocking queue implementation, but there a many out there. See this example and this one.
Another option for making thread-safe use of queues is the ConcurrentQueue<T> class that has been introduced since 2009 (the year of this question). This may help avoid having to write your own synchronization code or at least help making it much simpler.
From .NET Framework 4.6 onward, ConcurrentQueue<T> also implements the interface IReadOnlyCollection<T>.