Differences between multithreading and multitasking in C# [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between task and thread?
I understand the title itself may appear to be a duplicate question but I've really read all the previous posts related to this topic and still don't quite understand the program behavior.
I'm currently writing a small program that checks around 1,000 E-mail accounts. Undoubtedly I feel multithreading or multitasking is the right approach since each thread / task is not computationally expensive but the duration of each thread relies heavily on network I/O.
I think that under such a scenario, it would also be reasonable to set the number of threads / tasks at a number that is much larger than the number of cores. (four for i5-750). Therefore I've set the number of threads or tasks at 100.
The code snippet written using Tasks:
const int taskCount = 100;
var tasks = new Task[taskCount];
var loopVal = (int) Math.Ceiling(1.0*EmailAddress.Count/taskCount);
for (int i = 0; i < taskCount; i++)
{
var objContainer = new AutoCheck(i*loopVal, i*loopVal + loopVal);
tasks[i] = new Task(objContainer.CheckMail);
tasks[i].Start();
}
Task.WaitAll(tasks);
The same code snippet written using Threads:
const int threadCount = 100;
var threads = new Thread[threadCount];
var loopVal = (int)Math.Ceiling(1.0 * EmailAddress.Count / threadCount);
for (int i = 0; i < threadCount; i++)
{
var objContainer = new AutoCheck(i * loopVal, i * loopVal + loopVal);
threads[i] = new Thread(objContainer.CheckMail);
threads[i].Start();
}
foreach (Thread t in threads)
t.Join();
runningTime.Stop();
Console.WriteLine(runningTime.Elapsed);
So what are the essential differences between these two?

Tasks do not necessarily correspond to threads. They will be scheduled by the task library to threadpool threads in a much more efficient manner than your thread code.
Threads are fairly expensive to create. Tasks will queue up and reuse threads as they become available, so when a thread is waiting for network IO, it can actually be reused to execute another task. Threads that sit idle are wasted resources. You can only execute the number of threads that corresponds to your processor core count (simultaneously), so 100 threads means context switches on all your cores at least 25 times each.
If using tasks, just queue up all 1000 email processing tasks instead of batching them up and let it rip. The task library will handle how many threads to run it on.

The sort quick answer is that a Task does not equal a Thread. A task is queued up in a Task Scheduler and then executed on a Thread, but queuing up 100 Tasks does not mean you will have 100 threads running.
Usually a task will run on a thread from the thread pool, which has a finite size. Once all of those threads are busy, then your tasks will have to wait for a thread to become available to execute your task. It's also possible to queue them up on things like the UI thread using the appropriate task scheduler, in which case they end up being run synchronously by the UI thread's message loop (this is useful when you're interacting with a UI).
In your task example, there is no need to actually limit the number of tasks started, since the tasks will already get queued up and will have to wait until there is a thread available to run them. This is probably the best approach, because you're letting the system determine the maximum number of threads based on what the system can handle, rather than assuming you have enough CPU/Memory to handle it.
Whereas your example using threads specifically mean that you will definitely spin up the number of threads you're alloting for.

Related

Let threads wait for all tasks to complete before starting on the next set of tasks

I have a pipeline that consists of several stages. Jobs in the same stage can be worked on in parallel. But all jobs in stage 1 have to completed before anyone can start working on jobs in stage 2, etc..
I was thinking of synchronizing this work using a CountDownEvent.
My basis structure would be
this.WorkerCountdownEvent = new CountdownEvent(MaxJobsInStage);
this.WorkerCountdownEvent.Signal(MaxJobsInStage); // Starts all threads
// Each thread runs the following code
for (this.currentStage = 0; this.currentStage < this.PipelineStages.Count; this.currentStage++)
{
this.WorkerCountdownEvent.Wait();
var stage = this.PipelineStages[this.currentStage];
if (stage.Systems.Count < threadIndex)
{
var system = stage.Systems[threadIndex];
system.Process();
}
this.WorkerCountdownEvent.Signal(); // <--
}
This would work well for processing one stage. But the first thread that reaches this.WorkerCountdownEvent.Signal() will crash the application as its trying to decrement the signal to below zero.
Of course if I want prevent this, and have the jobs to wait again, I have to call this.WorkerCountdownEvent.Reset(). But I have to call it after all threads have started working, but before one thread is done with its work. Which seems like an impossible task?
Am I using the wrong synchronization primitive? Or should I use two countdown events? Or am I missing something completely?
(Btw usually jobs will take less than a milliseconds so bonus points if someone has a better way to do this using 'slim' primitives like ManualResetEventSlim. ThreadPools, or Task<> are not the direction I'm looking at since these threads will live for very long (hours) and need to go through the pipeline 60x per second. So the overhead of stopping/starting a Tasks is considerable here).
Edit: this questions was flagged as a duplciate of two questions. One of the questons was answered with "use thread.Join()" and the other one was answered with "Use TPL" both answers are (in my opnion) clearly not answers to a question about pipelining and threading primitives such as CountDownEvent.
I think that the most suitable synchronization primitive for this case is the Barrier.
Enables multiple tasks to cooperatively work on an algorithm in parallel through multiple phases.
Usage example:
private Barrier _barrier = new Barrier(this.WorkersCount);
// Each worker thread runs the following code
for (i = 0; i < this.StagesCount; i++)
{
// Here goes the work of a single worker for a single stage...
_barrier.SignalAndWait();
}
Update: In case you want the workers to wait for the signal asynchronously, there is an AsyncBarrier implementation here.

ThreadPool or Task.Factory

I have windows service that can get requests, I want to handle in each request in separated thread.
I want to limit also the number of the threads, i.e maximum 5 threads.
And i want to wait for all threads before i'm close the application,
What is the best way to do that?
What I'm tried:
for (int i = 0; i < 10; i++)
{
var i1 = i;
Task.Factory.StartNew(() => RequestHandle(i1.ToString())).ContinueWith(t => Console.WriteLine("Done"));
}
Task.WaitAll();//Not waiting actually for all threads, why?
In this way i can limit the number of the theads?
Or
var events = new ManualResetEvent[10];
ThreadPool.SetMaxThreads(5, 5);
for (int i = 0; i < 10; i++)
{
var i1 = i;
ThreadPool.QueueUserWorkItem(x =>
{
Test(i1.ToString());
events[i1].Set();
});
}
WaitHandle.WaitAll(events);
There is another way to implement that?
Both approaches should work, neither is a guarantee that each operation will run in a different thread (and that is a good thing). Controlling the maximun number of threads is another thing...
You can set the maximun number of threads of the ThreadPool with SetMaxThreads. Remember that this change is global, you only have one ThreadPool.
The default TaskScheduler will use the thread pool (except in some particular situations where it can run the taks inline on the same thread that is calling them), so changing the parameters of the ThreadPool will also affect the Tasks.
Now, notice I said default TaskScheduler. You can roll your own, which will give you more control over how the tasks will run. It is not advised to create a custom TaskScheduler (unless you really need it and you know what you are doing).
For the embedded question:
Task.WaitAll();//Not waiting actually for all threads, why?
To correctly call Task.WaitAll you need to pass the tasks you want to wait for as parameters. There is no way to wait for all currently existing tasks implicitly, you need to tell it what tasks you want to wait for.
Task.WaitAll();//Not waiting actually for all threads, why?
you have to the following
List<Task> tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var i1 = i;
tasks.Add(Task.Factory.StartNew(() => RequestHandle(i1.ToString())).ContinueWith(t => Console.WriteLine("Done")));
}
Task.WaitAll(tasks);
I think that you should make the differences between Task and thread
Task are not threads Task is just a promise of result in the future and your code can execute on only one thread even if you have many tasks scheduled
Thread is a low-level concept if you start a thread you know that it will be a separate thread
I think you first implentation is well enough to be sure that all your code is executed but you have to use Task.Run instead of Task.Factory.StartNew

Controlling number of tasks being executed at an instance at runtime [duplicate]

I have the following code:
var factory = new TaskFactory();
for (int i = 0; i < 100; i++)
{
var i1 = i;
factory.StartNew(() => foo(i1));
}
static void foo(int i)
{
Thread.Sleep(1000);
Console.WriteLine($"foo{i} - on thread {Thread.CurrentThread.ManagedThreadId}");
}
I can see it only does 4 threads at a time (based on observation). My questions:
What determines the number of threads used at a time?
How can I retrieve this number?
How can I change this number?
P.S. My box has 4 cores.
P.P.S. I needed to have a specific number of tasks (and no more) that are concurrently processed by the TPL and ended up with the following code:
private static int count = 0; // keep track of how many concurrent tasks are running
private static void SemaphoreImplementation()
{
var s = new Semaphore(20, 20); // allow 20 tasks at a time
for (int i = 0; i < 1000; i++)
{
var i1 = i;
Task.Factory.StartNew(() =>
{
try
{
s.WaitOne();
Interlocked.Increment(ref count);
foo(i1);
}
finally
{
s.Release();
Interlocked.Decrement(ref count);
}
}, TaskCreationOptions.LongRunning);
}
}
static void foo(int i)
{
Thread.Sleep(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}
When you are using a Task in .NET, you are telling the TPL to schedule a piece of work (via TaskScheduler) to be executed on the ThreadPool. Note that the work will be scheduled at its earliest opportunity and however the scheduler sees fit. This means that the TaskScheduler will decide how many threads will be used to run n number of tasks and which task is executed on which thread.
The TPL is very well tuned and continues to adjust its algorithm as it executes your tasks. So, in most cases, it tries to minimize contention. What this means is if you are running 100 tasks and only have 4 cores (which you can get using Environment.ProcessorCount), it would not make sense to execute more than 4 threads at any given time, as otherwise it would need to do more context switching. Now there are times where you want to explicitly override this behaviour. Let's say in the case where you need to wait for some sort of IO to finish, which is a whole different story.
In summary, trust the TPL. But if you are adamant to spawn a thread per task (not always a good idea!), you can use:
Task.Factory.StartNew(
() => /* your piece of work */,
TaskCreationOptions.LongRunning);
This tells the DefaultTaskscheduler to explicitly spawn a new thread for that piece of work.
You can also use your own Scheduler and pass it in to the TaskFactory. You can find a whole bunch of Schedulers HERE.
Note another alternative would be to use PLINQ which again by default analyses your query and decides whether parallelizing it would yield any benefit or not, again in the case of a blocking IO where you are certain starting multiple threads will result in a better execution you can force the parallelism by using WithExecutionMode(ParallelExecutionMode.ForceParallelism) you then can use WithDegreeOfParallelism, to give hints on how many threads to use but remember there is no guarantee you would get that many threads, as MSDN says:
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Finally, I highly recommend having a read of THIS great series of articles on Threading and TPL.
If you increase the number of tasks to for example 1000000 you will see a lot more threads spawned over time. The TPL tends to inject one every 500ms.
The TPL threadpool does not understand IO-bound workloads (sleep is IO). It's not a good idea to rely on the TPL for picking the right degree of parallelism in these cases. The TPL is completely clueless and injects more threads based on vague guesses about throughput. Also to avoid deadlocks.
Here, the TPL policy clearly is not useful because the more threads you add the more throughput you get. Each thread can process one item per second in this contrived case. The TPL has no idea about that. It makes no sense to limit the thread count to the number of cores.
What determines the number of threads used at a time?
Barely documented TPL heuristics. They frequently go wrong. In particular they will spawn an unlimited number of threads over time in this case. Use task manager to see for yourself. Let this run for an hour and you'll have 1000s of threads.
How can I retrieve this number? How can I change this number?
You can retrieve some of these numbers but that's not the right way to go. If you need a guaranteed DOP you can use AsParallel().WithDegreeOfParallelism(...) or a custom task scheduler. You also can manually start LongRunning tasks. Do not mess with process global settings.
I would suggest using SemaphoreSlim because it doesn't use Windows kernel (so it can be used in Linux C# microservices) and also has a property SemaphoreSlim.CurrentCount that tells how many remaining threads are left so you don't need the Interlocked.Increment or Interlocked.Decrement. I also removed i1 because i is value type and it won't be changed by the call of foo method passing the i argument so it's no need to copy it into i1 to ensure it never changes (if that was the reasoning for adding i1):
private static void SemaphoreImplementation()
{
var maxThreadsCount = 20; // allow 20 tasks at a time
var semaphoreSlim = new SemaphoreSlim(maxTasksCount, maxTasksCount);
var taskFactory = new TaskFactory();
for (int i = 0; i < 1000; i++)
{
taskFactory.StartNew(async () =>
{
try
{
await semaphoreSlim.WaitAsync();
var count = maxTasksCount-semaphoreSlim.CurrentCount; //SemaphoreSlim.CurrentCount tells how many threads are remaining
await foo(i, count);
}
finally
{
semaphoreSlim.Release();
}
}, TaskCreationOptions.LongRunning);
}
}
static async void foo(int i, int count)
{
await Task.Wait(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

C# Multi-Threading - Limiting the amount of concurrent threads

I have question on controlling the amount of concurrent threads I want running. Let me explain with what I currently do: For example
var myItems = getItems(); // is just some generic list
// cycle through the mails, picking 10 at a time
int index = 0;
int itemsToTake = myItems.Count >= 10 ? 10 : myItems.Count;
while (index < myItems.Count)
{
var itemRange = myItems.GetRange(index, itemsToTake);
AutoResetEvent[] handles = new AutoResetEvent[itemsToTake];
for (int i = 0; i < itemRange.Count; i++)
{
var item = itemRange[i];
handles[i] = new AutoResetEvent(false);
// set up the thread
ThreadPool.QueueUserWorkItem(processItems, new Item_Thread(handles[i], item));
}
// wait for all the threads to finish
WaitHandle.WaitAll(handles);
// update the index
index += itemsToTake;
// make sure that the next batch of items to get is within range
itemsToTake = (itemsToTake + index < myItems.Count) ? itemsToTake : myItems.Count -index;
This is a path that I currently take. However I do not like it at all. I know I can 'manage' the thread pool itself, but I have heard it is not advisable to do so. So what is the alternative? The semaphore class?
Thanks.
Instead of using ThreadPool directly, you might also consider using TPL or PLINQ. For example, with PLINQ you could do something like this:
getItems().AsParallel()
.WithDegreeOfParallelism(numberOfThreadsYouWant)
.ForAll(item => process(item));
or using Parallel:
var options = new ParallelOptions {MaxDegreeOfParallelism = numberOfThreadsYouWant};
Parallel.ForEach(getItems, options, item => process(item));
Make sure that specifying the degree of parallelism does actually improve performance of your application. TPL and PLINQ use ThreadPool by default, which does a very good job of managing the number of threads that are running. In .NET 4, ThreadPool implements algorithms that add more processing threads only if that improves performance.
Don't use THE treadpool, get another one (just look for google, there are half a dozen implementations out) and manage that yourself.
Managing THE treadpool is not advisable as a lot of internal workings may go ther, managing your OWN threadpool instance is totally ok.
It looks like you can control the maximum number of threads using ThreadPool.SetMaxThreads, although I haven't tested this.
Assuming the question is; "How do I limit the number of worker threads?" The the answer would be use a producer-consumer queue where you control the number of worker threads. Just queue your items and let it handle workers.
Here is a generic implementation you could use.
you can use ThreadPool.SetMaxThreads Method
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads.aspx
In the documentation, there is a mention of SetMaxThreads ...
public static bool SetMaxThreads (
int workerThreads,
int completionPortThreads
)
Sets the number of requests to the thread pool that can be active concurrently. All requests above that number remain queued until thread pool threads become available.
However:
You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer.
But I guess you are anyways better served by using a non-singleton thread pool.
There is no reason to deal with hybrid thread synchronization constructs (such is AutoResetEvent) and the ThreadPool.
You can use a class that can act as the coordinator responsible for executing all of your code asynchronously.
Wrap using a Task or the APM pattern what the "Item_Thread" does. Then use the AsyncCoordinator class by Jeffrey Richter (can be found at the code from the book CLR via C# 3rd Edition).

Simple Multithreading Question

Ok I should already know the answer but...
I want to execute a number of different tasks in parallel on a separate thread and wait for the execution of all threads to finish before continuing. I am aware that I could have used the ThreadPool.QueueUserWorkItem() or the BackgroundWorker but did not want to use either (for no particular reason).
So is the code below the correct way to execute tasks in parallel on a background thread and wait for them to finish processing?
Thread[] threads = new Thread[3];
for (int i = 0; i < threads.Length; i++)
{
threads[i] = new Thread(SomeActionDelegate);
threads[i].Start();
}
for (int i = 0; i < threads.Length; i++)
{
threads[i].Join();
}
I know this question must have been asked 100 times before so thanks for your answer and patience.
Yes, this is a correct way to do this. But if SomeActionDelegate is relatively small then the overhead of creating 3 threads will be significant. That's why you probaly should use the ThreadPool anyway. Even though it has no clean equivalent to Join.
A BackgroundWorker is mainly useful when interacting with the GUI.
It's always hard to say what the "correct" way is, but your approach does work correctly.
However, by using Join, the UI thread (if the main thread is a UI thread) will be blocked, therefore freezing the UI. If that is not the case, your approach is basically OK, even if it has different problems (scalability/number of concurrent threads, etc.).
if your are (or will be using .Net 4.0) try using Task parallel Library

Categories

Resources