My app does a lot of background tasks, and for each such action, I created a new thread for each action
Thread StreamResponse = new Thread(() =>
{
DB.ToDb(flag);
});
StreamResponse.Start();
These actions take place thousands per minute. Noticed that the app starts eating RAM. That's normal, because the application creates thousands of threads and not closing them.
From here there is a question, how do I make it so that these actions were in a separate thread. For example, I create a thread in a separate class and in this thread commit acts.
Or otherwise can? The task that the thread was made and after the completion of the action is automatically closed. But perhaps more correctly just all these steps to do in another thread. What do you think? How to make?
May be using Dispatcher.BeginInvoke ?
It seems that you could benefit from using the ThreadPool Class. From MSDN:
Provides a pool of threads that can be used to execute tasks, post work items, process asynchronous I/O, wait on behalf of other threads, and process timers.
Here is a sample code for using it:
ThreadPool.QueueUserWorkItem((x) =>
{
DB.ToDb(flag);
});
And you can set the maximum number of concurrent threads available in the thread pool using the ThreadPool.SetMaxThreads Method in order to improve performance and limit memory usage in your app.
Why not use task factory?
var task = Task.Factory.StartNew(()=>{ /*do stuff*/ });
Task factory works the same way as Queue does but has the advantage of allowing you to handle any return information more elegantly
var result = await task;
The Factory will only spawn threads when it makes sense to, so essentailly is the same as the thread pool
If you want to use schedule longer running tasks this is also considered the better option, in terms of recommended practices, but would need to specify that information on the creation of the task.
If you need further information on there is a good answer available here : ThreadPool.QueueUserWorkItem vs Task.Factory.StartNew
Related
I know the differences between a thread and a task., but I cannot understand if creating threads inside tasks is the same as creating only threads.
It depends on how you use the multithreaded capabilities and the asynchronous programming semantics of the language.
Simple facts first. Assume you have an initial, simple, single-threaded, and near empty application (that just reads a line of input with Console.ReadLine for simplicity sake). If you create a new Thread, then you've created it from within another thread, the main thread. Therefore, creating a thread from within a thread is a perfectly valid operation, and the starting point of any multithreaded application.
Now, a Task is not a thread per se, but it gets executed in one when you do Task.Run which is selected from a .NET managed thread pool. As such, if you create a new thread from within a task, you're essentially creating a thread from within a thread (same as above, no harm done). The caveat here is, that you don't have control of the thread or its lifetime, that is, you can't kill it, suspend it, resume it, etc., because you don't have a handle to that thread. If you want some unit of work done, and you don't care which thread does it, just that's it not the current one, then Task.Run is basically the way to go. With that said, you can always start a new thread from within a task, actually, you can even start a task from within a task, and here is some official documentation on unwrapping nested tasks.
Also, you can await inside a task, and create a new thread inside an async method if you want. However, the usability pattern for async and await is that you use them for I/O bound operations, these are operations that require little CPU time but can take long because they need to wait for something, such as network requests, and disk access. For responsive UI implementations, this technique is often used to prevent blocking of the UI by another operation.
As for being pointless or not, it's a use case scenario. I've faced situations where that could have been the solution, but found that redesigning my program logic so that if I need to use a thread from within a task, then what I do is to have two tasks instead of one task plus the inner thread, gave me a cleaner, and more readable code structure, but that it's just personal flair.
As a final note, here are some links to official documentation and another post regarding multithreaded programming in C#:
Async in Depth
Task based asynchronous programming
Chaining Tasks using Continuation Tasks
Start multiple async Tasks and process them as they complete
Should one use Task.Run within another Task
It depends how you use tasks and what your reason is for wanting another thread.
Task.Run
If you use Task.Run, the work will "run on the ThreadPool". It will be done on a different thread than the one you call it from. This is useful in a desktop application where you have a long-running processor-intensive operation that you just need to get off the UI thread.
The difference is that you don't have a handle to the thread, so you can't control that thread in any way (suspend, resume, kill, reuse, etc.). Essentially, you use Task.Run when you don't care which thread the work happens on, as long as it's not the current one.
So if you use Task.Run to start a task, there's nothing stopping you from starting a new thread within, if you know why you're doing it. You could pass the thread handle between tasks if you specifically want to reuse it for a specific purpose.
Async methods
Methods that use async and await are used for operations that use very little processing time, but have I/O operations - operations that require waiting. For example, network requests, read/writing local storage, etc. Using async and await means that the thread is free to do other things while you wait for a response. The benefits depend on the type of application:
Desktop app: The UI thread will be free to respond to user input while you wait for a response. I'm sure you've seen some programs that totally freeze while waiting for a response from something. This is what asynchronous programming helps you avoid.
Web app: The current thread will be freed up to do any other work required. This can include serving other incoming requests. The result is that your application can handle a bigger load than it could if you didn't use async and await.
There is nothing stopping you from starting a thread inside an async method too. You might want to move some processor-intensive work to another thread. But in that case you could use Task.Run too. So it all depends on why you want another thread.
It would be pointless in most cases of everyday programming.
There are situations where you would create threads.
Okay , let me try to put it in sentences ...
Lets consider an example,
where I create an async method and call it with await keyword,
As far as my knowledge tells me,
The main thread will be released
In a separate thread, async method will start executing
Once it is executed, The pointer will resume from last position It left in main thread.
Question 1 : Will it come back to main thread or it will be a new thread ?
Question 2: Does it make any difference if the async method is CPU bound or network bound ? If yes, what ?
The important question
Question 3 : Assuming that is was a CPU bound method, What did I achieve? I mean - main thread was released, but at the same time, another thread was used from thread pool. what's the point ?
async does not start a new thread. Neither does await. I recommend you read my async intro post and follow up with the resources at the bottom.
async is not about parallel programming; it's about asynchronous programming. If you need parallel programming, then use the Task Parallel Library (e.g., PLINQ, Parallel, or - in very complex cases - raw Tasks).
For example, you could have an async method that does I/O-bound operations. There's no need for another thread in this scenario, and none will be created.
If you do have a CPU-bound method, then you can use Task.Run to create an awaitable Task that executes that method on a thread pool thread. For example, you could do something like await Task.Run(() => Parallel...); to treat some parallel processing as an asynchronous operation.
Execution of the caller and async method will be entirely on the current thread. async methods don't create a new thread and using async/await does not actually create additional threads. Instead, thread completions/callbacks are used with a synchronization context and suspending/giving control (think Node.js style programming). However, when control is issued to or returns to the await statement, it may end up being on a different completion thread (this depends on your application and some other factors).
Yes, it will run slower if it is CPU or Network bound. Thus the await will take longer.
The benefit is not in terms of threads believe it or not... Asynchronous programming does not necessarily mean multiple threads. The benefit is that you can continue doing other work that doesn't require the async result, before waiting for the async result... An example is a web server HTTP listener thread pool. If you have a pool of size 20 then your limit is 20 concurrent requests... If all of these requests spend 90% of their time waiting on database work, you could async/await the database work and the time during which you await the database result callback will be freed... The thread will return to the HTTP listener thread pool and another user can access your site while the original one waits for the DB work to be done, upping your total limit.
It's really about freeing up threads that wait on externally-bound and slow operations to do other things while those operations execute... Taking advantage of built-in thread pools.
Don't forget that the async part could be some long-running job, e.g. running a giant database query over the network, downloading a file from the internet, etc.
When I create an array of tasks like this:
var taskArray = new Task<double>[]
{
Task.Factory.StartNew(() => new Random().NextDouble()),
Task.Factory.StartNew(() => new Random().NextDouble()),
Task.Factory.StartNew(() => new Random().NextDouble())
};
Will this create 3 threads for sure, or it is up to the CLR to create threads as it sees fit?
So if I perform this in a web request, this means there will be at least 4 threads created to service the request correct? (the web request + 1 for each task)
Will this create 3 threads for sure, or it is up to the CLR to create threads as it sees fit?
The latter. In particular, as those tasks complete so quickly I wouldn't be surprised if they all executed on the same thread (although not the same thread as the one calling StartNew) - particularly if this is in a "clean" process and the thread pool hasn't had to fire up many threads yet. (IIRC, the thread pool only starts one new thread every 0.5 seconds, which would give plenty of time for all of your tasks to execute on a single thread.)
You can use your own custom TaskScheduler if you really want to, but that would be relatively extreme.
You should read the MSDN article on task schedulers (including the default one) for more information.
No easy answer. It depend on available resources on server. It'll setup queue and if server can run 3 thread in same time (it run it), otherwise it'll queue it.
Maybe I did not understand it right ... all the Parallel class issue :(
But from what I am reading now, I understand that when I use the Parallel I actually mobilize all the threads that exists in the threadPool for some task/mission.
For example:
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
DoSomething(someString);
});
So the Parallel.ForEach in this case is mobilizing all the threads that exists in the threadPool for the 'DoSomething' task/mission.
But does the call Parallel.ForEach will create any new thread at all?
Its clear that there will be no 1000 new threads. But lets assume that there are 1000 new threads, some case that the threadPool release all the thread that it hold so, in this case ... the Parallel.ForEach will create any new thread?
Short answer: Parallel.ForEach() does not “mobilize all the threads”. And any operation that schedules some work on the ThreadPool (which Parallel.ForEach() does) can cause creation of new thread in the pool.
Long answer: To understand this properly, you need to know how three levels of abstraction work: Parallel.ForEach(), TaskScheduler and ThreadPool:
Parallel.ForEach() (and Parallel.For()) schedule their work on a TaskScheduler. If you don't specify a scheduler explicitly, the current one will be used.
Parallel.ForEach() splits the work between several Tasks. Each Task will process a part of the input sequence, and when it's done, it will request another part if one is available, and so on.
How many Tasks will Parallel.ForEach() create? As many as the TaskScheduler will let it run. The way this is done is that each Task first enqueues a copy of itself when it starts executing (unless doing so would violate MaxDegreeOfParallelism, if you set it). This way, the actual concurrency level is up to the TaskScheduler.
Also, the first Task will actually execute on the current thread, if the TaskScheduler supports it (this is done using RunSynchronously()).
The default TaskScheduler simply enqueues each Task to the ThreadPool queue. (Actually, it's more complicated if you start a Task from another Task, but that's not relevant here.) Other TaskSchedulers can do completely different things and some of them (like TaskScheduler.FromCurrentSynchronizationContext()) are completely unsuitable for use with Parallel.ForEach().
The ThreadPool uses quite a complex algorithm to decide exactly how many threads should be running at any given time. But the most important thing here is that scheduling new work item can cause the creating of a new thread (although not necessarily immediately). And because with Parallel.ForEach(), there is always some item queued to be executed, it's completely up to the internal algorithm of ThreadPool to decide the number of threads.
Put together, it's pretty much impossible to decide how many threads will be used by a Parallel.ForEach(), because it depends on many variables. Both extremes are possible: that the loop will run completely synchronously on the current thread and that each item will be run on its own, newly created thread.
But generally, is should be close to optimal efficiency and you probably don't have to worry about all those details.
Parallel.Foreach does not create new threads, nor does it "mobilize all the threads". It uses a limited number of threads from the threadpool and submits tasks to them for parallel execution. In the current implementation the default is to use one thread per core.
I think you have this the wrong way round.
From PATTERNS OF PARALLEL PROGRAMMING you'll see that Parallel.ForEach is just really syntactic sugar.
The Parallel.ForEach is largely boiled down to something like this,
for (int p = 0; p < arrayStrings.Count(); p++)
{
ThreadPool.QueueUserWorkItem(DoSomething(arrayStrings[p]);
}
The ThreadPool takes care of the scheduling. There are some excellent articles around how the ThreadPool's scheduler behaves to some degree if you're interested, but that's nothing to do with TPL.
Parallel does not deal with threads at all - it schedules TASKS to the task framework. THat then has a scheduler and the default scheduler goes to the threadpool. This one will try to find a goo number of threads (better in 4.5 than 4.0) and the Threadpool may slowly spin up new threads.
But that is not a functoin of parallel.foreach ;)
the Parallel.ForEach will create any new thread ???
It never will. As I said - it has 1000 foreach, then it queues 10.000 tasks, Point. THe Task factory scheduler will do what it is programmed to do ((you can replace it). Generally, default - yes, slowly new threads will spring up WITHIN REASON.
I have a Main thread that spawns around 20 worker threads.
I need to stop the Main thread until all the other threads are finished.
I know about (thread).Join. But that only works for one thread.
and multiple Joins hurt performance like this.
t1.Join()
t2.Join()
...
t20.Join()
as the program waits one by one for each to stop.
How would I make it such that
the main thread waits for all of a set of threads to end?
You should really look into Task Parallelism (Task Parallel Library). It uses a thread-pool, but also manage task-stealing etc.
Quote: "The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details." on Task Parallel Library
You can use it like this:
Task[] tasks = new Task[3]
{
Task.Factory.StartNew(() => MethodA()),
Task.Factory.StartNew(() => MethodB()),
Task.Factory.StartNew(() => MethodC())
};
//Block until all tasks complete.
Task.WaitAll(tasks);
Or if you use some kind of a loop to spawn your threads:
Data Parallelism (Task Parallel Library)
The joins are fine if that's what you want it to do. The main thread still has to wait for all the worker threads to terminate. Check out this website which is a sample chapter from C# in a Nutshell. It just so happens to be the threading chapter: http://www.albahari.com/threading/part4.aspx.
I can't see an obvious performance penalty for waiting for the threads to finish one-by-one. So, a simple foreach does what you want without any unnecerrasy bells and whistles:
foreach (Thread t in threads) t.Join();
Note: Of course, there's a Win32 API function that allows waiting for several objects (threads, in this case) at once — WaitForMultipleObjectsEx. There are many helper classes or threading frameworks out there on the Internet that utilize it for what you want. But do you really need them for a simple case?
and multiple Joins hurt performance
like this.
There's no "performance hurting", if you want to wait for all of your threads to exit, you call .join() on the threads.
Stuff your threads in a list and do
foreach(var t in myThread)
t.join();
If you are sure you will always have < 64 threads then you could have each new thread reliably set an Event before it exits, and WaitAll on the events in your main thread, once all threads are started up. The Event object would be created in the main thread and passed to the relevant child thread in a thread-safe way at thread creation time.
In native code you could do the same thing on the thread handles themselves, but not sure how to do this in .Net.
See also this prior question: C#: Waiting for all threads to complete