C# .NET choice of Multithreading approach - c#

I've looked over multiple similar questions on SO, but I still couldn't answer my own question.
I have a console app (an Azure Webjob actually) which does file processing and DB management. Some heavy data being downloaded from multiple sources and processed on the DB.
Here's an example of my code:
var dbLongIndpendentProcess = doProcesAsync();
var myfilesTasks = files.Select(file => Task.Run(
async () =>
{
// files processing
}
await myfilesTasks.WhenAll();
await dbLongIndpendentProcess;
// continue with other stuff;
It all works fine and does what I am expecting it to do. There are other tasks running in this whole process, but I guess the idea is clear from the code above.
My question: Is this a fair way of approaching this, or would I get more performance (or sense?) by doing the good old "manual" multithreading? The main reason I chose this approach was that it's simple and straightforward.
However, wasn't async/await primarily aimed at doing asynchronous not to block the main (UI) thread. Here I don't have any UI and I am not doing anything. event-driven.
Thanks,

I don't think you're multithreading by using this approach (except the single Task.Run), async doesn't generally run things on separate threads, it only prevents things from blocking. See: https://msdn.microsoft.com/en-gb/library/mt674882.aspx#Anchor_5
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
It would be much better to use tasks for the things you want to multithread, then you can take better advantage of machine cores and resources. You might want to look at a task based solution such as Pipelining (which may work in this scenario) etc...: https://msdn.microsoft.com/en-gb/library/ff963548.aspx or another alternative.

Related

Is there a neat way to force a pile of `async` C# code to run single-threadly as though it weren't actually `async`

Suppose (entirely hypothetically ;)) I have a big pile of async code.
10s of classes; 100s of async methods, of which 10s are actually doing async work (e.g. where we WriteToDbAsync(data) or we ReadFileFromInternetAsync(uri), or when WhenAll(parallelTasks).
And I want to do a bunch of diagnostic debugging on it. I want to perf profile it, and step through a bunch of it manually to see what's what.
All my tools are designed around synchronous C# code. They will sort of work with async, but it's definitely much less effective, and debugging is way harder, even when I try to directly manage the threads a bit.
If I'm only interested in a small portion of the code, then it's definitely a LOT easier to temporarily un-async that portion of the code. Read and Write synchronously, and just Task.Wait() on each of my "parallel" Tasks in sequence. But that's not viable for to do if I want to poke around in a large swathe of the code.
Is there anyway to ask C# to run some "async" code like that for me?
i.e. some sort of (() => MyAsyncMethod()).RunAsThoughAsyncDidntExist() which knows that any time it does real async communication with the outside world, it should just spin (within the same thread) until it gets an answer. Any time it's asked to run code in parallel ... don't; just run them in series on its single thread. etc. etc.
I'm NOT talking about just awaiting for the Task to finish, or calling Task.Wait(). Those won't change how that Task executes itself
I strongly assume that this sort of thing doesn't exist, and I just have to live with my tools not being well architected for async code.
But it would be great if someone with some expertise in the area, could confirm that.
EDIT: (Because SO told me to explain why the suggestion isn't an answer)...
Sinatr suggested this: How do I create a custom SynchronizationContext so that all continuations can be processed by my own single-threaded event loop? but (as I understand it) that is going to ensure that each time there's an await command then the code after that await continues on the same thread. But I want the actual contents of the await to be on the same thread.
Keep in mind that asynchronous != parallel.
Parallel means running two or more pieces of code at the same time, which can only be done with multithreading. It's about how code runs.
Asynchronous code frees the current thread to do other things while it is waiting for something else. It is about how code waits.
Asynchronous code with a synchronization context can run on a single thread. It starts running on one thread, then fires off an I/O request (like an HTTP request), and while it waits there is no thread. Then the continuation (because there is a synchronization context) can happen on the same thread depending on what the synchronization context requires, like in a UI application where the continuation happens on the UI thread.
When there is no synchronization context, then the continuation can be run on any ThreadPool thread (but might still happen on the same thread).
So if your goal is to make it initially run and then resume all on the same thread, then the answer you were already referred to is indeed the best way to do it, because it's that synchronization context that decides how the continuation is executed.
However, that won't help you if there are any calls to Task.Run, because the entire purpose of that method is to start a new thread (and give you an asynchronous way to wait for that thread to finish).
It also may not help if the code uses .ConfigureAwait(false) in any of the await calls, since that explicitly means "I don't need to resume on the synchronization context", so it may still run on a ThreadPool thread. I don't know if Stephen's solution does anything for that.
But if you really want it to "RunAsThoughAsyncDidntExist" and lock the current thread while it waits, then that's not possible. Take this code for example:
var myTask = DoSomethingAsync();
DoSomethingElse();
var results = await myTask;
This code starts an I/O request, then does something else while waiting for that request to finish, then finishes waiting and processes the results after. The only way to make that behave synchronously is to refactor it, since synchronous code isn't capable of doing other work while waiting. A decision would have to be made whether to do the I/O request before or after DoSomethingElse().

Is it pointless to use Threads inside Tasks in C#?

I know the differences between a thread and a task., but I cannot understand if creating threads inside tasks is the same as creating only threads.
It depends on how you use the multithreaded capabilities and the asynchronous programming semantics of the language.
Simple facts first. Assume you have an initial, simple, single-threaded, and near empty application (that just reads a line of input with Console.ReadLine for simplicity sake). If you create a new Thread, then you've created it from within another thread, the main thread. Therefore, creating a thread from within a thread is a perfectly valid operation, and the starting point of any multithreaded application.
Now, a Task is not a thread per se, but it gets executed in one when you do Task.Run which is selected from a .NET managed thread pool. As such, if you create a new thread from within a task, you're essentially creating a thread from within a thread (same as above, no harm done). The caveat here is, that you don't have control of the thread or its lifetime, that is, you can't kill it, suspend it, resume it, etc., because you don't have a handle to that thread. If you want some unit of work done, and you don't care which thread does it, just that's it not the current one, then Task.Run is basically the way to go. With that said, you can always start a new thread from within a task, actually, you can even start a task from within a task, and here is some official documentation on unwrapping nested tasks.
Also, you can await inside a task, and create a new thread inside an async method if you want. However, the usability pattern for async and await is that you use them for I/O bound operations, these are operations that require little CPU time but can take long because they need to wait for something, such as network requests, and disk access. For responsive UI implementations, this technique is often used to prevent blocking of the UI by another operation.
As for being pointless or not, it's a use case scenario. I've faced situations where that could have been the solution, but found that redesigning my program logic so that if I need to use a thread from within a task, then what I do is to have two tasks instead of one task plus the inner thread, gave me a cleaner, and more readable code structure, but that it's just personal flair.
As a final note, here are some links to official documentation and another post regarding multithreaded programming in C#:
Async in Depth
Task based asynchronous programming
Chaining Tasks using Continuation Tasks
Start multiple async Tasks and process them as they complete
Should one use Task.Run within another Task
It depends how you use tasks and what your reason is for wanting another thread.
Task.Run
If you use Task.Run, the work will "run on the ThreadPool". It will be done on a different thread than the one you call it from. This is useful in a desktop application where you have a long-running processor-intensive operation that you just need to get off the UI thread.
The difference is that you don't have a handle to the thread, so you can't control that thread in any way (suspend, resume, kill, reuse, etc.). Essentially, you use Task.Run when you don't care which thread the work happens on, as long as it's not the current one.
So if you use Task.Run to start a task, there's nothing stopping you from starting a new thread within, if you know why you're doing it. You could pass the thread handle between tasks if you specifically want to reuse it for a specific purpose.
Async methods
Methods that use async and await are used for operations that use very little processing time, but have I/O operations - operations that require waiting. For example, network requests, read/writing local storage, etc. Using async and await means that the thread is free to do other things while you wait for a response. The benefits depend on the type of application:
Desktop app: The UI thread will be free to respond to user input while you wait for a response. I'm sure you've seen some programs that totally freeze while waiting for a response from something. This is what asynchronous programming helps you avoid.
Web app: The current thread will be freed up to do any other work required. This can include serving other incoming requests. The result is that your application can handle a bigger load than it could if you didn't use async and await.
There is nothing stopping you from starting a thread inside an async method too. You might want to move some processor-intensive work to another thread. But in that case you could use Task.Run too. So it all depends on why you want another thread.
It would be pointless in most cases of everyday programming.
There are situations where you would create threads.

Task Parallel Library consuming lots of space on production server

I am using –
Task task = new Task(delegate { GetRecordsForEmailReplies(headingList, partialEntity); });
task.Start();
to run some heavy methods, but the problem is it’s consuming lots of
space of CPU on server some time IIS Work process increased above 60%
thats why server gets stuck.
Is there any solution to manage this problem, so please let me know? or any other option to run these heavy method without blocking the page load?
The suggested means for creating a new Task or Task<T> is to use Task.Factory.StartNew with .NET 4.0 or Task.Run with .NET 4.5. There is a detailed explanation from Stephen Toub here on the topic.
He explains that it's more efficient:
For example, we take a lot of care within TPL to make sure that when accessing tasks from multiple threads concurrently, the "right" thing happens. A Task is only ever executed once, and that means we need to ensure that multiple calls to a task's Start method from multiple threads concurrently will only result in the task being scheduled once.
Again, use the Task.Run instead:
Task.Run(() => GetRecordsForEmailReplies(headingList, partialEntity));

await Console.ReadLine()

I am currently building an asynchronous console application in which I have created classes to handle separate areas of the application.
I have created an InputHandler class which I envisioned would await Console.ReadLine() input. However, you cannot await such a function (since it is not async), my current solution is to simply:
private async Task<string> GetInputAsync() {
return Task.Run(() => Console.ReadLine())
}
which runs perfectly fine. However, my (limited) understanding is that calling Task.Run will fire off a new (parallel?) thread. This defeats the purpose of async methods since that new thread is now being blocked until Readline() returns right?
I know that threads are an expensive resource so I feel really wasteful and hacky doing this. I also tried Console.In.ReadLineAsync() but it is apparently buggy? (It seems to hang).
I know that threads are an expensive resource so I feel really wasteful and hacky doing this. I also tried Console.In.ReadLineAsync() but it is apparently buggy? (It seems to hang).
The console streams unfortunately do have surprising behavior. The underlying reason is that they block to ensure thread safety for the console streams. Personally I think that blocking in an asynchronous method is a poor design choice, but Microsoft decided to do this (only for console streams) and have stuck by their decision.
So, this API design forces you to use background threads (e.g., Task.Run) if you do want to read truly asynchronously. This is not a pattern you should normally use, but in this case (console streams) it is an acceptable hack to work around their API.
However, my (limited) understanding is that calling Task.Run will fire off a new (parallel?) thread.
Not quite. Task.Run will queue some work to the thread pool, which will have one of its threads execute the code. The thread pool manages the creation of threads as necessary, and you usually don't have to worry about it. So, Task.Run is not as wasteful as actually creating a new thread every time.

Blocking Methods within Task

I'm currently developing a small server application and getting to grips with Task<>, and other associated operations.
I'm wondering how Blocking operations work within a Task.
So for example, I currently use several Libraries with "blocking" operations. One of them is Npgsql (PostgreSQL provider.)
If I do the following...
Task myTask = new Task<>( () =>
{
using(var db = new PostgresqlDatabaseConnection())
{
db.ExecuteQuery("SELECT takes 50 ms to get data...")
db.Insert(anObject); etc....
}
}
).Start();
And say, chain it to a bunch of other tasks that process that data.
Is this Efficient? I.E. Let's say that ExexuteQuery calls some kind of Thread.Sleep(1) or somehow blocks the thread, is this going to effect my Task Execution?
I'm asking because my server uses several libraries that would have to be rewritten to accomodate a totally Asynchronous methodology. Or is this Async enough?
*My Thoughts *
I'm really just not sure.
I know that if for example, the db.Executre() just ran a while(true) loop until it got it's data, it would almost certainly be blocking my server. Because a lot of time would be spend processing while(true). Or is Task smart enough to know that it should spend less time on this? Or if internally it is using some waiting mechanism, does the Task library know? Does it know that it should be processing some other task while it waits.
I'm currently developing a small server application and getting to
grips with Task<>, and other associated operations.
You won't benefit from using new Task, Task.Factory.StartNew, Task.Run in the server-side application, unless the number of concurrent client connections is really low. Check this and this for some more details.
You would however greatly benefit from using naturally asynchronous API. They don't block a pool thread while "in-flight", so the thread is returned to the pool and then can get busy serving another client request. This improves your server app scalability.
I'm not sure if PostgreSQL provides such API, look for something like ExecuteQueryAsync or BeginExecuteQuery/EndExecuteQuery. If it doesn't have that, just use the synchronous ExecuteQuery method, but do not offload it to a pool thread as you do in your code fragment.
Using the async/await features of C# 5 will definitely make things easier. It can make asynchronous code easier to write, as you write it very similar to how you would write synchronous code.
Take the following example. I am using Thread.Sleep to simulate a long running operation, so any libraries that don't support async natively can still be used via Task.Run. While the Thread.Sleep is holding up the thread, your UI is still responsive. If you had written this code synchronously, your UI would hold up for 1.5 seconds until the thread.sleep was finished.
private async void button1_Click(object sender, EventArgs e)
{
Console.WriteLine("0");
await DoWorkAsync();
Console.WriteLine("3");
}
private async Task DoWorkAsync()
{
Console.WriteLine("1");
await Task.Run(()=>
{
// Do your db work here.
Thread.Sleep(1500);
});
Console.WriteLine("2");
}
So in short, if you have a long running database operations and you want to keep your UI responsive, you should leverage async/await. While this does keep your UI responsive, it does introduce new challenges like: what happens if the user clicks a button multiple times or what if the user closes the window while you are still processing to name some simple cases.
I encourage you to read further on the subject. Jon Skeet has a multi-part series on async. There are also numerous MSDN articles on the subject: 1 2 3 ...
Async programming does nothing to improve the efficiency of your logic or of the database. All it influences it the performance of switching between operations.
You cannot make a query or a computation faster wrapping it in a Task. You only add overhead.
Async IO is used on the server to achieve scalability to hundreds of concurrent requests. You don't need it here.

Categories

Resources