Asynchronous Delegate Invocation (ADI) vs. Task Parallel Library (TPL) - c#

I get this comment on ADI while reading Essential C# 4.0:
Unfortunately, the underlying
technology used by the asynchronous
delegate invocation pattern is an
end-of-further-development technology
for distributed programming known as
remoting. And although Microsoft still
supports the use of asynchronous
delegate invocation and it will
continue to function as it does today
for the foreseeable future, the
performance characteristics are
suboptimal given other
approaches—namely Thread, ThreadPool,
and TPL. Therefore, developers should
tend to favor one of these
alternatives rather than implementing
new development using the asynchronous
delegate invocation API. Further
discussion of the pattern is included
in the Advanced Topic text that
follows so that developers who
encounter it will understand how it
works.
So are there any limitations that ADI has and TPL doesn't, besides that TPL probably uses a not-end-of-further-development-yet technology?

Tasks and async delegates both use thread pool.
Tasks and async delegates are similar in the sense that exception can be propagated to caller. Tasks go one step further, accumulating all thrown exceptions and presenting them for all thread pool workers together.
Tasks allow for cancellation.
There's a free chapter that describes all of this in more detail:
http://www.albahari.com/threading/

You ask for "limitations".
I don't think you will find anything that can't be done with ADI (also called APM). The point is performance and programmer effort.
The verdict seems unanimous, Joe Duffy also warns you away from the ADI/APM
And the conclusion is easy, use the TPL if you can. It is easy and efficient. And it's at the just-the -beginning-of-further-development point.

Not that I am an expert in TPL. From what I understand TPL abstracts the decisions on the level of parallelism as configurations/specification.
For instance, in a parallel for loop.
Parallel.For(0, 1000, a => Thread.Sleep(10000));
You don't necessarily spawn 1000 threads. The TPL will "parallelise" to the appropriate number of threads. As opposed to asynchronously invoking a method 1000 times. (Which won't create 1000 threads either, but you will just have blocked execution calls until the required resources are freed up.
Also, TPL allows you a higher level control of the parallel tasks. In the above example, you can pause/break/abort the for loop easily. Such as.
Parrallel.For(0, 1000, (a, loopState) => loopState.Break());
It's a bit of hassle to achieve the above using conventional async method invoke.
TL,DR: TPL are more efficient and easier to use.

Related

Is Task.Run or TaskFactory.StartNew always inappropriate to use in async methods?

I've heard that the responsibility for threading should lie on the application and I shouldn't use Task.Run or maybe TaskFactory.StartNew in async methods.
However if I have a library that has methods that do quite heavy computation, then to free the threads that for example are accepting asp .net core http requests, couldn't I make the method async and make it run a long running task? Or this should be a sync method and the asp .net core application should be responsible to start the task?
At first, let's think why we need Asynchrony?
Asynchrony is needed either for scalability or offloading.
In case of Scalability, exposing async version of that call does nothing. Because you’re typically still consuming the same amount of resources you would have if you’d invoked it synchronously, even a bit more. But, Scalability is achieved by decreasing the amount of resources you use. And you are not decreasing resources by using Task.Run().
In case of Offloading, you can expose async wrappers of your sync methods. Because it can be very useful for responsiveness, as it allows you to offload long-running operations to a different thread. And in that way, you are getting some benefit from that async wrapper of your method.
Result:
Wrapping a synchronous method with a simple asynchronous façade does not yield any scalability benefits, but yields offloading benefits. But in such cases, by exposing only the synchronous method, you get some nice benefits. For example:
Surface area of your library is reduced.
Your users will know whether there are actually scalability benefits to using exposed asynchronous APIs
If both the synchronous method and an asynchronous wrapper around it are exposed, the developer is then faced with thinking they should invoke the asynchronous version for scalability(?) reasons, but in reality will actually be hurting their throughput by paying for the additional offloading overhead without the scalability benefits.
The source is Should I expose asynchronous wrappers for synchronous methods? by Stepen Toub. And I strongly recommend to you to read it.
Update:
Question in the comment:
Scalability is well explained in that article, with one example. Let's take into account Thread.Sleep. There are two possible ways to implement async version of that call:
public Task SleepAsync(int millisecondsTimeout)
{
return Task.Run(() => Sleep(millisecondsTimeout));
}
And another new implementation:
public Task SleepAsync(int millisecondsTimeout)
{
TaskCompletionSource<bool> tcs = null;
var t = new Timer(delegate { tcs.TrySetResult(true); }, null, –1, -1);
tcs = new TaskCompletionSource<bool>(t);
t.Change(millisecondsTimeout, -1);
return tcs.Task;
}
Both of these implementations provide the same basic behavior, both completing the returned task after the timeout has expired. However, from a scalability perspective, the latter is much more scalable. The former implementation consumes a thread from the thread pool for the duration of the wait time, whereas the latter simply relies on an efficient timer to signal the Task when the duration has expired.
So, in your case, just wrapping call with Task.Run won't be exposed for scalability, but offloading. But, user of that library is not aware of that.
User of your library, can just wrap that call with Task.Run himself. And I really, think he must do it.
Not exactly answering the question (I think the other answer is good enought for that), but to add some additional advice: Becareful with using Task.Run in a library which other people can use. It can cause unexpected Thread pool starvation for the library users. For example a developer is using a lot of third party libraries and all of them use Task.Run() and stuff. Now the developer tries to use Task.Run in his app too, but it slows down his app, because the thread pool is already used up by the third party libraries.
When you want to parallel stuff with Parallel.ForEach it is a different issue.

C# TPL Threading Task

What is the difference between Task class and parallel class which part of TPL at implementation point of view.?
I believe task class is having more benefits than threadpool and thread but still context switch happens in task class as well.
But parallel class is basically design to run program on multicore processor?
Your question is extremely wide and can contain lots of details as an answer, but let me restrict to specific details.
Task - Wrap a method for execution down the line, it use the Lambda (Action, Func Delegate) to do the same. You can wrap now and execute anytime later.
Parallel is an API which helps achieve the Data Parallelization, where you can divide a collection (IEnumerable type) into smaller chunks and each can be executed in parallel and finally aggregated to achieve the result
There are broadly two kinds of parallelism, in one you can subdivide the bigger task into smaller ones, wrap them in a Task type and wait for all or some of them to complete in parallel. This is task parallelism
In other one you take each data unit in a collection and work on it in a mutually exclusive manner, which is data parallelism achieved by Parallel.forEach or Parallel.For APIs
These are introduced from .Net 4.0 onward to make the parallelism easy for the developer, else we had to dabble with Thread and ThreadPool class, which require much more in-depth understanding of the working of threads, here lot of complexity is taken care of internally.
However, don't be under the impression that current mechanism doesn't use threads, both the above mentioned form of parallelism rely completely on ThreadPool threads, that's why we have all the stuff like context -switching happening, multiple threads getting invoked, just that microsoft has made developer life easy by doing it
You may want to go through following links for a better understanding, let me know if there's still a specific query:
Parallel.ForEach vs Task.Factory.StartNew
Past, Present and Future of Parallelism
Parallel.ForEach vs Task.Run and Task.WhenAll
TPL is designed to minimize pre-emptive context-switching (caused by thread oversubscription – having more threads than cores). Task abstractions, of which TPL is an implementation, are designed for cooperative parallelism, where the developer controls when a task will relinquish its execution (typically upon completion). If you schedule more tasks than you have cores, TPL will only execute concurrently approximately as many tasks as you have core; the rest will be queued. This promotes throughout since it avoids the overheads of context-switching, but reduces responsiveness, as each task may take longer to start being processed.
The Parallel class is yet a higher level of abstraction that builds on top of TPL. Implementation-wise, Parallel generates a graph of tasks, but can use heuristics for deciding the granularity of the said tasks depending on your work.

PLINQ vs Tasks vs Async vs Producer/Consumer queue? What to use?

I was reading C# 5.0 in nutshell and after reading author's view(s), I am quite confused as to what should I adopt. My requirement is that say I have a really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file, or really any other thing is is computationally heavy and is likely to take some time, what should be my approach toward developing it (in winforms if that matters, using VS 2012, C# 5.0), so that I can also report progress to the user.
Following scenario(s) come to mind...
Create a Task (with LongRunning option that computes the hashes and report the progress to user either by implementing IProgess<T> or Progess<T> or letting the task capture the SynchronizationContext context and posting to the UI.
Create a Async method like
async CalculateHashesAsync()
{
// await here for tasks the calculate the hash
await Task.Rung(() => CalculateHash();
// how do I report progress???
}
Use TPL (or PLINQ) as
void CalcuateHashes()
{
Parallel.For(0, allFiles.Count, file => calcHash(file)
// how do I report progress here?
}
Use a producer / consumer Queue.
Don't really know how?
The author in the book says...
Running one long running task on a pooled thread won't cause
trouble. It's when you run multiple long running tasks in parallel
(particularly ones that block) that performance can suffer. In that
case, there are usually better solutions than
TaskCreationOptions.LongRunnging
If tasks are IO bound, TaskCompletionSource and asynchronous functions let you
implement concurrency with callbacks instead of threads.
If tasks are compute bound, a producer/consumer queue lets you throttle the concurrency for those tasks, avoiding starvation for
other threads and process.
About the Producer/Consumer the author says...
A producer/consumer queue is a useful structure, both in parallel
programming and general concurrency scenarios as it gives you precise
control over how many worker threads execute at once, which is useful
not only in limiting CPU consumption, but other resources as well.
So, should I not use task, meaning that first option is out? Is second one the best option? Are there any other options? And If I were to follow author's advice, and implement a producer/consumer, how would I do that (I don't even have an idea of how to get started with producer/consumer in my scenario, if that is the best approach!)
I'd like to know if someone has ever come across such a scenario, how would they implement? If not, what would be the most performance effective and/or easy to develop/maintain (I know the word performance is subjective, but let's just consider the very general case that it works, and works well!)
really long running (computationally heavy) task, say for example, calculate SHA1 (or some other) hash of millions of file
That example clearly has both heavy CPU (hashing) and I/O (file) components. Perhaps this is a non-representative example, but in my experience even a secure hash is far faster than reading the data from disk.
If you just have CPU-bound work, the best solution is either Parallel or PLINQ. If you just have I/O-bound work, the best solution is to use async. If you have a more realistic and complex scenario (with both CPU and I/O work), then you should either hook up your CPU and I/O parts with producer/consumer queues or use a more complete solution such as TPL Dataflow.
TPL Dataflow works well with both parallel (MaxDegreeOfParallelism) and async, and has a builtin producer/consumer queue in-between each block.
One thing to keep in mind when mixing massive amounts of I/O and CPU usage is that different situations can cause massively different performance characteristics. To be safe, you'll want to throttle the data going through your queues so you won't end up with memory usage issues. TPL Dataflow has built-in support for throttling via BoundedCapacity.

When is the System.Threading.Task useful?

I have used most of the Threading library extensively. I am fairly familiar with creating new Threads, creating BackgroundWorkers and using the built-in .NET ThreadPool (which are all very cool).
However, I have never found a reason to use the Task class. I have seen maybe one or two examples of people using them, but the examples weren't very clear and they didn't give a high-level overview of why one should use a task instead of a new thread.
Question 1: From a high-level, when is using a task useful versus one of the other methods for parallelism in .NET?
Question 2: Does anyone have a simple and/or medium difficulty example demonstrating how to use tasks?
There are two main advantages in using Tasks:
Task can represent any result that will be available in the future (the general concept is not specific to .Net and it's called future), not just a computation. This is especially important with async-await, which uses Tasks for asynchronous operations. Since the operation that gets the result might fail, Tasks can also represent failures.
Task has lots of methods to operate on them. You can synchronously wait until it finishes (Wait()), wait for its result (Result), set up some operation when the Task finishes (ContinueWith()) and also some methods that work on several Tasks (WaitAll(), WaitAny(), ContinueWhenAll()). All of this is possible using other parallel processing methods, but you would have to do it manually.
And there are also some smaller advantages to using Task:
You can use a custom TaskScheduler to decide when and where will the Task run. This can be useful for example if you want to run a Task on the UI thread, limit the degree of parallelism or have a Task-level readers–writer lock.
Tasks support cooperative cancellation through CancellationToken.
Tasks that represent computations have some performance improvements. For example, they use work-stealing queue for more efficient processing and they also support inlining (executing Task that hasn't started yet on a thread that synchronously waits for it).

when to use Task and when to use Thread?

I've just asked question about Task but realized that I actually want to ask more general question.
Could someone summarize pros and cons of Tasks and Threads.
How to understand should I use Task or Thread?
Task is an order to program to do something in asynchronous way. The Thread is actually OS kernel object which executes what was requested. Think about Task like a clever thread aggregator/organizer that "knows" how much task is better to run contemporary on your CPU. It's just cleverer then common implementations of multi-threading (that's why it's suggested choice from Microsoft). It's a feature that helps you managing Threads in easier way.
Look also on this Should i use ThreadPools or Task Parallel Library for IO-bound operations that may give you some hints on performance issues you may be interested in.

Categories

Resources