I am new to .Net platform. I did a search and found that there are several ways to do parallel computing in .Net:
Parallel task in Task Parallel Library, which is .Net 3.5.
PLINQ, .Net 4.0
Asynchounous Programming, .Net 2.0, (async is mainly used to do I/O heavy tasks, F# has a concise syntax supporting this). I list this because in Mono, there seem to be no TPL or PLINQ. Thus if I need to write cross platform parallel programs, I can use async.
.Net threads. No version limitation.
Could you give some short comments on these or add more methods in this list?
You do need to do a fair amount of research in order to determine how to effectively multithread. There are some good technical articles, part of the Microsoft Parallel Computing team's site.
Off the top of my head, there are several ways to go about multithreading:
Thread class.
ThreadPool, which also has support for I/O-bound operations and an I/O completion port.
Begin*/End* asynchronous operations.
Event-based asynchronous programming (or "EBAP") components, which use SynchronizationContext.
BackgroundWorker, which is an EBAP that defines an asynchronous operation.
Task class (Task Parallel Library) in .NET 4.
Parallel LINQ. There is a good article on Parallel.ForEach (Task Parallel Library) vs. PLINQ.
Rx or "LINQ to Events", which does not yet have a non-Beta version but is nearing completion and looks promising.
(F# only) Asynchronous workflows.
Update: There is an article Understanding and Applying Parallel Patterns with the .NET Framework 4 available for download that gives some direction on which solutions to use for which kinds of parallel scenarios (though it assumes .NET 4 and doesn't cover Rx).
Strictly speaking, the distinction between parallel, asynchronous and concurrent should be made here.
Parallel means that a "task" is split among several smaller sub-"tasks" that can be run at the same time. This requires a multi-core CPU or a multi-CPU computer, where each task has its dedicated core or CPU. Or multiple computers. PLINQ (data parallelism) and TPL (task parallelism) fall into this category.
Asynchronous means that tasks run without blocking each other. F#'s async expression, Rx, Begin/End pattern are all APIs for async programming.
Concurrency is a concept more broad than parallelization and asynchrony.
Concurrency means that several "tasks" run at the same time, interacting with each other. But these "tasks" don't have to run on separate physical computing units, as is meant in parallelization. For example, multitasking operating systems can execute multiple processes concurrently even on single-core single-CPU computers, using time slices.
Concurrency can be achieved for example with the Actor model and message passing (e.g. F#'s mailbox, Erlang processes (Retlang in .Net))
Threads are a relatively low-level concept compared to the concepts above. Threads are tasks running within a process, running concurrently and managed directly by the operating system's scheduler. You can implement parallelization when the operating system maps each thread to a separate core, or an Actor model by implementing message queuing, routing, etc on each thread.
There are also some .NET libraries for data parallel programming which target the Graphics Processing Unit (GPU) including:
Microsoft Accelerator
is for data parallel programming and can target either the GPU or multi-core processors.
Brama is for LINQ style data transformations that run on the GPU.
CUDA.NET provides a wrapper to allow to CUDA to be used from .NET programs.
There is also the Reactive Extensions for .NET (Rx)
Rx is basically linq queries for events. It allows you to process and combine asynchronous data streams in the same way linq allows you to work with collections. So you would probably use it in conjunction with other parallel technologies as a way of bringing the results of your parallel operations together without having to worry about locks and other low level threading primitives.
Expert to Expert: Brian Beckman and Erik Meijer - Inside the .NET Reactive Framework (Rx) gives a good overview of what Rx is all about.
EDIT: Another library worth mention is the Concurrency and Coordination Runtime (CCR), it's been around for a long time (earlier than '06) and is shipped as part of the Microsoft Robotics Studio.
Rx has a lot of the same cool ideas that the CCR has inside it, but with a much nicer API in my opinion. There's still some interesting stuff in the CCR though so it might be worth checking out. There's also a distributed services framework that works with the CCR that might make it useful depending on what you're doing.
Expert to Expert: Meijer and Chrysanthakopoulos - Concurrency, Coordination and the CCR
One more is the new Task Parallel library in .NET 4.0, which is similar and along the lines of what you've already discovered, but this may be an interesting read:
Task Parallel Library
two major ways to do parallel are threads and the new task based library TPL.
Asynchronous Programming you mention is nothing more then one new thread in the threadpool.
PLINQ, Rx and others mentioned are actually extensions sitting on the top of the new task scheduler.
the best article explaining exactly the new architecture for new task scheduler and all libraries on the top of it, Visual Studio 2010 and new TPL .NET 4.0 Task-based Parallelism is here (by Steve Teixeira, Product Unit Manager for Parallel Developer Tools at Microsoft):
http://www.drdobbs.com/visualstudio/224400670
otherwise Dr Dobbs has dedicated parallel programming section here: http://www.drdobbs.com/go-parallel/index.jhtml
The main difference between say threads and new task based parallel programming is you do not need to think anymore in terms of threads, how do you manage pools and underlying OS and hardware anymore. TPL takes care for that you just use tasks. That is a huge change in the way you do parallel on any level including abstraction.
So in .NET actually you do not have many choices:
Thread
New task based, task scheduler.
Obviously the task based is the way to go.
cheers
Valko
Related
I'm working on upgrading a job scheduling system we use in-house that uses Quartz.net. Looking at the source of the latest version of Quartz, I noticed that it still uses its own thread pool implementation, as opposed the much-improved thread pool (or anything from System.Threading.Tasks) that started shipping with .NET 4.0.
I'd be curious to know if anyone has successfully implemented a job scheduling system that uses Quartz.net for its scheduling features and TPL for thread pooling. Is it relatively easy to swap out Quartz's thread pool for that of TPL? Is Quartz even still relevant in the world of Tasks? Alternatively, as sold as I am on the great improvements with the .NET 4.x thread pool (core awareness, local queues, improved locking, etc.), is Quartz's thread pool good enough for typical coarse-grained background jobs and not worth the effort of forcing TPL into the mix?
Thanks in advance for any insights on using (or not using) these two tools together.
Quartz.NET is there to solve a bit different problem than TPL. Quartz.NET is intended for recurring job scheduling with rich set of capabilities for execution timing. TPL on the other hand is meant for highly performant parallel execution of computational workload.
So in essence you (usually) use Quartz.NET for precision scheduling and TPL for conccurent workloads that needs to be completed as quick as possible utilizing all computing resources (cores etc).
Having said this, I'd say the thread pool implementation that Quartz.NET uses is quite sufficient for the job. Also bear in mind that Quartz.NET is .NET 3.5 compliant and cannot use 4.0 only features.
Of course, you can also always combine the two in your solution.
I've a configuration xml which is being used by a batch module in my .Net 3.5 windows application.
Each node in the xml is mapped to a .Net class. Each class does processing like mathematical calculations, making db calls etc.
The batch module loads the xml, identifies the class associated with each node and then processes it.
Now, we have the following requirements:
1.Lets say there are 3 classes[3 nodes in the xml]...A,B, and C.
Class A can be dependant on class B...ie. we need to execute class B before processing class A. Class C processing should be done on a separare thread.
2.If a thread is running, then we should be able to cancel that thread in the middle of its processing.
We need to implement this whole module using .net multi-threading.
My questions are:
1.Is it possible to implement requirement # 1 above?If yes, how?
2.Given these requirements, is .Net 3.5 a good idea or .Net 4.0 would be a better choice?Would like to know advantages and disadvantages please.
Thanks for reading.
You'd be better off using the Task Parallel Library (TPL) in .NET 4.0. It'll give you lots of nice features for abstracting the actual business of creating threads in the thread pool. You could use the parallel tasks pattern to create a Task for each of the jobs defined in the XML and the TPL will handle the scheduling of those tasks regardless of the hardware. In other words if you move to a machine with more cores the TPL will schedule more threads.
1) The TPL supports the notion of continuation tasks. You can use these to enforce task ordering and pass the result of one Task or future from the antecedent to the continuation. This is the futures pattern.
// The antecedent task. Can also be created with Task.Factory.StartNew.
Task<DayOfWeek> taskA = new Task<DayOfWeek>(() => DateTime.Today.DayOfWeek);
// The continuation. Its delegate takes the antecedent task
// as an argument and can return a different type.
Task<string> continuation = taskA.ContinueWith((antecedent) =>
{
return String.Format("Today is {0}.",
antecedent.Result);
});
// Start the antecedent.
taskA.Start();
// Use the contuation's result.
Console.WriteLine(continuation.Result);
2) Thread cancellation is supported by the TPL but it is cooperative cancellation. In other words the code running in the Task must periodically check to see if it has been cancelled and shut down cleanly. TPL has good support for cancellation. Note that if you were to use threads directly you run into the same limitations. Thread.Abort is not a viable solution in almost all cases.
While you're at it you might want to look at a dependency injection container like Unity for generating configured objects from your XML configuration.
Answer to comment (below)
Jimmy: I'm not sure I understand holtavolt's comment. What is true is that using parallelism only pays off if the amount of work being done is significant, otherwise your program may spend more time managing parallelism that doing useful work. The actual datasets don't have to be large but the work needs to be significant.
For example if your inputs were large numbers and you we checking to see if they were prime then the dataset would be very small but parallelism would still pay off because the computation is costly for each number or block of numbers. Conversely you might have a very large dataset of numbers that you were searching for evenness. This would require a very large set of data but the calculation is still very cheap and a parallel implementation might still not be more efficient.
The canonical example is using Parallel.For instead of for to iterate over a dataset (large or small) but only perform a simple numerical operation like addition. In this case the expected performance improvement of utilizing multiple cores is outweighed by the overhead of creating parallel tasks and scheduling and managing them.
Of course it can be done.
Assuming you're new, I would likely look into multithreading, and you want 1 thread per class then I would look into the backgroundworker class, and basically use it in the different classes to do the processing.
What version you want to use of .NET also depends on if this is going to run on client machines also. But I would go for .NET 4 simply because it's newest, and if you want to split up a single task into multiple threads it has built-in classes for this.
Given your use case, the Thread and BackgroundWorkerThread should be sufficient. As you'll discover in reading the MSDN information regarding these classes, you will want to support cancellation as your means of shutting down a running thread before it's complete. (Thread "killing" is something to be avoided if at all possible)
.NET 4.0 has added some advanced items in the Task Parallel Library (TPL) - where Tasks are defined and managed with some smarter affinity for their most recently used core (to provide better cache behavior, etc.), however this seems like overkill for your use case, unless you expect to be running very large datasets. See these sites for more information:
http://msdn.microsoft.com/en-us/library/dd460717.aspx
http://archive.msdn.microsoft.com/ParExtSamples
I'm in the process of writing a library that deals with long-running tasks like file downloading and processing large amounts of text. I want to multi-thread this library so that these tasks won't freeze up the applications that use them.
Do you have any advice for doing so in a structured manner, or specific classes I should use/avoid? I was thinking of using the IAsyncResult interface: http://msdn.microsoft.com/en-us/library/system.iasyncresult.aspx, or perhaps some BackgroundWorkers.
so that these tasks won't freeze up the applications that use them.
If this is your goal, you should look into the standard asynchronous programming patterns in the framework.
If your library is targeting .NET 4, have it return Task and Task<T>, as this will ease transition into the async support coming in the next release of C# and VB.NET. This also has the very nice addition of allowing synchronous usage with no extra work on your part, since the user can always just do:
var result = foo.BarAsync().Result; // Getting Task<T>.Result blocks, effectively making this synchronous
If you're targeting .NET 3.5 or earlier, you should consider using the Event-based asynchronous pattern, as it is used in more of the current APIs than the APM.
I get this comment on ADI while reading Essential C# 4.0:
Unfortunately, the underlying
technology used by the asynchronous
delegate invocation pattern is an
end-of-further-development technology
for distributed programming known as
remoting. And although Microsoft still
supports the use of asynchronous
delegate invocation and it will
continue to function as it does today
for the foreseeable future, the
performance characteristics are
suboptimal given other
approaches—namely Thread, ThreadPool,
and TPL. Therefore, developers should
tend to favor one of these
alternatives rather than implementing
new development using the asynchronous
delegate invocation API. Further
discussion of the pattern is included
in the Advanced Topic text that
follows so that developers who
encounter it will understand how it
works.
So are there any limitations that ADI has and TPL doesn't, besides that TPL probably uses a not-end-of-further-development-yet technology?
Tasks and async delegates both use thread pool.
Tasks and async delegates are similar in the sense that exception can be propagated to caller. Tasks go one step further, accumulating all thrown exceptions and presenting them for all thread pool workers together.
Tasks allow for cancellation.
There's a free chapter that describes all of this in more detail:
http://www.albahari.com/threading/
You ask for "limitations".
I don't think you will find anything that can't be done with ADI (also called APM). The point is performance and programmer effort.
The verdict seems unanimous, Joe Duffy also warns you away from the ADI/APM
And the conclusion is easy, use the TPL if you can. It is easy and efficient. And it's at the just-the -beginning-of-further-development point.
Not that I am an expert in TPL. From what I understand TPL abstracts the decisions on the level of parallelism as configurations/specification.
For instance, in a parallel for loop.
Parallel.For(0, 1000, a => Thread.Sleep(10000));
You don't necessarily spawn 1000 threads. The TPL will "parallelise" to the appropriate number of threads. As opposed to asynchronously invoking a method 1000 times. (Which won't create 1000 threads either, but you will just have blocked execution calls until the required resources are freed up.
Also, TPL allows you a higher level control of the parallel tasks. In the above example, you can pause/break/abort the for loop easily. Such as.
Parrallel.For(0, 1000, (a, loopState) => loopState.Break());
It's a bit of hassle to achieve the above using conventional async method invoke.
TL,DR: TPL are more efficient and easier to use.
Since the launch of .NET 4.0 a new term that has got into the limelight is parallel computing. Does parallel computing provide us some benefits or is it just another concept or feature?
Further is .NET really going to use it in applications?
Further is parallel computing different from parallel programming?
Kindly throw some light on the issue in perspective of .NET and some examples would be helpful.
It is not exactly a new term, just a new emphasis.
And yes, more programmers will have to create more parallel code if they want to profit from new hardware. See "the free lunch is over"
Further is parallel computing different from parallel programming ?
No
And here are some samples on MSDN (The PLINQ raytracer is cool)
You use parallel programming methods to enable parallel computing of your operations. .NET will utilize it if you tell it to via code.
The benefit to parallel computing is overall speed of execution. As you may have noticed over the past few years, processors aren't getting any faster, but the number of CPU cores per system is increasing. Parallel programming is the means by which you can take advantage of this form of upgrade, by splitting large jobs into smaller tasks that can be handled concurrently by separate cores.