I used to make good use of Java's ThreadPoolExecutor class and have yet to find a good equivalent in C#. I know of ThreadPool.QueueUserWorkItem which is useful in many cases but no good if you want to control the number of threads assigned to a task or have multiple individual queues for different task types.
For example I liked to use a ThreadPoolExecutor with a single thread to guarantee sequential execution of asynchronous calls.. Is there an easy way to do this in C#? Is there a non-static thread pool implementation?
Until .Net 4.0 and the TPL, there is no such feature built-in.
However, see this artcle
As part of the Reactive Extensions (Rx), the Task Parallel Library was backported to .NET 3.5. If you add a reference to the System.Threading.dll including in its distribution, you can use the TPL with .NET 3.5.
There are also thread pools built into the Concurrency and Coordination Runtime, which is freely available for use. See this MSDN article for use.
Related
I'm currently working on code which is supposed to be thread safe. Lots of asynchronous calls and events and stuff that generally requires quite a bit of work to keep synchronized and thread safe.
Are there any classes in the .NET framework which deal with this sort of thing, which I could look at (decompile), to see how things are supposed to be done? The more complex the better really...
.NET 4.0 Thread-Safe Collections
Though targeting Windows rather than just .NET, Joe Duffy's book is worth noting: Concurrent Programming on Windows
MSDN has some good information on asynchronous programming in .NET. Check out Asynchronous Programming Design Patterns.
Also check out the Monitor and Mutex classes in System.Threading
There are a few ways to keep things atomic. There are classes that support fencing, such that code is guaranteed to execute in a certain order. You can use the keyword volatile to ensure that variable access is atomic (Read from memory each use) and that you have no phantom reads. These are just a few tools; I would suggest a simple Google of "atomic C#, F#.."
I've a configuration xml which is being used by a batch module in my .Net 3.5 windows application.
Each node in the xml is mapped to a .Net class. Each class does processing like mathematical calculations, making db calls etc.
The batch module loads the xml, identifies the class associated with each node and then processes it.
Now, we have the following requirements:
1.Lets say there are 3 classes[3 nodes in the xml]...A,B, and C.
Class A can be dependant on class B...ie. we need to execute class B before processing class A. Class C processing should be done on a separare thread.
2.If a thread is running, then we should be able to cancel that thread in the middle of its processing.
We need to implement this whole module using .net multi-threading.
My questions are:
1.Is it possible to implement requirement # 1 above?If yes, how?
2.Given these requirements, is .Net 3.5 a good idea or .Net 4.0 would be a better choice?Would like to know advantages and disadvantages please.
Thanks for reading.
You'd be better off using the Task Parallel Library (TPL) in .NET 4.0. It'll give you lots of nice features for abstracting the actual business of creating threads in the thread pool. You could use the parallel tasks pattern to create a Task for each of the jobs defined in the XML and the TPL will handle the scheduling of those tasks regardless of the hardware. In other words if you move to a machine with more cores the TPL will schedule more threads.
1) The TPL supports the notion of continuation tasks. You can use these to enforce task ordering and pass the result of one Task or future from the antecedent to the continuation. This is the futures pattern.
// The antecedent task. Can also be created with Task.Factory.StartNew.
Task<DayOfWeek> taskA = new Task<DayOfWeek>(() => DateTime.Today.DayOfWeek);
// The continuation. Its delegate takes the antecedent task
// as an argument and can return a different type.
Task<string> continuation = taskA.ContinueWith((antecedent) =>
{
return String.Format("Today is {0}.",
antecedent.Result);
});
// Start the antecedent.
taskA.Start();
// Use the contuation's result.
Console.WriteLine(continuation.Result);
2) Thread cancellation is supported by the TPL but it is cooperative cancellation. In other words the code running in the Task must periodically check to see if it has been cancelled and shut down cleanly. TPL has good support for cancellation. Note that if you were to use threads directly you run into the same limitations. Thread.Abort is not a viable solution in almost all cases.
While you're at it you might want to look at a dependency injection container like Unity for generating configured objects from your XML configuration.
Answer to comment (below)
Jimmy: I'm not sure I understand holtavolt's comment. What is true is that using parallelism only pays off if the amount of work being done is significant, otherwise your program may spend more time managing parallelism that doing useful work. The actual datasets don't have to be large but the work needs to be significant.
For example if your inputs were large numbers and you we checking to see if they were prime then the dataset would be very small but parallelism would still pay off because the computation is costly for each number or block of numbers. Conversely you might have a very large dataset of numbers that you were searching for evenness. This would require a very large set of data but the calculation is still very cheap and a parallel implementation might still not be more efficient.
The canonical example is using Parallel.For instead of for to iterate over a dataset (large or small) but only perform a simple numerical operation like addition. In this case the expected performance improvement of utilizing multiple cores is outweighed by the overhead of creating parallel tasks and scheduling and managing them.
Of course it can be done.
Assuming you're new, I would likely look into multithreading, and you want 1 thread per class then I would look into the backgroundworker class, and basically use it in the different classes to do the processing.
What version you want to use of .NET also depends on if this is going to run on client machines also. But I would go for .NET 4 simply because it's newest, and if you want to split up a single task into multiple threads it has built-in classes for this.
Given your use case, the Thread and BackgroundWorkerThread should be sufficient. As you'll discover in reading the MSDN information regarding these classes, you will want to support cancellation as your means of shutting down a running thread before it's complete. (Thread "killing" is something to be avoided if at all possible)
.NET 4.0 has added some advanced items in the Task Parallel Library (TPL) - where Tasks are defined and managed with some smarter affinity for their most recently used core (to provide better cache behavior, etc.), however this seems like overkill for your use case, unless you expect to be running very large datasets. See these sites for more information:
http://msdn.microsoft.com/en-us/library/dd460717.aspx
http://archive.msdn.microsoft.com/ParExtSamples
I'm in the process of writing a library that deals with long-running tasks like file downloading and processing large amounts of text. I want to multi-thread this library so that these tasks won't freeze up the applications that use them.
Do you have any advice for doing so in a structured manner, or specific classes I should use/avoid? I was thinking of using the IAsyncResult interface: http://msdn.microsoft.com/en-us/library/system.iasyncresult.aspx, or perhaps some BackgroundWorkers.
so that these tasks won't freeze up the applications that use them.
If this is your goal, you should look into the standard asynchronous programming patterns in the framework.
If your library is targeting .NET 4, have it return Task and Task<T>, as this will ease transition into the async support coming in the next release of C# and VB.NET. This also has the very nice addition of allowing synchronous usage with no extra work on your part, since the user can always just do:
var result = foo.BarAsync().Result; // Getting Task<T>.Result blocks, effectively making this synchronous
If you're targeting .NET 3.5 or earlier, you should consider using the Event-based asynchronous pattern, as it is used in more of the current APIs than the APM.
I am new to .Net platform. I did a search and found that there are several ways to do parallel computing in .Net:
Parallel task in Task Parallel Library, which is .Net 3.5.
PLINQ, .Net 4.0
Asynchounous Programming, .Net 2.0, (async is mainly used to do I/O heavy tasks, F# has a concise syntax supporting this). I list this because in Mono, there seem to be no TPL or PLINQ. Thus if I need to write cross platform parallel programs, I can use async.
.Net threads. No version limitation.
Could you give some short comments on these or add more methods in this list?
You do need to do a fair amount of research in order to determine how to effectively multithread. There are some good technical articles, part of the Microsoft Parallel Computing team's site.
Off the top of my head, there are several ways to go about multithreading:
Thread class.
ThreadPool, which also has support for I/O-bound operations and an I/O completion port.
Begin*/End* asynchronous operations.
Event-based asynchronous programming (or "EBAP") components, which use SynchronizationContext.
BackgroundWorker, which is an EBAP that defines an asynchronous operation.
Task class (Task Parallel Library) in .NET 4.
Parallel LINQ. There is a good article on Parallel.ForEach (Task Parallel Library) vs. PLINQ.
Rx or "LINQ to Events", which does not yet have a non-Beta version but is nearing completion and looks promising.
(F# only) Asynchronous workflows.
Update: There is an article Understanding and Applying Parallel Patterns with the .NET Framework 4 available for download that gives some direction on which solutions to use for which kinds of parallel scenarios (though it assumes .NET 4 and doesn't cover Rx).
Strictly speaking, the distinction between parallel, asynchronous and concurrent should be made here.
Parallel means that a "task" is split among several smaller sub-"tasks" that can be run at the same time. This requires a multi-core CPU or a multi-CPU computer, where each task has its dedicated core or CPU. Or multiple computers. PLINQ (data parallelism) and TPL (task parallelism) fall into this category.
Asynchronous means that tasks run without blocking each other. F#'s async expression, Rx, Begin/End pattern are all APIs for async programming.
Concurrency is a concept more broad than parallelization and asynchrony.
Concurrency means that several "tasks" run at the same time, interacting with each other. But these "tasks" don't have to run on separate physical computing units, as is meant in parallelization. For example, multitasking operating systems can execute multiple processes concurrently even on single-core single-CPU computers, using time slices.
Concurrency can be achieved for example with the Actor model and message passing (e.g. F#'s mailbox, Erlang processes (Retlang in .Net))
Threads are a relatively low-level concept compared to the concepts above. Threads are tasks running within a process, running concurrently and managed directly by the operating system's scheduler. You can implement parallelization when the operating system maps each thread to a separate core, or an Actor model by implementing message queuing, routing, etc on each thread.
There are also some .NET libraries for data parallel programming which target the Graphics Processing Unit (GPU) including:
Microsoft Accelerator
is for data parallel programming and can target either the GPU or multi-core processors.
Brama is for LINQ style data transformations that run on the GPU.
CUDA.NET provides a wrapper to allow to CUDA to be used from .NET programs.
There is also the Reactive Extensions for .NET (Rx)
Rx is basically linq queries for events. It allows you to process and combine asynchronous data streams in the same way linq allows you to work with collections. So you would probably use it in conjunction with other parallel technologies as a way of bringing the results of your parallel operations together without having to worry about locks and other low level threading primitives.
Expert to Expert: Brian Beckman and Erik Meijer - Inside the .NET Reactive Framework (Rx) gives a good overview of what Rx is all about.
EDIT: Another library worth mention is the Concurrency and Coordination Runtime (CCR), it's been around for a long time (earlier than '06) and is shipped as part of the Microsoft Robotics Studio.
Rx has a lot of the same cool ideas that the CCR has inside it, but with a much nicer API in my opinion. There's still some interesting stuff in the CCR though so it might be worth checking out. There's also a distributed services framework that works with the CCR that might make it useful depending on what you're doing.
Expert to Expert: Meijer and Chrysanthakopoulos - Concurrency, Coordination and the CCR
One more is the new Task Parallel library in .NET 4.0, which is similar and along the lines of what you've already discovered, but this may be an interesting read:
Task Parallel Library
two major ways to do parallel are threads and the new task based library TPL.
Asynchronous Programming you mention is nothing more then one new thread in the threadpool.
PLINQ, Rx and others mentioned are actually extensions sitting on the top of the new task scheduler.
the best article explaining exactly the new architecture for new task scheduler and all libraries on the top of it, Visual Studio 2010 and new TPL .NET 4.0 Task-based Parallelism is here (by Steve Teixeira, Product Unit Manager for Parallel Developer Tools at Microsoft):
http://www.drdobbs.com/visualstudio/224400670
otherwise Dr Dobbs has dedicated parallel programming section here: http://www.drdobbs.com/go-parallel/index.jhtml
The main difference between say threads and new task based parallel programming is you do not need to think anymore in terms of threads, how do you manage pools and underlying OS and hardware anymore. TPL takes care for that you just use tasks. That is a huge change in the way you do parallel on any level including abstraction.
So in .NET actually you do not have many choices:
Thread
New task based, task scheduler.
Obviously the task based is the way to go.
cheers
Valko
Recently I was blogging about the oft over-used idea of multi-threading in .Net. I put together a staring list for "APIs you should know first":
Thread
ThreadPool
ManualResetEvent
AutoResetEvent
EventWaitHandle
WaitHandle
Monitor
Mutex
Semaphore
Interlocked
BackgroundWorker
AsyncOperation
lock Statement
volatile
ThreadStaticAttribute
Thread.MemoryBarrier
Thread.VolatileRead
Thread.VolatileWrite
Then I started thinking maybe not all of these are important. For instance, Thread.MemoryBarrier could probably be safely removed from the list. Add to that the obvious statement that I don't know everything and I decided to turn here.
So this is a broad and opinionated question, but I'm curious as to the collective's opinion as to a best-practice study list. Essentially I'm looking for a short hit list for new and/or Jr. developers to work from when beginning to write multi-threading code in C#.
So without further commentary, what should be added or removed from the above list?
I think you need to classify the levels of multithreading, not the different API's. Depending on your threading needs, you may or may not need to know certain subsets of the API's you have listed. If I were to organize them, I would do it along these lines:
Basic Multi-threading:
Requirements
Need to run concurrent processes.
Do not need access to shared resources.
Maximizing utilization of available hardware resources.
API Knowledge
Thread
ThreadPool
BackgroundWorker
Asynchronous Operations/Delegates
Shared Resource Multi-threading:
Requirements
Basic Multi=-threading requirements
Use of shared resources
API Knowledge
Basic Multi-threading API's
lock()/Monitor (they are the same thing)
Interlocked
ReaderWriterLock and variants
volatile
Multi-thread Synchronization
Requirements
Basic Multi=-threading requirements
Shared Resource Multi-threading requirements
Synchronization of behavior across multiple threads
API Knowledge
Basic Multi-threading API's
Shared Resource Multi-threading API's
WaitHandle
Manual/AutoResetEvent
Mutex
Semaphore
Concurrent Shared Resource Multi-threading (hyperthreading)
Requirements
Basic Multi=-threading requirements
Shared Resource Multi-threading requirements
Concurrent read/write access to shared collections
API Knowledge
Basic Multi-threading API's
Shared Resource Multi-threading API's
Parallel Extensions to .NET/.NET 4.0
The rest of the API's I would simply lump into general threading knowledge, stuff that could be picked up as needed, as they fit into all three levels. Things like MemoryBarrier are pretty fringe, and there are usually better ways to accomplish the same thing it accomplishes, with less ambiguity into their behavior and meaning.
IMHO BackgroundWorker should be on the VERY top of your list.
It's fairly simple to explain. Can be used in most cases and delivers a lot of "bang for the buck". For somebody new to threading this gives him something to actually work with without having to learn hours before he does the first thing right.
how about the ParameterizedThreadStart and ThreadStart delegates ?
.NET 4.0 will be bringing some new tools to this problem; many of these tools will hide the details of the low-level threading APIs. You can start getting ready to leverage this new functionality today by doing things such as using LINQ or learning functional-programming techniques.
I'd recommend looking at the Parallel Extensions coming in .NET 4.0, the ThreadPool, BackgroundWorker (if they're working in WinForms) and the lock keyword. Those provide most of the functionality that you'll need from multi-threading, whilst still being a relatively safe environment in which to experiment. Also, you should add the Dispatcher from WPF to your list; developers are more likely to come across that than VolatileRead.