I have a situation where I have a polling thread for a TCPClient (is that the best plan for a discrete TCP device?) which aggregates messages and occasionally responds to those messages by firing off events. The event producer really doesn't care much if the thread is blocked for a long time, but the consumer's design is such that I'd prefer to have it invoke the handlers on a single worker thread that I've got for handling a state machine.
The question then is this. How should I best manage the creation, configuration (thread name, is background, etc.) lifetime, and marshaling of calls for these threads using the Task library? I'm somewhat familiar with doing this explicitly using the Thread type, but when at all possible my company prefers to do what we can just through the use of Task.
Edit: I believe what I need here will be based around a SynchronizationContext on the consumer's type that ensures that tasks are schedules on a single thread tied to that context.
The question then is this. How should I best manage the creation, configuration (thread name, is background, etc.) lifetime, and marshaling of calls for these threads using the Task library?
This sounds like a perfect use case for BlockingCollection<T>. This class is designed specifically for producer/consumer scenarios, and allows you to have any threads add to the collection (which acts like a thread safe queue), and one (or more) thread or task call blockingCollection.GetConsumingEnumerable() to "consume" the items.
You could consider using TPL DataFlow where you setup an ActionBlock<T> that you push messages into from your TCP thread and then TPL DataFlow will take care of the rest by scaling out the processing of the actions as much your hardware can handle. You can also control exactly how much processing of the actions happen by configuring the ActionBlock<T> with a MaxDegreeOfParallelism.
Since processing sometimes can't keep up with the flow of incoming data, you might want to consider "linking" a BufferBlock<T> in front of the ActionBlock<T> to ensure that the TCP processing thread doesn't get too far ahead of what you can actually process. This would have the same effect as using BlockingCollection<T> with a bounded capacity.
Finally, note that I'm linking to .NET 4.5 documentation because it's easiest, but TPL DataFlow is available for .NET 4.0 via a separate download. Unfortunately they never made a NuGet package out of it.
Related
I'm currently picking up C# again and developping a simple application that sends broadcast messages and when received shown on a Windows Form.
I have a discovery class with two threads, one that broadcasts every 30 seconds, the other thread listens on a socket. It is in a thread because of the blocking call:
if (listenSocket.Poll(-1, SelectMode.SelectRead))
The first thread works much like a timer in a class library, it broadcasts the packet and then sleeps for 30 seconds.
Now in principle it works fine, when a packet is received I throw it to an event and the Winform places it in a list. The problems start with the form though because of the main UI thread requiring Invoke. Now I only have two threads and to me it doesn't seem to be the most effective in the long run becoming a complex thin when the number of threads will grow.
I have explored the Tasks but these seem to be more orientated at a once off long running task (much like the background worker for a form).
Most threading examples I find all report to the console and do not have the problems of Invoke and locking of variables.
As i'm using .NET 4.5 should I move to Tasks or stick to the threads?
Async programming will still delegate some aspects of your application to a different thread (threadpool) if you try to update the GUI from such a thread you are going to have similar problems as you have today with regular threads.
However there are many techniques in async await that allow you to delegate to a background thread, and yet put a kind off wait point saying please continue here on the GUI thread when you are finished with that operation which effectively allows you to update the GUI thread without invoke, and have a responsive GUI. I am talking about configureAwait. But there are other techniques as well.
If you don't know async await mechanism yet, this will take you some investment of your time to learn all these new things. But you'll find it very rewarding.
But it is up to you to decide if you are willing to spend a few days learning and experimenting with a technology that is new to you.
Google around a bit on async await, there are some excellent articles from Stephen Cleary for instance http://blog.stephencleary.com/2012/02/async-and-await.html
Firstly if you're worried about scalability you should probably start off with an approach that scales easily. ThreadPool would work nice. Tasks are based on ThreadPool as well and they allow for a bit more complex situations like tasks/threads firing in a sequence (also based on condition), synchronization etc. In your case (server and client) this seems unneeded.
Secondly it looks to me that you are worried about a bottleneck scenario where more than one thread will try to access a common resource like UI or DB etc. With DBs - don't worry they can handle multiple access well. But in case of UI or other not-multithread-friendly resource you have to manage parallel access yourself. I would suggest something like BlockingCollection which is a nice way to implement "many producers, one consumer" pattern. This way you could have multiple threads adding stuff and just one thread reading it from the collection and passing it on the the single-threaded resource like UI.
BTW, Tasks can also be long running i.e. run loops. Check this documentation.
I'm writing a service that has to read tasks from an AMQP message queue and perform a synchronous action based on the message type. These actions might be to send an email or hit a web service, but will generally be on the order of a couple hundred milliseconds assuming no errors.
I want this to be extensible so that other actions can be added in the future. Either way, the volume of messages could be quite high, with bursts of 100's / second coming in.
I'm playing around with several designs, but my questions are as follows:
What type of threading model should I go with? Do I:
a) Go with a single thread to consume from the queue and put tasks on a thread pool? If so, how do I represent those tasks?
b) Create multiple threads to host their own consumers and have them handle the task synchronously?
c) Create multiple threads to host their own consumers and have them all register a delegate to handle the tasks as they come in?
In the case of a or c, what's the best way to have the spawned thread communicate back with the main thread? I need to ack the message that came off the the queue. Do I raise an event from the spawned thread that the main thread listens to?
Is there a guideline as to how many threads I should run, given x cores? Is it x, 2*x? There are other services running on this system too.
You should generally* avoid direct thread programming in favor of the Task Parallel Library and concurrent collections built into .NET 4.0 and higher. Fortunately, the producer/consumer problem you described is common and Microsoft has a general-purpose tool for this: the BlockingCollection. This article has a good summary of its features. You may also refer to this white paper for performance analysis of the BlockingCollection<T> (among other things).
However, before pursuing the BlockingCollection<T> or an equivalent, given the scenario you described, why not go for the simple solution of using the Tasks. The TPL gives you the asynchronous execution of tasks with a lot of extras like cancellation and continuation. If, however, you need more advanced lifecycle management, then go for something like a BlockingCollection<T>.
* By "generally", I'm insinuating that the generic solution will not necessarily perform the best for your specific case as it's almost certain that a properly designed custom solution will be better. As with every decision, perform the cost/benefit analysis.
In typical .NET world, we use event-based asynchronous pattern(Event Handler) for most I/O operations, more specific as I know, the I/O completion port was introduced for improve the efficiency of scheduling the threads, like the ThreadPool, thus we don't need to manually maintain(init and destroy) the threads to handle the massive I/O responses.
Meanwhile, I naturally thought the waiting for I/O response don't need block any thread in modern Windows system because of hardware interrupt until I saw some pieces of C++ code in my recent project and even some sample code in web.
I Don't have any C++ experience
The first code piece is about a serial port listening, the pseudo C++ code(i input it in C# style) is like:
// loop checking the status
while(serialPort.Buffer.Count==0)
{
Thread.Sleep(100);
}
byte[] data = serialPort.Buffer;
// processing the actual data...
The second code piece is about the usage of I/O completion port in C++:
while (::GetQueuedCompletionStatus(port,
&bytesCopied,
&completionKey,
&overlapped,
INFINITE))
{
if (0 == bytesCopied && 0 == completionKey && 0 == overlapped)
{
break;
}
else
{
// Process completion packet
}
}
Obviously, they all blocking the thread.
So my question are:
Why those code didn't choose the Event-based no thread blocking way?
If .NET underlying use the second sample's code, so actually there're threads blocked when doing I/O operations?
(May a little off topic) Does .NET I/O operation callback allow concurrently re-enter when previous callback still under executing?(from my limited tests, the answer is NO) and why?
Well, first, blocking is not, in itself, bad. The 'main' GUI thread in a Windows app fires its 'OnClick' etc. events in response to messages received from a Windows message queue - a blocking producer-consumer queue. When no messages are received, the thread blocks on the queue. Same with most 'non-blocking' select() based servers - select is a blocking call,, (though it can be made to poll by setting a low/zero timeout - a poor design).
1) Asynchronous designs are intrinsically more complex. Per-socket context data, (eg. buffers), cannot be maintained in stack-based auto vars and must be maintained across events by either maintaining a global container of context objects, (that have to be looked up by socket handle in the events when they are fired), or by issuing context objects with the I/O requests and retrieving them from callback parameters in the events. Asynchronous designs should be totally asynchronous - calling anything that might block for any extended period has to be avoided if possible. Calls to opaque external libraries, DB queries and the like can be troublesome in this respect, blocking the supposedly asynchronous thread and preventing it from responding to events.
The first code snippet is just horrible and I struggle to find any justification for it at all. The sleep() loop polling has a built-in average 50ms latency in responding to input. Just mega-lame when better synch and async solutions exist. A dedicated read thread, queued APCs, (completion routines) and IOCP are all available for serial ports.
The second code-snippet IS, effectively, event-based async. You could make it look even more 'event-based' by having the handler threads call an event-handler with the parameters returned by the completion message.
IOCP is the preferred high-performance I/O system for Windows. It can handle many types of I/O operations and it's threadpool-based handlers can withstand occasional blocking or lengthy operations without holding up the processing of further I/O completions. Passing user-buffers in with the call allows the driver/s to load them directly in kernel space and removes a layer of copying. What it does not do is avoid the need to maintain context across the asynchronous calls.
Synchronous thread-per-client is commonly used where the requirements for scalability are swamped by the simple in-line code and immunity from blocking calls that are inherent in such designs. Handling serial comms is not something where scalability to thousands of ports is ever an issue.
2) IOCP handler threads block while waiting for completion messages, sure. If there is nothing to be done, threads should block:)
3) They should do. Adding on an extra layer of signaling to ensure that the callbacks are handled serially involves more overhead and adds back in the vulnerability to any kind of blocking in the callback holding up the handling of other callbacks from other IOCP handler threads that would not need to block. Since context is passed in as a parameter, there is no intrinsic requirement for IOCP-driven callbacks to be run in a serial manner. The code in the callback handler can just operate on the passed information in the manner of a state-machine.
That said, I would not be surprised if MS .NET did indeed provide signaling/queueing to enforce serial, non-reeentrant callbacks. Insufficiently experienced devs. often do things in multithreaded callbacks that they should not do, eg. accessing global/persistent state without any locking, or accessing thread-bound GUI controls directly. Serializing the calls, either by wrapping them into Windows messages or otherwise, removes this risk at the expense of performance.
Probably because asynchronous programming is hard.
.Net, on an I/O method mostly exposes a synchronous operation and an asynchronous operation when it can. So for example you have TcpClient.Connect and TcpClient.ConnectAsync/TcpClient.BeginConnect. Whatever starts with "Begin" or ends with "Async" is at least supposed to be async, which means there are no blocked threads. TcpClient.Connect is blocking and hence less scalable.
I'm not really sure, but i think they can. The question is why you want to do that? And how can you match a callback with its call?
I am building an application where it is possible to monitor some MCU hardware (sensors readings etc) in real time. For the communication I am using a CAN bus.
Basically i have 2 threads as of now. One is the main thread where the GUI is running and the other is managing/monitoring the communication between the device. So the obvious thing is that i need to pass the data from the communications thread to the gui thread. However what should be the right way to do it? I know how to pass a data back to the calling thread when the child thread has finished working, but in this case the communications thread is running all the time.
Of course the communications logic is represented by a separate class (CANManager).
I have couple of ideas of my own, however I would like to know what is the "right" way how this should be done.
Thanks in advance :)
Generally in any programming language you need to consider a pub-sub architecture for communicating across threads. This means that for each thread A which wishes to send a message to thread B, you should post a 'message' or event from that thread onto a queue, to be consumed by another thread when it is free. If you just google 'Cross Thread communication c#' you will find numerous articles to read over.
Specifically, in .NET the way to invoke a method or delegate on another (any) thread is to use SynchronizationContext. This is common to both Windows Forms and WPF, whereas WPF has a Dispatcher which is distinct to this framework to invoke on the UI thread only.
There are many frameworks, libraries, patterns available to do this sort of technique. One of them is the Task Parallel Library. TPL allows you to create a Task, or Task and invoke it on a threadpool, UI, same or specific thread. TPL allows thread marshalling via the use of Schedulers. You can use the built-in Schedulers or create your own. Schedulers use SynchronizationContext at their heart to do the thread marshalling.
One particularly interesting pattern of TPL is the ability to run a delegate on one thread and then chain multiple operations on other threads, e.g. on completion or on error. I would look into the Task Asynchronous Pattern and consider returning Task from async methods so you can chain on them using ContinueWith.
According to this post:
How to write a scalable Tcp/Ip based server
jerrylvl states:
----------*
Processing
When you get the callback from the Begin call you made, it is very important to realise that the code in the callback will execute on the low-level IOCP thread. It is absolutely essential that you avoid lengthy operations in this callback. Using these threads for complex processing will kill your scalability just as effectively as using 'thread-per-connection'.
The suggested solution is to use the callback only to queue up a work item to process the incoming data, that will be executed on some other thread. Avoid any potentially blocking operations inside the callback so that the IOCP thread can return to its pool as quickly as possible. In .NET 4.0 I'd suggest the easiest solution is to spawn a Task, giving it a reference to the client socket and a copy of the first byte that was already read by the BeginReceive call. This task is then responsible for reading all data from the socket that represent the request you are processing, executing it, and then making a new BeginReceive call to queue the socket for IOCP once more. Pre .NET 4.0, you can use the ThreadPool, or create your own threaded work-queue implementation.
----------*
My question is how exactly would I be doing this in .Net 4.0? Could someone please provide me with a code example which would work well in a scalable environment?
Thanks!
My question was answered in more depth here: C# - When to use standard threads, ThreadPool, and TPL in a high-activity server
This goes into specifics on using each the TPL, ThreadPool, and standard threads to perform work items, and when to use each method.