Asynchronous server design - c#

We have a server receiving data from between 500-1500 GPS devices. Each device sends a packet containing around 1-4 GPS coordinates every 10-30 seconds. The server is designed asynchronously with a listener handling connections using Begin- EndAccept, and communication using Begin- EndReceive. Once a packet is received the data is processed and stored in a database.
With few devices (500-700 devices) this takes barely 50 ms, and we have less than 50 concurrent threads running, and a realistic CPU usage (20-40%). However when the server is pressured with connections (1000+) the number of threads explodes to 500-600 and the CPU usage also drops to a few %. The processing time is also increased to several minutes.
Is the asynchronous design bad for this particular scenario with many small packets being sent at this rate, or might the be a problem in the code?
We have currently had to distribute the load across three servers to accomodate all devices, and they are all VMs with 6 CPUs and 4GB memory hosted on a Hyper-V server.
SOLUTION:
The solution I found from the answers from people, was to immediately schedule it as a task using the .Net parallel library, as this is much smarter when scheduling threads across multiple cores:
void EndReceive(IAsyncResult res)
{
Task.Factory.StartNew((object o) => { HandleReceive(o as IAsyncResult); }, res, TaskCreationOptions.PreferFairness);
}
Now the threads rarely exceed 50.

It sounds like somewhere in your application you're using non-asynchronous IO in which you're blocking on the results of the operation. You may be using proper asynchrony in many places, such as the primary connection with the client from the server, but perhaps you're not when connecting to a database or something like that. This mixing of async and non-async is likely why you're having so many threads being created.
By ensuring you have no blocking IO it should ensure you don't have lots of thread pool threads sitting around doing nothing, which appears to be the situation you're in.

What kind of operations are you doing on the server?
If they are CPU-bound it's useless to have more threads than cores and adding more may clutter your server with a bunch of threads fighting like dogs ;)
In this case you should be more lucky with simple processing loops, one per core.

I have never worked on such many requests at the same time but what you could try is creating as many threads as you have cores on your cpu and then implements a queueing system. Your threads would be consumming the queue one device's coordinate at a time. This way I guess your CPU would be used at full throttle...

Related

More TCPClients per Thread or More Threads

I need some guidance on a project we are developing. When triggered, my program needs to contact 1,000 devices by TCP and exchange about 200 bytes of information. All the clients are wireless on a private network. The majority of the time the program will be sitting idle, but then needs to send these messages as quickly as possible. I have come up with two possible methods:
Method 1
Use thread pooling to establish a number of worker threads and have these threads process their way through the 1,000 conversations. One thread handles one conversation until completion. The number of threads in the thread pool would then be tuned for best use of resources.
Method 2
A number of threads would be used to handle multiple conversations per thread. For example a thread process would open 10 socket connections start the conversation and then use asynchronous methods to wait for responses. As a communication is completed, a new device would be contacted.
Method 2 looks like it would be more effective in that operations wouldn’t have to wait with the server device responded. It would also save on the overhead of starting the stopping all those threads.
Am I headed in the right direction here? What am I missing or not considering?
There is a well-established way to deal with this problem. Simply use async IO. There is no need to maintain any threads at all. Async IO uses no threads while the IO is in progress.
Thanks to await doing this is quite easy.
The select/poll model is obsolete in .NET.

How are asp.net thread pool managed priority wise?

We have a computation intensive web application that serves only a few dozen user at any given time at most. However, we have encountered situations where some task performed by just one user can prevent the entire site from processing any other request.
I read that the thread pool has by default up to 128 threads. How can a single thread deprive the remaining threads from cpu time? How does that work? Is this the operating system juging that data access as an example requires higher priority since TCP connection should be kept reliable to sql server in case a large dataset is beeing fethched or saved?
Can someone with a deeper insight with how things actually work enlighten me on this?
How about multi core CPU? We have 8 cpu cores on the server. will they participate in the processing or do we have to increase the number of process/actively engage into parallel processing to take advantage of the multi-core environnment?

How to improve TCP/IP server scalability when restricted by application thread pool

I have a TCP/IP server written in C# .net which can have 10,000 connections at once easy. However, when a callback is received from a socket, it is dealt with by a new thread in the application thread pool. This means that the real concurrent communication limitation is down to the number of threads within the thread pool. For example if those 10,000 connections all attempt to send data at the same time, the majority will have to wait whilst the thread pool runs through as fast as it can. Can anyone share their experience with high performance socket services and advise how a large corporation would go about ensuring the 10,000 connections can not only be connected at the same time, but can also communicate at the same time? Thanks
Don't process the packets inline in the callback. Do the absolute minimum work there, and then hand them off to a separate worker thread pool via a producer-consumer queue that (ideally) never blocks the producer threads, which are your socket listeners. BlockingCollection<T> may be useful here.
You have to be careful that the queue does not grow unbounded - if your consumers are a lot slower than producers, and the queue grows under normal load, you have a problem to which throttling the network receives is the obvious solution, despite its undesirability.
YOu make a thought mistake here. Regardless how many thread you have, data always has to wait unless you have one CPU CORE PER CONNECTION. Scalability is not having unlimited paralellism, but being ab le to handle a lot of conenctions and keep the cpu at full power.
The thread pool is perfectly sized for that. Once the CPU reaches full utilization, you can not do anything else anyway.
and advise how a large corporation would go about ensuring the 10,000 connections can not only be
connected at the same time, but can also communicate at the same time?
MANY computers that have like a total of 500 processor cores. The trick is: what latency is acceptable. You dont need instant communication. You try to sovle that from the wrong end.

Too many Tasks causes SQL db to timeout

My problem is that I'm apparently using too many tasks (threads?) that call a method that queries a SQL Server 2008 database. Here is the code:
for(int i = 0; i < 100000 ; i++)
{
Task.Factory.StartNew(() => MethodThatQueriesDataBase()).ContinueWith(t=>OtherMethod(t));
}
After a while I get a SQL timeout exception. I want keep the actual number of threads low(er) than 100000 to a buffer of say "no more than 10 at a time". I know I can manage my own threads using the ThreadPool, but I want to be able to use the beauty of TPL with the ContinueWith.
I looked at the Task.Factory.Scheduler.MaximumConcurrencyLevel but it has no setter.
How do I do that?
Thanks in advance!
UPDATE 1
I just tested the LimitedConcurrencyLevelTaskScheduler class (pointed out by Skeet) and still doing the same thing (SQL Timeout).
BTW, this database receives more than 800000 events per day and has never had crashes or timeouts from those. It sounds kinda weird that this will.
You could create a TaskScheduler with a limited degree of concurrency, as explained here, then create a TaskFactory from that, and use that factory to start the tasks instead of Task.Factory.
Tasks are not 1:1 with threads - tasks are assigned threads for execution out of a pool of threads, and the pool of threads is normally kept fairly small (number of threads == number of CPU cores) unless a task/thread is blocked waiting for a long-running synchronous result - such as perhaps a synchronous network call or file I/O.
So spinning up 10,000 tasks should not result in the production of 10,000 actual threads. However, if every one of those tasks immediately dives into a blocking call, then you may wind up with more threads, but it still shouldn't be 10,000.
What may be happening here is you are overwhelming the SQL db with too many requests all at once. Even if the system only sets up a handful of threads for your thousands of tasks, a handful of threads can still cause a pileup if the destination of the call is single-threaded. If every task makes a call into the SQL db, and the SQL db interface or the db itself coordinates multithreaded requests through a single thread lock, then all the concurrent calls will pile up waiting for the thread lock to get into the SQL db for execution. There is no guarantee of which threads will be released to call into the SQL db next, so you could easily end up with one "unlucky" thread that starts waiting for access to the SQL db early but doesn't get into the SQL db call before the blocking wait times out.
It's also possible that the SQL back-end is multithreaded, but limits the number of concurrent operations due to licensing level. That is, a SQL demo engine only allows 2 concurrent requests but the fully licensed engine supports dozens of concurrent requests.
Either way, you need to do something to reduce your concurrency to more reasonable levels. Jon Skeet's suggestion of using a TaskScheduler to limit the concurrency sounds like a good place to start.
I suspect there is something wrong with the way you're handling DB connections. Web servers could have thousands of concurrent page requests running all in various stages of SQL activity. I'm betting that attempts to reduce the concurrent task count is really masking a different problem.
Can you profile the SQL connections? Check out perfmon to see how many active connections there are. See if you can grab-use-release connections as quickly as possible.

Are Socket.*Async methods threaded?

I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?
No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.
Seems like the *Async methods use IOCP (by looking at the code with Reflector).
Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.
Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.

Categories

Resources