Performance of ReceiveAsync vs. BeginReceive

Performance of ReceiveAsync vs. BeginReceive - c#

I'm currently programming a client application and I'm wondering whether I should use the Socket class' ReceiveAsync or BeginReceive method. I have been using the latter so far, however, I found that it seems to stress the CPU quite a bit. Here is what my receive loop basically looks like:
private void socket_ReceiveCallback(IAsyncResult result_)
{
// does nothing else at the moment
socket.EndReceive(result_);
byte[] buffer = (byte[])result_.AsyncState;
// receive new packet
byte[] newBuffer = new byte[1024];
socket.BeginReceive(newBuffer, 0, newBuffer.Length, SocketFlags.None,
socket_ReceiveFallback, newBuffer);
}
Now I've been wondering if I am doing something wrong here, since other applications that communicate hardly stress the CPU at all. And also I'm wondering if I would be better off with using SocketAsyncEventArgs and ReceiveAsync.
So here are my questions:
Why is my loop stressing the CPU so much?
Should I use SocketAsyncEventArgs and ReceiveAsync instead of BeginReceive?

BeginReceive and EndReceive are remnants of the old legacy asynchronous pattern that were used before the introduction of the modern async and await keywords in C# 5.
So you should prefer to use ReceiveAsync over BeginReceive and EndReceive for asynchronous programming.
For really high performance scenarios you should use SocketAsyncEventArgs. This was designed for high performance and is used by the Kestrel web server.
From the remarks section for the SocketAsyncEventArgs documentation
The SocketAsyncEventArgs class is part of a set of enhancements to the System.Net.Sockets.Socket class that provide an alternative asynchronous pattern that can be used by specialized high-performance socket applications. This class was specifically designed for network server applications that require high performance. An application can use the enhanced asynchronous pattern exclusively or only in targeted hot areas (for example, when receiving large amounts of data).
The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the System.Net.Sockets.Socket class requires a System.IAsyncResult object be allocated for each asynchronous socket operation.
In the new System.Net.Sockets.Socket class enhancements, asynchronous socket operations are described by reusable SocketAsyncEventArgs objects allocated and maintained by the application. High-performance socket applications know best the amount of overlapped socket operations that must be sustained. The application can create as many of the SocketAsyncEventArgs objects that it needs. For example, if a server application needs to have 15 socket accept operations outstanding at all times to support incoming client connection rates, it can allocate 15 reusable SocketAsyncEventArgs objects for that purpose.

I have been benchmarking synchronous vs. asynchronous socket on on a localhost loopback connection. My results were that the asynchronous version was about 30% slower. That was surprising to me considering that async IO is all the rage now. It didn't matter how many threads I used. I could use 128 threads and still synchronous IO was faster.
The reason for that is, I believe, that async IO requires more allocations and more kernel mode transitions.
So you could just switch to synchronous IO, if you don't expect hundreds of simultaneous connections.

do answer this you'd have to profile your application.
What I wonder is
why I see no EndReceive
why you don't use the received buffer at all
and
why you allocate new buffers time and time again - this is the only opperation here that should take any resources (CPU/memory)
Have a look at this: http://msdn.microsoft.com/de-de/library/dxkwh6zw.aspx

I did a comparative for max-load, the results in GBs (giga-bytes per second):
ReceiveAsync: ~1,2GBs
BeginReceive: ~1,1GBs
Receive (in a thread loop): ~1,4GBs
Notes:
All results was made using loopback address (localhost) and using a thread for the send socket
8192 bytes for buffer size
For a big-load transfer I would suggest using the Receive in a thread but for better CPU performance with various connections would use ReceiveAsync or BeginReceive.

Related

C# Tcp communication Threadpool or asyn call

I have a C# application which listens for incoming TCP connections and receive data from previously accepted connections. Please help me whether i use Threadpool or Async methods to write the program?? Note that, once a connection is accepted, it doesn't close it and continuously receive data from the connection, at the same time it accept more connections

A threadpool thread works best when the code takes less than half a second and does not a lot of I/O that will block the thread. Which is exactly the opposite scenario you describe.
Using Socket.BeginReceive() is strongly indicated here. Highly optimized at both the operating level and the framework, your program uses a single thread to wait for all pending reads to complete. Scaling to handle thousands of active connections is quite feasible.
Writing asynchronous code cleanly can be quite difficult, variables that you'd normally make local variables in a method that runs on the threadpool thread turn into fields of a class. You need a state machine to keep track of the connection state. You'll greatly benefit from the async/await support available in C# version 5 which allows you to turn those state variables back into local variables. The little wrappers you find in this answer or this blog post will help a great deal.

It mainly depends on what do you want to do with your connections. If you have unknown number of connections which you don't know how long they will be open, I think it's better to do it with async calls.
But if you at least know the avg. number of connection and the connections are short-term connections like a web server's connections, then it's better to do it with threadpool since you won't waste time creating threads for each socket.

First off, if you possibly can, don't use TCP/IP. I recommend you self-host WebAPI and/or SignalR instead. But if you do decide to use TCP/IP...
You should always use asynchronous APIs for sockets. Ideally, you want to be constantly reading from the socket and periodically writing (keepalive messages, if nothing else). What you don't want to do is to have time where you're only reading (e.g., waiting for the next message), or time where you're only writing (e.g., sending a message). When you're reading, you should be periodically writing; and when you're writing, you should be continuously reading.
This helps you detect half-open connections, and also avoids deadlocks.
You may find my TCP/IP .NET Sockets FAQ helpful.

Definately use asynchronous sockets... It's never a good idea to block a thread waiting for IO.
If you decide you have high performance needs, you should consider using the EAP design pattern for your sockets.
This will allow you to create an asynchronous solution with a lower memory profile. However, some find that using events with sockets is awkard and a bit clunky... if you fall into this category, you could take a look at this blog post to use .NET 4.5's async/await keywords with it: http://blogs.msdn.com/b/pfxteam/archive/2011/12/15/10248293.aspx#comments

What benefits do I gain using BeginAccept vs a dedicated thread that blocks on Accept using TcpListener?

I was trying to think if I would get any scalability benefits from using BeginAccept vs just blocking in a dedicated thread waiting for connections. Obviously, the individual clients are going to use BeginXXX/EndXXX pairs to utilize IOCP for network IO, but I'm thinking waiting on a client connection should have very low latency. I plan on creating a Task to process incoming connections so my follow up code after the Accept is completed won't block the main accept thread for very long (long enough to create a Task object pretty much) and I can go right back to blocking on new connections. This is pretty much what I would do with BeginAccept/EndAccept only without the complexities of managing the asynchronous call.
So, my question is what, if any, scaleability benefits do I get by using IOCP for accept? Please note, this is not for sending / receiving on individual client sockets, but just for accepting connections on the server listening socket.

If you only have a single port you're listening on, it probably isn't worth it - just like if you only need to deal with a few connections at a time, you may not bother using asynchronous operations to handle those.
The server-side benefit of asynchrony is usually when you scale up - for handling connections, it's when you get a lot of connections; for BeginAccept it's when you're listening on a lot of different ports. That's probably rarer, but if you ever do want to listen to 100 different ports (e.g. if you host lots of web sites on one server and for some reason want to just listen on different ports instead of using Host headers) then you don't want 100 threads sitting around just consuming stack space.

One server many clients: Threads or classes

I'm doing an application in C#, with a server and some clients (not more than 60), and I would like to be able to deal with each client independently. The communication between server and client is simple but I have to wait for some ack's and I don't want to block any query.
So far, I've done two versions of the server side, one it's based on this:
http://aviadezra.blogspot.com.es/2008/07/code-sample-net-sockets-multiple.html
and in the other one, I basically create a new thread for each client. Both versions work fine...but I would like to know pros and cons of the two methods.
Any programming pattern to follow in this sort of situation?

To answer your question it's both. You have threads and classes running in those threads. Whether you use WCF, async, sockets, or whatever, you will be running some object in a thread (or shuffled around a threadpool like with async). With WCF you can configure the concurrency model, and if you have to wait for ack's or other acknowledgement you'd be best to set it to multiple threads so you don't block other requests.
In the example you linked to the author is using AsyncCallback as the mechanism for telling you that a socket has data. But, from the MSDN you can see:
Use an AsyncCallback delegate to process the results of an asynchronous operation in a separate thread
So it's really no different for small scale apps. Using async like this can help you avoid allocating stack space for each thread, if you were to do a large application this would matter. But for a small app I think it just adds complexity. C# 4.5+ and F# do a cleaner job with async, so if you can use something like that then maybe go for it.
Doing it the way you have, you have a single thread that is responsible for socket management. It'll sit and accept new connections. When it gets a request it hands that socket to a new dedicated thread that will then sit on that socket and read from it. This thread is your client connection. I like to encapsulate the socket client reading into a base class that can do the low level io required and then act as a router for requests. I.e. when I get request XYZ I'll do request ABC. You can even have it dispatch events and subscribe to those events elsewhere (like in the async example). Now you've decoupled your client logic from your socket reading logic.
If you do things with WCF you don't need sockets and all that extra handling, but you should still be aware that calls are multi-threaded and properly synchronize your application when applicable.
For 60 clients I think you should choose whatever works best for you. WCF is easy to set up and easy to work with, I'd use that, but sockets are fine too. If you are concerned about the number of threads running, don't be. While it's bad to have too many threads running, most of your threads will actually be blocked while they are waiting on IO. Threads that are in a wait state aren't scheduled by the OS and don't really matter. Not to mention the waiting is most likely is using io completion ports under the hood so the wait overhead is pretty much negligible for a small application like yours.
In the end, I'd go with whatever is easiest to write, maintain, and extend.

Non-blocking Sockets vs BeginXXX vs SocketAsyncEventArgs

Can anyone please enlighten me about current .NET socket techniques?
Non-blocking sockets
If I set Socket.Blocking = false and use async operations - what will happen?
Is there any method of polling multiple non-blocking sockets instead of checking them for availability one-by-one (something like trivial select() or any other mechanism, some IOCP-related may be) aside from Socket.Select()?
BeginXXX and SocketAsyncEventArgs
Are they operating on blocking sockets under the hood and just hide thread creation?
Will manual creation of threads be equal to using BeginXXX methods?
Is there any other pros on using SocketAsyncEventArgs other then it allows to create pool of sockets and everything related to them?
And one final question: if app is working as some kind of heavily loaded binary proxy with most logic done in single thread - what provides better scalability: non-blocking approach or async operations?

1: Socket.Select should do that, although I don't tend to use that approach personally; in particular those IList get annoying at high volumes
2: no, other way around; the blocking operations are essentially using the non-blocking in the background, but with gates. No, they don't create threads under the hood - unless you count the callback when something is inbound. I have an example here that is serving 12k connections using SocketAsyncEventArgs - the thread count is something like 20. Among the intentions of SocketAsyncEventArgs is that:
it is far easier to pool effectively, without having lots of objects created/collected per operation
you can handle the "data is available now" scenario very efficiently without needing a callback at all (if the method returns false, you are meant to process the data immediately - no callback will be forthcoming)
For scalability: async

.NET sockets vs C++ sockets at high performance

My question is to settle an argument with my co-workers on C++ vs C#.
We have implemented a server that receives a large amount of UDP streams. This server was developed in C++ using asynchronous sockets and overlapped I/O using completion ports. We use 5 completion ports with 5 threads. This server can easily handle a 500 Mbps throughput on a gigabit network without any lost of packets / error (we didn't push our tests farther than 500 Mbps).
We have tried to re-implement the same kind of server in C# and we have not been able to reach the same incoming throughput. We are using asynchronous receive using ReceiveAsync method and a pool of SocketAsyncEventArgs to avoid the overhead of creating new object for every receive call. Each SAEventArgs has a buffer set to it so we do not need to allocate memory for every receive. The pool is very, very large so we can queue more than 100 receive requests. This server is unable to handle an incoming throughput of more than 240 Mbps. Over that limit, we lose some packets in our UDP streams.
My question is this: should I expect the same performance using C++ sockets and C# sockets? My opinion is that it should be the same performance if memory is managed correctly in .NET.
Side question: would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?
I suspect the only reference would be the implementation (ie. Reflector or other assembly de-compiler). With that you will find that all asynchronous IO goes through an IO Completion Port with call backs being processed in the IO-thread pool (which is separate to the normal thread pool).
use 5 completion ports
I would expect to use a single completion port processing all the IO into a single pool of threads with one thread per pool servicing completions (assuming you are doing any other IO, including disk, asynchronously as well).
Multiple completion ports would make sense if you have some form of prioritisation going on.
My question is this: should I expect the same performance using C++ sockets and C# sockets?
Yes or no, depending on how narrowly you define the "using ... sockets" part. In terms of the operations from the start of the asynchronous operation until the completion is posted to the completion port I would expect no significant difference (all the processing is in the Win32 API or Windows kernel).
However the safety that the .NET runtime provides will add some overhead. Eg. buffer lengths will be checked, delegates validated etc. If the limit on the application is CPU then this is likely to make a difference, and at the extreme a small difference can easily add up.
Also the .NET version will occasionally pause for GC (.NET 4.5 does asynchronous collection, so this will get better in the future). There are techniques to minimise garbage accumulating (eg. reuse objects rather than creating them, make use of structures while avoiding boxing).
In the end, if the C++ version works and is meeting your performance needs, why port?

You can't do a straight port of the code from C++ to C# and expect the same performance. .NET does a lot more than C++ when it comes to memory management (GC) and making sure that your code is safe (boundary checks etc).
I would allocate one large buffer for all IO operations (for instance 65535 x 500 = 32767500 bytes) and then assign a chunk to each SocketAsyncEventArgs (and for send operations). Memory is cheaper than CPU. Use a buffer manager / factory to provide chunks for all connections and IO operations (Flyweight pattern). Microsoft does this in their Async example.
Both Begin/End and Async methods uses IO completion ports in the background. The latter doesn't need to allocate objects for each operation which boosts performance.

My guess is that you're not seeing the same performance because .NET and C++ are actually doing different things. Your C++ code may not be as safe, or check boundaries. Also, are you simply measuring the ability to receive the packets without any processing? Or does your throughput include packet processing time? If so, then the code you may have written to process the packets may not be as efficient.
I'd suggest using a profiler to check where the most time is being spent and trying to optimize that. The actual socket code should be quite performant.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.