I am using a System.Net.Sockets.Socket in TCP mode to send data to some other machine. I implemented a listener to receive data formatted in a certain way and it is all working nicely. Now I am looking for ways to optimize use of bandwidth and delivery speed.
I wonder if it is worth sending a single array of bytes (or Span in the newer frameworks) instead of using a consecutive series of Send calls. I now send my encoded message in parts (head, length, tail, end, using separate Send calls).
Sure I could test this and I will but I do not have much experience with this and may be missing important considerations.
I realize there is a nowait option that may have an impact and that the socket and whatever is there lower in the stack may apply its own optimizing policy.
In would like to be able to prioritize delivery time (the time between the call to Send and reception at the other end) over bandwidth use for those messages to which it matters, and be more lenient with messages that are not time critical. But then again, I create a new socket whenever I find there is something in my queue and then use that until the queue is empty for what could be more than one message so this may not always work. Ideally I would want to be lenient so the socket can optimize payload until a time critical message hits the queue and then tell the socket to hurry until no more time critical messages are in the queue.
So my primary question is should I build my message before calling Send once (would that potentially do any good or just waste CPU cycles) and are there any caveats an experienced TCP programmer could make me aware of?
I have a UDP sender and a UDP listener. Transfering messages works nicely. But...
It appears when I am overfeeding (sending sustained data quickly) the listening socket may throw on the call to ReceiveFrom with error code 10040 which means some buffer was not large enough. The exception message is
A message sent on a datagram socket was larger than the internal
message buffer or some other network limit, or the buffer used to
receive a datagram into was smaller than the datagram itself.
Fair enough. But the problem is I will then get this exception on every following call to ReceiveFrom. The socket appears broken. I am willing to accept the transfer failed but I now want to flush the socket's receive buffer and continue.
I can prevent this from happening by setting a substantial receive buffer size of 128K on the listening socket (as opposed to the default of 8K). I can also fix it by having the sender pause for 1 ms after sending a chunk of 65507 bytes of a multi-chunk bulk message.
But I do not feel safe. If this exception still occurs, I want to log and continue (better luck next time). Recreating the socket and restarting the listen thread seems blunt. Is there a better way?
Something unrelated I do not like: Socket.ReceiveFrom throws an exception after the timeout. This is stupid, timeouts are normal behavior. I would expect a TryReceiveFrom method and I do not like using the exception handler as a flow control statement, which seems to be the only option I have. Is there a better way?
[Edit]
On further scrutiny (I ran into exceptions being thrown again after sending messages in one piece in an effort to optimize) I found the main reason for my troubles. It turns out the ReceiveFrom method is not the friendliest API...
Here it says:
"With connectionless protocols, ReceiveFrom will read the first
enqueued datagram received into the local network buffer. If the
datagram you receive is larger than the size of buffer, the
ReceiveFrom method will fill buffer with as much of the message as is
possible, and throw a SocketException."
In other words: with UDP a full datagram will always be returned regardless the size argument, which is effectively ignored in its capacity as a limiter, and you'd better make sure your buffer is big enough.
So you want the buffer passed to ReceiveFrom to be at least 64K in order for it to be big enough for the biggest possible datagram, see what you got by checking the return value and work with that.
It gets a little worst still: the size argument is not entirely ignored, if offset plus size exceeds the length of your buffer you also get an exception. So it is ignored on the one hand because it does not limit the number of bytes being written to your buffer but it is still being sanity-checked.
After discovering this quirk and respecting it I did have not had any overruns, no matter how hard I bashed it from the sending end (I send a large bitmap repeatedly without pausing). The report on my journey my save others some frustration.
I accept now that when the buffer overruns, the socket is broken and needs to be recreated.
The exception being thrown after a timeout of Socket.ReceiveFrom can be prevented by first checking if any data is available using the Socket.Poll method. This has its own timeout argument. So it is pointless to set ReceiveTimeout on the socket, using Poll in tandem with ReceiveFrom works much nicer.
I have an application that needs to transfere many small (1-3Byte) messages via a TCP connection (WLAN).
I know that sending 100kB at once is much faster than sending them in very small pieces (nearly bytewise).
Nevertheless the information I need to transfere is only 1-3bytes in size. Collecting data before sending would increase throughput, but it is important that the small pieces of data are transfered as early/fast as possible. So gathering data before sending is the problem.
Now I ask, what would be the best way, not to send every message individually on the one hand and on the other hand not to delay their transmission longer than necessary.
Now I'm thinking about creating a little buffer. When the first message is add, I start a timer with 1ms timeout. After that millisecond, data will be transfered. Independend of how many bytes are in the queue. But I don't know if this is a good solution.
Isn't there a way that the TCPClient/Server classes of .NET themselves have a solution for such a problem. I mean, they should know, when the current transmission is finished. In the meantime they could accumulate all send requests and send them out as in one transaction.
The TCP stack by default will already buffer the data internally in order to reduce overhead when sending. See Nagle Algorithm for details. Just make sure that you have this algorithm enabled, i.e. set Nodelay to false.
Let's say I have a static list List<string> dataQueue, where data keeps getting added at random intervals and also at a varying rate (1-1000 entries/second).
My main objective is to send the data from the list to the server, I'm using a TcpClient class.
What I've done so far is, I'm sending the data synchronously to the client in a Single thread
byte[] bytes = Encoding.ASCII.GetBytes(message);
tcpClient.GetStream().Write(bytes, 0, bytes.Length);
//The client is already connected at the start
And I remove the entry from the list, once the data is sent.
This works fine, but the speed of data being sent is not fast enough, the list gets populated and consumes more memory, as the list gets iterated and sent one by one.
My question is can I use the same tcpClient object to write concurrently from another thread or can I use another tcpClient object with a new connection to the same server in another thread? What is the most efficient(quickest) way to send this data to the server?
PS: I don't want to use UDP
Right; this is a fun topic which I think I can opine about. It sounds like you are sharing a single socket between multiple threads - perfectly valid as long as you do it very carefully. A TCP socket is a logical stream of bytes, so you can't use it concurrently as such, but if your code is fast enough, you can share the socket very effectively, with each message being consecutive.
Probably the very first thing to look at is: how are you actually writing the data to the socket? what is your framing/encoding code like? If this code is simply bad/inefficient: it can probably be improved. For example, is it indirectly creating a new byte[] per string via a naive Encode call? Are there multiple buffers involved? Is it calling Send multiple times while framing? How is it approaching the issue of packet fragmentation? etc
As a very first thing to try - you could avoid some buffer allocations:
var enc = Encoding.ASCII;
byte[] bytes = ArrayPool<byte>.Shared.Rent(enc.GetMaxByteCount(message.Length));
// note: leased buffers can be oversized; and in general, GetMaxByteCount will
// also be oversized; so it is *very* important to track how many bytes you've used
int byteCount = enc.GetBytes(message, 0, message.Length, bytes, 0);
tcpClient.GetStream().Write(bytes, 0, byteCount);
ArrayPool<byte>.Shared.Return(bytes);
This uses a leased buffer to avoid creating a byte[] each time - which can massively improve GC impact. If it was me, I'd also probably be using a raw Socket rather than the TcpClient and Stream abstractions, which frankly don't gain you a lot. Note: if you have other framing to do: include that in the size of the buffer you rent, use appropriate offsets when writing each piece, and only write once - i.e. prepare the entire buffer once - avoid multiple calls to Send.
Right now, it sounds like you have a queue and dedicated writer; i.e. your app code appends to the queue, and your writer code dequeues things and writes them to the socket. This is a reasonably way to implement things, although I'd add some notes:
List<T> is a terrible way to implement a queue - removing things from the start requires a reshuffle of everything else (which is expensive); if possible, prefer Queue<T>, which is implemented perfectly for your scenario
it will require synchronization, meaning you need to ensure that only one thread alters the queue at a time - this is typically done via a simple lock, i.e. lock(queue) {queue.Enqueue(newItem);} and SomeItem next; lock(queue) { next = queue.Count == 0 ? null : queue.Dequeue(); } if (next != null) {...write it...}.
This approach is simple, and has some advantages in terms of avoiding packet fragmentation - the writer can use a staging buffer, and only actually write to the socket when a certain threshold is buffered, or when the queue is empty, for example - but it has the possibility of creating a huge backlog when stalls occur.
However! The fact that a backlog has occurred indicates that something isn't keeping up; this could be the network (bandwidth), the remote server (CPU) - or perhaps the local outbound network hardware. If this is only happening in small blips that then resolve themselves - fine (especially if it happens when some of the outbound messages are huge), but: one to watch.
If this kind of backlog is recurring, then frankly you need to consider that you're simply saturated for the current design, so you need to unblock one of the pinch points:
making sure your encoding code is efficient is step zero
you could move the encode step into the app-code, i.e. prepare a frame before taking the lock, encode the message, and only enqueue an entirely prepared frame; this means that the writer thread doesn't have to do anything except dequeue, write, recycle - but it makes buffer management more complex (obviously you can't recycle buffers until they've been completely processed)
reducing packet fragmentation may help significantly, if you're not already taking steps to achieve that
otherwise, you might need (after investigating the blockage):
better local network hardware (NIC) or physical machine hardware (CPU etc)
multiple sockets (and queues/workers) to round-robin between, distributing load
perhaps multiple server processes, with a port per server, so your multiple sockets are talking to different processes
a better server
multiple servers
Note: in any scenario that involves multiple sockets, you want to be careful not to go mad and have too many dedicated worker threads; if that number goes above, say, 10 threads, you probably want to consider other options - perhaps involving async IO and/or pipelines (below).
For completeness, another basic approach is to write from the app-code; this approach is even simpler, and avoids the backlog of unsent work, but: it means that now your app-code threads themselves will back up under load. If your app-code threads are actually worker threads, and they're blocked on a sync/lock, then this can be really bad; you do not want to saturate the thread-pool, as you can end up in the scenario where no thread-pool threads are available to satisfy the IO work required to unblock whichever writer is active, which can land you in real problems. This is not usually a scheme that you want to use for high load/volume, as it gets problematic very quickly - and it is very hard to avoid packet fragmentation since each individual message has no way of knowing whether more messages are about to come in.
Another option to consider, recently, is "pipelines"; this is a new IO framework in .NET that is designed for high volume networking, giving particular attention to things like async IO, buffer re-use, and a well-implemented buffer/back-log mechanism that makes it possible to use the simple writer approach (syncronize while writing) and have that not translate into direct sends - it manifests as an async writer with access to a backlog, which makes packet fragmentation avoidance simple and efficient. This is quite an advanced area, but it can be very effective. The problematic part for you will be: it is designed for async usage throughout, even for writes - so if your app-code is currently synchronous, this could be a pain to implement. But: it is an area to consider. I have a number of blog posts talking about this topic, and a range of OSS examples and real-life libraries that make use of pipelines that I can point you at, but: this isn't a "quick fix" - it is a radical overhaul of your entire IO layer. It also isn't a magic bullet - it can only remove overhead due to local IO processing costs.
I have two PCs connected by direct Ethernet cable over 1Gbps link. One of them act as TCP Server and other act as TCP Client/s. Now I would like to achieve maximum possible network throughput between these two.
Options I tried:
Creating multiple clients on PC-1 with different port numbers, connecting to the TCP Server. The reason for creating multiple clients is to increase the network throughput but here I have an issue.
I have a buffer Queue of Events to be sent to Server. There will be multiple messages with same Event Number. The server has to acquire all the messages then sort the messages based on the Event number. Each client now dequeues the message from Concurrent Queue and sends to the server. After sent, again the client repeats the same. I have put constraint on the client side that Event-2 will not be sent until all messaged labelled with Event-1 is sent. Hence, I see the sent Event order correct. And the TCP server continuously receives from all the clients.
Now lets come to the problem:
The server is receiving the data in little random manner, like I have shown in the image. The randomness between two successive events is getting worse after some time of acquisition. I can think of this random behaviour is due to parallel worker threads being executed for IO Completion call backs.
technology used: F# Socket Async with SocketEventArgs
Solution I tried: Instead of allowing receive from all the clients at server side, I tried to poll for the next available client with pending data then it ensured the correct order but its performance is not at all comparable to the earlier approach.
I want to receive in the same order/ nearly same order (but not non-deterministic randomness) as being sent from the clients. Is there any way I can preserve the order and also maintain the better throughput? What are the best ways to achieve nearly 100% network throughput over two PCs?
As others have pointed out in the comments, a single TCP connection is likely to give you the highest throughput, if it's TCP you want to use.
You can possibly achieve slightly (really marginally) higher throughput with UDP, but then you have the hassle of recreating all the goodies TCP gives you for free.
If you want bidirectional high volume high speed throughput (as opposed to high volume just one way at a time), then it's possible one connection for each direction is easier to cope with, but I don't have that much experience with it.
Design tips
You should keep the connection open. The client will need to ask "are you still there?" at regular intervals if no other communication goes on. (On second thought, I realize that the only purpose of this is to allow quick reponse and the possiblity for the server to initiate a message transaction. So I revise it to: keep the connection open for a full transaction at least.)
Also, you should split up large messages - messages over a certain size. Keep the number of bytes you send in each chunk to a maximum round hex number, typically 8K, 16K, 32K or 64K on a local network. Experiment with sizes. The suggested max sizes has been optimal since Windows 3 at least. You need some sort of protocol with a chunck consisting of a fixed header (typically a magic number for check and resynch, a chunk number also for check and for analysis, and a total packet length) followed by the data.
You can possibly further improve throughput with compression (usually low quick compression) - it depends very much on the data, and whether you're on a fast or slow network.
Then there's this hassle that one typically runs into - problems with the Nagle algorith - and I no longer remember enough of the details there. I believe I used to overcome that by sending an acknowledgement in return for each chunk sent, and I suspect by doing that you satisfy the design requirements, and so avoid waiting for the last bytes to come in. But do google this.