Signalr Message batching - c#

I am researching a development of real time messaging with Signal R with web socket as transport.
My application will generate multiple messages at high rate and one question I came across is whether it would be a good idea to consider batching multiple messages before sending them out to the clients.
I have looked at the streaming functionality Signal R offers but I don't think it fits well in this case.
The messages will have variable sizes from just few bytes and up to kilobytes.
As I understand if messages are batched then there will be less time spent for serialization?
Of course this will depend on the serializer being used and may vary depending on message size.
Also there will be less round trips between client and server?
So the question is whether there would be performance gains by batching multiple messages before sending them out to clients.
I understand that it would be hard to give an conclusive answer but still would want to hear some ideas on the topic.

Related

SignalR best practices: Notify clients about new data and pull or push new data

I am currently implementing SignalR in my project and I am wondering if I should notify the clients about data changes and let them pull the new data or should I push the new data to the clients?
A bit concerned about the size limitations, what if the data that has to be sent is quite large?
Also what if there are a lot of users around 1000 at once?
Generally, I am looking about the best practices to make sure this works stable.
This is an opinion question so there isn't "an answer", just things to consider.
Definitely depends on how big the data is. You need to think about the fact that you're sending X bytes to N users at R rate. You can do the math to figure out if this will be problematic for performance but "quite large" is relative.
Some things to take into account:
General .NET performance overhead. Are you going to cause GC pressure by sending these large messages?
How does the serializer handle these types of payloads?
Remember that the data needs to be copied into individual connection buffers so are you going to be causing an insane amount of GC's with this scenario?
If you're using scale out, I highly recommend against sending large payloads as you're now sending that payload to each of the servers involved in your farm/cluster and it could get chatty.
Other things to consider:
Is the data really that dynamic?
Does each client need to get the entire dataset of can it be a diff?

How to preserve the message sent order at TCP server side with multiple clients

I have two PCs connected by direct Ethernet cable over 1Gbps link. One of them act as TCP Server and other act as TCP Client/s. Now I would like to achieve maximum possible network throughput between these two.
Options I tried:
Creating multiple clients on PC-1 with different port numbers, connecting to the TCP Server. The reason for creating multiple clients is to increase the network throughput but here I have an issue.
I have a buffer Queue of Events to be sent to Server. There will be multiple messages with same Event Number. The server has to acquire all the messages then sort the messages based on the Event number. Each client now dequeues the message from Concurrent Queue and sends to the server. After sent, again the client repeats the same. I have put constraint on the client side that Event-2 will not be sent until all messaged labelled with Event-1 is sent. Hence, I see the sent Event order correct. And the TCP server continuously receives from all the clients.
Now lets come to the problem:
The server is receiving the data in little random manner, like I have shown in the image. The randomness between two successive events is getting worse after some time of acquisition. I can think of this random behaviour is due to parallel worker threads being executed for IO Completion call backs.
technology used: F# Socket Async with SocketEventArgs
Solution I tried: Instead of allowing receive from all the clients at server side, I tried to poll for the next available client with pending data then it ensured the correct order but its performance is not at all comparable to the earlier approach.
I want to receive in the same order/ nearly same order (but not non-deterministic randomness) as being sent from the clients. Is there any way I can preserve the order and also maintain the better throughput? What are the best ways to achieve nearly 100% network throughput over two PCs?
As others have pointed out in the comments, a single TCP connection is likely to give you the highest throughput, if it's TCP you want to use.
You can possibly achieve slightly (really marginally) higher throughput with UDP, but then you have the hassle of recreating all the goodies TCP gives you for free.
If you want bidirectional high volume high speed throughput (as opposed to high volume just one way at a time), then it's possible one connection for each direction is easier to cope with, but I don't have that much experience with it.
Design tips
You should keep the connection open. The client will need to ask "are you still there?" at regular intervals if no other communication goes on. (On second thought, I realize that the only purpose of this is to allow quick reponse and the possiblity for the server to initiate a message transaction. So I revise it to: keep the connection open for a full transaction at least.)
Also, you should split up large messages - messages over a certain size. Keep the number of bytes you send in each chunk to a maximum round hex number, typically 8K, 16K, 32K or 64K on a local network. Experiment with sizes. The suggested max sizes has been optimal since Windows 3 at least. You need some sort of protocol with a chunck consisting of a fixed header (typically a magic number for check and resynch, a chunk number also for check and for analysis, and a total packet length) followed by the data.
You can possibly further improve throughput with compression (usually low quick compression) - it depends very much on the data, and whether you're on a fast or slow network.
Then there's this hassle that one typically runs into - problems with the Nagle algorith - and I no longer remember enough of the details there. I believe I used to overcome that by sending an acknowledgement in return for each chunk sent, and I suspect by doing that you satisfy the design requirements, and so avoid waiting for the last bytes to come in. But do google this.

SocketAsyncEventArgs ReceiveAsync Limitations (arg.Complete not called)

We have a pretty standard TCP implementation of SocketAsyncEventArgs (no real difference to the numerous examples you can google).
We have a load testing console app (also using SocketAsyncEventArgs) that sends x many messages per second. We use thread spinning to introduce mostly accurate intervals within the 1000ms to send the message (as opposed to sending x messages as fast as possible and then waiting for the rest of the 1000ms to elapse).
The messages we send are approximately 2k in size, to which the server implementation responds (on the same socket) with a pre-allocated HTTP OK 200 response.
We would expect to be able to send 100's if not 1000's of messages per second using SocketAsyncEventArgs. We found that with a simple blocking TcpListener/TcpClient we were able to process ~150msg/s. However, even with just 50 messages per second over 20 seconds, we lose 27 of the 1000 messages on average.
This is a TCP implementation, so we of course expected to lose no messages; especially given such a low throughput.
I'm trying to avoid pasting the entire implementation (~250 lines), but code available on request if you believe it helps. My question is, what load should we expect from SAEA? Given that we preallocate separate pools for Accept/Receive/Send args which we have confirmed are never starved, why do we not receive an arg.Complete callback for each message?
NB: No socket errors are witnessed during execution
Responses to comments:
#usr: Like you, we were concerned that our implementation may have serious issues cooked in. To confirm this we took the downloadable zip from this popular Code Project example project. We adapted the load test solution to work with the new example and re-ran our tests. We experienced EXACTLY the same results using someone else's code (which is primarily why we decided to approach the SO community).
We sent 50 msg/sec for 20 seconds, both the code project example and our own code resulted in an average of 973/1000 receive operations. Please note, we took our measurements at the most rudimentary level to reduce risk of incorrect monitoring. That is, we used a static int with Interlocked.Increment on the onReceive method - onComplete is only called for asynchronous operations, onReceive is invoked both by onComplete and when !willRaiseEvent.
All operations performed on a single machine using the loopback address.
Having experienced issues with two completely different implementations, we then doubted our load test implementation. We confirmed via Wireshark that our load test project did indeed send the traffic as expected (fragmentation was present in the pcap log, but wireshark indicated the packets were reassembled as expected). My networking understanding at low levels is weaker than I'd like, I admit, but given the amount of fragmentation nowhere near matches the number of missing messages, we are for now assuming the two are not related. As I udnerstand it, fragmentation should be handled at a lower layer, and completely abstracted at our level of API calls.
#Peter,
Fair point, in a normal networking scenario such level of timing accuracy would be utterly pointless. However, the waiting is very simple to implement and wireshark confirms the timing of our messages to be as accurate as the pcap log's precision allows. Given we are only testing on loopback (the same code has been deployed to Azure cloud services also which is the intended destination for the code once it is production level, but the same if not worse results were found on A0, A1, and A8 instances), we wanted to ensure some level of throttling. The code would easily push 1000 async args in a few ms if there was no throttling, and that is not a level of stress we are aiming for.
I would agree, given it is a TCP implementation, there must be a bug in our code. Are you aware of any bugs in the linked Code Project example? Because it exhibits the same issues as our code.
#usr, as predicted the buffers did contain multiple messages. We now need to work out how it is we're going to marry messages back together (TCP guarantees sequence of delivery, but in using multiple SAEA's we lose that guarantee through threading).
The best solution is to abandon custom TCP protocols. Use HTTP and protocol buffers for example. Or, web services. For all of this there are fast and easy to use asynchronous libraries available. Assuming this is not what you want:
Define a message framing format. For example, prepend BitConvert.GetBytes(messageLengthInBytes) to each message. That way you can deconstruct the stream.

Message queue considerations - MSMQ storage issue kills current app

I am working on two apps that use an MSMQ as a message bus mechanism so that A transfers messages to B. This clearly has to be robust so initially we chose MSMQ to store and transfer the messages.
When testing the app we noticed that in real-world conditions, where msmq is called to handle approximately 50.000 messages a minute (which sounds quite low to me) then we quickly reach the max storage size of the msmq /storage directory (defaults to 1.2gb i think).
We can increase that but I was wondering whether there is a better approach to handle slow receivers and fast senders. Is there a better queue or a better approach to use in this case?
Actually it isnt so much a problem of slow receivers since msmq will maintain the (received) messages in the storage dir for something like 6 hours or until the service is restarted. So essentially if in 5 minutes we reach the 1gb threshold then in a few hours we will reach terratybes of data!
Please read this blog to understand how MSMQ uses resources which I put together after years of supporting MSMQ at Microsoft.
It really does cover all the areas you need to know about.
If you have heard something about MSMQ that isn't in the blog then it is alomost certainly wrong - such as the 1.2GB storage limit for MSMQ. The maximum size of the msmq\storage directory is the hard disk capacity - it's an NTFS folder!
You should be able to have a queue with millions of messages in it (assuming you have enough kernel memory, as mentioned in the blog)
Cheers
John Breakwell
You should apply an SLA to your subscribers, they have to read their messages with in X amount of time or they lose them. You can scale this SLA to match the volume of messages that arrive.
For subscribers that cannot meet their SLA then simply put, they don't really care about receiving their messages that quickly (if they did, they would be available). For these subscribers you can offer a slower channel, such as an XML dump of the messages in the last hour (or what ever granularity is required). You probably wouldn't store each individual message here, but just an aggregate of changes (eg, something that can be queried from a DB).
Use separate queues for each message type, this way you can apply different priorities depending on the importance of the message, if one queue becomes full, messages of other types won't be blocked. It also makes it simpler to monitor if each message is being processed within its SLA by looking at the first message in the queue and seeing when it was added to determine how long it was waiting (see NServiceBus).
From your above metrics of 1GB in 5 minutes at 50,000 messages/minute I calculate each message to be about 4kb. This is quite large for a message since messages should normally only be carrying top level details about something happening, mostly IDs of what was changed, and the intent of what was changed. Larger data is better served from some other out-of-band channel for transferring large blobs (eg, file share, sftp, etc).
Also, since a service should encapsulate its own data, you shouldn't need to share much data between services. So large data within a service using messages to say what happened isn't unusual, large data between separate services using messages indicates that some boundaries are probably leaking.

Tcp Reliability versus Udp Burdens for serious, high-performance server

Speed, optimization, and scalability are the typical comparisons between the Udp and Tcp protocols. Tcp touts reliability with the disadvantage of a little extra overhead, but speed is good to excellent. Once a Tcp socket is instanced, keeping the socket open requires some overhead. But compared to the oft described burdens of Udp, which protocol actually has more overhead?. I've also heard that there are scalability issues with Tcp...yet the Internet (Web pages/servers) runs on Tcp - so what is it about Tcp that inhibits scalability?
Okay...so Udp doesn't require that overhead of keeping a connection open. But, it requires that you write extra methods to ensure all of the packet gets there, hopefully in the order that you want it received. If a packet isn't received in full, then you have to tell the client or server to resend. And you also have to keep some sort of message collection for partial packets, rebuild the partial messages, and check for a complete message before the message can finally be processed. Not to mention if the second part of a message never makes it, you have to either say resend the entire thing, or resend the part we are missing, or whatever.
Basically, my questions are:
Why would I choose Udp over Tcp for a serious, high-performance server with the added "overhead" of message
checking and manual ACK versus the "overhead" of a continuous stream?
If Tcp is good enough for the likes of World of Warcraft, why isn't Tcp more widely accepted as the protocol to use for a game server?
Note: I am not opposed to implementing Udp options for a server. We are using C# on .Net 3.5 framework. So I would also be interested in the best practices for dealing with Udp burdens. I am also using the asynchronous methods at the socket level rather than using TcpListener, TcpClient, etc. etc.
Well, I would recommend reading up some more. There are plenty places to look at the pro's and con's of TCP vs. UDP and vice versa, here are a few:
What Are The Advantages Of Using TCP Over UDP?
When should I use UDP instead of TCP?
TCP and UDP
What are the advantages of UDP over TCP?
However, this link may interest you the most, as it is directly about networked game programming:
Gaffer on Games - UDP vs. TCP
If I were to quote something small:
The decision seems pretty clear then,
TCP does everything we want and its
super easy to use, while UDP is a huge
pain in the ass and we have to code
everything ourselves from scratch. So
obviously we just use TCP right?
Wrong.
Using TCP is the worst possible
mistake you can make when developing a
networked game! To understand why, you
need to see what TCP is actually doing
above IP to make everything look so
simple!
I still recommend doing your own research on the matter though, and make sure which of the protocols suits your needs at the end of the day. This being said, it does seem to be the case that majority of games use UDP for their data. Anything that updates the entire state continuously does not need the overhead of guaranteed packet delivery.
First, I'll just paraphrase Stevens from Unix Network Programming Section 22.4 "When to Use UDP instead of TCP":
He basically says the following:
UDP is the only option for broadcast / multicast - so you have to use it there.
UDP can be used for simple request / reply apps. But you have to add your own error detection meaning at least acks, timesouts and retransmission.
UDP should not be used for bulk data transfer ( file transfers ) since you would have to build in all the functionality arleady in TCP to make it work right.
UDP should be used for real time data where speed of delivery is most important and some data loss is not an issue such as real time sensor data, live multimedia streams, real time stock quotes, etc.
The answer to your first question is very dependent on your definition of "high-performance". If you're primary concern is low latency, i.e. the individual data packets / requests arriving as quickly as possible than UDP would be the way to go. There are two primary reasons for this. Assuming packets / requests are fairly independent of each other than using TCP would introduce a problem known as head-of-line blocking.
Let's say you send two independent packets / requests. First A then B. Since TCP is stream based, if A get's lost in the network and needs to be retransmitted then even if B has already successfully arrived it can't be delivered to the application by the stack until A arrives, introducing unnecessary latency. Not only that, but until A arrives, B can't be acknowledged by the stack which might cause B to also be retransmitted causing needless network congestion.
One way around this problem is to use separate connections for each request, however this also introduces latency and hogs system resources. UDP bypasses all these problems.
Another issue in high performance ( low latency ) servers is the Nagle Algorithm which can add significant latency in TCP communications.
The answer to your second question is that WoW probably sends streams of data, not independent request / reply pairs. Also, some of the latency of TCP can be removed by disabling the Nagle algorithm. If they do use some request / reply communications they may have simply made a design decision that reliability is more important to them than latency.
Define "serious high performance" - how many concurrent connections are you talking about and how much data is flowing?
Take a look at the answers to this question What do you use when you need reliable UDP? which list some of the reliable protocols that have already been built on UDP. You might find one that works for your situation, or you may at least find some useful ideas.
The key to using UDP effectively here is to have some level of reliability and some level of unreliability and you get more of an advantage the more each datagram is able to be handled independently of others. The advantage over TCP is that you get to act on each datagram and decide if you can use it as it arrives. This is why it works for action games.
So, IMHO, if you need 100% reliability AND in order delivery then go with TCP; don't try and reimplement TCP in UDP.
It's Reliability vs Performance.
FPS games don't require -all- the packets to reach the destination, to reach it in order, to be exceptionally big, or to assure big throughput. They only require the packets to reach the server AS SOON AS POSSIBLE. This is the ultimate priority and overhead of TCP is simply an unnecessary burden.
WoW, in its "not quite realtime" communication and often tons of data to transmit (in crowded areas), may have to deal with packets exceeding MTU (requiring fragmentation) and requires reliability (fewer bigger packets = packet lost hurts more). So its choice of TCP is logical. Same would go for most turn-based strategy games and the like. In games where the player with ping of 30ms beats the player with ping 50ms UDP is the king.
I think the biggest part of TCP/IP that inhibits scalability is that it maintains a buffer on all incoming / outgoing connections up to basically the size of the window. So if I have a high latency but high throughput client i'm talking to, I have to keep all sent packets in buffer until I receive an ack. So for a few connections this is fine, but for handling 100K connections, it can start to be problematic overhead. On the receiving end, if a packet is dropped, again it will buffer all new packets received until the one required is retransmitted.
If you're going to implement retransmission, you need to do the same thing, and hence will have the same overhead. However, UDP does give you an advantage, if you know the end-to-end link speeds, or if certain message can be delivered out of order, or certain messages don't need retransmission. Keeping the gaming scenario:
packet 1 = move to 1,1
packet 2 = shoot
packet 3 = move to 2,2
Most game designers, if packet 1 is lost, but packet 3 is received, packet 1 is no longer important because it contains out of date information anyways. However, you could opt to say packet 2 is important, so if it's not acked, send a retransmission.
If you need high throughput, and connect two servers directly with 1000Mbps ethernet, TCP/IP will take awhile to scale and have additional overhead, and will likely never achieve a true gigabit connection due to the congestion avoidance mechanisms. However, you know it's 1 Gbps, so you can set up you're UDP to transmit at up to a 1 Gbps (minus overhead) yourself.
To answer you're questions more directly:
If you are going to ack every packet anyways, there isn't a massive benefit to having UDP, other than you can process some messages while waiting for retransmission (unless you want in-order delivery as well).
Udp isn't considered for game servers as much, mainly out of the scenario above, and real time combat systems such as First person shooters, where a message can be dropped, and the new message to come will invalidate the dropped message anyways. World of warcraft can get away with using TCP, since they don't have to be as precise with timing, and likely have some good logic that makes it more difficult for you to tell the difference anyways. The combat system simply doesn't require the speed.
I'd also contend that some of the justification is holdover from years ago, when everyone had less-reliable, and slower Internet connections. TCP is also more lenient for sharing the network, so if there's a lot going on, it will slow down so everyone gets a share of the connection (congestion avoidance).
TCP/IP is a protocol designed by people far smarter than I over years of research. Tuning in the last several years has allowed it to perform better with the faster and faster average network speeds we are seeing, and doesn't require a great understanding to use.
However, replacing this with UDP, does require a significant understanding of networking. I've seen badly written UDP programs saturate 1Gbps links and kill all traffic on the link, because they implemented a rather naive retransmission algorithm.
Here's a list of things TCP/IP can now do that you'd loose by going UDP:
- In order arrival to you're program
- retransmission (Now with Fast retransmit, selective acknowledgement, and other features)
- Maximum segment size
- Path MTU Discovery
- Black Hole Detection (extension of Path MTU)
- Congestion avoidance
Because of this, I'd highly recommend sticking with TCP/IP if it suits you're needs.
Also not to nit pick, but you're comment about the Internet running on TCP/IP is wrong, there are in fact dozens of Internet routeable protocols check them out here. I think you were referring to web pages and web servers are all running on top of TCP/IP. Which again for the web is great where us humans won't notice a delay as long as the page shows up correctly. Even for TCP/IP, their is some challenge that TCP/IP isn't aggressive enough for the web: Google thinks tcp/ip should be more aggressive by default

Categories

Resources