We are currently investigating the most efficient way of communicating between 120-140 embedded hardware devices running on the .NET Micro framework and a server.
Each embedded device needs to send to, and request information from the server on a fairly regular basis all in real time through TCP.
My question is this: Would it be better to initialise 140 TCP connections to the server, and then hang on to these connections, or initialise a new connection for each requests to and from the devices? Would holding on to and managing 140 TCP connections put a lot of strain on the server?
When the server detects new data in the database it needs to send this new info to 1..* devices (information is targeted to specific devices), if I held on to the 140 connections I would need to do a lookup for the correct connection each time I needed to send information instead of just sending to an IP:PORT associated with the new data.
I guess another possibly stupid question would be is it actually possibly to hang on to 140 TCP connections on a single port?
Any suggestions/comments are appreciated!
In general you are better maintaining the connections for as long as possible. If you have each device opening a connection each time it sends a message you can end up effectively DoS'ing the server as it ends up with lots of sockets in the TIME_WAIT state taking up space in it's tables.
I worked on a system where there were a bunch of clients talking to a server and while they could be turned on and off regularly, it was still better to maintain the connection (and re-establish it when it had dropped and a new message needed to be sent). You may end up needing to write slightly more complex code, but I've found it to be well worth the effort for the reduced load on the server.
Modern operating systems may have bigger buffers than the ones I actually encountered the DoS effect on, but it's fundamentally not the best idea to be using lots of connections like that.
Things can get relatively complicated on the client side, especially when the device tends to go to sleep transparently to the application because that means connections will time out while the app thinks they are still open. When we did this we ended up with relatively complex network code because we needed to deal with the fact that the sockets could (and would) fail as a matter of course and we simply needed to setup a new connection and re-attempt sending the message. You just tuck this code away into your libraries and forget about it once it's done though.
In actual fact in practice our initial application had even more complex code because it was dealing with a network library that was semi-aware of the stop start nature of the devices and tried to resend failed messages, sometimes meaning that the same message got sent twice. We ended up doing an extra layer of communication on top in order to ensure duplicates got rejected. If you're using C# or regular BSD style sockets you shouldn't have that problem though I'm guessing. This was a proprietary library that managed the reconnects but caused headaches with the resends and it's inappropriate default time-outs.
You usually can connect much more than 140 "clients" to a server (that is with decent network / HW / RAM)...
I recommend always to test this sort of thing with real scenarios (load etc.) to decide since there are aspects like network (performance, stability...), HW (server RAM etc.) and SW (what does the server exactly do?) that can only be checked by you.
Depending on the protocol you could/should even put some timeout/reconnect mechanism in there.
The lookup you mean would be really fast - just use ConcurrentDictionary to hold the needed information with IP:PORT as the key (assuming the server runs on a full .NET 4).
For some references see:
http://msdn.microsoft.com/en-us/library/dd287191.aspx
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/02/17/c.net-little-wonders-the-concurrentdictionary.aspx
EDIT - as per comments:
Holding on to a TCP/IP connection doesn't take much processing client-side... it costs a bit of memory. I would recommend to do a small test (1-2 clients) to check this assumption for your specific case.
If you are talking about a system with hardware devices then I suggest to go with closing the connection every time the client finishes sending data.
To make sure the client gets some update from the server, the client can wait for a 5 second period for any data to arrive from the server. If the data is received within/before this timeframe, then close the connection and process the data. If not, close the connection and wait after sending next set of data.
This way scaling becomes much easier. Keeping the connections open always leads to strain on the resources and in my opinion is not necessary unless it is some life-saving device like heart rate monitor, oxygen supply monitor etc.,
Related
I have two PCs connected by direct Ethernet cable over 1Gbps link. One of them act as TCP Server and other act as TCP Client/s. Now I would like to achieve maximum possible network throughput between these two.
Options I tried:
Creating multiple clients on PC-1 with different port numbers, connecting to the TCP Server. The reason for creating multiple clients is to increase the network throughput but here I have an issue.
I have a buffer Queue of Events to be sent to Server. There will be multiple messages with same Event Number. The server has to acquire all the messages then sort the messages based on the Event number. Each client now dequeues the message from Concurrent Queue and sends to the server. After sent, again the client repeats the same. I have put constraint on the client side that Event-2 will not be sent until all messaged labelled with Event-1 is sent. Hence, I see the sent Event order correct. And the TCP server continuously receives from all the clients.
Now lets come to the problem:
The server is receiving the data in little random manner, like I have shown in the image. The randomness between two successive events is getting worse after some time of acquisition. I can think of this random behaviour is due to parallel worker threads being executed for IO Completion call backs.
technology used: F# Socket Async with SocketEventArgs
Solution I tried: Instead of allowing receive from all the clients at server side, I tried to poll for the next available client with pending data then it ensured the correct order but its performance is not at all comparable to the earlier approach.
I want to receive in the same order/ nearly same order (but not non-deterministic randomness) as being sent from the clients. Is there any way I can preserve the order and also maintain the better throughput? What are the best ways to achieve nearly 100% network throughput over two PCs?
As others have pointed out in the comments, a single TCP connection is likely to give you the highest throughput, if it's TCP you want to use.
You can possibly achieve slightly (really marginally) higher throughput with UDP, but then you have the hassle of recreating all the goodies TCP gives you for free.
If you want bidirectional high volume high speed throughput (as opposed to high volume just one way at a time), then it's possible one connection for each direction is easier to cope with, but I don't have that much experience with it.
Design tips
You should keep the connection open. The client will need to ask "are you still there?" at regular intervals if no other communication goes on. (On second thought, I realize that the only purpose of this is to allow quick reponse and the possiblity for the server to initiate a message transaction. So I revise it to: keep the connection open for a full transaction at least.)
Also, you should split up large messages - messages over a certain size. Keep the number of bytes you send in each chunk to a maximum round hex number, typically 8K, 16K, 32K or 64K on a local network. Experiment with sizes. The suggested max sizes has been optimal since Windows 3 at least. You need some sort of protocol with a chunck consisting of a fixed header (typically a magic number for check and resynch, a chunk number also for check and for analysis, and a total packet length) followed by the data.
You can possibly further improve throughput with compression (usually low quick compression) - it depends very much on the data, and whether you're on a fast or slow network.
Then there's this hassle that one typically runs into - problems with the Nagle algorith - and I no longer remember enough of the details there. I believe I used to overcome that by sending an acknowledgement in return for each chunk sent, and I suspect by doing that you satisfy the design requirements, and so avoid waiting for the last bytes to come in. But do google this.
Requirements:
The need is for a windows service based C# .NET 4.5 always (at least long) connected TCP Server architecture with vertical and horizontal scaling and each server may handle max possible connections. Clients can be any IoT (internet of things).
I am aware of the limitations on ports but still wonder why these limitations in this era of tech (we always have limits but why still the old ones?!). Also temporary tcp/http connections will scale fine but not a requirement here.
Design:
Single thread per server for async-accept new connections (lifetime of server).
code: rawTcpClient = await tcpListener.AcceptTcpClientAsync();
One thread per client connection (loop) to hold client connection ?
(see my Q below)
a Task for performing client operation (short term, intermittent
operations)
my Question on optimization (if Possible?):
How can I optimize/manage to hold all the client connections in a set of threads/threadpool instead of one thread per connection since this is client-lifetime which may be for a long duration?
Ex: per server, only 50 threads based tasks allocated to hold connected clients so that they don't get disconnected, while waiting for client data?
Efficient & Scalable
The very first thing you need to decide is how efficient you want to be. The socket APIs can get extremely complex if efficiency is your top priority. However, efficiency is almost never the top priority, even though a lot of people think it is. The problem is that complexity can increase exponentially with efficiency/scalability, and if you simply maximize efficiency/scalability, you'll end up with an almost unmaintainable system. So you'll need to decide where to draw the line on that scale.
Particularly if you have horizontal scaling, you probably don't need to use the extreme-efficiency socket APIs.
I am aware of the limitations on ports but still wonder why these limitations in this era of tech (we always have limits but why still the old ones?!).
Compatibility. Ports in particular are represented by a 16-bit value. The only way this would change is if a new standard came out, and everything upgraded. NICs, gateways, ISPs, and IoT devices. That's a high order, and will probably never happen.
Single thread per server for async-accept new connections (lifetime of server).
That's fine. If you have a large amount of connection turnover, you can have multiple accept threads, too. Just keep your backlog high (it should be high by default on Windows Server OSes).
One thread per client connection (loop) to hold client connection?
Er, no.
You'll want to use asynchronous I/O, for sure.
You should have continuous (asynchronous) reads going on all connected clients, and then do (asynchronous) writes as necessary. Also, if the protocol permits it, you should periodically write heartbeat messages to each connected client; otherwise, you'll need a timer for each client to drop the connection. Depending on the nature of your writes, you may need to have a queue of pending writes per client.
a Task for performing client operation (short term, intermittent
operations)
If you use asynchronous tasks, then all your actual code will just run on whatever threadpool thread is available. No need for dedicated tasks at all.
You may find my TCP/IP .NET Sockets FAQ helpful.
Is there any way (preferably in C#) how to regularly measure connection layer latency (roundtrip) without changing the application protocol and without creating separate dedicated connection - e.g. using some similar SYN-ACK trick like tcping do but without closing/opening connection?
I'm connecting to the servers via given ASCII based protocol (and always using TCP_NODELAY). Servers send me large amount of discrete messages and I'm regularly sending 'heartbeat' payload (but there is no response payload to the heartbeat).
I cannot change the protocol and in many cases I also cannot create more than one physical connection to the server.
Keep in mind that TCP does windowing, so this could cause issues when trying to implement an elegant SEQ/ACK solution. (you would want sequence, not synchronize)
[EDIT: Snipped a very overcomplicated and confusing explaination.]
I'd have to say the best way is to use a simple stopwatch method of starting a timer, making a very thin request or poll, and measure the time back from it. If that query really is the lightest you can make it, then that should give you the minimum amount of time you can reasonably expect to wait, which sometimes more valuable than the ping (which can be misleading).
If you really absolutely need just the network time to machine and back, just use an ICMP ping.
Speed, optimization, and scalability are the typical comparisons between the Udp and Tcp protocols. Tcp touts reliability with the disadvantage of a little extra overhead, but speed is good to excellent. Once a Tcp socket is instanced, keeping the socket open requires some overhead. But compared to the oft described burdens of Udp, which protocol actually has more overhead?. I've also heard that there are scalability issues with Tcp...yet the Internet (Web pages/servers) runs on Tcp - so what is it about Tcp that inhibits scalability?
Okay...so Udp doesn't require that overhead of keeping a connection open. But, it requires that you write extra methods to ensure all of the packet gets there, hopefully in the order that you want it received. If a packet isn't received in full, then you have to tell the client or server to resend. And you also have to keep some sort of message collection for partial packets, rebuild the partial messages, and check for a complete message before the message can finally be processed. Not to mention if the second part of a message never makes it, you have to either say resend the entire thing, or resend the part we are missing, or whatever.
Basically, my questions are:
Why would I choose Udp over Tcp for a serious, high-performance server with the added "overhead" of message
checking and manual ACK versus the "overhead" of a continuous stream?
If Tcp is good enough for the likes of World of Warcraft, why isn't Tcp more widely accepted as the protocol to use for a game server?
Note: I am not opposed to implementing Udp options for a server. We are using C# on .Net 3.5 framework. So I would also be interested in the best practices for dealing with Udp burdens. I am also using the asynchronous methods at the socket level rather than using TcpListener, TcpClient, etc. etc.
Well, I would recommend reading up some more. There are plenty places to look at the pro's and con's of TCP vs. UDP and vice versa, here are a few:
What Are The Advantages Of Using TCP Over UDP?
When should I use UDP instead of TCP?
TCP and UDP
What are the advantages of UDP over TCP?
However, this link may interest you the most, as it is directly about networked game programming:
Gaffer on Games - UDP vs. TCP
If I were to quote something small:
The decision seems pretty clear then,
TCP does everything we want and its
super easy to use, while UDP is a huge
pain in the ass and we have to code
everything ourselves from scratch. So
obviously we just use TCP right?
Wrong.
Using TCP is the worst possible
mistake you can make when developing a
networked game! To understand why, you
need to see what TCP is actually doing
above IP to make everything look so
simple!
I still recommend doing your own research on the matter though, and make sure which of the protocols suits your needs at the end of the day. This being said, it does seem to be the case that majority of games use UDP for their data. Anything that updates the entire state continuously does not need the overhead of guaranteed packet delivery.
First, I'll just paraphrase Stevens from Unix Network Programming Section 22.4 "When to Use UDP instead of TCP":
He basically says the following:
UDP is the only option for broadcast / multicast - so you have to use it there.
UDP can be used for simple request / reply apps. But you have to add your own error detection meaning at least acks, timesouts and retransmission.
UDP should not be used for bulk data transfer ( file transfers ) since you would have to build in all the functionality arleady in TCP to make it work right.
UDP should be used for real time data where speed of delivery is most important and some data loss is not an issue such as real time sensor data, live multimedia streams, real time stock quotes, etc.
The answer to your first question is very dependent on your definition of "high-performance". If you're primary concern is low latency, i.e. the individual data packets / requests arriving as quickly as possible than UDP would be the way to go. There are two primary reasons for this. Assuming packets / requests are fairly independent of each other than using TCP would introduce a problem known as head-of-line blocking.
Let's say you send two independent packets / requests. First A then B. Since TCP is stream based, if A get's lost in the network and needs to be retransmitted then even if B has already successfully arrived it can't be delivered to the application by the stack until A arrives, introducing unnecessary latency. Not only that, but until A arrives, B can't be acknowledged by the stack which might cause B to also be retransmitted causing needless network congestion.
One way around this problem is to use separate connections for each request, however this also introduces latency and hogs system resources. UDP bypasses all these problems.
Another issue in high performance ( low latency ) servers is the Nagle Algorithm which can add significant latency in TCP communications.
The answer to your second question is that WoW probably sends streams of data, not independent request / reply pairs. Also, some of the latency of TCP can be removed by disabling the Nagle algorithm. If they do use some request / reply communications they may have simply made a design decision that reliability is more important to them than latency.
Define "serious high performance" - how many concurrent connections are you talking about and how much data is flowing?
Take a look at the answers to this question What do you use when you need reliable UDP? which list some of the reliable protocols that have already been built on UDP. You might find one that works for your situation, or you may at least find some useful ideas.
The key to using UDP effectively here is to have some level of reliability and some level of unreliability and you get more of an advantage the more each datagram is able to be handled independently of others. The advantage over TCP is that you get to act on each datagram and decide if you can use it as it arrives. This is why it works for action games.
So, IMHO, if you need 100% reliability AND in order delivery then go with TCP; don't try and reimplement TCP in UDP.
It's Reliability vs Performance.
FPS games don't require -all- the packets to reach the destination, to reach it in order, to be exceptionally big, or to assure big throughput. They only require the packets to reach the server AS SOON AS POSSIBLE. This is the ultimate priority and overhead of TCP is simply an unnecessary burden.
WoW, in its "not quite realtime" communication and often tons of data to transmit (in crowded areas), may have to deal with packets exceeding MTU (requiring fragmentation) and requires reliability (fewer bigger packets = packet lost hurts more). So its choice of TCP is logical. Same would go for most turn-based strategy games and the like. In games where the player with ping of 30ms beats the player with ping 50ms UDP is the king.
I think the biggest part of TCP/IP that inhibits scalability is that it maintains a buffer on all incoming / outgoing connections up to basically the size of the window. So if I have a high latency but high throughput client i'm talking to, I have to keep all sent packets in buffer until I receive an ack. So for a few connections this is fine, but for handling 100K connections, it can start to be problematic overhead. On the receiving end, if a packet is dropped, again it will buffer all new packets received until the one required is retransmitted.
If you're going to implement retransmission, you need to do the same thing, and hence will have the same overhead. However, UDP does give you an advantage, if you know the end-to-end link speeds, or if certain message can be delivered out of order, or certain messages don't need retransmission. Keeping the gaming scenario:
packet 1 = move to 1,1
packet 2 = shoot
packet 3 = move to 2,2
Most game designers, if packet 1 is lost, but packet 3 is received, packet 1 is no longer important because it contains out of date information anyways. However, you could opt to say packet 2 is important, so if it's not acked, send a retransmission.
If you need high throughput, and connect two servers directly with 1000Mbps ethernet, TCP/IP will take awhile to scale and have additional overhead, and will likely never achieve a true gigabit connection due to the congestion avoidance mechanisms. However, you know it's 1 Gbps, so you can set up you're UDP to transmit at up to a 1 Gbps (minus overhead) yourself.
To answer you're questions more directly:
If you are going to ack every packet anyways, there isn't a massive benefit to having UDP, other than you can process some messages while waiting for retransmission (unless you want in-order delivery as well).
Udp isn't considered for game servers as much, mainly out of the scenario above, and real time combat systems such as First person shooters, where a message can be dropped, and the new message to come will invalidate the dropped message anyways. World of warcraft can get away with using TCP, since they don't have to be as precise with timing, and likely have some good logic that makes it more difficult for you to tell the difference anyways. The combat system simply doesn't require the speed.
I'd also contend that some of the justification is holdover from years ago, when everyone had less-reliable, and slower Internet connections. TCP is also more lenient for sharing the network, so if there's a lot going on, it will slow down so everyone gets a share of the connection (congestion avoidance).
TCP/IP is a protocol designed by people far smarter than I over years of research. Tuning in the last several years has allowed it to perform better with the faster and faster average network speeds we are seeing, and doesn't require a great understanding to use.
However, replacing this with UDP, does require a significant understanding of networking. I've seen badly written UDP programs saturate 1Gbps links and kill all traffic on the link, because they implemented a rather naive retransmission algorithm.
Here's a list of things TCP/IP can now do that you'd loose by going UDP:
- In order arrival to you're program
- retransmission (Now with Fast retransmit, selective acknowledgement, and other features)
- Maximum segment size
- Path MTU Discovery
- Black Hole Detection (extension of Path MTU)
- Congestion avoidance
Because of this, I'd highly recommend sticking with TCP/IP if it suits you're needs.
Also not to nit pick, but you're comment about the Internet running on TCP/IP is wrong, there are in fact dozens of Internet routeable protocols check them out here. I think you were referring to web pages and web servers are all running on top of TCP/IP. Which again for the web is great where us humans won't notice a delay as long as the page shows up correctly. Even for TCP/IP, their is some challenge that TCP/IP isn't aggressive enough for the web: Google thinks tcp/ip should be more aggressive by default
I have a series of systems on a LAN running a synchronized display routine. For example, think of a chorus line. The program they ran is fixed. I have each "client" download the entire routine, and then contact the central "server" at fixed points in the routine for synchronization. The routine itself is mundane with, perhaps, 20 possible instructions.
Each client runs the same routine, but they can be doing completely different things at any one time. One part of the chorus line can be kicking left, another part kicking right, but all in time with each other. Clients can join and drop out at any time, but they're all assigned a part. If no-one is there to run the part, it just doesn't get run.
This is all coded in C# .Net.
The client display is a Windows Forms application. The server accepts TCP connections, and then services them round-robin fashion, keeping a master clock of what's going on. The clients send a signal that says "I've reached sync-point 32" (or 19, or 5, or whatever) and waits for the server to acknowledge and then moves on. Or the server can say "No, you need to start at sync-point 15".
This all works great. There is a minor bit of delay between the first and last clients to hit a sync-point, but it's hardly noticeable. Ran for months.
Then the Specification changed.
Now the clients need to respond to near real-time instructions from the server -- it's no longer a pre-set dance program. The server is going to be sending instructions out and the dance program is made up on the fly. I get the fun job of re-designing the protocol, the servicing loops, and the programming instructions.
My toolkit includes anything in a standard .Net 3.5 toolbox. Installing new software is a pain in the arse, since so many systems (clients) can be involved.
I'm looking for suggestions on keeping the clients synced (some sort of latching system? UDP? Broadcast?), distribution of the "dance program", anything that might make this easier than a traditional Client/Server TCP arrangement.
Keep in mind that there are time/speed limitations going on as well. I could put the dance program in a network database, but I'd have to shove instructions in fairly quickly and there'd be a lot of readers using a rather thick protocol (DBI, SqlClient, etc..) to get a small bit of text. That seems overly complex. And I still need something to keep them all displaying in sync.
Suggestions? Opinions? Wild-ass speculation? Code examples?
PS: Answers may not get marked as "correct" (since this isn't a "correct" answer), but +1 votes for good suggestions for sure.
I did something similar (quite a while back) with synchronizing a bank of 4 displays, each run by a single system, receiving messages from a central server.
The architecture we finally settled on after a fair amount of testing involved having one "master" machine. In your case, this would be having one of your 20 clients that acts as the master, and have it connect to the server via TCP.
The server then would send the entire series of commands for the series through to that one machine.
That machine then used UDP to broadcast real-time instructions to each of the other machines (the 19 other clients on its LAN) to keep their displays up to date. We used UDP for a couple reasons here - there was lower overhead involved, which helped keep the total resource usage down. Also, since you're updating in real-time, if one or two "frames" was out of sync, it was never noticable, at least not noticeable enough for our purposes (having a human sitting and interacting with the system).
The key point to this working smoothly, though, is having an intelligent communication means between the main server and the "master" machine - you want to keep the bandwidth as low as possible. In a case like yours, I'd probably come up with a single binary blob that had the current instruction set for the 20 machines, in its smallest form. (Maybe something like 20 bytes, or 40 bytes if you need it, etc). The "master" machine would then worry about translating this out to the other 19 machines and itself.
There are some nice things about this - the server has a much easier time transmitting to one machine in the cluster instead of every machine in the cluster. This let us, for example, have one single, centralized server "drive" multiple clusters efficiently, without having ridiculous hardware requirements anywhere. It also keeps the client code very, very simple. It just has to listen for a UDP datagram and do whatever it says - in your case, it sounds like it would have one of 20 commands, so the client becomes very, very simple.
The "master" server is the trickiest. In our implementation, we actually had the same client code on it as the other 19 (as a separate proces) and one "translation" process that took the blob, broke it into 20 pieces, and transmitted them. It was fairly simple to write, and worked very well.