I am writing a client for a server that typically sends data as strings in 500 or less bytes. However, the data will occasionally exceed that, and a single set of data could contain 200,000 bytes, for all the client knows (on initialization or significant events). However, I would like to not have to have each client running with a 50 MB socket buffer (if it's even possible).
Each set of data is delimited by a null \0 character. What kind of structure should I look at for storing partially sent data sets?
For example, the server may send ABCDEFGHIJKLMNOPQRSTUV\0WXYZ\0123!\0. I would want to process ABCDEFGHIJKLMNOPQRSTUV, WXYZ, and 123! independently. Also, the server could send ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890LOL123HAHATHISISREALLYLONG without the terminating character. I would want that data set stored somewhere for later appending and processing.
Also, I'm using asynchronous socket methods (BeginSend, EndSend, BeginReceive, EndReceive) if that matters.
Currently I'm debating between List<Byte> and StringBuilder. Any comparison of the two for this situation would be very helpful.
Read the data from the socket into a buffer. When you get the terminating character, turn it into a message and send it on its way to the rest of your code.
Also, remember that TCP is a stream, not a packet. So you should never assume that you will get everything sent at one time in a single read.
As far as buffers go, you should probably only need one per connection at most. I'd probably start with the max size that you reasonably expect to receive, and if that fills, create a new buffer of a larger size - a typical strategy is to double the size when you run out to avoid churning through too many allocations.
If you have multiple incoming connections, you may want to do something like create a pool of buffers, and just return "big" ones to the pool when done with them.
You could just use a List<byte> as your buffer, so the .NET framework takes care of automatically expanding it as needed. When you find a null terminator you can use List.RemoveRange() to remove that message from the buffer and pass it to the next layer up.
You'd probably want to add a check and throw an exception if it exceeds a certain length, rather than just wait until the client runs out of memory.
(This is very similar to Ben S's answer, but I think a byte array is a bit more robust than a StringBuilder in the face of encoding issues. Decoding bytes to a string is best done higher up, once you have a complete message.)
I would just use a StringBuilder and read in one character at a time, copying and emptying the builder whenever I hit a null terminator.
I wrote this answer regarding Java sockets but the concept is the same.
What's the best way to monitor a socket for new data and then process that data?
Related
I'm writing a simple chat program using sockets. When I'm sending a long message, flush the stream and a short message afterwards, the end of the long message gets appended to the short message. It looks like this:
Send "aaasdsd"
Recieve "aaasdsd"
Send "bb"
Recieve "bbasdsd"
Through debugging I've found that the Flush method, that's supposed to clear all data from the stream, does not do that. According to mdsn, it is the expected behaviour, because NetworkStream is not bufferized. How do I clear the stream in that case? I could just follow every message with an empty (consisting of \0 chars) one of the same length, but I don't think it's correct to do that, also, it would screw up some features I need.
TCP doesn't work this way. It's as simple as that.
TCP is a stream-based protocol. That means that you shouldn't ever treat it as a message-based protocol (unlike, say, UDP). If you need to send messages over TCP, you have to add your own messaging protocol on top of TCP.
What you're trying to do here is send two separate messages, and receive two separate messages on the other side. This would work fine on UDP (which is message-based), but it will not work on TCP, because TCP is a stream with no organisation.
So yeah, Flush works just fine. It's just that no matter how many times you call Flush on one side, and how many times you call individual Sends, each Receive on the other end will get as much data as can fit in its buffer, with no respect to the Sends on the other side.
The solution you've devised (almost - just separate the strings with a single \0) is actually one of the proper ways to handle this. By doing that, you're working with messages on top of the stream again. This is called message framing - it allows you to tell individual messages apart. In your case, you've added delimiters between the messages. Think about writing the same data in a file - again, you'll need some way of your own to separate the individual messages (for example, using end lines).
Another way to handle message framing is using a length prefix - before you send the string itself, send it's length. Then, when you read on the other side, you know that between the strings, there should always be a length prefix, so the reader knows when the message ends.
Yet another way isn't probably very useful for your case - you can work with fixed-length data. So a message will always be exactly 100 bytes, for example. This is very powerful when combined with pre-defined message types - so message type 1 would contain exactly two integers, representing some coördinates, for example.
In either case, though, you'll need your own buffering on the receiving end. This is because (as you've already seen) a single receive can read multiple messages at once, and at the same time, it's not guaranteed to read the whole message in a single read. Writing your own networking is actually pretty tricky - unless you're doing this to actually learn network programming, I'd recommend using some ready technology - for example, Lindgren (a nice networking library, optimized for games but works fine for general networking as well) or WCF. For a chat system, simple HTTP (especially with the bi-directional WebSockets) might be just fine as well.
EDIT:
As Damien correctly noted, there seems to be another problem with your code - you seem to be ignoring the return value of Read. The return value tells you the amount of bytes you've actually read. Since you have a fixed-size persistent buffer on the receiving side (apparently), it means that every byte after the amount you've just read will still contain the old data. To fix this, just make sure you're only working with as much bytes as Read returned. Also, since this seems to indicate you're ignoring the Read return value altogether, make sure to properly handle the case when Read returns 0 - that means the other side has gracefully shutdown its connection - and the receiving side should do the same.
I am working on a network application that can send live video feed asynchronously from one application to another, sort of like Skype. The main issue I am having is that I want to be able to send the frames but not have to know their size each time before receiving.
The way AForge.NET works when handling images is that the size of the current frame will most likely be different than the one before it. The size is not static so I was just wondering if there was a way to achieve this. And, I already tried sending the length first and then the frame, but that is not what I was looking for.
First, make sure you understand that TCP itself has no concept of "packet" at all, not at the user code level. If one is conceptualizing one's TCP network I/O in terms of packets, they are probably getting it wrong.
Now that said, you can impose a packet structure on the TCP stream of bytes. To do that where the packets are not always the same size, you can only transmit the length before the data, or delimit the data in some way, such as wrapping it in a self-describing encoding, or terminating the data in some way.
Note that adding structure around the data (encoding, terminating, whatever) when you're dealing with binary data is fraught with hassles, because binary data usually is required to support any combination of bytes. This introduces a need for escaping the data or otherwise being able to flag something that would normally look like a delimiter or terminator, so that it can be treated as binary data instead of some boundary of the data.
Personally, I'd just write a length before the data. It's a simple and commonly used technique. If you still don't want to do it that way, you should be specific and explain why you don't, so that your specific scenario can be better understood.
I wasn't quite sure how to explain my problem in the title, but I'll try to elaborate on my problem.
Basically I'm coding a chat that is not P2P, but where all users connect to a central server, similar to IRC. The connections are asynchronous and it almost works flawlessly. The main issue is that, when a lot of data is sent to one user (or to the server from one user) at once, the bytes may merge, resulting in errors. I've approached this by adding a header of 4 bytes containing the length of the data in front of the rest of the data. Still, the bytes seem to merge. I've also tried setting NoDelay to true and DontFragment to false; still, it doesn't work.
I'm guessing the problem is that when the bytes merge, I only handle the first bytes and then do nothing with the remaining. What would be the best way to approach this issue?
Receive callback code: http://pastebin.com/f0MvjHag
That's why they call it a stream. You put bytes in at one end and TCP guarantees they come out in the same order, none missing or duplicated, at the far end. Anything bigger than a byte is your problem.
You have to accumulate enough bytes in a buffer to have your header. Then interpret it and start processing additional bytes. You may have a few left over that start the next header.
This is normal behavior. When your application is not receiving data the system will be buffering it for you. It will try to hand off the available data the next time you make a request. On the other side, a large write may travel over connections that do not support an adequate frame size. They will be split as needed and arrive eventually in dribs and drabs.
This usually happens when two or more packets of data are sent at close intervals.
I recently had this problem myself, and the way I resolved it was to a separating key. You can then tokenize each message. For example, you could add the ASCII character #4 (the End-of-Transmission character) to the end of each message being sent like I did.
Write("Message1" + ((char)4).ToString())
Write("Message2" + ((char)4).ToString())
Then, when the client receives the data, you can iterate through the received data. When it finds that special character, it knows it's the end of one message, and (maybe) the beginning of a new one.
"Message1(EOT char)Message2(EOT char)"
\n may be easier to work with than using ASCII characters.
C# socket server, which has roughly 200 - 500 active connections, each one constantly sending messages to our server.
About 70% of the time the messages are handled fine (in the correct order etc), however in the other 30% of cases we have jumbled up messages and things get screwed up. We should note that some clients send data in unicode and others in ASCII, so that's handled as well.
Messages sent to the server are a variable length string which end in a char3, it's the char3 that we break on, other than that we keep receiving data.
Could anyone shed any light on our ProcessReceive code and see what could possibly be causing us issues and how we can solve this small issue (here's hoping it's a small issue!)
Code below:
Firstly, I'm sure you know, but it's always worth repeating; TCP is a stream of bytes. It knows nothing of any application level "messages" that you may determine exist in that stream of bytes. All successful socket Recv calls, whether sync or async, can return any number of bytes between 1 and the size of the buffer supplied.
With that in mind you should really be dealing with your message framing (i.e. looking for your delimiter) before you do anything else. If you don't find a delimiter then simply reissue the read using the same SocketAsyncEventArgs, the same buffer and set the offset to where you currently are, this will read some more data into the buffer and you can take another look for the delimiter once the next read has completed... Ideally you'd keep track of where you last got to when searching for a delimiter in this buffer to reduce repeated scanning...
Right now you're not doing that and your use of e.Buffer[e.Offset] == 255 will fail if you get a message that arrives in pieces as you could be referring to any byte in the message if the message is split over multiple reads.
The problem I am seeing is that you are calling Encoding.Unicode.GetString() on a buffer you received in the current read from socket. However, the contents of that buffer might not be a valid unicode encoding of a string.
What you need to do is to buffer your entire stream, and then decode it as a string in one final operation, after you have received all the data.
I'm writing a program that will have both a server side and a client side, and the client side will connect to a server hosted by the same program (but by another instance of it, and usually on another machine). So basically, I have control over both aspects of the protocol.
I am using BeginReceive() and BeginSend() on both sides to send and receive data. My question is if these two statements are true:
Using a call to BeginReceive() will give me the entire data that was sent by a single call to BeginSend() on the other end when the callback function is called.
Using a call to BeginSend() will send the entire data I pass it to the other end, and it will all be received by a single call to BeginReceive() on the other end.
The two are basically the same in fact.
If the answer is no, which I'm guessing is the case based on what I've read about sockets, what is the best way to handle commands? I'm writing a game that will have commands such as PUT X Y. I was thinking of appending a special character (# for example) to the end of each command, and each time I receive data, I append it to a buffer, then parse it only after I encounter a #.
No, you can't expect BeginReceive to necessarily receive all of the data from one call to BeginSend. You can send a lot of data in one call to BeginSend, which could very well be split across several packets. You may receive each packet's data in a separate receive call.
The two main ways of splitting a stream into multiple chunks are:
Use a delimiter (as per your current suggestion). This has the benefit of not needing to know the size beforehand, but has the disadvantage of being relatively hard to parse, and potentially introducing requirements such as escaping the delimiter.
Prepend the size of each message before the message. The receiver can read the length first, and then know exactly how much data to expect.