How to do non blocking socket reads with Protobuf using C#? - c#

Lets say I want to do non blocking reads from a network socket.
I can async await for the socket to read x bytes and all is fine.
But how do I combine this with deserialization via protobuf?
Reading objects from a stream must be blocking? that is, if the stream contains too little data for the parser, then there has to be some blocking going on behind the scenes so that the reader can fetch all the bytes it needs.
I guess I can use lengthprefix delimiters and read the first bytes and then figure out how many bytes I have to fetch minimum before I parse, is this the right way to go about it?
e.g. if my buffer is 500 bytes, then await those 500 bytes, and parse the lengthprefix and if the length is more than 500 then wait again, untill all of it is read.
What is the idiomatic way to combine non blocking IO and protobuf parsing?
(I'm using Jon Skeet's implementation right now http://code.google.com/p/protobuf-csharp-port/)

As a general rule, serializers don't often contain a DeserializeAsync method, because that is really really hard to do (at least, efficiently). If the data is of moderate size, then I would advise to buffer the required amount of data using asynchronous code - and then deserialize when all of the required data is available. If the data is very large and you don't want to have to buffer everything in memory, then consider using a regular synchronous deserialize on a worker thread.
(note that note of this is implementation specific, but if the serializer you are using does support an async deserialize: then sure, use that)

Use the BeginReceive/EndRecieve() methods to receive your data into a byte buffer (typically 1024 or 2048 bytes). In the AsyncCallback, after ensuring that you didn't read -1/0 bytes (end of stream/disconnect/io error), attempt to deserialize the packet with ProtocolBuf.
Your receive callback will be asynchronous, and it makes sense to parse the packet in the same thread as the reading, IMHO. It's the handling that will likely cause the biggest performance hit.

Related

Pipelines buffer preserving until processing is complete

I am researching the possibility of using pipelines for processing binary messages coming from network.
The binary messages i will be processing come with an payload and it is desirable to keep the payload in its binary form.
The idea is to read out the whole message and create a slice of message and its payload, once the message is completely read it will be passed to a channel chain for processing, the processing will not be instant and might take some time or be executed later and the goal is not to have the pipe reader wait until the processing is complete, then once the message processing is complete i would need to release the processed buffer region to the pipe writer.
Now of course i could just create a new byte array and copy the data coming from pipe writer but that would beat the purpose of no-copy? So as i understand i would need some buffer synchronization between the pipeline and the channel?
I observed the available apis (AdvanceTo) of pipe reader where its possible to tell the pipe reader what was consumed and what was examined but cant get around how this could be synced outside of the pipe reading method.
So the question would be whether there are some techniques or examples on how this can be achieved.
The buffer obtained from TryRead/ReadAsync is only valid until you call AdvanceTo, with the expectation that as soon as you've done that: anything you reported as consumed is available to be recycled for use elsewhere (which could be parallel/concurrent readers). Strictly speaking: even the bits you haven't reported as consumed: you still shouldn't treat as valid once you've called AdvanceTo (although in reality, it is likely that they'll still be the same segments - just: that isn't the concern of the caller; to the caller, it is only valid between the read and the advance).
This means that you explicitly can't do:
while (...)
{
var result = await pipe.ReadAsync();
if (TryIdentifyFrameBoundary(out var frame)) {
BeginProcessingInBackground(frame); // <==== THIS IS A PROBLEM!
reader.AdvanceTo(frame.End, frame.End);
}
else if { // take nothing
reader.AdvanceTo(buffer.Start, buffer.End);
if (result.IsCompleted) break; // that's all folks
}
}
because the "in background" bit, when it fires, could now be reading someone else's data (due to it being reused already).
So: either you need to process the frame contents as part of the read loop, or you're going to have to make a copy of the data, most likely by using:
c#
var len = checked ((int)buffer.Length);
var oversized = ArrayPool<byte>.Shared.Rent(len);
buffer.CopyTo(oversized);
and pass oversized to your background processing, remembering to only look at the first len bytes of it. You could pass this as a ReadOnlyMemory<byte>, but you need to consider that you're also going to want to return it to the array-pool afterwards (probably in a finally block), and passing it as a memory makes it a little more awkward (but not impossible, thanks to MemoryMarshal.TryGetArray).
Note: in early versions of the pipelines API, there was an element of reference-counting, which did allow you to preserve buffers, but it had a few problems:
it complicated the API hugely
it led to leaked buffers
it was ambiguous and confusing what "preserved" meant; is the count until it gets reused? or released completely?
so that feature was dropped.

How to read bytes from SerialPort.BaseStream without Length

I want to use the stream class to read/write data to/from a serial port. I use the BaseStream to get the stream (link below) but the Length property doesn't work. Does anyone know how can I read the full buffer without knowing how many bytes there are?
http://msdn.microsoft.com/en-us/library/system.io.ports.serialport.basestream.aspx
You can't. That is, you can't guarantee that you've received everything if all you have is the BaseStream.
There are two ways you can know if you've received everything:
Send a length word as the first 2 or 4 bytes of the packet. That says how many bytes will follow. Your reader then reads that length word, reads that many bytes, and knows it's done.
Agree on a record separator. That works great for text. For example you might decide that a null byte or a end-of-line character signals the end of the data. This is somewhat more difficult to do with arbitrary binary data, but possible. See comment.
Or, depending on your application, you can do some kind of timing. That is, if you haven't received anything new for X number of seconds (or milliseconds?), you assume that you've received everything. That has the obvious drawback of not working well if the sender is especially slow.
Maybe you can try SerialPort.BytesToRead property.

Reading from NetworkStream. Which is better ReadLine() vs. Read into ByteArray?

I am using TcpClient to communicate with a server that sends information in form of "\n" delimited strings. The data flow is pretty high and once the channel is set, the stream would always have information to read from. The messages can be of variable sizes.
My question now is, would it be better to use ReadLine() method to read the messages from the stream as they are already "\n" delimited, or will it be advisable to read byteArray of some fixed size and pick up message strings from them using Split("\n") or such? (Yes, I do understand that there may be cases when the byte array gets only a part of the message, and we would have to implement logic for that too.)
Points that need to be considered here are:
Performance.
Data Loss. Will some data be lost if the client isn't reading as fast as the data is coming in?
Multi-Threaded setup. What if this setup has to be implemented in a multi-threaded environment, where each thread would have a separate communication channel, however would share the same resources on the client.
If Performance is your main concern then I would prefer the Read over ReadLine method. I/O is one of the slower things a program can do so you want to minimize the amount of time in I/O routines by reading as much data up front.
Data loss is not really a concern here if you are using TCP. The TCP protocol guarantees delivery and will deal with congestion issues that result in lost packets.
For the threading portion of the question we're going to need a bit more information. What resources are shared, are they sharing TcpClient's, etc ...
I would say to go with a pool of buffers and doing reads manually (Read() on the socket), if you need a lot of performance. Pooling the buffers would avoid generating garbage, as I believe ReadLine() will generate some.
Since you're usint TCP, data loss should not be a problem.
In the multi-threaded setup, you will have to be specific, but in general, resource sharing is troublesome as it might generate a data race.
Why not use a BufferedStream to ensure you are reading optimally out of the stream:
var req = (HttpWebRequest)WebRequest.Create("http://www.stackoverflow.com");
using(var resp = (HttpWebResponse)req.GetResponse())
using(var stream = resp.GetResponseStream())
using(var bufferedStream = new BufferedStream(stream))
using(var streamReader = new StreamReader(bufferedStream))
{
while(!streamReader.EndOfStream)
{
string currentLine = streamReader.ReadLine();
Console.WriteLine(currentLine);
}
}
Of course, if you're looking to scale, going async would be a necessity. As such, ReadLine is out of the question and you're back to byte array manipulations.
I would read into a byte array.... only downside: limited size. You'd need to know a certain byte amount limit, or flush the byte array into a byte collection manually sometimes, and then transform the collection back into an array of bytes, also transforming it to a string using bitConverter and finally splitting it into the real messages :p
You will have a lot of overhead flushing the array into a collection... BUT, flushing into a string would require more resources, as bytes have to be decoded AS you flush them into the string.... so it's up to you... you can choose simplicity with string or efficiency with bytes, either way it wouldnt exactly be a serious performance boost from each other, but i'd personally go with the byte collection to avoid the implicit byte conversion.
Important: This comes from personal experience from previous TCP socket usage (Quake RCON stuff), not any book or anything :) Correct me if im mistaken please.

C# performance methods of receiving data from a socket?

Let's assume we have a simple internet socket, and it's going to send 10 megabytes (because I want to ignore memory issues) of random data through.
Is there any performance difference or a best practice method that one should use for receiving data? The final output data should be represented by a byte[]. Yes I know writing an arbitrary amount of data to memory is bad, and if I was downloading a large file I wouldn't be doing it like this. But for argument's sake let's ignore that and assume it's a smallish amount of data. I also realise that the bottleneck here is probably not the memory management but rather the socket receiving. I just want to know what would be the most efficient method of receiving data.
A few dodgy ways can think of is:
Have a List and a buffer, after the buffer is full, add it to the list and at the end list.ToArray() to get the byte[]
Write the buffer to a memory stream, after its complete construct a byte[] of the stream.Length and read it all into it in order to get the byte[] output.
Is there a more efficient/better way of doing this?
Just write to a MemoryStream and then call ToArray - that does the business of constructing an appropriately-sized byte array for you. That's effectively what a List<byte> would be like anyway, but using a MemoryStream will be a lot simpler.
Well, Jon Skeet's answer is great (as usual), but there's no code, so here's my interpretation. (Worked fine for me.)
using (var mem = new MemoryStream())
{
using (var tcp = new TcpClient())
{
tcp.Connect(new IPEndPoint(IPAddress.Parse("192.0.0.192"), 8880));
tcp.GetStream().CopyTo(mem);
}
var bytes = mem.ToArray();
}
(Why not combine the two usings? Well, if you want to debug, you might want to release the tcp connection before taking your time looking at the bytes received.)
This code will receive multiple packets and aggregate their data, FYI. So it's a great way to simply receive all tcp data sent during a connection.
What is the encoding of your data? is it plain ASCII, or is it something else, like UTF-8/Unicode?
if it is plain ASCII, you could just allocate a StringBuilder() of the required size (get the size from the ContentLength header of the response) and keep on appending your data to the builder, after converting it into a string using Encoding.ASCII.
If it is Unicode/UTF8 then you have an issue - you cannot just call Encoding..GetString(buffer, 0, bytesRead) on the bytes read, because the bytesRead might not constitute a logical string fragment in that encoding. For this case you will need to buffer the entire entity body into memory(or file), then read that file and decode it using the encoding.
You could write to a memory stream, then use a streamreader or something like that to get the data. What are you doing with the data? I ask because would be more efficient from a memory standpoint to write the incoming data to a file or database table as the data is being received rather than storing the entire contents in memory.

Send serialised object via socket

Whats the best way to format a message to a server, at moment I'm serilising an object using the binaryformatter and then sending it to the server.
At the server end its listening in an async fashion and then when the buffer size recieved is not 100% it assumes that the transfer has complete.
This is working and the moment, and I can deserialise the object at the other end, I'm just concerned that if I start sending async this method will fail has message's could be blurred.
I know that I need to mark the message somehow as to say that's the end of message one, this other bit belongs to message 2, but I'm unsure of the correct way to do this.
Could anyone point me in the right direction and maybe give me some examples?
Thanks
You could always serialize it to a memory stream, see how big it is, send the length as a 4-byte binary number then send the stream's contents.
On the other side you can just sit and wait for 4 bytes, combine them into an integer then sit and wait for that number of bytes.
When your read function returns (make them blocking reads), you know you have the entire object into a buffer, so you just deserialize it and cast it to your common interface type.
edit: this is the answer to your specific question. This being said you would be better off using a higher-level library than pure tcp.
If your object has a fixed length you can received specified amount of bytes on the other end and then create your object.
Otherwise you can send delimiters (symbol or sequence of symbols that you are not using in your object) between your objects and keep reading received data byte by byte until you see the delimiter.
You might want to take a look at Protocol Buffers (http://code.google.com/p/protobuf), a (implementation language independent) data interchange format/framework. There exist at least two .NET implementations for it (see http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns).

Categories

Resources