Serialization/Deserialization strategy for byte stream of C structs in C#

Serialization/Deserialization strategy for byte stream of C structs in C# - c#

I'm working on a Client app, written in C#, which should communicate with a legacy app (let's call it a server). The problem is that the server's API is represented as a bunch of plain old C structs. Every struct has a 4-byte header and the data that follows it. It's just a byte stream.
I understand that this particular binary format is unique (given by the legacy server app). Because of that, it's not possible to use any SerDes libraries like Protocol Buffers, which uses its way of encoding binary data.
Is there any project/library for binary serialization that allows me to specify the type of message (like protobuff does) and its binary format? Every library I've seen was based either on JSON, XML, or proprietary binary format.
Suppose I would decide to write my own SerDes library (in C#). What would be the best/recommended strategy for doing this? I want to do it the professional way, at least once in my life. Thanks!
PS: We're talking about little-endian only.
This is how the server defines a message:
struct Message1
{
byte Size; // Header 1st byte
byte Type;
byte ReqI;
byte Zero; //Header 4th byte.
word UDPPort; // Actual data starts here.
word Flags;
byte Sp0;
byte Prefix;
word Interval;
char Admin[16];
char IName[16];
};

It sounds like you have fixed sized c structs being sent via a socket connection and you need to interpret those into handy C# classes.
The easiest way may be to do all the message handling in code written in managed C++. In that you’d have a structure that, possibly with a bunch of pragmas, I’m sure could be made to have the same memory layout as the structure being sent through the socket. You would then also define a similar managed c++ class (eg containing managed strings instead of char arrays). You would also write code that converts the struct field by field into one of these managed classes. Wrap the whole thing up in a DLL, and include it in your C# project as a dependency.
The reason for this is because managed C++, weird though it is as a language, is a far easier bridge between unmanaged and managed code and data structures. There’s no need to marshall anything, this is done for you. I’ve used his route to create libraries that make calls into Windows’ hardware discovery facilities, for which there isn’t (or wasn’t) any pre-existing C# library. Using managed C++ code to call the necessary win32 functions was far easier than doing the same thing from C#.
Good luck!

Related

C++ class via UDP to be used in C#, what are the options?

I am receiving data via UDP from a C/C++ application. This application is doing a memcpy of the class into a buffer and throwing it our way. Our application is written in C# and I need to somehow make sense of the data. We have access to the header files of the structures - everything is basically a struct or an enum. We can't change the format the data comes in and the header files are likely to change fairly often.
I have considered re-writing our comms classes in C++ to receive the data and then I have more control of its serialisation, but that will take a long time and my C++ is rusty, not to mention I don't have a lot of experience with C++ threading which would be a requirement.
I have also created a few prototype C++ libraries with the provided header files to be accessed via C#, but I can't quite get my head around how I actually create and use an actual instance of the class in C# itself (every time I look into this, all I see are extern function calls, not the use of external types).
I have also looked into Marshalling. However, as the data is liable to change quite often, I think this is a last resort and feels quite manual.
Does anyone know of any options or have any more targeted reading or advice on this matter?

Why not use Google Protocol Buffers on each end i.e. c++ and c#. You would take your c++ definition and let PB do all the serialisation for you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. more...
It works across different OSs even where primitive type conversation would normally be a problem.

What is the most efficient way of passing a list of objects from c# to poor c++ win32 native dll?

I need to create a list (array) from .net which consists of about 50 000 elements, pass it to c++ dll, operate on it and return a list (array) from c++ to .net.
First option which comes to my mind, is to create a struct on both sides. Return an array of structs from .net to c++.
Here are my concerns:
a) if in a struct, it consists non reference types like: ints, doubles, etc
will an array of structs with its values will be stored on stack?
Is there a limit when creating array of struct? Is it efficient?
Is it an efficient way to initialize such a large array on .net side?
Do you have any sample which shows how to pass references to objects, etc?
Without COM, interoperability, etc?
Generally I seek an advise how to perform efficiently the following things:
1) Fetch data from db
2) Allocate it in a structure which I could efficiently pass to a dll c++ win32 library
3) Perform operations on c++ side, then return an array back to .net
I also need an advise, on which side the objects should be allocated/deallocated
in terms of performing the above operations..
Thanks for help in advance
P.S. I also don't understand the info that making an array public in a class, makes a whole copy every time I access it ... Could someone explain that to me?

I think the way to do this is the simplest, and that is to pass the buffer to the C++ DLL, along with a length argument denoting the number of items. This way, the client is responsible for creating the buffer and not the DLL. The DLL is no longer responsible for creation or deletion of the buffer, just the manipulation of the buffer.
This is the way that most, if not all Windows API functions accomplish this. Most of those API functions deal with character buffers, but the same principle applies.
For structs, here is a link describing how to define your C# struct to be compatible with a C++ (or C) struct:
Convert C++ struct to C#

How to marshall data type unsigned char** in C#?

I'm trying to marshall unsigned char** (which is in a C++ interface) in order to call the method from C#.
How can this be done? Is there a list where are C++ data types and C# data types?
Thanks!

What is the semantics of this unsigned char**? If it is a byte array, use ref byte[].
If it is a zero terminated string, use ref string.
You can find some popular method signatures mapped to c# on the page http://www.pinvoke.net, which may give you the idea.

I think you should use a serialization library that has an interface for both C++ and C#.
Both Protocol Buffers from Google or Thrift from Facebook support these two languages.
It would definitely make things much easier and safer for you.
If you decide to change the transferred data types (i.e. use integers, structures, etc. instead of raw strings), using a serialization library is the way to go.

That would be marshalable as ref string. Be sure to use the right character set with a [MarshalAs] attribute.

Writing a binary file in C# to be read by C program, with pointers?

I'm moving some old C code that generates a binary file into our C# system. The problem is, the resulting binary file will still need to be read by another old C program.
The original code outputs several structs to a binary file, and many of those structs contain linked lists, with *next pointers.
How can I write these in C# so that the original program will still be able to read them?
The old C code reads and writes the file a whole struct at a time, with freads and fwrites i.e.
fread ( &file, sizeof ( struct file_items ), 1, hdata.fp );
I can't find a whole lot of info on how fwrite would output the pointers, etc.

If the old code was writing pointers to a file, then odds are you dealing with very poorly written code. Those pointers would be meaningless to any other process reading that file...
Also, reading whole structures with a single fread() is a bad idea because different compilers may pad those structures differently (so the structure written by one application may be laid out differently than one read by another application).

If your code is depending on reading and writing pointer values to a file then it's broken. Every time you run the program it could potentially have a slightly different memory layout.
Instead of writing pointers you should probably convert the pointers into file offsets on write and convert the file offsets back to pointers on read.
(This is true for C, C++ and C#)

The pointers will be meaningless after reading them back, in C or any other language. I assume the pointer-structures are rebuild after reading. This means you can just treat them as fillers while reading/writing.
In .NET, streams only accept byte and byte[] as data types, so you will have to convert your structs to/from that format.
One way is to write custom code reading/writing the fields in order. Gives you the most control but it is a lot of work.
The other approach is to map your struct to a byte[] wholesale, I'll look for an example.

The only way you can be (correctly) writing pointers to disk is if you are using something like based addressing:
A linked list that consists of
pointers based on a pointer can be
saved to disk, then reloaded to
another place in memory, with the
pointers remaining valid.
Handling this in C# would be extremely difficult and require some kind of mapping layer during serialization.

A pointer refers to a memory location, when you store a pointer in a file, it is meaningless, it refers to something that is ephemeral. So either it does not matter in this application because the data referenced is discared, or you have not stated the format correctly. Normally in such a case you would apply 'serialization', so that the data pointed to were also stored, in the file in such a way that the original data and what it pointed to could be reconstructed ('deserialized') at a later time.
There is no fundamental difference between file storage in C and C# - that is independent of the language, however there may be differences in structure packing, so just storing the structure was always a bad idea (structure packing can vary even between C compilers). Also of course you need to realise that a char type in C# is 16-bit, not 8. You need to let the existing storage format be the specification and then implement it in C# using serialisation to avoid problems with the differences in structure implementation.

Serialize in C++ then deserialize in C#?

Is there an easy way to serialize data in c++ (either to xml or binary), and then deserialize the data in C#?
I'm working with some remote WINNT machines that won't run .Net. My server app is written entirely in C#, so I want an easy way to share simple data (key value pairs mostly, and maybe some representation of a SQL result set). I figure the best way is going to be to write the data to xml in some predefined format on the client, transfer the xml file to my server, and have a C# wrapper read the xml into a usable c# object.
The client and server are communicating over a tcp connection, and what I really want is to serialize the data in memory on the client, transfer the binary data over the socket to a c# memory stream that I can deserialize into a c# object (eliminating file creation, transfer, etc), but I don't think anything like that exists. Feel free to enlighten me.
Edit
I know I can create a struct in the c++ app and define it in c# and transfer data that way, but in my head, that feels like I'm limiting what can be sent. I'd have to set predefined sizes for objects, etc

Protocol Buffers might be useful to you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
.NET ports are available from Marc Gravell and Jon Skeet.

I checked out all mentioned projects like prottocol buffers, json, xml, etc. but after I have found BSON I use this because of the following reasons:
Easy to use API
Available in many languages (C, C++, Haskell, Go, Erlang, Perl, PHP, Python, Ruby, C#, ...)
Binary therefore very space efficient and fast (less bytes->less time)
constistent over platforms (no problems with endianess, etc)
hierarchical. The data model is comparable to json (what the name suggests) so most data modelling tasks should be solvable.
No precompiler necessary
wideley used (Mongodb, many languages)

C++ doesn't have structural introspection (you can't find out the fields of a class at runtime), so there aren't general mechanisms to write a C++ object. You either have to adopt a convention and use code generation, or (more typically) write the serialisation yourself.
There are some libraries for standard formats such as ASN.1, HDF5, and so on which are implementation language neutral. There are proprietary libraries which serve the same purpose (eg protocol buffers).
If you're targeting a particular architecture and compiler, then you can also just dump the C++ object as raw bytes, and create a parser on the C# side.
Quite what is better depends how tightly coupled you want your endpoints to be, and whether the data is mainly numerical (HDF5), tree and sequence structures (ASN.1), or simple plain data objects (directly writing the values in memory)

Other options would be:
creating a binary file that contains the data in the way you need it
( not a easy & portable solution )
XML
YAML
plain text files

There are a lot of options you can choose from. Named pipes, shared
memory, DDE, remoting... Depends on your particular need.
Quick googling gave the following:
Named pipes
Named Shared Memory
DDE

As mentioned already, Protocol Buffers are a good option.
If that option doesn't suit your needs, then I would look at sending the XML over to the client (you would have to prefix the message with the length so you know how much to read) and then using an implementation of IXmlSerializer or use the DataContract/DataMember attributes in conjunction with the DataContractSerializer to get your representation in .NET.
I would recommend against using the marshaling attributes, as they aren't supported on things like List<T> and a number of other standard .NET classes which you would use normally.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Serialization/Deserialization strategy for byte stream of C structs in C# - c#

Related

C++ class via UDP to be used in C#, what are the options?

What is the most efficient way of passing a list of objects from c# to poor c++ win32 native dll?

How to marshall data type unsigned char** in C#?

Writing a binary file in C# to be read by C program, with pointers?

Serialize in C++ then deserialize in C#?

Categories

Resources