So here's the background:
We have a legacy program that writes data logs in C++. the data is contained in different structures. The program that reads the log files uses those same structures to display the data. I rewrote the program that reads the log files and C# and had to create C# copy of all those structures by hand.
Is there a better way to do this? I have considered setting up a lookup path to the structures and a sort of parser that would generate a C# structure at build time, but it seems excessively complicated to handle all the special cases. Are there any suggestions to do this? it seems kind of ridiculous that C# doesn't have any backwards compatibility to handle C/C++ structures.
How many structures are there and how complicated are they?
It's a costs vs benefits question I'd say. I'll bet that, judging from your question, just quickly coding the structs in C# is the best way to go.
Just my 2 cents, before taxes...
Summary of your problem:
You have a large number of structs in an existing C++ program that you serialize to disk. You want to port the structs to C# so you can deserialize the data from disk into your C# program. You don't just want to do this once. You want to keep the two sets of structs in sync as both programs evolve.
What you need is an Interface Definition Language (IDL) in which you can describe your data in a language independent way. Something like Apache Thrift, Google Protocol Buffers or MessagePack.
The steps you'd have to take would be:
Convert your existing C++ structs into IDL. This is a one-off process. Use a custom script or look for an existing one. Someone must have solved this by now.
Setup your build system to generate both the C# and the C++ definitions at build time.
Use the Thrift/ProtoBuf/MessagePack C# and C++ APIs to serialize and deserialize your data as needed.
The disadvantages are:
You need build-time code generation. But you already considered this yourself and this way the work has already been done for you.
You will have to change both your C++ and C# to correctly use whatever data structures are generated for you.
The binary on disk format will change so legacy logs won't be readable. But you can write some C++ to convert them to the new format quite easily.
I think this is outweighed by the advantages:
The IDL will contain the canonical description of your data. Any changes to it will be reflected in both your C++ and C#. You won't have to manually update your C# version when the C++ one changes.
The binary data format will be machine-independent. You will be able to read/write your data from many different languages on a variety of platforms.
The third-party libraries I mentioned are robust and widely used. Better than hacking something yourself.
Related
I am receiving data via UDP from a C/C++ application. This application is doing a memcpy of the class into a buffer and throwing it our way. Our application is written in C# and I need to somehow make sense of the data. We have access to the header files of the structures - everything is basically a struct or an enum. We can't change the format the data comes in and the header files are likely to change fairly often.
I have considered re-writing our comms classes in C++ to receive the data and then I have more control of its serialisation, but that will take a long time and my C++ is rusty, not to mention I don't have a lot of experience with C++ threading which would be a requirement.
I have also created a few prototype C++ libraries with the provided header files to be accessed via C#, but I can't quite get my head around how I actually create and use an actual instance of the class in C# itself (every time I look into this, all I see are extern function calls, not the use of external types).
I have also looked into Marshalling. However, as the data is liable to change quite often, I think this is a last resort and feels quite manual.
Does anyone know of any options or have any more targeted reading or advice on this matter?
Why not use Google Protocol Buffers on each end i.e. c++ and c#. You would take your c++ definition and let PB do all the serialisation for you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. more...
It works across different OSs even where primitive type conversation would normally be a problem.
Server side - C# or java
Client side Objective C
I need a way to serialize an object in C#\java and de-serialize it in Objective C.
I'm new to Objective C and I was wondering where I can get information about this issue.
Thanks.
Apart from the obvious JSON/XML solutions, protobuf may also be interesting. There are Java//c++/python backends for it and 3rd parties have created backends for C# and objective-c (never used that one though) as well.
The main advantages are it being much, much faster to parse[1], much smaller[2] since it's a binary format and the fact that versioning was an important factor from the beginning.
[1] google claims 20-100times compared to XML
[2] 3-10times according to the same source
Another technology similar to protobufs is Apache Thrift.
Apache Thrift is a software framework for scalable cross-language services development. Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.
JSON for relatively straight forward object graphs
XML/REST for more complex object graphs (distinction between Arrays / Collections / nested arrays etc)
Sudzc. I am using it. It is pretty easy to invoke a Webservice from i-os app.
You dont have to write code to serialize object.
JSON is probably the best choice, because:
It is simple to use
It is human-readable
It is data-based rather than being tied to any more complex object model
You will be able to find decent libraries for import/export in most languages.
Serialisation of more complex objects is IMHO not a good idea from the perspective of portability since often one language/platform has no effective way of expressing a concept from another language / platform. e.g. as soon as you start declaring "types" or "classes" of serialised objects you run into the thorny issue of differing object models between languages.
On iOS there are couple of JSON frameworks and libraries with an Objective-C API:
JSONKit
SBJson
TouchJson
are probably the most prominent.
JSONKit is fast and simple, but can only parse a contiguous portion of JSON text. This means, you need to save downloaded data into a temporary file, or you need to save all downloaded JSON text into a NSMutableData object (kept in memory). Only after the JSON text has been downloaded completely you can start parsing.
SBJson is more flexible to use. It provides an additional "SAX style" interface, can parse partial input and can parse more than one JSON document per "input" (for example several JSON documents per network connection). This is very handy when you want to connect to a "streaming API" (e.g. Twitter Streaming API), where many JSON documents can arrive per connection. The drawback is, it is a much slower than JSONKit.
TouchJson is even somewhat slower than SBJson.
My personal preference is some other, though. It is faster than JSONKit (20% faster on arm), has an additional SAX style API, can handle "streaming APIs", can simultaneously download and parse, can handle very large JSON strings without severely impacting memory foot-print, while it is especially easy to use with NSURLConnection. (Well, I'm probably biased since I'm the author).
You can take a look at JPJson (Apache License v2):
JPJson - it's still in beta, though.
I am about to embark on a project to connect two programs, one in c#, and one in c++. I already have a working c# program, which is able to talk to other versions of itself. Before I start with the c++ version, I've thought of some issues:
1) I'm using protobuf-net v1. I take it the .proto files from the serializer are exactly what are required as templates for the c++ version? A google search mentioned something about pascal casing, but I have no idea if that's important.
2) What do I do if one of the .NET types does not have a direct counterpart in c++? What if I have a decimal or a Dictionary? Do I have to modify the .proto files somehow and squish the data into a different shape? (I shall examine the files and see if I can figure it out)
3) Are there any other gotchas that people can think of? Binary formats and things like that?
EDIT
I've had a look at one of the proto files now. It seems .NET specific stuff is tagged eg bcl.DateTime or bcl.Decimal. Subtypes are included in the proto definitions. I'm not sure what to do about bcl types, though. If my c++ prog sees a decimal, what will it do?
Yes, the proto files should be compatible. The casing is about conventions, which shouldn't affect actual functionality - just the generated code etc.
It's not whether or not there's a directly comparable type in .NET which is important - it's whether protocol buffers support the type which is important. Protocol buffers are mostly pretty primitive - if you want to build up anything bigger, you'll need to create your own messages.
The point of protocol buffers is to make it all binary compatible on the wire, so there really shouldn't be gotchas... read the documentation to find out about versioning policies etc. The only thing I can think of is that in the Java version at least, it's a good idea to make enum fields optional, and give the enum type itself a zero value of "unknown" which will be used if you try to deserialize a new value which isn't supported in deserializing code yet.
Some minor additions to Jon's points:
protobuf-net v1 does have a Getaproto which may help with a starting point, however, for interop purposes I would recommend starting from a .proto; protobuf-net can work this was around too, either via "protogen", or via the VS addin
other than that, you shouldn't have my issues as long as you remember to treat all files as binary; opening files in text mode will cause grief
I have a .NET application which serializes an object in binary format.
this object is a struct consisting of a few fields.
I must deserialize and use this object in a C++ application.
I have no idea if there are any serialization libraries for C++, a google search hasn't turned up much.
What is the quickest way to accomplish this?
Thanks in advance.
Roey.
Update :
I have serialized using Protobuf-net , in my .NET application, with relative ease.
I also get the .proto file that protobuf-net generated, using GetProto() command.
In the .proto file, my GUID fields get a type of "bcl.guid", but C++ protoc.exe compiler does not know how to interpret them!
What do I do with this?
If you are using BinaryFormatter, then it will be virtually impossible. Don't go there...
Protocol buffers is designed to be portable, cross platform and version-tolerant (so it won't explode when you add new fields etc). Google provide the C++ version, and there are several C# versions freely available (including my own) - see here for the full list.
Small, fast, easy.
Note that the v1 of protobuf-net won't handle structs directly (you'll need a DTO class), but v2 (very soon) does have tested struct support.
Can you edit the .NET app? If so why not use XML Serialization to output the data in a easy to import format?
Both boost and Google have libraries for serialization. However, if your struct is pretty trivial, you might consider managing the serialization yourself by writing bytes out from C# and then reading the data in C++ with fread.
Agree with others. You are making your app very vulnerable by doing this. Consider the situation if one of the classes you're serializing is changed in any way or built on a later version of the C# compiler: Your serialized classes could potentially change causing them to be unreadable.
An XML based solution might work well. Have you considered SOAP? A little out of fashion now but worth a look. The main issue is to decouple the implementation from the data. You can do this in binary if speed / efficiency is an issue, although in my experience, it rarely is.
Serializing in a binary format and expecting an application in another language to read the binary is a very brittle solution (ie it will tend to break on the smallest change to anything).
It would be more stable to serialize the data in a common standard format.
Do you have the option of changing the format? If so, consider choosing a non-binary format for greater interoperability. There are plenty of libraries for reading and writing XML. Json is popular as well.
Binary formats are efficient, but vulnerable to implementation details (does your C++ compiler pack data structures? how are ints and floats represented? what byte ordering is used?), and difficult to adjust if mangled. Text based formats are verbose, but tend to be much more robust. If you are uncertain about binary representations, text representations tend to be easier to understand (apart from challenges such as code pages and wide/narrow characters...).
For C++ XML libraries, the most capable (and perhaps also most complex) would still seem to be the Xerces library. But you should decide for yourself which library best fits your needs and skills.
Use XML Serialization its the best way to go, in fact is the cleanest way to go.
XmlSerializer s = new XmlSerializer( typeof( YourClassType ) );
TextWriter w = new StreamWriter( #"c:\list.xml" );
s.Serialize( w, yourClassListCollection );
w.Close();
Is there an easy way to serialize data in c++ (either to xml or binary), and then deserialize the data in C#?
I'm working with some remote WINNT machines that won't run .Net. My server app is written entirely in C#, so I want an easy way to share simple data (key value pairs mostly, and maybe some representation of a SQL result set). I figure the best way is going to be to write the data to xml in some predefined format on the client, transfer the xml file to my server, and have a C# wrapper read the xml into a usable c# object.
The client and server are communicating over a tcp connection, and what I really want is to serialize the data in memory on the client, transfer the binary data over the socket to a c# memory stream that I can deserialize into a c# object (eliminating file creation, transfer, etc), but I don't think anything like that exists. Feel free to enlighten me.
Edit
I know I can create a struct in the c++ app and define it in c# and transfer data that way, but in my head, that feels like I'm limiting what can be sent. I'd have to set predefined sizes for objects, etc
Protocol Buffers might be useful to you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
.NET ports are available from Marc Gravell and Jon Skeet.
I checked out all mentioned projects like prottocol buffers, json, xml, etc. but after I have found BSON I use this because of the following reasons:
Easy to use API
Available in many languages (C, C++, Haskell, Go, Erlang, Perl, PHP, Python, Ruby, C#, ...)
Binary therefore very space efficient and fast (less bytes->less time)
constistent over platforms (no problems with endianess, etc)
hierarchical. The data model is comparable to json (what the name suggests) so most data modelling tasks should be solvable.
No precompiler necessary
wideley used (Mongodb, many languages)
C++ doesn't have structural introspection (you can't find out the fields of a class at runtime), so there aren't general mechanisms to write a C++ object. You either have to adopt a convention and use code generation, or (more typically) write the serialisation yourself.
There are some libraries for standard formats such as ASN.1, HDF5, and so on which are implementation language neutral. There are proprietary libraries which serve the same purpose (eg protocol buffers).
If you're targeting a particular architecture and compiler, then you can also just dump the C++ object as raw bytes, and create a parser on the C# side.
Quite what is better depends how tightly coupled you want your endpoints to be, and whether the data is mainly numerical (HDF5), tree and sequence structures (ASN.1), or simple plain data objects (directly writing the values in memory)
Other options would be:
creating a binary file that contains the data in the way you need it
( not a easy & portable solution )
XML
YAML
plain text files
There are a lot of options you can choose from. Named pipes, shared
memory, DDE, remoting... Depends on your particular need.
Quick googling gave the following:
Named pipes
Named Shared Memory
DDE
As mentioned already, Protocol Buffers are a good option.
If that option doesn't suit your needs, then I would look at sending the XML over to the client (you would have to prefix the message with the length so you know how much to read) and then using an implementation of IXmlSerializer or use the DataContract/DataMember attributes in conjunction with the DataContractSerializer to get your representation in .NET.
I would recommend against using the marshaling attributes, as they aren't supported on things like List<T> and a number of other standard .NET classes which you would use normally.