Is there an easy way to serialize data in c++ (either to xml or binary), and then deserialize the data in C#?
I'm working with some remote WINNT machines that won't run .Net. My server app is written entirely in C#, so I want an easy way to share simple data (key value pairs mostly, and maybe some representation of a SQL result set). I figure the best way is going to be to write the data to xml in some predefined format on the client, transfer the xml file to my server, and have a C# wrapper read the xml into a usable c# object.
The client and server are communicating over a tcp connection, and what I really want is to serialize the data in memory on the client, transfer the binary data over the socket to a c# memory stream that I can deserialize into a c# object (eliminating file creation, transfer, etc), but I don't think anything like that exists. Feel free to enlighten me.
Edit
I know I can create a struct in the c++ app and define it in c# and transfer data that way, but in my head, that feels like I'm limiting what can be sent. I'd have to set predefined sizes for objects, etc
Protocol Buffers might be useful to you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
.NET ports are available from Marc Gravell and Jon Skeet.
I checked out all mentioned projects like prottocol buffers, json, xml, etc. but after I have found BSON I use this because of the following reasons:
Easy to use API
Available in many languages (C, C++, Haskell, Go, Erlang, Perl, PHP, Python, Ruby, C#, ...)
Binary therefore very space efficient and fast (less bytes->less time)
constistent over platforms (no problems with endianess, etc)
hierarchical. The data model is comparable to json (what the name suggests) so most data modelling tasks should be solvable.
No precompiler necessary
wideley used (Mongodb, many languages)
C++ doesn't have structural introspection (you can't find out the fields of a class at runtime), so there aren't general mechanisms to write a C++ object. You either have to adopt a convention and use code generation, or (more typically) write the serialisation yourself.
There are some libraries for standard formats such as ASN.1, HDF5, and so on which are implementation language neutral. There are proprietary libraries which serve the same purpose (eg protocol buffers).
If you're targeting a particular architecture and compiler, then you can also just dump the C++ object as raw bytes, and create a parser on the C# side.
Quite what is better depends how tightly coupled you want your endpoints to be, and whether the data is mainly numerical (HDF5), tree and sequence structures (ASN.1), or simple plain data objects (directly writing the values in memory)
Other options would be:
creating a binary file that contains the data in the way you need it
( not a easy & portable solution )
XML
YAML
plain text files
There are a lot of options you can choose from. Named pipes, shared
memory, DDE, remoting... Depends on your particular need.
Quick googling gave the following:
Named pipes
Named Shared Memory
DDE
As mentioned already, Protocol Buffers are a good option.
If that option doesn't suit your needs, then I would look at sending the XML over to the client (you would have to prefix the message with the length so you know how much to read) and then using an implementation of IXmlSerializer or use the DataContract/DataMember attributes in conjunction with the DataContractSerializer to get your representation in .NET.
I would recommend against using the marshaling attributes, as they aren't supported on things like List<T> and a number of other standard .NET classes which you would use normally.
Related
I am receiving data via UDP from a C/C++ application. This application is doing a memcpy of the class into a buffer and throwing it our way. Our application is written in C# and I need to somehow make sense of the data. We have access to the header files of the structures - everything is basically a struct or an enum. We can't change the format the data comes in and the header files are likely to change fairly often.
I have considered re-writing our comms classes in C++ to receive the data and then I have more control of its serialisation, but that will take a long time and my C++ is rusty, not to mention I don't have a lot of experience with C++ threading which would be a requirement.
I have also created a few prototype C++ libraries with the provided header files to be accessed via C#, but I can't quite get my head around how I actually create and use an actual instance of the class in C# itself (every time I look into this, all I see are extern function calls, not the use of external types).
I have also looked into Marshalling. However, as the data is liable to change quite often, I think this is a last resort and feels quite manual.
Does anyone know of any options or have any more targeted reading or advice on this matter?
Why not use Google Protocol Buffers on each end i.e. c++ and c#. You would take your c++ definition and let PB do all the serialisation for you.
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. more...
It works across different OSs even where primitive type conversation would normally be a problem.
So here's the background:
We have a legacy program that writes data logs in C++. the data is contained in different structures. The program that reads the log files uses those same structures to display the data. I rewrote the program that reads the log files and C# and had to create C# copy of all those structures by hand.
Is there a better way to do this? I have considered setting up a lookup path to the structures and a sort of parser that would generate a C# structure at build time, but it seems excessively complicated to handle all the special cases. Are there any suggestions to do this? it seems kind of ridiculous that C# doesn't have any backwards compatibility to handle C/C++ structures.
How many structures are there and how complicated are they?
It's a costs vs benefits question I'd say. I'll bet that, judging from your question, just quickly coding the structs in C# is the best way to go.
Just my 2 cents, before taxes...
Summary of your problem:
You have a large number of structs in an existing C++ program that you serialize to disk. You want to port the structs to C# so you can deserialize the data from disk into your C# program. You don't just want to do this once. You want to keep the two sets of structs in sync as both programs evolve.
What you need is an Interface Definition Language (IDL) in which you can describe your data in a language independent way. Something like Apache Thrift, Google Protocol Buffers or MessagePack.
The steps you'd have to take would be:
Convert your existing C++ structs into IDL. This is a one-off process. Use a custom script or look for an existing one. Someone must have solved this by now.
Setup your build system to generate both the C# and the C++ definitions at build time.
Use the Thrift/ProtoBuf/MessagePack C# and C++ APIs to serialize and deserialize your data as needed.
The disadvantages are:
You need build-time code generation. But you already considered this yourself and this way the work has already been done for you.
You will have to change both your C++ and C# to correctly use whatever data structures are generated for you.
The binary on disk format will change so legacy logs won't be readable. But you can write some C++ to convert them to the new format quite easily.
I think this is outweighed by the advantages:
The IDL will contain the canonical description of your data. Any changes to it will be reflected in both your C++ and C#. You won't have to manually update your C# version when the C++ one changes.
The binary data format will be machine-independent. You will be able to read/write your data from many different languages on a variety of platforms.
The third-party libraries I mentioned are robust and widely used. Better than hacking something yourself.
I have a system where a serialized file is created with a C# program and then deserialized in another C# program. I'm wondering if it's possible to do binary deserialization of a C# file in Java?
Thanks
You can try using some serializator that has implementations for both platforms and outputs data in a platform-independet format, like Protobuf.
Or if you need a full RPC over network between Java and C# application, you can go for Apache Thrift.
I assume you are speaking of an object serialized with BinaryFormatter. The answer then is a qualified "yes," since Java implements a Turing machine. However, this is will not be straightforward.
In this case the data will be in a format most suitable for consumption by a .NET runtime, and will contain information about .NET types and assemblies. You would have to implement your own reader for this format, and then have some way to map between .NET and Java types. (The Mono project implements a BinaryFormatter compatible with .NET's, so you could use their reader implementation as a reference.)
As an alternative, consider using another format for data serialization, such as JSON. This will give you instant portability to a wide array of languages, as well as the possibility for easy human inspection of the data.
Deserializing an object in Java which was serialized with C#'s built-in binary serialization would you'd to implement C#'s deserialization logic in java. That's a pretty involved process, so let's compare some options:
Use a third party library for serialization which works for C# and Java.
Write a routine to serialize each object. One in C#, one in Java. This will be tedious, and hard to maintain.
Implement C#'s serialization logic in Java, or vice versa. This will be difficult, time consuming, and you likely won't get it right the first time.
I recommend option 1, use a third-party library. Here's two third-party libraries I've used and highly suggest.
Google ProtoBufs
Apache Thrift
You can use any cross-platform binary format. Your options include, among others:
Protobuf
BSON (Binary JSON)
GZIP
JSON and XML (herrrrp) are also options, albeit text-based ones.
One other option would be to base64-encode the data, and decode it on the other side; albeit you may get a huge payload because it's binary (probably not a good idea).
Server side - C# or java
Client side Objective C
I need a way to serialize an object in C#\java and de-serialize it in Objective C.
I'm new to Objective C and I was wondering where I can get information about this issue.
Thanks.
Apart from the obvious JSON/XML solutions, protobuf may also be interesting. There are Java//c++/python backends for it and 3rd parties have created backends for C# and objective-c (never used that one though) as well.
The main advantages are it being much, much faster to parse[1], much smaller[2] since it's a binary format and the fact that versioning was an important factor from the beginning.
[1] google claims 20-100times compared to XML
[2] 3-10times according to the same source
Another technology similar to protobufs is Apache Thrift.
Apache Thrift is a software framework for scalable cross-language services development. Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.
JSON for relatively straight forward object graphs
XML/REST for more complex object graphs (distinction between Arrays / Collections / nested arrays etc)
Sudzc. I am using it. It is pretty easy to invoke a Webservice from i-os app.
You dont have to write code to serialize object.
JSON is probably the best choice, because:
It is simple to use
It is human-readable
It is data-based rather than being tied to any more complex object model
You will be able to find decent libraries for import/export in most languages.
Serialisation of more complex objects is IMHO not a good idea from the perspective of portability since often one language/platform has no effective way of expressing a concept from another language / platform. e.g. as soon as you start declaring "types" or "classes" of serialised objects you run into the thorny issue of differing object models between languages.
On iOS there are couple of JSON frameworks and libraries with an Objective-C API:
JSONKit
SBJson
TouchJson
are probably the most prominent.
JSONKit is fast and simple, but can only parse a contiguous portion of JSON text. This means, you need to save downloaded data into a temporary file, or you need to save all downloaded JSON text into a NSMutableData object (kept in memory). Only after the JSON text has been downloaded completely you can start parsing.
SBJson is more flexible to use. It provides an additional "SAX style" interface, can parse partial input and can parse more than one JSON document per "input" (for example several JSON documents per network connection). This is very handy when you want to connect to a "streaming API" (e.g. Twitter Streaming API), where many JSON documents can arrive per connection. The drawback is, it is a much slower than JSONKit.
TouchJson is even somewhat slower than SBJson.
My personal preference is some other, though. It is faster than JSONKit (20% faster on arm), has an additional SAX style API, can handle "streaming APIs", can simultaneously download and parse, can handle very large JSON strings without severely impacting memory foot-print, while it is especially easy to use with NSURLConnection. (Well, I'm probably biased since I'm the author).
You can take a look at JPJson (Apache License v2):
JPJson - it's still in beta, though.
I am working on a system that has components written in the following languages:
C
C++
C#
PHP
Python
These components all use (infrequently changing) data that comes from the same source and can be cached and accesed from memcache for performance reasons.
Because different data types may be stored differently by different language APIs to memcache, I am wondering if it would be better to store ALL data as string (objects will be stored as JSON string).
However, this in itself may pose problems as strings (will almost surely) have different internal representations accross the different languages, so I'm wondering about how wise that decision is.
As an aside, I am using the 1 writer, multiple readers 'pattern' so concurrency is not an issue.
Can anyone (preferably with ACTUAL experience of doing something similar) advice on the best format/way to store data in memcache so that it may be consumed by different programming languages?
memcached I think primarily only understands byte[] and representation of byte is same in all languages. You can serialize your objects using protocol buffers or a similar library and consume it in any other language. I've done this in my projects.
Regardless of the back-end chosen, (memcached, mongodb, redis, mysql, carrier pigeon) the most speed-efficient way to store data in it would be a simple block of data (so the back-end has no knowledge of it.) Whether that's string, byte[], BLOB, is really all the same.
Each language will need an agreed mechanism to convert objects to a storable data format and back. You:
Shouldn't build your own mechanism, that's just reinventing the wheel.
Should think about whether 'invalid' objects might end up in the back-end. (either because of a bug in a writer, or because objects from a previous revision are still present)
When it comes to choosing a format, I'd recommend two: JSON or Protocol Buffers. This is because their encoded size and encode/decode speed is among the smallest/fastest of all the available encodings.
Comparison
JSON:
Libraries available for dozens of languages, sometimes part of the standard library.
Very simple format - Human-readable when stored, human-writable!
No coordination required between different systems, just agreement on object structure.
No set-up needed in many languages, eg PHP: $data = json_encode($object); $object = json_decode($data);
No inherent schema, so readers need to validate decoded messages manually.
Takes more space than Protocol Buffers.
Protocol Buffers:
Generating tools provided for several languages.
Minimal size - difficult to beat.
Defined schema (externally) through .proto files.
Auto-generated interface objects for encoding/decoding, eg C++: person.SerializeToOstream(&output);
Support for differing versions of object schemas to add new optional members, so that existing objects aren't necessarily invalidated.
Not human-readable or writable, so possibly harder to debug.
Defined schema introduces some configuration management overhead.
Unicode
When it comes to Unicode support, both handle it without issues:
JSON: Will typically escape non-ascii characters inside the string as \uXXXX, so no compatibility problem there. Depending on the library, it may be also possible to force UTF-8 encoding.
Protocol Buffers: Seem to use UTF-8, though I haven't found info in Google's documentation in 3-foot-high letters to that effect.
Summary
Which one you go with will depend on how exactly your system will behave, how often changes to the data structure occur, and how all the above points will affect you.
Not going to lie you could do it in redis. Redis is a key-value database written to be high performance it allows the transfer of data between languages using a number of different client libraries these are the client libraries Here is an example in java and python
Edit 1: Code is untested. If you spot an error please let me know :)
Edit 2: I know I didn't use the prefered redis client for java but the point still stands.
Python
import redis
r = redis.Redis()
r.set('test','123')
Java
import org.jredis.RedisException;
import org.jredis.ri.alphazero.JRedisClient;
import static org.jredis.ri.alphazero.support.DefaultCodec.*;
class ExampleCode{
private final JRedisClient client = new JRedisClient();
public static void main(String[] args) throws RedisException {
System.out.println(toStr(client.get('test')))
}
}