I have a library (written in C#) for which I need to read/write representations of my objects to disk (or to any Stream) in a particular binary format (to ensure compatibility with C/Java library implementations). The format requires a fair amount of bit-packing and some DEFLATE'd bytestreams. I would like my library, however, to be as idiomatic .NET as possible, however, and so would like to provide an API as close as possible to the normal binary serialization process. I'm aware of the ability to implement the IFormatter interface, but being that I really am unable to reuse any part of the built-in serialization stack, is it worth doing this, or will it just bring unnecessary overhead. In other words:
Implement IFormatter and co.
OR
Just provide "Serialize"/"Deserialize" methods that act on a Stream?
A good point brought up below about needing the serialization semantics for any case involving Remoting. In a case where using MarshalByRef objects is feasible, I'm pretty sure that this won't be an issue, so leaving that aside are there any benefits or drawbacks to using the ISerializable/IFormatter versus a custom stack (or, is my understanding remoting incorrectly)?
I have always gone with the latter. There isn't much use in reusing the serialization framework if all you're doing is writing a file to a specific framework. The only place I've run into any issues with using a custom serialization framework is when remoting, you have to make your objects serializable.
This may not help you since you have to write to a specific format, but protobuf and sqlite are good tools for doing custom serialization.
I'd do the former. There's not much to the interface, and so if you're mimicking the structure anyway adding an ": IFormatter" and the other code necessary to get full compatibility won't take much.
Writing your own serialization code is error prone and time consuming.
As a thought - have you considered existing open-source portable formats, for example "protocol buffers"? This is a high density binary serialization format that underpins much of Google's data transfer etc. Versions are available in a wide range of languages - including Java/C++ etc (in the core Google distribution), and a vast range of others.
In particular, for .NET-idiomatic usage, protobuf-net looks a lot like XmlSerializer or DataContractSerializer (indeed, it can even work purely with xml/wcf attributes if it includes an order on each element) - or can use the specific protobuf-net attributes:
[ProtoContract]
class Person {
[ProtoMember(1)]
public string Name {get;set;}
}
If you want to guarantee portability to other implementations, the recommendation is to start "contract first", with a ".proto" file - in this case, something like:
message person {
required string name = 1;
}
This .proto file can then be used to generate any language-specific variant; so with protobuf-net you'd run it through "protogen" (included in protobuf-net; and a VS2008 add-on is in progress); or for Java/C++ etc you'd run it through "protoc" (included in Google's protobuf). "protogen" in protobuf-net can currently emit C# and VB, but it is pretty easy to add another language if you want to use F# etc - it just involves writing (or migrating) an xslt.
There is also another .NET version that is a more direct port of the Java version; as such it is less .NET idiomatic. This is dotnet-protobufs.
Related
In my C# application I saved class objects by serialization. Now I have a binary file on my disk. Is this file portable in the sense that installations of the same application on other systems will recognzie it? Are there disadvantages to consider?
If you have the exact same version of the app, then pretty much any serializer will be happy. When you start iterating versions, then assuming you are using BinaryFormatter, you can start hitting version compatibility issues, particularly if you have refactored (moved / renamed types, moved / renamed fields, etc). Which is why I strongly recommend not using BinaryFormatter for persistence. If you just want the data stored, there are a wide range of other serializers that will work great; XmlSerializer, Json.NET, protobuf-net, DataContractSerializer, etc - which all handle versioning more gracefully than BinaryFormatter. If you specifically want binary for performance reasons (large files, etc), then protobuf-net may be worth a look (but I'm biased). Otherwise, you could simply use compression - xml and json compress pretty well.
Additionally, note that BinaryFormatter will not work at all between other platforms, or generally even on other .NET frameworks (including the mobile frameworks, etc).
I have a system where a serialized file is created with a C# program and then deserialized in another C# program. I'm wondering if it's possible to do binary deserialization of a C# file in Java?
Thanks
You can try using some serializator that has implementations for both platforms and outputs data in a platform-independet format, like Protobuf.
Or if you need a full RPC over network between Java and C# application, you can go for Apache Thrift.
I assume you are speaking of an object serialized with BinaryFormatter. The answer then is a qualified "yes," since Java implements a Turing machine. However, this is will not be straightforward.
In this case the data will be in a format most suitable for consumption by a .NET runtime, and will contain information about .NET types and assemblies. You would have to implement your own reader for this format, and then have some way to map between .NET and Java types. (The Mono project implements a BinaryFormatter compatible with .NET's, so you could use their reader implementation as a reference.)
As an alternative, consider using another format for data serialization, such as JSON. This will give you instant portability to a wide array of languages, as well as the possibility for easy human inspection of the data.
Deserializing an object in Java which was serialized with C#'s built-in binary serialization would you'd to implement C#'s deserialization logic in java. That's a pretty involved process, so let's compare some options:
Use a third party library for serialization which works for C# and Java.
Write a routine to serialize each object. One in C#, one in Java. This will be tedious, and hard to maintain.
Implement C#'s serialization logic in Java, or vice versa. This will be difficult, time consuming, and you likely won't get it right the first time.
I recommend option 1, use a third-party library. Here's two third-party libraries I've used and highly suggest.
Google ProtoBufs
Apache Thrift
You can use any cross-platform binary format. Your options include, among others:
Protobuf
BSON (Binary JSON)
GZIP
JSON and XML (herrrrp) are also options, albeit text-based ones.
One other option would be to base64-encode the data, and decode it on the other side; albeit you may get a huge payload because it's binary (probably not a good idea).
Server side - C# or java
Client side Objective C
I need a way to serialize an object in C#\java and de-serialize it in Objective C.
I'm new to Objective C and I was wondering where I can get information about this issue.
Thanks.
Apart from the obvious JSON/XML solutions, protobuf may also be interesting. There are Java//c++/python backends for it and 3rd parties have created backends for C# and objective-c (never used that one though) as well.
The main advantages are it being much, much faster to parse[1], much smaller[2] since it's a binary format and the fact that versioning was an important factor from the beginning.
[1] google claims 20-100times compared to XML
[2] 3-10times according to the same source
Another technology similar to protobufs is Apache Thrift.
Apache Thrift is a software framework for scalable cross-language services development. Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.
JSON for relatively straight forward object graphs
XML/REST for more complex object graphs (distinction between Arrays / Collections / nested arrays etc)
Sudzc. I am using it. It is pretty easy to invoke a Webservice from i-os app.
You dont have to write code to serialize object.
JSON is probably the best choice, because:
It is simple to use
It is human-readable
It is data-based rather than being tied to any more complex object model
You will be able to find decent libraries for import/export in most languages.
Serialisation of more complex objects is IMHO not a good idea from the perspective of portability since often one language/platform has no effective way of expressing a concept from another language / platform. e.g. as soon as you start declaring "types" or "classes" of serialised objects you run into the thorny issue of differing object models between languages.
On iOS there are couple of JSON frameworks and libraries with an Objective-C API:
JSONKit
SBJson
TouchJson
are probably the most prominent.
JSONKit is fast and simple, but can only parse a contiguous portion of JSON text. This means, you need to save downloaded data into a temporary file, or you need to save all downloaded JSON text into a NSMutableData object (kept in memory). Only after the JSON text has been downloaded completely you can start parsing.
SBJson is more flexible to use. It provides an additional "SAX style" interface, can parse partial input and can parse more than one JSON document per "input" (for example several JSON documents per network connection). This is very handy when you want to connect to a "streaming API" (e.g. Twitter Streaming API), where many JSON documents can arrive per connection. The drawback is, it is a much slower than JSONKit.
TouchJson is even somewhat slower than SBJson.
My personal preference is some other, though. It is faster than JSONKit (20% faster on arm), has an additional SAX style API, can handle "streaming APIs", can simultaneously download and parse, can handle very large JSON strings without severely impacting memory foot-print, while it is especially easy to use with NSURLConnection. (Well, I'm probably biased since I'm the author).
You can take a look at JPJson (Apache License v2):
JPJson - it's still in beta, though.
I have a .NET application which serializes an object in binary format.
this object is a struct consisting of a few fields.
I must deserialize and use this object in a C++ application.
I have no idea if there are any serialization libraries for C++, a google search hasn't turned up much.
What is the quickest way to accomplish this?
Thanks in advance.
Roey.
Update :
I have serialized using Protobuf-net , in my .NET application, with relative ease.
I also get the .proto file that protobuf-net generated, using GetProto() command.
In the .proto file, my GUID fields get a type of "bcl.guid", but C++ protoc.exe compiler does not know how to interpret them!
What do I do with this?
If you are using BinaryFormatter, then it will be virtually impossible. Don't go there...
Protocol buffers is designed to be portable, cross platform and version-tolerant (so it won't explode when you add new fields etc). Google provide the C++ version, and there are several C# versions freely available (including my own) - see here for the full list.
Small, fast, easy.
Note that the v1 of protobuf-net won't handle structs directly (you'll need a DTO class), but v2 (very soon) does have tested struct support.
Can you edit the .NET app? If so why not use XML Serialization to output the data in a easy to import format?
Both boost and Google have libraries for serialization. However, if your struct is pretty trivial, you might consider managing the serialization yourself by writing bytes out from C# and then reading the data in C++ with fread.
Agree with others. You are making your app very vulnerable by doing this. Consider the situation if one of the classes you're serializing is changed in any way or built on a later version of the C# compiler: Your serialized classes could potentially change causing them to be unreadable.
An XML based solution might work well. Have you considered SOAP? A little out of fashion now but worth a look. The main issue is to decouple the implementation from the data. You can do this in binary if speed / efficiency is an issue, although in my experience, it rarely is.
Serializing in a binary format and expecting an application in another language to read the binary is a very brittle solution (ie it will tend to break on the smallest change to anything).
It would be more stable to serialize the data in a common standard format.
Do you have the option of changing the format? If so, consider choosing a non-binary format for greater interoperability. There are plenty of libraries for reading and writing XML. Json is popular as well.
Binary formats are efficient, but vulnerable to implementation details (does your C++ compiler pack data structures? how are ints and floats represented? what byte ordering is used?), and difficult to adjust if mangled. Text based formats are verbose, but tend to be much more robust. If you are uncertain about binary representations, text representations tend to be easier to understand (apart from challenges such as code pages and wide/narrow characters...).
For C++ XML libraries, the most capable (and perhaps also most complex) would still seem to be the Xerces library. But you should decide for yourself which library best fits your needs and skills.
Use XML Serialization its the best way to go, in fact is the cleanest way to go.
XmlSerializer s = new XmlSerializer( typeof( YourClassType ) );
TextWriter w = new StreamWriter( #"c:\list.xml" );
s.Serialize( w, yourClassListCollection );
w.Close();
Is Protocol Buffer for .NET gonna be lightweight/faster than Remoting(the SerializationFormat.Binary)? Will there be a first class support for it in language/framework terms? i.e. is it handled transparently like with Remoting/WebServices?
I very much doubt that it will ever have direct language support or even framework support - it's the kind of thing which is handled perfectly well with 3rd party libraries.
My own port of the Java code is explicit - you have to call methods to serialize/deserialize. (There are RPC stubs which will automatically serialize/deserialize, but no RPC implementation yet.)
Marc Gravell's project fits in very nicely with WCF though - as far as I'm aware, you just need to tell it (once) to use protocol buffers for serialization, and the rest is transparent.
In terms of speed, you should look at Marc Gravell's benchmark page. My code tends to be slightly faster than his, but both are much, much faster than the other serialization/deserialization options in the framework. It should be pointed out that protocol buffers are much more limited as well - they don't try to serialize arbitrary types, only the supported ones. We're going to try to support more of the common data types (decimal, DateTime etc) in a portable way (as their own protocol buffer messages) in future.
Some performance and size metrics are on this page. I haven't got Jon's stats on there at the moment, just because the page is a little old (Jon: we must fix that!).
Re being transparent; protobuf-net can hook into WCF via the contract; note that it plays nicely with MTOM over basic-http too. This doesn't work with Silverlight, though, since Silverlight lacks the injection point. If you use svcutil, you also need to add an attribute to class (via a partial class).
Re BinaryFormatter (remoting); yes, this has full supprt; you can do this simply by a trivial ISerializable implementation (i.e. just call the Serializer method with the same args). If you use protogen to create your classes, then it can do it for you: you can enable this at the command line via arguments (it isn't enabled by default as BinaryFormatter doesn't work on all frameworks [CF, etc]).
Note that for very small objects (single instances, etc) on local remoting (IPC), the raw BinaryFormatter performance is actually better - but for non-trivial graphs or remote links (network remoting) protobuf-net can out-perform it pretty well.
I should also note that the protocol buffers wire format doesn't directly support inheritance; protobuf-net can spoof this (while retaining wire-compatibility), but like with XmlSerializer, you need to declare the sub-classes up-front.
Why are there two versions?
The joys of open source, I guess ;-p Jon and I have worked on joint projects before, and have discussed merging these two, but the fact is that they target two different scenarios:
dotnet-protobufs (Jon's) is a port of the existing java version. This means it has a very familiar API for anybody already using the java version, and it is built on typical java constructs (builder classes, immutable data classes, etc) - with a few C# twists.
protobuf-net (Marc's) is a ground-up re-implementation following the same binary format (indeed, a critical requirement is that you can interchange data between different formats), but using typical .NET idioms:
mutable data classes (no builders)
the serialization member specifics are expressed in attributes (comparable to XmlSerializer, DataContractSerializer, etc)
If you are working on java and .NET clients, Jon's is probably a good choice for the familiar API on both sides. If you are pure .NET, protobuf-net has advantages - the familiar .NET style API, but also:
you aren't forced to be contract-first (although you can, and a code-generator is supplied)
you can re-use your existing objects (in fact, [DataContract] and [XmlType] classes can often be used without any changes at all)
it has full support for inheritance (which it achieves on the wire by spoofing encapsulation) (possibly unique for a protocol buffers implementation? note that sub-classes have to be declared in advance)
it goes out of its way to plug into and exploit core .NET tools (BinaryFormatter, XmlSerializer, WCF, DataContractSerializer) - allowing it to work directly as a remoting engine. This would presumably be quite a big split from the main java trunk for Jon's port.
Re merging them; I think we'd both be open to it, but it seems unlikely you'd want both feature sets, since they target such different requirements.