In my C# application I saved class objects by serialization. Now I have a binary file on my disk. Is this file portable in the sense that installations of the same application on other systems will recognzie it? Are there disadvantages to consider?
If you have the exact same version of the app, then pretty much any serializer will be happy. When you start iterating versions, then assuming you are using BinaryFormatter, you can start hitting version compatibility issues, particularly if you have refactored (moved / renamed types, moved / renamed fields, etc). Which is why I strongly recommend not using BinaryFormatter for persistence. If you just want the data stored, there are a wide range of other serializers that will work great; XmlSerializer, Json.NET, protobuf-net, DataContractSerializer, etc - which all handle versioning more gracefully than BinaryFormatter. If you specifically want binary for performance reasons (large files, etc), then protobuf-net may be worth a look (but I'm biased). Otherwise, you could simply use compression - xml and json compress pretty well.
Additionally, note that BinaryFormatter will not work at all between other platforms, or generally even on other .NET frameworks (including the mobile frameworks, etc).
Related
I have a system where a serialized file is created with a C# program and then deserialized in another C# program. I'm wondering if it's possible to do binary deserialization of a C# file in Java?
Thanks
You can try using some serializator that has implementations for both platforms and outputs data in a platform-independet format, like Protobuf.
Or if you need a full RPC over network between Java and C# application, you can go for Apache Thrift.
I assume you are speaking of an object serialized with BinaryFormatter. The answer then is a qualified "yes," since Java implements a Turing machine. However, this is will not be straightforward.
In this case the data will be in a format most suitable for consumption by a .NET runtime, and will contain information about .NET types and assemblies. You would have to implement your own reader for this format, and then have some way to map between .NET and Java types. (The Mono project implements a BinaryFormatter compatible with .NET's, so you could use their reader implementation as a reference.)
As an alternative, consider using another format for data serialization, such as JSON. This will give you instant portability to a wide array of languages, as well as the possibility for easy human inspection of the data.
Deserializing an object in Java which was serialized with C#'s built-in binary serialization would you'd to implement C#'s deserialization logic in java. That's a pretty involved process, so let's compare some options:
Use a third party library for serialization which works for C# and Java.
Write a routine to serialize each object. One in C#, one in Java. This will be tedious, and hard to maintain.
Implement C#'s serialization logic in Java, or vice versa. This will be difficult, time consuming, and you likely won't get it right the first time.
I recommend option 1, use a third-party library. Here's two third-party libraries I've used and highly suggest.
Google ProtoBufs
Apache Thrift
You can use any cross-platform binary format. Your options include, among others:
Protobuf
BSON (Binary JSON)
GZIP
JSON and XML (herrrrp) are also options, albeit text-based ones.
One other option would be to base64-encode the data, and decode it on the other side; albeit you may get a huge payload because it's binary (probably not a good idea).
I have a .NET application which serializes an object in binary format.
this object is a struct consisting of a few fields.
I must deserialize and use this object in a C++ application.
I have no idea if there are any serialization libraries for C++, a google search hasn't turned up much.
What is the quickest way to accomplish this?
Thanks in advance.
Roey.
Update :
I have serialized using Protobuf-net , in my .NET application, with relative ease.
I also get the .proto file that protobuf-net generated, using GetProto() command.
In the .proto file, my GUID fields get a type of "bcl.guid", but C++ protoc.exe compiler does not know how to interpret them!
What do I do with this?
If you are using BinaryFormatter, then it will be virtually impossible. Don't go there...
Protocol buffers is designed to be portable, cross platform and version-tolerant (so it won't explode when you add new fields etc). Google provide the C++ version, and there are several C# versions freely available (including my own) - see here for the full list.
Small, fast, easy.
Note that the v1 of protobuf-net won't handle structs directly (you'll need a DTO class), but v2 (very soon) does have tested struct support.
Can you edit the .NET app? If so why not use XML Serialization to output the data in a easy to import format?
Both boost and Google have libraries for serialization. However, if your struct is pretty trivial, you might consider managing the serialization yourself by writing bytes out from C# and then reading the data in C++ with fread.
Agree with others. You are making your app very vulnerable by doing this. Consider the situation if one of the classes you're serializing is changed in any way or built on a later version of the C# compiler: Your serialized classes could potentially change causing them to be unreadable.
An XML based solution might work well. Have you considered SOAP? A little out of fashion now but worth a look. The main issue is to decouple the implementation from the data. You can do this in binary if speed / efficiency is an issue, although in my experience, it rarely is.
Serializing in a binary format and expecting an application in another language to read the binary is a very brittle solution (ie it will tend to break on the smallest change to anything).
It would be more stable to serialize the data in a common standard format.
Do you have the option of changing the format? If so, consider choosing a non-binary format for greater interoperability. There are plenty of libraries for reading and writing XML. Json is popular as well.
Binary formats are efficient, but vulnerable to implementation details (does your C++ compiler pack data structures? how are ints and floats represented? what byte ordering is used?), and difficult to adjust if mangled. Text based formats are verbose, but tend to be much more robust. If you are uncertain about binary representations, text representations tend to be easier to understand (apart from challenges such as code pages and wide/narrow characters...).
For C++ XML libraries, the most capable (and perhaps also most complex) would still seem to be the Xerces library. But you should decide for yourself which library best fits your needs and skills.
Use XML Serialization its the best way to go, in fact is the cleanest way to go.
XmlSerializer s = new XmlSerializer( typeof( YourClassType ) );
TextWriter w = new StreamWriter( #"c:\list.xml" );
s.Serialize( w, yourClassListCollection );
w.Close();
Currently doing XML serialization however, it is very slow. Looking for a way to save/load information from file very quickly not really interested in how it looks on disc (if anything I want it to be obscured as I don't want manual editing).
Thinking of binary format however I am not sure if it would be able to serialize properties which may be of a custom type etc.
Any idea's?
You can try using Sqlite. It is very fast, and will give you complete database implementation with SQL queries on a file.
If you are thinking of trying binary formats, I suggest you try this first.
And can be used with ORM, and can be compressed and encrypted.
What exactly is the data?
With xml, the obvious answer would be to use smoething like GZipStream to compress it - making it smaller and obscure. You could use BinaryFormatter but it is brittle and IMO unsuitable for long-term storage. I would say "protocol buffers", (maybe protobuf-net), but it depends what the "custom data" is. But if you are using XmlSerializer at the moment protobuf-net may work virtually without changes (maybe add a few attributes) - and it is (in every case I've seen to date) both smaller and faster than BinaryFormatter.
Here's the steep learning curve (see also: Getting Started):
[ProtoContract]
public class Person {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
//...
}
To be fair, it can get a little trickier if you are using inheritance - not much though. In many cases you can actually use your existing attributes - it'll work with xml / wcf attributes if an explicit element order is included.
Binary serialization certainly works with properties of Custom Types and typically produces smaller files than XML serialization. It's certainly an approach you should consider if file size is an important factor for your situation.
I agree with Am about using an embedded database like SQLite. It comes with significant benefits. The ability to layer an ORM on top of it is probably the most significant.
XML Serialization is handy, particularly when you need to be able to edit the XML by hand or process it with other XML tools like XSLT etc, but it also has some unavoidable performance problems. One important technique when using XML Serialization in .Net is to cache the XML Serializers. Or to have them created by sgen on build.
The reason to cache the XML Serializer is related to the fact that the .Net runtime will automatically generate, compile and load an assembly containing a serializer if it can't find one in an already loaded assembly. This process can be really slow. Also constructing a new XMLSerializer instance can be quite slow. Hence why you should cache it. Be careful when caching the serializer though as different XMLSerializer constructors can produce different serializer implementations which behave differently. Particular with respect to namespaces, etc.
Then of course there is the usual performance implications of parsing a lot of text. Unfortunately that isn't easy to avoid with XML.
One of the reasons SQLite is a better choice than XML is related to the fact that it is, at its core, a fixed length record storage system. Any binary file with fixed length records is going to be fast to read, index and scan. Fixed block size file formats are almost always screamingly fast to read and write. I would recommend implementing one at some point for your own education.
If you still want a text based format (for ease of interoperability) and don't need the benefits of an ORM then consider using the FileHelpers library.
I have a library (written in C#) for which I need to read/write representations of my objects to disk (or to any Stream) in a particular binary format (to ensure compatibility with C/Java library implementations). The format requires a fair amount of bit-packing and some DEFLATE'd bytestreams. I would like my library, however, to be as idiomatic .NET as possible, however, and so would like to provide an API as close as possible to the normal binary serialization process. I'm aware of the ability to implement the IFormatter interface, but being that I really am unable to reuse any part of the built-in serialization stack, is it worth doing this, or will it just bring unnecessary overhead. In other words:
Implement IFormatter and co.
OR
Just provide "Serialize"/"Deserialize" methods that act on a Stream?
A good point brought up below about needing the serialization semantics for any case involving Remoting. In a case where using MarshalByRef objects is feasible, I'm pretty sure that this won't be an issue, so leaving that aside are there any benefits or drawbacks to using the ISerializable/IFormatter versus a custom stack (or, is my understanding remoting incorrectly)?
I have always gone with the latter. There isn't much use in reusing the serialization framework if all you're doing is writing a file to a specific framework. The only place I've run into any issues with using a custom serialization framework is when remoting, you have to make your objects serializable.
This may not help you since you have to write to a specific format, but protobuf and sqlite are good tools for doing custom serialization.
I'd do the former. There's not much to the interface, and so if you're mimicking the structure anyway adding an ": IFormatter" and the other code necessary to get full compatibility won't take much.
Writing your own serialization code is error prone and time consuming.
As a thought - have you considered existing open-source portable formats, for example "protocol buffers"? This is a high density binary serialization format that underpins much of Google's data transfer etc. Versions are available in a wide range of languages - including Java/C++ etc (in the core Google distribution), and a vast range of others.
In particular, for .NET-idiomatic usage, protobuf-net looks a lot like XmlSerializer or DataContractSerializer (indeed, it can even work purely with xml/wcf attributes if it includes an order on each element) - or can use the specific protobuf-net attributes:
[ProtoContract]
class Person {
[ProtoMember(1)]
public string Name {get;set;}
}
If you want to guarantee portability to other implementations, the recommendation is to start "contract first", with a ".proto" file - in this case, something like:
message person {
required string name = 1;
}
This .proto file can then be used to generate any language-specific variant; so with protobuf-net you'd run it through "protogen" (included in protobuf-net; and a VS2008 add-on is in progress); or for Java/C++ etc you'd run it through "protoc" (included in Google's protobuf). "protogen" in protobuf-net can currently emit C# and VB, but it is pretty easy to add another language if you want to use F# etc - it just involves writing (or migrating) an xslt.
There is also another .NET version that is a more direct port of the Java version; as such it is less .NET idiomatic. This is dotnet-protobufs.
Is Protocol Buffer for .NET gonna be lightweight/faster than Remoting(the SerializationFormat.Binary)? Will there be a first class support for it in language/framework terms? i.e. is it handled transparently like with Remoting/WebServices?
I very much doubt that it will ever have direct language support or even framework support - it's the kind of thing which is handled perfectly well with 3rd party libraries.
My own port of the Java code is explicit - you have to call methods to serialize/deserialize. (There are RPC stubs which will automatically serialize/deserialize, but no RPC implementation yet.)
Marc Gravell's project fits in very nicely with WCF though - as far as I'm aware, you just need to tell it (once) to use protocol buffers for serialization, and the rest is transparent.
In terms of speed, you should look at Marc Gravell's benchmark page. My code tends to be slightly faster than his, but both are much, much faster than the other serialization/deserialization options in the framework. It should be pointed out that protocol buffers are much more limited as well - they don't try to serialize arbitrary types, only the supported ones. We're going to try to support more of the common data types (decimal, DateTime etc) in a portable way (as their own protocol buffer messages) in future.
Some performance and size metrics are on this page. I haven't got Jon's stats on there at the moment, just because the page is a little old (Jon: we must fix that!).
Re being transparent; protobuf-net can hook into WCF via the contract; note that it plays nicely with MTOM over basic-http too. This doesn't work with Silverlight, though, since Silverlight lacks the injection point. If you use svcutil, you also need to add an attribute to class (via a partial class).
Re BinaryFormatter (remoting); yes, this has full supprt; you can do this simply by a trivial ISerializable implementation (i.e. just call the Serializer method with the same args). If you use protogen to create your classes, then it can do it for you: you can enable this at the command line via arguments (it isn't enabled by default as BinaryFormatter doesn't work on all frameworks [CF, etc]).
Note that for very small objects (single instances, etc) on local remoting (IPC), the raw BinaryFormatter performance is actually better - but for non-trivial graphs or remote links (network remoting) protobuf-net can out-perform it pretty well.
I should also note that the protocol buffers wire format doesn't directly support inheritance; protobuf-net can spoof this (while retaining wire-compatibility), but like with XmlSerializer, you need to declare the sub-classes up-front.
Why are there two versions?
The joys of open source, I guess ;-p Jon and I have worked on joint projects before, and have discussed merging these two, but the fact is that they target two different scenarios:
dotnet-protobufs (Jon's) is a port of the existing java version. This means it has a very familiar API for anybody already using the java version, and it is built on typical java constructs (builder classes, immutable data classes, etc) - with a few C# twists.
protobuf-net (Marc's) is a ground-up re-implementation following the same binary format (indeed, a critical requirement is that you can interchange data between different formats), but using typical .NET idioms:
mutable data classes (no builders)
the serialization member specifics are expressed in attributes (comparable to XmlSerializer, DataContractSerializer, etc)
If you are working on java and .NET clients, Jon's is probably a good choice for the familiar API on both sides. If you are pure .NET, protobuf-net has advantages - the familiar .NET style API, but also:
you aren't forced to be contract-first (although you can, and a code-generator is supplied)
you can re-use your existing objects (in fact, [DataContract] and [XmlType] classes can often be used without any changes at all)
it has full support for inheritance (which it achieves on the wire by spoofing encapsulation) (possibly unique for a protocol buffers implementation? note that sub-classes have to be declared in advance)
it goes out of its way to plug into and exploit core .NET tools (BinaryFormatter, XmlSerializer, WCF, DataContractSerializer) - allowing it to work directly as a remoting engine. This would presumably be quite a big split from the main java trunk for Jon's port.
Re merging them; I think we'd both be open to it, but it seems unlikely you'd want both feature sets, since they target such different requirements.