I spent a good portion of time last week working on serialization. During that time I found many examples utilizing either the BinaryFormatter or XmlSerializer. Unfortunately, what I did not find were any examples comprehensively detailing the differences between the two.
The genesis of my curiosity lies in why the BinaryFormatter is able to deserialize directly to an interface whilst the XmlSerializer is not. Jon Skeet in an answer to "casting to multiple (unknown types) at runtime" provides an example of direct binary serialization to an interface. Stan R. provided me with the means of accomplishing my goal using the XmlSerializer in his answer to "XML Object Deserialization to Interface."
Beyond the obvious of the BinaryFormatter utilizes binary serialization whilst the XmlSerializer uses XML I'd like to more fully understand the fundamental differences. When to use one or the other and the pros and cons of each.
The reason a binary formatter is able to deserialize directly to an interface type is because when an object is originally serialized to a binary stream metadata containing type and assembly information is stuck in with the object data. This means that when the binary formatter deserializes the object it knows its type, builds the correct object and you can then cast that to an interface type that object implements.
The XML serializer on the otherhand just serializes to a schema and only serializes the public fields and values of the object and no type information other then that (e.g. interfaces the type implements).
Here is a good post, .NET Serialization, comparing the BinaryFormatter, SoapFormatter, and XmlSerializer. I recommend you look at the following table which in addition to the previously mentioned serializers includes the DataContractSerializer, NetDataContractSerializer and protobuf-net.
Just to weigh in...
The obvious difference between the two is "binary vs xml", but it does go a lot deeper than that:
fields (BinaryFormatter=bf) vs public members (typically properties) (XmlSerializer=xs)
type-metadata based (bf) vs contract-based (xs)
version-brittle (bf) vs version-tolerant (xs)
"graph" (bf) vs "tree" (xs)
.NET specific (bf) vs portable (xs)
opaque (bf) vs human-readable (xs)
As a discussion of why BinaryFormatter can be brittle, see here.
It is impossible to discuss which is bigger; all the type metadata in BinaryFormatter can make it bigger. And XmlSerializer can work very with compression like gzip.
However, it is possible to take the strengths of each; for example, Google have open-sourced their own data serialization format, "protocol buffers". This is:
contract-based
portable (see list of implementations)
version-tolerant
tree-based
opaque (although there are tools to show data when combined with a .proto)
typically "contract first", but some implementations allow implicit contracts based on reflection
But importantly, it is very dense data (no type metadata, pure binary representation, short tags, tricks like variant-length base-7 encoding), and very efficient to process (no complex xml structure, no strings to match to members, etc).
I might be a little biased; I maintain one of the implementations (including several suitable for C#/.NET), but you'll note I haven't
linked to any specific implementation; the format stands under its own merits ;-p
The XML Serializer produces XML and also an XML Schema (implicitly). It will produce XML that conforms to this schema.
One implication is that it will not serialize anything which cannot be described in XML Schema. For instance, there is no way to distinguish between a list and an array in XML Schema, so the XML Schema produced by the serializer can be interpreted either way.
Runtime serialization (which the BinaryFormatter is part of) serializes the actual .NET types to the other side, so if you send a List<int>, the other side will get a List<int>.
That obviously works better if the other side is running .NET.
The XmlSerializer serialises the type by reading all the type's properties that have both a public getter and a public setter (and also any public fields). In this sense the XmlSerializer serializes/deserializes the "public view" of the instance.
The binary formatter, by contrast, serializes a type by serializing the instance's "internals", i.e. its fields. Any fields that are not marked as [NonSerialized] will be serialized to the binary stream. The type itself must be marked as [Serializable] as must any internal fields that are also to be serialized.
I guess one of the most important ones is that binary serialization can serialize both public and private members, whereas the other one works only with public ones.
In here, it provides a very helpful comparison between these two in terms of size. It's a very important issue, because you might send your serialized object to a remote machine.
http://www.nablasoft.com/alkampfer/index.php/2008/10/31/binary-versus-xml-serialization-size/
Related
Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.
I have a project with a few types that I binary-serialize (BinaryFormatter) to files. I'd now like to create a second project which allows admins to temporarily decode those files into a more readable XML format (e.g., using XmlSerializer).
I could deserialize them into an object of the original type, then reserialize them, but is it at all possible to
skip the deserialization (at least in my own code), and
better yet, not have to reference the type at all in my decoder tool?
If you are referring to .NET's native binary serialization (BinaryFormatter), the problem is that it saves your object (along with all the necessary metadata for deserialization) using an undocumented format (AFAIK).
If you really want to try doing it without deserialization, you can check this article, which appears to have analyzed its format (but the author himself states that it might be incomplete). But my opinion is that it's far too much trouble.
There's no way that you can deserialize to your type without specifying that type, at least with standard XML serialization, however, as long as the object was serialized using xml, you could use one of many XML reader classes to traverse the object without needing to deserialize it. Alternatively, if you happen to be serializing to JSON, there are some libraries out there that will deserialize to anonymous types that you can use.
If the type of conversion you are looking for is purely cosmetic (e.g. to make it more human readable), you could write some RegEx to replace the element tags.
I write a desktop application that can open / edit / save documents.
Those documents are described by several objects of different types that store references to each other. Of course there is a Document class that that serves as the root of this data structure.
The question is how to save this document model into a file.
What I need:
Support for recursive structures.
It must be able to open files even if they were produced from slightly different classes. My users don't want to recreate every document after every release just because I added a field somewhere.
It must deal with classes that are not known at compile time (for plug-in support).
What I tired so far:
XmlSerializer -> Fails the first and last criteria.
BinarySerializer -> Fails the second criteria.
DataContractSerializer: Similar to XmlSerializer but with support for cyclic (recursive) references. Also it was designed with (forward/backward) compatibility in mind: Data Contract Versioning. [edit]
NetDataContractSerializer: While the DataContractSerializer still requires to know all types in advance (i.e. it can't work very well with inheritance), NetDataContractSerializer stores type information in the output. Other than that the two seem to be equivalent. [edit]
protobuf-net: Didn't have time to experiment with it yet, but it seems similar in function to DataContractSerializer, but using a binary format. [edit]
Handling of unknown types [edit]
There seem two be two philosophies about what to do when the static and dynamic type differ (if you have a field of type object but a, lets say, Person-object in it). Basically the dynamic type must somehow get stored in the file.
Use different XML tags for different dynamic types. But since the XML tag to be used for a particular class might not be equal to the class name, its only possible to go this route if the deserializer knows all possible types in advance (so that he can scan them for attributes).
Store the CLR type (class name, assembly name & version) during serialization. Use this info during deserialization to instantiate the right class. The types must not be known prior to deserialization.
The second one is simpler to use, but the resulting file will be CLR dependent (and less sensitive to code modifications). Thats probably why XmlSerializer and DataContractSerializer choose the first way. NetDataContractSerializer is not recomended because its using the second approch (So does BinarySerializer by the way).
Any ideas?
The one you haven't tried is DataContractSerializer. There is a constructor that takes a parameter bool preserveObjectReferences that should handle the first criteria.
The WCF data contract serializer is probably closest to your needs, although not perfect.
There is only limited support for backwards compatibility (i.e. whether old versions of the program can read documents generated with a newer version). New fields are supported (via IExtensibleDataObject), but new classes or new enum values not.
I would think the XmlSerializer is your best bet. You won't be able to support everything on your requirements list without a bit of work in your Document classes - but the XmlSerializer architecture gives you extensibility points which should allow you to tap into its mechanism deep enough to do just about anything.
Using the IXmlSerializable interface - by implementing that on your classes you want to store - you should be able to do just about anything, really.
The interface exposes basically two methods - ReadXml And WriteXml
public void WriteXml (XmlWriter writer)
{
// do what you need to do to write out your XML for this object
}
public void ReadXml (XmlReader reader)
{
// do what you need to do to read your object from XML
}
Using these two methods, you should be able to capture the necessary state information from just about any object you might want to store, and turn it into XML that can be persisted to disk - and deserialized back into an object when the time comes!
XmlSerializer can work for your first criteria, however you must provide the recursion for objects like the TreeView control.
BinaryFormatter can work for all 3 criteria. If a class changes, you may have to create a conversion tool to convert old format documents to a new format. Or recognize an older format, deserialize to the old, and then save to the new - keeping your old class format around for a little while.
This will help cover version tolerance which is what I think you're after: MSDN - Version Tolerant Serialization
I've implemented a data access library that allows devs to mark up their derived classes with attributes to have them mapped directly to stored procedures. So far, so good. Now I'd like to provide a Serialize() method or override ToString(), and have the derived classes get free serialization into XML.
Where should I start? Will I have to use Reflection to do this?
XML Serialization using XmlSerializer
In the first instance, I would look at the XML Serialization in the .NET Framework that supports serialization of objects to and from XML using an XmlSerializer. There's also an article from Extreme XML on using this serialization framework.
The following links all provide examples of using this approach:
CodeProject article
Microsoft KB article
DotNetJohn - XML Serialization Using C#
ISerializable and SerializableAttribute
An alternative to this would be to use a formatter and the regular SerializableAttribute and ISerializable system of serialization. However, there is no built-in XML formatter for this framework other than the SoapFormatter, so you'd need to roll your own or find a third party/open source implementation.
Roll Your Own
Also, you could consider writing your own system using, for example, reflection to walk your object tree, serializing items according to their serialization visibility, which could be indicated by your own attributes or the existing DesignerSerializationVisibility attributes. The downside to this shown by most implementations is that it expects properties to be publicly read/write, so bear that in mind when evaluating existing custom solutions.
I would start by looking at XmlSerializer.
Hopefully you'll be able to end there too, since it already gives you this functionality :)
You should be able to use the XmlSerializer to perform the serialization of your class
XmlSerializer serializer = new XmlSerializer(this.GetType());
serializer.Serialize(stream, obj);
I am heavily using byte array to transfer objects, primitive data, over the network and back. I adapt java's approach, by having a type implement ISerializable, which contains two methods, as part of the interface, ReadObjectData and WriteObjectData. Any class using this interface, would write date into the byte array. Something Like that
class SerializationType:ISerializable
{
void ReadObjectData (/*Type that manages the write/reads into the byte array*/){}
void WriteObjectData(/*Type that manages the write/reads into the byte array*/){}
}
After write is complete for all object, I send an array of the network.
This is actually two-fold question. Is it a right way to send data over the network for the most efficiency (in terms of speed, size)?
Would you use this approach to write objects into the file, as opposed to use typically xml serialization?
Edit #1
Joel Coehoorn mentioned BinaryFormatter. I have never used this class. Would you elaborate, provide good example, references, recommendations, current practices -- in addition to what I currently see on msdn?
This should be fine, but you're doing work that is already done for you. Look at the System.Runtime.Serialization.Formatters.Binary.BinaryFormatter class.
Rather than needing to implement your own Read/WriteOjbectData() methods for each specific type you can just use this class that can already handle most any object. It basically takes an exact copy of the memory representation of almost any .Net object and writes it to or reads it from a stream:
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(outputStream, objectToSerialize);
objectToDeserialize = bf.Deserialize(inputStream) as DeserializedType;
Make sure you read through the linked documents: there can be issues with unicode strings, and an exact memory representation isn't always appropriate (things like open Sockets, for example).
If you are after simple, lightweight and efficient binary serialization, consider protobuf-net; based on google's protocol buffers format, but implemented from scratch for typical .NET usage. In particular, it can be used either standalone (via protobuf-net's Serializer), or via BinaryFormatter by implementing ISerializable (and delegating to Serializer).
Apart from being efficient, this format is designed to be extensible and portable (i.e. compatible with java/php/C++ "protocol buffers" implementations), unlike BinaryFormatter that is both implementation-specific and version-intolerant. And it means you don't have to mess around writing any serialization code...
Creating your own ISerializable interface when there's already one in the framework sounds like a bit of a recipe for disaster. At least give it a different name.
You'll have a bit of a problem when it comes to reading - you won't have an instance to call the method on. You might want to make it a sort of "factory" instead:
public interface ISerializationFactory<T>
{
T ReadObjectData(Stream input);
void WriteObjectData(Stream output);
}
As for XML vs binary... it entirely depends on the situation: how much data will there be, do you need backwards and forwards compatibility, does the XML serialization in .NET give you enough control already etc.
Yes this will be faster than sending XML as you will be sending less data over the wire. Even if you compressed the XML (which would drastically reduce its size) you would still have the overhead of compression and decompression. So I would say that between what you are currently doing and XML serialization you are currently using the most efficient solution.
However I am curious as to how much of a performance hit you would incur by using XML instead of a marshaled object. The reason that I would encourage you to look into XML serialization is because you will be storing the data in an application-neutral format that is also human readable. If you are able to serialize the data to XML in a way that does not incur performance penalties in your application I would recommend that you look into it.
Regarding writing to file, generally you want to serialize an object to XML if you want to be able to read the serialization or perhaps alter it. If you have no desire for the serialization to be human readable, you might as well reuse your binary serialization.
If you do want it to be human readable, then XML is something to consider, but it depends on the type of data you need to serialize. XML is inherently recursive and is therefore good for serializing likewise recursive data. It's less of a good fit on other types of data.
In other words, pick a persistent serialization that suits your needs. There's no one-way-fits-all solution here.
As for network, generally you'll want to keep size to a minimum, so XML is usually never a good choice due to its verbosity.
Serialization (in Java) is deceptively simple. As long as you do simple stuff (like never change the class) it is easy - but there are a number of "fun" things with it too.
Foe a good discussion on Java serialization look at Effective Java (specifically chapter 10).
For C#, not sure, but likely the core issues are the same.
There is an example here on C# serialization: http://www.codeproject.com/KB/cs/objserial.aspx.
XStream library provide an exceptionally good way of dealing with serialisation including support for XML, JSON and supporting custom converters. Specifically, the use of custom converters allowed us to reduce XML verbosity and to serialise strictly what is needed.
XStream has no requirement to declare everything as Serializable, which is very important when one utilises a third-party lib and needs to serialise an instance of a class from that lib, which is not declared as Serializable.
The answer is already accepted, but for the sake of completeness of this discussion here is a link to a good comparison between different serialisation approaches/libraries:
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
The kryo library looks very compelling for Java serialisation. Similarly to XStream is supports custom converters.