C# - XML Deserialization vs Binary Deserialization vs Binary+Deflate

C# - XML Deserialization vs Binary Deserialization vs Binary+Deflate - c#

I'm saving a huge list of objects to a file, and later deserializing them. The resulting xml file can be about 3 gigs in size.
I want the Deserialization to be super fast, so i tried all three approaches (xml,binary,compressed)
Obviously, deserialization a compressed file takes much longer than an XML one. But i saw binary deserialization also taking a lot more time vs the xml deserialisation. Is that normal? Shoudn't both xml and binary take pretty much the same time to deserialize the object?
Also what do you think will be the best option to use in terms of a good balance between file size and deserialization speeds?

In this performance comparison between all sorts of serialization methods that come with .NET (BinaryFormatter, XmlSerializer, DataContractSerializer, etc.) and protobuf, the protobuf serializer seems to way ahead of the serializers that come with .NET. The resulting size appaears to be smaller as well. If the protobuf format is an option for you, I strongly recommend you have a look at it. :-)
Another option: if deserializing is slow, only deserialize the parts you really need. Create an index file that tells you the offsets of the objects you write to the data file, so you can quickly deserialize the objects you need in a random-access fashion.

Customise your serialisation, either fully or by implementing ISerializable and then using binary (though custom XML may also be worth experimenting with). Don't serialise memoised fields, but just the key field their value is based upon. Look for other areas where you can reduce size by serialising enough information to build part of the graph, rather than the full representation of the graph.
Then use deflate with that.

Related

C# : Serialize objects to XML without reflection

In an application, we can save the current state of the application and it's configuration(which can be huge). We are using the XmlSerializer.
We now have only what we need in the XML(all XmlIgnore are in place), and it's VERY slow to store the whole configuration(file of ~50-100MB).
We NEED to keep storing this configuration as XML, but we would like to avoid :
The reflection, which is to slow
To implement the IXmlSerializable interface
The idea was to have a method to implement in each object, in which we can register which fields/property we want to serialize, then having a SerializationManager which is able to read what we want to serialize, and then write them.
Like this, objects doesn't know the language (XML) in which they will be rendered, and if one day we want a binary serialization(or if we want to have the possibility to serialize in different format), we can.
But we don't want to reinvent the wheel, and I don't know if some library exists or if something like Linq to xml can help, or if this is natively possible, ...
So how do you think I can achieve this?

"The reflection, which is to slow"
Except, it doesn't use reflection at runtime. It performs metaprogramming on the first run (assuming you are using new XmlSerializer(type)) to inspect the type and generate static code that will work on the given type. Therefore, any volume-related performance issue is not related to reflection. There is a chance that the metaprogramming itself can take a measurable time, but a: this is unlikely unless your model is really complicated, and b: it can be avoided by using the sgen.exe tool to pre-generate the serialization assembly.
Any performance issue, therefore, is most likely due to the size of the model and the overhead of xml.
If you want to try a different serializer, consider something like protobuf-net. You won't be able to read the data (it will not be xml), but the output will be much smaller and faster.

As you mentioned
In an application, we can save the current state of the application and it's configuration
State, especially when it is big (100Mb is ... huge!), required its own way to serialize data. Many of us knows and hates that slow saving/loading game saves from past. Even now, game developers distinguish quicksave from ordinal saves. It is optimized to occurs faster (to example, by caching part of recently performed quicksave) than ordinal save.
First question is why XML? BinarySerializer is faster, but for sizes like this you better use manual serialization (as Marc Gravell suggested, use protobuf, it's ultimate superior to anything).
Second question is, do you really need serialize data (change their format)? The fastest way of saving state is to dump memory. Imagine you have all your data saved in one block of memory, then dumping this block into a file is a very quick save. You may (I am not sure, but it should be doable) construct your data in a way, what overriding this memory will be kind of loading game. This much faster of any conversion.
If you go with dumping, then consider to pack it (into zip). Packing and saving 10 mb should be faster than saving unpacked 100 mb (assuming, you are not using too slow or too good packing algorithm), memory operations and cpu are much faster than SSD.
To save configuration, you can still serialize it as usual. If you want it to be a single file, then define own format of this file, to example:
config_stream, separator["<<<>>>>"], memory block [100 Mb]
Serialize with XmlSerializer into memory, create file, save config, separator, dump.

XML serialization or reading from XML Objects?

I have different XML files that I will need to read. I'm wondering if I should deserialize the files into custom objects or just read the data using XDocument objects and Linq-to-XML.
The files range in size from 1-2kb to 3mb+, and the different objects also range in complexity (some have attributes, some have children, some both, some none).
I figure it would be easier to work with the objects as opposed to Linq-to-XML, but creating those objects would require some time up front. Are there any rules of thumb or suggestions about when to deserialize as opposed to Linq?
Thanks for any help!

It really depends on what you are doing with the data. If you are not using all of the information that is provided by the XML document, then a LINQ based approach is probably easiest. Think of taking an RSS feed, and only keeping track of the article dates, and nothing else. In this case using a deserialization technique doesn't really do anything for you.
If you are using just about every last bit of data in the XML document, and its structure reflects that of your object model, then certainly deserialize it. This is something that I do all of the time for things like settings files, and even simple file formats.
In your case it sounds like it already exists, and was created by some external source, and you don't have an object representation of the data in your code already, so I would suggest using a LINQ based approach. Additionally, you mention a lot of variation in the files so the flexibility of LINQ would again come in handy. That is a wild guess based on your description though.

You could use the xsd.exe tool which could generate those classes from you given an XML file:
C:\work>xsd test.xml
C:\work>xsd /classes test.xsd
There is no really a rule of thumb. Personally I prefer working with strongly typed objects unless the file sizes become large in which case I switch to XmlReader.

Quickest C# Serializer

I need to serialize an OrderedDictionary, and I need it fast. I don't care about security or human-readability, I just need the fastest way to write the OrderedDictionary to file and read it back in again, so long as the serialization is consistent (same set of key-value pairs, same file contents). Is BinaryFormatter the best choice?

You might look into protobuf, Google's serialization format. There are several implementations for C#:
protosharp
protobuf-net
protobuf-csharp-port
There is a performance comparison online.

BinaryFormatter is almost certainly the fastest of the built-in serializers, but it wouldn't be very hard to measure the alternatives and check.

If you're looking for the fastest way, extra coding not being a problem, you could try this:
Serialize:
1. Write your dictionary data to an in-memory byte array. Assuming your dictionary isn't too huge, this should be relatively cheap, time-wise.
2. Write a small header containing the #records, and then write the array to file.
Deserialize:
1. Read the # items and instantiate your OrderedDictionary with that as the capacity.
2. Read the data back into an array, instantiate each object, and write back to the OrderedDictionary.
I think this won't be hugely faster than more traditional methods, but maybe worth a try if you're trying to eek out every last bit of performance.
A faster way would be to write your own IOrderedDictionary with intimate knowledge of your objects, and have it store them in a way optimized for fast serialization (e.g. a flat buffer that can be slammed to disk quickly). Perhaps even faster would be to use some indexing file format (maybe SQLite?) and write a caching IOrderedDictionary adapter to it. This would let you amortize the deserialization cost.

Storing large amounts of data in files. What is the most performant option?

Currently doing XML serialization however, it is very slow. Looking for a way to save/load information from file very quickly not really interested in how it looks on disc (if anything I want it to be obscured as I don't want manual editing).
Thinking of binary format however I am not sure if it would be able to serialize properties which may be of a custom type etc.
Any idea's?

You can try using Sqlite. It is very fast, and will give you complete database implementation with SQL queries on a file.
If you are thinking of trying binary formats, I suggest you try this first.
And can be used with ORM, and can be compressed and encrypted.

What exactly is the data?
With xml, the obvious answer would be to use smoething like GZipStream to compress it - making it smaller and obscure. You could use BinaryFormatter but it is brittle and IMO unsuitable for long-term storage. I would say "protocol buffers", (maybe protobuf-net), but it depends what the "custom data" is. But if you are using XmlSerializer at the moment protobuf-net may work virtually without changes (maybe add a few attributes) - and it is (in every case I've seen to date) both smaller and faster than BinaryFormatter.
Here's the steep learning curve (see also: Getting Started):
[ProtoContract]
public class Person {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
//...
}
To be fair, it can get a little trickier if you are using inheritance - not much though. In many cases you can actually use your existing attributes - it'll work with xml / wcf attributes if an explicit element order is included.

Binary serialization certainly works with properties of Custom Types and typically produces smaller files than XML serialization. It's certainly an approach you should consider if file size is an important factor for your situation.

I agree with Am about using an embedded database like SQLite. It comes with significant benefits. The ability to layer an ORM on top of it is probably the most significant.
XML Serialization is handy, particularly when you need to be able to edit the XML by hand or process it with other XML tools like XSLT etc, but it also has some unavoidable performance problems. One important technique when using XML Serialization in .Net is to cache the XML Serializers. Or to have them created by sgen on build.
The reason to cache the XML Serializer is related to the fact that the .Net runtime will automatically generate, compile and load an assembly containing a serializer if it can't find one in an already loaded assembly. This process can be really slow. Also constructing a new XMLSerializer instance can be quite slow. Hence why you should cache it. Be careful when caching the serializer though as different XMLSerializer constructors can produce different serializer implementations which behave differently. Particular with respect to namespaces, etc.
Then of course there is the usual performance implications of parsing a lot of text. Unfortunately that isn't easy to avoid with XML.
One of the reasons SQLite is a better choice than XML is related to the fact that it is, at its core, a fixed length record storage system. Any binary file with fixed length records is going to be fast to read, index and scan. Fixed block size file formats are almost always screamingly fast to read and write. I would recommend implementing one at some point for your own education.
If you still want a text based format (for ease of interoperability) and don't need the benefits of an ORM then consider using the FileHelpers library.

Binary serialization of Silverlight XAML object

I'm working on Silverlight application that needs to display complex 2d vector graphics.
It downloads zipped XAML file from the server, parses it (XamlRead) and injects to the layout root on the page.
This works fine for fairly small xaml files. The problems is that I need to make it work with much bigger file (lots more content in it). For example one of my uncompressed xaml files is 20 MB large and XamlRead method takes tool long to parse it. My question is if is there a way to do all the parsing on the server side. It would best to just store serialized binary output of XamlRead method as BLOB in the database. However when I try to serialize it, I'm getting a message that "Canvas object is not marked as serializable". I will really appreciate any advices .

Silverlight doesn't have much binary serialization built in; however, protobuf-net works on Silverlight and may help plug this gap. In the current build you can only really serialize types you control (due to adding attributes) - however, I'm in the middle of a big refactor to (among other things) add support for serializing types without attributes.
I expect it to be about 2 more weeks before this is available as a (hopefully) stable build, but you're welcome to take a look at it then.
Note that you will still need to give it some help (telling it what you want it to serialize), but it may be useful.
In particular, the data format ("protocol buffers") is designed to be both dense and efficient to process, which should increase the parse speed. See here for more (numbers are from main .NET, not Silverlight)

I've found the SharpSerializer package very easy to use for fast binary serlization in Silverlight: http://www.sharpserializer.com/en/index.html. You do not need to use the Serializable attribute -- however it only serializes public members.

If parsing is really the problem, it might help to use pre-compiled XAMLs called 'BAML'. This is a binary representation of the XAML file. Since the binary format has a much much cheaper parser instead of the too generic XML, this helps a lot. BAML is also used internally by the .NET compiler to generate more compact files.
For more information, see also http://stuff.seans.com/2008/07/13/hello-wpf-world-part-2-why-xaml/

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.