Difference between XmlTextWriter and XmlSerializer?

Difference between XmlTextWriter and XmlSerializer? - c#

I was using XmlSerializer when I came across someone using XmlTextWriter.
What is the difference between those two?
To me, they serve the same function which is to create XML files. Microsoft website said that XmlTextWriter provides a fast, non-cached, forward-only way of generating streams but I don't really know what that means.

The XmlTextWriter class is an object that knows XML. You can use it to generate arbitrary XML documents. It doesn't matter where the data's coming from; you can pull data for XML elements, attributes, and contents along with the actual structure of the XML document from whatever source you see fit, and it doesn't need to match any particular object's structure or data.
On the other hand XmlSerializer is an object that knows types. It has the features necessary to analyze a type, extract the important information, and write that information out. It happens to be able to use an XmlTextWriter object to perform the actual I/O; you can provide your own, or at some level it will always create a similar object to handle the actual I/O. In other words, the serializer object doesn't really know XML per se, nor does it need to. It delegates that work to another object.
Microsoft website said that XmlTextWriter provides a fast, non-cached, forward-only way of generating streams but I don't really know what that means.
"fast": not slow
"non-cached": important pieces of information are not stored in memory longer than absolutely necessary
"forward-only": you cannot revisit parts of the XML document you've already created
That is in contrast to other methods for generating XML documents in which the entire document structure is held in memory as its constructed, and written to a file only once the entire document has been constructed. This is often described as the "document object model", or DOM.
The writer approach tends to be more efficient in performance because the XML data is being generated on the fly, as needed, directly from other in-memory data structures you already have. Because the DOM approach requires the entire file's data and structure to be represented in memory at once, it will usually use more memory, which in some cases can reduce performance (though, frankly, on modern computers and for typical XML documents, this is usually a complete non-issue).

Related

XML (de)serialization and schema upgrades

I have a complex graph of XML-serializable classes that I'm able to (de)serialize to hard-disk just fine. But how do I handle massive changes to the graph schema structure? Is there some mechanism to handle XML schema upgrades? Some classes that would allow me to migrate old data to the new format?
Sure I could just use XmlReader/XmlWriter, go through every node and attribute and write several thousand lines of code to convert data to the new format, but maybe there is a better way?
I have found Object graph serialization in .NET and code version upgrades, but I don't think the linked articles apply when there are major changes in the model.

Instead of writing several thousand lines of code to convert files using XmlReader / XmlWriter, you could use XSLT. We are still talking hundreds of lines of code, and perhaps slower execution speeds, but if you are good at XSLT you could get it done much faster.
The other approach would be to build a C# program that links both the old class and the new class (of course you'd need to rename the old class to avoid naming collision). The program would load OldMyClass from disk, construct NewMyClass from the values of its attributes, and serialize NewMyClass to disk. Essentially, this approach moves the task of conversion into the C# territory, which may be a lot more familiar to you.

In this case, i keep my changes in my object and recreate my xml through the XmlSerializer: http://support.microsoft.com/kb/815813
With this i load and save new xml schema based in my object.

XmlTextReader vs. XDocument

I'm in the position to parse XML in .NET. Now I have the choice between at least XmlTextReader and XDocument. Are there any comparisons between those two (or any other XML parsers contained in the framework)?
Maybe this could help me to decide without trying both of them in depth.
The XML files are expected to be rather small, speed and memory usage are a minor issue compared to easiness of use. :-)
(I'm going to use them from C# and/or IronPython.)
Thanks!

If you're happy reading everything into memory, use XDocument. It'll make your life much easier. LINQ to XML is a lovely API.
Use an XmlReader (such as XmlTextReader) if you need to handle huge XML files in a streaming fashion, basically. It's a much more painful API, but it allows streaming (i.e. only dealing with data as you need it, so you can go through a huge document and only have a small amount in memory at a time).
There's a hybrid approach, however - if you have a huge document made up of small elements, you can create an XElement from an XmlReader positioned at the start of the element, deal with the element using LINQ to XML, then move the XmlReader onto the next element and start again.

XmlTextReader is kind of deprecated, do not use it.
From msdn blogs by XmlTeam
Effective Xml Part 1: Choose the right API
Avoid using XmlTextReader. It contains quite a few bugs that could not be fixed without breaking existing applications already using it.
The world has moved on, have you? Xml APIs you should avoid using.
Obsolete APIs are easy since the compiler helps identifying them but there are two more APIs you should avoid using – namely XmlTextReader and XmlTextWriter. We found a number of bugs in these classes which we could not fix without breaking existing applications. The easy route would be to deprecate these classes and ask people to use replacement APIs instead. Unfortunately these two classes cannot be marked as obsolete because they are part of ECMA-335 (Common Language Infrastructure) standard (http://www.ecma-international.org/publications/standards/Ecma-335.htm) – the companion CLILibrary.xml file which is a part of Partition IV).
The good news is that even though these classes are not deprecated there are replacement APIs for these in .NET Framework already and moving to them is relatively easy. First it is necessary to find the places where XmlTextReader or XmlTextWriter is being used (unfortunately it is a manual step). Now all the occurrences of XmlTextReader should be replaced with XmlReader and all the occurrences of XmlTextWriter should be replaced with XmlWriter (note that XmlTextReader derives from XmlReader and XmlTextWriter derives from XmlWriter so the app can already be using these e.g. as formal parameters). The last step is to change the way the XmlReader/XmlWriter objects are instantiated – instead of creating the reader/writer directly it is necessary to the static factory method .Create() present on both XmlReader and XmlWriter APIs.
Furthermore, intellisense in Visual Studio doesn't list XmlTextReader under System.Xml namespace. The class is defined as:
[EditorBrowsable(EditorBrowsableState.Never)]
public class XmlTextReader : XmlReader, IXmlLineInfo, IXmlNamespaceResolver
The XmlReader.Create factory methods return other internal implementations of the abstract class XmlReader depending on the settings passed.
For forward-only streaming API (i.e. that doesn't load the entire thing into memory), use XmlReader via XmlReader.Create method.
For an easier API to work with, go for XDocument aka LINQ To XML. Find XDocument vs XmlDocument here and here.

why we can't Serialize these objects?

why we can't Serialize objects into Random Access file ? and on the other hand we can serialize objects into sequential access file ?
""C# does not provide a means to obtain an object’s size at runtime. This means that,
if we serialize the class, we cannot guarantee a fixed-length record size "" (from the book that i read in).
so we cannot read the the random access file because we don't know every object size in the file so how we could do seeking ??????

Any object marked with the SerializableAttribute attribute can be serialized (in most scenarios). The result from serialization is always directed to a stream, which may very well be a file output stream.
Are you asking why an object graph cannot be deserialized partially? .NET serialization only [de]serializes complete object graphs. Otherwise you'll have to turn to other serialization formatters, or write your own.
For direct random access to a file, you must open the file with a stream that supports seeking.
EDIT:
Seeking in the resulting stream from a serialization has no practical purpose - only the serialiation formatter knows what's in there anyway and should always be fed the very start of the stream.
For persisting the data into other structures; do it in a two-stage process: First, target the serialization bytes to a [i.e. memory-backed] stream that you can read the size from afterwards, then write the data to the actual backing store, using said knowledge of size.
You can't predict the size of a serialized object, because the serialized representation might differ a lot from the runtime representation.
It it still possible to achieve exact control over output size, if you use only primitive types, and you write using a BinaryWriter - but that is not serialization per-se.

The default binary serialization in .NET serializes a whole object graph, which, by its nature of being a graph, doesn't have a constant size, which means each serialization object (record) won't have a constant size, preventing random access.
To be able to randomly access any record in a file, write your own implementation of the binary serialization of your class, or use a database. If you need a simple, no-install single-threaded database engine, have a look at SQL Server Compact.

Serialization for document storage

I write a desktop application that can open / edit / save documents.
Those documents are described by several objects of different types that store references to each other. Of course there is a Document class that that serves as the root of this data structure.
The question is how to save this document model into a file.
What I need:
Support for recursive structures.
It must be able to open files even if they were produced from slightly different classes. My users don't want to recreate every document after every release just because I added a field somewhere.
It must deal with classes that are not known at compile time (for plug-in support).
What I tired so far:
XmlSerializer -> Fails the first and last criteria.
BinarySerializer -> Fails the second criteria.
DataContractSerializer: Similar to XmlSerializer but with support for cyclic (recursive) references. Also it was designed with (forward/backward) compatibility in mind: Data Contract Versioning. [edit]
NetDataContractSerializer: While the DataContractSerializer still requires to know all types in advance (i.e. it can't work very well with inheritance), NetDataContractSerializer stores type information in the output. Other than that the two seem to be equivalent. [edit]
protobuf-net: Didn't have time to experiment with it yet, but it seems similar in function to DataContractSerializer, but using a binary format. [edit]
Handling of unknown types [edit]
There seem two be two philosophies about what to do when the static and dynamic type differ (if you have a field of type object but a, lets say, Person-object in it). Basically the dynamic type must somehow get stored in the file.
Use different XML tags for different dynamic types. But since the XML tag to be used for a particular class might not be equal to the class name, its only possible to go this route if the deserializer knows all possible types in advance (so that he can scan them for attributes).
Store the CLR type (class name, assembly name & version) during serialization. Use this info during deserialization to instantiate the right class. The types must not be known prior to deserialization.
The second one is simpler to use, but the resulting file will be CLR dependent (and less sensitive to code modifications). Thats probably why XmlSerializer and DataContractSerializer choose the first way. NetDataContractSerializer is not recomended because its using the second approch (So does BinarySerializer by the way).
Any ideas?

The one you haven't tried is DataContractSerializer. There is a constructor that takes a parameter bool preserveObjectReferences that should handle the first criteria.

The WCF data contract serializer is probably closest to your needs, although not perfect.
There is only limited support for backwards compatibility (i.e. whether old versions of the program can read documents generated with a newer version). New fields are supported (via IExtensibleDataObject), but new classes or new enum values not.

I would think the XmlSerializer is your best bet. You won't be able to support everything on your requirements list without a bit of work in your Document classes - but the XmlSerializer architecture gives you extensibility points which should allow you to tap into its mechanism deep enough to do just about anything.
Using the IXmlSerializable interface - by implementing that on your classes you want to store - you should be able to do just about anything, really.
The interface exposes basically two methods - ReadXml And WriteXml
public void WriteXml (XmlWriter writer)
{
// do what you need to do to write out your XML for this object
}
public void ReadXml (XmlReader reader)
{
// do what you need to do to read your object from XML
}
Using these two methods, you should be able to capture the necessary state information from just about any object you might want to store, and turn it into XML that can be persisted to disk - and deserialized back into an object when the time comes!

XmlSerializer can work for your first criteria, however you must provide the recursion for objects like the TreeView control.
BinaryFormatter can work for all 3 criteria. If a class changes, you may have to create a conversion tool to convert old format documents to a new format. Or recognize an older format, deserialize to the old, and then save to the new - keeping your old class format around for a little while.
This will help cover version tolerance which is what I think you're after: MSDN - Version Tolerant Serialization

Serialization byte array vs XML file

I am heavily using byte array to transfer objects, primitive data, over the network and back. I adapt java's approach, by having a type implement ISerializable, which contains two methods, as part of the interface, ReadObjectData and WriteObjectData. Any class using this interface, would write date into the byte array. Something Like that
class SerializationType:ISerializable
{
void ReadObjectData (/*Type that manages the write/reads into the byte array*/){}
void WriteObjectData(/*Type that manages the write/reads into the byte array*/){}
}
After write is complete for all object, I send an array of the network.
This is actually two-fold question. Is it a right way to send data over the network for the most efficiency (in terms of speed, size)?
Would you use this approach to write objects into the file, as opposed to use typically xml serialization?
Edit #1
Joel Coehoorn mentioned BinaryFormatter. I have never used this class. Would you elaborate, provide good example, references, recommendations, current practices -- in addition to what I currently see on msdn?

This should be fine, but you're doing work that is already done for you. Look at the System.Runtime.Serialization.Formatters.Binary.BinaryFormatter class.
Rather than needing to implement your own Read/WriteOjbectData() methods for each specific type you can just use this class that can already handle most any object. It basically takes an exact copy of the memory representation of almost any .Net object and writes it to or reads it from a stream:
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(outputStream, objectToSerialize);
objectToDeserialize = bf.Deserialize(inputStream) as DeserializedType;
Make sure you read through the linked documents: there can be issues with unicode strings, and an exact memory representation isn't always appropriate (things like open Sockets, for example).

If you are after simple, lightweight and efficient binary serialization, consider protobuf-net; based on google's protocol buffers format, but implemented from scratch for typical .NET usage. In particular, it can be used either standalone (via protobuf-net's Serializer), or via BinaryFormatter by implementing ISerializable (and delegating to Serializer).
Apart from being efficient, this format is designed to be extensible and portable (i.e. compatible with java/php/C++ "protocol buffers" implementations), unlike BinaryFormatter that is both implementation-specific and version-intolerant. And it means you don't have to mess around writing any serialization code...

Creating your own ISerializable interface when there's already one in the framework sounds like a bit of a recipe for disaster. At least give it a different name.
You'll have a bit of a problem when it comes to reading - you won't have an instance to call the method on. You might want to make it a sort of "factory" instead:
public interface ISerializationFactory<T>
{
T ReadObjectData(Stream input);
void WriteObjectData(Stream output);
}
As for XML vs binary... it entirely depends on the situation: how much data will there be, do you need backwards and forwards compatibility, does the XML serialization in .NET give you enough control already etc.

Yes this will be faster than sending XML as you will be sending less data over the wire. Even if you compressed the XML (which would drastically reduce its size) you would still have the overhead of compression and decompression. So I would say that between what you are currently doing and XML serialization you are currently using the most efficient solution.
However I am curious as to how much of a performance hit you would incur by using XML instead of a marshaled object. The reason that I would encourage you to look into XML serialization is because you will be storing the data in an application-neutral format that is also human readable. If you are able to serialize the data to XML in a way that does not incur performance penalties in your application I would recommend that you look into it.

Regarding writing to file, generally you want to serialize an object to XML if you want to be able to read the serialization or perhaps alter it. If you have no desire for the serialization to be human readable, you might as well reuse your binary serialization.
If you do want it to be human readable, then XML is something to consider, but it depends on the type of data you need to serialize. XML is inherently recursive and is therefore good for serializing likewise recursive data. It's less of a good fit on other types of data.
In other words, pick a persistent serialization that suits your needs. There's no one-way-fits-all solution here.
As for network, generally you'll want to keep size to a minimum, so XML is usually never a good choice due to its verbosity.

Serialization (in Java) is deceptively simple. As long as you do simple stuff (like never change the class) it is easy - but there are a number of "fun" things with it too.
Foe a good discussion on Java serialization look at Effective Java (specifically chapter 10).
For C#, not sure, but likely the core issues are the same.
There is an example here on C# serialization: http://www.codeproject.com/KB/cs/objserial.aspx.

XStream library provide an exceptionally good way of dealing with serialisation including support for XML, JSON and supporting custom converters. Specifically, the use of custom converters allowed us to reduce XML verbosity and to serialise strictly what is needed.
XStream has no requirement to declare everything as Serializable, which is very important when one utilises a third-party lib and needs to serialise an instance of a class from that lib, which is not declared as Serializable.
The answer is already accepted, but for the sake of completeness of this discussion here is a link to a good comparison between different serialisation approaches/libraries:
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
The kryo library looks very compelling for Java serialisation. Similarly to XStream is supports custom converters.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.