Good Way To Handle XML Change

Good Way To Handle XML Change - c#

Our system stores XML strings in a database. I've recently had to change the Properties on a Class, and now when an XML string gets deserialized it will throw an exception. What is the best way to handle this change? Look for the Node in the application code using XPATH or LINQ, or change the xml string in sql database (ie do a mass update)?.

You might want to look at writing a custom XML deserializer (i.e. implementing IXmlSerializable, see here) to handle changes in your XML. If you've invested a lot of time into crafting your XML serialization attributes, you may want to look at another approach.
Consider batch-upgrading your XML, or deprecating (instead of removing) properties inside of your classes and mapping older behavior to newer behavior.
Longer term, you will want to come up with a strategy for dealing with this in the future, since you will most likely be continue to make changes to your schema/object definitions as you add/change the functionality of your system.

if you serialize the objects to the database you could try the approach I outlined here to load the old versions into a new version then when you save the new version will be saved. Not sure if having different versions of your class will be appropriate though...
Basically you create a factory to produce your objects from the xml. everytime you change your object you create a new factory and a new object class, which is given a version of the old class in its constructor and it creates itself from the old class. The new factory tries to create a new object from the xml, if it can, happy days, if it can't then it creates a new object and tells the next oldest factory to create a next oldest object from the xml. These factories can then be chained together so that you can always load a newest version of the objects from whatever data is in the db.
This assumes that its possible to always create a valid v2 object from a v1 object.

It's a good practice to store a version along your XML strings. Either at the database level or at the class level so that your code knows which version of the class it has to deserialize.
You might also look at XSLT. It allows you to transform one version of XML into another.
In that case the logic to go from one version to another is not handle by code but by the XSLT. You can even store the XSLT into the database which makes it reusable by other programs.

Related

Dynamic XSLT generation based on changes in XSD

Initially I had various XSD definition for each XSD I had set of XML files stored.
After some time duration there are some changes in XSD definition so my stored XML is no more validation again new XSD.
For support I need to write XSLT and do changes in my stored XML to validate again new XSD.
Now, in this scenario each time XSD change, I need to write XSLT manually how can I generate this XSLT dynamically.
Currently I am able compare old and new XSD and get the list what is changes using Microsoft.XmlDiffPatch DLL.
Based on this changes I need to generate XSLT using C#.

I don't know what your question is, but I think this is technically possible.
It might be easier to just write some c# code that reads the Xml and then augments it and sets it back to the file/database/dataStore.

Not sure there is a magic bullet for this. Sounds like you're in for some work, and I'd advise that whatever you do be as reusable as possible in the face of future changes.
You might want to consider using xproc (via Calabash or some other engine) to create an XML pipeline whereby you detect and pass in changes of an XSD into an XSL (perhaps keeping to the convention of one XSL per XSD, to retain your sanity), and then said XSLs take those changes and handle them for all XML files bound by the XSD whose changes are being handled at the moment. Breaking all these into sub-transformations within the pipeline could be possible, and might make things more reusable in the future.
Inside the XSLs you're likely looking at doing something like:
for all changes to be made
for each XML
match/add/delete per element and/or attribute to implement change
One way to represent the changes to be made in some sort of standard format is as an incoming list of operations to perform and associated elements/attrs to act upon (maybe set it up as key/value pairs). Each operation could be a string (add, delete, convert) or a numeric code. You then traverse the list of ops and associated elements and trigger matches to accommodate.
This is all somewhat abstract because I have no idea of the scope or depth of changes you need to make. I'm really just thinking out loud here. You might just have to knuckle down and do some serious one-time work, then implement some sort of change control process to make sure things don't get out of hand in the future.
Hope this helps. Good luck!

XML (de)serialization and schema upgrades

I have a complex graph of XML-serializable classes that I'm able to (de)serialize to hard-disk just fine. But how do I handle massive changes to the graph schema structure? Is there some mechanism to handle XML schema upgrades? Some classes that would allow me to migrate old data to the new format?
Sure I could just use XmlReader/XmlWriter, go through every node and attribute and write several thousand lines of code to convert data to the new format, but maybe there is a better way?
I have found Object graph serialization in .NET and code version upgrades, but I don't think the linked articles apply when there are major changes in the model.

Instead of writing several thousand lines of code to convert files using XmlReader / XmlWriter, you could use XSLT. We are still talking hundreds of lines of code, and perhaps slower execution speeds, but if you are good at XSLT you could get it done much faster.
The other approach would be to build a C# program that links both the old class and the new class (of course you'd need to rename the old class to avoid naming collision). The program would load OldMyClass from disk, construct NewMyClass from the values of its attributes, and serialize NewMyClass to disk. Essentially, this approach moves the task of conversion into the C# territory, which may be a lot more familiar to you.

In this case, i keep my changes in my object and recreate my xml through the XmlSerializer: http://support.microsoft.com/kb/815813
With this i load and save new xml schema based in my object.

Should I use a Namespace of an XML file to identify its version

I'm using DataContractSerializer to serialize a class with DataContract and DataMember attributes to an XML file. My class could potentially change later, and thus the format of the serialized files could also change. I'd like to tag the files I'm saving with a version number so I at least know what version each file is from. I'm still deciding how and if I want to add functionality that will migrate files in older formats to later formats. But right now I'd be happy with just identifying a version mismatch.
Is the namespace of the XML file the correct place to store the version of the file? I was thinking of attributing my class with a DataContract attributes as follows.
[DataContract(Name="MyClass",Namespace="http://www.mycompany.com/MyProject/1.0
public class MyClass
...
Then later if MyClass changes I would change the namespace...
[DataContract(Name="MyClass",Namespace="http://www.mycompany.com/MyProject/2.0)]
public class MyClass
...
Is this the correct usage of XML namespaces, or is there another more prefered way to save the version of an XML file?

You can do it this way, but then the XML representation of your data becomes completely different from version to version from XML Infoset point of view (in which namespace is the part of the qualified name of the element), so you have neither backwards nor forwards compatibility.
Now, one advantage XML has is that it can be easily processed in a forward-compatible way with technologies such as XPath and XSLT - you just pick the elements you can interpret, and leave anything you don't recognize as is. But this requires elements with the same meaning to retain the same name (including namespace) between versions.
In general, it is best to make your schemas forward-compatible. If you can't achieve that, you might still want to provide as much compatibility as possible with existing tools (it is often easier to achieve compatibility against tools which only read data, rather than with those which also write it). Consequently, you avoid storing version number in such cases, and just try to parse whatever you're given, signalling an error if the input is definitely malformed.
If you come to the point where you absolutely must break compatibility in both directions and start from a clean slate, the suggested way of handling this for WCF data contracts is indeed by changing the namespace, as described in best practices on data contract versioning. There are a few minor variations there as well, such as using publication date instead of version number in the URL (W3C is quite fond of this for their schemas), but these are mostly stylistic.

Creating/compling .net data class within an application

Is there a pattern, Xml structure, architecture technique we can use to create simple data holder class code that we can deserialise to/from and do that at runtime?
We're in the early design stage of a .Net project and one of our goals is to make the resulting system extensible through configuration without needing a developer. We need to support new sources of data, typcially delivered as Xml messages. Currently we convert/deserialise the messages into simple classes and then use an already existing language which can manipulate those classes as we need.
That works well when you have a developer to map the Xml to simple class, create the class and then write the deserialisation, but it's not extensible for for an administrator. Our target user audience is high end DBA and/or network admin - people who can handle Xml but may not know C#.

You don't have to write any classes or deserialization routines. If you have a schema, you can use the XSD.exe tool from Visual Studio to automatically make Classes, and use built in .NET XML Serialization/Deserialization.
Now how to have that happen without a recompile each time...
It's not ideal, but this should work:
Assume your DBA can write a schema for the XML.
You could write a tool that takes the schema, runs it through XSD, Add's some wrapper code on top of it, and creates a dll which can be used from within your application.
This could be a manual process (ie the admins email you the schema) or you can distribute the tool as part of your application.
Also, you can infer a schema from an existing XML document.

Perhaps DataTable? Or just use an xml DOM (XmlDocument or XDocument) as data-storage? Neither is ideal, of course - but there is little point creating a type at runtime just for this if your real code will ever see it. What purpose would the extra class type serve? Among other issues you'd have to use lots of reflection just to talk to it.
The other option is a custom property-bag and IXmlSerializable, but that is effort.

Serialization for document storage

I write a desktop application that can open / edit / save documents.
Those documents are described by several objects of different types that store references to each other. Of course there is a Document class that that serves as the root of this data structure.
The question is how to save this document model into a file.
What I need:
Support for recursive structures.
It must be able to open files even if they were produced from slightly different classes. My users don't want to recreate every document after every release just because I added a field somewhere.
It must deal with classes that are not known at compile time (for plug-in support).
What I tired so far:
XmlSerializer -> Fails the first and last criteria.
BinarySerializer -> Fails the second criteria.
DataContractSerializer: Similar to XmlSerializer but with support for cyclic (recursive) references. Also it was designed with (forward/backward) compatibility in mind: Data Contract Versioning. [edit]
NetDataContractSerializer: While the DataContractSerializer still requires to know all types in advance (i.e. it can't work very well with inheritance), NetDataContractSerializer stores type information in the output. Other than that the two seem to be equivalent. [edit]
protobuf-net: Didn't have time to experiment with it yet, but it seems similar in function to DataContractSerializer, but using a binary format. [edit]
Handling of unknown types [edit]
There seem two be two philosophies about what to do when the static and dynamic type differ (if you have a field of type object but a, lets say, Person-object in it). Basically the dynamic type must somehow get stored in the file.
Use different XML tags for different dynamic types. But since the XML tag to be used for a particular class might not be equal to the class name, its only possible to go this route if the deserializer knows all possible types in advance (so that he can scan them for attributes).
Store the CLR type (class name, assembly name & version) during serialization. Use this info during deserialization to instantiate the right class. The types must not be known prior to deserialization.
The second one is simpler to use, but the resulting file will be CLR dependent (and less sensitive to code modifications). Thats probably why XmlSerializer and DataContractSerializer choose the first way. NetDataContractSerializer is not recomended because its using the second approch (So does BinarySerializer by the way).
Any ideas?

The one you haven't tried is DataContractSerializer. There is a constructor that takes a parameter bool preserveObjectReferences that should handle the first criteria.

The WCF data contract serializer is probably closest to your needs, although not perfect.
There is only limited support for backwards compatibility (i.e. whether old versions of the program can read documents generated with a newer version). New fields are supported (via IExtensibleDataObject), but new classes or new enum values not.

I would think the XmlSerializer is your best bet. You won't be able to support everything on your requirements list without a bit of work in your Document classes - but the XmlSerializer architecture gives you extensibility points which should allow you to tap into its mechanism deep enough to do just about anything.
Using the IXmlSerializable interface - by implementing that on your classes you want to store - you should be able to do just about anything, really.
The interface exposes basically two methods - ReadXml And WriteXml
public void WriteXml (XmlWriter writer)
{
// do what you need to do to write out your XML for this object
}
public void ReadXml (XmlReader reader)
{
// do what you need to do to read your object from XML
}
Using these two methods, you should be able to capture the necessary state information from just about any object you might want to store, and turn it into XML that can be persisted to disk - and deserialized back into an object when the time comes!

XmlSerializer can work for your first criteria, however you must provide the recursion for objects like the TreeView control.
BinaryFormatter can work for all 3 criteria. If a class changes, you may have to create a conversion tool to convert old format documents to a new format. Or recognize an older format, deserialize to the old, and then save to the new - keeping your old class format around for a little while.
This will help cover version tolerance which is what I think you're after: MSDN - Version Tolerant Serialization

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.