XML deserialization - generate warning or exception on unexpected element [duplicate]

XML deserialization - generate warning or exception on unexpected element [duplicate] - c#

When I deserialize an XML document with XmlTextReader, a textual element for which there is no corresponding class is simply ignored.
Note: this is not about elements missing from the XML, which one requires to be present, but rather being present in the XML text, while not having an equivalent property in code.
I would have expected to get an exception because if the respective element is missing from the runtime data and I serialize it later, the resulting XML document will be different from the original one. So it's not safe to ignore it (in my real-world case I have just forgotten to define one of the 99+ classes the given document contains, and I didn't notice at first).
So is this normal and if yes, why? Can I somehow request that I want to get exceptions if elements cannot be serialized?
In the following example-XML I have purposely misspelled "MyComandElement" to illustrate the core problem:
<MyRootElement>
<MyComandElement/>
</MyRootElement>
MyRootElement.cs:
public class CommandElement {};
public class MyRootElement
{
public CommandElement MyCommandElement {get; set;}
}
Deserialization:
XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyRootElement));
XmlTextReader xmlReader = new XmlTextReader(#"pgtest.xml");
MyRootElement mbs2 = (MyRootElement)xmlSerializer.Deserialize(xmlReader);
xmlReader.Close();

As I have found out by accident during further research, this problem is actually ridiculously easy to solve because...
...XmlSerializer supports events! All one has to do is to define an event handler for missing elements
void Serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
throw new Exception("Unknown element "+e.Element.Name+" found in "
+e.ObjectBeingDeserialized.ToString()+" in line "
+e.LineNumber+" at position "+e.LinePosition);
}
and register the event with XmlSerializer:
xmlSerializer.UnknownElement += Serializer_UnknownElement;
The topic is treated at MSDN, where one also learns that
By default, after calling the Deserialize method, the XmlSerializer ignores XML attributes of unknown types.
Not surprisingly, there are also events for missing attributes, nodes and objects.

So is this normal and if yes, why?
Because maybe you're consuming someone else's XML document and whilst they define 300 different elements within their XML, you only care about two. Should you be forced to create classes for all of their elements and deserialize all of them just to be able to access the two you care about?
Or perhaps you're working with a system that is going to be in flux over time. You're writing code that consumes today's XML and if new elements/attributes are introduced later, they shouldn't stop your tested and deployed code from being able to continue to consume those parts of the XML that they do understand (Insert caveat here that, hopefully, if you're in such a situation, you/the XML author don't introduce elements later which it is critical to understand to cope with the document correctly).
These are two sides of the same coin of why it can be desirable for the system not to blow up if it encounters unexpected parts within the XML document it's being asked to deserialize.

Related

Question on usage of XML with XPath vs XML as Class

I have few XML files which I would be using in my C# code.
So far I have been using XPATH for accessing the XML node / attributes
Question is what advantage would I get if i convert the XML to Class file (XSD.EXE) and use it in terms of maintainability and code readability.
In both the cases I know if I add or remove some nodes, code needs to be changed
In my case the DLL goes into GAC.
I am just trying to get your views
Cheers,
Karthik

The beauty of converting your XML to XSD and then to a C# class is the ease in which you can grab yet another file. Your code would be something like:
XmlSerializer ser = new XmlSerializer(typeof(MyClass));
FileStream fstm = new FileStream(#"C:\mysample.xml", FileMode.Open, FileAccess.Read);
MyClass result = ser.Deserialize(fstm) as MyClass;
if(result != null)
{
// do whatever you want with your new class instance!
}
With these few lines, you now have an object that represent exactly what your XML contained, and you can access its properties as properties on the object instance - much easier than doing lots of complicated XPath queries into your XML, in my opinion.
Also, thanks to the fact you now have a XSD, you can also easily validate incoming XML files to make sure they actually do correspond to the contract defined - which causes less constant error-checking in your code (you don't have to check after each XPath to see whether there's any node(s) that actually match that expression etc.).

XmlSerializer.Deserialize - ignore unnecessary elements?

I've got an XSD schema which I've generated a class for using xsd.exe, and I'm trying to use XmlSerializer.Deserialize to create an instance of that class from an XML file that is supposed to conform to the XSD schema. Unfortunately the XML file has some extra elements that the schema is not expecting, which causes a System.InvalidOperationException to be thrown from Deserialize.
I've tried adding <xs:any> elements to my schema but this doesn't seem to make any difference.
My question is: is there any way to get XmlSerializer.Deserialize to ignore these extra elements?

I usually add extra properties or fields to all entity classes to pick up extra elements and attributes, looking something like the code below:
[XmlAnyAttribute]
public XmlAttribute[] AnyAttributes;
[XmlAnyElement]
public XmlElement[] AnyElements;
Depending on the complexity of your generated code, you may not find hand-inserting this code on every entity appealing. Perhaps only-slightly-less-tedious is defining these attributes in a base class and ensuring all entities inherit the base.
To give fair attribution, I was first introduced to this pattern when reading the source code for DasBlog.

I don't think there is an option to do this. You either have to fix the schema or manually modify the code generated by xsd.exe to allow the XML to be deserialized. You can also try to open the XML document + schema in Visual Studio or any other XML editor with schema support to either fix the schema or the XML document.

Find number of serialized objects

My issue is trying to determine a number of objects created, the objects being serialized from an XML document. The XML document should be set up for simplicity, so any developer can add an additional object and need no further modification to the code. However each of these objects need to be handled/updated seperately, and specifically, some of the objects are of different sub-classes, which need to be handled differently. So what would be my simplest course of action, allowing other to add objects via the XML, but still ensuring the proper logic happenes for each?

This is totally a bad idea, but if you want something constructive...
Model your XML document objects and include some kind of known syntax for you to specify Lambda expressions in it. So if you enter a
<BinaryExpresion>
<NodeType>Add</NodeType>
<Left>3</Left>
<Right>4</Right>
</BinaryExpression>
Then when you read and compile the expression, you could run that code against the data if the XML object and do something (in this case, executing 3 + 4)

Is there any point Unit testing serialization?

I have a class that serializes a set of objects (using XML serialization) that I want to unit test.
My problem is it feels like I will be testing the .NET implementation of XML serialization, instead of anything useful. I also have a slight chicken and egg scenario where in order to test the Reader, I will need a file produced by the Writer to do so.
I think the questions (there's 3 but they all relate) I'm ultimately looking for feedback on are:
Is it possible to test the Writer, without using the Reader?
What is the best strategy for testing the reader (XML file? Mocking with record/playback)? Is it the case that all you will really be doing is testing property values of the objects that have been deserialized?
What is the best strategy for testing the writer!
Background info on Xml serialization
I'm not using a schema, so all XML elements and attributes match the objects' properties. As there is no schema, tags/attributes which do not match those found in properties of each object, are simply ignored by the XmlSerializer (so the property's value is null or default). Here is an example
<MyObject Height="300">
<Name>Bob</Name>
<Age>20</Age>
<MyObject>
would map to
public class MyObject
{
public string Name { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
and visa versa. If the object changed to the below the XML would still deserialize succesfully, but FirstName would be blank.
public class MyObject
{
public string FirstName { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
An invalid XML file would deserialize correctly, therefore the unit test would pass unless you ran assertions on the values of the MyObject.

Do you need to be able to do backward compatibility? If so, it may be worth building up unit tests of files produced by old versions which should still be able to be deserialized by new versions.
Other than that, if you ever introduce anything "interesting" it may be worth a unit test to just check you can serialize and deserialize just to make sure you're not doing something funky with a readonly property etc.

I would argue that it is essential to unit test serialization if it is vitally important that you can read data between versions. And you must test with "known good" data (i.e. it isn't sufficient to simply write data in the current version and then read it again).
You mention that you don't have a schema... why not generate one? Either by hand (it isn't very hard), or with xsd.exe. Then you have something to use as a template, and you can verify this just using XmlReader. I'm doing a lot of work with xml serialization at the moment, and it is a lot easier to update the schema than it is to worry about whether I'm getting the data right.
Even XmlSerializer can get complex; particularly if you involve subclasses ([XmlInclude]), custom serialization (IXmlSerializable), or non-default XmlSerializer construction (passing additional metadata at runtime to the ctor). Another possibility is creative use of [XmlIngore], [XmlAnyAttribute] or [XmlAnyElement]; for example you might support unexpected data for round-trip (only) in version X, but store it in a known property in version Y.
With serialization in general:
The reason is simple: you can break the data! How badly you do this depends on the serializer; for example, with BinaryFormatter (and I know the question is XmlSerializer), simply changing from:
public string Name {get;set;}
to
private string name;
public string Name {
get {return name;}
set {name = value; OnPropertyChanged("Name"); }
}
could be enough to break serialization, as the field name has changed (and BinaryFormatter loves fields).
There are other occasions when you might accidentally rename the data (even in contract-based serializers such as XmlSerializer / DataContractSerializer). In such cases you can usually override the wire identifiers (for example [XmlAttribute("name")] etc), but it is important to check this!
Ultimately, it comes down to: is it important that you can read old data? It usually is; so don't just ship it... prove that you can.

For me, this is absolutely in the Don't Bother category. I don't unit test my tools. However, if you wrote your own serialization class, then by all means unit test it.

If you want to ensure that the serialization of your objects doesn't break, then by all means unit test. If you read the MSDN docs for the XMLSerializer class:
The XmlSerializer cannot serialize or deserialize the following:Arrays of ArrayListArrays of List<T>
There is also a peculiar issue with enums declared as unsigned longs. Additionally, any objects marked as [Obsolete] do no get serialized from .Net 3.5 onwards.
If you have a set of objects that are being serialized, testing the serialization may seem odd, but it only takes someone to edit the objects being serialized to include one of the unsupported conditions for the serialisation to break.
In effect, you are not unit testing XML serialization, you are testing that your objects can be serialized. The same applies for deserialization.

Yes, as long as what needs to be tested is properly tested, through a bit of intervention.
The fact that you're serializing and deserializing in the first place means that you're probably exchanging data with the "outside world" -- the world outside the .NET serialization domain. Therefore, your tests should have an aspect that's outside this domain. It is not OK to test the Writer using the Reader, and vice versa.
It's not only about whether you would just end up testing the .NET serialization/deserialization; you have to test your interface with the outside world -- that you can output XML in the expected format and that you can properly consume XML in the anticipated format.
You should have static XML data that can be used to compare against serialization output and to use as input data for deserialization.
Assume you give the job of note taking and reading the notes back to the same guy:
You - Bob, I want you to jot down the following: "small yellow duck."
Bob - OK, got it.
You - Now, read it back to me.
Bob - "small yellow duck"
Now, what have we tested here? Can Bob really write? Did Bob even write anything or did he memorize the words? Can Bob actually read? -- his own handwriting? What about another person's handwriting? We don't have answers to any of these questions.
Now let's introduce Alice to the picture:
You - Bob, I want you to jot down the following: "small yellow duck."
Bob - OK, got it.
You - Alice, can you please check what Bob wrote?
Alice - OK, he's got it.
You - Alice, can you please jot down a few words?
Alice - Done.
You - Bob, can you please read them?
Bob - "red fox"
Alice - Yup, that sounds right.
We now know, with certainty, that Bob can write and read properly -- as long as we can completely trust Alice. Static XML data (ideally tested against a schema) should sufficiently be trustworthy.

In my experience it is definitely worth doing, especially if the XML is going to be used as an XML document by the consumer. For example, the consumer may need to have every element present in the document, either to avoid null checking of nodes when traversing or to pass schema validation.
By default the XML serializer will omit properties with a null value unless you add the [XmlElement(IsNullable = true)] attribute. Similarly, you may have to redirect generic list properties to standard arrays with an XMLArray attribute.
As another contributor said, if the object is changing over time, you need to continuously check that the output is consistent. It will also protect you against the serializer itself changing and not being backwards compatible, although you'd hope that this doesn't happen.
So for anything other than trivial uses, or where the above considerations are irrelevant, it is worth the effort of unit testing it.

There are a lot of types that serialization can not cope with etc. Also if you have your attributes wrong, it is common to get an exception when trying to read the xml back.
I tend to create an example tree of the objects that can be serialized with at least one example of each class (and subclass). Then at a minimum serialize the object tree to a stringstream and then read it back from the stringstream.
You will be amazed the number of time this catches a problem and save me having to wait for the application to start up to find the problem. This level of unit testing is more about speeding up development rather then increasing quality, so I would not do it for working serialization.
As other people have said, if you need to be able to read back data saved by old versions of your software, you had better keep a set of example data files for each shipped version and have tests to confirm you can still read them. This is harder then it seems at first, as the meaning of fields on a object may change between versions, so just being able to create the current object from a old serialized file is not enough, you have to check that the meaning is the same as it was it the version of the software that saved the file. (Put a version attribute in your root object now!)

I agree with you that you will be testing the .NET implementation more than you'll be testing your own code. But if that's what you want to do (perhaps you don't trust the .NET implementation :) ), I might approach your three questions as follows.
Yes, it's certainly possible to test the writer without the reader. Use the writer to serialize the example (20-year old Bob) you provided to a MemoryStream. Open the MemoryStream with an XmlDocument. Assert the root node is named "MyObject". Assert it has one attribute named "Height" with value "300". Assert there is a "Name" element containing a text node with value "Bob". Assert there is an "Age" element containing a text node with value "20".
Just do the reverse process of #1. Create an XmlDocument from the 20-year old Bob XML string. Deserialize the stream with the reader. Assert the Name property equals "Bob". Assert the Age property equals 20. You can do things like add test case with insignificant whitespace or single quotes instead of double-quotes to be more thorough.
See #1. You can extend it by adding what you consider to be tricky "edge" cases you think could break it. Names with various Unicode characters. Extra long names. Empty names. Negative ages. Etc.

I have done this in some cases... not testing the serialisation as such, but using some 'known good' XML serializations and then loading them into my classes, and checking that all the properties (as applicable) have the expected values.
This is not going to test anything for the first version... but if the classes ever evolve I know I will catch any breaking changes in the format.

We do acceptance testing of our serialization rather than unit testing.
What this means is that our acceptance testers take the XML schema, or as in your case some sample XML, and re-create their own serializable data-transfer class.
We then use NUnit to test our WCF service with this clean-room XML.
With this technique we've identified many, many errors. For example, where we have changed the name of the .NET member and forgotten to add an [XmlElement] tag with a Name = property.

If there's nothing you can do to change the way your class serializes, then you're testing .NET's implementation of XML serialization ;-)

If the format of the serialized XML matters, then you need to test the serialization. If it's important that you can deserialize it, then you need to test deserialization.

Seeing how you can't really fix serialization, you shouldn't be testing it - instead, you should be testing your own code and the way it interacts with the serialization mechanism. For example, you might need to unit-test the structure of the data you're serializing to make sure that no-one accidentally changes a field or something.
Speaking of which, I have recently adopted a practice where I check such things at compile-time rather than during execution of unit tests. It's a bit tedious, but I have a component that can traverse the AST, and then I can read it in a T4 template and write lots of #error messages if I meet something that shouldn't be there.

Problem deserializing validated XML, can't convert to/from array

I'm a bit out of my element. I've used xsd.exe to create an xsd schema from an xml file, and then to create a C# class from that xsd. That all seems to work fine.
At the moment I'm just trying to get the XML deserialized. The file I'm deserializing if the very same file I used to build the class originally. Here's my code:
String xsdPath=#"C:\Users\tol56881\Documents\dnd4e.xsd";
String xmlPath=#"C:\Users\tol56881\Documents\dnd4e.xml";
String xsdNamespace="";
//Validation stuff
XmlParserContext context = new XmlParserContext(null, null, "", XmlSpace.None);
XmlValidatingReader vr = new XmlValidatingReader(xmlPath, XmlNodeType.Element, context);
vr.ValidationType = ValidationType.Schema;
vr.Schemas.Add(xsdNamespace, xsdPath);
while (vr.Read()) ;
//Actually reading the file
TextReader tr = new StreamReader(xmlPath);
D20Character character = (D20Character)(new XmlSerializer(typeof(D20Character))).Deserialize(tr);
It compile fine, but when I try to run it I get the an error that's repeated for four different objects. I've given an example below, changing the names of the objects.
Unable to generate a temporary class (result=1).
error CS0030: Cannot convert type 'Namespace.ObjectName[]' to 'Namespace.ObjectName'
error CS0029: Cannot implicitly convert type 'Namespace.ObjectName' to 'Namespace.ObjectName[]'
So it seems like the program is trying to go from array to object and back to array, but I'm not really sure. The auto-generated class code is a huge mess that's difficult to wade through. I'm hoping that maybe there's something simple I'm missing here.
Thanks!

I managed to fix this. Each of the four objects in question were generated as doubly-indexed arrays, such as:
private loot[][] lootTallyField;
and
public loot[][] LootTally
{
get
{
return this.lootTallyField;
}
set
{
this.lootTallyField = value;
}
}
All I did was remove one set of brackets, and it all seems to be working fine. No problems with deserialization and a quick inspection of the deserialized object makes it look like the data was loaded correctly.
private loot[] lootTallyField;
and
public loot[] LootTally
{
get
{
return this.lootTallyField;
}
set
{
this.lootTallyField = value;
}
}
Still not sure why xsd.exe made these doubly-indexed if they're not supposed to be. I feel like I'm still missing something, hence why this question is still open.
Particularly, if I ever need to re-generate this code, then I'd need to reapply the fix, which kind of defeats the purpose of using a partial class in the first place...

There is a problem on xsd.exe tool, I will try to explain.
If you have a complexType with a sequence inside that has a child complexType with a sequence and the first one does not have any other elements / attributes, then the generated class will have only 1 generated type, instead of 2 and it will be a double array.
If you make the double array into a single array, you will be able to deserialize your xml just fine.
HOWEVER this will produce the following unexpected result.
If your xml looks like the below.
<root>
<loot>
<tally>value1</tally>
<tally>value2</tally>
</loot>
<loot>
<tally>value3</tally>
<tally>value4</tally>
</loot>
</root>
Then your deserialized object, in the lootTally array would only contain the value3 and value4 items instead of having all 4.
So you have 2 options to fix this correctly:
Alter the xsd file by adding a dummy in the first sequence, and run xsd.exe again, so that when it generates the class it will not create a double array, and then you can delete the dummy attribute from the class.
Alter the generated class, add a new class named loot which will contain an array of tally objects which you already have (and only need to alter the name).
Please note that in option 2 you may have to change some declarations if you have an XmlArrayItemAttribute to XmlElementAttribute.
Hope this helps

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XML deserialization - generate warning or exception on unexpected element [duplicate] - c#

Related

Question on usage of XML with XPath vs XML as Class

XmlSerializer.Deserialize - ignore unnecessary elements?

Find number of serialized objects

Is there any point Unit testing serialization?

Problem deserializing validated XML, can't convert to/from array

Categories

Resources