I have a set of objects that contain fields & properties that need to be inspectable in the output of serialization but not read back in when deserialized.
This is purely for debugging/confirmation purposes. We are creating hundreds of files and I want to spot check that serialization is occurring correctly by adding supplementary information. I do not want this supplementary information to be read in during deserialization - it's impossible to do so in fact.
I also need to do this with equal facility across different serialization formats, so we can assess which one is working best. I have a generic serialization approach where the desired format is passed in as an argument, so don't want anything too messy or intricate for each different format.
I've hunted around and found various things on related topics - mostly to do with the opposite: not writing certain fields during serialization. What's out there seems to be quite complicated and at times hacky.
Is it possible to serialize an object differently to deserializing it using Json.Net?
JsonConvert .NET Serialize/Deserialize Read Only
Serialize Property, but Do Not Deserialize Property in Json.Net
Also it appears any approach is inconsistent between serialization formats. i.e. unlike the [*Ignore] attributes, there are no [*SerializeOnly] attributes (where * = JSON, XML, YAML).
Is there an easy way to do this across these serialization formats? Is there a single family of attributes that can help? Or is it idiosyncratic and hacky in each case?
I have tested and applied this only to XML serialization, but it works for me:
When I want a property to be serialized, but not read back, I just declare an empty setter.
public String VersionOfApplicationThatHasWrittenThisFile
{
get
{
return "1.0";
}
set
{
// Leave empty
}
}
I have to communicate with a IBM main frame using IBM WebSphere.
The service on the main frame side can only use flat files.
On my side I want to use CQRS (Command / Query)
In other words I want to serialize command / queries and deserialize query results
I could do it with standard reflection offcourse, but my question is if there is a nicer way of doing it?
Can I make use of dynamics?
Flatfile > ParsedObjectStructured > Dynamic type > static type
This would depend an awful lot on what the format of the flat file is, and how the schema works - is it self-describing, for example? However, it sounds to me like most of the work here would be in understanding the flat-file format (and the schema-binding). From there, the choice of "deserialize into the static type" vs "deserialize into a dynamic type" is kinda moot, and I would say that there is very little point deserializing into a dynamic type just to have to map it all to the static type. Additionally, the static type can (again, depending on the file-format specifics) be a handy place to decorate the types to say "here's how to interpret this", if the file-format needs specification. For example (and I'm totally making this up as I go along - don't expect this to relate to your format):
[Frobber(Offset = 4, Format = DataFormat.LittleEndianInt32)]
public int Id {get;set;}
[Frobber(Offset = 0, Format = DataFormat.LittleEndianInt32)]
public int Index {get;set;}
[Frobber(Offset = 8, Format = DataFormat.FixedAscii, Size = 20)]
public string Name {get;set;}
[Frobber(Offset = 28, Format = DataFormat.Blob)] // implicit Size=16 as Guid
public Guid UniqueKey {get;set;}
where FrobberAttribute is just something you might invent to specify the file format. Of course, if the schema is defined internally to the file, this may not be necessary.
Re reflection: basic reflection will work fine if the data is fairly light usage; but overall, reflection can be quite expensive. If you need it to be optimal, you would probably want the implementation to consider strategy-caching (i.e. only doing the discovery work once) and meta-programming (turning the strategy into ready-baked IL, rather than incurring the overhead of reflection at runtime).
If the file format is a common / popular one, you might find that there are existing tools for reading that format. If not, you can either roll your own, or find some crazy person who enjoys writing serialization and meta-programming tools. Such people do exist...
I'm wondering if there's a way in which I can create a tree/view of a serialised object graph, and whether anyone has any pointers? EDIT The aim being that should we encounter a de-serialization problem for some reason, that we can actually view/produce a report on the serialized data to help us identify the cause of the problem before having to debug the code. Additionally I want to extend this in the future to take two streams (version 1, version 2) and highlight differences between the two of them to help ensure that we don't accidently remove interesting information during code changes. /EDIT
Traditionally we've used Soap or XML serialization, but these are becoming too restricted for our needs, and Binary serialization would generally do all that we need. The reason that this hasn't been adopted, is because it's much harder to view the serialized contents to help fix upgrade issues etc.
So I've started looking into trying to create a view on the serialized information. I can do this from an ISerializable constructor to a certain extent :
public A(SerializationInfo info, StreamingContext context)
{}
Given the serialization info I can reflect the m_data member and see the actual serialized contents. The problem with this approach is
It will only display a branch from the tree, I want to display the entire tree from the root and it's not really possible to do from this position.
It's not a convenient place to interrogate the information, I'd like to pass a stream to a class and do the work there.
I've seen the ObjectManager class but this works on an existing object graph, whereas I need to be able to work from the stream of data. I've looked through the BinaryFormatted which uses an ObjectReader and a __BinaryParser, hooking into the ObjectManager (which I think will then have the entire contents, just maybe in a flat list), but to replicate this or invoke it all via reflection (2 of those 3 classes are internal) seems like quite a lot of work, so I'm wondering if there's a better approach.
You could put a List<Child class> in every parent class (Even if there the same)
and when you create a child you immediately place it in that list or better yet declare it whilst adding it the list
For instance
ListName.Add(new Child(Constructer args));
Using this you would serialize them as one file which contains the hierarchy of the objects and the objects themselves.
If the parent and child classes are the same there is no reason why you cannot have dynamic and multi leveled hierarchy.
In order to achieve what you describe you would have to deserialize whole object graph from stream without knowing a type from which it was serialized. But this is not possible, because serializer doesn't store such information.
AFAIK it works in a following way. Suppose you have a couple of types:
class A { bool p1 }
class B { string p1; string p2; A p3}
// instantiate them:
var b = new B { p1 = "ppp1", p2 = "ppp2", p3 = new A { p1 = true} };
When serializer is writing this object, it starts walking object graph in some particular order (I assume in alphabetic order) and write object type and then it's contents. So your binary stream will like this:
[B:[string:ppp1][string:ppp2][A:[bool:true]]]
You see, here there are only values and their types. But order is implicit - like it is written.
So, if you change your object B, to suppose
class B { A p1; string p3; string p3;}
Serialzer will fail, because it will try to assing instance of string (which was serialized first) to pointer to A. You may try to reverse engineer how binary serialization works, then you may be able to create a dynamic tree of serialized objects. But this will require considerable effort.
For this purpose I would create class similar to this:
class Node
{
public string NodeType;
public List<Node> Children;
public object NodeValue;
}
Then while you will be reading from stream, you can create those nodes, and recreate whole serialized tree and analyze it.
I want to make a Configuration Data Manager. This would allow multiple services to store and access configuration data that is common to all of them.
For the purposes of the Manager, I've decided to create a configuration class object - basically what every configuration data entry would look like:
Name, type, and value.
In the object these would all be strings that discribe the configuration data object itself. Once it has gotten this data from its database as strings, it would put it into this configuration object.
Then, I want it to send it through WCF to its destination. BUT, I don't want to send a serialized version of the configuration object, but rather a serialized version of the object discribed by the configuration object.
The reason I'd like to do this is so that
The Data Manager does not need to know anything about the configuration data.
So I can add configuration objects easily without changing the service. Of course, I should be able to do all of the CRUD operations, not just read.
Summary:
Input: string of name, type and value
Output: Serialized output of the object; the object itself is "type name = value"
Questions:
Is this a good method for storing and accessing the data?
How can I/can I serialize in this manner?
What would the function prototype of a getConfigurationData method look like?
I have decided to go in a different direction, thanks for the help.
Is this a good method for storing and accessing the data?
That is difficult to answer, the best I can give you is both a "yes" and a "No". Yes, It's not a bad idea to isolate the serialization/rehydration of this data.... and No, I don't really care much for the way you describe doing it. I'm not sure I would want it stored in text unless I plan on editing it by hand, and if I'm editing it by hand, I'm not sure I'd want it in a database. It could be done; just not sure you're really on the right track yet.
How can I/can I serialize in this manner?
Don't build your own, never that. Use a well-known format that already exists. Either XML or JSON will serve for hand-editable, or there are several binary formats (BSON, protobuffers) if you do not need to be able to edit it.
What would the function prototype of a getConfigurationData method look like?
I would first break-down the 'general' aka common configuration into a seperate call from the service specific configuration. This enables getConfigurationData to simply return a rich type for common information. Then either add a extra param and property for service specific data, or add another method. As an example:
[DataContract]
public class ConfigurationInfo
{
[DataMember]
public string Foo;
...
// This string is a json/xml blob specific to the 'svcType' parameter
[DataMember]
public string ServiceConfig;
}
[DataContract]
public interface IServiceHost
{
ConfigurationInfo GetConfigurationData(string svcType);
}
Obviously you place a little burden on the caller to parse the 'ServiceConfig'; however, your server can treat it as an opaque string value. It's only job is to associate it with the appropriate svcType and store/fetch the correct value.
I have a class that serializes a set of objects (using XML serialization) that I want to unit test.
My problem is it feels like I will be testing the .NET implementation of XML serialization, instead of anything useful. I also have a slight chicken and egg scenario where in order to test the Reader, I will need a file produced by the Writer to do so.
I think the questions (there's 3 but they all relate) I'm ultimately looking for feedback on are:
Is it possible to test the Writer, without using the Reader?
What is the best strategy for testing the reader (XML file? Mocking with record/playback)? Is it the case that all you will really be doing is testing property values of the objects that have been deserialized?
What is the best strategy for testing the writer!
Background info on Xml serialization
I'm not using a schema, so all XML elements and attributes match the objects' properties. As there is no schema, tags/attributes which do not match those found in properties of each object, are simply ignored by the XmlSerializer (so the property's value is null or default). Here is an example
<MyObject Height="300">
<Name>Bob</Name>
<Age>20</Age>
<MyObject>
would map to
public class MyObject
{
public string Name { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
and visa versa. If the object changed to the below the XML would still deserialize succesfully, but FirstName would be blank.
public class MyObject
{
public string FirstName { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
An invalid XML file would deserialize correctly, therefore the unit test would pass unless you ran assertions on the values of the MyObject.
Do you need to be able to do backward compatibility? If so, it may be worth building up unit tests of files produced by old versions which should still be able to be deserialized by new versions.
Other than that, if you ever introduce anything "interesting" it may be worth a unit test to just check you can serialize and deserialize just to make sure you're not doing something funky with a readonly property etc.
I would argue that it is essential to unit test serialization if it is vitally important that you can read data between versions. And you must test with "known good" data (i.e. it isn't sufficient to simply write data in the current version and then read it again).
You mention that you don't have a schema... why not generate one? Either by hand (it isn't very hard), or with xsd.exe. Then you have something to use as a template, and you can verify this just using XmlReader. I'm doing a lot of work with xml serialization at the moment, and it is a lot easier to update the schema than it is to worry about whether I'm getting the data right.
Even XmlSerializer can get complex; particularly if you involve subclasses ([XmlInclude]), custom serialization (IXmlSerializable), or non-default XmlSerializer construction (passing additional metadata at runtime to the ctor). Another possibility is creative use of [XmlIngore], [XmlAnyAttribute] or [XmlAnyElement]; for example you might support unexpected data for round-trip (only) in version X, but store it in a known property in version Y.
With serialization in general:
The reason is simple: you can break the data! How badly you do this depends on the serializer; for example, with BinaryFormatter (and I know the question is XmlSerializer), simply changing from:
public string Name {get;set;}
to
private string name;
public string Name {
get {return name;}
set {name = value; OnPropertyChanged("Name"); }
}
could be enough to break serialization, as the field name has changed (and BinaryFormatter loves fields).
There are other occasions when you might accidentally rename the data (even in contract-based serializers such as XmlSerializer / DataContractSerializer). In such cases you can usually override the wire identifiers (for example [XmlAttribute("name")] etc), but it is important to check this!
Ultimately, it comes down to: is it important that you can read old data? It usually is; so don't just ship it... prove that you can.
For me, this is absolutely in the Don't Bother category. I don't unit test my tools. However, if you wrote your own serialization class, then by all means unit test it.
If you want to ensure that the serialization of your objects doesn't break, then by all means unit test. If you read the MSDN docs for the XMLSerializer class:
The XmlSerializer cannot serialize or deserialize the following:Arrays of ArrayListArrays of List<T>
There is also a peculiar issue with enums declared as unsigned longs. Additionally, any objects marked as [Obsolete] do no get serialized from .Net 3.5 onwards.
If you have a set of objects that are being serialized, testing the serialization may seem odd, but it only takes someone to edit the objects being serialized to include one of the unsupported conditions for the serialisation to break.
In effect, you are not unit testing XML serialization, you are testing that your objects can be serialized. The same applies for deserialization.
Yes, as long as what needs to be tested is properly tested, through a bit of intervention.
The fact that you're serializing and deserializing in the first place means that you're probably exchanging data with the "outside world" -- the world outside the .NET serialization domain. Therefore, your tests should have an aspect that's outside this domain. It is not OK to test the Writer using the Reader, and vice versa.
It's not only about whether you would just end up testing the .NET serialization/deserialization; you have to test your interface with the outside world -- that you can output XML in the expected format and that you can properly consume XML in the anticipated format.
You should have static XML data that can be used to compare against serialization output and to use as input data for deserialization.
Assume you give the job of note taking and reading the notes back to the same guy:
You - Bob, I want you to jot down the following: "small yellow duck."
Bob - OK, got it.
You - Now, read it back to me.
Bob - "small yellow duck"
Now, what have we tested here? Can Bob really write? Did Bob even write anything or did he memorize the words? Can Bob actually read? -- his own handwriting? What about another person's handwriting? We don't have answers to any of these questions.
Now let's introduce Alice to the picture:
You - Bob, I want you to jot down the following: "small yellow duck."
Bob - OK, got it.
You - Alice, can you please check what Bob wrote?
Alice - OK, he's got it.
You - Alice, can you please jot down a few words?
Alice - Done.
You - Bob, can you please read them?
Bob - "red fox"
Alice - Yup, that sounds right.
We now know, with certainty, that Bob can write and read properly -- as long as we can completely trust Alice. Static XML data (ideally tested against a schema) should sufficiently be trustworthy.
In my experience it is definitely worth doing, especially if the XML is going to be used as an XML document by the consumer. For example, the consumer may need to have every element present in the document, either to avoid null checking of nodes when traversing or to pass schema validation.
By default the XML serializer will omit properties with a null value unless you add the [XmlElement(IsNullable = true)] attribute. Similarly, you may have to redirect generic list properties to standard arrays with an XMLArray attribute.
As another contributor said, if the object is changing over time, you need to continuously check that the output is consistent. It will also protect you against the serializer itself changing and not being backwards compatible, although you'd hope that this doesn't happen.
So for anything other than trivial uses, or where the above considerations are irrelevant, it is worth the effort of unit testing it.
There are a lot of types that serialization can not cope with etc. Also if you have your attributes wrong, it is common to get an exception when trying to read the xml back.
I tend to create an example tree of the objects that can be serialized with at least one example of each class (and subclass). Then at a minimum serialize the object tree to a stringstream and then read it back from the stringstream.
You will be amazed the number of time this catches a problem and save me having to wait for the application to start up to find the problem. This level of unit testing is more about speeding up development rather then increasing quality, so I would not do it for working serialization.
As other people have said, if you need to be able to read back data saved by old versions of your software, you had better keep a set of example data files for each shipped version and have tests to confirm you can still read them. This is harder then it seems at first, as the meaning of fields on a object may change between versions, so just being able to create the current object from a old serialized file is not enough, you have to check that the meaning is the same as it was it the version of the software that saved the file. (Put a version attribute in your root object now!)
I agree with you that you will be testing the .NET implementation more than you'll be testing your own code. But if that's what you want to do (perhaps you don't trust the .NET implementation :) ), I might approach your three questions as follows.
Yes, it's certainly possible to test the writer without the reader. Use the writer to serialize the example (20-year old Bob) you provided to a MemoryStream. Open the MemoryStream with an XmlDocument. Assert the root node is named "MyObject". Assert it has one attribute named "Height" with value "300". Assert there is a "Name" element containing a text node with value "Bob". Assert there is an "Age" element containing a text node with value "20".
Just do the reverse process of #1. Create an XmlDocument from the 20-year old Bob XML string. Deserialize the stream with the reader. Assert the Name property equals "Bob". Assert the Age property equals 20. You can do things like add test case with insignificant whitespace or single quotes instead of double-quotes to be more thorough.
See #1. You can extend it by adding what you consider to be tricky "edge" cases you think could break it. Names with various Unicode characters. Extra long names. Empty names. Negative ages. Etc.
I have done this in some cases... not testing the serialisation as such, but using some 'known good' XML serializations and then loading them into my classes, and checking that all the properties (as applicable) have the expected values.
This is not going to test anything for the first version... but if the classes ever evolve I know I will catch any breaking changes in the format.
We do acceptance testing of our serialization rather than unit testing.
What this means is that our acceptance testers take the XML schema, or as in your case some sample XML, and re-create their own serializable data-transfer class.
We then use NUnit to test our WCF service with this clean-room XML.
With this technique we've identified many, many errors. For example, where we have changed the name of the .NET member and forgotten to add an [XmlElement] tag with a Name = property.
If there's nothing you can do to change the way your class serializes, then you're testing .NET's implementation of XML serialization ;-)
If the format of the serialized XML matters, then you need to test the serialization. If it's important that you can deserialize it, then you need to test deserialization.
Seeing how you can't really fix serialization, you shouldn't be testing it - instead, you should be testing your own code and the way it interacts with the serialization mechanism. For example, you might need to unit-test the structure of the data you're serializing to make sure that no-one accidentally changes a field or something.
Speaking of which, I have recently adopted a practice where I check such things at compile-time rather than during execution of unit tests. It's a bit tedious, but I have a component that can traverse the AST, and then I can read it in a T4 template and write lots of #error messages if I meet something that shouldn't be there.