Storing Relational Data in XML - c#

I'm wondering what the best practices are for storing a relational data structure in XML. Particulary, I am wondering about best practices for enforcing node order. For example, say I have three objects: School, Course, and Student, which are defined as follows:
class School
{
List<Course> Courses;
List<Student> Students;
}
class Course
{
string Number;
string Description;
}
class Student
{
string Name;
List<Course> EnrolledIn;
}
I would store such a data structure in XML like so:
<School>
<Courses>
<Course Number="ENGL 101" Description="English I" />
<Course Number="CHEM 102" Description="General Inorganic Chemistry" />
<Course Number="MATH 103" Description="Trigonometry" />
</Courses>
<Students>
<Student Name="Jack">
<EnrolledIn>
<Course Number="CHEM 102" />
<Course Number="MATH 103" />
</EnrolledIn>
</Student>
<Student Name="Jill">
<EnrolledIn>
<Course Number="ENGL 101" />
<Course Number="MATH 103" />
</EnrolledIn>
</Student>
</Students>
</School>
With the XML ordered this way, I can parse Courses first. Then, when I parse Students, I can look up each Course listed in EnrolledIn (by its Number) in the School.Courses list. This will give me an object reference to add to the EnrolledIn list in Student. If Students, however, comes before Courses, such a lookup to get a object reference is not possible. (Since School.Courses has not yet been populated.)
So what are the best practices for storing relational data in XML?
- Should I enforce that Courses must always come before Students?
- Should I tolerate any ordering and create a stub Course object whenever I encounter one I have not yet seen? (To be expanded when the definition of the Course is eventually reached later.)
- Is there some other way I should be persisting/loading my objects to/from XML? (I am currently implementing Save and Load methods on all my business objects and doing all this manually using System.Xml.XmlDocument and its associated classes.)
I am used to working with relational data out of SQL, but this is my first experience trying to store a non-trivial relational data structure in XML. Any advice you can provide as to how I should proceed would be greatly appreciated.

Don't think in SQL or relational when working with XML, because there are no order constraints.
You can however query using XPath to any portion of the XML document at any time. You want the courses first, then "//Courses/Course". You want the students enrollments next, then "//Students/Student/EnrolledIn/Course".
The bottom line being... just because XML is stored in a file, don't get caught thinking all your accesses are serial.
I posted a separate question, "Can XPath do a foreign key lookup across two subtrees of an XML?", in order to clarify my position. The solution shows how you can use XPath to make relational queries against XML data.

While you can specify order of child elements using a <xsd:sequence>, by requiring child objects to come in specific order you make your system less flexible (i.e., harder to update using notepad).
Best thing to do is to parse out all your data, then perform what actions you need to do. Don't act during the parse.
Obviously, the design of the XML and the data behind it precludes serializing a single POCO to XML. You need to control the serialization and deserialization logic in order to unhook and re-hook objects together.
I'd suggest creating a custom serializer that builds the xml representation of this object graph. It can thereby control not only the order of serialization, but also handle situations where nodes aren't in the expected order. You could do other things such as adding custom attributes to use for linking objects together which don't exist as public properties on the objects being serialized.
Creating the xml would be as simple as iterating over your objects a few times, building up collections of XElements with the expected representation of the objects as xml. When you're done you can stitch them together into an XDocument and grab the xml from it. You can make multiple passes over the xml on the reverse side to re-create your object graph and restore all references.

Node ordering is only important if you need to do forward-only processing of the data, e.g. using an XmlReader or a SAX parser. If you're going to read the XML into a DOM before processing it (which you are if you're using XmlDocument), node order doesn't really matter. What matters more is that the XML be structured so that you can query it with XPath efficiently, i.e. without having to use "//".
If you take a look at the schema that the DataSetGenerator produces, you'll see that there's no ordering associated with the DataTable-level elements. It may be that ADO processes elements in some sequence not represented in the schema (e.g. one DataTable at a time), or it may be that ADO does forward-only processing and doesn't enforce relational constraints until the DataSet is fully read. I don't know. But it's clear that ADO doesn't couple the processing order to the document order.
(And yes, you can specify the order of child elements in an XML schema; that's what xs:sequence does. If you don't want node order to be enforced, you use an unbounded xs:choice.)

The order is not usually important in XML. In this case the Courses could come after Students. You parse the XML and then you make your queries on the entire data.

From experience, XML isn't the best to store relational data. Have you investigated YAML? Do you have the option?
If you don't, a safe way would be to have a strict DTD for the XML and enforce that way. You could also, as you suggest, keep a hash of objects created. That way if a Student creates a Course you keep that Course around for future updating when the tag is hit.
Also remember you can use XPath queries to access specific nodes directly, so you can enforce parsing of courses first regardless of position in the XML document. (making a more complete answer, thanks to dacracot)

You could also use two XML files, one for courses and a second for students. Open and parse the first before you do the second.

I's been a while, but I seem to remember doing a base collection of 'things' in one part of an xml file, and referring to them in another using the schema features keyref and refer. I found a few examples here. My apologies if this is not what you're looking for.

XML is definitely not a friendly place for relational data.
If you absolutely need to do this, then I'd recommend a funky inverted kind of logic.
In your example, you've got Schools, which offers many courses, taken by many students.
Your XML might follow as such:
<School>
<Students>
<Student Name="Jack">
<EnrolledIn>
<Course Number="CHEM 102" Description="General Inorganic Chemistry" />
<Course Number="MATH 103" Description="Trigonometry" />
</EnrolledIn>
</Student>
<Student Name="Jill">
<EnrolledIn>
<Course Number="ENGL 101" Description="English I" />
<Course Number="MATH 103" Description="Trigonometry" />
</EnrolledIn>
</Student>
</Students>
</School>
This obviously isn't the least repetitive way to do this (it's relational data!), but it's easily parse-able.

Related

Best way to compare XML in C#

I'm looking for the best way to compare two XML files for difference using C#. Like say for example if I have two XMLs A and B like this:
XML A
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>222</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>444</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
XML B
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>aaa</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>bbb</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
<data:Repeat1>
<data:TextBox4>ccc</data:TextBox4>
<data:TextBox5>ddd</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
I'm looking to get only the different between the two XML files, like in this case it would be TextBox2 and TextBox4 and one full node for Repeat1_Repeat.
Is there a easy way to get this? Maybe use some framework? I'm using .NET 4.5.2 so anything recent would work too!
Thanks!
EDIT : Oh and also, I need it to work for n-level of nesting.
I think XMLDiff is the best way. No framework needed. As seen on MSDN:
By using the XMLDiff class, the programmer is able to determine if the
two files are in fact different based on the conditions that are
important to their application. The programmer is able to ignore
changes that are only superficial (for example, different prefixes for
same namespace). XMLPatch then provides the ability to update the
original XML by applying only the changes that matter to the original
XML.
You should check it out:
https://msdn.microsoft.com/en-us/library/aa302294.aspx
You can use Paste Special to generate classes for your XML.
Then you can deserialise your xml to create an instance of your code generated class.
So for two xml files; you can create two objects like xmlObject1 and xmlObject2.
Then you can use CompareObject to identify the difference between two objects.

Node-By-Node XML Serialization

I need to write out an XML where the order of elements is important (I realize that XML format might not be the right thing to use here, but...). I need something like:
<Author>
<Book>
<Author>
<Book>
The underlying class has elements that look like:
Author[] Author;
Book[] Book;
I am planning on having an index value on the Book and Author classes and using that to write out the XML.
What I am trying to find is if there is an easy way to serialize classes one by one into an XML. I looked at XmlWriter but it looks like it can only be used to write XML at a very basic level (i.e. no serialization support).
Thanks for your help!
If you don't want to use the built-in serialization routines (and sometimes you don't, because your structure may change and it gets harder and harder to use those routines as your structure evolves over time), then an easy way to get where you want to go is to create an XmlDocument and then use XmlDocument.CreateNode to create an XML node for the document, adding attributes to the node as needed, then add the node to the document using XmlDocument.AppendChild (or as you get deeper into the structure, use XmlNode.AppendChild). Finally, use XmlDocument.Save to write the structure out to a file (or memory or whatever you want to do with it at that point).
The Microsoft System.Xml namespace documentation is quite good, look to there for a lot of examples and then just write quick test programs to verify what you learn from the examples.

How to use flattened ViewModel for a Web API method

I am creating a Web API service that acts as a facade for my clients to a more complex messaging API on the backend. The .XSD that represents the calls I need to make to the backend API is obviously not something I want them to understand. My goal is to flatten out the required elements in a ViewModel class that can be used by the client. My POST might be something like below:
public HttpResponseMessage Post(FlattenedViewModel flattenedViewModel)
{
}
The idea of the flattened view model is to prevent my clients from having to understand any complex structuring of data to call my API. It's a lot easier to submit this (could be JSON or XML):
<PersonFirstName>John</PersonFirstName>
<PersonLastName>Smith</PersonLastName>
<PersonPhone>123-456-7890</PersonPhone>
than this:
<Person>
<Name>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</Name>
<Communication>
<Type>
<Phone>123-456-7890</Phone>
</Type>
</Communication>
</Person>
I understand creating the class structure to represent the 2nd example is not difficult and easy for all of us to understand. However, my real .XSD is about 50x this example. My goal is to provide an easier interface and ability to have a flattened view, so please use that as a constraint of this question. Imagine it like a user was entering data on a form and pressed submit; a form is like a flattened view of data to be entered.
The hurdles I am encountering are the following:
Having a node that can repeat a finite set of times is solvable. However, nodes with the following constraint on the .xsd: maxOccurs="unbounded" do not appear to be initially doable with a flattened view. Is there another way of doing this so I don't have to introduce a collection? Or can I introduce a collection but still allow the user to not have to understand a complex structure (like my 1st example)? Please provide an example of what that would look like if possible.
I have node names that are repeated among different parts of the .xsd but are unrelated. For example the node ID or Date. My solution is to append the parent node name to the value to create a property like SubmitDate or PersonID. The issue I now have is my ViewModel class property names don't match the ones of my entities that must be mapped to in the domain model. I'm using ValueInjecter, so is there any type of streamlined way I can still map properties to other classes that have different names (i.e. annotation or something)?
Any help is appreciated, thank you!
I believe the answer lies in creating custom injections for ValueInjector to use and then simply making a call to 'InjectFrom' to invoke them...
_person.InjectFrom<CustomPersonInjection>(flattenedViewModel);
I had a quick look around for some specific examples that might help you but could find anything within a reasonable time frame (they're out there though, google 'valueinjecter custom injections').
Here are some links to get you started:
Deep Cloning example: http://valueinjecter.codeplex.com/wikipage?title=Deep%20Cloning&referringTitle=Home
Custom Convention Injection: Using ValueInjecter to map between objects with different property names

Maintaining object relational mapping with serialization

I am trying to figure out a way to make some of my database objects serializable to and from XML files.
I am using an Entity Framework data model for my objects and making them available to my client using WCF RIA Services. I want to be able to take a given object from the database and serialize it to an XML file, and vice-versa.
In the past I have tried this and the problems I run into are as follows:
If I implement IXmlSerializable for each object, then at the time of deserialization each object knows nothing of the other objects being deserialized. It is in a kind of bubble and it has no way of resolving a foreign key ID to an object reference.
For the above problem, the only solution I found was to write one big serialization and deserialization method where a parent object keeps track of references and assign them as needed. This feels like a very bad way of doing it since I have to constantly maintain this large method anytime an object changes, instead of each object being responsible for its own serialization.
The standard XML design of nesting objects inside each other does not work well for ORM models. The reason is that some objects may have references to and be used by multiple other objects, so I can't create those objects as sub-elements of a parent object.
Consider the following XML:
<User Name="John Smith">
<FavoriteMovies>
<Movie Name="The Big Lebowski" Year="1998" ... />
</FavoriteMovies>
</User>
<User Name="Robert Jones">
<FavoriteMovies>
<Movie Name="The Big Lebowski" Year="1998" ... />
</FavoriteMovies>
</User>
Clearly I shouldn't have two instances of the same movie. Rather the serialization should look something like this:
<User Name="John Smith">
<FavoriteMovies>
<Id>5</Id>
</FavoriteMovies>
</User>
<User Name="Robert Jones">
<FavoriteMovies>
<Id>5</Id>
</FavoriteMovies>
</User>
<Movies>
<Movie Id="5" Name="The Big Lebowski" Year="1998" ... />
</Movies>
WCF already knows how to serialize and deserialize my objects into SOAP/JSON/etc. using Data Services. Is that something I can just re-use when serializing to an XML file?
It occurs to me that relying on a database foreign key ID probably won't work since in many cases the objects will have the default ID. WCF manages to serialize the objects without relying on these being set, and the IDs are only assigned once it gets saved to the SQL database.
Not familiar with EF and how the object model works but for most objects that successfully are serialized/deserialized over WCF you can you just use the DataContractSerializer directly. Refer to this article for a simple walkthrough.
Since you will not need the XML to be interoperable, you can probably also use the preserveObjectReferences setting to avoid redundant data.
I highly recommend Json.NET serializer http://json.codeplex.com/ over DataContractSerializer as soon as the latter is being bit buggy. You can also look this question for example for more research about why Json.Net serializer is better.

Class design to contain data from an XML file

I'm a student working on a lab that parses a pseudo XML file(basically coded our own parser) for data, stores the retrieved elements and data values, and displays (next lab will be adding "add,change,delete" functionality)
I was thinking about holding this read in information in some sort of multidimensional List due to it being dynamic by default. The other suggestion I've read over some other questions here at SO is to make a "parent node" class, and just store that in an array.
The problem I have is that at code time there is no way to know for sure how many child nodes a parent node will have. It could be --
<parent>
<child1>data value</child1>
<child2>data value</child2>
...etc
</parent>
or
<parent>
<child1data value</child1>
</parent>
I can't really think how I could code a class to have an unknown amount of variables.
Why not just use a List<List<T>>? Or maybe a Dictionary<string, List<T>>, assuming your parent nodes have unique identifiers?
You can keep a list of nodes or a dictionary of nodes if the nodes are unique in some way.
.NET collections are inherently dynamic. You don't need to know in advance how many items your collection will hold. A collection of your own class "parent" would work. Each parent class itself would implement a collection of "children". You can define them to be anything you want.
Even better though would be to make your class serializable from the get go, so that when you save it to an XML file it's already in a properly formatted XML structure. Reading in that data would require deserialization and it would populate everything for you. Check this out.
You should manage by your self but just think, you have parent as root and parent has list of child. Where is problem ? :)

Categories

Resources