I need to write out an XML where the order of elements is important (I realize that XML format might not be the right thing to use here, but...). I need something like:
<Author>
<Book>
<Author>
<Book>
The underlying class has elements that look like:
Author[] Author;
Book[] Book;
I am planning on having an index value on the Book and Author classes and using that to write out the XML.
What I am trying to find is if there is an easy way to serialize classes one by one into an XML. I looked at XmlWriter but it looks like it can only be used to write XML at a very basic level (i.e. no serialization support).
Thanks for your help!
If you don't want to use the built-in serialization routines (and sometimes you don't, because your structure may change and it gets harder and harder to use those routines as your structure evolves over time), then an easy way to get where you want to go is to create an XmlDocument and then use XmlDocument.CreateNode to create an XML node for the document, adding attributes to the node as needed, then add the node to the document using XmlDocument.AppendChild (or as you get deeper into the structure, use XmlNode.AppendChild). Finally, use XmlDocument.Save to write the structure out to a file (or memory or whatever you want to do with it at that point).
The Microsoft System.Xml namespace documentation is quite good, look to there for a lot of examples and then just write quick test programs to verify what you learn from the examples.
Related
I'm looking for the best way to compare two XML files for difference using C#. Like say for example if I have two XMLs A and B like this:
XML A
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>222</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>444</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
XML B
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>aaa</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>bbb</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
<data:Repeat1>
<data:TextBox4>ccc</data:TextBox4>
<data:TextBox5>ddd</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
I'm looking to get only the different between the two XML files, like in this case it would be TextBox2 and TextBox4 and one full node for Repeat1_Repeat.
Is there a easy way to get this? Maybe use some framework? I'm using .NET 4.5.2 so anything recent would work too!
Thanks!
EDIT : Oh and also, I need it to work for n-level of nesting.
I think XMLDiff is the best way. No framework needed. As seen on MSDN:
By using the XMLDiff class, the programmer is able to determine if the
two files are in fact different based on the conditions that are
important to their application. The programmer is able to ignore
changes that are only superficial (for example, different prefixes for
same namespace). XMLPatch then provides the ability to update the
original XML by applying only the changes that matter to the original
XML.
You should check it out:
https://msdn.microsoft.com/en-us/library/aa302294.aspx
You can use Paste Special to generate classes for your XML.
Then you can deserialise your xml to create an instance of your code generated class.
So for two xml files; you can create two objects like xmlObject1 and xmlObject2.
Then you can use CompareObject to identify the difference between two objects.
I believe the following:
public List<Vector3> Vectors;
Will serialize out to:
<Vectors>
<Vector3>
<X>0</X>
<Y>0</Y>
<Z>0</Z>
</Vector3>
</Vectors>
I want to remove the encasing tag which I believe I can do like this:
[XmlElement("Vector3")]
public List<Vector3> Vectors;
Which should serialize to:
<Vector3>
<X>0</X>
<Y>0</Y>
<Z>0</Z>
</Vector3>
But I'm afraid that would break old XML files that are still using the "Vectors" tag around the list. Is there a common way to solve this?
EDIT: The list above would be part of a container object, so the full XML might begin with
<Container>
and end with
</Container>
I left that out originally to keep the question shorter.
I don't believe XML has any sort of built in mechanism for versioning. I think your best bet is going to be writing some external mechanism which can detect the "version" as defined by you and deserialize the old version into your new object manually. You probably will also want to define a new version member variable or property which will serialize with your object in case you run into the same problem again, because once you change the schema a 2nd time, you will have 3 versions to worry about.
You can either write a custom deserialize method by defining IXmlSerializable on your object and defining the readXml/writeXml functions, or you can use some external process to generate the new XML format based on the old version. Perhaps load the XML file into an XmlDocument first, fix it how you want (i.e. move the Vector3 nodes up a level and remove the Vectors node), then save the document's OuterXml value into a string and deserialize via a MemoryStream.
I'm new to .net and c#, so I want to make sure i'm using the right tool for the job.
The XML i'm receiving is a description of a directory tree on another machine, so it go many levels deep. What I need to do now is to take the XML and create a structure of objects (custom classes) and populate them with info from the XML input, like File, Folder, Tags, Property...
The Tree stucture of this XML input makes it, in my mind, a prime candidate for using recursion to walk the tree.
Is there a different way of doing this in .net 3.5?
I've looked at XmlReaders, but they seem to be walking the tree in a linear fashion, not really what i'm looking for...
The XML i'm receiving is part of a 3rd party api, so is outside my control, and may change in the futures.
I've looked into Deserialization, but it's shortcomings (black box implementation, need to declare members a public, slow, only works for simple objects...) takes it out of the list as well.
Thanks for your input on this.
I would use the XLINQ classes in System.Xml.Linq (this is the namespace and the assembly you will need to reference). Load the XML into and XDocument:
XDocument doc = XDocument.Parse(someString);
Next you can either use recursion or a pseudo-recursion loop to iterate over the child nodes. You can choose you child nodes like:
//if Directory is tag name of Directory XML
//Note: Root is just the root XElement of the document
var directoryElements = doc.Root.Elements("Directory");
//you get the idea
var fileElements = doc.Root.Elements("File");
The variables directoryElements and fileElements will be IEnumerable types, which means you can use something like a foreach to loop through all of the elements. One way to build up you elements would be something like this:
List<MyFileType> files = new List<MyFileType>();
foreach(XElelement fileElement in fileElements)
{
files.Add(new MyFileType()
{
Prop1 = fileElement.Element("Prop1"), //assumes properties are elements
Prop2 = fileElement.Element("Prop2"),
});
}
In the example, MyFileType is a type you created to represent files. This is a bit of a brute-force attack, but it will get the job done.
If you want to use XPath you will need to using System.Xml.XPath.
A Note on System.Xml vs System.Xml.Linq
There are a number of XML classes that have been in .Net since the 1.0 days. These live (mostly) in System.Xml. In .Net 3.5, a wonderful, new set of XML classes were released under System.Xml.Linq. I cannot over-emphasize how much nicer they are to work with than the old classes in System.Xml. I would highly recommend them to any .Net programmer and especially someone just getting into .Net/C#.
XmlReader isn't a particularly friendly API. If you can use .NET 3.5, then loading into LINQ to XML is likely to be your best bet. You could easily use recursion with that.
Otherwise, XmlDocument would still do the trick... just a bit less pleasantly.
This is a problem which is very suitable for recursion.
To elaborate a bit more on what another poster said, you'll want to start by loading the XML into a System.Xml.XmlDocument, (using LoadXml or Load).
You can access the root of the tree using the XmlDocument.DocumentElement property, and access the children of each node by using the ChildNodes property. Child nodes returns a collection, and when the Collection is of size 0, you know you'll have reached your base case.
Using LINQ is also a good option, but I'm unable to elaborate on this solution, cause I'm not really a LINQ expert.
As Jon mentioned, XmlReader isn't very friendly. If you end up having perf issues, you might want to look into it, but if you just want to get the job done, go with XmlDocument/ChildNodes using recursion.
Load your XML into an XMLDocument. You can then walk the XMLDocuments DOM using recursion.
You might want to also look into the factory method pattern to create your classes, would be very useful here.
I've got a little problem that's slightly frustrating. Is it possible to set a default value when deserializing xml in C# (.NET 3.5)? Basically I'm trying to deserialize some xml that is not under my control and one element looks like this:
<assignee-id type="integer">38628</assignee-id>
it can also look like this:
<assignee-id type="integer" nil="true"></assignee-id>
Now, in my class I have the following property that should receive the data:
[XmlElementAttribute("assignee-id")]
public int AssigneeId { get; set; }
This works fine for the first xml element example, but the second fails. I've tried changing the property type to be int? but this doesn't help. I'll need to serialize it back to that same xml format at some point too, but I'm trying to use the built in serialization support without having to resort to rolling my own.
Does anyone have experience with this kind of problem?
It looks like your source XML is using xsi:type and xsi:nil, but not prefixing them with a namespace.
What you could do is process these with XSLT to turn this:
<assignees>
<assignee>
<assignee-id type="integer">123456</assignee-id>
</assignee>
<assignee>
<assignee-id type="integer" nil="true"></assignee-id>
</assignee>
</assignees>
into this:
<assignees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<assignee>
<assignee-id xsi:type="integer">123456</assignee-id>
</assignee>
<assignee>
<assignee-id xsi:type="integer" xsi:nil="true" />
</assignee>
</assignees>
This would then be handled correctly by the XmlSerializer without needing any custom code. The XSLT for this is rather trivial, and a fun exercise. Start with one of the many "copy" XSLT samples and simply add a template for the "type" and "nil" attributes to ouput a namespaced attribute.
If you prefer you could load your XML document into memory and change the attributes but this is not a good idea as the XSLT engine is tuned for performance and can process quite large files without loading them entirely into memory.
You might want to take a look at the OnDeserializedAttribute,OnSerializingAttribute, OnSerializedAttribute, and OnDeserializingAttribute to add custom logic to the serialization process
XmlSerializer uses xsi:nil - so I expect you'd need to do custom IXmlSerializable serialization for this. Sorry.
I'm wondering what the best practices are for storing a relational data structure in XML. Particulary, I am wondering about best practices for enforcing node order. For example, say I have three objects: School, Course, and Student, which are defined as follows:
class School
{
List<Course> Courses;
List<Student> Students;
}
class Course
{
string Number;
string Description;
}
class Student
{
string Name;
List<Course> EnrolledIn;
}
I would store such a data structure in XML like so:
<School>
<Courses>
<Course Number="ENGL 101" Description="English I" />
<Course Number="CHEM 102" Description="General Inorganic Chemistry" />
<Course Number="MATH 103" Description="Trigonometry" />
</Courses>
<Students>
<Student Name="Jack">
<EnrolledIn>
<Course Number="CHEM 102" />
<Course Number="MATH 103" />
</EnrolledIn>
</Student>
<Student Name="Jill">
<EnrolledIn>
<Course Number="ENGL 101" />
<Course Number="MATH 103" />
</EnrolledIn>
</Student>
</Students>
</School>
With the XML ordered this way, I can parse Courses first. Then, when I parse Students, I can look up each Course listed in EnrolledIn (by its Number) in the School.Courses list. This will give me an object reference to add to the EnrolledIn list in Student. If Students, however, comes before Courses, such a lookup to get a object reference is not possible. (Since School.Courses has not yet been populated.)
So what are the best practices for storing relational data in XML?
- Should I enforce that Courses must always come before Students?
- Should I tolerate any ordering and create a stub Course object whenever I encounter one I have not yet seen? (To be expanded when the definition of the Course is eventually reached later.)
- Is there some other way I should be persisting/loading my objects to/from XML? (I am currently implementing Save and Load methods on all my business objects and doing all this manually using System.Xml.XmlDocument and its associated classes.)
I am used to working with relational data out of SQL, but this is my first experience trying to store a non-trivial relational data structure in XML. Any advice you can provide as to how I should proceed would be greatly appreciated.
Don't think in SQL or relational when working with XML, because there are no order constraints.
You can however query using XPath to any portion of the XML document at any time. You want the courses first, then "//Courses/Course". You want the students enrollments next, then "//Students/Student/EnrolledIn/Course".
The bottom line being... just because XML is stored in a file, don't get caught thinking all your accesses are serial.
I posted a separate question, "Can XPath do a foreign key lookup across two subtrees of an XML?", in order to clarify my position. The solution shows how you can use XPath to make relational queries against XML data.
While you can specify order of child elements using a <xsd:sequence>, by requiring child objects to come in specific order you make your system less flexible (i.e., harder to update using notepad).
Best thing to do is to parse out all your data, then perform what actions you need to do. Don't act during the parse.
Obviously, the design of the XML and the data behind it precludes serializing a single POCO to XML. You need to control the serialization and deserialization logic in order to unhook and re-hook objects together.
I'd suggest creating a custom serializer that builds the xml representation of this object graph. It can thereby control not only the order of serialization, but also handle situations where nodes aren't in the expected order. You could do other things such as adding custom attributes to use for linking objects together which don't exist as public properties on the objects being serialized.
Creating the xml would be as simple as iterating over your objects a few times, building up collections of XElements with the expected representation of the objects as xml. When you're done you can stitch them together into an XDocument and grab the xml from it. You can make multiple passes over the xml on the reverse side to re-create your object graph and restore all references.
Node ordering is only important if you need to do forward-only processing of the data, e.g. using an XmlReader or a SAX parser. If you're going to read the XML into a DOM before processing it (which you are if you're using XmlDocument), node order doesn't really matter. What matters more is that the XML be structured so that you can query it with XPath efficiently, i.e. without having to use "//".
If you take a look at the schema that the DataSetGenerator produces, you'll see that there's no ordering associated with the DataTable-level elements. It may be that ADO processes elements in some sequence not represented in the schema (e.g. one DataTable at a time), or it may be that ADO does forward-only processing and doesn't enforce relational constraints until the DataSet is fully read. I don't know. But it's clear that ADO doesn't couple the processing order to the document order.
(And yes, you can specify the order of child elements in an XML schema; that's what xs:sequence does. If you don't want node order to be enforced, you use an unbounded xs:choice.)
The order is not usually important in XML. In this case the Courses could come after Students. You parse the XML and then you make your queries on the entire data.
From experience, XML isn't the best to store relational data. Have you investigated YAML? Do you have the option?
If you don't, a safe way would be to have a strict DTD for the XML and enforce that way. You could also, as you suggest, keep a hash of objects created. That way if a Student creates a Course you keep that Course around for future updating when the tag is hit.
Also remember you can use XPath queries to access specific nodes directly, so you can enforce parsing of courses first regardless of position in the XML document. (making a more complete answer, thanks to dacracot)
You could also use two XML files, one for courses and a second for students. Open and parse the first before you do the second.
I's been a while, but I seem to remember doing a base collection of 'things' in one part of an xml file, and referring to them in another using the schema features keyref and refer. I found a few examples here. My apologies if this is not what you're looking for.
XML is definitely not a friendly place for relational data.
If you absolutely need to do this, then I'd recommend a funky inverted kind of logic.
In your example, you've got Schools, which offers many courses, taken by many students.
Your XML might follow as such:
<School>
<Students>
<Student Name="Jack">
<EnrolledIn>
<Course Number="CHEM 102" Description="General Inorganic Chemistry" />
<Course Number="MATH 103" Description="Trigonometry" />
</EnrolledIn>
</Student>
<Student Name="Jill">
<EnrolledIn>
<Course Number="ENGL 101" Description="English I" />
<Course Number="MATH 103" Description="Trigonometry" />
</EnrolledIn>
</Student>
</Students>
</School>
This obviously isn't the least repetitive way to do this (it's relational data!), but it's easily parse-able.