Best way to compare XML in C#

Best way to compare XML in C# - c#

I'm looking for the best way to compare two XML files for difference using C#. Like say for example if I have two XMLs A and B like this:
XML A
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>222</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>444</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
XML B
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>aaa</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>bbb</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
<data:Repeat1>
<data:TextBox4>ccc</data:TextBox4>
<data:TextBox5>ddd</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
I'm looking to get only the different between the two XML files, like in this case it would be TextBox2 and TextBox4 and one full node for Repeat1_Repeat.
Is there a easy way to get this? Maybe use some framework? I'm using .NET 4.5.2 so anything recent would work too!
Thanks!
EDIT : Oh and also, I need it to work for n-level of nesting.

I think XMLDiff is the best way. No framework needed. As seen on MSDN:
By using the XMLDiff class, the programmer is able to determine if the
two files are in fact different based on the conditions that are
important to their application. The programmer is able to ignore
changes that are only superficial (for example, different prefixes for
same namespace). XMLPatch then provides the ability to update the
original XML by applying only the changes that matter to the original
XML.
You should check it out:
https://msdn.microsoft.com/en-us/library/aa302294.aspx

You can use Paste Special to generate classes for your XML.
Then you can deserialise your xml to create an instance of your code generated class.
So for two xml files; you can create two objects like xmlObject1 and xmlObject2.
Then you can use CompareObject to identify the difference between two objects.

Related

Finding all XPaths in a XQuery using Saxon-HE with C#

Situational Background: XSD with SCH
XML Schema (XSD)
I have an XML schema definition ("the schema") that includes several other XSDs, all in the same namespace. Some of those import other XSDs from foreign namespaces. All in all, the schema declares several global elements that can be instantiated as XML documents. Let's call them Global_1, Global_2 and Global_3.
Business Rules (SCH)
The schema is augmented by a Schematron file that defines the "business rules". It defines a number of abstract rules, and each abstract rule contains a number of assertions using the data model defined via XSD. For instance:
<sch:pattern>
<sch:rule id="rule_A" abstract="true">
<sch:assert test="if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true()" id="A-01">Error message</sch:assert>
<sch:assert test="not(abc:c = 'abcd' and abc:d = 'zz')" id="A-02">Some other error message</sch:assert>
</sch:rule>
<!-- (...) -->
</sch:pattern>
Each abstract rule is extended by one or more non-abstract (concrete) rule that defines a specific context in which the abstract rule's assertions are to be validated. For example:
<sch:pattern>
<!-- (...) -->
<sch:rule context="abc:Global_1/abc:x/abc:y">
<sch:extends rule="rule_A"/>
</sch:rule>
<sch:rule context="abc:Global_2/abc:j//abc:k/abc:l">
<sch:extends rule="rule_A"/>
</sch:rule>
<!-- (...) -->
</sch:pattern>
In other words, all the assertions defined within the abstract rule_A are being applied to their specific contexts.
Both "the schema" and "the business rules" are subject to change - my program gets them at run-time and I don't know their content at design-time. The only thing I can safely assume is that there are no endless recursive structures in the schema: There is always one definite leaf node for every type and no type contains itself. Put differently, there are no "infinite loops" possible in the instances.
The Problem I want To Solve
Basically, I want to evaluate programmatically if each of the defined rules is correct. Since correctness can be quite a problematic topic, here by correctness I simply mean: Each XPath used in a rule (i.e. its context and within the XQueries of its inherited assertions) is "possible", meaning it can exist according to the data model defined in the schema. If, for instance, a namespace prefix is forgotten (abc:a/b instead of abc:a/abc:b), this XPath will never return anything other than an empty node set. The same is true if one step in the XPath is accidentally omitted, or spelled wrong, etc. This is obviously not a very strong claim for "correctness" of such a rule, but it'll do for a first step.
My Approach Towards A Solution For This
At least to me it doesn't seem like a trivial problem to evaluate an XPath (not to speak of the entire XQuery!) designed for the instance of a schema against the actual schema, given how it may contain axis steps like //, ancestor::, sibling::, etc. So I decided to construct something I would call a "maximum instance": By recursively iterating through all global elements and their children (and the structure of their respective complex types etc.), I build an XML instance at run-time that contains every possible element and attribute where it would be in the normal instance, but all at once. So every optional element/attribute, every element within a choice block and so on. So, said maximum instance would look something like this:
<maximumInstance>
<Global_1>
<abc:a>
<abc:b additionalAttribute="some_fixed_value">
<abc:j/>
<abc:k/>
<abc:l/>
</abc:b>
</abc:a>
</Global_1>
<Global_2>
<abc:x>
<abc:y>
<abc:a/>
<abc:z>
<abc:l/>
</abc:z>
</abc:y>
</abc:x>
</Global_2>
<Global_3>
<!-- ... -->
</Global_3>
<!-- ... -->
</maximumInstance>
All it takes now is to iterate over all abstract rules: And for every assertion in each abstract rule it must be checked that for every context the respective abstract rule is extended by, every XPath within an assertion results in a non-empty node set when evaluated against the maximum instance.
Where I'm stuck
I have written a C# (.NET Framework 4.8) program that parses "the schema" into said "maximum instance" (which is an XDocument at run-time). It also parses the business rules into a structure that makes it easy to get each abstract rule, its assertions, and the contexts these assertions are to be validated against.
But currently, I only have each complete XQuery (just like they are in the Schematron file) which effectively creates an assertion. But I actually need to break the XQuery down into its components (I guess I'd need the abstract syntax tree) so that I would have all individual XPaths. For instance, when given the XQuery if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(), I would need to retrieve abc:a/abc:b and abc:x/abc:y.
I assume that this could be done using Saxon-HE (or maybe another Parser/Compiler currently available for C# I don't know about). Unfortunately, I have yet to understand how to make use of Saxon well enough to even find at least a valid starting point for what I want to achieve. I've been trying to use the abstract syntax tree (so I can access the respective XPaths in the XQuery) seemingly accessible via XQueryExecutable:
Processor processor = new Processor();
XQueryCompiler xqueryCompiler = processor.NewXQueryCompiler();
XQueryExecutable exe = xqueryCompiler.Compile(xquery);
var AST = exe.getUnderlyingCompiledQuery();
var st = new XDocument();
st.Add(new XElement("root"));
XdmNode node = processor.NewDocumentBuilder().Build(st.CreateReader());
AST.explain((node); // <-- this is an error!
But that doesn't get me anywhere: I don't find any properties exposed I could work with? And while VS offers me to use AST.explain(...) (which seems promising), I'm unable to figure out what to parametrize here. I tried using a XdmNode which I thought would be a Destination? But also, I am using Saxon 10 (via NuGet), while Destination seems to be from Saxon 9: net.sf.saxon.s9api.Destination?!
Does anybody who was kind enough to read through all of this have any advice for me on how to tackle this? :-) Or, maybe there's a better way to solve my problem I haven't thought of - I'm also grateful for suggestions.
TL;DR
Sorry for the wall of text! In short: I have Schematron rules that augment an XML schema with business logic. To evaluate these rules (not: validate instances against the rules!) without actual XML instances, I need to break down the XQueries which make up the Schematron's assertions into their components so that I can handle all XPaths used in them. I think it can be done with Saxon-HE, but my knowledge is too limited to even understand what a good starting point what be for that. I'm also open for suggestions regarding a possibly better approach to solve my actual problem (as described in detail above).
Thank you for taking the time to read this.

If this were an XSD schema rather than a Schematron schema, then Saxon-EE would do the job for you automatically: this is very similar what a schema-aware XQuery processor attempts to do. But another difference is that in schema-aware XQuery, you can't assume that every element named foo is a valid instance of the element declaration named foo in the schema; it's quite legitimate, for example, for a query to transform valid instances into invalid instances, or vice versa. The input and output, after all, might conform to different schemas.
Saxon uses path analysis to do this: it looks at path expressions to see "where they might lead". Path analysis is also used to assess streamability, and to support document projection (building a trimmed-down tree representation of the source document that leaves out the parts that the query cannot reach). The path analysis in Saxon is by no means complete, for example it doesn't attempt to handle recursive functions. Although all these operations require Saxon-EE, the basic path analysis code is actually present in Saxon-HE, but I would offer no guarantee that it works for any purpose other than those described.
You're basically right that this is a tough problem you've set yourself, and I wish you luck with it.
Another approach you could adopt that wouldn't involve grovelling around the Saxon internals is to convert the XQuery to XQueryX, which is an XML representation of the parse tree, and then inspect the XQueryX (presumably using XQuery) to find the parts you need.

While XQueryX (as pointed out by Michael Kay) would theoretically have been exactly what I was looking for, unfortunately I could not find anything useful regarding an implementation for .NET during my research.
So I eventually solved the whole thing by creating my own parser using the XPath3.1 grammar for ANTLR4 as an ideal starting point. This way, I am now able to retrieve a syntax tree of any Schematron rule expression, allowing me to extract each contained XPath expression (and its sub expressions) separately.
Note that another stumbling block has been the fact that .NET still (!) only handles XPath 1.0 genuinely: While my parser does everything as supposed to, for some of the found expressions .NET gave me "illegal token" errors when trying to evaluate them. Installing the XPath2 NuGet package by Chertkov/Heyenrath was the solution.

Xml serialization and different languages (internationalization)

What is a good solution to do XML serialization in different languages (internationalization) so that tags and attributes are serialized to the same object with different language?
I save configuration in xml and want to support german and english for the xml.
I don't really know how a good approach for this issue looks like.
Best solution would be but doesn´t exist:
[XmlRoot("MyEnglishConfig,MeineDeuscheKonfiguration")]
public class Test
{
[XmlElement(ElementName="Setting1,Einstellung1")]
public String value1;
}
so that both xml versions are parsed correct:
<MyEnglishConfig>
<Setting1>English value</Setting1>
<MyEnglishConfig>
<MeineDeuscheKonfiguration>
<Einstellung1>deutscher Wert</Einstellung1>
</MeineDeuscheKonfiguration>

What is a good solution to do XML serialization in different languages (internationalization) so that tags and attributes are serialized to the same object with different language?
Don't! That simply isn't a good thing to do, and virtually no libraries will help you do it. By all means localize and internationalize a maintenance UI; but leave the xml in a single culture. This is just asking for extreme pain. The type of people who are going to be manually editing an xml file probably aren't going to mind if they need to read tags in a different language.
If you needed to do that, XmlAttributeOverrides can be used to make a per-language serializer, but... yeuch.

Node-By-Node XML Serialization

I need to write out an XML where the order of elements is important (I realize that XML format might not be the right thing to use here, but...). I need something like:
<Author>
<Book>
<Author>
<Book>
The underlying class has elements that look like:
Author[] Author;
Book[] Book;
I am planning on having an index value on the Book and Author classes and using that to write out the XML.
What I am trying to find is if there is an easy way to serialize classes one by one into an XML. I looked at XmlWriter but it looks like it can only be used to write XML at a very basic level (i.e. no serialization support).
Thanks for your help!

If you don't want to use the built-in serialization routines (and sometimes you don't, because your structure may change and it gets harder and harder to use those routines as your structure evolves over time), then an easy way to get where you want to go is to create an XmlDocument and then use XmlDocument.CreateNode to create an XML node for the document, adding attributes to the node as needed, then add the node to the document using XmlDocument.AppendChild (or as you get deeper into the structure, use XmlNode.AppendChild). Finally, use XmlDocument.Save to write the structure out to a file (or memory or whatever you want to do with it at that point).
The Microsoft System.Xml namespace documentation is quite good, look to there for a lot of examples and then just write quick test programs to verify what you learn from the examples.

Walking an XML tree in C#

I'm new to .net and c#, so I want to make sure i'm using the right tool for the job.
The XML i'm receiving is a description of a directory tree on another machine, so it go many levels deep. What I need to do now is to take the XML and create a structure of objects (custom classes) and populate them with info from the XML input, like File, Folder, Tags, Property...
The Tree stucture of this XML input makes it, in my mind, a prime candidate for using recursion to walk the tree.
Is there a different way of doing this in .net 3.5?
I've looked at XmlReaders, but they seem to be walking the tree in a linear fashion, not really what i'm looking for...
The XML i'm receiving is part of a 3rd party api, so is outside my control, and may change in the futures.
I've looked into Deserialization, but it's shortcomings (black box implementation, need to declare members a public, slow, only works for simple objects...) takes it out of the list as well.
Thanks for your input on this.

I would use the XLINQ classes in System.Xml.Linq (this is the namespace and the assembly you will need to reference). Load the XML into and XDocument:
XDocument doc = XDocument.Parse(someString);
Next you can either use recursion or a pseudo-recursion loop to iterate over the child nodes. You can choose you child nodes like:
//if Directory is tag name of Directory XML
//Note: Root is just the root XElement of the document
var directoryElements = doc.Root.Elements("Directory");
//you get the idea
var fileElements = doc.Root.Elements("File");
The variables directoryElements and fileElements will be IEnumerable types, which means you can use something like a foreach to loop through all of the elements. One way to build up you elements would be something like this:
List<MyFileType> files = new List<MyFileType>();
foreach(XElelement fileElement in fileElements)
{
files.Add(new MyFileType()
{
Prop1 = fileElement.Element("Prop1"), //assumes properties are elements
Prop2 = fileElement.Element("Prop2"),
});
}
In the example, MyFileType is a type you created to represent files. This is a bit of a brute-force attack, but it will get the job done.
If you want to use XPath you will need to using System.Xml.XPath.
A Note on System.Xml vs System.Xml.Linq
There are a number of XML classes that have been in .Net since the 1.0 days. These live (mostly) in System.Xml. In .Net 3.5, a wonderful, new set of XML classes were released under System.Xml.Linq. I cannot over-emphasize how much nicer they are to work with than the old classes in System.Xml. I would highly recommend them to any .Net programmer and especially someone just getting into .Net/C#.

XmlReader isn't a particularly friendly API. If you can use .NET 3.5, then loading into LINQ to XML is likely to be your best bet. You could easily use recursion with that.
Otherwise, XmlDocument would still do the trick... just a bit less pleasantly.

This is a problem which is very suitable for recursion.
To elaborate a bit more on what another poster said, you'll want to start by loading the XML into a System.Xml.XmlDocument, (using LoadXml or Load).
You can access the root of the tree using the XmlDocument.DocumentElement property, and access the children of each node by using the ChildNodes property. Child nodes returns a collection, and when the Collection is of size 0, you know you'll have reached your base case.
Using LINQ is also a good option, but I'm unable to elaborate on this solution, cause I'm not really a LINQ expert.
As Jon mentioned, XmlReader isn't very friendly. If you end up having perf issues, you might want to look into it, but if you just want to get the job done, go with XmlDocument/ChildNodes using recursion.

Load your XML into an XMLDocument. You can then walk the XMLDocuments DOM using recursion.
You might want to also look into the factory method pattern to create your classes, would be very useful here.

How to exclude certain class/packages/public members from javadoc

I have created java api -ported from C# to be more specific- which aside from public interface, contains a lot of internal stuff that I don't want a user to know about. In C#, I have used doxygen to generate documentation. I presume javadoc has similar features to exclude certain public members, classes, even packages.
Would someone suggest how do that, perhaps via eclipse?
Thanks

I believe that in Eclipse, the only kind of exclusions you can specify are things like "exclude all protected members" and package-based exclusions (not class-based exclusions.)
If you're using Ant to generate them, you can use the nested "package" element, and you can use a fileset nested element and add exclusions to that fileset.
For example:
<javadoc >
<sourcefiles>
<fileset dir="${src}">
<include name="**/*.java"/>
<exclude name="**/ClassToExclude.java"/>
</fileset>
</sourcefiles>
<packageset>
<dirset dir="${src}">
<include name="com.mydomain.*"/>
<exclude name="com.mydomain.excludePackage"/>
</dirset>
</packageset>
</javadoc>
P.S. - I've used the <sourcefiles> element alot, but never the <packageset> element. The latter might not be spot-on syntactically.

Sure, when you "Generate Javadoc", you can select the package concerned by this process and exclude the others
alt text http://www.filigris.com/products/docflex_javadoc/images/eclipse_javadoc_1.png
(here the picture shows the javadoc generated with another tool, but that does not change the general idea)

You can use doxygen with Java. I am not aware of any tools that do what you want with Javadoc.
Superpackages should be coming in JDK 7 which I beleive could address this: http://blogs.oracle.com/andreas/entry/superpackages_in_jsr_294

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.