How to convert xsd to cs with redundancies? (OTA XML Publication)

How to convert xsd to cs with redundancies? (OTA XML Publication) - c#

I'm trying to create a SOAP 1.2 based C# / WCF interface, that is supposed to handle HTNG / OTA messages. (a communication standard for hotels)
The publication of this OTA standard can be found here: Open Travel Alliance - Specifications
This publication contains a bunch of .xsd files that define all the types that can be passed through such an interface. For example for transferring new reservations to a hotel / system, you can use the OTA_HotelResNotifRQ message, that can contain HotelReservations. The SOAP XML would look something like this:
<soapenv:Body>
<OTA_HotelResNotifRQ EchoToken="1474033560.151702" TimeStamp="2016-09-16T06:46:00-08:00" Version="1.001" xmlns="http://www.opentravel.org/OTA/2003/05" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opentravel.org/OTA/2003/05 NeedToGetThisPathFromIdeas/OTA_HotelResNotifRQ.xsd" ResStatus="Modify">
<POS>
...
</POS>
<HotelReservations>
<HotelReservation CreateDateTime="2015-11-15T10:39:01-08:00" ResStatus="Reserved" LastModifyDateTime="2016-09-16T06:46:00-08:00">
<UniqueID Type="14" ID="133121274"/>
<RoomStays>
<RoomStay MarketCode="Other OTA" SourceOfBusiness="OTA">
...
</RoomStay>
</RoomStays>
</HotelReservation>
</HotelReservations>
</OTA_HotelResNotifRQ>
</soapenv:Body>
The problem is that there are multiple messages, and therefore multiple .xsd definitions using the same elements / classes. For example, for the HotelReservations mentioned above all of the marked messages use it:
All these .xsd files define the same classes, like HotelReservation or RoomStay, etc. and there is an additional .xsd (the HotelReservation, that is not a RQ or an RS) that defines the types used in these messages. What I'm saying is that these schema definitions are very very redundant.
When I try to generate .cs classes from these files, either by using xsd.exe from .NET Framework, or WSCF.Blue I'm faced with all the types getting repeated, for example HotelReservationType is going to be defined by OTA_HotelResRQ.cs, and again by OTA.HotelResNotifRQ.cs, and again by etc. This of course leads to a useless code and to Visual Studio yelling "ambigious reference" all over the place like crazy.
How can I convert these .xsd definitions to .cs classes without redundancy, having all types defined only once? Is there a tool that can do this or did Open Travel Alliance really mess up their publications and I'm pretty much screwed?

You need to create a schema file that includes/imports all the ones you need, then generate the code from that, have a look at Working with multiple XML schemas.
Also have a look at Liquid XML Data Binder if xsd.exe doesn't turn out the kind output you want.

Related

Finding all XPaths in a XQuery using Saxon-HE with C#

Situational Background: XSD with SCH
XML Schema (XSD)
I have an XML schema definition ("the schema") that includes several other XSDs, all in the same namespace. Some of those import other XSDs from foreign namespaces. All in all, the schema declares several global elements that can be instantiated as XML documents. Let's call them Global_1, Global_2 and Global_3.
Business Rules (SCH)
The schema is augmented by a Schematron file that defines the "business rules". It defines a number of abstract rules, and each abstract rule contains a number of assertions using the data model defined via XSD. For instance:
<sch:pattern>
<sch:rule id="rule_A" abstract="true">
<sch:assert test="if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true()" id="A-01">Error message</sch:assert>
<sch:assert test="not(abc:c = 'abcd' and abc:d = 'zz')" id="A-02">Some other error message</sch:assert>
</sch:rule>
<!-- (...) -->
</sch:pattern>
Each abstract rule is extended by one or more non-abstract (concrete) rule that defines a specific context in which the abstract rule's assertions are to be validated. For example:
<sch:pattern>
<!-- (...) -->
<sch:rule context="abc:Global_1/abc:x/abc:y">
<sch:extends rule="rule_A"/>
</sch:rule>
<sch:rule context="abc:Global_2/abc:j//abc:k/abc:l">
<sch:extends rule="rule_A"/>
</sch:rule>
<!-- (...) -->
</sch:pattern>
In other words, all the assertions defined within the abstract rule_A are being applied to their specific contexts.
Both "the schema" and "the business rules" are subject to change - my program gets them at run-time and I don't know their content at design-time. The only thing I can safely assume is that there are no endless recursive structures in the schema: There is always one definite leaf node for every type and no type contains itself. Put differently, there are no "infinite loops" possible in the instances.
The Problem I want To Solve
Basically, I want to evaluate programmatically if each of the defined rules is correct. Since correctness can be quite a problematic topic, here by correctness I simply mean: Each XPath used in a rule (i.e. its context and within the XQueries of its inherited assertions) is "possible", meaning it can exist according to the data model defined in the schema. If, for instance, a namespace prefix is forgotten (abc:a/b instead of abc:a/abc:b), this XPath will never return anything other than an empty node set. The same is true if one step in the XPath is accidentally omitted, or spelled wrong, etc. This is obviously not a very strong claim for "correctness" of such a rule, but it'll do for a first step.
My Approach Towards A Solution For This
At least to me it doesn't seem like a trivial problem to evaluate an XPath (not to speak of the entire XQuery!) designed for the instance of a schema against the actual schema, given how it may contain axis steps like //, ancestor::, sibling::, etc. So I decided to construct something I would call a "maximum instance": By recursively iterating through all global elements and their children (and the structure of their respective complex types etc.), I build an XML instance at run-time that contains every possible element and attribute where it would be in the normal instance, but all at once. So every optional element/attribute, every element within a choice block and so on. So, said maximum instance would look something like this:
<maximumInstance>
<Global_1>
<abc:a>
<abc:b additionalAttribute="some_fixed_value">
<abc:j/>
<abc:k/>
<abc:l/>
</abc:b>
</abc:a>
</Global_1>
<Global_2>
<abc:x>
<abc:y>
<abc:a/>
<abc:z>
<abc:l/>
</abc:z>
</abc:y>
</abc:x>
</Global_2>
<Global_3>
<!-- ... -->
</Global_3>
<!-- ... -->
</maximumInstance>
All it takes now is to iterate over all abstract rules: And for every assertion in each abstract rule it must be checked that for every context the respective abstract rule is extended by, every XPath within an assertion results in a non-empty node set when evaluated against the maximum instance.
Where I'm stuck
I have written a C# (.NET Framework 4.8) program that parses "the schema" into said "maximum instance" (which is an XDocument at run-time). It also parses the business rules into a structure that makes it easy to get each abstract rule, its assertions, and the contexts these assertions are to be validated against.
But currently, I only have each complete XQuery (just like they are in the Schematron file) which effectively creates an assertion. But I actually need to break the XQuery down into its components (I guess I'd need the abstract syntax tree) so that I would have all individual XPaths. For instance, when given the XQuery if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(), I would need to retrieve abc:a/abc:b and abc:x/abc:y.
I assume that this could be done using Saxon-HE (or maybe another Parser/Compiler currently available for C# I don't know about). Unfortunately, I have yet to understand how to make use of Saxon well enough to even find at least a valid starting point for what I want to achieve. I've been trying to use the abstract syntax tree (so I can access the respective XPaths in the XQuery) seemingly accessible via XQueryExecutable:
Processor processor = new Processor();
XQueryCompiler xqueryCompiler = processor.NewXQueryCompiler();
XQueryExecutable exe = xqueryCompiler.Compile(xquery);
var AST = exe.getUnderlyingCompiledQuery();
var st = new XDocument();
st.Add(new XElement("root"));
XdmNode node = processor.NewDocumentBuilder().Build(st.CreateReader());
AST.explain((node); // <-- this is an error!
But that doesn't get me anywhere: I don't find any properties exposed I could work with? And while VS offers me to use AST.explain(...) (which seems promising), I'm unable to figure out what to parametrize here. I tried using a XdmNode which I thought would be a Destination? But also, I am using Saxon 10 (via NuGet), while Destination seems to be from Saxon 9: net.sf.saxon.s9api.Destination?!
Does anybody who was kind enough to read through all of this have any advice for me on how to tackle this? :-) Or, maybe there's a better way to solve my problem I haven't thought of - I'm also grateful for suggestions.
TL;DR
Sorry for the wall of text! In short: I have Schematron rules that augment an XML schema with business logic. To evaluate these rules (not: validate instances against the rules!) without actual XML instances, I need to break down the XQueries which make up the Schematron's assertions into their components so that I can handle all XPaths used in them. I think it can be done with Saxon-HE, but my knowledge is too limited to even understand what a good starting point what be for that. I'm also open for suggestions regarding a possibly better approach to solve my actual problem (as described in detail above).
Thank you for taking the time to read this.

If this were an XSD schema rather than a Schematron schema, then Saxon-EE would do the job for you automatically: this is very similar what a schema-aware XQuery processor attempts to do. But another difference is that in schema-aware XQuery, you can't assume that every element named foo is a valid instance of the element declaration named foo in the schema; it's quite legitimate, for example, for a query to transform valid instances into invalid instances, or vice versa. The input and output, after all, might conform to different schemas.
Saxon uses path analysis to do this: it looks at path expressions to see "where they might lead". Path analysis is also used to assess streamability, and to support document projection (building a trimmed-down tree representation of the source document that leaves out the parts that the query cannot reach). The path analysis in Saxon is by no means complete, for example it doesn't attempt to handle recursive functions. Although all these operations require Saxon-EE, the basic path analysis code is actually present in Saxon-HE, but I would offer no guarantee that it works for any purpose other than those described.
You're basically right that this is a tough problem you've set yourself, and I wish you luck with it.
Another approach you could adopt that wouldn't involve grovelling around the Saxon internals is to convert the XQuery to XQueryX, which is an XML representation of the parse tree, and then inspect the XQueryX (presumably using XQuery) to find the parts you need.

While XQueryX (as pointed out by Michael Kay) would theoretically have been exactly what I was looking for, unfortunately I could not find anything useful regarding an implementation for .NET during my research.
So I eventually solved the whole thing by creating my own parser using the XPath3.1 grammar for ANTLR4 as an ideal starting point. This way, I am now able to retrieve a syntax tree of any Schematron rule expression, allowing me to extract each contained XPath expression (and its sub expressions) separately.
Note that another stumbling block has been the fact that .NET still (!) only handles XPath 1.0 genuinely: While my parser does everything as supposed to, for some of the found expressions .NET gave me "illegal token" errors when trying to evaluate them. Installing the XPath2 NuGet package by Chertkov/Heyenrath was the solution.

Best way to compare XML in C#

I'm looking for the best way to compare two XML files for difference using C#. Like say for example if I have two XMLs A and B like this:
XML A
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>222</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>444</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
XML B
<data:TR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pd="http://www.ascentn.com/bpm/XMLSchema">
<data:processFields />
<data:formFields>
<data:TextBox1>111</data:TextBox1>
<data:TextBox2>aaa</data:TextBox2>
<data:TextBox3>3333</data:TextBox3>
<data:Repeat1_Repeat>
<data:Repeat1>
<data:TextBox4>bbb</data:TextBox4>
<data:TextBox5>555</data:TextBox5>
</data:Repeat1>
<data:Repeat1>
<data:TextBox4>ccc</data:TextBox4>
<data:TextBox5>ddd</data:TextBox5>
</data:Repeat1>
</data:Repeat1_Repeat>
</data:formFields>
</data:TR>
I'm looking to get only the different between the two XML files, like in this case it would be TextBox2 and TextBox4 and one full node for Repeat1_Repeat.
Is there a easy way to get this? Maybe use some framework? I'm using .NET 4.5.2 so anything recent would work too!
Thanks!
EDIT : Oh and also, I need it to work for n-level of nesting.

I think XMLDiff is the best way. No framework needed. As seen on MSDN:
By using the XMLDiff class, the programmer is able to determine if the
two files are in fact different based on the conditions that are
important to their application. The programmer is able to ignore
changes that are only superficial (for example, different prefixes for
same namespace). XMLPatch then provides the ability to update the
original XML by applying only the changes that matter to the original
XML.
You should check it out:
https://msdn.microsoft.com/en-us/library/aa302294.aspx

You can use Paste Special to generate classes for your XML.
Then you can deserialise your xml to create an instance of your code generated class.
So for two xml files; you can create two objects like xmlObject1 and xmlObject2.
Then you can use CompareObject to identify the difference between two objects.

Generate a C# object based on an xml file?

This may be way out in left field, crazy, but I just need to ask before I go on implementing this massive set of classes.
Basically, I'm writing a binary message parser that decodes a certain military message format into an object. The problem is that there are literally hundreds of different message types and they share almost nothing in common with each other. So the way I'm planning to implement this is to create hundreds of different objects.
However, even though the message attributes share nothing in common, the method for decoding them is fairly straightforward and follows a pattern. So I'm planning to write a code generator to generate all the objects and the decode logic for each message type.
What would be really sweet is if there was some way to dynamically create an object based on some schema. It doesn't necessarily have to be XML, but XML is pretty easy to work with.
Is this possible in C#?
I would like the interface to look something like this:
var decodedMessage = MessageDecoder.Decode(byteArray);
Where the MessageDecoder figures out what type of message it is and then returns the appropriate object. It will probably return an interface which implements a MessageType Property or something like that.
Basically what I'm wondering is if there is a way to have one object called Message, which implements a MessageType Property. And then Depending on the MessageType, the Message object transforms into whatever type of message it is, so I don't have to spend the time creating all of these message types.

ExpandOobject Where you can dynamically add fields to an object.
A good starting point is here.

Is xsd.exe what you are looking for? It can take an XML file or a schema and generate the c# classes. One problem that you might encounter though is that some of the military message formats are VERY obtuse. You could end up with some very large code files.

Look at T4 templates. They let you write code to generate code, they are integrated into the IDE, and they are quite easy really.
EDIT: There is no way to do what you are after with var, because var requires the right-hand side of the assignment to be statically typed (at compile time). I suppose that you could dynamically generate that statement, then compile and run it, but that's a very painful approach.
If you have XSD's for all of the message types, then you can use xsd.exe as #jle suggests. If not, then I am curious about the following:
// Let's assume this works
var decodedMessage = MessageDecoder.Decode(byteArray);
// Now what? I don't know what properties there are on decodedMessage, so I cant do anything with it.

XmlSerializer.Deserialize - ignore unnecessary elements?

I've got an XSD schema which I've generated a class for using xsd.exe, and I'm trying to use XmlSerializer.Deserialize to create an instance of that class from an XML file that is supposed to conform to the XSD schema. Unfortunately the XML file has some extra elements that the schema is not expecting, which causes a System.InvalidOperationException to be thrown from Deserialize.
I've tried adding <xs:any> elements to my schema but this doesn't seem to make any difference.
My question is: is there any way to get XmlSerializer.Deserialize to ignore these extra elements?

I usually add extra properties or fields to all entity classes to pick up extra elements and attributes, looking something like the code below:
[XmlAnyAttribute]
public XmlAttribute[] AnyAttributes;
[XmlAnyElement]
public XmlElement[] AnyElements;
Depending on the complexity of your generated code, you may not find hand-inserting this code on every entity appealing. Perhaps only-slightly-less-tedious is defining these attributes in a base class and ensuring all entities inherit the base.
To give fair attribution, I was first introduced to this pattern when reading the source code for DasBlog.

I don't think there is an option to do this. You either have to fix the schema or manually modify the code generated by xsd.exe to allow the XML to be deserialized. You can also try to open the XML document + schema in Visual Studio or any other XML editor with schema support to either fix the schema or the XML document.

How can I transform an object graph to an external XML format

I have to send information too a third party in an XML format they have specified, a very common task I'm sure.
I have set of XSD files and, using XSD.exe, I have created a set of types. To generate the XML I map the values from the types within my domain to the 3rd party types:
public ExternalBar Map(InternalFoo foo) {
var bar = new ExternalBar;
bar.GivenName = foo.FirstName;
bar.FamilyName = foo.LastName;
return bar;
}
I will then use the XMLSerializer to generate the files, probably checking them against the XSD before releasing them.
This method is very manual though and I wonder if there is a better way using the Framework or external tools to map the data and create the files.

LINQ to XML works quite well for this... e.g.
XElement results = new XElement("ExternalFoos",
from f in internalFoos
select new XElement("ExternalFoo", new XAttribute[] {
new XAttribute("GivenName", f.FirstName),
new XAttribute("FamilyName", f.LastName) } ));

Firstly, I'm assuming that the object properties in your existing domain map to the 3rd party types without much manipulation, except for the repetitive property assignments.
So I'd recommend just using standard XML serialization of your domain tree (generate an outbound schema for your classes using XSD), then post-processing the result via a set of XSLT stylesheets. Then after post-processing, validate the resulting XML documents against the 3rd party schemas.
It'll probably be more complicated than that, because it really depends on the complexity of the mapping between the object domains, but this is a method that I've used successfully in the past.
As far as GUI tools are concerned I've heard (but not used myself) that Stylus Studio is pretty good for schema-to-schema mappings (screenshot here).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.