Finding all XPaths in a XQuery using Saxon-HE with C#

Finding all XPaths in a XQuery using Saxon-HE with C# - c#

Situational Background: XSD with SCH
XML Schema (XSD)
I have an XML schema definition ("the schema") that includes several other XSDs, all in the same namespace. Some of those import other XSDs from foreign namespaces. All in all, the schema declares several global elements that can be instantiated as XML documents. Let's call them Global_1, Global_2 and Global_3.
Business Rules (SCH)
The schema is augmented by a Schematron file that defines the "business rules". It defines a number of abstract rules, and each abstract rule contains a number of assertions using the data model defined via XSD. For instance:
<sch:pattern>
<sch:rule id="rule_A" abstract="true">
<sch:assert test="if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true()" id="A-01">Error message</sch:assert>
<sch:assert test="not(abc:c = 'abcd' and abc:d = 'zz')" id="A-02">Some other error message</sch:assert>
</sch:rule>
<!-- (...) -->
</sch:pattern>
Each abstract rule is extended by one or more non-abstract (concrete) rule that defines a specific context in which the abstract rule's assertions are to be validated. For example:
<sch:pattern>
<!-- (...) -->
<sch:rule context="abc:Global_1/abc:x/abc:y">
<sch:extends rule="rule_A"/>
</sch:rule>
<sch:rule context="abc:Global_2/abc:j//abc:k/abc:l">
<sch:extends rule="rule_A"/>
</sch:rule>
<!-- (...) -->
</sch:pattern>
In other words, all the assertions defined within the abstract rule_A are being applied to their specific contexts.
Both "the schema" and "the business rules" are subject to change - my program gets them at run-time and I don't know their content at design-time. The only thing I can safely assume is that there are no endless recursive structures in the schema: There is always one definite leaf node for every type and no type contains itself. Put differently, there are no "infinite loops" possible in the instances.
The Problem I want To Solve
Basically, I want to evaluate programmatically if each of the defined rules is correct. Since correctness can be quite a problematic topic, here by correctness I simply mean: Each XPath used in a rule (i.e. its context and within the XQueries of its inherited assertions) is "possible", meaning it can exist according to the data model defined in the schema. If, for instance, a namespace prefix is forgotten (abc:a/b instead of abc:a/abc:b), this XPath will never return anything other than an empty node set. The same is true if one step in the XPath is accidentally omitted, or spelled wrong, etc. This is obviously not a very strong claim for "correctness" of such a rule, but it'll do for a first step.
My Approach Towards A Solution For This
At least to me it doesn't seem like a trivial problem to evaluate an XPath (not to speak of the entire XQuery!) designed for the instance of a schema against the actual schema, given how it may contain axis steps like //, ancestor::, sibling::, etc. So I decided to construct something I would call a "maximum instance": By recursively iterating through all global elements and their children (and the structure of their respective complex types etc.), I build an XML instance at run-time that contains every possible element and attribute where it would be in the normal instance, but all at once. So every optional element/attribute, every element within a choice block and so on. So, said maximum instance would look something like this:
<maximumInstance>
<Global_1>
<abc:a>
<abc:b additionalAttribute="some_fixed_value">
<abc:j/>
<abc:k/>
<abc:l/>
</abc:b>
</abc:a>
</Global_1>
<Global_2>
<abc:x>
<abc:y>
<abc:a/>
<abc:z>
<abc:l/>
</abc:z>
</abc:y>
</abc:x>
</Global_2>
<Global_3>
<!-- ... -->
</Global_3>
<!-- ... -->
</maximumInstance>
All it takes now is to iterate over all abstract rules: And for every assertion in each abstract rule it must be checked that for every context the respective abstract rule is extended by, every XPath within an assertion results in a non-empty node set when evaluated against the maximum instance.
Where I'm stuck
I have written a C# (.NET Framework 4.8) program that parses "the schema" into said "maximum instance" (which is an XDocument at run-time). It also parses the business rules into a structure that makes it easy to get each abstract rule, its assertions, and the contexts these assertions are to be validated against.
But currently, I only have each complete XQuery (just like they are in the Schematron file) which effectively creates an assertion. But I actually need to break the XQuery down into its components (I guess I'd need the abstract syntax tree) so that I would have all individual XPaths. For instance, when given the XQuery if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(), I would need to retrieve abc:a/abc:b and abc:x/abc:y.
I assume that this could be done using Saxon-HE (or maybe another Parser/Compiler currently available for C# I don't know about). Unfortunately, I have yet to understand how to make use of Saxon well enough to even find at least a valid starting point for what I want to achieve. I've been trying to use the abstract syntax tree (so I can access the respective XPaths in the XQuery) seemingly accessible via XQueryExecutable:
Processor processor = new Processor();
XQueryCompiler xqueryCompiler = processor.NewXQueryCompiler();
XQueryExecutable exe = xqueryCompiler.Compile(xquery);
var AST = exe.getUnderlyingCompiledQuery();
var st = new XDocument();
st.Add(new XElement("root"));
XdmNode node = processor.NewDocumentBuilder().Build(st.CreateReader());
AST.explain((node); // <-- this is an error!
But that doesn't get me anywhere: I don't find any properties exposed I could work with? And while VS offers me to use AST.explain(...) (which seems promising), I'm unable to figure out what to parametrize here. I tried using a XdmNode which I thought would be a Destination? But also, I am using Saxon 10 (via NuGet), while Destination seems to be from Saxon 9: net.sf.saxon.s9api.Destination?!
Does anybody who was kind enough to read through all of this have any advice for me on how to tackle this? :-) Or, maybe there's a better way to solve my problem I haven't thought of - I'm also grateful for suggestions.
TL;DR
Sorry for the wall of text! In short: I have Schematron rules that augment an XML schema with business logic. To evaluate these rules (not: validate instances against the rules!) without actual XML instances, I need to break down the XQueries which make up the Schematron's assertions into their components so that I can handle all XPaths used in them. I think it can be done with Saxon-HE, but my knowledge is too limited to even understand what a good starting point what be for that. I'm also open for suggestions regarding a possibly better approach to solve my actual problem (as described in detail above).
Thank you for taking the time to read this.

If this were an XSD schema rather than a Schematron schema, then Saxon-EE would do the job for you automatically: this is very similar what a schema-aware XQuery processor attempts to do. But another difference is that in schema-aware XQuery, you can't assume that every element named foo is a valid instance of the element declaration named foo in the schema; it's quite legitimate, for example, for a query to transform valid instances into invalid instances, or vice versa. The input and output, after all, might conform to different schemas.
Saxon uses path analysis to do this: it looks at path expressions to see "where they might lead". Path analysis is also used to assess streamability, and to support document projection (building a trimmed-down tree representation of the source document that leaves out the parts that the query cannot reach). The path analysis in Saxon is by no means complete, for example it doesn't attempt to handle recursive functions. Although all these operations require Saxon-EE, the basic path analysis code is actually present in Saxon-HE, but I would offer no guarantee that it works for any purpose other than those described.
You're basically right that this is a tough problem you've set yourself, and I wish you luck with it.
Another approach you could adopt that wouldn't involve grovelling around the Saxon internals is to convert the XQuery to XQueryX, which is an XML representation of the parse tree, and then inspect the XQueryX (presumably using XQuery) to find the parts you need.

While XQueryX (as pointed out by Michael Kay) would theoretically have been exactly what I was looking for, unfortunately I could not find anything useful regarding an implementation for .NET during my research.
So I eventually solved the whole thing by creating my own parser using the XPath3.1 grammar for ANTLR4 as an ideal starting point. This way, I am now able to retrieve a syntax tree of any Schematron rule expression, allowing me to extract each contained XPath expression (and its sub expressions) separately.
Note that another stumbling block has been the fact that .NET still (!) only handles XPath 1.0 genuinely: While my parser does everything as supposed to, for some of the found expressions .NET gave me "illegal token" errors when trying to evaluate them. Installing the XPath2 NuGet package by Chertkov/Heyenrath was the solution.

Related

XML Serializer, Port from MS to Mono, Order

Circumstances made it necessary to run a .NET 3.5 application on Mono. The application does not use special libraries, but relies heavily on the XML Serializer to load a complex model.
Concrete I have the following situation, an XML statement like:
<compare-clause>
<first-element />
<second-element />
</compare-clause>
While first-element and second-element can be several different types, the order is important, as it defines which is the left and which the right operand. This works always fine on MS, but on Mono (with some elements) there is very strange behaviour. Mono ignores assigning the first property, assigns first-element to the second property on the target object, and leaves out the XML second-element completely.
The XML annotation on the class looks like this:
[XmlElement("first-element"), Type=typeof(FirstElementType), Order=1]
[XmlElement("second-element"), Type=typeof(SecondElementType), Order=1]
public ElementBase Left { ... }
[XmlElement("first-element"), Type=typeof(FirstElementType), Order=2]
[XmlElement("second-element"), Type=typeof(SecondElementType), Order=2]
public ElementBase Right { ... }
As I said, it does not work under some circumstances, sometimes swapping the order helps, although all types have the same structure and the annotation on both properties is the same, except for Order=1,2. This has always worked on MS .NET. Maybe it has something to do, if first-element comes before second-element on the annotation, although this should be completely unrelated according to general attribute specification (order does not matter).
Maybe this is still a bug with Mono XML Serialzer or I have misused it and it just works on MS. Nevertheless, I had to use the latest version of the XML Serializer (January 2013) in order to get the application working at all!
I would be grateful for hints on this.
Best Regards

Ok, I have carefully read the MS .NET Framework documentation again. Seems multiple notation of XMLElementAttribute is only considered an official option for collection-based properties. However, it is misleading that no proper error message is generated when creating the serializer then (both on MS and Mono).
I have made a workaround now for deserializing multiple types by adding an object[] property to the object, and setting XMLIgnore on the official properties of the model. That way, I can also distinguish between several types of elements, even if there is only one element expected.
To make things further clear, to those interested: I needed to deserialize a couple of object types, which are always identified through their element name, on different places within the document hierarchy. However, scalar polymorphic properties seem to serialize/deserialize only with the name specified on the property, and not on the type. That means type discrimination is only based on an attribute (xsi:type) and not on element name.
I have already researched for alternative XML serializer libraries. Although there are a few options with significant improvements, neither seems to be completely flexible or completed, especially in terms of class hierarchies.

C# Unit Testing: iterating through expected results list

How to parametrize a C# unit test so that instead of a series of similar assert statements the test would iterate through a list of parameters (incl. expected values) and compare result with the expected values?
Use case:
this particular unit test needs to check an XML document and go through a list of XML element names, verifying that the document contains these elements and their values match what is expected
Assert part of the test method consists of series of assertions like this:
var width = output.Element(namespace + "width");
Assert.IsNotNull(width);
Assert.AreEqual(width.Value, "600");
I would like to avoid redundant code and iterate through the same code with different values instead. How do I define a data structure to iterate through in the assertion checking?
The data structure needed is a list of tuples (containing elements of types (XName, string) in this case). How to express that in C#? Are there some standard unit-testing tools that can help here?
More information:
using the Visual Studio unit-testing framework (Microsoft.VisualStudio.TestTools.UnitTesting) and .Net 3.5
I do not need to run the use case itself with various parameter values, just the assert part of it (the code quoted above)

Nunit has something called TestCases which you access via an attribute. This sounds like something you are asking for:
http://nunit.org/?p=testCase&r=2.5
UPDATE:
This answer was provided prior to the question update specifying the framework being used
UPDATE
This question also looks like it has relevancy: MS Test Equivalent (or lack of)
Does MSTest have an equivalent to NUnit's TestCase?

It seems like you just need to extract a method to meet your literal requested need:
public void AssertElementExistsWithValue(XmlElement parent, string nameSpace, string childName, string value)
{
var child = parent.Element(namespace + childName);
Assert.IsNotNull(child);
Assert.AreEqual(child.Value, "600");
}
I usually use the Linq Xml classes, so I apologize if I have a compile error. You'll get the gist, I'm sure.
When I test xml formatting, I usually write two tests. The first is a round trip test: write the entity to xml, read it back, assert that they are the same. This is a nice value oriented test that doesn't break if you change the name of an element.
The second test I write is one that pins the format of XML exactly. I get the xml from a correctly formatted object, and use it as a constant in a test, and assert the correct object is created. This test fails for implementation detail reasons, but that's ok. It is there to force me to notice if I break backwards compatibility with data formats.

Since you say that you are using mstest, here are couple of resources from MSDN and independent blog that illustrate Data Driven Testing using Microsoft's unit testing framework.
In summary, you need to specify a DataSource attribute in your TestMethod and point it to the source of data. It can be a CSV or SQL Server CE.

Is it possible to use a LINQ Query in Windows Workflow rule condition?

I am in the process of prototyping an implementation of a rules engine to help us with our ordering portals. For example giving discounts on items or requiring approval if certain items are ordered. We would also like to be able to add rules for dollar amounts, user hierarchy positions, and be apply it one client or more.
I feel that WWF is a good answer to this need.
All of that said however I am having a little difficulty figuring out how best to set up some of the more complex rules. I have a "condition" that I feel is best described in a LINQ query, like so:
var y = from ol in currentOrder.OrderLines where ol.ItemCode == "MYITEMCODE" select ol;
I am not against using a different framework for a rules engine or adding additional properties/methods to our objects (ex: OrderHasItem(ItemCode), etc) to make these rules more simplified but I would like to avoid having to do that. It feels self-defeating in that it forces us down the road of potentially requiring code changes for new rules.

Yes, you can use Linq queries with Workflow. In WF what you are referring to as a rule is an expression that is evaluated at runtime. Your query is selecting a subset of the orderliness based on a criteria.
For example, if I have a collection of names and I want to see only names that begin with 'R'. I could write the following code.
private static void ShowQueryWithCode(IEnumerable<string> names)
{
Console.WriteLine("LINQ Query in Code - show names that start with 'R'");
// Assuming there are no null entries in the names collection
var query = from name in names where name.StartsWith("R") select name;
// This is the same thing as
// var query = names.Where(name => name.StartsWith("R"));
foreach (var name in query)
{
Console.WriteLine(name);
}
Console.WriteLine();
}
To do the same with Workflow
Step 1: Create a Workflow with an In Argument of type IEnumerable
Here you can see that I've added the in argument
Step 2: Add a Variable for the query of type IEnumerable
Before you can add a variable you need to include an activity which has variables. In this workflow I've added a sequence.
Step 3: Assign the query the LINQ expression you want to use
You can use a method chain or query syntax.
Step 4: Iterate over the collection
In the completed workflow I used a ForEach activity to iterate the list of names and write them to the console.
This example uses C# in .NET 4.5 but the same technique can be used with Visual Basic.
You can download the sample code here

WF is a workflow engine. It's used to run factories and banks. The rule engine is just a small part of it. To make sense, any normal WF project requires a dedicated team of professionals to build it. Seems like an overkill for your specific purpose. You are very likely to bury yourself and your project in a typical fight between requirements and real skills of your team.
The use of any available .NET engine would be more justified in your situation. Keep in mind that building a custom rule engine is not an easy tasks, no matter how simple it seems from the beginning. Setting property values of a class (typically called a "fact" or "source" object) or executing actions (invoking class' methods) is what rules engines do best. And it seems that it's exactly what you need. Check out some available .NET engines. They are inexpensive, if not free, and reliable.

Find number of serialized objects

My issue is trying to determine a number of objects created, the objects being serialized from an XML document. The XML document should be set up for simplicity, so any developer can add an additional object and need no further modification to the code. However each of these objects need to be handled/updated seperately, and specifically, some of the objects are of different sub-classes, which need to be handled differently. So what would be my simplest course of action, allowing other to add objects via the XML, but still ensuring the proper logic happenes for each?

This is totally a bad idea, but if you want something constructive...
Model your XML document objects and include some kind of known syntax for you to specify Lambda expressions in it. So if you enter a
<BinaryExpresion>
<NodeType>Add</NodeType>
<Left>3</Left>
<Right>4</Right>
</BinaryExpression>
Then when you read and compile the expression, you could run that code against the data if the XML object and do something (in this case, executing 3 + 4)

Walking an XML tree in C#

I'm new to .net and c#, so I want to make sure i'm using the right tool for the job.
The XML i'm receiving is a description of a directory tree on another machine, so it go many levels deep. What I need to do now is to take the XML and create a structure of objects (custom classes) and populate them with info from the XML input, like File, Folder, Tags, Property...
The Tree stucture of this XML input makes it, in my mind, a prime candidate for using recursion to walk the tree.
Is there a different way of doing this in .net 3.5?
I've looked at XmlReaders, but they seem to be walking the tree in a linear fashion, not really what i'm looking for...
The XML i'm receiving is part of a 3rd party api, so is outside my control, and may change in the futures.
I've looked into Deserialization, but it's shortcomings (black box implementation, need to declare members a public, slow, only works for simple objects...) takes it out of the list as well.
Thanks for your input on this.

I would use the XLINQ classes in System.Xml.Linq (this is the namespace and the assembly you will need to reference). Load the XML into and XDocument:
XDocument doc = XDocument.Parse(someString);
Next you can either use recursion or a pseudo-recursion loop to iterate over the child nodes. You can choose you child nodes like:
//if Directory is tag name of Directory XML
//Note: Root is just the root XElement of the document
var directoryElements = doc.Root.Elements("Directory");
//you get the idea
var fileElements = doc.Root.Elements("File");
The variables directoryElements and fileElements will be IEnumerable types, which means you can use something like a foreach to loop through all of the elements. One way to build up you elements would be something like this:
List<MyFileType> files = new List<MyFileType>();
foreach(XElelement fileElement in fileElements)
{
files.Add(new MyFileType()
{
Prop1 = fileElement.Element("Prop1"), //assumes properties are elements
Prop2 = fileElement.Element("Prop2"),
});
}
In the example, MyFileType is a type you created to represent files. This is a bit of a brute-force attack, but it will get the job done.
If you want to use XPath you will need to using System.Xml.XPath.
A Note on System.Xml vs System.Xml.Linq
There are a number of XML classes that have been in .Net since the 1.0 days. These live (mostly) in System.Xml. In .Net 3.5, a wonderful, new set of XML classes were released under System.Xml.Linq. I cannot over-emphasize how much nicer they are to work with than the old classes in System.Xml. I would highly recommend them to any .Net programmer and especially someone just getting into .Net/C#.

XmlReader isn't a particularly friendly API. If you can use .NET 3.5, then loading into LINQ to XML is likely to be your best bet. You could easily use recursion with that.
Otherwise, XmlDocument would still do the trick... just a bit less pleasantly.

This is a problem which is very suitable for recursion.
To elaborate a bit more on what another poster said, you'll want to start by loading the XML into a System.Xml.XmlDocument, (using LoadXml or Load).
You can access the root of the tree using the XmlDocument.DocumentElement property, and access the children of each node by using the ChildNodes property. Child nodes returns a collection, and when the Collection is of size 0, you know you'll have reached your base case.
Using LINQ is also a good option, but I'm unable to elaborate on this solution, cause I'm not really a LINQ expert.
As Jon mentioned, XmlReader isn't very friendly. If you end up having perf issues, you might want to look into it, but if you just want to get the job done, go with XmlDocument/ChildNodes using recursion.

Load your XML into an XMLDocument. You can then walk the XMLDocuments DOM using recursion.
You might want to also look into the factory method pattern to create your classes, would be very useful here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.