How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.
XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.
I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.
Related
Is there something I need to configure in the XmlReaderSettings to encourage .net (4.8, 6, 7) to handle some cXML without throwing the following exception:
Unhandled exception. System.Xml.Schema.XmlSchemaException: The parameter entity replacement text must nest properly within markup declarations.
Sample cXML input
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd">
<cXML payloadID="donkeys#example.com" timestamp="2023-02-13T01:01:01Z">
<Header>
</Header>
<Request deploymentMode="production">
</Request>
</cXML>
Sample Application
using System.Xml;
using System.Xml.Linq;
namespace Donkeys
{
internal class Program
{
static void Main()
{
XmlReaderSettings settings = new()
{
XmlResolver = new XmlUrlResolver(),
DtdProcessing = DtdProcessing.Parse,
ValidationType = ValidationType.DTD,
};
FileStream fs = File.OpenRead("test.xml"); // sample cXML from question
XmlReader reader = XmlReader.Create(fs, settings);
XDocument.Load(reader); // this blows up
}
}
}
I'm looking to use the XmlUrlResolver to cache the DTDs but without ignoring the validation I get the error above but i'm not really sure why?
So far I've tried different validation flags but they don't validate at all unless I use ValidationType.DTD which goes pop.
The actual resolver seems to work fine; if I subclass it, it is returning the DTD (as a MemoryStream) as expected.
I can add an event handler to ignore the issue but this feels lamer than I'd like.
using System.Xml;
using System.Xml.Linq;
namespace Donkeys
{
internal class Program
{
static void Main()
{
XmlReaderSettings settings = new()
{
XmlResolver = new XmlUrlResolver(),
DtdProcessing = DtdProcessing.Parse,
ValidationType = ValidationType.DTD,
IgnoreComments = true
};
settings.ValidationEventHandler += Settings_ValidationEventHandler;
FileStream fs = File.OpenRead("test.xml");
XmlReader reader = XmlReader.Create(fs, settings);
XDocument dogs = XDocument.Load(reader);
}
private static void Settings_ValidationEventHandler(object? sender, System.Xml.Schema.ValidationEventArgs e)
{
// this seems fragile
if (e.Message.ToLower() == "The parameter entity replacement text must nest properly within markup declarations.".ToLower()) // and this would be a const
return;
throw e.Exception;
}
}
}
I've spent some time over the last few days looking into this and trying to get my head around what's going on here.
As far as I can tell, the error The parameter entity replacement text must nest properly within markup declarations is being reported incorrectly. My understanding of the spec is that this message means that you have mismatched < and > elements in the replacement text of a parameter entity in a DTD.
The following example is taken from this O'Reilly book sample page and demonstrates something that genuinely should reproduce this error:
<!ENTITY % finish_it ">">
<!ENTITY % bad "won't work" %finish_it;
Indeed the .NET DTD parser reports the same error for these two lines of DTD.
This doesn't mean you can't have < and > characters in parameter entity replacement text at all: the following two lines will declare an empty element with name Z, albeit in a somewhat round-about way:
<!ENTITY % Nested "<!ELEMENT Z EMPTY>">
%Nested;
The .NET DTD parser parses this successfully.
However, the .NET DTD parser appears to be objecting to this line in the cXML DTD, which defines the Object.ANY parameter entity:
<!ENTITY % Object.ANY '|xades:QualifyingProperties|cXMLSignedInfo|Extrinsic'>
There are of course no < and > characters in the replacement text, so the error is baffling.
This is by no means a new problem. I found this unanswered Stack Overflow question which basically reports the same problem. Also, this MSDN Forum post basically has the same problem, and it was asked in 2007. So is this unclear but intentional behaviour, or a bug that has been in .NET for 15+ years? I don't know.
For those who do want to look into things further, the following is about the minimum necessary to reproduce the problem. The necessary C# code to read the XML file can be taken from the question and adapted, I don't see the need to repeat it here:
example.dtd:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT A EMPTY>
<!ENTITY % Rest '|A' >
<!ELEMENT example (#PCDATA %Rest;)*>
example.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example SYSTEM "example.dtd">
<example/>
There are various ways to tweak this to get rid of the error. One way is to move the | character from the parameter entity into the ELEMENT example declaration. Replacing #PCDATA with another element (which you would also have to define) is another way.
But enough of the theory behind the problem. How can you actually move forwards with this?
I would take a local copy of the cXML DTD and adjust it to work around this error. You can download the DTD from the URL in your sample cXML input. The %Object.ANY; parameter entity is only used once in the DTD: I would replace this one occurrence with the replacement text, |xades:QualifyingProperties|cXMLSignedInfo|Extrinsic.
You then need to adjust the .NET XML parser to use your modified copy of the cXML DTD instead of fetching the the one from the given URL. You create a custom URL resolver for this, for example:
using System.Xml;
namespace Donkeys
{
internal class CXmlUrlResolver : XmlResolver
{
private static readonly Uri CXml1_2_041 = new Uri("http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd");
private readonly XmlResolver urlResolver;
public CXmlUrlResolver()
{
this.urlResolver = new XmlUrlResolver();
}
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
if (absoluteUri == CXml1_2_041)
{
// Return a Stream that reads from your custom version of the DTD,
// for example:
return File.OpenRead(#"SomeFilePathHere\cXML-1.2.401.dtd");
}
return this.urlResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
}
}
}
This checks to see what URI is being requested, and if it matches the cXML URI, returns a stream that reads from your customised copy of the DTD. If some other URI is given, it passes the request to the nested XMLResolver, which then deals with it. You will of course need to use an instance of CXmlUrlResolver instead of XmlUrlResolver() when creating your XmlReaderSettings.
I don't know how many versions of cXML you will have to deal with, but if you are dealing with multiple versions, you might have to create a custom copy of the DTD for each version, and have your resolver return the correct local copy for each different URI.
A similar approach is given at this MSDN Forums post from 2008, which also deals with difficulties parsing cXML with .NET. This features a custom URL resolver created by subclassing XmlUrlResolver. Those who prefer composition over inheritance may prefer my custom URL resolver instead.
I'm using XElement to build an XML document in C# and I'm trying to set
<myEelment xml:space="preserve">
Here's my current attempt:
myElement.SetAttributeValue(XName.Get("space", "xml"), "preserve");
but it comes out like this:
<myEelment p4:space="preserve" xmlns:p4="xml">
I understand how this is going wrong - my code is using "xml" as a namespace URI when I want to use as a namespace prefix. My problem is that AFAICT the "xml" namespace prefix is somehow implicit and doesn't actually have a namespace URI associated with it. So how can I generate attributes with the namespace prefix "xml"?
Standard namespaces are available as properties on the XNamespace class. Use that.
var myElement = doc.Descendants("myElement").Single();
myElement.SetAttributeValue(XNamespace.Xml + "space", "preserve");
I am trying to read the entitysets within the EDMX file from Entity Framework.
The EDMX file (XML format) has the following layout:
<edmx:Edmx Version="3.0" xmlns:edmx="http://schemas.microsoft.com/ado/2009/11/edmx">
<edmx:Runtime>
<edmx:ConceptualModels>
<Schema Namespace="Model" Alias="Self" p1:UseStrongSpatialTypes="false" xmlns:annotation="http://schemas.microsoft.com/ado/2009/02/edm/annotation" xmlns:p1="http://schemas.microsoft.com/ado/2009/02/edm/annotation" xmlns="http://schemas.microsoft.com/ado/2009/11/edm">
<EntityContainer Name="EntityModel" p1:LazyLoadingEnabled="true">
<EntitySet Name="TableName" EntityType="Model.TableName" />
I am using following XPath to get all EntitySet Nodes within the EntityContainer:
/edmx:Edmx/edmx:Runtime/edmx:ConceptualModels/Schema/EntityContainer/EntitySet
but I am getting no result with this C# code:
XmlDocument xdoc = new XmlDocument("pathtoedmx");
var ns = new XmlNamespaceManager(xdoc.NameTable);
ns.AddNamespace("edmx", "http://schemas.microsoft.com/ado/2009/11/edmx");
ns.AddNamespace("annotation", "http://schemas.microsoft.com/ado/2009/02/edm/annotation");
ns.AddNamespace("p1", "http://schemas.microsoft.com/ado/2009/02/edm/annotation");
ns.AddNamespace("", "http://schemas.microsoft.com/ado/2009/11/edm");
var entitySets = xdoc.SelectNodes("/edmx:Edmx/edmx:Runtime/edmx:ConceptualModels/Schema/EntityContainer/EntitySet", ns);
Already got the XPath from this tool (http://qutoric.com/xmlquire/), because I started not trusting my own XPath skills but it tells me the same XPath I was already using.
If I remove the "/Schema/EntityContainer/EntitySet" part its finding the "/edmx:Edmx/edmx:Runtime/edmx:ConceptualModels", but not further on already tried to specify the "edmx" namespace ("edmx:/Schema") but no difference.
Hope you can help me out, already banging my head against the table. :)
Namespaces are a convention on how to combine two different XML dialects into a single document. Those prefixes really doesn't matter as long you keep your URI component exactly the same. For instance, take something like this:
ns.AddNamespace("xxx", "http://schemas.microsoft.com/ado/2009/11/edmx");
Console.WriteLine(xdoc.SelectNodes("/xxx:Edmx", ns).Count); // 1
You'll get one node because your namespace URI matched, despite your "wrong" namespace prefix.
If you have an attribute named xmlns, current element and it's children will inherits that namespace URI.
In your case, your root element doesn't have a default namespace and that's ok. But your Schemas element does have a namespace and you need to inform it. I came with this code:
// change "" to "edm"
ns.AddNamespace("edm", "http://schemas.microsoft.com/ado/2009/11/edm");
var entitySets = xdoc.SelectNodes("/edmx:Edmx/edmx:Runtime/edmx:ConceptualModels/edm:Schema/edm:EntityContainer/edm:EntitySet", ns);
I usually search the web high and low for my answers but this time I am drawing a blank. I'm using VS2005 to write code to POST xml to an API. I have classes setup in C# that I serialize into an XML document. The classes are below:
[Serializable]
[XmlRoot(Namespace = "", IsNullable = false)]
public class Request
{
public RequestIdentify Identify;
public string Method;
public string Params;
}
[Serializable]
public class RequestIdentify
{
public string StoreId;
public string Password;
}
When I serialize this I get:
<?xml version="1.0" encoding="UTF-8"?>
<Request xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Identify>
<StoreId>00</StoreId>
<Password>removed for security</Password>
</Identify>
<Method>ProductExport</Method>
<Params />
</Request>
But the API returns a "No XML sent" error.
If I send the xml directly in a string as:
string xml = #"<Request><Identify><StoreId>00</StoreId><Password>Removed for security</Password></Identify><Method>ProductExport</Method><Params /></Request>";
effectively sending this xml (without the schema info in the "Request" tag):
<Request>
<Identify>
<StoreId>00</StoreId>
<Password>Removed for security</Password>
</Identify>
<Method>ProductExport</Method>
<Params />
</Request>
It seems to recognise the XML no problem.
So my question I guess is how can I change my current classes to Serialize into XML and get the XML as in the second case? I assume I need another "parent" class to wrap around the existing ones and use InnerXml property on this "parent" or something similar but I don't know how to do this.
Apologies for the question, I've only been using C# for 3 months and I'm a trainee developer who is having to teach himself on the job!
Oh and PS I don't know why but VS2005 really does not want to let me set these classes up with private variables and then use getters and setters on public equivalents so I have them written how they are for now.
Thanks in advance :-)
UPDATE:
As with most things it's very hard to find answers if you're not sure what you need to ask or how to word it but:
Once I knew what to look for I found the answers I needed:
Removing the XML declaration:
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.OmitXmlDeclaration = true;
StringWriter stringWriter = new StringWriter();
using (XmlWriter xmlWriter = XmlWriter.Create(stringWriter, writerSettings))
{
serializer.Serialize(xmlWriter, request);
}
string xmlText = stringWriter.ToString();
Removing/Setting the namespace (Thanks to above replies that helped find this one!):
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
Thanks for your help everyone who answered or pointed me in the right direction! And yes I did find articles to read once I knew what I was asking :-) It's the first time I have come unstuck in 3 months of teaching myself so I think I'm doing pretty well...
From Rydal's blog:
The XmlDocument object by default assigns a namespace to the XML string and also includes the declaration as the first line of the XML document. I definitely do not need or use those, therefore, I need to remove them. Here is how you go about doing just that.
Removing declaration and namespaces from XML serialization
I am importing some data from an XML file which contains a couple of namespaces. I am using nested classes to import data in different level. Unfortunately I cannot access the data in inner classes. I know the problem is regarding namespaces but I do not know how I can resolve them without modifying inner classes. Does anybody know how I can resolve namespaces in XDocument.
This is a part of that XML:
<TransXChange xmlns="http://www.transxchange.org.uk/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:apd="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
xsi:schemaLocation="http://www.transxchange.org.uk/ http://www.transxchange.org.uk/schema/2.1/TransXChange_registration.xsd"
xml:lang="en" CreationDateTime="2004-06-09T14:20:00-05:00"
>
<Operators>
<LicensedOperator id="O1" >
<NationalOperatorCode>ABC</NationalOperatorCode>
<OperatorAddresses>
<CorrespondenceAddress>
<apd:Line>45 City Road</apd:Line>
</CorrespondenceAddress>
</OperatorAddresses>
</LicensedOperator>
</Operators>
</TransXChange>
You have to use the namespace that matches the one used in your XML, with Linq to XML you can use XNamespace for that i.e.:
XDocument doc = XDocument.Load("test.xml");
XNamespace apd = "http://www.govtalk.gov.uk/people/AddressAndPersonalDetails";
var firstHit = doc.Descendants(apd + "Line").First();
You need to tell the parser how to reference the namespace:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xmlElem.OwnerDocument.NameTable);
nsmgr.AddNamespace("apd", "http://[PATH_TO_NAMESPACE_DOC]");
That should do it