C# Xpath can't get element by name - c#

Have document loaded to XmlDocument with next srtucture
<?xml version="1.0" encoding="UTF-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink">
<stylesheet type="text/css"></stylesheet>
<description>...</description>
<body>...</body>
<binary id="19317.jpg" content-type="image/jpeg">...</binary>
</FictionBook>
Next metods return me null (or empty collection if i use SelectNodes):
doc.SelectSingleNode("body");
doc.SelectSingleNode("//body");
doc.LastChild.SelectSingleNode("body");
doc.LastChild.SelectSingleNode("//body");
But this one works correctly
doc.LastChild["body"];
Why XPath don't give me any results?

doc.SelectSingleNode("//body"); doesn't work because body is declared in a specific namespace "http://www.gribuser.ru/xml/fictionbook/2.0", so to query for it you could code it like this:
var mgr = new XmlNamespaceManager(new NameTable());
mgr.AddNamespace("whatever", "http://www.gribuser.ru/xml/fictionbook/2.0");
var node = doc.SelectSingleNode("//whatever:body", mgr);
doc.LastChild["body"]; works because the implementation supports it, but you could use it like this to avoid ambiguities:
doc.LastChild["body", "http://www.gribuser.ru/xml/fictionbook/2.0"]

Related

Access Child nodes with namespace using xpath

How can I read the content of the childnotes using Xpath?
I have already tried this:
var xml = new XmlDocument();
xml.Load("server-status.xml");
var ns = new XmlNamespaceManager(xml.NameTable);
ns.AddNamespace("ns", "namespace");
var node = xml.SelectSingleNode("descendant::ns:server[ns:ip-address]", ns)
Console.WriteLine(node.InnerXml)
But I only get a string like this:
<ip-address>127.0.0.50</ip-address><name>Server 1</name><group>DATA</group>
How can I get the values individually?
Xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<server-status xmlns="namespace">
<server>
<ip-address>127.0.0.50</ip-address>
<name>Server 1</name>
<group>DATA</group>
</server>
</server-status>
You're using XML namespaces in XPath correctly.
However, your original XPath,
descendant::ns:server[ns:ip-address]
says to select all ns:server elements with ns:ip-address children.
If you wish to select the ns:ip-address children themselves, instead use
descendant::ns:server/ns:ip-address
Similarly, you could select ns:name or ns:group elements.

Accessing the xml tag with C# and updating the content

I have an xml file that I converted from pdf to xml.
Example XML looks as follows
<?xml version="1.0" encoding="UTF-8"?>
<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.graphics">
<office:body>
<office:drawing>
<draw:page draw:name="page1" draw:style-name="dp2" draw:master-page-name="master-page3">
<draw:frame draw:style-name="gr9" draw:text-style-name="P10" draw:layer="layout" svg:width="1.242cm" svg:height="0.357cm" svg:x="17.055cm" svg:y="11.787cm">
<draw:text-box>
<text:p text:style-name="P2"><text:span text:style-name="T6">Example</text:span></text:p>
</draw:text-box>
</draw:frame>
</draw:page>
</office:drawing>
</office:body>
</office:document>
The C# code I'm trying to use:
XmlDocument doc = new XmlDocument();
XmlNamespaceManager namespaces = new XmlNamespaceManager(doc.NameTable);
namespaces.AddNamespace("xmlns:draw", "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0");
doc.Load("invoiceto.xml");
doc.SelectSingleNode("/draw:frame/draw:text-box/text:p/text:span", namespaces).InnerText = "new value";
I get this error
'' text 'namespace prefix not defined.'
I want to replace the text of example with C# but how can I get to the <text: span text: style-name = "T6"> tag with C#?
First of all the prefix added to XmlNamespaceManager shouldn't include the xmlns part. Then you also need to add the prefix text besides draw because both will be used in the XPath expression for calling SelectSingleNode. Last, since the element <draw:frame> isn't the root element you need to either specify full path starting from the root or start the XPath using // (the descendant-or-self axis) instead:
XmlNamespaceManager namespaces = new XmlNamespaceManager(doc.NameTable);
namespaces.AddNamespace("draw", "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0");
namespaces.AddNamespace("text", "urn:oasis:names:tc:opendocument:xmlns:text:1.0");
doc.SelectSingleNode("//draw:frame/draw:text-box/text:p/text:span", namespaces).InnerText = "new value";
dotnetfiddle demo

C# Using XPath with XmlDocument - Can't select nodes in a namespace (returning null)

I'm trying to do something which ought to be quite simple but I'm having terrible trouble. I have tried code from multiple similar questions in StackOverflow but to no avail.
I'm trying to get various pieces of information from an ABN lookup with the Australian government. Here is anonymised return XML value:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">
<ABRPayloadSearchResults>
<request>
<identifierSearchRequest>
<authenticationGUID>00000000-0000-0000-0000-000000000000</authenticationGUID>
<identifierType>ABN</identifierType>
<identifierValue>00 000 000 000</identifierValue>
<history>N</history>
</identifierSearchRequest>
</request>
<response>
<usageStatement>The Registrar of the ABR monitors the quality of the information available on this website and updates the information regularly. However, neither the Registrar of the ABR nor the Commonwealth guarantee that the information available through this service (including search results) is accurate, up to date, complete or accept any liability arising from the use of or reliance upon this site.</usageStatement>
<dateRegisterLastUpdated>2017-01-01</dateRegisterLastUpdated>
<dateTimeRetrieved>2017-01-01T00:00:00.2016832+10:00</dateTimeRetrieved>
<businessEntity>
<recordLastUpdatedDate>2017-01-01</recordLastUpdatedDate>
<ABN>
<identifierValue>00000000000</identifierValue>
<isCurrentIndicator>Y</isCurrentIndicator>
<replacedFrom>0001-01-01</replacedFrom>
</ABN>
<entityStatus>
<entityStatusCode>Active</entityStatusCode>
<effectiveFrom>2017-01-01</effectiveFrom>
<effectiveTo>0001-01-01</effectiveTo>
</entityStatus>
<ASICNumber>000000000</ASICNumber>
<entityType>
<entityTypeCode>PRV</entityTypeCode>
<entityDescription>Australian Private Company</entityDescription>
</entityType>
<goodsAndServicesTax>
<effectiveFrom>2017-01-01</effectiveFrom>
<effectiveTo>0001-01-01</effectiveTo>
</goodsAndServicesTax>
<mainName>
<organisationName>COMPANY LTD</organisationName>
<effectiveFrom>2017-01-01</effectiveFrom>
</mainName>
<mainBusinessPhysicalAddress>
<stateCode>NSW</stateCode>
<postcode>0000</postcode>
<effectiveFrom>2017-01-01</effectiveFrom>
<effectiveTo>0001-01-01</effectiveTo>
</mainBusinessPhysicalAddress>
</businessEntity>
</response>
</ABRPayloadSearchResults>
</ABRSearchByABNResponse>
</soap:Body>
</soap:Envelope>
so I want to get for example the whole response using xpath="//response" then use various xpath statement within that node to get the <organisationName> ("//mainName/organisationName") and other values.
It should be simple right? Those xpath statements appear to work when testing in Notepad++but I use this code in Visual Studio:
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(ipxml);
XmlNode xnode = xdoc.SelectSingleNode("//response");
XmlNodeList xlist = xdoc.SelectNodes("//mainName/organisationName");
xlist = xdoc.GetElementsByTagName("mainName");
But it always returns null, whatever I put in the xpath I get a null return for the node and 0 count for the list whether I'm selecting something with child nodes, a value or not.
I can get the nodes using GetElementsByTagName() as in the example which returns the correct node, but I wanted to do it 'properly' selecting the proper field using xpath.
I also tried using XElement and Linq but still no luck. Is there something weird about the XML?
I'm sure it must something simple but I've been struggling for ages.
You aren't dealing with the namespaces present in the document. Specifically, the high level element:
<ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">
places ABRSearchByABNResponse, and all its child elements (unless overridden by another xmlns) into the namespace http://abr.business.gov.au/ABRXMLSearch/. In order to navigate to these nodes (without hacks like GetElementsByTagName or using local-name()), you'll need to register the namespaces with an XmlNamespaceManager, like so. The xmlns aliases don't necessarily need to match those used in the original document, but it's a good convention to do so:
XmlDocument
var xdoc = new XmlDocument();
var ns = new XmlNamespaceManager(xdoc.NameTable);
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");
xdoc.LoadXml(ipxml);
// NB need to use the overload accepting a namespace
var xresponse = xdoc.SelectSingleNode("//abr:response", ns);
var xlist = xdoc.SelectNodes("//abr:mainName/abr:organisationName", ns);
XDocument
More recently, the powers of LINQ can be harnessed with XDocument, which makes working with namespaces much easier (Descendants finds child nodes at any depth)
var xdoc = XDocument.Parse(ipxml);
XNamespace soap = "http://schemas.xmlsoap.org/soap/envelope/";
XNamespace abr = "http://abr.business.gov.au/ABRXMLSearch/";
var xresponse = xdoc.Descendants(abr + "response");
var xlist = xdoc.Descendants(abr + "organisationName");
XDocument + XPath
You can also resort to using XPath in Linq to Xml, especially for more complicated expressions:
var xdoc = XDocument.Parse(ipxml);
var ns = new XmlNamespaceManager(new NameTable());
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");
var xresponse = xdoc.XPathSelectElement("//abr:response", ns);
var xlist = xdoc.XPathSelectElement("//abr:mainName/abr:organisationName", ns);
You need to call SelectSingleNode and SelectNodes on the DocumentElement. You are calling them on the document itself.
For example:
XmlNode xnode = xdoc.DocumentElement.SelectSingleNode("//response");

Unable to use XmlDocument.SelectSingleNode on XML with Two Namespaces

I'm trying to parse the following XML:
<?xml version="1.0" encoding="utf-8"?>
<A2AAnf:MPPPPPP xsi:schemaLocation="urn:A2AAnf:xsd:$MPPPPPP.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:A2AAnf="urn:A2AAnf:xsd:$MPPPPPP">
<A2AAnf:Num>0</A2AAnf:Num>
<A2AAnf:FIT xmlns="urn:iso:std:iso:20022:xsd:003.001">
<Hdr>
<Inf>
<Mtd>TEST</Mtd>
</Inf>
</Hdr>
</A2AAnf:FIT>
I want to access the <Mtd> tag.
XMLQuire shows the path to be /A2AAnf:MPPPPPP/A2AAnf:FIT/dft:Hdr/dft:Inf/dft:Mtd, but when I'm trying to parse it using the following code:
XmlDocument xmldocument = new XmlDocument();
var xmlNameSpaceManager = new XmlNamespaceManager(xmldocument.NameTable);
xmlNameSpaceManager.AddNamespace("A2AAnf", "urn:A2AAnf:xsd:$MPPPPPP");
try
{
xmldocument.LoadXml(m_XML);
var node = xmldocument.SelectSingleNode("/A2AAnf:MPPPPPP/A2AAnf:FIT/dft:Hdr/dft:Inf/dft:Mtd", xmlNameSpaceManager);
}
I receive the following error:
namespace prefix 'dft' is not defined
And since I can't find "dft" in my XML, I also tried to remove the "dft" prefix and search for the same string without "dft". This time, nothing was returned.
What am I missing?
You have to add dft to your XmlNamespaceManager:
var xmlNameSpaceManager = new XmlNamespaceManager(xmldocument.NameTable);
xmlNameSpaceManager.AddNamespace("A2AAnf", "urn:A2AAnf:xsd:$MPPPPPP");
xmlNameSpaceManager.AddNamespace("dft", "urn:iso:std:iso:20022:xsd:003.001");
The prefixes you use in your XPath query have nothing to do with the prefixes used in the XML document. They are instead the prefixes you define in your XmlNamespaceManager.

How to correctly parse an XML document with arbitrary namespaces

I am trying to parse somewhat standard XML documents that use a schema called MARCXML from various sources.
Here are the first few lines of an example XML file that needs to be handled...
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<marc:record>
<marc:leader>00925njm 22002777a 4500</marc:leader>
and one without namespace prefixes...
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>01142cam 2200301 a 4500</leader>
Key point: in order to get the XPaths to resolve further along in the program I have to go through a regex routine to add the namespaces to the NameTable (which doesn't add them by default). This seems unnecessary to me.
Regex xmlNamespace = new Regex("xmlns:(?<PREFIX>[^=]+)=\"(?<URI>[^\"]+)\"", RegexOptions.Compiled);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlRecord);
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable);
MatchCollection namespaces = xmlNamespace.Matches(xmlRecord);
foreach (Match n in namespaces)
{
nsMgr.AddNamespace(n.Groups["PREFIX"].ToString(), n.Groups["URI"].ToString());
}
The XPath call looks something like this...
XmlNode leaderNode = xmlDoc.SelectSingleNode(".//" + LeaderNode, nsMgr);
Where LeaderNode is a configurable value and would equal "marc:leader" in the first example and "leader" in the second example.
Is there a better, more efficient way to do this? Note: suggestions for solving this using LINQ are welcome, but I would mainly like to know how to solve this using XmlDocument.
EDIT: I took GrayWizardx's advice and now have the following code...
if (LeaderNode.Contains(":"))
{
string prefix = LeaderNode.Substring(0, LeaderNode.IndexOf(':'));
XmlNode root = xmlDoc.FirstChild;
string nameSpace = root.GetNamespaceOfPrefix(prefix);
nsMgr.AddNamespace(prefix, nameSpace);
}
Now there's no more dependency on Regex!
If you know there is going to be a given element in the document (for instance the root element) you could try using GetNamespaceOfPrefix.

Categories

Resources