Handle XML with and without a Default Namespace - c#

I am writing a piece of code in C# that will receive XML and will pull out some values. When the XML I receive has a default namespace then I must use a XmlNamespaceManager.
XmlDocument requestXml = new XmlDocument();
requestXml.LoadXml(request);
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(requestXml.NameTable);
xmlNamespaceManager.AddNamespace("ns0", requestXml.DocumentElement.NamespaceURI);
metadata.Identifier = requestXml.SelectSingleNode("//ns0:Identifier[1]", xmlNamespaceManager).InnerText;
But when the namespace has a prefix than this code breaks, but it works without the namespace manager.
XmlDocument requestXml = new XmlDocument();
requestXml.LoadXml(request);
metadata.Identifier = requestXml.SelectSingleNode("//Identifier[1]").InnerText;
Is it reasonable for me to expect to know whether the xml namespace will have a prefix? If not, how can I build more robust code that doesn't care if the namespace has a prefix? Any help is much appreciated.
Update:
It is my understanding that these three are the same.
Example1:
<Node1 xmlns="myNamespace">
<Node2>
<Node3></Node3>
</Node2>
</Node1>
Example2:
<ns0:Node1 xmlns:ns0="myNamespace">
<Node2>
<Node3></Node3>
</Node2>
</ns0:Node1>
Example3:
<ns0:Node1 xmlns:ns0="myNamespace">
<ns0:Node2>
<ns0:Node3></ns0:Node3>
</ns0:Node2>
</ns0:Node1>
Instead of asking how C# can treat these the same (I know C# has trouble with default namespaces) a better question is what checks do I perform to ensure I am covering all scenarios.

One would assume that there is strong relationship between namespace prefixes in XML and namespace prefixes in XPath but there is none.
Namespace prefix of node in XML: could be empty or not, empty prefix maps to currently default namespace. That default namespace for current element could be either an empty namespace (if there is no xmlns="urn:somenamespace" on current/parent node) or some particular namespace if xmlns is present.
Sample:
<root>
<child1 xmlns="urn:ns1">
<inner xmlns="urn:reallyDeep" />
</child1>
<child2 />
<root>
Empty namespace prefix maps to:
root - empty namespace (default)
child1 - urn:ns1, full node name can be represented as {urn:ns1}child1
child2 - empty namespace coming from parent (root)
inner - urn:reallyDeep because it is redefined on that node with xmlns:.
Now XPath namespace prefixes are a bit simple because empty prefix always maps to empty namespace.
To select inner node in the sample above one need to select nodes from 3 namespaces: empty, urn:ns1, urn:reallyDeep. So namespace manager need to be constructed with 3 prefixes. Default empty one is always there, so need to add just two more. Note that prefixes in XPath have no relation to prefixes in XML - common practice is to match prefixes for some "well known" namespaces (like xsi), but it is strictly personal choice.
p1 -> urn:ns1
p2 -> urn:reallyDeep
/root/p1:child1/p2:child2
When writing code it may be safe to explicitly define mapping instead of trying to take one from the XML unless you building XPath samples based on the XML:
xmlNamespaceManager.AddNamespace("p1", "urn:ns1");
xmlNamespaceManager.AddNamespace("ns0", "urn:reallyDeep");
Note that XML may use single namespace prefix for all namespace (empty prefix or any other one). I.e. following XML is identical from XML point of view to sample earlier even if it uses prefixes:
<root>
<a:child1 xmlns="urn:ns1">
<a:inner xmlns:a="urn:reallyDeep" />
</a:child1>
<child2 />
<root>
Note that if you don't care about namespaces and prefixes (you really not using XML properly in this case) you can use local-name XPath function to compare just names.

Related

XmlDocument Searching for namespace returns children

I am trying to convert nodes that have a namespace declaration over to use a prefix instead. My first stab at it was to just use xslt to transform the xml, but I started looking at doing it with the XmlDocument class and using the SelectNodes() method. The issue I am seeing is when I try to select nodes that have a namespace, it selects that node AND its children. I assume this is because it is selecting the node which contains children.
<foo xmlns="some url">
<child>child</child>
</foo>
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(xmlstring);
var query = xdoc.SelectNodes("//*[namespace-uri()='some url']");
the query variable will return <foo> and <child> nodes, so when I loop through the nodes and change it to use the prefix, I get the following result.
<prefix:foo>
<prefix:child></prefix:child>
</prefix:foo>
Is there a way to just return just the <foo> node in this case? Is it better to also use xslt to transform it?
I didnt think you could change a namespace or prefix when using XDocument and XElement, so thats why I used XmlDocument.
Update
The result id want would be the prefix only on the node where the declaration was. This is valid xml correct or does the prefix need to be on the children as well to be valid?
<prefix:foo>
<child>child</child>
</prefix:foo>
In the XDM data model used by XPath and XSLT, there is no distinction between
<foo xmlns="some url">
<child>child</child>
</foo>
and
<foo xmlns="some url">
<child xmlns="some url">child</child>
</foo>
Logically the namespace is present on both element nodes, and its omission from the child in the lexical serialization is treated as a convenient abbreviation.
So yes, if you search for things having this namespace, you will get both elements.
Now, what are your requirements? I'm not convinced you fully understand them yourself, because the desired output you have shown is not actually well-formed (the namespace prefix is not declared). In your input, the two elements are in the same namespace; in the output, you seem to want them to be in different namespaces. If you want to process them differently, then you're going to have to use something other than the namespace to discriminate between them.
Remember that in XDM, it's the name of the node that matters, not the namespace declarations or prefixes; those are just ornamental. The name of the node is the combination of its local name and its namespace URI. You've described your requirement in terms of prefixes, but it's namespaces that actually matter.

How can I update XML element or attribute which its name contains ":" special character

I have XML file and the elements/attributes names have ":" character, how I can update its vales?
<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.3"?>
<my:ECR my:NoOfAutho="16" my:hideDetails="0" my:Type="ECR" my:NoOfVBUCMApprovales="10" >
<my:ECRNo>148</my:ECRNo>
<my:Stage>Approved</my:Stage>
<my:Details>
<my:ReasonForCR>Reason For CR</my:ReasonForCR>
<my:AreaAffected_Publications_VBUCM>false</my:AreaAffected_Publications_VBUCM>
<my:AreaAffected_Engineering>true</my:AreaAffected_Engineering>
<my:AreaAffected_Production>false</my:AreaAffected_Production>
<my:AreaAffected_CustomerSupport>true</my:AreaAffected_CustomerSupport>
<my:AreaAffected_VBUCMTest>false</my:AreaAffected_VBUCMTest>
</my:AreaAffectedVB_UCM>
Your XML sample is invalid as shown. The my prefix is not defined in the XML.
If your XML contained xmlns:my="schemas.microsoft.com/office/infopath/2003/myXSD/…" then the XML would at least have some hope of being valid.
For manipulating XML with namespaces in .NET code, consider using Linq XDocument instead of XmlDocument. I have found Linq's XNamespace and XName types to be much, much easier to use with the XDocument family of classes than the old style XmlDocument's rather clunky handling of namespaces.
Change your XML to add the xmlns:my attribute to the root element:
<my:MNO xmlns:my="schemas.microsoft.com/office/infopath/2003/myXSD/…" my:NoOfAutho="16" etc... >
In your C# code, add a reference to the Linq stuff to the top of your source file:
using System.Xml.Linq;
Then use code like this (not checked, may contain syntax typos) to load the xml and access the element:
XNamespace ns = "schemas.microsoft.com/office/infopath/2003/myXSD/…";
XName MNO_Name = ns + "MNO";
XDocument doc = XDocument.Load(path2);
XElement MNO_Element = doc.Root.Descendants(MNO_Name).Single();
You can then read or modify the properties, attributes, and children of the MNO element.
To read the value of <MNO>100</MNO>, use MNO_Element.Value.
To write a new value to the element, assign to the value property: MNO_Element.Value = "120";
.Single() asserts that there is exactly one node that matches the selection criteria, similar to the .SelectSingleNode() function of XmlDocument.
As you can see from this code, the name of the "my" namespace prefix in the XML document is immaterial to the code that processes the XML - it's the URI that the "my" prefix represents that is what is important. The prefix is just shorthand so the the XML writer doesn't have to write long and laborious URIs everywhere.
Writing your XML processing code to be agnostic of the XML namespace prefix is very important because the prefix name can (and will) vary from one XML doc to the next, but the namespace URI will be the same.
I don't understand what you mean by "how can I update its values", but it will likely help if you understand that these are XML namespaces.
I.E., my:ECRNo has a simple element name of ECRNo with a namespace prefix of my, which maps to a URN or a URL - which should be declared with a xmlns:my=... within the XML (either where it is declared, at a parent, or in the XML root element) - but isn't shown in the XML sample you provided here.
To update this using XmlNode, you need to use the overloaded SelectSingleNode method that accepts a XmlNamespaceManager as the 2nd argument. You then need to all the .AddNamespace method on the namespace manager to register the my prefix. This is detailed at http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.selectsinglenode%28v=VS.90%29.aspx .
Colons are not valid characters in xml elements / attributes. They are namespaces.
Your line
<my:ECR my:NoOfAutho="16" my:hideDetails="0" my:Type="ECR" my:NoOfVBUCMApprovales="10" >
Properly references the my namespace already, so you should just be able to do this:
<my:ECR NoOfAutho="16" hideDetails="0" Type="ECR" NoOfVBUCMApprovales="10" >
And you should be fine?
You will also have to remove the my: from other places in the file particularly closing tags
</ReasonForCR>

Parse XDocument without having to keep specifying the default namespace

I have some XML data (similar to the sample below) and I want to read the values in code.
Why am I forced to specify the default namespace to access each element? I would have expected the default namespace to be used for all elements.
Is there a more logical way to achieve my goal?
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<ReceiptsBatch xmlns="http://www.secretsonline.gov.uk/secrets">
<MessageHeader>
<MessageID>00000173</MessageID>
<Timestamp>2009-10-28T16:50:01</Timestamp>
<MessageCheck>BX4f+RmNCVCsT5g</MessageCheck>
</MessageHeader>
<Receipts>
<Receipt>
<Status>OK</Status>
</Receipt>
</Receipts>
</ReceiptsBatch>
Code to read xml elements I'm after:
XDocument xDoc = XDocument.Load( FileInPath );
XNamespace ns = "http://www.secretsonline.gov.uk/secrets";
XElement MessageCheck = xDoc.Element(ns+ "MessageHeader").Element(ns+"MessageCheck");
XElement MessageBody = xDoc.Element("Receipts");
As suggested by this answer, you can do this by removing all namespaces from the in-memory copy of the document. I suppose this should only be done if you know you won't have name collisions in the resulting document.
/// <summary>
/// Makes parsing easier by removing the need to specify namespaces for every element.
/// </summary>
private static void RemoveNamespaces(XDocument document)
{
var elements = document.Descendants();
elements.Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
foreach (var element in elements)
{
element.Name = element.Name.LocalName;
var strippedAttributes =
from originalAttribute in element.Attributes().ToArray()
select (object)new XAttribute(originalAttribute.Name.LocalName, originalAttribute.Value);
//Note that this also strips the attributes' line number information
element.ReplaceAttributes(strippedAttributes.ToArray());
}
}
You can use XmlTextReader.Namespaces property to disable namespaces while reading XML file.
string filePath;
XmlTextReader xReader = new XmlTextReader(filePath);
xReader.Namespaces = false;
XDocument xDoc = XDocument.Load(xReader);
This is how the Linq-To-Xml works. You can't find any element, if it is not in default namespace, and the same is true about its descendants. The fastest way to get rid from namespace is to remove link to the namespace from your initial XML.
The theory is that the meaning of the document is not affected by the user's choice of namespace prefixes. So long as the data is in the namespace http://www.secretsonline.gov.uk/secrets, it doesn't matter whether the author chooses to use the prefix "s", "secrets", "_x.cafe.babe", or the "null" prefix (that is, making it the default namespace). Your application shouldn't care: it's only the URI that matters. That's why your application has to specify the URI.
Note that the element Receipts is also in namespace http://www.secretsonline.gov.uk/secrets, so the XNamespace would also be required for the access to the element:
XElement MessageBody = xDoc.Element(ns + "Receipts");
As an alternative to using namespaces note that you can use "namespace agnostic" xpath using local-name() and namespace-uri(), e.g.
/*[local-name()='SomeElement' and namespace-uri()='somexmlns']
If you omit the namespace-uri predicate:
/*[local-name()='SomeElement']
Would match ns1:SomeElement and ns2:SomeElement etc. IMO I would always prefer XNamespace where possible, and the use-cases for namespace-agnostic xpath are quite limited, e.g. for parsing of specific elements in documents with unknown schemas (e.g. within a service bus), or best-effort parsing of documents where the namespace can change (e.g. future proofing, where the xmlns changes to match a new version of the document schema)

Why can't I retrieve attribute values from my XPathNavigator?

I have XML something like this:
<?xml version="1.0"?>
<a xmlns="http://mynamespace">
<b>
<c val="test" />
<b>
</a>
And I am trying to find the value of the 'val' attribute on the 'c' tag with something like this:
XmlDocument doc = new XmlDocument();
doc.Load("myxml.xml");
nsMgr = new XmlNamespaceManager(doc.NameTable);
nsMgr.AddNamespace(#"mns", "http://mynamespace");
XPathNavigator root = doc.CreateNavigator();
foreach (XPathNavigator nav in root.Select("//mns:c", nsMgr))
{
string val = nav.GetAttribute("val", NS);
Console.WriteLine("val == "+val);
}
My problem is that GetAttribute always returns as an empty string. What am I missing?
Update:
It seems I can fix this by passing an empty string into GetAttribute, i.e.
string val = nav.GetAttribute("val", "");
My question is therefore now: why does this work? Why does 'val' not belong to my namespace despite the XML having been validated against a schema which requires the 'val' attribute (I accidentally omitted this step in my above sample code, but I am validating the XML).
Default namespace declarations do not apply to attributes so that attribute named 'val' is in no namespace and if you want to access it then you need to access it without using a namespace.
The only way to put an attribute in a namespace is by giving it a qualified name with a prefix and local name (e.g. pf:val) where the prefix is bound to a namespace (e.g. xmlns:pf="http://example.com/foo").
Ok, I did some hunting and discovered that this is controlled in the XSD by the following attribute on the schema element:
attributeFormDefault="qualified"
or
attributeFormDefault="unqualified"
By default, it seems to be 'unqualified' which is to say that attributes never belong to any namespaces, only elements (Controlled by the elementFormDefault value, naturally).
Forcing qualified attributes by default forces any XML to require prefixes on all attributes belonging to that schema.
The answer to my question seems to be that an empty string is the correct namespace for all attributes defined in any schema that accepts the default value for this option.
Part of this is explained in Section 6.3 here which also states that the default namespace cannot apply to attributes - they must have prefixes, unlike elements.

Parsing XML document with XPath, C#

So I'm trying to parse the following XML document with C#, using System.XML:
<root xmlns:n="http://www.w3.org/TR/html4/">
<n:node>
<n:node>
data
</n:node>
</n:node>
<n:node>
<n:node>
data
</n:node>
</n:node>
</root>
Every treatise of XPath with namespaces tells me to do the following:
XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
mgr.AddNamespace("n", "http://www.w3.org/1999/XSL/Transform");
And after I add the code above, the query
xmlDoc.SelectNodes("/root/n:node", mgr);
Runs fine, but returns nothing. The following:
xmlDoc.SelectNodes("/root/node", mgr);
returns two nodes if I modify the XML file and remove the namespaces, so it seems everything else is set up correctly. Any idea why it work doesn't with namespaces?
Thanks alot!
As stated, it's the URI of the namespace that's important, not the prefix.
Given your xml you could use the following:
mgr.AddNamespace( "someOtherPrefix", "http://www.w3.org/TR/html4/" );
var nodes = xmlDoc.SelectNodes( "/root/someOtherPrefix:node", mgr );
This will give you the data you want. Once you grasp this concept it becomes easier, especially when you get to default namespaces (no prefix in source xml), since you instantly know you can assign a prefix to each URI and strongly reference any part of the document you like.
The URI you specified in your AddNamespace method doesn't match the one in the xmlns declaration.
If you declare prefix "n" to represent the namespace "http://www.w3.org/1999/XSL/Transform", then the nodes won't match when you do your query. This is because, in your document, the prefix "n" refers to the namespace "http://www.w3.org/TR/html4/".
Try doing mgr.AddNamespace("n", "http://www.w3.org/TR/html4/"); instead.

Categories

Resources