Search XML element with xml namespace prefix C# - c#

Update
I want to have an expression (XPath, or Regex Expression, similar) that can match an XML element with a particular namespace. For example, I want to locate the value of the link element (e.g. I need the http://url within <b:link>http://url</b:link>) shown below. However, the namespace prefix varies depending on different xml files as shown in cases 1-3.
Considering the allowed character for namespace prefix (e.g. is any character allowed/valid) , could anyone provide the solution (XPath, Regex Expression or similar?
Please note that because the xml file is unknown, thus, the namespace and prefix are unknown until runtime. Does it mean I cannot use this XDocument/XmlDocument, because it requires namespace to be known in the code.
Update
Case 1
<A xmlns:b="link">
<b:link>http://url
</b:link>
</A>
Case 2
<A xmlns="link">
<link>http://url
</link>
</A>
Case 3
<A xmlns:a123="link">
<a123:link>http://url
</a123:link>
</A>
Please note that the url within the link element could be any http url, and unknown until runtime.
Update
Please mark up my question.

You need to know the namespaces you will be dealing with and register them with an XmlNamespaceManager. Here is an example:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<A xmlns:b='link'><b:Books /></A>");
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("b", "link");
XmlNodeList books = doc.SelectNodes("//b:Books", nsmgr);
And if you want to do this using XDocument, which I would recommend for its brevity, here is how:
XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
XNamespace ns = "link";
var books = xDoc.Descendants(ns + "Books");
If you do not know the namespace(s) ahead of time, see this post which shows how to query across an XDocument using only the local name. Here's an example:
XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
var books = xDoc.Descendants().Where(e => e.Name.LocalName.ToLower() == "books");

Use an XML parser, not a regex.
That being said, you could use:
<(?:(.+?):)?Books />
And the namespace would be in captured group 1.
In fact, I'd more strongly recommend you use
<(?:([^<>]+?):)?Books />
To prevent mistakes like matching over another set of XML tags (who would use <> in a namespace anyway?!)

Related

How to select child node with XPath

I'm trying to get values from a XML document using the iXF format, but I'm having some issues with the XPath syntax.
I have the following XML document
<SOAP_ENV:Envelope xmlns:NS2="http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0" xmlns:NS1="CATIA/V5/Electrical/1.0" xmlns:tns="IXF_Schema.xsd" xmlns:ixf="http://www.ixfstd.org/std/ns/core/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP_ENV="http://schemas.xmlsoap.org/soap/envelope/" xsi:schemaLocation="IXF_Schema.xsd ElectricalSchema.xsd">
<SOAP_ENV:Body>
<ixf:object id="Electrical Physical System00000089.1" xsi:type="tns:Harness">
<tns:Name>Electrical Physical System00000089.1</tns:Name>
</ixf:object>
<ixf:object id="X10(1)//X11(1)" xsi:type="tns:Wire">
<tns:Name>X10(1)//X11(1)</tns:Name>
<NS1:Wire>
<NS1:Length>763,752mm</NS1:Length>
<NS1:Color>RD</NS1:Color>
<NS1:OuterDiameter>1,32mm</NS1:OuterDiameter>
</NS1:Wire>
</ixf:object>
</SOAP_ENV:Body>
</SOAP_ENV:Envelope>
And i'm trying to find all the Wire objects and get the Name and Length values with the following code.
XmlDocument xlDocument = new XmlDocument();
xlDocument.Load(importFile);
XmlNamespaceManager nsManager = new XmlNamespaceManager(xlDocument.NameTable);
nsManager.AddNamespace("tns", "IXF_Schema.xsd");
nsManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsManager.AddNamespace("ixf", "http://www.ixfstd.org/std/ns/core/1.0");
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0");
nsManager.AddNamespace("NS2", "http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0");
//Get all wire objects
XmlNodeList wires = xlDocument.SelectNodes("descendant::ixf:object[#xsi:type = \"tns:Wire\"]", nsManager);
foreach (XmlNode wire in wires)
{
string wireName;
string wireLength;
XmlNode node = wire.SelectSingleNode("./tns:Name", nsManager);
wireName = node.InnerText;
XmlNode node1 = wire.SelectSingleNode("./NS1:Wire/NS1:Length", nsManager);
wireLength = node1.InnerText;
}
I can get the wireName value without any problems but the Length element selection always returns 0 matches and I can not figure out why. I also tried to only select the Wire element using the same syntax as the Name element ./NS1:Wire but that also returns 0 matches.
Your XML declares
xmlns:NS1="CATIA/V5/Electrical/1.0"
^^
Your C# declares a different namespacem
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0")
^^
Make sure both namespaces match exactly.
Regarding your comment asking about the use of version numbers in namespaces...
It is an unfortunately common but certainly not widely accepted practice to include a version number in an XML namespace. Realize that by doing so, you're effectively saying that every namespaced XML component (element or attribute) should now be considered to differ from its counterpart in the old namespace. This is rarely what you want.
See also
Should I use a Namespace of an XML file to identify its version
What are the best practices for versioning XML schemas?

How can I update XML element or attribute which its name contains ":" special character

I have XML file and the elements/attributes names have ":" character, how I can update its vales?
<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.3"?>
<my:ECR my:NoOfAutho="16" my:hideDetails="0" my:Type="ECR" my:NoOfVBUCMApprovales="10" >
<my:ECRNo>148</my:ECRNo>
<my:Stage>Approved</my:Stage>
<my:Details>
<my:ReasonForCR>Reason For CR</my:ReasonForCR>
<my:AreaAffected_Publications_VBUCM>false</my:AreaAffected_Publications_VBUCM>
<my:AreaAffected_Engineering>true</my:AreaAffected_Engineering>
<my:AreaAffected_Production>false</my:AreaAffected_Production>
<my:AreaAffected_CustomerSupport>true</my:AreaAffected_CustomerSupport>
<my:AreaAffected_VBUCMTest>false</my:AreaAffected_VBUCMTest>
</my:AreaAffectedVB_UCM>
Your XML sample is invalid as shown. The my prefix is not defined in the XML.
If your XML contained xmlns:my="schemas.microsoft.com/office/infopath/2003/myXSD/…" then the XML would at least have some hope of being valid.
For manipulating XML with namespaces in .NET code, consider using Linq XDocument instead of XmlDocument. I have found Linq's XNamespace and XName types to be much, much easier to use with the XDocument family of classes than the old style XmlDocument's rather clunky handling of namespaces.
Change your XML to add the xmlns:my attribute to the root element:
<my:MNO xmlns:my="schemas.microsoft.com/office/infopath/2003/myXSD/…" my:NoOfAutho="16" etc... >
In your C# code, add a reference to the Linq stuff to the top of your source file:
using System.Xml.Linq;
Then use code like this (not checked, may contain syntax typos) to load the xml and access the element:
XNamespace ns = "schemas.microsoft.com/office/infopath/2003/myXSD/…";
XName MNO_Name = ns + "MNO";
XDocument doc = XDocument.Load(path2);
XElement MNO_Element = doc.Root.Descendants(MNO_Name).Single();
You can then read or modify the properties, attributes, and children of the MNO element.
To read the value of <MNO>100</MNO>, use MNO_Element.Value.
To write a new value to the element, assign to the value property: MNO_Element.Value = "120";
.Single() asserts that there is exactly one node that matches the selection criteria, similar to the .SelectSingleNode() function of XmlDocument.
As you can see from this code, the name of the "my" namespace prefix in the XML document is immaterial to the code that processes the XML - it's the URI that the "my" prefix represents that is what is important. The prefix is just shorthand so the the XML writer doesn't have to write long and laborious URIs everywhere.
Writing your XML processing code to be agnostic of the XML namespace prefix is very important because the prefix name can (and will) vary from one XML doc to the next, but the namespace URI will be the same.
I don't understand what you mean by "how can I update its values", but it will likely help if you understand that these are XML namespaces.
I.E., my:ECRNo has a simple element name of ECRNo with a namespace prefix of my, which maps to a URN or a URL - which should be declared with a xmlns:my=... within the XML (either where it is declared, at a parent, or in the XML root element) - but isn't shown in the XML sample you provided here.
To update this using XmlNode, you need to use the overloaded SelectSingleNode method that accepts a XmlNamespaceManager as the 2nd argument. You then need to all the .AddNamespace method on the namespace manager to register the my prefix. This is detailed at http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.selectsinglenode%28v=VS.90%29.aspx .
Colons are not valid characters in xml elements / attributes. They are namespaces.
Your line
<my:ECR my:NoOfAutho="16" my:hideDetails="0" my:Type="ECR" my:NoOfVBUCMApprovales="10" >
Properly references the my namespace already, so you should just be able to do this:
<my:ECR NoOfAutho="16" hideDetails="0" Type="ECR" NoOfVBUCMApprovales="10" >
And you should be fine?
You will also have to remove the my: from other places in the file particularly closing tags
</ReasonForCR>

Parse XDocument without having to keep specifying the default namespace

I have some XML data (similar to the sample below) and I want to read the values in code.
Why am I forced to specify the default namespace to access each element? I would have expected the default namespace to be used for all elements.
Is there a more logical way to achieve my goal?
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<ReceiptsBatch xmlns="http://www.secretsonline.gov.uk/secrets">
<MessageHeader>
<MessageID>00000173</MessageID>
<Timestamp>2009-10-28T16:50:01</Timestamp>
<MessageCheck>BX4f+RmNCVCsT5g</MessageCheck>
</MessageHeader>
<Receipts>
<Receipt>
<Status>OK</Status>
</Receipt>
</Receipts>
</ReceiptsBatch>
Code to read xml elements I'm after:
XDocument xDoc = XDocument.Load( FileInPath );
XNamespace ns = "http://www.secretsonline.gov.uk/secrets";
XElement MessageCheck = xDoc.Element(ns+ "MessageHeader").Element(ns+"MessageCheck");
XElement MessageBody = xDoc.Element("Receipts");
As suggested by this answer, you can do this by removing all namespaces from the in-memory copy of the document. I suppose this should only be done if you know you won't have name collisions in the resulting document.
/// <summary>
/// Makes parsing easier by removing the need to specify namespaces for every element.
/// </summary>
private static void RemoveNamespaces(XDocument document)
{
var elements = document.Descendants();
elements.Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
foreach (var element in elements)
{
element.Name = element.Name.LocalName;
var strippedAttributes =
from originalAttribute in element.Attributes().ToArray()
select (object)new XAttribute(originalAttribute.Name.LocalName, originalAttribute.Value);
//Note that this also strips the attributes' line number information
element.ReplaceAttributes(strippedAttributes.ToArray());
}
}
You can use XmlTextReader.Namespaces property to disable namespaces while reading XML file.
string filePath;
XmlTextReader xReader = new XmlTextReader(filePath);
xReader.Namespaces = false;
XDocument xDoc = XDocument.Load(xReader);
This is how the Linq-To-Xml works. You can't find any element, if it is not in default namespace, and the same is true about its descendants. The fastest way to get rid from namespace is to remove link to the namespace from your initial XML.
The theory is that the meaning of the document is not affected by the user's choice of namespace prefixes. So long as the data is in the namespace http://www.secretsonline.gov.uk/secrets, it doesn't matter whether the author chooses to use the prefix "s", "secrets", "_x.cafe.babe", or the "null" prefix (that is, making it the default namespace). Your application shouldn't care: it's only the URI that matters. That's why your application has to specify the URI.
Note that the element Receipts is also in namespace http://www.secretsonline.gov.uk/secrets, so the XNamespace would also be required for the access to the element:
XElement MessageBody = xDoc.Element(ns + "Receipts");
As an alternative to using namespaces note that you can use "namespace agnostic" xpath using local-name() and namespace-uri(), e.g.
/*[local-name()='SomeElement' and namespace-uri()='somexmlns']
If you omit the namespace-uri predicate:
/*[local-name()='SomeElement']
Would match ns1:SomeElement and ns2:SomeElement etc. IMO I would always prefer XNamespace where possible, and the use-cases for namespace-agnostic xpath are quite limited, e.g. for parsing of specific elements in documents with unknown schemas (e.g. within a service bus), or best-effort parsing of documents where the namespace can change (e.g. future proofing, where the xmlns changes to match a new version of the document schema)

How to find a XML tag with a specific attribute (in C#)

I need to get a list of tags that contain a specific attribute. I am using DITA xml and I need to find out all tags that has a href attribute.
The problem here is that the attribute may be inside any tag so XPath will not work in this case. For example, an image tag may contain a href, a topicref tag may contain a href, and so on.
So I need to get a XmlNodeList (as returned by the getElementByTagName method). Ideally I need a method getElementByAttributeName that should return XmlNodeList.
I might have misunderstood your problem here, but I think you could possibly use an XPath expression.
var nodes = doc.SelectNodes("//*[#href='pic1.jpg']");
The above should return all elements with href='pic1.jpg', where doc is the XmlDocument
If you're on C#, then the following approach might work for you:
XDocument document = XDocument.Load(xmlReader);
XAttribute xa = new XAttribute("href", "pic1.jpg");
var attrList = document.Descendants().Where (d => d.Attributes().Contains(xa));

Parsing XML document with XPath, C#

So I'm trying to parse the following XML document with C#, using System.XML:
<root xmlns:n="http://www.w3.org/TR/html4/">
<n:node>
<n:node>
data
</n:node>
</n:node>
<n:node>
<n:node>
data
</n:node>
</n:node>
</root>
Every treatise of XPath with namespaces tells me to do the following:
XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
mgr.AddNamespace("n", "http://www.w3.org/1999/XSL/Transform");
And after I add the code above, the query
xmlDoc.SelectNodes("/root/n:node", mgr);
Runs fine, but returns nothing. The following:
xmlDoc.SelectNodes("/root/node", mgr);
returns two nodes if I modify the XML file and remove the namespaces, so it seems everything else is set up correctly. Any idea why it work doesn't with namespaces?
Thanks alot!
As stated, it's the URI of the namespace that's important, not the prefix.
Given your xml you could use the following:
mgr.AddNamespace( "someOtherPrefix", "http://www.w3.org/TR/html4/" );
var nodes = xmlDoc.SelectNodes( "/root/someOtherPrefix:node", mgr );
This will give you the data you want. Once you grasp this concept it becomes easier, especially when you get to default namespaces (no prefix in source xml), since you instantly know you can assign a prefix to each URI and strongly reference any part of the document you like.
The URI you specified in your AddNamespace method doesn't match the one in the xmlns declaration.
If you declare prefix "n" to represent the namespace "http://www.w3.org/1999/XSL/Transform", then the nodes won't match when you do your query. This is because, in your document, the prefix "n" refers to the namespace "http://www.w3.org/TR/html4/".
Try doing mgr.AddNamespace("n", "http://www.w3.org/TR/html4/"); instead.

Categories

Resources