XML namespaces and XPath - c#

I have an application that has to load XML document and output nodes depending on XPath.
Suppose I start with a document like this:
<aaa>
...[many nodes here]...
<bbb>text</bbb>
...[many nodes here]...
<bbb>text</bbb>
...[many nodes here]...
</aaa>
With XPath //bbb
So far everything is nice.
And selection doc.SelectNodes("//bbb"); returns the list of required nodes.
Then someone uploads a document with one node like <myfancynamespace:foo/> and extra namespace in the root tag, and everything breaks.
Why? //bbb does not give a damn about myfancynamespace, theoretically it should even be good with //myfancynamespace:foo, as there is no ambiguity, but the expression returns 0 results and that's it.
Is there a workaround for this behavior?
I do have a namespace manager for the document, and I am passing it to the Xpath query. But the namespaces and the prefixes are unknown to me, so I can't add them before the query.
Do I have to pre-parse the document to fill the namespace manager before I do any selections? Why on earth such behavior, it just doesn't make sense.
EDIT:
I'm using:
XmlDocument and XmlNamespaceManager
EDIT2:
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null;
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
//I wish I could:
//nsmgr.AddNamespace("magic", "http://magicnamespaceuri/
//...
doc.LoadXML(usersuppliedxml);
XmlNodeList nodes = doc.SelectNodes(usersuppliedxpath, nsmgr);//usersuppliedxpath -> "//bbb"
//nodes.Count should be > 0, but with namespaced document they are 0
EDIT3:
Found an article which describes the actual scenario of the issue with one workaround, but not very pretty workaround: http://codeclimber.net.nz/archive/2008/01/09/How-to-query-a-XPath-doc-that-has-a-default.aspx
Almost seems that stripping the xmlns is the way to go...

You're missing the whole point of XML namespaces.
But if you really need to perform XPath on documents that will use an unknown namespace, and you really don't care about it, you will need to strip it out and reload the document. XPath will not work in a namespace-agnostic way, unless you want to use the local-name() function at every point in your selectors.
private XmlDocument StripNamespace(XmlDocument doc)
{
if (doc.DocumentElement.NamespaceURI.Length > 0)
{
doc.DocumentElement.SetAttribute("xmlns", "");
// must serialize and reload for this to take effect
XmlDocument newDoc = new XmlDocument();
newDoc.LoadXml(doc.OuterXml);
return newDoc;
}
else
{
return doc;
}
}

<myfancynamespace:foo/> is not necessarily the same as <foo/>.
Namespaces do matter. But I can understand your frustration as they usually tend to breaks codes as various implementation (C#, Java, ...) tend to output it differently.
I suggest you change your XPath to allow for accepting all namespaces. For example instead of
//bbb
Define it as
//*[local-name()='bbb']
That should take care of it.

You should describe a bit more detailed what you want to do. The way you ask your question it make no sense at all. The namespace is just a part of the name. Nothing more, nothing less. So your question is the same as asking for an XPath query to get all tags ending with "x". That's not the idea behind XML, but if you have strange reasons to do so: Feel free to iterate over all nodes and implement it yourself. The same applies to functionality you are requesting.

You could use the LINQ XML classes like XDocument. They greatly simplify working with namespaces.

Related

Xml with Namespace but without xmlns

I am working on a program in C# that edits open-document files on xml level. For example it adds rows to tables.
So I load the content.xml into an XmlDocument "doc" and traverse the xml structure.
Say I have the <table:table-row> node in an XmlNode "row" and now I want to add a <table:table-cell> node to it. So I call
XmlDocument doc = new XmlDocument();
doc.Load(filename);
...
XmlNode row = ...;
...
XmlNode cell = doc.CreateElement("table:table-cell");
row.Append(cell);
...
doc.Save(filename);
The problem is that, in the file, the new node only contains
<table-cell>...</table-cell>
C# just decides to ignore what I told it to and does something else without even telling me (at first I overlooked the problem and was wondering why it didn't work although the generated xml looked okay).
From what I gathered out so far, the problem has to do with the fact that "table:" is a namespace. When I also supply a NamespaceURI to CreateElement, I get
<table:table-cell table:xmlns="THE_URI" >... - but the original document did not have this xmlns, so I don't want it either...
I tried to use an XmlTextWriter and setting writer.Settings.Namespaces = false, because I thought, this should suppress the output of the xmlns, but it only caused an exception - the document has some namespaces, which are forbidden if Namespaces is set to false... (wtf!? suppressing the output of xmlns seems a billion times more logical than throwing an exception if an xmlns is present...)
In some similar discussions I read that you should set the cell.Name manually, but this property is read-only...
Others suggest to change it on text-file level (that's tinkering and it would be slow)
Can anyone give me a hint?
Every namespace should have at least one xmlns definition with a URI. This is the ultimate differentiation between two tags.
You can however have the xmlns attribute declared only once in the file (in the beginning).
See
Creating a specific XML document using namespaces in C#
The table: parts are not namespaces. They are "namespace prefixes". They are an alias for the actual namespace. They must be declared somewhere. If they are not declared at all in your source XML, then it is not valid XML, and you shouldn't expect to be able to process it.
Are you sure that what you have loaded is the entire XML document? They haven't left off parts to make it simpler? Those parts may be the ones that contain the definition of table:.

Better way to find xml node?

Can anyone tell me if there is a better way to search an XML file and replace a value?
The node could exist anywhere, so can't use xpath.
I can achieve what I want with the following, but just wondering if there is an easier way.
XmlDocument doc = new XmlDocument();
doc.Load("c:\\test.xml");
XmlNodeList elemList = doc.GetElementsByTagName("NameToChange");
for (int i=0; i < elemList.Count; i++)
{
elemList[i].InnerText = "replacedText";
}
TIA
Dave
Use LINQ to XML and the XElement etc classes. Personally, I prefer it to XPath because it's integrated directly into the language, instead of relying on strings.
Actually, I think there's no problem with unspecified location when it comes to XPath. Just use the descendant direction
http://www.tizag.com/xmlTutorial/xpathdescendant.php
The XPath would be:
//NameToChange
But your solution seems OK if you ask me. The way you did it seems easy.
The main difference that could result from using one solution or the other would be in the difficulty of implementing any changes in the way you want to select your nodes. I'm not familiar with C# libraries but XPath lets you do almost anything in a single line, when it comes to selection of elements.

C#, XML Query Question

I have tons of XML files all containing a the same XML Document, but with different values. But the structure is the same for each file.
Inside this file I have a datetime field.
What is the best, most efficient way to query these XML files? So I can retrieve for example... All files where the datetime field = today's date?
I'm using C# and .net v2. Should I be using XML objects to achieve this or text in file search routines?
Some code examples would be great... or just the general theory, anything would help, thanks...
This depends on the size of those files, and how complex the data actually is. As far as I understand the question, for this kind of XML data, using an XPath query and going through all the files might be the best approach, possibly caching the files in order to lessen the parsing overhead.
Have a look at:
XPathDocument, XmlDocument classes and XPath queries
http://support.microsoft.com/kb/317069
Something like this should do (not tested though):
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
// if required, add your namespace prefixes here to nsmgr
XPathExpression expression = XPathExpression.Compile("//element[#date='20090101']", nsmgr); // your query as XPath
foreach (string fileName in Directory.GetFiles("PathToXmlFiles", "*.xml")) {
XPathDocument doc;
using (XmlTextReader reader = new XmlTextReader(fileName, nsmgr.NameTable)) {
doc = new XPathDocument(reader);
}
if (doc.CreateNavigator().SelectSingleNode(expression) != null) {
// matching document found
}
}
Note: while you can also load a XPathDocument directly from a URI/path, using the reader makes sure that the same nametable is being used as the one used to compile the XPath query. If a different nametable was being used, you'd not get results from the query.
You might look into running XSL queries. See also XSLT Tutorial, XML transformation using Xslt in C#, How to query XML with an XPath expression by using Visual C#.
This question also relates to another on Stack Overflow: Parse multiple XML files with ASP.NET (C#) and return those with particular element. The accepted answer there, though, suggests using Linq.
If it is at all possible to move to C# 3.0 / .NET 3.5, LINQ-to-XML would be by far the easiest option.
With .NET 2.0, you're stuck with either XML objects or XSL.

C# xml read/write/xpath without using XmlDocument

I am refactoring some code in an existing system. The goal is to remove all instances of the XmlDocument to reduce the memory footprint. However, we use XPath to manipulate the xml when certain rules apply. Is there a way to use XPath without using a class that loads the entire document into memory? We've replaced all other instances with XmlTextReader, but those only worked because there is no XPath and the reading is very simple.
Some of the XPath uses values of other nodes to base its decision on. For instance, the value of the message node may be based on the value of the amount node, so there is a need to access multiple nodes at one time.
If your XPATH expression is based on accessing multiple nodes, you're just going to have to read the XML into a DOM. Two things, though. First, you don't have to read all of it into a DOM, just the part you're querying. Second, which DOM you use makes a difference; XPathDocument is read-only and tuned for XPATH query speed, unlike the more general purpose but expensive XmlDocument.
I supose that using System.Xml.Linq.XDocument is also prohibited? Otherwise, it would be a good choice, as it is faster than XmlDocument (as I remember).
Supporting XPath means supporting queries like:
//address[/states/state[#code=current()/#code]='California']
or
//item[#id != preceding-sibling/item/#id]
which require the XPath processor to be able to look everywhere in the document. You're not going to find a forward-only XPath processor.
The way to do this is to use XPathDocument, which can take a stream - therefore you can use StringReader.
This returns the value in a forward read way without the overhead of loading the whole XML DOM into memory with XmlDocument.
Here is an example which returns the value of the first node that satisfies the XPath query:
public string extract(string input_xml)
{
XPathDocument document = new XPathDocument(new StringReader(input_xml));
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator node_iterator = navigator.Select(SEARCH_EXPRESSION);
node_iterator.MoveNext();
return node_iterator.Current.Value;
}

Parsing XML document with XPath, C#

So I'm trying to parse the following XML document with C#, using System.XML:
<root xmlns:n="http://www.w3.org/TR/html4/">
<n:node>
<n:node>
data
</n:node>
</n:node>
<n:node>
<n:node>
data
</n:node>
</n:node>
</root>
Every treatise of XPath with namespaces tells me to do the following:
XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
mgr.AddNamespace("n", "http://www.w3.org/1999/XSL/Transform");
And after I add the code above, the query
xmlDoc.SelectNodes("/root/n:node", mgr);
Runs fine, but returns nothing. The following:
xmlDoc.SelectNodes("/root/node", mgr);
returns two nodes if I modify the XML file and remove the namespaces, so it seems everything else is set up correctly. Any idea why it work doesn't with namespaces?
Thanks alot!
As stated, it's the URI of the namespace that's important, not the prefix.
Given your xml you could use the following:
mgr.AddNamespace( "someOtherPrefix", "http://www.w3.org/TR/html4/" );
var nodes = xmlDoc.SelectNodes( "/root/someOtherPrefix:node", mgr );
This will give you the data you want. Once you grasp this concept it becomes easier, especially when you get to default namespaces (no prefix in source xml), since you instantly know you can assign a prefix to each URI and strongly reference any part of the document you like.
The URI you specified in your AddNamespace method doesn't match the one in the xmlns declaration.
If you declare prefix "n" to represent the namespace "http://www.w3.org/1999/XSL/Transform", then the nodes won't match when you do your query. This is because, in your document, the prefix "n" refers to the namespace "http://www.w3.org/TR/html4/".
Try doing mgr.AddNamespace("n", "http://www.w3.org/TR/html4/"); instead.

Categories

Resources