I have a rather large XML file from a computer diagnostics session, and my goal is to grab the test results data and pump it into a PDF for the customer. I've very little experience with XML and this is turning out to be a huge problem.
Here is a sample of the Document:
<pcd:DiagLog xmlns="http://www.pc-doctor.com/2004/8/diagLogger"
xmlns:pcd="http://www.pc-doctor.com/2004/8/diagLogger"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.pc-doctor.com/2004/8/diagLogger
http://www.pc-doctor.com/2004/8/diagLogger/diagLogger.xsd">
<Application>
<version>6.0.6818.10</version>
<Start>
<Time hour="04" minute="14" second="01" millisecond="34" month="10" day="15" year="2016" utcOffset="-480">2016-10-15T04:14:01.034-08:00</Time>
</Start>
<OS>Windows 10 Service Pack 0 PE x86 32-bit</OS>
</Application>
.......
<DiagInfo>
....
<TestResult EnglishResult="PASS">
....
</TestResult>
</DiagInfo>
There are thousands of lines between </Application> and <DiagInfo>, but I'm only concerned with the information found in <DiagInfo> and <TestResult>.
I thought I could grab the Nodes by simply:
XmlDocument doc = new XmlDocument();
doc.Load(xmlFilePath);
XmlNamespaceManager manager = new XmlNamespaceManager(doc.NameTable);
manager.AddNamespace("pcd", "http://www.pc-doctor.com/2004/8/diagLogger");
XmlNodeList xnList = doc.SelectNodes("/pcd:DiagLog/DiagInfo", manager);
But this is returning an empty list. When I refer to Namespace Manager or XsltContext needed, it appears I'm doing it right, but I don't think I'm understanding adding a namespace correctly. When I change the Root Element to just: <Diagnostics></Diagnostics> instead of the <pcd:DiagLog>, and try: doc.SelectNodes("/Diagnostics/DiagInfo", manager); my nodes list is populated.
Can anyone see where I'm screwing up the Namespace?
You need to use the namespace prefix for all nodes in that namespace.
This is incorrect: /pcd:DiagLog/DiagInfo.
This is correct: /pcd:DiagLog/pcd:DiagInfo.
Related
I have following XML:
<?xml version="1.0" encoding="utf-16"?>
<cincinnati xmlns="http://www.sesame-street.com/abc/def/1">
<cincinnatiChild xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ElementValue xmlns:a="http://schemas.data.org/2004/07/sesame-street.abc.def.ghi">
<a:someField>false</a:someField>
<a:data xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<b:KeyValueThing>
<b:Key>key1</b:Key>
<b:Value i:type="a:ArrayOfPeople">
<a:Person>
<a:firstField>
</a:firstField>
<a:dictionary>
<b:KeyValueThing>
<b:Key>ID</b:Key>
<b:Value i:type="c:long" xmlns:c="http://www.w3.org/2001/XMLSchema">000101</b:Value>
</b:KeyValueThing>
<b:KeyValueThing>
<b:Key>Name</b:Key>
<b:Value i:type="c:string" xmlns:c="http://www.w3.org/2001/XMLSchema">John</b:Value>
</b:KeyValueThing>
</a:dictionary>
</a:Person>
<a:Person>
...
<b:Value i:type="c:long" xmlns:c="http://www.w3.org/2001/XMLSchema">000102</b:Value>
...
</a:Person>
</b:Value>
</b:KeyValueThing>
</a:data>
</ElementValue>
</cincinnatiChild>
</cincinnati>
I need to get a list if ID values, e.g. 000101, 000102....
I think using XPath makes sense here but the multitude of namespaces makes it confusing (so a simple XmlNamespaceManager won't do).
Basically I need something like this (this syntax is of course not correct):
XmlDocument doc = // Load the xml
doc.DocumentElement.SelectSingleNode("/cincinati/cincinnatiChild/ElementValue/a:data/b:KeyValueThing/b:Value/a:Person/a:dictionary[b:KeyValueThing/b:Key='ID']");
also when I do doc.DocumentElement.SelectSingleNode("/cincinnati") or doc.DocumentElement.SelectSingleNode("/cincinnatiChild") I get null.
Since I'm unsure how to piece together al the helpfull advice from the comments I would like to see a working c# code line, either XmlDocument or XDocument are OK.
I also tries this before the SelectSingleNode:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("a", "http://schemas.data.org/2004/07/sesame-street.abc.def.ghi");
nsmgr.AddNamespace("b", "http://schemas.microsoft.com/2003/10/Serialization/Arrays");
nsmgr.AddNamespace("c", "http://www.w3.org/2001/XMLSchema");
nsmgr.AddNamespace("i", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("d", "http://www.sesame-street.com/abc/def/1");
You could bypass the namespace issues using the local-name()-function.
i.e.: in stead of b:KeyValueThing use *[local-name()='KeyValueThing']
a simple global XPath could look like this:
//*[local-name()='KeyValueThing'][*[local-name()='Key' and text()='ID' ]]/*[local-name()='Value']/text()
If you want to be more precise and speed up the XPath it would look like this:
/*[local-name()='cincinnati']/*[local-name()='cincinnatiChild']/*[local-name()='ElementValue']/*[local-name()='data']/*[local-name()='KeyValueThing']/*[local-name()='Value']/*[local-name()='Person']/*[local-name()='dictionary']/*[local-name()='KeyValueThing'][*[local-name()='Key' and text()='ID' ]]/*[local-name()='Value']/text()
I am trying to add a single line/node (provided below) into an XML:
<Import Project=".www\temp.proj" Condition="Exists('.www\temp.proj')" />
The line could be under the main/root node of the XML:
<Project Sdk="Microsoft.NET.Sdk">
The approach I used:
XmlDocument Proj = new XmlDocument();
Proj.LoadXml(file);
XmlElement root = Proj.DocumentElement;
// Not sure about the next steps
root.SetAttribute("not sure", "not sure", "not sure");
Though I don't exactly know how to add that line in the XML, cause it was my first try on directly editing XML files, the error caused an extra problem over it.
I get this error on my first attempt:
C# "loadxml" 'Data at the root level is invalid. Line 1, position 1.'
Know this error was a famous one, which some provided a variety of approaches in this link:
xml.LoadData - Data at the root level is invalid. Line 1, position 1
Unfortunately, most of the solutions are outdated, the answer didn't work on this case, and I don't know how to apply others on this case.
Provided/accepted answer on the link for that issue:
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}
Basically it didn't work, cause xml.StartsWith seems not existing anymore, at the same time xml.Remove also doesn't exist.
Can you please provide a piece of code that bypass the error and add the line to the XML?
Edit:
The sample XML file is provided in the comments section.
For the Xml posted in the comment, I have used two approachs :
1 - XmlDocument
XmlDocument Proj = new XmlDocument();
Proj.Load(file);
XmlElement root = Proj.DocumentElement;
//Create node
XmlNode node = Proj.CreateNode(XmlNodeType.Element, "Import", null);
//create attributes
XmlAttribute attrP = Proj.CreateAttribute("Project");
attrP.Value = ".www\\temp.proj";
XmlAttribute attrC = Proj.CreateAttribute("Condition");
attrC.Value = "Exists('.www\\temp.proj')";
node.Attributes.Append(attrP);
node.Attributes.Append(attrC);
//Get node PropertyGroup, the new node will be inserted before it
XmlNode pG = Proj.SelectSingleNode("/Project/PropertyGroup");
root.InsertBefore(node, pG);
Console.WriteLine(root.OuterXml);
2 - Linq To Xml, by using XDocument
XDocument xDocument = XDocument.Load(file);
xDocument.Root.AddFirst(new XElement("Import",
new XAttribute[]
{
new XAttribute("Project", ".www\\temp.proj"),
new XAttribute("Condition", "Exists('.www\\temp.proj')")
}));
Console.WriteLine(xDocument);
Namespace to add for XDocument:
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
Both solutions give the same result, but the last one is simple.
I hope you find this helpful.
Would it be possible for you to use the official MSBuild libraries?(https://www.nuget.org/packages/Microsoft.Build/)
I'm not sure which nuget package is actually required to read and edit project files only.
I've tried to programatically edit MSBuild project files directly and can not recommend it. It broke regulary due to unexpected changes...
The MSBuild library does a good job in editing project files and e.g. adding Properties, Items or Imports.
I am tasked with ripping and stripping pertinent data from XFDL files. I am attempting to use XmlDocument's SelectSignleNode method to do so. However, it has proven unsuccessful.
Represntative XML:
<XFDL>
...
<page1>
<check3>true</check3>
</page1>
...
<page sid="PAGE1">
<check sid="CHECK9">
<value>true</value>
</check>
</page>
...
Code:
XmlDocument document = new XmlDocument();
document.Load(memoryStream);//decoded and unzipped xfdl file
//Doesn't work
XmlNode checkBox = document.SelectSingleNode("//check[#sid='CHECK9']/value");
//Doesn't work
XmlNode checkBox = document.SelectSingleNode("//page[#sid='PAGE1']/check[#sid='CHECK9']");
MsgBox(checkBox.InnerXml);
Yields me System.NullReferenceException as an XmlNode isn't selected.
I think I'm having an xpath issue but I can't seem to understand where. The earlier xml node is easily selected using:
XmlNode checkBox = document.SelectSingleNode("//page1/check3");
MsgBox(checkBox.InnerText);
Displays just fine. And just to head it off at the pass, there isn't a definition of <check9></check9> in the <page1> tag.
Anyone have some insight?
Thanks in advance.
Okay, so here's the deal. XFDL defines a default namespace that requires an arbitrary mapping for xpath querying. In my case:
XML:
<XFDL xmlns="http://www.ibm.com/xmlns/prod/xfdl/8.0" ... >
Code:
manager.AddNamespace("a", "http://www.ibm.com/xmlns/prod/xfdl/8.0");
//Append 'a:' to query elements
document.SelectSingleNode("//a:check[#sid='CHECK9']/a:value", manager);
The problem is compounded because <check> is buried in <page> which is defined in another namespace: xfdl. My xpath query becomes:
document.SelectSingleNode("//xfdl:page[#sid='PAGE1']/a:check[#sid='CHECK9']/a:value", manager);
Now this is rather XFDL specific but can be applied to other issues where there are multiple namespaces defined within an XML document.
EDIT 1
Source: http://codeclimber.net.nz/archive/2008/01/09/How-to-query-a-XPath-doc-that-has-a-default.aspx
I have the following xml;
<env:Envelope xmlns:env='http://schemas.xmlsoap.org/soap/envelope/'>
<env:Header>
<mm7:TransactionID xmlns:mm7='http://www.3gpp.org/ftp/Specs/archive/23_series/23.140/schema/REL-6-MM7-1-4' env:mustUnderstand='1'>6797324d</mm7:TransactionID>
</env:Header>
<env:Body>
<DeliveryReportReq xmlns='http://www.3gpp.org/ftp/Specs/archive/23_series/23.140/schema/REL-6-MM7-1-4'>
<MM7Version>6.8.0</MM7Version><MMSRelayServerID>TARAL</MMSRelayServerID>
<MessageID>T*3*T\*4\*855419761</MessageID>
<Recipient>
<RFC2822Address>+61438922562/TYPE=hidden</RFC2822Address>
</Recipient>
<Sender>
<RFC2822Address>61418225661/TYPE=hidden</RFC2822Address>
</Sender>
<Date>2011-08-15T12:57:27+10:00</Date>
<MMStatus>Retrieved</MMStatus>
<StatusText>The message was retrieved by the recipient</StatusText>
</DeliveryReportReq>
</env:Body>
</env:Envelope>
So then i have the following c# code;
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(file);
XmlNode xNode = xDoc.SelectSingleNode("env:Envelope");
and i get the error;
Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function.
anyone know how to fix this?
Personally I would use LINQ to XML instead - its namespace support is far easier to get a handle on. It's not clear why you want to use XPath here anyway, given that Envelope is simply the root node - why not just ask for the root node?
However, if you really want to use XPath, you can create a new XmlNamespaceManager from the name table in the XmlDocument, register a prefix and then pass in the namespace manager to the SelectSingleNode overload which takes one.
There's some sample code in this answer but again I'd strongly urge you to consider other approaches if you can... particularly using LINQ to XML, where a search for (say) all the "env:Body" elements (only one here, but...) would look like this:
XNamespace env = "http://schemas.xmlsoap.org/soap/envelope/";
var bodies = doc.Descendants(env + "Body");
I just ran into an issue where my code was parsing xml fine but once I added in a second node it started to load incorrect data. The real code spans a number of classes and projects but for the sample I've put together the basics of what's causing the issue
When the code runs I'd expect the output to be the contents of the second Task node, but instead the contents of the first node is output. It keeps pulling from the first occurrence of the EmailAddresses node despite how when you check the settings object its inner xml is that of the second Task node. The call to SelectSingleNode("//EmailAddresses") is where the issue happens.
I have two ways around this issue
Remove the leading slashes from the EmailAddresses XPath expression
Call Clone() after getting the Task or Settings node
Solution 1 works in this case but I believe this will cause other code in my project to stop working.
Solution 2 looks more like a hack to me than a real solution.
MY question is am I in fact doing this correctly and there's a bug in .NET (all versions) or am I just pulling the XML wrong?
The c# code
var doc = new XmlDocument();
doc.Load(#"D:\temp\Sample.xml");
var tasks = doc.SelectSingleNode("Server/Tasks");
foreach (XmlNode threadNode in tasks.ChildNodes)
{
if (threadNode.Name.ToLower() != "thread")
{
continue;
}
foreach (XmlNode taskNode in threadNode.ChildNodes)
{
if (taskNode.Name.ToLower() != "task" || taskNode.Attributes["name"].Value != "task 1")
{
continue;
}
var settings = taskNode.SelectSingleNode("Settings");
var emails = settings.SelectSingleNode("//EmailAddresses");
Console.WriteLine(emails.InnerText);
}
}
The XML
<?xml version="1.0"?>
<Server>
<Tasks>
<Thread>
<Task name="task 2">
<Settings>
<EmailAddresses>task 2 data</EmailAddresses>
</Settings>
</Task>
</Thread>
<Thread>
<Task name="task 1">
<Settings>
<EmailAddresses>task 1 data</EmailAddresses>
</Settings>
</Task>
</Thread>
</Tasks>
</Server>
From http://www.w3.org/TR/xpath/#path-abbrev
// is short for
/descendant-or-self::node()/. For
example, //para is short for
/descendant-or-self::node()/child::para
and so will select any para element in
the document (even a para element that
is a document element will be selected
by //para since the document element
node is a child of the root node);
And also:
A location step of . is short for
self::node(). This is particularly
useful in conjunction with //. For
example, the location path .//para
is short for
self::node()/descendant-or-self::node()/child::para
and so will select all para descendant
elements of the context node.
Instead of:
var settings = taskNode.SelectSingleNode("Settings");
var emails = settings.SelectSingleNode("//EmailAddresses");
Use:
var emails = taskNode.SelectSingleNode("Settings/EmailAddresses");
The // XPath expression does not do what you think it does. It selects nodes in the document from the current node that match the selection no matter where they are.
In other words, it's not limited by the current scope, it actually crawls back up the document tree and starts matching from the root element.
To select the first <EmailAddresses> element in your current scope, you only need:
var emails = settings.SelectSingleNode("EmailAddresses");