Parsing XFDL Contents - C# - c#

I am tasked with ripping and stripping pertinent data from XFDL files. I am attempting to use XmlDocument's SelectSignleNode method to do so. However, it has proven unsuccessful.
Represntative XML:
<XFDL>
...
<page1>
<check3>true</check3>
</page1>
...
<page sid="PAGE1">
<check sid="CHECK9">
<value>true</value>
</check>
</page>
...
Code:
XmlDocument document = new XmlDocument();
document.Load(memoryStream);//decoded and unzipped xfdl file
//Doesn't work
XmlNode checkBox = document.SelectSingleNode("//check[#sid='CHECK9']/value");
//Doesn't work
XmlNode checkBox = document.SelectSingleNode("//page[#sid='PAGE1']/check[#sid='CHECK9']");
MsgBox(checkBox.InnerXml);
Yields me System.NullReferenceException as an XmlNode isn't selected.
I think I'm having an xpath issue but I can't seem to understand where. The earlier xml node is easily selected using:
XmlNode checkBox = document.SelectSingleNode("//page1/check3");
MsgBox(checkBox.InnerText);
Displays just fine. And just to head it off at the pass, there isn't a definition of <check9></check9> in the <page1> tag.
Anyone have some insight?
Thanks in advance.

Okay, so here's the deal. XFDL defines a default namespace that requires an arbitrary mapping for xpath querying. In my case:
XML:
<XFDL xmlns="http://www.ibm.com/xmlns/prod/xfdl/8.0" ... >
Code:
manager.AddNamespace("a", "http://www.ibm.com/xmlns/prod/xfdl/8.0");
//Append 'a:' to query elements
document.SelectSingleNode("//a:check[#sid='CHECK9']/a:value", manager);
The problem is compounded because <check> is buried in <page> which is defined in another namespace: xfdl. My xpath query becomes:
document.SelectSingleNode("//xfdl:page[#sid='PAGE1']/a:check[#sid='CHECK9']/a:value", manager);
Now this is rather XFDL specific but can be applied to other issues where there are multiple namespaces defined within an XML document.
EDIT 1
Source: http://codeclimber.net.nz/archive/2008/01/09/How-to-query-a-XPath-doc-that-has-a-default.aspx

Related

How to get list of values when XML contains many namespaces

I have following XML:
<?xml version="1.0" encoding="utf-16"?>
<cincinnati xmlns="http://www.sesame-street.com/abc/def/1">
<cincinnatiChild xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ElementValue xmlns:a="http://schemas.data.org/2004/07/sesame-street.abc.def.ghi">
<a:someField>false</a:someField>
<a:data xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<b:KeyValueThing>
<b:Key>key1</b:Key>
<b:Value i:type="a:ArrayOfPeople">
<a:Person>
<a:firstField>
</a:firstField>
<a:dictionary>
<b:KeyValueThing>
<b:Key>ID</b:Key>
<b:Value i:type="c:long" xmlns:c="http://www.w3.org/2001/XMLSchema">000101</b:Value>
</b:KeyValueThing>
<b:KeyValueThing>
<b:Key>Name</b:Key>
<b:Value i:type="c:string" xmlns:c="http://www.w3.org/2001/XMLSchema">John</b:Value>
</b:KeyValueThing>
</a:dictionary>
</a:Person>
<a:Person>
...
<b:Value i:type="c:long" xmlns:c="http://www.w3.org/2001/XMLSchema">000102</b:Value>
...
</a:Person>
</b:Value>
</b:KeyValueThing>
</a:data>
</ElementValue>
</cincinnatiChild>
</cincinnati>
I need to get a list if ID values, e.g. 000101, 000102....
I think using XPath makes sense here but the multitude of namespaces makes it confusing (so a simple XmlNamespaceManager won't do).
Basically I need something like this (this syntax is of course not correct):
XmlDocument doc = // Load the xml
doc.DocumentElement.SelectSingleNode("/cincinati/cincinnatiChild/ElementValue/a:data/b:KeyValueThing/b:Value/a:Person/a:dictionary[b:KeyValueThing/b:Key='ID']");
also when I do doc.DocumentElement.SelectSingleNode("/cincinnati") or doc.DocumentElement.SelectSingleNode("/cincinnatiChild") I get null.
Since I'm unsure how to piece together al the helpfull advice from the comments I would like to see a working c# code line, either XmlDocument or XDocument are OK.
I also tries this before the SelectSingleNode:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("a", "http://schemas.data.org/2004/07/sesame-street.abc.def.ghi");
nsmgr.AddNamespace("b", "http://schemas.microsoft.com/2003/10/Serialization/Arrays");
nsmgr.AddNamespace("c", "http://www.w3.org/2001/XMLSchema");
nsmgr.AddNamespace("i", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("d", "http://www.sesame-street.com/abc/def/1");
You could bypass the namespace issues using the local-name()-function.
i.e.: in stead of b:KeyValueThing use *[local-name()='KeyValueThing']
a simple global XPath could look like this:
//*[local-name()='KeyValueThing'][*[local-name()='Key' and text()='ID' ]]/*[local-name()='Value']/text()
If you want to be more precise and speed up the XPath it would look like this:
/*[local-name()='cincinnati']/*[local-name()='cincinnatiChild']/*[local-name()='ElementValue']/*[local-name()='data']/*[local-name()='KeyValueThing']/*[local-name()='Value']/*[local-name()='Person']/*[local-name()='dictionary']/*[local-name()='KeyValueThing'][*[local-name()='Key' and text()='ID' ]]/*[local-name()='Value']/text()

Insert a node into a XML file

I am trying to add a single line/node (provided below) into an XML:
<Import Project=".www\temp.proj" Condition="Exists('.www\temp.proj')" />
The line could be under the main/root node of the XML:
<Project Sdk="Microsoft.NET.Sdk">
The approach I used:
XmlDocument Proj = new XmlDocument();
Proj.LoadXml(file);
XmlElement root = Proj.DocumentElement;
// Not sure about the next steps
root.SetAttribute("not sure", "not sure", "not sure");
Though I don't exactly know how to add that line in the XML, cause it was my first try on directly editing XML files, the error caused an extra problem over it.
I get this error on my first attempt:
C# "loadxml" 'Data at the root level is invalid. Line 1, position 1.'
Know this error was a famous one, which some provided a variety of approaches in this link:
xml.LoadData - Data at the root level is invalid. Line 1, position 1
Unfortunately, most of the solutions are outdated, the answer didn't work on this case, and I don't know how to apply others on this case.
Provided/accepted answer on the link for that issue:
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}
Basically it didn't work, cause xml.StartsWith seems not existing anymore, at the same time xml.Remove also doesn't exist.
Can you please provide a piece of code that bypass the error and add the line to the XML?
Edit:
The sample XML file is provided in the comments section.
For the Xml posted in the comment, I have used two approachs :
1 - XmlDocument
XmlDocument Proj = new XmlDocument();
Proj.Load(file);
XmlElement root = Proj.DocumentElement;
//Create node
XmlNode node = Proj.CreateNode(XmlNodeType.Element, "Import", null);
//create attributes
XmlAttribute attrP = Proj.CreateAttribute("Project");
attrP.Value = ".www\\temp.proj";
XmlAttribute attrC = Proj.CreateAttribute("Condition");
attrC.Value = "Exists('.www\\temp.proj')";
node.Attributes.Append(attrP);
node.Attributes.Append(attrC);
//Get node PropertyGroup, the new node will be inserted before it
XmlNode pG = Proj.SelectSingleNode("/Project/PropertyGroup");
root.InsertBefore(node, pG);
Console.WriteLine(root.OuterXml);
2 - Linq To Xml, by using XDocument
XDocument xDocument = XDocument.Load(file);
xDocument.Root.AddFirst(new XElement("Import",
new XAttribute[]
{
new XAttribute("Project", ".www\\temp.proj"),
new XAttribute("Condition", "Exists('.www\\temp.proj')")
}));
Console.WriteLine(xDocument);
Namespace to add for XDocument:
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
Both solutions give the same result, but the last one is simple.
I hope you find this helpful.
Would it be possible for you to use the official MSBuild libraries?(https://www.nuget.org/packages/Microsoft.Build/)
I'm not sure which nuget package is actually required to read and edit project files only.
I've tried to programatically edit MSBuild project files directly and can not recommend it. It broke regulary due to unexpected changes...
The MSBuild library does a good job in editing project files and e.g. adding Properties, Items or Imports.

Getting child nodes in XML with custom Namespace

I have a rather large XML file from a computer diagnostics session, and my goal is to grab the test results data and pump it into a PDF for the customer. I've very little experience with XML and this is turning out to be a huge problem.
Here is a sample of the Document:
<pcd:DiagLog xmlns="http://www.pc-doctor.com/2004/8/diagLogger"
xmlns:pcd="http://www.pc-doctor.com/2004/8/diagLogger"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.pc-doctor.com/2004/8/diagLogger
http://www.pc-doctor.com/2004/8/diagLogger/diagLogger.xsd">
<Application>
<version>6.0.6818.10</version>
<Start>
<Time hour="04" minute="14" second="01" millisecond="34" month="10" day="15" year="2016" utcOffset="-480">2016-10-15T04:14:01.034-08:00</Time>
</Start>
<OS>Windows 10 Service Pack 0 PE x86 32-bit</OS>
</Application>
.......
<DiagInfo>
....
<TestResult EnglishResult="PASS">
....
</TestResult>
</DiagInfo>
There are thousands of lines between </Application> and <DiagInfo>, but I'm only concerned with the information found in <DiagInfo> and <TestResult>.
I thought I could grab the Nodes by simply:
XmlDocument doc = new XmlDocument();
doc.Load(xmlFilePath);
XmlNamespaceManager manager = new XmlNamespaceManager(doc.NameTable);
manager.AddNamespace("pcd", "http://www.pc-doctor.com/2004/8/diagLogger");
XmlNodeList xnList = doc.SelectNodes("/pcd:DiagLog/DiagInfo", manager);
But this is returning an empty list. When I refer to Namespace Manager or XsltContext needed, it appears I'm doing it right, but I don't think I'm understanding adding a namespace correctly. When I change the Root Element to just: <Diagnostics></Diagnostics> instead of the <pcd:DiagLog>, and try: doc.SelectNodes("/Diagnostics/DiagInfo", manager); my nodes list is populated.
Can anyone see where I'm screwing up the Namespace?
You need to use the namespace prefix for all nodes in that namespace.
This is incorrect: /pcd:DiagLog/DiagInfo.
This is correct: /pcd:DiagLog/pcd:DiagInfo.

What is wrong with this file or code?

What is happening \ what is the difference ?
I'm trying to return a specific node from an XML File.
XML File:
<?xml version="1.0" encoding="utf-8"?>
<JMF SenderID="InkZone-Controller" Version="1.2">
<Command ID="cmd.00695" Type="Resource">
<ResourceCMDParams ResourceName="InkZoneProfile" JobID="K_41">
<InkZoneProfile ID="r0013" Class="Parameter" Locked="false" Status="Available" PartIDKeys="SignatureName SheetName Side Separation" DescriptiveName="Schieberwerte von DI" ZoneWidth="32">
<InkZoneProfile SignatureName="SIG1">
<InkZoneProfile Locked="False" SheetName="S1">
<InkZoneProfile Side="Front" />
</InkZoneProfile>
</InkZoneProfile>
</InkZoneProfile>
</ResourceCMDParams>
</Command>
<InkZoneProfile Separation="Cyan" ZoneSettingsX="0 0,005 " />
</JMF>
Code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("C:\\test\\test.xml");
XmlNode root = xmlDoc.DocumentElement;
var parent = root.SelectSingleNode("/JMF/Command/ResourceCmdParams/InkZoneProfile/InkZoneProfile/InkZoneProfile/InkZoneProfile");
XmlElement IZP = xmlDoc.CreateElement("InkZoneProfile");
IZP.SetAttribute("Separation", x.colorname);
IZP.SetAttribute("ZoneSettingsX", x.colorvalues);
xmlDoc.DocumentElement.AppendChild(IZP);
xmlDoc.Save("C:\\test\\test.xml");
The var parent returns me null. I've debugged , and root and xmlDoc have on their inner text the XML Content.
But, a test made here(made by user #har07 , on the previous question:
SelectSingleNode returns null even with namespace managing
Worked without problems.
https://dotnetfiddle.net/vJ8h9S
What is the difference between those two ? They follow the same code basically, but one works and other doesn't.
When debugging i've found that root.InnerXml has the contents loaded on itself (same as XmlDoc.InnerXml ). But InnerXml doesn't implement a method to SelectSingleNode. I believe that if i save it to a string i'll probably lose indentation and etc.
Can someone tell me what is the difference or what is wrong ? Thanks !
XML Sample: https://drive.google.com/file/d/0BwU9_GrFRYrTUFhMYWk5blhhZWM/view?usp=sharing
SetAttribute don't auto escape string for you. Therefore it make your XML file invalid.
From MSDN about XmlElement.SetAttribute
Any markup, such as syntax to be recognized as an entity reference, is treated as literal text and needs to be properly escaped by the implementation when it is written out
Find in your code all line contain SetAttribute and use SecurityElement.Escape to escape the value.
For example: Change these lines:
IZP.SetAttribute("Separation", x.colorname);
IZP.SetAttribute("ZoneSettingsX", x.colorvalues);
To:
using System.Security;
IZP.SetAttribute("Separation", SecurityElement.Escape(x.colorname));
IZP.SetAttribute("ZoneSettingsX", SecurityElement.Escape(x.colorvalues));
If an attribute have name contains any of <>"'& you also have to escape it like the value.
Note:
You have to delete current xmls you create used the old code, because it is invalid, when you load it will cause exception.

Pivotviewer's .cxml parsing

I'm trying to do very simple operations on a .cxml file. As you know it's basically an .xml file. This is a sample file I created to test the application:
<?xml version="1.0" encoding="utf-8"?>
<Collection xmlns:p="http://schemas.microsoft.com/livelabs/pivot/collection/2009" SchemaVersion="1.0" Name="Actresses" xmlns="http://schemas.microsoft.com/collection/metadata/2009">
<FacetCategories>
<FacetCategory Name="Nationality" Type="LongString" p:IsFilterVisible="true" p:IsWordWheelVisible="true" p:IsMetaDataVisible="true" />
</FacetCategories>
<!-- Other entries-->
<Items ImgBase="Actresses_files\go144bwo.0ao.xml" HrefBase="http://www.imdb.com/name/">
<Item Id="2" Img="#2" Name="Anna Karina" Href="nm0439344/">
<Description> She is a nice girl</Description>
<Facets>
<Facet Name="Nationality">
<LongString Value="Danish" />
</Facet>
</Facets>
</Item>
</Items>
<!-- Other entries-->
</Collection>
I can't get any functioning simple code like:
XDocument document = XDocument.Parse(e.Result);
foreach (XElement x in document.Descendants("Item"))
{
...
}
The test on a generic xml is working. The cxml file is correctly loaded in document.
While watching the expression:
document.Descendants("Item"), results
the answer is:
Empty "Enumeration yielded no results" string
Any hint on what can be the error? I've also add a quick look to get Descendants of Facet, Facets, etc., but there are no results in the enumeration. This obviously doesn't happen with a generic xml file I used for testing. It's a problem I have with .cxml.
Basically your XML defines a default namespace with the xmlns="http://schemas.microsoft.com/collection/metadata/2009" attribute:
That means you need to fully qualify your Descendants query e.g.:
XDocument document = XDocument.Parse(e.Result);
foreach (XElement x in document.Descendants("{http://schemas.microsoft.com/collection/metadata/2009}Item"))
{
...
}
If you remove the default namespace from the XML your code actually works as-is, but that is not the aim of the exercise.
See Metadata.CXML project under http://github.com/Zoomicon/Metadata.CXML sourcecode for LINQ-based parsing of CXML files.
Also see ClipFlair.Metadata project at http://github.com/Zoomicon/ClipFlair.Metadata for parsing one's CXML custom facets too
BTW, at http://ClipFlair.codeplex.com can checkout the ClipFlair.Gallery project for how to author ASP.net web-based forms to edit metadata fragments (parts of CXML files) and merge them together in a single one (that you then convert periodically to DeepZoom CXML with PAuthor tool from http://pauthor.codeplex.com).
If anyone is interested in doing nesting (hierarchy) of CXML collections see
http://github.com/Zoomicon/Trafilm.Metadata
and
http://github.com/Zoomicon/Trafilm.Gallery

Categories

Resources