Get XML from XPathDocument - c#

I am working on a stylesheet and have some initial XML. However the XML is being manipulated a bit before styling and i would like to get the final XML sent into .Transform(). For instance, ...
XslCompiledTransform.Transform( xpd, xslArg, output )
...i would like to get the Xml content of xpd (as a string), so i can work on the stylesheet in other tools.
Is there a quick-and-dirty way to get this? Either in the VS2010 immediate window or as a quick C# line or two before the call to .Transform()?
EDIT: The .Transform() i'm using is
public void Transform(IXPathNavigable input,
XsltArgumentList arguments, TextWriter results);
...and xpd is an XPathDocument.

Edit: I misunderstood the intent of your question. The simple answer is to get the XML for any IXPathNavigable (which includes XPathDocument), you can do this:
string xml = xpd.CreateNavigator().OuterXml;
Below is my original answer, which explains how you could modify the XML from an XPathDocument in code before feeding it into a transform:
If xpd is an XPathDocument, you might be able to just get an XPathNavigator from the XPathDocument:
XPathNavigator xpn = xpd.CreateNavigator();
and use that to modify the XML. When you're done modifying it, you can just pass either xpn or xpd into the Transform() method. On the other hand, MSDN says that XPathDocument's CreateNavigator() creates a readonly navigator, so that may be a bit of a hitch.
If it really is readonly, you should be able to do this:
XmlDocument doc = new XmlDocument();
doc.LoadXml(xpd.CreateNavigator().OuterXml);
then use doc to modify the XML and pass doc into the transform when you're done.

Related

C# Xml Encoding

I'm freaking out with C# and XmlDocuments right now.
I need to parse XML data into another XML but I can't get special characters to work.
I'm working with XmlDocument and XmlNode.
What I tried so far:
- XmlDocument.CreateXmlDeclaration("1.0", "UTF-8", "yes");
- XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
What I know for sure:
- The input XML is also UTF-8
- The "InnerText" value is encoded without replacing the characters
Here is some code (not all... way to much code):
XmlDocument newXml = new XmlDocument();
newXml = (XmlDocument)systemsTemplate.Clone();
newXml.CreateXmlDeclaration("1.0", "UTF-8", "yes");
newXml.SelectSingleNode("systems").RemoveAll();
foreach(XmlNode categories in exSystems.SelectNodes("root/Content/Systems/SystemLine"))
{
XmlNode categorieSystemNode = systemsTemplate.SelectSingleNode("systems/system").Clone();
categorieSystemNode.RemoveAll();
XmlNode importIdNode = systemsTemplate.SelectSingleNode("systems/system/import_id").Clone();
string import_id = categories.Attributes["nodeName"].Value;
importIdNode.InnerText = import_id;
categorieSystemNode.AppendChild(importIdNode);
[way more Nodes which I proceed like this]
}
newXml.SelectSingleNode("systems").AppendChild(newXml.ImportNode(categorieSystemNode, true));
XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
writer.Formatting = Formatting.Indented;
newXml.Save(writer);
writer.Flush();
writer.Close();
But what I get is this as an example:
<intro><p>Whether your project [...]</intro>
Instead of this:
<intro><p>Whether your project [...] </p></intro>
I do have other non-html tags in the XML so please don't provide HTML-parsing solutions :/
I know I could replace the characters with String.Replace() but that's dirty and unsafe (and slow with around 20K lines).
I hope there is a simpler way of doing this.
Kind regards,
Eriwas
The main propose of XmlDocument is to provide an easy way to work with XML documents while making sure the outcome is a well formed document.
So, using InnerText as in your example, you let the framework encode the string and properly insert it into that document. Whenever you read that same value, it will be decoded and returned to you exactly as your original string.
But, if you want to add an XML fragment anyways, you should stick with InnerXml or ImportNode. You must be aware that could lead to a more complex document structure, and you probably would like to avoid that.
As a third possibility, you can use the CreateCDataSection to add a CDATA and add your text there.
You definitely should be away from treating that XML document as a string by trying Replace things; stick with the framework and you'll be ok.

how to change xml string value on fly

I have an xml response string and I want to change a value inside and log it.
<xml>
<ns2:abcd>
<password>sample</password>
</ns2:abcd>
I want to change the password value into encrypted version.
I am have tried using XmlDocument.SelectSingleNode but was thinking is there any better way than this?
Btw you need ns2 namespace to be declared, otherwise your xml will not be valid. After adding namespace definition, you can parse and modify your xml with Linq to Xml:
XDocument xdoc = XDocument.Parse(xml);
var passwordElement = xdoc.XPathSelectElement("//password");
passwordElement.Value = Encrypt((string)passwordElement);
xdoc.Save(path_to_xml);
No - there is no better way than using proper XML classes.
XmlDocument or XDocument would be perfectly fine for this task. If you XML is very large you may want to look into streaming with XmlReader (unlikely necessary in your case).
You might also consider looking into xsd.exe. With xsd.exe, you can deserialize your xml into a type-safe object model. From there, it's easy to manipulate the data.

XML namespaces and XPath

I have an application that has to load XML document and output nodes depending on XPath.
Suppose I start with a document like this:
<aaa>
...[many nodes here]...
<bbb>text</bbb>
...[many nodes here]...
<bbb>text</bbb>
...[many nodes here]...
</aaa>
With XPath //bbb
So far everything is nice.
And selection doc.SelectNodes("//bbb"); returns the list of required nodes.
Then someone uploads a document with one node like <myfancynamespace:foo/> and extra namespace in the root tag, and everything breaks.
Why? //bbb does not give a damn about myfancynamespace, theoretically it should even be good with //myfancynamespace:foo, as there is no ambiguity, but the expression returns 0 results and that's it.
Is there a workaround for this behavior?
I do have a namespace manager for the document, and I am passing it to the Xpath query. But the namespaces and the prefixes are unknown to me, so I can't add them before the query.
Do I have to pre-parse the document to fill the namespace manager before I do any selections? Why on earth such behavior, it just doesn't make sense.
EDIT:
I'm using:
XmlDocument and XmlNamespaceManager
EDIT2:
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null;
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
//I wish I could:
//nsmgr.AddNamespace("magic", "http://magicnamespaceuri/
//...
doc.LoadXML(usersuppliedxml);
XmlNodeList nodes = doc.SelectNodes(usersuppliedxpath, nsmgr);//usersuppliedxpath -> "//bbb"
//nodes.Count should be > 0, but with namespaced document they are 0
EDIT3:
Found an article which describes the actual scenario of the issue with one workaround, but not very pretty workaround: http://codeclimber.net.nz/archive/2008/01/09/How-to-query-a-XPath-doc-that-has-a-default.aspx
Almost seems that stripping the xmlns is the way to go...
You're missing the whole point of XML namespaces.
But if you really need to perform XPath on documents that will use an unknown namespace, and you really don't care about it, you will need to strip it out and reload the document. XPath will not work in a namespace-agnostic way, unless you want to use the local-name() function at every point in your selectors.
private XmlDocument StripNamespace(XmlDocument doc)
{
if (doc.DocumentElement.NamespaceURI.Length > 0)
{
doc.DocumentElement.SetAttribute("xmlns", "");
// must serialize and reload for this to take effect
XmlDocument newDoc = new XmlDocument();
newDoc.LoadXml(doc.OuterXml);
return newDoc;
}
else
{
return doc;
}
}
<myfancynamespace:foo/> is not necessarily the same as <foo/>.
Namespaces do matter. But I can understand your frustration as they usually tend to breaks codes as various implementation (C#, Java, ...) tend to output it differently.
I suggest you change your XPath to allow for accepting all namespaces. For example instead of
//bbb
Define it as
//*[local-name()='bbb']
That should take care of it.
You should describe a bit more detailed what you want to do. The way you ask your question it make no sense at all. The namespace is just a part of the name. Nothing more, nothing less. So your question is the same as asking for an XPath query to get all tags ending with "x". That's not the idea behind XML, but if you have strange reasons to do so: Feel free to iterate over all nodes and implement it yourself. The same applies to functionality you are requesting.
You could use the LINQ XML classes like XDocument. They greatly simplify working with namespaces.

C#: Line information when parsing XML with XmlDocument

What are my options for parsing an XML file with XmlDocument and still retain line information for error messages later on? (as an aside, is it possible to do the same thing with XML Deserialisation?)
Options seem to include:
Extending the DOM and using IXmlLineInfo
Using XPathDocument
The only other option I know of is XDocument.Load(), whose overloads accept LoadOptions.SetLineInfo. This would be consumed in much the same way as an XmlDocument.
Example
(Expanding answer from #Andy's comment)
There is no built in way to do this using XmlDocument (if you are using XDocument, you can use the XDocument.Load() overload which accepts LoadOptions.SetLineInfo - see this question).
While there's no built-in way, you can use the PositionXmlDocument wrapper class from here (from the SharpDevelop project):
https://github.com/icsharpcode/WpfDesigner/blob/5a994b0ff55b9e8f5c41c4573a4e970406ed2fcd/WpfDesign.XamlDom/Project/PositionXmlDocument.cs
In order to use it, you will need to use the Load overload that accepts an XmlReader (the other Load overloads will go to the regular XmlDocument class, which will not give you line number information). If you are currently using the XmlDocument.Load overload that accepts a filename, you will need to change your code as follows:
using (var reader = new XmlTextReader(filename))
{
var doc = new PositionXmlDocument();
doc.Load(reader);
}
Now, you should be able to cast any XmlNode from this document to a PositionXmlElement to retrieve line number and column:
var node = doc.ChildNodes[1];
var elem = (PositionXmlElement) node;
Console.WriteLine("Line: {0}, Position: {1}", elem.LineNumber, elem.LinePosition);

Best way to read, modify, and write XML

My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.
The document is a HTML document, so as you can imagine, it's quite complicated.
I'd like to search for an element id within the HTML document, so for example look for this and change the src;
<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />
If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.
You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.
If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.
For your specific example:
XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
// src will be null if the attribute is missing
string src = (string) img.Attribute("src");
img.SetAttributeValue("src", src + "with-changes");
}
Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.
XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
using(XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8)) {
xtw.Formatting = Formatting.Indented; // optional, if you want it to look nice
doc.WriteContentTo(xtw);
}
Depending on the structure of the input XML, this could make your parsing code a bit simpler.
Here's a tool I wrote to modify an IAR EWARM project (ewp) file, adding a linker define to the project. From the command line, you run it with 2 arguments, the input and output file names (*.ewp).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
namespace ewp_tool
{
class Program
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
XmlNodeList list = doc.SelectNodes("/project/configuration[name='Debug']/settings[name='ILINK']/data/option[name='IlinkConfigDefines']/state");
foreach(XmlElement x in list) {
x.InnerText = "MAIN_APP=1";
}
using (XmlTextWriter xtw = new XmlTextWriter(args[1], Encoding.UTF8))
{
//xtw.Formatting = Formatting.Indented; // leave this out, it breaks EWP!
doc.WriteContentTo(xtw);
}
}
}
}
The structure of the XML looks like this
<U+FEFF><?xml version="1.0" encoding="iso-8859-1"?>
<project>
<fileVersion>2</fileVersion>
<configuration>
<name>Debug</name>
<toolchain>
<name>ARM</name>
</toolchain>
<debug>1</debug>
...
<settings>
<name>ILINK</name>
<archiveVersion>0</archiveVersion>
<data>
...
<option>
<name>IlinkConfigDefines</name>
<state>MAIN_APP=0</state>
</option>
If you have smaller documents which fit in computers memory you can use XmlDocument.
Otherwise you can use XmlReader to iterate through the document.
Using XmlReader you can find out the elements type using:
while (xml.Read()) {
switch xml.NodeType {
case XmlNodeType.Element:
//Do something
case XmlNodeType.Text:
//Do something
case XmlNodeType.EndElement:
//Do something
}
}
For the task in hand - (read existing doc, write, and modify in a formalised way) I'd go with XPathDocument run through an XslCompiledTransform.
Where you can't formalise, don't have pre-existing docs or generally need more adaptive logic, I'd go with LINQ and XDocument like Skeet says.
Basically if the task is transformation then XSLT, if the task is manipulation then LINQ.
My favorite tool for this kind of thing is HtmlAgilityPack. I use it to parse complex HTML documents into LINQ-queryable collections. It is an extremely useful tool for querying and parsing HTML (which is often not valid XML).
For your problem, the code would look like:
var htmlDoc = HtmlAgilityPack.LoadDocument(stringOfHtml);
var images = htmlDoc.DocumentNode.SelectNodes("//img[id=lookforthis]");
if(images != null)
{
foreach (HtmlNode node in images)
{
node.Attributes.Append("alt", "added an alt to lookforthis images.");
}
}
htmlDoc.Save('output.html');
One fairly easy approach would be to create a new XmlDocument, then use the Load() method to populate it. Once you've got the document, you can use CreateNavigator() to get an XPathNavigator object that you can use to find and alter elements in the document. Finally, you can use the Save() method on the XmlDocument to write the changed document back out.
Just start by reading the documentation of the Xml namespace on the MSDN. Then if you have more specific questions, post them here...

Categories

Resources