I'm putting this here because I saw a lot of Q&A for XML on StackOverflow while trying to solve my own problems, and figured that once I'd found it, I'd post what I found so when someone else needs some XML help, this might help them.
My goal: To create an XML document that contains the following XML Declaration, Schema & Namespace Information:
<?xml version="1.0" encoding="UTF-8"?>
<abc:abcXML xsi:schemaLocation="urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ase="urn:abcXML:v12">
I'd already done it in Python for a quick prototype using minidom, and it was very simple. I needed to do it in a .NET language though (C#), because that's what the business calls for. I'm quite familiar with C#, but I've always stayed away from processing XML with it because I honestly don't have an in-depth grasp of XML and it's guidelines. Today, I had to face my demons.
Here's how I did it:
The first part is simple enough - create a document, and create a DocumentElement for the root (there's a catch here which I get to later):
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmlDeclaration, root);
The next part seems simple enough - create an element, give it a prefix, name and URI, then append it to the document. I thought this would work, but it doesn't (this is where the minimal understanding of XML comes into play):
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38 http://www.w3.org/2001/XMLSchema-instance");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd");
abcXML.AppendChild(xmlAttr);
xmlDoc.AppendChild(abcXML);
I tried to use doc.LoadXml() and doc.CreateDocumentFragment() and write my own declarations. No - I would get "Unexpected end of file". For those interested in XmlDocumentFragment: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocumentfragment.innerxml?view=netcore-3.1
This Microsoft article about XML Schemas and Namespaces didn't directly help me: https://learn.microsoft.com/en-us/dotnet/standard/data/xml/including-or-importing-xml-schemas
After doing more reading on XML, and going through the documentation for XmlDocument, XmlElement and XmlAttribute, this is the solution:
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "http://www.w3.org/2001/XMLSchema-instance");
xmlAttr.InnerXml = "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd";
abcXML.Attributes.Append(xmlAttr);
xmlDoc.AppendChild(abcXML);
Now you can add the elements to your document like so:
XmlElement header = doc.CreateElement(string.Empty, "Header", string.Empty);
abcXML.AppendChild(header);
To save the document, I used:
xmlDoc.Save(fileLocation);
I compared my output to the sample I had, and after comparing the file contents, I had succeeded in matching it. I provided the output to the client, they uploaded it into application they were using, and it failed: Row 1, Column 1 - Unexpected Character.
I had a suspicion it was encoding, and I was right. Using xmlDoc.Save(fileLocation) is correct, but it generates a UTF-8 file with the Byte Order Mark (BOM) at Row 1, Column 1. The XML parsing function in the application doesn't expect that, so the process failed. To fix that, I used the following method:
Encoding enc = new UTF8Encoding(false); /* This creates a UTF-8 encoding without the BOM */
using (System.IO.TextWriter tw = new System.IO.StreamWriter(filePath, false, enc))
{
xmlDoc.Save(tw);
}
return true;
I generated the file again, sent it to the client, and it worked first go.
I hope someone finds this to be useful.
For complicated namespaces it is simpler to just parse the xml string. I like using xml linq. You sample xml is wrong. The namespace is "ase" (not abc).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<ase:abcXML xsi:schemaLocation=\"urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd\"" +
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"" +
" xmlns:ase=\"urn:abcXML:v12\">" +
"</ase:abcXML>";
XDocument doc = XDocument.Parse(xml);
XElement root = doc.Root;
XNamespace nsAse = root.GetNamespaceOfPrefix("ase");
}
}
}
Related
I'm working on a tool for validating XML files grabbed from a mainframe. For reasons beyond my control every XML file is encoded in ISO 8859-1.
<?xml version="1.0" encoding="ISO 8859-1"?>
My C# application utilizes the System.XML library to parse the XML and eventually a string of a message contained within one of the child nodes.
If I manually remove the XML encoding line it works just fine. But i'd like to find a solution that doesn't require manual intervention. Are there any elegant approaches to solving this? Thanks in advance.
The exception that is thrown reads as:
System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31
My code is
XMLDocument xmlDoc = new XMLDocument();
xmlDoc.Load(//fileLocation);
As Jeroen pointed out in a comment, the encoding should be:
<?xml version="1.0" encoding="ISO-8859-1"?>
not:
<?xml version="1.0" encoding="ISO 8859-1"?>
(missing dash -).
You can use a StreamReader with an explicit encoding to read the file anyway:
using (var reader = new StreamReader("//fileLocation", Encoding.GetEncoding("ISO-8859-1")))
{
var xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
// ...
}
(from answer by competent_tech in other thread I linked in an earlier comment).
If you do not want the using statement, I guess you can do:
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));
Instead of XmlDocument, you can use the XDocument class in the namespace System.Xml.Linq if you refer the assembly System.Xml.Linq.dll (since .NET 3.5). It has static methods like Load(Stream) and Parse(string) which you can use as above.
I am new to XDocument but I have been looking around for a solution to this problem which I couldn't get to fix.
I need to load some kind of XML files (PNML) that comes this way:
<pnml xmlns="http://www.pnml.org/version-2009/grammar/pnml">
<net id="id" type ="http://www.pnml.org/version-2009/grammar/ptnet">
..........</net> </pnml>
And I couldn't get to load these kind of files unless I add "xmlns" as an Attribute to the node net .
Meanwhile, the files I create myself has this xmlns attribute, and I can load them without problems.
While, files that are generated from some other software that I need to be able to use from my software doesn't has this "xmlns" attribute, and if I add it myself to the files generated by this software, I can load those files.
Here's the code I am using to Load :
XDocument doc = XDocument.Load(file);
XNamespace ns = #"http://www.pnml.org/version-2009/grammar/pnml";
foreach (XElement element in doc.Element(ns + "pnml")
.Elements("net").Elements("page").Elements("place"))
{ // Do my loading to "place" nodes for example }
But whenever I try to load a file, it just skips my "foreach" statement, and if I add some line before "foreach" like:
string id= (string) doc.Element(ns + "pnml")
.Element("net").Attribute("id");
it says:
Object reference not set to an instance of an object.
Here's an example of a file generated by my code and also can be read from my code:
<?xml version="1.0" encoding="utf-8"?>
<pnml xmlns="http://www.pnml.org/version-2009/grammar/pnml">
<net id="netid" type="http://www.pnml.org/version-2009/grammar/ptnet" xmlns="">
nodes and information </net> </pnml>
NOTE: I use this code to save my files:
XNamespace ns = #"http://www.pnml.org/version-2009/grammar/pnml";
XDocument doc = new XDocument
(
new XElement(ns+"pnml"
, new XElement("net",new XAttribute("id", net_id), ...));
I found a way to save my files without this "xmlns" attribute, but once I omit it, I can't load it from my code. And the first example I wrote is the standard format and I really need to get ride of the "xmlns" problem.
EDIT: I'm sorry if you got confused, what I want is to be able to load the standard PNML files that doesn't have thise "xmlns" attribute within the "net" node.
What you're missing is that element namespaces are inherited from their parents.
So your XML:
<pnml xmlns="http://www.pnml.org/version-2009/grammar/pnml">
<net id="id" type ="http://www.pnml.org/version-2009/grammar/ptnet">
...
Contains two elements. One is pnml with the namespace http://www.pnml.org/version-2009/grammar/pnml, and the child is net which also has the namespace http://www.pnml.org/version-2009/grammar/pnml.
With this in mind, your query on the existing XML should be:
doc.Element(ns + "pnml").Elements(ns + "net")...
And your code to generate the XML should be:
new XElement(ns + "pnml",
new XElement(ns + "net", new XAttribute("id", net_id), ...));
Try something like this
var result = doc.Element(ns + "pnml").Descendants().Where(x=>x.Name.LocalName=="net")
I am trying to extract some information using XPath from an XBRL file (eXtensible Business Reporting Language), which is basically just an XML file.
Here is an example file
The file has multiple namespace declarations and these declarations change from file to file, sometimes.
Can you please help to write the XPath to extract the data in the node "dei:EntityRegistrantName", using C#?
I have tried multiple articles on the internet but can't figure this out.
Using this XML Library, I use a simple element get. The library figures out the namespace for me:
XElement root = XElement.Load(file); // or .Parse(string)
var a = root.XPathElement("//dei:EntityRegistrantName");
Console.WriteLine(a.ToString());
The output is (formatted for readability):
<dei:EntityRegistrantName
contextRef="eol_PE8528----1510-K0009_STD_365_20150630_0"
id="id_6568047_FBD9ABEE-63B9-43BD-B87B-EFE7CC59EFB0_1_400001"
xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31">
MICROSOFT CORPORATION
</dei:EntityRegistrantName>
Simply use the query methods available to you with LINQ to XML:
var doc = XDocument.Load(file);
Namespace dei = "http://xbrl.sec.gov/dei/2014-01-31"
var name = (string)doc.Descendants(dei + "EntityRegistrantName").Single();
Some other questions have asked how to use Xpath to query XML documents with a default namespace. The answer is to use a namespace manager to create an alias for the default namespace, and use that alias in your xpaths.
However, what if you don't know the URI of the default namespace in advance? How do you find it out from the XML document?
var doc = XDocument.Parse(myXml);
XNamespace ns = doc.Root.GetDefaultNamespace();
If you are using XmlDocument, you can get the default namespace by checking NamespaceURI of the root element:
var document = new XmlDocument();
document.LoadXml("<root xmlns='http://java.sun.com/xml/ns/j2ee'></root>");
var defaultNamespace = document.DocumentElement.NamespaceURI;
Assert.IsTrue(defaultNamespace == "http://java.sun.com/xml/ns/j2ee");
You could try using XmlNamespaceManager.DefaultNamespace to get it.
http://msdn.microsoft.com/en-us/library/system.xml.xmlnamespacemanager.defaultnamespace.aspx
I know this is an old topic, but I had the same problem, using the XmlDocument class, as I wanted to know the Default Namespace and a prefixed namespace.
I could get both namespaces using the same Method.
string prefixns = element.GetNamespaceOfPrefix("prefix");
string defaultns = element.GetNamespaceOfPrefix("");
this seems to work for me getting both namespaces on a XmlElement.
Edit: This is a XmlNode Method, so should also work on Attributes
The simplest way to do it
XmlDocument xDoc = new XmlDocument();
xDoc.Load(uriPath);
Console.WriteLine(xDoc.NamespaceURI);
This question already has answers here:
How can I remove the BOM from XmlTextWriter using C#?
(2 answers)
Closed 7 years ago.
I'm opening an existing XML file with C#, and I replace some nodes in there. All works fine. Just after I save it, I get the following characters at the beginning of the file:
 (EF BB BF in HEX)
The whole first line:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The rest of the file looks like a normal XML file.
The simplified code is here:
XmlDocument doc = new XmlDocument();
doc.Load(xmlSourceFile);
XmlNode translation = doc.SelectSingleNode("//trans-unit[#id='127']");
translation.InnerText = "testing";
doc.Save(xmlTranslatedFile);
I'm using a C# Windows Forms application with .NET 4.0.
Any ideas? Why would it do that? Can we disable that somehow? It's for Adobe InCopy, and it does not open it like this.
UPDATE:
Alternative Solution:
Saving it with the XmlTextWriter works too:
XmlTextWriter writer = new XmlTextWriter(inCopyFilename, null);
doc.Save(writer);
It is the UTF-8 BOM, which is actually discouraged by the Unicode standard:
http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
Use of a BOM is neither required nor
recommended for UTF-8, but may be
encountered in contexts where UTF-8
data is converted from other encoding
forms that use a BOM or where the BOM
is used as a UTF-8 signature
You may disable it using:
var sw = new IO.StreamWriter(path, new System.Text.UTF8Encoding(false));
doc.Save(sw);
sw.Close();
It's a UTF-8 Byte Order Mark (BOM) and is to be expected.
You can try to change the encoding of the XmlDocument. Below is the example copied from MSDN
using System; using System.IO; using System.Xml;
public class Sample {
public static void Main() {
// Create and load the XML document.
XmlDocument doc = new XmlDocument();
string xmlString = "<book><title>Oberon's Legacy</title></book>";
doc.Load(new StringReader(xmlString));
// Create an XML declaration.
XmlDeclaration xmldecl;
xmldecl = doc.CreateXmlDeclaration("1.0",null,null);
xmldecl.Encoding="UTF-16";
xmldecl.Standalone="yes";
// Add the new node to the document.
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmldecl, root);
// Display the modified XML document
Console.WriteLine(doc.OuterXml);
}
}
As everybody else mentioned, it's Unicode issue.
I advise you to try LINQ To XML. Although not really related, I mention it as it's super easy compared to old ways and, more importantly, I assume it might have automatic resolutions to issues like these without extra coding from you.