I use .NET XML technologies quite extensively on my work. One of the things the I like very much is the XSLT engine, more precisely the extensibility of it. However there one little piece which keeps being a source of annoyance. Nothing major or something we can't live with but it is preventing us from producing the beautiful XML we would like to produce.
One of the things we do is transform nodes inline and importing nodes from one XML document to another.
Sadly , when you save nodes to an XmlTextWriter (actually whatever XmlWriter.Create(Stream) returns), the namespace definitions get all thrown in there, regardless of it is necessary (previously defined) or not. You get kind of the following xml:
<root xmlns:abx="http://bladibla">
<abx:child id="A">
<grandchild id="B">
<abx:grandgrandchild xmlns:abx="http://bladibla" />
</grandchild>
</abx:child>
</root>
Does anyone have a suggestion as to how to convince .NET to be efficient about its namespace definitions?
PS. As an added bonus I would like to override the default namespace, changing it as I write a node.
Use this code:
using (var writer = XmlWriter.Create("file.xml"))
{
const string Ns = "http://bladibla";
const string Prefix = "abx";
writer.WriteStartDocument();
writer.WriteStartElement("root");
// set root namespace
writer.WriteAttributeString("xmlns", Prefix, null, Ns);
writer.WriteStartElement(Prefix, "child", Ns);
writer.WriteAttributeString("id", "A");
writer.WriteStartElement("grandchild");
writer.WriteAttributeString("id", "B");
writer.WriteElementString(Prefix, "grandgrandchild", Ns, null);
// grandchild
writer.WriteEndElement();
// child
writer.WriteEndElement();
// root
writer.WriteEndElement();
writer.WriteEndDocument();
}
This code produced desired output:
<?xml version="1.0" encoding="utf-8"?>
<root xmlns:abx="http://bladibla">
<abx:child id="A">
<grandchild id="B">
<abx:grandgrandchild />
</grandchild>
</abx:child>
</root>
Did you try this?
Dim settings = New XmlWriterSettings With {.Indent = True,
.NamespaceHandling = NamespaceHandling.OmitDuplicates,
.OmitXmlDeclaration = True}
Dim s As New MemoryStream
Using writer = XmlWriter.Create(s, settings)
...
End Using
Interesting is the 'NamespaceHandling.OmitDuplicates'
I'm not sure this is what you're looking for, but you can use this kind of code when you start writing to the Xml stream:
myWriter.WriteAttributeString("xmlns", "abx", null, "http://bladibla");
The XmlWriter should remember it and not rewrite it anymore. It may not be 100% bulletproof, but it works most of the time.
Related
I look for a way to beautify incomplete XML documents. In best case it should handle even large sizes (e.g. 10 MB or maybe 100 MB).
Incomplete means that the documents are truncated at a random position. Until this position the XML has a valid syntax. Beautify means to add line breaks and leading spaces between the tags.
In my case it's needed to analyse aborted streams. Without line breaks and indentions it's really hard to read for a human.
I know there are some editors which can beautify incomplete documents, but I want to integrate the beautifier into my own analysis tool.
Unfortunately I did't find a discussion or solution for that case.
The nuget package GuiLabs.Language.Xml of Kirill Osenkov (repository XmlParser) seems to be a useful candidate for an own beautifier implementation, because it's designed to be error tolerant. Unfortunately there is too less documentation to understand how to use this parser.
Example xml:
<?xml encoding="UTF-8"?><X><B><C>aa</C><B/><A.B><X>bb</X></A.B><A p="pp"/><nn:A>cc</nn:A><D><E>eee</
Expected result as string:
<?xml encoding="UTF-8"?>
<X>
<B>
<C>aa</C>
<B/>
<A.B>
<X>bb</X>
</A.B>
<A p="pp"/>
<nn:A>cc</nn:A>
<D>
<E>eee</
The error ignoring "XML" parser of AngleSharp.Xml can be used to parse your sample, though missing tags will be added, you can then get an XML string representation of the built document and with the help of legacy XmlTextReader and XmlTextWriter which allow you to ignore namespaces you can at least indent the markup:
var xml = #"<?xml encoding=""UTF-8""?><X><B><C>aa</C><B/><A.B><X>bb</X></A.B><A p=""pp""/><nn:A>cc</nn:A><D><E>eee</";
var xmlParser = new XmlParser(new XmlParserOptions() { IsSuppressingErrors = true });
var doc = xmlParser.ParseDocument(xml);
Console.WriteLine(doc.ToMarkup());
using (StringReader sr = new StringReader(doc.ToXml()))
{
using (XmlTextReader xr = new XmlTextReader(sr))
{
xr.Namespaces = false;
using (XmlTextWriter xw = new XmlTextWriter(Console.Out))
{
xw.Namespaces = false;
xw.Formatting = Formatting.Indented;
xw.WriteNode(xr, false);
}
}
}
}
e.g. get
<X>
<B>
<C>aa</C>
<B />
<A.B>
<X>bb</X>
</A.B>
<A p="pp" />
<nn:A>cc</nn:A>
<D>
<E>eee</E>
</D>
</B>
</X>
As your text says "Until this position the XML has a valid syntax" and your comment suggests the errors in your sample are just due to sloppiness I think it might also be possible to use WriteNode of an XmlWriter with XmlWriterSettings.Indent set to true on a standard XmlReader, as long as you catch the exception the XmlReader throws:
var xml = #"<?xml version=""1.0""?><root><section><p>Paragraph 1.</p><p>Paragraph 2.";
try
{
using (StringReader sr = new StringReader(xml))
{
using (XmlReader xr = XmlReader.Create(sr))
{
using (XmlWriter xw = XmlWriter.Create(Console.Out, new XmlWriterSettings() { Indent = true }))
{
xw.WriteNode(xr, false);
}
}
}
}
catch (XmlException e)
{
Console.WriteLine();
Console.WriteLine("Malformed input XML: {0}", e.Message);
}
gives
<?xml version="1.0"?>
<root>
<section>
<p>Paragraph 1.</p>
<p>Paragraph 2.</p>
</section>
</root>
Malformed input XML: Unexpected end of file has occurred. The following elements are not closed: p, section, root. Line 1, position 71.
So no need with WriteNode to handle every possible Readxxx and node type and call the corresponding Writexxx on the XmlWriter by you own code.
Does it have to be C#?
In Java, you should be able to pipe the output of a SAX parser into an indenting serializer by connecting a SAXSource to a StreamResult using an identity transformer, and then just make sure that when the SAX parser aborts, you trap the exception and close the output stream tidily.
I think you can probably do the same thing in C# but not quite as conveniently: coupling the events read from an XmlReader and sending the corresponding events to an XmlWriter is a lot more tedious because you have to write code for each separate kind of event.
If you want a C# solution and you're prepared to install Saxon enterprise edition, you can write a simple streaming transformation:
<transform version="3.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="xml" indent="yes"/>
<mode streamable="yes" on-no-match="shallow-copy"/>
</transform>
invoke it from the Saxon API using XsltTransformer with a Serializer as the destination, and again, catch the exception and flush/close the output stream to which the Serializer is writing.
Using Saxon on Java would be overkill because the identity transformer does this "out of the box".
I'm putting this here because I saw a lot of Q&A for XML on StackOverflow while trying to solve my own problems, and figured that once I'd found it, I'd post what I found so when someone else needs some XML help, this might help them.
My goal: To create an XML document that contains the following XML Declaration, Schema & Namespace Information:
<?xml version="1.0" encoding="UTF-8"?>
<abc:abcXML xsi:schemaLocation="urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ase="urn:abcXML:v12">
I'd already done it in Python for a quick prototype using minidom, and it was very simple. I needed to do it in a .NET language though (C#), because that's what the business calls for. I'm quite familiar with C#, but I've always stayed away from processing XML with it because I honestly don't have an in-depth grasp of XML and it's guidelines. Today, I had to face my demons.
Here's how I did it:
The first part is simple enough - create a document, and create a DocumentElement for the root (there's a catch here which I get to later):
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmlDeclaration, root);
The next part seems simple enough - create an element, give it a prefix, name and URI, then append it to the document. I thought this would work, but it doesn't (this is where the minimal understanding of XML comes into play):
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38 http://www.w3.org/2001/XMLSchema-instance");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd");
abcXML.AppendChild(xmlAttr);
xmlDoc.AppendChild(abcXML);
I tried to use doc.LoadXml() and doc.CreateDocumentFragment() and write my own declarations. No - I would get "Unexpected end of file". For those interested in XmlDocumentFragment: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocumentfragment.innerxml?view=netcore-3.1
This Microsoft article about XML Schemas and Namespaces didn't directly help me: https://learn.microsoft.com/en-us/dotnet/standard/data/xml/including-or-importing-xml-schemas
After doing more reading on XML, and going through the documentation for XmlDocument, XmlElement and XmlAttribute, this is the solution:
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "http://www.w3.org/2001/XMLSchema-instance");
xmlAttr.InnerXml = "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd";
abcXML.Attributes.Append(xmlAttr);
xmlDoc.AppendChild(abcXML);
Now you can add the elements to your document like so:
XmlElement header = doc.CreateElement(string.Empty, "Header", string.Empty);
abcXML.AppendChild(header);
To save the document, I used:
xmlDoc.Save(fileLocation);
I compared my output to the sample I had, and after comparing the file contents, I had succeeded in matching it. I provided the output to the client, they uploaded it into application they were using, and it failed: Row 1, Column 1 - Unexpected Character.
I had a suspicion it was encoding, and I was right. Using xmlDoc.Save(fileLocation) is correct, but it generates a UTF-8 file with the Byte Order Mark (BOM) at Row 1, Column 1. The XML parsing function in the application doesn't expect that, so the process failed. To fix that, I used the following method:
Encoding enc = new UTF8Encoding(false); /* This creates a UTF-8 encoding without the BOM */
using (System.IO.TextWriter tw = new System.IO.StreamWriter(filePath, false, enc))
{
xmlDoc.Save(tw);
}
return true;
I generated the file again, sent it to the client, and it worked first go.
I hope someone finds this to be useful.
For complicated namespaces it is simpler to just parse the xml string. I like using xml linq. You sample xml is wrong. The namespace is "ase" (not abc).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<ase:abcXML xsi:schemaLocation=\"urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd\"" +
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"" +
" xmlns:ase=\"urn:abcXML:v12\">" +
"</ase:abcXML>";
XDocument doc = XDocument.Parse(xml);
XElement root = doc.Root;
XNamespace nsAse = root.GetNamespaceOfPrefix("ase");
}
}
}
I'm fairly new to XML and very new to using the XMLWriter object. I've been successful in using it to write a "Well Formed" XML file, but after many failed attempts to create the needed header, below, I've decided to come here for some insight.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE IDMS-XML SYSTEM "http://eclipseinc.com/dtd/IDMS-XML.dtd">
<IDMS-XML>
....
Here's the beginning of my code (very standard):
using (XmlWriter xmlWriter = XmlWriter.Create("SendXML.xml"))
{
xmlWriter.WriteStartDocument();
....
I've tried using things like xmlWriter.WriteString() to force it in, but that has been unsuccessful for me. Thanks for any and all insight.
You need to be more clear what the “it” you are trying to “force in” is. Do you mean the <!DOCTYPE ...? That is a doctype declaration, and XmlWriter has a built-in method for adding one. To create a SYSTEM doctype try:
xmlWriter.WriteDocType("IDMS-XML", null, "http://eclipseinc.com/dtd/IDMS-XML.dtd", null);
If that is not what you mean you must be more explicit.
In order to write to the xml file you need to create the XmlTextWriter then create a node to make a header. Hope this helps you out a bit.
XmlTextWriter writer = new XmlTextWriter("filename",System.Text.Encoding.UTF8); writer.WriteStartDocument(True)
writer.WriteStartElement("Start Element Name")
createNode("NodeName", writer)
writer.WriteEndElement()
writer.WriteEndDocument()
writer.Close()
public void createnode(String nodename, XmlTextWriter writer)
{
writer.WriteStartElement("Name Here")
writer.WriteString(nodename)
writer.WriteEndElement()
}
When writing out an xml document I need to write all self closing tags without any whitespace, for example:
<foo/>
instead of:
<foo />
The reason for this is that a vendor system that I'm interfacing with throws a fit otherwise. In an ideal world the vendor would fix their system, but I don't bet on that happening any time soon. What's the best way to get an XmlWriter to output the self closing tags without the space?
My current scheme is to do something like:
return xml.Replace(" />", "/>");
Obviously this is far from ideal. Is it possible to subclass the XmlWriter for that one operation? Is there a setting as part of the XmlWriterSettings that I've overlooked?
I think that there is no such option to avoid that one space in self-closing tag. According to MSDN, XmlTextWriter:
When writing an empty element, an
additional space is added between tag
name and the closing tag, for example
. This provides compatibility
with older browsers.
Hopefully you could write <elementName></elementName> syntax instead of unwanted <elementName />, to do that use XmlWriter.WriteFullEndElement method, e.g.:
using System.Xml;
..
static void Main(string[] args)
{
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Indent = true;
xmlWriterSettings.IndentChars = ("\t");
xmlWriterSettings.OmitXmlDeclaration = true;
XmlWriter writer = XmlWriter.Create("example.xml", xmlWriterSettings);
writer.WriteStartElement("root");
writer.WriteStartElement("element1");
writer.WriteEndElement();
writer.WriteStartElement("element2");
writer.WriteFullEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
writer.Close();
}
produces following XML document:
<root>
<element1 />
<element2></element2>
</root>
Use a different serializer, for example the Saxon serializer, which also runs on .NET. It so happens that the Saxon serializer does what you want.
It's horrible, of course, to choose products based on accidental behaviour that no self-respecting system would require, but you have to accept reality - if you want to trade with idiots, you have to behave like an idiot.
Try this:
x.WriteStartElement("my-tag");
//Value of your tag is null
If (<"my-tag"> == "")
{
x.WriteWhitespace("");
}else
x.WriteString(my-tag);
x.WriteEndElement();
I'm using the following code to initialise an XmlDocument
XmlDocument moDocument = new XmlDocument();
moDocument.AppendChild(moDocument.CreateXmlDeclaration("1.0", "UTF-8", null));
moDocument.AppendChild(moDocument.CreateElement("kml", "http://www.opengis.net/kml/2.2"));
Later in the process I write some values to it using the following code
using (XmlWriter oWriter = oDocument.DocumentElement.CreateNavigator().AppendChild())
{
oWriter.WriteStartElement("Placemark");
//....
oWriter.WriteEndElement();
oWriter.Flush();
}
This ends up giving me the following xml when I save the document
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark xmlns="">
<!-- -->
</Placemark>
</kml>
How can I get rid of the empty xmlns on the Placemark element?
--EDITED TO SHOW CHANGE TO HOW PLACEMARK WAS BEING WRITTEN--
If I put the namespace in the write of placemark then non of the elements are added to the document.
I have fixed the issue by creating the document with the following code (no namespace in the document element)
XmlDocument moDocument = new XmlDocument();
moDocument.AppendChild(moDocument.CreateXmlDeclaration("1.0", "UTF-8", null));
moDocument.AppendChild(moDocument.CreateElement("kml"));
And by saving it with the following code to set the namespace before the save
moDocument.DocumentElement.SetAttribute("xmlns", msNamespace);
moDocument.Save(msFilePath);
This is valid as the namespce is only required in the saved xml file.
This is an old post, but just to prevent future bad practice; you should never declare the xmlns namespace in an XML document, so this may be the cause why you get empty nodes since you are doing something the XmlDocument is not supposed to do.
The prefix xmlns is used only to declare namespace bindings and is by
definition bound to the namespace name http://www.w3.org/2000/xmlns/.
It MUST NOT be declared . Other prefixes MUST NOT be bound to this
namespace name, and it MUST NOT be declared as the default namespace.
Element names MUST NOT have the prefix xmlns.
Source: http://www.w3.org/TR/REC-xml-names/#ns-decl
The following code worked for me (source):
XmlSerializer s = new XmlSerializer(objectToSerialize.GetType());
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("","");
s.Serialize(xmlWriter, objectToSerialize, ns);
oWriter.WriteStartElement("Placemark"); should work, because the parent node already has the right namespace.
Did you try:
oWriter.WriteStartElement("kml", "Placemark", "kml");
You needed
oWriter.WriteStartElement("Placemark", "http://www.opengis.net/kml/2.2");
otherwise the Placemark element gets put in the null namespace, which is why the xmlns="" attribute is added when you serialize the XML.