How to ignore leading whitespace in XML file? - c#

I need to load xml from a file into an XmlDocument. The problem is that the file contains some leading whitespace. (I have no control over the system that produces the file.)
Is there any clean/easy way to ignore or strip those characters?
string SamplelRequestFile = #"C:\example.xml";
XmlDocument docXML = new XmlDocument();
XmlTextReader xReader = new XmlTextReader(SamplelRequestFile);
XmlReaderSettings ReaderSettings = new XmlReaderSettings();
ReaderSettings.XmlResolver = null;
ReaderSettings.ProhibitDtd = false;
docXML.Load(xReader);
example.xml (note the leading spaces)
<?xml version="1.0" ?>
<myRoot>
<someElement />
</myRoot>

You'll just have to do something like
using (StreamReader sr = new StreamReader(#"C:\example.xml"))
{
XmlDocument docXML = new XmlDocument();
docXML.LoadXml(sr.ReadToEnd().Trim());
...
}

here is a sample that works:
string file = #"C:\example.xml";
XmlDocument docXML = new XmlDocument();
using (TextReader x = new StreamReader(file))
{
while (x.Peek() == ' ')
x.Read();
docXML.Load(x);
}

This is an invalid XML.
According to XML Specification, pi or processing-instructions must be the first characters if they are present.
I suggest you pre-process the XML by trimming the XML.
Workaround:
string content = File.ReadAllText(#"C:\example.xml");
XmlDocument doc = new XmlDocument();
doc.LoadXml(content.Trim());

Create a Stream and a StreamReader on the file yourself, then Peek() and consume characters from the stream as long as you see whitespace. Once you're sure that the next character is <, pass the stream to the XmlTextReader constructor.

Have you tried adding this flag ?
ReaderSettings.IgnoreWhitespace = true;

string newXml = string.TrimLeft(oldXml);

Related

Missing xml header when convert json to xml in C# [duplicate]

Consider the following simple code which creates an XML document and displays it.
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
textBox1.Text = xml.OuterXml;
it displays, as expected:
<root><!--Comment--></root>
It doesn't, however, display the
<?xml version="1.0" encoding="UTF-8"?>
So how can I get that as well?
Create an XML-declaration using XmlDocument.CreateXmlDeclaration Method:
XmlNode docNode = xml.CreateXmlDeclaration("1.0", "UTF-8", null);
xml.AppendChild(docNode);
Note: please take a look at the documentation for the method, especially for encoding parameter: there are special requirements for values of this parameter.
You need to use an XmlWriter (which writes the XML declaration by default). You should note that that C# strings are UTF-16 and your XML declaration says that the document is UTF-8 encoded. That discrepancy can cause problems. Here's an example, writing to a file that gives the result you expect:
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("root");
xml.AppendChild(root);
XmlComment comment = xml.CreateComment("Comment");
root.AppendChild(comment);
XmlWriterSettings settings = new XmlWriterSettings
{
Encoding = Encoding.UTF8,
ConformanceLevel = ConformanceLevel.Document,
OmitXmlDeclaration = false,
CloseOutput = true,
Indent = true,
IndentChars = " ",
NewLineHandling = NewLineHandling.Replace
};
using ( StreamWriter sw = File.CreateText("output.xml") )
using ( XmlWriter writer = XmlWriter.Create(sw,settings))
{
xml.WriteContentTo(writer);
writer.Close() ;
}
string document = File.ReadAllText( "output.xml") ;
XmlDeclaration xmldecl;
xmldecl = xmlDocument.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = xmlDocument.DocumentElement;
xmlDocument.InsertBefore(xmldecl, root);

Formatting string in XML format and remove invalid attribute characters

I've a string say "<Node a="<b>">". I need to escape only the data and parse this string as a node in XMLWriter. Hence how to escape only the attribute value "<" and note the XML structure's "<".
using (var writer = XmlWriter.Create(Console.Out))
{
writer.WriteStartElement("Node");
writer.WriteAttributeString("a", "<b>");
}
Output <Node a="<b>" />
Firstly you should parse the string. Since this is not valid xml, you can't use an xml parser. You can try HtmlAgilityPack. Then you can write the values with xml writer.
string s = "<Node a=\"<b>\">";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(s);
var node = doc.DocumentNode.FirstChild;
var attr = node.Attributes[0];
using (var writer = XmlWriter.Create(Console.Out))
{
writer.WriteStartElement(node.Name);
writer.WriteAttributeString(attr.Name, attr.Value);
}

Xml within an Xml

I basically want to know how to insert a XmlDocument inside another XmlDocument.
The first XmlDocument will have the basic header and footer tags.
The second XmlDocument will be the body/data tag which must be inserted into the first XmlDocument.
string tableData = null;
using(StringWriter sw = new StringWriter())
{
rightsTable.WriteXml(sw);
tableData = sw.ToString();
}
XmlDocument xmlTable = new XmlDocument();
xmlTable.LoadXml(tableData);
StringBuilder build = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(build, new XmlWriterSettings { OmitXmlDeclaration = true }))
{
writer.WriteStartElement("dataheader");
//need to insert the xmlTable here somehow
writer.WriteEndElement();
}
Is there an easier solution to this?
Use importNode feature in your document parser.
You can use this code based on CreateCDataSection method
// Create an XmlCDataSection from your document
var cdata = xmlTable.CreateCDataSection("<test></test>");
XmlElement root = xmlTable.DocumentElement;
// Append the cdata section to your node
root.AppendChild(cdata);
Link : http://msdn.microsoft.com/fr-fr/library/system.xml.xmldocument.createcdatasection.aspx
I am not sure what you are really looking for but this can show how to merge two xml documents (using Linq2xml)
string xml1 =
#"<xml1>
<header>header1</header>
<footer>footer</footer>
</xml1>";
string xml2 =
#"<xml2>
<body>body</body>
<data>footer</data>
</xml2>";
var xdoc1 = XElement.Parse(xml1);
var xdoc2 = XElement.Parse(xml2);
xdoc1.Descendants().First(d => d.Name == "header").AddAfterSelf(xdoc2.Elements());
var newxml = xdoc1.ToString();
OUTPUT
<xml1>
<header>header1</header>
<body>body</body>
<data>footer</data>
<footer>footer</footer>
</xml1>
You will need to write the inner XML files in CDATA sections.
Use writer.WriteCData for such nodes, passing in the inner XML as text.
writer.WriteCData(xmlTable.OuterXml);
Another option (thanks DJQuimby) is to encode the XML to some XML compatible format (say base64) - note that the encoding used must be XML compatible and that some encoding schemes will increase the size of the encoded document (base64 adds ~30%).

How to create in memory XML document and get string out of it

I would like to create the XML string with special characters handling. However it turned out to be too complicated and causing issues by generating wrong XML.
Now i was thinking to build the string using some object from System.xml and then stringify() or get string from it. This will i guess help me from special character cases.
//Psudo code
xmlDoc doc = new XMLDoc();
Element ele= new Element("xyz");
ele.value(Oob.property)
doc.appendNode(ele);
...
doc.getXMLString();
Can some one please let me know how to do this in C# .NET2.0+ .
I find XmlTextWriter more intuitive than XmlDocument for editing.
e.g.:
string xmlString = null;
using(StringWriter sw = new StringWriter())
{
XmlTextWriter writer = new XmlTextWriter(sw);
writer.Formatting = Formatting.Indented; // if you want it indented
writer.WriteStartDocument(); // <?xml version="1.0" encoding="utf-16"?>
writer.WriteStartElement("TAG"); //<TAG>
// <SUBTAG>value</SUBTAG>
writer.WriteStartElement("SUBTAG");
writer.WriteString("value");
writer.WriteEndElement();
// <SUBTAG attr="hello">world</SUBTAG>
writer.WriteStartElement("SUBTAG");
writer.WriteStartAttribute("attr");
writer.WriteString("hello");
writer.WriteEndAttribute();
writer.WriteString("world");
writer.WriteEndElement();
writer.WriteEndElement(); //</TAG>
writer.WriteEndDocument();
xmlString = sw.ToString();
}
after this code xmlString will contain:
<?xml version="1.0" encoding="utf-16"?>
<TAG>
<SUBTAG>value</SUBTAG>
<SUBTAG attr="hello">world</SUBTAG>
</TAG>
ADDITIONAL INFO:
using XmlDocument would be:
XmlDocument doc = new XmlDocument();
XmlNode tagNode = doc.CreateNode(XmlNodeType.Element, "TAG", null);
doc.AppendChild(tagNode);
XmlNode subTagNode1 = doc.CreateNode(XmlNodeType.Element, "SUBTAG", null);
tagNode.AppendChild(subTagNode1);
XmlText subTagNode1Value = doc.CreateTextNode("value");
subTagNode1.AppendChild(subTagNode1Value);
XmlNode subTagNode2 = doc.CreateNode(XmlNodeType.Element, "SUBTAG", null);
tagNode.AppendChild(subTagNode2);
XmlAttribute subTagNode2Attribute = doc.CreateAttribute("attr");
subTagNode2Attribute.Value = "hello";
subTagNode2.Attributes.SetNamedItem(subTagNode2Attribute);
XmlText subTagNode2Value = doc.CreateTextNode("world");
subTagNode2.AppendChild(subTagNode2Value);
string xmlString = null;
using(StringWriter wr = new StringWriter())
{
doc.Save(wr);
xmlString = wr.ToString();
}
You can also refer to this community wiki question, which leads to easier-to-read syntax when you need to build an xml stream programatically.
You can then just call the .ToString() method to get a clean escaped representation of your XML stream.
var xmlString = new XElement("Foo",
new XAttribute("Bar", "some & value with special characters <>"),
new XElement("Nested", "data")).ToString();
And you would get in xmlString:
<Foo Bar="some & value with special characters <>">
<Nested>data</Nested>
</Foo>

Save xml string or XmlNode to text file in indent format?

I have an xml string which is very long in one line. I would like to save the xml string to a text file in a nice indent format:
<root><app>myApp</app><logFile>myApp.log</logFile><SQLdb><connection>...</connection>...</root>
The format I prefer:
<root>
<app>myApp</app>
<logFile>myApp.log</logFile>
<SQLdb>
<connection>...</connection>
....
</SQLdb>
</root>
What are .Net libraries available for C# to do it?
This will work for what you want to do ...
var samp = #"<root><app>myApp</app><logFile>myApp.log</logFile></root>";
var xdoc = XDocument.Load(new StringReader(samp), LoadOptions.None);
xdoc.Save(#"c:\temp\myxml.xml", SaveOptions.None);
Same result with System.Xml namespace ...
var xdoc = new XmlDocument();
xdoc.LoadXml(samp);
xdoc.Save(#"c:\temp\myxml.xml");
I'm going to assume you don't mean that you have a System.String instance with some XML in it, and I'm going to hope you don't create it via string manipulation.
That said, all you have to do is set the proper settings when you create your XmlWriter:
var sb = new StringBuilder();
var settings = new XmlWriterSettings {Indent = true};
using (var writer = XmlWriter.Create(sb, settings))
{
// write your XML using the writer
}
// Indented results available in sb.ToString()
Just another option:
using System.Xml.Linq;
public string IndentXmlString(string xml)
{
XDocument doc = XDocument.Parse(xml);
return doc.ToString();
}

Categories

Resources