Formatting string in XML format and remove invalid attribute characters

Formatting string in XML format and remove invalid attribute characters - c#

I've a string say "<Node a="<b>">". I need to escape only the data and parse this string as a node in XMLWriter. Hence how to escape only the attribute value "<" and note the XML structure's "<".

using (var writer = XmlWriter.Create(Console.Out))
{
writer.WriteStartElement("Node");
writer.WriteAttributeString("a", "<b>");
}
Output <Node a="<b>" />
Firstly you should parse the string. Since this is not valid xml, you can't use an xml parser. You can try HtmlAgilityPack. Then you can write the values with xml writer.
string s = "<Node a=\"<b>\">";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(s);
var node = doc.DocumentNode.FirstChild;
var attr = node.Attributes[0];
using (var writer = XmlWriter.Create(Console.Out))
{
writer.WriteStartElement(node.Name);
writer.WriteAttributeString(attr.Name, attr.Value);
}

Related

Writing Escape Characters in Plain Text with XML

I'm having an issue parsing some XML with C#. To give an example, I'm taking some XML that contains the following tag:
<Message>Hello, my name is
Mr. Rogers</Message>
And parsing it with the following C# code:
var xmlData = XDocument.Load(filePath);
The problem that I'm having is that the above XML is being parsed as:
<Message>Hello, my name is
Mr. Rogers</Message>
when it needs to be parsed as:
<Message>Hello, my name is
Mr. Rogers</Message>
Is it possible to just have it parsed as the latter example in C#?

The only way is to do pre- and post-processing.
public XDocument LoadXDocument(string xmlString)
{
const string replacer = "|||==|||";
const string replacement = "
";
xmlString = xmlString.Replace(replacement, replacer);
XDocument doc;
using (var reader = new StringReader(xmlString))
{
doc = XDocument.Load(reader);
}
foreach (var element in doc.Elements())
{
if (element.Value.Contains(replacer))
{
element.SetValue(element.Value.Replace(replacer, replacement));
}
}
return doc;
}

Xml within an Xml

I basically want to know how to insert a XmlDocument inside another XmlDocument.
The first XmlDocument will have the basic header and footer tags.
The second XmlDocument will be the body/data tag which must be inserted into the first XmlDocument.
string tableData = null;
using(StringWriter sw = new StringWriter())
{
rightsTable.WriteXml(sw);
tableData = sw.ToString();
}
XmlDocument xmlTable = new XmlDocument();
xmlTable.LoadXml(tableData);
StringBuilder build = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(build, new XmlWriterSettings { OmitXmlDeclaration = true }))
{
writer.WriteStartElement("dataheader");
//need to insert the xmlTable here somehow
writer.WriteEndElement();
}
Is there an easier solution to this?

Use importNode feature in your document parser.

You can use this code based on CreateCDataSection method
// Create an XmlCDataSection from your document
var cdata = xmlTable.CreateCDataSection("<test></test>");
XmlElement root = xmlTable.DocumentElement;
// Append the cdata section to your node
root.AppendChild(cdata);
Link : http://msdn.microsoft.com/fr-fr/library/system.xml.xmldocument.createcdatasection.aspx

I am not sure what you are really looking for but this can show how to merge two xml documents (using Linq2xml)
string xml1 =
#"<xml1>
<header>header1</header>
<footer>footer</footer>
</xml1>";
string xml2 =
#"<xml2>
<body>body</body>
<data>footer</data>
</xml2>";
var xdoc1 = XElement.Parse(xml1);
var xdoc2 = XElement.Parse(xml2);
xdoc1.Descendants().First(d => d.Name == "header").AddAfterSelf(xdoc2.Elements());
var newxml = xdoc1.ToString();
OUTPUT
<xml1>
<header>header1</header>
<body>body</body>
<data>footer</data>
<footer>footer</footer>
</xml1>

You will need to write the inner XML files in CDATA sections.
Use writer.WriteCData for such nodes, passing in the inner XML as text.
writer.WriteCData(xmlTable.OuterXml);
Another option (thanks DJQuimby) is to encode the XML to some XML compatible format (say base64) - note that the encoding used must be XML compatible and that some encoding schemes will increase the size of the encoded document (base64 adds ~30%).

XML parser, multiple roots

This is part of the input string, i can't modify it, it will always come in this way(via shared memory), but i can modify after i have put it into a string of course:
<sys><id>SCPUCLK</id><label>CPU Clock</label><value>2930</value></sys><sys><id>SCPUMUL</id><label>CPU Multiplier</label><value>11.0</value></sys><sys><id>SCPUFSB</id><label>CPU FSB</label><value>266</value></sys>
i've read it with both:
String.Concat(
XElement.Parse(encoding.GetString(bytes))
.Descendants("value")
.Select(v => v.Value));
and:
XmlDocument document = new XmlDocument();
document.LoadXml(encoding.GetString(bytes));
XmlNode node = document.DocumentElement.SelectSingleNode("//value");
Console.WriteLine("node = " + node);
but they both have an error when run; that the input has multiple roots(There are multiple root elements quote), i don't want to have to split the string.
Is their any way to read the string take the value between <value> and </value> without spiting the string into multiple inputs?

That's not a well-formed XML document, so most XML tools won't be able to process it.
An exception is the XmlReader. Look up XmlReaderSettings.ConformanceLevel in MSDN. If you set it to ConformanceLevel.Fragment, you can create an XmlReader with those settings and use it to read elements from a stream that has no top-level element.
You have to write code that uses XmlReader.Read() to do this - you can't just feed it to an XmlDocument (which does require that there be a single top-level element).
e.g.,
var readerSettings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var reader = XmlReader.Create(stream, readerSettings))
{
while (reader.Read())
{
using (var fragmentReader = reader.ReadSubtree())
{
if (fragmentReader.Read())
{
var fragment = XNode.ReadFrom(fragmentReader) as XElement;
// do something with fragment
}
}
}
}

XML elements must have ONE root element, with whatever child structure you want.
Your xml string looks like:
<sys>
...
</sys>
<sys>
...
</sys>
The valid version would be:
<someRootElement>
<sys>
...
</sys>
<sys>
...
</sys>
</someElement>
Try:
XmlDocument document = new XmlDocument();
document.LoadXml("<root>"+encoding.GetString(bytes)+"</root>");
XmlNode node = document.DocumentElement.SelectSingleNode("//value");
Console.WriteLine("node = " + node);

This solution parses successfully all node types, including text nodes:
var settings = new XmlReaderSettings{ConformanceLevel = ConformanceLevel.Fragment};
var reader = XmlReader.Create(stream, settings);
while(reader.Read())
{
while(reader.NodeType != XmlNodeType.None)
{
if(reader.NodeType == XmlNodeType.XmlDeclaration)
{
reader.Skip();
continue;
}
XNode node = XNode.ReadFrom(reader);
}
}
Skips the XML declaration because it isn't a node.

How to ignore leading whitespace in XML file?

I need to load xml from a file into an XmlDocument. The problem is that the file contains some leading whitespace. (I have no control over the system that produces the file.)
Is there any clean/easy way to ignore or strip those characters?
string SamplelRequestFile = #"C:\example.xml";
XmlDocument docXML = new XmlDocument();
XmlTextReader xReader = new XmlTextReader(SamplelRequestFile);
XmlReaderSettings ReaderSettings = new XmlReaderSettings();
ReaderSettings.XmlResolver = null;
ReaderSettings.ProhibitDtd = false;
docXML.Load(xReader);
example.xml (note the leading spaces)
<?xml version="1.0" ?>
<myRoot>
<someElement />
</myRoot>

You'll just have to do something like
using (StreamReader sr = new StreamReader(#"C:\example.xml"))
{
XmlDocument docXML = new XmlDocument();
docXML.LoadXml(sr.ReadToEnd().Trim());
...
}

here is a sample that works:
string file = #"C:\example.xml";
XmlDocument docXML = new XmlDocument();
using (TextReader x = new StreamReader(file))
{
while (x.Peek() == ' ')
x.Read();
docXML.Load(x);
}

This is an invalid XML.
According to XML Specification, pi or processing-instructions must be the first characters if they are present.
I suggest you pre-process the XML by trimming the XML.
Workaround:
string content = File.ReadAllText(#"C:\example.xml");
XmlDocument doc = new XmlDocument();
doc.LoadXml(content.Trim());

Create a Stream and a StreamReader on the file yourself, then Peek() and consume characters from the stream as long as you see whitespace. Once you're sure that the next character is <, pass the stream to the XmlTextReader constructor.

Have you tried adding this flag ?
ReaderSettings.IgnoreWhitespace = true;

string newXml = string.TrimLeft(oldXml);

Save xml string or XmlNode to text file in indent format?

I have an xml string which is very long in one line. I would like to save the xml string to a text file in a nice indent format:
<root><app>myApp</app><logFile>myApp.log</logFile><SQLdb><connection>...</connection>...</root>
The format I prefer:
<root>
<app>myApp</app>
<logFile>myApp.log</logFile>
<SQLdb>
<connection>...</connection>
....
</SQLdb>
</root>
What are .Net libraries available for C# to do it?

This will work for what you want to do ...
var samp = #"<root><app>myApp</app><logFile>myApp.log</logFile></root>";
var xdoc = XDocument.Load(new StringReader(samp), LoadOptions.None);
xdoc.Save(#"c:\temp\myxml.xml", SaveOptions.None);
Same result with System.Xml namespace ...
var xdoc = new XmlDocument();
xdoc.LoadXml(samp);
xdoc.Save(#"c:\temp\myxml.xml");

I'm going to assume you don't mean that you have a System.String instance with some XML in it, and I'm going to hope you don't create it via string manipulation.
That said, all you have to do is set the proper settings when you create your XmlWriter:
var sb = new StringBuilder();
var settings = new XmlWriterSettings {Indent = true};
using (var writer = XmlWriter.Create(sb, settings))
{
// write your XML using the writer
}
// Indented results available in sb.ToString()

Just another option:
using System.Xml.Linq;
public string IndentXmlString(string xml)
{
XDocument doc = XDocument.Parse(xml);
return doc.ToString();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Formatting string in XML format and remove invalid attribute characters - c#

I've a string say "<Node a="<b>">". I need to escape only the data and parse this string as a node in XMLWriter. Hence how to escape only the attribute value "<" and note the XML structure's "<".

Related

Writing Escape Characters in Plain Text with XML

Xml within an Xml

XML parser, multiple roots

How to ignore leading whitespace in XML file?

Save xml string or XmlNode to text file in indent format?

Categories

Resources