Load xml from string that contains HTML

Load xml from string that contains HTML - c#

I am doing the following -
// Create message
StringBuilder sbXML = new StringBuilder();
sbXML .Append("<root>");
sbXML .AppendFormat("<messageBody>{0}</messageBody>", JsonString);
sbXML .Append("</root>");
Where JsonString is a json string, however some of the entries in the json are strings of html (which I think this is why it is breaking).
When I do -
XmlDocument xmlDOC = new XmlDocument();
xmlDOC.LoadXml(sbXML.ToString());
I get the error -
'\' is an unexpected token. The expected token is '"' or '''.
My Json also contains urls so for instance -
{
"exampleJson": {
"url": "http://example.com/",
"html": "example text"
}
}
I believe it is these values that is leading to the exception, is there a way around this so that xmlDOC.LoadXml can load my Json, I considered doing something like -
xmlDOC.LoadXml(sbXML.ToString().Replace("character to replace", "acceptable character"));
However this is obviously not ideal. I also tried just using
.Load
However this resulted in illegal characters in the path exception.

I think you want to be doing something like:
StringBuilder sbXML = new StringBuilder();
sbXML.Append("<root>");
sbXML.Append("<messageBody />");
sbXML.Append("</root>");
XmlDocument xmlDOC = new XmlDocument();
xmlDOC.LoadXml(sbXML.ToString());
xmlDOC.DocumentElement.SelectSingleNode("messageBody").InnerText = JsonString;
As pointed out by #Alexei Levenkov creating Xml by string concatenation is a really bad idea and will lead to more problems later.
Using the System.Xml.XmlDocument methods is a much safer method that will encode all the bits it needs to to make the value of JsonString Xml safe.

You need to use the CDATA tag.
example:
<messageBody><![CDATA[ any json data ]]> </messageBody>

Related

{"'\u0004', hexadecimal value 0x04, is an invalid character

I am trying to convert a file to XML format that contains some special characters but it's not getting converted because of that special characters in the data.
I have already this regex code still it's not working for me please help.
The code what I have tried:
string filedata = #"D:\readwrite\test11.txt";
string input = ReadForFile(filedata);
string re1 = #"[^\u0000-\u007F]+";
string re5 = #"\p{Cs}";
data = Regex.Replace(input, re1, "");
data = Regex.Replace(input, re5, "");
XmlDocument xmlDocument = new XmlDocument();
try
{
xmlDocument = (XmlDocument)JsonConvert.DeserializeXmlNode(data);
var Xdoc = XDocument.Parse(xmlDocument.OuterXml);
}
catch (Exception ex)
{
Console.WriteLine(ex);
}

0x04 is a transmission control character and cannot appear in a text string. XmlDocument is right to reject it if it really does appear in your data. This does suggest that the regex you have doesn't do what you think it does, if I'm right that regex will find the first instance of one or more of those invalid characters at the beginning of a line and replace it, but not all of them. The real question for me is why this non-text 'character' appears in data intended as XML in the first place.
I have other questions. I've never seen JsonConvert.DeserializeXmlNode before - I had to look up what it does. Why are you using a JSON function against the root of a document which presumably therefore contains no JSON? Why are you then taking that document, converting it back to a string, and then creating an XDocument from it? Why not just create an XDocument to start with?

XmlNode.OwnerDocument.ChildNodes is empty

I get a XmlElement from a web service. I get something unexpected because xmlElement.OwnerDocument.ChildNodes is empty. How is that possible?
This is the xml:
<tns1:VideoSource xmlns:tns1="http://www.onvif.org/ver10/topics">
<MotionAlarm wstop:topic="true" xmlns:wstop="http://docs.oasis-open.org/wsn/t-1" xmlns="http://www.onvif.org/ver10/events/wsdl">
</MotionAlarm>
</tns1:VideoSource>

I tested you xml with the code below and there are children like you said. I suspect there may be some white characters that is creating an error. If you got data from a website (probably a stream) there may be some null characters at the end of the stream that is invisible. Make sure your stream class is using UTF8 encoding. The default encoding in some streams is Ascii which can change characters and add padding character which may create issues.
string input =
"<tns1:VideoSource xmlns:tns1=\"http://www.onvif.org/ver10/topics\">" +
"<MotionAlarm wstop:topic=\"true\" xmlns:wstop=\"http://docs.oasis-open.org/wsn/t-1\" xmlns=\"http://www.onvif.org/ver10/events/wsdl\">" +
"</MotionAlarm>" +
"</tns1:VideoSource>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(input);
XmlNodeList videoSource = doc.ChildNodes;
XmlNodeList motionAlarm = videoSource[0].ChildNodes;

What would be the best way of checking whether a string contains XML tags?

I know that the following would find potential tags, but is there a better way to check if a string contains XML tags to prevent exceptions when reading/writing the string between XML files?
string testWord = "test<a>";
bool foundTag = Regex.IsMatch(testWord, #"^*<*>*$"));

I'd use another Regex for that
Regex.IsMatch(testWord, #"<.+?>");
However, even if it does match, there is no guarantee that your file actually is an xml file, as the regex could also match strings like "<<a>" which is invalid, or "a <= b >= c" which is obviously not xml.
You should consider using the XmlDocument class instead.
XmlDocument xmlDoc = new XmlDocument();
try
{
xmlDoc.Load(testWord);
}
catch
{
// not an xml
}

Why don't you HtmlEncode the string before sending it via XML? This way you can avoid difficulties with Regex parsing tags.

Escaping ONLY contents of Node in XML

I have a part of code mentioned like below.
//Reading from a file and assign to the variable named "s"
string s = "<item><name> Foo </name></item>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
But, it stops working if the contents has characters something like "<", ">"..etc.
string s = "<item><name> Foo > Bar </name></item>";
I know, I have to escape those characters before loading but, if I do like
doc.LoadXml(System.Security.SecurityElement.Escape(s));
, the tags (< , >) are also escaped and as a result, the error occurs.
How can I solve this problem?

a tricky solution:
string s = "<item><name> Foo > Bar </name></item>";
s = Regex.Replace(s, #"<[^>]+?>", m => HttpUtility.HtmlEncode(m.Value)).Replace("<","ojlovecd").Replace(">","cdloveoj");
s = HttpUtility.HtmlDecode(s).Replace("ojlovecd", ">").Replace("cdloveoj", "<");
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);

Assuming your content will never contain the characters "]]>", you can use CDATA.
string s = "<item><name><![CDATA[ Foo > Bar ]]></name></item>";
Otherwise, you'll need to html encode your special characters, and decode them before you use/display them (unless it's in a browser).
string s = "<item><name> Foo > Bar </name></item>";

Assign the content of string to the InnerXml property of node.
var node = doc.CreateElement("root");
node.InnerXml = s;
Take a look at - Different ways how to escape an XML string in C#

It looks like the strings that you have generated are strings, and not valid XML. You can either get the strings generated as valid XML OR if you know that the strings are always going to be the name, then don't include the XML <item> and <name> tags in the data.
Then when you create the XMLDocument. do a CreateElement and assign your string before resaving the results.
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("item");
doc.AppendChild(root);
XmlElement name = doc.CreateElement("name");
name.InnerText = "the contents from your file";
root.AppendChild(name);

Convert XPathDocument to string

I have an XPathDocument and would like to export it in a string that contains the document as XML representation. What is easiest way of doing so?

You can do the following to get a string representation of the XML document:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
string xml = xdoc.CreateNavigator().OuterXml;
If you want your string to contain a full representation of the XML document including an XML declaration you can use the following code:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
StringBuilder sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb))
{
xdoc.CreateNavigator().WriteSubtree(xmlWriter);
}
string xml = sb.ToString();

An XPathDocument is a read-only representation of an XML document. That means that the internal representation will not change. To get the XML, you can get the original document.
Or use 0xA3's method, which will go through the whole document and write it again (output not necessarily the same as input, yet structurally and functionally equal, because some input is discarded with XDM in-memory representation)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Load xml from string that contains HTML - c#

You need to use the CDATA tag. example: <messageBody><![CDATA[ any json data ]]> </messageBody>

Related

{"'\u0004', hexadecimal value 0x04, is an invalid character

XmlNode.OwnerDocument.ChildNodes is empty

What would be the best way of checking whether a string contains XML tags?

Escaping ONLY contents of Node in XML

Convert XPathDocument to string

Categories

Resources