Escaping ONLY contents of Node in XML

Escaping ONLY contents of Node in XML - c#

I have a part of code mentioned like below.
//Reading from a file and assign to the variable named "s"
string s = "<item><name> Foo </name></item>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
But, it stops working if the contents has characters something like "<", ">"..etc.
string s = "<item><name> Foo > Bar </name></item>";
I know, I have to escape those characters before loading but, if I do like
doc.LoadXml(System.Security.SecurityElement.Escape(s));
, the tags (< , >) are also escaped and as a result, the error occurs.
How can I solve this problem?

a tricky solution:
string s = "<item><name> Foo > Bar </name></item>";
s = Regex.Replace(s, #"<[^>]+?>", m => HttpUtility.HtmlEncode(m.Value)).Replace("<","ojlovecd").Replace(">","cdloveoj");
s = HttpUtility.HtmlDecode(s).Replace("ojlovecd", ">").Replace("cdloveoj", "<");
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);

Assuming your content will never contain the characters "]]>", you can use CDATA.
string s = "<item><name><![CDATA[ Foo > Bar ]]></name></item>";
Otherwise, you'll need to html encode your special characters, and decode them before you use/display them (unless it's in a browser).
string s = "<item><name> Foo > Bar </name></item>";

Assign the content of string to the InnerXml property of node.
var node = doc.CreateElement("root");
node.InnerXml = s;
Take a look at - Different ways how to escape an XML string in C#

It looks like the strings that you have generated are strings, and not valid XML. You can either get the strings generated as valid XML OR if you know that the strings are always going to be the name, then don't include the XML <item> and <name> tags in the data.
Then when you create the XMLDocument. do a CreateElement and assign your string before resaving the results.
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("item");
doc.AppendChild(root);
XmlElement name = doc.CreateElement("name");
name.InnerText = "the contents from your file";
root.AppendChild(name);

Related

Encode specific part of string and load it to xml

When I load string xml into XmlDocument object, it throws
'<', hexadecimal value 0x3C, is an invalid attribute character.
string xml= Request.Form["webformfield"] + string.Empty;
// it will read the input from webform in encoded format
e.g:
<Models><Model ModelID="F2434" ModelName="FTest 1 & Income MP" />
try around:
//decoded the whole string
StringWriter sw = new StringWriter();
Server.HtmlDecode(models, sw); // this is an internal method of the framework.
models = sw.ToString();
after decoding the string, the string will be stored below
//string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test1 <> & \"' characters \"/><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test2 <> & \"' characters \"/></Models>";
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
I have changed xml string manually at runtime and it worked. Changed value of ModelNameWithSpecialCHars attribute from the string.
added string image, because when I written the encoded specials characters, it was showing it in decoded format. find the below code.
Changed string:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();
Is there any way I can encode only specific part of string. e.g
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Tes <> & \"' characters \"/></Models>";
in the above string, I need to encode only value of ModelNameWithSpecialCHars attribute. ("Tes <> & "')

Sure - just put the part that needs encoding in a separate variable and use the WebUtility.HtmlEncode method (found in System.Net):
string bad_xml = "Tes <> & \"' characters ";
bad_xml = WebUtility.HtmlEncode(bad_xml);
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"" + bad_xml + "\"/Model></Models>"
although you might find it cleaner to add elements and attributes using the XmlDocument class since you want to have an XmlDocument in the end anyway, and it will encode text for you into valid XML, which has slightly different escaping requirements (although for your test string they are equivalent).

replace new lines with "" in C#

I want to convert this:
<translation>
1 Sənədlər
</translation>
to
<translation>1 Sənədlər</translation> in XML using C#.
Please help me. Only translation tags.
I tried this:
XDocument xdoc = XDocument.Load(path);
xdoc.Save("path, SaveOptions.DisableFormatting);
But it does not remove the new lines between <translation> tags.

what you have should work. you can validate by dumping the XDocument to a string variable to confirm if the SaveOptions is removing the formatting.
for eg: i tried the below and content does not have any formatting including newlines and whitespaces.
XDocument xmlDoc = new XDocument(new XElement("Team", new XElement("Developer", "Sam")));
var content = xmlDoc.ToString(SaveOptions.DisableFormatting);

A new line is determined in the code by "\n" and possibly also "\r". You can simply remove these:
string xmlString = "<translation>\r\n1 Sənədlər\r\n</translation>"; // With the 'new lines'
xmlString = xmlString.Replace("\r", "").Replace("\n", "");
This will result in:
<translation>
1 Sənədlər
</translation>
Becomming:
<translation>1 Sənədlər</translation>
I hope this helps.

You can strip out newlines manually in an environment-sensitive way by using
var content = xmlString.Replace(Environment.NewLine, string.Empty)

XML defines two types of whitespace: significant and insignificant:
Insignificant whitespace is the whitespace between elements where text content doesn't occur, whereas significant whitespace is the whitespace within elements that contain text content. You might find the graphic in this article useful to show the difference.
What you have in your translation element is significant whitespace; the element contains text so it is assumed to be part of the element contents. Without a schema or DTD that says it can be collapsed, no amount of changing the whitespace handling on read or write is going to remove this. These options only relate to the insignificant whitespace.
What you can do is apply your own processing: using LINQ to XML, you can trim the whitespace of all elements that contain only text using something like this:
var textElements = doc.Descendants()
.Where(element => element.Nodes().All(node => node is XText));
foreach (var element in textElements)
{
element.Value = element.Value.Trim();
}
See this fiddle for a demo.

Keep special characters in XML

I have a requirement where I need to read an XML file that may contain special characters. But I need to keep those special characters "as-is". However, after calling XDocument.Load(), &apos; is turned to &apos; and & to &.
Here is what the XML file may look like:
<root>
<child>This is a text with special character such as &apos; and &</child>
</root>
XDocument xDoc = null;
xDocument = XDocument.Load("myFile.xml", LoadOptions.SetBaseUri | LoadOptions.SetLineInfo | LoadOptions.PreserveWhitespace);
I've tried with encoding, but with no success. For example:
using (StreamReader oReader = new StreamReader("myFile.xml", Encoding.GetEncoding("utf-8")))
{
xDocument = XDocument.Load(oReader);
}
or
xDocument = XDocument.Parse(File.ReadAllText("myFile.xml", Encoding.UTF8));
Is there anything else that I can try?
Thanks.

Xml Document, escape this character

I have an XML document that has the paragraph separator character in some nodes as When I load XML into an XmlDocument object, I no longer see this character. Instead I see a space. How do I get it to show &#x2029?
XmlDocument doc = new XmlDocument();
doc.Load(xmlFilePath);
XmlNodeList nodes = doc.SelectNodes("/catalog/classes");
foreach(XmlNode node in nodes) {
string category = node["category"];
bool containerSeperator = category.Contains(" ") // this should return true but it returns false. This category has a paragraph separator
}

Test #1:
var xmlText = #"<Test>&</Test>";
var xml = XDocument.Parse(xmlText);
var result = xml.Element("Test").Value;
result will not be &, result will be ". So Contains("&") will never be true.
Test #2:
var xmlText = #"<Test> </Test>";
var xml = XDocument.Parse(xmlText);
var result = Encoding.Unicode.GetBytes(xml.Element("Test").Value);
result will be two bytes: x20 and x29, which is exactly what is read from XML. So the bytes are there you just don't see them as this Unicode character is not readable.

Convert XPathDocument to string

I have an XPathDocument and would like to export it in a string that contains the document as XML representation. What is easiest way of doing so?

You can do the following to get a string representation of the XML document:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
string xml = xdoc.CreateNavigator().OuterXml;
If you want your string to contain a full representation of the XML document including an XML declaration you can use the following code:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
StringBuilder sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb))
{
xdoc.CreateNavigator().WriteSubtree(xmlWriter);
}
string xml = sb.ToString();

An XPathDocument is a read-only representation of an XML document. That means that the internal representation will not change. To get the XML, you can get the original document.
Or use 0xA3's method, which will go through the whole document and write it again (output not necessarily the same as input, yet structurally and functionally equal, because some input is discarded with XDM in-memory representation)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Escaping ONLY contents of Node in XML - c#

Assign the content of string to the InnerXml property of node. var node = doc.CreateElement("root"); node.InnerXml = s; Take a look at - Different ways how to escape an XML string in C#

Related

Encode specific part of string and load it to xml

replace new lines with "" in C#

Keep special characters in XML

Xml Document, escape this character

Convert XPathDocument to string

Categories

Resources