Xml Document, escape this character - c#

I have an XML document that has the paragraph separator character in some nodes as
When I load XML into an XmlDocument object, I no longer see this character. Instead I see a space. How do I get it to show &#x2029?
XmlDocument doc = new XmlDocument();
doc.Load(xmlFilePath);
XmlNodeList nodes = doc.SelectNodes("/catalog/classes");
foreach(XmlNode node in nodes) {
string category = node["category"];
bool containerSeperator = category.Contains("
") // this should return true but it returns false. This category has a paragraph separator
}

Test #1:
var xmlText = #"<Test>&</Test>";
var xml = XDocument.Parse(xmlText);
var result = xml.Element("Test").Value;
result will not be &, result will be ". So Contains("&") will never be true.
Test #2:
var xmlText = #"<Test>
</Test>";
var xml = XDocument.Parse(xmlText);
var result = Encoding.Unicode.GetBytes(xml.Element("Test").Value);
result will be two bytes: x20 and x29, which is exactly what is read from XML. So the bytes are there you just don't see them as this Unicode character is not readable.

Related

How to get the xml contents without using the while loop

I have an xml file which contains two start tags and end tags. And I need the contents within these two tags separately. Please check the below content.
<testing>
<test>
<text>test1</text>
</test>
<test>
<text>test2</text>
</test>
</testing>
As of now I am using a while loop and finding the start index and end index of the tags and then getting the contents using the substring method. Please check the below code.
string xml = File.ReadAllText(#"C:\testing_doc.txt");
int startindex = xml.IndexOf("<test>");
while (startindex > 0)
{
int endIndex = xml.IndexOf("</test>", startindex);
int length = endIndex - startindex;
string textValue = xml.Substring(startindex, length);
startindex = xml.IndexOf("<test>", endIndex); // getting the start index for the second test tag
}
Is there any other way to get the contents without using the while loop? Because using while seems to be kind of expensive and if text file is corrupted then it will cause other problems.
Thanks in advance,
Anish
You can use XPATH which is designed to solve querying XML as the following:
var xml = #"<testing>
<test>
<text>test1</text>
</test>
<test>
<text>test2</text>
</test>
</testing>
";
var testing = XElement.Parse(xml);
var tests = testing.XPathEvaluate("test/text/text()") as IEnumerable;
foreach (var test in tests)
{
Console.WriteLine(test); // test1, test2
}
You could use XmlDocument class which is based on W3C DOM(Document object Model)
and XPath class
XmlDocument doc = new XmlDocument();
doc.load(#"C:\testing_doc.txt");
XmlNodeList values = doc.SelectNodes("testing/test/text"); //Using XPath
string str = string.Empty;
foreach (XmlNode x in values)
{
str += x.InnerText + ",";
}
str.TrimEnd(',');
Console.WriteLine(str); //test1,test2
If you want to do manually, regex can help you
string xml = File.ReadAllText(#"C:\testing_doc.txt");
string pattern = "<test>(.*?)</test>";
Match match = Regex.Match(xml , pattern);
if (match.Success){
System.Console.WriteLine(match.Groups[1].Value);
}
But think about the library helping to parse XML available XMLDocument or LinQ to XML

Parse the hexadecimal string to XDocument

My code
string workflowIdRef="WorkflowUser_NEW Bình Thuận Copy"
string requestFileXml = HttpContext.Current.Server.MapPath("~/Admin/TS/requestXml/JobCreateReq.xml");
XmlDocument xmld = new XmlDocument();
xmld.Load(requestFileXml);
string requestXml = xmld.OuterXml;
requestXml = requestXml.Replace("WORKFLOW_ID", workflowIdRef);
//Parse string requestXml to xDocument Temp
XDocument xTemp = XDocument.Parse(requestXml);
I debug and see below result
Text Visualizer mode:
And XML visualizer:
The XDocument XTemp has result string like in the picture 2
How to get XTemp to have result string like in the picture 1?
& is special character in XML, it's a marker for beginning of an encoded character like ì. If you want it as plain string as opposed to encoded character you'll need to escape & by using &, for example &#xEC;.
demo :
//notice that & are escaped to & in the following string :
string workflowIdRef="WorkflowUser_NEW B&#xEC;nh Thu&#x1EAD;n Copy"
string workflowIdRef = "WorkflowUser_NEW B&#xEC;nh Thu&#x1EAD;n Copy";
string xmlContent = #"<root workflowIdRef=""WORKFLOW_ID""/>";
xmlContent.Replace("WORKFLOW_ID", workflowIdRef);.Replace("WORKFLOW_ID", workflowIdRef);
XDocument xTemp = XDocument.Parse(xmlContent);
output :

Escaping ONLY contents of Node in XML

I have a part of code mentioned like below.
//Reading from a file and assign to the variable named "s"
string s = "<item><name> Foo </name></item>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
But, it stops working if the contents has characters something like "<", ">"..etc.
string s = "<item><name> Foo > Bar </name></item>";
I know, I have to escape those characters before loading but, if I do like
doc.LoadXml(System.Security.SecurityElement.Escape(s));
, the tags (< , >) are also escaped and as a result, the error occurs.
How can I solve this problem?
a tricky solution:
string s = "<item><name> Foo > Bar </name></item>";
s = Regex.Replace(s, #"<[^>]+?>", m => HttpUtility.HtmlEncode(m.Value)).Replace("<","ojlovecd").Replace(">","cdloveoj");
s = HttpUtility.HtmlDecode(s).Replace("ojlovecd", ">").Replace("cdloveoj", "<");
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
Assuming your content will never contain the characters "]]>", you can use CDATA.
string s = "<item><name><![CDATA[ Foo > Bar ]]></name></item>";
Otherwise, you'll need to html encode your special characters, and decode them before you use/display them (unless it's in a browser).
string s = "<item><name> Foo > Bar </name></item>";
Assign the content of string to the InnerXml property of node.
var node = doc.CreateElement("root");
node.InnerXml = s;
Take a look at - Different ways how to escape an XML string in C#
It looks like the strings that you have generated are strings, and not valid XML. You can either get the strings generated as valid XML OR if you know that the strings are always going to be the name, then don't include the XML <item> and <name> tags in the data.
Then when you create the XMLDocument. do a CreateElement and assign your string before resaving the results.
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("item");
doc.AppendChild(root);
XmlElement name = doc.CreateElement("name");
name.InnerText = "the contents from your file";
root.AppendChild(name);

How to encapsulate text into tags?

Let say we have such string variable:
string input = "First regular, <b>bold</b>,<i>italic</i>,<u>underline</u>,<b><i><u>bold+italic+underline</u></i></b>"
string which can contain some html tags in it.
The question is how can i encapsule each "non-taged" text part into some tag, to get smth like this:
string output = "<plain>First regular, </plain><b>bold</b><plain>,</plain><i>italic</i><plain>,</plain><u>underline</u><plain>,</plain><b><i><u>bold+italic+underline</u></i></b>"
How to do this in C# ? Regex? How should look such regex expression?
Maybe encapsulation isn't good start, what i need is to create xml structure from:
string input = "First regular, <b>bold</b>,<i>italic</i>,<u>underline</u>,<b><i><u>bold+italic+underline</u></i></b>"
I need to create
XDocument xml = XDocument.Parse("<plain>First regular, </plain><b>bold</b><plain>,</plain><i>italic</i><plain>,</plain><u>underline</u><plain>,</plain><b><i><u>bold+italic+underline</u></i></b>")
This code is kind of fail, but it should get you on the right path:
string input = "First regular, <b>bold</b>,<i>italic</i>,<u>underline</u>,<b><i><u>bold+italic+underline</u></i></b>";
input = "<data>" + input + "</data>";
XmlDocument xml = new XmlDocument();
xml.InnerXml = input;
XmlNodeList nodes = xml.SelectNodes("//text()");
foreach (XmlNode node in nodes) {
if (node.ParentNode.Name != "b" && node.ParentNode.Name != "i" && node.ParentNode.Name != "u") {
node.InnerText = "^^^^^" + node.InnerText + "$$$$$";
}
}
input = xml.DocumentElement.InnerXml.Replace("^^^^^", "<plain>").Replace("$$$$$", "</plain>");

Convert XPathDocument to string

I have an XPathDocument and would like to export it in a string that contains the document as XML representation. What is easiest way of doing so?
You can do the following to get a string representation of the XML document:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
string xml = xdoc.CreateNavigator().OuterXml;
If you want your string to contain a full representation of the XML document including an XML declaration you can use the following code:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
StringBuilder sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb))
{
xdoc.CreateNavigator().WriteSubtree(xmlWriter);
}
string xml = sb.ToString();
An XPathDocument is a read-only representation of an XML document. That means that the internal representation will not change. To get the XML, you can get the original document.
Or use 0xA3's method, which will go through the whole document and write it again (output not necessarily the same as input, yet structurally and functionally equal, because some input is discarded with XDM in-memory representation)

Categories

Resources