How to prevent XElement from decoding character entity references - c#

I have an XML string that contains an apostrophe. I replace the apostrophe with its equivalent & parse the revised string into an XElement. The XElement, however, is turning the ' back into an apostrophe.
How do I force XElement.Parse to preserve the encoded string?
string originalXML = #"<Description><data>Mark's Data</data></Description>"; //for illustration purposes only
string encodedApostrophe = originalXML.Replace("'", "'");
XElement xe = XElement.Parse(encodedApostrophe);

This is correct behavior. In places where ' is allowed, it works the same as &apos;, ' or '. If you want to include literal string ' in the XML, you should encode the &:
originalXML.Replace("'", "&#39;")
Or parse the original XML and modify that:
XElement xe = XElement.Parse(originalXML);
var data = xe.Element("data");
data.Value = data.Value.Replace("'", "'");
But doing this seems really weird. Maybe there is a better solution to the problem you're trying to solve.
Also, this encoding is not “ASCII equivalent”, they are called character entity references. And the numeric ones are based on the Unicode codepoint of the character.

Related

Encode specific part of string and load it to xml

When I load string xml into XmlDocument object, it throws
'<', hexadecimal value 0x3C, is an invalid attribute character.
string xml= Request.Form["webformfield"] + string.Empty;
// it will read the input from webform in encoded format
e.g:
<Models><Model ModelID="F2434" ModelName="FTest 1 & Income MP" />
try around:
//decoded the whole string
StringWriter sw = new StringWriter();
Server.HtmlDecode(models, sw); // this is an internal method of the framework.
models = sw.ToString();
after decoding the string, the string will be stored below
//string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test1 <> & \"' characters \"/><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test2 <> & \"' characters \"/></Models>";
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
I have changed xml string manually at runtime and it worked. Changed value of ModelNameWithSpecialCHars attribute from the string.
added string image, because when I written the encoded specials characters, it was showing it in decoded format. find the below code.
Changed string:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();
Is there any way I can encode only specific part of string. e.g
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Tes <> & \"' characters \"/></Models>";
in the above string, I need to encode only value of ModelNameWithSpecialCHars attribute. ("Tes <> & "')
Sure - just put the part that needs encoding in a separate variable and use the WebUtility.HtmlEncode method (found in System.Net):
string bad_xml = "Tes <> & \"' characters ";
bad_xml = WebUtility.HtmlEncode(bad_xml);
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"" + bad_xml + "\"/Model></Models>"
although you might find it cleaner to add elements and attributes using the XmlDocument class since you want to have an XmlDocument in the end anyway, and it will encode text for you into valid XML, which has slightly different escaping requirements (although for your test string they are equivalent).

How to resolve special character exception issue in UTF8 byte code conversion in C#?

I am converting a string into UTF8 byte code,where as it is not accepting any special character and not converting it. so please help me to know convert these special char also in c#.
byte[] bytes = Encoding.UTF8.GetBytes("<Shipper>A & G VENLO BV</Shipper>");
Do not lead people astray. Your code throws a System.Xml.XmlException while parsing the XML.
The fact is that the string <Shipper>A & G VENLO BV</Shipper> is not well formed XML. The & symbol in XML must be escaped.
You have to create XML using the right approach:
XmlDocument xmlDoc = new XmlDocument();
XmlElement shipper = xmlDoc.CreateElement("Shipper");
shipper.InnerText = "A & G VENLO BV";
xmlDoc.AppendChild(shipper);
As a result, you will get the well-formed XML
<Shipper>A & G VENLO BV</Shipper>
Now you can work with it
byte[] bytes = Encoding.UTF8.GetBytes(shipper.OuterXml);

How to unescape special characters in c#

I have the following code
XElement element = new XElement("test", "a&b");
where
element.LastNode contains the value "a&b".
i wanted to be it "a&b".
How do i replace this?
Wait a moment,
<test>a&b</test>
is not valid XML. You cannot make XML that looks like this. This is clarified by the XML standard.
& has special meaning, it denotes an escaped character that may otherwise be invalid. An '&' character is encoded as & in XML.
for what its worth, this is invalid HTML for the same reason.
<!DOCTYPE html> <html> <body> a&b </body> </html>
If I write the code,
const string Value = "a&b";
var element = new XElement("test", Value);
Debug.Assert(
string.CompareOrdinal(Value, element.Value) == 0,
"XElement is mad");
it runs without error, XElement encodes and decodes to and from XML as necessary.
To unescape or decode the XML element you simply read XElement.Value.
If you want to make a document that looks like
<test>a&b</test>
you can but it is not XML or HTML, tools for working with HTML or XML won't intentionally help you. You'll have make your own Readers, Writers and Parsers.
The & is a reserved character so it will allways be encoded. So you have to decode:
Is this an option:
HttpUtility.HtmlDecode Method (String)
Usage:
string decoded = HttpUtility.HtmlDecode("a&b");
// returns "a&b"
Try following:
public static string GetTextFromHTML(String htmlstring)
{
// replace all tags with spaces...
htmlstring= Regex.Replacehtmlstring)#"<(.|\n)*?>", " ");
// .. then eliminate all double spaces
while (htmlstring).Contains(" "))
{
htmlstring= htmlstring.Replace(" ", " ");
}
// clear out non-breaking spaces and & character code
htmlstring = htmlstring.Replace(" ", " ");
htmlstring = htmlstring.Replace("&", "&");
return htmlstring;
}

XElement.Load Error reading ampersand symbols and special country characters

I'm having problems reading the ampersand symbol from an XML file:
XElement xmlElements = XElement.Load(Path_Xml_Data_File);
I get error when I have:
<Name>Patrick & Phill</Name>
Error: Name cannot begin with the ' ' character, hexadecimal value 0x20. Xml.XmlException) A System.Xml.XmlException was thrown: "Name cannot begin with the ' ' character
Or with special Portuguese characters:
<Extra>Direcção Assistida</Extra> (= <Extra>Direcção Assistida</Extra>)
Error: Reference to undeclared entity 'ccedil'
Any idea how to solve this problem?
I'm afraid that you're dealing with malformed XML.
To represent the ampersand, the data that you're loading should use the "&" entity.
The ç (ç) and ã (ã) named entities are not part of the XML standard, they are more commonly found in HTML (although they can be added to XML by the use of a DTD).
You could use HtmlTidy to tidy up the data first, or you could write something to convert the bare ampersands into entities on the incoming files.
For example:
public string CleanUpData(string data)
{
var r = new Regex(#"&\s");
string output = r.Replace(data, "& ");
output = output.Replace("ç", "ç");
output = output.Replace("ã", "ã");
return output;
}

c# parsing xml with and apostrophe throws exception

I am parsing an xml file and am running into an issue when trying find a node that has an apostrophe in it. When item name does not have this everything works fine. I have tried replacing the apostrophe with different escape chars but am not having much luck
string s = "/itemDB/item[#name='" + itemName + "']";
// Things i have tried that did not work
// s.Replace("'", "''");
// .Replace("'", "\'");
XmlNode parent = root.SelectSingleNode(s);
I always receive an XPathException. What is the proper way to do this. Thanks
For apostophe replace it with &apos;
You can do it Like this:
XmlDocument root = new XmlDocument();
root.LoadXml(#"<itemDB><item name=""abc'def""/></itemDB>");
XmlNode node = root.SelectSingleNode(#"itemDB/item[#name=""abc'def""]");
Note the verbatim string literal '#' and the double quotes.
Your code would then look like this and there is no need to replace anything:
var itemName = #"abc'def";
string s = #"/itemDB/item[#name=""" + itemName + #"""]";

Categories

Resources