When I load string xml into XmlDocument object, it throws
'<', hexadecimal value 0x3C, is an invalid attribute character.
string xml= Request.Form["webformfield"] + string.Empty;
// it will read the input from webform in encoded format
e.g:
<Models><Model ModelID="F2434" ModelName="FTest 1 & Income MP" />
try around:
//decoded the whole string
StringWriter sw = new StringWriter();
Server.HtmlDecode(models, sw); // this is an internal method of the framework.
models = sw.ToString();
after decoding the string, the string will be stored below
//string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test1 <> & \"' characters \"/><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Test2 <> & \"' characters \"/></Models>";
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
I have changed xml string manually at runtime and it worked. Changed value of ModelNameWithSpecialCHars attribute from the string.
added string image, because when I written the encoded specials characters, it was showing it in decoded format. find the below code.
Changed string:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();
Is there any way I can encode only specific part of string. e.g
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"Tes <> & \"' characters \"/></Models>";
in the above string, I need to encode only value of ModelNameWithSpecialCHars attribute. ("Tes <> & "')
Sure - just put the part that needs encoding in a separate variable and use the WebUtility.HtmlEncode method (found in System.Net):
string bad_xml = "Tes <> & \"' characters ";
bad_xml = WebUtility.HtmlEncode(bad_xml);
string xml = "<Models><Model ModelId=\"124\" ModelNameWithSpecialCHars=\"" + bad_xml + "\"/Model></Models>"
although you might find it cleaner to add elements and attributes using the XmlDocument class since you want to have an XmlDocument in the end anyway, and it will encode text for you into valid XML, which has slightly different escaping requirements (although for your test string they are equivalent).
Related
I am trying to convert a file to XML format that contains some special characters but it's not getting converted because of that special characters in the data.
I have already this regex code still it's not working for me please help.
The code what I have tried:
string filedata = #"D:\readwrite\test11.txt";
string input = ReadForFile(filedata);
string re1 = #"[^\u0000-\u007F]+";
string re5 = #"\p{Cs}";
data = Regex.Replace(input, re1, "");
data = Regex.Replace(input, re5, "");
XmlDocument xmlDocument = new XmlDocument();
try
{
xmlDocument = (XmlDocument)JsonConvert.DeserializeXmlNode(data);
var Xdoc = XDocument.Parse(xmlDocument.OuterXml);
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
0x04 is a transmission control character and cannot appear in a text string. XmlDocument is right to reject it if it really does appear in your data. This does suggest that the regex you have doesn't do what you think it does, if I'm right that regex will find the first instance of one or more of those invalid characters at the beginning of a line and replace it, but not all of them. The real question for me is why this non-text 'character' appears in data intended as XML in the first place.
I have other questions. I've never seen JsonConvert.DeserializeXmlNode before - I had to look up what it does. Why are you using a JSON function against the root of a document which presumably therefore contains no JSON? Why are you then taking that document, converting it back to a string, and then creating an XDocument from it? Why not just create an XDocument to start with?
I am converting a string into UTF8 byte code,where as it is not accepting any special character and not converting it. so please help me to know convert these special char also in c#.
byte[] bytes = Encoding.UTF8.GetBytes("<Shipper>A & G VENLO BV</Shipper>");
Do not lead people astray. Your code throws a System.Xml.XmlException while parsing the XML.
The fact is that the string <Shipper>A & G VENLO BV</Shipper> is not well formed XML. The & symbol in XML must be escaped.
You have to create XML using the right approach:
XmlDocument xmlDoc = new XmlDocument();
XmlElement shipper = xmlDoc.CreateElement("Shipper");
shipper.InnerText = "A & G VENLO BV";
xmlDoc.AppendChild(shipper);
As a result, you will get the well-formed XML
<Shipper>A & G VENLO BV</Shipper>
Now you can work with it
byte[] bytes = Encoding.UTF8.GetBytes(shipper.OuterXml);
I get a XmlElement from a web service. I get something unexpected because xmlElement.OwnerDocument.ChildNodes is empty. How is that possible?
This is the xml:
<tns1:VideoSource xmlns:tns1="http://www.onvif.org/ver10/topics">
<MotionAlarm wstop:topic="true" xmlns:wstop="http://docs.oasis-open.org/wsn/t-1" xmlns="http://www.onvif.org/ver10/events/wsdl">
</MotionAlarm>
</tns1:VideoSource>
I tested you xml with the code below and there are children like you said. I suspect there may be some white characters that is creating an error. If you got data from a website (probably a stream) there may be some null characters at the end of the stream that is invisible. Make sure your stream class is using UTF8 encoding. The default encoding in some streams is Ascii which can change characters and add padding character which may create issues.
string input =
"<tns1:VideoSource xmlns:tns1=\"http://www.onvif.org/ver10/topics\">" +
"<MotionAlarm wstop:topic=\"true\" xmlns:wstop=\"http://docs.oasis-open.org/wsn/t-1\" xmlns=\"http://www.onvif.org/ver10/events/wsdl\">" +
"</MotionAlarm>" +
"</tns1:VideoSource>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(input);
XmlNodeList videoSource = doc.ChildNodes;
XmlNodeList motionAlarm = videoSource[0].ChildNodes;
I have an XML string that contains an apostrophe. I replace the apostrophe with its equivalent & parse the revised string into an XElement. The XElement, however, is turning the ' back into an apostrophe.
How do I force XElement.Parse to preserve the encoded string?
string originalXML = #"<Description><data>Mark's Data</data></Description>"; //for illustration purposes only
string encodedApostrophe = originalXML.Replace("'", "'");
XElement xe = XElement.Parse(encodedApostrophe);
This is correct behavior. In places where ' is allowed, it works the same as ', ' or '. If you want to include literal string ' in the XML, you should encode the &:
originalXML.Replace("'", "'")
Or parse the original XML and modify that:
XElement xe = XElement.Parse(originalXML);
var data = xe.Element("data");
data.Value = data.Value.Replace("'", "'");
But doing this seems really weird. Maybe there is a better solution to the problem you're trying to solve.
Also, this encoding is not “ASCII equivalent”, they are called character entity references. And the numeric ones are based on the Unicode codepoint of the character.
I have a part of code mentioned like below.
//Reading from a file and assign to the variable named "s"
string s = "<item><name> Foo </name></item>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
But, it stops working if the contents has characters something like "<", ">"..etc.
string s = "<item><name> Foo > Bar </name></item>";
I know, I have to escape those characters before loading but, if I do like
doc.LoadXml(System.Security.SecurityElement.Escape(s));
, the tags (< , >) are also escaped and as a result, the error occurs.
How can I solve this problem?
a tricky solution:
string s = "<item><name> Foo > Bar </name></item>";
s = Regex.Replace(s, #"<[^>]+?>", m => HttpUtility.HtmlEncode(m.Value)).Replace("<","ojlovecd").Replace(">","cdloveoj");
s = HttpUtility.HtmlDecode(s).Replace("ojlovecd", ">").Replace("cdloveoj", "<");
XmlDocument doc = new XmlDocument();
doc.LoadXml(s);
Assuming your content will never contain the characters "]]>", you can use CDATA.
string s = "<item><name><![CDATA[ Foo > Bar ]]></name></item>";
Otherwise, you'll need to html encode your special characters, and decode them before you use/display them (unless it's in a browser).
string s = "<item><name> Foo > Bar </name></item>";
Assign the content of string to the InnerXml property of node.
var node = doc.CreateElement("root");
node.InnerXml = s;
Take a look at - Different ways how to escape an XML string in C#
It looks like the strings that you have generated are strings, and not valid XML. You can either get the strings generated as valid XML OR if you know that the strings are always going to be the name, then don't include the XML <item> and <name> tags in the data.
Then when you create the XMLDocument. do a CreateElement and assign your string before resaving the results.
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("item");
doc.AppendChild(root);
XmlElement name = doc.CreateElement("name");
name.InnerText = "the contents from your file";
root.AppendChild(name);