XDocument Parse Ignore Chinese Characters - c#

I have a XML string which contains some chinese characters like �菅࿼Ჽ탽᫴. So When Parsing it with XDocument.Parse it is throwing the below exception.
System.Xml.XmlException: '', hexadecimal value 0x01, is an invalid character
I tried converting the xml string to UTF-8. But still the same issue.
Any Ideas?
Update:
XML Contains lots of elements in it, but on using the below answer it is ignoring all the other elements but just converting the elements which has special characters, Is there anything can be done with XDocument instead of XElement?

Use XmlReader with XmlReaderSettings.CheckCharacters set to false will solve your issue.
UPDATE
Here is what I'd used to load my japanese xml file.
string xmlText = "your xml data";
XElement node;
XmlReaderSettings xrs = new XmlReaderSettings();
xrs.CheckCharacters = false;
using (XmlReader rd = XmlReader.Create(new StringReader(xmlText), xrs))
{
node = XElement.Load(rd);
}

Related

Invalid Unicode Character in XML string [duplicate]

The following code;
var c = (char) 1;
var serializer = new XmlSerializer(typeof (string));
var writer = new StringWriter();
serializer.Serialize(writer, c.ToString());
var serialized = writer.ToString();
var dc = serializer.Deserialize(new StringReader(serialized));
Throws this exception in .NET 4.
Invalid Operation Exception - There is an error in XML document (2, 12). '', hexadecimal value 0x01, is an invalid character. Line 2, position 12
Am I doing something wrong? Or is there a reasonable work around?
Many thanks!
There is a workaround as explained here - you can use XmlReaderSettings.CheckCharacters option to ignore validation of characters:
XmlReader xr = XmlReader.Create(new StringReader(serialized),
new XmlReaderSettings { CheckCharacters = false });
var dc = (string)serializer.Deserialize(xr);
You're trying to serialize characters which can't be represented within XML. Unfortunately they break XML serialization. I don't know of any workarounds for this other than writing your own escaping code.
On the other hand, actual uses for such characters (ASCII characters before space, other than tab, carriage return and line feed IIRC) are relatively rare - you may find you're okay just to strip them. Alternatives are to come up with your own escaping, or encode the whole string as binary and base64 the result. Escaping will take a good deal less space than the re-encoding approach :)

Keep special characters in XML

I have a requirement where I need to read an XML file that may contain special characters. But I need to keep those special characters "as-is". However, after calling XDocument.Load(), ' is turned to ' and & to &.
Here is what the XML file may look like:
<root>
<child>This is a text with special character such as &apos; and &</child>
</root>
XDocument xDoc = null;
xDocument = XDocument.Load("myFile.xml", LoadOptions.SetBaseUri | LoadOptions.SetLineInfo | LoadOptions.PreserveWhitespace);
I've tried with encoding, but with no success. For example:
using (StreamReader oReader = new StreamReader("myFile.xml", Encoding.GetEncoding("utf-8")))
{
xDocument = XDocument.Load(oReader);
}
or
xDocument = XDocument.Parse(File.ReadAllText("myFile.xml", Encoding.UTF8));
Is there anything else that I can try?
Thanks.

XmlNode.OwnerDocument.ChildNodes is empty

I get a XmlElement from a web service. I get something unexpected because xmlElement.OwnerDocument.ChildNodes is empty. How is that possible?
This is the xml:
<tns1:VideoSource xmlns:tns1="http://www.onvif.org/ver10/topics">
<MotionAlarm wstop:topic="true" xmlns:wstop="http://docs.oasis-open.org/wsn/t-1" xmlns="http://www.onvif.org/ver10/events/wsdl">
</MotionAlarm>
</tns1:VideoSource>
I tested you xml with the code below and there are children like you said. I suspect there may be some white characters that is creating an error. If you got data from a website (probably a stream) there may be some null characters at the end of the stream that is invisible. Make sure your stream class is using UTF8 encoding. The default encoding in some streams is Ascii which can change characters and add padding character which may create issues.
string input =
"<tns1:VideoSource xmlns:tns1=\"http://www.onvif.org/ver10/topics\">" +
"<MotionAlarm wstop:topic=\"true\" xmlns:wstop=\"http://docs.oasis-open.org/wsn/t-1\" xmlns=\"http://www.onvif.org/ver10/events/wsdl\">" +
"</MotionAlarm>" +
"</tns1:VideoSource>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(input);
XmlNodeList videoSource = doc.ChildNodes;
XmlNodeList motionAlarm = videoSource[0].ChildNodes;​

convert KOI8-R xml node into unicode in c#

I have the following xml:
<root>
<text><![CDATA[ОПЕЛХМЮБЮ ОПЕГ БЗПРЪЫ ЯЕ АЮПЮАЮМ, Б ЙНИРН ЯЕ]]></text>
</root>
I know this text is generated using encoding KOI8-R (this text is displayed in my text editor only when I select this encoding when I open the xml file as text) and I would like to convert the value of this node into a string usable in c#. I can read the InnerText value of this node, but it's not what I'm expecting. Can someone show me the correct way to convert a string written with this encoding into a Unicode one?
Update
Following Jon Skeet suggestions, the solution would look like this:
Encoding encoding = Encoding.GetEncoding("KOI8-R");
XmlDocument doc2 = new XmlDocument();
using (TextReader tr = new StreamReader(outputPath, encoding))
{
doc2.Load(tr);
}
How do you have that XML? It should have an XML declaration stating which encoding it's using; otherwise it's not correct simply in XML terms. You shouldn't be worrying about encodings after you've parsed the XML. So potentially something like:
Encoding encoding = Encoding.GetEncoding("KOI8-R");
XDocument doc;
using (var reader = File.OpenText("file.xml", encoding))
{
doc = XDocument.Load(reader);
}
... but as I say, the file itself should declare the encoding.

XmlReader read document with unescaped &s

I am trying to parse an XMl document that i received into a string from a web service call.
String content = ...;//long xml document
using(TextReader reader = new StringReader(content))
using(XmlReader xml_reader = XmlReader.Create(reader, settings))
{
XML = new XPathDocument(xml_reader);
}
however i get an exception :
An error occurred while parsing EntityName. Line 1, position 1721.
i looked through the document around that character and it was in the middle of a random tag, however about 20-30 chars earlier i noticed that there were unescaped ampersands (& characters), so im thinking that that is the problem.
running:
content.Substring(1700, 100);//results in the following text
"alue>1 time per day& with^honey~&water\\\\</Value></Frequency></Direction> </Directions> "
^unescaped & char 1721 is the 'w'
how can i successful read this document as xml?
verify that your xml encoding matches theirs (the top of the document, something like <?xml version="1.0" encoding="ISO-8859-9"?>). Substitute the value from the webservice xml document for webserviceEncoding below
using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding(webserviceEncoding)))) {
XML = new XPathDocument( r );
// ...
}
If that doesn't work
Replace it in the string prior to loading it into an xml parser
Notify the webservice vendor

Categories

Resources