nbsp issue reading XML - c#

There are several posts on this issue but none of the solutions seem to work for me. Not sure what I'm doing wrong.
I'm trying to read an XML file that looks like this:
<?xml version="1.0"?>
<content languageCode="en" languageID="1">
<ftypeparser>c:\users\pdeoliveira\Documents\Xmlparsernew.xml</ftypeparser>
<xmlparser>c:\users\pdeoliveira\Documents\Xmlparser.xml</xmlparser>
<sourcelocation>c:\localization\2015</sourcelocation>
<rpkfile>RPK_DefaultSyllabi_2015.xml</rpkfile>
<loglocation>c:\users\pdeoliveira\Documents\Buildloc_Log.txt</loglocation>
<losefiles><locfile>c:\localization\2015\Strings.xml</locfile></losefiles>
</content>
Here's the code that reads it:
XmlDocument doc = new XmlDocument();
string filetoload = "config.xml";
try
{
doc.Load(filetoload);
}
catch (Exception e)
{
Console.Writeline(e);
}
And here's the exception I get:
System.Xml.XmlException: Reference to undeclared entity 'nbsp'. Line 1960, position 12.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.HandleGeneralEntityReference(String name, Boolean isInAttributeValue, Boolean pushFakeEntityIfNullResolver, Int32 entityStartLinePos)
at System.Xml.XmlTextReaderImpl.ResolveEntity()
at System.Xml.XmlLoader.LoadEntityReferenceNode(Boolean direct)
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.Load(String filename)
at ConsoleApplication9.Program.getrestype(String fname) in c:\Users\pdeoliveira\Documents\Visual Studio 2013\Projects\BuildLoc\BuildLoc\Program.cs:line 1067
I tried using the declaration for nbsp like suggested in other posts:
<!DOCTYPE doctypeName [
<!ENTITY nbsp " ">
]>
That didn't work either. The interesting thing is that my file is only 10 or so lines long and it points to an issue on line 1960...??
Any help would be appreciated.

Related

XML Deserialization C# gives error for valid document

I have a lot of XML files with the same structure. Many of them working OK, but for some XmlSerializer gives me an error but when I put the document in xml validator - it says that document is correct.
Deserialization code:
var document = serializer.Deserialize(File.OpenRead(file));
Error:
System.InvalidOperationException: There is an error in XML document (504, 8). ---> System.Xml.XmlException: Unexpected node type Element. ReadElementString method can only be called on elements with simple or empty content. Line 504, position 8.
at System.Xml.XmlReader.ReadElementString()
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderPatentdocument.Read33_Claimtext(Boolean isNullable, Boolean checkType)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderPatentdocument.Read34_Claim(Boolean isNullable, Boolean checkType)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderPatentdocument.Read35_Claims(Boolean isNullable, Boolean checkType)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderPatentdocument.Read43_Patentdocument(Boolean isNullable, Boolean checkType)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderPatentdocument.Read44_patentdocument()
--- End of inner exception stack trace ---
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
The part of the document where it gives the error:
<text>12. Führungsschiene nach einem der Ansprüche 2 bis 11, dadurch gekennzeichnet, daß in den beiden Nutwänden (<b>11<i>a</i>, 11</b><i>a′)</i> einander gegenüberliegende Bohrungen (<b>14</b><i>a</i>, <b>14</b><i>a</i>′) vorgesehen sind, von denen die eine Bohrung (<b>14</b><i>a</i>′) durch das Einsatzteil (<b>15</b><i>a)</i> ver­schlossen ist.</text>
I suppose it is because of inline html tags inside because it complains about this line on position of i tag
<b>11<i>a</i>, 11</b>
But for example this xml is correct according to XmlSerializer and it is possible to deserialize it:
<text>9. Führungsschiene nach Anspruch 8, dadurch gekennzeichnet, daß der Ansatz (<b>20</b>) die Zuführfläche (<b>25</b>) aufweist.</text>
So my question why xml validator says that the document is valid and XmlSerializer cannot deserialize it? Is it possible to have a workaround without changing the document?
You're right when you point at the inner HTML tags.
Your XML is not valid because you have tags inside a simple (text) element. XmlSerializer doesn't understand and throws an error.
If you have generated the XML files, you have to escape the data inside the simple elements beforehand :
with HTML Encode
Or by encapsulating it in a CDATA tag (<![CDATA[...]]>)
Try serializing the instance that is causing you problems. Then you can compare the output of the serialization with the contents of the file you are attempting to deserialize. The difference between the two XML strings will show you where the problem is.
Here is a quick function to serialize an instance of a class to XML:
public static string Serialize<T>(T entity)
{
if (entity == null)
return String.Empty;
try
{
XmlSerializer XS = new XmlSerializer(typeof(T));
System.IO.StringWriter SW = new System.IO.StringWriter();
XS.Serialize(SW, entity);
return SW.ToString();
}
catch (Exception e)
{
Logging.Log(Severity.Error, "Unable to serialize entity", e);
return String.Empty;
}
}
If you haven't tried it yet, I would suggest the software BeyondCompare to easily see the difference between the two files.
Suppose we have the following class:
public class Foo
{
//[XmlIgnore]
public string Text { get; set; }
}
And xml of the following form:
<Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<text>12. Führungsschiene nach einem der Ansprüche 2 bis 11, dadurch gekennzeichnet, daß in den beiden Nutwänden (<b>11<i>a</i>, 11</b><i>a′)</i> einander gegenüberliegende Bohrungen (<b>14</b><i>a</i>, <b>14</b><i>a</i>′) vorgesehen sind, von denen die eine Bohrung (<b>14</b><i>a</i>′) durch das Einsatzteil (<b>15</b><i>a)</i> ver­schlossen ist.</text>
</Foo>
Then we can deserialize the data as follows.
var xs = new XmlSerializer(typeof(Foo));
xs.UnknownElement += Xs_UnknownElement;
Foo foo;
using (var fs = new FileStream("test.txt", FileMode.Open))
{
foo = (Foo)xs.Deserialize(fs);
}
Subscribe XmlSerializer to UnknownElement event.
In the event handler manually set our property to the data.
private static void Xs_UnknownElement(object sender, XmlElementEventArgs e)
{
var foo = (Foo)e.ObjectBeingDeserialized;
foo.Text = e.Element.InnerXml;
}
Please note that the property name should not match the xml node name (case sensitive). Only in this case the event is triggered. If the names match, use the XmlIgnore attribute.

XDocument Invalid Characters On Load - '\v', hexadecimal value 0x0B, is an invalid character

I am downloading some XML content from the Adobe Connect API. I am loading the content into a XDocument and reading through all of the sco elements to save them to the database. However, one of the calls to the API contains an invalid character that gives the exception:
System.Xml.XmlException: '', hexadecimal value 0x0B, is an invalid character. Line 2, position 6495.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.ParseText()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Load(XmlReader reader)
at ACRS.DataRefresherApp.Program.GetFolderContents(Folder parentFolder, AcrsDbContext db) in xxx:line 164
Here is a sample of the XML coming from the Adobe Connect API. Note: this example does not contain an invalid character.
<?xml version="1.0"?>
<results>
<status code="ok"/>
<scos>
<sco is-folder="1" duration="" display-seq="0" icon="folder" type="folder" folder-id="xx" source-sco-id="" sco-id="xx">
<name>Shared Templates</name>
<url-path>/f1101964883/</url-path>
<date-created>2010-09-16T15:21:15.993+10:00</date-created>
<date-modified>2013-12-11T22:31:05.130+11:00</date-modified>
<is-seminar>false</is-seminar>
</sco>
.....
</scos>
</results>
Here is the code I am using to read/load the XML data.
Stream responseStream = response.GetResponseStream();
XmlReader xmlReader = XmlReader.Create(responseStream, new XmlReaderSettings() { CheckCharacters = false });
var xmlResponse = XDocument.Load(xmlReader);
var folders = xmlResponse.Elements("results").Elements("scos").Elements("sco").ToList();
The exception occurs when the XDocument attempts to load the data from the xmlReader.
var xmlResponse = XDocument.Load(xmlReader);
I realise that I do not need to use the XmlReader and can load the XDocument directrly from the stream. However, I have included the XmlReader in response to this blog post by Paul Selles.
I have already read this thread:
How to prevent System.Xml.XmlException: Invalid character in the given encoding
However, this does not fix my problem. Apparently, XML standards cause the reader to default to the declared document encoding once the document is being read. In the case of my document where no declaration is being made, it should default to UTF-8. See this answer.

Exact way to receive an XML by C# Web service

I appreciate your help.
I have created a web service to receive an XML file, so I followed the below approach then I published it and it worked fine for me :
....
XmlDocument xmldoc = new XmlDocument();
try
{
if (HttpContext.Current.Request.InputStream != null)
{
StreamReader stream = new StreamReader(HttpContext.Current.Request.InputStream);
string xmls = stream.ReadToEnd();
xmldoc.LoadXml(xmls);
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
}
}
catch (Exception ex)
{
logger.Log(NLog.LogLevel.Error, ex.Message + ex.StackTrace);
}
...
knowing that my XML structure is:
<reports uis="5521452542">
<attribute1>val1</attribute1>
...
</reports>
but after testing by some friends, that called my web service from the Lunix platform I received in the Log file error the below message error; knowing that their XML file is validated.
Just to let you know; that their XML file did not contains the declaration of:
<?xml version="1.0" encoding="UTF-8"?>
Can this provide the error or NOT ?
2014-04-03 03:56:53.7408|Error|Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at WebService.Service1.GetInfoService() in
D:\yassine\Mobily\Log\WebService\WebService\WebService\Service1.asmx.cs:line 56
2014-04-03 03:56:53.8032|Error|Root element is missing. at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at WebService.Service1.GetInfoService() in
D:\yassine\Mobily\Log\WebService\WebService\WebService\Service1.asmx.cs:line 71
Can you please help me to find the exact error please ?
Thank you
The exception is saying exactly whats wrong, you are receiving an invalid xml that has no root element. Ask your friends to send you the raw xml by mail so you could see what they're sending you.
You can you Altova XmlSpy to verify that the xml is valid.
A very basic but valid xml should be:
<root>
<child></child
</root>

XmlDocument file name with space

I have XML file that contains services names in windows7, one of the services has white space i.e "service name" I get exception when I load the file:
fileName = file;
pathToFile = path;
XmlDocument ServerList = new XmlDocument();
ServerList.Load(pathToFile + fileName);
the XML:
<systems>
<Groups>
<Myervices>
<Dialogic/>
<BoardServer/>
<HmpElements/>
<Service-1 Agent/>
</Myervices>
</Groups>
</systems>
the filenName has the white space, is there a way to receive it cause I cannot change the service name.
the exception I get:
'/' is an unexpected token. The expected token is '='. Line 824,
position 23. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at
System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) at
System.Xml.XmlTextReaderImpl.ThrowUnexpectedToken(String
expectedToken1, String expectedToken2) at
System.Xml.XmlTextReaderImpl.ParseAttributes() at
System.Xml.XmlTextReaderImpl.ParseElement() at
System.Xml.XmlTextReaderImpl.ParseElementContent() at
System.Xml.XmlTextReaderImpl.Read() at
System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace) at
System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc) at
System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean
preserveWhitespace) at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.Load(String filename) at
Stop_Start_systems.Functions..ctor(String path, String file) in
c:\Stop_Start_systems\Functions.cs:line 32 at
Stop_Start_systems.Default.Page_Load(Object sender, EventArgs e) in
c:\Stop_Start_systems\Default.aspx.cs:line 31
System.Collections.ListDictionaryInterna Thanks
The problem has nothing to do with the name of the XML file, or the code you posted. It has everything to do with the XML being invalid. XML element names can't contain spaces, so this isn't valid:
<Service-1 Agent/>
Instead, you should use the same element name for all services, putting the service name into an attribute instead, e.g.
<Service Name="Service-1 Agent" />
<Service Name="Some other service" />
etc. I would strongly advise you to create the XML file automatically using an API instead of by hand - that way you're much more likely to end up with valid XML.

Invalid character in the given encoding

XmlDocument oXmlDoc = new XmlDocument();
try
{
oXmlDoc.Load(filePath);
}
catch (Exception ex)
{
// Log Error Here
try
{
Encoding enc = Encoding.GetEncoding("iso-8859-1");
StreamReader sr = new StreamReader(filePath, enc);
String response = sr.ReadToEnd();
oXmlDoc.LoadXml(response);
}
catch (Exception innerException)
{
// Log Error Here
return false;
}
}
I got xml file from third party which also include the Document Type Definition file after xml declaration.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SoccerMatchPlus SYSTEM "SoccerMatchPlus.dtd">
<SoccerMatchPlus matchid="33226">
<Booking id="13642055" time="47">
<Player id="370927">
<Name firstName="Lasse" initials="L" lastName="Nielsen">L Nielsen</Name>
</Player>
<Team id="26415" name="AæB" homeOrAway="Home"/>
</Booking>
</SoccerMatchPlus>
If I parse the file with Invalid character in the given encoding. Line 102, position 56. If I catch the exception and retry to parse the file then I got another issue, file parses but
I got the error Could not find file 'C:\Windows\system32\SoccerMatchPlus.dtd'.
Document Type Definition file named SoccerMatchPlus.dtd is added before the root element by third party.
In the case of Load method the parser loads the file from the location where xml file also exists.
I put the SoccerMatchPlus.dtd in other location where xml file resides, can I load that SoccerMatchPlus.dtd file from the specified location at runtime or can you tell me the better way to load the xml file which contains the invalid characters data?
Use the XmlResolver property of XmlDocument class to disable DTD processing.
XmlDocument oXmlDoc = new XmlDocument();
oXmlDoc.XmlResolver = null;

Categories

Resources