XmlException: Text node cannot appear in this state. Line 1, position 1 - c#

Before I get into the issue, I'm aware there is another question that sounds exactly the same as mine. However, I've tried that solution (using Notepad++ to encode the xml file as UTF-8 (without BOM) ) and it doesn't work.
XmlDocument namesDoc = new XmlDocument();
XmlDocument factionsDoc = new XmlDocument();
namesDoc.LoadXml(Application.persistentDataPath + "/names.xml");
factionsDoc.LoadXml(Application.persistentDataPath + "/factions.xml");
Above is the code I have problems with. I'm not sure what the problem is.
<?xml version="1.0" encoding="UTF-8"?>
<factions>
<major id="0">
...
Above is a section of the XML file (the start of it - names.xml is also the same except it has no 'id' attribute). The file(s) are both encoded in UTF-8 - in the latest notepad++ version, there is no option of "encode in UTF-8 without BOM" afaik UTF-8 is the same as UTF-8 without BOM.
Does anyone have any idea what the cause may be? Or am I doing something wrong/forgetting something? :/

You are receiving an error because the .LoadXml() method expects a string argument that contains the XML data, not the location of an XML file. If you want to load an XML file then you need to use the .Load() method, not the .LoadXml() method.

Related

How to place XML Processing Instruction on Line 1 using System.XML.Linq

I am writing a console application that generates an XML file that will be consumed by a server job processing application that was written a long time ago. The server app requires a processing instruction: <?JtJob jobname?>. I'm using C# XDocument to generate my xml:
XDocument xml = new XDocument(new XProcessingInstruction("JtJob", "FieldInspection3_Rejected"),
new XElement("Document",
new XElement("DataFile", tempFileName),
new XElement("FormType","Corrected Form Package"),
new XElement("BYOD_RejectComment",reasonForRejection),
new XElement("BYOD_FromTech",techEmail)
)
);
xml.Save(Path.Combine("C:\\Data", DateTime.Now.ToString("yyyyMMdd_HHmmssffff") + "_Rejected.xml"));
For some reason, the server app requires the processing instruction to be on the first line. If my xml file looks like this:
<?xml version="1.0" encoding="utf-8"?><?JtJob FieldInspection3_Rejected?>
<Document>
<DataFile>C:\Windows\TEMP\tmp387F.tmp</DataFile>
<FormType>Corrected Form Package</FormType>
<BYOD_RejectComment>you're ugly</BYOD_RejectComment>
<BYOD_FromTech>example#gmail.com</BYOD_FromTech>
</Document>
Everything works fine. But when it looks like this:
<?xml version="1.0" encoding="utf-8"?>
<?JtJob FieldInspection3_Rejected?>
<Document>
<DataFile>C:\Windows\TEMP\tmp387F.tmp</DataFile>
<FormType>Corrected Form Package</FormType>
<BYOD_RejectComment>you're ugly</BYOD_RejectComment>
<BYOD_FromTech>example#gmail.com</BYOD_FromTech>
</Document>
It errors. My problem is, using the XDocument code above, it generates the second output.
Without loading my generated xml back in as a string and manipulating the string, is there a way for me to tell XDocument to create the processing instruction on the first line?
I know the blame is definitely to be placed on the server app for not accepting valid XML syntax, but my goal is to get this to work, not fix a 20 year old program.
Edit: Thanks! Using the save override preserved the formatting. Didn't make it all one line, but it allowed me to keep the PI on line 1.
Edit 2: Well, that didn't help me either. But I found out what would help me! XDocument.Save() by default outputs UTF8 With BOM. I changed it to without BOM by using XMLTextWriter and that worked.
What if you used the XDocument.Save(String, SaveOptions) method to get an output all on a single line?
So do this instead:
xml.Save(fileName, SaveOptions.DisableFormatting);
This would force the declaration to be onto the first line with the downside of having the entire document on the first line, but if it works for that program then so be it.
You'll want to use a XDocument.Save() overload that allows you to specify formatting options:
xml.Save(Path.Combine("C:\\Data", DateTime.Now.ToString("yyyyMMdd_HHmmssffff") + "_Rejected.xml"),
SaveOptions.DisableFormatting);
https://msdn.microsoft.com/en-us/library/bb551426(v=vs.110).aspx

Xdocument, Xelement.Save incorrect encoding

I 'm having problem with the code presented:
string serializedLicence = SerializationHelper.ToXML(licenceInfo);
var licenceFileXml = new XElement("Licence", new XElement("LicenceData", serializedLicence)));
XmlDocument signedLicence = SignXml(licenceFileXml.ToString(), Properties.Resources.PRIVATE_KEY);
signedLicence.Save(saveFileDialogXmlLicence.FileName);
The created file has an incorrect coding of strings send to XElement constructors aswell as the signature, that assigned with custom SignXml() method (which creates signature with XmlDocument.DocumentElement.AppendChild() method, but that's irrelevant right now). The output:
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<Licence>
<LicenceData><?xml version="1.0" encoding="utf-16"?>
<LicenceInfo
//stuff stuff stuff
</LicenceInfo></LicenceData>
<Signature><SignedInfo xmlns="h stuff stuff stuff</Signature>
</Licence>
So basically I'm taking serialized object string and put it between markers, and this part gets encoded wrong. Debugger shows me, that the text in XElement object is holding < and > just after creating it. I could parse it manually, but that's inapropriate.
Note: befeore that, I was straight signing the deserialisation xml and it worked fine, so I can't figure it out why XDocument uses different encoding than XmlSerializer/XmlDocument object.
Also: I think I could just use XmlDocument object to build the file, but I'm curious what's wrong.
You're adding serializedLicence as string, so it's treated as text, not as XML and that's why it looks like that in you document.
var licenceFileXml = new XElement("Licence",
new XElement("LicenceData",
XDocument.Parse(serializedLicence).Root)));

XmlDocument.Loadxml System.Xml.XmlException: Name cannot begin with the '8' character - xml from imageglue

I'm using ImageGlue to read Exif data from an image. ImageGlue gets the exif data successfully and returns an xml string. When I try to use an XMLDocument to load the xml string, it throws the error:
Name cannot begin with the "8" character".
I know what part of the xml is causing the error, but I don't know if it is an issue with the xml, or the xmldocument object trying to load. The xml is below, the tag causing the error is the last one:
Industries, Inc. It doesn't like the "8298", if I remove it, it works fine. Is it ImageGlue's issue not generating the right xml from the Exif, or is the XmlDocument object (C#) not reading it correctly...?
<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>
<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:exif=\"http://ns.adobe.com/exif/1.0/#\">
<rdf:Description>
<dc:date>2013-10-17T14-08-19Z</dc:date>
<dc:type>image</dc:type>
<dc:format>image/jpeg</dc:format>
<dc:source>Photo</dc:source>
<exif:ImageWidth>4368 pixels</exif:ImageWidth>
<exif:ImageLength>2912 pixels</exif:ImageLength>
<exif:BitsPerSample>8,8,8</exif:BitsPerSample>
<exif:Compression>6</exif:Compression>
<exif:PhotometricInterpretation>2</exif:PhotometricInterpretation>
<exif:Model>Canon EOS 5D</exif:Model>
<exif:Orientation>Normal</exif:Orientation>
<exif:SamplesPerPixel>3</exif:SamplesPerPixel>
<exif:XResolution>72 pixels per inch</exif:XResolution>
<exif:YResolution>72 pixels per inch</exif:YResolution>
<exif:ResolutionUnit>inch</exif:ResolutionUnit>
<exif:Software>Adobe Photoshop CS5 Windows</exif:Software>
<exif:DateTime>2013:10:16 10:42:48</exif:DateTime>
<exif:Artist>bobbi </exif:Artist>
<exif:ThumbnailOffset>838</exif:ThumbnailOffset>
<exif:ThumbnailLength>6049</exif:ThumbnailLength>
<exif:Tag 8298>Industries, Inc.</exif:Tag 8298>
That's invalid XML - it's ImageGlue not generating the right XML. It is trying to use an XML tag name of "Tag 8298", but the space doesn't work in there, meaning the 8298 is interpreted as an attribute name, which cannot begin with a digit.

XML header missing after converting an XML file into a Binary Format File

I have a problem. I have an XML spreadsheet file that I'm trying to send via email. So I converted into a binary file and attached it to an email. The problem is when I'm trying to open it (on Excel), it's not showing the data that I saved. When I opened it like an XML file I realized that it didn't saved the XML header:
The way it should be:
<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
...
<Styles>
...
</Styles>
<Worksheet>
...
</Worksheet></Workbook>
after converting:
<Worksheet>
...
</Worksheet>
I've tried to use an xmldocument but i wasn't working, I also tried using a string, still not working. This is how I convert the XML to binary:
UTF8Encoding encoding = new UTF8Encoding();
binaryFile = encoding.GetBytes(xmlFile);
How can I fix this problem?
Thanks.
I think we need more information on how you're converting the XML file.
From your description it sounds like you've saved an Excel Spreadsheet to XML and for whatever reason you cannot just attach this text document to an email. My guess is you're using a method to attach the XML file that requires a byte array and can't just be provided a file location. If you could provide more information on this, it would help us figure out where things are going wrong for you.
The part I'm really stuck on is:
I've tried to use an xmldocument but i wasn't working, I also tried
using a string, still not working.
How did you try string? Did you read the file from disk using FileStream? If so, you should have been able to retrieve the full contents of the file.
Were you using XmlDocument the whole time and trying XmlDocument.OuterXml? This probably won't give you the control headers since they're not part of the XML body inside the root node.
So really there are two things I would have tried. First, if I had an XML file on disk and needed to attach it to an email through code and my only option was to provide a byte array, I'd do something like:
using (FileStream fs = new FileStream("", FileMode.Open, FileAccess.Read))
{
byte[] binaryFile = new byte[fs.Length];
fs.Read(binaryFile, 0, buff.LongLength);
//Copy the byte array to your email object.
}
Now if this isn't what you're doing, you'll need to provide a lot more detail on what you are starting with (file on disk?), what you need to do (send automated email?), what constraints you have and any other information that would limit potential solutions.
I've found my mistake: I didn't serialized the XML file so that's why after the conversion it just shows the data without the XML header. so there's 2 ways to resolve this problem:
first, we can concatenate the header with the data string, or we can use the serialize function. This is where I've found how to do it.

Correcting Encoding in a large Xml File

I'm importing data from XML files containing this type of content:
<FirstName>™MšR</FirstName><MiddleName/><LastName>HšNER™Z</LastName>
The XML is loaded via:
XmlDocument doc = new XmlDocument();
try
{
doc.Load(fullFilePath);
}
When I execute this code with the data contained on top I get an exception about an illegal character. I understand that part just fine.
I'm not sure which encoding this is or how to solve this problem. Is there a way I can change the encoding of the XmlDocument or another method to make sure the above content is parsed correctly?
Update: I do not have any encoding declaration or <?xml in this document.
I've seen some links say to add it dynamically? Is this UTF-16 encoding?
It appears that:
The name was ÖMÜR HÜNERÖZ (or possibly ÔMÜR HÜNERÔZ or ÕMÜR HÜNERÕZ; I don't know what language that is).
The XML file was encoded using the DOS "OEM" code page, probably 437 or 850.
But it was decoded using windows-1252 (the "ANSI" code page).
If you look at the file with a hex editor (HXD or Visual Studio, for instance), what exactly do you see?
Is every character from the string you posted represented by a single byte? Does the file have a byte-order mark (a bunch of non-printable bytes at the start of the file)?
The ™ and š seem to indicate that something went pretty wrong with encoding/conversion along the way, but let's see... I guess they both correspond with a vowel (O-M-A-R H-A-NER-O-Z, maybe?), but I haven't figured out yet how they ended up looking like this...
Edit: dan04 hit the nail on the head. ™ in cp-1252 has hex value 99, and š is 9a. In cp-437 and cp-850, hex 99 represents Ö, and 9a Ü.
The fix is simple: just specify this encoding when opening your XML file:
XmlDocument doc = new XmlDocument();
using (var reader = new StreamReader(fileName, Encoding.GetEncoding(437)))
{
doc.Load(reader);
}
From here:
Encoding encoding;
using (var stream = new MemoryStream(bytes))
{
using (var xmlreader = new XmlTextReader(stream))
{
xmlreader.MoveToContent();
encoding = xmlreader.Encoding;
}
}
You might want to take a look at this: How to best detect encoding in XML file?
For actual reading you can use StreamReader to take care of BOM(Byte order mark):
string xml;
using (var reader = new StreamReader("FilePath", true))
{ // ↑
xml= reader.ReadToEnd(); // detectEncodingFromByteOrderMarks
}
Edit: Removed the encoding parameter. StreamReader will detect the encoding of a file if the file contains a BOM. If it does not it will default to UTF8.
Edit 2: Detecting Text Encoding for StreamReader
Obviously you provided a fragment of the XML document since it's missing a root element, so I'll assume that was your intention. Is there an xml processing instruction at the top like <?xml version="1.0" encoding="UTF-8" ?>?

Categories

Resources