Getting "ï»¿" at the beginning of my XML File after save() [duplicate]

Getting "ï»¿" at the beginning of my XML File after save() [duplicate] - c#

This question already has answers here:
How can I remove the BOM from XmlTextWriter using C#?
(2 answers)
Closed 7 years ago.
I'm opening an existing XML file with C#, and I replace some nodes in there. All works fine. Just after I save it, I get the following characters at the beginning of the file:
ï»¿ (EF BB BF in HEX)
The whole first line:
ï»¿<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The rest of the file looks like a normal XML file.
The simplified code is here:
XmlDocument doc = new XmlDocument();
doc.Load(xmlSourceFile);
XmlNode translation = doc.SelectSingleNode("//trans-unit[#id='127']");
translation.InnerText = "testing";
doc.Save(xmlTranslatedFile);
I'm using a C# Windows Forms application with .NET 4.0.
Any ideas? Why would it do that? Can we disable that somehow? It's for Adobe InCopy, and it does not open it like this.
UPDATE:
Alternative Solution:
Saving it with the XmlTextWriter works too:
XmlTextWriter writer = new XmlTextWriter(inCopyFilename, null);
doc.Save(writer);

It is the UTF-8 BOM, which is actually discouraged by the Unicode standard:
http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
Use of a BOM is neither required nor
recommended for UTF-8, but may be
encountered in contexts where UTF-8
data is converted from other encoding
forms that use a BOM or where the BOM
is used as a UTF-8 signature
You may disable it using:
var sw = new IO.StreamWriter(path, new System.Text.UTF8Encoding(false));
doc.Save(sw);
sw.Close();

It's a UTF-8 Byte Order Mark (BOM) and is to be expected.

You can try to change the encoding of the XmlDocument. Below is the example copied from MSDN
using System; using System.IO; using System.Xml;
public class Sample {
public static void Main() {
// Create and load the XML document.
XmlDocument doc = new XmlDocument();
string xmlString = "<book><title>Oberon's Legacy</title></book>";
doc.Load(new StringReader(xmlString));
// Create an XML declaration.
XmlDeclaration xmldecl;
xmldecl = doc.CreateXmlDeclaration("1.0",null,null);
xmldecl.Encoding="UTF-16";
xmldecl.Standalone="yes";
// Add the new node to the document.
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmldecl, root);
// Display the modified XML document
Console.WriteLine(doc.OuterXml);
}
}

As everybody else mentioned, it's Unicode issue.
I advise you to try LINQ To XML. Although not really related, I mention it as it's super easy compared to old ways and, more importantly, I assume it might have automatic resolutions to issues like these without extra coding from you.

Related

Namespaces, Schemas, Elements and Attributes in an XmlDocument in .NET

I'm putting this here because I saw a lot of Q&A for XML on StackOverflow while trying to solve my own problems, and figured that once I'd found it, I'd post what I found so when someone else needs some XML help, this might help them.
My goal: To create an XML document that contains the following XML Declaration, Schema & Namespace Information:
<?xml version="1.0" encoding="UTF-8"?>
<abc:abcXML xsi:schemaLocation="urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ase="urn:abcXML:v12">
I'd already done it in Python for a quick prototype using minidom, and it was very simple. I needed to do it in a .NET language though (C#), because that's what the business calls for. I'm quite familiar with C#, but I've always stayed away from processing XML with it because I honestly don't have an in-depth grasp of XML and it's guidelines. Today, I had to face my demons.

Here's how I did it:
The first part is simple enough - create a document, and create a DocumentElement for the root (there's a catch here which I get to later):
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmlDeclaration, root);
The next part seems simple enough - create an element, give it a prefix, name and URI, then append it to the document. I thought this would work, but it doesn't (this is where the minimal understanding of XML comes into play):
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38 http://www.w3.org/2001/XMLSchema-instance");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd");
abcXML.AppendChild(xmlAttr);
xmlDoc.AppendChild(abcXML);
I tried to use doc.LoadXml() and doc.CreateDocumentFragment() and write my own declarations. No - I would get "Unexpected end of file". For those interested in XmlDocumentFragment: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocumentfragment.innerxml?view=netcore-3.1
This Microsoft article about XML Schemas and Namespaces didn't directly help me: https://learn.microsoft.com/en-us/dotnet/standard/data/xml/including-or-importing-xml-schemas
After doing more reading on XML, and going through the documentation for XmlDocument, XmlElement and XmlAttribute, this is the solution:
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "http://www.w3.org/2001/XMLSchema-instance");
xmlAttr.InnerXml = "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd";
abcXML.Attributes.Append(xmlAttr);
xmlDoc.AppendChild(abcXML);
Now you can add the elements to your document like so:
XmlElement header = doc.CreateElement(string.Empty, "Header", string.Empty);
abcXML.AppendChild(header);
To save the document, I used:
xmlDoc.Save(fileLocation);
I compared my output to the sample I had, and after comparing the file contents, I had succeeded in matching it. I provided the output to the client, they uploaded it into application they were using, and it failed: Row 1, Column 1 - Unexpected Character.
I had a suspicion it was encoding, and I was right. Using xmlDoc.Save(fileLocation) is correct, but it generates a UTF-8 file with the Byte Order Mark (BOM) at Row 1, Column 1. The XML parsing function in the application doesn't expect that, so the process failed. To fix that, I used the following method:
Encoding enc = new UTF8Encoding(false); /* This creates a UTF-8 encoding without the BOM */
using (System.IO.TextWriter tw = new System.IO.StreamWriter(filePath, false, enc))
{
xmlDoc.Save(tw);
}
return true;
I generated the file again, sent it to the client, and it worked first go.
I hope someone finds this to be useful.

For complicated namespaces it is simpler to just parse the xml string. I like using xml linq. You sample xml is wrong. The namespace is "ase" (not abc).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<ase:abcXML xsi:schemaLocation=\"urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd\"" +
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"" +
" xmlns:ase=\"urn:abcXML:v12\">" +
"</ase:abcXML>";
XDocument doc = XDocument.Parse(xml);
XElement root = doc.Root;
XNamespace nsAse = root.GetNamespaceOfPrefix("ase");
}
}
}

C# Parsing XML in ISO-8859-1

I'm working on a tool for validating XML files grabbed from a mainframe. For reasons beyond my control every XML file is encoded in ISO 8859-1.
<?xml version="1.0" encoding="ISO 8859-1"?>
My C# application utilizes the System.XML library to parse the XML and eventually a string of a message contained within one of the child nodes.
If I manually remove the XML encoding line it works just fine. But i'd like to find a solution that doesn't require manual intervention. Are there any elegant approaches to solving this? Thanks in advance.
The exception that is thrown reads as:
System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31
My code is
XMLDocument xmlDoc = new XMLDocument();
xmlDoc.Load(//fileLocation);

As Jeroen pointed out in a comment, the encoding should be:
<?xml version="1.0" encoding="ISO-8859-1"?>
not:
<?xml version="1.0" encoding="ISO 8859-1"?>
(missing dash -).
You can use a StreamReader with an explicit encoding to read the file anyway:
using (var reader = new StreamReader("//fileLocation", Encoding.GetEncoding("ISO-8859-1")))
{
var xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
// ...
}
(from answer by competent_tech in other thread I linked in an earlier comment).
If you do not want the using statement, I guess you can do:
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));
Instead of XmlDocument, you can use the XDocument class in the namespace System.Xml.Linq if you refer the assembly System.Xml.Linq.dll (since .NET 3.5). It has static methods like Load(Stream) and Parse(string) which you can use as above.

How to transform XMLDocument using XSLT in C# 2.0

I am using C# 2.0 and I have got below code:
XmlDocument doc = new XmlDocument();
doc.LoadXml(GetListOfPagesInStructureGroup(m_Page.Id));
In above I am loading my XMLDocument with method which returns as string, now after some processing on above xmldocument I want to apply XSLT on the above XMLDocument to render my desired result according to XSLT and finally my function will return whole rendered XML as string
Please suggest!!

Please suggest on below solution:
XslCompiledTransform xslTransform = new XslCompiledTransform();
StringWriter writer = new StringWriter();
xslTransform.Load("xslt/RenderDestinationTabXML.xslt");
xslTransform.Transform(doc.CreateNavigator(),null, writer);
return writer.ToString();
Thanks!!

Try the XslCompiledTransform class.

There are lots of examples on the web of transforming an XML file to a different format using an XSLT file, like the following:
XslTransform myXslTransform = new XslTransform();
XsltSettings myXsltSettings = new XsltSettings();
myXsltSettings.EnableDocumentFunction = true;
myXslTransform.Load("transform.xsl");
myXslTransform.Transform("input.xml", "output.xml");
However this is only a partial answer, I would like to be able to get the XML input data from a web form and use that as the input data instead of an '.xml' file, but have not found any concrete examples, also using Visual Studio I can see the different constructors and methods that are available and I am not seeing one that accepts xml data in a string format, so it would be very helpful if someone could provide an example of that.

Re " I want my same XMlDocument updated " - it doesn't work like that; the output is separate to the input. If that is important, just use a StringWriter or MemoryStream as the destination, then reload the XmlDocument from the generated output.
Consider in particular: the output from an xslt transformation does not have to be xml, and also: the xslt is most likely using the node tree during the operation; changing the structure in-place would make that very hard.

How to transform an xml string using a XSLT in C#

I'd like to transform a string that contains an xml using a XSLT, it's for a Colombian company, so I have the following code (don't try to understand it):
string xmlTFDNode = #<tfd:TimbreFiscalDigital xmlns:tfd="http://www.sat.gob.mx/TimbreFiscalDigital" xsi:schemaLocation="http://www.sat.gob.mx/TimbreFiscalDigital TimbreFiscalDigital.xsd" selloCFD="tOSe+Ex/wvn33YlGwtfmrJwQ31Crd7lI9VcH63TGjHfxk5vfb3q9uSbDUGk9TXvo70ydOpikRVw+9B2Six0m bu3PjoPpO909oAYITrRyomdeUGJ4vmA2/12L86EJLWpU7vIt4cL8HpkEw7TOFhSdpzb/890+jP+C1adBsHU1VHc=" FechaTimbrado="2010-03-06T20:40:10" UUID="ad662d33-6934-459c-a128-bdf0393e0f44" noCertificadoSAT="30001000000100000801" version="1.0" selloSAT="j5bSpqM3w0+shGtImqOwqqy6+d659O78ckfstu5vTSFa+2CVMj6Awfr18x4yMLGBwk6ruYbjBlVURodEIl6n JIhTTUtYQV1cbRDG9kvvhaNAakxqaSOnOx79nHxqFPRVoqh10CsjocS9PZkSM2jz1uwLgaF0knf1g8pjDkLYwlk="/>
and I have a XLST stored on the server named InvoiceTFD.xslt
This is the XSLT file
I want to create a method to return a string with the data transformed, it shoud look like this (that's what the XSLT does):
||1.0|ad662d33-6934-459c-a128-bdf0393e0f44|2001-12-
17T09:30:47Z|iYyIk1MtEPzTxY3h57kYJnEXNae9lvLMgAq3jGMePsDtEOF6XLWbrV2GL/
2TX00vP2+YsPN+5UmyRdzMLZGEfESiNQF9fotNbtA487dWnCf5pUu0ikVpgHvpY7YoA4
lB1D/JWc+zntkgW+Ig49WnlKyXi0LOlBOVuxckDb7EAx4=|12345678901234 567890||
The problem I is that the XslTransform.Transform method creates a new file, and I don't want to write a file
Recapitulating, I just want to take a string, transform it using a XSLT file I have, and return a string with the transformation without creating files on the server, that's it!
I believe it's not that hard, but I'm new in .NET so I really don't know how to do it :(
Thanks in advance and have a great day guys !!

You can write to a memory stream:
MemoryStream oStream = new MemoryStream()
oXslt.Transform(new XPathDocument(new XmlNodeReader(oXml)), null, oStream );
oStream.Position = 0
StreamReader oReader = new StreamReader(oStream);
string output = oReader.ReadToEnd();
BTW, use XPathDocument and XslCompiledTransform. They are much faster than XslTransform and XmlDocument. Even if you use an XmlDocument to create the xml, covert it to an XPathDocument for the transform.

Transform method can take Stream outputStream parameter. You can create StringWriter and pass it as output stream.

Use XslCompiledTransform instead of XslTransform. Use method: XslCompiledTransform.Transform, save result to OutputStream.
As I see, your XSLT is version 2.0. Neither of them support XSLT 2.0.

Correcting Encoding in a large Xml File

I'm importing data from XML files containing this type of content:
<FirstName>™MšR</FirstName><MiddleName/><LastName>HšNER™Z</LastName>
The XML is loaded via:
XmlDocument doc = new XmlDocument();
try
{
doc.Load(fullFilePath);
}
When I execute this code with the data contained on top I get an exception about an illegal character. I understand that part just fine.
I'm not sure which encoding this is or how to solve this problem. Is there a way I can change the encoding of the XmlDocument or another method to make sure the above content is parsed correctly?
Update: I do not have any encoding declaration or <?xml in this document.
I've seen some links say to add it dynamically? Is this UTF-16 encoding?

It appears that:
The name was ÖMÜR HÜNERÖZ (or possibly ÔMÜR HÜNERÔZ or ÕMÜR HÜNERÕZ; I don't know what language that is).
The XML file was encoded using the DOS "OEM" code page, probably 437 or 850.
But it was decoded using windows-1252 (the "ANSI" code page).

If you look at the file with a hex editor (HXD or Visual Studio, for instance), what exactly do you see?
Is every character from the string you posted represented by a single byte? Does the file have a byte-order mark (a bunch of non-printable bytes at the start of the file)?
The ™ and š seem to indicate that something went pretty wrong with encoding/conversion along the way, but let's see... I guess they both correspond with a vowel (O-M-A-R H-A-NER-O-Z, maybe?), but I haven't figured out yet how they ended up looking like this...
Edit: dan04 hit the nail on the head. ™ in cp-1252 has hex value 99, and š is 9a. In cp-437 and cp-850, hex 99 represents Ö, and 9a Ü.
The fix is simple: just specify this encoding when opening your XML file:
XmlDocument doc = new XmlDocument();
using (var reader = new StreamReader(fileName, Encoding.GetEncoding(437)))
{
doc.Load(reader);
}

From here:
Encoding encoding;
using (var stream = new MemoryStream(bytes))
{
using (var xmlreader = new XmlTextReader(stream))
{
xmlreader.MoveToContent();
encoding = xmlreader.Encoding;
}
}
You might want to take a look at this: How to best detect encoding in XML file?
For actual reading you can use StreamReader to take care of BOM(Byte order mark):
string xml;
using (var reader = new StreamReader("FilePath", true))
{ // ↑
xml= reader.ReadToEnd(); // detectEncodingFromByteOrderMarks
}
Edit: Removed the encoding parameter. StreamReader will detect the encoding of a file if the file contains a BOM. If it does not it will default to UTF8.
Edit 2: Detecting Text Encoding for StreamReader

Obviously you provided a fragment of the XML document since it's missing a root element, so I'll assume that was your intention. Is there an xml processing instruction at the top like <?xml version="1.0" encoding="UTF-8" ?>?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Getting "ï»¿" at the beginning of my XML File after save() [duplicate] - c#

It's a UTF-8 Byte Order Mark (BOM) and is to be expected.

As everybody else mentioned, it's Unicode issue. I advise you to try LINQ To XML. Although not really related, I mention it as it's super easy compared to old ways and, more importantly, I assume it might have automatic resolutions to issues like these without extra coding from you.

Related

Namespaces, Schemas, Elements and Attributes in an XmlDocument in .NET

C# Parsing XML in ISO-8859-1

How to transform XMLDocument using XSLT in C# 2.0

How to transform an xml string using a XSLT in C#

Correcting Encoding in a large Xml File

Categories

Resources