C# Parsing XML in ISO-8859-1 - c#

I'm working on a tool for validating XML files grabbed from a mainframe. For reasons beyond my control every XML file is encoded in ISO 8859-1.
<?xml version="1.0" encoding="ISO 8859-1"?>
My C# application utilizes the System.XML library to parse the XML and eventually a string of a message contained within one of the child nodes.
If I manually remove the XML encoding line it works just fine. But i'd like to find a solution that doesn't require manual intervention. Are there any elegant approaches to solving this? Thanks in advance.
The exception that is thrown reads as:
System.Xml.XmlException' occurred in System.Xml.dll. System does not support 'ISO 8859-1' encoding. Line 1, position 31
My code is
XMLDocument xmlDoc = new XMLDocument();
xmlDoc.Load(//fileLocation);

As Jeroen pointed out in a comment, the encoding should be:
<?xml version="1.0" encoding="ISO-8859-1"?>
not:
<?xml version="1.0" encoding="ISO 8859-1"?>
(missing dash -).
You can use a StreamReader with an explicit encoding to read the file anyway:
using (var reader = new StreamReader("//fileLocation", Encoding.GetEncoding("ISO-8859-1")))
{
var xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
// ...
}
(from answer by competent_tech in other thread I linked in an earlier comment).
If you do not want the using statement, I guess you can do:
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(File.ReadAllText("//fileLocation", Encoding.GetEncoding("ISO-8859-1")));
Instead of XmlDocument, you can use the XDocument class in the namespace System.Xml.Linq if you refer the assembly System.Xml.Linq.dll (since .NET 3.5). It has static methods like Load(Stream) and Parse(string) which you can use as above.

Related

How do I edit Node Values in an Xml File with C#

I am trying to change the values in a Farming simulator 22 savegame xml file from C# in visual studio. There are a lot of nodes so I have reduced them to make things easier. I want to know how to replace the value in the node using C# with out having to create and rebuild the xml file from scratch.
the path to the xml file is: (C:\Users\Name\Documents\My Games\FarmingSimulator2022\savegame1\careerSavegame.xml)
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<careerSavegame revision="2" valid="true">
<settings>
<savegameName>My game save</savegameName>
<creationDate>2022-05-03</creationDate>
<mapId>MapFR</mapId>
<mapTitle>Haut-Beyleron</mapTitle>
<saveDateFormatted>2022-08-22</saveDateFormatted>
<saveDate>2022-08-22</saveDate>
<resetVehicles>false</resetVehicles>
</careerSavegame>
You can use the System.Xml.Linq namespace to access the xml file. This will load the file in the memory.
There is one class inside it, XDocument, that represents the xml document.
String filePath = "C:\Users\Name\Documents\My Games\FarmingSimulator2022\savegame1\careerSavegame.xml"
XDocument xdoc = XDocument.Load(filePath);
var element = xdoc.Elements("MyXmlElement").Single();
element.Value = "foo";
xdoc.Save("file.xml");
You can set the element variable as per the one which is needed to be replaced.
Through some research I found the solution to editing the values within the nodes. In this example I only change the value of savegameName, but it will be the same for the rest.
//Routing the xml file
XmlDocument xmlsettings = new XmlDocument();
xmlsettings.Load(#"D:\careerSavegame.xml");
//Setting values to nodes through innertext
String FarmNameSetting = "Martek Farm";
XmlNode savegameNamenode =
xmlsettings.SelectSingleNode
("careerSavegame/settings/savegameName");
savegameNamenode.InnerText = FarmNameSetting;

Namespaces, Schemas, Elements and Attributes in an XmlDocument in .NET

I'm putting this here because I saw a lot of Q&A for XML on StackOverflow while trying to solve my own problems, and figured that once I'd found it, I'd post what I found so when someone else needs some XML help, this might help them.
My goal: To create an XML document that contains the following XML Declaration, Schema & Namespace Information:
<?xml version="1.0" encoding="UTF-8"?>
<abc:abcXML xsi:schemaLocation="urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ase="urn:abcXML:v12">
I'd already done it in Python for a quick prototype using minidom, and it was very simple. I needed to do it in a .NET language though (C#), because that's what the business calls for. I'm quite familiar with C#, but I've always stayed away from processing XML with it because I honestly don't have an in-depth grasp of XML and it's guidelines. Today, I had to face my demons.
Here's how I did it:
The first part is simple enough - create a document, and create a DocumentElement for the root (there's a catch here which I get to later):
XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmlDeclaration, root);
The next part seems simple enough - create an element, give it a prefix, name and URI, then append it to the document. I thought this would work, but it doesn't (this is where the minimal understanding of XML comes into play):
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38 http://www.w3.org/2001/XMLSchema-instance");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd");
abcXML.AppendChild(xmlAttr);
xmlDoc.AppendChild(abcXML);
I tried to use doc.LoadXml() and doc.CreateDocumentFragment() and write my own declarations. No - I would get "Unexpected end of file". For those interested in XmlDocumentFragment: https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocumentfragment.innerxml?view=netcore-3.1
This Microsoft article about XML Schemas and Namespaces didn't directly help me: https://learn.microsoft.com/en-us/dotnet/standard/data/xml/including-or-importing-xml-schemas
After doing more reading on XML, and going through the documentation for XmlDocument, XmlElement and XmlAttribute, this is the solution:
XmlElement abcXML = xmlDoc.CreateElement("ase", "abcXML", "urn:abcXML:r38");
XmlAttribute xmlAttr = xmlDoc.CreateAttribute("xsi:schemaLocation", "http://www.w3.org/2001/XMLSchema-instance");
xmlAttr.InnerXml = "urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd";
abcXML.Attributes.Append(xmlAttr);
xmlDoc.AppendChild(abcXML);
Now you can add the elements to your document like so:
XmlElement header = doc.CreateElement(string.Empty, "Header", string.Empty);
abcXML.AppendChild(header);
To save the document, I used:
xmlDoc.Save(fileLocation);
I compared my output to the sample I had, and after comparing the file contents, I had succeeded in matching it. I provided the output to the client, they uploaded it into application they were using, and it failed: Row 1, Column 1 - Unexpected Character.
I had a suspicion it was encoding, and I was right. Using xmlDoc.Save(fileLocation) is correct, but it generates a UTF-8 file with the Byte Order Mark (BOM) at Row 1, Column 1. The XML parsing function in the application doesn't expect that, so the process failed. To fix that, I used the following method:
Encoding enc = new UTF8Encoding(false); /* This creates a UTF-8 encoding without the BOM */
using (System.IO.TextWriter tw = new System.IO.StreamWriter(filePath, false, enc))
{
xmlDoc.Save(tw);
}
return true;
I generated the file again, sent it to the client, and it worked first go.
I hope someone finds this to be useful.
For complicated namespaces it is simpler to just parse the xml string. I like using xml linq. You sample xml is wrong. The namespace is "ase" (not abc).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<ase:abcXML xsi:schemaLocation=\"urn:abcXML:v12 http://www.test.com/XML/schemas/v12/abcXML_v12.xsd\"" +
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"" +
" xmlns:ase=\"urn:abcXML:v12\">" +
"</ase:abcXML>";
XDocument doc = XDocument.Parse(xml);
XElement root = doc.Root;
XNamespace nsAse = root.GetNamespaceOfPrefix("ase");
}
}
}

Xdocument, Xelement.Save incorrect encoding

I 'm having problem with the code presented:
string serializedLicence = SerializationHelper.ToXML(licenceInfo);
var licenceFileXml = new XElement("Licence", new XElement("LicenceData", serializedLicence)));
XmlDocument signedLicence = SignXml(licenceFileXml.ToString(), Properties.Resources.PRIVATE_KEY);
signedLicence.Save(saveFileDialogXmlLicence.FileName);
The created file has an incorrect coding of strings send to XElement constructors aswell as the signature, that assigned with custom SignXml() method (which creates signature with XmlDocument.DocumentElement.AppendChild() method, but that's irrelevant right now). The output:
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<Licence>
<LicenceData><?xml version="1.0" encoding="utf-16"?>
<LicenceInfo
//stuff stuff stuff
</LicenceInfo></LicenceData>
<Signature><SignedInfo xmlns="h stuff stuff stuff</Signature>
</Licence>
So basically I'm taking serialized object string and put it between markers, and this part gets encoded wrong. Debugger shows me, that the text in XElement object is holding < and > just after creating it. I could parse it manually, but that's inapropriate.
Note: befeore that, I was straight signing the deserialisation xml and it worked fine, so I can't figure it out why XDocument uses different encoding than XmlSerializer/XmlDocument object.
Also: I think I could just use XmlDocument object to build the file, but I'm curious what's wrong.
You're adding serializedLicence as string, so it's treated as text, not as XML and that's why it looks like that in you document.
var licenceFileXml = new XElement("Licence",
new XElement("LicenceData",
XDocument.Parse(serializedLicence).Root)));

Convert Encoded XML Tags To Nodes

I have the following XML obtain via a SOAP call.
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetValueListForFieldResponse xmlns="http://URLHere/webservices/">
<GetValueListForFieldResult>
&lt;SelectDef&gt; &lt;Id&gt;1736&lt;/Id&gt; &lt;SelectName&gt;Values List&lt;/SelectName&gt; &lt;GlobalFlag&gt;False&lt;/GlobalFlag&gt; &lt;Sort&gt;1&lt;/Sort&gt; &lt;SelectDefValues&gt; &lt;SelectDefValue&gt; &lt;guid&gt;aaf6f3a7-6a74-4187-b4e7-3a9355b16796&lt;/guid&gt; &lt;Id&gt;14692&lt;/Id&gt; &lt;Name&gt;Open&lt;/Name&gt; &lt;Description&gt;&lt;/Description&gt; &lt;Color&gt;#000000&lt;/Color&gt; &lt;DefaultFlag&gt;False&lt;/DefaultFlag&gt; &lt;SortOrder&gt;1&lt;/SortOrder&gt; &lt;select_id&gt;1736&lt;/select_id&gt; &lt;/SelectDefValue&gt; &lt;SelectDefValue&gt; &lt;guid&gt;f5082b54-d799-4fdc-80c1-0e232b360057&lt;/guid&gt; &lt;Id&gt;14693&lt;/Id&gt; &lt;Name&gt;Closed&lt;/Name&gt; &lt;Description&gt;&lt;/Description&gt; &lt;Color&gt;#000000&lt;/Color&gt; &lt;DefaultFlag&gt;False&lt;/DefaultFlag&gt; &lt;SortOrder&gt;0&lt;/SortOrder&gt; &lt;select_id&gt;1736&lt;/select_id&gt; &lt;/SelectDefValue&gt; &lt;SelectDefValue&gt; &lt;guid&gt;94e29e78-2ab3-463f-bbb6-ab7f36003c7f&lt;/guid&gt; &lt;Id&gt;14780&lt;/Id&gt; &lt;Name&gt;Past Due&lt;/Name&gt; &lt;Description&gt;&lt;/Description&gt; &lt;Color&gt;#000000&lt;/Color&gt; &lt;DefaultFlag&gt;False&lt;/DefaultFlag&gt; &lt;SortOrder&gt;2&lt;/SortOrder&gt; &lt;select_id&gt;1736&lt;/select_id&gt; &lt;/SelectDefValue&gt; &lt;/SelectDefValues&gt; &lt;/SelectDef&gt;
</GetValueListForFieldResult>
</GetValueListForFieldResponse>
</soap:Body>
Is there a way to convert the data in the GetValueForFieldResult node to actual XML so I can parse the data?
Below is how I'm making the SOAP call and storing the XML. I'm learning C# and if below is a complete mess my appologies.
HttpWebRequest reqVl = (HttpWebRequest)WebRequest.Create(serverURL + "/ws/Field.asmx");
reqVl.Headers.Add("SOAPAction", "http://URL/webservices/GetValueListForField");
reqVl.ContentType = "text/xml;charset=\"utf-8\"";
reqVl.Accept = "text/xml";
reqVl.Method = "POST";
using (Stream stm = reqVl.GetRequestStream())
{
using (StreamWriter stmw = new StreamWriter(stm))
{
stmw.Write(VLsoap);
}
}
WebResponse responseVL = reqVl.GetResponse();
Stream responseStreamVL = responseVL.GetResponseStream();
XmlReader rdrVL = XmlReader.Create(responseStreamVL);
XmlDocument vls = new XmlDocument();
vls.Load(rdrVL);
Here is some code to achieve what you want - however, please read the text afterwards for an explanation of why this may not be the best way to get what you want.
Tested as working with your message and .Net 4.
Assuming vls contains your SOAP message as XML, we split the problem into two halves; parsing the soap message, and extracting and decoding the contents of the GetValueListForFieldResult node into a string that can be loaded into another XmlDocument
Part I - getting the contents of the GetValueListForFieldResult node
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(vls.NameTable);
namespaceManager.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
namespaceManager.AddNamespace("default", "http://URLHere/webservices/");
XmlNode payLoadNode =
vls.SelectSingleNode("/soap:Envelope/soap:Body/default:GetValueListForFieldResponse/default:GetValueListForFieldResult", namespaceManager);
string encodedXml = payLoadNode.InnerText;
Part II - getting the encoded string into an Xml Document
It is at this point that we have the encoded string. We have a couple of choices for decoding this HTML; as I'm using .Net 4 I've gone for the simplest:
string decodedXml = WebUtility.HtmlDecode(encodedXml);
XmlDocument payloadDocument = new XmlDocument();
payloadDocument.LoadXml(decodedXml);
If you are using .Net 3.5 then you'll have to consider adding a reference to System.Web and using HttpUtility.HtmlDecode instead to decode the string.
Parsing your message above gives me the result:
<SelectDef>
<Id>1736</Id>
<SelectName>Values List</SelectName>
<GlobalFlag>False</GlobalFlag>
<Sort>1</Sort>
<SelectDefValues>
<SelectDefValue>
<guid>aaf6f3a7-6a74-4187-b4e7-3a9355b16796</guid>
<Id>14692</Id>
<Name>Open</Name>
<Description></Description>
<Color>#000000</Color>
<DefaultFlag>False</DefaultFlag>
<SortOrder>1</SortOrder>
<select_id>1736</select_id>
</SelectDefValue>
<SelectDefValue>
<guid>f5082b54-d799-4fdc-80c1-0e232b360057</guid>
<Id>14693</Id>
<Name>Closed</Name>
<Description></Description>
<Color>#000000</Color>
<DefaultFlag>False</DefaultFlag>
<SortOrder>0</SortOrder>
<select_id>1736</select_id>
</SelectDefValue>
<SelectDefValue>
<guid>94e29e78-2ab3-463f-bbb6-ab7f36003c7f</guid>
<Id>14780</Id>
<Name>Past Due</Name>
<Description></Description>
<Color>#000000</Color>
<DefaultFlag>False</DefaultFlag>
<SortOrder>2</SortOrder>
<select_id>1736</select_id>
</SelectDefValue>
</SelectDefValues>
</SelectDef>
The Alternative
The reason you may not wish to do it this way is because the response you are receiving has been wrapped using SOAP; you may therefore wish to try and consume the service you are connecting to as a web service and generate a proxy library; this will encapsulate all of the code you see above, as well as the code you have written, in a proxy that may allow you to retrieve the values you want in a more type safe and less fragile manner. Support for this is built into .Net.
Further, as the URL you are connecting to terminates in ASMX, it tells you that this is most likely a native .Net web service so wiring your client up to it should be simple.
The MSDN topic "How to add a Reference to a Web Service" should help you in generating the proxy and avoiding all of the code above.

Getting "" at the beginning of my XML File after save() [duplicate]

This question already has answers here:
How can I remove the BOM from XmlTextWriter using C#?
(2 answers)
Closed 7 years ago.
I'm opening an existing XML file with C#, and I replace some nodes in there. All works fine. Just after I save it, I get the following characters at the beginning of the file:
 (EF BB BF in HEX)
The whole first line:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The rest of the file looks like a normal XML file.
The simplified code is here:
XmlDocument doc = new XmlDocument();
doc.Load(xmlSourceFile);
XmlNode translation = doc.SelectSingleNode("//trans-unit[#id='127']");
translation.InnerText = "testing";
doc.Save(xmlTranslatedFile);
I'm using a C# Windows Forms application with .NET 4.0.
Any ideas? Why would it do that? Can we disable that somehow? It's for Adobe InCopy, and it does not open it like this.
UPDATE:
Alternative Solution:
Saving it with the XmlTextWriter works too:
XmlTextWriter writer = new XmlTextWriter(inCopyFilename, null);
doc.Save(writer);
It is the UTF-8 BOM, which is actually discouraged by the Unicode standard:
http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
Use of a BOM is neither required nor
recommended for UTF-8, but may be
encountered in contexts where UTF-8
data is converted from other encoding
forms that use a BOM or where the BOM
is used as a UTF-8 signature
You may disable it using:
var sw = new IO.StreamWriter(path, new System.Text.UTF8Encoding(false));
doc.Save(sw);
sw.Close();
It's a UTF-8 Byte Order Mark (BOM) and is to be expected.
You can try to change the encoding of the XmlDocument. Below is the example copied from MSDN
using System; using System.IO; using System.Xml;
public class Sample {
public static void Main() {
// Create and load the XML document.
XmlDocument doc = new XmlDocument();
string xmlString = "<book><title>Oberon's Legacy</title></book>";
doc.Load(new StringReader(xmlString));
// Create an XML declaration.
XmlDeclaration xmldecl;
xmldecl = doc.CreateXmlDeclaration("1.0",null,null);
xmldecl.Encoding="UTF-16";
xmldecl.Standalone="yes";
// Add the new node to the document.
XmlElement root = doc.DocumentElement;
doc.InsertBefore(xmldecl, root);
// Display the modified XML document
Console.WriteLine(doc.OuterXml);
}
}
As everybody else mentioned, it's Unicode issue.
I advise you to try LINQ To XML. Although not really related, I mention it as it's super easy compared to old ways and, more importantly, I assume it might have automatic resolutions to issues like these without extra coding from you.

Categories

Resources