Getting garbled characters - c#

My aim here is to convert the original xml file through some xsl to the destination having a utf-8 encoding. Here is the original xml file with the following header:
<?xml version='1.0' encoding='ISO-8859-1'?>
I'm transforming this using xsl to another xml file. The xsl file has the following header:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:xalan="http://xml.apache.org/xalan"
exclude-result-prefixes="xalan">
<xsl:output method="xml"
encoding="UTF-8"
indent="yes"
xalan:indent-amount="4"/>
Here is the C# code:
XPathDocument myXPathDoc = new XPathDocument(FileName);
XslCompiledTransform myXslTrans = new XslCompiledTransform();
myXslTrans.Load("C:/test/test.xsl");
XmlTextWriter myWriter = new XmlTextWriter(destinationFile, Encoding.UTF8);
myWriter.Formatting = Formatting.Indented;
myWriter.Indentation = 4;
myXslTrans.Transform(myXPathDoc, null, myWriter);
myWriter.Close();
The output of this is I get a garbled arabic text at destinationfile. How do get this to read proper arabic text.
EDIT, Question 2:
The original XML file is missing the closing root/child tags. How do I edit this xml to include these in.
e.g. original xml file, missing closing for aaaa and nnnn. How do I edit using C# to get them in.
<aaaa>
<nnnn)
<rrrr>
</rrrr>

If your original XML file contains Arabic characters then its XML declaration is lying - the file is not encoding="ISO-8859-1" as that encoding cannot represent Arabic.
If you can determine what encoding the original file really uses you can force the file to be read in that encoding by using the XPathDocument constructor that takes a TextReader instead of the one that takes a file name. For Arabic, the encoding is probably Windows-1256.

Related

XmlDocument, preformatted xml: Add new line after XML declaration only

I need to add a line after xml declaration to process a file on an external system that rejects it if missing.
I had formatted the doc by myself on creation and signing it, so I don't want to mess with formatting as corrupts the signature (it has also third-party external signed documents)
.
Yes, I can open it as text, use replace "?><" by "?>\r\n<", save it, or do everything manually, but I want to do it "the XmlDocument" way.
What I have:
<?xml version="1.0" encoding="ISO-8859-1"?><LceEnvioOblig xmlns="http://www.sii.cl/SiiLce" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sii.cl/SiiLce LceEnvioOblig_v10.xsd" version="1.0">
<DocumentoEnvioOblig ID="EnvioOblig_12868626-6_76876251-1">
<Caratula>
<RutEnvia>12868626-6</RutEnvia>
<RutContribuyente>76876251-1</RutContribuyente>
<TmstFirmaEnv>2019-01-15T12:00:14-03:00</TmstFirmaEnv>
</Caratula>
What I need:
<?xml version="1.0" encoding="ISO-8859-1"?>
<LceEnvioOblig xmlns="http://www.sii.cl/SiiLce" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sii.cl/SiiLce LceEnvioOblig_v10.xsd" version="1.0">
<DocumentoEnvioOblig ID="EnvioOblig_12868626-6_76876251-1">
<Caratula>
<RutEnvia>12868626-6</RutEnvia>
<RutContribuyente>76876251-1</RutContribuyente>
<TmstFirmaEnv>2019-01-15T12:00:14-03:00</TmstFirmaEnv>
</Caratula>
Relevant code:
signedXml.ComputeSignature();
XmlElement xmlDigitalSignature = signedXml.GetXml();
xmlDoc.DocumentElement.AppendChild(xmlDoc.ImportNode(xmlDigitalSignature, true));
var dec = xmlDoc.CreateXmlDeclaration("1.0", Constantes.SaveEncoding.EncodingName,"no");
using (var sw = new StreamWriter(salida, false, Constantes.SaveEncoding))
{
xmlDoc.Save(sw);
}
Note I am not using indentations and PreserveWhitespace has not worked fine for me.
Any suggestions or best way to do it?

Read XML with Arabic data embedded c#

I am trying to load an XML file that contains a mix of ASCII text and Arabic characters. Here is the top snippet:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TS>
<TS version="2.1" language="ar_EG">
<context>
<message>
<location filename="ui/aboutdialog.cpp" line="90"/>
<source>You have </source>
<translation type="unfinished">يوجد لديك</translation>
</message>
<message>
<location filename="ui/aboutdialog.cpp" line="90"/>
<source> launches left</source>
<translation type="unfinished">عدد التشغيلات المتبقية</translation>
</message>
</context>
I want to load this up into a C# TreeView object, but I am having issues with loading into XDocument or XMLDocument.
Using this:
XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text));
or
XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text, Encoding.GetEncoding(874)));
Gives me a "Invalid URI: Uri string is too long" error.
Using this:
XmlDocument xd = new XmlDocument();
xd.Load(tbxTSFileName.Text);
Gives the error "Invalid character in the given encoding. Line 9 position 40".
Read the documentation for the method you're calling.
XDocument.Load takes a URL, not an XML string.
You want XDocument.Parse.
Your reader needs to use utf-8, as indicated in the document itself. Ideally, you would use an XML reader and it would take care of using the indicated encoding itself.

Append XMLDocument to other

I have following two xml strings
<?xml version="1.0"?>
<AccessRequest xml:lang='en-US'>
<AccessLicenseNumber>YOURACCESSLICENSENUMBER</AccessLicenseNumber>
<UserId>YOURUSERID</UserId>
<Password>YOURPASSWORD</Password>
</AccessRequest>
and
<?xml version="1.0" ?>
<RatingServiceSelectionRequest>
<PickupType>
<Code>01</Code>
</PickupType>
<Shipment>
<Description>Rate </Description>
<Shipper>
<Address>
<PostalCode>originzip</PostalCode>
</Address>
</Shipper>
<ShipTo>
<Address>
<PostalCode>destinationzip</PostalCode>
<CountryCode>countrycode</CountryCode>
</Address>
</ShipTo>
<Service>
<Code>11</Code>
</Service>
<Package>
<PackagingType>
<Code>02</Code>
<Description>Package</Description>
</PackagingType>
<Description>Rate Shopping</Description>
<PackageWeight>
<Weight>weight</Weight>
</PackageWeight>
</Package>
<ShipmentServiceOptions/>
</Shipment>
</RatingServiceSelectionRequest>
I want to append second xml string to first one. I tried writing both XmlDocuments to a XmlWriter. But it throws exception "Cannot write XML declaration. XML declaration can be only at the beginning of the document."
Stream stm = req.GetRequestStream();
XmlDocument doc1 = new XmlDocument();
XmlDocument doc2 = new XmlDocument();
doc1.LoadXml(xmlData1);
doc2.LoadXml(xmlData2);
XmlWriterSettings xws = new XmlWriterSettings();
xws.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlWriter.Create(stm, xws))
{
doc1.WriteTo(xw);
doc2.WriteTo(xw);
}
How can I append it as is? Please help
Remove <?xml version="1.0" ?> from second xml string before appending it to first xml string.
I had this problem in the past. The two lines of code below did the job:
var MyDoc = XDocument.Load("File1.xml");
MyDoc.Root.Add(XDocument.Load("File2.xml").Root.Elements());
If you already have strings ready, then please use the Parse function instead of Load.
Please notice I am using the System.Xml.Linq that uses XDocument instead of XmlDocument class.
EDIT
As I understood, you need both documents to be concatenated as is. The problem is that it will eventually leads to an invalid XML document for two reasons :
the document will contain two root nodes: AccessRequest and RatingServiceSelectionRequest. A valid XML document contains only one root node.
There must be only one <?xml version="1.0" ?> XML declaration at the beginning of a document.
If the UPS api your are using is fed with an invalid XML, you unfortunately can not use XML objects. Therefore you will have to use a simple string concatenation to achieve what you want:
var xml = xmlData1 + xmlData2;

Insert image (Byte[]) into word doc using XSLT & C#

I have an xml document that looks like this.
<?xml version="1.0" encoding="UTF-8"?>
<Job>
<ID>1</ID>
<Name>Front Window<Name>
<Image>/9j/4AAQSkZJRgABAQ..(etc)</windowImage>
</job>
<Job>
<ID>2</ID>
<Name>BackWindow<Name>
<Image>/9j/4BAQSkZJRgABAQ..(etc)</windowImage>
</job>
</xml>
Also a XSLT file which loops through each job. My problem is how can I insert the images which are saved as byte[]? They are only saved in the database and no file is created for them because they are generated in the application. My C# code looks like this.
Job jobClass = new ReportSQL().createXMLclass(_jobID);
new ReportSQL().createXMLFile(jobClass);
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(#"Code\quoteTemplate.xslt");
xslt.Transform("xmlfile.xml", "doc.doc");
I think the XML you want to produce saves the images as text. This can be accomplished by encoding the bytearray as Base64:
String s = Convert.ToBase64String(byte[])
and
byte[] b = Convert.FromBase64String(String)

Reading contents of XML file without having to remove the XML declaration

I want to read all XML contents from a file. The code below only works when the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) is removed. What is the best way to read the file without removing the XML declaration?
XmlTextReader reader = new XmlTextReader(#"c:\my path\a.xml");
reader.Read();
string rs = reader.ReadOuterXml();
Without removing the XML declaration, reader.ReadOuterXml() returns an empty string.
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">http://www.as.com/ver/ver.IClaimver/Car</a:Action>
<a:MessageID>urn:uuid:b22149b6-2e70-46aa-8b01-c2841c70c1c7</a:MessageID>
<ActivityId CorrelationId="16b385f3-34bd-45ff-ad13-8652baeaeb8a" xmlns="http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">04eb5b59-cd42-47c6-a946-d840a6cde42b</ActivityId>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">http://localhost/ver.Web/ver2011.svc</a:To>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Car xmlns="http://www.as.com/ver">
<carApplication>
<HB_Base xsi:type="HB" xmlns="urn:core">
<Header>
<Advisor>
<AdvisorLocalAuthorityCode>11</AdvisorLocalAuthorityCode>
<AdvisorType>1</AdvisorType>
</Advisor>
</Header>
<General>
<ApplyForHB>yes</ApplyForHB>
<ApplyForCTB>yes</ApplyForCTB>
<ApplyForFSL>yes</ApplyForFSL>
<ConsentSupplied>no</ConsentSupplied>
<SupportingDocumentsSupplied>no</SupportingDocumentsSupplied>
</General>
</HB_Base>
</carApplication>
</Car>
</s:Body>
</s:Envelope>
Update
I know other methods that use NON-xml reader (e.g. by using File.ReadAllText()). But I need to know a way that uses an xml method.
There can be no text or whitespace before the <?xml ?> encoding declaration other than a BOM, and no text between the declaration and the root element other than line break.
Anything else is an invalid document.
UPDATE:
I think your expectation of XmlTextReader.read() is incorrect.
Each call to XmlTextReader.Read() steps through the next "token" in the XML document, one token at a time. "Token" means XML elements, whitespace, text, and XML encoding declaration.
Your call to reader.ReadOuterXML() is returning an empty string because the first token in your XML file is an XML declaration, and an XML declaration does not have an OuterXML.
Consider this code:
XmlTextReader reader = new XmlTextReader("test.xml");
reader.Read();
Console.WriteLine(reader.NodeType); // XMLDeclaration
reader.Read();
Console.WriteLine(reader.NodeType); // Whitespace
reader.Read();
Console.WriteLine(reader.NodeType); // Element
string rs = reader.ReadOuterXml();
The code above produces this output:
XmlDeclaration
Whitespace
Element
The first "token" is the XML declaration.
The second "token" encountered is the line break after the XML declaration.
The third "token" encountered is the <s:Envelope> element. From here a call to reader.ReadOuterXML() will return what I think you're expecting to see - the text of <s:Envelope> element, which is the entire soap packet.
If what you really want is to load the XML file into memory as objects, just call
var doc = XDocument.Load("test.xml")
and be done with the parsing in one fell swoop.
Unless you're working with an XML doc that is so monstrously huge that it won't fit in system memory, there's really not a lot of reason to go poking through the XML document one token at a time.
What about
XmlDocument doc=new XmlDocument;
doc.Load(#"c:\my path\a.xml");
//Now we have the XML document - convert it to a String
//There are many ways to do this, one should be:
StringWriter sw=new StringWriter();
doc.Save(sw);
String finalresult=sw.ToString();
EDIT: I'm assuming you mean you actually have text between the document declaration and the root element. If that's not the case, please clarify.
Without removing the extra text, it's simply an invalid XML file. I wouldn't expect it to work. You don't have an XML file - you have something a bit like an XML file, but with extraneous stuff before the root element.
IMHO you can't read this file. It's because there's a plain text before the root element <s:Envelope> which makes whole document invalid.
You're parsing an XML document as XML just to obtain the source text? Why?
If you really want to do that then:
string rs;
using(var rdr = new StreamReader(#"c:\my path\a.xml"))
rs = rdr.ReadToEnd();
Will work, but I'm really not sure that is what you actually want. This pretty much ignores that it's XML and just reads the text. Useful for some things, but not a lot.

Categories

Resources