How Can I Encode My New Xml Document? - c#

I have a string with a XML text and i want to save it like XML. I encoded string (to "utf-8") but when i want to make XML from that - my cyrillic symbols in Value don't displayed right . What i need to do to encode my XML document ?
part of my xml :
<rev:Code>Мои данные</rev:Code>
my code:
string send = Encoding.GetEncoding("utf8").GetString(Encoding.GetEncoding("utf-8").GetBytes(send));
XmlDocument docsec = new XmlDocument();
docsec.LoadXml(send);
docsec.Save("C:\\XmlNEW.xml");
Original text :Мои данные
I see it after creating XML :Мои данные

I worked with russian textfiles before to convert them to rtf and used "Encoding.GetEncoding(1251)" for that purpose.

Problem was in Save Method because it use xml encoding, i take my answer from this : Answer
XmlDocument docsec = new XmlDocument();
docsec.LoadXml(send);
using (TextWriter writer = new StreamWriter("C:\XmlNEW.xml", false, Encoding.UTF8))
docsec.Save(writer);

Related

XML Illegal Characters in path using XPathDocument to do XSL Transform in C#

I have an XML in a string that I need to actually transform to html using an xsl.
I do the transform with XslCompiledTransform. In order for this to work, I am parsing the string that contains the XML to XML using XPathDocument.
However if I try to parse the string straight to the XPathDocument, then I get the error:
Illegal Characters in path.
So I had to include a StringReader in order to be able to parse the string to the XPathDocument. (Using the solutions in the posts I linked below.)
Here is my step by step procedure:
The string is retrieved from SDL Trados Studio and it depends on the XML that is being worked on (how it was originally created and loaded for translations) the string sometimes has a BOM sometimes not. The 'xml' is actually parsed from the segments of the source and target text and the structure element. The textual elements are escaped for xml and the markup and text is joined in one string. (My separate post on the removal of the BOM is C# XPathDocument parsing string to XML with BOM.)
The the string is then parsed into an XPathDocument using a StringReader.
The transform is done with the XslCompiledTransform, using a StringBuilder and a StringWriter.
Transformed xml (now html) is saved to a file.
Here is my code:
//Recreate XML file using an extractor returns a string array
string strSourceXML = String.Join("", extractor.TextSrc);
//strip BOM
strSourceXML = strSourceXML.Substring(strSourceXML.IndexOf("<?"));
//Transform XML with the preview XSL
var xSourceDoc = new XPathDocument(strSourceXML);
//Load XSL
var xTr = new XslCompiledTransform();
var xslt = Settings.GetValue("WordPreview", "XSLTpath", "");
xTr.Load(xslt);
//Parse XML string
dynamic xSourceDoc;
using (StringReader s = new StringReader(strSourceXML))
{
xSourceDoc = new XPathDocument(s);
}
//Transform the XML
StringBuilder sb1 = new StringBuilder();
StringWriter swSource = new StringWriter(sb1);
xTr.Transform(xSourceDoc, null, swSource);
//Transformed file saved to the disk
string tmpSourceDoc = Path.GetTempFileName();
System.IO.StreamWriter writer1 = new System.IO.StreamWriter(tmpSourceDoc, false, Encoding.Unicode);
writer1.Write(sb1.ToString());
writer1.Close();
My question is: Is there a simpler way to solve it? Any suggestions to transform the string straight using the XSLT? Or if not, is there a direct way to parse a string to the XPathDocument?
I have searched over many posts on Stack Overflow such as these:
XML Illegal Characters in path
Illegal characters in path error while parsing XML in C#
Illegal characters in path when loading a string with XDocument
But none of them give me the solution to do this simpler. Any suggestion is welcome. Thanks.
Don't need the intermediate StringBuilder and StringWriter.
XsltCompiledTransform instance can immediately writes to the stream on disk.
string strSourceXML = string.Concat(extractor.TextSrc);
strSourceXML = strSourceXML.Substring(strSourceXML.IndexOf("<?"));
var xTr = new XslCompiledTransform();
var xslt = Settings.GetValue("WordPreview", "XSLTpath", "");
xTr.Load(xslt);
string tmpSourceDoc = Path.GetTempFileName();
using (var reader = new StringReader(strSourceXML))
using (var writer = new StreamWriter(tmpSourceDoc, false, Encoding.Unicode))
{
var xSourceDoc = new XPathDocument(reader);
xTr.Transform(xSourceDoc, null, writer);
}

convert KOI8-R xml node into unicode in c#

I have the following xml:
<root>
<text><![CDATA[ОПЕЛХМЮБЮ ОПЕГ БЗПРЪЫ ЯЕ АЮПЮАЮМ, Б ЙНИРН ЯЕ]]></text>
</root>
I know this text is generated using encoding KOI8-R (this text is displayed in my text editor only when I select this encoding when I open the xml file as text) and I would like to convert the value of this node into a string usable in c#. I can read the InnerText value of this node, but it's not what I'm expecting. Can someone show me the correct way to convert a string written with this encoding into a Unicode one?
Update
Following Jon Skeet suggestions, the solution would look like this:
Encoding encoding = Encoding.GetEncoding("KOI8-R");
XmlDocument doc2 = new XmlDocument();
using (TextReader tr = new StreamReader(outputPath, encoding))
{
doc2.Load(tr);
}
How do you have that XML? It should have an XML declaration stating which encoding it's using; otherwise it's not correct simply in XML terms. You shouldn't be worrying about encodings after you've parsed the XML. So potentially something like:
Encoding encoding = Encoding.GetEncoding("KOI8-R");
XDocument doc;
using (var reader = File.OpenText("file.xml", encoding))
{
doc = XDocument.Load(reader);
}
... but as I say, the file itself should declare the encoding.

How change file coding from windows-1251 to utf-8

I have xml file and I need to convert the text that I get from it:
I just start to write code, but I don't know how to realize this:
string text = File.ReadAllText(path);
XDocument documentcode = XDocument.Load(text);
You will have to specify the correct encoding when reading:
string text = File.ReadAllText(path, Encoding.GetEncoding("windows-1251"));
XDocument documentcode = XDocument.Parse(text); // not load.
You probably don't have to do anything special when writing.

XDocument: saving XML to file without BOM

I'm generating an utf-8 XML file using XDocument.
XDocument xml_document = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(ROOT_NAME,
new XAttribute("note", note)
)
);
...
xml_document.Save(#file_path);
The file is generated correctly and validated with an xsd file with success.
When I try to upload the XML file to an online service, the service says that my file is wrong at line 1; I have discovered that the problem is caused by the BOM on the first bytes of the file.
Do you know why the BOM is appended to the file and how can I save the file without it?
As stated in Byte order mark Wikipedia article:
While Unicode standard allows BOM in
UTF-8 it does not require or
recommend it. Byte order has no
meaning in UTF-8 so a BOM only
serves to identify a text stream or
file as UTF-8 or that it was converted
from another format that has a BOM
Is it an XDocument problem or should I contact the guys of the online service provider to ask for a parser upgrade?
Use an XmlTextWriter and pass that to the XDocument's Save() method, that way you can have more control over the type of encoding used:
var doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement("root", new XAttribute("note", "boogers"))
);
using (var writer = new XmlTextWriter(".\\boogers.xml", new UTF8Encoding(false)))
{
doc.Save(writer);
}
The UTF8Encoding class constructor has an overload that specifies whether or not to use the BOM (Byte Order Mark) with a boolean value, in your case false.
The result of this code was verified using Notepad++ to inspect the file's encoding.
First of all: the service provider MUST handle it, according to XML spec, which states that BOM may be present in case of UTF-8 representation.
You can force to save your XML without BOM like this:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = new UTF8Encoding(false); // The false means, do not emit the BOM.
using (XmlWriter w = XmlWriter.Create("my.xml", settings))
{
doc.Save(w);
}
(Googled from here: http://social.msdn.microsoft.com/Forums/en/xmlandnetfx/thread/ccc08c65-01d7-43c6-adf3-1fc70fdb026a)
The most expedient way to get rid of the BOM character when using XDocument is to just save the document, then do a straight File read as a file, then write it back out. The File routines will strip the character out for you:
XDocument xTasks = new XDocument();
XElement xRoot = new XElement("tasklist",
new XAttribute("timestamp",lastUpdated),
new XElement("lasttask",lastTask)
);
...
xTasks.Add(xRoot);
xTasks.Save("tasks.xml");
// read it straight in, write it straight back out. Done.
string[] lines = File.ReadAllLines("tasks.xml");
File.WriteAllLines("tasks.xml",lines);
(it's hoky, but it works for the sake of expediency - at least you'll have a well-formed file to upload to your online provider) ;)
By UTF-8 Documents
String XMLDec = xDoc.Declaration.ToString();
StringBuilder sb = new StringBuilder(XMLDec);
sb.Append(xDoc.ToString());
Encoding encoding = new UTF8Encoding(false); // false = without BOM
File.WriteAllText(outPath, sb.ToString(), encoding);

Convert XPathDocument to string

I have an XPathDocument and would like to export it in a string that contains the document as XML representation. What is easiest way of doing so?
You can do the following to get a string representation of the XML document:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
string xml = xdoc.CreateNavigator().OuterXml;
If you want your string to contain a full representation of the XML document including an XML declaration you can use the following code:
XPathDocument xdoc = new XPathDocument(#"C:\samples\sampleDocument.xml");
StringBuilder sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb))
{
xdoc.CreateNavigator().WriteSubtree(xmlWriter);
}
string xml = sb.ToString();
An XPathDocument is a read-only representation of an XML document. That means that the internal representation will not change. To get the XML, you can get the original document.
Or use 0xA3's method, which will go through the whole document and write it again (output not necessarily the same as input, yet structurally and functionally equal, because some input is discarded with XDM in-memory representation)

Categories

Resources