I am using this code to store my class:
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
serializer.Serialize(stream, myClass);
stream.Close();
This writes a file that I can read alright with XmlSerializer.Deserialize. The generated file, however, is not a proper text file. XmlSerializer.Serialize doesn't store a BOM, but still inserts multibyte characters. Thus it is implicitely declared an ANSI file (because we expect an XML file to be a text file, and a text file without a BOM is considered ANSI by Windows), showing ö as ö in some editors.
Is this a known bug? Or some setting that I'm missing?
Here is what the generated file starts with:
<?xml version="1.0"?>
<SvnProjects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
The first byte in the file is hex 3C, i.e the <.
Having or not having a BOM is not a definition of a "proper text file". In fact, I'd say that the most typical format these days is UTF-8 without BOM; I don't think I've ever seen anyone actually use the UTF-8 BOM in real systems! But: if you want a BOM, that's fine: just pass the correct Encoding in; if you want UTF-8 with BOM:
using (var writer = XmlWriter.Create(myPath, s_settings))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
serializer.Serialize(writer, obj);
}
with:
static readonly XmlWriterSettings s_settings =
new XmlWriterSettings { Encoding = new UTF8Encoding(true) };
The result of this is a file that starts EF-BB-BF, the UTF-8 BOM.
If you want a different encoding, then just replace new UTF8Encoding with whatever you did want, remembering to enable the BOM.
(note: the static Encoding.UTF8 instance has the BOM enabled, but IMO it is better to be very explicit here if you specifically intend to use a BOM, just like you should be very explicit about what Encoding you intended to use)
Edit: the key difference here is that Serialize(Stream, object) ends up using:
XmlTextWriter xmlWriter = new XmlTextWriter(stream, encoding: null) {
Formatting = Formatting.Indented,
Indentation = 2
};
which then ends up using:
public StreamWriter(Stream stream) : this(stream,
encoding: UTF8NoBOM, // <==== THIS IS THE PROBLEM
bufferSize: 1024, leaveOpen: false)
{
}
so: UTF-8 without BOM is the default if you use that API.
you must xml an instance not a class definition
for getting Unicode you must declare a XmlWriter or TextWriter
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
XmlWriter writer = new XmlTextWriter(fs, Encoding.Unicode);
serializer.Serialize(writer, myClass);
stream.Close();
Related
I have the following bit of code in C# to convert an XML file to another using XSLT/
string xmlInput = #"<?xml version='1.0' encoding='UTF-8'?><catalog><cd><title> Empire Burlesque </title ><artist> Bob Dylan </artist><country> USA </country><company> Columbia </company><price> 10.90 </price><year> 1985 </year></cd></catalog>";
///////////////////////////////////////////////////////////////
string xmlOutput = String.Empty;
using (StringReader sri = new StringReader(xmlInput))
{
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
//xslt.Load(xrt);
xslt.Load(#"XSLT/slide2.xslt");
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, new XmlWriterSettings { Encoding = Encoding.UTF8 }))
{
xslt.Transform(xri, xwo);
xmlOutput = sw.ToString();
}
}
}
xmlOutput gives me "<?xml version=\"1.0\" encoding=\"utf-16\"?><root> Empire Burlesque </root>"
How can I get utf-8 and no slashes?
.NET strings are sequences of UTF-16 encoded characters and StringWriter/StringBuilder default to that encoding. (source https://forums.asp.net/post/3240311.aspx)
So you need to make a class which inherits of the default stringwriter:
public class StringWriterWithEncoding : StringWriter
{
Encoding myEncoding;
public override Encoding Encoding
{
get
{
return myEncoding;
}
}
public StringWriterWithEncoding(Encoding encoding) : base()
{
myEncoding = encoding;
}
public StringWriterWithEncoding(Encoding encoding) : base(CultureInfo.CurrentCulture)
{
myEncoding = encoding;
}
public StringWriterWithEncoding(StringBuilder sb, Encoding encoding) : base(sb, CultureInfo.CurrentCulture)
{
myEncoding = encoding;
}
}
and create an instance of that e.g. StringWriterWithEncoding utf8Writer = new StringWriterWithEncoding(Encoding.UTF8); and pass that as the third argument to the Transform method of your XslCompiledTransform.
use like this:
StringBuilder sb = new StringBuilder();
using (StringWriterWithEncoding sw = new StringWriterWithEncoding(sb, Encoding.UTF8))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(#"XSLT/slide2.xslt");
xslt.Transform(xri, sw);
}
xmlOutput = sb.ToString();
First issue is caused by StringWriter
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, new XmlWriterSettings { Encoding = Encoding.UTF8 }))
Even though you specifically set XmlWriterSettings.Encoding to UTF-8, you specify output stream to be StringWriter and since .NET strings are UTF-16, XmlWriter is forced to use UTF-16.
If you use for instance FileStream instead of StringWriter, output will be in UTF-8 or whatever encoding you specify.
Slashes issue is just your IDE escaping it. If you print xmlOutput to Console you will see it contains no extra slashes.
You can include this line in your XSLT stylesheet:
<xsl:output encoding="utf-8"/>
(or of course whichever encoding you prefer), and it will automatically set the output settings to the utf-8 encoding.
I believe that using the MemoryStream is a better way to handle this. .net strings are utf-16 internally and that's how they are encoded when you write to a StringWriter of StringBuilder object. Using a memory stream, you avoid that pitfall.
string xmlDoc = "";
// Use a memory stream to avoid the .net internal string utf-16 encoding pitfall.
using (MemoryStream xmlStream = new MemoryStream())
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlAsText)))
using (XmlReader xsltReader = XmlReader.Create(new StringReader(xsltAsText)))
{
// Transform XML string to new XML based on the XSLT
// Load the XSLT and transform source XML to target XML
XslCompiledTransform myXslTrans = new XslCompiledTransform();
myXslTrans.Load(xsltReader);
myXslTrans.Transform(xmlReader, null, xmlStream);
// Using the encoding from the xslt, transform the xml stream bytes to the xml string.
// If no encoding in xslt, defaults to UTF-8.
xmlDoc = myXslTrans.OutputSettings.Encoding.GetString(xmlStream.ToArray());
// Remove the BOM if it exists
string byteOrderMark = myXslTrans.OutputSettings.Encoding.GetString(myXslTrans.OutputSettings.Encoding.GetPreamble());
if (xmlDoc.StartsWith(byteOrderMark, StringComparison.Ordinal))
{
xmlDoc = xmlDoc.Remove(0, byteOrderMark.Length);
}
}
If there is no encoding attribute in the stylesheet, then you get UTF-8 by default, not UTF-16. This is the same as if you were to write it directly to file. I can't say how it works for different cultures, sorry.
I'm suggesting this method over the other methods where they are hard coding UTF-8. This works with whatever (valid) encoding is in the style sheet, such as ISO-8859-1.
I have a method that saves an instance of my custom class to a file. One time I noticed that my application fails to start, because this file is filled with 0-value bytes (null characters). This has never happened before, it seemed to work just fine. Does anyone see something odd with this code? Something that can cause the serializer or the memory stream to return an array of zero values? Or should I suspect it's the work of another application?
private readonly XmlSerializer _serializer = new XmlSerializer(typeof(MySettings));
public void Save(MySettings config)
{
using (var stream = new MemoryStream())
{
_serializer.Serialize(stream, config);
byte[] binaryConfig = stream.ToArray();
File.WriteAllBytes(_configFilePath, binaryConfig);
}
}
Wouldn't it be simpler to use something like this?
XmlSerializer x = new XmlSerializer(typeof(MySettings));
using (FileStream stream = new FileStream(_configFilePath, FileMode.Create, FileAccess.Write))
{
x.Serialize(stream, config);
stream.Close();
}
The XML file should not contain any 0-bytes or nul characters, as your object is translated to XML text during serialization. You can simply open the XML file using a text editor to have a look at the file contents.
This should hopefully be a simple one.
I am serializaing a List<> of C# objects to an XML document. Everything is going great however my XML document has ASCII encoding (spaces are represented as X0020 for example) and the client is complaining so I want to change the encoding to UTF8 like so:
private void SerializeToXML(List<ResponseData> finalXML)
{
XmlSerializer serializer = new XmlSerializer(typeof(List<ResponseData>));
TextWriter textWriter = new StreamWriter(txtFileLocation.Text, Encoding.UTF8);
serializer.Serialize(textWriter, finalXML);
textWriter.Close();
}
Intellisense is telling me this should work...
...but is complaining when I try it...
What am I doing wrong?
Thanks
There is no (string, Encoding) method signature for the StreamWriter constructor.
There is a (Stream, Encoding) signature for the constructor.
here is a snippet that is working like a charm:
using (Stream stream = File.Open(SerializeXmlFileName, FileMode.Create))
{
using (TextWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
XmlSerializer xmlFormatter = new XmlSerializer(this.Member.GetType());
xmlFormatter.Serialize(writer, this.Member);
writer.Close();
}
stream.Close();
}
How can I see the XML contents of fully populated XmlWriter object while debugging. My silverlight application doesn't permit to actually write to a file and check the contents.
Have it write to a MemoryStream or StringBuilder instead of a file. That will allow you to check the output.
You can create the XmlWriter based on a MemoryStream, then unencode the bytes from the memory stream and display it in a text box, for example.
MemoryStream ms = new MemoryStream();
XmlWriterSettings ws = new XmlWriterSettings();
ws.Encoding = Encoding.UTF8;
XmlWriter w = XmlWriter.Create(ms, ws);
// populate the writer
w.Flush();
textBox1.Text = Encoding.UTF8.GetString(ms.GetBuffer(), 0, (int)ms.Position);
An XmlReader is not "populated". It represents the state of an XML parsing operation, as that operation is in progress. This state will change as the XML is read.
I'm having a problem writing Norwegian characters into an XML file using C#. I have a string variable containing some Norwegian text (with letters like æøå).
I'm writing the XML using an XmlTextWriter, writing the contents to a MemoryStream like this:
MemoryStream stream = new MemoryStream();
XmlTextWriter xmlTextWriter = new XmlTextWriter(stream, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
Then I add my Norwegian text like this:
xmlTextWriter.WriteCData(myNorwegianText);
Then I write the file to disk like this:
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile);
stream.Position = 0;
StreamReader sr = new StreamReader(stream);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();
Now the problem is that in the file on this, all the Norwegian characters look funny.
I'm probably doing the above in some stupid way. Any suggestions on how to fix it?
Why are you writing the XML first to a MemoryStream and then writing that to the actual file stream? That's pretty inefficient. If you write directly to the FileStream it should work.
If you still want to do the double write, for whatever reason, do one of two things. Either
Make sure that the StreamReader and StreamWriter objects you use all use the same encoding as the one you used with the XmlWriter (not just the StreamWriter, like someone else suggested), or
Don't use StreamReader/StreamWriter. Instead just copy the stream at the byte level using a simple byte[] and Stream.Read/Write. This is going to be, btw, a lot more efficient anyway.
Both your StreamWriter and your StreamReader are using UTF-8, because you're not specifying the encoding. That's why things are getting corrupted.
As tomasr said, using a FileStream to start with would be simpler - but also MemoryStream has the handy "WriteTo" method which lets you copy it to a FileStream very easily.
I hope you've got a using statement in your real code, by the way - you don't want to leave your file handle open if something goes wrong while you're writing to it.
Jon
You need to set the encoding everytime you write a string or read binary data as a string.
Encoding encoding = Encoding.GetEncoding("ISO-8859-1");
FileStream myFile = new FileStream(myPath, FileMode.Create);
StreamWriter sw = new StreamWriter(myFile, encoding);
stream.Position = 0;
StreamReader sr = new StreamReader(stream, encoding);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
myFile.Flush();
myFile.Close();
As mentioned in above answers, the biggest issue here is the Encoding, which is being defaulted due to being unspecified.
When you do not specify an Encoding for this kind of conversion, the default of UTF-8 is used - which may or may not match your scenario. You are also converting the data needlessly by pushing it into a MemoryStream and then out into a FileStream.
If your original data is not UTF-8, what will happen here is that the first transition into the MemoryStream will attempt to decode using default Encoding of UTF-8 - and corrupt your data as a result. When you then write out to the FileStream, which is also using UTF-8 as encoding by default, you simply persist that corruption into the file.
In order to fix the issue, you likely need to specify Encoding into your Stream objects.
You can actually skip the MemoryStream process entirely, also - which will be faster and more efficient. Your updated code might look something more like:
FileStream fs = new FileStream(myPath, FileMode.Create);
XmlTextWriter xmlTextWriter =
new XmlTextWriter(fs, Encoding.GetEncoding("ISO-8859-1"));
xmlTextWriter.Formatting = Formatting.Indented;
xmlTextWriter.WriteStartDocument(); //Start doc
xmlTextWriter.WriteCData(myNorwegianText);
StreamWriter sw = new StreamWriter(fs);
fs.Position = 0;
StreamReader sr = new StreamReader(fs);
string content = sr.ReadToEnd();
sw.Write(content);
sw.Flush();
fs.Flush();
fs.Close();
Which encoding do you use for displaying the result file? If it is not in ISO-8859-1, it will not display correctly.
Is there a reason to use this specific encoding, instead of for example UTF8?
After investigating, this is that worked best for me:
var doc = new XDocument(new XDeclaration("1.0", "ISO-8859-1", ""));
using (XmlWriter writer = doc.CreateWriter()){
writer.WriteStartDocument();
writer.WriteStartElement("Root");
writer.WriteElementString("Foo", "value");
writer.WriteEndElement();
writer.WriteEndDocument();
}
doc.Save("dte.xml");