XML serialization, encoding - c#

using System;
public class clsPerson
{
public string FirstName;
public string MI;
public string LastName;
}
class class1
{
static void Main(string[] args)
{
clsPerson p=new clsPerson();
p.FirstName = "Jeff";
p.MI = "A";
p.LastName = "Price";
System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(p.GetType());
x.Serialize(Console.Out, p);
Console.WriteLine();
Console.ReadLine();
}
}
taken from http://support.microsoft.com/kb/815813
1)
System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(p.GetType());
What does this line do? what is GetType()?
2) how do I get the encoding to
<?xml version="1.0" encoding="utf-8"?>
< clsPerson xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
instead of
<?xml version="1.0" encoding="IBM437"?>
<clsPerson xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3
.org/2001/XMLSchema">
or not include the encoding type at all?

If you pass the serializer an XmlWriter, you can control some parameters like encoding, whether to omit the declaration (eg for a fragment), etc.
This is not meant to be a definitive guide, but an alternative so you can see what's going on, and something that isn't just going to console first.
Note also, if you create your XmlWriter with a StringBuilder instead of a MemoryStream, your xml will ignore your Encoding and come out as utf-16 encoded. See the blog post writing xml with utf8 encoding for more information.
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
OmitXmlDeclaration = false,
Encoding = Encoding.UTF8
};
using (MemoryStream memoryStream = new MemoryStream() )
using (XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings))
{
var x = new System.Xml.Serialization.XmlSerializer(p.GetType());
x.Serialize(xmlWriter, p);
// we just output back to the console for this demo.
memoryStream.Position = 0; // rewind the stream before reading back.
using( StreamReader sr = new StreamReader(memoryStream))
{
Console.WriteLine(sr.ReadToEnd());
} // note memory stream disposed by StreamReaders Dispose()
}

1) The GetType() function returns a Type object representing the type of your object, in this case the class clsPerson. You could also use typeof(clsPerson) and get the same result. That line creates an XmlSerializer object for your particular class.
2) If you want to change the encoding, I believe there is an override of the Serialize() function that lets you specify that. See MSDN for details. You may have to create an XmlWriter object to use it though, details for that are also on MSDN:
XmlWriter writer = XmlWriter.Create(Console.Out, settings);
You can also set the encoding in the XmlWriter, the XmlWriterSettings object has an Encoding property.

I took the solution offered by #robert-paulson here for a similar thing I was trying to do and get the string of an XmlSchema. By default it would return as utf-16. However as mentioned the solution here suffers from a Stream Closed Read error. So I tool the liberty of posting the refactor as an extension method with the tweek mentioned by #Liam to move the using block.
public static string ToXmlString(this XmlSchema xsd)
{
var xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
OmitXmlDeclaration = false,
Encoding = Encoding.UTF8
};
using (var memoryStream = new MemoryStream())
{
using (var xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings))
{
xsd.Write(xmlWriter);
}
memoryStream.Position = 0;
using (var sr = new StreamReader(memoryStream))
{
return sr.ReadToEnd();
}
}
}

1) This creates a XmlSerializer for the class clsPerson.
2) encoding is IBM437 because that is the form for the Console.Out stream.
PS: Hungarian notation is not preferred in C#; just name you class Person.

Related

xslt always gives me UTF-16 with slashes

I have the following bit of code in C# to convert an XML file to another using XSLT/
string xmlInput = #"<?xml version='1.0' encoding='UTF-8'?><catalog><cd><title> Empire Burlesque </title ><artist> Bob Dylan </artist><country> USA </country><company> Columbia </company><price> 10.90 </price><year> 1985 </year></cd></catalog>";
///////////////////////////////////////////////////////////////
string xmlOutput = String.Empty;
using (StringReader sri = new StringReader(xmlInput))
{
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
//xslt.Load(xrt);
xslt.Load(#"XSLT/slide2.xslt");
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, new XmlWriterSettings { Encoding = Encoding.UTF8 }))
{
xslt.Transform(xri, xwo);
xmlOutput = sw.ToString();
}
}
}
xmlOutput gives me "<?xml version=\"1.0\" encoding=\"utf-16\"?><root> Empire Burlesque </root>"
How can I get utf-8 and no slashes?
.NET strings are sequences of UTF-16 encoded characters and StringWriter/StringBuilder default to that encoding. (source https://forums.asp.net/post/3240311.aspx)
So you need to make a class which inherits of the default stringwriter:
public class StringWriterWithEncoding : StringWriter
{
Encoding myEncoding;
public override Encoding Encoding
{
get
{
return myEncoding;
}
}
public StringWriterWithEncoding(Encoding encoding) : base()
{
myEncoding = encoding;
}
public StringWriterWithEncoding(Encoding encoding) : base(CultureInfo.CurrentCulture)
{
myEncoding = encoding;
}
public StringWriterWithEncoding(StringBuilder sb, Encoding encoding) : base(sb, CultureInfo.CurrentCulture)
{
myEncoding = encoding;
}
}
and create an instance of that e.g. StringWriterWithEncoding utf8Writer = new StringWriterWithEncoding(Encoding.UTF8); and pass that as the third argument to the Transform method of your XslCompiledTransform.
use like this:
StringBuilder sb = new StringBuilder();
using (StringWriterWithEncoding sw = new StringWriterWithEncoding(sb, Encoding.UTF8))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(#"XSLT/slide2.xslt");
xslt.Transform(xri, sw);
}
xmlOutput = sb.ToString();
First issue is caused by StringWriter
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, new XmlWriterSettings { Encoding = Encoding.UTF8 }))
Even though you specifically set XmlWriterSettings.Encoding to UTF-8, you specify output stream to be StringWriter and since .NET strings are UTF-16, XmlWriter is forced to use UTF-16.
If you use for instance FileStream instead of StringWriter, output will be in UTF-8 or whatever encoding you specify.
Slashes issue is just your IDE escaping it. If you print xmlOutput to Console you will see it contains no extra slashes.
You can include this line in your XSLT stylesheet:
<xsl:output encoding="utf-8"/>
(or of course whichever encoding you prefer), and it will automatically set the output settings to the utf-8 encoding.
I believe that using the MemoryStream is a better way to handle this. .net strings are utf-16 internally and that's how they are encoded when you write to a StringWriter of StringBuilder object. Using a memory stream, you avoid that pitfall.
string xmlDoc = "";
// Use a memory stream to avoid the .net internal string utf-16 encoding pitfall.
using (MemoryStream xmlStream = new MemoryStream())
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlAsText)))
using (XmlReader xsltReader = XmlReader.Create(new StringReader(xsltAsText)))
{
// Transform XML string to new XML based on the XSLT
// Load the XSLT and transform source XML to target XML
XslCompiledTransform myXslTrans = new XslCompiledTransform();
myXslTrans.Load(xsltReader);
myXslTrans.Transform(xmlReader, null, xmlStream);
// Using the encoding from the xslt, transform the xml stream bytes to the xml string.
// If no encoding in xslt, defaults to UTF-8.
xmlDoc = myXslTrans.OutputSettings.Encoding.GetString(xmlStream.ToArray());
// Remove the BOM if it exists
string byteOrderMark = myXslTrans.OutputSettings.Encoding.GetString(myXslTrans.OutputSettings.Encoding.GetPreamble());
if (xmlDoc.StartsWith(byteOrderMark, StringComparison.Ordinal))
{
xmlDoc = xmlDoc.Remove(0, byteOrderMark.Length);
}
}
If there is no encoding attribute in the stylesheet, then you get UTF-8 by default, not UTF-16. This is the same as if you were to write it directly to file. I can't say how it works for different cultures, sorry.
I'm suggesting this method over the other methods where they are hard coding UTF-8. This works with whatever (valid) encoding is in the style sheet, such as ISO-8859-1.

C# XmlSerializer Force Encoding Type to ISO-8859-1

I am trying to set encoding type to ISO-8859-1 while serializing an object. The code runs without throwing any exception but the returned coding type is always set to "UTF-16". I have searched many examples, there are 100s of samples, but I am simply unable to force the desired encoding type.
My questions is that how can I force it to set encoding to ISO-8859-1?
Thanks in advance.
The code is:
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
OmitXmlDeclaration = false,
Encoding = Encoding.GetEncoding("ISO-8859-1")
};
using (var stringWriter = new StringWriter())
{
using (var xmlWriter = XmlWriter.Create(stringWriter, xmlWriterSettings))
{
serializer.Serialize(xmlWriter, obj, ns);
}
return stringWriter.ToString();
}
When you write XML to a TextWriter (e.g. StringWriter, StreamWriter), it always uses the encoding specified by the TextWriter, and ignores the one specified in XmlWriterSettings. For StringWriter, the encoding is always UTF-16, because that's the encoding used by .NET strings in memory.
The workaround is to write to a MemoryStream and read the string from there:
var encoding = Encoding.GetEncoding("ISO-8859-1");
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
OmitXmlDeclaration = false,
Encoding = encoding
};
using (var stream = new MemoryStream())
{
using (var xmlWriter = XmlWriter.Create(stream, xmlWriterSettings))
{
serializer.Serialize(xmlWriter, obj, ns);
}
return encoding.GetString(stream.ToArray());
}
Note that if you have the XML in a string, when you write it to a file you need to make sure to write it with the same encoding.

How to return xml as UTF-8 instead of UTF-16

I am using a routine that serializes <T>. It works, but when downloaded to the browser I see a blank page. I can view the page source or open the download in a text editor and I see the xml, but it is in UTF-16 which I think is why browser pages show blank?
How do I modify my serializer routine to return UTF-8 instead of UTF-16?
The XML source returned:
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>January</string>
<string>February</string>
<string>March</string>
<string>April</string>
<string>May</string>
<string>June</string>
<string>July</string>
<string>August</string>
<string>September</string>
<string>October</string>
<string>November</string>
<string>December</string>
<string />
</ArrayOfString>
An example call to the serializer:
DateTimeFormatInfo dateTimeFormatInfo = new DateTimeFormatInfo();
var months = dateTimeFormatInfo.MonthNames.ToList();
string SelectionId = "1234567890";
return new XmlResult<List<string>>(SelectionId)
{
Data = months
};
The Serializer:
public class XmlResult<T> : ActionResult
{
private string filename = DateTime.Now.ToString("ddmmyyyyhhss");
public T Data { private get; set; }
public XmlResult(string selectionId = "")
{
if (selectionId != "")
{
filename = selectionId;
}
}
public override void ExecuteResult(ControllerContext context)
{
HttpContextBase httpContextBase = context.HttpContext;
httpContextBase.Response.Buffer = true;
httpContextBase.Response.Clear();
httpContextBase.Response.AddHeader("content-disposition", "attachment; filename=" + filename + ".xml");
httpContextBase.Response.ContentType = "text/xml";
using (StringWriter writer = new StringWriter())
{
XmlSerializer xml = new XmlSerializer(typeof(T));
xml.Serialize(writer, Data);
httpContextBase.Response.Write(writer);
}
}
}
You can use a StringWriter that will force UTF8. Here is one way to do it:
public class Utf8StringWriter : StringWriter
{
// Use UTF8 encoding but write no BOM to the wire
public override Encoding Encoding
{
get { return new UTF8Encoding(false); } // in real code I'll cache this encoding.
}
}
and then use the Utf8StringWriter writer in your code.
using (StringWriter writer = new Utf8StringWriter())
{
XmlSerializer xml = new XmlSerializer(typeof(T));
xml.Serialize(writer, Data);
httpContextBase.Response.Write(writer);
}
answer is inspired by Serializing an object as UTF-8 XML in .NET
Encoding of the Response
I am not quite familiar with this part of the framework. But according to the MSDN you can set the content encoding of an HttpResponse like this:
httpContextBase.Response.ContentEncoding = Encoding.UTF8;
Encoding as seen by the XmlSerializer
After reading your question again I see that this is the tough part. The problem lies within the use of the StringWriter. Because .NET Strings are always stored as UTF-16 (citation needed ^^) the StringWriter returns this as its encoding. Thus the XmlSerializer writes the XML-Declaration as
<?xml version="1.0" encoding="utf-16"?>
To work around that you can write into an MemoryStream like this:
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
XmlSerializer xml = new XmlSerializer(typeof(T));
xml.Serialize(writer, Data);
// I am not 100% sure if this can be optimized
httpContextBase.Response.BinaryWrite(stream.ToArray());
}
Other approaches
Another edit: I just noticed this SO answer linked by jtm001. Condensed the solution there is to provide the XmlSerializer with a custom XmlWriter that is configured to use UTF8 as encoding.
Athari proposes to derive from the StringWriter and advertise the encoding as UTF8.
To my understanding both solutions should work as well. I think the take-away here is that you will need one kind of boilerplate code or another...
To serialize as UTF8 string:
private string Serialize(MyData data)
{
XmlSerializer ser = new XmlSerializer(typeof(MyData));
// Using a MemoryStream to store the serialized string as a byte array,
// which is "encoding-agnostic"
using (MemoryStream ms = new MemoryStream())
// Few options here, but remember to use a signature that allows you to
// specify the encoding
using (XmlTextWriter tw = new XmlTextWriter(ms, Encoding.UTF8))
{
tw.Formatting = Formatting.Indented;
ser.Serialize(tw, data);
// Now we get the serialized data as a string in the desired encoding
return Encoding.UTF8.GetString(ms.ToArray());
}
}
To return it as XML on a web response, don't forget to set the response encoding:
string xml = Serialize(data);
Response.ContentType = "application/xml";
Response.ContentEncoding = System.Text.Encoding.UTF8;
Response.Output.Write(xml);

Live SDK - Uploading XML file trough Memory Stream

I have a bit of a problem with the client.UploadAsync method of Live SDK (SkyDrive SDK). My code for some reason doesn't work or more specifically it uploads an empty file. It doesn't throw any error and the serialization to stream works (I know that for sure).
It even seems that the Memory Stream is OK. (since I have no tool to really see the data in it I just guess it is OK by looking at its 'Length' property).
The UploadAsync method is fine as well or at least it worked well when I first serialized the data into a .xml file in IsolatedStorage then read it with IsolatedStorageFileStream and then eventualy send that stream. (then it uploaded the data)
Any advice on why this may be happening?
public void UploadFile<T>(string skyDriveFolderID, T data, string fileNameInSkyDrive)
{
this.fileNameInSkyDrive = fileNameInSkyDrive;
{
try
{
memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Indent = true;
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings))
{
serializer.Serialize(xmlWriter, data);
}
client.UploadAsync(skyDriveFolderID, fileNameInSkyDrive, true, memoryStream, null);
}
catch (Exception ex)
{
if (memoryStream != null) { memoryStream.Dispose(); }
}
}
}
You have to "rewind" the memorystream to the start before calling the UploadAsync method. Imagine the memorystream being like a tape which you record things on. The "read/write-head" is always floating over some point of the tape, which is the end in your case because you just wrote all serialized data onto it. The uploading method tries to read from it by moving forward on the tape, realizing it is already at its end. Thus you get an empty file uploaded.
The method you need for rewinding is:
memoryStream.Seek(0, SeekOrigin.Begin);
Also, it is good practice to use the using directive for IDisposable objects, which the memorystream is. This way you don't need a try {...} finally { ...Dispose(); } (this is done by the using).
Your method could then look like:
public void UploadFile<T>(string skyDriveFolderID, T data, string fileNameInSkyDrive)
{
this.fileNameInSkyDrive = fileNameInSkyDrive;
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Indent = true;
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (var memoryStream = new MemoryStream())
{
using (var xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings))
{
serializer.Serialize(xmlWriter, data);
}
memoryStream.Seek(0, SeekOrigin.Begin);
client.UploadAsync(skyDriveFolderID, fileNameInSkyDrive, true, memoryStream, null);
}
}

Serializing a simple array with XmlSerializer

Its late and fully possible I'm missing something obvious but what is it?
I'm trying to create a backing property which reveals an int array as serialized (which is then used to build up a Queue).
I'm pretty sure this is right but the getter always return a blank string, even when there are values in there (not that it should ever return a blank string.
Here is my code:
readonly Lazy<XmlSerializer> _queueSerializer = new Lazy<XmlSerializer>(() => new XmlSerializer(typeof(int[])));
[StringLength(1000)]
public string _MostRecentPlayers
{
get
{
var stream = new MemoryStream();
_queueSerializer.Value.Serialize(stream, _mostRecentPlayers.ToArray());
return new StreamReader(stream).ReadToEnd();
}
set
{
if (value.IsEmpty())
{
_mostRecentPlayers.Clear();
return;
}
MemoryStream stream = new MemoryStream(Encoding.ASCII.GetBytes(value));
var tempQueue = _queueSerializer.Value.Deserialize(stream) as int[];
_mostRecentPlayers.Clear();
tempQueue.ForEach(_mostRecentPlayers.Enqueue);
}
}
readonly Queue<int> _mostRecentPlayers = new Queue<int>(_mostRecentAmountTracked);
You haven't rewound the stream; it is positioned at the end. Set .Position = 0 before reading it. Or easier, just serialize to a StringWriter, or if you really want to use a MemoryStream, pass the (oversized) backing array from GetBuffer() along with the .Length to an Encoding and call GetString().
using(var sw = new StringWriter()) {
_queueSerializer.Value.Serialize(sw, _mostRecentPlayers.ToArray());
xml = sw.ToString();
}
or for ASCII (see comments):
using(var ms = new MemoryStream()) {
var settings = new XmlWriterSettings {
Encoding = Encoding.ASCII
};
using(var xw = XmlWriter.Create(ms, settings)) {
_queueSerializer.Value.Serialize(xw, _mostRecentPlayers.ToArray());
}
xml = Encoding.ASCII.GetString(ms.GetBuffer(), 0, (int)ms.Length);
}
Also, unless it is unlikely that you will serialize in the exe, I would suggest simplifying to just:
static readonly XmlSerializer _queueSerializer =new XmlSerializer(typeof(int[]));
Finally, note that xml is quite verbose as a mechansim to throw some ints around. CSV would seem a lot simpler (assuming you want text).

Categories

Resources