XML serialization national chars error - c#

EDIT:
Doing too much.... this works for me with national chars
var xs = new XmlSerializer(typeof(ToDoItem));
var stringWriter = new StringWriter();
xs.Serialize(stringWriter, item);
var test = XDocument.Parse(stringWriter.ToString());
...where The item is the object containing strings with national chars
/EDIT
I did a project with serialization of some objects.
I copied some code from examples on this site and everything worked great, till I changed framework ASP.NET from 3.5 til 4.0... (and changed ISS7 .net setting from v2.0 to v4.0)
I am 99% sure this is the cause of the following error:
Before this change something like this:
var test = XDocument.Parse(SerializeObject("æøåAØÅ", typeof(string)));
test.Save(HttpContext.Current.Server.MapPath("test.xml"));
Would save the xml with the exact chars used.
Now it saves this:
���A��
I would like: Information on settings I might have to make in IIS7
OR
A comment on how to change the serializing methods to handle the national chars better.
This is the serialization code used.
private static String UTF8ByteArrayToString(Byte[] characters)
{
var encoding = new UTF8Encoding();
String constructedString = encoding.GetString(characters);
return (constructedString);
}
public static String SerializeObject(Object pObject, Type type)
{
try
{
String XmlizedString = null;
var memoryStream = new MemoryStream();
var xs = new XmlSerializer(type);
var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.ASCII);
xs.Serialize(xmlTextWriter, pObject);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
XmlizedString = UTF8ByteArrayToString(memoryStream.ToArray());
return XmlizedString.Trim();
}
catch (Exception e)
{
//Console.WriteLine(e);
return null;
}
}

You save a text as using ASCII and then decode it using UTF-8 and expect that it will work? It won't. This code could never work properly, regardless of any updates or settings.
There is no need to write the XML to a MemoryStream and then decode that. Just use StringWriter:
var xs = new XmlSerializer(type);
var stringWriter = new StringWriter();
xs.Serialize(stringWriter, pObject);
return stringWriter.ToString();

Related

.pkpass create fail because of manifest pass.json string format?

This is a very strange question.
I using C# to create a pass.json and save it to memoryStream, it work normally. After that I create the manifest.json SHA1 data which including that pass.json, the string of manifest.json like this and it is totally correct.
{"icon.png": "9423bd00e2b01c59a3265c38b5062fac7da0752d",
"icon#2x.png": "4d1db55bdaca70b685c013529a1c0dcbd7046524",
"logo.png": "ee5b053e63dbfe3b78378c15d163331d68a0ede8",
"logo#2x.png": "2f9e3a55bded1163620719a4d6c1ad496ed40c17",
"pass.json": "fd68bf77757d3057263a9aca0e5110ddd933934a"}
After generate pkpass as my phone, it can't open. I change the pass.json SHA1 code as "fd68bf77757d3057263a9aca0e5110ddd933934a" without using a value to save it, it work.
The coding like following:
// This version run success
var strPass = JavascriptSerialize(details);
var sw = new StreamWriter(assetsFolder + #"pass.json");
sw.Write(strPass);
sw.Close();
manifest.passjson = GetSha1Hash(assetsFolder + manifest.GetAssetBoardingPass(libPkPass_object_boardingPass.JsonObjects.AssetTypes.passjson));
//manifest.passjson = "2f9e3a55bded1163620719a4d6c1ad496ed40c17"
// end
// This version run fail
var strPass = JavascriptSerialize(details);
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(strPass);
writer.Write(s);
writer.Flush();
stream.Position = 0;
var a = GetSha1HashMemory(passStream);
private static string GetSha1HashMemory(Stream passStream)
{
//var bs = new BufferedStream(passStream);
using (SHA1Managed sha = new SHA1Managed())
{
byte[] checksum = sha.ComputeHash(passStream);
string sendCheckSum = BitConverter.ToString(checksum)
.Replace("-", string.Empty);
return sendCheckSum.ToString().ToLower();
}
}
manifest.passjson = a;
//manifest.passjson = "2f9e3a55bded1163620719a4d6c1ad496ed40c17" (same data )
//end
What is going on?????? I can find out any question that string is wrong.
The pkpass provide in here (sendspace).
Can any body told me where is wrong?
Big Thank!
Two mistakes :
ComputeHash(Stream) and using Stream
ComputeHash(Stream) : ComputeHash stream only using System.IO.Stream, but not MemoryStream, change to ComputeHash(bytes[]) can handle it
using Stream: I try to pass the stream to other function, it is not a good example, the stream need to create a new one and it may replace some bytes at your computer stream. In this case, I just need to call this function will out open new one, it will fix
StringBuilder formatted;
using (var sha1 = new SHA1Managed())
{
//var bytePass = ReadFully(passStream);
var bytePass = passStream.ToArray();
var hash = sha1.ComputeHash(bytePass);
formatted = new StringBuilder(2 * hash.Length);
foreach (var b in hash)
{
formatted.AppendFormat("{0:X2}", b);
}
}
manifest.passjson = formatted.ToString().ToLower();

XML Serialization changes in upgraded project (.NET 3.5 to 4.5)

Recently upgraded a 3.5 project to 4.5. There is a chunk of data that we are serializing and storing in the database, but everytime a save occurs in the upgraded project, the XML formatting has changed, throwing errors, and I can't seem to figure out the core issue. There are 2 SO questions in particular that mention encoding changes, but I've tried switching to UTF8 (in a few different ways specified in the answers on those questions), without any success - with UTF8 I just got a mess of strange characters throughout the entire file.
The main issues that I can see occurring are:
A leading ? character is added to the XML (which I've come to find out is a valid character, but we aren't handling apparently)
Child nodes aren't being included with some of the nodes.
Here is our serialization method:
public static string SerializeXml<T>(T instance)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
MemoryStream memStream = new MemoryStream();
XmlTextWriter xmlWriter = new XmlTextWriter(memStream, Encoding.Unicode);
serializer.Serialize(xmlWriter, instance);
memStream = (MemoryStream)xmlWriter.BaseStream;
return UnicodeEncoding.Unicode.GetString(memStream.ToArray()).Replace("<?xml version=\"1.0\" encoding=\"utf-16\"?>", "");
}
and our deserialization method:
public static T DeserializeXml<T>(string xml)
{
XmlSerializer xs = new XmlSerializer(typeof(T));
StringReader reader = new StringReader(xml);
return (T)xs.Deserialize(reader);
}
Any help would be appreciated, I am not too familiar with serialization or encoding. Just curious what may have changed with the upgrade to 4.5, or if there is something I need to take a closer look at.
If you want to Serialize to a String you need to use UTF16. If you want to Serialize with UTF8 you need to serialize to a byte[]. Strings in C# are UTF16 so in the code you posted I believe all your data is encoded with UTF16 but because you are omitting the Xml Declaration code assumes it is UTF8.
I would recommend using functions like this and not omitting the XmlDeclaration:
public static string SerializeXmlToString<T>(T instance)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.Unicode;
StringBuilder builder = new StringBuilder();
using (StringWriter writer = new StringWriter(builder))
using (XmlWriter xmlWriter = XmlWriter.Create(writer, settings))
{
serializer.Serialize(xmlWriter, instance);
}
return builder.ToString();
}
public static byte[] SerializeXml<T>(T instance)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
using (MemoryStream memStream = new MemoryStream())
{
using (XmlWriter xmlWriter = XmlWriter.Create(memStream, settings))
{
serializer.Serialize(xmlWriter, instance);
}
return memStream.ToArray();
}
}
public static T DeserializeXml<T>(string data)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (StringReader reader = new StringReader(data))
{
return (T)serializer.Deserialize(reader);
}
}
public static T DeserializeXml<T>(byte[] bytes)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
using(MemoryStream stream = new MemoryStream(bytes))
{
return (T)serializer.Deserialize(stream);
}
}

Convert XDocument to byte array (and byte array to XDocument)

I've taken over a system that stores large XML documents in SQL Server in binary format.
Currently the data is saved by converting it to a string, then converting that string to a byte array. But recently with some large XML documents I'm getting out memory exceptions when attempting to convert to a string, so I want to bypass this process and go straight from the XDocument to a byte array.
The Entity Framework class holding the XML has been extended so that the binary data is accessible as a string like this:
partial class XmlData
{
public string XmlString { get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
}
I want to further extend the class to look something like this:
partial class XmlData
{
public string XmlString{ get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
public XDocument XDoc
{
get
{
// Convert XmlBinary to XDocument
}
set
{
// Convert XDocument to XmlBinary
}
}
}
I think I've nearly figured out the conversion, but when I use the partial classes XmlString method to get the XML back from the DB, the XML has always been cut off near the end, always at a different character count:
var memoryStream = new MemoryStream();
var xmlWriter = XmlWriter.Create(memoryStream);
myXDocument.WriteTo(xmlWriter);
XmlData.XmlBinary = memoryStream.ToArray();
SOLUTION
Here's the basic conversion:
var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
using (var memoryStream = new MemoryStream())
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
myXDocument.WriteTo(xmlWriter);
xmlWriter.Flush();
XmlData.XmlBinary = memoryStream.ToArray();
}
But for some reason in this process, some weird non ascii characters get added to the XML so using my previous XmlString method would load those weird characters and XDocument.Parse() would break, so my new partial class looks like this:
partial class XmlData
{
public string XmlString
{
get
{
var xml = Encoding.UTF8.GetString(XmlBinary);
xml = Regex.Replace(xml, #"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
return xml;
}
set
{
value = Regex.Replace(value, #"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
XmlBinary = Encoding.UTF8.GetBytes(value);
}
}
public XDocument XDoc
{
get
{
using (var memoryStream = new MemoryStream(XmlBinary))
using (var xmlReader = XmlReader.Create(memoryStream))
{
var xml = XDocument.Load(xmlReader);
return xml;
}
}
set
{
var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
using (var memoryStream = new MemoryStream())
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
value.WriteTo(xmlWriter);
xmlWriter.Flush();
XmlBinary = memoryStream.ToArray();
}
}
}
}
That sounds like buffer of one of streams / writers was not flushed during read or write - use using (...) for autoclose, flush and dispose, and also check that in all places where you finished read / write you've done .Flush()

C# XML Serialization - Leading Question Marks

Problem
By leveraging some samples I found online here, I've written some XML serialization methods.
Method1: Serialize an Object and return: (a) the type, (b) the xml string
Method2: Takes (a) and (b) above and gives you back the Object.
I noticed that the xml string from the Method1 contains a leading '?'. This seems to be fine when using Method2 to reconstruct the Object.
But when doing some testing in the application, sometimes we got leading '???' instead. This caused the Method2 to throw an exception while trying to reconstruct the Object.
The 'Object' in this case was just a simple int.
System.InvalidOperationException was unhandled
Message="There is an error in XML document (1, 1)."
Source="System.Xml"
StackTrace:
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
at XMLSerialization.Program.DeserializeXmlStringToObject(String xmlString, String objectType) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 96
at XMLSerialization.Program.Main(String[] args) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 49
Would anyone be able to shed some light on what might be causing this?
Sample Code
Here's sample code from the mini-tester I wrote while coding this up which runs as a VS console app. It'll show you the XML string. You can also uncomment the regions to append the extra leading '??' to reproduce the exception.
using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace XMLSerialization
{
class Program
{
static void Main(string[] args)
{
// deserialize to string
#region int
object inObj = 5;
#endregion
#region string
//object inObj = "Testing123";
#endregion
#region list
//List inObj = new List();
//inObj.Add("0:25");
//inObj.Add("1:26");
#endregion
string[] stringArray = SerializeObjectToXmlString(inObj);
#region include leading ???
//int indexOfBracket = stringArray[0].IndexOf('<');
//stringArray[0] = "??" + stringArray[0];
#endregion
#region strip out leading ???
//int indexOfBracket = stringArray[0].IndexOf('<');
//string trimmedString = stringArray[0].Substring(indexOfBracket);
//stringArray[0] = trimmedString;
#endregion
Console.WriteLine("Input");
Console.WriteLine("-----");
Console.WriteLine("Object Type: " + stringArray[1]);
Console.WriteLine();
Console.WriteLine("XML String: " + Environment.NewLine + stringArray[0]);
Console.WriteLine(String.Empty);
// serialize back to object
object outObj = DeserializeXmlStringToObject(stringArray[0], stringArray[1]);
Console.WriteLine("Output");
Console.WriteLine("------");
#region int
Console.WriteLine("Object: " + (int)outObj);
#endregion
#region string
//Console.WriteLine("Object: " + (string)outObj);
#endregion
#region list
//string[] tempArray;
//List list = (List)outObj;
//foreach (string pair in list)
//{
// tempArray = pair.Split(':');
// Console.WriteLine(String.Format("Key:{0} Value:{1}", tempArray[0], tempArray[1]));
//}
#endregion
Console.Read();
}
private static string[] SerializeObjectToXmlString(object obj)
{
XmlTextWriter writer = new XmlTextWriter(new MemoryStream(), Encoding.UTF8);
writer.Formatting = Formatting.Indented;
XmlSerializer serializer = new XmlSerializer(obj.GetType());
serializer.Serialize(writer, obj);
MemoryStream stream = (MemoryStream)writer.BaseStream;
string xmlString = UTF8ByteArrayToString(stream.ToArray());
string objectType = obj.GetType().FullName;
return new string[]{xmlString, objectType};
}
private static object DeserializeXmlStringToObject(string xmlString, string objectType)
{
MemoryStream stream = new MemoryStream(StringToUTF8ByteArray(xmlString));
XmlSerializer serializer = new XmlSerializer(Type.GetType(objectType));
object obj = serializer.Deserialize(stream);
return obj;
}
private static string UTF8ByteArrayToString(Byte[] characters)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetString(characters);
}
private static byte[] StringToUTF8ByteArray(String pXmlString)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(pXmlString);
}
}
}
When I've come across this before, it usually had to do with encoding. I'd try specifying the encoding when you serialize your object. Try using the following code. Also, is there any specific reason why you need to return a string[] array? I've changed your methods to use generics so you don't have to specify a type.
private static string SerializeObjectToXmlString<T>(T obj)
{
XmlSerializer xmls = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
settings.Indent = true;
settings.IndentChars = "\t";
settings.NewLineChars = Environment.NewLine;
settings.ConformanceLevel = ConformanceLevel.Document;
using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
{
xmls.Serialize(writer, obj);
}
string xml = Encoding.UTF8.GetString(ms.ToArray());
return xml;
}
}
private static T DeserializeXmlStringToObject <T>(string xmlString)
{
XmlSerializer xmls = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(xmlString)))
{
return (T)xmls.Deserialize(ms);
}
}
If you still have problems, try using Encoding.ASCII in your code anywhere you see Encoding.UTF8, unless you have a specific reason for using UTF8. I'm not sure of the cause, but I've seen UTF8 encoding cause this exact problem in certain cases when serializing.
This is BOM symbol. You can either remove it
if (xmlString.Length > 0 && xmlString[0] != '<')
{
xmlString = xmlString.Substring(1, xmlString.Length - 1);
}
Or use UTF32 to serialize
using (StringWriter writer = new StringWriter(CultureInfo.InvariantCulture))
{
serializer.Serialize(writer, instance);
result = writer.ToString();
}
And deserialize
object result;
using (StringReader reader = new StringReader(instance))
{
result = serializer.Deserialize(reader);
}
If you are using this code only inside .Net applications using UTF32 won't create problems as it's the default encoding for everything inside .Net

Serialization in C# without using file system

I have a simple 2D array of strings and I would like to stuff it into an SPFieldMultiLineText in MOSS. This maps to an ntext database field.
I know I can serialize to XML and store to the file system, but I would like to serialize without touching the filesystem.
public override void ItemAdding(SPItemEventProperties properties)
{
// build the array
List<List<string>> matrix = new List<List<string>>();
/*
* populating the array is snipped, works fine
*/
// now stick this matrix into the field in my list item
properties.AfterProperties["myNoteField"] = matrix; // throws an error
}
Looks like I should be able to do something like this:
XmlSerializer s = new XmlSerializer(typeof(List<List<string>>));
properties.AfterProperties["myNoteField"] = s.Serialize.ToString();
but that doesn't work. All the examples I've found demonstrate writing to a text file.
StringWriter outStream = new StringWriter();
XmlSerializer s = new XmlSerializer(typeof(List<List<string>>));
s.Serialize(outStream, myObj);
properties.AfterProperties["myNoteField"] = outStream.ToString();
Here's a Generic serializer (C#):
public string SerializeObject<T>(T objectToSerialize)
{
BinaryFormatter bf = new BinaryFormatter();
MemoryStream memStr = new MemoryStream();
try
{
bf.Serialize(memStr, objectToSerialize);
memStr.Position = 0;
return Convert.ToBase64String(memStr.ToArray());
}
finally
{
memStr.Close();
}
}
In your case you could call with:
SerializeObject<List<string>>(matrix);
Use the TextWriter and TextReader classes with the StringWriter.
To Wit:
XmlSerializer s = new XmlSerializer(typeof(whatever));
TextWriter w = new StringWriter();
s.Serialize(w, whatever);
yourstring = w.ToString();
IN VB.NET
Public Shared Function SerializeToByteArray(ByVal object2Serialize As Object) As Byte()
Using stream As New MemoryStream
Dim xmlSerializer As New XmlSerializer(object2Serialize.GetType())
xmlSerializer.Serialize(stream, object2Serialize)
Return stream.ToArray()
End Using
End Function
Public Shared Function SerializeToString(ByVal object2Serialize As Object) As String
Dim bytes As Bytes() = SerializeToByteArray(object2Serialize)
Return Text.UTF8Encoding.GetString(bytes)
End Function
IN C#
public byte[] SerializeToByteArray(object object2Serialize) {
using(MemoryStream stream = new MemoryStream()) {
XmlSerializer xmlSerializer = new XmlSerializer(object2Serialize.GetType());
xmlSerializer.Serialize(stream, object2Serialize);
return stream.ToArray();
}
}
public string SerializeToString(object object2Serialize) {
byte[] bytes = SerializeToByteArray(object2Serialize);
return Text.UTF8Encoding.GetString(bytes);
}

Categories

Resources