I have an abstract class that I'm trying to serialize and deserialize the concrete implementations of. In my abstract base class I have this:
[DataContract]
public class MyAbstractBase
{
[DataMember]
public string Foo { get; set; }
// some other abstract methods that derived classes have to implement
}
And to that class I add a method to serialize:
public string SerializeBase64()
{
// Serialize to a base 64 string
byte[] bytes;
long length = 0;
MemoryStream ws = new MemoryStream();
DataContractSerializer serializer = new DataContractSerializer(this.GetType());
XmlDictionaryWriter binaryDictionaryWriter = XmlDictionaryWriter.CreateBinaryWriter(ws);
serializer.WriteObject(binaryDictionaryWriter, this);
binaryDictionaryWriter.Flush();
length = ws.Length;
bytes = ws.GetBuffer();
string encodedData = bytes.Length + ":" + Convert.ToBase64String(bytes, 0, bytes.Length, Base64FormattingOptions.None);
return encodedData;
}
This seems to work fine, in that it produces "something" and doesn't actually throw any errors.
Of course, the problem comes with deserialization. I added this:
public static MyAbstractBase DeserializeBase64(string s)
{
int p = s.IndexOf(':');
int length = Convert.ToInt32(s.Substring(0, p));
// Extract data from the base 64 string!
byte[] memorydata = Convert.FromBase64String(s.Substring(p + 1));
MemoryStream rs = new MemoryStream(memorydata, 0, length);
DataContractSerializer serializer = new DataContractSerializer(typeof(MyAbstractBase ), new List<Type>() { typeof(SomeOtherClass.MyDerivedClass) });
XmlDictionaryReader binaryDictionaryReader = XmlDictionaryReader.CreateBinaryReader(rs, XmlDictionaryReaderQuotas.Max);
return (MyAbstractBase)serializer.ReadObject(binaryDictionaryReader);
}
I thought by adding the "known types" to my DataContractSerializer, it would be able to figure out how to deserialize the derived class, but it appears that it doesn't. It complains with the error:
Expecting element 'MyAbstractBase' from namespace 'http://schemas.datacontract.org/2004/07/MyApp.Foo'.. Encountered 'Element' with name 'SomeOtherClass.MyDerivedClass', namespace 'http://schemas.datacontract.org/2004/07/MyApp.Foo.Bar'.
So any idea what I'm missing here?
I put together a simple demonstration of the problem on a dot net fiddle here:
http://dotnetfiddle.net/W7GCOw
Unfortunately, it won't run directly there because it doesn't include the System.Runtime.Serialization assemblies. But if you drop it into a Visual Studio project, it will serialize fine, but balks at deserialization.
When serializing your data, use the same overloaded method for serialization as you use for deserialization:
DataContractSerializer serializer = new DataContractSerializer(typeof(MyAbstractBase ), new List<Type>() { typeof(SomeOtherClass.MyDerivedClass) });
Also, Declare a KnownType attribute around your base class so it knows what possible derived classes it may deserialize:
[DataContract]
[KnownType(typeof(SomeOtherclass.MyDerivedClass))]
public class MyAbstractBase
{
[DataMember]
public string Foo { get; set; }
// some other abstract methods that derived classes have to implement
}
So I identified a couple of problems. The first is that the CreateBinaryWriter just doesn't seem to work at all. So I dropped it and just directly serialized with serializer.WriteObject(ws,this);
The second problem is on serialization I did this:
DataContractSerializer serializer = new DataContractSerializer(this.GetType(), new List<Type>() { typeof(SomeOtherClass.MyDerivedClass) });
The problem is that the type I'm passing there isn't the base type, it's whatever type I'm calling this function from. But in deserialization I have this:
DataContractSerializer serializer = new DataContractSerializer(typeof(MyAbstractBase ), new List<Type>() { typeof(SomeOtherClass.MyDerivedClass) });
This isn't the same serializer. The type is different. So changing both to typeof(MyAbstractBase) fixed the problem in my simple example.
Of course, in my real project I'm still getting errors with it complaining on deserialization with The data at the root level is invalid. Line 1, position 1. which is odd because I've compared both the data coming out of serialization and the data going into deserialization and they are absolutely identical. No rogue BOM or anything.
EDIT: I've solved the data at the root level is invalid problem. It seems that decorating my [DataContract] attributes with explicit Name and Namespace properties solved the problem and, as an added bonus, reduced the size of my data because I greatly shortened the namespaces. Quite why the serializer couldn't cope with the real namespaces I don't know.
Another Edit: One last wrinkle. I think the namespace thing was a red herring and I got it working only by pure coincidence. The root of the problem (along with the solution) is explained here:
https://msmvps.com/blogs/peterritchie/archive/2009/04/29/datacontractserializer-readobject-is-easily-confused.aspx
When you do GetBuffer() on the MemoryStream you can get excess null characters because of the way the underlying array resizes itself as needed. This puts a bunch of nulls on the end of your serialization (which can be spotted as a bunch of As after base 64'ing the array) and these are what are screwing up the deserialization with the very confusing The data at the root level is invalid. Line 1, position 1.. It's confusing because the problem isn't at the beginning at all, it's at the END!!!
In case anybody is interested, by serialization now looks like this:
public string SerializeBase64()
{
// Serialize to a base 64 string
byte[] bytes;
long length = 0;
using (MemoryStream ws = new MemoryStream())
{
XmlDictionaryWriter writer = XmlDictionaryWriter.CreateTextWriter(ws);
DataContractSerializer serializer = new DataContractSerializer(typeof(MyAbstractBase ), new List<Type>() { typeof(SomeOtherClass.MyDerivedClass) });
serializer.WriteObject(writer, this);
writer.Flush();
length = ws.Length;
// Note: https://msmvps.com/blogs/peterritchie/archive/2009/04/29/datacontractserializer-readobject-is-easily-confused.aspx
// We need to trim nulls from the buffer produced by the serializer because it'll barf on them when it tries to deserialize.
bytes = new byte[ws.Length];
Array.Copy(ws.GetBuffer(), bytes, bytes.Length);
}
string encodedData = bytes.Length + ":" + Convert.ToBase64String(bytes, 0, bytes.Length, Base64FormattingOptions.None);
return encodedData;
}
Related
I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.
If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.
It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");
I am trying to consume a web service that claims to return JSON, but actually always returns JSONP. I don't see a way to change that service's behavior.
I would like to use NewtonSoft Json.Net to parse the result. I have declared a class, let's call it MyType that I want to deserialize the inner JSON result into.
JSONP:
parseResponse({
"total" : "13,769",
"lower" : "1",
"upper" : "20"})
As you can see this is not correct JSON as it has parseResponse( prefix and ) suffix. While this example is very simple, the actual response can be quite long, on the order of 100Ks.
MyType:
public class MyType
{
public Decimal total;
public int lower;
public int upper;
}
After I get my web service response into a stream and JsonTextReader I try to deserialize like this:
(MyType)serializer.Deserialize(jsonTextReader, typeof(MyType));
Of course I get null for a result because there is that pesky parseResponse with round brackets.
I've taken a look at this question which unfortunately does not help. I'm actually using a JsonTextReader to feed in the JSON, rather than a string (and prefer so to avoid the performance hit of creating huge a string). Even if I'd use the suggestion from that question, it looks dangerous as it uses a global replace. If there is no good way to use a stream, an answer with safe parsing of strings would be okay.
If I interpret your question as follows:
I am trying to deserialize some JSON from a Stream. The "JSON" is actually in JSONP format and so contains some prefix and postfix text I would like to ignore. How can I skip the prefix and postfix text while still reading and deserializing directly from stream rather than loading the entire stream into a string?
Then you can deserialize your JSON from a JSONP stream using the following extension method:
public static class JsonExtensions
{
public static T DeserializeEmbeddedJsonP<T>(Stream stream)
{
using (var textReader = new StreamReader(stream))
return DeserializeEmbeddedJsonP<T>(textReader);
}
public static T DeserializeEmbeddedJsonP<T>(TextReader textReader)
{
using (var jsonReader = new JsonTextReader(textReader.SkipPast('(')))
{
var settings = new JsonSerializerSettings
{
CheckAdditionalContent = false,
};
return JsonSerializer.CreateDefault(settings).Deserialize<T>(jsonReader);
}
}
}
public static class TextReaderExtensions
{
public static TTextReader SkipPast<TTextReader>(this TTextReader reader, char ch) where TTextReader : TextReader
{
while (true)
{
var c = reader.Read();
if (c == -1 || c == ch)
return reader;
}
}
}
Notes:
Prior to constructing the JsonTextReader I construct a StreamReader and skip past the first '(' character in the stream. This positions the StreamReader at the beginning of the actual JSON.
Before deserialization I set JsonSerializerSettings.CheckAdditionalContent = false to tell the serializer to ignore any characters after the end of the JSON content. Oddly enough it is necessary to do this explicitly despite the fact that the default value seems to be false already, since the underlying field is nullable.
The same code can be used to deserialize embedded JSONP from a string by passing a StringReader to DeserializeEmbeddedJsonP<T>(TextReader reader);. Doing so avoids the need to create a new string by trimming the prefix and postfix text and so may improve performance and memory use even for smaller strings.
Sample working .Net fiddle.
It looks like it's returning JSONP. Kind of weird that a webservice would do that by default, without you including "?callback". In any case, if that's just the way it is, you can easily use a RegEx to just strip off the method call:
var x = WebServiceCall();
x = Regex.Replace(x, #"^.+?\(|\)$", "");
I have class with array of some serialized class (named it elements). I want to serialize this class and then deserialize this. But deserialization is difficult.
Because I don't know type of element of my array before serialized this create two arrays match to elements array. In one (typeOfElements) keep type of element and in other (serializedElemnt) keep serialized string of elements. But I don't know after deserialized how to create my main elements array. How can I convert type to class to create my main array?
[ProtoContract]
class MyClass
{
.
.
public MyClass()
{
}
object[] elements;
[ProtoMember(1)]
string[] SerilizedElements;
[ProtoMember(2)]
string[] TypeOfElements;
[ProtoBeforeSerialization]
void initBeforeSerilize()
{
TypeOfElements = new string[elements.Length];
SerilizedElements = new string[elements.Length];
for (int i = 0; i < elements.Length; i++)
{
TypeOfElements[i] = elements[i].GetType().ToString();
using (MemoryStream ms = new MemoryStream())
{
Serializer.Serialize(ms, elements[i]);
SerilizedElements[i] = Encoding.UTF8.GetString(ms.ToArray());
}
}
}
[ProtoAfterDeserialization]
void initAfterSerilize()
{
for (int i = 0; i < SerilizedElements.Length; i++)
{
Type t = Type.GetType(TypeOfElements[i]);
using(MemoryStream ms=new MemoryStream(Encoding.ASCII.GetBytes(SerilizedElements[i])))
{
//I don't know how to write this line
elements[i]=Serializer.Deserialize<t>(ms);
}
}
}
}
For the "I only know the type at runtime" issue, look at Serializer.NonGeneric, which has all the methods you would want for working with a Type. The non-generic API is also the primary API on the v2 API, aka TypeModel. The string encoding issue has already been noted; if you need strings, base-64 should be used, but personally I'd use a byte[]. I would also suggest thinking about whether inheritance can be used instead of unknown types - this is certainly possible if the number of candidate types is finite and known.
(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?
First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.
Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.
This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}
Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.
Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.
I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}
If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}
This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);
I have a collection of objects that I need to write to a binary file.
I need the bytes in the file to be compact, so I can't use BinaryFormatter. BinaryFormatter throws in all sorts of info for deserialization needs.
If I try
byte[] myBytes = (byte[]) myObject
I get a runtime exception.
I need this to be fast so I'd rather not be copying arrays of bytes around. I'd just like the cast byte[] myBytes = (byte[]) myObject to work!
OK just to be clear, I cannot have any metadata in the output file. Just the object bytes. Packed object-to-object. Based on answers received, it looks like I'll be writing low-level Buffer.BlockCopy code. Perhaps using unsafe code.
To convert an object to a byte array:
// Convert an object to a byte array
public static byte[] ObjectToByteArray(Object obj)
{
BinaryFormatter bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
You just need copy this function to your code and send to it the object that you need to convert to a byte array. If you need convert the byte array to an object again you can use the function below:
// Convert a byte array to an Object
public static Object ByteArrayToObject(byte[] arrBytes)
{
using (var memStream = new MemoryStream())
{
var binForm = new BinaryFormatter();
memStream.Write(arrBytes, 0, arrBytes.Length);
memStream.Seek(0, SeekOrigin.Begin);
var obj = binForm.Deserialize(memStream);
return obj;
}
}
You can use these functions with custom classes. You just need add the [Serializable] attribute in your class to enable serialization
If you want the serialized data to be really compact, you can write serialization methods yourself. That way you will have a minimum of overhead.
Example:
public class MyClass {
public int Id { get; set; }
public string Name { get; set; }
public byte[] Serialize() {
using (MemoryStream m = new MemoryStream()) {
using (BinaryWriter writer = new BinaryWriter(m)) {
writer.Write(Id);
writer.Write(Name);
}
return m.ToArray();
}
}
public static MyClass Desserialize(byte[] data) {
MyClass result = new MyClass();
using (MemoryStream m = new MemoryStream(data)) {
using (BinaryReader reader = new BinaryReader(m)) {
result.Id = reader.ReadInt32();
result.Name = reader.ReadString();
}
}
return result;
}
}
Well a cast from myObject to byte[] is never going to work unless you've got an explicit conversion or if myObject is a byte[]. You need a serialization framework of some kind. There are plenty out there, including Protocol Buffers which is near and dear to me. It's pretty "lean and mean" in terms of both space and time.
You'll find that almost all serialization frameworks have significant restrictions on what you can serialize, however - Protocol Buffers more than some, due to being cross-platform.
If you can give more requirements, we can help you out more - but it's never going to be as simple as casting...
EDIT: Just to respond to this:
I need my binary file to contain the
object's bytes. Only the bytes, no
metadata whatsoever. Packed
object-to-object. So I'll be
implementing custom serialization.
Please bear in mind that the bytes in your objects are quite often references... so you'll need to work out what to do with them.
I suspect you'll find that designing and implementing your own custom serialization framework is harder than you imagine.
I would personally recommend that if you only need to do this for a few specific types, you don't bother trying to come up with a general serialization framework. Just implement an instance method and a static method in all the types you need:
public void WriteTo(Stream stream)
public static WhateverType ReadFrom(Stream stream)
One thing to bear in mind: everything becomes more tricky if you've got inheritance involved. Without inheritance, if you know what type you're starting with, you don't need to include any type information. Of course, there's also the matter of versioning - do you need to worry about backward and forward compatibility with different versions of your types?
I took Crystalonics' answer and turned them into extension methods. I hope someone else will find them useful:
public static byte[] SerializeToByteArray(this object obj)
{
if (obj == null)
{
return null;
}
var bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
public static T Deserialize<T>(this byte[] byteArray) where T : class
{
if (byteArray == null)
{
return null;
}
using (var memStream = new MemoryStream())
{
var binForm = new BinaryFormatter();
memStream.Write(byteArray, 0, byteArray.Length);
memStream.Seek(0, SeekOrigin.Begin);
var obj = (T)binForm.Deserialize(memStream);
return obj;
}
}
Use of binary formatter is now considered unsafe. see --> Docs Microsoft
Just use System.Text.Json:
To serialize to bytes:
JsonSerializer.SerializeToUtf8Bytes(obj);
To deserialize to your type:
JsonSerializer.Deserialize(byteArray);
You are really talking about serialization, which can take many forms. Since you want small and binary, protocol buffers may be a viable option - giving version tolerance and portability as well. Unlike BinaryFormatter, the protocol buffers wire format doesn't include all the type metadata; just very terse markers to identify data.
In .NET there are a few implementations; in particular
protobuf-net
dotnet-protobufs
I'd humbly argue that protobuf-net (which I wrote) allows more .NET-idiomatic usage with typical C# classes ("regular" protocol-buffers tends to demand code-generation); for example:
[ProtoContract]
public class Person {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
}
....
Person person = new Person { Id = 123, Name = "abc" };
Serializer.Serialize(destStream, person);
...
Person anotherPerson = Serializer.Deserialize<Person>(sourceStream);
This worked for me:
byte[] bfoo = (byte[])foo;
foo is an Object that I'm 100% certain that is a byte array.
I found Best Way this method worked correcly for me
Use Newtonsoft.Json
public TData ByteToObj<TData>(byte[] arr){
return JsonConvert.DeserializeObject<TData>(Encoding.UTF8.GetString(arr));
}
public byte[] ObjToByte<TData>(TData data){
var json = JsonConvert.SerializeObject(data);
return Encoding.UTF8.GetBytes(json);
}
Take a look at Serialization, a technique to "convert" an entire object to a byte stream. You may send it to the network or write it into a file and then restore it back to an object later.
To access the memory of an object directly (to do a "core dump") you'll need to head into unsafe code.
If you want something more compact than BinaryWriter or a raw memory dump will give you, then you need to write some custom serialisation code that extracts the critical information from the object and packs it in an optimal way.
edit
P.S. It's very easy to wrap the BinaryWriter approach into a DeflateStream to compress the data, which will usually roughly halve the size of the data.
I believe what you're trying to do is impossible.
The junk that BinaryFormatter creates is necessary to recover the object from the file after your program stopped.
However it is possible to get the object data, you just need to know the exact size of it (more difficult than it sounds) :
public static unsafe byte[] Binarize(object obj, int size)
{
var r = new byte[size];
var rf = __makeref(obj);
var a = **(IntPtr**)(&rf);
Marshal.Copy(a, r, 0, size);
return res;
}
this can be recovered via:
public unsafe static dynamic ToObject(byte[] bytes)
{
var rf = __makeref(bytes);
**(int**)(&rf) += 8;
return GCHandle.Alloc(bytes).Target;
}
The reason why the above methods don't work for serialization is that the first four bytes in the returned data correspond to a RuntimeTypeHandle. The RuntimeTypeHandle describes the layout/type of the object but the value of it changes every time the program is ran.
EDIT: that is stupid don't do that -->
If you already know the type of the object to be deserialized for certain you can switch those bytes for BitConvertes.GetBytes((int)typeof(yourtype).TypeHandle.Value) at the time of deserialization.
I found another way to convert an object to a byte[], here is my solution:
IEnumerable en = (IEnumerable) myObject;
byte[] myBytes = en.OfType<byte>().ToArray();
Regards
This method returns an array of bytes from an object.
private byte[] ConvertBody(object model)
{
return Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(model));
}
Spans are very useful for something like this. To put it simply, they are very fast ref structs that have a pointer to the first element and a length. They guarantee a contiguous region of memory and the JIT compiler is able to optimize based on these guarantees. They work just like pointer arrays you can see all the time in the C and C++ languages.
Ever since spans have been added, you are able to use two MemoryMarshal functions that can get all bytes of an object without the overhead of streams. Under the hood, it is just a little bit of casting. Just like you asked, there are no extra allocations going down to the bytes unless you copy them to an array or another span. Here is an example of the two functions in use to get the bytes of one:
public static Span<byte> GetBytes<T>(ref T o)
where T : struct
{
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
throw new Exception($"Type {nameof(T)} is or contains a reference");
var singletonSpan = MemoryMarshal.CreateSpan(ref o, 1);
var bytes = MemoryMarshal.AsBytes(singletonSpan);
return bytes;
}
The first function, MemoryMarshal.CreateSpan, takes a reference to an object with a length for how many adjacent objects of the same type come immediately after it. They must be adjacent because spans guarantee contiguous regions of memory. In this case, the length is 1 because we are only working with the single object. Under the hood, it is done by creating a span beginning at the first element.
The second function, MemoryMarshal.AsBytes, takes a span and turns it into a span of bytes. This span still covers the argument object so any changes to the bytes will be reflected within the object. Fortunately, spans have a method called ToArray which copies all of the contents from the span into a new array. Under the hood, it creates a span over bytes instead of T and adjusts the length accordingly. If there's a span you want to copy into instead, there's the CopyTo method.
The if statement is there to ensure that you are not copying the bytes of a type that is or contains a reference for safety reasons. If it is not there, you may be copying a reference to an object that doesn't exist.
The type T must be a struct because MemoryMarshal.AsBytes requires a non-nullable type.
You can use below method to convert list of objects into byte array using System.Text.Json serialization.
private static byte[] CovertToByteArray(List<object> mergedResponse)
{
var options = new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true,
};
if (mergedResponse != null && mergedResponse.Any())
{
return JsonSerializer.SerializeToUtf8Bytes(mergedResponse, options);
}
return new byte[] { };
}