Why is binary file vary large compared to text?

Why is binary file vary large compared to text? - c#

I have been keeping a large set of data as TEXT records in a TEXT file:
yyyyMMddTHHmmssfff doube1 double2
However when I read it I need to parse each DateTime. This is quite slow for millions of records.
So, now I am trying it as a binary file which I created by serlializing my class.
That way I do not need to parse the DateTime.
class MyRecord
{
DateTime DT;
double Price1;
double Price2;
}
public byte[] SerializeToByteArray()
{
var bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, this);
return ms.ToArray();
}
}
MyRecord mr = new MyRecord();
outBin = new BinaryWriter(File.Create(binFileName, 2048, FileOptions.None));
for (AllRecords) //Pseudo
{
mr = new MyRecord(); //Pseudo
outBin.Write(mr.SerializeToByteArray());
}
The resulting binary is on average 3 times the size of the TEXT file.
Is that to be expected?
EDIT 1
I am exploring using Protbuf to help me:
I want to do this with using USING to fit my existing structure.
private void DisplayBtn_Click(object sender, EventArgs e)
{
string fileName = dbDirectory + #"\nAD20120101.dat";
FileStream fs = File.OpenRead(fileName);
MyRecord tr;
while (fs.CanRead)
{
tr = Serializer.Deserialize<MyRecord>(fs);
Console.WriteLine("> "+ tr.ToString());
}
}
BUT after first record tr - full of zeroes.

Your archive likely has considerable overhead serializing type information with each record.
Instead, make the whole collection serializable (if it isn't already) and serialize that in one go.

You are not storing a simple binary version of your DateTime, but an object representing those. That is much larger then simply storing your Date as Text.
If you create a class
class MyRecords
{
DateTime[] DT;
double[] Price1;
double[] Price2;
}
And serialize that, it should be much smaller.
Also I guess DateTime still needs lots of space, so you can convert your DateTime to a Integer Unix Timestamp and store that.

As Requested by the OP.
the output is not a binary file it's binary serialization of instances plus an overhead of BinaryFormatter to allow deserialization later for this reason you get 3 times the file large than expected
if you need a smart serialization solution you can take a look at ProtoBuf-net https://code.google.com/p/protobuf-net/
here you can find a link explaining how you can achieve this
[ProtoContract]
Public class MyRecord
{ [ProtoMember(1)]
DateTime DT;
[ProtoMember(2)]
double Price1;
[ProtoMember(3)]
double Price2;
}

Related

C# serialization without struct metadata itself

Right now I'm working on a game engine. To be more efficient and keep data from the end user, I'm trying to use serialization on a modified form of the Wavefront's *.OBJ format. I have multiple structs set up to represent data, and the serialization of the objects works fine except it takes up a significant amount of file space (at least x5 that of the original OBJ file).
To be specific, here's a quick example of what the final object would be (in a JSON-esque format):
{
[{float 5.0, float 2.0, float 1.0}, {float 7.0, float 2.0, float 1.0}, ...]
// ^^^ vertex positions
// other similar structures for colors, normals, texture coordinates
// ...
[[{int 1, int 1, int 1}, {int 2, int 2, int 1}, {int 3, int 3, int 2}], ...]
//represents one face; represents the following
//face[vertex{position index, text coords index, normal index}, vertex{}...]
}
Basically, my main issue with the method of serializing data (binary format) is it saves the names of the structs, not the values. I'd love to keep the data in the format I have already, just without saving the struct itself in my data. I want to save something similar to the above, yet it'll still let me recompile with a different struct name later.
Here's the main object I'm serializing and saving to a file:
[Serializable()] //the included structs have this applied
public struct InstantGameworksObjectData
{
public Position[] Positions;
public TextureCoordinates[] TextureCoordinates;
public Position[] Normals;
public Face[] Faces;
}
Here's the method in which I serialize and save the data:
IFormatter formatter = new BinaryFormatter();
long Beginning = DateTime.Now.Ticks / 10000000;
foreach (string file in fileNames)
{
Console.WriteLine("Begin " + Path.GetFileName(file));
var output = InstantGameworksObject.ConvertOBJToIGWO(File.ReadAllLines(file));
Console.WriteLine("Writing file");
Stream fileOutputStream = new FileStream(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo", FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(fileOutputStream, output);
Console.WriteLine(outputPath + #"\" + Path.GetFileNameWithoutExtension(file) + ".igwo");
}
The output, of course, is in binary/hex (based on what program you use to view the file), and that's great:
But putting it into a hex-to-text converter online yields specific name data:
In the long run, this could mean gigabytes worth of useless data. How can I save my C# object with the data in the correct format, just without the extra meta-clutter?

As you correctly note, the standard framework binary formatters include a host of metadata about the structure of the data. This is to try to keep the serialised data self-describing. If they were to separate the data from all that metadata, then the smallest change to the structure of classes would render the previously serialised data useless. By that token, I doubt you'd find any standard framework method of serialising binary data that didn't include all the metadata.
Even ProtoBuf includes the semantics of the data in the file data, albeit with less overhead.
Given that the structure of your data follows the reasonably common and well established form of 3D object data, you could roll your own format for your assets which strips the semantics and only stores the raw data. You can implement read and write methods easily using the BinaryReader/BinaryWriter classes (which would be my preference). If you're looking to obfuscate data from the end user, there are a variety of different ways that you could achieve that with this approach.
For example:
public static InstantGameworksObjectData ReadIgoObjct(BinaryReader pReader)
{
var lOutput = new InstantGameworksObjectData();
int lVersion = pReader.ReadInt32(); // Useful in case you ever want to change the format
int lPositionCount = pReader.ReadInt32(); // Store the length of the Position array before the data so you can pre-allocate the array.
lOutput.Positions = new Position[lPositionCount];
for ( int lPositionIndex = 0 ; lPositionIndex < lPositionCount ; ++ lPositionIndex )
{
lOutput.Positions[lPositionIndex] = new Position();
lOutput.Positions[lPositionIndex].X = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Y = pReader.ReadSingle();
lOutput.Positions[lPositionIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.Positions[lPositionIndex] = Position.ReadPosition(pReader);
}
int lTextureCoordinateCount = pReader.ReadInt32();
lOutput.TextureCoordinates = new TextureCoordinate[lPositionCount];
for ( int lTextureCoordinateIndex = 0 ; lTextureCoordinateIndex < lTextureCoordinateCount ; ++ lTextureCoordinateIndex )
{
lOutput.TextureCoordinates[lTextureCoordinateIndex] = new TextureCoordinate();
lOutput.TextureCoordinates[lTextureCoordinateIndex].X = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Y = pReader.ReadSingle();
lOutput.TextureCoordinates[lTextureCoordinateIndex].Z = pReader.ReadSingle();
// or if you prefer... lOutput.TextureCoordinates[lTextureCoordinateIndex] = TextureCoordinate.ReadTextureCoordinate(pReader);
}
// ...
}
As far as space efficiency and speed goes, this approach is hard to beat. However, this works well for the 3D objects as they're fairly well-defined and the format is not likely to change, but this approach may not extend well to the other assets that you want to store.
If you find you are needing to change class structures frequently, you may find you have to write lots of if-blocks based on version to correctly read a file, and have to regularly debug issues where the data in the file is not quite in the format you expect. A happy medium might be to use something such as ProtoBuf for the bulk of your development until you're happy with the structure of your data object classes, and then writing raw binary Read/Write methods for each of them before you release.
I'd also recommend some Unit Tests to ensure that your Read and Write methods are correctly persisting the object to avoid pulling your hair out later.
Hope this helps

Fast way to get the length of serialized data using ProtoBuf-net?

Let's say I have a Person class. It has an editable Notes property.
I want to serialize the Person instance to a fixed size buffer, thus the Notes cannot be infinite long.
In the UI, I use a TextBox to let user edit the notes. I want to dynamically update a label saying how many more characters you can write.
This is my current implementation, is there any faster method? (I'm using rs282)
public Int32 GetSerializedLength()
{
Byte[] data;
using (MemoryStream ms = new MemoryStream())
{
Serializer.SerializeWithLengthPrefix<Person>(ms, this, PrefixStyle.Base128);
data = ms.ToArray();
}
using (MemoryStream ms = new MemoryStream(data))
{
Int32 length = 0;
if (Serializer.TryReadLengthPrefix(ms, PrefixStyle.Base128, out length))
return length;
else
return -1;
}
}
EDIT: I'm confused about the internal lengh of the serialized data and the total length of the serialized data.
Here is my final version:
private static MemoryStream _ms = new MemoryStream();
public static Int64 GetSerializedLength(Person person)
{
if(null == person) return 0;
_ms.SetLength(0);
Serializer.Serialize<Person>(_ms, person);
return _ms.Length;
}

With the edit, it sounds like you want the length without serializing it (since if you want the length with serializing it, you'd just serialize it and check the .Length).
Basically, no - this isn't available. I know that some of the other implementations build this data eagerly, that is in part because they are constructing the buffered data all the time, where-as protobuf-net works from an object graph.
protobuf-net does not do this - it builds the data by discovery during a single pass over the object graph. Is there a specific purpose you have in mind? Things can always be added (with effort, though).
Re the issue of a notes (string) field that you don't want to be over-sized; as a sanity check, note that protubuf uses UTF8 or string data, so personally I would just do:
if(theText.Length > MAX || Encoding.UTF8.GetByteCount(theText) > MAX
|| GetSerializedLength(obj) > MAX)
{
//
}
note we've checked this a bit more cheaply in the obvious cases

Converting an ArrayList of Points to a byte array to store in a SQL database

I have part of an application which is used to save a users signature on a hand-held device.
I have managed to do this by having an arraylist of "Lines" which contains an arraylist of "points".
I need to store this information in a SQL database. The only info I have seen to do this, is to use the image data type and pass it a byte array. There being the question. How would I do this, or is there a better way?

If you convert this data into an image, some information (which point belongs to which line) is lost. If you want to keep this information, you should serialize your data.
Decide on the format: You could use something terribly verbose like XML...
<Line>
<Point X="3" Y="4" />
<Point X="3" Y="5" />
...
</Line>
<Line>
<Point X="10" Y="10" />
...
</Line>
...
...or some custom text-based format:
3,4-3,5-...;10,10-...;...
Of course, you'll be most efficient storing the data in some kind of binary encoding.
If you use a text- (or XML-)based format, your serialization will yield a string which you can store in a varchar(MAX) or text field in SQL Server. If you decide on a binary encoding, the serialization will result in a byte array. For this, varbinary(MAX) or image is the data type of choice. Note that the name image might be misleading: This does not necessarily mean that your data is encoded as an "image" (jpeg, png, etc.).
Implementation: For XML and a standard binary encoding, .NET provides built-in functions. Here's an introduction and links to walkthroughs on MSDN:
Serialization (C# and Visual Basic)
If you decide on some custom text-based or custom binary encoding, you will need to code the serialization and deserialization parts yourself. If you want help with that, you need to provide some more information: How does your data structure look exactly? What data types are your X and Y coordinates?

It sounds like you want some form of binary serialization; here is a working version using protobuf-net (note I changed ArrayList to List<T>):
using System.Collections.Generic;
using System.IO;
using ProtoBuf;
[ProtoContract]
class Line
{
private readonly List<Point> points = new List<Point>();
[ProtoMember(1, DataFormat = DataFormat.Group)]
public List<Point> Points { get { return points; } }
}
[ProtoContract]
class Point
{
[ProtoMember(1)]
public int X { get; set; }
[ProtoMember(2)]
public int Y { get; set; }
}
static class Program
{
static void Main()
{
var lines = new List<Line> {
new Line { Points = {
new Point { X = 1, Y = 2}
}},
new Line { Points = {
new Point { X = 3, Y = 4},
new Point { X = 5, Y = 6}
}},
};
byte[] raw;
// serialize
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, lines);
raw = ms.ToArray();
}
List<Line> clone;
// deserialize
using (var ms = new MemoryStream(raw))
{
clone = Serializer.Deserialize<List<Line>>(ms);
}
}
}

So you let the user create a bitmap-graphic by signing with a stylus?
Sounds perfectly reasonable to me to store this as an image.
create bitmap from array

Just create a black and white bitmap from your data and then convert your bitmap to PNG image using .NET library:
MemoryStream pngbuffer = new MemoryStream();
Bitmap.Save(pngbuffer,ImageFormat.Png)
And store it as a binary data. Most databases have a format for storing binary data: PostgreSQL bytea, MySQL and Oracle blob, MSSQL image etc.

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

(this is a re-post of a question that I saw in my RSS, but which was deleted by the OP. I've re-added it because I've seen this question asked several times in different places; wiki for "good form")
Suddenly, I receive a ProtoException when deserializing and the message is: unknown wire-type 6
What is a wire-type?
What are the different wire-type values and their description?
I suspect a field is causing the problem, how to debug this?

First thing to check:
IS THE INPUT DATA PROTOBUF DATA? If you try and parse another format (json, xml, csv, binary-formatter), or simply broken data (an "internal server error" html placeholder text page, for example), then it won't work.
What is a wire-type?
It is a 3-bit flag that tells it (in broad terms; it is only 3 bits after all) what the next data looks like.
Each field in protocol buffers is prefixed by a header that tells it which field (number) it represents,
and what type of data is coming next; this "what type of data" is essential to support the case where
unanticipated data is in the stream (for example, you've added fields to the data-type at one end), as
it lets the serializer know how to read past that data (or store it for round-trip if required).
What are the different wire-type values and their description?
0: variant-length integer (up to 64 bits) - base-128 encoded with the MSB indicating continuation (used as the default for integer types, including enums)
1: 64-bit - 8 bytes of data (used for double, or electively for long/ulong)
2: length-prefixed - first read an integer using variant-length encoding; this tells you how many bytes of data follow (used for strings, byte[], "packed" arrays, and as the default for child objects properties / lists)
3: "start group" - an alternative mechanism for encoding child objects that uses start/end tags - largely deprecated by Google, it is more expensive to skip an entire child-object field since you can't just "seek" past an unexpected object
4: "end group" - twinned with 3
5: 32-bit - 4 bytes of data (used for float, or electively for int/uint and other small integer types)
I suspect a field is causing the problem, how to debug this?
Are you serializing to a file? The most likely cause (in my experience) is that you have overwritten an existing file, but have not truncated it; i.e. it was 200 bytes; you've re-written it, but with only 182 bytes. There are now 18 bytes of garbage on the end of your stream that is tripping it up. Files must be truncated when re-writing protocol buffers. You can do this with FileMode:
using(var file = new FileStream(path, FileMode.Truncate)) {
// write
}
or alternatively by SetLength after writing your data:
file.SetLength(file.Position);
Other possible cause
You are (accidentally) deserializing a stream into a different type than what was serialized. It's worth double-checking both sides of the conversation to ensure this is not happening.

Since the stack trace references this StackOverflow question, I thought I'd point out that you can also receive this exception if you (accidentally) deserialize a stream into a different type than what was serialized. So it's worth double-checking both sides of the conversation to ensure this is not happening.

This can also be caused by an attempt to write more than one protobuf message to a single stream. The solution is to use SerializeWithLengthPrefix and DeserializeWithLengthPrefix.
Why this happens:
The protobuf specification supports a fairly small number of wire-types (the binary storage formats) and data-types (the .NET etc data-types). Additionally, this is not 1:1, nor is is 1:many or many:1 - a single wire-type can be used for multiple data-types, and a single data-type can be encoded via any of multiple wire-types. As a consequence, you cannot fully understand a protobuf fragment unless you already know the scema, so you know how to interpret each value. When you are, say, reading an Int32 data-type, the supported wire-types might be "varint", "fixed32" and "fixed64", where-as when reading a String data-type, the only supported wire-type is "string".
If there is no compatible map between the data-type and wire-type, then the data cannot be read, and this error is raised.
Now let's look at why this occurs in the scenario here:
[ProtoContract]
public class Data1
{
[ProtoMember(1, IsRequired=true)]
public int A { get; set; }
}
[ProtoContract]
public class Data2
{
[ProtoMember(1, IsRequired = true)]
public string B { get; set; }
}
class Program
{
static void Main(string[] args)
{
var d1 = new Data1 { A = 1};
var d2 = new Data2 { B = "Hello" };
var ms = new MemoryStream();
Serializer.Serialize(ms, d1);
Serializer.Serialize(ms, d2);
ms.Position = 0;
var d3 = Serializer.Deserialize<Data1>(ms); // This will fail
var d4 = Serializer.Deserialize<Data2>(ms);
Console.WriteLine("{0} {1}", d3, d4);
}
}
In the above, two messages are written directly after each-other. The complication is: protobuf is an appendable format, with append meaning "merge". A protobuf message does not know its own length, so the default way of reading a message is: read until EOF. However, here we have appended two different types. If we read this back, it does not know when we have finished reading the first message, so it keeps reading. When it gets to data from the second message, we find ourselves reading a "string" wire-type, but we are still trying to populate a Data1 instance, for which member 1 is an Int32. There is no map between "string" and Int32, so it explodes.
The *WithLengthPrefix methods allow the serializer to know where each message finishes; so, if we serialize a Data1 and Data2 using the *WithLengthPrefix, then deserialize a Data1 and a Data2 using the *WithLengthPrefix methods, then it correctly splits the incoming data between the two instances, only reading the right value into the right object.
Additionally, when storing heterogeneous data like this, you might want to additionally assign (via *WithLengthPrefix) a different field-number to each class; this provides greater visibility of which type is being deserialized. There is also a method in Serializer.NonGeneric which can then be used to deserialize the data without needing to know in advance what we are deserializing:
// Data1 is "1", Data2 is "2"
Serializer.SerializeWithLengthPrefix(ms, d1, PrefixStyle.Base128, 1);
Serializer.SerializeWithLengthPrefix(ms, d2, PrefixStyle.Base128, 2);
ms.Position = 0;
var lookup = new Dictionary<int,Type> { {1, typeof(Data1)}, {2,typeof(Data2)}};
object obj;
while (Serializer.NonGeneric.TryDeserializeWithLengthPrefix(ms,
PrefixStyle.Base128, fieldNum => lookup[fieldNum], out obj))
{
Console.WriteLine(obj); // writes Data1 on the first iteration,
// and Data2 on the second iteration
}

Previous answers already explain the problem better than I can. I just want to add an even simpler way to reproduce the exception.
This error will also occur simply if the type of a serialized ProtoMember is different from the expected type during deserialization.
For instance if the client sends the following message:
public class DummyRequest
{
[ProtoMember(1)]
public int Foo{ get; set; }
}
But what the server deserializes the message into is the following class:
public class DummyRequest
{
[ProtoMember(1)]
public string Foo{ get; set; }
}
Then this will result in the for this case slightly misleading error message
ProtoBuf.ProtoException: Invalid wire-type; this usually means you have over-written a file without truncating or setting the length
It will even occur if the property name changed. Let's say the client sent the following instead:
public class DummyRequest
{
[ProtoMember(1)]
public int Bar{ get; set; }
}
This will still cause the server to deserialize the int Bar to string Foo which causes the same ProtoBuf.ProtoException.
I hope this helps somebody debugging their application.

Also check the obvious that all your subclasses have [ProtoContract] attribute. Sometimes you can miss it when you have rich DTO.

I've seen this issue when using the improper Encoding type to convert the bytes in and out of strings.
Need to use Encoding.Default and not Encoding.UTF8.
using (var ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
var bytes = ms.ToArray();
str = Encoding.Default.GetString(bytes);
}

If you are using SerializeWithLengthPrefix, please mind that casting instance to object type breaks the deserialization code and causes ProtoBuf.ProtoException : Invalid wire-type.
using (var ms = new MemoryStream())
{
var msg = new Message();
Serializer.SerializeWithLengthPrefix(ms, (object)msg, PrefixStyle.Base128); // Casting msg to object breaks the deserialization code.
ms.Position = 0;
Serializer.DeserializeWithLengthPrefix<Message>(ms, PrefixStyle.Base128)
}

This happened in my case because I had something like this:
var ms = new MemoryStream();
Serializer.Serialize(ms, batch);
_queue.Add(Convert.ToBase64String(ms.ToArray()));
So basically I was putting a base64 into a queue and then, on the consumer side I had:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(myQueueItem));
var batch = Serializer.Deserialize<List<EventData>>(stream);
So though the type of each myQueueItem was correct, I forgot that I converted a string. The solution was to convert it once more:
var bytes = Convert.FromBase64String(myQueueItem);
var stream = new MemoryStream(bytes);
var batch = Serializer.Deserialize<List<EventData>>(stream);

How to convert an object to a byte array in C#

I have a collection of objects that I need to write to a binary file.
I need the bytes in the file to be compact, so I can't use BinaryFormatter. BinaryFormatter throws in all sorts of info for deserialization needs.
If I try
byte[] myBytes = (byte[]) myObject
I get a runtime exception.
I need this to be fast so I'd rather not be copying arrays of bytes around. I'd just like the cast byte[] myBytes = (byte[]) myObject to work!
OK just to be clear, I cannot have any metadata in the output file. Just the object bytes. Packed object-to-object. Based on answers received, it looks like I'll be writing low-level Buffer.BlockCopy code. Perhaps using unsafe code.

To convert an object to a byte array:
// Convert an object to a byte array
public static byte[] ObjectToByteArray(Object obj)
{
BinaryFormatter bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
You just need copy this function to your code and send to it the object that you need to convert to a byte array. If you need convert the byte array to an object again you can use the function below:
// Convert a byte array to an Object
public static Object ByteArrayToObject(byte[] arrBytes)
{
using (var memStream = new MemoryStream())
{
var binForm = new BinaryFormatter();
memStream.Write(arrBytes, 0, arrBytes.Length);
memStream.Seek(0, SeekOrigin.Begin);
var obj = binForm.Deserialize(memStream);
return obj;
}
}
You can use these functions with custom classes. You just need add the [Serializable] attribute in your class to enable serialization

If you want the serialized data to be really compact, you can write serialization methods yourself. That way you will have a minimum of overhead.
Example:
public class MyClass {
public int Id { get; set; }
public string Name { get; set; }
public byte[] Serialize() {
using (MemoryStream m = new MemoryStream()) {
using (BinaryWriter writer = new BinaryWriter(m)) {
writer.Write(Id);
writer.Write(Name);
}
return m.ToArray();
}
}
public static MyClass Desserialize(byte[] data) {
MyClass result = new MyClass();
using (MemoryStream m = new MemoryStream(data)) {
using (BinaryReader reader = new BinaryReader(m)) {
result.Id = reader.ReadInt32();
result.Name = reader.ReadString();
}
}
return result;
}
}

Well a cast from myObject to byte[] is never going to work unless you've got an explicit conversion or if myObject is a byte[]. You need a serialization framework of some kind. There are plenty out there, including Protocol Buffers which is near and dear to me. It's pretty "lean and mean" in terms of both space and time.
You'll find that almost all serialization frameworks have significant restrictions on what you can serialize, however - Protocol Buffers more than some, due to being cross-platform.
If you can give more requirements, we can help you out more - but it's never going to be as simple as casting...
EDIT: Just to respond to this:
I need my binary file to contain the
object's bytes. Only the bytes, no
metadata whatsoever. Packed
object-to-object. So I'll be
implementing custom serialization.
Please bear in mind that the bytes in your objects are quite often references... so you'll need to work out what to do with them.
I suspect you'll find that designing and implementing your own custom serialization framework is harder than you imagine.
I would personally recommend that if you only need to do this for a few specific types, you don't bother trying to come up with a general serialization framework. Just implement an instance method and a static method in all the types you need:
public void WriteTo(Stream stream)
public static WhateverType ReadFrom(Stream stream)
One thing to bear in mind: everything becomes more tricky if you've got inheritance involved. Without inheritance, if you know what type you're starting with, you don't need to include any type information. Of course, there's also the matter of versioning - do you need to worry about backward and forward compatibility with different versions of your types?

I took Crystalonics' answer and turned them into extension methods. I hope someone else will find them useful:
public static byte[] SerializeToByteArray(this object obj)
{
if (obj == null)
{
return null;
}
var bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
public static T Deserialize<T>(this byte[] byteArray) where T : class
{
if (byteArray == null)
{
return null;
}
using (var memStream = new MemoryStream())
{
var binForm = new BinaryFormatter();
memStream.Write(byteArray, 0, byteArray.Length);
memStream.Seek(0, SeekOrigin.Begin);
var obj = (T)binForm.Deserialize(memStream);
return obj;
}
}

Use of binary formatter is now considered unsafe. see --> Docs Microsoft
Just use System.Text.Json:
To serialize to bytes:
JsonSerializer.SerializeToUtf8Bytes(obj);
To deserialize to your type:
JsonSerializer.Deserialize(byteArray);

You are really talking about serialization, which can take many forms. Since you want small and binary, protocol buffers may be a viable option - giving version tolerance and portability as well. Unlike BinaryFormatter, the protocol buffers wire format doesn't include all the type metadata; just very terse markers to identify data.
In .NET there are a few implementations; in particular
protobuf-net
dotnet-protobufs
I'd humbly argue that protobuf-net (which I wrote) allows more .NET-idiomatic usage with typical C# classes ("regular" protocol-buffers tends to demand code-generation); for example:
[ProtoContract]
public class Person {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
}
....
Person person = new Person { Id = 123, Name = "abc" };
Serializer.Serialize(destStream, person);
...
Person anotherPerson = Serializer.Deserialize<Person>(sourceStream);

This worked for me:
byte[] bfoo = (byte[])foo;
foo is an Object that I'm 100% certain that is a byte array.

I found Best Way this method worked correcly for me
Use Newtonsoft.Json
public TData ByteToObj<TData>(byte[] arr){
return JsonConvert.DeserializeObject<TData>(Encoding.UTF8.GetString(arr));
}
public byte[] ObjToByte<TData>(TData data){
var json = JsonConvert.SerializeObject(data);
return Encoding.UTF8.GetBytes(json);
}

Take a look at Serialization, a technique to "convert" an entire object to a byte stream. You may send it to the network or write it into a file and then restore it back to an object later.

To access the memory of an object directly (to do a "core dump") you'll need to head into unsafe code.
If you want something more compact than BinaryWriter or a raw memory dump will give you, then you need to write some custom serialisation code that extracts the critical information from the object and packs it in an optimal way.
edit
P.S. It's very easy to wrap the BinaryWriter approach into a DeflateStream to compress the data, which will usually roughly halve the size of the data.

I believe what you're trying to do is impossible.
The junk that BinaryFormatter creates is necessary to recover the object from the file after your program stopped.
However it is possible to get the object data, you just need to know the exact size of it (more difficult than it sounds) :
public static unsafe byte[] Binarize(object obj, int size)
{
var r = new byte[size];
var rf = __makeref(obj);
var a = **(IntPtr**)(&rf);
Marshal.Copy(a, r, 0, size);
return res;
}
this can be recovered via:
public unsafe static dynamic ToObject(byte[] bytes)
{
var rf = __makeref(bytes);
**(int**)(&rf) += 8;
return GCHandle.Alloc(bytes).Target;
}
The reason why the above methods don't work for serialization is that the first four bytes in the returned data correspond to a RuntimeTypeHandle. The RuntimeTypeHandle describes the layout/type of the object but the value of it changes every time the program is ran.
EDIT: that is stupid don't do that -->
If you already know the type of the object to be deserialized for certain you can switch those bytes for BitConvertes.GetBytes((int)typeof(yourtype).TypeHandle.Value) at the time of deserialization.

I found another way to convert an object to a byte[], here is my solution:
IEnumerable en = (IEnumerable) myObject;
byte[] myBytes = en.OfType<byte>().ToArray();
Regards

This method returns an array of bytes from an object.
private byte[] ConvertBody(object model)
{
return Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(model));
}

Spans are very useful for something like this. To put it simply, they are very fast ref structs that have a pointer to the first element and a length. They guarantee a contiguous region of memory and the JIT compiler is able to optimize based on these guarantees. They work just like pointer arrays you can see all the time in the C and C++ languages.
Ever since spans have been added, you are able to use two MemoryMarshal functions that can get all bytes of an object without the overhead of streams. Under the hood, it is just a little bit of casting. Just like you asked, there are no extra allocations going down to the bytes unless you copy them to an array or another span. Here is an example of the two functions in use to get the bytes of one:
public static Span<byte> GetBytes<T>(ref T o)
where T : struct
{
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
throw new Exception($"Type {nameof(T)} is or contains a reference");
var singletonSpan = MemoryMarshal.CreateSpan(ref o, 1);
var bytes = MemoryMarshal.AsBytes(singletonSpan);
return bytes;
}
The first function, MemoryMarshal.CreateSpan, takes a reference to an object with a length for how many adjacent objects of the same type come immediately after it. They must be adjacent because spans guarantee contiguous regions of memory. In this case, the length is 1 because we are only working with the single object. Under the hood, it is done by creating a span beginning at the first element.
The second function, MemoryMarshal.AsBytes, takes a span and turns it into a span of bytes. This span still covers the argument object so any changes to the bytes will be reflected within the object. Fortunately, spans have a method called ToArray which copies all of the contents from the span into a new array. Under the hood, it creates a span over bytes instead of T and adjusts the length accordingly. If there's a span you want to copy into instead, there's the CopyTo method.
The if statement is there to ensure that you are not copying the bytes of a type that is or contains a reference for safety reasons. If it is not there, you may be copying a reference to an object that doesn't exist.
The type T must be a struct because MemoryMarshal.AsBytes requires a non-nullable type.

You can use below method to convert list of objects into byte array using System.Text.Json serialization.
private static byte[] CovertToByteArray(List<object> mergedResponse)
{
var options = new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true,
};
if (mergedResponse != null && mergedResponse.Any())
{
return JsonSerializer.SerializeToUtf8Bytes(mergedResponse, options);
}
return new byte[] { };
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is binary file vary large compared to text? - c#

Your archive likely has considerable overhead serializing type information with each record. Instead, make the whole collection serializable (if it isn't already) and serialize that in one go.

Related

C# serialization without struct metadata itself

Fast way to get the length of serialized data using ProtoBuf-net?

Converting an ArrayList of Points to a byte array to store in a SQL database

Using Protobuf-net, I suddenly got an exception about an unknown wire-type

How to convert an object to a byte array in C#

Categories

Resources