in the example below how do i get the length number using PrefixStyle with ProtoBuf-net?
and what's the difference between PrefixStyle.Base128 and PrefixStyle.Fixed32?
Thanks!
PerfTest clone;
using (MemoryStream ms = new MemoryStream())
{
Serializer.SerializeWithLengthPrefix(ms, obj,PrefixStyle.Base128);
byte[] raw = ms.ToArray();
ms.Position = 0;
clone = Serializer.DeserializeWithLengthPrefix<PerfTest>(ms,PrefixStyle.Base128);
}
Edit: Using the code below the byte array has a length of 22. Why does TryReadLengthPrefix return 21? Surely is should return 22?
PerfTest clone;
using (MemoryStream ms = new MemoryStream())
{
Serializer.SerializeWithLengthPrefix(ms, obj,PrefixStyle.Base128);
byte[] raw = ms.ToArray();
ms.Position = 0;
int bArrayLen = ms.ToArray().Length; //returns 22
int len;// set to 21. Why not 22?
Serializer.TryReadLengthPrefix(ms, PrefixStyle.Base128,out len);
clone = Serializer.DeserializeWithLengthPrefix<PerfTest>(ms,PrefixStyle.Fixed32);
}
Fixed32 always uses 4 bytes for the length prefix - this seems to be not uncommon for people packing messages manually (indeed, I even had to add different-endian versions at some point, due to repeat requests).
The preference though should be Base128, which uses "varint" encoding, so small numbers take less space to encode. Additionally, this prefix style can be used to make a sequence of objects wire-compatible with a single object with a repeated field, which has some useful applications.
Re getting the length; if you are using DeserializeWithLenghPrefix or DeserializeItems you shouldnt need to - it handles it internally. But if you need this, look at Serializer.TryReadLengthPrefix.
To clarify: the purpose of the *WithLengthPrefix methods is to allow separation of different objects/messages in a single stream - most commonly: network sockets. This is necessary because the protocol itself otherwise assumes an entire stream is a single message, so would keep trying to read until the stream/socket was closed.
Re the 22 vs 21 bytes; that is because the length prefix itself takes some length :) actually if the array is 22 I'm a little surprised the payload isn't 20 - I byte field headed (to make the composed stream itself a valid protobuf stream), 1 byte length, and 20 bytes payload. I'm on a mobile at the moment, so I can't run the sample to investigate; it is possible to omit the field-header (optionally), though - so it could be 1 byte length, 21 bytes payload. I'll look later.
Related
I was reading this popular stack overflow question Creating a byte array from a stream and wanted to get some clarification on how byte arrays work.
in this chunk of code here:
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = PictureStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
Here's what I'm not understanding:
I'm getting lost on the size that this array is set to. For example, I use that code chunk to convert an image stream to a byte array, but i'm usually reading images that are larger than 2 megabytes, which is far larger than the size of the array that's reading in the picture- 16*1024 bytes. However, the above code converts the image from a stream to a byte array totally fine, no "out of bounds index" errors to be had.
So how is my array a smaller size than the photo I'm reading in, yet still manages to read it totally fine?
The array you pass is just a buffer. When you read from the stream it returns the number of bytes read and populates the buffer array with that many elements (it is not always fully filled). Then you write that many bytes to the memory stream. This process is repeated until there are no more bytes to read from the file.
You will notice that the array produced by ToArray is much larger than your buffer size.
As already mentioned in the comments.
The function read of Picture stream only reads a chunk of data,
actually exactly the amount which the transport buffer has.
The we read this amount we write it to the output stream from the transport buffer.
I tried to write some code snipped to demonstrates what is going on:
int inputBufferSizeInByte = 1024 * 1000 * 5; // 5 MiB = 5000 KiB
// AmountKiloByte * factor MiB * factorWhatWeWant
Byte[] inputBuffer = new Byte[inputBufferSizeInByte];
//we fill our inputBuffer with random numbers
Random rnd = new Random();
rnd.NextBytes(inputBuffer);
//we define our streams
MemoryStream inputMemoryStream = new MemoryStream(inputBuffer);
MemoryStream outPutMemoryStream = new MemoryStream();
//we define a smaller buffer for reading
int transportBufferSizeInByte = 1024 * 16; // 16 KiB
byte[] transportBufferFor = new byte[transportBufferSizeInByte];
int amountTotalWeReadInByte = 0;
int tempReadAmountInByte = 0;
int callWriteCounter = 0;
do
{
tempReadAmountInByte = inputMemoryStream.Read(transportBufferFor, 0, transportBufferSizeInByte);
//we write what we got to the output
if(tempReadAmountInByte>0)
{
outPutMemoryStream.Write(transportBufferFor, 0, tempReadAmountInByte);
callWriteCounter++;
}
//we calc how the total amout
amountTotalWeReadInByte += tempReadAmountInByte;
} while (tempReadAmountInByte > 0);
//we sum up
Console.WriteLine("input buffer size: \t" + inputBufferSizeInByte + " \t in Byte");
Console.WriteLine("total amount read \t" + amountTotalWeReadInByte + " \t in Byte");
Console.WriteLine("output stream size: \t" + outPutMemoryStream.Length + " \t in Byte");
Console.WriteLine("called strean write \t" + callWriteCounter + "\t\t times");
output:
input buffer size: 5120000 in Byte
total amount read 5120000 in Byte
output stream size: 5120000 in Byte
called strean write 313 times
So we call 313 times the stream write function and everthing behaves like it should.
That's brings me to key question:
why is there in size difference between the picture in memory and in hard disk ?
I do think the picture encoding is the reason.
The difference of the size of a picture on the hard disk and its memory representation belongs often
to the picture encoding. I know this fact from working with the cpp library opencv.
I rather guess the c# implementation behaves similar.
See some Q/A about this topic:
[question]: JPEG image memory byte size from OpenCV imread doesn't seem right
Q: Is there any benefit of storing the length of a large array within the array itself?
Explanation:
Let's say we compress some large binary serialized object by using the GZipStream class of the System.IO.Compression namespace.
The output will be a Base64 string of some compressed byte array.
At some later point the Base64 string gets converted back to a byte array and the data needs to be decompressed.
While compressing the data we create a new byte array with the size of the compressed byte array + 4.
In the first 4 bytes we store the length/size of the compressed byte array and we then BlockCopy the length and the data to the new array. This new array gets converted into a Base64 string.
While decompressing we convert the Base64 string into a byte array.
Now we can extract the length of the actual compressed data by using the BitConverter class which will extract a Int32 from the first 4 bytes.
We then allocate a byte array with the length that we got from the first 4 bytes and let the Stream write the decompressed bytes to the byte array.
I can't image that something like this actually has any benefit at all.
It adds more complexity to the code and more operations need to be executed.
Readability is reduced too.
The BlockCopy operations alone should consume so much resources that this just cannot have a benefit, right?
Compression example code:
byte[] buffer = new byte[0xffff] // Some large binary serialized object
// Compress in-memory.
using (var mem = new MemoryStream())
{
// The actual compression takes place here.
using (var zipStream = new GZipStream(mem, CompressionMode.Compress, true)) {
zipStream.Write(buffer, 0, buffer.Length);
}
// Store compressed byte data here.
var compressedData = new byte[mem.Length];
mem.Position = 0;
mem.Read(compressedData, 0, compressedData.Length);
/* Increase the size by 4 to accommadate for an Int32 that
** will store the total length of the compressed data. */
var zipBuffer = new byte[compressedData.Length + 4];
// Store length of compressedData array in the first 4 bytes.
Buffer.BlockCopy(compressedData, 0, zipBuffer, 4, compressedData.Length);
// Store the compressedData array after the first 4 bytes which store the length.
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, zipBuffer, 0, 4);
return Convert.ToBase64String(zipBuffer);
}
Decompression example code:
byte[] zipBuffer = Convert.FromBase64String("some base64 string");
using (var inStream = new MemoryStream())
{
// The length of the array that was stored in the first 4 bytes.
int dataLength = BitConverter.ToInt32(zipBuffer, 0);
// Allocate array with specific size.
byte[] buffer = new byte[dataLength];
// Writer data to buffer array.
inStream.Write(zipBuffer, 4, zipBuffer.Length - 4);
inStream.Position = 0;
// Decompress data.
using (var zipStream = new GZipStream(inStream, CompressionMode.Decompress)) {
zipStream.Read(buffer, 0, buffer.Length);
}
... code
... code
... code
}
You tagged the question as C#, wich means .NET, so the question is irrelevant:
The Framework already store the length with the Array. It is how the array classes do the sanity checks on Indexers. It how it prevents overflow attacks in managed code. That help alone is worth any minor inefficiency (note that the JiT is actually able to prune most of the checks. With a loop for example, it will simply look at the running variable once per loop).
You would have to go all the way into unmanaged code and handling naked pointers to have a hope to get rid of it. But why would you? The difference is so small, it falls under the speed rant. If it maters, you propably got a realtime programming case. And starting those with .NET was a bad idea.
Good day. For a current project I need to know how datatypes are represented as bytes. For example, if I use :
long three = 500;var bytes = BitConverter.GetBytes(three);
I get the values 244,1,0,0,0,0,0,0. I get that it is a 64 bit value, and 8 bits go int a bit, thus are there 8 bytes. But how does 244 and 1 makeup 500? I tried Googling it, but all I get is use BitConverter. I need to know how the bitconverter works under the hood. If anybody can perhaps point me to an article or explain how this stuff works, it would be appreciated.
It's quite simple.
BitConverter.GetBytes((long)1); // {1,0,0,0,0,0,0,0};
BitConverter.GetBytes((long)10); // {10,0,0,0,0,0,0,0};
BitConverter.GetBytes((long)100); // {100,0,0,0,0,0,0,0};
BitConverter.GetBytes((long)255); // {255,0,0,0,0,0,0,0};
BitConverter.GetBytes((long)256); // {0,1,0,0,0,0,0,0}; this 1 is 256
BitConverter.GetBytes((long)500); // {244,1,0,0,0,0,0,0}; this is yours 500 = 244 + 1 * 256
If you need source code you should check Microsoft GitHub since implementation is open source :)
https://github.com/dotnet
From the source code:
// Converts a long into an array of bytes with length
// eight.
[System.Security.SecuritySafeCritical] // auto-generated
public unsafe static byte[] GetBytes(long value)
{
Contract.Ensures(Contract.Result<byte[]>() != null);
Contract.Ensures(Contract.Result<byte[]>().Length == 8);
byte[] bytes = new byte[8];
fixed(byte* b = bytes)
*((long*)b) = value;
return bytes;
}
I have a string that only contains 1 and 0 and I need to save this to a .txt-File.
I also want it to be as small as possible. Since I have binary code, I can turn it into pretty much everything. Saving it as binary is not an option, since apparently every character will be a whole byte, even if it's a 1 or a 0.
I thought about turning my string into an Array of Byte but trying to convert "11111111" to Byte gave me a System.OverflowException.
My next thought was using an ASCII Codepage or something. But I don't know how reliable that is. Alternatively I could turn all of the 8-Bit pieces of my string into the corresponding numbers. 8 characters would turn into a maximum of 3 (255), which seems pretty nice to me. And since I know the highest individual number will be 255 I don't even need any delimiter for decoding.
But I'm sure there's a better way.
So:
What exactly is the best/most efficient way to store a string that only contains 1 and 0?
You could represent all your data as 64 bit integers and then write them to a binary file:
// The string we are working with.
string str = #"1010101010010100010101101";
// The number of bits in a 64 bit integer!
int size = 64;
// Pad the end of the string with zeros so the length of the string is divisible by 64.
str += new string('0', str.Length % size);
// Convert each 64 character segment into a 64 bit integer.
long[] binary = new long[str.Length / size]
.Select((x, idx) => Convert.ToInt64(str.Substring(idx * size, size), 2)).ToArray();
// Copy the result to a byte array.
byte[] bytes = new byte[binary.Length * sizeof(long)];
Buffer.BlockCopy(binary, 0, bytes, 0, bytes.Length);
// Write the result to file.
File.WriteAllBytes("MyFile.bin", bytes);
EDIT:
If you're only writing 64 bits then it's a one-liner:
File.WriteAllBytes("MyFile.bin", BitConverter.GetBytes(Convert.ToUInt64(str, 2)));
I would suggest using BinaryWriter. Like this:
BinaryWriter writer = new BinaryWriter(File.Open(fileName, FileMode.Create));
So I'm curious, what exactly is going on here?
static void SetUInt16 (byte [] bytes, int offset, ushort val)
{
bytes [offset] = (byte) ((val & 0x0ff00) >> 8);
bytes [offset + 1] = (byte) (val & 0x0ff);
}
Basically the idea in this code is to set a 16 bit int into a byte buffer at a specific location, but the problem is I'm trying to emulate it using
using(var ms = new MemoryStream())
using(var w = new BinaryWriter(ms))
{
w.Write((ushort)1);
}
I'm expecting to read 1 but instead I'm getting 256. Is this an endianness issue?
The code writes a 16-bit integer in big-endian order. Upper byte is written first. Not the same thing that BinaryWriter does, it writes in little-endian order.
When you decode the data, are you getting 256 when you expect 1? BinaryWriter.Write uses little-endian encoding, your SetUInt16 method is using big-endian.