I have a custom protocol to send and receive messages over TCP like the following:
The first 4 bytes is the message type, the following 4 bytes is the length of the message and the rest of the buffer is containing the message itself.
private byte[] CreateMessage(int mtype,string data)
{
byte[] buffer = new byte[4 + 4 + data.Length];
//write mtype, data.Length, and data to buffer
return buffer;
}
I want to write mtype to the first 4 bytes of buffer and then data.Length to next 4 bytes and then the data. I am coming from golang world and we do that like the following:
buf := make([]byte, 4+4+len(data))
binary.LittleEndian.PutUint32(buf[0:], uint32(mtype))
binary.LittleEndian.PutUint32(buf[4:], uint32(len(data)))
Span<byte> span = buffer;
BinaryPrimitives.WriteUInt32LittleEndian(span, type);
BinaryPrimitives.WriteUInt32LittleEndian(span.Slice(4), (uint)len);
// etc
A span is sort of like an array, and you can create a span from an array..but a span can be sliced internally. Not all APIs work with spans, but those that do ... sweet.
Related
I'm writing an RLE algorithm in C# that can work on any file as input. The approach to encoding I'm taking is as follows:
An RLE packet contains 1 byte for the length and 1 byte for the value. For example, if the byte 0xFF appeared 3 times in a row, 0x03 0xFF would be written to the file.
If representing the data as raw data would be more efficient, I use 0x00 as a terminator. This works because the length of a packet can never be zero. If I wanted to add the bytes 0x53 0x2C 0x01 to my compressed file it would look like this:
0x03 0xFF 0x00 0x53 0x2C 0x01
However a problem arises when trying to switch back to RLE packets. I can't use a byte as a terminator like I did for switching onto raw data because any byte value from 0x00 to 0xFF can be in the input data, and when decoding the bytes the decoder would misinterpret the byte as a terminator and ruin everything.
What can I do to indicate that I have to switch back to RLE packets when it can't be written as data in the file?
Here is my code if it helps:
private static void RunLengthEncode(ref byte[] bytes)
{
// Create a list to store the bytes
List<byte> output = new List<byte>();
byte runLengthByte;
int runLengthCounter = 0;
// Set the RLE byte to the first byte in the array and increment the RLE counter
runLengthByte = bytes[0];
// For each byte in the input array...
for (int i = 0; i < bytes.Length; i++)
{
if (runLengthByte == bytes[i] || runLengthCounter == 255)
{
runLengthCounter++;
}
else
{
// RLE packets under 3 should be written as raw data to avoid increasing the file size
if (runLengthCounter < 3)
{
// Add a 0x00 to indicate raw data
output.Add(0x00);
// Add the bytes that were skipped while counting the run length
for (int j = i - runLengthCounter; j < i; j++)
{
output.Add(bytes[j]);
}
}
else
{
// Add 2 bytes, one for the number of bytes and one for the value
output.Add((byte)runLengthCounter);
output.Add(runLengthByte);
}
runLengthCounter = 1;
runLengthByte = bytes[i];
}
// Add the last bytes to the list when finishing
if (i == bytes.Length - 1)
{
// Add 2 bytes, one for the number of bytes and one for the value
output.Add((byte)runLengthCounter);
output.Add(runLengthByte);
}
}
// Set the bytes to the RLE encoded data
bytes = output.ToArray();
}
Also if you want to comment and say that RLE isn't very efficient for binary data, I know it isn't. This is a project I'm doing to implement many kinds of compression to learn about them, not for an actual product.
Any help would be appreciated! Thanks!
There are many ways to unambiguously encode run-lengths. One simple way is, when decoding: if you see two equal bytes in a row, then the next byte is a a count of repeats of that byte after those first two. I.e. 0..255 additional repeats, so encoding runs of 2..257. (There's no point in encoding runs of 0 or 1.)
I was reading this popular stack overflow question Creating a byte array from a stream and wanted to get some clarification on how byte arrays work.
in this chunk of code here:
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = PictureStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
Here's what I'm not understanding:
I'm getting lost on the size that this array is set to. For example, I use that code chunk to convert an image stream to a byte array, but i'm usually reading images that are larger than 2 megabytes, which is far larger than the size of the array that's reading in the picture- 16*1024 bytes. However, the above code converts the image from a stream to a byte array totally fine, no "out of bounds index" errors to be had.
So how is my array a smaller size than the photo I'm reading in, yet still manages to read it totally fine?
The array you pass is just a buffer. When you read from the stream it returns the number of bytes read and populates the buffer array with that many elements (it is not always fully filled). Then you write that many bytes to the memory stream. This process is repeated until there are no more bytes to read from the file.
You will notice that the array produced by ToArray is much larger than your buffer size.
As already mentioned in the comments.
The function read of Picture stream only reads a chunk of data,
actually exactly the amount which the transport buffer has.
The we read this amount we write it to the output stream from the transport buffer.
I tried to write some code snipped to demonstrates what is going on:
int inputBufferSizeInByte = 1024 * 1000 * 5; // 5 MiB = 5000 KiB
// AmountKiloByte * factor MiB * factorWhatWeWant
Byte[] inputBuffer = new Byte[inputBufferSizeInByte];
//we fill our inputBuffer with random numbers
Random rnd = new Random();
rnd.NextBytes(inputBuffer);
//we define our streams
MemoryStream inputMemoryStream = new MemoryStream(inputBuffer);
MemoryStream outPutMemoryStream = new MemoryStream();
//we define a smaller buffer for reading
int transportBufferSizeInByte = 1024 * 16; // 16 KiB
byte[] transportBufferFor = new byte[transportBufferSizeInByte];
int amountTotalWeReadInByte = 0;
int tempReadAmountInByte = 0;
int callWriteCounter = 0;
do
{
tempReadAmountInByte = inputMemoryStream.Read(transportBufferFor, 0, transportBufferSizeInByte);
//we write what we got to the output
if(tempReadAmountInByte>0)
{
outPutMemoryStream.Write(transportBufferFor, 0, tempReadAmountInByte);
callWriteCounter++;
}
//we calc how the total amout
amountTotalWeReadInByte += tempReadAmountInByte;
} while (tempReadAmountInByte > 0);
//we sum up
Console.WriteLine("input buffer size: \t" + inputBufferSizeInByte + " \t in Byte");
Console.WriteLine("total amount read \t" + amountTotalWeReadInByte + " \t in Byte");
Console.WriteLine("output stream size: \t" + outPutMemoryStream.Length + " \t in Byte");
Console.WriteLine("called strean write \t" + callWriteCounter + "\t\t times");
output:
input buffer size: 5120000 in Byte
total amount read 5120000 in Byte
output stream size: 5120000 in Byte
called strean write 313 times
So we call 313 times the stream write function and everthing behaves like it should.
That's brings me to key question:
why is there in size difference between the picture in memory and in hard disk ?
I do think the picture encoding is the reason.
The difference of the size of a picture on the hard disk and its memory representation belongs often
to the picture encoding. I know this fact from working with the cpp library opencv.
I rather guess the c# implementation behaves similar.
See some Q/A about this topic:
[question]: JPEG image memory byte size from OpenCV imread doesn't seem right
Q: Is there any benefit of storing the length of a large array within the array itself?
Explanation:
Let's say we compress some large binary serialized object by using the GZipStream class of the System.IO.Compression namespace.
The output will be a Base64 string of some compressed byte array.
At some later point the Base64 string gets converted back to a byte array and the data needs to be decompressed.
While compressing the data we create a new byte array with the size of the compressed byte array + 4.
In the first 4 bytes we store the length/size of the compressed byte array and we then BlockCopy the length and the data to the new array. This new array gets converted into a Base64 string.
While decompressing we convert the Base64 string into a byte array.
Now we can extract the length of the actual compressed data by using the BitConverter class which will extract a Int32 from the first 4 bytes.
We then allocate a byte array with the length that we got from the first 4 bytes and let the Stream write the decompressed bytes to the byte array.
I can't image that something like this actually has any benefit at all.
It adds more complexity to the code and more operations need to be executed.
Readability is reduced too.
The BlockCopy operations alone should consume so much resources that this just cannot have a benefit, right?
Compression example code:
byte[] buffer = new byte[0xffff] // Some large binary serialized object
// Compress in-memory.
using (var mem = new MemoryStream())
{
// The actual compression takes place here.
using (var zipStream = new GZipStream(mem, CompressionMode.Compress, true)) {
zipStream.Write(buffer, 0, buffer.Length);
}
// Store compressed byte data here.
var compressedData = new byte[mem.Length];
mem.Position = 0;
mem.Read(compressedData, 0, compressedData.Length);
/* Increase the size by 4 to accommadate for an Int32 that
** will store the total length of the compressed data. */
var zipBuffer = new byte[compressedData.Length + 4];
// Store length of compressedData array in the first 4 bytes.
Buffer.BlockCopy(compressedData, 0, zipBuffer, 4, compressedData.Length);
// Store the compressedData array after the first 4 bytes which store the length.
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, zipBuffer, 0, 4);
return Convert.ToBase64String(zipBuffer);
}
Decompression example code:
byte[] zipBuffer = Convert.FromBase64String("some base64 string");
using (var inStream = new MemoryStream())
{
// The length of the array that was stored in the first 4 bytes.
int dataLength = BitConverter.ToInt32(zipBuffer, 0);
// Allocate array with specific size.
byte[] buffer = new byte[dataLength];
// Writer data to buffer array.
inStream.Write(zipBuffer, 4, zipBuffer.Length - 4);
inStream.Position = 0;
// Decompress data.
using (var zipStream = new GZipStream(inStream, CompressionMode.Decompress)) {
zipStream.Read(buffer, 0, buffer.Length);
}
... code
... code
... code
}
You tagged the question as C#, wich means .NET, so the question is irrelevant:
The Framework already store the length with the Array. It is how the array classes do the sanity checks on Indexers. It how it prevents overflow attacks in managed code. That help alone is worth any minor inefficiency (note that the JiT is actually able to prune most of the checks. With a loop for example, it will simply look at the running variable once per loop).
You would have to go all the way into unmanaged code and handling naked pointers to have a hope to get rid of it. But why would you? The difference is so small, it falls under the speed rant. If it maters, you propably got a realtime programming case. And starting those with .NET was a bad idea.
I am trying to write an Encoded file.The file has 9 to 12 bit symbols. While writing a file I guess that it is not written correctly the 9 bit symbols because I am unable to decode that file. Although when file has only 8 bit symbols in it. Everything works fine. This is the way I am writing a file
File.AppendAllText(outputFileName, WriteBackContent, ASCIIEncoding.Default);
Same goes for reading with ReadAllText function call.
What is the way to go here?
I am using ZXing library to encode my file using RS encoder.
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);//if i use AZTEC_DATA_8 it works fine beacuse symbol size is 8 bit
int[] bytesAsInts = Array.ConvertAll(toBytes.ToArray(), c => (int)c);
enc.encode(bytesAsInts, parity);
byte[] bytes = bytesAsInts.Select(x => (byte)x).ToArray();
string contentWithParity = (ASCIIEncoding.Default.GetString(bytes.ToArray()));
WriteBackContent += contentWithParity;
File.AppendAllText(outputFileName, WriteBackContent, ASCIIEncoding.Default);
Like in the code I am initializing my Encoder with AZTEC_DATA_12 which means 12 bit symbol. Because RS Encoder requires int array so I am converting it to int array. And writing to file like here.But it works well with AZTEC_DATA_8 beacue of 8 bit symbol but not with AZTEC_DATA_12.
Main problem is here:
byte[] bytes = bytesAsInts.Select(x => (byte)x).ToArray();
You are basically throwing away part of the result when converting the single integers to single bytes.
If you look at the array after the call to encode(), you can see that some of the array elements have a value higher than 255, so they cannot be represented as bytes. However, in your code quoted above, you cast every single element in the integer array to byte, changing the element when it has a value greater than 255.
So to store the result of encode(), you have to convert the integer array to a byte array in a way that the values are not lost or modified.
In order to make this kind of conversion between byte arrays and integer arrays, you can use the function Buffer.BlockCopy(). An example on how to use this function is in this answer.
Use the samples from the answer and the one from the comment to the answer for both conversions: Turning a byte array to an integer array to pass to the encode() function and to turn the integer array returned from the encode() function back into a byte array.
Here are the sample codes from the linked answer:
// Convert byte array to integer array
byte[] result = new byte[intArray.Length * sizeof(int)];
Buffer.BlockCopy(intArray, 0, result, 0, result.Length);
// Convert integer array to byte array (with bugs fixed)
int bytesCount = byteArray.Length;
int intsCount = bytesCount / sizeof(int);
if (bytesCount % sizeof(int) != 0) intsCount++;
int[] result = new int[intsCount];
Buffer.BlockCopy(byteArray, 0, result, 0, byteArray.Length);
Now about storing the data into files: Do not turn the data into a string directly via Encoding.GetString(). Not all bit sequences are valid representations of characters in any given character set. So, converting a random sequence of random bytes into a string will sometimes fail.
Instead, either store/read the byte array directly into a file via File.WriteAllBytes() / File.ReadAllBytes() or use Convert.ToBase64() and Convert.FromBase64() to work with a base64 encoded string representation of the byte array.
Combined here is some sample code:
ReedSolomonEncoder enc = new ReedSolomonEncoder(GenericGF.AZTEC_DATA_12);//if i use AZTEC_DATA_8 it works fine beacuse symbol size is 8 bit
int[] bytesAsInts = Array.ConvertAll(toBytes.ToArray(), c => (int)c);
enc.encode(bytesAsInts, parity);
// Turn int array to byte array without loosing value
byte[] bytes = new byte[bytesAsInts.Length * sizeof(int)];
Buffer.BlockCopy(bytesAsInts, 0, bytes, 0, bytes.Length);
// Write to file
File.WriteAllBytes(outputFileName, bytes);
// Read from file
bytes = File.ReadAllBytes(outputFileName);
// Turn byte array to int array
int bytesCount = bytes.Length * 40;
int intsCount = bytesCount / sizeof(int);
if (bytesCount % sizeof(int) != 0) intsCount++;
int[] dataAsInts = new int[intsCount];
Buffer.BlockCopy(bytes, 0, dataAsInts, 0, bytes.Length);
// Decoding
ReedSolomonDecoder dec = new ReedSolomonDecoder(GenericGF.AZTEC_DATA_12);
dec.decode(dataAsInts, parity);
I have designed a 2 Pass Assembler for my project. The output is in Hexadecimal form i.e. 15 is 0F.
I am working with ComPort and to send "0F" over the line it should be sent as String.
But the problem is that I can only receive 1 byte on the other end and sizeOf("0F") > 1 byte .
There is no way of decompressing data on the other end and I need to do all work on my end and still i want to receive "0F" on the other end.
Can i do this if yes then how?
I did this to get the hexadecimal string :
String.format("{0:X2}",15);
In addition,
using System.IO.Ports;
private SerialPort comPort = new SerialPort();
comPort.Write("0F");
On the receiving end I have a 8-bit processor which have a 1byte * 256 blocks i.e. 256 bytes. "0F" when received is received as 2 bytes and cannot be stored in a single block of 1 byte. So I want "0F" to be of 1 byte.
Looks like you need something like this:
// create buffer
byte[] buffer = new byte[256];
// put values you need to send to buffer
buffer[0] = 0x0f;
// ... add another bytes if you need...
// send them
var comPort = new SerialPort();
comPort.Write(buffer, 0, 1); // 0 is buffer offset, 1 is number of bytes to write