Recieve raw binary data stream in controller and combine every 2 bytes - c#

I plan to send raw binary data from a GSM module to a Web API controller. I'm simulating sending the data with Fiddler. The data format is 8 bytes eg.
0x18 0x01 0x1B 0x02 0x12 0x10 0x2D 0x0A
I receive the data at the controller in a 16 byte array:
the data looks correct:
byte 0 = 49 (Ascii char 1) (binary 0011 0001)
byte 1 = 56 (Ascii char 8) (binary 0011 1000)
I need to combine both these bytes to create a single byte of 0x18 (binary 0001 1000)
Looking at the binary values, it looks like i need to shift byte 0 left 4 places, then use the and operator with byte 1?
I'm a bit stuck if anyone could please help.
Thank you

Using bit operators:
byte a = 49;
byte b = 56;
a <<= 4;
b <<= 4;
b >>= 4;
byte result = (byte)(b + a);
Console.WriteLine("{0}", result);

Related

Indicating the end of a raw data chunk in an RLE algorithm that can contain all byte values

I'm writing an RLE algorithm in C# that can work on any file as input. The approach to encoding I'm taking is as follows:
An RLE packet contains 1 byte for the length and 1 byte for the value. For example, if the byte 0xFF appeared 3 times in a row, 0x03 0xFF would be written to the file.
If representing the data as raw data would be more efficient, I use 0x00 as a terminator. This works because the length of a packet can never be zero. If I wanted to add the bytes 0x53 0x2C 0x01 to my compressed file it would look like this:
0x03 0xFF 0x00 0x53 0x2C 0x01
However a problem arises when trying to switch back to RLE packets. I can't use a byte as a terminator like I did for switching onto raw data because any byte value from 0x00 to 0xFF can be in the input data, and when decoding the bytes the decoder would misinterpret the byte as a terminator and ruin everything.
What can I do to indicate that I have to switch back to RLE packets when it can't be written as data in the file?
Here is my code if it helps:
private static void RunLengthEncode(ref byte[] bytes)
{
// Create a list to store the bytes
List<byte> output = new List<byte>();
byte runLengthByte;
int runLengthCounter = 0;
// Set the RLE byte to the first byte in the array and increment the RLE counter
runLengthByte = bytes[0];
// For each byte in the input array...
for (int i = 0; i < bytes.Length; i++)
{
if (runLengthByte == bytes[i] || runLengthCounter == 255)
{
runLengthCounter++;
}
else
{
// RLE packets under 3 should be written as raw data to avoid increasing the file size
if (runLengthCounter < 3)
{
// Add a 0x00 to indicate raw data
output.Add(0x00);
// Add the bytes that were skipped while counting the run length
for (int j = i - runLengthCounter; j < i; j++)
{
output.Add(bytes[j]);
}
}
else
{
// Add 2 bytes, one for the number of bytes and one for the value
output.Add((byte)runLengthCounter);
output.Add(runLengthByte);
}
runLengthCounter = 1;
runLengthByte = bytes[i];
}
// Add the last bytes to the list when finishing
if (i == bytes.Length - 1)
{
// Add 2 bytes, one for the number of bytes and one for the value
output.Add((byte)runLengthCounter);
output.Add(runLengthByte);
}
}
// Set the bytes to the RLE encoded data
bytes = output.ToArray();
}
Also if you want to comment and say that RLE isn't very efficient for binary data, I know it isn't. This is a project I'm doing to implement many kinds of compression to learn about them, not for an actual product.
Any help would be appreciated! Thanks!
There are many ways to unambiguously encode run-lengths. One simple way is, when decoding: if you see two equal bytes in a row, then the next byte is a a count of repeats of that byte after those first two. I.e. 0..255 additional repeats, so encoding runs of 2..257. (There's no point in encoding runs of 0 or 1.)

C# - how to put 4 bit and 12 bit into 2 bytes

I have 2 bytes that should populate this way:
first number (4b) second number (12b)
So 4 bit can be between 1-15
And 12 bit can be between 1-50
So i have this Bytes Array:
byte[] packetArrayBytes = new byte[Length];
The way I've understood the question is that you've got these two (presumably) unsigned integers a and b:
(I'll be writing them in hexadecimal to make it easier to read)
a: 0x0000000X
b: 0x00000XXX
Where a is the 4-bit number and b is the 12-bit one, with the Xs marking the bits containing the relevant values.
You want to store them in two separate 8-bit chunks: c: 0x00 and d: 0x00
So you need to shift the bits into position, like this:
byte[] packetArrayBytes = new byte[2];
uint intA = 0xF; // Allowed range is 0-15 (0x0-0xF)
uint intB = 0xABC; // Allowed range is 0-4095 (0x0-0xFFF)
// Need to convert from uint to bytes:
byte[] bytesA = BitConverter.GetBytes(intA);
byte[] bytesB = BitConverter.GetBytes(intB);
byte a = bytesA[0]; // a is 0x0F
byte b = bytesB[1]; // b is 0x0A
int c = 0x00; // c is 0x00
int d = bytesB[0]; // d is 0xBC
// Mask out 4 least significant bits of a,
// then shift 4 bits left to place them in the most significant bits (of the byte),
// then OR them into c.
c |= (a & 0x0F) << 4; // c is now 0xF0
// Mask out 4 least significant bits of b,
// then OR them into c.
c |= b & 0x0F; // c is now 0xFA
packetArrayBytes[0] = (Byte)c;
packetArrayBytes[1] = (Byte)d;
Console.WriteLine(BitConverter.ToString(packetArrayBytes)); // Prints "FA-BC"
After doing these operations, the values of a and b should be placed in the bytes c and d like this:
c: 0xFA d: 0xBC. Which you can then place into your array.
To get the values back you just do these same operations in reverse.
If a and b are signed values, I believe the same operations work, but you'll have to make sure you're not interpreting them as unsigned when reading the data back into numbers.

Reading Two Int4 from a Byte into two Seperate Bytes and Vice Versa

Okay so this may sound ridiculous, but as a personal project, I am trying to re-create a TCP networking protocol in C#.
Every TCP packet received has a header that must start with with two Int4 (0 - 15) forming a single Byte. I think using bitwise operators I have extracted the two Int4 from the byte:
Byte firstInt4 = headerByte << 4;
Byte secondInt4 = headerByte >> 4;
The issue is that I now need to be able to write two Int4 to a single Byte, but I have no idea how to do this.
Yes, bitwise operations will do:
Split:
byte header = ...
byte firstInt4 = (byte) (header & 0xF); // 4 low bits
byte secondInt4 = (byte) (headerByte >> 4); // 4 high bits
Combine:
byte header = (byte) ((secondInt4 << 4) | firstInt4);
An int4 is called a "nibble": half a byte is a nibble. :)
Something like:
combinedByte = hiNibble;
combinedByte << 4; // Make space for second nibble.
combinedByte += loNibble;
should do what you want.

Fastest way to XOR two specific bit indexes in two bytes

This is in C#. I was hoping I could do something like the following.
byte byte1 = 100;
byte byte2 = 100;
byte1[1] = byte1[1] ^ byte2[6]; // XOR bit at index 1 against bit at index 6
However, I am currently stuck at:
if ((byte2 ^ (byte)Math.Pow(2, index2)) < byte2)
byte1 = (byte)(byte1 ^ (byte)Math.Pow(2, index1));
Is there a faster way, possibly something similar to what I typed at the top?
Edit:
I had never heard of any of the bitwise operators other than XOR. That's why the original solution had the bizarre Math.Pow() calls. I've already improved my solution considerably according to my benchmarking of millions of loop iterations. I'm sure I'll get it faster with more reading. Thanks to everybody that responded.
byte2 = (byte)(byte2 << (7 - index2));
if (byte2 > 127)
{
byte buffer = (byte)(1 << index1);
byte1 = (byte)(byte1 ^ buffer);
}
Bytes are immutable, you can't change a bit of the byte as if it was an array. You'd need to access the bits through masks (&) and shifts (<< >>), then create a new byte containing the result.
// result bit is the LSB of r
byte r = (byte)((byte1 >> 1 & 1) ^ (byte2 >> 6 & 1));
The specific mask 1 will erase any bit except the right most (LSB).

Better algorithm for converting SHA1 to Ascii that uses fewer than 40 characters?

All conversions of the 160 bit SHA1 use 40 ascii characters (320 bits) to represent 160 bits of data (that I have been able to find). I have a need to optimize this and use as few ascii characters as possible to represent a SHA1 hash.
For instance this string "The quick brown fox jumps over the lazy dog" equals this in ASCII "2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12" when converted by typical algorithms.
I have create an algorithm that uses 5 bits for each ASCII character so I go from needing 40 ASCII characters to 32 "F0K1032QD08C1M44U11B0R77P3R31L2I".
Does anybody have a better way to get fewer characters, but not lose information (by something like a lossy compression technique or using a smaller hash like MD5)?
I have a need to potentially represent this hash as a folder on windows so using upper and lower case to use 6 bits per character cant be done.
class Program
{
static byte[] GetBytesForTypical(byte[] hash)
{
List<byte> newHash = new List<byte>();
foreach (byte b in hash)
{
int first4Bits = (b & 0xF0) >> 4;
int last4bits = b & 0x0F;
newHash.Add((byte)first4Bits);
newHash.Add((byte)last4bits);
}
return newHash.ToArray();
}
public static string ConvertHashToFileSystemFriendlyStringTypical(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
strToConvert.Append(b.ToString("X"));
}
return strToConvert.ToString();
}
static byte[] GetBytesForCompressedAttempt(byte[] hash)
{
byte[] newHash = new byte[32];
// the bit array 5 bits at a time
// at 8 bits per bytes that is 40 bits per loop 4 times
int byteCounter =0;
int k = 0;
for(int i=0; i < 4 ;++i)
{
//Get 5 bits worth
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
//Get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
newHash[k] <<= 2;
++byteCounter;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
// get 1 bit
newHash[k] = (byte)(hash[byteCounter] & 0x1);
newHash[k] <<= 7;
++byteCounter;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++k;
hash[byteCounter] >>= 4;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++byteCounter;
// get 1 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1);
hash[byteCounter] >>=1;
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++k;
hash[byteCounter] >>= 5;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++byteCounter;
// get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++byteCounter;
++k;
}
return newHash;
}
public static string ConvertHashToFileSystemFriendlyStringCompressedl(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
System.Diagnostics.Debug.Assert(b < 32);
if (b >= 10 && b < 32)
{
strToConvert.Append((char)(b - 10 + 'A'));
}
else
{
strToConvert.Append((char)(b + '0'));
}
}
return strToConvert.ToString();
}
static void Main(string[] args)
{
System.Security.Cryptography.SHA1 hasher = System.Security.Cryptography.SHA1.Create();
byte[] data = hasher.ComputeHash(Encoding.Default.GetBytes("The quick brown fox jumps over the lazy dog"));
byte[] stringBytesTypical = GetBytesForTypical(data);
string typicalFriendlyHashString = ConvertHashToFileSystemFriendlyStringTypical(stringBytesTypical);
//2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12 == typicalFriendlyHashString
byte[] stringBytesCompressedAttempt = GetBytesForCompressedAttempt(data);
string compressedFriendlyHashString = ConvertHashToFileSystemFriendlyStringCompressedl(stringBytesCompressedAttempt);
//F0K1032QD08C1M44U11B0R77P3R31L2I == compressedFriendlyHashString
}
}
EDIT:
The need to reduce to fewer than 40 characters has nothing to do with windows folder names. (although it could since windows paths have a limit). I need to conserve as much space as possible for human readable strings and then create a folder for anything that needs to be reviewed. The problem with the 40 character ascii string is that 1/2 of the bits are set to 0 and are in essence wasted. So when storing millions and millions of hashes space and lookup speed start to become intertwined. I cant redesign user workflow, but I can make the system more snappy and consume less memory
EDIT:
Also this would improve user experience. Currently a user has to use a partial hash to look something up. Worse case (in practice) the first 8 characters in the hash need to be used currently to usually ensure there are no duplicates. These 8 characters represent 32 bits of real hash data. Going down to 5 bits per character users will only need 6 characters to ensure no dups. If I can get it to 6 bits then user should only need around 5 characters. This gets into the realm of what most people are able to memorize
EDIT: I've made some progress from the original code I posed above. Once I converted the hash into hexatridecimal (base 36) I was able to remove one of the characters from the original 5 bit implementation above. So I am currently at 31 characters. Which means that from the typical implementation where 8 characters are required for retrieval (in practice) users should be able to use 6 characters to retrieve the same data.
public static string ConvertHashToFileSystemFriendlyStringCompressed2(byte[] hashData)
{
string mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
BigInteger base10 = new BigInteger(hashData);
string base36;
var result = new Stack<char>();
do
{
result.Push(mapping[(int)(base10 % 36)]);
base10 /= 36;
} while (base10 != 0);
base36 = new string(result.ToArray());
return base36;
}
EDIT: Been doing more research and I have a graph that I wanted to post showing the diminishing returns you get as you increase the number of ASCII characters you have to choose from. You wind up needing more and more characters for smaller and smaller gains. I seem to be at the tail end of where you get the biggest bang for your buck (at 36 characters). So even if I am able to jump to use 64 characters (which I cant at the present time) I only remove 4 of the final string. However if slim down the original hash to 18 bytes those same 36 characters now only create a 27 character string (same length as converting to base 64). Now the problem is how can I reliably compress a 20 byte hash into 18 bytes. Truncation wont work since users will still have to memorize 6 characters if I use truncation. Since a SHA1 hash are random bytes I am not sure I can lossless compress 2 bytes away (10% space savings).
EDIT: So my attempts to compress the hash bytes have not met with success. I expected this but had to try in order to prove this to myself. Basically what I did was attempt to use a Huffman Code to compress the original hash.
Since each value in the hash is equally likely (definition of a good hash) using a common Huffman tree for all compression is out of the question (since that would yield the same number of bits I am trying to compress for no net gain). However, once you create a Huffman tree for a specific hash you do get compression of the original hash (20 bytes to 16 bytes for example), only to have the saved 4 bytes subsequently lost because you have to store the Huffman tree as well. This approach may work for longer hash values (512 bits ect) but does not appear to work well enough for all SHA1 hash values to warrant implementation (only a very small subset of SHA1 hash outputs will benefit from this type of compression).

Categories

Resources