I've spent quite a bit of time trying to confirm the type of CRC-8 algorithm used in ASCII data communications between two devices. I have confirmed that the CRC is calculated on the 0x02 Start of text byte + the next byte of data. An Interface Design Document that I have describing one device specifies the use of a 0xEA polynomial with an initial value of 0xFF. An example of one captured message is below:
Input Bytes: 0x02 0x41
CRC Result: b10011011 or 0x9B
Going into the problem, I had little to no knowledge of the inner working of a typical CRC algorithm. Initially, I tried hand calculation against the input bytes to confirm my understanding of the algo before attempting a code solution. This involved XORing the 1st input byte with my 0xFF initial value and then skipping to the second input byte to continue the XOR operations.
Having tried multiple times to confirm the CRC through typical XOR operations while shifting the MSB left out of the register during each step, I could never get the results I wanted. Today, I realized that the 0xEA polynomial is also considered to be a reversed reciprocal of the 0xD5 poly with an implied 1+x^8 that is commonly used in CRC-8 algos. How does this fact change how I would go about manually calculating the CRC? I've read that in some instances a reverse leads to the algo right shifting bits instead of left shifting?
The polynomial is x^8+x^7+x^5+x^3+x^2+x+1 => 01AF bit reversed to x^8+x^7+x^6+x^5+x^3+x+1 => 0x1EB. Example code where the conditional XOR is done after the shift, so the XOR value is 0x1EB>>1 = 0xF5. A 256 byte table lookup could be used to replace the inner loop.
using System;
namespace crc8r
{
class Program
{
private static byte crc8r(byte[] bfr, int bfrlen)
{
byte crc = 0xff;
for (int j = 0; j < bfrlen; j++)
{
crc ^= bfr[j];
for (int i = 0; i < 8; i++)
// assumes twos complement math
crc = (byte)((crc>>1)^((0-(crc&1))&0xf5));
}
return crc;
}
static void Main(string[] args)
{
byte[] data = new byte[3] {0x02, 0x41, 0x00};
byte crc;
crc = crc8r(data, 2); // crc == 0x9b
Console.WriteLine("{0:X2}", crc);
data[2] = crc;
crc = crc8r(data, 3); // crc == 0x00
Console.WriteLine("{0:X2}", crc);
return;
}
}
}
Regarding "EA", if the polynomial is XOR'ed before the shift, 0x1EB (or 0x1EA since bit 0 will be shifted off and doesn't matter) is used. XOR'ing before the shift requires 9 bits, or a post shift OR or XOR of 0x80, while XOR'ing after the shift only requires 8 bits.
Example line of code using 0x1eb before the shift:
crc = (byte)((crc^((0-(crc&1))&0x1eb))>>1);
Related
This is in C#. I was hoping I could do something like the following.
byte byte1 = 100;
byte byte2 = 100;
byte1[1] = byte1[1] ^ byte2[6]; // XOR bit at index 1 against bit at index 6
However, I am currently stuck at:
if ((byte2 ^ (byte)Math.Pow(2, index2)) < byte2)
byte1 = (byte)(byte1 ^ (byte)Math.Pow(2, index1));
Is there a faster way, possibly something similar to what I typed at the top?
Edit:
I had never heard of any of the bitwise operators other than XOR. That's why the original solution had the bizarre Math.Pow() calls. I've already improved my solution considerably according to my benchmarking of millions of loop iterations. I'm sure I'll get it faster with more reading. Thanks to everybody that responded.
byte2 = (byte)(byte2 << (7 - index2));
if (byte2 > 127)
{
byte buffer = (byte)(1 << index1);
byte1 = (byte)(byte1 ^ buffer);
}
Bytes are immutable, you can't change a bit of the byte as if it was an array. You'd need to access the bits through masks (&) and shifts (<< >>), then create a new byte containing the result.
// result bit is the LSB of r
byte r = (byte)((byte1 >> 1 & 1) ^ (byte2 >> 6 & 1));
The specific mask 1 will erase any bit except the right most (LSB).
I have a single byte which contains two values. Here's the documentation:
The authority byte is split into two fields. The three least significant bits carry the user’s authority level (0-5). The five most
significant bits carry an override reject threshold. If these bits are
set to zero, the system reject threshold is used to determine whether
a score for this user is considered an accept or reject. If they are
not zero, then the value of these bits multiplied by ten will be the
threshold score for this user.
Authority Byte:
7 6 5 4 3 ......... 2 1 0
Reject Threshold .. Authority
I don't have any experience of working with bits in C#.
Can someone please help me convert a Byte and get the values as mentioned above?
I've tried the following code:
BitArray BA = new BitArray(mybyte);
But the length comes back as 29 and I would have expected 8, being each bit in the byte.
-- Thanks for everyone's quick help. Got it working now! Awesome internet.
Instead of BitArray, you can more easily use the built-in bitwise AND and right-shift operator as follows:
byte authorityByte = ...
int authorityLevel = authorityByte & 7;
int rejectThreshold = authorityByte >> 3;
To get the single byte back, you can use the bitwise OR and left-shift operator:
int authorityLevel = ...
int rejectThreshold = ...
Debug.Assert(authorityLevel >= 0 && authorityLevel <= 7);
Debug.Assert(rejectThreshold >= 0 && rejectThreshold <= 31);
byte authorityByte = (byte)((rejectThreshold << 3) | authorityLevel);
Your use of the BitArray is incorrect. This:
BitArray BA = new BitArray(mybyte);
..will be implicitly converted to an int. When that happens, you're triggering this constructor:
BitArray(int length);
..therefore, its creating it with a specific length.
Looking at MSDN (http://msdn.microsoft.com/en-us/library/x1xda43a.aspx) you want this:
BitArray BA = new BitArray(new byte[] { myByte });
Length will then be 8 (as expected).
To get a value of the five most significant bits in a byte as an integer, shift the byte to the right by 3 (i.e. by 8-5), and set the three upper bits to zero using bitwise AND operation, like this:
byte orig = ...
int rejThreshold = (orig >> 3) & 0x1F;
>> is the "shift right" operator. It moves bits 7..3 into positions 4..0, dropping the three lower bits.
0x1F is the binary number 00011111, which has the upper three bits set to zero, and the lower five bits set to one. AND-ing with this number zeroes out three upper bits.
This technique can be generalized to get other bit patterns and other integral data types. You shift the bits that you want into the least-significant position, and apply a mask that "cuts out" the number of bits that you want. In some cases, shifting would not be necessary (e.g. when you get the least significant group of bits). In other cases, such as above, the masking would not be necessary, because you get the most significant group of bits in an unsigned type (if the type is signed, ANDing would be required).
You're using the wrong constructor (probably).
The one that you're using is probably this one, while you need this one:
var bitArray = new BitArray(new [] { myByte } );
All conversions of the 160 bit SHA1 use 40 ascii characters (320 bits) to represent 160 bits of data (that I have been able to find). I have a need to optimize this and use as few ascii characters as possible to represent a SHA1 hash.
For instance this string "The quick brown fox jumps over the lazy dog" equals this in ASCII "2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12" when converted by typical algorithms.
I have create an algorithm that uses 5 bits for each ASCII character so I go from needing 40 ASCII characters to 32 "F0K1032QD08C1M44U11B0R77P3R31L2I".
Does anybody have a better way to get fewer characters, but not lose information (by something like a lossy compression technique or using a smaller hash like MD5)?
I have a need to potentially represent this hash as a folder on windows so using upper and lower case to use 6 bits per character cant be done.
class Program
{
static byte[] GetBytesForTypical(byte[] hash)
{
List<byte> newHash = new List<byte>();
foreach (byte b in hash)
{
int first4Bits = (b & 0xF0) >> 4;
int last4bits = b & 0x0F;
newHash.Add((byte)first4Bits);
newHash.Add((byte)last4bits);
}
return newHash.ToArray();
}
public static string ConvertHashToFileSystemFriendlyStringTypical(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
strToConvert.Append(b.ToString("X"));
}
return strToConvert.ToString();
}
static byte[] GetBytesForCompressedAttempt(byte[] hash)
{
byte[] newHash = new byte[32];
// the bit array 5 bits at a time
// at 8 bits per bytes that is 40 bits per loop 4 times
int byteCounter =0;
int k = 0;
for(int i=0; i < 4 ;++i)
{
//Get 5 bits worth
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
//Get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
newHash[k] <<= 2;
++byteCounter;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
// get 1 bit
newHash[k] = (byte)(hash[byteCounter] & 0x1);
newHash[k] <<= 7;
++byteCounter;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++k;
hash[byteCounter] >>= 4;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++byteCounter;
// get 1 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1);
hash[byteCounter] >>=1;
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++k;
hash[byteCounter] >>= 5;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++byteCounter;
// get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++byteCounter;
++k;
}
return newHash;
}
public static string ConvertHashToFileSystemFriendlyStringCompressedl(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
System.Diagnostics.Debug.Assert(b < 32);
if (b >= 10 && b < 32)
{
strToConvert.Append((char)(b - 10 + 'A'));
}
else
{
strToConvert.Append((char)(b + '0'));
}
}
return strToConvert.ToString();
}
static void Main(string[] args)
{
System.Security.Cryptography.SHA1 hasher = System.Security.Cryptography.SHA1.Create();
byte[] data = hasher.ComputeHash(Encoding.Default.GetBytes("The quick brown fox jumps over the lazy dog"));
byte[] stringBytesTypical = GetBytesForTypical(data);
string typicalFriendlyHashString = ConvertHashToFileSystemFriendlyStringTypical(stringBytesTypical);
//2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12 == typicalFriendlyHashString
byte[] stringBytesCompressedAttempt = GetBytesForCompressedAttempt(data);
string compressedFriendlyHashString = ConvertHashToFileSystemFriendlyStringCompressedl(stringBytesCompressedAttempt);
//F0K1032QD08C1M44U11B0R77P3R31L2I == compressedFriendlyHashString
}
}
EDIT:
The need to reduce to fewer than 40 characters has nothing to do with windows folder names. (although it could since windows paths have a limit). I need to conserve as much space as possible for human readable strings and then create a folder for anything that needs to be reviewed. The problem with the 40 character ascii string is that 1/2 of the bits are set to 0 and are in essence wasted. So when storing millions and millions of hashes space and lookup speed start to become intertwined. I cant redesign user workflow, but I can make the system more snappy and consume less memory
EDIT:
Also this would improve user experience. Currently a user has to use a partial hash to look something up. Worse case (in practice) the first 8 characters in the hash need to be used currently to usually ensure there are no duplicates. These 8 characters represent 32 bits of real hash data. Going down to 5 bits per character users will only need 6 characters to ensure no dups. If I can get it to 6 bits then user should only need around 5 characters. This gets into the realm of what most people are able to memorize
EDIT: I've made some progress from the original code I posed above. Once I converted the hash into hexatridecimal (base 36) I was able to remove one of the characters from the original 5 bit implementation above. So I am currently at 31 characters. Which means that from the typical implementation where 8 characters are required for retrieval (in practice) users should be able to use 6 characters to retrieve the same data.
public static string ConvertHashToFileSystemFriendlyStringCompressed2(byte[] hashData)
{
string mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
BigInteger base10 = new BigInteger(hashData);
string base36;
var result = new Stack<char>();
do
{
result.Push(mapping[(int)(base10 % 36)]);
base10 /= 36;
} while (base10 != 0);
base36 = new string(result.ToArray());
return base36;
}
EDIT: Been doing more research and I have a graph that I wanted to post showing the diminishing returns you get as you increase the number of ASCII characters you have to choose from. You wind up needing more and more characters for smaller and smaller gains. I seem to be at the tail end of where you get the biggest bang for your buck (at 36 characters). So even if I am able to jump to use 64 characters (which I cant at the present time) I only remove 4 of the final string. However if slim down the original hash to 18 bytes those same 36 characters now only create a 27 character string (same length as converting to base 64). Now the problem is how can I reliably compress a 20 byte hash into 18 bytes. Truncation wont work since users will still have to memorize 6 characters if I use truncation. Since a SHA1 hash are random bytes I am not sure I can lossless compress 2 bytes away (10% space savings).
EDIT: So my attempts to compress the hash bytes have not met with success. I expected this but had to try in order to prove this to myself. Basically what I did was attempt to use a Huffman Code to compress the original hash.
Since each value in the hash is equally likely (definition of a good hash) using a common Huffman tree for all compression is out of the question (since that would yield the same number of bits I am trying to compress for no net gain). However, once you create a Huffman tree for a specific hash you do get compression of the original hash (20 bytes to 16 bytes for example), only to have the saved 4 bytes subsequently lost because you have to store the Huffman tree as well. This approach may work for longer hash values (512 bits ect) but does not appear to work well enough for all SHA1 hash values to warrant implementation (only a very small subset of SHA1 hash outputs will benefit from this type of compression).
Code(written in C):
unsigned long chksum_crc32 (unsigned char *block, unsigned int length)
{
register unsigned long crc;
unsigned long i;
crc = 0xFFFFFFFF;
for (i = 0; i < length; i++)
{
crc = ((crc >> 8) & 0x00FFFFFF) ^ crc_tab[(crc ^ *block++) & 0xFF];
}
return (crc ^ 0xFFFFFFFF);
}
/* chksum_crc32gentab() -- to a global crc_tab[256], this one will
* calculate the crcTable for crc32-checksums.
* it is generated to the polynom [..]
*/
void chksum_crc32gentab ()
{
unsigned long crc, poly;
int i, j;
poly = 0xEDB88320L;
for (i = 0; i < 256; i++)
{
crc = i;
for (j = 8; j > 0; j--)
{
if (crc & 1)
{
crc = (crc >> 1) ^ poly;
}
else
{
crc >>= 1;
}
}
crc_tab[i] = crc;
}
}
For starters; I know how CRC works, first the divisor is calculated with a specified polynomial, then this FCS(frame check sequence) is appended to the data set and sent to the end users system. Once the transfer is finished, the FCS is checked with the same polynomial used to calculate the FCS, and if the remainder of the data with that divisor is zero, then you know the data is correct.
I do not understand the implementation of these two functions. From what I have learned, the function chksum_crc32gentab() generates all the possible hex values the checksum could take with the 32 bit CRC polynomial. One thing I dont get is how poly = 0xEDB88320L; is equivelent to a polynomial. I don't understand the logic in the bottom of this function either. For example, the conditional if (crc & 1), does this mean that for every bit in crc that is 1, compute, otherwise shift right one bit?
I also do not understand chksum_crc32(unsigned char *block, unsigned int length);. Does this function just take in a string of bytes and convert them to the proper crc value computed with the table?. I guess I am confused about the logic it uses within the for loop.
If anyone understands this code an explanation would be great; this does work for the crc32 conversion from the .net class, an example of how data is converted then used by these functions would be something like:
(C# source)
MemoryStream ms = new MemoryStream(System.Text.Encoding.Default.GetBytes(input));
foreach (byte b in crc32.ComputeHash(ms))
hash += b.ToString("x2").ToLower();
Here is the original site and project the C code was taken from. http://www.codeproject.com/Articles/35134/How-to-calculate-CRC-in-C
Any explanation would help
Or just google it... Second hit is: http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/libkern/crc32.c
Backporting it from C#'s the hard way to do it, most of these algorithms are already in C.
In CRC calculations, binary polynomials, which are sums of x^n with either a 0 or 1 coefficient, are represented simply as binary words where the position of the 0 or 1 indicates which power of x it is a coefficient of.
0xEDB88320L represents the coefficients of the CRC32 polynomial as 1's where there is an x^n term (except for the x^32 term, which is left out). The CRC32 polynomial (why oh why doesn't stackoverflow have TeX equations like math.stackexchange -- I can't write decent equations here! sigh, sorry for the rant ...) is:
x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
Because of how this CRC is defined with respect to bit-ordering, the lowest coefficients are in the highest bits. So the first E in the hex constant above is 1110 representing (in order from left to right in the bits), 1 + x + x^2.
You can find the construction in the crc32.c source file of zlib, from which a snippet is shown here:
static const unsigned char p[] = {0,1,2,4,5,7,8,10,11,12,16,22,23,26};
/* make exclusive-or pattern from polynomial (0xedb88320UL) */
poly = 0;
for (n = 0; n < (int)(sizeof(p)/sizeof(unsigned char)); n++)
poly |= (z_crc_t)1 << (31 - p[n]);
/* generate a crc for every 8-bit value */
for (n = 0; n < 256; n++) {
c = (z_crc_t)n;
for (k = 0; k < 8; k++)
c = c & 1 ? poly ^ (c >> 1) : c >> 1;
crc_table[0][n] = c;
}
The if (crc & 1) or c & 1 ? above looks at the low bit of the CRC at each step before it is shifted away. That is effectively a carry bit for the polynomial subtraction operation, so if it is a one, the polynomial is subtracted (exclusive-ored) from the shifted down polynomial in the CRC (multiplied by x). The CRC is shifted down whether the low bit is 1 or not.
The chksum_crc32() function that you show indeed computes the CRC on the provided block of data. It is the standard table-based approach for CRC calculations on strings of bytes, which indexes the table by the exclusive-or of the data byte and the low byte of the CRC. This does the same thing as shifting in a bit at a time and applying the polynomial for 1 bits, but does it in one step instead of eight. The CRC is effectively multiplied by x^8 (the >> 8), and is exclusive-ored with the effect of exclusive-oring with the polynomial 0 to 8 times at various shifted locations depending on the index value. It is simply a speed trick using a pre-computed table.
You can find even more extreme speed tricks used in zlib's crc32.c that uses larger tables and processes more data a time.
I have a byte array in c#. I need to pull out a certain number of bytes starting at a bit position that may not lie on a byte boundary.
Write a little helper method which uses the shift operators to get a byte out
e.g.
byte[] x = new[] {0x0F, 0xF0}
result = x[0] << 4 | x[1] >> 4;
returns 8 bits from the 5th bit position 0xFF
You could easily vary the position using the modulo operator %
a byte is the minimal alignment you can read with the standard stream readers in .NET
If you want to read bits, you need to use bitwise operators and masks to determine if a bit is on (1) or off (0).
But, this means you could use boolean true/false to tell what the contents of a byte are. One way is to read the bits into a boolean enumeration. Something like this extension method could work:
public static IEnumerable<bool> BitsToBools(IEnumerable<byte> input)
{
int readByte;
while((readByte = input.MoveNext()) >= 0)
{
for(int i = 7; i >= 0; i--) // read left to right
yield return ((readByte >> i) & 1) == 1;
}
}
You could add a startIndex and a count to the extension method if you want, or pass in the range from the calling method.