What is the endianess of a double on Windows 10 64bit? - c#

I am using c# and writing a program to send numbers over UDP. I am on the Windows 10 64bit platform and I am using BitConverter in order to get the bytes from integers, doubles, etc..
As an example:
If I use:
Byte[] data = BitConverter.GetBytes((int)1);
I get, 01000000 in HEX, which would be little endian as expected.
If I use:
Byte[] data = BitConverter.GetBytes((double)1);
I get, 000000000000f03f in HEX, which looks like a big endian number but I am just not so sure.
My guess is I don't have a good understanding of endianess or of the double format. I suppose it is also possible that Windows stores doubles different from ints?

Binary representation of double is different from that of integer. It follows the ieee standard for storing floating point values. Use the ieee standard and get the binary representation of 1 and then check fir endianness.

An interesting note. As you might already know, C# doesn't define the endiannes and it depends on the cpu architecture, if you are writing cross platform/architecture applications you can check with the method call BitConverter.IsLittleEndian
Indicates the byte order ("endianness") in which data is stored in
this computer architecture.
Remarks
Different computer architectures store data using different byte
orders. "Big-endian" means the most significant byte is on the left
end of a word. "Little-endian" means the most significant byte is on
the right end of a word.
Note
You can convert from network byte order to the byte order of the host
computer without retrieving the value of the
BitConverter.IsLittleEndian field by passing a 16-bit, 32-bit, or 64
bit integer to the IPAddress.HostToNetworkOrder method.
If you need different endiannes, you can convert easily enough with Array.Reverse.
byte[] bytes = BitConverter.GetBytes(num);
Array.Reverse(bytes, 0, bytes.Length);
or bitwise switch with types like int and long, you could take it a step further with unsafe and pointers for other types
public uint SwapBytes(uint x)
{
x = (x >> 16) | (x << 16);
return ((x & 0xFF00FF00) >> 8) | ((x & 0x00FF00FF) << 8);
}
public ulong SwapBytes(ulong x)
{
x = (x >> 32) | (x << 32);
x = ((x & 0xFFFF0000FFFF0000) >> 16) | ((x & 0x0000FFFF0000FFFF) << 16);
return ((x & 0xFF00FF00FF00FF00) >> 8) | ((x & 0x00FF00FF00FF00FF) << 8);
}

Certainly little-endian.
Remember that IEEE floating-point is a bitfield, with sign having higher significance than exponent, which in turn has higher significance than mantissa.
Your integer example has only one field, and its low bits are set.
Your double example has all zero bits in the mantissa field, and the more significant field of exponent bits is non-zero. (Both of these are affected by the biasing used by IEEE-754)
The significant bits are at the higher memory addresses, just like with the little-endian integer.
For reference, IEEE-754 for 1.0 is { sign: 0, exponent: 0x3ff, mantissa: 0x0000000000000 }

Related

What does "int &= 0xFF" in a checksum do?

I implemented this checksum algorithm I found, and it works fine but I can't figure out what this "&= 0xFF" line is actually doing.
I looked up the bitwise & operator, and wikipedia claims it's a logical AND of all the bits in A with B. I also read that 0xFF is equivalent to 255 -- which should mean that all of the bits are 1. If you take any number & 0xFF, wouldn't that be the identity of the number? So A & 0xFF produces A, right?
So then I thought, wait a minute, checksum in the code below is a 32 bit Int, but 0xFF is 8bit. Does that mean that the result of checksum &= 0xFF is that 24 bits end up as zeros and only the remaining 8 bits are kept? In which case, checksum is truncated to 8 bits. Is that what's going on here?
private int CalculateChecksum(byte[] dataToCalculate)
{
int checksum = 0;
for(int i = 0; i < dataToCalculate.Length; i++)
{
checksum += dataToCalculate[i];
}
//What does this line actually do?
checksum &= 0xff;
return checksum;
}
Also, if the result is getting truncated to 8 bits, is that because 32 bits is pointless in a checksum? Is it possible to have a situation where a 32 bit checksum catches corrupt data when 8 bit checksum doesn't?
It is masking off the higher bytes, leaving only the lower byte.
checksum &= 0xFF;
Is syntactically short for:
checksum = checksum & 0xFF;
Which, since it is doing integer operations, the 0xFF gets expanded into an int:
checksum = checksum & 0x000000FF;
Which masks off the upper 3 bytes and returns the lower byte as an integer (not a byte).
To answer your other question: Since a 32-bit checksum is much wider than an 8-bit checksum, it can catch errors that an 8-bit checksum would not, but both sides need to use the same checksum calculations for that to work.
Seems like you have a good understanding of the situation.
Does that mean that the result of checksum &= 0xFF is that 24 bits end up as zeros and only the remaining 8 bits are kept?
Yes.
Is it possible to have a situation where a 32 bit checksum catches corrupt data when 8 bit checksum doesn't?
Yes.
This is performing a simple checksum on the bytes (8 bit values) by adding them and ignoring any overflow out into higher order bits. The final &=0xFF, as you suspected, just truncates the value to the 8LSB of the 32 bit (If that is your compiler's definition of int) value resulting in an unsigned value between 0 and 255.
The truncation to 8 bits and throwing away the higher order bits is simply the algorithm defined for this checksum implementation. Historically this sort of check value was used to provide some confidence that a block of bytes had been transferred over a simple serial interface correctly.
To answer your last question then yes, a 32 bit check value will be able to detect an error that would not be detected with an 8 bit check value.
Yes, the checksum is truncated to 8 bits by the
&= 0xFF. The lowest 8 bits are kept and all higher bits are set to 0.
Narrowing the checksum to 8 bits does decrease the reliability. Just think of two 32bit checksums that are different but the lowest 8 bits are equal. In case of truncating to 8 bits both would be equal, in 32bit case they are not.

Find minimum float greater than a double value

I was having an issue with the CUDNN_BN_MIN_EPSILON value being used in the cudnnBatchNormalizationForwardTraining function (see the docs here), and it turns out it was because I was passing the float value 1e-5f instead of double (I'm working with float values to save memory and speed up computation), and this value once converted to float was slightly less than 1e-5, which is the actual value of that constant.
After some trial and error, I found a decent approximation I'm now using:
const float CUDNN_BN_MIN_EPSILON = 1e-5f + 5e-13f;
I'm sure there's a better way to approach problems like this, so the question is:
Given a positive double value, what is the best (as in "reliable") way to find the minimum possible float value which (on its own and if/when converted to double) is strictly greater than the initial double value?
Another way to formulate this problem is that, given a double value d1 and a float value f1, d1 - (float)f1 should be the minimum possible negative value (as otherwise it'd mean that f1 was less than d1, which is not what we're looking for).
I did some basic trial and error (using 1e-5 as my target value):
// Check the initial difference
> 1e-5 - 1e-5f
2,5262124918247909E-13 // We'd like a small negative value here
// Try to add the difference to the float value
> 1e-5 - (1e-5f + (float)(1e-5 - 1e-5f))
2,5262124918247909E-13 // Same, probably due to approximation
// Double the difference (as a test)
> 1e-5 - (1e-5f + (float)((1e-5 - 1e-5f) * 2))
-6,5687345259044915E-13 // OK
With this approximation, the final float value is 1,00000007E-05, which looks fine.
But, that * 2 multiplication was completely arbitrary on my end, and I'm not sure it'll be reliable or the optimum possible thing to do there.
Is there a better way to achieve this?
Thanks!
EDIT: this is the (bad) solution I'm using now, will be happy to replace it with a better one!
/// <summary>
/// Returns the minimum possible upper <see cref="float"/> approximation of the given <see cref="double"/> value
/// </summary>
/// <param name="value">The value to approximate</param>
public static float ToApproximatedFloat(this double value)
=> (float)value + (float)((value - (float)value) * 2);
SOLUTION: this is the final, correct implementation (thanks to John Bollinger):
public static unsafe float ToApproximatedFloat(this double value)
{
// Obtain the bit representation of the double value
ulong bits = *((ulong*)&value);
// Extract and re-bias the exponent field
ulong exponent = ((bits >> 52) & 0x7FF) - 1023 + 127;
// Extract the significand bits and truncate the excess
ulong significand = (bits >> 29) & 0x7FFFFF;
// Assemble the result in 32-bit unsigned integer format, then add 1
ulong converted = (((bits >> 32) & 0x80000000u)
| (exponent << 23)
| significand) + 1;
// Reinterpret the bit pattern as a float
return *((float*)&converted);
}
In C:
#include <math.h>
float NextFloatGreaterThan(double x)
{
float y = x;
if (y <= x) y = nexttowardf(y, INFINITY);
return y;
}
If you do not want to use library routines, then replace nexttowardf(y, INFINITY) above with -NextBefore(-y), where NextBefore is taken from this answer and modified:
Change double to float and DBL_ to FLT_.
Change .625 to .625f.
Replace fmax(SmallestPositive, fabs(q)*Scale) with SmallestPositive < fabs(q)*Scale ? fabs(q)*Scale : SmallestPositive.
Replace fabs(q) with (q < 0 ? -q : q).
(Obviously, the routine could be converted from -NextBefore(-y) to NextAfter(y). That is left as an exercise for the reader.)
Inasmuch as you seem interested in the representation-level details, you'll be dependent on the representations of types float and double. In practice, however, it is very likely that that comes down to the basic "binary32" and "binary64" formats of IEEE-754. These have the general form of one sign bit, several bits of biased exponent, and a bunch of bits of significand, including, for normalized values, one implicit bit of significand.
Simple case
Given a double in IEEE-754 binary64 format whose value is no less than +2-126, what you want to do is
obtain the bit pattern of the original double value in a form that can be directly examined and manipulated. For example, as an unsigned 64-bit integer.
double d = 1e-5;
uint64_t bits;
memcpy(&bits, &d, 8);
extract and re-bias the exponent field
uint64_t exponent = ((bits >> 52) & 0x7FF) - 1023 + 127;
extract the significand bits and truncate the excess
uint64_t significand = (bits >> 29) & 0x7fffff;
assemble the result in 32-bit unsigned integer format
uint32_t float_bits = ((bits >> 32) & 0x80000000u)
| (exponent << 23)
| significand;
add one. Since you want a result strictly greater than the original double, this is correct regardless of whether all of the truncated significand bits were 0. It will correctly increment the exponent field if the addition overflows the significand bits. It may, however, produce the bit pattern of an infinity.
float_bits += 1;
store / copy / reinterpret the bit pattern as that of a float
float f;
memcpy(&f, &float_bits, 4);
Negative numbers
Given a negative double in binary64 format whose magnitude is no less than 2-126, follow the above procedure except subtract 1 from float_bits instead of adding one. Note that for exactly -2-126, this produces a subnormal binary32 (see below), which is the correct result.
Zeroes and very small numbers, including subnormals
IEEE 754 provides reduced-precision representations of non-zero numbers of very small magnitude. Such representations are called subnormal. Under some circumstances the minimum binary32 exceeding a given input binary64 is a subnormal, including for some inputs that are not binary64 subnormals.
Also, IEEE 754 provides signed zeroes, and -0 is a special case: the minimum binary32 strictly greater than -0 (either format) is the smallest positive subnormal number. Note: not +0, because according to IEEE 754, +0 and -0 compare equal via the normal comparison operators. The minimum positive, nonzero, subnormal binary32 value has bit pattern 0x00000001.
The binary64 values subject to these considerations have biased binary64 exponent fields with values less than or equal to the difference between the binary64 exponent bias and the binary32 exponent bias (896). This includes those with biased exponents of exactly 0, which characterize binary64 zeroes and subnormals. Examination of the rebiasing step in the simple-case procedure should lead you to conclude, correctly, that that procedure will produce the wrong result for such inputs.
Code for these cases is left as an exercise.
Infinities and NaNs
Inputs with all bits of the biased binary64 exponent field set represent either positive or negative infinity (when the binary64 significand has no bits set) or a not-a-number (NaN) value. Binary64 NaNs and positive infinity should convert to their binary32 equivalents. Negative infinity should perhaps convert to the negative binary32 value of greatest magnitude. These need to be handled as special cases.
Code for these cases is left as an exercise.

CRC-16 0x8005 polynominal, from C to C#. SOS

I have this block of C code that I can not for the life of me understand. I need to calculate the CRC-16 for a certain byte array I send to the method and it should give me the msb(most significant byte) and the lsb(least significant byte). I was also given a C written app to test some functionality and that app also gives me a log of what is sent and what is received via COM port.
What is weird is that I entered the hex string that I found in the log into this online calculator, but it gives me a different result.
I took a stab at translating the method to C#, but I don't understand certain aspects:
What is pucPTR doing there (it's not beeing used anywhere else)?
What do the 2 lines of code mean, under the first for?
Why in the second for the short "i" is <=7, shouldn't it be <=8?
Last line in if statement means that usCRC is in fact ushort 8005?
Here is the block of code:
unsigned short CalculateCRC(unsigned char* a_szBufuer, short a_sBufferLen)
{
unsigned short usCRC = 0;
for (short j = 0; j < a_sBufferLen; j++)
{
unsigned char* pucPtr = (unsigned char*)&usCRC;
*(pucPtr + 1) = *(pucPtr + 1) ^ *a_szBufuer++;
for (short i = 0; i <= 7; i++)
{
if (usCRC & ((short)0x8000))
{
usCRC = usCRC << 1;
usCRC = usCRC ^ ((ushort)0x8005);
}
else
usCRC = usCRC << 1;
}
}
return (usCRC);
}
This is the hex string that I convert to byte array and send to the method:
02 00 04 a0 00 01 01 03
This is the result that should be given from the CRC calculus:
06 35
The document I have been given says that this is a CRC16 IBM (msb, lsb) of the entire data.
Can anyone please help? I've been stuck on it for a while now.
Any code guru out there capable of translating that C method to C#? Apparently I'm not capable of such sourcery.
First of all, please note than in C, the ^ operator means bitwise XOR.
What is pucPTR doing there (it's not beeing used anywhere else)?
What do the 2 lines of code mean, under the first for?
Causing bugs, by the looks of it. It is only used to grab one of the two bytes of the FCS, but the code is written in an endianess-dependent way.
Endianess is very important when dealing with checksum algorithms, since they were originally designed for hardware shift registers, that require MSB first, aka big endian. In addition, CRC often means data communication, and data communication means possibly different endianess between the sender, the protocol and the receiver.
I would guess that this code was written for little endian machines only and the intent is to XOR with the ms byte. The code points to the first byte then uses +1 pointer arithmetic to get to the second byte. Corrected code should be something like:
uint8_t puc = (unsigned int)usCRC >> 8;
puc ^= *a_szBufuer;
usCRC = (usCRC & 0xFF) | ((unsigned int)puc << 8);
a_szBufuer++;
The casts to unsigned int are there to portably prevent mishaps with implicit integer promotion.
Why in the second for the short "i" is <=7, shouldn't it be <=8?
I think it is correct, but more readably it could have been written as i < 8.
Last line in if statement means that usCRC is in fact ushort 8005?
No, it means to XOR your FCS with the polynomial 0x8005. See this.
The document I have been given says that this is a CRC16 IBM
Yeah it is sometimes called that. Though from what I recall, "CRC16 IBM" also involves some bit inversion of the final result(?). I'd double check that.
Overall, be careful with this code. Whoever wrote it didn't have much of a clue about endianess, integer signedness and implicit type promotions. It is amateur-level code. You should be able to find safer, portable professional versions of the same CRC algorithm on the net.
Very good reading about the topic is A Painless Guide To CRC.
What is pucPTR doing there (it's not beeing used anywhere else)?
pucPtr is used to transform uglily an unsigned short to an array of 2 unsigned char. According endianess of platform, pucPtr will point on first byte of unsigned short and pucPtr+1 will point on second byte of unsigned short (or vice versa). You have to know if this algorithm is designed for little or big endian.
Code equivalent (and portable, if code have been developed for big endian):
unsigned char rawCrc[2];
rawCrc[0] = (unsigned char)(usCRC & 0x00FF);
rawCrc[1] = (unsigned char)((usCRC >> 8) & 0x00FF);
rawCrc[1] = rawCrc[1] ^ *a_szBufuer++;
usCRC = (unsigned short)rawCrc[0]
| (unsigned short)((unsigned int)rawCrc[1] << 8);
For little endian, you have to inverse raw[0] and raw[1]
What do the 2 lines of code mean, under the first for?
First line does the ugly transformation described in 1.
Second line retrieves value pointed by a_szBufuer and increment it. And does a "xor" with second (or first, according endianess) byte of crc (note *(pucPtr +1) is equivalent of pucPtr[1]) and stores results inside second (or first, according endianess) byte of crc.
*(pucPtr + 1) = *(pucPtr + 1) ^ *a_szBufuer++;
is equivalent to
pucPtr[1] = pucPtr[1] ^ *a_szBufuer++;
Why in the second for the short "i" is <=7, shouldn't it be <=8?
You have to do 8 iterations, from 0 to 7. You can change condition to i = 0; i<8 or i=1; i<=8
Last line in if statement means that usCRC is in fact ushort 8005?
No, it doesn't. It means that usCRC is now equal to usCRC XOR 0x8005. ^ is XOR bitwise operation (also called or-exclusive). Example:
0b1100110
^0b1001011
----------
0b0101101

What byte order when BitConverter.IsLittleEndian = false

I'm storing numbers in their byte equivalent format, using the least number of bytes possible. With the range 65535 through 16777215, BitConverter gives me a 4 byte array, but I want to only store 3 bytes.
For the code below, my array is [0]254, [1]255, [2]255, [3]0, so I can chop out byte [3]. This is on a Core i7 proc. In my production code, before the array copy, I am checking BitConverter.IsLittleEndian to determine that I can chop the last byte.
int i = 16777214;
byte[] bytesTemp = BitConverter.GetBytes(i);
byte[] value = null;
if (BitConverter.IsLittleEndian)
Array.Copy(bytesTemp, 0, value, 0, 3);
My question is - do I need to concern myself with the Endian-ness of the system, or does the CLR just use this LittleEndian format regardless? I don't have a BigEndian system (nor even know how I'd get one) to test whether my byte array comes out in the reverse order.
Yes, according to the documentation, you need to be concerned. They have an example where they reverse the bytes if the architecture is not the desired endianess.
As far as where to get a BigEndian system, I think that the ARM based processors are big-endians, although I haven't tested this. So if you're running on Win RT device or a phone, for example, you might get different behavior.
It entirely depends on what you are doing with the data. If you are going to be writing it to disk for portable persistence, then yes... I would probably care about endianness. If you are just going to use it to recreate an int later in the same process (or on the same machine), it probably doesn't matter as much.
However, when I do need to worry about endianness, I usually don't acheive that by BitConverter at all - personally, I'd be tempted to use byte masking and shifting; then you don't even need to know the endianness - it'll work the same on any system. It also avoids the annoyingly bad design decision of BitConverter returning a byte array rather than accepting an array and offset.
For example:
byte[] buffer = ...
// write little-endian
buffer[offset++] = (byte)(i & 0xFF);
buffer[offset++] = (byte)((i >> 8) & 0xFF);
buffer[offset++] = (byte)((i >> 16) & 0xFF);
buffer[offset++] = (byte)((i >> 24) & 0xFF);

How to perform a Bitwise Operator on a byte array in C#

I am using C# and Microsoft.Xna.Framework.Audio;
I have managed to record some audio into a byte[] array and I am able to play it back.
The audio comes in as 8 bit unsigned data, and I would like to convert it into 16 bit mono signed audio so I can read the frequency what not.
I have read a few places that for sound sampling you perform a Bitwise Operator Or and shift the bits 8 places.
I have performed the code as follows;
soundArray[i] = (short)(buffer[i] | (buffer[i + 1] << 8));
What I end up with is a lot of negative data.
From my understanding it would mostly need to be in the positive and would represent a wave length of data.
Any suggestions or help greatly appreciated,
Cheers.
MonkeyGuy.
This combines two 8-bit unsigned integers into one 16-bit signed integer:
soundArray[i] = (short)(buffer[i] | (buffer[i + 1] << 8));
I think what you might want is to simply scale each 8-bit unsigned integer to a 16-bit signed integer:
soundArray[i] = (short)((buffer[i] - 128) << 8);
Have you tried converting the byte to short before shifting?

Categories

Resources