I'm reading some data over a socket. The integral data types are no trouble, the System.BitConverter methods are correctly handling the conversion. (So there are no Endian issues to worry about, I think?)
However, BitConverter.ToDouble isn't working for the floating point parts of the data...the source specification is a bit low level for me, but talks about a binary fixed point representation with a positive byte offset in the more significant direction and negative byte offset in the less significant direction.
Most of the research I've done has been aimed at C++ or a full fixed-point library handling sines and cosines, which sounds like overkill for this problem. Could someone please help me with a C# function to produce a float from 8 bytes of a byte array with, say, a -3 byte offset?
Further details of format as requested:
The signed numerical value of fixed point data shall be represented using binary, two's-complement notation.For fixed point data, the value of each data parameter shall be defined in relation to the reference byte. The reference byte defines an eight-bit field, with the unit of measure in the LSB position. The value of the LSB of the reference byte is ONE.
Byte offset shall be defined by a signed integer indicating the position of the least significant byte of a data element relative to the reference byte.
The MSB of the data element represents the sign bit. Bit positions between the MSB of the
parameter absolute value and the MSB of the most significant byte shall be equal in value to the sign bit.
Floating point data shall be represented as a binary floating point number in conformance with the IEEE ANSI/IEEE Std 754-2008. (This sentence is from a different section which may be a red herring).
Ok, after asking some questions from a local expert on the source material, it turns out CodeInChaos was on the right track...if the value is 8 bytes with a -3 byte offset, then I can use BitConverter.ToInt64 / 256^3, if it is 4 bytes with a -1 byte offset then BitConverter.ToInt32 / 256 will produce the correct answer. I guess that means BitConverter.ToXXX where XXX is signed is smart enough to handle the twos-complement calculations!
Thanks to those who tried to help out, I thought it couldn't be too complicated but getting that 256 offset from the reference document wording was very confusing:-)
System.BitConverter works very slow, so if performance is significant to you, i'd recommend to convert bytes to int by yourself (via logical shifts).
Also, please specify in what exact format floats are sent in your protocol.
Related
I have a single-precision float value, and no information about the distribution of the samples from which this value was generated, so I can't apply a sigmoid or perform some kind of normalization. Also, I know the value will always be non-negative. What is the best way to represent this float as a byte?
I've thought of the following:
Interpret the float as a UInt32 (I expect this to maintain relative ordering between numbers, please correct me if I'm wrong) and then scale it to the range of a byte.
UInt32 uVal = BitConverter.ToUInt32(BitConverter.GetBytes(fVal), 0);
byte bVal = Convert.ToByte(uVal * Byte.MaxValue / UInt32.MaxValue);
I'd appreciate your comments and any other suggestions. Thanks!
You have to assume a distribution. You have no choice. Somehow you have to partition the float values and assign them to byte values.
If the distribution is assumed to be linear arithmetic then the space is roughly 0 to 3.4e38. Each increment in the byte value would have a weight of about +1.3e36.
If the distribution is assumed to be linear geometric then the space is roughly 2.3e83. Each increment in the byte value would have a weight of about x2.1.
You can derive these values by simple arithmetic. The first is maxfloat/256. The second is the 256th root of (maxfloat/minfloat).
Your proposal to use and scale the raw bit pattern will produce a lumpy distribution in which numbers with different exponents are grouped together while numbers with the same exponent and different mantissa are separated. I would not recommend it for most purposes.
--
A really simple way that might suit some purposes is to simply use the 8-bit exponent (mask 0x7f80), ignoring the sign bit and mantissa. The values of 00 and ff would have to be handled specially. See http://en.wikipedia.org/wiki/Single-precision_floating-point_format.
I really hope someone can help me.
I have a single byte[] that has to show the amount of bytes in die byte[] to follow. Now my value is above 255. Is there a way to display/enter a large number?
A byte holds a value from 0 to 255. To represent 299, you either have to use 2 bytes, or use a scheme (which the receiver will have to use as well) where the value in the byte is interpreted as more than its nominal value in order to expand the possible range of values. For instance, the value could be the length / 2. This would allow lengths of 0 - 510, but would allow only even lengths (odd length arrays would need a pad byte).
You can use two (or more) bytes to represent a number larger than 255. Is that what you want ?
short value = 2451;
byte[] data = BitConverter.GetBytes(value);
If this is needed in order to exchange data with some external system, remember to read about Endianness.
That depends on what you consider a good approach. You can perform some form of encoding to allow you store larger than 2 bytes worth of data. I.e. perhaps setting the first byte as 0xFF means you will consider the next byte as part of its data.
[0x01,0x0A,0xFF,0x0A]
Would be interpreted as 3 values of [1,10,265]
Just wondering if someone could explain why the two following lines of code return "different" results? What causes the reversed values? Is this something to do with endianness?
int.MaxValue.ToString("X") //Result: 7FFFFFFF
BitConverter.ToString(BitConverter.GetBytes(int.MaxValue)) //Result: FF-FF-FF-7F
int.MaxValue.ToString("X") outputs 7FFFFFFF, that is, the number 2147483647 as a whole.
On the other hand, BitConverter.GetBytes returns an array of bytes representing 2147483647 in memory. On your machine, this number is stored in little-endian (highest byte last). And BitConverter.ToString operates separately on each byte, therefore not reordering output to give the same as above, thus preserving the memory order.
However the two values are the same : 7F-FF-FF-FF for int.MaxValue, in big-endian, and FF-FF-FF-7F for BitConverter, in little-endian. Same number.
I would guess because GetBytes returns an array of bytes which BitConverter.ToString formatted - in my opinion - rather nicely
And also keep in mind that the bitwise represantattion may be different from the value! This depends where the most signigicant byte sits!
hth
What actually happens when a Byte overflows?
Say we have
byte byte1 = 150; // 10010110
byte byte2 = 199; // 11000111
If we now do this addition
byte byte3 = byte1 + byte2;
I think we'll end up with byte3 = 94 but what actually happens? Did I overwrite some other memory somehow or is this totally harmless?
It's quite simple. It just does the addition and comes off at a number with more than 8 bits. The ninth bit (being one) just 'falls off' and you are left with the remaining 8 bits which form the number 94.
(yes it's harmless)
The top bits will be truncated. It is not harmful to any other memory, it is only harmful in terms of unintended results.
In C# if you have
checked { byte byte3 = byte1 + byte2; }
It will throw an overflow exception. Code is compiled unchecked by default. As the other answers are saying, the value will 'wrap around'. ie, byte3 = (byte1 + byte2) & 0xFF;
The carry flag gets set... but besides the result not being what you expect, there should be no ill effects.
Typically (and the exact behaviour will depend on the language and platform), the result will be taken modulo-256. i.e. 150+199 = 349. 349 mod 256 = 93.
This shouldn't affect any other storage.
Since you have tagged your question C#, C++ and C, I'll answer about C and C++. In c++ overflow on signed types, including sbyte (which, I believe, is signed char in C/C++) results in undefined behavior. However for unsigned types, such as byte (which is unsigned char in C++) the result is takes modulo 2n where n is the number of bits in the unsigned type. In C# the second rule holds, and the signed types generate an exception if they are in checked block. I may be wrong in the C# part.
Overflow is harmless in c# - you won't overflow memory - you simply obly get the last 8 bits of the result. If you want this to this an exception, use the 'checked' keyword. Note also that you may find byte+byte gives int, so you may need to cast back to byte.
The behavior depends on the language.
In C and C++, signed overflow is undefined and unsigned overflow has the behavior you mentioned (although there is no byte type).
In C#, you can use the checked keyword to explicitly say you want to receive an exception if there is overflow and the unchecked keyword to explicitly say you want to ignore it.
Leading bit just dropped off.
And arithmetic overflow occurs. Since 150+199=349, binary 1 0101 1101, the upper 1 bit is dropped and the byte becomes 0101 1101; i.e. the number of bits a byte can hold overflowed.
No damage was done - e.g. the memory didn't overflow to another location.
Let's look at what actually happens (in C (assuming you've got the appropriate datatype, as some have pointed out that C doesn't have a "byte" datatype; nonetheless, there are 8-bit datatypes which can be added)). If these bytes are declared on the stack, they exist in main memory; at some point, the bytes will get copied to the processor for operation (I'm skipping over several important steps, such as processsor cacheing...). Once in the processor, they will be stored in registers; the processor will execute an add operation upon those two registers to add the data together. Here's where the cause of confusion occurs. The CPU will perform the add operation in the native (or sometimes, specified) datatype. Let's say the native type of the CPU is a 32-bit word (and that that datatype is what is used for the add operation); that means these bytes will be stored in 32-bit words with the upper 24 bits unset; the add operation will indeed do the overflow in the target 32-bit word. But (and here's the important bit) when the data is copied back from the register to the stack, only the lowest 8 bits (the byte) will be copied back to the target variable's location on the stack. (Note that there's some complexity involved with byte packing and the stack here as well.)
So, here's the upshot; the add causes an overflow (depending on the specific processor instruction chosen); the data, however, is copied out of the processor into a datatype of the appropriate size, so the overflow is unseen (and harmless, assuming a properly written compiler).
As far as C# goes, adding two values of type byte together results in a value of type int which must then be cast back to byte.
Therefore your code sample will result in a compiler error without a cast back to byte as in the following.
byte byte1 = 150; // 10010110
byte byte2 = 199; // 11000111
byte byte3 = (byte)(byte1 + byte2);
See MSDN for more details on this. Also, see the C# language specification, section 7.3.6 Numeric promotions.
Looking at the C# numeric data types, I noticed that most of the types have a signed and unsigned version. I noticed that whereas the "default" integer, short and long are signed, and have their unsigned counterpart as uint, ushort and ulong; the "default" byte is instead unsigned - and have a signed counterpart in sbyte.
Just out of curiosity, why is byte so different from the rest? Was there a specific reason behind this or it is "just the way things are"?
Hope the question isn't too confusing due to my phrasing and excessive use of quotes. Heh..
I would say a byte is not considered a numeric type but defines a structure with 8 bits in size. Besides there is no signed byte notion, it is unsigned. Numbers on the otherhand are firstly considered to be signed, so stating they are unsigned which is less common warrants the prefix
[EDIT]
Forgot there is a signed byte (sbyte). I suppose it is rather historical and practical application. Ints are more common than UInts and byte is more common than sbyte.
Historically the terms byte, nibble and bit indicate a unit of storage, a mnemonic or code...not a numeric value. Having negative mega-bytes of memory or adding ASCII codes 1 and 2 expecting code 3 is kinda silly. In many ways there is no such thing as a signed "byte". Sometimes the line between "thing" and "value" is very blurry....as with most languages that treat byte as a thing and a value.
It's more so a degree of corruption of the terms. A byte is not inherently numeric in any form, it's simply a unit of storage.
However, bytes, characters, and 8-bit signed/unsigned integers have had their names used interchangeably where they probably should not have:
Byte denotes 8 bits of data, says
nothing about the format of the data.
Character denotes some data that
stores a representation of a single
text character.
"UINT8"/"INT8" denotes 8 bits of
data, in signed or unsigned format,
storing numeric integer values.
It really just comes down to being intuitive versus being consistent. It probably would have been cleaner if the .NET Framework used System.UInt8 and System.Int8 for consistency with the other integer types. But yeah it does seem a bit arbitrary.
For what it's worth MSIL (which all .NET languages compile to anyhow) is more consistent in that a sbyte is called an int8 and a byte is called an unsigned int8, short is called int16, etc.
But the term byte is typically not used to describe a numeric type but rather a set of 8 bits such as when dealing with files, serialization, sockets, etc. For example if Stream.Read worked with a System.Int8[] array, that would be a very unusual looking API.