What is the best way to quantize a float to a byte

What is the best way to quantize a float to a byte - c#

I have a single-precision float value, and no information about the distribution of the samples from which this value was generated, so I can't apply a sigmoid or perform some kind of normalization. Also, I know the value will always be non-negative. What is the best way to represent this float as a byte?
I've thought of the following:
Interpret the float as a UInt32 (I expect this to maintain relative ordering between numbers, please correct me if I'm wrong) and then scale it to the range of a byte.
UInt32 uVal = BitConverter.ToUInt32(BitConverter.GetBytes(fVal), 0);
byte bVal = Convert.ToByte(uVal * Byte.MaxValue / UInt32.MaxValue);
I'd appreciate your comments and any other suggestions. Thanks!

You have to assume a distribution. You have no choice. Somehow you have to partition the float values and assign them to byte values.
If the distribution is assumed to be linear arithmetic then the space is roughly 0 to 3.4e38. Each increment in the byte value would have a weight of about +1.3e36.
If the distribution is assumed to be linear geometric then the space is roughly 2.3e83. Each increment in the byte value would have a weight of about x2.1.
You can derive these values by simple arithmetic. The first is maxfloat/256. The second is the 256th root of (maxfloat/minfloat).
Your proposal to use and scale the raw bit pattern will produce a lumpy distribution in which numbers with different exponents are grouped together while numbers with the same exponent and different mantissa are separated. I would not recommend it for most purposes.
--
A really simple way that might suit some purposes is to simply use the 8-bit exponent (mask 0x7f80), ignoring the sign bit and mantissa. The values of 00 and ff would have to be handled specially. See http://en.wikipedia.org/wiki/Single-precision_floating-point_format.

Related

Change an integral value's data type while keeping it normalized to the maximum value of said data type in C#

I want to change a value, of, let's say, type int to be of type short, and making the value itself be "normalized" to the maximum value short can store - that is, so int.MaxValue would convert into short.MaxValue, and vice versa.
Here's an example using floating-point math to demonstrate:
public static short Rescale(int value){
float normalized = (float)value / int.MaxValue; // normalize the value to -1.0 to 1.0
float rescaled = normalized * (float)(short.MaxValue);
return (short)(rescaled);
}
While this works, it seems like using floating-point math is really inefficient, and can be improved, as we're dealing with binary data here. I tried using bit-shifting, but with to no avail.
Both signed and unsigned values are going to be processed - that isn't really an issue with the floating point solution, but when bit-shifting and doing other bit-manipulation, that makes things much more difficult.
This code will be used in quite a performance heavy context - it will be called 512 times every ~20 milliseconds, so performance is pretty important here.
How can I do this with bit-manipulation (or plain old integer algebra, if bit manipulation isn't necessary) and avoid floating-point math when we're operating on integer values?

You should use the shift operator. It is very fast.
int is 32bits, short is 16, so shift 16 bits right to scale your int to a short:
int x = 208908324 ;
//32 bits vs 16 bits.
short k = (short) (x >> 16);
Just reverse the process for scaling up. Obviously the lower bits will be filled with zeros.

How can I multiply and divide integers without bigger intermediate types?

Currently, I'm developing some fuzzy logic stuff in C# and want to achieve this in a generic way. For simplicity, I can use float, double and decimal to process an interval [0, 1], but for performance, it would be better to use integers. Some thoughts about symmetry also led to the decision to omit the highest value in unsigned and the lowest value in signed integers. The lowest, non-omitted value maps to 0 and the highest, non-omitted value maps to 1. The omitted value is normalized to the next non-omitted value.
Now, I want to implement some compund calculations in the form of:
byte f(byte p1, byte p2, byte p3, byte p4)
{
return (p1 * p2) / (p3 * p4);
}
where the byte values are interpreted as the [0, 1] interval mentioned above. This means p1 * p2 < p1 and p1 * p2 < p2 as opposed to numbers greater than 1, where this is not valid, e. g. 2 * 3 = 6, but 0.1 * 0.2 = 0.02.
Additionally, a problem is: p1 * p2 and p3 * p4 may exceed the range of the type byte. The result of the whole formula may not exceed this range, but the overflow would still occur in one or both parts. Of course, I can just cast to ushort and in the end back to byte, but for an ulong I wouldn't have this possibility without further effort and I don't want to stick to 32 bits. On the other hand, if I return (p1 / p3) * (p2 / p4), I decrease the type escalation, but might run into a result of 0, where the actual result is non-zero.
So I thought of somehow simultaneously "shrinking" both products step by step until I have the result in the [0, 1] interpretation. I don't need an exact value, a heuristic with an error less than 3 integer values off the correct value would be sufficient, and for an ulong an even higher error would certainly be OK.
So far, I have tried to convert the input to a decimal/float/double in the interval [0, 1] and calculated it. But this is completely counterproductive regarding performance. I read stuff about division algorithms, but I couldn't find the one I saw once in class. It was about calculating quotient and remainder simultaneously, with an accumulator. I tried to reconstruct and extend it for factorized parts of the division with corrections, but this breaks, where inidivisibility occurs and I get a too big error. I also made some notes and calculated some integer examples manually, trying to factor out, cancel out, split sums and such fancy derivation stuff, but nothing led to a satisfying result or steps for an algorithm.
Is there a
performant way
to multiply/divide signed (and unsigned) integers as above
interpreted as interval [0, 1]
without type promotion
?

To answer your question as summarised: No.
You need to state (and rank) your overall goals explicitly (e.g., is symmetry more or less important than performance?). Your chances of getting a helpful answer improve with succinctly stating them in the question.
While I think Phil1970's you can ignore scaling for … division overly optimistic, multiplication is enough of a problem: If you don't generate partial results bigger (twice as big) as your "base type", you are stuck with multiplying parts of your operands and piecing the result together.
For ideas about piecing together "larger" results: AVR's Fractional Multiply.
Regarding …in signed integers. The lowest, non-omitted value maps to 0…, I expect that you will find, e.g., excess -32767/32768-coded fractions even harder to handle than two's complement ones.

If you are not careful, you will lost more time doing conversions that it would have take with regular operations.
That being said, an alternative that might make some sense would be to map value between 0 and 128 included (or 0 and 32768 if you want more precision) so that all value are essentially stored multiplied by 128.
So if you have (0.5 * 0.75) / (0.125 * 0.25) the stored values for each of those numbers would be 64, 96, 16 and 32 respectively. If you do those computation using ushort you would have (64 * 96) / (16 * 32) = 6144 / 512 = 12. This would give a result of 12 / 128 = 0.09375.
By the way, you can ignore scaling for addition, substraction and division. For multiplication, you would do the multiplication as usual and then divide by 128. So for 0.5 * 0.75 you would have 64 * 96 / 128 = 48 which correspond to 48 / 128 = 0.375 as expected.
The code can be optimized for the platform particularly if the platform is more efficient with narrow numbers. And if necessary, rounding could be added to operation.
By the way since the scaling if a power of 2, you can use bit shifting for scaling. You might prefer to use 256 instead of 128 particularly if you don't have one cycle bit shifting but then you need larger width to handle some operations.
But you might be able to do some optimization if the most significant bit is not set for example so that you would only use larger width when necessary.

Efficient bit remapping algorithm

I have a use case where I need to scramble an input in such a way that:
Each specific input always maps to a specific pseudo-random output.
The output must shuffle the input sufficiently so that an incrementing input maps to a pseudo-random output.
For example, if the input is 64 bits, there must be exactly 2^64 unique outputs, and these must break incrementing inputs as much as possible (arbitrary requirement).
I will code this in C#, but can translate from Java or C, so long as there are not SIMD intrinsics. What I am looking for is some already existing code, rather than reinventing the wheel.
I have looked on Google, but haven't found anything that does a 1:1 mapping.

This seems to work fairly well:
const long multiplier = 6364136223846793005;
const long mulinv_multiplier = -4568919932995229531;
const long offset = 1442695040888963407;
static long Forward(long x)
{
return x * multiplier + offset;
}
static long Reverse(long x)
{
return (x - offset) * mulinv_multiplier;
}
You can change the constants to whatever as long as multiplier is odd and mulinv_multiplier is the modular multiplicative inverse (see wiki:modular multiplicative inverse or Hackers Delight 10-15 Exact Division by Constants) of multiplier (modulo 2^64, obviously - and that's why multiplier has to be odd, otherwise it has no inverse).
The offset can be anything, but make it relatively prime with 2^64 just to be on the safe side.
These specific constants come from Knuths linear congruential generator.
There's one small thing: it puts the complement of the LSB of the input in the LSB of the result. If that's a problem, you could just rotate it by any nonzero amount.
For 32 bits, the constants can be multiplier = 0x4c957f2d, offset = 0xf767814f, mulinv_multiplier = 0x329e28a5.
For 64 bits, multiplier = 12790229573962758597, mulinv_multiplier = 16500474117902441741 may work better.
Or, you could use a CRC, which is reversible for this use (ie the input is the same size as the CRC) for CRC64 it requires some modifications of course.

Just from the top of my head:
Shift the input: Make sure you keep every bit, i.e. use two shift operations in different directions and OR the result together.
Apply an static XOR.
Everything else that comes to my mind won't be bijective. However, a search for bijective might bring up something useful ;D

Check if decimal contains decimal places by looking at the bytes

There is a similar question in here. Sometimes that solution gives exceptions because the numbers might be to large.
I think that if there is a way of looking at the bytes of a decimal number it will be more efficient. For example a decimal number has to be represented by some n number of bytes. For example an Int32 is represented by 32 bits and all the numbers that start with the bit of 1 are negative. Maybe there is some kind of similar relationship with decimal numbers. How could you look at the bytes of a decimal number? or the bytes of an integer number?

If you are really talking about decimal numbers (as opposed to floating-point numbers), then Decimal.GetBits will let you look at the individual bits of a decimal. The MSDN page also contains a description of the meaning of the bits.
On the other hand, if you just want to check whether a number has a fractional part or not, doing a simple
var hasFractionalPart = (myValue - Math.Round(myValue) != 0)
is much easier than decoding the binary structure. This should work for decimals as well as classic floating-point data types such as float or double. In the latter case, due to floating-point rounding error, it might make sense to check for Math.Abs(myValue - Math.Round(myValue)) < someThreshold instead of comparing to 0.

If you want a reasonably efficient way of getting the 'decimal' value of a decimal type you can just mod it by one.
decimal number = 4.75M;
decimal fractionalPart = number % 1;
Console.WriteLine(fractionalPart); //will print 0.75
While it may not be the theoretically optimal solution, it'll be quite fast, and almost certainly fast enough for your purposes (far better than string manipulation and parsing, which is a common naive approach).

You can use Decimal.GetBits in order to retrieve the bits from a decimal structure.
The MSDN page linked above details how they are laid out in memory:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction. The scaling factor is implicitly the number 10, raised to an exponent ranging from 0 to 28.
The return value is a four-element array of 32-bit signed integers.
The first, second, and third elements of the returned array contain the low, middle, and high 32 bits of the 96-bit integer number.
The fourth element of the returned array contains the scale factor and sign. It consists of the following parts:
Bits 0 to 15, the lower word, are unused and must be zero.
Bits 16 to 23 must contain an exponent between 0 and 28, which indicates the power of 10 to divide the integer number.
Bits 24 to 30 are unused and must be zero.
Bit 31 contains the sign; 0 meaning positive, and 1 meaning negative.

Going with Oded's detailed info to use GetBits, I came up with this
const int EXP_MASK = 0x00FF0000;
bool hasDecimal = (Decimal.GetBits(value)[3] & EXP_MASK) != 0x0;

Converting byte[] of binary fixed point to floating point value

I'm reading some data over a socket. The integral data types are no trouble, the System.BitConverter methods are correctly handling the conversion. (So there are no Endian issues to worry about, I think?)
However, BitConverter.ToDouble isn't working for the floating point parts of the data...the source specification is a bit low level for me, but talks about a binary fixed point representation with a positive byte offset in the more significant direction and negative byte offset in the less significant direction.
Most of the research I've done has been aimed at C++ or a full fixed-point library handling sines and cosines, which sounds like overkill for this problem. Could someone please help me with a C# function to produce a float from 8 bytes of a byte array with, say, a -3 byte offset?
Further details of format as requested:
The signed numerical value of fixed point data shall be represented using binary, two's-complement notation.For fixed point data, the value of each data parameter shall be defined in relation to the reference byte. The reference byte defines an eight-bit field, with the unit of measure in the LSB position. The value of the LSB of the reference byte is ONE.
Byte offset shall be defined by a signed integer indicating the position of the least significant byte of a data element relative to the reference byte.
The MSB of the data element represents the sign bit. Bit positions between the MSB of the
parameter absolute value and the MSB of the most significant byte shall be equal in value to the sign bit.
Floating point data shall be represented as a binary floating point number in conformance with the IEEE ANSI/IEEE Std 754-2008. (This sentence is from a different section which may be a red herring).

Ok, after asking some questions from a local expert on the source material, it turns out CodeInChaos was on the right track...if the value is 8 bytes with a -3 byte offset, then I can use BitConverter.ToInt64 / 256^3, if it is 4 bytes with a -1 byte offset then BitConverter.ToInt32 / 256 will produce the correct answer. I guess that means BitConverter.ToXXX where XXX is signed is smart enough to handle the twos-complement calculations!
Thanks to those who tried to help out, I thought it couldn't be too complicated but getting that 256 offset from the reference document wording was very confusing:-)

System.BitConverter works very slow, so if performance is significant to you, i'd recommend to convert bytes to int by yourself (via logical shifts).
Also, please specify in what exact format floats are sent in your protocol.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.