Short version: how can I detect overflow using the fixed-point multiplication described here but for a signed type?
Long version:
I still have some overflow issues with my Q31.32 fixed point type. To make it easier to work out examples on paper, I've made a much smaller type using the same algorithm, a Q3.4 based on sbyte. I figure that if I can work out all the kinks for a Q3.4 type, the same logic should apply for a Q31.32 one.
Note that I could very easily implement Q3.4 multiplication by performing it on a 16-bit integer, but I'm doing as if that didn't exist, because for the Q31.32 I'd need a 128-bit integer which doesn't exist (and BigInteger is too slow).
I want my multiplication to handle overflow by saturation, that is when overflow happens, the result is the highest or smallest value that can be represented depending on the sign of the operands.
This is basically how the type is represented:
struct Fix8 {
sbyte m_rawValue;
public static readonly Fix8 One = new Fix8(1 << 4);
public static readonly Fix8 MinValue = new Fix8(sbyte.MinValue);
public static readonly Fix8 MaxValue = new Fix8(sbyte.MaxValue);
Fix8(sbyte value) {
m_rawValue = value;
}
public static explicit operator decimal(Fix8 value) {
return (decimal)value.m_rawValue / One.m_rawValue;
}
public static explicit operator Fix8(decimal value) {
var nearestExact = Math.Round(value * 16m) * 0.0625m;
return new Fix8((sbyte)(nearestExact * One.m_rawValue));
}
}
And this is how I currently handle multiplication:
public static Fix8 operator *(Fix8 x, Fix8 y) {
sbyte xl = x.m_rawValue;
sbyte yl = y.m_rawValue;
// split x and y into their highest and lowest 4 bits
byte xlo = (byte)(xl & 0x0F);
sbyte xhi = (sbyte)(xl >> 4);
byte ylo = (byte)(yl & 0x0F);
sbyte yhi = (sbyte)(yl >> 4);
// perform cross-multiplications
byte lolo = (byte)(xlo * ylo);
sbyte lohi = (sbyte)((sbyte)xlo * yhi);
sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
sbyte hihi = (sbyte)(xhi * yhi);
// shift results as appropriate
byte loResult = (byte)(lolo >> 4);
sbyte midResult1 = lohi;
sbyte midResult2 = hilo;
sbyte hiResult = (sbyte)(hihi << 4);
// add everything
sbyte sum = (sbyte)((sbyte)loResult + midResult1 + midResult2 + hiResult);
// if the top 4 bits of hihi (unused in the result) are neither all 0s or 1s,
// then this means the result overflowed.
sbyte topCarry = (sbyte)(hihi >> 4);
bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
if (topCarry != 0 && topCarry != -1) {
return opSignsEqual ? MaxValue : MinValue;
}
// if signs of operands are equal and sign of result is negative,
// then multiplication overflowed upwards
// the reverse is also true
if (opSignsEqual) {
if (sum < 0) {
return MaxValue;
}
}
else {
if (sum > 0) {
return MinValue;
}
}
return new Fix8(sum);
}
This gives result accurate within the precision of the type and handles most overflow cases. It doesn't however handle these ones, for example:
Failed -8 * 2 : expected -8 but got 0
Failed 3.5 * 5 : expected 7,9375 but got 1,5
Let's work out how the multiplication happens for the first one.
-8 and 2 are represented as x = 0x80 and y = 0x20.
xlo = 0x80 & 0x0F = 0x00
xhi = 0x80 >> 4 = 0xf8
ylo = 0x20 & 0x0F = 0x00
yhi = 0x20 >> 4 = 0x02
lolo = xlo * ylo = 0x00
lohi = xlo * yhi = 0x00
hilo = xhi * ylo = 0x00
hihi = xhi * yhi = 0xf0
The sum is obviously 0 as all terms are 0 save for hihi, but only the lowest 4 bits of hihi are used in the final sum.
My usual overflow detection magic doesn't work here: the result is zero so the sign of the result is meaningless (e.g. 0.0625 * -0.0625 == 0 (by rounding down), 0 is positive yet signs of operands differ); also the high bits of hihi are 1111 which often happens even when there's no overflow.
Basically I don't know how to detect that overflow happened here. Is there a more general method?
You should examine hihi to see whether it contains any relevant bits outside the range of the result. You can also compare the highest bit of the result with the corresponding bit in hihi to see whether a carry propagated that far, and if it did (i.e. the bit changed), whether that indicates an overflow (i.e. the bit changed in the wrong direction). All of this would probably be easier to formulate if you were using one's complement notation, and treat the sign bits separately. But in that case, your example of −8 would be pointless.
Looking at your example, you have hihi = 0xf0.
hihi 11110000
result ±###.####
So in this case, if there were no overflow in hihi alone, then the first 5 bits would all be the same, and the sign of the result would match the sign of hihi. This is not the case here. You can check this using
if ((hihi & 0x08) * 0x1f != (hihi & 0xf8))
handle_overflow();
The carry into hihi can probably be detected most easily by adding the result one summand at a time and performing common overflow detection after each step. Haven't got a nice piece of code for that ready.
This took me a long time, but I eventually figured everything out. This code is tested to work for every possible combination of x and y in the range allowed by sbyte. Here is the commented code:
static sbyte AddOverflowHelper(sbyte x, sbyte y, ref bool overflow) {
var sum = (sbyte)(x + y);
// x + y overflows if sign(x) ^ sign(y) != sign(sum)
overflow |= ((x ^ y ^ sum) & sbyte.MinValue) != 0;
return sum;
}
/// <summary>
/// Multiplies two Fix8 numbers.
/// Deals with overflow by saturation.
/// </summary>
public static Fix8 operator *(Fix8 x, Fix8 y) {
// Using the cross-multiplication algorithm, for learning purposes.
// It would be both trivial and much faster to use an Int16, but this technique
// won't work for a Fix64, since there's no Int128 or equivalent (and BigInteger is too slow).
sbyte xl = x.m_rawValue;
sbyte yl = y.m_rawValue;
byte xlo = (byte)(xl & 0x0F);
sbyte xhi = (sbyte)(xl >> 4);
byte ylo = (byte)(yl & 0x0F);
sbyte yhi = (sbyte)(yl >> 4);
byte lolo = (byte)(xlo * ylo);
sbyte lohi = (sbyte)((sbyte)xlo * yhi);
sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
sbyte hihi = (sbyte)(xhi * yhi);
byte loResult = (byte)(lolo >> 4);
sbyte midResult1 = lohi;
sbyte midResult2 = hilo;
sbyte hiResult = (sbyte)(hihi << 4);
bool overflow = false;
// Check for overflow at each step of the sum, if it happens overflow will be true
sbyte sum = AddOverflowHelper((sbyte)loResult, midResult1, ref overflow);
sum = AddOverflowHelper(sum, midResult2, ref overflow);
sum = AddOverflowHelper(sum, hiResult, ref overflow);
bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
// if signs of operands are equal and sign of result is negative,
// then multiplication overflowed positively
// the reverse is also true
if (opSignsEqual) {
if (sum < 0 || (overflow && xl > 0)) {
return MaxValue;
}
}
else {
if (sum > 0) {
return MinValue;
}
// If signs differ, both operands' magnitudes are greater than 1,
// and the result is greater than the negative operand, then there was negative overflow.
sbyte posOp, negOp;
if (xl > yl) {
posOp = xl;
negOp = yl;
}
else {
posOp = yl;
negOp = xl;
}
if (sum > negOp && negOp < -(1 << 4) && posOp > (1 << 4)) {
return MinValue;
}
}
// if the top 4 bits of hihi (unused in the result) are neither all 0s nor 1s,
// then this means the result overflowed.
sbyte topCarry = (sbyte)(hihi >> 4);
// -17 (-1.0625) is a problematic value which never causes overflow but messes up the carry bits
if (topCarry != 0 && topCarry != -1 && xl != -17 && yl != -17) {
return opSignsEqual ? MaxValue : MinValue;
}
// Round up if necessary, but don't overflow
var lowCarry = (byte)(lolo << 4);
if (lowCarry >= 0x80 && sum < sbyte.MaxValue) {
++sum;
}
return new Fix8(sum);
}
I'm putting all this together into a properly unit tested fixed-point math library for .NET, which will be available here: https://github.com/asik/FixedMath.Net
Related
I've found method which implements Adler32 algorithm in C# and I would like to use it, but I do not understand part of the code:
Can someone explain me:
1) why bit operators are used when sum1, and sum2 are initialized
2) why sum2 is shifted ?
Adler32 on wiki https://en.wikipedia.org/wiki/Adler-32
& operator explanation:
(Binary AND Operator copies a bit to the result if it exists in both operands)
private bool MakeForBuffer(byte[] bytesBuff, uint adlerCheckSum)
{
if (Object.Equals(bytesBuff, null))
{
checksumValue = 0;
return false;
}
int nSize = bytesBuff.GetLength(0);
if (nSize == 0)
{
checksumValue = 0;
return false;
}
uint sum1 = adlerCheckSum & 0xFFFF; // 1) why bit operator is used?
uint sum2 = (adlerCheckSum >> 16) & 0xFFFF; // 2) why bit operator is used? , why is it shifted?
for (int i = 0; i < nSize; i++)
{
sum1 = (sum1 + bytesBuff[i]) % adlerBase;
sum2 = (sum1 + sum2) % adlerBase;
}
checksumValue = (sum2 << 16) + sum1;
return true;
}
1) why bit operator is used?
& 0xFFFF sets the two high bytes of the checksum to 0, so sum1 is simply the lower 16 bits of the checksum.
2) why bit operator is used? , why is it shifted?
adlerCheckSum >> 16 shifts the 16 higher bytes down to the lower 16 bytes, & 0xFFFF does the same as in the first step - it sets the 16 high bits to 0.
Example
adlerChecksum = 0x12345678
adlerChecksum & 0xFFFF = 0x00005678
adlerChecksum >> 16 = 0x????1234
(it should be 0x00001234 in C# but other languages / compilers "wrap the bits around" and you would get 0x56781234)
(adlerChecksum >> 16) & 0xFFFF = 0x00001234 now you can be sure it's 0x1234, this step is just a precaution that's probably unnecessary in C#.
adlerChecksum = 0x12345678
sum1 = 0x00005678
sum2 = 0x00001234
Those two operations combined simply split the UInt32 checksum into two UInt16.
From the adler32 Tag-Wiki:
Adler-32 is a fast checksum algorithm used in zlib to verify the results of decompression. It is composed of two sums modulo 65521. Start with s1 = 1 and s2 = 0, then for each byte x, s1 = s1 + x, s2 = s2 + s1. The two sums are combined into a 32-bit value with s1 in the low 16 bits and s2 in the high 16 bits.
Community,
Assume we have a random integer which is in the range Int32.MinValue - Int32.MaxValue.
I'd like find two numbers which result in this integer when calculated together using the right shift operator.
An example :
If the input value is 123456 two possible output values could be 2022703104 and 14, because 2022703104 >> 14 == 123456
Here is my attempt:
private static int[] DetermineShr(int input)
{
int[] arr = new int[2];
if (input == 0)
{
arr[0] = 0;
arr[1] = 0;
return arr;
}
int a = (int)Math.Log(int.MaxValue / Math.Abs(input), 2);
int b = (int)(input * Math.Pow(2, a));
arr[0] = a;
arr[1] = b;
return arr;
}
However for some negativ values it doesn't work, the output won't result in a correct calculation.
And for very small input values such as -2147483648 its throwing an exception :
How can I modify my function so it will produce a valid output for all input values between Int32.MinValue and Int32.MaxValue ?
Well, let's compare
123456 == 11110001001000000
2022703104 == 1111000100100000000000000000000
can you see the pattern? If you're given shift (14 in your case) the answer is
(123456 << shift) + any number in [0..2 ** (shift-1)] range
however, on large values left shift can result in integer overflow; if shift is small (less than 32) I suggest using long:
private static long Factor(int source, int shift) {
unchecked {
// (uint): we want bits, not two complement
long value = (uint) source;
return value << shift;
}
}
Test:
int a = -1;
long b = Factor(-1, 3);
Console.WriteLine(a);
Console.WriteLine(Convert.ToString(a, 2));
Console.WriteLine(b);
Console.WriteLine(Convert.ToString(b, 2))
will return
-1
11111111111111111111111111111111
34359738360
11111111111111111111111111111111000
please, notice, that negative integers being two's complements
https://en.wikipedia.org/wiki/Two%27s_complement
are, in fact, quite large when treated as unsigned integers
I want to compare a stream of bits of arbitrary length to a mask in c# and return a ratio of how many bits were the same.
The mask to check against is anywhere between 2 bits long to 8k (with 90% of the masks being 5 bits long), the input can be anywhere between 2 bits up to ~ 500k, with an average input string of 12k (but yeah, most of the time it will be comparing 5 bits with the first 5 bits of that 12k)
Now my naive implementation would be something like this:
bool[] mask = new[] { true, true, false, true };
float dendrite(bool[] input) {
int correct = 0;
for ( int i = 0; i<mask.length; i++ ) {
if ( input[i] == mask[i] )
correct++;
}
return (float)correct/(float)mask.length;
}
but I expect this is better handled (more efficient) with some kind of binary operator magic?
Anyone got any pointers?
EDIT: the datatype is not fixed at this point in my design, so if ints or bytearrays work better, I'd also be a happy camper, trying to optimize for efficiency here, the faster the computation, the better.
eg if you can make it work like this:
int[] mask = new[] { 1, 1, 0, 1 };
float dendrite(int[] input) {
int correct = 0;
for ( int i = 0; i<mask.length; i++ ) {
if ( input[i] == mask[i] )
correct++;
}
return (float)correct/(float)mask.length;
}
or this:
int mask = 13; //1101
float dendrite(int input) {
return // your magic here;
} // would return 0.75 for an input
// of 101 given ( 1100101 in binary,
// matches 3 bits of the 4 bit mask == .75
ANSWER:
I ran each proposed answer against each other and Fredou's and Marten's solution ran neck to neck but Fredou submitted the fastest leanest implementation in the end. Of course since the average result varies quite wildly between implementations I might have to revisit this post later on. :) but that's probably just me messing up in my test script. ( i hope, too late now, going to bed =)
sparse1.Cyclone
1317ms 3467107ticks 10000iterations
result: 0,7851563
sparse1.Marten
288ms 759362ticks 10000iterations
result: 0,05066964
sparse1.Fredou
216ms 568747ticks 10000iterations
result: 0,8925781
sparse1.Marten
296ms 778862ticks 10000iterations
result: 0,05066964
sparse1.Fredou
216ms 568601ticks 10000iterations
result: 0,8925781
sparse1.Marten
300ms 789901ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1314ms 3457988ticks 10000iterations
result: 0,7851563
sparse1.Fredou
207ms 546606ticks 10000iterations
result: 0,8925781
sparse1.Marten
298ms 786352ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1301ms 3422611ticks 10000iterations
result: 0,7851563
sparse1.Marten
292ms 769850ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1305ms 3433320ticks 10000iterations
result: 0,7851563
sparse1.Fredou
209ms 551178ticks 10000iterations
result: 0,8925781
( testscript copied here, if i destroyed yours modifying it lemme know. https://dotnetfiddle.net/h9nFSa )
how about this one - dotnetfiddle example
using System;
namespace ConsoleApplication1
{
public class Program
{
public static void Main(string[] args)
{
int a = Convert.ToInt32("0001101", 2);
int b = Convert.ToInt32("1100101", 2);
Console.WriteLine(dendrite(a, 4, b));
}
private static float dendrite(int mask, int len, int input)
{
return 1 - getBitCount(mask ^ (input & (int.MaxValue >> 32 - len))) / (float)len;
}
private static int getBitCount(int bits)
{
bits = bits - ((bits >> 1) & 0x55555555);
bits = (bits & 0x33333333) + ((bits >> 2) & 0x33333333);
return ((bits + (bits >> 4) & 0xf0f0f0f) * 0x1010101) >> 24;
}
}
}
64 bits one here - dotnetfiddler
using System;
namespace ConsoleApplication1
{
public class Program
{
public static void Main(string[] args)
{
// 1
ulong a = Convert.ToUInt64("0000000000000000000000000000000000000000000000000000000000001101", 2);
ulong b = Convert.ToUInt64("1110010101100101011001010110110101100101011001010110010101100101", 2);
Console.WriteLine(dendrite(a, 4, b));
}
private static float dendrite(ulong mask, int len, ulong input)
{
return 1 - getBitCount(mask ^ (input & (ulong.MaxValue >> (64 - len)))) / (float)len;
}
private static ulong getBitCount(ulong bits)
{
bits = bits - ((bits >> 1) & 0x5555555555555555UL);
bits = (bits & 0x3333333333333333UL) + ((bits >> 2) & 0x3333333333333333UL);
return unchecked(((bits + (bits >> 4)) & 0xF0F0F0F0F0F0F0FUL) * 0x101010101010101UL) >> 56;
}
}
}
I came up with this code:
static float dendrite(ulong input, ulong mask)
{
// get bits that are same (0 or 1) in input and mask
ulong samebits = mask & ~(input ^ mask);
// count number of same bits
int correct = cardinality(samebits);
// count number of bits in mask
int inmask = cardinality(mask);
// compute fraction (0.0 to 1.0)
return inmask == 0 ? 0f : correct / (float)inmask;
}
// this is a little hack to count the number of bits set to one in a 64-bit word
static int cardinality(ulong word)
{
const ulong mult = 0x0101010101010101;
const ulong mask1h = (~0UL) / 3 << 1;
const ulong mask2l = (~0UL) / 5;
const ulong mask4l = (~0UL) / 17;
word -= (mask1h & word) >> 1;
word = (word & mask2l) + ((word >> 2) & mask2l);
word += word >> 4;
word &= mask4l;
return (int)((word * mult) >> 56);
}
This will check 64-bits at a time. If you need more than that you can just split the input data into 64-bit words and compare them one by one and compute the average result.
Here's a .NET fiddle with the code and a working test case:
https://dotnetfiddle.net/5hYFtE
I would change the code to something along these lines:
// hardcoded bitmask
byte mask = 255;
float dendrite(byte input) {
int correct = 0;
// store the xor:ed result
byte xored = input ^ mask;
// loop through each bit
for(int i = 0; i < 8; i++) {
// if the bit is 0 then it was correct
if(!(xored & (1 << i)))
correct++;
}
return (float)correct/(float)mask.length;
}
The above uses a mask and input of 8 bits, but of course you could modify this to use a 4 byte integer and so on.
Not sure if this will work as expected, but it might give you some clues on how to proceed.
For example if you only would like to check the first 4 bits you could change the code to something like:
float dendrite(byte input) {
// hardcoded bitmask i.e 1101
byte mask = 13;
// number of bits to check
byte bits = 4;
int correct = 0;
// store the xor:ed result
byte xored = input ^ mask;
// loop through each bit, notice that we only checking the first 4 bits
for(int i = 0; i < bits; i++) {
// if the bit is 0 then it was correct
if(!(xored & (1 << i)))
correct++;
}
return (float)correct/(float)bits;
}
Of course it might be faster to actually use a int instead of a byte.
I need to be able to convert from a Delphi Real48 to C# double.
I've got the bytes I need to convert but am looking for an elegant solution. to the problem.
Anybody out there had to do this before?
I'm needing to do the conversion in C#
Thanks in advance
I've done some hunting around and I found some C++ code to do the job, converted it and it seems to be giving the right answer... damned if I understand it all though :S
private static double Real48ToDouble(byte[] real48)
{
if (real48[0] == 0)
return 0.0; // Null exponent = 0
double exponent = real48[0] - 129.0;
double mantissa = 0.0;
for (int i = 1; i < 5; i++) // loop through bytes 1-4
{
mantissa += real48[i];
mantissa *= 0.00390625; // mantissa /= 256
}
mantissa += (real48[5] & 0x7F);
mantissa *= 0.0078125; // mantissa /= 128
mantissa += 1.0;
if ((real48[5] & 0x80) == 0x80) // Sign bit check
mantissa = -mantissa;
return mantissa * Math.Pow(2.0, exponent);
}
If somebody can explain it that would be great :D
static double GetDoubleFromBytes(byte[] bytes)
{
var real48 = new long[6];
real48[0] = bytes[0];
real48[1] = bytes[1];
real48[2] = bytes[2];
real48[3] = bytes[3];
real48[4] = bytes[4];
real48[5] = bytes[5];
long sign = (real48[0] & 0x80) >> 7;
long significand =
((real48[0] % 0x80) << 32) +
(real48[1] << 24) +
(real48[2] << 16) +
(real48[3] << 8) +
(real48[4]);
long exponent = bytes[5];
if (exponent == 0)
{
return 0.0;
}
exponent += 894;
long bits = (sign << 63) + (exponent << 52) + (significand << 13);
return BitConverter.Int64BitsToDouble(bits);
}
Appreciate this is an old post, but also the following may be useful for those looking to do this in T-SQL (which I was).
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ifn_HexReal48ToFloat]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
drop function [dbo].[ifn_HexReal48ToFloat]
go
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[ifn_HexReal48ToFloat]
(
#strRawHexBinary char(12), -- NOTE. Do not include the leading 0x
#bitReverseBytes bit
)
RETURNS FLOAT
AS
BEGIN
-- Reverse bytes if required
-- e.g. 3FF4 0000 0000 is stored as
-- 0000 0000 F43F
declare #strNewValue varchar(12)
if #bitReverseBytes = 1
begin
set #strNewValue=''
declare #intCounter int
set #intCounter = 6
while #intCounter>=0
begin
set #strNewValue = #strNewValue + substring(#strRawHexBinary, (#intCounter * 2) + 1,2)
set #intCounter = #intCounter - 1
end
end
-- Convert the raw string into a binary
declare #binBinaryFloat binary(6)
set #binBinaryFloat = convert(binary(6),'0x' + isnull(#strNewValue, #strRawHexBinary),1)
-- Based on original hex to float conversion at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=81849
-- and storage format documented at
-- http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/devcommon/internaldataformats_xml.html
-- Where, counting from the left
-- Sign = bit 1
-- Exponent = bits 41 - 48 with a bias of 129
-- Fraction = bits 2 - 40
return
SIGN
(
CAST(#binBinaryFloat AS BIGINT)
)
*
-- Fraction part. 39 bits. From left 2 - 40.
(
1.0 +
(CAST(#binBinaryFloat AS BIGINT) & 0x7FFFFFFFFF00) * POWER(CAST(2 AS FLOAT), -47)
)
*
-- Exponent part. 8 bits. From left bits 41 -48
POWER
(
CAST(2 AS FLOAT),
(
CAST(#binBinaryFloat AS BIGINT) & 0xff
- 129
)
)
end
Confirmation
0.125 is 0x 0000 0000 007E (or 0x 7E00 0000 0000 reversed)
select dbo.ifn_HexReal48ToFloat('00000000007E', 0)
select dbo.ifn_HexReal48ToFloat('7E0000000000', 1)
The input is a char12 as I had to extract the binary from the middle of 2 other larger binary fields and shunt them together so had it already as char12. Easy enough to change to be binary(6) input if don't need to do any manipulation beforehand.
As an aside, in the scenario I'm implementing into, the T-SQL variant is outperformed by C# CLR code so the C# code above may be better. Whilst not everywhere allows CLR code into SQL Server if you can then maybe you should. For more background an article at http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/ does some in depth measurement which shows some dramatic differences between T-SQL and CLR.
I have been testing this and have found an error (as others have noticed) with negative values. Here is my tested version of the code. I tested this with 120,530 different random values ranging from 11,400,000.00 to -2,000,000.00
//This seems to be the layout of the Real48 bits where
//E = Exponent
//S = Sign bit
//F = Fraction
//EEEEEEEE FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF SFFFFFFF
//12345678 12345678 12345678 12345678 12345678 12345678
Double exponentbase = 129d; // The exponent is offest by 129
Double exponent = real48[0] - exponentbase; // deduct the offest.
// Calculate the mantissa
Double mantissa = 0.0;
Double value = 1.0;
// For Each Byte.
for (int iByte = 5; iByte >= 1; iByte--)
{
int startbit = 7;
if (iByte == 5)
{ startbit = 6; } //skip the sign bit.
//For Each Bit
for (int iBit = startbit; iBit >= 0; iBit--)
{
value = value / 2;// Each bit is worth half the next bit but we're going backwards.
if (((real48[iByte] >> iBit) & 1) == 1) //if this bit is set.
{
mantissa += value; // add the value.
}
}
}
if (mantissa == 1.0 && real48[0] == 0) // Test for null value
return 0.0;
double result;
result = (1 + mantissa) * Math.Pow(2.0, exponent);
if ((real48[5] & 0x80) == 0x80) // Sign bit check
result = -result;
return result;
I've changed the code you've posted into a more readable format so you can see how it works:
Double exponentbase = 129d;
Double exponent = real48[0] - exponentbase; // The exponent is offest so deduct the base.
// Now Calculate the mantissa
Double mantissa = 0.0;
Double value = 1.0;
// For Each Byte.
for (int i = 5; i >= 1; i--)
{
int startbit = 7;
if (i == 5)
{ startbit = 6; } //skip the sign bit.
//For Each Bit
for (int j = startbit; j >= 0; j--)
{
value = value / 2;// Each bit is worth half the next bit but we're going backwards.
if (((real48[i] >> j) & 1) == 1) //if this bit is set.
{
mantissa += value; // add the value.
}
}
}
if (mantissa == 1.0 && real48[0] == 0) // Test for null value
return 0.0;
if ((real48[5] & 0x80) == 1) // Sign bit check
mantissa = -mantissa;
return (1 + mantissa) * Math.Pow(2.0, exponent);
I want to convert an int to a byte[2] array using BCD.
The int in question will come from DateTime representing the Year and must be converted to two bytes.
Is there any pre-made function that does this or can you give me a simple way of doing this?
example:
int year = 2010
would output:
byte[2]{0x20, 0x10};
static byte[] Year2Bcd(int year) {
if (year < 0 || year > 9999) throw new ArgumentException();
int bcd = 0;
for (int digit = 0; digit < 4; ++digit) {
int nibble = year % 10;
bcd |= nibble << (digit * 4);
year /= 10;
}
return new byte[] { (byte)((bcd >> 8) & 0xff), (byte)(bcd & 0xff) };
}
Beware that you asked for a big-endian result, that's a bit unusual.
Use this method.
public static byte[] ToBcd(int value){
if(value<0 || value>99999999)
throw new ArgumentOutOfRangeException("value");
byte[] ret=new byte[4];
for(int i=0;i<4;i++){
ret[i]=(byte)(value%10);
value/=10;
ret[i]|=(byte)((value%10)<<4);
value/=10;
}
return ret;
}
This is essentially how it works.
If the value is less than 0 or greater than 99999999, the value won't fit in four bytes. More formally, if the value is less than 0 or is 10^(n*2) or greater, where n is the number of bytes, the value won't fit in n bytes.
For each byte:
Set that byte to the remainder of the value-divided-by-10 to the byte. (This will place the last digit in the low nibble [half-byte] of the current byte.)
Divide the value by 10.
Add 16 times the remainder of the value-divided-by-10 to the byte. (This will place the now-last digit in the high nibble of the current byte.)
Divide the value by 10.
(One optimization is to set every byte to 0 beforehand -- which is implicitly done by .NET when it allocates a new array -- and to stop iterating when the value reaches 0. This latter optimization is not done in the code above, for simplicity. Also, if available, some compilers or assemblers offer a divide/remainder routine that allows retrieving the quotient and remainder in one division step, an optimization which is not usually necessary though.)
Here's a terrible brute-force version. I'm sure there's a better way than this, but it ought to work anyway.
int digitOne = year / 1000;
int digitTwo = (year - digitOne * 1000) / 100;
int digitThree = (year - digitOne * 1000 - digitTwo * 100) / 10;
int digitFour = year - digitOne * 1000 - digitTwo * 100 - digitThree * 10;
byte[] bcdYear = new byte[] { digitOne << 4 | digitTwo, digitThree << 4 | digitFour };
The sad part about it is that fast binary to BCD conversions are built into the x86 microprocessor architecture, if you could get at them!
Here is a slightly cleaner version then Jeffrey's
static byte[] IntToBCD(int input)
{
if (input > 9999 || input < 0)
throw new ArgumentOutOfRangeException("input");
int thousands = input / 1000;
int hundreds = (input -= thousands * 1000) / 100;
int tens = (input -= hundreds * 100) / 10;
int ones = (input -= tens * 10);
byte[] bcd = new byte[] {
(byte)(thousands << 4 | hundreds),
(byte)(tens << 4 | ones)
};
return bcd;
}
maybe a simple parse function containing this loop
i=0;
while (id>0)
{
twodigits=id%100; //need 2 digits per byte
arr[i]=twodigits%10 + twodigits/10*16; //first digit on first 4 bits second digit shifted with 4 bits
id/=100;
i++;
}
More common solution
private IEnumerable<Byte> GetBytes(Decimal value)
{
Byte currentByte = 0;
Boolean odd = true;
while (value > 0)
{
if (odd)
currentByte = 0;
Decimal rest = value % 10;
value = (value-rest)/10;
currentByte |= (Byte)(odd ? (Byte)rest : (Byte)((Byte)rest << 4));
if(!odd)
yield return currentByte;
odd = !odd;
}
if(!odd)
yield return currentByte;
}
Same version as Peter O. but in VB.NET
Public Shared Function ToBcd(ByVal pValue As Integer) As Byte()
If pValue < 0 OrElse pValue > 99999999 Then Throw New ArgumentOutOfRangeException("value")
Dim ret As Byte() = New Byte(3) {} 'All bytes are init with 0's
For i As Integer = 0 To 3
ret(i) = CByte(pValue Mod 10)
pValue = Math.Floor(pValue / 10.0)
ret(i) = ret(i) Or CByte((pValue Mod 10) << 4)
pValue = Math.Floor(pValue / 10.0)
If pValue = 0 Then Exit For
Next
Return ret
End Function
The trick here is to be aware that simply using pValue /= 10 will round the value so if for instance the argument is "16", the first part of the byte will be correct, but the result of the division will be 2 (as 1.6 will be rounded up). Therefore I use the Math.Floor method.
I made a generic routine posted at IntToByteArray that you could use like:
var yearInBytes = ConvertBigIntToBcd(2010, 2);
static byte[] IntToBCD(int input) {
byte[] bcd = new byte[] {
(byte)(input>> 8),
(byte)(input& 0x00FF)
};
return bcd;
}