When I Initialize a ulong with the value 18446744073709551615 and then add a 1 to It and display to the Console It displays a 0 which is totally expected.
I know this question sounds stupid but I have to ask It. if my Computer has a 64-bit architecture CPU how is my calculator able to work with larger numbers than 18446744073709551615?
I suppose floating-point has a lot to do here.
I would like to know exactly how this happens.
Thank you.
working with larger numbers than 18446744073709551615
"if my Computer has a 64-bit architecture CPU" --> The architecture bit size is largely irrelevant.
Consider how you are able to add 2 decimal digits whose sum is more than 9. There is a carry generated and then used when adding the next most significant decimal place.
The CPU can do the same but with base 18446744073709551616 instead of base 10. It uses a carry bit as well as a sign and overflow bit to perform extended math.
I suppose floating-point has a lot to do here.
This is nothing to do with floating point.
; you say you're using ulong, which means your using unsigned 64-but arithmetic. The largest value you can store is therefore "all ones", for 64 bits - aka UInt64.MaxValue, which as you've discovered: https://learn.microsoft.com/en-us/dotnet/api/system.uint64.maxvalue
If you want to store arbitrarily large numbers: there are APIs for that - for example BigInteger. However, arbitrary size cones at a cost, so it isn't the default, and certainly isn't what you get when you use ulong (or double, or decimal, etc - all the compiler-level numeric types have fixed size).
So: consider using BigInteger
You either way have a 64 bits architecture processor and limited to doing 64 bits math - your problem is a bit hard to explain without taking an explicit example of how this is solved with BigInteger in System.Numerics namespace, available in .NET Framework 4.8 for example. The basis is to 'decompose' the number into an array representation.
mathematical expression 'decompose' here meaning :
"express (a number or function) as a combination of simpler components."
Internally BigInteger uses an internal array (actually multiple internal constructs) and a helper class called BigIntegerBuilder. In can implicitly convert an UInt64 integer without problem, for even bigger numbers you can use the + operator for example.
BigInteger bignum = new BigInteger(18446744073709551615);
bignum += 1;
You can read about the implicit operator here:
https://referencesource.microsoft.com/#System.Numerics/System/Numerics/BigInteger.cs
public static BigInteger operator +(BigInteger left, BigInteger right)
{
left.AssertValid();
right.AssertValid();
if (right.IsZero) return left;
if (left.IsZero) return right;
int sign1 = +1;
int sign2 = +1;
BigIntegerBuilder reg1 = new BigIntegerBuilder(left, ref sign1);
BigIntegerBuilder reg2 = new BigIntegerBuilder(right, ref sign2);
if (sign1 == sign2)
reg1.Add(ref reg2);
else
reg1.Sub(ref sign1, ref reg2);
return reg1.GetInteger(sign1);
}
In the code above from ReferenceSource you can see that we use the BigIntegerBuilder to add the left and right parts, which are also BigInteger constructs.
Interesting, it seems to keep its internal structure into an private array called "_bits", so that is the answer to your question. BigInteger keeps track of an array of 32-bits valued integer array and is therefore able to handle big integers, even beyond 64 bits.
You can drop this code into a console application or Linqpad (which has the .Dump() method I use here) and inspect :
BigInteger bignum = new BigInteger(18446744073709551615);
bignum.GetType().GetField("_bits",
BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bignum).Dump();
A detail about BigInteger is revealed in a comment in the source code of BigInteger on Reference Source. So for integer values, BigInteger stores the value in the _sign field, for other values the field _bits is used.
Obviously, the internal array needs to be able to be converted into a representation in the decimal system (base-10) so humans can read it, the ToString() method converts the BigInteger to a string representation.
For a better in-depth understanding here, consider doing .NET source stepping to step way into the code how you carry out the mathematics here. But for a basic understanding, the BigInteger uses an internal representation of which is composed with 32 bits array which is transformed into a readable format which allows bigger numbers, bigger than even Int64.
// For values int.MinValue < n <= int.MaxValue, the value is stored in sign
// and _bits is null. For all other values, sign is +1 or -1 and the bits are in _bits
Related
I want to change a value, of, let's say, type int to be of type short, and making the value itself be "normalized" to the maximum value short can store - that is, so int.MaxValue would convert into short.MaxValue, and vice versa.
Here's an example using floating-point math to demonstrate:
public static short Rescale(int value){
float normalized = (float)value / int.MaxValue; // normalize the value to -1.0 to 1.0
float rescaled = normalized * (float)(short.MaxValue);
return (short)(rescaled);
}
While this works, it seems like using floating-point math is really inefficient, and can be improved, as we're dealing with binary data here. I tried using bit-shifting, but with to no avail.
Both signed and unsigned values are going to be processed - that isn't really an issue with the floating point solution, but when bit-shifting and doing other bit-manipulation, that makes things much more difficult.
This code will be used in quite a performance heavy context - it will be called 512 times every ~20 milliseconds, so performance is pretty important here.
How can I do this with bit-manipulation (or plain old integer algebra, if bit manipulation isn't necessary) and avoid floating-point math when we're operating on integer values?
You should use the shift operator. It is very fast.
int is 32bits, short is 16, so shift 16 bits right to scale your int to a short:
int x = 208908324 ;
//32 bits vs 16 bits.
short k = (short) (x >> 16);
Just reverse the process for scaling up. Obviously the lower bits will be filled with zeros.
I have tried BigInteger, decimal, float and long but no luck.
Screenshot of required output example
It is a fairly easy task to write your own rational class; remember, rationals are just pairs of integers, and you already have BigInteger.
In this series of articles I show how to devise your own big integer and big rational classes starting from absolutely nothing, not even integers. Note that this is not fast and not intended to be fast; it is intended to be educational. You can use the techniques I describe in this series to help you when designing your arithmetic class.
https://ericlippert.com/2013/09/16/math-from-scratch-part-one/
Or, if you don't want to write it yourself, you can always use the one from Microsoft:
http://bcl.codeplex.com/wikipage?title=BigRational&referringTitle=Home
But that said...
I need a minimum of 128 decimal places to calculate precise probabilities of events between different time steps
Do you need 128 decimal places to represent 128 digits of precision, or of magnitude? Because if it is just magnitude, then simply do a transformation of your probability math into logarithms and do the math in doubles.
The easiest way to achieve arbitrary precision numbers is to combine the BigInteger class from System.Numerics with an int exponent. You could use BigInteger for your exponent, but this is likely overkill as the numbers would be well beyong meaningful in scale.
So if you create a class along these lines:
public class ArbDecimal
{
BigInteger value;
int exponent;
public override string ToString()
{
StringBuilder sb = new StringBuilder();
int place;
foreach (char digit in value.ToString())
{
if (place++ == value.ToString().Length - exponent)
{
sb.Append('.');
}
sb.Append(digit);
}
return sb.ToString();
}
}
You should then be able to define your mathematical operations using the laws of indices with the value and exponent fields.
For instance, to achieve addition, you would scale the larger value to have the same exponent as the smaller one by multiplying it by 10^(largerExp-smallerExp) then adding the two values and rescaling.
In your class, the number 0.01 would be represented like:
value = 1
exponent = -2
Due to the fact that 1*10^-2 = 0.01.
Utilising this method, you can store arbitrarily precise (and large) numbers limited only by the available ram and the .NET framework's object size limit.
I am maintaining a C# desktop application, on windows 7, using Visual Studio 2013. And somewhere in the code there is the following line, that tries to create a 0.01 decimal value, using a Decimal(Int32[]) constructor:
decimal d = new decimal(new int[] { 1, 0, 0, 131072 });
First question is, is it different from the following?
decimal d = 0.01M;
If it is not different, why the developer has gone through the trouble of coding like that?
I need to change this line in order to create dynamic values. Something like:
decimal d = (decimal) (1 / Math.Pow(10, digitNumber));
Am I going to cause some unwanted behavior this way?
It seems useful to me when the source of the decimal consists of bits.
The decimal used in .NET has an implementation that is based on a sequence of bit parameters (not just one stream of bits like with an int), so it can be useful to construct a decimal with bits when you communicate with other systems which return a decimal through a blob of bytes (a socket, from a piece of memory, etc).
It is easy now to convert the set of bits to a decimal now. No need for fancy conversion code. Also, you can construct a decimal from the inputs defined in the standard, which makes it convenient for testing the .NET framework too.
The decimal(int[] bits) constructor allows you to give a bitwise definition of the decimal you're creating bits must be a 4 int array where:
bits 0, 1, and 2 make up the 96-bit integer number.
bits 3 contains the scale factor and sign
It just allows you to get really precise with the definition of the decimal judging from your example I don't think you need that level of precision.
See here for more detail on using that constructor or here for other constructors that may be more appropriate for you
To more specifically answer your question if digitNumberis a 16bit exponent then decimal d = new decimal(new int[] { 1, 0, 0, digitNumber << 16 }); does what you want since the exponent goes in bits 16 - 23 of last int in the array
The definition in the xml is
//
// Summary:
// Initializes a new instance of System.Decimal to a decimal value represented
// in binary and contained in a specified array.
//
// Parameters:
// bits:
// An array of 32-bit signed integers containing a representation of a decimal
// value.
//
// Exceptions:
// System.ArgumentNullException:
// bits is null.
//
// System.ArgumentException:
// The length of the bits is not 4.-or- The representation of the decimal value
// in bits is not valid.
So for some unknown reason the original developer wanted to initialize his decimal this way. Maybe he was just wanted to confuse someone in the future.
It cant possibly affect your code if you change this to
decimal d = 0.01m;
because
(new decimal(new int[] { 1, 0, 0, 131072})) == 0.01m
You should exactly know how decimal stored in memory.
you can use this method to generate the desired value
public static decimal Base10FractionGenerator(int digits)
{
if (digits < 0 || digits > 28)
throw new ArgumentException($"'{nameof(digits)}' must be between 0 and 28");
return new decimal(new[] { 1, 0, 0, digits << 16 });
}
Use it like
Console.WriteLine(Base10FractionGenerator(0));
Console.WriteLine(Base10FractionGenerator(2));
Console.WriteLine(Base10FractionGenerator(5));
Here is the result
1
0.01
0.00001
The particular constructor you're talking about generates a decimal from four 32-bit values. Unfortunately, newer versions of the Common Language Infrastructure (CLI) leave its exact format unspecified (presumably to allow implementations to support different decimal formats) and now merely guarantee at least a specific precision and range of decimal numbers. However, earlier versions of the CLI do define that format exactly as Microsoft's implementation does, so it's probably kept that way in Microsoft's implementation for backward compatibility. However, it's not ruled out that other implementations of the CLI will interpret the four 32-bit values of the Decimal constructor differently.
Decimals are exact numerics, you can use == or != to test for equality.
Perhaps, this line of code comes from some other place where it made sense at some particular point of time.
I'd clean it up.
I recently came across denormalized definition and I understand that there are some numbers that cannot be represented in a normalized form because they are too small to fit into its corresponding type. According with IEEE
So what I was trying to do is catch when a denormalized number is being passed as a parameter to avoid calculations with this numbers. If I am understanding correct I just need to look for numbers within the Range of denormalized
private bool IsDenormalizedNumber(float number)
{
return Math.Pow(2, -149) <= number && number<= ((2-Math.Pow(2,-23))*Math.Pow(2, -127)) ||
Math.Pow(-2, -149) <= number && number<= -((2 - Math.Pow(2, -23)) * Math.Pow(2, -127));
}
Is my interpretation correct?
I think a better approach would be to inspect the bits. Normalized or denormalized is a characteristic of the binary representation, not of the value itself. Therefore, you will be able to detect it more reliably this way and you can do so without and potentially dangerous floating point comparisons.
I put together some runnable code for you, so that you can see it work. I adapted this code from a similar question regarding doubles. Detecting the denormal is much simpler than fully excising the exponent and significand, so I was able to simplify the code greatly.
As for why it works... The exponent is stored in offset notation. The 8 bits of the exponent can take the values 1 to 254 (0 and 255 are reserved for special cases), they are then offset adjusted by -127 yielding the normalized range of -126 (1-127) to 127 (254-127). The exponent is set to 0 in the denormal case. I think this is only required because .NET does not store the leading bit on the significand. According to IEEE 754, it can be stored either way. It appears that C# has opted for dropping it in favor of a sign bit, though I don't have any concrete details to back that observation.
In any case, the actual code is quite simple. All that is required is to excise the 8 bits storing the exponent and test for 0. There is a special case around 0, which is handled below.
NOTE: Per the comment discussion, this code relies on platform specific implementation details (x86_64 in this test case). As #ChiuneSugihara pointed out, the CLI does not ensure this behavior and it may differ on other platforms, such as ARM.
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("-120, denormal? " + IsDenormal((float)Math.Pow(2, -120)));
Console.WriteLine("-126, denormal? " + IsDenormal((float)Math.Pow(2, -126)));
Console.WriteLine("-127, denormal? " + IsDenormal((float)Math.Pow(2, -127)));
Console.WriteLine("-149, denormal? " + IsDenormal((float)Math.Pow(2, -149)));
Console.ReadKey();
}
public static bool IsDenormal(float f)
{
// when 0, the exponent will also be 0 and will break
// the rest of this algorithm, so we should check for
// this first
if (f == 0f)
{
return false;
}
// Get the bits
byte[] buffer = BitConverter.GetBytes(f);
int bits = BitConverter.ToInt32(buffer, 0);
// extract the exponent, 8 bits in the upper registers,
// above the 23 bit significand
int exponent = (bits >> 23) & 0xff;
// check and see if anything is there!
return exponent == 0;
}
}
}
The output is:
-120, denormal? False
-126, denormal? False
-127, denormal? True
-149, denormal? True
Sources:
extracting mantissa and exponent from double in c#
https://en.wikipedia.org/wiki/IEEE_floating_point
https://en.wikipedia.org/wiki/Denormal_number
http://csharpindepth.com/Articles/General/FloatingPoint.aspx
Code adapted from:
extracting mantissa and exponent from double in c#
From my understanding denormalized numbers are there for help with underflows in some cases (see answer to Denormalized Numbers - IEEE 754 Floating Point).
So to get a denormalized number you would need to explicitly create it or else cause an underflow. In the first case it seems unlikely that a literal denormalized number would be specified in code, and even if someone tried it I am not sure that .NET would allow it. In the second case as long as you are in a checked context you should get an OverflowException thrown for any overflow or underflow in an arithmetic computation so that would guard against the possibility of getting a denormalized number. In an unchecked context I am not sure if an underflow will take you to a denormalized number, but you can try it and see if you are wanting to run calculations in unchecked.
Long story short you can not worry about it if you are running in checked and try an underflow and see in unchecked if you want to run in that context.
EDIT
I wanted to update my answer since a comment didn't feel substantial enough. First off I struck out the comment I made about the checked context since that only applies to non-floating point calculations (like int) and not to float or double. That was my mistake on that one.
The issue with denormalized numbers is that they are not consistent in the CLI. Notice how I am using "CLI" and not "C#" because we need to go lower level than just C# to understand the issue. From The Common Language Infrastructure Annotated Standard Partition I Section 12.1.3 the second note (page 125 of the book) it states:
This standard does not specify the behavior of arithmetic operations on denormalized floating point numbers, nor does it specify when or whether such representations should be created. This is in keeping with IEC 60559:1989. In addition, this standard does not specify how to access the exact bit pattern of NaNs that are created, nor the behavior when converting a NaN between 32-bit and 64-bit representation. All of this behavior is deliberately left implementation specific.
So at the CLI level the handling of denormalized numbers is deliberately left to be implementation specific. Furthermore, if you look at the documentation for float.Epsilon (found here), which is the smallest positive number representable by a float you will get a denormalized number on most machines that matches what is listed in the documentation (which is approximately 1.4e-45). This is what #Kevin Burdett was most likely seeing in his answer. That being said if you scroll down farther on the page you will see the following quote under "Platform Notes"
On ARM systems, the value of the Epsilon constant is too small to be detected, so it equates to zero. You can define an alternative epsilon value that equals 1.175494351E-38 instead.
So there are portability issues that can come into play when you are dealing with manually handling denormalized numbers even just for the .NET CLR (which is an implementation of the CLI). In fact this ARM specific value is kind of interesting since it appears to be a normalized number (I used the function from #Kevin Burdett with IsDenormal(1.175494351E-38f) and it returned false). In the CLI proper the concerns are more severe since there is no standardization on their handling by design according to the annotation on the CLI standard. So this leaves questions about what would happen with the same code on Mono or Xamarin for instance which is a difference implementation of the CLI than the .NET CLR.
In the end I am right back to my previous advice. Just don't worry about denormalized numbers, they are there to silently help you and it is hard to imagine why you would need to specifically single them out. Also as #HansPassant mentioned you most likely won't even encounter anyway. It is just hard to imagine how you would be going under the smallest, positive normalized number in double which is absurdly small.
How can i convert double value to binary value.
i have some value like this below 125252525235558554452221545332224587265 i want to convert this to binary format..so i am keeping it in double and then trying to convert to binary (1 & 0's).. i am using C#.net
Well, you haven't specified a platform or what sort of binary value you're interested in, but in .NET there's BitConverter.DoubleToInt64Bits which lets you get at the IEEE 754 bits making up the value very easily.
In Java there's Double.doubleToLongBits which does the same thing.
Note that if you have a value such as "125252525235558554452221545332224587265" then you've got more information than a double can store accurately in the first place.
In C, you can do it for instance this way, which is a classic use of the union construct:
int i;
union {
double x;
unsigned char byte[sizeof (double)];
} converter;
converter.x = 5.5555555555556e18;
for(i = 0; i < sizeof converter.byte; i++)
printf("%02x", converter.byte[i]);
If you stick this in a main() and run it, it might print something like this:
~/src> gcc -o floatbits floatbits.c
~/src> ./floatbits
ba b5 f6 15 53 46 d3 43
Note though that this, of course, is platform-dependent in its endianness. The above is from a Linux system running on a Sempron CPU, i.e. it's little endian.
A decade late but hopefully this will help someone:
// Converts a double value to a string in base 2 for display.
// Example: 123.5 --> "0:10000000101:1110111000000000000000000000000000000000000000000000"
// Created by Ryan S. White in 2020, Released under the MIT license.
string DoubleToBinaryString(double val)
{
long v = BitConverter.DoubleToInt64Bits(val);
string binary = Convert.ToString(v, 2);
return binary.PadLeft(64, '0').Insert(12, ":").Insert(1, ":");
}
If you mean you want to do it yourself, then this is not a programming question.
If you want to make a computer do it, the easiest way is to use a floating point input routine and then display the result in its hex form. In C++:
double f = atof ("5.5555555555556E18");
unsigned char *b = (unsigned char *) &f;
for (int j = 0; j < 8; ++j)
printf (" %02x", b [j]);
A double value already IS a binary value. It is just a matter of the representation that you wish it to have. In a programming language when you call it a double, then the language that you use will interpret it in one way. If you happen to call the same chunk of memory an int, then it is not the same number.
So it depends what you really want... If you need to write it to disk or to network, then you need to think about BigEndian/LittleEndian.
For these huge numbers (who cannot be presented accurately using a double) you need to use some specialized class to hold the information needed.
C# provides the Decimal class:
The Decimal value type represents decimal numbers ranging from positive 79,228,162,514,264,337,593,543,950,335 to negative 79,228,162,514,264,337,593,543,950,335. The Decimal value type is appropriate for financial calculations requiring large numbers of significant integral and fractional digits and no round-off errors. The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding. For example, the following code produces a result of 0.9999999999999999999999999999 rather than 1.
If you need bigger precision than this, you need to make your own class I guess. There is one here for ints: http://sourceforge.net/projects/cpp-bigint/ although it seems to be for c++.