I have a large set of numbers, probably in the multiple gigabytes range. First issue is that I can't store all of these in memory. Second is that any attempt at addition of these will result in an overflow. I was thinking of using more of a rolling average, but it needs to be accurate. Any ideas?
These are all floating point numbers.
This is not read from a database, it is a CSV file collected from multiple sources. It has to be accurate as it is stored as parts of a second (e.g; 0.293482888929) and a rolling average can be the difference between .2 and .3
It is a set of #'s representing how long users took to respond to certain form actions. For example when showing a messagebox, how long did it take them to press OK or Cancel. The data was sent to me stored as seconds.portions of a second; 1.2347 seconds for example. Converting it to milliseconds and I overflow int, long, etc.. rather quickly. Even if I don't convert it, I still overflow it rather quickly. I guess the one answer below is correct, that maybe I don't have to be 100% accurate, just look within a certain range inside of a sepcific StdDev and I would be close enough.
You can sample randomly from your set ("population") to get an average ("mean"). The accuracy will be determined by how much your samples vary (as determined by "standard deviation" or variance).
The advantage is that you have billions of observations, and you only have to sample a fraction of them to get a decent accuracy or the "confidence range" of your choice. If the conditions are right, this cuts down the amount of work you will be doing.
Here's a numerical library for C# that includes a random sequence generator. Just make a random sequence of numbers that reference indices in your array of elements (from 1 to x, the number of elements in your array). Dereference to get the values, and then calculate your mean and standard deviation.
If you want to test the distribution of your data, consider using the Chi-Squared Fit test or the K-S test, which you'll find in many spreadsheet and statistical packages (e.g., R). That will help confirm whether this approach is usable or not.
Integers or floats?
If they're integers, you need to accumulate a frequency distribution by reading the numbers and recording how many of each value you see. That can be averaged easily.
For floating point, this is a bit of a problem. Given the overall range of the floats, and the actual distribution, you have to work out a bin-size that preserves the accuracy you want without preserving all of the numbers.
Edit
First, you need to sample your data to get a mean and a standard deviation. A few thousand points should be good enough.
Then, you need to determine a respectable range. Folks pick things like ±6σ (standard deviations) around the mean. You'll divide this range into as many buckets as you can stand.
In effect, the number of buckets determines the number of significant digits in your average. So, pick 10,000 or 100,000 buckets to get 4 or 5 digits of precision. Since it's a measurement, odds are good that your measurements only have two or three digits.
Edit
What you'll discover is that the mean of your initial sample is very close to the mean of any other sample. And any sample mean is close to the population mean. You'll note that most (but not all) of your means are with 1 standard deviation of each other.
You should find that your measurement errors and inaccuracies are larger than your standard deviation.
This means that a sample mean is as useful as a population mean.
Wouldn't a rolling average be as accurate as anything else (discounting rounding errors, I mean)? It might be kind of slow because of all the dividing.
You could group batches of numbers and average them recursively. Like average 100 numbers 100 times, then average the result. This would be less thrashing and mostly addition.
In fact, if you added 256 or 512 at once you might be able to bit-shift the result by either 8 or 9, (I believe you could do this in a double by simply changing the floating point mantissa)--this would make your program extremely quick and it could be written recursively in just a few lines of code (not counting the unsafe operation of the mantissa shift).
Perhaps dividing by 256 would already use this optimization? I may have to speed test dividing by 255 vs 256 and see if there is some massive improvement. I'm guessing not.
You mean of 32-bit and 64-bit numbers. But why not just use a proper Rational Big Num library? If you have so much data and you want an exact mean, then just code it.
class RationalBignum {
public Bignum Numerator { get; set; }
public Bignum Denominator { get; set; }
}
class BigMeanr {
public static int Main(string[] argv) {
var sum = new RationalBignum(0);
var n = new Bignum(0);
using (var s = new FileStream(argv[0])) {
using (var r = new BinaryReader(s)) {
try {
while (true) {
var flt = r.ReadSingle();
rat = new RationalBignum(flt);
sum += rat;
n++;
}
}
catch (EndOfStreamException) {
break;
}
}
}
Console.WriteLine("The mean is: {0}", sum / n);
}
}
Just remember, there are more numeric types out there than the ones your compiler offers you.
You could break the data into sets of, say, 1000 numbers, average these, and then average the averages.
This is a classic divide-and-conquer type problem.
The issue is that the average of a large set of numbers is the same
as the average of the first-half of the set, averaged with the average of the second-half of the set.
In other words:
AVG(A[1..N]) == AVG( AVG(A[1..N/2]), AVG(A[N/2..N]) )
Here is a simple, C#, recursive solution.
Its passed my tests, and should be completely correct.
public struct SubAverage
{
public float Average;
public int Count;
};
static SubAverage AverageMegaList(List<float> aList)
{
if (aList.Count <= 500) // Brute-force average 500 numbers or less.
{
SubAverage avg;
avg.Average = 0;
avg.Count = aList.Count;
foreach(float f in aList)
{
avg.Average += f;
}
avg.Average /= avg.Count;
return avg;
}
// For more than 500 numbers, break the list into two sub-lists.
SubAverage subAvg_A = AverageMegaList(aList.GetRange(0, aList.Count/2));
SubAverage subAvg_B = AverageMegaList(aList.GetRange(aList.Count/2, aList.Count-aList.Count/2));
SubAverage finalAnswer;
finalAnswer.Average = subAvg_A.Average * subAvg_A.Count/aList.Count +
subAvg_B.Average * subAvg_B.Count/aList.Count;
finalAnswer.Count = aList.Count;
Console.WriteLine("The average of {0} numbers is {1}",
finalAnswer.Count, finalAnswer.Average);
return finalAnswer;
}
The trick is that you're worried about an overflow. In that case, it all comes down to order of execution. The basic formula is like this:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
(C * A) + V
A1 = ———————————
C + 1
The danger is over the course of evaulating the sequence, while A should stay relatively manageable C will become very large.
Eventually C * A will overflow the integer or double types.
One thing we can try is to re-write it like this, to reduce the chance of an overflow:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
In this way, we never multiply C * A and only deal with smaller numbers. But the concern now is the result of the division operations. If C is very large, C/C+1 (for example) may not be meaningful when constrained to normal floating point representations. The best I can suggest is to use the largest type possible for C here.
Here's one way to do it in pseudocode:
average=first
count=1
while more:
count+=1
diff=next-average
average+=diff/count
return average
Sorry for the late comment, but isn't it the formula above provided by Joel Coehoorn rewritten wrongly?
I mean, the basic formula is right:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
A1 = ( (C * A) + V ) / ( C + 1 )
But instead of:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
shouldn't we have:
A1 = C/(C+1) * A + V/(C+1)
That would explain kastermester's post:
"My math ticks off here - You have C, which you say "go towards infinity" or at least, a really big number, then: C/(C+1) goes towards 1. A /(C+1) goes towards 0. V/(C+1) goes towards 0. All in all: A1 = 1 * 0 + 0 So put shortly A1 goes towards 0 - seems a bit off. – kastermester"
Because we would have A1 = 1 * A + 0, i.e., A1 goes towards A, which it's right.
I've been using such method for calculating averages for a long time and the aforementioned precision problems have never been an issue for me.
With floating point numbers the problem is not overflow, but loss of precision when the accumulated value gets large. Adding a small number to a huge accumulated value will result in losing most of the bits of the small number.
There is a clever solution by the author of the IEEE floating point standard himself, the Kahan summation algorithm, which deals exactly with this kind of problems by checking the error at each step and keeping a running compensation term that prevents losing the small values.
If the numbers are int's, accumulate the total in a long. If the numbers are long's ... what language are you using? In Java you could accumulate the total in a BigInteger, which is an integer which will grow as large as it needs to be. You could always write your own class to reproduce this functionality. The gist of it is just to make an array of integers to hold each "big number". When you add two numbers, loop through starting with the low-order value. If the result of the addition sets the high order bit, clear this bit and carry the one to the next column.
Another option would be to find the average of, say, 1000 numbers at a time. Hold these intermediate results, then when you're done average them all together.
Why is a sum of floating point numbers overflowing? In order for that to happen, you would need to have values near the max float value, which sounds odd.
If you were dealing with integers I'd suggest using a BigInteger, or breaking the set into multiple subsets, recursively averaging the subsets, then averaging the averages.
If you're dealing with floats, it gets a bit weird. A rolling average could become very inaccurate. I suggest using a rolling average which is only updated when you hit an overflow exception or the end of the set. So effectively dividing the set into non-overflowing sets.
Two ideas from me:
If the numbers are ints, use an arbitrary precision library like IntX - this could be too slow, though
If the numbers are floats and you know the total amount, you can divide each entry by that number and add up the result. If you use double, the precision should be sufficient.
Why not just scale the numbers (down) before computing the average?
If I were to find the mean of billions of doubles as accurately as possible, I would take the following approach (NOT TESTED):
Find out 'M', an upper bound for log2(nb_of_input_data). If there are billions of data, 50 may be a good candidate (> 1 000 000 billions capacity). Create an L1 array of M double elements. If you're not sure about M, creating an extensible list will solve the issue, but it is slower.
Also create an associated L2 boolean array (all cells set to false by default).
For each incoming data D:
int i = 0;
double localMean = D;
while (L2[i]) {
L2[i] = false;
localMean = (localMean + L1[i]) / 2;
i++;
}
L1[i] = localMean;
L2[i] = true;
And your final mean will be:
double sum = 0;
double totalWeight = 0;
for (int i = 0; i < 50) {
if (L2[i]) {
long weight = 1 << i;
sum += L1[i] * weight;
totalWeight += weight;
}
}
return sum / totalWeight;
Notes:
Many proposed solutions in this thread miss the point of lost precision.
Using binary instead of 100-group-or-whatever provides better precision, and doubles can be safely doubled or halved without losing precision!
Try this
Iterate through the numbers incrementing a counter, and adding each number to a total, until adding the next number would result in an overflow, or you run out of numbers.
( It makes no difference if the inputs are integers or floats - use the largest precision float you can and convert each input to that type)
Divide the total by the counter to get a mean ( a floating point), and add it to a temp array
If you had run out of numbers, and there is only one element in temp, that's your result.
Start over using the temp array as input, ie iteratively recurse until you reached the end condition described earlier.
depending on the range of numbers it might be a good idea to have an array where the subscript is your number and the value is the quantity of that number, you could then do your calculation from this
Related
Let say an event have the probability P to succeed. (0 < P < 1 )
and I have to make N tests to see if this happens and I want the total number of successes:
I could go
int countSuccesses = 0;
while(N-- > 0)
{
if(Random.NextDouble()<P) countSuccesses++; // NextDouble is from 0.0 to 1.0
}
But is there not a more efficient way to do this? I want to have a single formula so I just can use ONE draw random number to determine the total number of successes. (EDIT The idea of using only one draw was to get below O(n))
I want to be able call a method
GetSuccesses( n, P)
and it to be O(1)
UPDATE
I will try to go with the
MathNet.Numerics.Distributions.Binomial.Sample(P, n)
even if it might be using more then only one random number I'll guess it will be faster than O(n) even if its not O(1). I'll benchmark that. Big thanks to David and Rici.
UPDATE
The binomial sample above was O(n) so it did not help me. But thanks to a comment done by Fred I just switched to
MathNet.Numerics.Distributions.Normal.Sample(mean, stddev)
where
mean = n * P
stddev = Math.Sqrt(n * P * (1 - P));
and now it is O(1) !
Per #rici for small N you can use the CDF or PMF of the Binomial Distribution, and simply compare the random input with the probabibilities for 0,1,2..N successes.
Something like:
static void Main(string[] args)
{
var trials = 10;
var trialProbability = 0.25;
for (double p = 0; p <= 1; p += 0.01)
{
var i = GetSuccesses(trials, trialProbability, p);
Console.WriteLine($"{i} Successes out of {trials} with P={trialProbability} at {p}");
}
Console.ReadKey();
}
static int GetSuccesses(int N, double P, double rand)
{
for (int i = 0; i <= N; i++)
{
var p_of_i_successes = MathNet.Numerics.Distributions.Binomial.PMF(P, N, i);
if (p_of_i_successes >= rand)
return i;
rand -= p_of_i_successes;
}
return N;
}
I'm not going to write formula here, as it's already in wiki, and I don't really know good formatting here for such things.
Probability for each outcome can be determined by Bernulli formula
https://en.wikipedia.org/wiki/Bernoulli_trial
What you need to do is to calculate binominal coefficient, then probability calculation becomes quite easy - multiply binominal coefficient by p and q in appropriate powers. Fill in array P[0..n] that contains probability for each outcome - number of exactly i successes.
After set up go from 0 up to n and calculate rolling sum of probabilities.
Check lower/upper bounds against random value and once it's inside current interval, return the outcome.
So, deciding part will be like this:
sum=0;
for (int i = 0; i <= n; i++)
if (sum-eps < R && sum+P[i]+eps > R)
return i;
else
sum+=P[i];
Here eps is small floating point value to overcome floating point rounding issues, R is saved random value, P is an array of probabilities I mentioned before.
Unfortunately, this method is not practical for big N (20 or 100+):
you'll get quite big impact of rounding errors
random numbers generator can be not determinitive enough to cover every possible outcome with proper probabilities distribution
Based on how you've phrased the question, this is impossible to do.
Your are essentially asking how to ensure that a single coin flip (i.e. one randomized outcome) is exactly 50% heads and 50% tails, which is impossible.
Even if you were to use two random numbers, where you expect one heads and one tails; this test would fail in 50% of all cases (because you might get two heads or two tails).
Probablility is founded on the law of large numbers. This explicitly states that a small sampling cannot be expected to accurately reflect the expected outcome.
The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the law only applies (as the name indicates) when a large number of observations is considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).
When I asked this as a comment; you replied:
#Flater No, I am making N actual draws but with only one random number.
But this doesn't make sense. If you only use one random value, and keep using that same value, then every draw is obviously going to give you the exact same outcome (that same number).
The closest I can interpret your question in a way that is not impossible would be that you were mistakenly referring to a single random seed as a single random number.
A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.
For a seed to be used in a pseudorandom number generator, it does not need to be random. Because of the nature of number generating algorithms, so long as the original seed is ignored, the rest of the values that the algorithm generates will follow probability distribution in a pseudorandom manner.
However, your explicitly mentioned expectations seem to disprove that assumption. You want to do something like:
GetSuccesses( n, P, Random.NextDouble())
and you're also expecting to get a O(1) operation, which flies in the face of the law of large numbers.
If you're actually talking about having a single random seed; then your expectation are not correct.
If you make N draws, the operation is still of O(N) complexity. Whether you randomize the seed after every draw or not is irrelevant, it's always O(N).
GetSuccesses( n, P, Random.NextDouble()) is going to give you one draw, not one seed. Regardless of terminology used, your expectation of the code is not related to using the same seed for several draws.
As the question is currently phrased; what you want is impossible. Repeated comments for clarification by several commenter have not yet yielded a clearer picture.
As a sidenote, I find it very weird that you've answered to every comment except when directly asked if you are talking about a seed instead of a number (twice now).
#David and #rici pointed me to the
MathNet.Numerics.Distributions.Binomial.Sample(P, n)
A benchmark told me that it also O(n) and in par with my original
int countSuccesses = 0;
while(N-- > 0)
{
if(Random.NextDouble()<P) countSuccesses++; // NextDouble is from 0.0 to 1.0
}
But thanks to a comment done by Fred:
You could turn that random number into a gaussian sample with mean N*P, which would have the same distribution as your initial function
I just switched to
MathNet.Numerics.Distributions.Normal.Sample(mean, stddev)
where
mean = n * P
stddev = Math.Sqrt(n * P * (1 - P));
and now it is O(1) !
and the function I wanted got to be:
private int GetSuccesses(double p, int n)
{
double mean = n * p;
double stddev = Math.Sqrt(n * p * (1 - p));
double hits = MathNet.Numerics.Distributions.Normal.Sample(Random, mean, stddev);
return (int)Math.Round(hits, 0);
}
As Paul pointed out,
this is an approximation, but one that I gladly accept.
I need to calculate PI with predefined precision using this formula:
So I ended up with this solution.
private static double CalculatePIWithPrecision(int presicion)
{
if (presicion == 0)
{
return PI_ZERO_PRECISION;
}
double sum = 0;
double numberOfSumElements = Math.Pow(10, presicion + 2);
for (double i = 1; i < numberOfSumElements; i++)
{
sum += 1 / (i * i);
}
double pi = Math.Sqrt(sum * 6);
return pi;
}
So this works correct, but I faced the problem with efficiency. It's very slow with precision values 8 and higher.
Is there a better (and faster!) way to calculate PI using that formula?
double numberOfSumElements = Math.Pow(10, presicion + 2);
I'm going to talk about this strictly in practical software engineering terms, avoiding getting lost in the formal math. Just practical tips that any software engineer should know.
First observe the complexity of your code. How long it takes to execute is strictly determined by this expression. You've written an exponential algorithm, the value you calculate very rapidly goes up as presicion increases. You quote the uncomfortable number, 8 produces 10^10 or a loop that makes ten billion calculations. Yes, you notice this, that's when computers starts to take seconds to produce a result, no matter how fast they are.
Exponential algorithms are bad, they perform very poorly. You can only do worse with one that has factorial complexity, O(n!), that goes up even faster. Otherwise the complexity of many real-world problems.
Now, is that expression actually accurate? You can do this with an "elbow test", using a practical back-of-the-envelope example. Let's pick a precision of 5 digits as a target and write it out:
1.0000 + 0.2500 + 0.1111 + 0.0625 + 0.0400 + 0.0278 + ... = 1.6433
You can tell that the additions rapidly get smaller, it converges quickly. You can reason out that, once the next number you add gets small enough then it does very little to make the result more accurate. Let's say that when the next number is less than 0.00001 then it's time to stop trying to improve the result.
So you'll stop at 1 / (n * n) = 0.00001 => n * n = 100000 => n = sqrt(100000) => n ~= 316
Your expression says to stop at 10^(5+2) = 10,000,000
You can tell that you are way off, looping entirely too often and not improving the accuracy of the result with the last 9.999 million iterations.
Time to talk about the real problem, too bad that you didn't explain how you got to such a drastically wrong algorithm. But surely you discovered when testing your code that it just was not very good at calculating a more precise value for pi. So you figured that by iterating more often, you'd get a better result.
Do note that in this elbow-test, it is also very important that you are able to calculate the additions with sufficient precision. I intentionally rounded the numbers, as though it was calculated on a machine capable of performing additions with 5 digits of precision. Whatever you do, the result can never be more precise than 5 digits.
You are using the double type in your code. Directly supported by the processor, it does not have infinite precision. The one and only rule you ever need to keep in mind is that calculations with double are never more precise than 15 digits. Also memorize the rule for float, it is never more precise than 7 digits.
So no matter what value you pass for presicion, the result can never be more precise than 15 digits. That is not useful at all, you already have the value of pi accurate to 15 digits. It is Math.Pi
The one thing you need to do to fix this is using a type that has more precision than double. In fact, it needs to be a type that has arbitrary precision, it needs to be at least as accurate as the presicion value you pass. Such a type does not exist in the .NET framework. Finding a library that can provide you with one is a common question at SO.
I'm working on a simple game and I have the requirement of taking a word or phrase such as "hello world" and converting it to a series of numbers.
The criteria is:
Numbers need to be distinct
Need ability to configure maximum sequence of numbers. IE 10 numbers total.
Need ability to configure max range for each number in sequence.
Must be deterministic, that is we should get the same sequence everytime for the same input phrase.
I've tried breaking down the problem like so:
Convert characters to ASCII number code: "hello world" = 104 101 108 108 111 32 119 111 114 108 100
Remove everyother number until we satisfy total numbers (10 in this case)
Foreach number if number > max number then divide by 2 until number <= max number
If any numbers are duplicated increase or decrease the first occurence until satisfied. (This could cause a problem as you could create a duplicate by solving another duplicate)
Is there a better way of doing this or am I on the right track? As stated above I think I may run into issues with removing distinction.
If you want to limit the size of the output series - then this is impossible.
Proof:
Assume your output is a series of size k, each of range r <= M for some predefined M, then there are at most k*M possible outputs.
However, there are infinite number of inputs, and specifically there are k*M+1 different inputs.
From pigeonhole principle (where the inputs are the pigeons and the outputs are the pigeonholes) - there are 2 pigeons (inputs) in one pigeonhole (output) - so the requirement cannot be achieved.
Original answer, provides workaround without limiting the size of the output series:
You can use prime numbers, let p1,p2,... be the series of prime numbers.
Then, convert the string into series of numbers using number[i] = ascii(char[i]) * p_i
The range of each character is obviously then [0,255 * p_i]
Since for each i,j such that i != j -> p_i * x != p_j * y (for each x,y) - you get uniqueness. However, this is mainly nice theoretically as the generated numbers might grow quickly, and for practical implementation you are going to need some big number library such as java's BigInteger (cannot recall the C# equivalent)
Another possible solution (with the same relaxation of no series limitation) is:
number[i] = ascii(char[i]) + 256*(i-1)
In here the range for number[i] is [256*(i-1),256*i), and elements are still distinct.
Mathematically, it is theoretically possible to do what you want, but you won't be able to do it in C#:
If your outputs are required to be distinct, then you cannot lose any information after encoding the string using ASCII values. This means that if you limit your output size to n numbers then the numbers will have to include all information from the encoding.
So for your example
"Hello World" -> 104 101 108 108 111 32 119 111 114 108 100
you would have to preserve the meaning of each of those numbers. The simplest way to do this would just 0 pad your numbers to three digits and concatenate them together into one large number...making your result 104101108111032119111114108100 for max numbers = 1.
(You can see where the issue becomes, for arbitrary length input you need very large numbers.) So certainly it is possible to encode any arbitrary length string input to n numbers, but the numbers will become exceedingly large.
If by "numbers" you meant digits, then no you cannot have distinct outputs, as #amit explained in his example with the pidgeonhole principle.
Let's eliminate your criteria as easily as possible.
For distinct, deterministic, just use a hash code. (Hash actually isn't guaranteed to be distinct, but is highly likely to be):
string s = "hello world";
uint hash = Convert.ToUInt32(s.GetHashCode());
Note that I converted the signed int returned from GetHashCode to unsigned, to avoid the chance of having a '-' appear.
Then, for your max range per number, just convert the base.
That leaves you with the maximum sequence criteria. Without understanding your requirements better, all I can propose is truncate if necessary:
hash.toString().Substring(0, size)
Truncating leaves a chance that you'll no longer be distinct, but that must be built in as acceptable to your requirements? As amit explains in another answer, you can't have infinite input and non-infinite output.
Ok, so in one comment you've said that this is just to pick lottery numbers. In that case, you could do something like this:
public static List<int> GenNumbers(String input, int count, int maxNum)
{
List<int> ret = new List<int>();
Random r = new Random(input.GetHashCode());
for (int i = 0; i < count; ++i)
{
int next = r.Next(maxNum - i);
foreach (int picked in ret.OrderBy(x => x))
{
if (picked <= next)
++next;
else
break;
}
ret.Add(next);
}
return ret;
}
The idea is to seed a random number generator with the hash code of the String. The rest of that is just picking numbers without replacement. I'm sure it could be written more efficiently - an alternative is to generate all maxNum numbers and shuffle the first count. Warning, untested.
I know newer versions of the .Net runtime use a random String hash code algorithm (so results will differ between runs), but I believe this is opt-in. Writing your own hash algorithm is an option.
I'm messing around with Fourier transformations. Now I've created a class that does an implementation of the DFT (not doing anything like FFT atm). This is the implementation I've used:
public static Complex[] Dft(double[] data)
{
int length = data.Length;
Complex[] result = new Complex[length];
for (int k = 1; k <= length; k++)
{
Complex c = Complex.Zero;
for (int n = 1; n <= length; n++)
{
c += Complex.FromPolarCoordinates(data[n-1], (-2 * Math.PI * n * k) / length);
}
result[k-1] = 1 / Math.Sqrt(length) * c;
}
return result;
}
And these are the results I get from Dft({2,3,4})
Well it seems pretty okay, since those are the values I expect. There is only one thing I find confusing. And it all has to do with the rounding of doubles.
First of all, why are the first two numbers not exactly the same (0,8660..443 8 ) vs (0,8660..443). And why can't it calculate a zero, where you'd expect it. I know 2.8E-15 is pretty close to zero, but well it's not.
Anyone know how these, marginal, errors occur and if I can and want to do something about it.
It might seem that there's not a real problem, because it's just small errors. However, how do you deal with these rounding errors if you're for example comparing 2 values.
5,2 + 0i != 5,1961524 + i2.828107*10^-15
Cheers
I think you've already explained it to yourself - limited precision means limited precision. End of story.
If you want to clean up the results, you can do some rounding of your own to a more reasonable number of siginificant digits - then your zeros will show up where you want them.
To answer the question raised by your comment, don't try to compare floating point numbers directly - use a range:
if (Math.Abs(float1 - float2) < 0.001) {
// they're the same!
}
The comp.lang.c FAQ has a lot of questions & answers about floating point, which you might be interested in reading.
From http://support.microsoft.com/kb/125056
Emphasis mine.
There are many situations in which precision, rounding, and accuracy in floating-point calculations can work to generate results that are surprising to the programmer. There are four general rules that should be followed:
In a calculation involving both single and double precision, the result will not usually be any more accurate than single precision. If double precision is required, be certain all terms in the calculation, including constants, are specified in double precision.
Never assume that a simple numeric value is accurately represented in the computer. Most floating-point values can't be precisely represented as a finite binary value. For example .1 is .0001100110011... in binary (it repeats forever), so it can't be represented with complete accuracy on a computer using binary arithmetic, which includes all PCs.
Never assume that the result is accurate to the last decimal place. There are always small differences between the "true" answer and what can be calculated with the finite precision of any floating point processing unit.
Never compare two floating-point values to see if they are equal or not- equal. This is a corollary to rule 3. There are almost always going to be small differences between numbers that "should" be equal. Instead, always check to see if the numbers are nearly equal. In other words, check to see if the difference between them is very small or insignificant.
Note that although I referenced a microsoft document, this is not a windows problem. It's a problem with using binary and is in the CPU itself.
And, as a second side note, I tend to use the Decimal datatype instead of double: See this related SO question: decimal vs double! - Which one should I use and when?
In C# you'll want to use the 'decimal' type, not double for accuracy with decimal points.
As to the 'why'... repsensenting fractions in different base systems gives different answers. For example 1/3 in a base 10 system is 0.33333 recurring, but in a base 3 system is 0.1.
The double is a binary value, at base 2. When converting to base 10 decimal you can expect to have these rounding errors.
The following code in C# (.Net 3.5 SP1) is an infinite loop on my machine:
for (float i = 0; i < float.MaxValue; i++) ;
It reached the number 16777216.0 and 16777216.0 + 1 is evaluates to 16777216.0. Yet at this point: i + 1 != i.
This is some craziness.
I realize there is some inaccuracy in how floating point numbers are stored. And I've read that whole numbers greater 2^24 than cannot be properly stored as a float.
Still the code above, should be valid in C# even if the number cannot be properly represented.
Why does it not work?
You can get the same to happen for double but it takes a very long time. 9007199254740992.0 is the limit for double.
Right, so the issue is that in order to add one to the float, it would have to become
16777217.0
It just so happens that this is at a boundary for the radix and cannot be represented exactly as a float. (The next highest value available is 16777218.0)
So, it rounds to the nearest representable float
16777216.0
Let me put it this way:
Since you have a floating amount of precision, you have to increment up by a higher-and-higher number.
EDIT:
Ok, this is a little bit difficult to explain, but try this:
float f = float.MaxValue;
f -= 1.0f;
Debug.Assert(f == float.MaxValue);
This will run just fine, because at that value, in order to represent a difference of 1.0f, you would need over 128 bits of precision. A float has only 32 bits.
EDIT2
By my calculations, at least 128 binary digits unsigned would be necessary.
log(3.40282347E+38) * log(10) / log(2) = 128
As a solution to your problem, you could loop through two 128 bit numbers. However, this will take at least a decade to complete.
Imagine for example that a floating point number is represented by up to 2 significant decimal digits, plus an exponent: in that case, you could count from 0 to 99 exactly. The next would be 100, but because you can only have 2 significant digits that would be stored as "1.0 times 10 to the power of 2". Adding one to that would be ... what?
At best, it would be 101 as an intermediate result, which would actually be stored (via a rounding error which discards the insignificant 3rd digit) as "1.0 times 10 to the power of 2" again.
To understand what's going wrong you're going to have to read the IEEE standard on floating point
Let's examine the structure of a floating point number for a second:
A floating point number is broken into two parts (ok 3, but ignore the sign bit for a second).
You have a exponent and a mantissa. Like so:
smmmmmmmmeeeeeee
Note: that is not acurate to the number of bits, but it gives you a general idea of what's happening.
To figure out what number you have we do the following calculation:
mmmmmm * 2^(eeeeee) * (-1)^s
So what is float.MaxValue going to be? Well you're going to have the largest possible mantissa and the largest possible exponent. Let's pretend this looks something like:
01111111111111111
in actuality we define NAN and +-INF and a couple other conventions, but ignore them for a second because they're not relevant to your question.
So, what happens when you have 9.9999*2^99 + 1? Well, you do not have enough significant figures to add 1. As a result it gets rounded down to the same number. In the case of single floating point precision the point at which +1 starts to get rounded down happens to be 16777216.0
It has nothing to do with overflow, or being near the max value. The float value for 16777216.0 has a binary representation of 16777216. You then increment it by 1, so it should be 16777217.0, except that the binary representation of 16777217.0 is 16777216!!! So it doesn't actually get incremented or at least the increment doesn't do what you expect.
Here is a class written by Jon Skeet that illustrates this:
DoubleConverter.cs
Try this code with it:
double d1 = 16777217.0;
Console.WriteLine(DoubleConverter.ToExactString(d1));
float f1 = 16777216.0f;
Console.WriteLine(DoubleConverter.ToExactString(f1));
float f2 = 16777217.0f;
Console.WriteLine(DoubleConverter.ToExactString(f2));
Notice how the internal representation of 16777216.0 is the same 16777217.0!!
The iteration when i approaches float.MaxValue has i just below this value. The next iteration adds to i, but it can't hold a number bigger than float.MaxValue. Thus it holds a value much smaller, and begins the loop again.