Dividing two Ulong integers outputs wrong result - c#

I am writing a program for Catalan number. So here is the formula for that:
I decided to use the middle part of the formula, because the other parts are too abstract for my knowledge (maybe I slept too much in math classes).
Actually my program works fine for n = 0;,n = 5;, n = 10; But if I enter n = 15; - here comes the boom - the output is 2 when it should be 9694845.
So here is my child:
using System;
namespace _8_Numbers_of_Catalan
{
class CatalanNumbers
{
static void Main()
{
Console.Write("n: ");
int n = int.Parse(Console.ReadLine());
Console.WriteLine("Catalan({0})", n);
//calculating the Catan number from the formula
// Catan(n) = [(2*n)!]/[(n+1)! * n!]
Console.WriteLine((factorial(2 * n)) / (factorial(n + 1) * factorial(n)));
}//finding the factorial
private static ulong factorial(int n)
{
ulong fact = 1;
for (int i = 1; i <= n; i++)
{
fact *= (ulong)i;
}
return fact;
}
}
}
Thank you in advance for understanding me if there is something obviously wrong. I am new in programming.

That is because you are performing calculation of these using integer variables that can contain at most 64 bits.
Your call to factorial(15 * 2) is 30! which would result in a value of
265,252,859,812,191,058,636,308,480,000,000
Much more than fits in a 64 bit integer variable:
18,446,744,073,709,551,615 (0xFFFFFFFFFFFFFFFF).
The options you have are to use a System.Numerics.BigInteger type (slow) or a double (up to a maximum value of 1.7976931348623157E+308). Which means you will loose some precision, that may or may not be relevant.
Another option you have is to use an algorithm to approximate the value of large factorials using an asymptotic approximation such as the Schönhage–Strassen algorithm used by Mathematica.
You may also want to check out some existing online resources for calculation of big factorials in .NET
As a last but not least option (and I have not thoroughly checked) it seems likely to me that specific algorithms exists that allow you to calculate (or approximate to a sufficient accuracy and precision) a Catalan number.

you should use a System.Numerics.BigInteger for this. (add System.Numerics as reference in your project).
private static BigInteger factorial(int n)
{
BigInteger fact = 1;
for (int i = 1; i <= n; i++)
{
fact *= i;
}
return fact;
}
// output: 9694845

Related

Creating a power function in C# without using Math.Pow(x, y) [duplicate]

System.Numerics.BigInteger lets you multiply large integers together, but is there anything of the same type for floating point numbers? If not, is there a free library I can use?
//this but with floats
System.Numerics.BigInteger maxint = new BigInteger(int.MaxValue);
System.Numerics.BigInteger big = maxint * maxint * maxint;
System.Console.WriteLine(big);
Perhaps you're looking for BigRational? Microsoft released it under their BCL project on CodePlex. Not actually sure how or if it will fit your needs.
It keeps it as a rational number. You can get the a string with the decimal value either by casting or some multiplication.
var r = new BigRational(5000, 3768);
Console.WriteLine((decimal)r);
Console.WriteLine((double)r);
Or with a simple(ish) extension method like this:
public static class BigRationalExtensions
{
public static string ToDecimalString(this BigRational r, int precision)
{
var fraction = r.GetFractionPart();
// Case where the rational number is a whole number
if(fraction.Numerator == 0 && fraction.Denominator == 1)
{
return r.GetWholePart() + ".0";
}
var adjustedNumerator = (fraction.Numerator
* BigInteger.Pow(10, precision));
var decimalPlaces = adjustedNumerator / fraction.Denominator;
// Case where precision wasn't large enough.
if(decimalPlaces == 0)
{
return "0.0";
}
// Give it the capacity for around what we should need for
// the whole part and total precision
// (this is kinda sloppy, but does the trick)
var sb = new StringBuilder(precision + r.ToString().Length);
bool noMoreTrailingZeros = false;
for (int i = precision; i > 0; i--)
{
if(!noMoreTrailingZeros)
{
if ((decimalPlaces%10) == 0)
{
decimalPlaces = decimalPlaces/10;
continue;
}
noMoreTrailingZeros = true;
}
// Add the right most decimal to the string
sb.Insert(0, decimalPlaces%10);
decimalPlaces = decimalPlaces/10;
}
// Insert the whole part and decimal
sb.Insert(0, ".");
sb.Insert(0, r.GetWholePart());
return sb.ToString();
}
}
If it's out of the precision range of a decimal or double, they will be cast to their respective types with a value of 0.0. Also, casting to decimal, when the result is outside of its range, will cause an OverflowException to be thrown.
The extension method I wrote (which may not be the best way to calculate a fraction's decimal representation) will accurately convert it to a string, with unlimited precision. However, if the number is smaller than the precision requested, it will return 0.0, just like decimal or double would.
It should be considered what the implications would be if there were a BigFloat type.
BigFloat x = 1.0;
BigFloat y = 3.0;
BigFloat z = x / y;
The answer would be 0.333333333333333333333333333333333333333333333333333333 recurring. Forever. Infinite. Out of Memory Error.
It is easy to construct an infinite BigFloat.
However, if you are happy to stick to rational numbers, those express by the dividing one integer with another than you can use BigInteger to build a BigRational type that can provide arbitrary precision for representing any real number.
BigRational x = 1;
BigRational y = 3;
BigRational z = x / y;
This works and gives us this type:
You can just NuGet BigRational and you'll find many implementations, including once from Microsoft.

'Grokkable' algorithm to understand exponentiation where the exponent is floating point

To clarify first:
2^3 = 8. That's equivalent to 2*2*2. Easy.
2^4 = 16. That's equivalent to 2*2*2*2. Also easy.
2^3.5 = 11.313708... Er, that's not so easy to grok.
Want I want is a simple algorithm which most clearly shows how 2^3.5 = 11.313708. It should preferably not use any functions apart from the basic addition, subtract, multiply, or divide operators.
The code certainly doesn't have to be fast, nor does it necessarily need to be short (though that would help). Don't worry, it can be approximate to a given user-specified accuracy (which should also be part of the algorithm). I'm hoping there will be a binary chop/search type thing going on, as that's pretty simple to grok.
So far I've found this, but the top answer is far from simple to understand on a conceptual level.
The more answers the merrier, so I can try to understand different ways of attacking the problem.
My language preference for the answer would be C#/C/C++/Java, or pseudocode for all I care.
Ok, let's implement pow(x, y) using only binary searches, addition and multiplication.
Driving y below 1
First, take this out of the way:
pow(x, y) == pow(x*x, y/2)
pow(x, y) == 1/pow(x, -y)
This is important to handle negative exponents and drive y below 1, where things start getting interesting. This reduces the problem to finding pow(x, y) where 0<y<1.
Implementing sqrt
In this answer I assume you know how to perform sqrt. I know sqrt(x) = x^(1/2), but it is easy to implement it just using a binary search to find y = sqrt(x) using y*y=x search function, e.g.:
#define EPS 1e-8
double sqrt2(double x) {
double a = 0, b = x>1 ? x : 1;
while(abs(a-b) > EPS) {
double y = (a+b)/2;
if (y*y > x) b = y; else a = y;
}
return a;
}
Finding the answer
The rationale is that every number below 1 can be approximated as a sum of fractions 1/2^x:
0.875 = 1/2 + 1/4 + 1/8
0.333333... = 1/4 + 1/16 + 1/64 + 1/256 + ...
If you find those fractions, you actually find that:
x^0.875 = x^(1/2+1/4+1/8) = x^(1/2) * x^(1/4) * x^(1/8)
That ultimately leads to
sqrt(x) * sqrt(sqrt(x)) * sqrt(sqrt(sqrt(x)))
So, implementation (in C++)
#define EPS 1e-8
double pow2(double x, double y){
if (x < 0 and abs(round(y)-y) < EPS) {
return pow2(-x, y) * ((int)round(y)%2==1 ? -1 : 1);
} else if (y < 0) {
return 1/pow2(x, -y);
} else if(y > 1) {
return pow2(x * x, y / 2);
} else {
double fraction = 1;
double result = 1;
while(y > EPS) {
if (y >= fraction) {
y -= fraction;
result *= x;
}
fraction /= 2;
x = sqrt2(x);
}
return result;
}
}
Deriving ideas from the other excellent posts, I came up with my own implementation. The answer is based on the idea that base^(exponent*accuracy) = answer^accuracy. Given that we know the base, exponent and accuracy variables beforehand, we can perform a search (binary chop or whatever) so that the equation can be balanced by finding answer. We want the exponent in both sides of the equation to be an integer (otherwise we're back to square one), so we can make accuracy any size we like, and then round it to the nearest integer afterwards.
I've given two ways of doing it. The first is very slow, and will often produce extremely high numbers which won't work with most languages. On the other hand, it doesn't use log, and is simpler conceptually.
public double powSimple(double a, double b)
{
int accuracy = 10;
bool negExponent = b < 0;
b = Math.Abs(b);
bool ansMoreThanA = (a>1 && b>1) || (a<1 && b<1); // Example 0.5^2=0.25 so answer is lower than A.
double accuracy2 = 1.0 + 1.0 / accuracy;
double total = a;
for (int i = 1; i < accuracy* b; i++) total = total*a;
double t = a;
while (true) {
double t2 = t;
for(int i = 1; i < accuracy; i++) t2 = t2 * t; // Not even a binary search. We just hunt forwards by a certain increment
if((ansMoreThanA && t2 > total) || (!ansMoreThanA && t2 < total)) break;
if (ansMoreThanA) t *= accuracy2; else t /= accuracy2;
}
if (negExponent) t = 1 / t;
return t;
}
This one below is a little more involved as it uses log(). But it is much quicker and doesn't suffer from the super-high number problems as above.
public double powSimple2(double a, double b)
{
int accuracy = 1000000;
bool negExponent= b<0;
b = Math.Abs(b);
double accuracy2 = 1.0 + 1.0 / accuracy;
bool ansMoreThanA = (a>1 && b>1) || (a<1 && b<1); // Example 0.5^2=0.25 so answer is lower than A.
double total = Math.Log(a) * accuracy * b;
double t = a;
while (true) {
double t2 = Math.Log(t) * accuracy;
if ((ansMoreThanA && t2 > total) || (!ansMoreThanA && t2 < total)) break;
if (ansMoreThanA) t *= accuracy2; else t /= accuracy2;
}
if (negExponent) t = 1 / t;
return t;
}
You can verify that 2^3.5 = 11.313708 very easily: check that 11.313708^2 = (2^3.5)^2 = 2^7 = 128
I think the easiest way to understand the computation you would actually do for this would be to refresh your understanding of logarithms - one starting point would be http://en.wikipedia.org/wiki/Logarithm#Exponentiation.
If you really want to compute non-integer powers with minimal technology one way to do that would be to express them as fractions with denominator a power of two and then take lots of square roots. E.g. x^3.75 = x^3 * x^(1/2) * x^(1/4) then x^(1/2) = sqrt(x), x^(1/4) = sqrt(sqrt(x)) and so on.
Here is another approach, based on the idea of verifying a guess. Given y, you want to find x such that x^(a/b) = y, where a and b are integers. This equation implies that x^a = y^b. You can calculate y^b, since you know both numbers. You know a, so you can - as you originally suspected - use binary chop or perhaps some numerically more efficient algorithm to solve x^a = y^b for x by simply guessing x, computing x^a for this guess, comparing it with y^b, and then iteratively improving the guess.
Example: suppose we wish to find 2^0.878 by this method. Then set a = 439, b = 500, so we wish to find 2^(439/500). If we set x=2^(439/500) we have x^500 = 2^439, so compute 2^439 and (by binary chop or otherwise) find x such that x^500 = 2^439.
Most of it comes down to being able to invert the power operation.
In other words, the basic idea is that (for example) N2 should be basically the "opposite" of N1/2 so that if you do something like:
M = N2
L = M1/2
Then the result you get in L should be the same as the original value in N (ignoring any rounding and such).
Mathematically, that means that N1/2 is the same as sqrt(N), N1/3 is the cube root of N, and so on.
The next step after that would be something like N3/2. This is pretty much the same idea: the denominator is a root, and the numerator is a power, so N3/2 is the square root of the cube of N (or the cube of the square root of N--works out the same).
With decimals, we're just expressing a fraction in a slightly different form, so something like N3.14 can be viewed as N314/100--the hundredth root of N raised to the power 314.
As far as how you compute these: there are quite a few different ways, depending heavily on the compromise you prefer between complexity (chip area, if you're implementing it in hardware) and speed. The obvious way is to use a logarithm: AB = Log-1(Log(A)*B).
For a more restricted set of inputs, such as just finding the square root of N, you can often do better than that extremely general method though. For example, the binary reducing method is quite fast--implemented in software, it's still about the same speed as Intel's FSQRT instruction.
As stated in the comments, its not clear if you want a mathematical description of how fractional powers work, or an algorithm to calculate fractional powers.
I will assume the latter.
For almost all functions (like y = 2^x) there is a means of approximating the function using a thing called the Taylor Series http://en.wikipedia.org/wiki/Taylor_series. This approximates any reasonably behaved function as a polynomial, and polynomials can be calculated using only multiplication, division, addition and subtraction (all of which the CPU can do directly). If you calculate the Taylor series for y = 2^x and plug in x = 3.5 you will get 11.313...
This almost certainly not how exponentiation is actually done on your computer. There are many algorithms which run faster for different inputs. For example, if you calculate 2^3.5 using the Taylor series, then you would have to look at many terms to calculate it with any accuracy. However, the Taylor series will converge much faster for x = 0.5 than for x = 3.5. So one obvious improvement is to calculate 2^3.5 as 2^3 * 2^0.5, as 2^3 is easy to calculate directly. Modern exponentiation algorithms will use many, many tricks to speed up processing - but the principle is still much the same, approximate the exponentiation function as some infinite sum, and calculate as many terms as you need to get the accuracy that is required.

Lucas Lehmer optimization

I've been working to optimize the Lucas-Lehmer primality test using C# code (yes I'm doing something with Mersenne primes to calculate perfect numbers. I was wondering it is possible with the current code to make further improvements in speed. I use the System.Numerics.BigInteger class to hold the numbers, perhaps it is not the wisest, we'll see it then.
This code is actually based on the intelligence found on: http://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_test
This page (at the timestamp) section, some proof is given to optimize the division away.
The code for the LucasTest is:
public bool LucasLehmerTest(int num)
{
if (num % 2 == 0)
return num == 2;
else
{
BigInteger ss = new BigInteger(4);
for (int i = 3; i <= num; i++)
{
ss = KaratsubaSquare(ss) - 2;
ss = LucasLehmerMod(ss, num);
}
return ss == BigInteger.Zero;
}
}
Edit:
Which is faster than using ModPow from the BigInteger class as suggested by Mare Infinitus below. That implementation is:
public bool LucasLehmerTest(int num)
{
if (num % 2 == 0)
return num == 2;
else
{
BigInteger m = (BigInteger.One << num) - 1;
BigInteger ss = new BigInteger(4);
for (int i = 3; i <= num; i++)
ss = (BigInteger.ModPow(ss, 2, m) - 2) % m;
return ss == BigInteger.Zero;
}
}
The LucasLehmerMod method is implemented as follows:
public BigInteger LucasLehmerMod(BigInteger divident, int divisor)
{
BigInteger mask = (BigInteger.One << divisor) - 1; //Mask
BigInteger remainder = BigInteger.Zero;
BigInteger temporaryResult = divident;
do
{
remainder = temporaryResult & mask;
temporaryResult >>= divisor;
temporaryResult += remainder;
} while ( (temporaryResult >> divisor ) != 0 );
return (temporaryResult == mask ? BigInteger.Zero : temporaryResult);
}
What I am afraid of is that when using the BigInteger class from the .NET framework, I am bound to their calculations. Would it mean I have to create my own BigInteger class to improve it? Or can I sustain by using a KaratsubaSquare (derived from the Karatsuba algorithm) like this, what I found on Optimizing Karatsuba Implementation:
public BigInteger KaratsubaSquare(BigInteger x)
{
int n = BitLength(x);
if (n <= LOW_DIGITS) return BigInteger.Pow(x,2); //Standard square
BigInteger b = x >> n; //Higher half
BigInteger a = x - (b << n); //Lower half
BigInteger ac = KaratsubaSquare(a); // lower half * lower half
BigInteger bd = KaratsubaSquare(b); // higher half * higher half
BigInteger c = Karatsuba(a, b); // lower half * higher half
return ac + (c << (n + 1)) + (bd << (2 * n));
}
So basically, I want to look if it is possible to improve the Lucas-Lehmer test method by optimizing the for loop. However, I am a bit stuck there... Is it even possible?
Any thoughts are welcome of course.
Some extra thoughs:
I could use several threads to speed up the calculation on finding Perfect numbers. However, I have no experience (yet) with good partitioning.
I'll try to explain my thoughts (no code yet):
First I'll be generating a primetable with use of the sieve of Erathostenes. It takes about 25 ms to find primes within the range of 2 - 1 million single threaded.
What C# offers is quite astonishing. Using PLINQ with the Parallel.For method, I could run several calculations almost simultaneously, however, it chunks the primeTable array into parts which are not respected to the search.
I already figured out that the automatic load balancing of the threads is not sufficient for this task. Hence I need to try a different approach by dividing the loadbalance depending on the mersenne numbers to find and use to calculate a perfect number. Has anyone some experience with this? This page seems to be a bit helpful: http://www.drdobbs.com/windows/custom-parallel-partitioning-with-net-4/224600406
I'll be looking into it further.
As for now, my results are as following.
My current algorithm (using the standard BigInteger class from C#) can find the first 17 perfect numbers (see http://en.wikipedia.org/wiki/List_of_perfect_numbers) within 5 seconds on my laptop (an Intel I5 with 4 cores and 8GB of RAM). However, then it gets stuck and finds nothing within 10 minutes.
This is something I cannot match yet... My gut feeling (and common sense) tells me that I should look into the LucasLehmer test, since a for-loop calculating the 18th perfect number (using Mersenne Prime 3217) would run 3214 times. There is room for improvement I guess...
What Dinony posted below is a suggestion to rewrite it completely in C. I agree that would boost my performance, however I choose C# to find out it's limitations and benefits. Since it's widely used, and it's ability to rapidly develop applications, it seemed to me worthy of trying.
Could unsafe code provide benefits here as well?
One possible optimization is to use BigInteger ModPow
It really increases performance significantly.
Just a note for info...
In python, this
ss = KaratsubaSquare(ss) - 2
has worse performance than this:
ss = ss*ss - 2
What about adapting the code to C? I have no idea about the algorithm, but it is not that much code.. so the biggest run-time improvement could be adapting to C.

how to always round up to the next integer [duplicate]

This question already has answers here:
How can I ensure that a division of integers is always rounded up?
(10 answers)
Closed 6 years ago.
i am trying to find total pages in building a pager on a website (so i want the result to be an integer. i get a list of records and i want to split into 10 per page (the page count)
when i do this:
list.Count() / 10
or
list.Count() / (decimal)10
and the list.Count() =12, i get a result of 1.
How would I code it so i get 2 in this case (the remainder should always add 1)
Math.Ceiling((double)list.Count() / 10);
(list.Count() + 9) / 10
Everything else here is either overkill or simply wrong (except for bestsss' answer, which is awesome). We do not want the overhead of a function call (Math.Truncate(), Math.Ceiling(), etc.) when simple math is enough.
OP's question generalizes (pigeonhole principle) to:
How many boxes do I need to store x objects if only y objects fit into each box?
The solution:
derives from the realization that the last box might be partially empty, and
is (x + y - 1) ÷ y using integer division.
You'll recall from 3rd grade math that integer division is what we're doing when we say 5 ÷ 2 = 2.
Floating-point division is when we say 5 ÷ 2 = 2.5, but we don't want that here.
Many programming languages support integer division. In languages derived from C, you get it automatically when you divide int types (short, int, long, etc.). The remainder/fractional part of any division operation is simply dropped, thus:
5 / 2 == 2
Replacing our original question with x = 5 and y = 2 we have:
How many boxes do I need to store 5 objects if only 2 objects fit into each box?
The answer should now be obvious: 3 boxes -- the first two boxes hold two objects each and the last box holds one.
(x + y - 1) ÷ y =
(5 + 2 - 1) ÷ 2 =
6 ÷ 2 =
3
So for the original question, x = list.Count(), y = 10, which gives the solution using no additional function calls:
(list.Count() + 9) / 10
A proper benchmark or how the number may lie
Following the argument about Math.ceil(value/10d) and (value+9)/10 I ended up coding a proper non-dead code, non-interpret mode benchmark.
I've been telling that writing micro benchmark is not an easy task. The code below illustrates this:
00:21:40.109 starting up....
00:21:40.140 doubleCeil: 19444599
00:21:40.140 integerCeil: 19444599
00:21:40.140 warming up...
00:21:44.375 warmup doubleCeil: 194445990000
00:21:44.625 warmup integerCeil: 194445990000
00:22:27.437 exec doubleCeil: 1944459900000, elapsed: 42.806s
00:22:29.796 exec integerCeil: 1944459900000, elapsed: 2.363s
The benchmark is in Java since I know well how Hotspot optimizes and ensures it's a fair result. With such results, no statistics, noise or anything can taint it.
Integer ceil is insanely much faster.
The code
package t1;
import java.math.BigDecimal;
import java.util.Random;
public class Div {
static int[] vals;
static long doubleCeil(){
int[] v= vals;
long sum = 0;
for (int i=0;i<v.length;i++){
int value = v[i];
sum+=Math.ceil(value/10d);
}
return sum;
}
static long integerCeil(){
int[] v= vals;
long sum = 0;
for (int i=0;i<v.length;i++){
int value = v[i];
sum+=(value+9)/10;
}
return sum;
}
public static void main(String[] args) {
vals = new int[7000];
Random r= new Random(77);
for (int i = 0; i < vals.length; i++) {
vals[i] = r.nextInt(55555);
}
log("starting up....");
log("doubleCeil: %d", doubleCeil());
log("integerCeil: %d", integerCeil());
log("warming up...");
final int warmupCount = (int) 1e4;
log("warmup doubleCeil: %d", execDoubleCeil(warmupCount));
log("warmup integerCeil: %d", execIntegerCeil(warmupCount));
final int execCount = (int) 1e5;
{
long time = System.nanoTime();
long s = execDoubleCeil(execCount);
long elapsed = System.nanoTime() - time;
log("exec doubleCeil: %d, elapsed: %.3fs", s, BigDecimal.valueOf(elapsed, 9));
}
{
long time = System.nanoTime();
long s = execIntegerCeil(execCount);
long elapsed = System.nanoTime() - time;
log("exec integerCeil: %d, elapsed: %.3fs", s, BigDecimal.valueOf(elapsed, 9));
}
}
static long execDoubleCeil(int count){
long sum = 0;
for(int i=0;i<count;i++){
sum+=doubleCeil();
}
return sum;
}
static long execIntegerCeil(int count){
long sum = 0;
for(int i=0;i<count;i++){
sum+=integerCeil();
}
return sum;
}
static void log(String msg, Object... params){
String s = params.length>0?String.format(msg, params):msg;
System.out.printf("%tH:%<tM:%<tS.%<tL %s%n", new Long(System.currentTimeMillis()), s);
}
}
This will also work:
c = (count - 1) / 10 + 1;
I think the easiest way is to divide two integers and increase by one :
int r = list.Count() / 10;
r += (list.Count() % 10 == 0 ? 0 : 1);
No need of libraries or functions.
edited with the right code.
You can use Math.Ceiling
http://msdn.microsoft.com/en-us/library/system.math.ceiling%28v=VS.100%29.aspx
Xform to double (and back) for a simple ceil?
list.Count()/10 + (list.Count()%10 >0?1:0) - this bad, div + mod
edit 1st:
on a 2n thought that's probably faster (depends on the optimization): div * mul (mul is faster than div and mod)
int c=list.Count()/10;
if (c*10<list.Count()) c++;
edit2 scarpe all. forgot the most natural (adding 9 ensures rounding up for integers)
(list.Count()+9)/10
Check by using mod - if there is a remainder, simply increment the value by one.

Average function without overflow exception

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:
using System;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var items = new long[]
{
long.MaxValue - 100,
long.MaxValue - 200,
long.MaxValue - 300
};
try
{
var avg = items.Average();
Console.WriteLine(avg);
}
catch (OverflowException ex)
{
Console.WriteLine("can't calculate that!");
}
Console.ReadLine();
}
}
Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:
public static double Average(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
long num = 0L;
long num2 = 0L;
foreach (long num3 in source)
{
num += num3;
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
return (((double) num) / ((double) num2));
}
I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).
But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?
Thanks!!
UPDATE:
The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.
Thanks all!!
This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.
In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.
For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.
Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.
var total = BigInteger.Zero
var count = 0
for i in values
count += 1
total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?
If you're just looking for an arithmetic mean, you can perform the calculation like this:
public static double Mean(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
double count = (double)source.Count();
double mean = 0D;
foreach(long x in source)
{
mean += (double)x/count;
}
return mean;
}
Edit:
In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.
You may try the following approach:
let number of elements is N, and numbers are arr[0], .., arr[N-1].
You need to define 2 variables:
mean and remainder.
initially mean = 0, remainder = 0.
at step i you need to change mean and remainder in the following way:
mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;
after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)
If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.
// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30
List<int> diffs = new List<int>();
// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }
var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1
// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;
You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.
Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:
public sealed class RationalNumber
{
public RationalNumber()
{
this.Divisor = 1;
}
public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
{
RationalNumber result = new RationalNumber();
Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
Int64 nDivisor = c1.Divisor * c2.Divisor;
Int64 nReminder = nDividend % nDivisor;
if ( nReminder == 0 )
{
// The number is whole
result.Dividend = nDividend / nDivisor;
}
else
{
Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );
if ( nGreatestCommonDivisor != 0 )
{
nDividend = nDividend / nGreatestCommonDivisor;
nDivisor = nDivisor / nGreatestCommonDivisor;
}
result.Dividend = nDividend;
result.Divisor = nDivisor;
}
return result;
}
private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
{
Int64 nRemainder;
while ( b != 0 )
{
nRemainder = a% b;
a = b;
b = nRemainder;
}
return a;
}
// a / b = a is devidend, b is devisor
public Int64 Dividend { get; set; }
public Int64 Divisor { get; set; }
}
Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:
Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;
for ( Int32 i = 0; i < aValues.Length; ++i )
{
Int64 nReminder = aValues[ i ] % aValues.Length;
Int64 nWhole = aValues[ i ] / aValues.Length;
nAverage += nWhole;
if ( nReminder != 0 )
{
list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
}
}
RationalNumber rationalTotal = new RationalNumber();
foreach ( var rational in list )
{
rationalTotal += rational;
}
nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );
At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.
EDIT:
Why this works:
Define: A set of numbers.
if Average( A ) = SUM( A ) / LEN( A ) =>
Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>
if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).
=> so
Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;
Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.
Simple answer with LINQ...
var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();
Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)
If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.
However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...
I guess there has to be a compromise somewhere or the other. If the numbers are really getting so large then few digits of lower orders (say lower 5 digits) might not affect the result as much.
Another issue is where you don't really know the size of the dataset coming in, especially in stream/real time cases. Here I don't see any solution other then the
(previousAverage*oldCount + newValue) / (oldCount <- oldCount+1)
Here's a suggestion:
*LargestDataTypePossible* currentAverage;
*SomeSuitableDatatypeSupportingRationalValues* newValue;
*int* count;
addToCurrentAverage(value){
newValue = value/100000;
count = count + 1;
currentAverage = (currentAverage * (count-1) + newValue) / count;
}
getCurrentAverage(){
return currentAverage * 100000;
}
Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).
Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.
That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).
As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.
How about BigInteger in Visual J#.
If you're willing to sacrifice precision, you could do something like:
long num2 = 0L;
foreach (long num3 in source)
{
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
average += (double)num3 / (double)num2;
}
return average;
Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.
var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
You could keep a rolling average which you update once for each large number.
Use the IntX library on CodePlex.
NextAverage = CurrentAverage + (NewValue - CurrentAverage) / (CurrentObservations + 1)
Here is my version of an extension method that can help with this.
public static long Average(this IEnumerable<long> longs)
{
long mean = 0;
long count = longs.Count();
foreach (var val in longs)
{
mean += val / count;
}
return mean;
}
Let Avg(n) be the average in first n number, and data[n] is the nth number.
Avg(n)=(double)(n-1)/(double)n*Avg(n-1)+(double)data[n]/(double)n
Can avoid value overflow however loss precision when n is very large.
For two positive numbers (or two negative numbers) , I found a very elegant solution from here.
where an average computation of (a+b)/2 can be replaced with a+((b-a)/2.

Categories

Resources