Binary Search versus LINQ select statement - c#

I have a list of floating point data in which I want to find the index just below a passed value. A simplified example:
double[] x= {1.0, 1.4, 2.3, 5.6, 7.8};
double[] y= {3.4, 8.2, 5.3, 8.1, 0.5};
int lowerIndex = BinaryIndexSearch(x, 2.0); // should return 1
The intent is that an interpolation will then be performed with x and y using lowerIndex and lowerIndex+1.
The binary index search algorithm looks like
int BinaryIndexSearch(double[] x, double value)
{
int upper = x.Length - 1;
int lower = 0;
int pivot;
do
{
pivot = (upper + lower) / 2;
if (value >= x[pivot])
{
lower = pivot + 1;
}
else
{
upper = pivot - 1;
}
}
while (value < x[pivot] || value >= x[pivot + 1]);
return pivot;
}
Is there a more efficient way to do this with LINQ? Would it typically be faster? The comparison operation at the end of the do..while loop is the "hottest" line of code my program.

LINQ will not be more efficient than a binary search.
However, you are re-inventing the existing Array.BinarySearch method.
If the element is not found, Array.BinarySearch will return the bitwise complement (~ operator) of the location where it ought to be.

Linq is written over IEnumerable. It is not meant for performance. As a general rule of thumb all algorithms that have intimate knowledge of the data structure used will be faster than a generic solution (like LINQ is).

Related

calculations for a very large number

I have a function which calculates the factorial and combinations as follows.
int faktorial(int n)
{
if( (n == 0)||(n == 1))
{
return (1);
}
else
{
return (n * faktorial(n-1));
}
}
int Kombinasi(int x, int y)
{
int n = faktorial(x);
int k = (faktorial(x - y)) * (faktorial(y));
int hasil = n / k;
return (hasil);
}
But there is a problem that in calculating the factorial.
Suppose I want to count combination with x = 1000 and y = 4. The function of the combination of the existing call factorial function. but the factorial function is not able to count them. How to solve this problem ?. Sorry my english is very bad. thanks.
BigInteger works and is pretty fast at 1000!.
BigInteger faktorial(BigInteger n)
{
if ((n == 0) || (n == 1))
{
return (1);
}
else
{
return (n * faktorial(n - 1));
}
}
BigInteger Kombinasi(BigInteger x, BigInteger y)
{
BigInteger n = faktorial(x);
BigInteger k = (faktorial(x - y)) * (faktorial(y));
BigInteger hasil = n / k;
return (hasil);
}
Answer:
402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Note, however, that it appears to overflow the stack above around 8889!.
First, to answer your question - you can handle bigger values (up to 2^64 - 1) if you use
ulong c;
Second, a little help - that won't help you with the exercise. Even unsigned long won't be able to handle such large values. However, note that instead, to get (n choose k), you can simply calculate (n * (n - 1) * .... * (n - k + 1)) / k!, Which deals with much smaller values.
Since it looks like what you really want to do is compute a binomial coefficient, an alternative to using BigInteger is to take advantage of some of the numerical properties of factorials. So rather than computing factorials directly (which can be large), you can instead do this:
long Kombinasi(long x, long y)
{
if( y == 0 ) return 1;
return ( x * Kombinasi( x - 1, y - 1 ) ) / y;
}
You could also use this algorithm in combination with BigInteger if you need even larger values:
BigInteger Binomial( BigInteger n, BigInteger k )
{
if( k <= 0 ) return 1;
return ( n * Binomial( n - 1, k - 1 ) ) / k;
}
This will be much more efficient than computing the factorials and dividing since it takes advantage of the fact that most of the factorial terms cancel out. It will also perform fewer multiplications, especially if k is small.
As suggested by other members We can use BitInteger for big numbers.
I dont know whether it is useful or not, but I want to explain one point here.
So lets say We have a signed int which has big value(int.Max) and If you try to add some positive integer value (10), It wont give you System.OverflowException. It simply give you negative value. So If you want to raise exception in such cases. You can use checked keyword. if the expression produces a value that is outside the range of the destination type. If the expression contains one or more non-constant values, the compiler does not detect the overflow. Overflow checking can be enabled by use of the checked keyword. So when you try something like I mentioned above, It will throw exception and you can handle it accordingly.
checked in C#

'Grokkable' algorithm to understand exponentiation where the exponent is floating point

To clarify first:
2^3 = 8. That's equivalent to 2*2*2. Easy.
2^4 = 16. That's equivalent to 2*2*2*2. Also easy.
2^3.5 = 11.313708... Er, that's not so easy to grok.
Want I want is a simple algorithm which most clearly shows how 2^3.5 = 11.313708. It should preferably not use any functions apart from the basic addition, subtract, multiply, or divide operators.
The code certainly doesn't have to be fast, nor does it necessarily need to be short (though that would help). Don't worry, it can be approximate to a given user-specified accuracy (which should also be part of the algorithm). I'm hoping there will be a binary chop/search type thing going on, as that's pretty simple to grok.
So far I've found this, but the top answer is far from simple to understand on a conceptual level.
The more answers the merrier, so I can try to understand different ways of attacking the problem.
My language preference for the answer would be C#/C/C++/Java, or pseudocode for all I care.
Ok, let's implement pow(x, y) using only binary searches, addition and multiplication.
Driving y below 1
First, take this out of the way:
pow(x, y) == pow(x*x, y/2)
pow(x, y) == 1/pow(x, -y)
This is important to handle negative exponents and drive y below 1, where things start getting interesting. This reduces the problem to finding pow(x, y) where 0<y<1.
Implementing sqrt
In this answer I assume you know how to perform sqrt. I know sqrt(x) = x^(1/2), but it is easy to implement it just using a binary search to find y = sqrt(x) using y*y=x search function, e.g.:
#define EPS 1e-8
double sqrt2(double x) {
double a = 0, b = x>1 ? x : 1;
while(abs(a-b) > EPS) {
double y = (a+b)/2;
if (y*y > x) b = y; else a = y;
}
return a;
}
Finding the answer
The rationale is that every number below 1 can be approximated as a sum of fractions 1/2^x:
0.875 = 1/2 + 1/4 + 1/8
0.333333... = 1/4 + 1/16 + 1/64 + 1/256 + ...
If you find those fractions, you actually find that:
x^0.875 = x^(1/2+1/4+1/8) = x^(1/2) * x^(1/4) * x^(1/8)
That ultimately leads to
sqrt(x) * sqrt(sqrt(x)) * sqrt(sqrt(sqrt(x)))
So, implementation (in C++)
#define EPS 1e-8
double pow2(double x, double y){
if (x < 0 and abs(round(y)-y) < EPS) {
return pow2(-x, y) * ((int)round(y)%2==1 ? -1 : 1);
} else if (y < 0) {
return 1/pow2(x, -y);
} else if(y > 1) {
return pow2(x * x, y / 2);
} else {
double fraction = 1;
double result = 1;
while(y > EPS) {
if (y >= fraction) {
y -= fraction;
result *= x;
}
fraction /= 2;
x = sqrt2(x);
}
return result;
}
}
Deriving ideas from the other excellent posts, I came up with my own implementation. The answer is based on the idea that base^(exponent*accuracy) = answer^accuracy. Given that we know the base, exponent and accuracy variables beforehand, we can perform a search (binary chop or whatever) so that the equation can be balanced by finding answer. We want the exponent in both sides of the equation to be an integer (otherwise we're back to square one), so we can make accuracy any size we like, and then round it to the nearest integer afterwards.
I've given two ways of doing it. The first is very slow, and will often produce extremely high numbers which won't work with most languages. On the other hand, it doesn't use log, and is simpler conceptually.
public double powSimple(double a, double b)
{
int accuracy = 10;
bool negExponent = b < 0;
b = Math.Abs(b);
bool ansMoreThanA = (a>1 && b>1) || (a<1 && b<1); // Example 0.5^2=0.25 so answer is lower than A.
double accuracy2 = 1.0 + 1.0 / accuracy;
double total = a;
for (int i = 1; i < accuracy* b; i++) total = total*a;
double t = a;
while (true) {
double t2 = t;
for(int i = 1; i < accuracy; i++) t2 = t2 * t; // Not even a binary search. We just hunt forwards by a certain increment
if((ansMoreThanA && t2 > total) || (!ansMoreThanA && t2 < total)) break;
if (ansMoreThanA) t *= accuracy2; else t /= accuracy2;
}
if (negExponent) t = 1 / t;
return t;
}
This one below is a little more involved as it uses log(). But it is much quicker and doesn't suffer from the super-high number problems as above.
public double powSimple2(double a, double b)
{
int accuracy = 1000000;
bool negExponent= b<0;
b = Math.Abs(b);
double accuracy2 = 1.0 + 1.0 / accuracy;
bool ansMoreThanA = (a>1 && b>1) || (a<1 && b<1); // Example 0.5^2=0.25 so answer is lower than A.
double total = Math.Log(a) * accuracy * b;
double t = a;
while (true) {
double t2 = Math.Log(t) * accuracy;
if ((ansMoreThanA && t2 > total) || (!ansMoreThanA && t2 < total)) break;
if (ansMoreThanA) t *= accuracy2; else t /= accuracy2;
}
if (negExponent) t = 1 / t;
return t;
}
You can verify that 2^3.5 = 11.313708 very easily: check that 11.313708^2 = (2^3.5)^2 = 2^7 = 128
I think the easiest way to understand the computation you would actually do for this would be to refresh your understanding of logarithms - one starting point would be http://en.wikipedia.org/wiki/Logarithm#Exponentiation.
If you really want to compute non-integer powers with minimal technology one way to do that would be to express them as fractions with denominator a power of two and then take lots of square roots. E.g. x^3.75 = x^3 * x^(1/2) * x^(1/4) then x^(1/2) = sqrt(x), x^(1/4) = sqrt(sqrt(x)) and so on.
Here is another approach, based on the idea of verifying a guess. Given y, you want to find x such that x^(a/b) = y, where a and b are integers. This equation implies that x^a = y^b. You can calculate y^b, since you know both numbers. You know a, so you can - as you originally suspected - use binary chop or perhaps some numerically more efficient algorithm to solve x^a = y^b for x by simply guessing x, computing x^a for this guess, comparing it with y^b, and then iteratively improving the guess.
Example: suppose we wish to find 2^0.878 by this method. Then set a = 439, b = 500, so we wish to find 2^(439/500). If we set x=2^(439/500) we have x^500 = 2^439, so compute 2^439 and (by binary chop or otherwise) find x such that x^500 = 2^439.
Most of it comes down to being able to invert the power operation.
In other words, the basic idea is that (for example) N2 should be basically the "opposite" of N1/2 so that if you do something like:
M = N2
L = M1/2
Then the result you get in L should be the same as the original value in N (ignoring any rounding and such).
Mathematically, that means that N1/2 is the same as sqrt(N), N1/3 is the cube root of N, and so on.
The next step after that would be something like N3/2. This is pretty much the same idea: the denominator is a root, and the numerator is a power, so N3/2 is the square root of the cube of N (or the cube of the square root of N--works out the same).
With decimals, we're just expressing a fraction in a slightly different form, so something like N3.14 can be viewed as N314/100--the hundredth root of N raised to the power 314.
As far as how you compute these: there are quite a few different ways, depending heavily on the compromise you prefer between complexity (chip area, if you're implementing it in hardware) and speed. The obvious way is to use a logarithm: AB = Log-1(Log(A)*B).
For a more restricted set of inputs, such as just finding the square root of N, you can often do better than that extremely general method though. For example, the binary reducing method is quite fast--implemented in software, it's still about the same speed as Intel's FSQRT instruction.
As stated in the comments, its not clear if you want a mathematical description of how fractional powers work, or an algorithm to calculate fractional powers.
I will assume the latter.
For almost all functions (like y = 2^x) there is a means of approximating the function using a thing called the Taylor Series http://en.wikipedia.org/wiki/Taylor_series. This approximates any reasonably behaved function as a polynomial, and polynomials can be calculated using only multiplication, division, addition and subtraction (all of which the CPU can do directly). If you calculate the Taylor series for y = 2^x and plug in x = 3.5 you will get 11.313...
This almost certainly not how exponentiation is actually done on your computer. There are many algorithms which run faster for different inputs. For example, if you calculate 2^3.5 using the Taylor series, then you would have to look at many terms to calculate it with any accuracy. However, the Taylor series will converge much faster for x = 0.5 than for x = 3.5. So one obvious improvement is to calculate 2^3.5 as 2^3 * 2^0.5, as 2^3 is easy to calculate directly. Modern exponentiation algorithms will use many, many tricks to speed up processing - but the principle is still much the same, approximate the exponentiation function as some infinite sum, and calculate as many terms as you need to get the accuracy that is required.

Is there general method to solve for a single unknown if the unknown variable changes?

I have a simple algebraic relationship that uses three variables. I can guarantee that I know two of the three and need to solve for the third, but I don't necessarily know which two of the variables I will know. I'm looking for a single method or algorithm that can handle any of the cases without a huge batch of conditionals. This may not be possible, but I would like to implement it in a more general sense rather than code in every relationship in terms of the other variables.
For example, if this were the relationship:
3x - 5y + z = 5
I don't want to code this:
function(int x, int y)
{
return 5 - 3x + 5y;
}
function(int x, int z)
{
return (5 - z - 3x)/(-5);
}
And so on. Is there a standard sort of way to handle programming problems like this? Maybe using matrices, parameterization, etc?
If you restrict yourself to the kind of linear functions shown above, you could generalize the function like this
3x - 5y + z = 5
would become
a[0]*x[0] + a[1]*x[1] + a[2]*x[2] = c
with a = { 3, -5, 1 } and c = 5.
I.e., you need a list (or array) of constant factors List<double> a; and a list of variables List<double?> x; plus the constant on the right side double c;
public double Solve(IList<double> a, IList<double?> x, double c)
{
int unknowns = 0;
int unkonwnIndex = 0; // Initialization required because the compiler is not smart
// enough to infer that unknownIndex will be initialized when
// our code reaches the return statement.
double sum = 0.0;
if (a.Count != x.Count) {
throw new ArgumentException("a[] and x[] must have same length");
}
for (int i = 0; i < a.Count; i++) {
if (x[i].HasValue) {
sum += a[i] * x[i].Value;
} else {
unknowns++;
unknownIndex = i;
}
}
if (unknowns != 1) {
throw new ArgumentException("Exactly one unknown expected");
}
return (c - sum) / a[unknownIndex];
}
Example:
3x - 5y + z = 5
5 - (- 5y + z)
x = --------------
3
As seen in the example the solution consists of subtracting the sum of all terms except the unknown term from the constant and then to divide by the factor of the unknown. Therefore my solution memorizes the index of the unknown.
You can generalize with powers like this, assuming that you have the equation
a[0]*x[0]^p[0] + a[1]*x[1]^p[1] + a[2]*x[2]^p[2] = c
you need an additional parameter IList<int> p and the result becomes
return Math.Pow((c - sum) / a[unknownIndex], 1.0 / p[unknownIndex]);
as x ^ (1/n) is equal to nth-root(x).
If you use doubles for the powers, you will even be able to represent functions like
5
7*x^3 + --- + 4*sqrt(z) = 11
y^2
a = { 7, 5, 4 }, p = { 3, -2, 0.5 }, c = 11
because
1
x^(-n) = ---
x^n
and
nth-root(x) = x^(1/n)
However, you will not be able to find the roots of true non-linear polynomials like x^2 - 5x = 7. The algorithm shown above, works only, if the unknown appears exactly once in the equation.
Yes, here is one function:
private double? ValueSolved (int? x, int? y, int? z)
{
if (y.HasValue && z.HasValue && !x.HasValue
return (5 + (5 * y.Value) - z.Value) / 3;
if (x.HasValue && z.HasValue && !y.HasValue
return (5 - z.Value - (3 * x.Value)) / -5;
if (x.HasValue && y.HasValue && !z.HasValue
return 5 - (3 * x.Value) + (5 * y.Value);
return null;
}
There is no standard way of solving such a problem.
In the general case, symbolic math is a problem solved by purpose built libraries, Math.NET has a symbolic library you might be interested in: http://symbolics.mathdotnet.com/
Ironically, a much tougher problem, a system of linear equations, can be easily solved by a computer by calculating an inverse matrix. You can set up the provided equation in this manner, but there are no built-in general purpose Matrix classes in .NET.
In your specific case, you could use something like this:
public int SolveForVar(int? x, int? y, int? z)
{
int unknownCount = 0;
int currentSum = 0;
if (x.HasValue)
currentSum += 3 * x.Value;
else
unknownCount++;
if (y.HasValue)
currentSum += -5 * y.Value;
else
unknownCount++;
if (z.HasValue)
currentSum += z.Value;
else
unknownCount++;
if (unknownCount > 1)
throw new ArgumentException("Too Many Unknowns");
return 5 - currentSum;
}
int correctY = SolveForVar(10, null, 3);
Obviously that approach gets unwieldy for large variable counts, and doesn't work if you need lots of dynamic numbers or complex operations, but it could be generalized to a certain extent.
I'm not sure what you are looking for, since the question is tagged symbolic-math but the sample code you have is producing numerical solutions, not symbolic ones.
If you want to find a numerical solution for a more general case, then define a function
f(x, y, z) = 3x - 5y + z - 5
and feed it to a general root-finding algorithm to find the value of the unknown parameter(s) that will produce a root. Most root-finding implementations allow you to lock particular function parameters to fixed values before searching for a root along the unlocked dimensions of the problem.

n-th Root Algorithm

What is the fastest way to calculate the n-th root of a number?
I'm aware of the Try and Fail method, but I need a faster algorithm.
The canonical way to do this is Newton's Method. In case you don't know, the derivative of xn is nxn-1. This will come in handy. 1 is a good first guess. You want to apply it to the function a - xn
IIRC, it's superconvergent on functions of the form a - xn, but either way, it's quite fast. Also, IIRC, the warning in the wiki about it failing to converge would apply to more complex functions that have properties that the 'nice' functions you are interested in lack.
Not the fastest, but it works. Substitute your chosen type:
private static decimal NthRoot(decimal baseValue, int N)
{
if (N == 1)
return baseValue;
decimal deltaX;
decimal x = 0.1M;
do
{
deltaX = (baseValue / Pow(x, N - 1) - x) / N;
x = x + deltaX;
} while (Math.Abs(deltaX) > 0);
return x;
}
private static decimal Pow(decimal baseValue, int N)
{
for (int i = 0; i < N - 1; i++)
baseValue *= baseValue;
return baseValue;
}
Are you referring to the nth root algorithm ? This is not a try-and-fail method, but an iterative algorithm which is repeated until the required precision is reached.

Average function without overflow exception

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:
using System;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var items = new long[]
{
long.MaxValue - 100,
long.MaxValue - 200,
long.MaxValue - 300
};
try
{
var avg = items.Average();
Console.WriteLine(avg);
}
catch (OverflowException ex)
{
Console.WriteLine("can't calculate that!");
}
Console.ReadLine();
}
}
Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:
public static double Average(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
long num = 0L;
long num2 = 0L;
foreach (long num3 in source)
{
num += num3;
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
return (((double) num) / ((double) num2));
}
I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).
But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?
Thanks!!
UPDATE:
The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.
Thanks all!!
This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.
In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.
For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.
Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.
var total = BigInteger.Zero
var count = 0
for i in values
count += 1
total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?
If you're just looking for an arithmetic mean, you can perform the calculation like this:
public static double Mean(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
double count = (double)source.Count();
double mean = 0D;
foreach(long x in source)
{
mean += (double)x/count;
}
return mean;
}
Edit:
In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.
You may try the following approach:
let number of elements is N, and numbers are arr[0], .., arr[N-1].
You need to define 2 variables:
mean and remainder.
initially mean = 0, remainder = 0.
at step i you need to change mean and remainder in the following way:
mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;
after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)
If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.
// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30
List<int> diffs = new List<int>();
// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }
var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1
// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;
You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.
Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:
public sealed class RationalNumber
{
public RationalNumber()
{
this.Divisor = 1;
}
public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
{
RationalNumber result = new RationalNumber();
Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
Int64 nDivisor = c1.Divisor * c2.Divisor;
Int64 nReminder = nDividend % nDivisor;
if ( nReminder == 0 )
{
// The number is whole
result.Dividend = nDividend / nDivisor;
}
else
{
Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );
if ( nGreatestCommonDivisor != 0 )
{
nDividend = nDividend / nGreatestCommonDivisor;
nDivisor = nDivisor / nGreatestCommonDivisor;
}
result.Dividend = nDividend;
result.Divisor = nDivisor;
}
return result;
}
private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
{
Int64 nRemainder;
while ( b != 0 )
{
nRemainder = a% b;
a = b;
b = nRemainder;
}
return a;
}
// a / b = a is devidend, b is devisor
public Int64 Dividend { get; set; }
public Int64 Divisor { get; set; }
}
Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:
Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;
for ( Int32 i = 0; i < aValues.Length; ++i )
{
Int64 nReminder = aValues[ i ] % aValues.Length;
Int64 nWhole = aValues[ i ] / aValues.Length;
nAverage += nWhole;
if ( nReminder != 0 )
{
list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
}
}
RationalNumber rationalTotal = new RationalNumber();
foreach ( var rational in list )
{
rationalTotal += rational;
}
nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );
At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.
EDIT:
Why this works:
Define: A set of numbers.
if Average( A ) = SUM( A ) / LEN( A ) =>
Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>
if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).
=> so
Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;
Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.
Simple answer with LINQ...
var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();
Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)
If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.
However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...
I guess there has to be a compromise somewhere or the other. If the numbers are really getting so large then few digits of lower orders (say lower 5 digits) might not affect the result as much.
Another issue is where you don't really know the size of the dataset coming in, especially in stream/real time cases. Here I don't see any solution other then the
(previousAverage*oldCount + newValue) / (oldCount <- oldCount+1)
Here's a suggestion:
*LargestDataTypePossible* currentAverage;
*SomeSuitableDatatypeSupportingRationalValues* newValue;
*int* count;
addToCurrentAverage(value){
newValue = value/100000;
count = count + 1;
currentAverage = (currentAverage * (count-1) + newValue) / count;
}
getCurrentAverage(){
return currentAverage * 100000;
}
Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).
Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.
That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).
As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.
How about BigInteger in Visual J#.
If you're willing to sacrifice precision, you could do something like:
long num2 = 0L;
foreach (long num3 in source)
{
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
average += (double)num3 / (double)num2;
}
return average;
Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.
var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
You could keep a rolling average which you update once for each large number.
Use the IntX library on CodePlex.
NextAverage = CurrentAverage + (NewValue - CurrentAverage) / (CurrentObservations + 1)
Here is my version of an extension method that can help with this.
public static long Average(this IEnumerable<long> longs)
{
long mean = 0;
long count = longs.Count();
foreach (var val in longs)
{
mean += val / count;
}
return mean;
}
Let Avg(n) be the average in first n number, and data[n] is the nth number.
Avg(n)=(double)(n-1)/(double)n*Avg(n-1)+(double)data[n]/(double)n
Can avoid value overflow however loss precision when n is very large.
For two positive numbers (or two negative numbers) , I found a very elegant solution from here.
where an average computation of (a+b)/2 can be replaced with a+((b-a)/2.

Categories

Resources