Merging approximately equal points in dataset - c#

I'm looking for an algorithm that can quickly run through a short (<30 element) array and merge points that are approximately equal. It'll probably end up being some sort of segmentation algorithm.
The context is as follows: I'm looking for the tallest peaks in a dataset. I've already separated the tallest maximums from the dross using a one-dimensional implementation of J-SEG, but anywhere where the dataset is "flat," I get back a point for every element along the plateau. I need to be able to adaptively merge these points to a single point at the center of the plateau. (This also means I don't know how many clusters there will be.)
Sample dataset 1 (Sample/Artificial input)
Input:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
152 1625624905272.95
160 1625625128029.81
166 1625625152145.47
176 1625625104745.48
179 1625625127365.09
183 1625625152208.44
190 1625624974205.81
194 21068100428092.9
247 54686024895222.1
Ideal Output:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
159 1625625061816.08
182 1625625089631.21
194 21068100428092.9
247 54686024895222.1
Sample dataset 2 (Real input):
Input:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
154 192912098976702
156 195042451997727
161 195221254214493
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Ideal output:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
157 194391935062974
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Sample Dataset 3 (Real input)
Input:
2 299777367852602
26 263467434856928
35 293412234811901
83 242768805551742
104 226333969841383
107 227548774800053
178 229173574175201
181 229224441416751
204 244334414017228
206 245258151638118
239 198782930497571
Ideal output:
2 299777367852602
26 263467434856928 (May be merged
35 293412234811901 depending on parameters)
83 242768805551742
105.5 226941372320718
179.5 229199007795976
205 244796282827673
239 198782930497571
(Will edit in further information as needed.)

I'm not sure if this is exactly what you want, but there aren't any other answers posted yet so here we go.
I looked at it from the perspective of a graph. If I were looking at a graph and I wanted to determine which points were horizontally similar that would end up being relative to the graphs scale. So I made a function that accepts a percentage of the scale that you want to be considered the same. It then takes that percentage and multiplies it by the maximum difference between your dataset.
Additionally the similar value is always compared against the average of the currently located plateau. Once a a plateau is detected to end it adds the x's together and divides by 2 to get the middle and then takes the average y value and adds it as a final data point.
Without having access to good sample data all I have to go on is my very poor data generator that I made. But in my testing values within 1% generally eliminated about half of my data points.
Now it's important to note that this is one dimensional, x distance is completely ignored. You could very easily expand it to be two dimension as well. Also something else that I considered doing is instead of outputting a single data point to represent plateaus you could output a start and end point of the average instead.
namespace PointCondenser
{
public static class Extensions
{
static public bool AlmostEqual<T>(this T value, T value2, T epsilon)
{
return (Math.Abs((dynamic)value - value2) < epsilon);
}
}
public struct Point
{
public Point(double x, double y)
{
X = x;
Y = y;
}
public override string ToString()
{
return string.Format("{0}\t{1}", X, Y);
}
public double X;
public double Y;
}
class Program
{
static public Point RandomYPoint(int i)
{
var r = new Random();
var r2 = new Random(i);
var variance = r2.NextDouble() / 100;
return new Point(i, Math.Abs(r.NextDouble() - variance) * 100);
}
static public IEnumerable<Point> SmoothPoints(IEnumerable<Point> points, double percent)
{
if (percent <= 0 || percent >= 1)
throw new ArgumentOutOfRangeException("percent", "Percentage outside of logical bounds");
var final = new List<Point>();
var apoints = points.ToArray();
var largestDifference = apoints.Max(x => x.Y) - apoints.Min(x => x.Y);
var epsilon = largestDifference * percent;
var currentPlateau = new List<Point> { apoints[0] };
for (var i = 1; i < apoints.Length; ++i)
{
var point = apoints[i];
if (point.Y.AlmostEqual(currentPlateau.Average(x => x.Y), epsilon))
currentPlateau.Add(point);
else
{
var x = (currentPlateau[0].X + currentPlateau[currentPlateau.Count - 1].X) / 2.0;
var y = currentPlateau.Average(z => z.Y);
currentPlateau.Clear();
currentPlateau.Add(point);
final.Add(new Point(x, y));
}
}
return final;
}
static void Main(string[] args)
{
var r = new Random();
var points = new List<Point>();
for (var i = 0; i < 100; ++i)
{
for (var n = 0; n < r.Next(1, 5); ++n)
{
var p = RandomYPoint(points.Count);
points.Add(p);
Console.WriteLine(p);
}
Thread.Sleep(r.Next(10, 250));
}
Console.Write("\n\n Condensed \n\n");
var newPoints = SmoothPoints(points, .01);
foreach (var p in newPoints)
Console.WriteLine(p);
}
}
}

Another approach to clustering without parameters is to merge the closest data points.
That is in each pass you will find the smallest gap between two data points and then merge the pairs with that gap.
As a result in each pass granularity will decrease. However finding the smallest gap can be expensive unless the data points are sorted based on the attribute that you compare.

In retrospect, I could also have done this with linear regression: if the slope is close to zero, and the slope to the next point is similar to the average slope of previous points on the plateau, then register next point for merging and continue.

Related

Check if root of cubic equation is complex or not?

I use this Cubic root implementation.
I have equation #1:
x³ -2 x² -5 x + 6 = 0
It gives me 3 complex roots ({real, imaginary}):
{-2, 7.4014868308343765E-17}
{1 , -2.9605947323337506E-16}
{3 , 2.9605947323337506E-16}
But in fact, the right result should be 3 non-complex roots: -2, 1, 3.
With this case, I can test by: apply 3 complex roots to the equation, it returns non-zero result (failed); apply 3 non-complex roots to the equation, it returns zero result (passed).
But there is the case where I apply both 3-complex roots and 3-non-complex roots to the equation (e.g. 47 x³ +7 x² -52 x + 0 = 0), it return non-zero (failed).
I think what causes this issue is because of this code:
/// <summary>
/// Evaluate all cubic roots of this <c>Complex</c>.
/// </summary>
public static (Complex, Complex, Complex) CubicRoots(this Complex complex)
{
var r = Math.Pow(complex.Magnitude, 1d/3d);
var theta = complex.Phase/3;
const double shift = Constants.Pi2/3;
return (Complex.FromPolarCoordinates(r, theta),
Complex.FromPolarCoordinates(r, theta + shift),
Complex.FromPolarCoordinates(r, theta - shift));
}
I know that floating point value can lose precision when calculating (~1E-15), but the problem is the imaginary part needs to decide weather it's zero or non-zero to tell if it's complex number or not.
I can't tell the user of my app: "hey user, if you see the imaginary part is close enough to 0, you can decide for yourself that the root's not a complex number".
Currently, I use this method to check:
const int TOLERATE = 15;
bool isRemoveImaginary = System.Math.Round(root.Imaginary, TOLERATE) == 0; //Remove imaginary if it's too close to zero
But I don't know if this method is appropriate, what if the TOLERATE = 15 is not enough. Or is it the right method to solve this problem?
So I want to ask, is there any better way to tell the root is complex or not?
Thank you Mark Dickinson.
So according to Wikipedia:
delta > 0: the cubic has three distinct real roots
delta < 0: the cubic has one real root and two non-real complex
conjugate roots.
The delta D = (B*B - 4*A*A*A)/(-27*a*a)
My ideal is:
delta > 0: remove all imaginary numbers of 3 roots.
delta < 0: find the real root then remove its imaginary part if any
(to make sure it's real). Leave the other 2 roots untouched. Now I
have 2 ideas to find the real root:
Ideal #1
In theory, the real root should have imaginary = 0, but due to floating point precision, imaginary can deviate from 0 a little (e.g. imaginary = 1E-15 instead of 0). So the idea is: the 1 real root among 3 roots should have the imaginary whose value is closest to 0.
Code:
NumComplex[] arrRoot = { x1, x2, x3 };
if (delta > 0)
{
for (var idxRoot = 0; idxRoot < arrRoot.Length; ++idxRoot)
arrRoot[idxRoot] = arrRoot[idxRoot].RemoveImaginary();
}
else
{
//The root with imaginary closest to 0 should be the real root,
//the other two should be non-real.
var realRootIdx = 0;
var absClosest = double.MaxValue;
double abs;
for (var idxRoot = 0; idxRoot < arrRoot.Length; ++idxRoot)
{
abs = System.Math.Abs(arrRoot[idxRoot].GetImaginary());
if (abs < absClosest)
{
absClosest = abs;
realRootIdx = idxRoot;
}
}
arrRoot[realRootIdx] = arrRoot[realRootIdx].RemoveImaginary();
}
The code above can be wrong if there are 3 roots ({real, imaginary}) like this:
{7, -1E-99}
{3, 1E-15}//1E-15 caused by floating point precision, 1E-15 should be 0
{7, 1E-99}//My code will mistake this because this is closer to 0 than 1E-15.
Maybe if that case does happen in real life, I will come up with a better way to pick the real root.
Idea #2
Take a look at how the 3 roots calculated:
x1 = FromPolarCoordinates(r, theta);
x2 = FromPolarCoordinates(r, theta + shift);
x3 = FromPolarCoordinates(r, theta - shift);
3 roots have the form (know this by tests, not proven by math):
x1 = { A }
x2 = { B, C }
x3 = { B, -C }
Use math knowledge to prove which one among the 3 roots is the real one.
Trial #1: Maybe the root x1 = FromPolarCoordinates(r, theta) is always real? (failed) untrue because the following case proved that guess is wrong: -53 x³ + 6 x² + 14 x - 54 = 0 (Thank Mark Dickinson again)
I don't know if math can prove something like: while delta < 0: if B < 0 then x3 is real, else x1 is real?
So until I get better idea, I'll just use idea #1.

How to interpolate through 3 points/numbers with a defined number of samples? (in c#)

So for example we have 1, 5, and 10 and we want to interpolate between these with 12 points, we should get:
1.0000
1.7273
2.4545
3.1818
3.9091
4.6364
5.4545
6.3636
7.2727
8.1818
9.0909
10.0000
say we have 5, 10, and 4 and again 12 points, we should get:
5.0000
5.9091
6.8182
7.7273
8.6364
9.5455
9.4545
8.3636
7.2727
6.1818
5.0909
4.0000
This is a generalized solution that works by these principles:
Performs linear interpolation
It calculates a "floating point index" into the input array
This index is used to select 1 (if the fractional parts is very close to 0) or 2 numbers from the input array
The integer part of this index is the base input array index
The fractional part says how far towards the next array element we should move
This should work with whatever size input arrays and output collections you would need.
public IEnumerable<double> Interpolate(double[] inputs, int count)
{
double maxCountForIndexCalculation = count - 1;
for (int index = 0; index < count; index++)
{
double floatingIndex = (index / maxCountForIndexCalculation) * (inputs.Length - 1);
int baseIndex = (int)floatingIndex;
double fraction = floatingIndex - baseIndex;
if (Math.Abs(fraction) < 1e-5)
yield return inputs[baseIndex];
else
{
double delta = inputs[baseIndex + 1] - inputs[baseIndex];
yield return inputs[baseIndex] + fraction * delta;
}
}
}
It produces the two collections of outputs you showed in your question but beyond that, I have not tested it. Little error checking is performed so you should add the necessary bits.
The problem is an interpolation of two straight lines with different slopes given the end points and the intersection.
Interpolation is defined as following : In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points.
I'm tired of people giving negative points for solutions to hard problems. This is not a simply problem, but a problem that require "thinking out of the box". lets looks at the solution for following input : 1 12 34
I picked these numbers because the results are all integers
The step size L (Lower) = distance of elements from 1 to 12 = 2
The step size H (Higher) = distance of elements from 12 to 34 = 4
So the answer is : 1 3 5 7 9 11 [12] 14 18 22 26 30 34
Notice the distance between the 6th point 11 and center is 1 (half of L)
Notice the distance between the center point 12 and the 7th point is 2 (half of H)
Finally notice the distance between the 6th and 7th points is 3.
My results are scaled exactly the same as the OPs first example.
It is hard to see the sequence with the fractional inputs the OP posted. If you look at the OP first example and calculate the step distance of the first 6 points you get 0.72. The last 6 points the distance is 0.91. Then calculate the distance from the 6th point to the center is .36 (half 0.72). Then center to 7th point 0.45 (half 0.91). Excuse me for rounding the numbers a little bit.
It is a sequence problem just like the in junior high school where you learned arithmetic and geometric sequences. Then as a bonus question you got the sequence 23, 28, 33, 42,51,59,68,77,86 which turns out to be the train stations on the NYC 3rd Ave subway system. Solving problems like this you need to think "Outside the Box" which comes from the tests IBM gives to Job Applicants. These are the people who can solve the Nine Point Problem : http://www.brainstorming.co.uk/puzzles/ninedotsnj.html
I did the results when the number of points is EVEN which in you case is 12. You will need to complete the code if the number of points is ODD.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
const int NUMBER_POINTS = 12;
static void Main(string[] args)
{
List<List<float>> tests = new List<List<float>>() {
new List<float>() { 1,5, 10},
new List<float>() { 5,10, 4}
};
foreach (List<float> test in tests)
{
List<float> output = new List<float>();
float midPoint = test[1];
if(NUMBER_POINTS % 2 == 0)
{
//even number of points
//add lower numbers
float lowerDelta = (test[1] - test[0])/((NUMBER_POINTS / 2) - .5F);
for (int i = 0; i < NUMBER_POINTS / 2; i++)
{
output.Add(test[0] + (i * lowerDelta));
}
float upperDelta = (test[2] - test[1]) / ((NUMBER_POINTS / 2) - .5F); ;
for (int i = 0; i < NUMBER_POINTS / 2; i++)
{
output.Add(test[1] + (i * upperDelta) + (upperDelta / 2F));
}
}
else
{
}
Console.WriteLine("Numbers = {0}", string.Join(" ", output.Select(x => x.ToString())));
}
Console.ReadLine();
}
}
}

Fast Exp calculation: possible to improve accuracy without losing too much performance?

I am trying out the fast Exp(x) function that previously was described in this answer to an SO question on improving calculation speed in C#:
public static double Exp(double x)
{
var tmp = (long)(1512775 * x + 1072632447);
return BitConverter.Int64BitsToDouble(tmp << 32);
}
The expression is using some IEEE floating point "tricks" and is primarily intended for use in neural sets. The function is approximately 5 times faster than the regular Math.Exp(x) function.
Unfortunately, the numeric accuracy is only -4% -- +2% relative to the regular Math.Exp(x) function, ideally I would like to have accuracy within at least the sub-percent range.
I have plotted the quotient between the approximate and the regular Exp functions, and as can be seen in the graph the relative difference appears to be repeated with practically constant frequency.
Is it possible to take advantage of this regularity to improve the accuracy of the "fast exp" function further without substantially reducing the calculation speed, or would the computational overhead of an accuracy improvement outweigh the computational gain of the original expression?
(As a side note, I have also tried one of the alternative approaches proposed in the same SO question, but this approach does not seem to be computationally efficient in C#, at least not for the general case.)
UPDATE MAY 14
Upon request from #Adriano, I have now performed a very simple benchmark. I have performed 10 million computations using each of the alternative exp functions for floating point values in the range [-100, 100]. Since the range of values I am interested in spans from -20 to 0 I have also explicitly listed the function value at x = -5. Here are the results:
Math.Exp: 62.525 ms, exp(-5) = 0.00673794699908547
Empty function: 13.769 ms
ExpNeural: 14.867 ms, exp(-5) = 0.00675211846828461
ExpSeries8: 15.121 ms, exp(-5) = 0.00641270968867667
ExpSeries16: 32.046 ms, exp(-5) = 0.00673666189488182
exp1: 15.062 ms, exp(-5) = -12.3333325982094
exp2: 15.090 ms, exp(-5) = 13.708332516253
exp3: 16.251 ms, exp(-5) = -12.3333325982094
exp4: 17.924 ms, exp(-5) = 728.368055056781
exp5: 20.972 ms, exp(-5) = -6.13293614238501
exp6: 24.212 ms, exp(-5) = 3.55518353166184
exp7: 29.092 ms, exp(-5) = -1.8271053775984
exp7 +/-: 38.482 ms, exp(-5) = 0.00695945286970704
ExpNeural is equivalent to the Exp function specified in the beginning of this text. ExpSeries8 is the formulation that I originally claimed was not very efficient on .NET; when implementing it exactly like Neil it was actually very fast. ExpSeries16 is the analogous formula but with 16 multiplications instead of 8. exp1 through exp7 are the different functions from Adriano's answer below. The final variant of exp7 is a variant where the sign of x is checked; if negative the function returns 1/exp(-x) instead.
Unfortunately, neither of the expN functions listed by Adriano are sufficient in the broader negative value range I am considering. The series expansion approach by Neil Coffey seems to be more suitable in "my" value range, although it is too rapidly diverging with larger negative x, especially when using "only" 8 multiplications.
Taylor series approximations (such as the expX() functions in Adriano's answer) are most accurate near zero and can have huge errors at -20 or even -5. If the input has a known range, such as -20 to 0 like the original question, you can use a small look up table and one additional multiply to greatly improve accuracy.
The trick is to recognize that exp() can be separated into integer and fractional parts. For example:
exp(-2.345) = exp(-2.0) * exp(-0.345)
The fractional part will always be between -1 and 1, so a Taylor series approximation will be pretty accurate. The integer part has only 21 possible values for exp(-20) to exp(0), so these can be stored in a small look up table.
Try following alternatives (exp1 is faster, exp7 is more precise).
Code
public static double exp1(double x) {
return (6+x*(6+x*(3+x)))*0.16666666f;
}
public static double exp2(double x) {
return (24+x*(24+x*(12+x*(4+x))))*0.041666666f;
}
public static double exp3(double x) {
return (120+x*(120+x*(60+x*(20+x*(5+x)))))*0.0083333333f;
}
public static double exp4(double x) {
return 720+x*(720+x*(360+x*(120+x*(30+x*(6+x))))))*0.0013888888f;
}
public static double exp5(double x) {
return (5040+x*(5040+x*(2520+x*(840+x*(210+x*(42+x*(7+x)))))))*0.00019841269f;
}
public static double exp6(double x) {
return (40320+x*(40320+x*(20160+x*(6720+x*(1680+x*(336+x*(56+x*(8+x))))))))*2.4801587301e-5;
}
public static double exp7(double x) {
return (362880+x*(362880+x*(181440+x*(60480+x*(15120+x*(3024+x*(504+x*(72+x*(9+x)))))))))*2.75573192e-6;
}
Precision
Function Error in [-1...1] Error in [3.14...3.14]
exp1 0.05 1.8% 8.8742 38.40%
exp2 0.01 0.36% 4.8237 20.80%
exp3 0.0016152 0.59% 2.28 9.80%
exp4 0.0002263 0.0083% 0.9488 4.10%
exp5 0.0000279 0.001% 0.3516 1.50%
exp6 0.0000031 0.00011% 0.1172 0.50%
exp7 0.0000003 0.000011% 0.0355 0.15%
Credits
These implementations of exp() have been calculated by "scoofy" using Taylor series from a tanh() implementation of "fuzzpilz" (whoever they are, I just had these references on my code).
In case anyone wants to replicate the relative error function shown in the question, here's a way using Matlab (the "fast" exponent is not very fast in Matlab, but it is accurate):
t = 1072632447+[0:ceil(1512775*pi)];
x = (t - 1072632447)/1512775;
ex = exp(x);
t = uint64(t);
import java.lang.Double;
et = arrayfun( #(n) java.lang.Double.longBitsToDouble(bitshift(n,32)), t );
plot(x, et./ex);
Now, the period of the error exactly coincides with when the binary value of tmp overflows from the mantissa into the exponent. Let's break our data into bins by discarding the bits that become the exponent (making it periodic), and keeping only the high eight remaining bits (to make our lookup table a reasonable size):
index = bitshift(bitand(t,uint64(2^20-2^12)),-12) + 1;
Now we calculate the mean required adjustment:
relerrfix = ex./et;
adjust = NaN(1,256);
for i=1:256; adjust(i) = mean(relerrfix(index == i)); end;
et2 = et .* adjust(index);
The relative error is decreased to +/- .0006. Of course, other tables sizes are possible as well (for example, a 6-bit table with 64 entries gives +/- .0025) and the error is almost linear in table size. Linear interpolation between table entries would improve the error yet further, but at the expense of performance. Since we've already met the accuracy goal, let's avoid any further performance hits.
At this point it's some trivial editor skills to take the values computed by MatLab and create a lookup table in C#. For each computation, we add a bitmask, table lookup, and double-precision multiply.
static double FastExp(double x)
{
var tmp = (long)(1512775 * x + 1072632447);
int index = (int)(tmp >> 12) & 0xFF;
return BitConverter.Int64BitsToDouble(tmp << 32) * ExpAdjustment[index];
}
The speedup is very similar to the original code -- for my computer, this is about 30% faster compiled as x86 and about 3x as fast for x64. With mono on ideone, it's a substantial net loss (but so is the original).
Complete source code and testcase: http://ideone.com/UwNgx
using System;
using System.Diagnostics;
namespace fastexponent
{
class Program
{
static double[] ExpAdjustment = new double[256] {
1.040389835,
1.039159306,
1.037945888,
1.036749401,
1.035569671,
1.034406528,
1.033259801,
1.032129324,
1.031014933,
1.029916467,
1.028833767,
1.027766676,
1.02671504,
1.025678708,
1.02465753,
1.023651359,
1.022660049,
1.021683458,
1.020721446,
1.019773873,
1.018840604,
1.017921503,
1.017016438,
1.016125279,
1.015247897,
1.014384165,
1.013533958,
1.012697153,
1.011873629,
1.011063266,
1.010265947,
1.009481555,
1.008709975,
1.007951096,
1.007204805,
1.006470993,
1.005749552,
1.005040376,
1.004343358,
1.003658397,
1.002985389,
1.002324233,
1.001674831,
1.001037085,
1.000410897,
0.999796173,
0.999192819,
0.998600742,
0.998019851,
0.997450055,
0.996891266,
0.996343396,
0.995806358,
0.995280068,
0.99476444,
0.994259393,
0.993764844,
0.993280711,
0.992806917,
0.992343381,
0.991890026,
0.991446776,
0.991013555,
0.990590289,
0.990176903,
0.989773325,
0.989379484,
0.988995309,
0.988620729,
0.988255677,
0.987900083,
0.987553882,
0.987217006,
0.98688939,
0.98657097,
0.986261682,
0.985961463,
0.985670251,
0.985387985,
0.985114604,
0.984850048,
0.984594259,
0.984347178,
0.984108748,
0.983878911,
0.983657613,
0.983444797,
0.983240409,
0.983044394,
0.982856701,
0.982677276,
0.982506066,
0.982343022,
0.982188091,
0.982041225,
0.981902373,
0.981771487,
0.981648519,
0.981533421,
0.981426146,
0.981326648,
0.98123488,
0.981150798,
0.981074356,
0.981005511,
0.980944219,
0.980890437,
0.980844122,
0.980805232,
0.980773726,
0.980749562,
0.9807327,
0.9807231,
0.980720722,
0.980725528,
0.980737478,
0.980756534,
0.98078266,
0.980815817,
0.980855968,
0.980903079,
0.980955475,
0.981017942,
0.981085714,
0.981160303,
0.981241675,
0.981329796,
0.981424634,
0.981526154,
0.981634325,
0.981749114,
0.981870489,
0.981998419,
0.982132873,
0.98227382,
0.982421229,
0.982575072,
0.982735318,
0.982901937,
0.983074902,
0.983254183,
0.983439752,
0.983631582,
0.983829644,
0.984033912,
0.984244358,
0.984460956,
0.984683681,
0.984912505,
0.985147403,
0.985388349,
0.98563532,
0.98588829,
0.986147234,
0.986412128,
0.986682949,
0.986959673,
0.987242277,
0.987530737,
0.987825031,
0.988125136,
0.98843103,
0.988742691,
0.989060098,
0.989383229,
0.989712063,
0.990046579,
0.990386756,
0.990732574,
0.991084012,
0.991441052,
0.991803672,
0.992171854,
0.992545578,
0.992924825,
0.993309578,
0.993699816,
0.994095522,
0.994496677,
0.994903265,
0.995315266,
0.995732665,
0.996155442,
0.996583582,
0.997017068,
0.997455883,
0.99790001,
0.998349434,
0.998804138,
0.999264107,
0.999729325,
1.000199776,
1.000675446,
1.001156319,
1.001642381,
1.002133617,
1.002630011,
1.003131551,
1.003638222,
1.00415001,
1.004666901,
1.005188881,
1.005715938,
1.006248058,
1.006785227,
1.007327434,
1.007874665,
1.008426907,
1.008984149,
1.009546377,
1.010113581,
1.010685747,
1.011262865,
1.011844922,
1.012431907,
1.013023808,
1.013620615,
1.014222317,
1.014828902,
1.01544036,
1.016056681,
1.016677853,
1.017303866,
1.017934711,
1.018570378,
1.019210855,
1.019856135,
1.020506206,
1.02116106,
1.021820687,
1.022485078,
1.023154224,
1.023828116,
1.024506745,
1.025190103,
1.02587818,
1.026570969,
1.027268461,
1.027970647,
1.02867752,
1.029389072,
1.030114973,
1.030826088,
1.03155163,
1.032281819,
1.03301665,
1.033756114,
1.034500204,
1.035248913,
1.036002235,
1.036760162,
1.037522688,
1.038289806,
1.039061509,
1.039837792,
1.040618648
};
static double FastExp(double x)
{
var tmp = (long)(1512775 * x + 1072632447);
int index = (int)(tmp >> 12) & 0xFF;
return BitConverter.Int64BitsToDouble(tmp << 32) * ExpAdjustment[index];
}
static void Main(string[] args)
{
double[] x = new double[1000000];
double[] ex = new double[x.Length];
double[] fx = new double[x.Length];
Random r = new Random();
for (int i = 0; i < x.Length; ++i)
x[i] = r.NextDouble() * 40;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int j = 0; j < x.Length; ++j)
ex[j] = Math.Exp(x[j]);
sw.Stop();
double builtin = sw.Elapsed.TotalMilliseconds;
sw.Reset();
sw.Start();
for (int k = 0; k < x.Length; ++k)
fx[k] = FastExp(x[k]);
sw.Stop();
double custom = sw.Elapsed.TotalMilliseconds;
double min = 1, max = 1;
for (int m = 0; m < x.Length; ++m) {
double ratio = fx[m] / ex[m];
if (min > ratio) min = ratio;
if (max < ratio) max = ratio;
}
Console.WriteLine("minimum ratio = " + min.ToString() + ", maximum ratio = " + max.ToString() + ", speedup = " + (builtin / custom).ToString());
}
}
}
The following code should address the accuracy requirements, as for inputs in [-87,88] the results have relative error <= 1.73e-3. I do not know C#, so this is C code, but I assume conversion should be fairly straightforward.
I assume that since the accuracy requirement is low, the use of single-precision computation is fine. A classic algorithm is being used in which the computation of exp() is mapped to computation of exp2(). After argument conversion via multiplication by log2(e), exponentation by the fractional part is handled using a minimax polynomial of degree 2, while exponentation by the integral part of the argument is performed by direct manipulation of the exponent part of the IEEE-754 single-precision number.
The volatile union facilitates re-interpretation of a bit pattern as either an integer or a single-precision floating-point number, needed for the exponent manipulation. It looks like C# offers decidated re-interpretation functions for this, which is much cleaner.
The two potential performance problems are the floor() function and float->int conversion. Traditionally both were slow on x86 due to the need to handle dynamic processor state. But SSE (in particular SSE 4.1) provides instructions that allow these operations to be fast. I do not know whether C# can make use of those instructions.
/* max. rel. error <= 1.73e-3 on [-87,88] */
float fast_exp (float x)
{
volatile union {
float f;
unsigned int i;
} cvt;
/* exp(x) = 2^i * 2^f; i = floor (log2(e) * x), 0 <= f <= 1 */
float t = x * 1.442695041f;
float fi = floorf (t);
float f = t - fi;
int i = (int)fi;
cvt.f = (0.3371894346f * f + 0.657636276f) * f + 1.00172476f; /* compute 2^f */
cvt.i += (i << 23); /* scale by 2^i */
return cvt.f;
}
I have studied the paper by Nicol Schraudolph where the original C implementation of the above function was defined in more detail now. It does seem that it is probably not possible to substantially approve the accuracy of the exp computation without severely impacting the performance. On the other hand, the approximation is valid also for large magnitudes of x, up to +/- 700, which is of course advantageous.
The function implementation above is tuned to obtain minimum root mean square error. Schraudolph describes how the additive term in the tmp expression can be altered to achieve alternative approximation properties.
"exp" >= exp for all x 1072693248 - (-1) = 1072693249
"exp" <= exp for all x - 90253 = 1072602995
"exp" symmetric around exp - 45799 = 1072647449
Mimimum possible mean deviation - 68243 = 1072625005
Minimum possible root-mean-square deviation - 60801 = 1072632447
He also points out that at a "microscopic" level the approximate "exp" function exhibits stair-case behavior since 32 bits are discarded in the conversion from long to double. This means that the function is piece-wise constant on a very small scale, but the function is at least never decreasing with increasing x.
I have developed for my purposes the following function that calculates quickly and accurately the natural exponent with single precision. The function works in the entire range of float values. The code is written under Visual Studio (x86).
_declspec(naked) float _vectorcall fexp(float x)
{
static const float ct[8] = // Constants table
{
1.44269502f, // lb(e)
1.92596299E-8f, // Correction to the value lb(e)
-9.21120925E-4f, // 16*b2
0.115524396f, // 4*b1
2.88539004f, // b0
4.29496730E9f, // 2^32
0.5f, // 0.5
2.32830644E-10f // 2^-32
};
_asm
{
mov ecx,offset ct // ecx contains the address of constants tables
vmulss xmm1,xmm0,[ecx] // xmm1 = x*lb(e)
vcvtss2si eax,xmm1 // eax = round(x*lb(e)) = k
vcvtsi2ss xmm1,xmm1,eax // xmm1 = k
dec eax // of=1, if eax=80000000h, i.e. overflow
jno exp_cont // Jump to form 2^k, if k is normal
vucomiss xmm0,xmm0 // Compare xmm0 with itself to identify NaN
jp exp_end // Complete with NaN result, if x=NaN
vmovd eax,xmm0 // eax contains the bits of x
test eax,eax // sf=1, if x<0; of=0 always
exp_break: // Get 0 for x<0 or Inf for x>0
vxorps xmm0,xmm0,xmm0 // xmm0 = 0
jle exp_end // Ready, if x<<0
vrcpss xmm0,xmm0,xmm0 // xmm0 = Inf at case x>>0
exp_end: // The result at xmm0 is ready
ret // Return
exp_cont: // Continue if |x| is not too big
vfmsub231ss xmm1,xmm0,[ecx] // xmm1 = x*lb(e)-k = t/2 in the range from -0,5 to 0,5
cdq // edx=-1, if x<0, othervice edx=0
vfmadd132ss xmm0,xmm1,[ecx+4] // xmm0 = t/2 (corrected value)
and edx,8 // edx=8, if x<0, othervice edx=0
vmulss xmm2,xmm0,xmm0 // xmm2 = t^2/4 - the argument of polynomial
vmovss xmm1,[ecx+8] // Initialize the sum with highest coefficient 16*b2
lea eax,[eax+8*edx+97] // The exponent of 2^(k-31), if x>0, othervice 2^(k+33)
vfmadd213ss xmm1,xmm2,[ecx+12] // xmm1 = 4*b1+4*b2*t^2
test eax,eax // Test the sign of x
jle exp_break // Result is 0 if the exponent is too small
vfmsub213ss xmm1,xmm2,xmm0 // xmm1 = -t/2+b1*t^2+b2*t^4
cmp eax,254 // Check that the exponent is not too large
ja exp_break // Jump to set Inf if overflow
vaddss xmm1,xmm1,[ecx+16] // xmm1 = b0-t/2+b1*t^2+b2*t^4 = f(t)-t/2
shl eax,23 // eax contains the bits of 2^(k-31) or 2^(k+33)
vdivss xmm0,xmm0,xmm1 // xmm0 = t/(2*f(t)-t)
vmovd xmm2,eax // xmm2 = 2^(k-31), if x>0; otherwice 2^(k+33)
vaddss xmm0,xmm0,[ecx+24] // xmm0 = t/(2*f(t)-t)+0,5
vmulss xmm0,xmm0,xmm2 // xmm0 = e^x with shifted exponent of +-32
vmulss xmm0,xmm0,[ecx+edx+20] // xmm0 = e^x with corrected exponent
ret // Return
}
}

triggering an event with a certain probability with C#

I'm trying to simulate a realistic key press event. For that reason I'm using SendInput() method, but for greater result I need to specify the delay between keyDOWN and KeyUP events! These numbers below show the elapsed time in milliseconds between DOWN and UP events (these are real/valid):
96
95
112
111
119
104
143
96
95
104
120
112
111
88
104
119
111
103
95
104
95
127
112
143
144
142
143
128
144
112
111
112
120
128
111
135
118
147
96
135
103
64
64
87
79
112
88
111
111
112
111
104
87
95
We can simplify the output:
delay 64 - 88 ms -> 20% of a time
delay 89 - 135 ms -> 60% of a time
delay 136 - 150 ms -> 20 % of a time
How do I trigger an event according to probabilities from above? Here is the code I'm using right now:
private void button2_Click(object sender, EventArgs e)
{
textBox2.Focus();
Random r = new Random();
int rez = r.Next(0, 5); // 0,1,2,3,4 - five numbers total
if (rez == 0) // if 20% (1/5)
{
Random r2 = new Random();
textBox2.AppendText(" " + rez + " " + r2.Next(64, 88) + Environment.NewLine);
// do stuff
}
else if (rez == 4)//if 20% (1/5)
{
Random r3 = new Random();
textBox2.AppendText(" " + rez + " " + r3.Next(89, 135) + Environment.NewLine);
// do stuff
}
else // if 1 or 2 or 3 (3/5) -> 60%
{
Random r4 = new Random();
textBox2.AppendText(" " + rez + " " + r4.Next(136, 150) + Environment.NewLine);
// do stuff
}
}
There is a huge problem with this code. In theory, after millions of iterations - the resulting graph will look similar to this:
How to deal with this problem?
EDIT: the solution was to use distribution as people suggested.
here is java implementation of such code:
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextGaussian%28%29
and here is C# implementation:
How to generate normally distributed random from an integer range?
although I'd suggest to decrease the value of "deviations" a little.
here is interesting msdn article
http://blogs.msdn.com/b/ericlippert/archive/2012/02/21/generating-random-non-uniform-data-in-c.aspx
everyone thanks for help!
Sounds like you need to generate a normal distribution. The built-in .NET class generates a Uniform Distribution.
Gaussian or Normal distribution random numbers are possible using the built-in Random class by using the Box-Muller transform.
You should end up with a nice probability curve like this
(taken from http://en.wikipedia.org/wiki/Normal_distribution)
To transform a Normally Distributed random number into an integer range, the Box-Muller transform can help with this again. See this previous question and answer which describes the process and links to the mathematical proof.
This is the right idea, I just think you need to use doubles instead of ints so you can partition the probability space between 0 and 1. This will allow you to get a finer grain, as follows :
Normalise the real values by dividing all the values by the largest value
Divide the values into buckets - the more buckets, the closer the graph will be to the continuous case
Now, the larger the bucket the more chance of the event being raised. So, partition the interval [0,1] according to how many elements are in each bucket. So, if you have 20 real values, and a bucket has 5 values in it, it takes up a quarter of the interval.
On each test, generate a random number between 0-1 using Random.NextDouble() and whichever bucket the random number falls into, raise an event with that parameter. So for the numbers you provided, here are the values for 5 buckets buckets :
This is a bit much to put in a code example, but hopefully this gives the right idea
One possible approach would be to model the delays as an Exponential Distribution. The exponential distribution models the time between events that occur continuously and independently at a constant average rate - which sounds like a fair assumption given your problem.
You can estimate the parameter lambda by taking the inverse of the average of your real observed delays, and simulate the distribution using this approach, i.e.
delay = -Math.Log(random.NextDouble()) / lambda
However, looking at your sample, the data looks too "concentrated" around the mean to be a pure Exponential, so simulating that way would result in delays with the proper mean, but too spread out to match your sample.
One way to address that is to model the process as a shifted Exponential; essentially, the process is shifted by a value which represents the minimum the value can take, instead of 0 for an exponential. In code, taking the shift as the minimum observed value from your sample, this could look like this:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var min = sample.Min();
sample = sample.Select(it => it - min).ToList();
var lambda = 1d / sample.Average();
var random = new Random();
var result = new List<double>();
for (var i = 0; i < 100; i++)
{
var simulated = min - Math.Log(random.NextDouble()) / lambda;
result.Add(simulated);
Console.WriteLine(simulated);
}
A trivial alternative, which is in essence similar to Aidan's approach, is to re-sample: pick random elements from your original sample, and the result will have exactly the desired distribution:
var sample = new List<double>()
{
96,
95,
112,
111,
119,
104,
143,
96,
95,
104,
120,
112
};
var random = new Random();
var size = sample.Count();
for (var i = 0; i < 100; i++)
{
Console.WriteLine(sample[random.Next(0, size)]);
}

Average function without overflow exception

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:
using System;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var items = new long[]
{
long.MaxValue - 100,
long.MaxValue - 200,
long.MaxValue - 300
};
try
{
var avg = items.Average();
Console.WriteLine(avg);
}
catch (OverflowException ex)
{
Console.WriteLine("can't calculate that!");
}
Console.ReadLine();
}
}
Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:
public static double Average(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
long num = 0L;
long num2 = 0L;
foreach (long num3 in source)
{
num += num3;
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
return (((double) num) / ((double) num2));
}
I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).
But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?
Thanks!!
UPDATE:
The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.
Thanks all!!
This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.
In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.
For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.
Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.
var total = BigInteger.Zero
var count = 0
for i in values
count += 1
total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?
If you're just looking for an arithmetic mean, you can perform the calculation like this:
public static double Mean(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
double count = (double)source.Count();
double mean = 0D;
foreach(long x in source)
{
mean += (double)x/count;
}
return mean;
}
Edit:
In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.
You may try the following approach:
let number of elements is N, and numbers are arr[0], .., arr[N-1].
You need to define 2 variables:
mean and remainder.
initially mean = 0, remainder = 0.
at step i you need to change mean and remainder in the following way:
mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;
after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)
If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.
// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30
List<int> diffs = new List<int>();
// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }
var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1
// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;
You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.
Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:
public sealed class RationalNumber
{
public RationalNumber()
{
this.Divisor = 1;
}
public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
{
RationalNumber result = new RationalNumber();
Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
Int64 nDivisor = c1.Divisor * c2.Divisor;
Int64 nReminder = nDividend % nDivisor;
if ( nReminder == 0 )
{
// The number is whole
result.Dividend = nDividend / nDivisor;
}
else
{
Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );
if ( nGreatestCommonDivisor != 0 )
{
nDividend = nDividend / nGreatestCommonDivisor;
nDivisor = nDivisor / nGreatestCommonDivisor;
}
result.Dividend = nDividend;
result.Divisor = nDivisor;
}
return result;
}
private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
{
Int64 nRemainder;
while ( b != 0 )
{
nRemainder = a% b;
a = b;
b = nRemainder;
}
return a;
}
// a / b = a is devidend, b is devisor
public Int64 Dividend { get; set; }
public Int64 Divisor { get; set; }
}
Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:
Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;
for ( Int32 i = 0; i < aValues.Length; ++i )
{
Int64 nReminder = aValues[ i ] % aValues.Length;
Int64 nWhole = aValues[ i ] / aValues.Length;
nAverage += nWhole;
if ( nReminder != 0 )
{
list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
}
}
RationalNumber rationalTotal = new RationalNumber();
foreach ( var rational in list )
{
rationalTotal += rational;
}
nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );
At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.
EDIT:
Why this works:
Define: A set of numbers.
if Average( A ) = SUM( A ) / LEN( A ) =>
Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>
if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).
=> so
Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;
Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.
Simple answer with LINQ...
var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();
Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)
If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.
However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...
I guess there has to be a compromise somewhere or the other. If the numbers are really getting so large then few digits of lower orders (say lower 5 digits) might not affect the result as much.
Another issue is where you don't really know the size of the dataset coming in, especially in stream/real time cases. Here I don't see any solution other then the
(previousAverage*oldCount + newValue) / (oldCount <- oldCount+1)
Here's a suggestion:
*LargestDataTypePossible* currentAverage;
*SomeSuitableDatatypeSupportingRationalValues* newValue;
*int* count;
addToCurrentAverage(value){
newValue = value/100000;
count = count + 1;
currentAverage = (currentAverage * (count-1) + newValue) / count;
}
getCurrentAverage(){
return currentAverage * 100000;
}
Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).
Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.
That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).
As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.
How about BigInteger in Visual J#.
If you're willing to sacrifice precision, you could do something like:
long num2 = 0L;
foreach (long num3 in source)
{
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
average += (double)num3 / (double)num2;
}
return average;
Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.
var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
You could keep a rolling average which you update once for each large number.
Use the IntX library on CodePlex.
NextAverage = CurrentAverage + (NewValue - CurrentAverage) / (CurrentObservations + 1)
Here is my version of an extension method that can help with this.
public static long Average(this IEnumerable<long> longs)
{
long mean = 0;
long count = longs.Count();
foreach (var val in longs)
{
mean += val / count;
}
return mean;
}
Let Avg(n) be the average in first n number, and data[n] is the nth number.
Avg(n)=(double)(n-1)/(double)n*Avg(n-1)+(double)data[n]/(double)n
Can avoid value overflow however loss precision when n is very large.
For two positive numbers (or two negative numbers) , I found a very elegant solution from here.
where an average computation of (a+b)/2 can be replaced with a+((b-a)/2.

Categories

Resources