Proportionately distribute (prorate) a value across a set of values - c#

I have a need to write code that will prorate a value across a list, based on the relative weights of "basis" values in the list. Simply dividing the "basis" values by the sum of the "basis" values and then multiplying the factor by the original value to prorate works to a certain degree:
proratedValue = (basis / basisTotal) * prorationAmount;
However, the result of this calculation must then be rounded to integer values. The effect of the rounding means that the the sum of proratedValue for all items in the list may differ from the original prorationAmount.
Can anyone explain how to apply a "lossless" proration algorithm that proportionately distributes a value across a list as accurately as possible, without suffering from rounding errors?

Simple algorithm sketch here...
Have a running total which starts at zero.
Do your standard "divide basis by total basis, then multiply by proportion amount" for the first item.
Store the original value of the running total elsewhere, then add the amount you just calculated in #2.
Round both the old value and the new value of the running total to integers (don't modify the existing values, round them into separate variables), and take the difference.
The number calculated in step 4 is the value assigned to the current basis.
Repeat steps #2-5 for each basis.
This is guaranteed to have the total amount prorated equal to the input prorate amount, because you never actually modify the running total itself (you only take rounded values of it for other calculations, you don't write them back). What would have been an issue with integer rounding before is now dealt with, since the rounding error will add up over time in the running total and eventually push a value across the rounding threshold in the other direction.
Basic example:
Input basis: [0.2, 0.3, 0.3, 0.2]
Total prorate: 47
----
R used to indicate running total here:
R = 0
First basis:
oldR = R [0]
R += (0.2 / 1.0 * 47) [= 9.4]
results[0] = int(R) - int(oldR) [= 9]
Second basis:
oldR = R [9.4]
R += (0.3 / 1.0 * 47) [+ 14.1, = 23.5 total]
results[1] = int(R) - int(oldR) [23-9, = 14]
Third basis:
oldR = R [23.5]
R += (0.3 / 1.0 * 47) [+ 14.1, = 37.6 total]
results[1] = int(R) - int(oldR) [38-23, = 15]
Fourth basis:
oldR = R [37.6]
R += (0.2 / 1.0 * 47) [+ 9.4, = 47 total]
results[1] = int(R) - int(oldR) [47-38, = 9]
9+14+15+9 = 47

TL;DR algorithm with best (+20%) possible accuracy, 70% slower.
Evaulated algorithms presented in accepted answer here as well as answer to python question of similar nature.
Distribute 1 - based on Amber's algorithm
Distribute 2 - based on John Machin's algorithm
Distribute 3 - see below
Distribute 4 - optimized version of Distribute 3 (eg. removed LINQ, used arrays)
Testing results (10,000 iterations)
Algorithm | Avg Abs Diff (x lowest) | Time (x lowest)
------------------------------------------------------------------
Distribute 1 | 0.5282 (1.1992) | 00:00:00.0906921 (1.0000)
Distribute 2 | 0.4526 (1.0275) | 00:00:00.0963136 (1.0620)
Distribute 3 | 0.4405 (1.0000) | 00:00:01.1689239 (12.8889)
Distribute 4 | 0.4405 (1.0000) | 00:00:00.1548484 (1.7074)
Method 3 present has 19.9% better accuracy, for 70.7% slower execution time as expected.
Distribute 3
Makes best effort to be as accurate as possible in distributing amount.
Distribute weights as normal
Increment weights with highest error until actual distributed amount equals expected amount
Sacrifices speed for accuracy by making more then one pass through the loop.
public static IEnumerable<int> Distribute3(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var query = from w in weights
let fraction = amount * (w / totalWeight)
let integral = (int)Math.Floor(fraction)
select Tuple.Create(integral, fraction);
var result = query.ToList();
var added = result.Sum(x => x.Item1);
while (added < amount)
{
var maxError = result.Max(x => x.Item2 - x.Item1);
var index = result.FindIndex(x => (x.Item2 - x.Item1) == maxError);
result[index] = Tuple.Create(result[index].Item1 + 1, result[index].Item2);
added += 1;
}
return result.Select(x => x.Item1);
}
Distribute 4
public static IEnumerable<int> Distribute4(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var length = weights.Count();
var actual = new double[length];
var error = new double[length];
var rounded = new int[length];
var added = 0;
var i = 0;
foreach (var w in weights)
{
actual[i] = amount * (w / totalWeight);
rounded[i] = (int)Math.Floor(actual[i]);
error[i] = actual[i] - rounded[i];
added += rounded[i];
i += 1;
}
while (added < amount)
{
var maxError = 0.0;
var maxErrorIndex = -1;
for(var e = 0; e < length; ++e)
{
if (error[e] > maxError)
{
maxError = error[e];
maxErrorIndex = e;
}
}
rounded[maxErrorIndex] += 1;
error[maxErrorIndex] -= 1;
added += 1;
}
return rounded;
}
Test Harness
static void Main(string[] args)
{
Random r = new Random();
Stopwatch[] time = new[] { new Stopwatch(), new Stopwatch(), new Stopwatch(), new Stopwatch() };
double[][] results = new[] { new double[Iterations], new double[Iterations], new double[Iterations], new double[Iterations] };
for (var i = 0; i < Iterations; ++i)
{
double[] weights = new double[r.Next(MinimumWeights, MaximumWeights)];
for (var w = 0; w < weights.Length; ++w)
{
weights[w] = (r.NextDouble() * (MaximumWeight - MinimumWeight)) + MinimumWeight;
}
var amount = r.Next(MinimumAmount, MaximumAmount);
var totalWeight = weights.Sum();
var expected = weights.Select(w => (w / totalWeight) * amount).ToArray();
Action<int, DistributeDelgate> runTest = (resultIndex, func) =>
{
time[resultIndex].Start();
var result = func(weights, amount).ToArray();
time[resultIndex].Stop();
var total = result.Sum();
if (total != amount)
throw new Exception("Invalid total");
var diff = expected.Zip(result, (e, a) => Math.Abs(e - a)).Sum() / amount;
results[resultIndex][i] = diff;
};
runTest(0, Distribute1);
runTest(1, Distribute2);
runTest(2, Distribute3);
runTest(3, Distribute4);
}
}

The problem you have is to define what an "acceptable" rounding policy is, or in other words, what it is you are trying to minimize. Consider first this situation: you have only 2 identical items in your list, and are trying to allocate 3 units. Ideally, you would want to allocate the same amount to each item (1.5), but that is clearly not going to happen. The "best" you could do is likely to allocate 1 and 2, or 2 and 1. So
there might be multiple solutions to each allocation
identical items may not receive an identical allocation
Then, I chose 1 and 2 over 0 and 3 because I assume that what you want is to minimize the difference between the perfect allocation, and the integer allocation. This might not be what you consider "a good allocation", and this is a question you need to think about: what would make an allocation better than another one?
One possible value function could be to minimize the "total error", i.e. the sum of the absolute values of the differences between your allocation and the "perfect", unconstrained allocation.
It sounds to me that something inspired by Branch and Bound could work, but it's non trivial.
Assuming that Dav solution always produces an allocation that satisfies the constraint (which I'll trust is the case), I assume that it is not guaranteed to give you the "best" solution, "best" defined by whatever distance/fit metric you end up adopting. My reason for this is that this is a greedy algorithm, which in integer programming problems can lead you to solutions which are really off the optimal solution. But if you can live with a "somewhat correct" allocation, then I say, go for it! Doing it "optimally" doesn't sound trivial.
Best of luck!

Ok. I'm pretty certain that the original algorithm (as written) and the code posted (as written) doesn't quite answer the mail for the test case outlined by #Mathias.
My intended use of this algorithm is a slightly more specific application. Rather than calculating the % using (#amt / #SumAmt) as shown in the original question. I have a fixed $ amount that needs to be split or spread across multiple items based on a % split defined for each of those items. The split % sums to 100%, however, straight multiplication often results in decimals that (when forced to round to whole $) don't add up to the total amount that I'm splitting apart. This is the core of the problem.
I'm fairly certain that the original answer from #Dav doesn't work in cases where (as #Mathias described) the rounded values are equal across multiple slices. This problem with the original algorithm and code can be summed up with one test case:
Take $100 and split it 3 ways using 33.333333% as your percentage.
Using the code posted by #jtw (assuming this is an accurate implementation of the original algorithm), yields you the incorrect answer of allocating $33 to each item (resulting in an overall sum of $99), so it fails the test.
I think a more accurate algorithm might be:
Have a running total which starts at 0
For each item in the group:
Calculate the un-rounded allocation amount as ( [Amount to be Split] * [% to Split] )
Calculate the cumulative Remainder as [Remainder] + ( [UnRounded Amount] - [Rounded Amount] )
If Round( [Remainder], 0 ) > 1 OR the current item is the LAST ITEM in the list, then set the item's allocation = [Rounded Amount] + Round( [Remainder], 0 )
else set item's allocation = [Rounded Amount]
Repeat for next item
Implemented in T-SQL, it looks like this:
-- Start of Code --
Drop Table #SplitList
Create Table #SplitList ( idno int , pctsplit decimal(5, 4), amt int , roundedAmt int )
-- Test Case #1
--Insert Into #SplitList Values (1, 0.3333, 100, 0)
--Insert Into #SplitList Values (2, 0.3333, 100, 0)
--Insert Into #SplitList Values (3, 0.3333, 100, 0)
-- Test Case #2
--Insert Into #SplitList Values (1, 0.20, 57, 0)
--Insert Into #SplitList Values (2, 0.20, 57, 0)
--Insert Into #SplitList Values (3, 0.20, 57, 0)
--Insert Into #SplitList Values (4, 0.20, 57, 0)
--Insert Into #SplitList Values (5, 0.20, 57, 0)
-- Test Case #3
--Insert Into #SplitList Values (1, 0.43, 10, 0)
--Insert Into #SplitList Values (2, 0.22, 10, 0)
--Insert Into #SplitList Values (3, 0.11, 10, 0)
--Insert Into #SplitList Values (4, 0.24, 10, 0)
-- Test Case #4
Insert Into #SplitList Values (1, 0.50, 75, 0)
Insert Into #SplitList Values (2, 0.50, 75, 0)
Declare #R Float
Declare #Results Float
Declare #unroundedAmt Float
Declare #idno Int
Declare #roundedAmt Int
Declare #amt Float
Declare #pctsplit Float
declare #rowCnt int
Select #R = 0
select #rowCnt = 0
-- Define the cursor
Declare SplitList Cursor For
Select idno, pctsplit, amt, roundedAmt From #SplitList Order By amt Desc
-- Open the cursor
Open SplitList
-- Assign the values of the first record
Fetch Next From SplitList Into #idno, #pctsplit, #amt, #roundedAmt
-- Loop through the records
While ##FETCH_STATUS = 0
Begin
-- Get derived Amounts from cursor
select #unroundedAmt = ( #amt * #pctsplit )
select #roundedAmt = Round( #unroundedAmt, 0 )
-- Remainder
Select #R = #R + #unroundedAmt - #roundedAmt
select #rowCnt = #rowCnt + 1
-- Magic Happens! (aka Secret Sauce)
if ( round(#R, 0 ) >= 1 ) or ( ##CURSOR_ROWS = #rowCnt ) Begin
select #Results = #roundedAmt + round( #R, 0 )
select #R = #R - round( #R, 0 )
End
else Begin
Select #Results = #roundedAmt
End
If Round(#Results, 0) <> 0
Begin
Update #SplitList Set roundedAmt = #Results Where idno = #idno
End
-- Assign the values of the next record
Fetch Next From SplitList Into #idno, #pctsplit, #amt, #roundedAmt
End
-- Close the cursor
Close SplitList
Deallocate SplitList
-- Now do the check
Select * From #SplitList
Select Sum(roundedAmt), max( amt ),
case when max(amt) <> sum(roundedamt) then 'ERROR' else 'OK' end as Test
From #SplitList
-- End of Code --
Which yields a final result set for the test case of:
idno pctsplit amt roundedAmt
1 0.3333 100 33
2 0.3333 100 34
3 0.3333 100 33
As near as I can tell (and I've got several test cases in the code), this handles all of these situations pretty gracefully.

This is an apportionment problem, for which there are many known methods. All have certain pathologies: the Alabama paradox, the population paradox, or a failure of the quota rule. (Balinski and Young proved that no method can avoid all three.) You'll probably want one that follows the quote rule and avoids the Alabama paradox; the population paradox isn't as much of a concern since there's no much difference in the number of days per month between different years.

I think proportional distributions is the answer:
http://www.sangakoo.com/en/unit/proportional-distributions-direct-and-inverse

Related

Linear interpolation between two numbers with steps

I've a little trouble finding out how to linearly interpolate between two numbers with a defined number of intermediate steps.
Let's say I want to interpolate between 4 and 22 with 8 intermediate steps like so : Example
It's easy to figure out that it's x+2 here. But what if the starting value was 5 and the final value 532 with 12 intermediate steps? (In my special case I would need starting and ending value with 16 steps in between)
If you have two fence posts and you put k fence posts between them, you create k + 1 spaces. For instance:
| |
post1 post2
adding one posts creates two spaces
| | |
post1 post2
If you want those k + 1 spaces to be equal you can divide the total distance by k + 1 to get the distance between adjacent posts.
d = 22 - 4 = 18
k = 8
e = d / (k + 1) = 18 / 9 = 2
In your other case example, the answer is
d = 532 - 5 = 527
k = 12
e = d / (k + 1) = 527 / 13 ~ 40.5
I hesitate to produce two separate answers, but I feel this methodology is sufficiently unique from the other one. There's a useful function which may be exactly what you need which is appropriately called Mathf.Lerp().
var start = 5;
var end = 532;
var steps = 13;
for (int i = 0; i <= steps; i++) {
// The type conversion is necessary because both i and steps are integers
var value = Mathf.Lerp(start, end, i / (float)steps);
Debug.Log(value);
}
For actually doing the linear interpolation, use Mathf.MoveTowards().
For figuring out your maximum delta (i.e. the amount you want it to move each step), take the difference, and then divide it by the number of desired steps.
var start = 4;
var end = 22;
var distance = end - start;
var steps = 9; // Your example technically has 9 steps, not 8
var delta = distance / steps;
Note that this conveniently assumes your distance is a clean multiple of steps. If you don't know this is the case and it's important that you never exceed that number of steps, you may want to explicitly check for it. Here's a crude example for an integer. Floating point methods may be more complicated:
if (distance % delta > 0) { delta += 1; }

Converting from Gödel code to text

I am doing a Gödel encryption software, but I am facing a problem.
How to get the code after encrypting it? For example, the Gödel number for the symbol 0 is 6 and the Gödel number for the symbol = is 5. Thus, in their system, the Gödel number of the formula 0 = 0 is 2^6 × 3^5 × 5^6 = 243,000,000.
Note: 2^6 means 2 in power 6.
//need to make 5+6
int equalsign = 5;
int n5 = 2;
int n6 = 59;
int final;
//do caluclations
final = (Math.Pow(2, 2) * Math.Pow (3, 5) * Math.Pow(5, 59));
Console.WriteLine(final);
How to get the 0 = 0 from 243,000,000? I am having problems with this, any suggestion how to make it or for the code?
final = (Math.Pow(2, 2) * Math.Pow (3, 5) * Math.Pow(5, 59));
This does not work too. Any way how to fix this as well?
This is Gödel formula:
Image 1
The general idea is to keep dividing the number final by the current prime p so long as it is evenly divisble by p. The number of times you can divide evenly by each prime gives you the powers.
roughly speaking you're looking for something like the following pseudocode:
code = []
for each prime p:
c = 0
while final % p == 0:
c++
final = final / p
code.append(c)

generate seemingly random unique numeric ID in SQL Server

I need to use SQL Server to generate seemingly random unique 8-digit numeric ID (can pad zeros at front). Is there a built-in functionality for this? I saw this Identity property, but it is sequential, not random.
If this is not possible, is it good practice to directly write a randomly generated ID to db then check for exception? (note that my app is multi-threaded, so checking before writing doesn't guarantee uniqueness unless done in atomic action.)
Thanks!
UPDATE: Added "numeric" to clarify.
Edited to show that the randomness doesn't need to be cryptographically strong or anything near. Just seemingly random is good enough. Oliver suggested an elegant solution, and I've posted an answer using that approach. Thanks, Oliver!
Randomness clashes with uniqueness, but there is an elegant solution suggested by #Oliver when the numbers only need to appear random, while an underlying order exists. From Erics' http://ericlippert.com/2013/11/14/a-practical-use-of-multiplicative-inverses/, the main idea is that for given a pair of coprime, positive integers x and m, we can find a multiplicative inverse y where (x*y) % m == 1. This is very useful because given a database row ID z, we can map z to another integer by doing encoded = (z*x) % m. Now given this encoded, how can we get z back? Simple, z = (encoded * y) % m since (x*y*z) % m == z given z < m. This one-to-one correspondence guarantees uniqueness of the "encoded" while provides an apparance of randomness.
Note that Eric showed how to calculate this multiplicative inverse. But if you are lazy, there is this.
In my implementation, I just store the sequential ID of each row as it is. Then, each ID is mapped to another number, something simlar to the "InvoiceNumber" in the article. When the customer hands you back this "InvoiceNumber", you can map it back to its original database ID by using multiplicative inverse.
Below is a C# example of encoding and decoding sequence from 0 to 9.
public static void SeeminglyRandomSequence()
{ //use long to prevent overflow
long m = 10; //modulo, choose m to be much larger than number of rows
long x = 7; //anything coprime to m
long y = 3; //multiplicative inverse of x, where (y*x) % m == 1
List<long> encodedSequence = new List<long>();
List<long> decodedSequence = new List<long>();
for (long i = 0; i < m; i++)
{
long encoded = (i * x) % m;
encodedSequence.Add(encoded);
}
foreach (long encoded in encodedSequence)
{
long decoded = (encoded * y) % m;
decodedSequence.Add(decoded);
}
Debug.WriteLine("just encoded sequence from 0 to {0}. Result shown below:", (m - 1));
Debug.WriteLine("encoded sequence: " + string.Join(" ", encodedSequence));
Debug.WriteLine("decoded sequence: " + string.Join(" ", decodedSequence));
}
The printed result is:
just encoded sequence from 0 to 9. Result shown below:
encoded sequence: 0 7 4 1 8 5 2 9 6 3
decoded sequence: 0 1 2 3 4 5 6 7 8 9
As you can see, each input is mapped to a unique output, and it's easy to reverse this mapping. In your application, you might want to start with 1 since 0 always maps to itself.
Just to show the "apparent randomness" for larger m, below are the first 10 mappings when m=100,000,000:
just encoded sequence from 1 to 10. Result shown below:
encoded sequence: 81654327 63308654 44962981 26617308 8271635 89925962 71580289 53234616 34888943 16543270
decoded sequence: 1 2 3 4 5 6 7 8 9 10
USE the below query to create 8 digit randow unique number.
SELECT CAST(RAND() * 100000000 AS INT) AS [RandomNumber]
To avoid exception while inserting of existing number into DB use below query.
IF NOT EXIST(SELECT UniqueColumnID FROM TABLENAME WHERE UniqueColumnID = #RandowNumber)
BEGIN
--Insert query using #RandowNumber.
END
you can use NEWID() to generate uniqueIdentifier data which always random and unique each time
To get 8 character you can use substring, left etc function.
select substring( cast( NEWID() as varchar(100)),0,8)
or new logic for uniqueness:- http://forums.asp.net/t/1474299.aspx?How+to+generate+unique+key+of+fixed+length+20+digit+in+sql+server+2005+
select Left(NewID(),4)+Right(NewId(),4)
you can use random() function for this too.
check this links:
How do I generate random number for each row in a TSQL Select?
How to get numeric random uniqueid in SQL Server
Updated
If you want to unique value int data-type and 8 character long. Good to make a identity column like below, which is good for 8 character length data means (10,000,000). But after that it gives you exception. So be carefull which logic you want.(Still I say, its bad idea). store as random value as above with more length so uniqueness comes.
create table temp (id numeric(8,0) IDENTITY(1,1) NOT NULL, value1 varchar ) --if you want do not stop after 8 character , give int datatype.
insert into temp values( 'a'), ('b'), ('c')
select * from temp
drop table temp
Finally
It's not guaranteed unique but it's hard to get duplicates with NEWID()(refer above link forums.asp.net)
Create a SQL function or procedure as follow:
ALTER FUNCTION [dbo].[GenerateRandomNo]
(
#Lower INT = 111111111,
#Upper INT = 999999999
)
RETURNS NVARCHAR(128)
AS
BEGIN
DECLARE #TempRandom FLOAT
DECLARE #Random NVARCHAR(128);
-- Add the T-SQL statements to compute the return value here
SELECT #TempRandom = RandomNo from RandomNo
SELECT #Random = CONVERT(NVARCHAR(128),CONVERT(INT,ROUND(((#Upper - #Lower -1) * #TempRandom + #Lower), 0)))
WHILE EXISTS(SELECT * FROM Table WHERE Column = #Random)
BEGIN
SELECT #TempRandom = RandomNo from RandomNo
SELECT #Random = CONVERT(NVARCHAR(128),CONVERT(INT, ROUND(((#Upper - #Lower -1) * #TempRandom + #Lower), 0)))
END
-- Return the result of the function
RETURN #Random
END
And then call that function passing parameter if you want to generate random no with specific length or range.
--create
-- table Tbl( idx int)
DECLARE
#unique_id int
SET #unique_id= ( SELECT ROUND( 89999999 * RAND(
) + 10000000 , 0
)
)
IF not EXISTS( SELECT idx
FROM tbl
WHERE idx = #unique_id
)
BEGIN
INSERT INTO tbl( idx
)
VALUES( #unique_id
)
SELECT #unique_id, * FROM tbl
END;
--TRUNCATE TABLE tbl

Google Code Jam 2013 R1B - Falling Diamonds

Yesterdays Code Jam had a question titled Falling Diamonds. The full text can be found here, but in summary:
Diamonds fall down the Y axis.
If a diamond hits point to point with another diamond, there is a 50/50 chance it will slide to the right or left, provided it is not blocked from doing so.
If a diamond is blocked from sliding one direction, it will always slide the other way.
If a diamond is blocked in both directions, it will stop and rest on the blocking diamonds.
If a diamond hits the ground, it will bury itself half way, then stop.
The orientation of the diamond never changes, i.e. it will slide or sink, but not tumble.
The objective is to find the probability that a diamond will rest at a given coordinate, assuming N diamonds fall.
The above requirements basically boil down to the diamonds building successively larger pyramids, one layer at a time.
Suffice to say, I have not been able to solve this problem to google’s satisfaction. I get the sample from the problem description correct, but fail on the actual input files. Ideally I would like to see a matched input and correct output file that I can play with to try and find my error. Barring that, I would also welcome comments on my code.
In general, my approach is to find how many layers are needed to have one which contains the coordinate. Once I know which layer I am looking at, I can determine a number of values relevant to the layer and point we are trying to reach. Such as how many diamonds are in the pyramid when this layer is empty, how many diamonds can stack up on a side before the rest are forced the other way, how many have to slide in the same direction to reach the desired point, etc.
I then check to see if the number of diamonds dropping either makes it impossible to reach the point (probability 0), or guarantees we will cover the point (probability 1). The challenge is in the middle ground where it is possible but not guaranteed.
For the middle ground, I first check to see if we are dropping enough to potentially fill a side and force remaining drops to slide in the opposite direction. Reason being that in this condition we can guarantee that a certain number of diamonds will slide to each side, which reduces the number of drops we have to worry about, and resolves the problem of the probability changing when a side gets full. Example: if 12 diamonds drop it is guaranteed that each side of the outer layer will have 2 or more diamonds in it, whether a given side has 2, 3, or 4 depends on the outcome of just 2 drops, not of all 6 that fall in this layer.
Once I know how many drops are relevant to success, and the number that have to break the same way in order to cover the point, I sum the probabilities that the requisite number, or more, will go the same way.
As I said, I can solve the sample in the problem description, but I am not getting the correct output for the input files. Unfortunately I have not been able to find anything telling me what the correct output is so that I can compare it to what I am getting. Here is my code (I have spent a fair amount of time since the contest ended trying to tune this for success and adding comments to keep from getting myself lost):
protected string Solve(string Line)
{
string[] Inputs = Line.Split();
int N = int.Parse(Inputs[0]);
int X = int.Parse(Inputs[1]);
int Y = int.Parse(Inputs[2]);
int AbsX = X >= 0 ? X : -X;
int SlideCount = AbsX + Y; //number that have to stack up on one side of desired layer in order to force the remaining drops to slide the other way.
int LayerCount = (SlideCount << 1) | 1; //Layer is full when both sides have reached slidecount, and one more drops
int Layer = SlideCount >> 1; //Zero based Index of the layer is 1/2 the slide count
int TotalLayerEmpty = ((Layer * Layer) << 1) - Layer; //Total number of drops required to fill the layer below the desired layer
int LayerDrops = N - TotalLayerEmpty; //how many will drop in this layer
int MinForTarget; //Min number that have to be in the layer to hit the target location, i.e. all fall to correct side
int TargetCovered; //Min number that have to be in the layer to guarantee the target is covered
if (AbsX == 0)
{//if target X is 0 we need the layer to be full for coverage (top one would slide off until both sides were full)
MinForTarget = TargetCovered = LayerCount;
}
else
{
MinForTarget = Y + 1; //Need Y + 1 to hit an altitude of Y
TargetCovered = MinForTarget + SlideCount; //Min number that have to be in the layer to guarantee the target is covered
}
if (LayerDrops >= TargetCovered)
{//if we have enough dropping to guarantee the target is covered, probability is 1
return "1.0";
}
else if (LayerDrops < MinForTarget)
{//if we do not have enough dropping to reach the target under any scenario, probability is 0
return "0.0";
}
else
{//We have enough dropping that reaching the target is possible, but not guaranteed
int BalancedDrops = LayerDrops > SlideCount ? LayerDrops - SlideCount : 0; //guaranteed to have this many on each side
int CriticalDrops = LayerDrops - (BalancedDrops << 1);//the number of drops relevant to the probablity of success
int NumToSucceed = MinForTarget - BalancedDrops;//How many must break our way for success
double SuccessProb = 0;//Probability that the number of diamonds sliding the same way is between NumToSucceed and CriticalDrops
double ProbI;
for (int I = NumToSucceed; I <= CriticalDrops; I++)
{
ProbI = Math.Pow(0.5, I); //Probability that I diamonds will slide the same way
SuccessProb += ProbI;
}
return SuccessProb.ToString();
}
}
Your general approach seems to fit the problem, though the calculation of the last probability is not completely correct.
Let me describe how I solved this. We are looking at pyramids. These pyramids can be assigned a layer, based on how many diamonds the pyramid has. A pyramid of layer 1 has only 1 diamond. A pyramid of layer 2 has 1 + 2 + 3 diamonds. A pyramid of layer 3 has 1 + 2 + 3 + 4 + 5 diamonds. A pyramid of layer n has 1 + 2 + 3 + ... + 2*n-1 diamonds, which equals (2 * n - 1) * n.
Given this, we can calculate the layer of the biggest pyramid we are able to build with a given number of diamonds:
layer = floor( ( sqrt( 1 + 8 * diamonds ) + 1 ) / 4 )
and the number of diamonds which are not needed in order to build this pyramid. These diamonds will start to fill the next bigger pyramid:
overflow = diamonds - layer * ( 2 * layer - 1 )
We can now see the following things:
If the point is within the layer layer, it will be covered, so p = 1.0.
If the point is not within the layer layer + 1 (i.e. the next bigger pyramid), it will not be covered, so p = 0.0.
If the point is within the the layer layer + 1, is might be covered, so 0 <= p <= 1.
Since we only need to solve the last problem, we can simplify the problem statement a little bit: Given are the two sides of the triangle, r and l. Each side has a fixed capacity, the maximum number of diamonds it can take. What is the probability for one configuration (nr, nl), where nr denotes the diamonds on the right side, nl denotes the diamonds on the left side and nr + nl = overflow.
This probability can be calculated using Bernoulli's Trails:
P( nr ) = binomial_coefficient( overflow, k ) * pow( 0.5, overflow )
However, this will fail in one case: If one side is completely filled with diamonds, the probabilities change. The probability, that the diamond falls on the completely filled side is now 0, while the probability for the other side is 1.
Assume the following case: Each side can take up to 4 diamonds, while 6 diamonds are still left. The interesting case is now P( 2 ), because in this case, the left side will take 4 diamonds.
Some examples how the 6 diamonds could fall down. r stands for the decision go right, while l stands for go left:
l r l r l l => For every diamond, the probability for each side was 0.5. This case doesn't differ from the previous case. The probability for exactly this case is pow( 0.5, 6 ). There are 4 different cases like this (rllllr, lrlllr, llrllr, lllrlr). There are 10 different cases like this. The number of cases is the number of ways one element can be chosen from 5: binomial_coefficient( 5, 2 ) = 10
l r l l l r => The last diamond was going to fall on the right side, because the left side was full. The last probability was 1 for the right side and 0 for the left side. The probability for exactly this case is pow( 0.5, 5 ). There are 4 different cases like this: binomial_coefficient( 4, 1 ) = 4
l l l l r r => The last two diamonds were going to fall on the right side, because the left side was full. The last two probabilities were 1 for the right side and 0 for the left side. The probability for exactly this case is pow( 0.5, 4 ). There is exactly one case like this, because binomial_coefficient( 3, 0 ) = 1.
The general algorithm is to assume, that the last 0, 1, 2, 3, ..., nr elements will go to the right side inevitably, then to calculate the probability for each of these cases (the last 0, 1, 2, 3, ..., nr probabilites will be 1) and multiply each probability with the number of different cases where the last 0, 1, 2, 3, ..., nr probabilities are 1.
See the following code. p will be the probability for the case that nr diamonds will go on the right side and the left side is full:
p = 0.0
for i in range( nr + 1 ):
p += pow( 0.5, overflow - i ) * binomial_coefficient( overflow - i - 1, nr - i )
Now that we can calculate the probabilities for each individual combinations (nr, nl), one can simply add all cases where nr > k, with k being the minimal number of diamonds for one side for which the required point is still covered.
See the complete python code I used for this problem: https://github.com/frececroka/codejam-2013-falling-diamonds/blob/master/app.py
Your assumption are over simplicistic. You can download the correct answers of the large dataset caluclated with my solution:
http://pastebin.com/b6xVhp9U
You have to calc all the possible combinations of diamonds that will occupy your point of interests. To do that I have used this formula:
https://math.stackexchange.com/a/382123/32707
You basically have to:
Calc the height of the pyramid (ie calc the FIXED diamonds)
Calc the number of the diamonds that can freely move on the left or on the right
Calc the probability (with sums of binomial coeff)
With the latter and the Point Y you can apply that formula to calc the probability.
Also don't worry if you are not able solve this problem because it was pretty tough. If you want my solution in PHP here it is:
Note that you have to calc if the point is inside the fixed pyramid of is outside the fixed pyramid, also you have to do other minor checks.
<?php
set_time_limit(0);
$data = file('2bl.in',FILE_IGNORE_NEW_LINES);
$number = array_shift($data);
for( $i=0;$i<$number;$i++ ) {
$firstLine = array_shift($data);
$firstLine = explode(' ',$firstLine);
$s = $firstLine[0];
$x = $firstLine[1];
$y = $firstLine[2];
$s = calcCase( $s,$x,$y );
appendResult($i+1,$s);
}
function calcCase($s,$x,$y) {
echo "S: [$s] P($x,$y)\n<br>";
$realH = round(calcH($s),1);
echo "RealHeight [$realH] ";
$h = floor($realH);
if (isEven($h))
$h--;
$exactDiamonds = progression($h);
movableDiamonds($s,$h,$exactDiamonds,$movableDiamonds,$unfullyLevel);
$widthLevelPoint = $h-$y;
$spacesX = abs($x) - $widthLevelPoint;
$isFull = (int)isFull($s,$exactDiamonds);
echo "Diamonds: [$s], isFull [$isFull], Height: [$h], exactDiamonds [$exactDiamonds], movableDiamonds [$movableDiamonds], unfullyLevel [$unfullyLevel] <br>
widthlevel [$widthLevelPoint],
distance from pyramid (horizontal) [$spacesX]<br> ";
if ($spacesX>1)
return '0.0';
$pointHeight = $y+1;
if ($x==0 && $pointHeight > $h) {
return '0.0';
}
if ($movableDiamonds==0) {
echo 'Fixed pyramid';
if ( $y<=$h && abs($x) <= $widthLevelPoint )
return '1.0';
else
return '0.0';
}
if ( !$isFull ) {
echo "Pyramid Not Full ";
if ($spacesX>0)
return '0.0';
if ($unfullyLevel == $widthLevelPoint)
return '0.5';
else if ($unfullyLevel > $widthLevelPoint)
return '0.0';
else
return '1.0';
}
echo "Pyramid full";
if ($spacesX<=0)
return '1.0';
if ($movableDiamonds==0)
return '0.0';
if ( $movableDiamonds > ($h+1) ) {
$otherDiamonds = $movableDiamonds - ($h+1);
if ( $otherDiamonds - $pointHeight >= 0 ) {
return '1.0';
}
}
$totalWays = totalWays($movableDiamonds);
$goodWays = goodWays($pointHeight,$movableDiamonds,$totalWays);
echo "<br>GoodWays: [$goodWays], totalWays: [$totalWays]<br>";
return sprintf("%1.7f",$goodWays / $totalWays);
}
function goodWays($pointHeight,$movableDiamonds,$totalWays) {
echo "<br>Altezza punto [$pointHeight] ";
if ($pointHeight>$movableDiamonds)
return 0;
if ( $pointHeight == $movableDiamonds )
return 1;
$good = sumsOfBinomial( $movableDiamonds, $pointHeight );
return $good;
}
function totalWays($diamonds) {
return pow(2,$diamonds);
}
function sumsOfBinomial( $n, $k ) {
$sum = 1; //> Last element (n;n)
for($i=$k;$i<($n);$i++) {
$bc = binomial_coeff($n,$i);
//echo "<br>Binomial Coeff ($n;$i): [$bc] ";
$sum += $bc;
}
return $sum;
}
// calculate binomial coefficient
function binomial_coeff($n, $k) {
$j = $res = 1;
if($k < 0 || $k > $n)
return 0;
if(($n - $k) < $k)
$k = $n - $k;
while($j <= $k) {
$res = bcmul($res, $n--);
$res = bcdiv($res, $j++);
}
return $res;
}
function isEven($n) {
return !($n&1);
}
function isFull($s,$exact) {
return ($exact <= $s);
}
function movableDiamonds($s,$h,$exact,&$movableDiamonds,&$level) {
$baseWidth = $h;
$level=$baseWidth;
//> Full pyramid
if ( isFull($s,$exact) ) {
$movableDiamonds = ( $s-$exact );
return;
}
$movableDiamonds = $s;
while( $level ) {
//echo "<br> movable [$movableDiamonds] removing [$level] <br>" ;
if ($level > $movableDiamonds)
break;
$movableDiamonds = $movableDiamonds-$level;
$level--;
if ($movableDiamonds<=0)
break;
}
return $movableDiamonds;
}
function progression($n) {
return (1/2 * $n *(1+$n) );
}
function calcH($s) {
if ($s<=3)
return 1;
$sqrt = sqrt(1+(4*2*$s));
//echo "Sqrt: [$sqrt] ";
return ( $sqrt-1 ) / 2;
}
function appendResult($caseNumber,$string) {
static $first = true;
//> Cleaning file
if ($first) {
file_put_contents('result.out','');
$first=false;
}
$to = "Case #{$caseNumber}: {$string}";
file_put_contents( 'result.out' ,$to."\n",FILE_APPEND);
echo $to.'<br>';
}

Average function without overflow exception

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:
using System;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var items = new long[]
{
long.MaxValue - 100,
long.MaxValue - 200,
long.MaxValue - 300
};
try
{
var avg = items.Average();
Console.WriteLine(avg);
}
catch (OverflowException ex)
{
Console.WriteLine("can't calculate that!");
}
Console.ReadLine();
}
}
Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:
public static double Average(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
long num = 0L;
long num2 = 0L;
foreach (long num3 in source)
{
num += num3;
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
return (((double) num) / ((double) num2));
}
I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).
But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?
Thanks!!
UPDATE:
The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.
Thanks all!!
This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.
In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.
For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.
Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.
var total = BigInteger.Zero
var count = 0
for i in values
count += 1
total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?
If you're just looking for an arithmetic mean, you can perform the calculation like this:
public static double Mean(this IEnumerable<long> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
double count = (double)source.Count();
double mean = 0D;
foreach(long x in source)
{
mean += (double)x/count;
}
return mean;
}
Edit:
In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.
You may try the following approach:
let number of elements is N, and numbers are arr[0], .., arr[N-1].
You need to define 2 variables:
mean and remainder.
initially mean = 0, remainder = 0.
at step i you need to change mean and remainder in the following way:
mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;
after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)
If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.
// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30
List<int> diffs = new List<int>();
// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }
var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1
// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;
You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.
Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:
public sealed class RationalNumber
{
public RationalNumber()
{
this.Divisor = 1;
}
public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
{
RationalNumber result = new RationalNumber();
Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
Int64 nDivisor = c1.Divisor * c2.Divisor;
Int64 nReminder = nDividend % nDivisor;
if ( nReminder == 0 )
{
// The number is whole
result.Dividend = nDividend / nDivisor;
}
else
{
Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );
if ( nGreatestCommonDivisor != 0 )
{
nDividend = nDividend / nGreatestCommonDivisor;
nDivisor = nDivisor / nGreatestCommonDivisor;
}
result.Dividend = nDividend;
result.Divisor = nDivisor;
}
return result;
}
private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
{
Int64 nRemainder;
while ( b != 0 )
{
nRemainder = a% b;
a = b;
b = nRemainder;
}
return a;
}
// a / b = a is devidend, b is devisor
public Int64 Dividend { get; set; }
public Int64 Divisor { get; set; }
}
Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:
Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;
for ( Int32 i = 0; i < aValues.Length; ++i )
{
Int64 nReminder = aValues[ i ] % aValues.Length;
Int64 nWhole = aValues[ i ] / aValues.Length;
nAverage += nWhole;
if ( nReminder != 0 )
{
list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
}
}
RationalNumber rationalTotal = new RationalNumber();
foreach ( var rational in list )
{
rationalTotal += rational;
}
nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );
At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.
EDIT:
Why this works:
Define: A set of numbers.
if Average( A ) = SUM( A ) / LEN( A ) =>
Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>
if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).
=> so
Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;
Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.
Simple answer with LINQ...
var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();
Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)
If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.
However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...
I guess there has to be a compromise somewhere or the other. If the numbers are really getting so large then few digits of lower orders (say lower 5 digits) might not affect the result as much.
Another issue is where you don't really know the size of the dataset coming in, especially in stream/real time cases. Here I don't see any solution other then the
(previousAverage*oldCount + newValue) / (oldCount <- oldCount+1)
Here's a suggestion:
*LargestDataTypePossible* currentAverage;
*SomeSuitableDatatypeSupportingRationalValues* newValue;
*int* count;
addToCurrentAverage(value){
newValue = value/100000;
count = count + 1;
currentAverage = (currentAverage * (count-1) + newValue) / count;
}
getCurrentAverage(){
return currentAverage * 100000;
}
Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).
Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.
That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).
As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.
How about BigInteger in Visual J#.
If you're willing to sacrifice precision, you could do something like:
long num2 = 0L;
foreach (long num3 in source)
{
num2 += 1L;
}
if (num2 <= 0L)
{
throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
average += (double)num3 / (double)num2;
}
return average;
Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.
var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
You could keep a rolling average which you update once for each large number.
Use the IntX library on CodePlex.
NextAverage = CurrentAverage + (NewValue - CurrentAverage) / (CurrentObservations + 1)
Here is my version of an extension method that can help with this.
public static long Average(this IEnumerable<long> longs)
{
long mean = 0;
long count = longs.Count();
foreach (var val in longs)
{
mean += val / count;
}
return mean;
}
Let Avg(n) be the average in first n number, and data[n] is the nth number.
Avg(n)=(double)(n-1)/(double)n*Avg(n-1)+(double)data[n]/(double)n
Can avoid value overflow however loss precision when n is very large.
For two positive numbers (or two negative numbers) , I found a very elegant solution from here.
where an average computation of (a+b)/2 can be replaced with a+((b-a)/2.

Categories

Resources