I'm looking for a non-recursive algorithm (preferably in C#) which will generate a list of all sums possible from a set of positive numbers.
E.g. For a set of three numbers "1,2,3" the following seven sums are possible:
1
2
3
1+2=3
1+3=4
2+3=5
1+2+3=6
The maximum set size would be around 50. I know how to approach this problem recursively, but have been limited by the call stack in the past when tackling a similar problem, so want to avoid it this time.
If you just need all possible sums then you can use this function.
public static IEnumerable<int> GetSums(List<int> list)
{
return from m in Enumerable.Range(0, 1 << list.Count)
select
(from i in Enumerable.Range(0, list.Count)
where (m & (1 << i)) != 0
select list[i]).Sum();
}
And, then just call it like that:
var result = GetSums(myList).ToList();
Additional information:
You can also use this method for generating combinations(source):
public static IEnumerable<IEnumerable<T>> GetPowerSet<T>(List<T> list)
{
return from m in Enumerable.Range(0, 1 << list.Count)
select
from i in Enumerable.Range(0, list.Count)
where (m & (1 << i)) != 0
select list[i];
}
And find the sums of all combinations with the help of Sum() method from the System.Linq namespace:
var result = GetPowerSet(myList).Select(x => x.Sum()).ToList();
Sums from subsets are in direct correspondence with subsets, which are also in direct correspondence with binary sequences. If you have five items in your set, you want to iterate over all bit sequences from 00000 to 11111. Equivalently, you want to iterate from 0 to 2^5-1. If a bit is set to one, you should include the value in the sum. So, something like this:
for i = 0 to 2^n-1
sum = 0
for j = 0 to n - 1
if i & (1 << j) then
sum += items[j]
yield return sum
Obviously, this is pseudocode and doesn't deal with values of n larger than the number of bits used by i, but that is going to be a long iteration. This should at least get you started.
If sum of all numbers is limited by reliable value, then DP solution exists with complexity O(N * MaxSum), else there are O(2^N) possible sums.
DP solution (Delphi):
procedure GenerateAllSums(const A: array of Integer);
var
ASums: array of Boolean;
S, i, j: Integer;
begin
//find maximal possible sum
S := 0;
for i := 0 to High(A) do
S := S + A[i];
//make array for possible sums
SetLength(ASums, S + 1);
ASums[0] := True; // all others - false
for i := 0 to High(A) do
for j := S - A[i] downto 0 do
if ASums[j] then
ASums[j + A[i]] := True;
//Now 'True' elements of ASums denote possible sum values
end;
I have a maths issue within my program. I think the problem is simple but I'm not sure what terms to use, hence my own searches returned nothing useful.
I receive some values in a method, the only thing I know (in terms of logic) is the numbers will be something which can be duplicated.
In other words, the numbers I could receive are predictable and would be one of the following
1
2
4
16
256
65536
etc
I need to know at what index they appear at. In othewords, 1 is always at index 0, 2 at index 1, 4 at index 3, 16 is at index 4 etc.
I know I could write a big switch statement but I was hoping a formula would be tidier. Do you know if one exists or any clues as the names of the math forumula's I'm using.
The numbers you listed are powers of two. The inverse function of raising a number to a power is the logarithm, so that's what you use to go backwards from (using your terminology here) a number to an index.
var num = 256;
var ind = Math.Log(num, 2);
Above, ind is the base-2 logarithm of num. This code will work for any base; just substitute that base for 2. If you are only going to be working with powers of 2 then you can use a special-case solution that is faster based on the bitwise representation of your input; see What's the quickest way to compute log2 of an integer in C#?
Try
Math.Log(num, base)
where base is 2
MSDN: http://msdn.microsoft.com/en-us/library/hd50b6h5.aspx
Logarithm will return to You power of base from you number.
But it's in case if your number really are power of 2,
otherwise you have to understand exactly what you have, what you need
It also look like numbers was powered to 2 twice, so that try this:
private static int getIndexOfSeries(UInt64 aNum)
{
if (aNum == 1)
return 0;
else if (aNum == 2)
return 1;
else
{
int lNum = (int)Math.Log(aNum, 2);
return 1+(int)Math.Log(lNum, 2);
}
}
Result for UInt64[] Arr = new UInt64[] { 1, 2, 4, 16, 256, 65536, 4294967296 } is:
Num[0] = 1
Num[1] = 2
Num[2] = 4
Num[3] = 16
Num[4] = 256
Num[5] = 65536
Num[6] = 4294967296 //65536*65536
where [i] - index
You should calculate the base 2 logarithm of the number
Hint: For the results:
0 2
1 4
2 16
3 256
4 65536
5 4294967296
etc.
The formula is, for a give integer x:
Math.Pow(2, Math.Pow(2, x));
that is
2 to the power (2 to the power (x) )
Once the formula is known, one could solve it for x (I won't go through that since you already got an answer).
I would like to make a frequency table with random numbers.
So i have created a array that generates 11 random values between 0 and 9999.
public void FillArrayRandom(int[] T)
{
Random Rndint = new Random();
for (int i=0; i < T.Length; i++)
{
T[i] = Rndint.Next(0, 9999);
}
}/*FillArrayRandom*/
The result i want is something alike this:(bar height up to 21) So this will be a constant.
*
* *
* * * (the highest value will have the largest row/bar)
* * * *
0 1 2 3 .....(index value's)
931 6669 10 8899 .... (up to 11 random values)
My question is how do i exactly caculate the frequency between those 11 random values?
The bars should have a relative relation with each other depending on there frequency.
I would only like to use 1 single array in my program (for the generated values).
F = (F * 21?) / ...? Really no clue how to obtain the proper results.
If a frequency is >=21 write * If a frequency is >=20 write * If a frequency is >=19 write * , and so on until i reach 1. (and the full table is displayed
Basicly i would like to print the table line per line with consolewrite(line).
etc...
Regards.
To calculate frequency you could use a Dictionary defined something like:
freqDict Dictionary<int, int> = new Dictionary<int, int>();
Where the first integer (K) is the key that corresponds to your random value or its index in the values array, either way it has to be unique and able to reference to a particular value. The second integer is the value (V) that is your count of each key.
Next, walk your array of randomly generated values and if it's not represented in the dictionary add it with the new key and set the value to 1. If the dictionary already contains the key you simply increment the value by 1. Do this for each value in your rand array and you will have a dictionary with a frequency distribution.
I'm looking to use a rolling hash function so I can take hashes of n-grams of a very large string.
For example:
"stackoverflow", broken up into 5 grams would be:
"stack", "tacko", "ackov", "ckove",
"kover", "overf", "verfl", "erflo", "rflow"
This is ideal for a rolling hash function because after I calculate the first n-gram hash, the following ones are relatively cheap to calculate because I simply have to drop the first letter of the first hash and add the new last letter of the second hash.
I know that in general this hash function is generated as:
H = c1ak − 1 + c2ak − 2 + c3ak − 3 + ... + cka0 where a is a constant and c1,...,ck are the input characters.
If you follow this link on the Rabin-Karp string search algorithm , it states that "a" is usually some large prime.
I want my hashes to be stored in 32 bit integers, so how large of a prime should "a" be, such that I don't overflow my integer?
Does there exist an existing implementation of this hash function somewhere that I could already use?
Here is an implementation I created:
public class hash2
{
public int prime = 101;
public int hash(String text)
{
int hash = 0;
for(int i = 0; i < text.length(); i++)
{
char c = text.charAt(i);
hash += c * (int) (Math.pow(prime, text.length() - 1 - i));
}
return hash;
}
public int rollHash(int previousHash, String previousText, String currentText)
{
char firstChar = previousText.charAt(0);
char lastChar = currentText.charAt(currentText.length() - 1);
int firstCharHash = firstChar * (int) (Math.pow(prime, previousText.length() - 1));
int hash = (previousHash - firstCharHash) * prime + lastChar;
return hash;
}
public static void main(String[] args)
{
hash2 hashify = new hash2();
int firstHash = hashify.hash("mydog");
System.out.println(firstHash);
System.out.println(hashify.hash("ydogr"));
System.out.println(hashify.rollHash(firstHash, "mydog", "ydogr"));
}
}
I'm using 101 as my prime. Does it matter if my hashes will overflow? I think this is desirable but I'm not sure.
Does this seem like the right way to go about this?
i remember a slightly different implementation which seems to be from one of sedgewick's algorithms books (it also contains example code - try to look it up). here's a summary adjusted to 32 bit integers:
you use modulo arithmetic to prevent your integer from overflowing after each operation.
initially set:
c = text ("stackoverflow")
M = length of the "n-grams"
d = size of your alphabet (256)
q = a large prime so that (d+1)*q doesn't overflow (8355967 might be a good choice)
dM = dM-1 mod q
first calculate the hash value of the first n-gram:
h = 0
for i from 1 to M:
h = (h*d + c[i]) mod q
and for every following n-gram:
for i from 1 to lenght(c)-M:
// first subtract the oldest character
h = (h + d*q - c[i]*dM) mod q
// then add the next character
h = (h*d + c[i+M]) mod q
the reason why you have to add d*q before subtracting the oldest character is because you might run into negative values due to small values caused by the previous modulo operation.
errors included but i think you should get the idea. try to find one of sedgewick's algorithms books for details, less errors and a better description. :)
As i understand it's a function minimization for:
2^31 - sum (maxchar) * A^kx
where maxchar = 62 (for A-Za-z0-9). I've just calculated it by Excel (OO Calc, exactly) :) and a max A it found is 76, or 73, for a prime number.
Not sure what your aim is here, but if you are trying to improve performance, using math.pow will cost you far more than you save by calculating a rolling hash value.
I suggest you start by keeping to simple and efficient and you are very likely find it is fast enough.
I have a need to write code that will prorate a value across a list, based on the relative weights of "basis" values in the list. Simply dividing the "basis" values by the sum of the "basis" values and then multiplying the factor by the original value to prorate works to a certain degree:
proratedValue = (basis / basisTotal) * prorationAmount;
However, the result of this calculation must then be rounded to integer values. The effect of the rounding means that the the sum of proratedValue for all items in the list may differ from the original prorationAmount.
Can anyone explain how to apply a "lossless" proration algorithm that proportionately distributes a value across a list as accurately as possible, without suffering from rounding errors?
Simple algorithm sketch here...
Have a running total which starts at zero.
Do your standard "divide basis by total basis, then multiply by proportion amount" for the first item.
Store the original value of the running total elsewhere, then add the amount you just calculated in #2.
Round both the old value and the new value of the running total to integers (don't modify the existing values, round them into separate variables), and take the difference.
The number calculated in step 4 is the value assigned to the current basis.
Repeat steps #2-5 for each basis.
This is guaranteed to have the total amount prorated equal to the input prorate amount, because you never actually modify the running total itself (you only take rounded values of it for other calculations, you don't write them back). What would have been an issue with integer rounding before is now dealt with, since the rounding error will add up over time in the running total and eventually push a value across the rounding threshold in the other direction.
Basic example:
Input basis: [0.2, 0.3, 0.3, 0.2]
Total prorate: 47
----
R used to indicate running total here:
R = 0
First basis:
oldR = R [0]
R += (0.2 / 1.0 * 47) [= 9.4]
results[0] = int(R) - int(oldR) [= 9]
Second basis:
oldR = R [9.4]
R += (0.3 / 1.0 * 47) [+ 14.1, = 23.5 total]
results[1] = int(R) - int(oldR) [23-9, = 14]
Third basis:
oldR = R [23.5]
R += (0.3 / 1.0 * 47) [+ 14.1, = 37.6 total]
results[1] = int(R) - int(oldR) [38-23, = 15]
Fourth basis:
oldR = R [37.6]
R += (0.2 / 1.0 * 47) [+ 9.4, = 47 total]
results[1] = int(R) - int(oldR) [47-38, = 9]
9+14+15+9 = 47
TL;DR algorithm with best (+20%) possible accuracy, 70% slower.
Evaulated algorithms presented in accepted answer here as well as answer to python question of similar nature.
Distribute 1 - based on Amber's algorithm
Distribute 2 - based on John Machin's algorithm
Distribute 3 - see below
Distribute 4 - optimized version of Distribute 3 (eg. removed LINQ, used arrays)
Testing results (10,000 iterations)
Algorithm | Avg Abs Diff (x lowest) | Time (x lowest)
------------------------------------------------------------------
Distribute 1 | 0.5282 (1.1992) | 00:00:00.0906921 (1.0000)
Distribute 2 | 0.4526 (1.0275) | 00:00:00.0963136 (1.0620)
Distribute 3 | 0.4405 (1.0000) | 00:00:01.1689239 (12.8889)
Distribute 4 | 0.4405 (1.0000) | 00:00:00.1548484 (1.7074)
Method 3 present has 19.9% better accuracy, for 70.7% slower execution time as expected.
Distribute 3
Makes best effort to be as accurate as possible in distributing amount.
Distribute weights as normal
Increment weights with highest error until actual distributed amount equals expected amount
Sacrifices speed for accuracy by making more then one pass through the loop.
public static IEnumerable<int> Distribute3(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var query = from w in weights
let fraction = amount * (w / totalWeight)
let integral = (int)Math.Floor(fraction)
select Tuple.Create(integral, fraction);
var result = query.ToList();
var added = result.Sum(x => x.Item1);
while (added < amount)
{
var maxError = result.Max(x => x.Item2 - x.Item1);
var index = result.FindIndex(x => (x.Item2 - x.Item1) == maxError);
result[index] = Tuple.Create(result[index].Item1 + 1, result[index].Item2);
added += 1;
}
return result.Select(x => x.Item1);
}
Distribute 4
public static IEnumerable<int> Distribute4(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var length = weights.Count();
var actual = new double[length];
var error = new double[length];
var rounded = new int[length];
var added = 0;
var i = 0;
foreach (var w in weights)
{
actual[i] = amount * (w / totalWeight);
rounded[i] = (int)Math.Floor(actual[i]);
error[i] = actual[i] - rounded[i];
added += rounded[i];
i += 1;
}
while (added < amount)
{
var maxError = 0.0;
var maxErrorIndex = -1;
for(var e = 0; e < length; ++e)
{
if (error[e] > maxError)
{
maxError = error[e];
maxErrorIndex = e;
}
}
rounded[maxErrorIndex] += 1;
error[maxErrorIndex] -= 1;
added += 1;
}
return rounded;
}
Test Harness
static void Main(string[] args)
{
Random r = new Random();
Stopwatch[] time = new[] { new Stopwatch(), new Stopwatch(), new Stopwatch(), new Stopwatch() };
double[][] results = new[] { new double[Iterations], new double[Iterations], new double[Iterations], new double[Iterations] };
for (var i = 0; i < Iterations; ++i)
{
double[] weights = new double[r.Next(MinimumWeights, MaximumWeights)];
for (var w = 0; w < weights.Length; ++w)
{
weights[w] = (r.NextDouble() * (MaximumWeight - MinimumWeight)) + MinimumWeight;
}
var amount = r.Next(MinimumAmount, MaximumAmount);
var totalWeight = weights.Sum();
var expected = weights.Select(w => (w / totalWeight) * amount).ToArray();
Action<int, DistributeDelgate> runTest = (resultIndex, func) =>
{
time[resultIndex].Start();
var result = func(weights, amount).ToArray();
time[resultIndex].Stop();
var total = result.Sum();
if (total != amount)
throw new Exception("Invalid total");
var diff = expected.Zip(result, (e, a) => Math.Abs(e - a)).Sum() / amount;
results[resultIndex][i] = diff;
};
runTest(0, Distribute1);
runTest(1, Distribute2);
runTest(2, Distribute3);
runTest(3, Distribute4);
}
}
The problem you have is to define what an "acceptable" rounding policy is, or in other words, what it is you are trying to minimize. Consider first this situation: you have only 2 identical items in your list, and are trying to allocate 3 units. Ideally, you would want to allocate the same amount to each item (1.5), but that is clearly not going to happen. The "best" you could do is likely to allocate 1 and 2, or 2 and 1. So
there might be multiple solutions to each allocation
identical items may not receive an identical allocation
Then, I chose 1 and 2 over 0 and 3 because I assume that what you want is to minimize the difference between the perfect allocation, and the integer allocation. This might not be what you consider "a good allocation", and this is a question you need to think about: what would make an allocation better than another one?
One possible value function could be to minimize the "total error", i.e. the sum of the absolute values of the differences between your allocation and the "perfect", unconstrained allocation.
It sounds to me that something inspired by Branch and Bound could work, but it's non trivial.
Assuming that Dav solution always produces an allocation that satisfies the constraint (which I'll trust is the case), I assume that it is not guaranteed to give you the "best" solution, "best" defined by whatever distance/fit metric you end up adopting. My reason for this is that this is a greedy algorithm, which in integer programming problems can lead you to solutions which are really off the optimal solution. But if you can live with a "somewhat correct" allocation, then I say, go for it! Doing it "optimally" doesn't sound trivial.
Best of luck!
Ok. I'm pretty certain that the original algorithm (as written) and the code posted (as written) doesn't quite answer the mail for the test case outlined by #Mathias.
My intended use of this algorithm is a slightly more specific application. Rather than calculating the % using (#amt / #SumAmt) as shown in the original question. I have a fixed $ amount that needs to be split or spread across multiple items based on a % split defined for each of those items. The split % sums to 100%, however, straight multiplication often results in decimals that (when forced to round to whole $) don't add up to the total amount that I'm splitting apart. This is the core of the problem.
I'm fairly certain that the original answer from #Dav doesn't work in cases where (as #Mathias described) the rounded values are equal across multiple slices. This problem with the original algorithm and code can be summed up with one test case:
Take $100 and split it 3 ways using 33.333333% as your percentage.
Using the code posted by #jtw (assuming this is an accurate implementation of the original algorithm), yields you the incorrect answer of allocating $33 to each item (resulting in an overall sum of $99), so it fails the test.
I think a more accurate algorithm might be:
Have a running total which starts at 0
For each item in the group:
Calculate the un-rounded allocation amount as ( [Amount to be Split] * [% to Split] )
Calculate the cumulative Remainder as [Remainder] + ( [UnRounded Amount] - [Rounded Amount] )
If Round( [Remainder], 0 ) > 1 OR the current item is the LAST ITEM in the list, then set the item's allocation = [Rounded Amount] + Round( [Remainder], 0 )
else set item's allocation = [Rounded Amount]
Repeat for next item
Implemented in T-SQL, it looks like this:
-- Start of Code --
Drop Table #SplitList
Create Table #SplitList ( idno int , pctsplit decimal(5, 4), amt int , roundedAmt int )
-- Test Case #1
--Insert Into #SplitList Values (1, 0.3333, 100, 0)
--Insert Into #SplitList Values (2, 0.3333, 100, 0)
--Insert Into #SplitList Values (3, 0.3333, 100, 0)
-- Test Case #2
--Insert Into #SplitList Values (1, 0.20, 57, 0)
--Insert Into #SplitList Values (2, 0.20, 57, 0)
--Insert Into #SplitList Values (3, 0.20, 57, 0)
--Insert Into #SplitList Values (4, 0.20, 57, 0)
--Insert Into #SplitList Values (5, 0.20, 57, 0)
-- Test Case #3
--Insert Into #SplitList Values (1, 0.43, 10, 0)
--Insert Into #SplitList Values (2, 0.22, 10, 0)
--Insert Into #SplitList Values (3, 0.11, 10, 0)
--Insert Into #SplitList Values (4, 0.24, 10, 0)
-- Test Case #4
Insert Into #SplitList Values (1, 0.50, 75, 0)
Insert Into #SplitList Values (2, 0.50, 75, 0)
Declare #R Float
Declare #Results Float
Declare #unroundedAmt Float
Declare #idno Int
Declare #roundedAmt Int
Declare #amt Float
Declare #pctsplit Float
declare #rowCnt int
Select #R = 0
select #rowCnt = 0
-- Define the cursor
Declare SplitList Cursor For
Select idno, pctsplit, amt, roundedAmt From #SplitList Order By amt Desc
-- Open the cursor
Open SplitList
-- Assign the values of the first record
Fetch Next From SplitList Into #idno, #pctsplit, #amt, #roundedAmt
-- Loop through the records
While ##FETCH_STATUS = 0
Begin
-- Get derived Amounts from cursor
select #unroundedAmt = ( #amt * #pctsplit )
select #roundedAmt = Round( #unroundedAmt, 0 )
-- Remainder
Select #R = #R + #unroundedAmt - #roundedAmt
select #rowCnt = #rowCnt + 1
-- Magic Happens! (aka Secret Sauce)
if ( round(#R, 0 ) >= 1 ) or ( ##CURSOR_ROWS = #rowCnt ) Begin
select #Results = #roundedAmt + round( #R, 0 )
select #R = #R - round( #R, 0 )
End
else Begin
Select #Results = #roundedAmt
End
If Round(#Results, 0) <> 0
Begin
Update #SplitList Set roundedAmt = #Results Where idno = #idno
End
-- Assign the values of the next record
Fetch Next From SplitList Into #idno, #pctsplit, #amt, #roundedAmt
End
-- Close the cursor
Close SplitList
Deallocate SplitList
-- Now do the check
Select * From #SplitList
Select Sum(roundedAmt), max( amt ),
case when max(amt) <> sum(roundedamt) then 'ERROR' else 'OK' end as Test
From #SplitList
-- End of Code --
Which yields a final result set for the test case of:
idno pctsplit amt roundedAmt
1 0.3333 100 33
2 0.3333 100 34
3 0.3333 100 33
As near as I can tell (and I've got several test cases in the code), this handles all of these situations pretty gracefully.
This is an apportionment problem, for which there are many known methods. All have certain pathologies: the Alabama paradox, the population paradox, or a failure of the quota rule. (Balinski and Young proved that no method can avoid all three.) You'll probably want one that follows the quote rule and avoids the Alabama paradox; the population paradox isn't as much of a concern since there's no much difference in the number of days per month between different years.
I think proportional distributions is the answer:
http://www.sangakoo.com/en/unit/proportional-distributions-direct-and-inverse