Generating Combinations

Generating Combinations - c#

Recently I have been reading about lotto wheeling and combination generating. I thought I'd give it a whirl and looked about for example code. I managed to cobble together a number wheel based on some VB but I've introduced an interesting bug while porting it.
http://www.xtremevbtalk.com/showthread.php?t=168296
It allows you to basically ID any combination. You feed it N numbers, K picks and an index and it returns that combination in lexicographical order.
It works well at low values but as the number of balls (N) rises I get additional numbers occurring for example. 40 balls, 2 picks. Combination No. 780 Returns 40 and 41! The more picks and numbers I added the higher this goes, It seem to happen at the end of a run when the number preceding is due to cycle.
I found the method for generating number of possible combination on the VB forum to not make a lot of sense, so I found a simpler one:
http://www.dreamincode.net/code/snippet2334.htm
Then I discovered that using doubles seems to cause a lack of resolution. Using long works, but now I can't use higher values of N because the multiplying goes out of range for a long! I then tried ulong and decimal neither could go much past 26-28 numbers (N).
So I reverted to the version on the VB site.
http://www.xtremevbtalk.com/showthread.php?s=6548354125cb4f312fc555dd0864853e&t=129902
The code is a method to avoid hitting the 96bit ceiling and claims to be able to calculate as high as N 98, K 49.
For some reason I cannot get this to behave, it spits out some very strange numbers.
After giving up for a while I decided to re-read the wiki suggested. While most of it was over my head, I was able to discover that certain ways of calculating a binomial coefficient have inaccuracy. This wouldn't be appropriate for a system where you are essentially dialing up (wheeling) to a game. After a bit of searching and reading I came across this:
http://dmitrybrant.com/2008/04/29/binomial-coefficients-stirling-numbers-csharp
Turns out this is exactly the information I was looking for! The first method is accurate and plenty fast for anything I'm doing. Much thanks for psYchotic going to the trouble of joining just to post here!

There are exactly 780 combinations of 2 numbers to generate out of a set of 40. If your combination generator uses a zero-based index, any index >= the maximum amount of combinations would be invalid.
You can use the binomial coefficient to determine the number of combinations that can be formed.

Related

How double hashing works in case of the .NET Dictionary?

The other day I was reading that article on CodeProject
And I got hard times understanding a few points about the implementation of the .NET Dictionary (considering the implementation here without all the optimizations in .NET Core):
Note: If will add more items than the maximum number in the table
(i.e 7199369), the resize method will manually search the next prime
number that is larger than twice the old size.
Note: The reason that the sizes are being doubled while resizing the
array is to make the inner-hash table operations to have asymptotic
complexity. The prime numbers are being used to support
double-hashing.
So I tried to remember my old CS classes back a decade ago with my good friend wikipedia:
Open Addressing
Separate Chaining
Double Hashing
But I still don't really see how first it relates to double hashing (which is a collision resolution technique for open-addressed hash tables) except the fact that the Resize() method double of the entries based on the minimum prime number (taken based on the current/old size), and tbh I don't really see the benefits of "doubling" the size, "asymptotic complexity" (I guess that article meant O(n) when the underlying array (entries) is full and subject to resize).
First, If you double the size with or without using a prime, is it not really the same?
Second, to me, the .NET hash table use a separate chaining technique when it comes to collision resolution.
I guess I must have missed a few things and I would like to have someone who can shed the light on those two points.

I got my answer on Reddit, so I am gonna try to summarize here:
Collision Resolution Technique
First off, it seems that the collision resolution is using Separate Chaining technique and not Open addressing technique and therefore there is no Double Hashing strategy:
The code goes as follows:
private struct Entry
{
public int hashCode; // Lower 31 bits of hash code, -1 if unused
public int next; // Index of next entry, -1 if last
public TKey key; // Key of entry
public TValue value; // Value of entry
}
It just that instead of having one dedicated storage for all the entries sharing the same hashcode / index like a list or whatnot for every bucket, everything is stored in the same entries array.
Prime Number
About the prime number the answer lies here: https://cs.stackexchange.com/a/64191/42745 it's all about multiple:
Therefore, to minimize collisions, it is important to reduce the number of common factors between m and the elements of K. How can this
be achieved? By choosing m to be a number that has very few factors: a
prime number.
Doubling the underlying entries array size
Help to avoid call too many resize operations (i.e. copies) by increasing the size of the array by enough amount of slots.
See that answer: https://stackoverflow.com/a/2369504/4636721
Hash-tables could not claim "amortized constant time insertion" if,
for instance, the resizing was by a constant increment. In that case
the cost of resizing (which grows with the size of the hash-table)
would make the cost of one insertion linear in the total number of
elements to insert. Because resizing becomes more and more expensive
with the size of the table, it has to happen "less and less often" to
keep the amortized cost of insertion constant.

Genetical algorithms. How to find the optimal size of the population

How to find the optimal size of the population. In my task, each gene is a value of type int lying in a given range.
For example:
The chromosome consists of 2 genes.
The first gene maybe contains a int value in the range from 5 to 15
The second gene maybe contains a int value from 15 to 25.
The question. How to find the size of the initial population.

Usually optimal size is found iteratively through trial and error. You can write a simple algorithm to optimize population size, start for example with pop size of 100 and iterativly increase it by e.g. 50. For each step you need to run GA and calculate some measure that will assess population size, you can use one of these: maximum fitness, average fitness, time till convergence criteria is met. To increase accuracy you should repeat each step at least few times, after that calculate average in each step and draw chart from which you can choose optimal pop size or if it's not enough you can optimize closely peak doing the same thing near this pop size.
Depending on your problem the chart will look different. If it's just a positive slope curve, then you will have to choose on your own reasonable pop size. With too small pop size your GA will most likely loose diversity and perhaps fall to some local optimum. When it's too big then your GA will become simple random search algorithm.
Btw I hope this example is far from your real problem, because genetic algorithms are not the best choice for such small chromosomes.

'Beautify' number by rounding erroneous digits appropriately

I want my cake and to eat it. I want to beautify (round) numbers to the largest extent possible without compromising accuracy for other calculations. I'm using doubles in C# (with some string conversion manipulation too).
Here's the issue. I understand the inherent limitations in double number representation (so please don't explain that). HOWEVER, I want to round the number in some way to appear aesthetically pleasing to the end user (I am making a calculator). The problem is rounding by X significant digits works in one case, but not in the other, whilst rounding by decimal place works in the other, but not the first case.
Observe:
CASE A: Math.Sin(Math.Pi) = 0.000000000000000122460635382238
CASE B: 0.000000000000001/3 = 0.000000000000000333333333333333
For the first case, I want to round by DECIMAL PLACES. That would give me the nice neat zero I'm looking for. Rounding by Sig digits would mean I would keep the erroneous digits too.
However for the second case, I want to round by SIGNIFICANT DIGITS, as I would lose tons of accuracy if I rounded merely by decimal places.
Is there a general way I can cater to both types of calculation?

I don't thinks it's feasible to do that to the result itself and precision has nothing to do with it.
Consider this input: (1+3)/2^3 . You can "beautify" it by showing the result as sin(30) or cos(60) or 1/2 and a whole lot of other interpretations. Choosing the wrong "beautification" can mislead your user, making them think their function has something to do with sin(x).
If your calculator keeps all the initial input as variables you could keep all the operations postponed until you need the result and then make sure you simplify the result until it matches your needs. And you'll need to consider using rational numbers, e, Pi and other irrational numbers may not be as easy to deal with.

The best solution to this is to keep every bit you can get during calculations, and leave the display format up to the end user. The user should have some idea how many significant digits make sense in their situation, given both the nature of the calculations and the use of the result.
Default to a reasonable number of significant digits for a few calculations in the floating point format you are using internally - about 12 if you are using double. If the user changes the format, immediately redisplay in the new format.

The best solution is to use arbitrary-precision and/or symbolic arithmetic, although these result in much more complex code and slower speed. But since performance isn't important for a calculator (in case of a button calculator and not the one that you enter expressions to calculate) you can use them without issue
Anyway there's a good trade-off which is to use decimal floating point. You'll need to limit the input/output precision but use a higher precision for the internal representation so that you can discard values very close to zero like the sin case above. For better results you could detect some edge cases such as sine/cosine of 45 degree's multiples... and directly return the exact result.
Edit: just found a good solution but haven't had an opportunity to try.
Here’s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it’s been regarded as essentially solved since about 1990.
Prior to Steele and White’s "How to print floating-point numbers accurately", implementations of printf and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.
...
In 2010, Florian Loitsch published a wonderful paper in PLDI, "Printing floating-point numbers quickly and accurately with integers", which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say "mostly"? Because although Loitsch's "Grisu3" algorithm is very fast, it gives up on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative
Here be dragons: advances in problems you didn’t even know you had

C#/XNA - Multiplication faster than Division?

I saw a tweet recently that confused me (this was posted by an XNA coder, in the context of writing an XNA game):
Microoptimization tip of the day: when possible, use multiplication instead of division in high frequency areas. It's a few cycles faster.
I was quite surprised, because I always thought compilers where pretty smart (for example, using bit-shifting), and recently read a post by Shawn Hargreaves saying much the same thing. I wondered how much truth there was in this, since there are lots of calculations in my game.
I inquired, hoping for a sample, however the original poster was unable to give one. He did, however, say this:
Not necessarily when it's something like "center = width / 2". And I've already determined "yes, it's worth it". :)
So, I'm curious...
Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself.

Most compilers can do a reasonable job of optimizing when you give them a chance. For example, if you're dividing by a constant, chances are pretty good that the compiler can/will optimize that so it's done about as quickly as anything you can reasonably substitute for it.
When, however, you have two values that aren't known ahead of time, and you need to divide one by the other to get the answer, if there was much way for the compiler to do much with it, it would -- and for that matter, if there was much room for the compiler to optimize it much, the CPU would do it so the compiler didn't have to.
Edit: Your best bet for something like that (that's reasonably realistic) would probably be something like:
double scale_factor = get_input();
for (i=0; i<values.size(); i++)
values[i] /= scale_factor;
This is relatively easy to convert to something like:
scale_factor = 1.0 / scale_factor;
for (i=0; i<values.size(); i++)
values[i] *= scale_factor;
I can't really guarantee much one way or the other about a particular compiler doing that. It's basically a combination of strength reduction and loop hoisting. There are certainly optimizers that know how to do both, but what I've seen of the C# compiler suggests that it may not (but I never tested anything exactly like this, and the testing I did was a few versions back...)

Although the compiler can optimize out divisions and multiplications by powers of 2, other numbers can be difficult or impossible to optimize. Try optimizing a division by 17 and you'll see why. This is of course assuming the compiler doesn't know that you are dividing by 17 ahead of time (it is a run-time variable, not a constant).

Bit late but never mind.
The answer to your question is yes.
Have a look at my article here, http://www.codeproject.com/KB/cs/UniqueStringList2.aspx, which uses information based on the article mentioned in the first comment to your question.
I have a QuickDivideInfo struct which stores the magic number and the shift for a given divisor thus allowing division and modulo to be calculated using faster multiplication. I pre-computed (and tested!) QuickDivideInfos for a list of Golden Prime Numbers. For x64 at least, the .Divide method on QuickDivideInfo is inlined and is 3x quicker than using the divide operator (on an i5); it works for all numerators except int.MinValue and cannot overflow since the multiplication is stored in 64 bits before shifting. (I've not tried on x86 but if it doesn't inline for some reasons then the neatness of the Divide method would be lost and you would have to manually inline it).
So the above will work in all scenarios (except int.MinValue) if you can precalculate. If you trust the code that generates the magic number/shift, then you can deal with any divisor at runtime.
Other well-known small divisors with a very limited range of numerators could be written inline and may well be faster if they don't need an intermediate long.
Division by multiple of two: I would expect the compiler to deal with this (as in your width / 2) example since it is constant. If it doesn't then changing it to width >> 1 should be fine

To give some numbers, on this pdf
http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags
of the Pentium we get some numbers, and they aren't good:
IMUL 10 or 11
FMUL 3+1
IDIV 46 (32 bits operand)
FDIV 39
We are speaking of BIG differences

while(start<=end)
{
int mid=(start+end)/2;
if(mid*mid==A)
return mid;
if(mid*mid<A)
{
start=mid+1;
ans=mid;
}
If i am doing this way the outcome is the TIME LIMIT EXCEEDED for square root of 2147483647
But if i am doing the following way then the thing is clear that for Division compiler responds faster than for multiplication.
while(start<=end)
{
int mid=(start+end)/2;
if(mid==A/mid)
return mid;
if(mid<A/mid)
{
start=mid+1;
ans=mid;
}
else
end=mid-1;
}

C# Algorithm to assign matches between teams in a sport

Possible Dup: Help Me Figure Out A Random Scheduling Algorithm using Python and PostgreSQL
Let's say you have a division with 9 teams, and you want them to play 16 games each. Usually you would want to have 8 games (Home), and 8 games (Visitor). Is there a known algorithm to go in and assign the matches, randomly?
Note -> It can, sometimes not work, so you can have uneven numbers.
Any help is appreciated.

See these permutation algorithms
Does this one work for you : Fisher–Yates shuffle

There's a nice easy way to generate a round robin here. In the second round, you can repeat the round-robin and add swap home and away.
If you have an odd number of teams, you just use a dummy team that gives its opponent a bye in a particular round, which results in an extra round. You can distribute that extra round among the other rounds if you'd rather give double-headers than byes.

I think you can use the maximal matching in a bipartite graph algorithm for this (see, e.g., here), which runs in polynomial time.
We represent your problem by assigning each team, T, 8 vertices (Th1, ..., Th8) in the "home" subset of vertices and 8 vertices (Ta1, ..., Ta8) in the "away" subset of the vertices.
We now look for a maximal matching between the "home" and "away" subsets such that each edge (H, A) in the matching satisfies the property that H is in the "home" subset, "A" is in the "away" subset, and H and A belong to different teams.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.