I'm looking for a function that maps two (positive) integers into a single new integer, which can be reversed to the original combination.
The question has been asked before, for example Mapping two integers to one, in a unique and deterministic way. The difference is that one of the integers is bound to an upper bound which is quite small, for example 50. The other integer is unbound.
What i'm trying to solve is that I have and 1-50 arrays with numbers 1 - max int (but mostly < 10.000.000).
array1 {1,2,3,4,5,6,7..N)
array2 {1,2,3,4,5,6,7..N)
array50 {1,2,3,4,5,6,7..N)
Now I want to create a single new array which combines these N arrays to a single new array, where each number is reversable to the original array. So I thought about creating pairs, one number pointing to the array and one to the actual number in the array.
If I use the default functions like Cantor Pairing Function I get huge numbers very fast, and i'm trying to keep those numbers as small as possible.
It would be preferably if the biggest part would just fit in a Int32 instead of a long. I think it should be possible because one of the numbers in my pair is bounded by 50, but I can't figure out how.
If you have two numbers
a from 0 to a_max - 1
b from 0 to 232/a_max - 1
you can combine them as
x = a + a_max*b;
and the combined number x will fit into a 32 bit unsigned integer.
To decode them, use
a = x%a_max;
b = x/a_max;
It is not possible to find a more efficient packing, because every possible output value is used. (There are no 'gaps' in the output.) If the bounds for b are too narrow, a larger output type must be used.
Related
I have 3 array that keep integer values. A array of 4 -dimensional, a array of 2-dimensional, a array of single-dimensional. But the total number of elements is equal to each. I'm going to print on console all the elements in these array. Which one prints the fastest? Or is it equal to printing times?
int[,,,] Q = new int[4, 4, 4, 4];
int[,] W = new int[16,16];
int[] X = new int[256];
Unless I'm missing something, there are two main ways you could be iterating over the multi-dimensional arrays.
The first is:
int[,] W = new int[16,16];
for(int i = 0; i < 16; i++)
{
for(int j = 0; j < 16; j++)
Console.WriteLine(W[i][j]);
}
This method is slower than iterating over the single-dimensional array, as the only difference is that for every 16 members, you need to start a new iteration of the outside loop and re-initiate the inner loop.
The second is:
for(int i = 0; i < 256; i++)
{
Console.WriteLine(W[i / 16][i % 16]);
}
This method is slower because every iteration you need to calculate both (i / 16) and (i % 16).
Ignoring the iteration factor, there is also the time it takes to access another pointer every iteration.
To the extent of my knowledge in boolean functions*, given two sets of two integers, one of them bigger numbers but both having the same size in memory (as is the case for all numbers of type int in c#), the time to compute the addition of the two sets would be exactly the same (as in the number of clock ticks, but it's not something I'd expect everyone who stumbles upon this question to be familiar with). This being the case, the time for calculating the address of an array member is not dependent upon how big its index is.
So to summarize, unless I'm missing something or I'm way rustier than I think, there is one factor that is guaranteed to lengthen the time it takes for iterating over multidimensional arrays (the extra pointers to access), another factor that is guaranteed to do the same, but you can choose one of two options for (multiple loops or additional calculations every iteration of the loop), and there are no factors that would slow down the single-dimensional array approach (no "tax" for an extra long index).
CONCLUSIONS:
That makes it two factors working for a single-dimensional array, and none for a multi-dimensional one.
Thus, I would assume the single-dimensional array would be faster
That being said, you're using C#, so you're probably not really looking for that insignificant an edge or you'd use a low-level language. And if you are, you should probably either switch to a low-level language or really contemplate whether you are doing whatever it is you're trying to in the best way possible (the only case where this could make an actual difference, that I can think of, is if you load into your code a whole 1 million record plus database, and that's really bad practice).
However, if you're just starting out in C# then you're probably just overthinking it.
Whichever it is, this was a fun hypothetical, so thanks for asking it!
*by boolean functions, I mean functions at the binary level, not C# functions returning a bool value
I want to generate random numbers within a range (1 - 100000), but instead of purely random I want the results to be based on a kind of distribution. What I mean that in general I want the numbers "clustered" around the minimum value of the range (1).
I've read about Box–Muller transform and normal distributions but I'm not quite sure how to use them to achieve the number generator.
How can I achieve such an algorithm using C#?
There are a lot of ways doing this (using uniform distribution prng) here few I know of:
Combine more uniform random variables to obtain desired distribution.
I am not a math guy but there sure are equations for this. This kind of solution has usually the best properties from randomness and statistical point of view. For more info see the famous:
Understanding “randomness”.
but there are limited number of distributions we know the combinations for.
Apply non linear function on uniform random variable
This is the simplest to implement. You simply use floating randoms in <0..1> range apply your non linear function (that change the distribution towards your wanted shape) on them (while result is still in the <0..1> range) and rescale the result into your integer range for example (in C++):
floor( pow( random(),5 ) * 100000 )
The problem is that this is just blind fitting of the distribution so you usually need to tweak the constants a bit. It a good idea to render histogram and randomness graphs to see the quality of result directly like in here:
How to seed to generate random numbers?
You can also avoid too blind fitting with BEZIERS like in here:
Random but most likely 1 float
Distribution following pseudo random generator
there are two approaches I know of for this the simpler is:
create big enough array of size n
fill it with all values following the distribution
so simply loop through all values you want to output and compute how many of them will be in n size array (from your distribution) and add that count of the numbers into array. Beware the filled size of the array might be slightly less than n due to rounding. If n is too small you will be missing some less occurring numbers. so if you multiply probability of the least probable number and n it should be at least >=1. After the filling change the n into the real array size (number of really filled numbers in it).
shuffle the array
now use the array as linear list of random numbers
so instead of random() you just pick a number from array and move to the next one. Once you get into n-th value schuffle the array and start from first one again.
This solution has very good statistical properties (follows the distribution exactly) but the randomness properties are not good and requires array and occasional shuffling. For more info see:
How to efficiently generate a set of unique random numbers with a predefined distribution?
The other variation of this is to avoid use of array and shuffling. It goes like this:
get random value in range <0..1>
apply inverse cumulated distribution function to convert to target range
as you can see its like the #2 Apply non linear function... approach but instead of "some" non linear function you use directly the distribution. So if p(x) is probability of x in range <0..1> where 1 means 100% than we need a function that cumulates all the probabilities up to x (sorry do not know the exact math term in English). For integers:
f(x) = p(0)+p(1)+...+p(x)
Now we need inverse function g() to it so:
y = f(x)
x = g(y)
Now if my memory serves me well then the generation should look like this:
y = random(); // <0..1>
x = g(y); // probability -> value
Many distributions have known g() function but for those that do not (or we are too lazy to derive it) you can use binary search on p(x). Too lazy to code it so here slower linear search version:
for (x=0;x<max;x++) if (f(x)>=y) break;
So when put all together (and using only p(x)) I got this (C++):
y=random(); // uniform distribution pseudo random value in range <0..1>
for (f=0.0,x=0;x<max;x++) // loop x through all values
{
f+=p(x); // f(x) cumulative distribution function
if (f>=y) break;
}
// here x is your pseudo random value following p(x) distribution
This kind of solution has usually very good both statistical and randomness properties and does not require that the distribution is a continuous function (it can be even just an array of values instead).
In C# I created a list array containing a list of varied indexes. I'd like to display 1 combination of 2 combinations of different indexes. The 2 combinations inside the one must not be repeated.
I am trying to code a tennis tournament with 14 players that pair. Each player must never be paired with another player twice.
Your problem falls under the domain of the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
There are 2 different ways to interpret your problem. In tennis, tournaments are usually arranged to use single elmination where the winning player from each match advances. However, some local clubs also use round robins where each player plays each other player just once, which appears to be the problem that you are looking at.
So, the question is - how to calculate the total number of unique matches that can be played with 14 players (N = 14), where each player plays just one other player (and thus K = 2). The binomial coefficient calculation is as follows:
Total number of unique combinations = N! / (K! * (N - K)! ). The ! character is called a factorical, and means N * (N-1) * (N-2) ... * 1. When K is 2, the binomial coefficient is reduced to: N * (N - 1) / 2. So, plugging in 14 for N and 2 for K, we find that the total number of combinations is 91.
The following code will iterate through each uniue combinations:
int N = 14; // Total number of elements in the set.
int K = 2; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the 2 players, starting with index 0.
int[] KIndexes = new int[K];
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Loop, KIndexes);
// KIndex[0] is the first player & Kindex[2] is the 2nd player.
// Print out the indexes for both players.
String S = "Player1 = Kindexes[0].ToString() + ", " +
"Player2 = Kindexes[1].ToString();
Console.WriteLine(S};
}
You should be able to port this class over fairly easily to the language of your choice. You probably will not have to port over the generic part of the class to accomplish your goals. Depending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.
I should also mention, that since this is a class project, your teacher might not accept the above answer since he might be looking for more original work. In that case, you might want to consider using loops. You should check with him before submitting a solution.
This question already has answers here:
Mapping two integers to one, in a unique and deterministic way
(19 answers)
Closed 7 years ago.
I need to find a way, such that user has to input 2 numbers (int) and for every different value a single output (int preferably!) is returned.
Say user enters 6, 8 it returns k when user enter anything else like 6,7 or 9,8 or any other input m, n except for 6, 8 (even if only one input is changed) a completely different output is produced. But the thing is, it should be unique for only that m, n so I cant use something like m*n because 6 X 4 = 24 but also, 12 X 2 = 24 so the output is not unique, so I need to find a way where for every different input, there is a totally different output that is not repeated for any other value.
EDIT: In response to Nicolas: the input range can be anything but will be less then 1000 (but more then 1 of course!)
EDIT 2: In response to Rawling, I can use long (Int64) but not preferably use float or doulbe, becuase this output will be used in a for loop, and float and double are terrible for for loop, you can check it here
Since your two numbers are less than 1000, you can do k = (1000 * x1) + x2 to get a unique answer. The maximum value would be 999999, which is well within the range of a 32-bit int.
You can always return a long: from two integers a and b, return 2^|INT_SIZE|*a + b
It is easy to see from pigeonhole principle, that given two ints, one cannot return a unique int for every different input. Explanation: If you have 2 numbers, each containing n bits, then there are 2^n possibilities for each number, and thus there are (2^n)^2 possible pairs, so from piegeonhole principle - you need at least lg_2((2^n)^2) = 2n bits to represent them,
EDIT: Your edit mentions the range of your numbers is [1,1000] - thus the same idea can be applied: 1000*a + b will generate a unique int for each pairs.
Note that for the same reasons, the range of the resulting integer must be [1,1000000] - or you will get clashes.
Because I don't have 50 posts to comment, I must say, there are functions
called Pairing Functions.
Pairing functions such as Cantor's Pairing Function(Shown on the previous link) and Szudzik's Pairing Function which allows the inputs to be infinite and still be able to provide an unique and deterministic output.
Here is another similar question on stackoverflow. (Great, I need 10 reputation to post more than two links..)
(http://) stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way
EDIT: I'm late.
If you didn't have a hard upper bound, you could do the following:
int Unique (int x, int y)
{
int n = x + y;
int t = (n%2==0) ? ((n/2) * (n+1)) : (n * ((n+1)/2));
return t + x;
}
Mathematically speaking, this will return a unique non negative integer for each (non-negative) pair of integers with no upper bound.
Programatically speaking, it will run into overflow problems, which could be overcome by using long instead of int for everything except the input variables.
The canonical mathematical solution is to use prime powers. As every number can be decomposed uniquely into its prime factors, returning 2^n * 3^m will give you different results for every n and m.
This can be extended to 2^n * 3^m * 5^a * 7^b *11^c and so on; you only need to check that you do not run out of 32-bit integers. If there is a risk of overflowing, you can take the remainder after dividing by a prime larger than your input range, and you will still have uniqueness.
I am using C# and have a list of int numbers which contains different numbers such as {34,36,40,35,37,38,39,4,5,3}. Now I need a script to find the different ranges in the list and write it on a file. for this example it would be: (34-40) and (3-5). What is the quick way to do it?
thanks for the help in advance;
The easiest way would be to sort the array and then do a single sequential pass to capture the ranges. That will most likely be fast enough for your purposes.
Two techniques come to mind: histogramming and sorting. Histogramming will be good for dense number sets (where you have most of the numbers between min and max) and sorting will be good if you have sparse number sets (very few of the numbers between min and max are actually used).
For histogramming, simply walk the array and set a Boolean flag to True in the corresponding position histogram, then walk the histogram looking for runs of True (default should be false).
For sorting, simply sort the array using the best applicable sorting technique, then walk the sorted array looking for contiguous runs.
EDIT: some examples.
Let's say you have an array with the first 1,000,000 positive integers, but all even multiples of 191 are removed (you don't know this ahead of time). Histogramming will be a better approach here.
Let's say you have an array containing powers of 2 (2, 4, 8, 16, ...) and 3 (3, 9, 27, 81, ...). For large lists, the list will be fairly sparse and sorting should be expected to do better.
As Mike said, first sort the list. Now, starting with the first element, remember that element, then compare it with the next one. If the next element is 1 greater than the current one, you have a contiguous series. Continue this until the next number is NOT contiguous. When you reach that point, you have a range from the first remembered value to the current value. Remember/output that range, then start again with the next value as the first element of a new series. This will execute in roughly 2N time (linear).
I would sort them and then check for consecutive numbers. If the difference > 1 you have a new range.