I am looking for an algorithm to compute the 'distance' of 2 string of same length. The operations eligible are :
replace a char
swap two char whatever there position is
Hamming distance will fit for the first action but I am struggling to find a way to compute the minium swaps count for the second action. It looks like it is the minimum swaps required to move from one permutation to another but I can not go deeper. Is there something known for that ?
NB : I am implementing in C# but generic algorithm is fine
Related
I want to generate random numbers within a range (1 - 100000), but instead of purely random I want the results to be based on a kind of distribution. What I mean that in general I want the numbers "clustered" around the minimum value of the range (1).
I've read about Box–Muller transform and normal distributions but I'm not quite sure how to use them to achieve the number generator.
How can I achieve such an algorithm using C#?
There are a lot of ways doing this (using uniform distribution prng) here few I know of:
Combine more uniform random variables to obtain desired distribution.
I am not a math guy but there sure are equations for this. This kind of solution has usually the best properties from randomness and statistical point of view. For more info see the famous:
Understanding “randomness”.
but there are limited number of distributions we know the combinations for.
Apply non linear function on uniform random variable
This is the simplest to implement. You simply use floating randoms in <0..1> range apply your non linear function (that change the distribution towards your wanted shape) on them (while result is still in the <0..1> range) and rescale the result into your integer range for example (in C++):
floor( pow( random(),5 ) * 100000 )
The problem is that this is just blind fitting of the distribution so you usually need to tweak the constants a bit. It a good idea to render histogram and randomness graphs to see the quality of result directly like in here:
How to seed to generate random numbers?
You can also avoid too blind fitting with BEZIERS like in here:
Random but most likely 1 float
Distribution following pseudo random generator
there are two approaches I know of for this the simpler is:
create big enough array of size n
fill it with all values following the distribution
so simply loop through all values you want to output and compute how many of them will be in n size array (from your distribution) and add that count of the numbers into array. Beware the filled size of the array might be slightly less than n due to rounding. If n is too small you will be missing some less occurring numbers. so if you multiply probability of the least probable number and n it should be at least >=1. After the filling change the n into the real array size (number of really filled numbers in it).
shuffle the array
now use the array as linear list of random numbers
so instead of random() you just pick a number from array and move to the next one. Once you get into n-th value schuffle the array and start from first one again.
This solution has very good statistical properties (follows the distribution exactly) but the randomness properties are not good and requires array and occasional shuffling. For more info see:
How to efficiently generate a set of unique random numbers with a predefined distribution?
The other variation of this is to avoid use of array and shuffling. It goes like this:
get random value in range <0..1>
apply inverse cumulated distribution function to convert to target range
as you can see its like the #2 Apply non linear function... approach but instead of "some" non linear function you use directly the distribution. So if p(x) is probability of x in range <0..1> where 1 means 100% than we need a function that cumulates all the probabilities up to x (sorry do not know the exact math term in English). For integers:
f(x) = p(0)+p(1)+...+p(x)
Now we need inverse function g() to it so:
y = f(x)
x = g(y)
Now if my memory serves me well then the generation should look like this:
y = random(); // <0..1>
x = g(y); // probability -> value
Many distributions have known g() function but for those that do not (or we are too lazy to derive it) you can use binary search on p(x). Too lazy to code it so here slower linear search version:
for (x=0;x<max;x++) if (f(x)>=y) break;
So when put all together (and using only p(x)) I got this (C++):
y=random(); // uniform distribution pseudo random value in range <0..1>
for (f=0.0,x=0;x<max;x++) // loop x through all values
{
f+=p(x); // f(x) cumulative distribution function
if (f>=y) break;
}
// here x is your pseudo random value following p(x) distribution
This kind of solution has usually very good both statistical and randomness properties and does not require that the distribution is a continuous function (it can be even just an array of values instead).
I have a function that takes in X as an argument and randomly picks an element from a 2D array.
The 2D array has thousands of elements, each of them has a different requirement on X, stored in arr[Y][1].
For example,
arr[0] should only be chosen when X is larger than 4. (arr[0][1] = 4+)
Then arr[33] should only be chosen when X is between 37 and 59. (arr[33][1] = 37!59)
And arr[490] should only be chosen when X is less than 79. (arr[490][1] = 79-)
And there are many more, most with a different X requirement.
What is the best way to tackle this problem that takes the least space, and least repetition of elements?
The worst way would be storing possible choices for each X in a 2D array. But that would cause a lot of repetition, costing too much memory.
Then, I have thought about using three arrays, separating X+ requirements, X- and X range. But it still sounds too basic to me, is there a better way?
One option here would be what's called "accept/reject sampling": you pick a random index i and check if the condition on X is satisfied for that index. If so, you return arr[i]. If not, you pick another index at random and repeat until you find something.
Performance will be good so long as most conditions are satisfied for most values of i. If this isn't the case -- if there are a lot of values of X for which only a tiny number of conditions are satisfied -- then it might make sense to try and precompute something that lets you find (or narrow down) the indices that are allowable for a given X.
How to do this depends on what you allow as a condition on each index. For instance, if every condition is given by an interval like in the examples you give, you could sort the list twice, first by left endpoints and then by right endpoints. Then determining the valid indices for a particular value of X comes down to intersecting the intervals whose left endpoint is less than or equal to X with those whose right endpoint is greater than or equal to X.
Of course if you allow conditions other than "X is in this interval" then you'd need a different algorithm.
While I believe that re-sampling will be the optimal solution in your case (dozens of resamplings is very cheap price to pay), here is the algorithm I would never implement in practice (since it uses very complicated datastructures and is less efficient than resampling), but with provable bounds. It requires O(n log n) preprocessing time, O(n log n) memory and O(log n) time for each query, where n is the number of elements you can potentially sample.
You store all ends of all ranges in one array (call it ends). E.g. in your case you have an array [-infty, 4, 37, 59, 79, +infty] (it may require some tuning, like adding +1 to right ends of ranges; not important now). The idea is that for any X we only have to determine between which ends it's located. E.g. if X=62 is in range [59; 79] (I'll call such pair an interval). Then for each interval you store a set of all possible ranges. For your input X you just find the interval (using binary search) and then output a random range, corresponding to this interval.
How do you compute the corresponding set of ranges for each interval? We go from left to right in ends array. Let's assume we compute the set for the current interval, and go to the next one. There is some end between these interval. If it's a left end of some interval, we add the corresponding range to the new set (since we enter this range). If it's a right end, we remove the range. How do we do this in O(log n) time instead of O(n)? Immutable balanced tree sets can do this (essentially, they create new trees instead of modifying the old one).
How do you return a uniformly random range from a set? You should augment tree sets: each node should know how many nodes its subtree contains. First you sample an integer in range [0; size(tree)). Then you look at your root node and its children. For example, assume that you sampled integer 15, and your left child's subtree has size 10, while the right's one is 20. Then you go to the right child (since 15 >= 10) and process it with integer 5 (since 15 - 10 = 5). You will eventually visit a leaf, corresponding to a single range. Return this range.
Sorry if it's hard to understand. Like I said, it's not trivial approach which you would need for upper bounds in the worse case (other approaches discussed before require linear time in the worst case; resampling may run for indefinite time if there is no element satisfying restrictions). It also requires some careful handling (e.g. when some ranges have coinciding endpoints).
What would be the optimal solution to the following problem :
Given a list of values (fe : numbers ranging from 0-14) how would you sort them by using only swap operations (fe : swapping the 0-th and the 9-th element in the list) your goal is to find the solution with the least swaps.
Thank you in advance
Assuming the values are 0 to n-1 for an array of size n, here is a an algorithm with O(n) time complexity, and it should be the optimal algorithm for minimizing swaps. Every swap will place at least one value (sometimes both) in it's proper location.
// the values of A[] range from 0 to n-1
void sort(int A[], int n)
{
for(int i = 0; i < n; i++)
while(A[i] != i)
swap(A[i], A[A[i]]);
}
For a more generic solution and assuming that only the swaps used to sort the original array are counted, generate an array of indices to the array to be sorted, sort the array of indices according to the array to be sorted (using any sort algorithm), then use the above algorithm to sort the original array and the array of indices at the same time. Using C++ to describe this, and using a lambda compare for this example:
void sort(int A[], int n)
{
// generate indices I[]
int *I = new int[n];
for(int i = 0; i < n; i++)
I[i] = i;
// sort I according to A
std::sort(I, I+n,
[&A](int i, int j)
{return A[i] < A[j];});
// sort A and I according to I using swaps
for(int i = 0; i < n; i++){
while(I[i] != i){
std::swap(I[i], I[I[i]]);
std::swap(A[i], A[A[i]]); // only this swap is counted
}
}
delete[] I;
}
For languages without the equivalent of a lambda commpare, a custom sort function can be used. Sorting is accomplished undoing the "cycles" in the array with O(n) time complexity. Every permutation of an array can be considered as a series of cycles. Value is really the order for the element, but in this case the ordering and value are the same:
index 0 1 2 3 4 5 6 7
value 6 3 1 2 4 0 7 5
The cycles are the "paths" to follow a chain of values, start with index 0, which has a value of 6, then go to index 6 which has a value of 7 and repeat the process until the cycle completes back at index 0. Repeat for the rest of the array. For this example, the cycles are:
0->6 6->7 7->5 5->0
1->3 3->2 2->1
4->4
Following the algorithm shown above the swaps are:
swap(a[0],a[6]) // puts 6 into place
swap(a[0],a[7]) // puts 7 into place
swap(a[0],a[5]) // puts 0 and 5 into place
swap(a[1],a[3]) // puts 3 into place
swap(a[1],a[2]) // puts 1 and 2 into place
// done
Link to the more practical example of sorting multiple arrays according to one of them. In this example, the cycles are done using a series of moves instead of swaps:
Sorting two arrays based on one with standard library (copy steps avoided)
What you're searching for is a sorting algorithm.
https://brilliant.org/wiki/sorting-algorithms/
A good one is "QuickSort" combined with a simpler sorting algorithm like "BubbleSort"
Ted-Ed also have a good video on the topic:
https://www.youtube.com/watch?v=WaNLJf8xzC4
Probably the best way to find the answer to this question is to open your favorite search engine and put the title to your question there. You will find many results, including:
Sorting algorithm - Wikipedia (which includes a section on Popular sorting algorithms)
10.4. Sorting Algorithms - Introductory Programming in C#
Read through these and find the algorithms that only use the swapping of elements to do the sorting (since that is your requirement). You can also read about the performance of the algorithms as well (since that was another part of the requirement).
Note that some will perform faster than others depending on how large and how sorted the array is.
Another way to figure this out is to ask yourself, "What do the professionals do?". Which will likely lead you to reading the documentation for the Array.Sort Method, which is the built-in mechanism that most of us use if we need to quickly sort an array. Here you will find the following information:
Remarks
This method uses the introspective sort (introsort) algorithm as follows:
If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.
If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.
Otherwise, it uses a Quicksort algorithm.
So now we see that, for small partitions (like your example with 15 elements), the pros use insertion sort.
I need ideas how to make this happen. I just started to learn programming. I need to make a program which generates random spots (marked by * chars) in matrix filled by . chars. Matrix size is entered in console (int n and int m). I managed to do this part. But the hard part is - I have to find the number of spots (* and every * near it combines to one big spot) and the biggest of the spots. How could I do this?
Thank you very much...
here's how matrix looks like in this scenario - number of spots should be 6 and biggest spot size is 21
You need to generate 3 random numbers. First number will be the first index of the matrix, the second number -- the second index in the matrix, and the third number of iterations of the above 2 actions. Use the System.Random class and the method of this class yourobject.Next(). The constructor for this class has two overloads. One overload is empty. It generates a seed depending of the time (be careful with the time!!! Don't initialize objects inside a loop). Another overload is using a seed given by you. The static method Next() was also two overloads. One is empty. It generates an random number, and nothing more. The second overload, you need to specify the maximum number. The numbers that will be generated will be in this length
0 < n < yournumber. In your case to generate the index you need to specify as a parameter the Length of the matrix + 1, but the result will need to be: result - 1 (I think you will understand why). Good luck!
Possible Dup: Help Me Figure Out A Random Scheduling Algorithm using Python and PostgreSQL
Let's say you have a division with 9 teams, and you want them to play 16 games each. Usually you would want to have 8 games (Home), and 8 games (Visitor). Is there a known algorithm to go in and assign the matches, randomly?
Note -> It can, sometimes not work, so you can have uneven numbers.
Any help is appreciated.
See these permutation algorithms
Does this one work for you : Fisher–Yates shuffle
There's a nice easy way to generate a round robin here. In the second round, you can repeat the round-robin and add swap home and away.
If you have an odd number of teams, you just use a dummy team that gives its opponent a bye in a particular round, which results in an extra round. You can distribute that extra round among the other rounds if you'd rather give double-headers than byes.
I think you can use the maximal matching in a bipartite graph algorithm for this (see, e.g., here), which runs in polynomial time.
We represent your problem by assigning each team, T, 8 vertices (Th1, ..., Th8) in the "home" subset of vertices and 8 vertices (Ta1, ..., Ta8) in the "away" subset of the vertices.
We now look for a maximal matching between the "home" and "away" subsets such that each edge (H, A) in the matching satisfies the property that H is in the "home" subset, "A" is in the "away" subset, and H and A belong to different teams.