Exercise Background
The exercise consists in generating a 2D map with a user given x,y size of said map, and then place on each cell of the map random items from a table.
I have a cell in an [x, y] coordinate of an Items matrix and I have to pick items randomly for every cell of this matrix.
My Problem
I have to select random items from a table of 4 items that have their probabilities shown in cumulative probability, and a cell that has such items can have more than 1 and different combinations of those items.
I don't really know how to go about this problem, taking in account that 2 of the items have the same probability on the given table for the homework.
This is the table of probability given:
Food - 1
Weapons - 0.5
Enemy - 0.5
Trap - 0.3
My Items enumeration:
[Flags]
enum Items
{
Food = 1<<0,
Weapon = 1<<1,
Enemy = 1<<2,
Trap = 1<<3
}
Again, the expected output is to pick randomly through this percentages what items does 1 cell have. What I'd like to have as an answer would be just a start or a way to go about this problem please, I still want to try and do it myself, avoid complete code solutions if you can.
I find it easier to work with integers in this type of problem, so I'll work with:
Food - 10
Weapons - 5
Enemy - 5
Trap - 3
That gives a total of 10 + 5 + 5 + 3 = 23 total possible options.
Most computer RNGs work from base 0, so split the 23 options (as in 0..22) like this:
Food - 0..9 giving 10 options.
Weapons - 10..14 giving 5 options.
Enemy - 15..19 giving 5 options.
Trap - 20..22 giving 3 options.
Work through the possibilities in order, stopping when you reach the selected option. I will use pseudocode as my C++ is very rusty:
function pickFWET()
pick <- randomInRange(0 to 22);
if (pick < 10) return FOOD;
if (pick < 15) return WEAPONS;
if (pick < 20) return ENEMY;
if (pick < 23) return TRAP;
// If we reach here then there was an error.
throwError("Wrong pick in pickFWET");
end function pickFWET
If two items have the same cumulative probability then the probability of getting the latter item is 0. Double check the probability table, but if it is correct, then 'Weapons' is not a valid option to get.
However in general. If you could 'somehow' generate a random number between 0 and 1, the problem would be easy right? With a few if conditions you can choose one of the options given this random number.
With a little bit of search you can easily find how to generate a random number in whatever language you desire.
What would be the optimal solution to the following problem :
Given a list of values (fe : numbers ranging from 0-14) how would you sort them by using only swap operations (fe : swapping the 0-th and the 9-th element in the list) your goal is to find the solution with the least swaps.
Thank you in advance
Assuming the values are 0 to n-1 for an array of size n, here is a an algorithm with O(n) time complexity, and it should be the optimal algorithm for minimizing swaps. Every swap will place at least one value (sometimes both) in it's proper location.
// the values of A[] range from 0 to n-1
void sort(int A[], int n)
{
for(int i = 0; i < n; i++)
while(A[i] != i)
swap(A[i], A[A[i]]);
}
For a more generic solution and assuming that only the swaps used to sort the original array are counted, generate an array of indices to the array to be sorted, sort the array of indices according to the array to be sorted (using any sort algorithm), then use the above algorithm to sort the original array and the array of indices at the same time. Using C++ to describe this, and using a lambda compare for this example:
void sort(int A[], int n)
{
// generate indices I[]
int *I = new int[n];
for(int i = 0; i < n; i++)
I[i] = i;
// sort I according to A
std::sort(I, I+n,
[&A](int i, int j)
{return A[i] < A[j];});
// sort A and I according to I using swaps
for(int i = 0; i < n; i++){
while(I[i] != i){
std::swap(I[i], I[I[i]]);
std::swap(A[i], A[A[i]]); // only this swap is counted
}
}
delete[] I;
}
For languages without the equivalent of a lambda commpare, a custom sort function can be used. Sorting is accomplished undoing the "cycles" in the array with O(n) time complexity. Every permutation of an array can be considered as a series of cycles. Value is really the order for the element, but in this case the ordering and value are the same:
index 0 1 2 3 4 5 6 7
value 6 3 1 2 4 0 7 5
The cycles are the "paths" to follow a chain of values, start with index 0, which has a value of 6, then go to index 6 which has a value of 7 and repeat the process until the cycle completes back at index 0. Repeat for the rest of the array. For this example, the cycles are:
0->6 6->7 7->5 5->0
1->3 3->2 2->1
4->4
Following the algorithm shown above the swaps are:
swap(a[0],a[6]) // puts 6 into place
swap(a[0],a[7]) // puts 7 into place
swap(a[0],a[5]) // puts 0 and 5 into place
swap(a[1],a[3]) // puts 3 into place
swap(a[1],a[2]) // puts 1 and 2 into place
// done
Link to the more practical example of sorting multiple arrays according to one of them. In this example, the cycles are done using a series of moves instead of swaps:
Sorting two arrays based on one with standard library (copy steps avoided)
What you're searching for is a sorting algorithm.
https://brilliant.org/wiki/sorting-algorithms/
A good one is "QuickSort" combined with a simpler sorting algorithm like "BubbleSort"
Ted-Ed also have a good video on the topic:
https://www.youtube.com/watch?v=WaNLJf8xzC4
Probably the best way to find the answer to this question is to open your favorite search engine and put the title to your question there. You will find many results, including:
Sorting algorithm - Wikipedia (which includes a section on Popular sorting algorithms)
10.4. Sorting Algorithms - Introductory Programming in C#
Read through these and find the algorithms that only use the swapping of elements to do the sorting (since that is your requirement). You can also read about the performance of the algorithms as well (since that was another part of the requirement).
Note that some will perform faster than others depending on how large and how sorted the array is.
Another way to figure this out is to ask yourself, "What do the professionals do?". Which will likely lead you to reading the documentation for the Array.Sort Method, which is the built-in mechanism that most of us use if we need to quickly sort an array. Here you will find the following information:
Remarks
This method uses the introspective sort (introsort) algorithm as follows:
If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.
If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.
Otherwise, it uses a Quicksort algorithm.
So now we see that, for small partitions (like your example with 15 elements), the pros use insertion sort.
I have the following 2 strings:
String A: Manchester United
String B: Manchester Utd
Both strings means the same, but contains different values.
How can I compare these string to have a "matching score" like, in the case, the first word is similar, "Manchester" and the second words contain similar letters, but not in the right place.
Is there any simple algorithm that returns the "matching score" after I supply 2 strings?
You could calculate the Levenshtein distance between the two strings and if it is smaller than some value (that you must define) you may consider them to be pretty close.
I've needed to do something like this and used Levenshtein distance.
I used it for a SQL Server UDF which is being used in queries with more than a million of rows (and texts of up to 6 or 7 words).
I found that the algorithm runs faster and the "similarity index" is more precise if you compare each word separately. I.e. you split each input string in words, and compare each word of one input string to each word of the other input string.
Remember that Levenshtein gives the difference, and you have to convert it to a "similarity index". I used something like distance divided by the length of the longest word (but with some variations)
First rule: order and number of words
You must also consider:
if there must be the same number of words in both inputs, or it can change
and if the order must be the same on both inputs, or it can change.
Depending on this the algorithm changes. For example, applying the first rule is really fast if the number of words differs. And, the second rule reduces the number of comparisons, specially if there are many words in the compared texts. That's explained with examples later.
Second rule: weighting the similarity of each compared pair
I also weighted the longer words higher than the shorter words to get the global similarity index. My algorithm takes the longest of the two words in the compared pair, and gives a higher weight to the pair with the longer words than to the pair with the shorter ones, although not exactly proportional to the pair length.
Sample comparison: same order
With this example, which uses different number of words:
compare "Manchester United" to "Manchester Utd FC"
If the same order of the words in both inputs is guaranteed, you should compare these pairs:
Manchester United
Manchester Utd FC
(Manchester,Manchester) (Utd,United) (FC: not compared)
Manchester United
Manchester Utd FC
(Manchester,Manchester) (Utd: not compared) (United,FC)
Machester United
Manchester Utd FC
(Mancheter: not compared) (Manchester,Utd) (United,FC)
Obviously, the highest score would be for the first set of pairs.
Implementation
To compare words in the same order.
The string with the higher number of words is a fixed vector, shown as A,B,C,D,E in this example. Where v[0] is the word A, v[1] the word B and so on.
For the string with the lower number of words we need to create all the possible combination of indexes that can be compared with the firs set. In this case, the string with lower number of words is represented by a,b,c.
You can use a simple loop to create all the vectors that represents the pairs to be compared like so
A,B,C,D,E A,B,C,D,E A,B,C,D,E A,B,C,D,E A,B,C,D,E A,B,C,D,E
a,b,c a,b, c a,b, c a, b,c a, b, c a, b,c
0 1 2 0 1 3 0 1 4 0 2 3 0 2 4 0 3 4
A,B,C,D,E A,B,C,D,E A,B,C,D,E A,B,C,D,E
a,b,c a,b, c a, b,c a,b,c
1 2 3 1 2 4 1 3 4 2 3 4
The numbers in the sample, are vectors that have the indices of the first set of words which must be comapred with the indices in the first set. i.e. v[0]=0, means compare index 0 of the short set (a) to index 0 of the long set (A), v[1]=2 means compare index 1 of the short (b) set to index 2 of the long set (C), and so on.
To calculate this vectors, simply start with 0,1,2. Move to the right the latest index that can be moved until it can no longer be moved:
Strat by moving the last one:
0,1,2 -> 0,1,3 -> 0,1,4
No more moves possible, move the previous index, and restore the others
to the lowest possible values (move 1 to 2, restore 4 to 3)
When the last can't be move any further, move the one before the last, and reset the last to the nearest possible place (1 moved to 2, and 4 move to 3):
0,2,3 -> 0,2,4
No more moves possible of the last, move the one before the last
Move the one before the last again.
0,3,4
No more moves possible of the last, move the one before the last
Not possible, move the one before the one before the last, and reset the others:
Move the previous one:
1,2,3 -> 1,2,4
And so on. See the picture
When you have all the possible combinations you can compare the defined pairs.
Third rule: minimum similarity to stop comparison
Stop comparison when minimun similarity is reached: depending on what you want to do it's possible that you can set a thresold that, when it's reached, stops the comparison of the pairs.
If you can't set a thresold, at least you can always stop if you get a 100% similarity for each pair of words. This allows to spare a lot of time.
On some occasions you can simply decide to stop the comparison if the similarity is at least, something like 75%. This can be used if you want to show the user all the strings which are similar to the one provided by the user.
Sample: comparison with change of the order of the words
If there can be changes in the order, you need to compare each word of the first set with each word of the second set, and take the highest scores for the combinations of results, which include all the words of the shortest pair ordered in all the possible ways, compared to different words of the second pair. For this you can populate the upper or lower triangle of a matrix of (n X m) elements, and then take the required elements from the matrix.
Fourth rule: normalization
You must also normalize the word before comparison, like so:
if not case-sensitive convert all the words to upper or lower case
if not accent sensitive, remove accents in all the words
if you know that there are usual abbreviations, you can also normalized them, to the abbreviation to speed it up (i.e. convert united to utd, not utd to united)
Caching for optimization
To optmize the procedure, I cached whichever I could, i.e. the comparison vectors for different sizes, like the vectors 0,1,2-0,1,3,-0,1,4-0,2,3, in the A,B,C,D,E to a,b,c comparison example: all comparisons for lengths 3,5 would be calculated on first use and recycled for all the 3 words to 5 words incoming comparisons.
Other algorithms
I tried Hamming distance and the results were less accurate.
You can do much more complex things like semantic comparisons, phonetic comparisons, consider that some letters are just the same (like b and v, for several languages, like spanish, where ther is no distinction). Some of this things are very easy to implemente and others are really difficult.
NOTE: I didn't include the implementation of Levenhstein distance, because you can easyly find it implemented on differente laguages
Take a look at this article, which explains how to do it and gives sample code too :)
Fuzzy Matching (Levenshtein Distance)
Update:
Here is the method code that takes two strings as parameters and calculates the "Levenshtein Distance" of the two strings
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
Detecting duplicates sometimes might be a "little" more complicated than computing Levenshtein dinstance.
Consider following example:
1. Jeff, Lynch, Maverick, Road, 181, Woodstock
2. Jeff, Alf., Lynch, Maverick, Rd, Woodstock, NY
This duplicates can be matched by complicated clustering algorithms.
For further information you might want to check some research papers like
"Effective Incremental Clustering for Duplicate Detection in Large Databases".
(Example is from the paper)
What you are looking for is a string similarity measure. There are multiple ways of doing this:
Edit Distances between two strings (as in Answer #1)
Converting the strings into sets of characters (generally on bigrams or words) and then calculating Bruce Coefficient or Dice Coefficient on the two sets.
Projecting the strings into term vectors (either on words or bigrams) and calculating the Cosine Distance between the two vectors.
I generally find the option #2 to be the easiest to implement and if your strings are phrases then you can simply tokenize them on word-boundaries.
In all the above cases, you might want to first remove the stop words (common words like and, a,the etc) before tokenizing.
Update: Links
Dice Coefficient
Cosine Similarity
Implementing Naive Similarity engine in C# *Warning: shameless Self Promotion
Here is an alternative to using the Levenshtein distance algorithm. This compares strings based on Dice's Coefficient, which compares the number of common letter pairs in each string to generate a value between 0 and 1 with 0 being no similarity and 1 being complete similarity
public static double CompareStrings(string strA, string strB)
{
List<string> setA = new List<string>();
List<string> setB = new List<string>();
for (int i = 0; i < strA.Length - 1; ++i)
setA.Add(strA.Substring(i, 2));
for (int i = 0; i < strB.Length - 1; ++i)
setB.Add(strB.Substring(i, 2));
var intersection = setA.Intersect(setB, StringComparer.InvariantCultureIgnoreCase);
return (2.0 * intersection.Count()) / (setA.Count + setB.Count);
}
Call the method like this:
CompareStrings("Manchester United", "Manchester Utd");
Ouput is: 0.75862068965517238