Randomly select a specific quantity of indices from an array? - c#

I have an array of boolean values and need to randomly select a specific quantity of indices for values which are true.
What is the most efficient way to generate the array of indices?
For instance,
BitArray mask = GenerateSomeMask(length: 100000);
int[] randomIndices = RandomIndicesForTrue(mask, quantity: 10);
In this case the length of randomIndices would be 10.

There's a faster way to do this that requires only a single scan of the list.
Consider picking a line at random from a text file when you don't know how many lines are in the file, and the file is too large to fit in memory. The obvious solution is to read the file once to count the lines, pick a random number in the range of 0 to Count-1, and then read the file again up to the chosen line number. That works, but requires you to read the file twice.
A faster solution is to read the first line and save it as the selected line. You replace the selected line with the next line with probability 1/2. When you read the third line, you replace with probability 1/3, etc. When you've read the entire file, you have selected a line at random, and every line had equal probability of being selected. The code looks something like this:
string selectedLine = null;
int numLines = 0;
Random rnd = new Random();
foreach (var line in File.ReadLines(filename))
{
++numLines;
double prob = 1.0/numLines;
if (rnd.Next() >= prob)
selectedLine = line;
}
Now, what if you want to select 2 lines? You select the first two. Then, as each line is read the probability that it will replace one of the two lines is 2/n, where n is the number of lines already read. If you determine that you need to replace a line, you randomly select the line to be replaced. You can follow that same basic idea to select any number of lines at random. For example:
string[] selectedLines = new int[M];
int numLines = 0;
Random rnd = new Random();
foreach (var line in File.ReadLines(filename))
{
++numLines;
if (numLines <= M)
{
selectedLines[numLines-1] = line;
}
else
{
double prob = (double)M/numLines;
if (rnd.Next() >= prob)
{
int ix = rnd.Next(M);
selectedLines[ix] = line;
}
}
}
You can apply that to your BitArray quite easily:
int[] selected = new int[quantity];
int num = 0; // number of True items seen
Random rnd = new Random();
for (int i = 0; i < items.Length; ++i)
{
if (items[i])
{
++num;
if (num <= quantity)
{
selected[num-1] = i;
}
else
{
double prob = (double)quantity/num;
if (rnd.Next() > prob)
{
int ix = rnd.Next(quantity);
selected[ix] = i;
}
}
}
}
You'll need some special case code at the end to handle the case where there aren't quantity set bits in the array, but you'll need that with any solution.
This makes a single pass over the BitArray, and the only extra memory it uses is for the list of selected indexes. I'd be surprised if it wasn't significantly faster than the LINQ version.
Note that I used the probability calculation to illustrate the math. You can change the inner loop code in the first example to:
if (rnd.Next(numLines+1) == numLines)
{
selectedLine = line;
}
++numLines;
You can make a similar change to the other examples. That does the same thing as the probability calculation, and should execute a little faster because it eliminates a floating point divide for each item.

There are two families of approaches you can use: deterministic and non-deterministic. The first one involves finding all the eligible elements in the collection and then picking N at random; the second involves randomly reaching into the collection until you have found N eligible items.
Since the size of your collection is not negligible at 100K and you only want to pick a few out of those, at first sight non-deterministic sounds like it should be considered because it can give very good results in practice. However, since there is no guarantee that N true values even exist in the collection, going non-deterministic could put your program into an infinite loop (less catastrophically, it could just take a very long time to produce results).
Therefore I am going to suggest going for a deterministic approach, even though you are going to pay for the guarantees you need through the nose with resource usage. In particular, the operation will involve in-place sorting of an auxiliary collection; this will practically undo the nice space savings you got by using BitArray.
Theory aside, let's get to work. The standard way to handle this is:
Filter all eligible indices into an auxiliary collection.
Randomly shuffle the collection with Fisher-Yates (there's a convenient implementation on StackOverflow).
Pick the N first items of the shuffled collection. If there are less than N then your input cannot satisfy your requirements.
Translated into LINQ:
var results = mask
.Select((i, f) => Tuple.Create) // project into index/bool pairs
.Where(t => t.Item2) // keep only those where bool == true
.Select(t => t.Item1) // extract indices
.ToList() // prerequisite for next step
.Shuffle() // Fisher-Yates
.Take(quantity) // pick N
.ToArray(); // into an int[]
if (results.Length < quantity)
{
// not enough true values in input
}

If you have 10 indices to choose from, you could generate a random number from 0 to 2^10 - 1, and use that as you mask.

Related

What is stopping this program from having an output

I made this code to brute force anagrams by printing all possible permutables of the characters in a string, however there is no output. I do not need simplifications as I am a student still learning the basics, I just need help getting it to work this way so I can better understand it.
using System;
using System.Collections.Generic;
namespace anagramSolver
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter anagram:");
string anagram = Console.ReadLine();
string temp = "";
List<string> storeStr = new List<string>(0);
List<int> storeInt = new List<int>(0);
//creates factorial number for anagram
int factorial = 1;
for (int i = anagram.Length; i > 0; i--)
{
factorial *= i;
}
while (storeStr.Count != factorial)
{
Random rnd = new Random();
while(temp.Length != anagram.Length)
{
int num = rnd.Next(anagram.Length - 1);
if (storeInt.Contains(num) == true)
{
}
else
{
storeInt.Add(num);
temp += anagram[num];
}
}
if (storeStr.Contains(temp) == true)
{
temp = "";
}
else
{
storeStr.Add(temp);
Console.WriteLine(temp, storeStr.Count);
temp = "";
}
}
}
}
}
edit: added temp reset after it is deemed contained by storeStr
Two main issues causing infinite loop:
1)
As per Random.Next documentation, the parameter is the "exclusive" upper bound. This means, if you want a random number between 0 and anagram.Length - 1 included, you should use rnd.Next(anagram.Length);.
With rnd.Next(anagram.Length - 1), you'll never hit anagram.Length - 1.
2)
Even if you solve 1, only the first main iteration goes well.
storeInt is never reset. Meaning, after the first main iteration, it will have already all the numbers in it.
So, during the second iteration, you will always hit the case if (storeInt.Contains(num) == true), which does nothing and the inner loop will go on forever.
Several issues here...
The expression rnd.Next(anagram.Length - 1) generates a value between 0 and anagram.Length - 2. For an input with two characters it will always return 0, so you'll never actually generate a full string. Remove the - 1 from it and you'll be fine.
Next, you're using a list to keep track of the character indices you've used already, but you never clear the list. You'll get one output (eventually, when the random number generator covers all the values) and then enter an infinite loop on the next generation pass. Clear storeInt after the generation loop to fix this.
While not a true infinite loop, creating a new instance of the random number generator each time will give you a lot of duplication. new Random() uses the current time as a seed value and you could potentially get through a ton of loops with each seed, generating exactly the same values until the time changes enough to change your random sequence. Create the random number generator once before you start your main loop.
And finally, your code doesn't handle repeated input letters. If the input is "aa" then there is only a single distinct output, but your code won't stop until it gets two. If the input was "aab" there are three distinct permutations ("aab", "aba", "baa") which is half of the expected results. You'll never reach your exit condition in this case. You could do it by keeping track of the indices you've used instead of the generated strings, it just makes it a bit more complex.
There are a few ways to generate permutations that are less error-prone. In general you should try to avoid "keep generating random garbage until I find the result I'm looking for" solutions. Think about how you personally would go about writing down the full list of permutations for 3, 4 or 5 inputs. What steps would you take? Think about how that would work in a computer program.
this loop
while(temp.Length != anagram.Length)
{
int num = rnd.Next(anagram.Length - 1);
if (storeInt.Contains(num) == true)
{
}
else
{
storeInt.Add(num);
temp += anagram[num];
}
}
gets stuck.
once the number is in storeInt you never change storeStr, yet thats what you are testing for loop exit

Efficient way to pair a limited number of random elements from two separate collections

I have two lists (lista and listb), each containing an unknown number of points (two ints in a struct).
I want to create a new list containing unique random pairings from lista and listb. So an example entry might be [12,14] where 12 is an index for lista and 14 is an index for listb.
I also want to set a maximum number of pairings when calling this function. So instead of pairing every element in lista with every element in listb, I could limit it to 200 random pairings as an example.
My first attempt at this was to simply generate every possible pairing. Shuffle that list and knock off any elements past my max. This solution isn't nearly efficient enough.
My next attempt was to make an array per original list of every possible index, shuffle those separately, and then just iterate over them both until I had the max number of pairings (or all of them). This has several problems I'm not certain how to solve. One of which, lista could have 10 million elements for all I know. Creating a new array of 10 million elements (the indices list) and shuffling that when my max pairs might only be 200? Seems silly to go that far.
I've considered just choosing random elements from both lista/listb and seeing if I've already paired them before adding them to the new list. This is also quite a silly option as a lot of time can be spent picking duplicate pairings over and over.
So, what's a good option here or is there one? I don't want to iterate over every possible combination, pairings need to be unique, removing options from a list is quite slow due to the array re-sizing when they are quite large, distribution needs to be pretty uniform in the selection process for each list, etc.
Thanks for any and all help.
Edit - I meant the unique aspect regarding the pairs themselves. So element 10 in lista could be used over and over as long as the element in listb is different each time. The only catch there is I don't want to limit lista and listb right off as I need fairly even distribution across both lists for every pairing.
To avoid duplicates completely, you could try doing a sparse Fisher-Yates shuffle.
Create a Dictionary<int, int> dict that will map "indices in the Fisher-Yates array that do not hold their own index" to "the value at that index".
For the nth item, pick a random number x from n (inclusive) to "size of ListA * size of ListB" (exclusive)
dict[x] ?? x is your selected item.
Store dict[n] ?? n in dict[x].
Map the selected item back to a pair of indices (divide by size of ListA for the ListB index, modulus by the size of ListA for the ListA index).
A math or statistics buff might give you a formula for evaluating this but I just wrote some test code.
The code simply picks random pairs, and every time it sees a duplicate it tries again. Then for each such "pick a random pair until unique" cycle it counts how many retries it did and tracks this. Then finally this is summed up into a global array to track the relative frequency of these things.
Here's the results after about 1 minute of execution:
84382319 81 0 0 0 0 0 0 0 0
The numbers mean this:
Out of 421912 cycles [(84382319+81)/200]:
81 duplicates were found but retrying did not find a duplicate (3rd number and up is 0)
84382319 unique pairs could be found on the first try without duplicates
So, obviously this will start to rise if you increase the number of pairs you want generated or lower the numbers to choose wrong, but I'm not sure this will pose a problem in practice.
Here's the LINQPad program I used:
static Random R = new Random();
void Main()
{
var a = 10000;
var b = 10000;
var n = 200;
int[] counts = new int[10];
var dc = new DumpContainer().Dump();
while (true)
{
var once = Test(a, b, n);
for (int i = 0; i < once.Length; i++)
counts[i] += once[i];
dc.Content = Util.HorizontalRun(true, counts);
}
}
public static int[] Test(int a, int b, int n)
{
var seen = new HashSet<Tuple<int, int>>();
var result = new int[10];
for (int index = 0; index < n; index++)
{
int tries = 0;
while (true)
{
var av = R.Next(a);
var bv = R.Next(a);
var t = Tuple.Create(av, bv);
if (seen.Contains(t))
tries++;
else
{
seen.Add(t);
break;
}
}
result[tries]++;
}
return result;
}

Shuffle an array without creating any runs

I have an array of repeating letters:
AABCCD
and I would like to put them into pseudo-random order. Simple right, just use Fisher-Yates => done. However there is a restriction on the output - I don't want any runs of the same letter. I want at least two other characters to appear before the same character reappears. For example:
ACCABD
is not valid because there are two Cs next to each other.
ABCACD
is also not valid because there are two C's next to each other (CAC) with only one other character (A) between them, I require at least two other characters.
Every valid sequence for this simple example:
ABCADC ABCDAC ACBACD ACBADC ACBDAC ACBDCA ACDABC ACDACB ACDBAC ACDBCA
ADCABC ADCBAC BACDAC BCADCA CABCAD CABCDA CABDAC CABDCA CADBAC CADBCA
CADCAB CADCBA CBACDA CBADCA CDABCA CDACBA DACBAC DCABCA
I used a brute force approach for this small array but my actual problem is arrays with hundreds of elements. I've tried using Fisher-Yates with some suppression - do normal Fisher-Yates and then if you don't like the character that comes up, try X more times for a better one. Generates valid sequences about 87% of the time only and is very slow. Wondering if there's a better approach. Obviously this isn't possible for all arrays. An array of just "AAB" has no valid order, so I'd like to fail down to the best available order of "ABA" for something like this.
Here is a modified Fisher-Yates approach. As I mentioned, it is very difficult to generate a valid sequence 100% of the time, because you have to check that you haven't trapped yourself by leaving only AAA at the end of your sequence.
It is possible to create a recursive CanBeSorted method, which tells you whether or not a sequence can be sorted according to your rules. That will be your basis for a full solution, but this function, which returns a boolean value indicating success or failure, should be a starting point.
public static bool Shuffle(char[] array)
{
var random = new Random();
var groups = array.ToDictionary(e => e, e => array.Count(v => v == e));
char last = '\0';
char lastButOne = '\0';
for (int i = array.Length; i > 1; i--)
{
var candidates = groups.Keys.Where(c => groups[c] > 0)
.Except(new[] { last, lastButOne }).ToList();
if (!candidates.Any())
return false;
var #char = candidates[random.Next(candidates.Count)];
var j = Array.IndexOf(array.Take(i).ToArray(), #char);
// Swap.
var tmp = array[j];
array[j] = array[i - 1];
array[i - 1] = tmp;
lastButOne = last;
last = #char;
groups[#char] = groups[#char] - 1;
}
return true;
}
Maintain a link list that will keep track of the letter and it's position in the result.
After getting the random number,Pick it's corresponding character from the input(same as Fisher-Yates) but now search in the list whether it has already occurred or not.
If not, insert the letter in the result and also in the link list with its position in the result.
If yes, then check it's position in the result(that you have stored in the link list when you have written that letter in result). Now compare this location with the current inserting location, If mod(currentlocation-previouslocation) is 3 or greater, you can insert that letter in the result otherwise not, if not choose the random number again.

Non repeating random number in c# for Mine sweeper Game

i am developing a mine sweeper game in c# of dimension (8 x 8).The difficulty levels increase/decrease the number of mines on the grid.
I use a random class (with min,max set;) to generate a random cell number.The problem i am facing is ,the random object keeps repeating the same number.I tried to resolve this issue by maintaining a local list where i store the generated unique random numbers.The next time i call Next(), i would check it against the local list ,to see if its already present.If the number is already present i would keep calling Next() until i get a new number which is unique and not present in the list.But this doesnt look in itself a good solution as sometimes it takes painful amount of time to generate a new list.
Any suggestions on this please
Even if you use the same random number generator, repeating values are possible.
One way to avoid this would be to generate a list of possible values and using the random number generated to access a value in this list (using as indice) and reducing this list, as you find places to put mines to.
For 8 X 8 example, you have 64 possible places
List<int> possibleValues = new List<int>();
for (int i = 0; i < 64; i++)
{
possibleValues[i] = i;
}
List<int> result = new List<int>();
Random r = new Random();
int numberOfMines = 50; //say you want to put 50 mines there
for (int i = 0; i < numberOfMines; i++)
{
int indice = r.Next(possibleValues.Count);
int value = possibleValues[indice];
possibleValues.Remove(value);
result.Add(value);
}
It looks like you want a shuffle based on a fixed number of cells (8,8), e.g. a Fisher-Yates shuffle. This would guarantee that any coordinate only appears exactly once (as opposed to repeatedly using Random.Next() where you can draw the same number many times), and the order of appearance of coordinates is randomized.
Initialize an array that contains all the coordinates of your cells, shuffle the array and maintain an index of the next "random" cell, return the cell at the offset of the index and increase the index.
First calculate the number of mines, and empty fields.
Random rand=new Random();
int mines=GetMinesFromDifficulty(...);
int empty=TotalFields-mines;
Then for each field:
for(int y=0;y<height;y++)
for(int x=0;y<height;y++)
{
if(random.Next(mines+empty) < mines))
{
field[x,y]=Mine;
mines--;
}
else
{
field[x,y]=Empty;
empty--;
}
}
Instead of picking slots where the mines should be, loop through the slots and calculate the probability that there should be a mine there. The implementation for this becomes very simple, as you just need a single loop:
bool[] mines = new bool[64];
int cnt = 12;
Random rnd = new Random();
for (int i = 0; i < mines.Length; i++) {
if (rnd.Next(mines.Length - i) < cnt) {
mines[i] = true;
cnt--;
}
}
(Room for improvement: You can exit out of the loop when cnt reaches zero, if you don't need to initialise all slots.)
If your grid is 8x8, and you want to randomly choose an unused cell instead of pulling random numbers until you hit an unused one, then keep track of the number of unused cells. Say 8 have been used, leaving 55 unused. Then generate a random number between 0 and 54. You would then have to count through, and find the nth empty cell.
It would probably be easier to think of the problem in a more linear way. Instead of say a 2D array... Squares[8][8] think of it as a single dimension array Squares[64].
At this point you generate a number between 0-63 for your random mine placement. If say the value is 10 you could store for later to offset subsequent numbers. You can reduce your range now from 0-62, if you pulled out the value 16 you would want to add +1 for each value you'd already pulled out underneath it (so actually use 17 in this case, but square 10 has been excluded from our set).
Without seeing any code it's kind of hard to get the gist of things, but from what I can tell you have the following:
A multi-dimensional array [8][8] for the grid layout of the game, you're now trying to randomly place mines?
You'll need to keep one instance of Random for generating the numbers, else you will get the same number over and over again. Something like this
private readonly Random randomNumber = new Random();
for(int i = 0; i < 10; i++)
{
Console.WriteLine(this.randomNumber.Next(1, 10));
}
This will then generate 10 random numbers, each one different.

Is using Random and OrderBy a good shuffle algorithm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:
var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());
Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I prefer Durstenfeld's variant of the Fisher-Yates shuffle which swaps elements.
Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.
The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.
EDIT: Here's a simple implementation (no error checking!):
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length-1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
T tmp = elements[i];
elements[i] = elements[swapIndex];
elements[swapIndex] = tmp;
}
// Lazily yield (avoiding aliasing issues etc)
foreach (T element in elements)
{
yield return element;
}
}
EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    for (int i = elements.Length - 1; i >= 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
// ... except we don't really need to swap it fully, as we can
// return it immediately, and afterwards it's irrelevant.
        int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
    }
}
This will now only do as much work as it needs to.
Note that in both cases, you need to be careful about the instance of Random you use as:
Creating two instances of Random at roughly the same time will yield the same sequence of random numbers (when used in the same way)
Random isn't thread-safe.
I have an article on Random which goes into more detail on these issues and provides solutions.
This is based on Jon Skeet's answer.
In that answer, the array is shuffled, then returned using yield. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.
This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray() it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3) and only three swaps will take place (although the entire array would have to be copied first).
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
// we don't actually perform the swap, we can forget about the
// swapped element because we already returned it.
}
// there is one item remaining that was not returned - we return it now
yield return elements[0];
}
Starting from this quote of Skeet:
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I'll go on a little explaining the reason for the hopefully unique!
Now, from the Enumerable.OrderBy:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved
This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problem that is exactly this problem.
Now, is it real? Is it true?
As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG
This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wiki page there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/ and using the .OrderBy(x => r.Next()) and the .OrderBy(x => r.Next(someValue)).
Now, Random.Next is
A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.
so it's equivalent to
OrderBy(x => r.Next(int.MaxValue))
To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy, then any range of values should give the same result.
The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024). If you make the array bigger (4 or 5), then even r.Next(1024) isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)
It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).
It works by assigning each element of the series a random integer, then ordering the sequence by these integers.
It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.
This has come up many times before. Search for Fisher-Yates on StackOverflow.
Here is a C# code sample I wrote for this algorithm. You can parameterize it on some other type, if you prefer.
static public class FisherYates
{
// Based on Java code from wikipedia:
// http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
static public void Shuffle(int[] deck)
{
Random r = new Random();
for (int n = deck.Length - 1; n > 0; --n)
{
int k = r.Next(n+1);
int temp = deck[n];
deck[n] = deck[k];
deck[k] = temp;
}
}
}
Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.
One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.
I found Jon Skeet's answer to be entirely satisfactory, but my client's robo-scanner will report any instance of Random as a security flaw. So I swapped it out for System.Security.Cryptography.RNGCryptoServiceProvider. As a bonus, it fixes that thread-safety issue that was mentioned. On the other hand, RNGCryptoServiceProvider has been measured as 300x slower than using Random.
Usage:
using (var rng = new RNGCryptoServiceProvider())
{
var data = new byte[4];
yourCollection = yourCollection.Shuffle(rng, data);
}
Method:
/// <summary>
/// Shuffles the elements of a sequence randomly.
/// </summary>
/// <param name="source">A sequence of values to shuffle.</param>
/// <param name="rng">An instance of a random number generator.</param>
/// <param name="data">A placeholder to generate random bytes into.</param>
/// <returns>A sequence whose elements are shuffled randomly.</returns>
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, RNGCryptoServiceProvider rng, byte[] data)
{
var elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
rng.GetBytes(data);
var swapIndex = BitConverter.ToUInt32(data, 0) % (i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
Looking for an algorithm? You can use my ShuffleList class:
class ShuffleList<T> : List<T>
{
public void Shuffle()
{
Random random = new Random();
for (int count = Count; count > 0; count--)
{
int i = random.Next(count);
Add(this[i]);
RemoveAt(i);
}
}
}
Then, use it like this:
ShuffleList<int> list = new ShuffleList<int>();
// Add elements to your list.
list.Shuffle();
How does it work?
Let's take an initial sorted list of the 5 first integers: { 0, 1, 2, 3, 4 }.
The method starts by counting the nubmer of elements and calls it count. Then, with count decreasing on each step, it takes a random number between 0 and count and moves it to the end of the list.
In the following step-by-step example, the items that could be moved are italic, the selected item is bold:
0 1 2 3 4
0 1 2 3 4
0 1 2 4 3
0 1 2 4 3
1 2 4 3 0
1 2 4 3 0
1 2 3 0 4
1 2 3 0 4
2 3 0 4 1
2 3 0 4 1
3 0 4 1 2
This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)
Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!
Dice-O-Matic
The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).
I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!
I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!
Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next() for both parameters without caching these!
In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.
I might check it later by putting debugging output inside a new "Next" function so see what happens.
In Reflector I could not immediately find out how it works.
It is worth noting that due to the deferred execution of LINQ, using a random number generator instance with OrderBy() can result in a possibly unexpected behavior: The sorting does not happen until the collection is read. This means each time you read or enumerate the collection, the order changes. One would possibly expect the elements to be shuffled once and then to retain the order each time it is accessed thereafter.
Random random = new();
var shuffled = ordered.OrderBy(x => random.Next())
The code above passes a lambda function x => random.Next() as a parameter to OrderBy(). This will capture the instance referred to by random and save it with the lambda by so that it can call Next() on this instance to perform the ordering later which happens right before it is enumerated(when the first element is requested from the collection).
The problem here, is since this execution is saved for later, the ordering happens each time just before the collection is enumerated using new numbers obtained by calling Next() on the same random instance.
Example
To demonstrate this behavior, I have used Visual Studio's C# Interactive Shell:
> List<int> list = new() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
> Random random = new();
> var shuffled = list.OrderBy(element => random.Next());
> shuffled.ToList()
List<int>(10) { 5, 9, 10, 4, 6, 2, 8, 3, 1, 7 }
> shuffled.ToList()
List<int>(10) { 8, 2, 9, 1, 3, 6, 5, 10, 4, 7 } // Different order
> shuffled.ElementAt(0)
9 // First element is 9
> shuffled.ElementAt(0)
7 // First element is now 7
>
This behavior can even be seen in action by placing a breakpoint just after where the IOrderedEnumerable is created when using Visual Studio's debugger: each time you hover on the variable, the elements show up in a different order.
This, of course, does not apply if you immediately enumerate the elements by calling ToList() or an equivalent. However, this behavior can lead to bugs in many cases, one of them being when the shuffled collection is expected to contain a unique element at each index.
Startup time to run on code with clear all threads and cache every new test,
First unsuccessful code. It runs on LINQPad. If you follow to test this code.
Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});
//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);
list.OrderBy(x => r.Next()) uses 38.6528 ms
list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)
the after second time both of them use in the same time.
EDIT:
TEST CODE on Intel Core i7 4#2.1GHz, Ram 8 GB DDR3 #1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]
using System;
using System.Runtime;
using System.Diagnostics;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Threading;
namespace Algorithm
{
class Program
{
public static void Main(string[] args)
{
try {
int i = 0;
int limit = 10;
var result = GetTestRandomSort(limit);
foreach (var element in result) {
Console.WriteLine();
Console.WriteLine("time {0}: {1} ms", ++i, element);
}
} catch (Exception e) {
Console.WriteLine(e.Message);
} finally {
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
public static IEnumerable<double> GetTestRandomSort(int limit)
{
for (int i = 0; i < 5; i++) {
string path = null, temp = null;
Stopwatch st = null;
StreamReader sr = null;
int? count = null;
List<string> list = null;
Random r = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Thread.Sleep(5000);
st = Stopwatch.StartNew();
#region Import Input Data
path = Environment.CurrentDirectory + "\\data";
list = new List<string>();
sr = new StreamReader(path);
count = 0;
while (count < limit && (temp = sr.ReadLine()) != null) {
// Console.WriteLine(temp);
list.Add(temp);
count++;
}
sr.Close();
#endregion
// Console.WriteLine("--------------Random--------------");
// #region Sort by Random with OrderBy(random.Next())
// r = new Random();
// list = list.OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with OrderBy(Guid)
// list = list.OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with Parallel and OrderBy(random.Next())
// r = new Random();
// list = list.AsParallel().OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with Parallel OrderBy(Guid)
// list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with User-Defined Shuffle Method
// r = new Random();
// list = list.Shuffle(r).ToList();
// #endregion
// #region Sort by Random with Parallel User-Defined Shuffle Method
// r = new Random();
// list = list.AsParallel().Shuffle(r).ToList();
// #endregion
// Result
//
st.Stop();
yield return st.Elapsed.TotalMilliseconds;
foreach (var element in list) {
Console.WriteLine(element);
}
}
}
}
}
Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.
Answer: They are contradiction.

Categories

Resources