Building a Matrix of Combinations - c#

I'm sure this has been asked a million times, but when I searched all the examples didn't quite fit, so I thought I should ask it anyway.
I have two arrays which will always contain 6 items each. For example:
string[] Colors=
new string[] { "red", "orange", "yellow", "green", "blue", "purple" };
string[] Foods=
new string[] { "fruit", "grain", "dairy", "meat", "sweet", "vegetable" };
Between these two arrays, there are 36 possible combinations(e.g. "red fruit", "red grain").
Now I need to further group these into sets of six unique values.
For example:
meal[0]=
new Pair[] {
new Pair { One="red", Two="fruit" },
new Pair { One="orange", Two="grain" },
new Pair { One="yellow", Two="dairy" },
new Pair { One="green", Two="meat" },
new Pair { One="blue", Two="sweet" },
new Pair { One="purple", Two="vegetable" }
};
where meal is
Pair[][] meal;
No element can be repeated in my list of "meals". So there is only ever a single "Red" item, and a single "meat" item, etc.
I can easily create the pairs based on the first two arrays, but I am drawing a blank on how best to then group them into unique combinations.

OK, you want a sequence containing all 720 possible sequences. This is a bit trickier but it can be done.
The basic idea is the same as in my previous answer. In that answer we:
generated a permutation at random
zipped the permuted second array with the unpermuted first array
produced an array from the query
Now we'll do the same thing except instead of producing a permutation at random, we'll produce all the permutations.
Start by getting this library:
http://www.codeproject.com/Articles/26050/Permutations-Combinations-and-Variations-using-C-G
OK, we need to make all the permutations of six items:
Permutations<string> permutations = new Permutations<string>(foods);
What do we want to do with each permutation? We already know that. We want to first zip it with the colors array, turning it into a sequence of pairs, which we then turn into an array. Instead, let's turn it into a List<Pair> because, well, trust me, it will be easier.
IEnumerable<List<Pair>> query =
from permutation in permutations
select colors.Zip(permutation, (color, food)=>new Pair(color, food)).ToList();
And now we can turn that query into a list of results;
List<List<Pair>> results = query.ToList();
And we're done. We have a list with 720 items in it. Each item is a list with 6 pairs in it.
The heavy lifting is done by the library code, obviously; the query laid on top of it is straightforward.
('ve been meaning to write a blog article for some time on ways to generate permutations in LINQ; I might use this as an example!)

There are 720 possible combinations that meet your needs. It is not clear from your question whether you want to enumerate all 720 or choose one at random or what. I'm going to assume the latter.
UPDATE: Based on comments, this assumption was incorrect. I'll start a new answer.
First, produce a permutation of the second array. You can do it in-place with the Fischer-Yates-Knuth shuffle; there are many examples of how to do so on StackOverflow. Alternatively, you could produce a permutation with LINQ by sorting with a random key.
The former technique is fast even if the number of items is large, but mutates an existing array. The second technique is slower, particularly if the number of items is extremely large, which it isn't.
The most common mistake people make with the second technique is sorting on a guid. Guids are guaranteed to be unique, not guaranteed to be random.
Anyway, produce a query which, when executed, permutes the second array:
Random random = new Random();
IEnumerable<string> shuffled = from food in foods
orderby random.NextDouble()
select food;
A few other caveats:
Remember, the result of a query expression is a query, not a set of results. The permutation doesn't happen until you actually turn the thing into an array at the other end.
if you make two instances of Random within the same millisecond, you get the same sequence out of them both.
Random is pseudo-random, not truly random.
Random is not threadsafe.
Now you can zip-join your permuted sequence to the first array:
IEnumerable<Pair> results = colors.Zip(shuffled, (color, food)=>new Pair(color, food));
Again, this is still a query representing the action of zipping the two sequences together. Nothing has happened yet except building some queries.
Finally, turn it into an array. This actually executes the queries.
Pair[] finalResults = results.ToArray();
Easy peasy.

Upon request, I will be specific about how I view the problem in regards to sorting. I know that since C# is a higher level language there are tons of quick and easy libraries and objects that can be used to reduce this to minimal code. This answer is actually attempting the solve the question by implementing sorting logic.
When initially reading this question I was reminded of sorting a deck of cards. The two arrays are very similar to an array for suit and an array for face value. Since one way to solve a shuffle is to randomize the arrays and then pick a card combined of both, you could apply the same logic here.
Sorting as a possible solution
The Fisher-Yates sorting algorithm essentially loops through all the indices of the array swapping the current index with a random index. This creates a fairly efficient sorting method. So then how does this apply to the problem at hand? One possible implementation could be...
static Random rdm = new Random();
public string[] Shuffle(string[] c)
{
var random = rdm;
for (int i = c.Length; i > 1; i--)
{
int iRdm = rdm.Next(i);
string cTemp = c[iRdm];
c[iRdm] = c[i - 1];
c[i - 1] = cTemp;
}
return c;
}
Source: Fisher-Yates Shuffle
The code above randomizes the positions of values within the string array. If you passed the Colors and Food arrays into this function, you would get unique pairings for your Pairs by referencing a specific index of both.
Since the array is shuffled, the pairing of the two arrays at index 0,1,2,etc are unique. The problem however asks for Pairs to be created. A Pair class should then be created that takes in a value at a specific index for both Colors and Foods. ie...Colors[3] and Foods[3]
public class Pair
{
public string One;
public string Two;
public Pair(string m1, string m2)
{
One = m1;
Two = m2;
}
}
Since we have sorted arrays and a class to contain the unique parings, we simply create the meal array and populate it with Pairs.
If we wanted to create a new pair we would have...
Pair temp = new Pair(Colors[0],Foods[0]);
With this information we can finally populate the meal array.
Pair[] meal = new Pair[Colors.Length - 1];
for (int i = 0; i < Colors.Length - 1; i++)
{
meal[i] = new Pair(Colors[i],Foods[i]);
}
This section of code creates the meal array and defines its number of indices by the length of Colors. The code then loops through the total number of Color values while creating new pair combos and dropping them in meal. This method assumes the length of the arrays are identical, a check could easily be made for the smallest array.
Full Code
private void Form1_Load(object sender, EventArgs e)
{
string[] Colors = new string[] { "red", "orange", "yellow", "green", "blue", "purple" };
string[] Foods = new string[] { "fruit", "grain", "dairy", "meat", "sweet", "vegetable" };
Colors = Shuffle(Colors);
Foods = Shuffle(Foods);
Pair[] meal = new Pair[Colors.Length - 1];
for (int i = 0; i < Colors.Length - 1; i++)
{
meal[i] = new Pair(Colors[i],Foods[i]);
}
}
static Random rdm = new Random();
public string[] Shuffle(string[] c)
{
var random = rdm;
for (int i = c.Length; i > 1; i--)
{
int iRdm = rdm.Next(i);
string cTemp = c[iRdm];
c[iRdm] = c[i - 1];
c[i - 1] = cTemp;
}
return c;
}
}
public class Pair
{
public string One;
public string Two;
public Pair(string m1, string m2)
{
One = m1;
Two = m2;
}
}
-Original Post-
You can simply shuffle the array. This will allow for the same method to populate meal, but with different results. There is a post on Fisher-Yates shuffle Here

Related

Randomly selects an item from one of the two lists

I am working with generating Random items from either one or another list. I am kind of struggling how to do that.
Basically I have two lists:
List<string> names = new List<string>();
List<string> surnames = new List<string>();
I know how to get an item from one list randomly, but I am struggling how to do so there will be a possibility of taking an item from either names or surnames.
I know there is possibly an easy solution for that but couldn't find it.
Any help would be appreciated.
I know how to get an item from one list randomly
Leverage the technique for taking a random item from a single list to build a simple approach that works with two lists.
Imagine that you have a list of length N = names.Count + surnames.Count
Pick a random position p between 0, inclusive, and N, exclusive
If the position p is less than names.Count, use names[p]
Otherwise, use surnames[p - names.Length]
Effectively, the above approach picks an item form a merged list without performing an actual merge.
Edit: It turns out that you wanted a random combination of names[] and surnames[]. This is a simpler task, which is achieved by picking a random element from an array twice - once from names[], and then separately from surnames[].
This should do the job:
Random r = new Random();
Int32 nameIdx = r.Next(names.Count);
Int32 surnameIdx = r.Next(surnames.Count);
String randFullname = names[nameIdx] + " " + surnames[surnameIdx];
This is just an example to show you how to work with random array accesses. If you need to select only one name or one surname (the question was not really clear on that point "but I am struggling how to do so there will be a possibility of taking an item from either names or surnames"), just throw another random [0 1] and pick the first or the second list basing your choice on the output value:
List<String> currentList;
String result;
Random r = new Random();
if (rand.Next(0, 2) == 0)
currentList = names;
else
currentList = surnames;
Int32 idx = r.Next(currentList.Count);
String result = currentList[idx];
Otherwise, just pick a single random entry from a concatenation:
List<String> con = names.Concat(surnames).ToList();
You can merge two lists and access random element as follows,
var newList = names.Concat(surnames).ToList();
Random r = new Random();
string rand = newList[r.Next(newList.Count)];
If you want to do it in a one-liner you could try the following:
var r = new Random();
var randomName = names.Concat(surnames).OrderBy(n => r.Next()).First();
It's not very efficient memory wise, but it should work.

c# - How to create 2D array with unknown number of rows, but known number of columns?

I am trying to create 2-dimensional array of ints with two columns, but an unknown amount of rows. I know that in order to create the 2D array itself I do the following:
List<List<int>> myList = new List<List<int>>();
But how do I modify this to specify the number of columns? And how would I add a row to this array?
You could use a List of int arrays. The List would represent the rows, and each int array in the List would be one row. The length of each array would equal the amount of columns.
List<int[]> rows = new List<int[]>();
int[] row = new int[2];
row[0] = 100;
row[1] = 200;
rows.Add(row);
As long as each int array is length two (int[] row = new int[2]), all rows will have two columns. The List can have any number of int arrays added to it in this way.
There is no way to create 2D array (or any other sort of array) with unknown number of elements. Once you initialize it you have to provide number of elements.
The syntax for multidimensional array is the following:
var arr = new int[k, l, n,...]
You can create so called jagged array, i.e. array of arrays and initialize it in the cycle. You will still need to initialize it with a number of subarrays and then fill with those subarrays of given lengths:
var arr = new int[][n];
for (int i = 0; i < arr.Length; i++)
{
arr[i] = new int[subArrayLength];
}
What you actually do is List of Lists which can have any number of "rows" of any length. You can add new list of specific length to an outer list by List method Add() and the same way you can add an element to any of inner List.
So basically to add a "row" you would need the following:
List<List<int>> table = new List<List<int>>();
table.Add(Enumerable.Repeat(defaultValue, numberOfColumns).ToList());
To add a column you would need something like this:
foreach (var row in table)
{
row.Add(defaultValue);
}
It seems that you want to simulate table structure - for that I would suggest to create a class to incapsulate table logic from above inside so that any addition of row will cause addition of outer List of current numberOfColumns size and addition of column will cause addition of an element to all outer lists.
However if you need a fixed number of columns the best alternative you can use is to declare a new class with your columns mapped to the properties of the class and then simply declare a List like it is described in the following answer as #shash678 pointed
Original answer: Consider using a List<Tuple<int, int>>.
Explanation: To begin with, an int[] is not a List<>. From the spirit of your question you seem to have the need to group several units of data together with a finite size, like a records in a table. You could decide to create a POCO with a more descriptive name (which would increase code readability and help with semantics), use an array of fixed size, or use a Tuple. Since there is no mention as to the need for mutability, and the size of the inner "array" will be fixed, I would suggest a Tuple. Through type safety you will ensure that the size and shape of each new object added to the list of Tuples is correct.
Your fist and second questions would be taken care, as far as the 3rd, see: Tuple Pair
e.g. list.Add(Tuple.Create<int,int>(1,1));
I feel like these answers are leading you down a bad path. When learning how to program, you should also consider best practices. In this case your code is what you should use. If you can avoid setting explicit array size, you should(In my opinion). List were create for this purpose. This sample app explains how to use the list correctly in this scenario. My personal opinion is to avoid using arrays if you can.
int someNumberOfRow = 10;//This is just for testing purposes.
Random random = new Random();//This is just for testing purposes.
List<List<int>> myList = new List<List<int>>();
//add two elements to the an arraylist<int> then add this arraylist to myList arraylist
for(int i = 0; i < someNumberOfRow; i++)
{
//Create inner list and add two ints to it.
List<int> innerList = new List<int>();
innerList.Add(random.Next());
innerList.Add(random.Next());
//Add the inner list to myList;
myList.Add(innerList);
}
//This prints myList
for(int i = 0; i < myList.Count; i++)
{
Console.WriteLine((i + 1) + ": " + myList[i][0] + " - " + myList[i][1]);
}
Console.WriteLine("\n\n");
//If the app may scale in the future, I suggest you use an approach similar to this
foreach(List<int> sublist in myList)
{
foreach(int columns in sublist)
{
Console.Write(columns + " ");
}
Console.WriteLine();
}
Console.ReadLine();

Efficient way to pair a limited number of random elements from two separate collections

I have two lists (lista and listb), each containing an unknown number of points (two ints in a struct).
I want to create a new list containing unique random pairings from lista and listb. So an example entry might be [12,14] where 12 is an index for lista and 14 is an index for listb.
I also want to set a maximum number of pairings when calling this function. So instead of pairing every element in lista with every element in listb, I could limit it to 200 random pairings as an example.
My first attempt at this was to simply generate every possible pairing. Shuffle that list and knock off any elements past my max. This solution isn't nearly efficient enough.
My next attempt was to make an array per original list of every possible index, shuffle those separately, and then just iterate over them both until I had the max number of pairings (or all of them). This has several problems I'm not certain how to solve. One of which, lista could have 10 million elements for all I know. Creating a new array of 10 million elements (the indices list) and shuffling that when my max pairs might only be 200? Seems silly to go that far.
I've considered just choosing random elements from both lista/listb and seeing if I've already paired them before adding them to the new list. This is also quite a silly option as a lot of time can be spent picking duplicate pairings over and over.
So, what's a good option here or is there one? I don't want to iterate over every possible combination, pairings need to be unique, removing options from a list is quite slow due to the array re-sizing when they are quite large, distribution needs to be pretty uniform in the selection process for each list, etc.
Thanks for any and all help.
Edit - I meant the unique aspect regarding the pairs themselves. So element 10 in lista could be used over and over as long as the element in listb is different each time. The only catch there is I don't want to limit lista and listb right off as I need fairly even distribution across both lists for every pairing.
To avoid duplicates completely, you could try doing a sparse Fisher-Yates shuffle.
Create a Dictionary<int, int> dict that will map "indices in the Fisher-Yates array that do not hold their own index" to "the value at that index".
For the nth item, pick a random number x from n (inclusive) to "size of ListA * size of ListB" (exclusive)
dict[x] ?? x is your selected item.
Store dict[n] ?? n in dict[x].
Map the selected item back to a pair of indices (divide by size of ListA for the ListB index, modulus by the size of ListA for the ListA index).
A math or statistics buff might give you a formula for evaluating this but I just wrote some test code.
The code simply picks random pairs, and every time it sees a duplicate it tries again. Then for each such "pick a random pair until unique" cycle it counts how many retries it did and tracks this. Then finally this is summed up into a global array to track the relative frequency of these things.
Here's the results after about 1 minute of execution:
84382319 81 0 0 0 0 0 0 0 0
The numbers mean this:
Out of 421912 cycles [(84382319+81)/200]:
81 duplicates were found but retrying did not find a duplicate (3rd number and up is 0)
84382319 unique pairs could be found on the first try without duplicates
So, obviously this will start to rise if you increase the number of pairs you want generated or lower the numbers to choose wrong, but I'm not sure this will pose a problem in practice.
Here's the LINQPad program I used:
static Random R = new Random();
void Main()
{
var a = 10000;
var b = 10000;
var n = 200;
int[] counts = new int[10];
var dc = new DumpContainer().Dump();
while (true)
{
var once = Test(a, b, n);
for (int i = 0; i < once.Length; i++)
counts[i] += once[i];
dc.Content = Util.HorizontalRun(true, counts);
}
}
public static int[] Test(int a, int b, int n)
{
var seen = new HashSet<Tuple<int, int>>();
var result = new int[10];
for (int index = 0; index < n; index++)
{
int tries = 0;
while (true)
{
var av = R.Next(a);
var bv = R.Next(a);
var t = Tuple.Create(av, bv);
if (seen.Contains(t))
tries++;
else
{
seen.Add(t);
break;
}
}
result[tries]++;
}
return result;
}

getting a random number as long as its not in a list

var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
do
{
r = rnd.Next(1, 30);
}
while (list.Contains(r));
but i think that's a stupid solution, can anyone give me a more optimized approach?
even better if there is a way to prevent the Random instance from returning a number that it has already returned.
in case anyone wonders why do i need this its the first step in shuffling 3 byte arrays and combining them into one byte array and producing 3 byte arrays that hold the indices original order as it was in the original arrays.
Yes, one thing to make it much more efficient is use a HashSet<int> instead of a List<int> lookups for a HashSet are MUCH faster than a List (however the cost of the constructor will be slightly more for a HashSet).
Also if the input list is always the same numbers move it out of the function to help reduce the cost overhead of generating the HashSet the first time.
Due to order now mattering, in my personal experience (please test and profile for your own situation), after about 14 items in the list it is faster to convert a list to a HashSet and do the lookup than doing the lookup in the list itself.
var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
//In this example with 5 items in the list the HashSet will be slower do to the cost
// of creating it, but if we knew that the 5 items where fixed I would make this
// outside of the function so I would only have to pay the cost once per program
// start-up and it would be considered faster again due to amortized start-up cost.
var checkHashSet = new HashSet<int>(list);
do
{
r = rnd.Next(1, 30);
}
while (checkHashSet.Contains(rnd.Next(1, 30))); //Shouldent this be "r" not "rnd.Next(1,30)"?
You're right that looping isn't particularly efficient. You can use some handy extensions to select a number if you consider the constraint of the list of valid numbers, as opposed to the list of invalid ones.
So you have your list of invalid numbers:
var list = new List<int>(){1,17,18,21,30};
Which means that your list of valid numbers is the range from 1-30 except for these. Something like:
var validList = Enumerable.Range(1, 30).Except(list);
So we can use these extensions from the linked answer:
public static T RandomElement(this IEnumerable<T> enumerable)
{
return enumerable.RandomElementUsing(new Random());
}
public static T RandomElementUsing(this IEnumerable<T> enumerable, Random rand)
{
int index = rand.Next(0, enumerable.Count());
return enumerable.ElementAt(index);
}
And select a random element from the list of known valid numbers:
var kindOfRandomNumber = Enumerable.Range(1, 30).Except(list).RandomElement();

Is using Random and OrderBy a good shuffle algorithm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:
var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());
Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I prefer Durstenfeld's variant of the Fisher-Yates shuffle which swaps elements.
Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.
The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.
EDIT: Here's a simple implementation (no error checking!):
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length-1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
T tmp = elements[i];
elements[i] = elements[swapIndex];
elements[swapIndex] = tmp;
}
// Lazily yield (avoiding aliasing issues etc)
foreach (T element in elements)
{
yield return element;
}
}
EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    for (int i = elements.Length - 1; i >= 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
// ... except we don't really need to swap it fully, as we can
// return it immediately, and afterwards it's irrelevant.
        int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
    }
}
This will now only do as much work as it needs to.
Note that in both cases, you need to be careful about the instance of Random you use as:
Creating two instances of Random at roughly the same time will yield the same sequence of random numbers (when used in the same way)
Random isn't thread-safe.
I have an article on Random which goes into more detail on these issues and provides solutions.
This is based on Jon Skeet's answer.
In that answer, the array is shuffled, then returned using yield. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.
This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray() it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3) and only three swaps will take place (although the entire array would have to be copied first).
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
// we don't actually perform the swap, we can forget about the
// swapped element because we already returned it.
}
// there is one item remaining that was not returned - we return it now
yield return elements[0];
}
Starting from this quote of Skeet:
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I'll go on a little explaining the reason for the hopefully unique!
Now, from the Enumerable.OrderBy:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved
This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problem that is exactly this problem.
Now, is it real? Is it true?
As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG
This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wiki page there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/ and using the .OrderBy(x => r.Next()) and the .OrderBy(x => r.Next(someValue)).
Now, Random.Next is
A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.
so it's equivalent to
OrderBy(x => r.Next(int.MaxValue))
To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy, then any range of values should give the same result.
The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024). If you make the array bigger (4 or 5), then even r.Next(1024) isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)
It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).
It works by assigning each element of the series a random integer, then ordering the sequence by these integers.
It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.
This has come up many times before. Search for Fisher-Yates on StackOverflow.
Here is a C# code sample I wrote for this algorithm. You can parameterize it on some other type, if you prefer.
static public class FisherYates
{
// Based on Java code from wikipedia:
// http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
static public void Shuffle(int[] deck)
{
Random r = new Random();
for (int n = deck.Length - 1; n > 0; --n)
{
int k = r.Next(n+1);
int temp = deck[n];
deck[n] = deck[k];
deck[k] = temp;
}
}
}
Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.
One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.
I found Jon Skeet's answer to be entirely satisfactory, but my client's robo-scanner will report any instance of Random as a security flaw. So I swapped it out for System.Security.Cryptography.RNGCryptoServiceProvider. As a bonus, it fixes that thread-safety issue that was mentioned. On the other hand, RNGCryptoServiceProvider has been measured as 300x slower than using Random.
Usage:
using (var rng = new RNGCryptoServiceProvider())
{
var data = new byte[4];
yourCollection = yourCollection.Shuffle(rng, data);
}
Method:
/// <summary>
/// Shuffles the elements of a sequence randomly.
/// </summary>
/// <param name="source">A sequence of values to shuffle.</param>
/// <param name="rng">An instance of a random number generator.</param>
/// <param name="data">A placeholder to generate random bytes into.</param>
/// <returns>A sequence whose elements are shuffled randomly.</returns>
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, RNGCryptoServiceProvider rng, byte[] data)
{
var elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
rng.GetBytes(data);
var swapIndex = BitConverter.ToUInt32(data, 0) % (i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
Looking for an algorithm? You can use my ShuffleList class:
class ShuffleList<T> : List<T>
{
public void Shuffle()
{
Random random = new Random();
for (int count = Count; count > 0; count--)
{
int i = random.Next(count);
Add(this[i]);
RemoveAt(i);
}
}
}
Then, use it like this:
ShuffleList<int> list = new ShuffleList<int>();
// Add elements to your list.
list.Shuffle();
How does it work?
Let's take an initial sorted list of the 5 first integers: { 0, 1, 2, 3, 4 }.
The method starts by counting the nubmer of elements and calls it count. Then, with count decreasing on each step, it takes a random number between 0 and count and moves it to the end of the list.
In the following step-by-step example, the items that could be moved are italic, the selected item is bold:
0 1 2 3 4
0 1 2 3 4
0 1 2 4 3
0 1 2 4 3
1 2 4 3 0
1 2 4 3 0
1 2 3 0 4
1 2 3 0 4
2 3 0 4 1
2 3 0 4 1
3 0 4 1 2
This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)
Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!
Dice-O-Matic
The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).
I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!
I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!
Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next() for both parameters without caching these!
In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.
I might check it later by putting debugging output inside a new "Next" function so see what happens.
In Reflector I could not immediately find out how it works.
It is worth noting that due to the deferred execution of LINQ, using a random number generator instance with OrderBy() can result in a possibly unexpected behavior: The sorting does not happen until the collection is read. This means each time you read or enumerate the collection, the order changes. One would possibly expect the elements to be shuffled once and then to retain the order each time it is accessed thereafter.
Random random = new();
var shuffled = ordered.OrderBy(x => random.Next())
The code above passes a lambda function x => random.Next() as a parameter to OrderBy(). This will capture the instance referred to by random and save it with the lambda by so that it can call Next() on this instance to perform the ordering later which happens right before it is enumerated(when the first element is requested from the collection).
The problem here, is since this execution is saved for later, the ordering happens each time just before the collection is enumerated using new numbers obtained by calling Next() on the same random instance.
Example
To demonstrate this behavior, I have used Visual Studio's C# Interactive Shell:
> List<int> list = new() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
> Random random = new();
> var shuffled = list.OrderBy(element => random.Next());
> shuffled.ToList()
List<int>(10) { 5, 9, 10, 4, 6, 2, 8, 3, 1, 7 }
> shuffled.ToList()
List<int>(10) { 8, 2, 9, 1, 3, 6, 5, 10, 4, 7 } // Different order
> shuffled.ElementAt(0)
9 // First element is 9
> shuffled.ElementAt(0)
7 // First element is now 7
>
This behavior can even be seen in action by placing a breakpoint just after where the IOrderedEnumerable is created when using Visual Studio's debugger: each time you hover on the variable, the elements show up in a different order.
This, of course, does not apply if you immediately enumerate the elements by calling ToList() or an equivalent. However, this behavior can lead to bugs in many cases, one of them being when the shuffled collection is expected to contain a unique element at each index.
Startup time to run on code with clear all threads and cache every new test,
First unsuccessful code. It runs on LINQPad. If you follow to test this code.
Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});
//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);
list.OrderBy(x => r.Next()) uses 38.6528 ms
list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)
the after second time both of them use in the same time.
EDIT:
TEST CODE on Intel Core i7 4#2.1GHz, Ram 8 GB DDR3 #1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]
using System;
using System.Runtime;
using System.Diagnostics;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Threading;
namespace Algorithm
{
class Program
{
public static void Main(string[] args)
{
try {
int i = 0;
int limit = 10;
var result = GetTestRandomSort(limit);
foreach (var element in result) {
Console.WriteLine();
Console.WriteLine("time {0}: {1} ms", ++i, element);
}
} catch (Exception e) {
Console.WriteLine(e.Message);
} finally {
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
public static IEnumerable<double> GetTestRandomSort(int limit)
{
for (int i = 0; i < 5; i++) {
string path = null, temp = null;
Stopwatch st = null;
StreamReader sr = null;
int? count = null;
List<string> list = null;
Random r = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Thread.Sleep(5000);
st = Stopwatch.StartNew();
#region Import Input Data
path = Environment.CurrentDirectory + "\\data";
list = new List<string>();
sr = new StreamReader(path);
count = 0;
while (count < limit && (temp = sr.ReadLine()) != null) {
// Console.WriteLine(temp);
list.Add(temp);
count++;
}
sr.Close();
#endregion
// Console.WriteLine("--------------Random--------------");
// #region Sort by Random with OrderBy(random.Next())
// r = new Random();
// list = list.OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with OrderBy(Guid)
// list = list.OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with Parallel and OrderBy(random.Next())
// r = new Random();
// list = list.AsParallel().OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with Parallel OrderBy(Guid)
// list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with User-Defined Shuffle Method
// r = new Random();
// list = list.Shuffle(r).ToList();
// #endregion
// #region Sort by Random with Parallel User-Defined Shuffle Method
// r = new Random();
// list = list.AsParallel().Shuffle(r).ToList();
// #endregion
// Result
//
st.Stop();
yield return st.Elapsed.TotalMilliseconds;
foreach (var element in list) {
Console.WriteLine(element);
}
}
}
}
}
Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.
Answer: They are contradiction.

Categories

Resources