I am doing an exercise from exercism.io, in which I have to generate random names for robots. I am able to get through a bulk of the tests until I hit this test:
[Fact]
public void Robot_names_are_unique()
{
var names = new HashSet<string>();
for (int i = 0; i < 10_000; i++) {
var robot = new Robot();
Assert.True(names.Add(robot.Name));
}
}
After some googling around, I stumbled upon a couple of solutions and found out about the Fisher-Yates algorithm. I tried to implement it into my own solution but unfortunately, I haven't been able to pass the final test, and I'm stumped. If anyone could point me in the right direction with this, I'd greatly appreciate it. My code is below:
EDIT: I forgot to mention that the format of the string has to follow this: #"^[A-Z]{2}\d{3}$"
public class Robot
{
string _name;
Random r = new Random();
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string nums = "0123456789";
public Robot()
{
_name = letter() + num();
}
public string Name
{
get { return _name; }
}
private string letter() => GetString(2 ,alpha.ToCharArray(), r);
private string num() => GetString(3, nums.ToCharArray(), r);
public void Reset() => _name = letter() + num();
public string GetString(int length,char[] chars, Random rnd)
{
Shuffle(chars, rnd);
return new string(chars, 0, length);
}
public void Shuffle(char[] _alpha, Random r)
{
for(int i = _alpha.Length - 1; i > 1; i--)
{
int j = r.Next(i);
char temp = _alpha[i];
_alpha[i] = _alpha[j];
_alpha[j] = temp;
}
}
}
The first rule of any ID is:
It does not mater how big it is, how many possible value it has - if you just create enough of them, you will get a colission eventually.
To Quote Trillian from the Hithchikers Guide: "[A colission] is not impossible. Just realy, really unlikely."
However in this case, I think it is you creating Random Instances in a Loop. This is a classical beginners mistake when workign with Random. You should not create a new random isntance for each Robot Instance, you should have one for the application that you re-use. Like all Pseudorandom Number Generators, Random is deterministic. Same inputs - same outputs.
As you did not specify a seed value, it will use the time in milliseconds. Wich is going to the same between the first 20+ loop itterations at last. So it is going to have the same seed and the same inputs, so the same outputs.
The easiest solution for unique names is to use GUIDs. In theory, it is possible to generate non-unique GUIDs but it is pretty close to zero.
Here is the sample code:
var newUniqueName = Guid.NewGuid().ToString();
Sure GUIDs do not look pretty but they are really easy to use.
EDIT: Since the I missed the additional requirement for the format I see that GUID format is not acceptable.
Here is an easy way to do that too. Since format is two letters (26^2 possibile values) and 3 digits (10^3 possible values) the final number of possible values is 26^2 * 10^3 = 676 * 1000 = 676000. This number is quite small so Random can be used to generate the random integer in the range 0-675999 and then that number can be converted to the name. Here is the sample code:
var random = new System.Random();
var value = random.Next(676000);
var name = ((char)('A' + (value % 26))).ToString();
value /= 26;
name += (char)('A' + (value % 26));
value /= 26;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
The usual disclaimer about possible identical names applies here too since we have 676000 possible variants and 10000 required names.
EDIT2: Tried the code above and generating 10000 names using random numbers produced between 9915 and 9950 unique names. That is no good. I would use a simple static in class member as a counter instead of random number generator.
First, let's review the test you're code is failing against:
10.000 instances created
Must all have distinct names
So somehow, when creating 10000 "random" names, your code produces at least two names that are the same.
Now, let's have a look at the naming scheme you're using:
AB123
The maximum number of unique names we could possibly create is 468000 (26 * 25 * 10 * 9 * 8).
This seems like it should not be a problem, because 10000 < 468000 - but this is where the birthday paradox comes in!
From wikipedia:
In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.
Rewritten for the purposes of your problem, we end up asking:
What's the probability that, in a set of 10000 randomly chosen people, some pair of them will have the same name.
The wikipedia article also lists a function for approximating the number of people required to reach a 50% propbability that two people will have the same name:
where m is the total number of possible distinct values. Applying this with m=468000 gives us ~806 - meaning that after creating only 806 randomly named Robots, there's already a 50% chance of two of them having the same name.
By the time you reach Robot #10000, the chances of not having generated two names that are the same is basically 0.
As others have noted, you can solve this by using a Guid as the robot name instead.
If you want to retain the naming convention you might also get around this by implementing an LCG with an appropriate period and use that as a less collision-prone "naming generator".
Here's one way you can do it:
Generate the list of all possible names
For each robot, select a name from the list at random
Remove the selected name from the list so it can't be selected again
With this, you don't even need to shuffle. Something like this (note, I stole Optional Option's method of generating names because it's quite clever and I couldn't be bother thinking of my own):
public class Robot
{
private static List<string> names;
private static Random rnd = new Random();
public string Name { get; private set; }
static Robot()
{
Console.WriteLine("Initializing");
// Generate possible candidates
names = Enumerable.Range(0, 675999).Select(i =>
{
var sb = new StringBuilder(5);
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
return sb.ToString();
}).ToList();
}
public Robot()
{
// Note: if this needs to be multithreaded, then you'd need to do some work here
// to avoid two threads trying to take a name at the same time
// Also note: you should probably check that names.Count > 0
// and throw an error if not
var i = rnd.Next(0, names.Count - 1);
Name = names[i];
names.RemoveAt(i);
}
}
Here's a fiddle that generates 20 random names. They can only be unique because they are removed once they are used.
The point about multitheading is very important however. If you needed to be able to generate robots in parallel, then you'd need to add some code (e.g. locking the critical section of code) to ensure that only one name is being picked and removed from the list of candidates at a time or else things will get really bad, really quickly. This is why, when people need a random id with a reasonable expectation that it'll be unique, without worrying that some other thread(s) are trying the same thing at the same time, they use GUIDs. The sheer number of possible GUIDs makes collisions very unlikely. But you don't have that luxury with only 676,000 possible values
I'm not good at stats, so I tried to solve a simple problem in C#. The problem: "A given team has a 65% chance to win a single game against another team. What is the probability that they will win a best-of-5 set?"
I wanted to look at the relationship between that probability and the number of games in the set. How does a Bo3 compare to a Bo5, and so on?
I did this by creating Set and Game objects and running iterations. The win decision is done with this code:
Won = rnd.Next(1, 100) <= winChance;
rnd is, as you might expect, a static System.Random object.
Here's my Set object code:
public class Set
{
public int NumberOfGames { get; private set; }
public List<Game> Games { get; private set; }
public Set(int numberOfGames, int winChancePct)
{
NumberOfGames = numberOfGames;
GamesNeededToWin = Convert.ToInt32(Math.Ceiling(NumberOfGames / 2m));
Games = Enumerable.Range(1, numberOfGames)
.Select(i => new Game(winChancePct))
.ToList();
}
public int GamesNeededToWin { get; private set; }
public bool WonSet => Games.Count(g => g.Won) >= GamesNeededToWin;
}
My issue is that the results I get aren't quite what they should be. Someone who sucks less at stats did the math for me, and it seems my code is always overestimating the chance of winning the set, and the number of iterations doesn't improve the accuracy.
The results I get (% set win by games per set) are below. The first column is the games per set, the next is the statistical win rate (which my results should be approaching), and the remaining columns are my results based on the number of iterations. As you can see, more iterations don't seem to be making the numbers more accurate.
Games Per Set|Expected Set Win Rate|10K|100K|1M|10M
1 65.0% 66.0% 65.6% 65.7% 65.7%
3 71.8% 72.5% 72.7% 72.7% 72.7%
5 76.5% 78.6% 77.4% 77.5% 77.5%
7 80.0% 80.7% 81.2% 81.0% 81.1%
9 82.8% 84.1% 83.9% 83.9% 83.9%
The entire project is posted on github here if you want to look.
Any insight into why this isn't producing accurate results would be greatly appreciated.
The answer of Darren Sisson is correct; your computation is off by approximately 1%, and so all your results are as well.
My recommendation is that you solve the problem by encapsulating your desired semantics into an object which you can then test independently:
interface IDistribution<T>
{
T Sample();
}
static class Extensions
{
public static IEnumerable<T> Samples(this IDistribution<T> d)
{
while (true) yield return d.Sample();
}
}
class Bernoulli : IDistribution<bool>
{
// Note that we could also make it IDistribution<int> and return
// 0 and 1 instead of false and true; that would be the more
// "classic" approach to a Bernoulli distribution. Your choice.
private double d;
private Random random = new Random();
private Bernoulli(double d) { this.d = d; }
public static Make(double d) => new Bernoulli(d);
public bool Sample() => random.NextDouble() < d;
}
And now you have a biased coin flipper which you can test independently. You can now write code like:
int flips = 1000;
int heads = Bernoulli
.Make(0.65)
.Samples()
.Take(flips)
.Where(x => x)
.Count();
to do 1000 coin flips with a 65% chance of heads.
Notice that what we are doing here is constructing a probability distribution monad and then using the tools of LINQ to express a conditional probability. This is a powerful technique; your application barely scratches the surface of what we can do with it.
Exercise: construct extension methods Where, Select and SelectMany which take not IEnumerable<T> but rather IDistribution<T>; can you express the semantics of the distribution in terms of the distribution type itself, rather than making a transformation from the distribution monad to the sequence monad? Can you do the same for zip joins?
Exercise: construct other implementations of IDistribution<T>. Can you construct, say, a Cauchy distribution of doubles? What about a normal distribution? What about a dice-rolling distribution on a fair die of n sides? Now, can you put this all together? What is the distribution which is: flip a coin; if heads, roll four dice and add them together, otherwise roll two dice and discard all the doubles, and multiply the results.
quick look, The Random function's upper bound is exclusive so would need to be set to 101
var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
do
{
r = rnd.Next(1, 30);
}
while (list.Contains(r));
but i think that's a stupid solution, can anyone give me a more optimized approach?
even better if there is a way to prevent the Random instance from returning a number that it has already returned.
in case anyone wonders why do i need this its the first step in shuffling 3 byte arrays and combining them into one byte array and producing 3 byte arrays that hold the indices original order as it was in the original arrays.
Yes, one thing to make it much more efficient is use a HashSet<int> instead of a List<int> lookups for a HashSet are MUCH faster than a List (however the cost of the constructor will be slightly more for a HashSet).
Also if the input list is always the same numbers move it out of the function to help reduce the cost overhead of generating the HashSet the first time.
Due to order now mattering, in my personal experience (please test and profile for your own situation), after about 14 items in the list it is faster to convert a list to a HashSet and do the lookup than doing the lookup in the list itself.
var list = new List<int>(){1,17,18,21,30};
Random rnd = new Random(DateTime.Now.Second);
int r;
//In this example with 5 items in the list the HashSet will be slower do to the cost
// of creating it, but if we knew that the 5 items where fixed I would make this
// outside of the function so I would only have to pay the cost once per program
// start-up and it would be considered faster again due to amortized start-up cost.
var checkHashSet = new HashSet<int>(list);
do
{
r = rnd.Next(1, 30);
}
while (checkHashSet.Contains(rnd.Next(1, 30))); //Shouldent this be "r" not "rnd.Next(1,30)"?
You're right that looping isn't particularly efficient. You can use some handy extensions to select a number if you consider the constraint of the list of valid numbers, as opposed to the list of invalid ones.
So you have your list of invalid numbers:
var list = new List<int>(){1,17,18,21,30};
Which means that your list of valid numbers is the range from 1-30 except for these. Something like:
var validList = Enumerable.Range(1, 30).Except(list);
So we can use these extensions from the linked answer:
public static T RandomElement(this IEnumerable<T> enumerable)
{
return enumerable.RandomElementUsing(new Random());
}
public static T RandomElementUsing(this IEnumerable<T> enumerable, Random rand)
{
int index = rand.Next(0, enumerable.Count());
return enumerable.ElementAt(index);
}
And select a random element from the list of known valid numbers:
var kindOfRandomNumber = Enumerable.Range(1, 30).Except(list).RandomElement();
I am writing a cards game, and need to draw random cards from pile. I am using Random and Random.Next(..) for that.
Now I want to debug my application, and want to be able to reproduce certain scenarios by using the same Random sequence.
Can anyone help...? I couldn't find an answer over several searches.
Thanks.
Use the overload of the Random constructor which accepts a seed value
static Random r = new Random(0);
This will produce the same sequence of pseudorandom numbers across every execution.
You will need to seed your random number generator. Assuming you're using System.Random, use
Random r = new Random(<some integer>);
which starts the sequence at <some integer>.
But an important note here: you will need to research your random number generator carefully. Else it will be possible to decipher your sequence which will make playing your game unintentionally profitable for an astute user. I doubt you'll be using Random once you pass to production. (Technically it's possible to decipher a linear congruential sequence - which is what C# uses - in a little over three drawings.)
Create a System.Random instance with the same seed:
Random random = new System.Random(1337);
Use the same seed in Random constructor. That guarantee you that Next() will give the same results. For example
Random randomGen1 = new Random(5);
Random randomGen2 = new Random(5);
int r1 = randomGen1.Next();
int r2 = ramdomGen2.Next();
if(r1 == r2)
{
Console.WriteLine("Great success!!");
}
Possibly a bit of overkill for this case (using a seed will give you repeatability) but a good shape for dealing with dependencies over which you do not have full control is to wrap the dependency and access it through an interface. This will allow you to swap in a version that gives specified behaviour for unit testing/ debugging.
e.g.
public interface IRandom {
int Get();
}
public class DefaultRandom : IRandom {
private readonly Random random;
public DefaultRandom() {
random = new Random();
}
public int Get() {
return random.Next();
}
}
public class TestRandom : IRandom {
private readonly List<int> numbers;
private int top;
public TestRandom() {
numbers = new List<int> {1, 1, 2, 3, 5, 8, 13, 21};
top = 0;
}
public int Get() {
return (top >= numbers.Count)
? 0
: numbers[top++];
}
}
One of the nice aspects of this method is that you can use specific values (say, to test edge cases) which might be hard to generate using a fixed seed value.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:
var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());
Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I prefer Durstenfeld's variant of the Fisher-Yates shuffle which swaps elements.
Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.
The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.
EDIT: Here's a simple implementation (no error checking!):
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length-1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
T tmp = elements[i];
elements[i] = elements[swapIndex];
elements[swapIndex] = tmp;
}
// Lazily yield (avoiding aliasing issues etc)
foreach (T element in elements)
{
yield return element;
}
}
EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
// ... except we don't really need to swap it fully, as we can
// return it immediately, and afterwards it's irrelevant.
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
This will now only do as much work as it needs to.
Note that in both cases, you need to be careful about the instance of Random you use as:
Creating two instances of Random at roughly the same time will yield the same sequence of random numbers (when used in the same way)
Random isn't thread-safe.
I have an article on Random which goes into more detail on these issues and provides solutions.
This is based on Jon Skeet's answer.
In that answer, the array is shuffled, then returned using yield. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.
This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray() it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3) and only three swaps will take place (although the entire array would have to be copied first).
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
// we don't actually perform the swap, we can forget about the
// swapped element because we already returned it.
}
// there is one item remaining that was not returned - we return it now
yield return elements[0];
}
Starting from this quote of Skeet:
It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.
I'll go on a little explaining the reason for the hopefully unique!
Now, from the Enumerable.OrderBy:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved
This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problem that is exactly this problem.
Now, is it real? Is it true?
As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG
This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wiki page there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/ and using the .OrderBy(x => r.Next()) and the .OrderBy(x => r.Next(someValue)).
Now, Random.Next is
A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.
so it's equivalent to
OrderBy(x => r.Next(int.MaxValue))
To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy, then any range of values should give the same result.
The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024). If you make the array bigger (4 or 5), then even r.Next(1024) isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)
It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).
It works by assigning each element of the series a random integer, then ordering the sequence by these integers.
It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.
This has come up many times before. Search for Fisher-Yates on StackOverflow.
Here is a C# code sample I wrote for this algorithm. You can parameterize it on some other type, if you prefer.
static public class FisherYates
{
// Based on Java code from wikipedia:
// http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
static public void Shuffle(int[] deck)
{
Random r = new Random();
for (int n = deck.Length - 1; n > 0; --n)
{
int k = r.Next(n+1);
int temp = deck[n];
deck[n] = deck[k];
deck[k] = temp;
}
}
}
Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.
One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.
I found Jon Skeet's answer to be entirely satisfactory, but my client's robo-scanner will report any instance of Random as a security flaw. So I swapped it out for System.Security.Cryptography.RNGCryptoServiceProvider. As a bonus, it fixes that thread-safety issue that was mentioned. On the other hand, RNGCryptoServiceProvider has been measured as 300x slower than using Random.
Usage:
using (var rng = new RNGCryptoServiceProvider())
{
var data = new byte[4];
yourCollection = yourCollection.Shuffle(rng, data);
}
Method:
/// <summary>
/// Shuffles the elements of a sequence randomly.
/// </summary>
/// <param name="source">A sequence of values to shuffle.</param>
/// <param name="rng">An instance of a random number generator.</param>
/// <param name="data">A placeholder to generate random bytes into.</param>
/// <returns>A sequence whose elements are shuffled randomly.</returns>
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, RNGCryptoServiceProvider rng, byte[] data)
{
var elements = source.ToArray();
for (int i = elements.Length - 1; i >= 0; i--)
{
rng.GetBytes(data);
var swapIndex = BitConverter.ToUInt32(data, 0) % (i + 1);
yield return elements[swapIndex];
elements[swapIndex] = elements[i];
}
}
Looking for an algorithm? You can use my ShuffleList class:
class ShuffleList<T> : List<T>
{
public void Shuffle()
{
Random random = new Random();
for (int count = Count; count > 0; count--)
{
int i = random.Next(count);
Add(this[i]);
RemoveAt(i);
}
}
}
Then, use it like this:
ShuffleList<int> list = new ShuffleList<int>();
// Add elements to your list.
list.Shuffle();
How does it work?
Let's take an initial sorted list of the 5 first integers: { 0, 1, 2, 3, 4 }.
The method starts by counting the nubmer of elements and calls it count. Then, with count decreasing on each step, it takes a random number between 0 and count and moves it to the end of the list.
In the following step-by-step example, the items that could be moved are italic, the selected item is bold:
0 1 2 3 4
0 1 2 3 4
0 1 2 4 3
0 1 2 4 3
1 2 4 3 0
1 2 4 3 0
1 2 3 0 4
1 2 3 0 4
2 3 0 4 1
2 3 0 4 1
3 0 4 1 2
This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)
Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!
Dice-O-Matic
The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).
I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!
I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!
Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next() for both parameters without caching these!
In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.
I might check it later by putting debugging output inside a new "Next" function so see what happens.
In Reflector I could not immediately find out how it works.
It is worth noting that due to the deferred execution of LINQ, using a random number generator instance with OrderBy() can result in a possibly unexpected behavior: The sorting does not happen until the collection is read. This means each time you read or enumerate the collection, the order changes. One would possibly expect the elements to be shuffled once and then to retain the order each time it is accessed thereafter.
Random random = new();
var shuffled = ordered.OrderBy(x => random.Next())
The code above passes a lambda function x => random.Next() as a parameter to OrderBy(). This will capture the instance referred to by random and save it with the lambda by so that it can call Next() on this instance to perform the ordering later which happens right before it is enumerated(when the first element is requested from the collection).
The problem here, is since this execution is saved for later, the ordering happens each time just before the collection is enumerated using new numbers obtained by calling Next() on the same random instance.
Example
To demonstrate this behavior, I have used Visual Studio's C# Interactive Shell:
> List<int> list = new() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
> Random random = new();
> var shuffled = list.OrderBy(element => random.Next());
> shuffled.ToList()
List<int>(10) { 5, 9, 10, 4, 6, 2, 8, 3, 1, 7 }
> shuffled.ToList()
List<int>(10) { 8, 2, 9, 1, 3, 6, 5, 10, 4, 7 } // Different order
> shuffled.ElementAt(0)
9 // First element is 9
> shuffled.ElementAt(0)
7 // First element is now 7
>
This behavior can even be seen in action by placing a breakpoint just after where the IOrderedEnumerable is created when using Visual Studio's debugger: each time you hover on the variable, the elements show up in a different order.
This, of course, does not apply if you immediately enumerate the elements by calling ToList() or an equivalent. However, this behavior can lead to bugs in many cases, one of them being when the shuffled collection is expected to contain a unique element at each index.
Startup time to run on code with clear all threads and cache every new test,
First unsuccessful code. It runs on LINQPad. If you follow to test this code.
Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});
//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);
list.OrderBy(x => r.Next()) uses 38.6528 ms
list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)
the after second time both of them use in the same time.
EDIT:
TEST CODE on Intel Core i7 4#2.1GHz, Ram 8 GB DDR3 #1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]
using System;
using System.Runtime;
using System.Diagnostics;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Threading;
namespace Algorithm
{
class Program
{
public static void Main(string[] args)
{
try {
int i = 0;
int limit = 10;
var result = GetTestRandomSort(limit);
foreach (var element in result) {
Console.WriteLine();
Console.WriteLine("time {0}: {1} ms", ++i, element);
}
} catch (Exception e) {
Console.WriteLine(e.Message);
} finally {
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
public static IEnumerable<double> GetTestRandomSort(int limit)
{
for (int i = 0; i < 5; i++) {
string path = null, temp = null;
Stopwatch st = null;
StreamReader sr = null;
int? count = null;
List<string> list = null;
Random r = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Thread.Sleep(5000);
st = Stopwatch.StartNew();
#region Import Input Data
path = Environment.CurrentDirectory + "\\data";
list = new List<string>();
sr = new StreamReader(path);
count = 0;
while (count < limit && (temp = sr.ReadLine()) != null) {
// Console.WriteLine(temp);
list.Add(temp);
count++;
}
sr.Close();
#endregion
// Console.WriteLine("--------------Random--------------");
// #region Sort by Random with OrderBy(random.Next())
// r = new Random();
// list = list.OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with OrderBy(Guid)
// list = list.OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with Parallel and OrderBy(random.Next())
// r = new Random();
// list = list.AsParallel().OrderBy(l => r.Next()).ToList();
// #endregion
// #region Sort by Random with Parallel OrderBy(Guid)
// list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList();
// #endregion
// #region Sort by Random with User-Defined Shuffle Method
// r = new Random();
// list = list.Shuffle(r).ToList();
// #endregion
// #region Sort by Random with Parallel User-Defined Shuffle Method
// r = new Random();
// list = list.AsParallel().Shuffle(r).ToList();
// #endregion
// Result
//
st.Stop();
yield return st.Elapsed.TotalMilliseconds;
foreach (var element in list) {
Console.WriteLine(element);
}
}
}
}
}
Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG
Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.
Answer: They are contradiction.