Random number distribution is uneven / non uniform - c#

I have noticed a strange issue with the random number generation in c#, it looks like sets (patterns) are repeated a lot more often than you would expect.
I'm writing a mechanism that generates activation codes, a series of 7 numbers (range 0-29).
Doing the math, there should be 30^7 (22billion) possible combinations of activation codes. Based on this it should be extremely unlikely to get a duplicate activation code before the 1 billionth code is generated. However running my test, I start getting duplicate codes after about 60,000 iteration, which is very surprising. I have also tried using RNGCryptoServiceProvider with similar results, I get duplicates at about 100,000 iterations.
I would really like to know if this is a bug/limitation of the random number generation in .Net or if I'm doing something wrong.
The following code is a test to validate the uniqueness of the generated codes:
static void Main(string[] args)
{
Random rand = new Random();
RandomActivationCode(rand, true);
Console.Out.WriteLine("Press enter");
Console.ReadLine();
}
static void RandomActivationCode(Random randomGenerator)
{
var maxItems = 11000000;
var list = new List<string>(maxItems);
var activationCodes = new HashSet<string>(list);
activationCodes.Clear();
DateTime start = DateTime.Now;
for (int i = 0; i < maxItems; ++i)
{
string activationCode = "";
for (int j = 0; j < 7; ++j)
{
activationCode += randomGenerator.Next(0,30) + "-";
}
if (activationCodes.Contains(activationCode))
{
Console.Out.WriteLine("Code: " + activationCode);
Console.Out.WriteLine("Duplicate at iteration: " + i.ToString("##,#"));
Console.Out.WriteLine("Press enter");
Console.ReadLine();
Console.Out.WriteLine();
Console.Out.WriteLine();
}
else
{
activationCodes.Add(activationCode);
}
if (i % 100000 == 0)
{
Console.Out.WriteLine("Iteration: " + i.ToString("##,#"));
Console.Out.WriteLine("Time elapsed: " + (DateTime.Now - start));
}
}
}
My workaround is to use 10 number activation codes, which means that the test runs without any duplicate values being generated. The test runs up to 11 million iterations (after which point it runs out of memory).

This is not at all surprising; this is exactly what you should expect. Your belief that it should take a long time to generate duplicates when the space of possibilities is large is simply false, so stop believing that. Start believing the truth: that if there are n possible codes then you should start getting duplicates at about the square root of n codes generated, which is about 150 thousand if n is 22 billion.
Think about it this way: by the time you have generated root-n codes, most of them have had roughly a root-n-in-n chance to have a collision. Multiply root-n by roughly root-n-in-n, and you get... roughly 100% chance of collision.
That is of course not a rigorous argument, but it should give you the right intution, to replace your faulty belief. If that argument is unconvincing then you might want to read my article on the subject:
http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx
If you want to generate a unique code then generate a GUID; that's what they're for. Note that a GUID is not guaranteed to be random, it is only guaranteed to be unique.
Another choice for generating random seeming codes that are not actually random at all, but are unique, is to generate the numbers 1, 2, 3, 4, ... as many as you want, and then use the multiplicative inverse technique to make a random-looking unique encoding of those numbers. See http://ericlippert.com/2013/11/14/a-practical-use-of-multiplicative-inverses/ for details.

Related

What is stopping this program from having an output

I made this code to brute force anagrams by printing all possible permutables of the characters in a string, however there is no output. I do not need simplifications as I am a student still learning the basics, I just need help getting it to work this way so I can better understand it.
using System;
using System.Collections.Generic;
namespace anagramSolver
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter anagram:");
string anagram = Console.ReadLine();
string temp = "";
List<string> storeStr = new List<string>(0);
List<int> storeInt = new List<int>(0);
//creates factorial number for anagram
int factorial = 1;
for (int i = anagram.Length; i > 0; i--)
{
factorial *= i;
}
while (storeStr.Count != factorial)
{
Random rnd = new Random();
while(temp.Length != anagram.Length)
{
int num = rnd.Next(anagram.Length - 1);
if (storeInt.Contains(num) == true)
{
}
else
{
storeInt.Add(num);
temp += anagram[num];
}
}
if (storeStr.Contains(temp) == true)
{
temp = "";
}
else
{
storeStr.Add(temp);
Console.WriteLine(temp, storeStr.Count);
temp = "";
}
}
}
}
}
edit: added temp reset after it is deemed contained by storeStr
Two main issues causing infinite loop:
1)
As per Random.Next documentation, the parameter is the "exclusive" upper bound. This means, if you want a random number between 0 and anagram.Length - 1 included, you should use rnd.Next(anagram.Length);.
With rnd.Next(anagram.Length - 1), you'll never hit anagram.Length - 1.
2)
Even if you solve 1, only the first main iteration goes well.
storeInt is never reset. Meaning, after the first main iteration, it will have already all the numbers in it.
So, during the second iteration, you will always hit the case if (storeInt.Contains(num) == true), which does nothing and the inner loop will go on forever.
Several issues here...
The expression rnd.Next(anagram.Length - 1) generates a value between 0 and anagram.Length - 2. For an input with two characters it will always return 0, so you'll never actually generate a full string. Remove the - 1 from it and you'll be fine.
Next, you're using a list to keep track of the character indices you've used already, but you never clear the list. You'll get one output (eventually, when the random number generator covers all the values) and then enter an infinite loop on the next generation pass. Clear storeInt after the generation loop to fix this.
While not a true infinite loop, creating a new instance of the random number generator each time will give you a lot of duplication. new Random() uses the current time as a seed value and you could potentially get through a ton of loops with each seed, generating exactly the same values until the time changes enough to change your random sequence. Create the random number generator once before you start your main loop.
And finally, your code doesn't handle repeated input letters. If the input is "aa" then there is only a single distinct output, but your code won't stop until it gets two. If the input was "aab" there are three distinct permutations ("aab", "aba", "baa") which is half of the expected results. You'll never reach your exit condition in this case. You could do it by keeping track of the indices you've used instead of the generated strings, it just makes it a bit more complex.
There are a few ways to generate permutations that are less error-prone. In general you should try to avoid "keep generating random garbage until I find the result I'm looking for" solutions. Think about how you personally would go about writing down the full list of permutations for 3, 4 or 5 inputs. What steps would you take? Think about how that would work in a computer program.
this loop
while(temp.Length != anagram.Length)
{
int num = rnd.Next(anagram.Length - 1);
if (storeInt.Contains(num) == true)
{
}
else
{
storeInt.Add(num);
temp += anagram[num];
}
}
gets stuck.
once the number is in storeInt you never change storeStr, yet thats what you are testing for loop exit

Why is the pseudo-random number generator less likely to generate 54 big numbers in a row?

Consider an event that occurs with probability p. This program checks how many failed trials it takes before the event occurs and keeps a histogram of the totals. e.g.: If p were 0.5, then this would be like asking how many times in a row a coin comes up tails before it comes up heads? With smaller values of p, we would expect many failures before we get a success.
The implementation being tested is essentially: while (!(rand.NextDouble() < p)) count++;
Here is a plot of the histogram of outcomes for count.
Immediately obvious is the irregularity at x=54. For some reason, it is approximately half as likely as it should be for a series of exactly 54 random numbers greater than or equal to p to be generated in a row.
The actual p I'm checking in this test is 1/32. (Doesn't really matter, as long as it's small enough to get some measurable number of 54's as an outcome.) And I'm counting 10000000 total successes. (Also doesn't seem to matter.) It also doesn't matter what random seed I use.
Obviously this is a quirk of the pseudo-random number generator being used by the Random.NextDouble function in .NET. But I'd like to know why does this otherwise uniform data have such a striking single spike in such an oddly specific and consistent place? What is it about this particular pseudo-random number generator that makes generating exactly 54 large numbers in a row followed by a small number half as likely as any other sequence length?
I would have expected many more non-uniform anomalies as it degenerates, not just this one spike.
Here is the code that generated this data set:
using System;
namespace RandomTest
{
class Program
{
static void Main(string[] args)
{
Random rand = new Random(1);
int numTrials = 10000000;
int[] hist = new int[512];
double p = 1.0 / 32.0;
for (int i = 0; i < numTrials; ++i) {
int count = 0;
while (!(rand.NextDouble() < p)) {
count++;
}
if (count > hist.Length - 1) {
count = hist.Length - 1;
}
hist[count]++;
}
for (int i = 0; i < hist.Length; ++i) {
Console.WriteLine("{0},{1}", i, hist[i]);
}
}
}
}
In case it's relevant, this is .Net Framework 4.7.2 on Windows x86.
I ran your code on framework 4.8 and I got point 28 to be the outlier
plot of generated data
Then I ran it again without any changes, and 58 was the outlier
2nd plot of data
My Guess as to the cause of the issue you perceive is that the random generator is random.
Running the code yields different results each time, and it appears to be random where the outlier is.
Since we know that the outlier is random, we can conclude that it is not a fault in a specific line of code. Because of this we can assume that the random outlier could be caused by mere chance that the generator picked a number significantly less than others. The randomness of randomness.

Random string collision after using Fisher-Yates algorithm (C#)

I am doing an exercise from exercism.io, in which I have to generate random names for robots. I am able to get through a bulk of the tests until I hit this test:
[Fact]
public void Robot_names_are_unique()
{
var names = new HashSet<string>();
for (int i = 0; i < 10_000; i++) {
var robot = new Robot();
Assert.True(names.Add(robot.Name));
}
}
After some googling around, I stumbled upon a couple of solutions and found out about the Fisher-Yates algorithm. I tried to implement it into my own solution but unfortunately, I haven't been able to pass the final test, and I'm stumped. If anyone could point me in the right direction with this, I'd greatly appreciate it. My code is below:
EDIT: I forgot to mention that the format of the string has to follow this: #"^[A-Z]{2}\d{3}$"
public class Robot
{
string _name;
Random r = new Random();
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string nums = "0123456789";
public Robot()
{
_name = letter() + num();
}
public string Name
{
get { return _name; }
}
private string letter() => GetString(2 ,alpha.ToCharArray(), r);
private string num() => GetString(3, nums.ToCharArray(), r);
public void Reset() => _name = letter() + num();
public string GetString(int length,char[] chars, Random rnd)
{
Shuffle(chars, rnd);
return new string(chars, 0, length);
}
public void Shuffle(char[] _alpha, Random r)
{
for(int i = _alpha.Length - 1; i > 1; i--)
{
int j = r.Next(i);
char temp = _alpha[i];
_alpha[i] = _alpha[j];
_alpha[j] = temp;
}
}
}
The first rule of any ID is:
It does not mater how big it is, how many possible value it has - if you just create enough of them, you will get a colission eventually.
To Quote Trillian from the Hithchikers Guide: "[A colission] is not impossible. Just realy, really unlikely."
However in this case, I think it is you creating Random Instances in a Loop. This is a classical beginners mistake when workign with Random. You should not create a new random isntance for each Robot Instance, you should have one for the application that you re-use. Like all Pseudorandom Number Generators, Random is deterministic. Same inputs - same outputs.
As you did not specify a seed value, it will use the time in milliseconds. Wich is going to the same between the first 20+ loop itterations at last. So it is going to have the same seed and the same inputs, so the same outputs.
The easiest solution for unique names is to use GUIDs. In theory, it is possible to generate non-unique GUIDs but it is pretty close to zero.
Here is the sample code:
var newUniqueName = Guid.NewGuid().ToString();
Sure GUIDs do not look pretty but they are really easy to use.
EDIT: Since the I missed the additional requirement for the format I see that GUID format is not acceptable.
Here is an easy way to do that too. Since format is two letters (26^2 possibile values) and 3 digits (10^3 possible values) the final number of possible values is 26^2 * 10^3 = 676 * 1000 = 676000. This number is quite small so Random can be used to generate the random integer in the range 0-675999 and then that number can be converted to the name. Here is the sample code:
var random = new System.Random();
var value = random.Next(676000);
var name = ((char)('A' + (value % 26))).ToString();
value /= 26;
name += (char)('A' + (value % 26));
value /= 26;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
The usual disclaimer about possible identical names applies here too since we have 676000 possible variants and 10000 required names.
EDIT2: Tried the code above and generating 10000 names using random numbers produced between 9915 and 9950 unique names. That is no good. I would use a simple static in class member as a counter instead of random number generator.
First, let's review the test you're code is failing against:
10.000 instances created
Must all have distinct names
So somehow, when creating 10000 "random" names, your code produces at least two names that are the same.
Now, let's have a look at the naming scheme you're using:
AB123
The maximum number of unique names we could possibly create is 468000 (26 * 25 * 10 * 9 * 8).
This seems like it should not be a problem, because 10000 < 468000 - but this is where the birthday paradox comes in!
From wikipedia:
In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.
Rewritten for the purposes of your problem, we end up asking:
What's the probability that, in a set of 10000 randomly chosen people, some pair of them will have the same name.
The wikipedia article also lists a function for approximating the number of people required to reach a 50% propbability that two people will have the same name:
where m is the total number of possible distinct values. Applying this with m=468000 gives us ~806 - meaning that after creating only 806 randomly named Robots, there's already a 50% chance of two of them having the same name.
By the time you reach Robot #10000, the chances of not having generated two names that are the same is basically 0.
As others have noted, you can solve this by using a Guid as the robot name instead.
If you want to retain the naming convention you might also get around this by implementing an LCG with an appropriate period and use that as a less collision-prone "naming generator".
Here's one way you can do it:
Generate the list of all possible names
For each robot, select a name from the list at random
Remove the selected name from the list so it can't be selected again
With this, you don't even need to shuffle. Something like this (note, I stole Optional Option's method of generating names because it's quite clever and I couldn't be bother thinking of my own):
public class Robot
{
private static List<string> names;
private static Random rnd = new Random();
public string Name { get; private set; }
static Robot()
{
Console.WriteLine("Initializing");
// Generate possible candidates
names = Enumerable.Range(0, 675999).Select(i =>
{
var sb = new StringBuilder(5);
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
return sb.ToString();
}).ToList();
}
public Robot()
{
// Note: if this needs to be multithreaded, then you'd need to do some work here
// to avoid two threads trying to take a name at the same time
// Also note: you should probably check that names.Count > 0
// and throw an error if not
var i = rnd.Next(0, names.Count - 1);
Name = names[i];
names.RemoveAt(i);
}
}
Here's a fiddle that generates 20 random names. They can only be unique because they are removed once they are used.
The point about multitheading is very important however. If you needed to be able to generate robots in parallel, then you'd need to add some code (e.g. locking the critical section of code) to ensure that only one name is being picked and removed from the list of candidates at a time or else things will get really bad, really quickly. This is why, when people need a random id with a reasonable expectation that it'll be unique, without worrying that some other thread(s) are trying the same thing at the same time, they use GUIDs. The sheer number of possible GUIDs makes collisions very unlikely. But you don't have that luxury with only 676,000 possible values

True random is not true random I am really confused

Alright I tested the way below.
Generated x times random numbers between 0~x and then checked the ones that were not generated.
I would assume that it would be very close to 100%. What I mean is all numbers between 0~x are generated.
But results are shocking. About 36% of the numbers are missing.
Is my random function not really random?
Here below my random class:
private static Random seedGenerator = new Random();
private static ThreadLocal<Random> random =
new ThreadLocal<Random>(SeededRandomFactory);
private static Random SeededRandomFactory()
{
lock (seedGenerator)
return new Random(seedGenerator.Next());
}
public static int GenerateRandomValueMin(int irRandValRange, int irMinValue)
{
return random.Value.Next(irMinValue, irMinValue + irRandValRange);
}
Here the below results:
Between 0-10, missing numbers count: 4, percent: 40%
Between 0-100, missing numbers count: 36, percent: 36%
Between 0-1000, missing numbers count: 369, percent: 36,9%
Between 0-10000, missing numbers count: 3674, percent: 36,74%
Between 0-100000, missing numbers count: 36583, percent: 36,58%
Between 0-1000000, missing numbers count: 367900, percent: 36,79%
Between 0-10000000, missing numbers count: 3678122, percent: 36,78%
Between 0-100000000, missing numbers count: 36797477, percent: 36,8%
Here the code how I check:
File.WriteAllText("results.txt", "");
int irFirst = 10;
for (int i = 0; i < 8; i++)
{
HashSet<int> hsGenerated = new HashSet<int>();
for (int k = 0; k < irFirst; k++)
{
hsGenerated.Add(GenerateRandomValue.GenerateRandomValueMin(irFirst, 0));
}
int irNotFound = 0;
for (int k = 0; k < irFirst; k++)
{
if (hsGenerated.Contains(k) == false)
irNotFound++;
}
string srSonuc =
string.Format(
"Between 0-{0}, missing numbers count: {1}, percent: {2}%",
irFirst, irNotFound,
Math.Round((Convert.ToDouble(irNotFound)/Convert.ToDouble(irFirst))*100.0, 2).ToString()
);
using (StreamWriter w = File.AppendText("sonuclar.txt"))
{
w.WriteLine(srSonuc);
}
irFirst = irFirst * 10;
}
As mentioned in the comments, your testing method is off.
You draw x times a number between 0 and x. The probability that a specific number is not drawn is:
As x approaches infinity, p will go towards 1/e (or approx. 36.7879441%) And this is the number you are seeing in your results.
Also, as x approaches infinity, you will observe this probability as outcome of your sample (Law of large numbers)
This has to do with probability. When you have a bowl with a red and a white marble. And you take one, put it back an take another one you cannot guarantee that you see both. You could take the red one twice. You are doing the same thing with more objects.
To elaborate on true true randomness:
I would expect close to 99% percent instead of 64%. Or at least 90%+ percent. So you say this isn't possible with current technology
That is simple. Thanks to modern math, technology and my super powers I can tell you how to do that: You need more draws than numbers to choose from. The formula becomes:
where n is you desired percentage of missing numbers. For example if you are willing to accept 5% numbers missing, you must draw three times as many random numbers. For a 1% chance, you need to iterate 4.6 times the maximum number.
This math assumes a perfectly uniform random number generation.
Your results are exactly what is to be expected from a uniform distribution where you sample with replacement.
Consider the simplest possible example. You have a coin and toss it twice. So we assume that we are sampling from a uniform discrete distribution.
The possible outcomes, which occur with equal probability of 0.25 are:
TT
TH
HT
HH
As you can see, only two of the four outcomes have both heads and tails.
This is known as sampling with replacement. So, once we have sampled a tails, then we "put it back in the bag", and it could come out again on the next sample.
Now suppose we sample without replacement. In that case there are two possible outcomes:
TH
HT
And as you see, each possible value appears exactly once.
Essentially your expectation for the results is not correct. As another example, suppose you toss a coin and it comes down tails. What do you expect will happen on the next toss. You are arguing that the coin must now come down heads. But that is clearly nonsense.
If you did want to sample without replacement, and it's not clear that's really what you want, then you do so with the Fisher-Yates shuffle.
If you want 100 million unique random number you could do something like this:
Now using Fisher-Yates suffle algorithm:
List<int> numbers = new List<int>(100000000);
for (int i = 0; i < numbers.Capacity; i++)
{
int rnd = random.Next(numbers.Count + 1);
if (rnd == numbers.Count)
numbers.Add(i);
else
{
numbers.Add(numbers[rnd]);
numbers[rnd] = i;
}
}
By the way you could calculate irNotFound much faster:
int irNotFound = irFirst - hsGenerated.Count;
Good luck with your quest.

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

I have a local class with a method used to build a list of strings and I'm finding that when I hit this method (in a for loop of 1000 times) often it's not returning the amount I request.
I have a global variable:
string[] cachedKeys
A parameter passed to the method:
int requestedNumberToGet
The method looks similar to this:
List<string> keysToReturn = new List<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ?
cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
//Do we have enough to fill the request within the time? otherwise give
//however many we currently have
while (DateTime.Now < breakoutTime
&& keysToReturn.Count < numberPossibleToGet
&& cachedKeys.Length >= numberPossibleToGet)
{
string randomKey = cachedKeys[rand.Next(0, cachedKeys.Length)];
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
}
if (keysToReturn.Count != numberPossibleToGet)
Debugger.Break();
I have approximately 40 strings in cachedKeys none exceeding 15 characters in length.
I'm no expert with threading so I'm literally just calling this method 1000 times in a loop and consistently hitting that debug there.
The machine this is running on is a fairly beefy desktop so I would expect the breakout time to be realistic, in fact it randomly breaks at any point of the loop (I've seen 20s, 100s, 200s, 300s).
Any one have any ideas where I'm going wrong with this?
Edit: Limited to .NET 2.0
Edit: The purpose of the breakout is so that if the method is taking too long to execute, the client (several web servers using the data for XML feeds) won't have to wait while the other project dependencies initialise, they'll just be given 0 results.
Edit: Thought I'd post the performance stats
Original
'0.0042477465711424217323710136' - 10
'0.0479597267250446634977350473' - 100
'0.4721072091564710039963179678' - 1000
Skeet
'0.0007076318358897569383818334' - 10
'0.007256508857969378789762386' - 100
'0.0749829936486341141122684587' - 1000
Freddy Rios
'0.0003765841748043396576939248' - 10
'0.0046003053460705201359390649' - 100
'0.0417058592642360970458535931' - 1000
Why not just take a copy of the list - O(n) - shuffle it, also O(n) - and then return the number of keys that have been requested. In fact, the shuffle only needs to be O(nRequested). Keep swapping a random member of the unshuffled bit of the list with the very start of the unshuffled bit, then expand the shuffled bit by 1 (just a notional counter).
EDIT: Here's some code which yields the results as an IEnumerable<T>. Note that it uses deferred execution, so if you change the source that's passed in before you first start iterating through the results, you'll see those changes. After the first result is fetched, the elements will have been cached.
static IEnumerable<T> TakeRandom<T>(IEnumerable<T> source,
int sizeRequired,
Random rng)
{
List<T> list = new List<T>(source);
sizeRequired = Math.Min(sizeRequired, list.Count);
for (int i=0; i < sizeRequired; i++)
{
int index = rng.Next(list.Count-i);
T selected = list[i + index];
list[i + index] = list[i];
list[i] = selected;
yield return selected;
}
}
The idea is that at any point after you've fetched n elements, the first n elements of the list will be those elements - so we make sure that we don't pick those again. When then pick a random element from "the rest", swap it to the right position and yield it.
Hope this helps. If you're using C# 3 you might want to make this an extension method by putting "this" in front of the first parameter.
The main issue are the using retries in a random scenario to ensure you get unique values. This quickly gets out of control, specially if the amount of items requested is near to the amount of items to get i.e. if you increase the amount of keys, you will see the issue less often but that can be avoided.
The following method does it by keeping a list of the keys remaining.
List<string> GetSomeKeys(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keysRemaining = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int randomIndex = rand.Next(keysRemaining.Count);
keysToReturn.Add(keysRemaining[randomIndex]);
keysRemaining.RemoveAt(randomIndex);
}
return keysToReturn;
}
The timeout was necessary on your version as you could potentially keep retrying to get a value for a long time. Specially when you wanted to retrieve the whole list, in which case you would almost certainly get a fail with the version that relies on retries.
Update: The above performs better than these variations:
List<string> GetSomeKeysSwapping(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keys = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int index = rand.Next(numberPossibleToGet - i) + i;
keysToReturn.Add(keys[index]);
keys[index] = keys[i];
}
return keysToReturn;
}
List<string> GetSomeKeysEnumerable(string[] cachedKeys, int requestedNumberToGet)
{
Random rand = new Random();
return TakeRandom(cachedKeys, requestedNumberToGet, rand).ToList();
}
Some numbers with 10.000 iterations:
Function Name Elapsed Inclusive Time Number of Calls
GetSomeKeys 6,190.66 10,000
GetSomeKeysEnumerable 15,617.04 10,000
GetSomeKeysSwapping 8,293.64 10,000
A few thoughts.
First, your keysToReturn list is potentially being added to each time through the loop, right? You're creating an empty list and then adding each new key to the list. Since the list was not pre-sized, each add becomes an O(n) operation (see MSDN documentation). To fix this, try pre-sizing your list like this.
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Second, your breakout time is unrealistic (ok, ok, impossible) on Windows. All of the information I've ever read on Windows timing suggests that the best you can possibly hope for is 10 millisecond resolution, but in practice it's more like 15-18 milliseconds. In fact, try this code:
for (int iv = 0; iv < 10000; iv++) {
Console.WriteLine( DateTime.Now.Millisecond.ToString() );
}
What you'll see in the output are discrete jumps. Here is a sample output that I just ran on my machine.
13
...
13
28
...
28
44
...
44
59
...
59
75
...
The millisecond value jumps from 13 to 28 to 44 to 59 to 75. That's roughly a 15-16 millisecond resolution in the DateTime.Now function for my machine. This behavior is consistent with what you'd see in the C runtime ftime() call. In other words, it's a systemic trait of the Windows timing mechanism. The point is, you should not rely on a consistent 5 millisecond breakout time because you won't get it.
Third, am I right to assume that the breakout time is prevent the main thread from locking up? If so, then it'd be pretty easy to spawn off your function to a ThreadPool thread and let it run to completion regardless of how long it takes. Your main thread can then operate on the data.
Use HashSet instead, HashSet is much faster for lookup than List
HashSet<string> keysToReturn = new HashSet<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
int length = cachedKeys.Length;
while (DateTime.Now < breakoutTime && keysToReturn.Count < numberPossibleToGet) {
int i = rand.Next(0, length);
while (!keysToReturn.Add(cachedKeys[i])) {
i++;
if (i == length)
i = 0;
}
}
Consider using Stopwatch instead of DateTime.Now. It may simply be down to the inaccuracy of DateTime.Now when you're talking about milliseconds.
The problem could quite possibly be here:
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
This will require iterating over the list to determine if the key is in the return list. However, to be sure, you should try profiling this using a tool. Also, 5ms is pretty fast at .005 seconds, you may want to increase that.

Categories

Resources