Ensure uniform (ish) distribution with random number generation

Ensure uniform (ish) distribution with random number generation - c#

I have a list of objects and I would like to access the objects in a random order continuously.
I was wondering if there was a way of ensuring that the random value were not always similar.
Example.
My list is a list of Queues, and I am trying to interleave the values to produce a real-world scenario for testing.
I don't particularly want all of the items in Queues 1 and 2 before any other item.
Is there a guaruanteed way to do this?
Thanks
EDIT ::
The List of Queues I have is a basically a list of files that i am transmitting to a webservice. Files need to be in a certain order hence the Queues.
So I have
Queue1 = "set1_1.xml", set1_2.xml", ... "set1_n.xml"
Queue2 ...
...
QueueN
While each file needs to be transmitted in order in terms of the other files in its queue, I would like to simulate a real world simulation where files would be received from different sources at different times and so have them interleaved.
At the moment I am just using a simple rand on 0 to (number of Queues) to determine which file to dequeue next. This works but I was asking if there might have been away to get some more uniformity rather than having 50 files from Queue 1 and 2 and then 5 files from Queue 3.
I do realise though that altering the randomness no longer makes it random.
Thank you for all your answers.

Well, it isn't entire clear what the scenario is, but the thing with random is you never can tell ;-p. Anything you try to do to "guarantee" thins will probably reduce the randomness.
How are you doing it? Personally I'd do something like:
static IEnumerable<T> GetItems<T>(IEnumerable<Queue<T>> queues)
{
int remaining = queues.Sum(q => q.Count);
Random rand = new Random();
while (remaining > 0)
{
int index = rand.Next(remaining);
foreach (Queue<T> q in queues)
{
if (index < q.Count)
{
yield return q.Dequeue();
remaining--;
break;
}
else
{
index -= q.Count;
}
}
}
}
This should be fairly uniform over the entire set. The trick here is that by treating the queues as a single large queue, the tendency is that the queue's with lots of items will get dequeued more quickly (since there is more chance of getting an index in their range). This means that it should automatically balance consumption between the queues so that they all run dry at (roughly) the same time. If you don't have LINQ, just change the first line:
int remaining = 0;
foreach(Queue<T> q in queues) {remaining += q.Count;}
Example usage:
static void Main()
{
List<Queue<int>> queues = new List<Queue<int>> {
Build(1,2,3,4,5), Build(6,7,8), Build(9,10,11,12,13)
};
foreach (int i in GetItems(queues))
{
Console.WriteLine(i);
}
}
static Queue<T> Build<T>(params T[] items)
{
Queue<T> queue = new Queue<T>();
foreach (T item in items)
{
queue.Enqueue(item);
}
return queue;
}

It depends on what you really want...
If the "random" values are truly random then you will get uniform distribution with enough iterations.
If you're talking about controlling or manipulating the distribution then the values will no longer be truly random!
So, you can either have:
Truly random values with uniform distribution, or
Controlled distribution, but no longer truly random

Are you trying to shuffle your list?
If so you can do it by sorting it on a random value.
Try something like this:
private Random random = new Random();
public int RandomSort(Queue q1, Queue q2)
{
if (q1 == q2) { return 0; }
return random.Next().CompareTo(random.Next());
}
And then use the RandomSort as the argument in a call to List.Sort();

If the items to be queued have a GetHashCode() algorithm which distributes values evenly across all integers, you can take the modulus operator on the hash value to specify which queue to add the item to. This is basically the same principle which hash tables use to assure an even distribution of values.

Related

Confused with the Parallel.ForEach loop in C#

I'm new to C# and am basically trying to generate large Integers using the BigInteger class in C#, by feeding it a byte[] array of randomly filled values. I have another method, CheckForPrime(BigIngeger b) that checks for a prime number and simply returns true if the number is prime. So to run everything, I'm using a Parallel ForEach loop which loops infinitely until a condition is false. Here is my code:
private static IEnumerable<bool> IterateUntilFalse(bool condition)
{
while (condition) yield return true;
}
and here is my attempt at a Parallel.ForEach loop that runs till the condition is false:
public void PrintOutPrimes(int numPrimes)
{
byte[] makeBytes = new byte[512];
BigInteger makePrime;
RandomNumberGenerator r = RandomNumberGenerator.Create();
ConcurrentBag<BigInteger> bbag = new ConcurrentBag<BigInteger>();
int trackNum = 0;
bool condition = (trackNum <= numPrimes);
Parallel.ForEach(IterateUntilFalse(condition), (ignored, state) =>
{
if (trackNum > numPrimes) state.Stop();
r.GetNonZeroBytes(makeBytes);
makePrime = BigInteger.Abs(new BigInteger(makeBytes));
if (CheckForPrime(makePrime))
{
bbag.Add(makePrime);
if(bbag.TryPeek(out makePrime))
{
Console.WriteLine(makePrime);
}
Interlocked.Increment(ref trackNum);
}
});
}
I want to generate large prime numbers using the performance of Parallel.ForEach, and then print them out as it finds them. The problem seems like the code is "bypassing" the if statement in the loop and adding non-prime numbers to the concurrent bag too. Is there a way to avoid this and guarantee that only the prime numbers be added to the bag and then printed sequentially?

bool condition is not a condition, it's the result of evaluating the condition once. Basically it is always true and will never change. Therefore, your iterator will yield true forever.
To solve that, you need a real condition that can be evaluated several times, like a Lambda expression. But even with a lambda this won't work well due to race conditions in your local variables.
Actually I don't really see why you use Parallel.Foreach at all. If you want 500 primes, then use Parallel.For, not Parallel.Foreach.
Instead of generating a huge amount of random numbers and checking them for primality, choose a random number and increment it until you find a prime number. That way it's guaranteed that you get one prime number for every random number input.
Concept:
Parallel.For(0, numPrimes, (i, state)=>
{
byte[] makeBytes = new byte[512];
r.GetNonZeroBytes(makeBytes);
BigInteger makePrime = BigInteger.Abs(new BigInteger(makeBytes));
while (!IsPrime(makePrime))
makePrime += 1; // at some point, it will become prime
bbag.Add(makePrime);
});
You probably want to start with an odd number and increment by 2.
You definitely want one RNG per thread instead of accessing a single global RNG, except you have a thread-safe RNG.
You also want the byte[] to be part of the state, so that it does not get recreated per loop and messes with garbage collection.

How can I implement odd-even sorting in C# using threads?

I am practicing about threads and concurrency in C# and tried to implement the basic odd-even sort algorithm using a thread for even and another for odd sorting.
static bool Sort(int startPosition, List<int> list)
{
bool result = true;
do
{
for (int i = startPosition; i <= list.Count - 2; i = i + 2)
{
if (list[i] > list[i + 1])
{
int temp = list[i];
list[i] = list[i + 1];
list[i + 1] = temp;
result = false;
}
}
} while (!result);
return result;
}
While the main method is like this:
static void Main(string[] args)
{
bool isOddSorted = false;
bool isEvenSorted = false;
List<int> list = new List<int>();
while (list.Count < 15)
{
list.Add(new Random().Next(0, 20));
}
var evenThread = new Thread(() =>
{
isEvenSorted = Sort(0, list);
});
evenThread.Start();
var oddThread = new Thread(() =>
{
isOddSorted = Sort(1, list);
});
oddThread.Start();
while (true)
{
if (isEvenSorted && isOddSorted)
{
foreach (int i in list)
{
Console.WriteLine(i);
}
break;
}
}
}
Understandably, the loop in Sort method works forever because the result variable is never set to true. However the way it works manages to sort the list. It just doesn't break at any time.
However the moment I add a "result = true" to the first line of do-scope of Sort function, the sorting messes up.
I couldn't figure out how to fix this.

You cannot do odd-even sort easily in a multi-threaded manner. Why?
Because the odd-even sort is in essence the repetition of two sorting passes (the odd and the even pass), with any subsequent pass depending on the result of the preceding pass. You cannot run two passes in parallel/concurrently in practical terms, as each pass has to follow each other.
There are of course ways to employ multi-threading, even with odd-even-sort, although that wouldn't probably make much practical sense. For example, you could divide the list into several partitions, with each partition being odd-even-sorted independently. The sorting of each partition could be done in a multi-threaded manner. As a final step it would require merging the sorted partitions in a way that would result in the fully sorted list.
(By the way, that you eventually get a sorted list if you only let the do while loops in your Sort method run many, many times is just that given enough time, even with "overlapping" concurrent passes you reach eventually a sorted list, but maybe not with all the same numbers from the original list. Because given enough repetions of the loop, eventually the elements will be compared with each other and shuffled to the right positions. However, since you have not synchronized list access, you might lose some numbers from the list, being replaced with duplicates of other numbers, depending on the runtime behavior and timing of list accesses between the two threads.)

You are trying to modify non-thread safe collection across threads.
Even if the assumption is good - you are using basic swap in Sort method (but you did not implement it entirely correct), you have to take under account that while one of the threads is doing the swap, other one could swap a value that is being in temp variable in this exact moment.
You would need to familiarize ourself with either locks and/or thread-Safe Collections.

Look at your result variable and the logic you have implemented with regard to result.
The outer do ... while (!result) loop will only exit when result is being true.
Now imagine your inner for loop finds two numbers that need swapping. So it does and swaps the numbers. And sets result to false. And here is my question to you: After result has been set to false when two numbers have been swapped, when and where is result ever being set to true?
Also, while you sort each the numbers on even list positions, and each the numbers on odd positions, your code does not do a final sort across the entire list. So, basically, if after doing the even and odd sorting, a larger number on an even position n is followed by a smaller number on odd position n+1, your code leaves it at that, leaving the list essentially still (partially) unsorted...

Algorithm for preventing burst from a producer

I have the following :
One producer that produces random integer (around one every minute). Eg : 154609722148751
One consumer that consumes theses integers one by one. Consumption is around 3 seconds long.
From time to time the producer get crazy and produces only one 'kind' of figure very quickly and then get back to normal.
Eg : 6666666666666666666666666666666666666666666666675444696 in 1 second.
My goal is to have as lower as possible as different kind of figure not consumed.
Say, in the previous sample
I have :
a lot of '6' not consumed
one '7' not consumed
one '5' not consumed
three '4' not consumed
one '9' not consumed
If I use a simple FIFO algorithm I am going to wait a long time before all the '6' being consumed. I would prefer to 'priortize' the other figures and THEN consume the '6'.
Does such an algorithm already exists ? (C# implementation is a plus)
Currently, I was thinking about this algorithm :
have a queue for each figure (Q0,Q1,Q2 ..., Q9)
sequentially dequeue one item for each queue :
private int _currentQueueId;
private Queue<T>[] _allQueues;
public T GetNextItemToConsume<T>()
{
//We assume that there is at least one item to consume, and no lock needed
var found = false;
while(!found)
{
var nextQueue = _allQueues[_currentQueueId++ % 10];
if(nextQueue.Count > 0)
return nextQueue.DeQueue();
}
}
Do you have better algorithm than this one ? (or any idea)
NB1 : I don't have the lead on the consumption process (that is to say I can't increase the consumption speed nor the number of consumption thread ..) (indeed an infinite consumation speed would solve my issue)
NB2 : exact time's figures are not relevant but we can assume that consumption is ten times quicker that production
NB3 : I don't have the lead on the 'crazy' behaviour of the producer and in fact it is a normal (but not so frequent) production behaviour

Here is a more complete example than my comment above:
sealed class Item {
public Item(int group) {
ItemGroup = group;
}
public int ItemGroup { get; private set; }
}
Item TakeNextItem(IList<Item> items) {
const int LowerBurstLimit = 1;
if (items == null)
throw new ArgumentNullException("items");
if (items.Count == 0)
return null;
var item = items.GroupBy(x => x.ItemGroup)
.OrderBy(x => Math.Max(LowerBurstLimit, x.Count()))
.First()
.First();
items.Remove(item);
return item;
}
My idea here is to sort the items based on their frequency and then just take from the one with lowest frequency. It is the same idea as your with your multiple queues but it is calculated on the fly.
If there are multiple groups with same frequency it will take the oldest item. (assuming GroupBy and OrderBy are stable. They are in practice but I am not sure it is stated in the documentation)
Increase LowerBurstLimit if you want to process the item in chronological order except the ones with more than LowerBurstLimit item in the queue.
To measure the time I just created this quick code in LinqPad.
(Eric Lippert: please ignore this part :-))
void Main()
{
var rand = new Random();
var items = Enumerable.Range(1, 1000)
.Select(x => { // simulate burst
int group = rand.Next(100);
return new Item(group < 90 ? 1 : (group % 10));
})
.ToList();
var first = TakeNextItem(items); // JIT
var sw = new Stopwatch();
sw.Start();
while (TakeNextItem(items) != null) {
}
sw.Stop();
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
When I run this code with a 1000 items it takes around 80ms on my 3 year old laptop. i.e. on average 80µs per "Take".
GroupBy should be O(N*M), OrderBy O(M*LogM) and Remove O(N) (N=average queue length, M=number of groups) so the performance should scale linearly with N. i.e. a 10000 item queue take ~800µs per "Take" (I got ~700µs with 10000 items in the test)

I would use the 10 queues as you mentionned and select the one from which to dequeue statistically based on the number of element present in the particular queue. The queue with the highest number of elements is more likely to be selected for dequeue.
For better perf, you need to keep track of the total count of elements across all queues. For each dequeue operation, draw a random int X between 0 and total count-1, this will tell you from which queue to dequeue (loop through the queues a substract the number of elements in the queue from X, until you would go below zero, then pick that queue).

Why is processing a sorted array slower than an unsorted array?

I have a list of 500000 randomly generated Tuple<long,long,string> objects on which I am performing a simple "between" search:
var data = new List<Tuple<long,long,string>>(500000);
...
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
When I generate my random array and run my search for 100 randomly generated values of x, the searches complete in about four seconds. Knowing of the great wonders that sorting does to searching, however, I decided to sort my data - first by Item1, then by Item2, and finally by Item3 - before running my 100 searches. I expected the sorted version to perform a little faster because of branch prediction: my thinking has been that once we get to the point where Item1 == x, all further checks of t.Item1 <= x would predict the branch correctly as "no take", speeding up the tail portion of the search. Much to my surprise, the searches took twice as long on a sorted array!
I tried switching around the order in which I ran my experiments, and used different seed for the random number generator, but the effect has been the same: searches in an unsorted array ran nearly twice as fast as the searches in the same array, but sorted!
Does anyone have a good explanation of this strange effect? The source code of my tests follows; I am using .NET 4.0.
private const int TotalCount = 500000;
private const int TotalQueries = 100;
private static long NextLong(Random r) {
var data = new byte[8];
r.NextBytes(data);
return BitConverter.ToInt64(data, 0);
}
private class TupleComparer : IComparer<Tuple<long,long,string>> {
public int Compare(Tuple<long,long,string> x, Tuple<long,long,string> y) {
var res = x.Item1.CompareTo(y.Item1);
if (res != 0) return res;
res = x.Item2.CompareTo(y.Item2);
return (res != 0) ? res : String.CompareOrdinal(x.Item3, y.Item3);
}
}
static void Test(bool doSort) {
var data = new List<Tuple<long,long,string>>(TotalCount);
var random = new Random(1000000007);
var sw = new Stopwatch();
sw.Start();
for (var i = 0 ; i != TotalCount ; i++) {
var a = NextLong(random);
var b = NextLong(random);
if (a > b) {
var tmp = a;
a = b;
b = tmp;
}
var s = string.Format("{0}-{1}", a, b);
data.Add(Tuple.Create(a, b, s));
}
sw.Stop();
if (doSort) {
data.Sort(new TupleComparer());
}
Console.WriteLine("Populated in {0}", sw.Elapsed);
sw.Reset();
var total = 0L;
sw.Start();
for (var i = 0 ; i != TotalQueries ; i++) {
var x = NextLong(random);
var cnt = data.Count(t => t.Item1 <= x && t.Item2 >= x);
total += cnt;
}
sw.Stop();
Console.WriteLine("Found {0} matches in {1} ({2})", total, sw.Elapsed, doSort ? "Sorted" : "Unsorted");
}
static void Main() {
Test(false);
Test(true);
Test(false);
Test(true);
}
Populated in 00:00:01.3176257
Found 15614281 matches in 00:00:04.2463478 (Unsorted)
Populated in 00:00:01.3345087
Found 15614281 matches in 00:00:08.5393730 (Sorted)
Populated in 00:00:01.3665681
Found 15614281 matches in 00:00:04.1796578 (Unsorted)
Populated in 00:00:01.3326378
Found 15614281 matches in 00:00:08.6027886 (Sorted)

When you are using the unsorted list all tuples are accessed in memory-order. They have been allocated consecutively in RAM. CPUs love accessing memory sequentially because they can speculatively request the next cache line so it will always be present when needed.
When you are sorting the list you put it into random order because your sort keys are randomly generated. This means that the memory accesses to tuple members are unpredictable. The CPU cannot prefetch memory and almost every access to a tuple is a cache miss.
This is a nice example for a specific advantage of GC memory management: data structures which have been allocated together and are used together perform very nicely. They have great locality of reference.
The penalty from cache misses outweighs the saved branch prediction penalty in this case.
Try switching to a struct-tuple. This will restore performance because no pointer-dereference needs to occur at runtime to access tuple members.
Chris Sinclair notes in the comments that "for TotalCount around 10,000 or less, the sorted version does perform faster". This is because a small list fits entirely into the CPU cache. The memory accesses might be unpredictable but the target is always in cache. I believe there is still a small penalty because even a load from cache takes some cycles. But that seems not to be a problem because the CPU can juggle multiple outstanding loads, thereby increasing throughput. Whenever the CPU hits a wait for memory it will still speed ahead in the instruction stream to queue as many memory operations as it can. This technique is used to hide latency.
This kind of behavior shows how hard it is to predict performance on modern CPUs. The fact that we are only 2x slower when going from sequential to random memory access tell me how much is going on under the covers to hide memory latency. A memory access can stall the CPU for 50-200 cycles. Given that number one could expect the program to become >10x slower when introducing random memory accesses.

LINQ doesn't know whether you list is sorted or not.
Since Count with predicate parameter is extension method for all IEnumerables, I think it doesn't even know if it's running over the collection with efficient random access. So, it simply checks every element and Usr explained why performance got lower.
To exploit performance benefits of sorted array (such as binary search), you'll have to do a little bit more coding.

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

I have a local class with a method used to build a list of strings and I'm finding that when I hit this method (in a for loop of 1000 times) often it's not returning the amount I request.
I have a global variable:
string[] cachedKeys
A parameter passed to the method:
int requestedNumberToGet
The method looks similar to this:
List<string> keysToReturn = new List<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ?
cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
//Do we have enough to fill the request within the time? otherwise give
//however many we currently have
while (DateTime.Now < breakoutTime
&& keysToReturn.Count < numberPossibleToGet
&& cachedKeys.Length >= numberPossibleToGet)
{
string randomKey = cachedKeys[rand.Next(0, cachedKeys.Length)];
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
}
if (keysToReturn.Count != numberPossibleToGet)
Debugger.Break();
I have approximately 40 strings in cachedKeys none exceeding 15 characters in length.
I'm no expert with threading so I'm literally just calling this method 1000 times in a loop and consistently hitting that debug there.
The machine this is running on is a fairly beefy desktop so I would expect the breakout time to be realistic, in fact it randomly breaks at any point of the loop (I've seen 20s, 100s, 200s, 300s).
Any one have any ideas where I'm going wrong with this?
Edit: Limited to .NET 2.0
Edit: The purpose of the breakout is so that if the method is taking too long to execute, the client (several web servers using the data for XML feeds) won't have to wait while the other project dependencies initialise, they'll just be given 0 results.
Edit: Thought I'd post the performance stats
Original
'0.0042477465711424217323710136' - 10
'0.0479597267250446634977350473' - 100
'0.4721072091564710039963179678' - 1000
Skeet
'0.0007076318358897569383818334' - 10
'0.007256508857969378789762386' - 100
'0.0749829936486341141122684587' - 1000
Freddy Rios
'0.0003765841748043396576939248' - 10
'0.0046003053460705201359390649' - 100
'0.0417058592642360970458535931' - 1000

Why not just take a copy of the list - O(n) - shuffle it, also O(n) - and then return the number of keys that have been requested. In fact, the shuffle only needs to be O(nRequested). Keep swapping a random member of the unshuffled bit of the list with the very start of the unshuffled bit, then expand the shuffled bit by 1 (just a notional counter).
EDIT: Here's some code which yields the results as an IEnumerable<T>. Note that it uses deferred execution, so if you change the source that's passed in before you first start iterating through the results, you'll see those changes. After the first result is fetched, the elements will have been cached.
static IEnumerable<T> TakeRandom<T>(IEnumerable<T> source,
int sizeRequired,
Random rng)
{
List<T> list = new List<T>(source);
sizeRequired = Math.Min(sizeRequired, list.Count);
for (int i=0; i < sizeRequired; i++)
{
int index = rng.Next(list.Count-i);
T selected = list[i + index];
list[i + index] = list[i];
list[i] = selected;
yield return selected;
}
}
The idea is that at any point after you've fetched n elements, the first n elements of the list will be those elements - so we make sure that we don't pick those again. When then pick a random element from "the rest", swap it to the right position and yield it.
Hope this helps. If you're using C# 3 you might want to make this an extension method by putting "this" in front of the first parameter.

The main issue are the using retries in a random scenario to ensure you get unique values. This quickly gets out of control, specially if the amount of items requested is near to the amount of items to get i.e. if you increase the amount of keys, you will see the issue less often but that can be avoided.
The following method does it by keeping a list of the keys remaining.
List<string> GetSomeKeys(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keysRemaining = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int randomIndex = rand.Next(keysRemaining.Count);
keysToReturn.Add(keysRemaining[randomIndex]);
keysRemaining.RemoveAt(randomIndex);
}
return keysToReturn;
}
The timeout was necessary on your version as you could potentially keep retrying to get a value for a long time. Specially when you wanted to retrieve the whole list, in which case you would almost certainly get a fail with the version that relies on retries.
Update: The above performs better than these variations:
List<string> GetSomeKeysSwapping(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keys = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int index = rand.Next(numberPossibleToGet - i) + i;
keysToReturn.Add(keys[index]);
keys[index] = keys[i];
}
return keysToReturn;
}
List<string> GetSomeKeysEnumerable(string[] cachedKeys, int requestedNumberToGet)
{
Random rand = new Random();
return TakeRandom(cachedKeys, requestedNumberToGet, rand).ToList();
}
Some numbers with 10.000 iterations:
Function Name Elapsed Inclusive Time Number of Calls
GetSomeKeys 6,190.66 10,000
GetSomeKeysEnumerable 15,617.04 10,000
GetSomeKeysSwapping 8,293.64 10,000

A few thoughts.
First, your keysToReturn list is potentially being added to each time through the loop, right? You're creating an empty list and then adding each new key to the list. Since the list was not pre-sized, each add becomes an O(n) operation (see MSDN documentation). To fix this, try pre-sizing your list like this.
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Second, your breakout time is unrealistic (ok, ok, impossible) on Windows. All of the information I've ever read on Windows timing suggests that the best you can possibly hope for is 10 millisecond resolution, but in practice it's more like 15-18 milliseconds. In fact, try this code:
for (int iv = 0; iv < 10000; iv++) {
Console.WriteLine( DateTime.Now.Millisecond.ToString() );
}
What you'll see in the output are discrete jumps. Here is a sample output that I just ran on my machine.
13
...
13
28
...
28
44
...
44
59
...
59
75
...
The millisecond value jumps from 13 to 28 to 44 to 59 to 75. That's roughly a 15-16 millisecond resolution in the DateTime.Now function for my machine. This behavior is consistent with what you'd see in the C runtime ftime() call. In other words, it's a systemic trait of the Windows timing mechanism. The point is, you should not rely on a consistent 5 millisecond breakout time because you won't get it.
Third, am I right to assume that the breakout time is prevent the main thread from locking up? If so, then it'd be pretty easy to spawn off your function to a ThreadPool thread and let it run to completion regardless of how long it takes. Your main thread can then operate on the data.

Use HashSet instead, HashSet is much faster for lookup than List
HashSet<string> keysToReturn = new HashSet<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
int length = cachedKeys.Length;
while (DateTime.Now < breakoutTime && keysToReturn.Count < numberPossibleToGet) {
int i = rand.Next(0, length);
while (!keysToReturn.Add(cachedKeys[i])) {
i++;
if (i == length)
i = 0;
}
}

Consider using Stopwatch instead of DateTime.Now. It may simply be down to the inaccuracy of DateTime.Now when you're talking about milliseconds.

The problem could quite possibly be here:
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
This will require iterating over the list to determine if the key is in the return list. However, to be sure, you should try profiling this using a tool. Also, 5ms is pretty fast at .005 seconds, you may want to increase that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.