C# How to use Interlocked.CompareExchange

C# How to use Interlocked.CompareExchange - c#

My goal is the following:
There is a certain range of integers, and I have to test every integer in that range for something random. I want to use multiple threads for this, and divide the work equally among the threads using a shared counter. I set the counter at the beginning value, and let every thread take a number, increase it, do some calculations, and return a result. This shared counter has to be incremented with locks, because otherwise there will be gaps / overlaps in the range of integers to test.
I have no idea where to start. Let's say I want 12 threads to do the work, I do:
for (int t = 0; t < threads; t++)
{
Thread thr = new Thread(new ThreadStart(startThread));
}
startThread() isthe method I use for the calculations.
Can you help me on my way? I know I have to use the Interlocked class, but that's all….

Say you have an int field somewhere (initialized to -1 initially) then:
int newVal = Interlocked.Increment(ref theField);
is a thread-safe increment; assuming you don't mind the (very small) risk of overflowing the upper int limit, then:
int next;
while((next = Interlocked.Increment(ref theField)) <= upperInclusive) {
// do item with index "next"
}
However, Parallel.For will do all of this a lot more conveniently:
Parallel.For(lowerInclusive, upperExclusive, i => DoWork(i));
or (to constrain to 12 threads):
var options = new ParallelOptions { MaxDegreeOfParallelism = 12 };
Parallel.For(lowerInclusive, upperExclusive, options, i => DoWork(i));

Related

Is parallel code supposed to run slower than sequential code, after a certain dataset size?

I'm fairly new to C# and programming in general and I was trying out parallel programming.
I have written this example code that computes the sum of an array first, using multiple threads, and then, using one thread (the main thread).
I've timed both cases.
static long Sum(int[] numbers, int start, int end)
{
long sum = 0;
for (int i = start; i < end; i++)
{
sum += numbers[i];
}
return sum;
}
static async Task Main()
{
// Arrange data.
const int COUNT = 100_000_000;
int[] numbers = new int[COUNT];
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = random.Next(100);
}
// Split task into multiple parts.
int threadCount = Environment.ProcessorCount;
int taskCount = threadCount - 1;
int taskSize = numbers.Length / taskCount;
var start = DateTime.Now;
// Run individual parts in separate threads.
List<Task<long>> tasks = new();
for (int i = 0; i < taskCount; i++)
{
int begin = i * taskSize;
int end = (i == taskCount - 1) ? numbers.Length : (i + 1) * taskSize;
tasks.Add(Task.Run(() => Sum(numbers, begin, end)));
}
// Wait for all threads to finish, as we need the result.
var partialSums = await Task.WhenAll(tasks);
long sumAsync = partialSums.Sum();
var durationAsync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Async sum: {sumAsync}");
Console.WriteLine($"Async duration: {durationAsync} miliseconds");
// Sequential
start = DateTime.Now;
long sumSync = Sum(numbers, 0, numbers.Length);
var durationSync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Sync sum: {sumSync}");
Console.WriteLine($"Sync duration: {durationSync} miliseconds");
var factor = durationSync / durationAsync;
Console.WriteLine($"Factor: {factor:0.00}x");
}
When the array size is 100 million, the parallel sum is computed 2x faster. (on average).
But when the array size is 1 billion, it's significantly slower than the sequential sum.
Why is it running slower?
Hardware Information
Environment.ProcessorCount = 4
GC.GetGCMemoryInfo().TotalAvailableMemoryBytes = 8468377600
Timing:
When array size is 100,000,000
When array size is 1,000,000,000
New Test:
This time instead of separate threads (it was 3 in my case) working on different parts of a single array of 1,000,000,000 integers, I physically divided the dataset into 3 separate arrays of 333,333,333 (one-third in size). This time, although, I'm working on adding up a billion integers on the same machine, my parallel code runs faster (as expected)
private static void InitArray(int[] numbers)
{
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = (int)random.Next(100);
}
}
public static async Task Main()
{
Stopwatch stopwatch = new();
const int SIZE = 333_333_333; // one third of a billion
List<int[]> listOfArrays = new();
for (int i = 0; i < Environment.ProcessorCount - 1; i++)
{
int[] numbers = new int[SIZE];
InitArray(numbers);
listOfArrays.Add(numbers);
}
// Sequential.
stopwatch.Start();
long syncSum = 0;
foreach (var array in listOfArrays)
{
syncSum += Sum(array);
}
stopwatch.Stop();
var sequentialDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Sequential sum: {syncSum}");
Console.WriteLine($"Sequential duration: {sequentialDuration} ms");
// Parallel.
stopwatch.Restart();
List<Task<long>> tasks = new();
foreach (var array in listOfArrays)
{
tasks.Add(Task.Run(() => Sum(array)));
}
var partialSums = await Task.WhenAll(tasks);
long parallelSum = partialSums.Sum();
stopwatch.Stop();
var parallelDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Parallel sum: {parallelSum}");
Console.WriteLine($"Parallel duration: {parallelDuration} ms");
Console.WriteLine($"Factor: {sequentialDuration / parallelDuration:0.00}x");
}
Timing
I don't know if it helps figure out what went wrong in the first approach.

The asynchronous pattern is not the same as running code in parallel. The main reason for asynchronous code is better resource utilization while the computer is waiting for some kind of IO device. Your code would be better described as parallel computing or concurrent computing.
While your example should work fine, it may not be the easiest, nor optimal way to do it. The easiest option would probably be to use Parallel Linq: numbers.AsParallel().Sum();. There is also a Parallel.For method that should be better suited, including an overload that maintains a thread local state. Note that while the parallel.For will attempt to optimize its partitioning, you probably want to process chunks of data in each iteration to reduce overhead. I would try around 1-10k values or so.
We can only guess the reason your parallel method is slower. Summing numbers is a really fast operation, so it may be that the computation is limited by memory bandwith or Cache usage. And while you want your work partitions to be fairly large, using too large partitions may result in less overall parallelism if a thread gets suspended for any reason. You may also want partitions on certain sizes to work well with the caching system, see cache associativity. It is also possible you are including things you did not intend to measure, like compilation times or GCs, See benchmark .Net that takes care of many of the edge cases when measuring performance.
Also, never use DateTime for measuring performance, Stopwatch is both much easier to use and much more accurate.

My machine has 4GB RAM, so initializing an int[1_000_000_000] results in memory paging. Going from int[100_000_000] to int[1_000_000_000] results in non-linear performance degradation (100x instead of 10x). Essentially a CPU-bound operation becomes I/O-bound. Instead of adding numbers, the program spends most of its time reading segments of the array from the disk. In these conditions using multiple threads can be detrimental for the overall performance, because the pattern of accessing the storage device becomes more erratic and less streamlined.
Maybe something similar happens on your 8GB RAM machine too, but I can't say for sure.

Parallel.ForEach search doesn't find the correct value

This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit

The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.

Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.

Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}

Why is a parallel-processing much slower for a first call in C#?

I am trying to process numbers as fast as possible with C# app. I use a Thread.Sleep() to simulate a processing and random numbers. I use 3 different techniques.
This is test code that I used:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Test
{
internal class Program
{
private static void Main()
{
var data = new int[500000];
var random = new Random();
for (int i = 0; i < 500000; i++)
{
data[i] = random.Next();
}
var partialTimes = new Dictionary<int, double>();
var iterations = 5;
for (int i = 1; i < iterations + 1; i++)
{
Console.Write($"ProcessData3 {i}\t");
StartProcessing(data, partialTimes, ProcessData3);
GC.Collect();
}
Console.WriteLine();
Console.WriteLine("Press Enter to Exit");
Console.ReadLine();
}
private static void StartProcessing(int[] data, Dictionary<int, double> partialTimes, Action<int[], Dictionary<int, double>> processData)
{
var stopwatch = Stopwatch.StartNew();
try
{
processData?.Invoke(data, partialTimes);
stopwatch.Stop();
Console.WriteLine($"{stopwatch.Elapsed.ToString(#"mm\:ss\:fffffff")} total = {partialTimes.Sum(s => s.Value)} max = {partialTimes.Values.Max()}");
}
finally
{
partialTimes.Clear();
}
}
private static void ProcessData1(int[] data, Dictionary<int, double> partialTimes)
{
Parallel.ForEach(data, number =>
{
var partialStopwatch = Stopwatch.StartNew();
Thread.Sleep(1);
partialStopwatch.Stop();
lock (partialTimes)
{
partialTimes[number] = partialStopwatch.Elapsed.TotalMilliseconds;
}
});
}
private static void ProcessData3(int[] data, Dictionary<int, double> partialTimes)
{
// Partition the entire source array.
var rangePartitioner = Partitioner.Create(0, data.Length);
// Loop over the partitions in parallel.
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
// Loop over each range element without a delegate invocation.
for (int i = range.Item1; i < range.Item2; i++)
{
var number = data[i];
var partialStopwatch = Stopwatch.StartNew();
Thread.Sleep(1);
partialStopwatch.Stop();
lock (partialTimes)
{
partialTimes[number] = partialStopwatch.Elapsed.TotalMilliseconds;
}
}
});
}
private static void ProcessData2(int[] data, Dictionary<int, double> partialTimes)
{
var tasks = new Task[data.Count()];
for (int i = 0; i < data.Count(); i++)
{
var number = data[i];
tasks[i] = Task.Factory.StartNew(() =>
{
var partialStopwatch = Stopwatch.StartNew();
Thread.Sleep(1);
partialStopwatch.Stop();
lock (partialTimes)
{
partialTimes[number] = partialStopwatch.Elapsed.TotalMilliseconds;
}
});
}
Task.WaitAll(tasks);
}
}
}
For each technique I restart the program. And I get these results,
with having a Thread.Sleep( 1 ):
ProcessData1 1 00:56:1796688 total = 801335,282599955 max = 16,8783
ProcessData1 2 00:23:5390014 total = 816167,642100022 max = 14,5913
ProcessData1 3 00:14:7090566 total = 827589,675899998 max = 13,2617
ProcessData1 4 00:10:8929177 total = 829296,528300007 max = 15,0175
ProcessData1 5 00:10:6333310 total = 839282,123200008 max = 29,2738
ProcessData2 1 00:37:8084153 total = 824507,174200022 max = 112,071
ProcessData2 2 00:16:3762096 total = 849272,47810001 max = 77,1514
ProcessData2 3 00:12:9177717 total = 854012,353100029 max = 67,5684
ProcessData2 4 00:10:4798701 total = 857396,642899983 max = 92,9408
ProcessData2 5 00:09:2206146 total = 870966,655499989 max = 51,8945
ProcessData3 1 01:13:6814541 total = 803581,718699918 max = 25,6815
ProcessData3 2 01:07:9809277 total = 814069,532899922 max = 26,0671
ProcessData3 3 01:07:9857984 total = 814148,329399928 max = 21,3116
ProcessData3 4 01:07:4812183 total = 808042,695499966 max = 16,8601
ProcessData3 5 01:07:2954614 total = 805895,325499903 max = 23,8517
Where
total is total a time spent inside each Parallel.ForEach() function together and
max is a maximum time of each function.
Why is the first loop so slow? How is it possible that other attempts are processed so quickly? How to achieve a faster parallel processing on the first attempt?
EDIT:
So I tried it also with having a Thread.Sleep( 10 )
Results are:
ProcessData1 1 02:50:2845698 total = 5109831,95429994 max = 12,0612
ProcessData1 2 00:56:3361645 total = 5125884,05919954 max = 12,7666
ProcessData1 3 00:53:4911541 total = 5131105,15209993 max = 12,7486
ProcessData1 4 00:49:5665628 total = 5144654,75829992 max = 13,2678
ProcessData1 5 00:46:0218194 total = 5152955,19509996 max = 13,702
ProcessData2 1 01:21:7207557 total = 5121889,31579983 max = 73,8152
ProcessData2 2 00:39:6660074 total = 5175557,68889969 max = 59,369
ProcessData2 3 00:31:9036416 total = 5193819,89889973 max = 56,2895
ProcessData2 4 00:27:4616803 total = 5207168,56969977 max = 65,5495
ProcessData2 5 00:24:4270755 total = 5222567,9044998 max = 65,368
ProcessData3 1 02:44:9985645 total = 5110117,19019997 max = 11,7172
ProcessData3 2 02:25:6533128 total = 5237779,27010012 max = 26,3171
ProcessData3 3 02:22:2771259 total = 5116123,45259975 max = 12,0581
ProcessData3 4 02:22:1678911 total = 5112574,93779995 max = 11,5334
ProcessData3 5 02:21:9418178 total = 5104980,07120004 max = 11,5583
So first loop still takes much more seconds than others..

The behavior you're seeing is entirely explained by the fact that the ThreadPool class delays creating new threads until some small amount of time has passed (on the order of 1 second…it's changed over the years).
It can be informative to add instrumentation to one's program. In your example, a very useful tool is to count the number of concurrent threads as managed by the thread pool, determine the "high water mark" (i.e. the maximum number of threads it eventually settles on), and then use that number to override the thread pool's behavior.
When I did that, I discovered that on the first run of the first method, you get up to about 25 threads. But since the default for the thread pool is to only create a number of threads equal to the number of cores on your computer (eight, in my case), creating the additional threads can take a fair amount of time. And of course, during that time, you get significantly less throughput than you would otherwise (so you incur a larger delay than just the 20 seconds or so getting up to that number of threads causes).
On the subsequent runs of that test, the max number of threads gradually rises (since each new run is starting with more threads in the thread pool already, from the previous run) gets as high as around 53.
If you know in advance how many threads the thread pool is going to require in order to perform your work efficiently, you can use the SetMinThreads() method to increase the number of threads it will create immediately on demand before switching to the throttled thread-creation algorithm. For example, having that 53 thread high water mark in hand, you can set the number of minimum threads to that number (or a nice round one, like 50).
When I do that, all five runs of your first test, which previously took between 25 seconds to 1 minute (with the longer runs being earlier, of course), take around 19 seconds to complete.
I'd like to emphasize that you should use SetMinThreads() very carefully. The thread pool is, in general, very good about managing work-loads. The scenario you present above is obviously just for the sake of example and not realistic, but it does have the problem that you're not really doing that much work in each Parallel.ForEach() iteration in the first place. It doesn't seem like a good fit for concurrency, since so much of the time spent will be on overhead. Using SetMinThreads() in any similar scenario just papers over a more insidious underlying issue.
You'll find that if you tailor your workloads to better match available resources, and to minimize transitions between tasks and threads, you can get good throughput without overriding the default thread pool numbers.
Some other notes on this particular test…
Note that if you change the program to run all three tests in the same session (five runs each), the "first run is longer" happens only for the first test. For future reference, you should always approach this sort of "first time is slower" question with an eye to testing different combinations and ordering, to verify whether it's a particular implementation that suffers from the effect, or if you see the effect for the first test, regardless of which implementation is run first. There are a number of implementation and platform details, including JIT, thread pool, disk cache that can affect the initial run of any algorithm, and you'll want to make sure that you quickly narrow down your search to knowing whether you're dealing with one of those or some genuine issue in your own algorithm.
By the way, not that it really matters for your question, but I find it odd your choice to use the random number in the data array as the key for your timings dictionary. This IMHO renders those timing values useless, due to collisions in the random numbers. You won't count every time (when there's a collision, only the last instance of that number will get stored) which means that the "total" time displayed is less than the true total time spent, and even the max values won't necessarily be correct (if the true max value gets overwritten by a later value using the same key, you'll miss it).
Here's my modified version of your first test, which shows both the diagnostic code I added, and (commented out) the statements to set the thread pool counts to produce faster, more consistent behavior:
private static int _threadCount1;
private static int _maxThreadCount1;
private static void ProcessData1(int[] data, Dictionary<int, double> partialTimes)
{
const int minOverride = 50;
int minMain, minIOCP, maxMain, maxIOCP;
ThreadPool.GetMinThreads(out minMain, out minIOCP);
ThreadPool.GetMaxThreads(out maxMain, out maxIOCP);
WriteLine($"cores: {Environment.ProcessorCount}");
WriteLine($"threads: {minMain} min, {maxMain} max");
// Uncomment two lines below to see uniform behavior across test runs:
//ThreadPool.SetMinThreads(minOverride, minIOCP);
//ThreadPool.SetMaxThreads(minOverride, maxIOCP);
_threadCount1 = _maxThreadCount1 = 0;
Parallel.ForEach(data, number =>
{
int threadCount = Interlocked.Increment(ref _threadCount1);
var partialStopwatch = Stopwatch.StartNew();
Thread.Sleep(1);
partialStopwatch.Stop();
lock (partialTimes)
{
partialTimes[number] = partialStopwatch.Elapsed.TotalMilliseconds;
if (_maxThreadCount1 < threadCount)
{
_maxThreadCount1 = threadCount;
}
}
Interlocked.Decrement(ref _threadCount1);
});
ThreadPool.SetMinThreads(minMain, minIOCP);
ThreadPool.SetMaxThreads(maxMain, maxIOCP);
WriteLine($"max thread count: {_maxThreadCount1}");
}

What do this methods do?

I've found two diferent methods to get a Max value from an array but I'm not really fond of parallel programing, so I really don't understand it.
I was wondering do this methods do the same or am I missing something?
I really don't have much information about them. Not even comments...
The first method:
int[] vec = ... (I guess the content doesn't matter)
static int naiveMax()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, i =>
{
lock (obj) {
if (vec[i] > max) max = vec[i];
}
});
return max;
}
And the second one:
static int Max()
{
int max = vec[0];
object obj = new object();
Parallel.For(0, vec.Length, //could be Parallel.For<int>
() => vec[0],
(i, loopState, partial) =>
{
if(vec[i]>partial) partial = vec[i];
return partial;
},
partial => {
lock (obj) {
if( partial > max) max = partial;
}
});
return max;
}
Do these do the same or something diferent and what? Thanks ;)

Both find the maximum value in an array of integers. In an attempt to find the maximum value faster, they do it "in parallel" using the Parallel.For Method. Both methods fail at this, though.
To see this, we first need a sufficiently large array of integers. For small arrays, parallel processing doesn't give us a speed-up anyway.
int[] values = new int[100000000];
Random random = new Random();
for (int i = 0; i < values.Length; i++)
{
values[i] = random.Next();
}
Now we can run the two methods and see how long they take. Using an appropriate performance measurement setup (Stopwatch, array of 100,000,000 integers, 100 iterations, Release build, no debugger attached, JIT warm-up) I get the following results on my machine:
naiveMax 00:06:03.3737078
Max 00:00:15.2453303
So Max is much much better than naiveMax (6 minutes! cough).
But how does it compare to, say, PLINQ?
static int MaxPlinq(int[] values)
{
return values.AsParallel().Max();
}
MaxPlinq 00:00:11.2335842
Not bad, saved a few seconds. Now, what about a plain, old, sequential for loop for comparison?
static int Simple(int[] values)
{
int result = values[0];
for (int i = 0; i < values.Length; i++)
{
if (result < values[i]) result = values[i];
}
return result;
}
Simple 00:00:05.7837002
I think we have a winner.
Lesson learned: Parallel.For is not pixie dust that you can sprinkle over your code to
make it magically run faster. If performance matters, use the right tools and measure, measure, measure, ...

They appear to do the same thing, however they are very inefficient. The point of parallelization is to improve the speed of code that can be executed independently. Due to race conditions, discovering the maximum (as implemented here) requires an atomic semaphore/lock on the actual logic... Which means you're spinning up many threads and related resources simply to do the code sequentially anyway... Defeating the purpose of parallelization entirely.

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

I have a local class with a method used to build a list of strings and I'm finding that when I hit this method (in a for loop of 1000 times) often it's not returning the amount I request.
I have a global variable:
string[] cachedKeys
A parameter passed to the method:
int requestedNumberToGet
The method looks similar to this:
List<string> keysToReturn = new List<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ?
cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
//Do we have enough to fill the request within the time? otherwise give
//however many we currently have
while (DateTime.Now < breakoutTime
&& keysToReturn.Count < numberPossibleToGet
&& cachedKeys.Length >= numberPossibleToGet)
{
string randomKey = cachedKeys[rand.Next(0, cachedKeys.Length)];
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
}
if (keysToReturn.Count != numberPossibleToGet)
Debugger.Break();
I have approximately 40 strings in cachedKeys none exceeding 15 characters in length.
I'm no expert with threading so I'm literally just calling this method 1000 times in a loop and consistently hitting that debug there.
The machine this is running on is a fairly beefy desktop so I would expect the breakout time to be realistic, in fact it randomly breaks at any point of the loop (I've seen 20s, 100s, 200s, 300s).
Any one have any ideas where I'm going wrong with this?
Edit: Limited to .NET 2.0
Edit: The purpose of the breakout is so that if the method is taking too long to execute, the client (several web servers using the data for XML feeds) won't have to wait while the other project dependencies initialise, they'll just be given 0 results.
Edit: Thought I'd post the performance stats
Original
'0.0042477465711424217323710136' - 10
'0.0479597267250446634977350473' - 100
'0.4721072091564710039963179678' - 1000
Skeet
'0.0007076318358897569383818334' - 10
'0.007256508857969378789762386' - 100
'0.0749829936486341141122684587' - 1000
Freddy Rios
'0.0003765841748043396576939248' - 10
'0.0046003053460705201359390649' - 100
'0.0417058592642360970458535931' - 1000

Why not just take a copy of the list - O(n) - shuffle it, also O(n) - and then return the number of keys that have been requested. In fact, the shuffle only needs to be O(nRequested). Keep swapping a random member of the unshuffled bit of the list with the very start of the unshuffled bit, then expand the shuffled bit by 1 (just a notional counter).
EDIT: Here's some code which yields the results as an IEnumerable<T>. Note that it uses deferred execution, so if you change the source that's passed in before you first start iterating through the results, you'll see those changes. After the first result is fetched, the elements will have been cached.
static IEnumerable<T> TakeRandom<T>(IEnumerable<T> source,
int sizeRequired,
Random rng)
{
List<T> list = new List<T>(source);
sizeRequired = Math.Min(sizeRequired, list.Count);
for (int i=0; i < sizeRequired; i++)
{
int index = rng.Next(list.Count-i);
T selected = list[i + index];
list[i + index] = list[i];
list[i] = selected;
yield return selected;
}
}
The idea is that at any point after you've fetched n elements, the first n elements of the list will be those elements - so we make sure that we don't pick those again. When then pick a random element from "the rest", swap it to the right position and yield it.
Hope this helps. If you're using C# 3 you might want to make this an extension method by putting "this" in front of the first parameter.

The main issue are the using retries in a random scenario to ensure you get unique values. This quickly gets out of control, specially if the amount of items requested is near to the amount of items to get i.e. if you increase the amount of keys, you will see the issue less often but that can be avoided.
The following method does it by keeping a list of the keys remaining.
List<string> GetSomeKeys(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keysRemaining = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int randomIndex = rand.Next(keysRemaining.Count);
keysToReturn.Add(keysRemaining[randomIndex]);
keysRemaining.RemoveAt(randomIndex);
}
return keysToReturn;
}
The timeout was necessary on your version as you could potentially keep retrying to get a value for a long time. Specially when you wanted to retrieve the whole list, in which case you would almost certainly get a fail with the version that relies on retries.
Update: The above performs better than these variations:
List<string> GetSomeKeysSwapping(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keys = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int index = rand.Next(numberPossibleToGet - i) + i;
keysToReturn.Add(keys[index]);
keys[index] = keys[i];
}
return keysToReturn;
}
List<string> GetSomeKeysEnumerable(string[] cachedKeys, int requestedNumberToGet)
{
Random rand = new Random();
return TakeRandom(cachedKeys, requestedNumberToGet, rand).ToList();
}
Some numbers with 10.000 iterations:
Function Name Elapsed Inclusive Time Number of Calls
GetSomeKeys 6,190.66 10,000
GetSomeKeysEnumerable 15,617.04 10,000
GetSomeKeysSwapping 8,293.64 10,000

A few thoughts.
First, your keysToReturn list is potentially being added to each time through the loop, right? You're creating an empty list and then adding each new key to the list. Since the list was not pre-sized, each add becomes an O(n) operation (see MSDN documentation). To fix this, try pre-sizing your list like this.
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Second, your breakout time is unrealistic (ok, ok, impossible) on Windows. All of the information I've ever read on Windows timing suggests that the best you can possibly hope for is 10 millisecond resolution, but in practice it's more like 15-18 milliseconds. In fact, try this code:
for (int iv = 0; iv < 10000; iv++) {
Console.WriteLine( DateTime.Now.Millisecond.ToString() );
}
What you'll see in the output are discrete jumps. Here is a sample output that I just ran on my machine.
13
...
13
28
...
28
44
...
44
59
...
59
75
...
The millisecond value jumps from 13 to 28 to 44 to 59 to 75. That's roughly a 15-16 millisecond resolution in the DateTime.Now function for my machine. This behavior is consistent with what you'd see in the C runtime ftime() call. In other words, it's a systemic trait of the Windows timing mechanism. The point is, you should not rely on a consistent 5 millisecond breakout time because you won't get it.
Third, am I right to assume that the breakout time is prevent the main thread from locking up? If so, then it'd be pretty easy to spawn off your function to a ThreadPool thread and let it run to completion regardless of how long it takes. Your main thread can then operate on the data.

Use HashSet instead, HashSet is much faster for lookup than List
HashSet<string> keysToReturn = new HashSet<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
int length = cachedKeys.Length;
while (DateTime.Now < breakoutTime && keysToReturn.Count < numberPossibleToGet) {
int i = rand.Next(0, length);
while (!keysToReturn.Add(cachedKeys[i])) {
i++;
if (i == length)
i = 0;
}
}

Consider using Stopwatch instead of DateTime.Now. It may simply be down to the inaccuracy of DateTime.Now when you're talking about milliseconds.

The problem could quite possibly be here:
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
This will require iterating over the list to determine if the key is in the return list. However, to be sure, you should try profiling this using a tool. Also, 5ms is pretty fast at .005 seconds, you may want to increase that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.