Deleting from array, mirrored (strange) behavior

Deleting from array, mirrored (strange) behavior - c#

The title may seem a little odd, because I have no idea how to describe this in one sentence.
For the course Algorithms we have to micro-optimize some stuff, one is finding out how deleting from an array works. The assignment is delete something from an array and re-align the contents so that there are no gaps, I think it is quite similar to how std::vector::erase works from c++.
Because I like the idea of understanding everything low-level, I went a little further and tried to bench my solutions. This presented some weird results.
At first, here is a little code that I used:
class Test {
Stopwatch sw;
Obj[] objs;
public Test() {
this.sw = new Stopwatch();
this.objs = new Obj[1000000];
// Fill objs
for (int i = 0; i < objs.Length; i++) {
objs[i] = new Obj(i);
}
}
public void test() {
// Time deletion
sw.Restart();
deleteValue(400000, objs);
sw.Stop();
// Show timings
Console.WriteLine(sw.Elapsed);
}
// Delete function
// value is the to-search-for item in the list of objects
private static void deleteValue(int value, Obj[] list) {
for (int i = 0; i < list.Length; i++) {
if (list[i].Value == value) {
for (int j = i; j < list.Length - 1; j++) {
list[j] = list[j + 1];
//if (list[j + 1] == null) {
// break;
//}
}
list[list.Length - 1] = null;
break;
}
}
}
}
I would just create this class and call the test() method. I did this in a loop for 25 times.
My findings:
The first round it takes a lot longer than the other 24, I think this is because of caching, but I am not sure.
When I use a value that is in the start of the list, it has to move more items in memory than when I use a value at the end, though it still seems to take less time.
Benchtimes differ quite a bit.
When I enable the commented if, performance goes up (10-20%) even if the value I search for is almost at the end of the list (which means the if goes off a lot of times without actually being useful).
I have no idea why these things happen, is there someone who can explain (some of) them? And maybe if someone sees this who is a pro at this, where can I find more info to do this the most efficient way?
Edit after testing:
I did some testing and found some interesting results. I run the test on an array with a size of a million items, filled with a million objects. I run that 25 times and report the cumulative time in milliseconds. I do that 10 times and take the average of that as a final value.
When I run the test with my function described just above here I get a score of:
362,1
When I run it with the answer of dbc I get a score of:
846,4
So mine was faster, but then I started to experiment with a half empty empty array and things started to get weird. To get rid of the inevitable nullPointerExceptions I added an extra check to the if (thinking it would ruin a bit more of the performance) like so:
if (fromItem != null && fromItem.Value != value)
list[to++] = fromItem;
This seemed to not only work, but improve performance dramatically! Now I get a score of:
247,9
The weird thing is, the scores seem to low to be true, but sometimes spike, this is the set I took the avg from:
94, 26, 966, 36, 632, 95, 47, 35, 109, 439
So the extra evaluation seems to improve my performance, despite of doing an extra check. How is this possible?

You are using Stopwatch to time your method. This calculates the total clock time taken during your method call, which could include the time required for .Net to initially JIT your method, interruptions for garbage collection, or slowdowns caused by system loads from other processes. Noise from these sources will likely dominate noise due to cache misses.
This answer gives some suggestions as to how you can minimize some of the noise from garbage collection or other processes. To eliminate JIT noise, you should call your method once without timing it -- or show the time taken by the first call in a separate column in your results table since it will be so different. You might also consider using a proper profiler which will report exactly how much time your code used exclusive of "noise" from other threads or processes.
Finally, I'll note that your algorithm to remove matching items from an array and shift everything else down uses a nested loop, which is not necessary and will access items in the array after the matching index twice. The standard algorithm looks like this:
public static void RemoveFromArray(this Obj[] array, int value)
{
int to = 0;
for (int from = 0; from < array.Length; from++)
{
var fromItem = array[from];
if (fromItem.Value != value)
array[to++] = fromItem;
}
for (; to < array.Length; to++)
{
array[to] = default(Obj);
}
}
However, instead of using the standard algorithm you might experiment by using Array.RemoveAt() with your version, since (I believe) internally it does the removal in unmanaged code.

Related

What is the best way to use Parallel.For in a recursive algorithm?

I am building software to evaluate many possible solutions and am trying to introduce parallel processing to speed up the calculations. My first attempt was to build a datatable with each row being a solution to evaluate but building the datatable takes quite some time and I am running into memory issues when the number of possible solutions goes into the millions.
The problem which warrants these solutions is structured as follows:
There is a range dates for x number of events which must be done in order. The solutions to evaluate could look as follows with each solution being a row, the events being the columns and the day number being the values.
Given 3 days (0 to 2) and three events:
0 0 0
0 0 1
0 0 2
0 1 1
0 1 2
0 2 2
1 1 1
1 1 2
1 2 2
2 2 2
My new plan was to use recursion and evaluate the solutions as I go rather than build a solution set to then evaluate.
for(int day = 0; day < maxdays; day++)
{
List<int> mydays = new List<int>();
mydays.Add(day);
EvalEvent(0,day,mydays);
}
private void EvalEvent(int eventnum,
int day, List<int> mydays)
{
Parallel.For(day,maxdays, day2 =>
// events must be on same day or after previous events
{
List<int> mydays2 = new List<int>();
for(int a = 0; a <mydays.Count;a++)
{
mydays2.Add(mydays[a]);
}
mydays2.Add(day2);
if(eventnum< eventcount - 1) // proceed to next event
{
EvalEvent(eventnum+1, day2,mydays2);
}
else
{
EvalSolution(mydays2);
}
});
}
My question is if this is actually an efficient use of parallel processing or will too many threads be spawned and slow it down? Should the parallel loop only be done on the last or maybe last few values of eventnum or is there a better way to approach the problem?
Requested old code pretty much is as follows:
private int daterange;
private int events;
private void ScheduleIt()
{
daterange = 10;
events = 6;
CreateSolutions();
int best = GetBest();
}
private DataTable Options();
private bool CreateSolutions()
{
Options= new DataTable();
Options.Columns.Add();
for (int day1=0;day1<=daterange ;day1++)
{
Options.Rows.Add(day1);
}
for (int event =1; event<events; event++)
{
Options.Columns.Add();
foreach(DataRow dr in Options.Rows)
{dr[Options.Columns.Count-1] = dr[Options.Columns.Count-2] ;}
int rows = Options.Rows.Count;
for (int day1=1;day1<=daterange ;day1++)
{
for(int i = 0; i <rows; i++)
{
if(day1 > Convert.ToInt32(Options.Rows[i][Options.Columns.Count-2]))
{
try{
Options.Rows.Add();
for (int col=0;col<Options.Columns.Count-1;col++)
{
Options.Rows[Options.Rows.Count-1][col] =Options.Rows[i][col];
}
Options.Rows[Options.Rows.Count-1][Options.Columns.Count-1] = day1;
}
catch(Exception ex)
{
return false;
}
}
}
}
}
return true;
}
private intGetBest()
{
int bestopt = 0;
double bestscore =999999999;
Parallel.For( 0, Options.Rows.Count,opt =>
{
double score = 0;
for(int i = 0; i <Options.Columns.Count;i++)
{score += Options.Rows[opt][i]}// just a stand in calc for a score
if (score < bestscore)
{bestscore = score;
bestopt = opt;
}
});
return bestopt;
}

Even if done without errors it can not significantly speed up your solution.
It looks like each level of recursion tries to start multiple (let say up to "k") next level calls for let's "n" level. This essentially mean code is O(k ^ n) which grows very fast. Non-algorithmic speedup of such O(k^n) solution is essentially useless (unless both k and n are very small). In particular, running code in parallel will only give you constant factor of speed up (roughly number of threads supported by your CPUs) which really does not change complexity at all.
Indeed creation of exponentially large number of requests for new threads will likely cause more problems by itself for just managing threads.
In addition to not significantly improving performance parallel code is harder to write as it needs proper synchronization or cleaver data partitioning - neither seem to be present in your case.

Parallelization works best when the workload is bulky and balanced. Ideally you would like your work splited to as many independent partitions as the logical processors of your machine, ensuring that all partitions have approximately the same size. This way all available processors will work with the maximum efficiency for approximately the same duration, and you'll get the results after the shortest time possible.
Of course you should start with a working and bug-free serial implementation, and then think about ways to partition your work. The easiest way is usually not optimal. For example an easy path is to convert your work to a LINQ query, and then parallelize it with AsParallel() (making it PLINQ). This usually results to a too granular partitioning, that introduces too much overhead. If you can't find ways to improve it you then can go the way of Parallel.For or Parallel.ForEach, which is a bit more complex.
A LINQ implementation should probably start by creating an iterator that produces all your units of work (Events or Solutions, it's not very clear to me).
public static IEnumerable<Solution> GetAllSolutions()
{
for (int day = 0; day < 3; day++)
{
for (int ev = 0; ev < 3; ev++)
{
yield return new Solution(); // ???
}
}
}
It will certainly be helpful if you have created concrete classes to represent the entities you are dealing with.

Should try-catch be avoided for known cases

I have a case which I know will happen but very scarce. For example in every 10 thousand times the code runs, this might happen once.
I can check for this case by a simple if but this if will run many times with no use.
On the other hand I can place the code in try-catch block and when that special case happens I do what is needed to recover.
The question is which one is better? I know that generally speaking try-catch should not be used for known cases because of the overhead issue and also the application logic should not rely on catch code, but running an if multiple times will have more performance issue. I have tested this using this small test code:
static void Main(string[] args)
{
Stopwatch sc = new Stopwatch();
var list = new List<int>();
var rnd = new Random();
for (int i = 0; i < 100000000; i++)
{
list.Add(rnd.Next());
}
sc.Start();
DoWithIf(list);
sc.Stop();
Console.WriteLine($"Done with IFs in {sc.ElapsedMilliseconds} milliseconds");
sc.Restart();
DoWithTryCatch(list);
sc.Stop();
Console.WriteLine($"Done with TRY-CATCH in {sc.ElapsedMilliseconds} milliseconds");
Console.ReadKey();
}
private static int[] DoWithTryCatch(List<int> list)
{
var res = new int[list.Count ];
try
{
for (int i = 0; i < list.Count; i++)
{
res[i] = list[i];
}
return res;
}
catch
{
return res;
}
}
private static int[] DoWithIf(List<int> list)
{
var res = new int[list.Count - 1];
for (int i = 0; i < list.Count; i++)
{
if (i < res.Length)
res[i] = list[i];
}
return res;
}
This code simply copies a lot of numbers to an array with not enough size. In my machine checking array bounds each time takes around 210 milliseconds to run while using try-catch that will hit catch once runs in around 190 milliseconds.
Also if you think it depends on the case my case is that I get push notifications in an app and will check if I have the topic of the message. If not I will get and store the topic information for next messages. There are many messages in few topics.

So, it would be accurate to say that in your test, the if option was slower than the try...catch option by 20 milliseconds, for a loop of 100000000 times.
That translates to 20 / 100,000,000 - that's 0.0000002 milliseconds for each iteration.
Do you really think that kind of nano-optimization is worth writing code that goes goes against proper design standards?
Exceptions are for exceptional cases, the things that you can't control or can't test in advance - for instance, when you are reading data from a database and the connection terminates in the middle - stuff like that.
Using exceptions for things that can be easily tested with simple code - well, that's just plain wrong.
If, for instance, you would have demonstrated a meaningful performance difference between these two options then perhaps you could justify using try...catch instead of if - but that's clearly not the case here.
So, to summarize - use if, not try...catch.
You should design your code for clarity, not for performance.
Write code that conveys the algorithm it is implementing in the clearest way possible.
Set performance goals and measure your code's performance against them.
If your code doesn't measure to your performance goals, Find the bottle necks and treat them.
Don't go wasting your time on nano-optimizations when you design the code.

In your case, you have somehow missed the obvious optimization: if you worry that calling an if 100.000 times is too much... don't?
private static int[] DoWithIf(List<int> list)
{
var res = new int[list.Count - 1];
var bounds = Math.Min(res.Length, list.Count)
for (int i = 0; i < bounds; i++)
{
res[i] = list[i];
}
return res;
}
So I know this is only a test case, but the answer is: optimize if you need it and for what you need it. If you have something in a loop that's supposedly costly, then try to move it out of the loop. Optimize based on logic, not based on compiler constructs. If you are down to optimizing compiler constructs, you should not be coding in a managed and/or high level language anyway.

Computing the most frequent element

I have recently come across this line of code and what it does is that it goes through an array and returns the value that is seen most often. For example 1,1,2,1,3 so it will return 1 because it appears more than 2 and 3. What I am trying to do is understand how it works so what I did was I went through it with visual studio step by step but it is not ringing any bells.
Can anyone help me understand what is going on here? It would be a total plus if someone can tell me what does c do and what is the logic behind the arguments in the if statements.
int[] arr = a;
int c = 1, maxcount = 1, maxvalue = 0;
int result = 0;
for (int i = 0; i < arr.Length; i++)
{
maxvalue = arr[i];
for (int j = 0; j <arr.Length; j++)
{
if (maxvalue == arr[j] && j != i)
{
c++;
if (c > maxcount)
{
maxcount = c;
result = arr[i];
}
}
else
{
c=1;
}
}
}
return result;

EDIT: On closer examination, the code snippet has a nested loop and is conventionally counting the maximum seen element by simply keeping track of the maximum seen times and the element that was seen and keeping them in sync.
That looks like an implementation of the Boyer-Moore majority vote counting algorithm. They have a nice illustration here.
The logic is simple, and is to compute the majority in a single pass, taking O(n) time. Note that majority here means that more than 50% of the array must be filled with that element. If there is no majority element, you get an "incorrect" result.
Verifying if the element is actually forming a majority is done in a separate pass usually.

It is not computing the most frequent element - what it is computing is the longest run of elements.
Also, it is not doing it very efficiently, the inner loop only needs to compute upto i-1, not upto arr.Length.
c is keeping track of the current run length. The first "if" is to check if this is a "continouous run". The second "if" (after reaching the last element in the run) will check if this run is longer than any run you have seen so far.
In the above input sample, you are getting 1 as answer because it is the longest run. Try with an input where the element with the longest run is not the same as the most frequent element. (e.g., 2,1,1,1,3,2,3,2,3,2,3,2 - here 2 is the most frequent element, but 1,1,1 is the longest run).

Dynamic compilation for performance

I have an idea of how I can improve the performance with dynamic code generation, but I'm not sure which is the best way to approach this problem.
Suppose I have a class
class Calculator
{
int Value1;
int Value2;
//..........
int ValueN;
void DoCalc()
{
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
//....
//....
//....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
}
The DoCalc method is at the lowest level and it is called many times during calculation. Another important aspect is that ValueN are only set at the beginning and do not change during calculation. So many of the ifs in the DoCalc method are unnecessary, as many of ValueN are 0. So I was hoping that dynamic code generation could help to improve performance.
For instance if I create a method
void DoCalc_Specific()
{
const Value1 = 0;
const Value2 = 0;
const ValueN = 1;
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
....
....
....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
and compile it with optimizations switched on the C# compiler is smart enough to only keep the necessary stuff. So I would like to create such method at run time based on the values of ValueN and use the generated method during calculations.
I guess that I could use expression trees for that, but expression trees works only with simple lambda functions, so I cannot use things like if, while etc. inside the function body. So in this case I need to change this method in an appropriate way.
Another possibility is to create the necessary code as a string and compile it dynamically. But it would be much better for me if I could take the existing method and modify it accordingly.
There's also Reflection.Emit, but I don't want to stick with it as it would be very difficult to maintain.
BTW. I'm not restricted to C#. So I'm open to suggestions of programming languages that are best suited for this kind of problem. Except for LISP for a couple of reasons.
One important clarification. DoValue1RelatedStuff() is not a method call in my algorithm. It's just some formula-based calculation and it's pretty fast. I should have written it like this
if (Value1 > 0)
{
// Do Value1 Related Stuff
}
I have run some performance tests and I can see that with two ifs when one is disabled the optimized method is about 2 times faster than with the redundant if.
Here's the code I used for testing:
public class Program
{
static void Main(string[] args)
{
int x = 0, y = 2;
var if_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
WithIf(x, y);
}
var if_et = DateTime.Now.Ticks - if_st;
Console.WriteLine(if_et.ToString());
var noif_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
Without(x, y);
}
var noif_et = DateTime.Now.Ticks - noif_st;
Console.WriteLine(noif_et.ToString());
Console.ReadLine();
}
static double WithIf(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
if (x > 0)
{
result += x * 0.01;
}
if (y > 0)
{
result += y * 0.01;
}
}
return result;
}
static double Without(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
result += y * 0.01;
}
return result;
}
}

I would usually not even think about such an optimization. How much work does DoValueXRelatedStuff() do? More than 10 to 50 processor cycles? Yes? That means you are going to build quite a complex system to save less then 10% execution time (and this seems quite optimistic to me). This can easily go down to less then 1%.
Is there no room for other optimizations? Better algorithms? An do you really need to eliminate single branches taking only a single processor cycle (if the branch prediction is correct)? Yes? Shouldn't you think about writing your code in assembler or something else more machine specific instead of using .NET?
Could you give the order of N, the complexity of a typical method, and the ratio of expressions usually evaluating to true?

It would surprise me to find a scenario where the overhead of evaluating the if statements is worth the effort to dynamically emit code.
Modern CPU's support branch prediction and branch predication, which makes the overhead for branches in small segments of code approach zero.
Have you tried to benchmark two hand-coded versions of the code, one that has all the if-statements in place but provides zero values for most, and one that removes all of those same if branches?

If you are really into code optimisation - before you do anything - run the profiler! It will show you where the bottleneck is and which areas are worth optimising.
Also - if the language choice is not limited (except for LISP) then nothing will beat assembler in terms of performance ;)
I remember achieving some performance magic by rewriting some inner functions (like the one you have) using assembler.

Before you do anything, do you actually have a problem?
i.e. does it run long enough to bother you?
If so, find out what is actually taking time, not what you guess. This is the quick, dirty, and highly effective method I use to see where time goes.
Now, you are talking about interpreting versus compiling. Interpreted code is typically 1-2 orders of magnitude slower than compiled code. The reason is that interpreters are continually figuring out what to do next, and then forgetting, while compiled code just knows.
If you are in this situation, then it may make sense to pay the price of translating so as to get the speed of compiled code.

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

I have a local class with a method used to build a list of strings and I'm finding that when I hit this method (in a for loop of 1000 times) often it's not returning the amount I request.
I have a global variable:
string[] cachedKeys
A parameter passed to the method:
int requestedNumberToGet
The method looks similar to this:
List<string> keysToReturn = new List<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ?
cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
//Do we have enough to fill the request within the time? otherwise give
//however many we currently have
while (DateTime.Now < breakoutTime
&& keysToReturn.Count < numberPossibleToGet
&& cachedKeys.Length >= numberPossibleToGet)
{
string randomKey = cachedKeys[rand.Next(0, cachedKeys.Length)];
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
}
if (keysToReturn.Count != numberPossibleToGet)
Debugger.Break();
I have approximately 40 strings in cachedKeys none exceeding 15 characters in length.
I'm no expert with threading so I'm literally just calling this method 1000 times in a loop and consistently hitting that debug there.
The machine this is running on is a fairly beefy desktop so I would expect the breakout time to be realistic, in fact it randomly breaks at any point of the loop (I've seen 20s, 100s, 200s, 300s).
Any one have any ideas where I'm going wrong with this?
Edit: Limited to .NET 2.0
Edit: The purpose of the breakout is so that if the method is taking too long to execute, the client (several web servers using the data for XML feeds) won't have to wait while the other project dependencies initialise, they'll just be given 0 results.
Edit: Thought I'd post the performance stats
Original
'0.0042477465711424217323710136' - 10
'0.0479597267250446634977350473' - 100
'0.4721072091564710039963179678' - 1000
Skeet
'0.0007076318358897569383818334' - 10
'0.007256508857969378789762386' - 100
'0.0749829936486341141122684587' - 1000
Freddy Rios
'0.0003765841748043396576939248' - 10
'0.0046003053460705201359390649' - 100
'0.0417058592642360970458535931' - 1000

Why not just take a copy of the list - O(n) - shuffle it, also O(n) - and then return the number of keys that have been requested. In fact, the shuffle only needs to be O(nRequested). Keep swapping a random member of the unshuffled bit of the list with the very start of the unshuffled bit, then expand the shuffled bit by 1 (just a notional counter).
EDIT: Here's some code which yields the results as an IEnumerable<T>. Note that it uses deferred execution, so if you change the source that's passed in before you first start iterating through the results, you'll see those changes. After the first result is fetched, the elements will have been cached.
static IEnumerable<T> TakeRandom<T>(IEnumerable<T> source,
int sizeRequired,
Random rng)
{
List<T> list = new List<T>(source);
sizeRequired = Math.Min(sizeRequired, list.Count);
for (int i=0; i < sizeRequired; i++)
{
int index = rng.Next(list.Count-i);
T selected = list[i + index];
list[i + index] = list[i];
list[i] = selected;
yield return selected;
}
}
The idea is that at any point after you've fetched n elements, the first n elements of the list will be those elements - so we make sure that we don't pick those again. When then pick a random element from "the rest", swap it to the right position and yield it.
Hope this helps. If you're using C# 3 you might want to make this an extension method by putting "this" in front of the first parameter.

The main issue are the using retries in a random scenario to ensure you get unique values. This quickly gets out of control, specially if the amount of items requested is near to the amount of items to get i.e. if you increase the amount of keys, you will see the issue less often but that can be avoided.
The following method does it by keeping a list of the keys remaining.
List<string> GetSomeKeys(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keysRemaining = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int randomIndex = rand.Next(keysRemaining.Count);
keysToReturn.Add(keysRemaining[randomIndex]);
keysRemaining.RemoveAt(randomIndex);
}
return keysToReturn;
}
The timeout was necessary on your version as you could potentially keep retrying to get a value for a long time. Specially when you wanted to retrieve the whole list, in which case you would almost certainly get a fail with the version that relies on retries.
Update: The above performs better than these variations:
List<string> GetSomeKeysSwapping(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keys = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int index = rand.Next(numberPossibleToGet - i) + i;
keysToReturn.Add(keys[index]);
keys[index] = keys[i];
}
return keysToReturn;
}
List<string> GetSomeKeysEnumerable(string[] cachedKeys, int requestedNumberToGet)
{
Random rand = new Random();
return TakeRandom(cachedKeys, requestedNumberToGet, rand).ToList();
}
Some numbers with 10.000 iterations:
Function Name Elapsed Inclusive Time Number of Calls
GetSomeKeys 6,190.66 10,000
GetSomeKeysEnumerable 15,617.04 10,000
GetSomeKeysSwapping 8,293.64 10,000

A few thoughts.
First, your keysToReturn list is potentially being added to each time through the loop, right? You're creating an empty list and then adding each new key to the list. Since the list was not pre-sized, each add becomes an O(n) operation (see MSDN documentation). To fix this, try pre-sizing your list like this.
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Second, your breakout time is unrealistic (ok, ok, impossible) on Windows. All of the information I've ever read on Windows timing suggests that the best you can possibly hope for is 10 millisecond resolution, but in practice it's more like 15-18 milliseconds. In fact, try this code:
for (int iv = 0; iv < 10000; iv++) {
Console.WriteLine( DateTime.Now.Millisecond.ToString() );
}
What you'll see in the output are discrete jumps. Here is a sample output that I just ran on my machine.
13
...
13
28
...
28
44
...
44
59
...
59
75
...
The millisecond value jumps from 13 to 28 to 44 to 59 to 75. That's roughly a 15-16 millisecond resolution in the DateTime.Now function for my machine. This behavior is consistent with what you'd see in the C runtime ftime() call. In other words, it's a systemic trait of the Windows timing mechanism. The point is, you should not rely on a consistent 5 millisecond breakout time because you won't get it.
Third, am I right to assume that the breakout time is prevent the main thread from locking up? If so, then it'd be pretty easy to spawn off your function to a ThreadPool thread and let it run to completion regardless of how long it takes. Your main thread can then operate on the data.

Use HashSet instead, HashSet is much faster for lookup than List
HashSet<string> keysToReturn = new HashSet<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
int length = cachedKeys.Length;
while (DateTime.Now < breakoutTime && keysToReturn.Count < numberPossibleToGet) {
int i = rand.Next(0, length);
while (!keysToReturn.Add(cachedKeys[i])) {
i++;
if (i == length)
i = 0;
}
}

Consider using Stopwatch instead of DateTime.Now. It may simply be down to the inaccuracy of DateTime.Now when you're talking about milliseconds.

The problem could quite possibly be here:
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
This will require iterating over the list to determine if the key is in the return list. However, to be sure, you should try profiling this using a tool. Also, 5ms is pretty fast at .005 seconds, you may want to increase that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Deleting from array, mirrored (strange) behavior - c#

Related

What is the best way to use Parallel.For in a recursive algorithm?

Should try-catch be avoided for known cases

Computing the most frequent element

Dynamic compilation for performance

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

Categories

Resources