Calculating the approximate run time of a for loop - c#

I have a piece of code in my C# Windows Form Application that looks like below:
List<string> RESULT_LIST = new List<string>();
int[] arr = My_LIST.ToArray();
string s = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < arr.Length; i++)
{
int counter = i;
for (int j = 1; j <= arr.Length; j++)
{
counter++;
if (counter == arr.Length)
{
counter = 0;
}
s += arr[counter].ToString();
RESULT_LIST.Add(s);
}
s = "";
}
sw.Stop();
TimeSpan ts = sw.Elapsed;
string elapsedTime = String.Format("{0:00}", ts.TotalMilliseconds * 1000);
MessageBox.Show(elapsedTime);
I use this code to get any combination of the numbers of My list. I have behaved with My_LIST like a recursive one. The image below demonstrates my purpose very clearly:
All I need to do is:
Making a formula to calculate the approximate run time of these two
nested for loops to guess the run time for any length and help the
user know the approximate time that he/she must wait.
I have used a C# Stopwatch like this: Stopwatch sw = new Stopwatch(); to show the run time and below are the results(Note that in order to reduce the chance of error I've repeated the calculation three times for each length and the numbers show the time in nano seconds for the first, second and third attempt respectively.):
arr.Length = 400; 127838 - 107251 - 100898
arr.Length = 800; 751282 - 750574 - 739869
arr.Length = 1200; 2320517 - 2136107 - 2146099
arr.Length = 2000; 8502631 - 7554743 - 7635173
Note that there are only one-digit numbers in My_LIST to make the time
of adding numbers to the list approximately equal.
How can I find out the relation between arr.Length and run time?

First, let's suppose you have examined the algorithm and noticed that it appears to be quadratic in the array length. This suggests to us that the time taken to run should be a function of the form
t = A + B n + C n2
You've gathered some observations by running the code multiple times with different values for n and measuring t. That's a good approach.
The question now is: what are the best values for A, B and C such that they match your observations closely?
This problem can be solved in a variety of ways; I would suggest to you that the least-squares method of regression would be the place to start, and see if you get good results. There's a page on it here:
www.efunda.com/math/leastsquares/lstsqr2dcurve.cfm
UPDATE: I just looked at your algorithm again and realized it is cubic because you have a quadratic string concat in the inner loop. So this technique might not work so well. I suggest you use StringBuilder to make your algorithm quadratic.
Now, suppose you did not know ahead of time that the problem was quadratic. How would you determine the formula then? A good start would be to graph your points on log scale paper; if they roughly form a straight line then the slope of the line gives you a clue as to the power of the polynomial. If they don't form a straight line -- well, cross that bridge when you come to it.

Well you gonna do some math here.
Since the total number of runs is exactly n^2, not O(n^2) but exactly n^2 times.
Then what you could do is to keep a counter variable for the number of items processed and use math to find out an estimate
int numItemProcessed;
int timeElapsed;//read from stop watch
int totalItems = n * n;
int remainingEstimate = ((float) totalItems - numItemProcessed) / numItemProcessed) * timeElapsed

Don't assume the algorithm is necessarily N^2 in time complexity.
Take the averages of your numbers, and plot the best fit on a log-log plot, then measure the gradient. This will give you an idea as to the largest term in the polynomial. (see wikipedia log-log plot)
Once you have that, you can do a least-squares regression to work out the coefficients of the polynomial of the correct order. This will allow an estimate from the data, of the time taken for an unseen problem.
Note: As Eric Lippert said, it depends on what you want to measure - averaging may not be appropriate depending on your use case - the first run time might be more correct.
This method will work for any polynomial algorithm. It will also tell you if the algorithm is polynomial (non-polynomial running times will not give straight lines on the log-log plot).

Related

c# Normalized power for Polar cycle

List<int> NPower = new List<int>();
List<double> list = new List<double>();
try
{
for (int i = 1; i < dataGridView1.Rows.Count; i++)
{
for (int n = 0; n < i + 30; n++)
{
NPower.Add(Convert.ToInt32(dataGridView1.Rows[i + n].Cells[6].Value));
}
}
average = NPower.Average();
total = Math.Pow(average, 4);
NPower.Clear();
}
catch (Exception)
{
average = NPower.Average();
NP = Convert.ToInt32(Math.Pow(average, (1.0 / 3.0)));
label19.Text = "Normalised Power: " + NP.ToString();
NPower.Clear();
}
Hi so i'm trying to calculate the normalized power for a cycling polar cycle. I know that for the normalized power you need to:
1) starting at the 30 s mark, calculate a rolling 30 s average (of the preceeding time points, obviously).
2) raise all the values obtained in step #1 to the 4th power.
3) take the average of all of the values obtained in step #2.
4) take the 4th root of the value obtained in step #3.
I think i have done that but the normalized power comes up with 16 which isnt correct. Could anyone look at my code to see if they could figure out a solution. Thankyou, sorry for my code i'm still quite new to this so my code might be in the incorrect format.
I'm not sure that I understand your requirements or code completely, but a few things I noticed:
Since you're supposed to start taking the rolling average after 30 seconds, shouldn't i be initialized to 30 instead of 1?
Since it's a rolling average, shouldn't n be initialized to the value of i instead of 0?
Why is the final result calculated inside a catch block?
Shouldn't it be Math.Pow(average, (1.0 / 4.0)) since you want the fourth root, not the third?

Computing the most frequent element

I have recently come across this line of code and what it does is that it goes through an array and returns the value that is seen most often. For example 1,1,2,1,3 so it will return 1 because it appears more than 2 and 3. What I am trying to do is understand how it works so what I did was I went through it with visual studio step by step but it is not ringing any bells.
Can anyone help me understand what is going on here? It would be a total plus if someone can tell me what does c do and what is the logic behind the arguments in the if statements.
int[] arr = a;
int c = 1, maxcount = 1, maxvalue = 0;
int result = 0;
for (int i = 0; i < arr.Length; i++)
{
maxvalue = arr[i];
for (int j = 0; j <arr.Length; j++)
{
if (maxvalue == arr[j] && j != i)
{
c++;
if (c > maxcount)
{
maxcount = c;
result = arr[i];
}
}
else
{
c=1;
}
}
}
return result;
EDIT: On closer examination, the code snippet has a nested loop and is conventionally counting the maximum seen element by simply keeping track of the maximum seen times and the element that was seen and keeping them in sync.
That looks like an implementation of the Boyer-Moore majority vote counting algorithm. They have a nice illustration here.
The logic is simple, and is to compute the majority in a single pass, taking O(n) time. Note that majority here means that more than 50% of the array must be filled with that element. If there is no majority element, you get an "incorrect" result.
Verifying if the element is actually forming a majority is done in a separate pass usually.
It is not computing the most frequent element - what it is computing is the longest run of elements.
Also, it is not doing it very efficiently, the inner loop only needs to compute upto i-1, not upto arr.Length.
c is keeping track of the current run length. The first "if" is to check if this is a "continouous run". The second "if" (after reaching the last element in the run) will check if this run is longer than any run you have seen so far.
In the above input sample, you are getting 1 as answer because it is the longest run. Try with an input where the element with the longest run is not the same as the most frequent element. (e.g., 2,1,1,1,3,2,3,2,3,2,3,2 - here 2 is the most frequent element, but 1,1,1 is the longest run).

Shuffling an array bottlenecks on Random.Nex(int)

I've been working on a small piece of code that sorts the provided array. The array should be sorted as fast as possible. Randomization is not that important. After profiling the method I found out that the biggest hog is Random.Next. Which takes up about 70% of the method execution time. After searching online for faster random generators I found no plug and play libraries that offer any improved performance.
So I was wondering whether there are any ways to improve the performance of this code any more.
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Shuffle(byte[] chars)
{
for (var i = 0; i < chars.Length; i++)
{
var index = rnd.Next(chars.Length);
byte tmpStore = chars[index];
chars[index] = chars[i];
chars[i] = tmpStore;
}
}
Alright, this is getting into micro-optimization territory.
Random.Next(int) actually performs some ops internally that we can optimize out:
int index = (int)(rnd.Next() * (1.0 / int.Max) * chars.Length);
Since you're using the same maxValue over and over in a loop, a trivial optimization would be to precalculate your denominator outside of the loop. This way we get rid of an int->double conversion and a multiply:
double d = chars.Length / (double)int.Max;
And then:
int index = (int)(rnd.Next() * d);
On a separate note: your shuffle isn't going to have a uniform distribution. See Jeff Atwood's post The Danger of Naïveté which deals specifically with this subject and shows how to perform a uniform Fisher-Yates shuffle.
If n^n isn't too big for the double range, you could create one random double number, multiply it by n^n, then use modulo(n) each iteration as the current random number prior to dividing the random result by n as preparation for the next iteration.

parallel.foreach works, but why?

Can anyone explain, why this program is returning the correct value for sqrt_min?
int n = 1000000;
double[] myArr = new double[n];
for(int i = n-1 ; i>= 0; i--){ myArr[i] = (double)i;}
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num =>
{
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if(sqrt < sqrt_min){ sqrt_min = sqrt;}
});
Console.WriteLine("minimum: "+sqrt_min);
It works by sheer luck. Sometimes when you run it you are lucky that the non-atomic reads and writes to the double are not resulting in "torn" values. Sometimes you are lucky that the non-atomic tests and sets just happen to be setting the correct value when that race happens. There is no guarantee that this program produces any particular result.
Your code is not safe; it only works by coincidence.
If two threads run the if simultaneously, one of the minimums will be overwritten:
sqrt_min = 6
Thread A: sqrt = 5
Thread B: sqrt = 4
Thread A enters the if
Thread B enters the if
Thread B assigns sqrt_min = 4
Thread A assigns sqrt_min = 5
On 32-bit systems, you're also vulnerable to read/write tearing.
It would be possible to make this safe using Interlocked.CompareExchange in a loop.
For why your original code is broken check the other answers, I won't repeat that.
Multithreading is easiest when there is no write access to shared state. Luckily your code can be written that way. Parallel linq can be nice in such situations, but sometimes the the overhead is too large.
You can rewrite your code to:
double sqrt_min = myArr.AsParallel().Select(x=>Math.Sqrt(x)).Min();
In your specific problem it's faster to swap around the Min and the Sqrt operation, which is possible because Sqrt is monotonically increasing.
double sqrt_min = Math.Sqrt(myArr.AsParallel().Min())
Your code does not really work: I ran it in a loop 100,000 times, and it failed once on my 8-core computer, producing this output:
minimum: 1
I shortened the runs to make the error appear faster.
Here are my modifications:
static void Run() {
int n = 10;
double[] myArr = new double[n];
for (int i = n - 1; i >= 0; i--) { myArr[i] = (double)i*i; }
// sqrt_min contains minimal sqrt-value
double sqrt_min = double.MaxValue;
Parallel.ForEach(myArr, num => {
double sqrt = Math.Sqrt(num); // some time consuming calculation that should be parallized
if (sqrt < sqrt_min) { sqrt_min = sqrt; }
});
if (sqrt_min > 0) {
Console.WriteLine("minimum: " + sqrt_min);
}
}
static void Main() {
for (int i = 0; i != 100000; i++ ) {
Run();
}
}
This is not a coincidence, considering the lack of synchronization around the reading and writing of a shared variable.
As others have said, this only works based on shear luck. Both the OP and other posters have had trouble actually creating the race condition though. That is fairly easily explained. The code generates lots of race conditions, but the vast majority of them (99.9999% to be exact) are irrelevant. All that matters at the end of the day is the fact that 0 should be the min result. If your code thinks that root 5 is greater than root 6, or that root 234 is greater than root 235 it still won't break. There needs to be a race condition specifically with the iteration generating 0. The odds that one of the iterations has a race condition with another is very, very high. The odds that the iteration processing the last item has a race condition is really quite low.

What's wrong in terms of performance with this code? List.Contains, random usage, threading?

I have a local class with a method used to build a list of strings and I'm finding that when I hit this method (in a for loop of 1000 times) often it's not returning the amount I request.
I have a global variable:
string[] cachedKeys
A parameter passed to the method:
int requestedNumberToGet
The method looks similar to this:
List<string> keysToReturn = new List<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ?
cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
//Do we have enough to fill the request within the time? otherwise give
//however many we currently have
while (DateTime.Now < breakoutTime
&& keysToReturn.Count < numberPossibleToGet
&& cachedKeys.Length >= numberPossibleToGet)
{
string randomKey = cachedKeys[rand.Next(0, cachedKeys.Length)];
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
}
if (keysToReturn.Count != numberPossibleToGet)
Debugger.Break();
I have approximately 40 strings in cachedKeys none exceeding 15 characters in length.
I'm no expert with threading so I'm literally just calling this method 1000 times in a loop and consistently hitting that debug there.
The machine this is running on is a fairly beefy desktop so I would expect the breakout time to be realistic, in fact it randomly breaks at any point of the loop (I've seen 20s, 100s, 200s, 300s).
Any one have any ideas where I'm going wrong with this?
Edit: Limited to .NET 2.0
Edit: The purpose of the breakout is so that if the method is taking too long to execute, the client (several web servers using the data for XML feeds) won't have to wait while the other project dependencies initialise, they'll just be given 0 results.
Edit: Thought I'd post the performance stats
Original
'0.0042477465711424217323710136' - 10
'0.0479597267250446634977350473' - 100
'0.4721072091564710039963179678' - 1000
Skeet
'0.0007076318358897569383818334' - 10
'0.007256508857969378789762386' - 100
'0.0749829936486341141122684587' - 1000
Freddy Rios
'0.0003765841748043396576939248' - 10
'0.0046003053460705201359390649' - 100
'0.0417058592642360970458535931' - 1000
Why not just take a copy of the list - O(n) - shuffle it, also O(n) - and then return the number of keys that have been requested. In fact, the shuffle only needs to be O(nRequested). Keep swapping a random member of the unshuffled bit of the list with the very start of the unshuffled bit, then expand the shuffled bit by 1 (just a notional counter).
EDIT: Here's some code which yields the results as an IEnumerable<T>. Note that it uses deferred execution, so if you change the source that's passed in before you first start iterating through the results, you'll see those changes. After the first result is fetched, the elements will have been cached.
static IEnumerable<T> TakeRandom<T>(IEnumerable<T> source,
int sizeRequired,
Random rng)
{
List<T> list = new List<T>(source);
sizeRequired = Math.Min(sizeRequired, list.Count);
for (int i=0; i < sizeRequired; i++)
{
int index = rng.Next(list.Count-i);
T selected = list[i + index];
list[i + index] = list[i];
list[i] = selected;
yield return selected;
}
}
The idea is that at any point after you've fetched n elements, the first n elements of the list will be those elements - so we make sure that we don't pick those again. When then pick a random element from "the rest", swap it to the right position and yield it.
Hope this helps. If you're using C# 3 you might want to make this an extension method by putting "this" in front of the first parameter.
The main issue are the using retries in a random scenario to ensure you get unique values. This quickly gets out of control, specially if the amount of items requested is near to the amount of items to get i.e. if you increase the amount of keys, you will see the issue less often but that can be avoided.
The following method does it by keeping a list of the keys remaining.
List<string> GetSomeKeys(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keysRemaining = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int randomIndex = rand.Next(keysRemaining.Count);
keysToReturn.Add(keysRemaining[randomIndex]);
keysRemaining.RemoveAt(randomIndex);
}
return keysToReturn;
}
The timeout was necessary on your version as you could potentially keep retrying to get a value for a long time. Specially when you wanted to retrieve the whole list, in which case you would almost certainly get a fail with the version that relies on retries.
Update: The above performs better than these variations:
List<string> GetSomeKeysSwapping(string[] cachedKeys, int requestedNumberToGet)
{
int numberPossibleToGet = Math.Min(cachedKeys.Length, requestedNumberToGet);
List<string> keys = new List<string>(cachedKeys);
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Random rand = new Random();
for (int i = 0; i < numberPossibleToGet; i++)
{
int index = rand.Next(numberPossibleToGet - i) + i;
keysToReturn.Add(keys[index]);
keys[index] = keys[i];
}
return keysToReturn;
}
List<string> GetSomeKeysEnumerable(string[] cachedKeys, int requestedNumberToGet)
{
Random rand = new Random();
return TakeRandom(cachedKeys, requestedNumberToGet, rand).ToList();
}
Some numbers with 10.000 iterations:
Function Name Elapsed Inclusive Time Number of Calls
GetSomeKeys 6,190.66 10,000
GetSomeKeysEnumerable 15,617.04 10,000
GetSomeKeysSwapping 8,293.64 10,000
A few thoughts.
First, your keysToReturn list is potentially being added to each time through the loop, right? You're creating an empty list and then adding each new key to the list. Since the list was not pre-sized, each add becomes an O(n) operation (see MSDN documentation). To fix this, try pre-sizing your list like this.
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
List<string> keysToReturn = new List<string>(numberPossibleToGet);
Second, your breakout time is unrealistic (ok, ok, impossible) on Windows. All of the information I've ever read on Windows timing suggests that the best you can possibly hope for is 10 millisecond resolution, but in practice it's more like 15-18 milliseconds. In fact, try this code:
for (int iv = 0; iv < 10000; iv++) {
Console.WriteLine( DateTime.Now.Millisecond.ToString() );
}
What you'll see in the output are discrete jumps. Here is a sample output that I just ran on my machine.
13
...
13
28
...
28
44
...
44
59
...
59
75
...
The millisecond value jumps from 13 to 28 to 44 to 59 to 75. That's roughly a 15-16 millisecond resolution in the DateTime.Now function for my machine. This behavior is consistent with what you'd see in the C runtime ftime() call. In other words, it's a systemic trait of the Windows timing mechanism. The point is, you should not rely on a consistent 5 millisecond breakout time because you won't get it.
Third, am I right to assume that the breakout time is prevent the main thread from locking up? If so, then it'd be pretty easy to spawn off your function to a ThreadPool thread and let it run to completion regardless of how long it takes. Your main thread can then operate on the data.
Use HashSet instead, HashSet is much faster for lookup than List
HashSet<string> keysToReturn = new HashSet<string>();
int numberPossibleToGet = (cachedKeys.Length <= requestedNumberToGet) ? cachedKeys.Length : requestedNumberToGet;
Random rand = new Random();
DateTime breakoutTime = DateTime.Now.AddMilliseconds(5);
int length = cachedKeys.Length;
while (DateTime.Now < breakoutTime && keysToReturn.Count < numberPossibleToGet) {
int i = rand.Next(0, length);
while (!keysToReturn.Add(cachedKeys[i])) {
i++;
if (i == length)
i = 0;
}
}
Consider using Stopwatch instead of DateTime.Now. It may simply be down to the inaccuracy of DateTime.Now when you're talking about milliseconds.
The problem could quite possibly be here:
if (!keysToReturn.Contains(randomKey))
keysToReturn.Add(randomKey);
This will require iterating over the list to determine if the key is in the return list. However, to be sure, you should try profiling this using a tool. Also, 5ms is pretty fast at .005 seconds, you may want to increase that.

Categories

Resources