Garbage Collector generations not incrementing - c#

I have a lot of questions about the garbage collection procedure, mainly when does it run, when the objects are set to an older generation, so on..
static void Main(string[] args)
{
int i = 0, j = 0;
int a = 0;
Holder prev = new Holder(null);
while(GC.CollectionCount(1) == 0)
{
int aux = GC.CollectionCount(0);
if(aux > a){
a = aux;
++j;
Console.WriteLine((i+1));
}
++i;
Holder h = new Holder(prev);
Console.WriteLine(GC.GetGeneration(prev));
prev = h;
}
}
I'm trying to get the number of objects in the gen1.
Why does j = 1; ?? the GC only runs once on the gen0 (to leave the while shouldn't it at least run 2 times)?
[EDIT]
by adding this after the while breaks, i got very confused
Console.WriteLine("#gc0 = "+GC.CollectionCount(0)); --> 2
Console.WriteLine("#gc1 = "+GC.CollectionCount(1)); --> 1
Console.WriteLine("#objs = "+ i);
Console.ReadLine();
how come the GC.CollectionCount(0) be only 2? I´ve been reading Richter clr via c# and he said this
the objects in generation 1 are examined
only when generation 1 reaches its budget, which usually requires several garbage collections of
generation 0.
[EDIT]
but at the same time, if the gc sees that all the object's survived, it grows g0 limit, maybe the reason for only 2 gc's on g0?

How about we spice things up with a bit of randomness like this:
class Program
{
static void Main(string[] args)
{
Random random = new Random();
int i = 0, j = 0;
int a = 0;
Holder prev = new Holder(null);
Holder prev2 = new Holder(null);
while (GC.CollectionCount(1) == 0)
{
int aux = GC.CollectionCount(0);
if (aux > a)
{
a = aux;
++j;
Console.WriteLine((i + 1));
}
++i;
var flag = random.Next(1) == 1;
Holder h = new Holder(flag ? prev : prev2);
Console.WriteLine("Prev: " + GC.GetGeneration(prev));
Console.WriteLine("Prev2: " + GC.GetGeneration(prev2));
if (flag)
{
prev = h;
}
else
{
prev2 = h;
}
}
}
}
internal class Holder
{
private Holder holder;
public Holder(Holder o)
{
holder = o;
}
}
The code sample you've provided was so simple that the CLR knew there was not point in moving your prev item to another generation.
It usage was simple and I think that the runtime had it optimized it to live on G0 only.
Adding a more complex logic breaks the runtime's optimizations and now one of prev or prev1 will go on the G1 depending on which of the objects was used less frequently (don't know the exact mechanics here).
You can try adding instead of prev and prev2 an prevs array and do a random there on the index and you can better see how the array elements will advance the generations.

Related

Why might System.Random.Sample throw IndexOutOfRangeException?

I'm calling Next() on a .NET System.Random instance inside Unity (2017.1.1f1). It's throwing an IndexOutOfRangeException from inside the protected function Sample(), which accepts no arguments and returns a double between 0 and 1. What might cause this behavior?
Here is the exception in detail
System.IndexOutOfRangeException: Array index is out of range.
at System.Random.Sample () [0x0003e] in
/Users/builduser/buildslave/mono/build/mcs/class/corlib/System/Random.cs:91
at System.Random.Next (Int32 maxValue) [0x00017] in
/Users/builduser/buildslave/mono/build/mcs/class/corlib/System/Random.cs:112
at Quicksilver.SysIRand.RandInt (Int32 max_exclusive) [0x00008] in
F:\SVNHome\gemrush\GemRush\Assets\Source\Shared\Utility\IRand.cs:38
at Quicksilver.IEnumerableExt.SelectRandom[Skill] (IEnumerable`1
collection, IRand rand, Int32 count) [0x00070] in
F:\SVNHome\gemrush\GemRush\Assets\Source\Shared\Utility\IEnumerableExt.cs:61
(there's another 13 layers of callstack above this)
This is a multi-threaded environment, but each thread has its own dedicated instance of System.Random. As you can see from the code below the parameter passed to Next() must have been 1 or higher.
This error was thrown about 1 hour into a complex automated test script, so running multiple times to test theories is expensive, and any modifications that change how the RNG gets invoked will prevent a reproduction. (If the error somehow involves unintended interaction between threads, then it probably won't be reproducible at all.)
Since it made it an hour into the test script, the overwhelming majority of invocations of this method must NOT be throwing an error.
The function making direct use of the random numbers is here:
// Chooses count items at random from the enumeration and returns them in an array
// The order of selected items within the array is also random
// If the collection is smaller than count, the entire collection is returned (in random order)
public static T[] SelectRandom<T>(this IEnumerable<T> collection, IRand rand, int count = 1)
{
if (count <= 0) return new T[0]; // Optimization for trivial case
T[] keep = new T[count];
int found = 0;
foreach (T item in collection)
{
if (found < count)
{
// Take the first #count items, in case that's all there are
// Move a random item of those found so far (including the new one)
// to the end of the array, and insert the new one in its place
int r = rand.RandInt(found + 1);
keep[found++] = keep[r];
keep[r] = item;
}
else
{
// Random chance to replace one of our previously-selected elements
++found;
if (rand.RandInt(found) < count) // probability desired/found
{
// Replace a random previously-selected element
int r = rand.RandInt(count);
keep[r] = item;
}
}
}
if (found < count)
{
// The collection was too small to get everything requested;
// Make a new, smaller array containing everything in the collection
T[] all = new T[found];
Array.Copy(keep, all, found);
return all;
}
return keep;
}
The error is being thrown from this line:
if (rand.RandInt(found) < count) // probability desired/found
IRand is the interface for a very thin wrapper around System.Random; IRand.RandInt() simply returns Random.Next().
EDIT
Here's how the Random instances are created and distributed:
private void Start()
{
SysIRand[] rngs = new SysIRand[parallelTesters];
if (parallelTesters > 0) rngs[0] = new SysIRand(new System.Random(548913));
if (parallelTesters > 1) rngs[1] = new SysIRand(new System.Random(138498));
if (parallelTesters > 2) rngs[2] = new SysIRand(new System.Random(976336));
if (parallelTesters > 3) rngs[3] = new SysIRand(new System.Random(793461));
if (parallelTesters > 4) rngs[4] = new SysIRand(new System.Random(648791));
if (parallelTesters > 5) rngs[5] = new SysIRand(new System.Random(348916));
if (parallelTesters > 6) rngs[6] = new SysIRand(new System.Random(8467168));
if (parallelTesters > 7) rngs[7] = new SysIRand(new System.Random(6183569));
for (int i = 8; i < parallelTesters; ++i)
{
rngs[i] = new SysIRand(new System.Random(7 * i * i + 8961 * i + 129786));
}
for (int t = 0; t < parallelTesters; ++t)
{
SysIRand rand = rngs[t];
IBot bot = BotFactory.DrawBot(rand);
BotTester tester = new BotTester(rand, bot);
tester.testerID = t + 1;
tester.OnMessage += (str) => UponMessage(tester.testerID + " ~ " + str);
tester.OnError += (str) => UponError(tester.testerID + " ~ " + str);
tester.OnGameAborted += UponGameAborted;
tester.OnDebugMoment += UponDebugMoment;
testers.Add(tester);
}
(in this run, parallelTesters had a value of 3)
Each BotTester exclusively uses the Random instance passed to its constructor. Each thread is private to a single BotTester, started from BotTester.RunGame():
public bool RunGame(GameMode mode, int maxGames = 1, long maxMilliSeconds = 100000000, int maxRetries = 5000)
{
if (threadRunning) return false;
thread = new Thread(() => ThreadedRunGame(mode, maxGames, maxMilliSeconds, maxRetries));
thread.Start();
return true;
}
The only explanation that makes sense is that you think you are accessing Random() instance thread safe, with your own words, each thread has its own Random() instance but looks like you are accessing the same Random() instance while it is still in the middle of calculating. Here is the implementation in Unity. Sample() simply calls InternalSample()
private int InternalSample()
{
int inext = this.inext;
int inextp = this.inextp;
int index1;
if ((index1 = inext + 1) >= 56)
index1 = 1;
int index2;
if ((index2 = inextp + 1) >= 56)
index2 = 1;
int num = this.SeedArray[index1] - this.SeedArray[index2];
if (num < 0)
num += int.MaxValue;
this.SeedArray[index1] = num;
this.inext = index1;
this.inextp = index2;
return num;
}
As you can see the places you can get a IndexOutOfRangeException are limited,i.e when you access this.SeedArray. Here is the definition for SeedArray.
public class Random
{
private int[] SeedArray = new int[56];
....
}
It is already allocated to 56 elements and in the implementation of InternalSample method you can see that for each call index1 and index2 are always limited to be at most 55 unless InternalSample method is called multiple times.

Processing records speed

I realise this is a non-specific code question. But I suspect that people with answers are on this forum.
I am receiving a large amount of records of < 100 bytes via TCP at a rate of 10 per millisecond.
I have to parse and process the data and that takes me 100 microseconds - so I am pretty maxed out.
Does 100 microseconds seem large?
Here is an example of the kind of processing I do with LINQ. It is really convenient - but is it inherently slow?
public void Process()
{
try
{
int ptr = PayloadOffset + 1;
var cPair = MessageData.GetString(ref ptr, 7);
var orderID = MessageData.GetString(ref ptr, 15);
if (Book.CPairs.ContainsKey(cPair))
{
var cPairGroup = Book.CPairs[cPair];
if (cPairGroup.BPrices != null)
{
cPairGroup.BPrices.ForEach(x => { x.BOrders.RemoveAll(y => y.OrderID.Equals(orderID)); });
cPairGroup.BPrices.RemoveAll(x => x.BOrders.Count == 0);
}
}
}
}
public class BOrderGroup
{
public double Amount;
public string OrderID;
}
public class BPriceGroup
{
public double BPrice;
public List<BOrderGroup> BOrders;
}
public class CPairGroup
{
public List<BPriceGroup> BPrices;
}
public static Dictionary<string, CPairGroup> CPairs;
As other have mentioned, LINQ is not inherently slow. But it can be slower than equivalent non-LINQ code (this is why Roslyn team has "Avoid LINQ" guide under coding conventions).
If this is your hot path and you need every microsecond than you should probably implement logic in such a way:
public void Process()
{
try
{
int ptr = PayloadOffset + 1;
var cPair = MessageData.GetString(ref ptr, 7);
var orderID = MessageData.GetString(ref ptr, 15);
if (Book.CPairs.TryGetValue(cPair, out CPairGroup cPairGroup) && cPairGroup != null)
{
for (int i = cPairGroup.BPrices.Count - 1; i >= 0; i--)
{
var x = cPairGroup.BPrices[i];
for (int j = x.BOrders.Count - 1; j >= 0; j--)
{
var y = x.BOrders[j];
if (y.OrderID.Equals(orderID))
{
x.BOrders.RemoveAt(j);
}
}
if (x.BOrders.Count == 0)
{
cPairGroup.BPrices.RemoveAt(i);
}
}
}
}
}
Main points:
Avoid double dictionary lookup by using TryGetValue
Single iteration over cPairGroup.BPrices
In place modification of structures by iterating backwards
This code should not contain any additional heap allocations

Thread safety Parallel.For c#

im frenchi so sorry first sorry for my english .
I have an error on visual studio (index out of range) i have this problem only with a Parallel.For not with classic for.
I think one thread want acces on my array[i] and another thread want too ..
It's a code for calcul Kmeans clustering for building link between document (with cosine similarity).
more information :
IndexOutOfRange is on similarityMeasure[i]=.....
I have a computer with 2 Processor (12logical)
with classic for , cpu usage is 9-14% , time for 1 iteration=9min..
with parallel.for , cpu usage is 70-90% =p, time for 1 iteration =~1min30
Sometimes it works longer before generating an error
My function is :
private static int FindClosestClusterCenter(List<Centroid> clustercenter, DocumentVector obj)
{
float[] similarityMeasure = new float[clustercenter.Count()];
float[] copy = similarityMeasure;
object sync = new Object();
Parallel.For(0, clustercenter.Count(), (i) => //for(int i = 0; i < clustercenter.Count(); i++) Parallel.For(0, clustercenter.Count(), (i) => //
{
similarityMeasure[i] = SimilarityMatrics.FindCosineSimilarity(clustercenter[i].GroupedDocument[0].VectorSpace, obj.VectorSpace);
});
int index = 0;
float maxValue = similarityMeasure[0];
for (int i = 0; i < similarityMeasure.Count(); i++)
{
if (similarityMeasure[i] > maxValue)
{
maxValue = similarityMeasure[i];
index = i;
}
}
return index;
}
My function is call here :
do
{
prevClusterCenter = centroidCollection;
DateTime starttime = DateTime.Now;
foreach (DocumentVector obj in documentCollection)//Parallel.ForEach(documentCollection, parallelOptions, obj =>//foreach (DocumentVector obj in documentCollection)
{
int ind = FindClosestClusterCenter(centroidCollection, obj);
resultSet[ind].GroupedDocument.Add(obj);
}
TimeSpan tempsecoule = DateTime.Now.Subtract(starttime);
Console.WriteLine(tempsecoule);
//Console.ReadKey();
InitializeClusterCentroid(out centroidCollection, centroidCollection.Count());
centroidCollection = CalculMeanPoints(resultSet);
stoppingCriteria = CheckStoppingCriteria(prevClusterCenter, centroidCollection);
if (!stoppingCriteria)
{
//initialisation du resultat pour la prochaine itération
InitializeClusterCentroid(out resultSet, centroidCollection.Count);
}
} while (stoppingCriteria == false);
_counter = counter;
return resultSet;
FindCosSimilarity :
public static float FindCosineSimilarity(float[] vecA, float[] vecB)
{
var dotProduct = DotProduct(vecA, vecB);
var magnitudeOfA = Magnitude(vecA);
var magnitudeOfB = Magnitude(vecB);
float result = dotProduct / (float)Math.Pow((magnitudeOfA * magnitudeOfB),2);
//when 0 is divided by 0 it shows result NaN so return 0 in such case.
if (float.IsNaN(result))
return 0;
else
return (float)result;
}
CalculMeansPoint :
private static List<Centroid> CalculMeanPoints(List<Centroid> _clust)
{
for (int i = 0; i < _clust.Count(); i++)
{
if (_clust[i].GroupedDocument.Count() > 0)
{
for (int j = 0; j < _clust[i].GroupedDocument[0].VectorSpace.Count(); j++)
{
float total = 0;
foreach (DocumentVector vspace in _clust[i].GroupedDocument)
{
total += vspace.VectorSpace[j];
}
_clust[i].GroupedDocument[0].VectorSpace[j] = total / _clust[i].GroupedDocument.Count();
}
}
}
return _clust;
}
You may have some side effects in FindCosineSimilarity, make sure it does not modify any field or input parameter. Example: resultSet[ind].GroupedDocument.Add(obj);. If resultSet is not a reference to locally instantiated array, then that is a side effect.
That may fix it. But FYI you could use AsParallel for this rather than Parallel.For:
similarityMeasure = clustercenter
.AsParallel().AsOrdered()
.Select(c=> SimilarityMatrics.FindCosineSimilarity(c.GroupedDocument[0].VectorSpace, obj.VectorSpace))
.ToArray();
You realize that if you synchronize the whole Content of the Parallel-For, it's just the same as having a normal synchrone for-loop, right? Meaning the code as it is doesnt do anything in parallel, so I dont think you'll have any Problems with concurrency. My guess from what I can tell is clustercenter[i].GroupedDocument is propably an empty Array.

Duplicate math result in Bakery Algorithm (C# code)

Index out of bounds when create new thread with parameters? - Continue to my previous topic , now i got a new problem with my my Bakery Algorithm code !
Here's my code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace BakeryAlgorithm
{
class Program
{
static int threads = 10;
static string x = "";
static int count = 0;
static int[] ticket = new int[threads];
static bool[] entering = new bool[threads];
public static void doLock(int pid)
{
for (int i = 0; i < threads; i++)
{
ticket[i] = 0;
entering[i] = false;
}
entering[pid] = true;
int max = 0;
for (int i = 0; i < threads; i++)
{
if (ticket[i] > ticket[max]) { max = i; }
}
ticket[pid] = 1+max;
entering[pid] = false;
for (int i = 0; i < threads; ++i)
{
if (i != pid)
{
while (entering[i])
{
Thread.Yield();
}
while (ticket[i] != 0 && (ticket[pid] > ticket[i] ||
(ticket[pid] == ticket[i] && pid > i)))
{
Thread.Yield();
}
}
}
if (x == "C" || x == "c")
Console.WriteLine("[System] PID " + pid.ToString() + " get into critical section");
}
public static void unlock(int pid)
{
ticket[pid] = 0;
count++;
Console.WriteLine("[Thread] PID " + pid.ToString() + " complete.");
}
public static void arrayInit()
{
for (int i = 0; i < threads; i++)
{
ticket[i] = 0;
entering[i] = false;
}
}
public static void simThread(int i)
{
doLock(i);
if (x == "C" || x=="c")
Console.WriteLine("[Thread] PID " + i.ToString() + " begin to process...");
//Do some thing ????
Random rnd = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
int a = rnd.Next(1,99);
int b = rnd.Next(1,99);
int c = rnd.Next(1,4);
int d = 0;
string o="";
if (c == 1)
{
d = a + b;
o="+";
}
else if (c == 2)
{
d = a * b;
o="*";
}
else if (c == 3)
{
d = a / b;
o="/";
}
else
{
d = a - b;
o="-";
}
if (x == "C" || x == "c")
Console.WriteLine("Math Result : " + a.ToString() + o + b.ToString() + "=" + d.ToString());
unlock(i);
}
[STAThread]
static void Main(string[] args)
{
arrayInit();
string choice="C";
while (choice == "C" || x == "c")
{
Console.WriteLine("Process log (C=Yes,K=No) : ");
x = Console.ReadLine();
if (x == "")
x = "C";
Console.Clear();
Console.WriteLine("----------------------------------");
Console.WriteLine("Bakery Algorithm C#");
Console.WriteLine("Number of threads : " + threads.ToString());
Console.WriteLine("Process Log...");
Console.WriteLine("----------------------------------");
Thread[] threadArray = new Thread[threads];
for (int i = 0; i < 10; i++)
{
int copy = i;
threadArray[i] = new Thread(() => simThread(copy));
if (x == "C" || x == "c")
Console.WriteLine("[System] PID " + i.ToString() + " created");
threadArray[i].Start();
}
Console.ReadLine();
Console.WriteLine("----------------------------------");
Console.WriteLine("Complete processed " + count.ToString() + " threads !");
count = 0;
Console.WriteLine("----------------------------------");
Console.WriteLine("You want to restart (Yes=C or No=K)");
choice = Console.ReadLine();
if (choice == "")
choice = "C";
}
}
}
}
The result are here :
2*2=4
2*2=4 << duplicated
3*2=6
4*2=8
4*6=24
4*2=8 << duplicated
.... and continue with duplicate values ( random position ) !
Hope somebody here can help !
There's many things wrong with your code, but the most important part is that you didn't read the requirements that make Lamport's bakery work:
Lamport's bakery algorithm assumes a sequential consistency memory model.
You will be hard-pressed to find a modern computer that has sequential consistency.
So even if your implementation was correct with respect to those constraints, it would still be wrong on pretty much any computer that runs .NET. To make this work on a modern CPU and in .NET, you'll need to insert memory barriers to prevent instruction reordering and introduce cache refreshing to make sure each CPU core sees the same values... and by then you're probably better off using different synchronization primitives altogether.
Now, fixing these kinds of algorithms tends to be rather hard - multi-threading is hard on its own, doing lock-less multi-threading just pushes this to absurd territory. So let me just address some points:
1) You can't just use new Random() and expect statistically random numbers from that. Random has an internal state that's by default initialized to the current OS tick - that means that creating 10 Randoms in a row and then doing Next on each of those is pretty likely to produce exactly the same "random" numbers.
One way of handling that gracefully would be to have a thread-local field:
ThreadLocal<Random> rnd
= new ThreadLocal<Random>(() => new Random(Guid.NewGuid().GetHashCode()));
Each of your threads can then safely do rnd.Value.Next(...) and get reliable numbers without locking.
However, since the whole point of this excercise is to allow shared access to mutable state, a solution more in line with the task would be to use a single shared Random field instead (created only once, before starting the threads). Since the Bakery algorithm is supposed to make sure you can safely use shared stuff in the critical section, this should be safe, if implemented correctly :)
2) To actually make the Bakery part work, you need to enforce the only proper instruction ordering.
This is hard. Seriously.
I'm not actually sure how to do this safely.
The best way to start is to insert an explicit memory barrier before and after each read and write of shared state. Then you can go one by one and remove those that aren't necessary. Of course, you should only need this in the doLock and unlock methods - the rest of simThread should be single-threaded.
For a short sample:
Thread.MemoryBarrier();
entering[pid] = true;
Thread.MemoryBarrier();
int max = 0;
for (int i = 0; i < threads; i++)
{
if (ticket[i] > ticket[max]) { max = i; }
}
Thread.MemoryBarrier();
ticket[pid] = 1+max;
Thread.MemoryBarrier();
entering[pid] = false;
Thread.MemoryBarrier();
So, which one of those is it safe to remove? I have no idea. I'd have to use a lot of mental power to make sure this is safe. Heck, I'm not sure if it's safe as is - do I need to rewrite the for cycle too? Are ticket[i] and ticket[max] going to be fresh enough for the algorithm to work? I know some are definitely needed, but I'm not sure which can safely be left out.
I'm pretty sure this will be slower than using a simple lock, though. For any production code, steer clear away from code like this - "smart" code usually gets you in trouble, even if everyone in your team understands it well. It's kind of hard finding those kinds of experts, and most of those wouldn't touch lock-less code like that with a meter-long stick :)
You must create a different random number for each thread (more details)
so try this code in your main method:
for (int i = 0; i < 10; i++)
{
int temp = i;
threadArray[i] = new Thread(() => simThread(temp));
Console.WriteLine("[He Thong] PID " + i.ToString() + " duoc khoi tao");
threadArray[i].Start();
Thread.Sleep(20);
}
and the following code in you threads:
Random rand = new Random((int) DateTime.Now.Ticks & 0x0000FFFF);
now you can ensure you produce different random number for each thread.
Try:
Random rnd = new Random(Environment.TickCount / (i + 1));
This will give different seeds to each RNG.

Code sample that shows casting to uint is more efficient than range check

So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.
if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)
I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}
You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).

Categories

Resources