Duplicate math result in Bakery Algorithm (C# code) - c#

Index out of bounds when create new thread with parameters? - Continue to my previous topic , now i got a new problem with my my Bakery Algorithm code !
Here's my code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace BakeryAlgorithm
{
class Program
{
static int threads = 10;
static string x = "";
static int count = 0;
static int[] ticket = new int[threads];
static bool[] entering = new bool[threads];
public static void doLock(int pid)
{
for (int i = 0; i < threads; i++)
{
ticket[i] = 0;
entering[i] = false;
}
entering[pid] = true;
int max = 0;
for (int i = 0; i < threads; i++)
{
if (ticket[i] > ticket[max]) { max = i; }
}
ticket[pid] = 1+max;
entering[pid] = false;
for (int i = 0; i < threads; ++i)
{
if (i != pid)
{
while (entering[i])
{
Thread.Yield();
}
while (ticket[i] != 0 && (ticket[pid] > ticket[i] ||
(ticket[pid] == ticket[i] && pid > i)))
{
Thread.Yield();
}
}
}
if (x == "C" || x == "c")
Console.WriteLine("[System] PID " + pid.ToString() + " get into critical section");
}
public static void unlock(int pid)
{
ticket[pid] = 0;
count++;
Console.WriteLine("[Thread] PID " + pid.ToString() + " complete.");
}
public static void arrayInit()
{
for (int i = 0; i < threads; i++)
{
ticket[i] = 0;
entering[i] = false;
}
}
public static void simThread(int i)
{
doLock(i);
if (x == "C" || x=="c")
Console.WriteLine("[Thread] PID " + i.ToString() + " begin to process...");
//Do some thing ????
Random rnd = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
int a = rnd.Next(1,99);
int b = rnd.Next(1,99);
int c = rnd.Next(1,4);
int d = 0;
string o="";
if (c == 1)
{
d = a + b;
o="+";
}
else if (c == 2)
{
d = a * b;
o="*";
}
else if (c == 3)
{
d = a / b;
o="/";
}
else
{
d = a - b;
o="-";
}
if (x == "C" || x == "c")
Console.WriteLine("Math Result : " + a.ToString() + o + b.ToString() + "=" + d.ToString());
unlock(i);
}
[STAThread]
static void Main(string[] args)
{
arrayInit();
string choice="C";
while (choice == "C" || x == "c")
{
Console.WriteLine("Process log (C=Yes,K=No) : ");
x = Console.ReadLine();
if (x == "")
x = "C";
Console.Clear();
Console.WriteLine("----------------------------------");
Console.WriteLine("Bakery Algorithm C#");
Console.WriteLine("Number of threads : " + threads.ToString());
Console.WriteLine("Process Log...");
Console.WriteLine("----------------------------------");
Thread[] threadArray = new Thread[threads];
for (int i = 0; i < 10; i++)
{
int copy = i;
threadArray[i] = new Thread(() => simThread(copy));
if (x == "C" || x == "c")
Console.WriteLine("[System] PID " + i.ToString() + " created");
threadArray[i].Start();
}
Console.ReadLine();
Console.WriteLine("----------------------------------");
Console.WriteLine("Complete processed " + count.ToString() + " threads !");
count = 0;
Console.WriteLine("----------------------------------");
Console.WriteLine("You want to restart (Yes=C or No=K)");
choice = Console.ReadLine();
if (choice == "")
choice = "C";
}
}
}
}
The result are here :
2*2=4
2*2=4 << duplicated
3*2=6
4*2=8
4*6=24
4*2=8 << duplicated
.... and continue with duplicate values ( random position ) !
Hope somebody here can help !

There's many things wrong with your code, but the most important part is that you didn't read the requirements that make Lamport's bakery work:
Lamport's bakery algorithm assumes a sequential consistency memory model.
You will be hard-pressed to find a modern computer that has sequential consistency.
So even if your implementation was correct with respect to those constraints, it would still be wrong on pretty much any computer that runs .NET. To make this work on a modern CPU and in .NET, you'll need to insert memory barriers to prevent instruction reordering and introduce cache refreshing to make sure each CPU core sees the same values... and by then you're probably better off using different synchronization primitives altogether.
Now, fixing these kinds of algorithms tends to be rather hard - multi-threading is hard on its own, doing lock-less multi-threading just pushes this to absurd territory. So let me just address some points:
1) You can't just use new Random() and expect statistically random numbers from that. Random has an internal state that's by default initialized to the current OS tick - that means that creating 10 Randoms in a row and then doing Next on each of those is pretty likely to produce exactly the same "random" numbers.
One way of handling that gracefully would be to have a thread-local field:
ThreadLocal<Random> rnd
= new ThreadLocal<Random>(() => new Random(Guid.NewGuid().GetHashCode()));
Each of your threads can then safely do rnd.Value.Next(...) and get reliable numbers without locking.
However, since the whole point of this excercise is to allow shared access to mutable state, a solution more in line with the task would be to use a single shared Random field instead (created only once, before starting the threads). Since the Bakery algorithm is supposed to make sure you can safely use shared stuff in the critical section, this should be safe, if implemented correctly :)
2) To actually make the Bakery part work, you need to enforce the only proper instruction ordering.
This is hard. Seriously.
I'm not actually sure how to do this safely.
The best way to start is to insert an explicit memory barrier before and after each read and write of shared state. Then you can go one by one and remove those that aren't necessary. Of course, you should only need this in the doLock and unlock methods - the rest of simThread should be single-threaded.
For a short sample:
Thread.MemoryBarrier();
entering[pid] = true;
Thread.MemoryBarrier();
int max = 0;
for (int i = 0; i < threads; i++)
{
if (ticket[i] > ticket[max]) { max = i; }
}
Thread.MemoryBarrier();
ticket[pid] = 1+max;
Thread.MemoryBarrier();
entering[pid] = false;
Thread.MemoryBarrier();
So, which one of those is it safe to remove? I have no idea. I'd have to use a lot of mental power to make sure this is safe. Heck, I'm not sure if it's safe as is - do I need to rewrite the for cycle too? Are ticket[i] and ticket[max] going to be fresh enough for the algorithm to work? I know some are definitely needed, but I'm not sure which can safely be left out.
I'm pretty sure this will be slower than using a simple lock, though. For any production code, steer clear away from code like this - "smart" code usually gets you in trouble, even if everyone in your team understands it well. It's kind of hard finding those kinds of experts, and most of those wouldn't touch lock-less code like that with a meter-long stick :)

You must create a different random number for each thread (more details)
so try this code in your main method:
for (int i = 0; i < 10; i++)
{
int temp = i;
threadArray[i] = new Thread(() => simThread(temp));
Console.WriteLine("[He Thong] PID " + i.ToString() + " duoc khoi tao");
threadArray[i].Start();
Thread.Sleep(20);
}
and the following code in you threads:
Random rand = new Random((int) DateTime.Now.Ticks & 0x0000FFFF);
now you can ensure you produce different random number for each thread.

Try:
Random rnd = new Random(Environment.TickCount / (i + 1));
This will give different seeds to each RNG.

Related

Multithreading is taking more time than sequential threading

I am new to C#
I am generating random numbers saving into an integer array of size 1 million, then I search user input number and its occurrences in an array using single thread then I search it using 5 threads. My processor has 4 cores.
THE PROBLEM is multithreading is taking way more time than sequential I just cannot figure out why any help would be much appreciated.
Here is the code.
namespace LAB_2
{
class Program
{
static int[] arr = new int[1000000];
static int counter = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0,c5=0;
static int x = 0;
#if DEBUG
static void Main(string[] args)
{
try
{
//Take input
generate();
Console.WriteLine("Enter number to search for its occurances");
x = Console.Read();
//Multithreaded search
Stopwatch stopwatch2 = Stopwatch.StartNew();
multithreaded_search();
stopwatch2.Stop();
Console.WriteLine("Multithreaded search");
Console.WriteLine("Total milliseconds with multiple threads = " + stopwatch2.ElapsedMilliseconds);
//search without multithreading
Stopwatch stopwatch = Stopwatch.StartNew();
search();
stopwatch.Stop();
Console.WriteLine("Total milliseconds without multiple threads = " + stopwatch.ElapsedMilliseconds);
}
finally
{
Console.WriteLine("Press enter to close...");
Console.ReadLine();
}
#endif
}
public static void generate() //Populate the array
{
Random rnd = new Random();
for (int i = 0; i < 1000000; i++)
{
arr[i] = rnd.Next(1, 500000);
}
}
public static void search() //single threaded/Normal searching
{
int counter = 0;
for (int i = 0; i < 1000000; i++)
{
if (x == arr[i])
{
counter++;
}
}
Console.WriteLine("Number of occurances " + counter);
}
public static void multithreaded_search()
{
Task thr1 = Task.Factory.StartNew(() => doStuff(0, 200000, "c1"));
Task thr2 = Task.Factory.StartNew(() => doStuff(200001, 400000, "c2"));
Task thr3 = Task.Factory.StartNew(() => doStuff(400001, 600000, "c3"));
Task thr4 = Task.Factory.StartNew(() => doStuff(600001, 800000, "c4"));
Task thr5 = Task.Factory.StartNew(() => doStuff(800001, 1000000, "c5"));
//IF I don't use WaitAll then the search is
//faster than sequential, but gets compromised
Task.WaitAll(thr1, thr2, thr3, thr4, thr5);
counter = c1 + c2 + c3 + c4 + c5;
Console.WriteLine("Multithreaded search");
Console.WriteLine("Number of occurances " + counter);
}
static void doStuff(int stime, int etime, String c)
{
for (int i = stime; i < etime; i++)
{
if (x == arr[i])
{
switch (c)
{
case "c1":
c1++;
break;
case "c2":
c2++;
break;
case "c3":
c3++;
break;
case "c4":
c4++;
break;
case "c5":
c5++;
break;
};
}
Thread.Yield();
}
}
}
}
First, in your doStuff you do more work than in search. While it is not likely to have a tangible effect, you never know.
Second, Thread.Yield is a killer with tasks. This methods is intended to be used in very marginal situations like spinning when you think a lock might be too expensive. Here, it is just a brake to your code, causing the OS scheduler to do more work, perhaps even do a context-switch on the current core, which in turn will invalidate the cache.
Finally, your data and computations are small. Moderns CPUs will enumerate such an array in no time, and it is likely a great part of it, or even all, fits in the cache. Concurrent processing has its overhead.
I recommend Benchmark.NET.

Checking if instance is null VS calling empty function (C#)

Introduction
So I was making a game, thinking how do I structure and update all my game objects. Do I (case 1) create a simple GameObj as a parent class and put some physics in virtual Update method, some default drawing in virtual Draw, etc, and make every other object (wall, enemy, player...) be the child, OR do I (case 2) use components as described in this article. In short, the writer explains that we could make interfaces for user input, physics update and draw (lets stop at those 3) and describe our GameObj with preprogrammed instances of these interfaces.
Now, in both cases I will get a loop of GameObj class.
In case 1 it would probably look something like this
// in Update function of the level class
for(int i = 0; i < gameObjList.Count; i++)
{
gameObjList[i].Update();
}
And in case 2, something like this
// in UpdatePhysics function of the level class
for(int i = 0; i < gameObjList.Count; i++)
{
gameObjList[i].PhysicComponent.Update();
}
And so on (in case 2) for other interfaces such as InputComponent.Update and DrawComponent.Draw (or CollisionComponent.Check(gameObj[x]), I dunno).
Reasons listed are ment to be inside a level class that takes care of all of our game objects
Reasons to consider if ( x != null )
In both cases we (could) have a situation where we need to call if ( x != null ). In case 1 we maybe don't want to delete and add to the gameObjList all the time, but recycle the instances, so we set them to null without doing something along the lines of gameObjList.Remove(x). In case 2 maybe we want to be able not to set some of the components, so we'd have to ask if (gameObjList[i].someComponent != null) to be able to call gameObjList[i].someComponent.Update().
Reasons to consider calling empty function
Also in both cases, we could just call an empty function (e.g. public void myFunction(){}). Lets consider the self explanatory Wall class. It exists just to be there. Id doesn't update, but it does have a certain relation to other GameObjs. Also, some of it's children in case 1, like a lets say MovingWall or Platform would have some sort of update. As for case 2, we could always declare a default, empty class of someComponent whose Update function would be empty, and so an instance of this class would be set to our GameObj component if none is set in the constructor. Maybe something like this
public GameObj(IPhysicsComponent physicsComponent, ...){
if(physicsComponent == null)
physicsComponent = PhysicsComponent.Default;
this.physicsComponent = physicsComponent;
}
Research
Now, I didn't find what would be the most efficient thing to do in a game engine we are building here. Here are some examples I just tested (note some of them are just for reference):
1. empty loop
2. empty function
3. if(x != null) x.empyFunction(); x is always null
4. x?.emptyFunction(); x is always null
5. if(x != null) x.empyFunction(); x is not null
6. x?.emptyFunction(); x is not null
7. myClass.staticEmptyFunction();
These 7 points are tested 100 000 times, 10 000 times. The code below is the code that I tested with. You can run in locally, change some of the static variables, and the result will appear in "result.txt" in the folder where you ran the program. Here is the code :
public enum TimeType
{
emptyLoop = 1,
loopEmptyFunction = 2,
loopNullCheck = 3,
loopNullCheckShort = 4,
loopNullCheckInstanceNotNull = 5,
loopNullCheckInstanceNotNullShort = 6,
loopEmptyStaticFunction = 7
}
class myTime
{
public double miliseconds { get; set; }
public long ticks { get; set; }
public TimeType type { get; set; }
public myTime() { }
public myTime(Stopwatch stopwatch, TimeType type)
{
miliseconds = stopwatch.Elapsed.TotalMilliseconds;
ticks = stopwatch.ElapsedTicks;
this.type = type;
}
}
class myClass
{
public static void staticEmptyFunction() { }
public void emptyFunction() { }
}
class Program
{
static List<myTime> timesList = new List<myTime>();
static int testTimesCount = 10000;
static int oneTestDuration = 100000;
static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
Console.Write("TEST ");
for (int j = 0; j < testTimesCount; j++)
{
Console.Write("{0}, ", j + 1);
myClass myInstance = null;
// 1. EMPTY LOOP
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)1));
stopwatch.Reset();
// 3. LOOP WITH NULL CHECKING (INSTANCE IS NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
if (myInstance != null)
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)3));
stopwatch.Reset();
// 4. LOOP WITH SHORT NULL CHECKING (INSTANCE IS NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance?.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)4));
stopwatch.Reset();
myInstance = new myClass();
// 2. LOOP WITH EMPTY FUNCTION
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)2));
stopwatch.Reset();
// 5. LOOP WITH NULL CHECKING (INSTANCE IS NOT NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
if (myInstance != null)
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)5));
stopwatch.Reset();
// 6. LOOP WITH SHORT NULL CHECKING (INSTANCE IS NOT NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance?.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)6));
stopwatch.Reset();
// 7. LOOP WITH STATIC FUNCTION
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myClass.staticEmptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)7));
stopwatch.Reset();
}
Console.WriteLine("\nDONE TESTING");
}
static void GetResults()
{
// SUMS
double sum1t, sum2t, sum3t, sum4t, sum5t, sum6t, sum7t,
sum1m, sum2m, sum3m, sum4m, sum5m, sum6m, sum7m;
sum1t = sum2t = sum3t = sum4t = sum5t = sum6t = sum7t =
sum1m = sum2m = sum3m = sum4m = sum5m = sum6m = sum7m = 0;
foreach (myTime time in timesList)
{
switch (time.type)
{
case (TimeType)1: sum1t += time.ticks; sum1m += time.miliseconds; break;
case (TimeType)2: sum2t += time.ticks; sum2m += time.miliseconds; break;
case (TimeType)3: sum3t += time.ticks; sum3m += time.miliseconds; break;
case (TimeType)4: sum4t += time.ticks; sum4m += time.miliseconds; break;
case (TimeType)5: sum5t += time.ticks; sum5m += time.miliseconds; break;
case (TimeType)6: sum6t += time.ticks; sum6m += time.miliseconds; break;
case (TimeType)7: sum7t += time.ticks; sum7m += time.miliseconds; break;
}
}
// AVERAGES
double avg1t, avg2t, avg3t, avg4t, avg5t, avg6t, avg7t,
avg1m, avg2m, avg3m, avg4m, avg5m, avg6m, avg7m;
avg1t = sum1t / (double)testTimesCount;
avg2t = sum2t / (double)testTimesCount;
avg3t = sum3t / (double)testTimesCount;
avg4t = sum4t / (double)testTimesCount;
avg5t = sum5t / (double)testTimesCount;
avg6t = sum6t / (double)testTimesCount;
avg7t = sum7t / (double)testTimesCount;
avg1m = sum1m / (double)testTimesCount;
avg2m = sum2m / (double)testTimesCount;
avg3m = sum3m / (double)testTimesCount;
avg4m = sum4m / (double)testTimesCount;
avg5m = sum5m / (double)testTimesCount;
avg6m = sum6m / (double)testTimesCount;
avg7m = sum7m / (double)testTimesCount;
string fileName = "/result.txt";
using (StreamWriter tr = new StreamWriter(AppDomain.CurrentDomain.BaseDirectory + fileName))
{
tr.WriteLine(((TimeType)1).ToString() + "\t" + avg1t + "\t" + avg1m);
tr.WriteLine(((TimeType)2).ToString() + "\t" + avg2t + "\t" + avg2m);
tr.WriteLine(((TimeType)3).ToString() + "\t" + avg3t + "\t" + avg3m);
tr.WriteLine(((TimeType)4).ToString() + "\t" + avg4t + "\t" + avg4m);
tr.WriteLine(((TimeType)5).ToString() + "\t" + avg5t + "\t" + avg5m);
tr.WriteLine(((TimeType)6).ToString() + "\t" + avg6t + "\t" + avg6m);
tr.WriteLine(((TimeType)7).ToString() + "\t" + avg7t + "\t" + avg7m);
}
}
static void Main(string[] args)
{
RunTest();
GetResults();
Console.ReadLine();
}
}
When I put all the data in excel and made a chart, it looked like this (DEBUG):
EDIT - RELEASE version. I guess this answers my question.
The questions are
Q1. What approach to use to be more efficient?
Q2. In what case?
Q3. Is there official documentation on this?
Q4. Did anybody else test this, maybe more intensively?
Q5. Is there a better way to test this (is my code at fault)?
Q6. Is there a better way around the problem of huge lists of instances that need to be quickly and efficiently updated, as in - every frame?
EDIT Q7. Why does the static method take so much longer to execute in the release version?
As #grek40 suggested, I did another test where I called myClass.staticEmptyFunction(); 100 times before starting the test so that it can be cashed. I also did set testTimesCount to 10 000 and oneTestDuration to 1 000 000. Here are the results:
Now, it seems much more stable. Even the little differences you can spot I blame on my google chrome, excel and deluge running in the background. The questions I asked I asked because I thought that there would be a greater difference, but I guess that the optimisation worsk much much better than I expected. I also guess that nobody did this test because they probably knew that there's C behind it, and that people did amasing work on the optimisation.

Garbage Collector generations not incrementing

I have a lot of questions about the garbage collection procedure, mainly when does it run, when the objects are set to an older generation, so on..
static void Main(string[] args)
{
int i = 0, j = 0;
int a = 0;
Holder prev = new Holder(null);
while(GC.CollectionCount(1) == 0)
{
int aux = GC.CollectionCount(0);
if(aux > a){
a = aux;
++j;
Console.WriteLine((i+1));
}
++i;
Holder h = new Holder(prev);
Console.WriteLine(GC.GetGeneration(prev));
prev = h;
}
}
I'm trying to get the number of objects in the gen1.
Why does j = 1; ?? the GC only runs once on the gen0 (to leave the while shouldn't it at least run 2 times)?
[EDIT]
by adding this after the while breaks, i got very confused
Console.WriteLine("#gc0 = "+GC.CollectionCount(0)); --> 2
Console.WriteLine("#gc1 = "+GC.CollectionCount(1)); --> 1
Console.WriteLine("#objs = "+ i);
Console.ReadLine();
how come the GC.CollectionCount(0) be only 2? I´ve been reading Richter clr via c# and he said this
the objects in generation 1 are examined
only when generation 1 reaches its budget, which usually requires several garbage collections of
generation 0.
[EDIT]
but at the same time, if the gc sees that all the object's survived, it grows g0 limit, maybe the reason for only 2 gc's on g0?
How about we spice things up with a bit of randomness like this:
class Program
{
static void Main(string[] args)
{
Random random = new Random();
int i = 0, j = 0;
int a = 0;
Holder prev = new Holder(null);
Holder prev2 = new Holder(null);
while (GC.CollectionCount(1) == 0)
{
int aux = GC.CollectionCount(0);
if (aux > a)
{
a = aux;
++j;
Console.WriteLine((i + 1));
}
++i;
var flag = random.Next(1) == 1;
Holder h = new Holder(flag ? prev : prev2);
Console.WriteLine("Prev: " + GC.GetGeneration(prev));
Console.WriteLine("Prev2: " + GC.GetGeneration(prev2));
if (flag)
{
prev = h;
}
else
{
prev2 = h;
}
}
}
}
internal class Holder
{
private Holder holder;
public Holder(Holder o)
{
holder = o;
}
}
The code sample you've provided was so simple that the CLR knew there was not point in moving your prev item to another generation.
It usage was simple and I think that the runtime had it optimized it to live on G0 only.
Adding a more complex logic breaks the runtime's optimizations and now one of prev or prev1 will go on the G1 depending on which of the objects was used less frequently (don't know the exact mechanics here).
You can try adding instead of prev and prev2 an prevs array and do a random there on the index and you can better see how the array elements will advance the generations.

Code sample that shows casting to uint is more efficient than range check

So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.
if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)
I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}
You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).

What's wrong with my implementation of the KMP algorithm?

static void Main(string[] args)
{
string str = "ABC ABCDAB ABCDABCDABDE";//We should add some text here for
//the performance tests.
string pattern = "ABCDABD";
List<int> shifts = new List<int>();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
NaiveStringMatcher(shifts, str, pattern);
stopWatch.Stop();
Trace.WriteLine(String.Format("Naive string matcher {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
shifts.Clear();
stopWatch.Restart();
int[] pi = new int[pattern.Length];
Knuth_Morris_Pratt(shifts, str, pattern, pi);
stopWatch.Stop();
Trace.WriteLine(String.Format("Knuth_Morris_Pratt {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
Console.ReadKey();
}
static IList<int> NaiveStringMatcher(List<int> shifts, string text, string pattern)
{
int lengthText = text.Length;
int lengthPattern = pattern.Length;
for (int s = 0; s < lengthText - lengthPattern + 1; s++ )
{
if (text[s] == pattern[0])
{
int i = 0;
while (i < lengthPattern)
{
if (text[s + i] == pattern[i])
i++;
else break;
}
if (i == lengthPattern)
{
shifts.Add(s);
}
}
}
return shifts;
}
static IList<int> Knuth_Morris_Pratt(List<int> shifts, string text, string pattern, int[] pi)
{
int patternLength = pattern.Length;
int textLength = text.Length;
//ComputePrefixFunction(pattern, pi);
int j;
for (int i = 1; i < pi.Length; i++)
{
j = 0;
while ((i < pi.Length) && (pattern[i] == pattern[j]))
{
j++;
pi[i++] = j;
}
}
int matchedSymNum = 0;
for (int i = 0; i < textLength; i++)
{
while (matchedSymNum > 0 && pattern[matchedSymNum] != text[i])
matchedSymNum = pi[matchedSymNum - 1];
if (pattern[matchedSymNum] == text[i])
matchedSymNum++;
if (matchedSymNum == patternLength)
{
shifts.Add(i - patternLength + 1);
matchedSymNum = pi[matchedSymNum - 1];
}
}
return shifts;
}
Why does my implemention of the KMP algorithm work slower than the Naive String Matching algorithm?
The KMP algorithm has two phases: first it builds a table, and then it does a search, directed by the contents of the table.
The naive algorithm has one phase: it does a search. It does that search much less efficiently in the worst case than the KMP search phase.
If the KMP is slower than the naive algorithm then that is probably because building the table is taking you longer than it takes to simply search the string naively in the first place. Naive string matching is usually very fast on short strings. There is a reason why we don't use fancy-pants algorithms like KMP inside the BCL implementations of string searching. By the time you set up the table, you could have done half a dozen searches of short strings with the naive algorithm.
KMP is only a win if you have enormous strings and you are doing lots of searches that allow you to re-use an already-built table. You need to amortize away the huge cost of building the table by doing lots of searches using that table.
And also, the naive algorithm only has bad performance in bizarre and unlikely scenarios. Most people are searching for words like "London" in strings like "Buckingham Palace, London, England", and not searching for strings like "BANANANANANANA" in strings like "BANAN BANBAN BANBANANA BANAN BANAN BANANAN BANANANANANANANANAN...". The naive search algorithm is optimal for the first problem and highly sub-optimal for the latter problem; but it makes sense to optimize for the former, not the latter.
Another way to put it: if the searched-for string is of length w and the searched-in string is of length n, then KMP is O(n) + O(w). The Naive algorithm is worst case O(nw), best case O(n + w). But that says nothing about the "constant factor"! The constant factor of the KMP algorithm is much larger than the constant factor of the naive algorithm. The value of n has to be awfully big, and the number of sub-optimal partial matches has to be awfully large, for the KMP algorithm to win over the blazingly fast naive algorithm.
That deals with the algorithmic complexity issues. Your methodology is also not very good, and that might explain your results. Remember, the first time you run code, the jitter has to jit the IL into assembly code. That can take longer than running the method in some cases. You really should be running the code a few hundred thousand times in a loop, discarding the first result, and taking an average of the timings of the rest.
If you really want to know what is going on you should be using a profiler to determine what the hot spot is. Again, make sure you are measuring the post-jit run, not the run where the code is jitted, if you want to have results that are not skewed by the jit time.
Your example is too small and it does not have enough repetitions of the pattern where KMP avoids backtracking.
KMP can be slower than the normal search in some cases.
A Simple KMPSubstringSearch Implementation.
https://github.com/bharathkumarms/AlgorithmsMadeEasy/blob/master/AlgorithmsMadeEasy/KMPSubstringSearch.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace AlgorithmsMadeEasy
{
class KMPSubstringSearch
{
public void KMPSubstringSearchMethod()
{
string text = System.Console.ReadLine();
char[] sText = text.ToCharArray();
string pattern = System.Console.ReadLine();
char[] sPattern = pattern.ToCharArray();
int forwardPointer = 1;
int backwardPointer = 0;
int[] tempStorage = new int[sPattern.Length];
tempStorage[0] = 0;
while (forwardPointer < sPattern.Length)
{
if (sPattern[forwardPointer].Equals(sPattern[backwardPointer]))
{
tempStorage[forwardPointer] = backwardPointer + 1;
forwardPointer++;
backwardPointer++;
}
else
{
if (backwardPointer == 0)
{
tempStorage[forwardPointer] = 0;
forwardPointer++;
}
else
{
int temp = tempStorage[backwardPointer];
backwardPointer = temp;
}
}
}
int pointer = 0;
int successPoints = sPattern.Length;
bool success = false;
for (int i = 0; i < sText.Length; i++)
{
if (sText[i].Equals(sPattern[pointer]))
{
pointer++;
}
else
{
if (pointer != 0)
{
int tempPointer = pointer - 1;
pointer = tempStorage[tempPointer];
i--;
}
}
if (successPoints == pointer)
{
success = true;
}
}
if (success)
{
System.Console.WriteLine("TRUE");
}
else
{
System.Console.WriteLine("FALSE");
}
System.Console.Read();
}
}
}
/*
* Sample Input
abxabcabcaby
abcaby
*/

Categories

Resources