Optimizing the search for prime numbers - c#

I created a function for finding Prime Numbers, but the process takes a long time and uses a lot of memory. I need to optimize my code by making it more time and memory efficient.
The function is split into two parts:
The 1st part calculates odd numbers, the 2nd part is the isSimple method which searches for odd numbers that are prime.
I made some progress by moving the Math.Sqrt(N) outside of the for loop, but I'm not sure what to do next.
Any suggestions are welcome.
Program:
class Program
{
static void Main(string[] args)
{
//consider odd numbers
for (int i = 10001; i <=90000; i+=2)
{
if (isSimple(i))
{
Console.Write(i.ToString() + "\n");
}
}
}
//The method of finding primes
private static bool isSimple(int N)
{
double koren = Math.Sqrt(N);
// to verify a prime number or not enough to check whether it is //divisible number on numbers before its root
for (int i = 2; i <= koren; i++)
{
if (N % i == 0)
return false;
}
return true;
}
}

You are checking all possible divisors with your for (int i = 2; i <= koren; i++) That wastes time. You can halve the time taken by checking only odd divisors. You know that the given number N is odd, so no even number can be a divisor. Try for (int i = 3; i <= koren; i+=2) instead.

Related

Scheduler algorithm that fill remaning hours to work

I need to generate all possible values to a scheduler who works like this:
Some hours of the week can be already chosen.
The week of work is defined by the following pattern "???????" question marks can be replaced.
Given a maximum of hours, I need to replace the question marks with digits so that the sum of the scheduled hours match the hours need to work in a week returning a string array with all possible schedules, ordered lexicographically.
Example:
pattern = "08??840",
required_week_hours= 24
In this example, there are only 4 hours left to work.
calling this:
function List<String> GenerateScheduler(int workHours, int dayHours, string pattern){}
public static void Main(){
GenerateScheduler(24, 4, "08??840");
}
This would return the following list of strings:
0804840
0813840
.......
.......
0840840
I'm not very familiar with algorithms, which one I could use to solve this problem?
This sounds like a problem where you have to generate all permutations of a list of a certain amount of numbers that sum up to a certain number. First, you need to sum up the hours you already know. Then you need to count up the number of ? aka the number of shifts/days you do not know about. Using these parameters, this is what the solution will look like,
public List<string> GenerateScheduler(int workHours, int dayHours, string pattern){
int remainingSum = workHours;
int unknownCount = 0;
// first iterate through the pattern to know how many ? characters there are
// as well as the number of hours remaining
for (int i = 0; i < pattern.Length; i++) {
if (pattern[i] == '?') {
unknownCount++;
}
else {
remainingSum -= pattern[i] - '0';
}
}
List<List<int>> permutations = new List<List<int>>();
// get all the lists of work shifts that sum to the remaining number of hours
// the number of work shifts in each list is the number of ? characters in pattern
GeneratePermutations(permutations, workHours, unknownCount);
// after getting all the permutations, we need to iterate through the pattern
// for each permutation to construct a list of schedules to return
List<string> schedules = new List<string>();
foreach (List<int> permutation in permutation) {
StringBuilder newSchedule = new StringBuilder();
int permCount = 0;
for (int i = 0; i < pattern.Length(); i++) {
if (pattern[i] == '?') {
newSchedule.Append(permutation[permCount]);
permCount++;
}
else {
newSchedule.Append(pattern[i]);
}
}
schedules.Add(newSchedule.ToString());
}
return schedules;
}
public void GeneratePermutations(List<List<int>> permutations, int workHours, int unknownCount) {
for (int i = 0; i <= workHours; i++) {
List<int> permutation = new List<int>();
permutation.Add(i);
GeneratePermuationsHelper(permutations, permutation, workHours - i, unknownCount - 1);
}
}
public void GeneratePermutationsHelper(List<List<int>> permutations, List<int> permutation, int remainingHours, int remainingShifts){
if (remainingShifts == 0 && remainingHours == 0) {
permutations.Add(permutation);
return;
}
if (remainingHours <= 0 || remainingShifts <= 0) {
return;
}
for (int i = 0; i <= remainingHours; i++) {
List<int> newPermutation = new List<int>(permutation);
newPermutation.Add(i);
GeneratePermutationsHelper(permutations, newPermutation, remainingHours - i, remainingShifts - 1);
}
}
This can be a lot to digest so I will briefly go over how the permutation recursive helper function works. The parameters go as follows:
a list containing all the permutations
the current permutation being examined
the remaining number of hours needed to reach the total work hour count
the number of remaining shifts (basically number of '?' - permutation.Count)
First, we check to see if the current permutation meets the criteria that the total of its work hours equals the amount of hours remaining needed to complete the pattern and the number of shifts in the permutation equals the number of question marks in the pattern. If it does, then we add this permutation to the list of permutations. If it doesn't, we check to see if the total amount of work hours surpasses the amount of hours remaining or if the number of shifts has reached the number of question marks in the pattern. If so, then the permutation is not added. However, if we can still add more shifts, we will run a loop from i = 0 to remainingHours and make a copy of the permutation while adding i to this copied list in each iteration of the loop. Then, we will adjust the remaining hours and remaining shifts accordingly before calling the helper function recursively with the copied permutation.
Lastly, we can use these permutations to create a list of new schedules, replacing the ? characters in the pattern with the numbers from each permutation.
As per OP, you already know the remaining hours, which I assume is given by the parameter dayHours. So, if you were to break down the problem further, you would need to replace '?' characters with numbers so that, sum of new character(number) is equal to remaining hours(dayHours).
You can do the following.
public IEnumerable<string> GenerateScheduler(int totalHours,int remainingHours,string replacementString)
{
var numberOfPlaces = replacementString.Count(x => x == '?');
var minValue = remainingHours;
var maxValue = remainingHours * Math.Pow(10,numberOfPlaces-1);
var combinations = Enumerable.Range(remainingHours,(int)maxValue)
.Where(x=> SumOfDigit(x) == remainingHours).Select(x=>x.ToString().PadLeft(numberOfPlaces,'0').ToCharArray());
foreach(var item in combinations)
{
var i = 0;
yield return Regex.Replace(replacementString, "[?]", (m) => {return item[i++].ToString(); });
}
}
double SumOfDigit(int value)
{
int sum = 0;
while (value != 0)
{
int remainder;
value = Math.DivRem(value, 10, out remainder);
sum += remainder;
}
return sum;
}

Code sample that shows casting to uint is more efficient than range check

So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.
if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)
I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}
You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).

Difference in declaring variable as a array index and "0"?

I'm new to c# so be ready for some dumb questions.
My current task is to find a score from an array where the highest/lowest scores have been taken away, and if the highest/lowest occur more than once (ONLY if they occur more than once), one of them can be added:
eg. int[] scores = [4, 8, 6, 4, 8, 5] therefore the final addition will be 4+8+6+5 = 23.
Another condition of the task is that LINQ cannot be used, as well as any of the System.Array methods. (you can see by my previously ask questions that has been a bit of a pain for me, since I solved this with LINQ in less than 5 minutes).
So here is the problem: I have working code the solves the problem but the task requires multiple methods/functions, so I cannot receive full marks if I have only 3 methods (including main). I have been trying to restructure the program but with all sorts of issues. Here is my code (just so I can explain it better):
using System;
using System.Collections.Generic;
//using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Scoring {
class Program {
static int highOccurrence = 0;
static int lowOccurrence = 0;
//static int high; <------
//static int low; <------
static void Main(string[] args) {
int[] scores = { 4, 8, 6, 4, 8, 5 };
findScore(scores);
ExitProgram();
}
static int findOccurrence(int[] scores, int low, int high) { //find the number of times a high/low occurs
for (int i = 0; i < scores.Length; i++) {
if (low == scores[i]) {
lowOccurrence++;
//record number of time slow occurs
}
if (high == scores[i]) {
highOccurrence++;
//record number of times high occurs }
}
return highOccurrence;
}
static int findScore(int[] scores) { //calculates score, needs to be restructured
int[] arrofNormal = new int[scores.Length];
//int low = scores[0]; <----This is where the issue is
//int high = scores[0]; <----- ^^^^^
int total = 0;
for (int i = 0; i < scores.Length; i++) {
if (low > scores[i]) {
low = scores[i];
} //record lowest value
if (high < scores[i]) {
high = scores[i];
//record highest value
}
}
for (int x = 0; x < scores.Length; x++) {
if (scores[x] != low && scores[x] != high) {
arrofNormal[x] = scores[x];
//provides the total of the scores (not including the high and the low)
}
total += arrofNormal[x];
}
findOccurrence(scores, low, high);
if (highOccurrence > 1) { //if there is more than 1 high (or 1 low) it is added once into the total
total += high;
if (lowOccurrence > 1) {
total += low;
}
}
Console.WriteLine("Sum = " + total);
return total; //remove not all code paths return.. error
}
static void ExitProgram() {
Console.Write("\n\nPress any key to exit program: ");
Console.ReadKey();
}//end ExitProgram
}
}
I have placed arrows in the code above to show where my issue is. If I try to declare "high" and "low" as global variables, my final answer is always a few numbers off, buy if I leave the variables declared as "high = scores[0]" etc, I will get the right answer.
What I want ideally is to have separate methods for each step of the calculation, so right now I have method for finding the number of times a specific value shows up in the array. The next I would like to do is finding the highest/lowest value in the array, one method would do the final calculation, and the final one would write the results into the console window. The last two parts (finding the high/low and final calculation) are currently in the find score method.
Any help would be greatly appreciated. Thanks.

How to: Assign a unique number to every entry in a list?

What I want is to make tiles. These tiles (about 30 of them) should have a fixed position in the game, but each time I load the game they should have random numbers that should affect their graphical appearance.
I know how to use the Random method to give a single tile a number to change its appearance, but I'm clueless on how I would use the Random method if I were to make a list storing the position of multiple tiles. How can you assign each entry in a list a unique random number?
I need this for my game where you're in a flat 2D map, generated with random types of rooms (treasure rooms, arena rooms etc.) that you are to explore.
Take a look at the Fisher-Yates shuffle. It's super easy to use and should work well for you, if I read your question right.
Make an array of 30 consecutive numbers, mirroring your array of tiles. Then pick an array-shuffling solution you like from, say, here for instance:
http://forums.asp.net/t/1778021.aspx/1
Then tile[23]'s number will be numberArray[23].
if you have something like this:
public class Tile
{
public int Number {get;set;}
...
}
you can do it like this:
var numbers = Enumerable
.Range(1, tilesList.Count) // generates list of sequential numbers
.OrderBy(x => Guid.NewGuid()) // shuffles the list
.ToList();
for (int i = 0; i < tiles.Count; i++)
{
tile[i].Number = numbers[i];
}
I know, that Guid is not a Random alternative, but it should fit this scenario.
Update: As long as answer was downvoted, I've wrote simple test, to check if Guids are not usable for shuffling an array:
var larger = 0;
var smaller = 0;
var start = DateTime.Now;
var guid = Guid.NewGuid();
for (int i = 0; i < 10000000; i++)
{
var nextGuid = Guid.NewGuid();
if (nextGuid.CompareTo(guid) < 0)
{
larger++;
}
else
{
smaller++;
}
guid = nextGuid;
}
Console.WriteLine("larger: {0}", larger);
Console.WriteLine("smaller: {0}", smaller);
Console.WriteLine("took seconds: {0}", DateTime.Now - start);
Console.ReadKey();
What it does, it counts how many times next guid is smaller than current and how many times is larger. In perfect case, there should be equal number of larger and smaller next guids, which would indicate, that those two events (current guid and next guid) are independent. Also measured time, just to make sure, that it is not too slow.
And got following result (with 10 million guids):
larger: 5000168
smaller: 4999832
took seconds: 00:00:01.1980686
Another test is direct compare of Fisher-Yates and Guid shuffling:
static void Main(string[] args)
{
var numbers = Enumerable.Range(1, 7).ToArray();
var originalNumbers = numbers.OrderBy(x => Guid.NewGuid()).ToList();
var foundAfterListUsingGuid = new List<int>();
var foundAfterListUsingShuffle = new List<int>();
for (int i = 0; i < 100; i++)
{
var foundAfter = 0;
while (!originalNumbers.SequenceEqual(numbers.OrderBy(x => Guid.NewGuid())))
{
foundAfter++;
}
foundAfterListUsingGuid.Add(foundAfter);
foundAfter = 0;
var shuffledNumbers = Enumerable.Range(1, 7).ToArray();
while (!originalNumbers.SequenceEqual(shuffledNumbers))
{
foundAfter++;
Shuffle(shuffledNumbers);
}
foundAfterListUsingShuffle.Add(foundAfter);
}
Console.WriteLine("Average matching order (Guid): {0}", foundAfterListUsingGuid.Average());
Console.WriteLine("Average matching order (Shuffle): {0}", foundAfterListUsingShuffle.Average());
Console.ReadKey();
}
static Random _random = new Random();
public static void Shuffle<T>(T[] array)
{
var random = _random;
for (int i = array.Length; i > 1; i--)
{
// Pick random element to swap.
int j = random.Next(i); // 0 <= j <= i-1
// Swap.
T tmp = array[j];
array[j] = array[i - 1];
array[i - 1] = tmp;
}
}
By "direct compare" I mean, that I'm producing shuffled sequence and try to shuffle again to get same sequence, and assume, that the more tries I need to produce same sequence, the better random is (which is not necessary mathematically correct assumption, I think it is oversimplification).
So results for small set with 1000 iterations to reduce error, was:
Average matching order (Guid): 5015.097
Average matching order (Shuffle): 4969.424
So, Guid performed event better, if my metric is correct :)
with 10000 iterations they came closer:
Average matching order (Guid): 5079.9283
Average matching order (Shuffle): 4940.749
So in my opinion, for current usage (shuffle room number in game), guids are suitable solution.

What's wrong with my implementation of the KMP algorithm?

static void Main(string[] args)
{
string str = "ABC ABCDAB ABCDABCDABDE";//We should add some text here for
//the performance tests.
string pattern = "ABCDABD";
List<int> shifts = new List<int>();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
NaiveStringMatcher(shifts, str, pattern);
stopWatch.Stop();
Trace.WriteLine(String.Format("Naive string matcher {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
shifts.Clear();
stopWatch.Restart();
int[] pi = new int[pattern.Length];
Knuth_Morris_Pratt(shifts, str, pattern, pi);
stopWatch.Stop();
Trace.WriteLine(String.Format("Knuth_Morris_Pratt {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
Console.ReadKey();
}
static IList<int> NaiveStringMatcher(List<int> shifts, string text, string pattern)
{
int lengthText = text.Length;
int lengthPattern = pattern.Length;
for (int s = 0; s < lengthText - lengthPattern + 1; s++ )
{
if (text[s] == pattern[0])
{
int i = 0;
while (i < lengthPattern)
{
if (text[s + i] == pattern[i])
i++;
else break;
}
if (i == lengthPattern)
{
shifts.Add(s);
}
}
}
return shifts;
}
static IList<int> Knuth_Morris_Pratt(List<int> shifts, string text, string pattern, int[] pi)
{
int patternLength = pattern.Length;
int textLength = text.Length;
//ComputePrefixFunction(pattern, pi);
int j;
for (int i = 1; i < pi.Length; i++)
{
j = 0;
while ((i < pi.Length) && (pattern[i] == pattern[j]))
{
j++;
pi[i++] = j;
}
}
int matchedSymNum = 0;
for (int i = 0; i < textLength; i++)
{
while (matchedSymNum > 0 && pattern[matchedSymNum] != text[i])
matchedSymNum = pi[matchedSymNum - 1];
if (pattern[matchedSymNum] == text[i])
matchedSymNum++;
if (matchedSymNum == patternLength)
{
shifts.Add(i - patternLength + 1);
matchedSymNum = pi[matchedSymNum - 1];
}
}
return shifts;
}
Why does my implemention of the KMP algorithm work slower than the Naive String Matching algorithm?
The KMP algorithm has two phases: first it builds a table, and then it does a search, directed by the contents of the table.
The naive algorithm has one phase: it does a search. It does that search much less efficiently in the worst case than the KMP search phase.
If the KMP is slower than the naive algorithm then that is probably because building the table is taking you longer than it takes to simply search the string naively in the first place. Naive string matching is usually very fast on short strings. There is a reason why we don't use fancy-pants algorithms like KMP inside the BCL implementations of string searching. By the time you set up the table, you could have done half a dozen searches of short strings with the naive algorithm.
KMP is only a win if you have enormous strings and you are doing lots of searches that allow you to re-use an already-built table. You need to amortize away the huge cost of building the table by doing lots of searches using that table.
And also, the naive algorithm only has bad performance in bizarre and unlikely scenarios. Most people are searching for words like "London" in strings like "Buckingham Palace, London, England", and not searching for strings like "BANANANANANANA" in strings like "BANAN BANBAN BANBANANA BANAN BANAN BANANAN BANANANANANANANANAN...". The naive search algorithm is optimal for the first problem and highly sub-optimal for the latter problem; but it makes sense to optimize for the former, not the latter.
Another way to put it: if the searched-for string is of length w and the searched-in string is of length n, then KMP is O(n) + O(w). The Naive algorithm is worst case O(nw), best case O(n + w). But that says nothing about the "constant factor"! The constant factor of the KMP algorithm is much larger than the constant factor of the naive algorithm. The value of n has to be awfully big, and the number of sub-optimal partial matches has to be awfully large, for the KMP algorithm to win over the blazingly fast naive algorithm.
That deals with the algorithmic complexity issues. Your methodology is also not very good, and that might explain your results. Remember, the first time you run code, the jitter has to jit the IL into assembly code. That can take longer than running the method in some cases. You really should be running the code a few hundred thousand times in a loop, discarding the first result, and taking an average of the timings of the rest.
If you really want to know what is going on you should be using a profiler to determine what the hot spot is. Again, make sure you are measuring the post-jit run, not the run where the code is jitted, if you want to have results that are not skewed by the jit time.
Your example is too small and it does not have enough repetitions of the pattern where KMP avoids backtracking.
KMP can be slower than the normal search in some cases.
A Simple KMPSubstringSearch Implementation.
https://github.com/bharathkumarms/AlgorithmsMadeEasy/blob/master/AlgorithmsMadeEasy/KMPSubstringSearch.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace AlgorithmsMadeEasy
{
class KMPSubstringSearch
{
public void KMPSubstringSearchMethod()
{
string text = System.Console.ReadLine();
char[] sText = text.ToCharArray();
string pattern = System.Console.ReadLine();
char[] sPattern = pattern.ToCharArray();
int forwardPointer = 1;
int backwardPointer = 0;
int[] tempStorage = new int[sPattern.Length];
tempStorage[0] = 0;
while (forwardPointer < sPattern.Length)
{
if (sPattern[forwardPointer].Equals(sPattern[backwardPointer]))
{
tempStorage[forwardPointer] = backwardPointer + 1;
forwardPointer++;
backwardPointer++;
}
else
{
if (backwardPointer == 0)
{
tempStorage[forwardPointer] = 0;
forwardPointer++;
}
else
{
int temp = tempStorage[backwardPointer];
backwardPointer = temp;
}
}
}
int pointer = 0;
int successPoints = sPattern.Length;
bool success = false;
for (int i = 0; i < sText.Length; i++)
{
if (sText[i].Equals(sPattern[pointer]))
{
pointer++;
}
else
{
if (pointer != 0)
{
int tempPointer = pointer - 1;
pointer = tempStorage[tempPointer];
i--;
}
}
if (successPoints == pointer)
{
success = true;
}
}
if (success)
{
System.Console.WriteLine("TRUE");
}
else
{
System.Console.WriteLine("FALSE");
}
System.Console.Read();
}
}
}
/*
* Sample Input
abxabcabcaby
abcaby
*/

Categories

Resources