Help fix my KMP search algorithm

Help fix my KMP search algorithm - c#

Hello I am trying to write a C# version of KMP search from Algorithms in C book.
Having trouble finding the flaw in my algorithm. Would someone help?
static int KMP(string p, string str) {
int m = p.Length;
int n = str.Length;
int i;
int j;
int[] next = new int[m];
next[0] = -1;
for (i = 0, j = -1; i < m; i++, j++, next[i] = j) {
//Getting index out of bounds
while (j > 0 && p[i] != p[j]) j = next[j];
}
for (i = 0, j = 0; i < n && j < m; i++, j++) {
while (j >= 0 && p[j] != str[i]) j = next[j];
if (j == m) return i - m;
}
return -1;
}

The simple answer is in the first loop i++ is settling before next[i] = j so on the last character of the search string its trying to set next[m+1] to j - which causes an index out of bounds exception. Try changing the order:
for (i = 0, j = -1; i < m; next[i] = j, i++, j++)
More fundamentally, try breaking the implementation into testable parts. For example, you can extract a testable method for the first loop as it is building the computed table for the search word. Start with:
public int[] BuildTable(string word)
{
// todo
}
and some NUnit tests based on the wiki description
[Test]
public void Should_get_computed_table_0_0_0_0_1_2_given_ABCDABD()
{
const string input = "ABCDABD";
var result = BuildTable(input);
result.Length.ShouldBeEqualTo(input.Length);
result[0].ShouldBeEqualTo(-1);
result[1].ShouldBeEqualTo(0);
result[2].ShouldBeEqualTo(0);
result[3].ShouldBeEqualTo(0);
result[4].ShouldBeEqualTo(0);
result[5].ShouldBeEqualTo(1);
result[6].ShouldBeEqualTo(2);
}
[Test]
public void Should_get_computed_table_0_1_2_3_4_5_given_AAAAAAA()
{
const string input = "AAAAAAA";
var result = BuildTable(input);
result.Length.ShouldBeEqualTo(input.Length);
result[0].ShouldBeEqualTo(-1);
result[1].ShouldBeEqualTo(0);
result[2].ShouldBeEqualTo(1);
result[3].ShouldBeEqualTo(2);
result[4].ShouldBeEqualTo(3);
result[5].ShouldBeEqualTo(4);
result[6].ShouldBeEqualTo(5);
}
Next write one or more tests for the KMP method.
[Test]
public void Should_get_15_given_text_ABC_ABCDAB_ABCDABCDABDE_and_word_ABCDABD()
{
const string text = "ABC ABCDAB ABCDABCDABDE";
const string word = "ABCDABD";
int location = KMP(word, text);
location.ShouldBeEqualTo(15);
}
Then implement using the structure used on the wiki description of the algorithm and it should come together for you.
public int KMP(string word, string textToBeSearched)
{
var table = BuildTable(word);
// rest of algorithm
}

Related

I'm generating a list of 100 random "names", now I need to follow it up with 100 more names and 100 random numbers. C#

I'm making a program that generates the "names" (random lines of text from the ASCII) that are the names of movies in this instance. I should follow them up with a "name" of a director for each (can also be generated from the ASCII), and after that the random year that is the year the "movie" was made (from 1896 to 2021).
I have two separate functions that randomize the names of the movies and directors, but I'm confused with the supposed placement of the Console.Writeline which the intelligence only allows inside their own loops. Otherwise it doesn't seem to be able to use the values "directorname" and "moviename".
I need it to write the names in a single line, ai. (KHGTJ, KGHTJF).
Also I need a way to generate a random year from 1896 to 2021 that is printed after the names of the movie, and director, ai. (KFJU, MDDOS, 1922).
private static void GenerateRandomNames()
{
Random random = new Random();
char y = (char)65;
for (int p = 0; p < 100; p++)
{
string directorname = "";
for (int m = 0; m < 5; m++)
{
int b = random.Next(65, 90);
y = (char)b;
directorname += y;
}
Console.WriteLine(directorname);
}
Random rnd = new Random();
char x = (char)65;
for (int j = 0; j < 100; j++)
{
string moviename = "";
for (int i = 0; i < 5; i++)
{
int a = rnd.Next(65, 90);
x = (char)a;
moviename += x;
}
Console.WriteLine(moviename);
}
Console.WriteLine();
I need to fix the plecement of the Console.Writeline() so it can print both names in the same line, and be able to print the year after them.
I've tried placing the Console.Writeline() outside the loops, but of course it can't then use the name. But this way it prints them the wrong way.

If you want to have minimal changes in your code, you can use the following code:
private static void GenerateRandomNames()
{
//a separate thing for the names of the directors (ASCII)
// then for the years they were made (1896-2021)
//they should all be printed in the end ie. (KGMFK, JDBDJ, 1922)
Random rnd = new Random();
char x = (char)65;
for (int j = 0; j < 100; j++)
{
string directors = "";
string moviename = "";
for (int i = 0; i < 5; i++)
{
int a = rnd.Next(65, 90);
x = (char)a;
moviename += x;
}
for (int i = 0; i < 5; i++)
{
int a = rnd.Next(65, 90);
x = (char)a;
directors += x;
}
Console.WriteLine("( "+directors +", "+ moviename + ", " +rnd.Next(1896, 2021).ToString()+" )");
}
Console.WriteLine();
}
and result:

Not sure if it is good to answer this type of question, but answering it anyway.
Since you only want other 5-letter words and 4-digit numbers ranging from 1896 - 2021,
Just get another variable 'b' and do the same as you did for 'a', like :
int b = rnd.Next(65,90) ;
y = char(b) ;
director name += y ;
and to get the year value, you can use this :
year = rnd.Next(1896,2021)
So, by combining all of the above, you have the code like this :
internal class Program
{
private static void GenerateRandomNames()
{
Random rnd = new Random();
char x = (char)65;
char y = (char) 65 ;
for (int j = 0; j < 100; j++)
{
string moviename = "";
string directorName = "";
int year = rnd.Next(1896,2021);
for (int i = 0; i < 5; i++)
{
int a = rnd.Next(65, 90);
int b = rnd.Next(65, 90);
x = (char)a;
moviename += x;
y = (char)a;
directorName += x;
}
Console.WriteLine(moviename);
Console.WriteLine(directorName);
Console.WriteLine(year);
}
Console.WriteLine();
}
static void Main(string[] args)
{
GenerateRandomNames();
}
}

The task becomes easier if you extract the creation of a random name to a new method. This allows you to call it twice easily. I moved the random object to the class (making it a class field), so that it can be reused in different places.
internal class Program
{
private static readonly Random _rnd = new Random();
private static string CreateRandomName(int minLength, int maxLength)
{
string name = "";
for (int i = 0; i < _rnd.Next(minLength, maxLength + 1); i++)
{
char c = (char)_rnd.Next((int)'A', (int)'Z' + 1);
name += c;
}
return name;
}
private static void WriteRandomNames()
{
for (int i = 0; i < 100; i++)
{
string movie = CreateRandomName(4, 40);
string director = CreateRandomName(3, 30);
int year = _rnd.Next(1896, 2022);
Console.WriteLine($"{movie}, {director}, {year}");
}
Console.WriteLine();
}
static void Main(string[] args)
{
WriteRandomNames();
}
}
Note that the second parameter of the Next(Int32, Int32) method is the exclusive upper bound. Therefore I added 1.
output:
HATRHKYAHQTGS, NCPQ, 1999
QVJAYOTTISN, LJTGJDMB, 2018
JEXJDICLRMZFRV, GJPZHFBHOTR, 1932
SKFINIGVYUIIVBD, DIZSKOS, 1958
LWWGSEIZT, AMDW, 1950
OAVZVQVFPPBY, SPEZZE, 2008
YLNTZZIXOCNENGYUL, URNJMK, 1962
ONIN, WUITIL, 1987
RJUXGORWDVQRILDWWKSDWF, MOEYPZQPV, 1946
YUQSSOPZTCTRM, UEPPXIVGERG, 1994
KILWEYC, QJZOTLKFMVPHUE, 1915

Wow, in the time it took me to write an answer, three or more others appeared. They all seem like pretty good answers to me, but since I went to the trouble of writing this code, here you go. :)
I focused on using the same Random in different ways, because I think that's what you were asking about.
using System;
using System.Collections.Generic;
using System.Linq;
Random rnd = new Random(1950);
GenerateRandomNames();
void GenerateRandomNames()
{
for (int j = 0; j < 100; j++)
{
// here's one way to get a random string
string name = Guid.NewGuid().ToString().Substring(0, 5);
string description = new string(GetRandomCharacters(rnd.Next(5,16)).ToArray());
string cleaner = new string(GetCleanerCharacters(rnd.Next(5, 16)).ToArray());
string preferred = new string(GetPreferredRandomCharacters(rnd.Next(5, 16)).ToArray());
int year = rnd.Next(1896, DateTime.Now.Year + 1);
Console.WriteLine($"{year}\t{preferred}");
Console.WriteLine($"{year}\t{cleaner}");
Console.WriteLine($"{year}\t{name}\t{description}");
Console.WriteLine();
}
Console.WriteLine();
}
// Not readable
IEnumerable<char> GetRandomCharacters(int length = 5)
{
for (int i = 0; i < length; i++)
{
yield return Convert.ToChar(rnd.Next(0, 255));
}
}
// gives you lots of spaces
IEnumerable<char> GetCleanerCharacters(int length = 5)
{
for (int i = 0; i < length; i++)
{
char c = Convert.ToChar(rnd.Next(0, 255));
if (char.IsLetter(c))
{
yield return c;
}
else
{
yield return ' ';
}
}
}
// Most readable (in my opinion), but still nonsense.
IEnumerable<char> GetPreferredRandomCharacters(int length = 5)
{
for (int i = 0; i < length; i++)
{
bool randomSpace = rnd.Next(0, 6) == 3;
if (i > 0 && randomSpace) // prevent it from starting with a space
{
yield return ' ';
continue;
}
var c = Convert.ToChar(rnd.Next(65, 91)); // uppercase letters
if (rnd.Next(0, 2) == 1)
{
c = char.ToLower(c);
}
yield return c;
}
}

Count Matching Subsequences

This is what I need to do:
Create a function that receives a text string, and a search string, and returns how many times the search string appears in the string, as a subsequence of its letters in order.
For example, if you receive the word "Hhoola" and the substring "hola", the answer would be 4, because you could take the first H with the first O (and with the L and with the A), the first H with the second O, the second H with the first O, or the second H with the second O. If you receive "hobla", the answer would be 1. If you receive "ohla", the answer would be 0, because after the H there is no O to complete the sequence in order.
This is what i got so far:
int count = 0;
void Function(string text, string subText)
{
for (int i = 0; i < text.Length; i++)
{
if (text[i] == subText[0])
{
for (int j = 0; j < subText.Length; j++)
{
if (text[i + j] != subText[j])
{
break;
}
if (j == subText.Length - 1)
{
count++;
}
}
}
}
}
string text = Console.ReadLine().ToLower();
string subText = Console.ReadLine().ToLower();
ReceibeText(text, subText);

The code should look like this. Code doesn't work but is close.
public class SubSequences
{
string input = "";
string word = "";
int count = 0;
public void FindMatches(string input, string word)
{
this.input = input;
this.word = word;
FindMatchesRecursive(0, 0);
}
public void FindMatchesRecursive(int inputIndex, int wordIndex)
{
for (int i = inputIndex; i < input.Length - word.Length; i++ )
{
for (int j = wordIndex; j < input.Length - word.Length; j++)
{
if (word.Substring(i) == input.Substring(j))
{
if (j == word.Length)
{
FindMatchesRecursive(i + 1, j + 1);
}
else
{
Console.WriteLine("Word Matches");
}
}
}
}
}

How I can check if substring contain another substring c#

//the word skill it's a substring for two string i want to compare based it
string first = "skill.Name";
string second = "jobskillRelation";
first.Contains(second);

You can use Longest Common Substring code provided here, the C# version is like this:
public static string lcs(string a, string b)
{
var lengths = new int[a.Length, b.Length];
int greatestLength = 0;
string output = "";
for (int i = 0; i < a.Length; i++)
{
for (int j = 0; j < b.Length; j++)
{
if (a[i] == b[j])
{
lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1;
if (lengths[i, j] > greatestLength)
{
greatestLength = lengths[i, j];
output = a.Substring(i - greatestLength + 1, greatestLength);
}
}
else
{
lengths[i, j] = 0;
}
}
}
return output;
}
so the usage will be:
var LCS = lcs(first,second)

If you want to compare two string to see if both contain a certain keyword, this may help.
Boolean compare(string first, string second, string keyword)
{
if (first.Contains(keyword) && second.Contains(keyword))
return true;
return false;
}

Latest Codility Time Problems

Just got done with the latest Codility, passed it, but didnt get 100% on it
Here is the spec
A prefix of a string S is any leading contiguous part of S. For example, "c" and "cod" are prefixes of the string "codility". For simplicity, we require prefixes to be non-empty.
The product of prefix P of string S is the number of occurrences of P multiplied by the length of P. More precisely, if prefix P consists of K characters and P occurs exactly T times in S, then the product equals K * T.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
The longest prefix is identical to the original string. The goal is to choose such a prefix as maximizes the value of the product. In above example the maximal product is 10.
In this problem we consider only strings that consist of lower-case English letters (a−z).
So basically, it is a string traverse problem. I was able to pass all the validation parts, but I lost on the time. Here is what I wrote
int Solution(string S)
{
int finalCount = 0;
for (int i = 0; i <= S.Length - 1; i++)
{
string prefix = S.Substring(0, i + 1);
int count = 0;
for (int j = 0; j <= S.Length - 1; j++)
{
if (prefix.Length + j <= S.Length)
{
string newStr = S.Substring(j, prefix.Length);
if (newStr == prefix)
{
count++;
}
}
if (j == S.Length - 1)
{
int product = count * prefix.Length;
if (product > finalCount)
{
finalCount = product;
}
}
}
}
return finalCount;
}
I know that the nested loop is killing me, but I cannot think of a way to traverse the "sections" of the string without adding the other loop.
Any help would be appreciated.

Naive brute-force solution takes O(N ** 3) time. Choose length from 1 to N, get a prefix of its length and count occurences by brute-force searching. Сhoosing length takes O(N) time and brute force takes O(N ** 2) time, totally O(N ** 3).
If you use KMP or Z-algo, you can find occurences in O(N) time, so the whole solution will be O(N ** 2) time.
And you can precalc occurences, so it will take O(N) + O(N) = O(N) time solution.
vector<int> z_function(string &S); //z-function, z[0] = S.length()
vector<int> z = z_function(S);
//cnt[i] - count of i-length prefix occurrences of S
for (int i = 0; i < n; ++i)
++cnt[z[i]];
//if cnt[i] is in S, cnt[i - 1] will be in S
int previous = 0;
for (int i = n; i > 0; --i) {
cnt[i] += previous;
previous = cnt[i];
}
Here's the blog post, explaining all O(N ** 3), O(N ** 2), O(N) solutions.

my effort was as follows trying to eliminate unnecessary string compares, i read isaacs blog but it is in c how would this translate to c#, i even went as far as using arrays everywhere to avoid the string immutability factor but there was no improvement
int Rtn = 0;
int len = S.Length;
if (len > 1000000000)
return 1000000000;
for (int i = 1; i <= len; i++)
{
string tofind = S.Substring(0, i);
int tofindlen = tofind.Length;
int occurencies = 0;
for (int ii = 0; ii < len; ii++)
{
int found = FastIndexOf(S, tofind.ToCharArray(), ii);
if (found != -1)
{
if ((found == 0 && tofindlen != 1) || (found >= 0))
{
ii = found;
}
++occurencies ;
}
}
if (occurencies > 0)
{
int total = occurencies * tofindlen;
if (total > Rtn)
{
Rtn = total;
}
}
}
return Rtn;
}
static int FastIndexOf(string source, char[] pattern, int start)
{
if (pattern.Length == 0) return 0;
if (pattern.Length == 1) return source.IndexOf(pattern[0], start);
bool found;
int limit = source.Length - pattern.Length + 1 - start;
if (limit < 1) return -1;
char c0 = pattern[0];
char c1 = pattern[1];
// Find the first occurrence of the first character
int first = source.IndexOf(c0, start, limit);
while ((first != -1) && (first + pattern.Length <= source.Length))
{
if (source[first + 1] != c1)
{
first = source.IndexOf(c0, ++first);
continue;
}
found = true;
for (int j = 2; j < pattern.Length; j++)
if (source[first + j] != pattern[j])
{
found = false;
break;
}
if (found) return first;
first = source.IndexOf(c0, ++first);
}
return -1;
}

I only got a 43... I like my code though! Same script in javascript got 56 for what it's worth.
using System;
class Solution
{
public int solution(string S)
{
int highestCount = 0;
for (var i = S.Length; i > 0; i--)
{
int occurs = 0;
string prefix = S.Substring(0, i);
int tempIndex = S.IndexOf(prefix) + 1;
string tempString = S;
while (tempIndex > 0)
{
tempString = tempString.Substring(tempIndex);
tempIndex = tempString.IndexOf(prefix);
tempIndex++;
occurs++;
}
int product = occurs * prefix.Length;
if ((product) > highestCount)
{
if (highestCount > 1000000000)
return 1000000000;
highestCount = product;
}
}
return highestCount;
}
}

This works better
private int MaxiumValueOfProduct(string input)
{
var positions = new List<int>();
int max = 0;
for (int i = 1; i <= input.Length; i++)
{
var subString = input.Substring(0, i);
int position = 0;
while ((position < input.Length) &&
(position = input.IndexOf(subString, position, StringComparison.OrdinalIgnoreCase)) != -1)
{
positions.Add(position);
++position;
}
var currentMax = subString.Length * positions.Count;
if (currentMax > max) max = currentMax;
positions.Clear();
}
return max;
}

Longest Common Subsequence

Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}

Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.

As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Help fix my KMP search algorithm - c#

Related

I'm generating a list of 100 random "names", now I need to follow it up with 100 more names and 100 random numbers. C#

Count Matching Subsequences

How I can check if substring contain another substring c#

Latest Codility Time Problems

Longest Common Subsequence

Categories

Resources