//the word skill it's a substring for two string i want to compare based it
string first = "skill.Name";
string second = "jobskillRelation";
first.Contains(second);
You can use Longest Common Substring code provided here, the C# version is like this:
public static string lcs(string a, string b)
{
var lengths = new int[a.Length, b.Length];
int greatestLength = 0;
string output = "";
for (int i = 0; i < a.Length; i++)
{
for (int j = 0; j < b.Length; j++)
{
if (a[i] == b[j])
{
lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1;
if (lengths[i, j] > greatestLength)
{
greatestLength = lengths[i, j];
output = a.Substring(i - greatestLength + 1, greatestLength);
}
}
else
{
lengths[i, j] = 0;
}
}
}
return output;
}
so the usage will be:
var LCS = lcs(first,second)
If you want to compare two string to see if both contain a certain keyword, this may help.
Boolean compare(string first, string second, string keyword)
{
if (first.Contains(keyword) && second.Contains(keyword))
return true;
return false;
}
Related
This is what I need to do:
Create a function that receives a text string, and a search string, and returns how many times the search string appears in the string, as a subsequence of its letters in order.
For example, if you receive the word "Hhoola" and the substring "hola", the answer would be 4, because you could take the first H with the first O (and with the L and with the A), the first H with the second O, the second H with the first O, or the second H with the second O. If you receive "hobla", the answer would be 1. If you receive "ohla", the answer would be 0, because after the H there is no O to complete the sequence in order.
This is what i got so far:
int count = 0;
void Function(string text, string subText)
{
for (int i = 0; i < text.Length; i++)
{
if (text[i] == subText[0])
{
for (int j = 0; j < subText.Length; j++)
{
if (text[i + j] != subText[j])
{
break;
}
if (j == subText.Length - 1)
{
count++;
}
}
}
}
}
string text = Console.ReadLine().ToLower();
string subText = Console.ReadLine().ToLower();
ReceibeText(text, subText);
The code should look like this. Code doesn't work but is close.
public class SubSequences
{
string input = "";
string word = "";
int count = 0;
public void FindMatches(string input, string word)
{
this.input = input;
this.word = word;
FindMatchesRecursive(0, 0);
}
public void FindMatchesRecursive(int inputIndex, int wordIndex)
{
for (int i = inputIndex; i < input.Length - word.Length; i++ )
{
for (int j = wordIndex; j < input.Length - word.Length; j++)
{
if (word.Substring(i) == input.Substring(j))
{
if (j == word.Length)
{
FindMatchesRecursive(i + 1, j + 1);
}
else
{
Console.WriteLine("Word Matches");
}
}
}
}
}
I have to find subtext in text without using builtin function of string.
public static void Main(string[] args)
{
string subtext = "polly";
string text = "polly put the katle on,polly put the katle on,polly put the katle on,we all have tea";
int i, j, found;
int strLen, wordLen;
strLen = text.Length;
wordLen = subtext.Length;
for (i = 0; i < strLen - wordLen; i++)
{
found = 1;
for (j = 0; j < wordLen; j++)
{
if (text[i + j] != subtext[j])
{
found = 0;
break;
}
}
if (found == 1)
{
Console.WriteLine(" found at index:", subtext, i);
Console.ReadLine();
}
}
}
I am not sure how long you would like to search, your current code seems to find all indexes (or at least that seems to be the intent)
Some things you could change however is instead of always starting the loop, you could validate the if the char at position i matches the first char of the subtext, and if not continue.
When you want to write the data to the console, don't forget to add the spaceholders for your arguments, like:
Console.WriteLine("found {0} at index: {1}", subtext, i);
For the rest, I guess your current implementation is okay, but you could add some validations, like ensuring that both texts are available, and if subtext is longer than the text, simply return -1 directly.
For a simple find of first index, I wrote this one up, it still looks pretty similar to yours
private static int FindIn( string text, string sub ) {
if (string.IsNullOrWhiteSpace( text ) || string.IsNullOrWhiteSpace( sub ) ) {
return string.IsNullOrWhiteSpace( sub ) ? 0 : -1;
}
if (text.Length < sub.Length) {
return -1;
}
for (int i = 0; i < text.Length - sub.Length; i++) {
if (text[i] != sub[0]) {
continue;
}
var matched = true;
for (int j = 1; j < sub.Length && i + j < text.Length; j++) {
if (text[i+j] != sub[j]) {
matched = false;
break;
}
}
if (matched) {
return i;
}
}
return -1;
}
Which you can play around with here
There are a lot of pattern-matching algorithms in this book, i will leave here c# implementation of Knuth-Morris-Pratt algorithm.
static int[] GetPrefix(string s)
{
int[] result = new int[s.Length];
result[0] = 0;
int index = 0;
for (int i = 1; i < s.Length; i++)
{
while (index >= 0 && s[index] != s[i]) { index--; }
index++;
result[i] = index;
}
return result;
}
static int FindSubstring(string pattern, string text)
{
int res = -1;
int[] pf = GetPrefix(pattern);
int index = 0;
for (int i = 0; i < text.Length; i++)
{
while (index > 0 && pattern[index] != text[i]) { index = pf[index - 1]; }
if (pattern[index] == text[i]) index++;
if (index == pattern.Length)
{
return res = i - index + 1;
}
}
return res;
}
If you are looking for all occurance of the subtect in the text you can use the following code:
public static void Main(string[] args)
{
string subtext = "polly";
string text = "polly put the katle on,polly put the katle on,polly put the katle on,we all have tea";
int index = 0;
int startPosition = 0;
bool found = false;
while (index < text.Length - 1)
{
if (subtext[0] == text[index])
{
startPosition = index;
index++;
for (int j = 1; j <= subtext.Length - 1; j++)
{
if (subtext[j] != text[index])
{
found = false;
break;
}
else
{
found = true;
}
index++;
}
}
if (found)
{
Console.WriteLine("{0} found at index: {1}", subtext, startPosition);
found = false;
}
index++;
}
Console.ReadLine();
}
If you are looking only for the first occurance add break in the "if (found)" condition
I'm facing a problem I don't even know what to search in Google/Stack Overflow.
So comment if you feel the need for further explanation, questions.
Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.
Example:
I have two strings, that I convert to a CharArray.
I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).
As you can see the first string contains E15 (in that specific order), and so does the seconds one.
So these two strings will return : { 'E', '1', '5' }
string s1 = "E15QD(A)";
string s2 = "NHE15H";
The problem I am facing is that if i replace "s2" with:
string s2 = "NQE18H" // Will return {'Q', 'E', '1' }
My operation will return : {'Q', 'E', '1' }
The result should be : {'E', '1' } because Q don't follow the letter 1
Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.
Current code:
List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();
var result = cA1.Where(x => cA2.Contains(x)).ToList();
Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.
This is a "longest common substring" problem.
You can use this extension to get all substrings lazily:
public static class StringExtensions
{
public static IEnumerable<string> GetSubstrings(this string str)
{
if (string.IsNullOrEmpty(str))
throw new ArgumentException("str must not be null or empty", "str");
for (int c = 0; c < str.Length - 1; c++)
{
for (int cc = 1; c + cc <= str.Length; cc++)
{
yield return str.Substring(c, cc);
}
}
}
}
Then it's easy and readable with this LINQ query:
string longestIntersection = "E15QD(A)".GetSubstrings()
.Intersect("NQE18H".GetSubstrings())
.OrderByDescending(s => s.Length)
.FirstOrDefault(); // E1
Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:
longString.GetSubstrings().Intersect(shortString.GetSubstrings())
I think this should do it:
string similar = null;
for (int i = 0; i < s1.Length; i++)
{
string s = s1.Substring(0, i + 1);
if (s2.Contains(s))
{
similar = s;
}
}
char[] result = similar.ToCharArray();
#TimSchmelter provided the link to this answer in the comments of the original post.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}
Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}
Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.
As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks
Hello I am trying to write a C# version of KMP search from Algorithms in C book.
Having trouble finding the flaw in my algorithm. Would someone help?
static int KMP(string p, string str) {
int m = p.Length;
int n = str.Length;
int i;
int j;
int[] next = new int[m];
next[0] = -1;
for (i = 0, j = -1; i < m; i++, j++, next[i] = j) {
//Getting index out of bounds
while (j > 0 && p[i] != p[j]) j = next[j];
}
for (i = 0, j = 0; i < n && j < m; i++, j++) {
while (j >= 0 && p[j] != str[i]) j = next[j];
if (j == m) return i - m;
}
return -1;
}
The simple answer is in the first loop i++ is settling before next[i] = j so on the last character of the search string its trying to set next[m+1] to j - which causes an index out of bounds exception. Try changing the order:
for (i = 0, j = -1; i < m; next[i] = j, i++, j++)
More fundamentally, try breaking the implementation into testable parts. For example, you can extract a testable method for the first loop as it is building the computed table for the search word. Start with:
public int[] BuildTable(string word)
{
// todo
}
and some NUnit tests based on the wiki description
[Test]
public void Should_get_computed_table_0_0_0_0_1_2_given_ABCDABD()
{
const string input = "ABCDABD";
var result = BuildTable(input);
result.Length.ShouldBeEqualTo(input.Length);
result[0].ShouldBeEqualTo(-1);
result[1].ShouldBeEqualTo(0);
result[2].ShouldBeEqualTo(0);
result[3].ShouldBeEqualTo(0);
result[4].ShouldBeEqualTo(0);
result[5].ShouldBeEqualTo(1);
result[6].ShouldBeEqualTo(2);
}
[Test]
public void Should_get_computed_table_0_1_2_3_4_5_given_AAAAAAA()
{
const string input = "AAAAAAA";
var result = BuildTable(input);
result.Length.ShouldBeEqualTo(input.Length);
result[0].ShouldBeEqualTo(-1);
result[1].ShouldBeEqualTo(0);
result[2].ShouldBeEqualTo(1);
result[3].ShouldBeEqualTo(2);
result[4].ShouldBeEqualTo(3);
result[5].ShouldBeEqualTo(4);
result[6].ShouldBeEqualTo(5);
}
Next write one or more tests for the KMP method.
[Test]
public void Should_get_15_given_text_ABC_ABCDAB_ABCDABCDABDE_and_word_ABCDABD()
{
const string text = "ABC ABCDAB ABCDABCDABDE";
const string word = "ABCDABD";
int location = KMP(word, text);
location.ShouldBeEqualTo(15);
}
Then implement using the structure used on the wiki description of the algorithm and it should come together for you.
public int KMP(string word, string textToBeSearched)
{
var table = BuildTable(word);
// rest of algorithm
}