Longest Common Subsequence

Longest Common Subsequence - c#

Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}

Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.

As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks

Related

Compare chars in two strings of different length in C#

In C#, Im trying to compare two strings and find out how many chars are different.
I Tried this:
static void Main(String[] args)
{
var strOne = "abcd";
var strTwo = "bcd";
var arrayOne = strOne.ToCharArray();
var arrayTwo = strTwo.ToCharArray();
var differentChars = arrayOne.Except(arrayTwo);
foreach (var character in differentChars)
Console.WriteLine(character); //Will print a
}
but there are problems with this:
if the strings contain same chars but on different positions within the string
it removes the duplicate occurences
If the strings would be the same length I would compare the chars one by one but if one is bigger than the other the positions are different

You may want to have a look at the Leventshtein Distance Algorithm.
As the article says
the Levenshtein distance between two words is the minimum number of
single-character edits (insertions, deletions or substitutions)
required to change one word into the other
You can find many reference implementations in the internet, like here or here.
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}
}

How about this? Append two strings into one and group them, you will have the count of the chars, > 1 will give the repeatings and = 0 will give the unique ones.
var strOne = "abcd";
var strTwo = "bcd";
var arrayOne = strOne.Concat(strTwo).GroupBy(x => x).Select(x => new { Key = x.Key, Count = x.Count() });
foreach (var character in arrayOne) {
if (character.Count > 1)
{
Console.WriteLine(character.Key); // the repeating chars
}
}
If the same character appears twice in the same string,
var strOne = "abbcdd";
var strTwo = "cd";
var arrayTwo = strOne.Select(x => new { Key = x, IsExists = strTwo.Any(y => y == x) });
foreach (var character in arrayTwo) {
if (character.IsExists)
{
Console.WriteLine(character.Key);
}
}

How I can check if substring contain another substring c#

//the word skill it's a substring for two string i want to compare based it
string first = "skill.Name";
string second = "jobskillRelation";
first.Contains(second);

You can use Longest Common Substring code provided here, the C# version is like this:
public static string lcs(string a, string b)
{
var lengths = new int[a.Length, b.Length];
int greatestLength = 0;
string output = "";
for (int i = 0; i < a.Length; i++)
{
for (int j = 0; j < b.Length; j++)
{
if (a[i] == b[j])
{
lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1;
if (lengths[i, j] > greatestLength)
{
greatestLength = lengths[i, j];
output = a.Substring(i - greatestLength + 1, greatestLength);
}
}
else
{
lengths[i, j] = 0;
}
}
}
return output;
}
so the usage will be:
var LCS = lcs(first,second)

If you want to compare two string to see if both contain a certain keyword, this may help.
Boolean compare(string first, string second, string keyword)
{
if (first.Contains(keyword) && second.Contains(keyword))
return true;
return false;
}

Intersect two lists and return the similarity with the preserved order of the original first string value

I'm facing a problem I don't even know what to search in Google/Stack Overflow.
So comment if you feel the need for further explanation, questions.
Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.
Example:
I have two strings, that I convert to a CharArray.
I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).
As you can see the first string contains E15 (in that specific order), and so does the seconds one.
So these two strings will return : { 'E', '1', '5' }
string s1 = "E15QD(A)";
string s2 = "NHE15H";
The problem I am facing is that if i replace "s2" with:
string s2 = "NQE18H" // Will return {'Q', 'E', '1' }
My operation will return : {'Q', 'E', '1' }
The result should be : {'E', '1' } because Q don't follow the letter 1
Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.
Current code:
List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();
var result = cA1.Where(x => cA2.Contains(x)).ToList();
Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.

This is a "longest common substring" problem.
You can use this extension to get all substrings lazily:
public static class StringExtensions
{
public static IEnumerable<string> GetSubstrings(this string str)
{
if (string.IsNullOrEmpty(str))
throw new ArgumentException("str must not be null or empty", "str");
for (int c = 0; c < str.Length - 1; c++)
{
for (int cc = 1; c + cc <= str.Length; cc++)
{
yield return str.Substring(c, cc);
}
}
}
}
Then it's easy and readable with this LINQ query:
string longestIntersection = "E15QD(A)".GetSubstrings()
.Intersect("NQE18H".GetSubstrings())
.OrderByDescending(s => s.Length)
.FirstOrDefault(); // E1
Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:
longString.GetSubstrings().Intersect(shortString.GetSubstrings())

I think this should do it:
string similar = null;
for (int i = 0; i < s1.Length; i++)
{
string s = s1.Substring(0, i + 1);
if (s2.Contains(s))
{
similar = s;
}
}
char[] result = similar.ToCharArray();

#TimSchmelter provided the link to this answer in the comments of the original post.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}

How do I find intersections of consecutive characters in strings?

Suppose I had the following strings:
blahFOOblahblah
blahblahBARblah
FIZZblahblahblah
Now, I want to interrogate each of these to discover which of them contain any substring of the following:
FIZZbuzz
Obviously, this string shares the word "FIZZ" in common with #3.
I have already taken a look at this post, which doesn't quite accomplish what I am after since it simply focuses on characters (in any order) rather than substrings.

Are you looking for something like longest common substring?
There are fast, but quite complicated algorithms which solve the task by building and using suffix trees. They have O(n) time for constant size alphabet, and O(n log(n)) time in the worst case, where n is the maximal length of strings.
Below is a possible C# implementation (from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring). It is not optimal, but could be sufficient in our case.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}

Latest Codility Time Problems

Just got done with the latest Codility, passed it, but didnt get 100% on it
Here is the spec
A prefix of a string S is any leading contiguous part of S. For example, "c" and "cod" are prefixes of the string "codility". For simplicity, we require prefixes to be non-empty.
The product of prefix P of string S is the number of occurrences of P multiplied by the length of P. More precisely, if prefix P consists of K characters and P occurs exactly T times in S, then the product equals K * T.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
The longest prefix is identical to the original string. The goal is to choose such a prefix as maximizes the value of the product. In above example the maximal product is 10.
In this problem we consider only strings that consist of lower-case English letters (a−z).
So basically, it is a string traverse problem. I was able to pass all the validation parts, but I lost on the time. Here is what I wrote
int Solution(string S)
{
int finalCount = 0;
for (int i = 0; i <= S.Length - 1; i++)
{
string prefix = S.Substring(0, i + 1);
int count = 0;
for (int j = 0; j <= S.Length - 1; j++)
{
if (prefix.Length + j <= S.Length)
{
string newStr = S.Substring(j, prefix.Length);
if (newStr == prefix)
{
count++;
}
}
if (j == S.Length - 1)
{
int product = count * prefix.Length;
if (product > finalCount)
{
finalCount = product;
}
}
}
}
return finalCount;
}
I know that the nested loop is killing me, but I cannot think of a way to traverse the "sections" of the string without adding the other loop.
Any help would be appreciated.

Naive brute-force solution takes O(N ** 3) time. Choose length from 1 to N, get a prefix of its length and count occurences by brute-force searching. Сhoosing length takes O(N) time and brute force takes O(N ** 2) time, totally O(N ** 3).
If you use KMP or Z-algo, you can find occurences in O(N) time, so the whole solution will be O(N ** 2) time.
And you can precalc occurences, so it will take O(N) + O(N) = O(N) time solution.
vector<int> z_function(string &S); //z-function, z[0] = S.length()
vector<int> z = z_function(S);
//cnt[i] - count of i-length prefix occurrences of S
for (int i = 0; i < n; ++i)
++cnt[z[i]];
//if cnt[i] is in S, cnt[i - 1] will be in S
int previous = 0;
for (int i = n; i > 0; --i) {
cnt[i] += previous;
previous = cnt[i];
}
Here's the blog post, explaining all O(N ** 3), O(N ** 2), O(N) solutions.

my effort was as follows trying to eliminate unnecessary string compares, i read isaacs blog but it is in c how would this translate to c#, i even went as far as using arrays everywhere to avoid the string immutability factor but there was no improvement
int Rtn = 0;
int len = S.Length;
if (len > 1000000000)
return 1000000000;
for (int i = 1; i <= len; i++)
{
string tofind = S.Substring(0, i);
int tofindlen = tofind.Length;
int occurencies = 0;
for (int ii = 0; ii < len; ii++)
{
int found = FastIndexOf(S, tofind.ToCharArray(), ii);
if (found != -1)
{
if ((found == 0 && tofindlen != 1) || (found >= 0))
{
ii = found;
}
++occurencies ;
}
}
if (occurencies > 0)
{
int total = occurencies * tofindlen;
if (total > Rtn)
{
Rtn = total;
}
}
}
return Rtn;
}
static int FastIndexOf(string source, char[] pattern, int start)
{
if (pattern.Length == 0) return 0;
if (pattern.Length == 1) return source.IndexOf(pattern[0], start);
bool found;
int limit = source.Length - pattern.Length + 1 - start;
if (limit < 1) return -1;
char c0 = pattern[0];
char c1 = pattern[1];
// Find the first occurrence of the first character
int first = source.IndexOf(c0, start, limit);
while ((first != -1) && (first + pattern.Length <= source.Length))
{
if (source[first + 1] != c1)
{
first = source.IndexOf(c0, ++first);
continue;
}
found = true;
for (int j = 2; j < pattern.Length; j++)
if (source[first + j] != pattern[j])
{
found = false;
break;
}
if (found) return first;
first = source.IndexOf(c0, ++first);
}
return -1;
}

I only got a 43... I like my code though! Same script in javascript got 56 for what it's worth.
using System;
class Solution
{
public int solution(string S)
{
int highestCount = 0;
for (var i = S.Length; i > 0; i--)
{
int occurs = 0;
string prefix = S.Substring(0, i);
int tempIndex = S.IndexOf(prefix) + 1;
string tempString = S;
while (tempIndex > 0)
{
tempString = tempString.Substring(tempIndex);
tempIndex = tempString.IndexOf(prefix);
tempIndex++;
occurs++;
}
int product = occurs * prefix.Length;
if ((product) > highestCount)
{
if (highestCount > 1000000000)
return 1000000000;
highestCount = product;
}
}
return highestCount;
}
}

This works better
private int MaxiumValueOfProduct(string input)
{
var positions = new List<int>();
int max = 0;
for (int i = 1; i <= input.Length; i++)
{
var subString = input.Substring(0, i);
int position = 0;
while ((position < input.Length) &&
(position = input.IndexOf(subString, position, StringComparison.OrdinalIgnoreCase)) != -1)
{
positions.Add(position);
++position;
}
var currentMax = subString.Length * positions.Count;
if (currentMax > max) max = currentMax;
positions.Clear();
}
return max;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Longest Common Subsequence - c#

Related

Compare chars in two strings of different length in C#

How I can check if substring contain another substring c#

Intersect two lists and return the similarity with the preserved order of the original first string value

How do I find intersections of consecutive characters in strings?

Latest Codility Time Problems

Categories

Resources