How do I find intersections of consecutive characters in strings?

How do I find intersections of consecutive characters in strings? - c#

Suppose I had the following strings:
blahFOOblahblah
blahblahBARblah
FIZZblahblahblah
Now, I want to interrogate each of these to discover which of them contain any substring of the following:
FIZZbuzz
Obviously, this string shares the word "FIZZ" in common with #3.
I have already taken a look at this post, which doesn't quite accomplish what I am after since it simply focuses on characters (in any order) rather than substrings.

Are you looking for something like longest common substring?
There are fast, but quite complicated algorithms which solve the task by building and using suffix trees. They have O(n) time for constant size alphabet, and O(n log(n)) time in the worst case, where n is the maximal length of strings.
Below is a possible C# implementation (from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring). It is not optimal, but could be sufficient in our case.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}

Related

How I can check if substring contain another substring c#

//the word skill it's a substring for two string i want to compare based it
string first = "skill.Name";
string second = "jobskillRelation";
first.Contains(second);

You can use Longest Common Substring code provided here, the C# version is like this:
public static string lcs(string a, string b)
{
var lengths = new int[a.Length, b.Length];
int greatestLength = 0;
string output = "";
for (int i = 0; i < a.Length; i++)
{
for (int j = 0; j < b.Length; j++)
{
if (a[i] == b[j])
{
lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1;
if (lengths[i, j] > greatestLength)
{
greatestLength = lengths[i, j];
output = a.Substring(i - greatestLength + 1, greatestLength);
}
}
else
{
lengths[i, j] = 0;
}
}
}
return output;
}
so the usage will be:
var LCS = lcs(first,second)

If you want to compare two string to see if both contain a certain keyword, this may help.
Boolean compare(string first, string second, string keyword)
{
if (first.Contains(keyword) && second.Contains(keyword))
return true;
return false;
}

Intersect two lists and return the similarity with the preserved order of the original first string value

I'm facing a problem I don't even know what to search in Google/Stack Overflow.
So comment if you feel the need for further explanation, questions.
Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.
Example:
I have two strings, that I convert to a CharArray.
I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).
As you can see the first string contains E15 (in that specific order), and so does the seconds one.
So these two strings will return : { 'E', '1', '5' }
string s1 = "E15QD(A)";
string s2 = "NHE15H";
The problem I am facing is that if i replace "s2" with:
string s2 = "NQE18H" // Will return {'Q', 'E', '1' }
My operation will return : {'Q', 'E', '1' }
The result should be : {'E', '1' } because Q don't follow the letter 1
Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.
Current code:
List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();
var result = cA1.Where(x => cA2.Contains(x)).ToList();
Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.

This is a "longest common substring" problem.
You can use this extension to get all substrings lazily:
public static class StringExtensions
{
public static IEnumerable<string> GetSubstrings(this string str)
{
if (string.IsNullOrEmpty(str))
throw new ArgumentException("str must not be null or empty", "str");
for (int c = 0; c < str.Length - 1; c++)
{
for (int cc = 1; c + cc <= str.Length; cc++)
{
yield return str.Substring(c, cc);
}
}
}
}
Then it's easy and readable with this LINQ query:
string longestIntersection = "E15QD(A)".GetSubstrings()
.Intersect("NQE18H".GetSubstrings())
.OrderByDescending(s => s.Length)
.FirstOrDefault(); // E1
Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:
longString.GetSubstrings().Intersect(shortString.GetSubstrings())

I think this should do it:
string similar = null;
for (int i = 0; i < s1.Length; i++)
{
string s = s1.Substring(0, i + 1);
if (s2.Contains(s))
{
similar = s;
}
}
char[] result = similar.ToCharArray();

#TimSchmelter provided the link to this answer in the comments of the original post.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}

indexOutofRange BubbleSort when using inputbox

Its been bugging me for hours because it is always returning 0 at numbers[i] and I cant figure out the problem. code worked for a different program but I had to change it so it could have a custom array size and that's when everything went wrong.
any help would be great.
Thanks in advance.
int[] numbers = new int[Convert.ToInt16(TxtArray.Text)];
int j = 0;
for (j = numbers.Length; j >= 0; j--)
{
int i = 0;
for (i = 0; i <= j - 1; i++)
{
string NumbersInput = Microsoft.VisualBasic.Interaction.InputBox("Enter Numbers to be sorted",
"Numbers Input", "", -1, -1);
numbers[i] = Convert.ToInt16(NumbersInput);
//returns 0 in if statement
if (numbers[i] < numbers[i + 1])
{
int intTemp = 0;
intTemp = numbers[i];
numbers[i] = numbers[i + 1];
numbers[i + 1] = intTemp;
}
}
}
for (int i = 0; i < numbers.Length; i++)
{
LstNumbers.Items.Add(numbers[i]);
}

private void button1_Click(object sender, EventArgs e)
{
int sizeOfArrayInt = Convert.ToInt32(arraySize.Text);
int[] array = new int[sizeOfArrayInt];
string numbers = arrayValues.Text;
string[] numbersSplit = numbers.Split(',');
int count = 0;
foreach (string character in numbersSplit)
{
int value;
bool parse = Int32.TryParse(character, out value);
if (value != null)
{
array[count] = value;
}
count++;
}
array = this.SortArray(array);
foreach (int item in array)
{
this.listBox.Items.Add(item);
}
}
private int[] SortArray(int[] arrayToSort)
{
//int[] sortedArray = new int[arrayToSort.Length];
int count = arrayToSort.Length;
for (int j = count; j >= 0; j--)
{
int i = 0;
for (i = 0; i <= j - 2; i++)
{
if (arrayToSort[i] < arrayToSort[i + 1])
{
int intTemp = 0;
intTemp = arrayToSort[i];
arrayToSort[i] = arrayToSort[i + 1];
arrayToSort[i + 1] = intTemp;
}
}
}
return arrayToSort;
}
strong text
This I got to work as a Windows Form and the output displays in the list box as each array item or individual i iteration over the array. Of course there is no error checking. Hope that helps.

Setting aside the strangeness of how you are working with text boxes, your problem with throwing an exception would happen even without them because it lies here, in your inner loop:
for (i = 0; i <= j - 1; i++)
Suppose that numbers.Length == 2. This means that j == 2. So on the first time through the outer loop, you hit the inner loop with these conditions. The first time through, i == 0. You get to the if statement:
if (numbers[i] < numbers[i + 1])
numbers[0] exists, and numbers[1] exists, so this iteration goes through fine and i is incremented.
Now i == 1. Now the loop checks its boundary condition. i <= j - 1 == true, so the loop continues. Now when you hit that if statement, it tries to access numbers[i + 1], i.e., numbers[2], which does not exist, throwing an IndexOutOfRangeException.
Edit: Came back and realized that I left out the solution (to the exception, anyway). For the bubble sort to work, your inner loop's boundary condition should be i <= j - 2, because j's initial value is == numbers.Length, which is not zero-based, while array indexes are.
Second Edit: Note that just using a List won't actually solve this problem. You have to use the right boundary condition. Trying to access list[list.Count()] will throw an ArgumentOutOfRangeException. Just because a List will dynamically resize doesn't mean that it will somehow let you access items that don't exist. You should always take time to check your boundary conditions, no matter what data structure you use.

Latest Codility Time Problems

Just got done with the latest Codility, passed it, but didnt get 100% on it
Here is the spec
A prefix of a string S is any leading contiguous part of S. For example, "c" and "cod" are prefixes of the string "codility". For simplicity, we require prefixes to be non-empty.
The product of prefix P of string S is the number of occurrences of P multiplied by the length of P. More precisely, if prefix P consists of K characters and P occurs exactly T times in S, then the product equals K * T.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
The longest prefix is identical to the original string. The goal is to choose such a prefix as maximizes the value of the product. In above example the maximal product is 10.
In this problem we consider only strings that consist of lower-case English letters (a−z).
So basically, it is a string traverse problem. I was able to pass all the validation parts, but I lost on the time. Here is what I wrote
int Solution(string S)
{
int finalCount = 0;
for (int i = 0; i <= S.Length - 1; i++)
{
string prefix = S.Substring(0, i + 1);
int count = 0;
for (int j = 0; j <= S.Length - 1; j++)
{
if (prefix.Length + j <= S.Length)
{
string newStr = S.Substring(j, prefix.Length);
if (newStr == prefix)
{
count++;
}
}
if (j == S.Length - 1)
{
int product = count * prefix.Length;
if (product > finalCount)
{
finalCount = product;
}
}
}
}
return finalCount;
}
I know that the nested loop is killing me, but I cannot think of a way to traverse the "sections" of the string without adding the other loop.
Any help would be appreciated.

Naive brute-force solution takes O(N ** 3) time. Choose length from 1 to N, get a prefix of its length and count occurences by brute-force searching. Сhoosing length takes O(N) time and brute force takes O(N ** 2) time, totally O(N ** 3).
If you use KMP or Z-algo, you can find occurences in O(N) time, so the whole solution will be O(N ** 2) time.
And you can precalc occurences, so it will take O(N) + O(N) = O(N) time solution.
vector<int> z_function(string &S); //z-function, z[0] = S.length()
vector<int> z = z_function(S);
//cnt[i] - count of i-length prefix occurrences of S
for (int i = 0; i < n; ++i)
++cnt[z[i]];
//if cnt[i] is in S, cnt[i - 1] will be in S
int previous = 0;
for (int i = n; i > 0; --i) {
cnt[i] += previous;
previous = cnt[i];
}
Here's the blog post, explaining all O(N ** 3), O(N ** 2), O(N) solutions.

my effort was as follows trying to eliminate unnecessary string compares, i read isaacs blog but it is in c how would this translate to c#, i even went as far as using arrays everywhere to avoid the string immutability factor but there was no improvement
int Rtn = 0;
int len = S.Length;
if (len > 1000000000)
return 1000000000;
for (int i = 1; i <= len; i++)
{
string tofind = S.Substring(0, i);
int tofindlen = tofind.Length;
int occurencies = 0;
for (int ii = 0; ii < len; ii++)
{
int found = FastIndexOf(S, tofind.ToCharArray(), ii);
if (found != -1)
{
if ((found == 0 && tofindlen != 1) || (found >= 0))
{
ii = found;
}
++occurencies ;
}
}
if (occurencies > 0)
{
int total = occurencies * tofindlen;
if (total > Rtn)
{
Rtn = total;
}
}
}
return Rtn;
}
static int FastIndexOf(string source, char[] pattern, int start)
{
if (pattern.Length == 0) return 0;
if (pattern.Length == 1) return source.IndexOf(pattern[0], start);
bool found;
int limit = source.Length - pattern.Length + 1 - start;
if (limit < 1) return -1;
char c0 = pattern[0];
char c1 = pattern[1];
// Find the first occurrence of the first character
int first = source.IndexOf(c0, start, limit);
while ((first != -1) && (first + pattern.Length <= source.Length))
{
if (source[first + 1] != c1)
{
first = source.IndexOf(c0, ++first);
continue;
}
found = true;
for (int j = 2; j < pattern.Length; j++)
if (source[first + j] != pattern[j])
{
found = false;
break;
}
if (found) return first;
first = source.IndexOf(c0, ++first);
}
return -1;
}

I only got a 43... I like my code though! Same script in javascript got 56 for what it's worth.
using System;
class Solution
{
public int solution(string S)
{
int highestCount = 0;
for (var i = S.Length; i > 0; i--)
{
int occurs = 0;
string prefix = S.Substring(0, i);
int tempIndex = S.IndexOf(prefix) + 1;
string tempString = S;
while (tempIndex > 0)
{
tempString = tempString.Substring(tempIndex);
tempIndex = tempString.IndexOf(prefix);
tempIndex++;
occurs++;
}
int product = occurs * prefix.Length;
if ((product) > highestCount)
{
if (highestCount > 1000000000)
return 1000000000;
highestCount = product;
}
}
return highestCount;
}
}

This works better
private int MaxiumValueOfProduct(string input)
{
var positions = new List<int>();
int max = 0;
for (int i = 1; i <= input.Length; i++)
{
var subString = input.Substring(0, i);
int position = 0;
while ((position < input.Length) &&
(position = input.IndexOf(subString, position, StringComparison.OrdinalIgnoreCase)) != -1)
{
positions.Add(position);
++position;
}
var currentMax = subString.Length * positions.Count;
if (currentMax > max) max = currentMax;
positions.Clear();
}
return max;
}

Longest Common Subsequence

Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}

Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.

As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I find intersections of consecutive characters in strings? - c#

Related

How I can check if substring contain another substring c#

Intersect two lists and return the similarity with the preserved order of the original first string value

indexOutofRange BubbleSort when using inputbox

Latest Codility Time Problems

Longest Common Subsequence

Categories

Resources