I'm trying to find a long string in another string. For this I've been using G[i].Contains(P[arr]) but for some reason the code just skips that condition. In my case : G[I] is 1000 character long and P[arr] is 475. When I debug I can see strings are not trimmed and also I have verified that P[ARR] is part of G[I] in Notepad++ so it should definitely satisfy a condition.
for (int arr = 0; arr < P.Length; arr++)
{
for (int i = a; i < G.Length; i++)
{
if (G[i].Contains(P[arr]))
{
if (!(b == 0))
{
a = i + 1;
continue;
}
primary_1 = (a == 0) ? G[i].IndexOf(P[arr]) : primary;
++count;
a = i + 1;
Console.WriteLine("Counter: " + i);
break;
}
}
}
Related
I have to find subtext in text without using builtin function of string.
public static void Main(string[] args)
{
string subtext = "polly";
string text = "polly put the katle on,polly put the katle on,polly put the katle on,we all have tea";
int i, j, found;
int strLen, wordLen;
strLen = text.Length;
wordLen = subtext.Length;
for (i = 0; i < strLen - wordLen; i++)
{
found = 1;
for (j = 0; j < wordLen; j++)
{
if (text[i + j] != subtext[j])
{
found = 0;
break;
}
}
if (found == 1)
{
Console.WriteLine(" found at index:", subtext, i);
Console.ReadLine();
}
}
}
I am not sure how long you would like to search, your current code seems to find all indexes (or at least that seems to be the intent)
Some things you could change however is instead of always starting the loop, you could validate the if the char at position i matches the first char of the subtext, and if not continue.
When you want to write the data to the console, don't forget to add the spaceholders for your arguments, like:
Console.WriteLine("found {0} at index: {1}", subtext, i);
For the rest, I guess your current implementation is okay, but you could add some validations, like ensuring that both texts are available, and if subtext is longer than the text, simply return -1 directly.
For a simple find of first index, I wrote this one up, it still looks pretty similar to yours
private static int FindIn( string text, string sub ) {
if (string.IsNullOrWhiteSpace( text ) || string.IsNullOrWhiteSpace( sub ) ) {
return string.IsNullOrWhiteSpace( sub ) ? 0 : -1;
}
if (text.Length < sub.Length) {
return -1;
}
for (int i = 0; i < text.Length - sub.Length; i++) {
if (text[i] != sub[0]) {
continue;
}
var matched = true;
for (int j = 1; j < sub.Length && i + j < text.Length; j++) {
if (text[i+j] != sub[j]) {
matched = false;
break;
}
}
if (matched) {
return i;
}
}
return -1;
}
Which you can play around with here
There are a lot of pattern-matching algorithms in this book, i will leave here c# implementation of Knuth-Morris-Pratt algorithm.
static int[] GetPrefix(string s)
{
int[] result = new int[s.Length];
result[0] = 0;
int index = 0;
for (int i = 1; i < s.Length; i++)
{
while (index >= 0 && s[index] != s[i]) { index--; }
index++;
result[i] = index;
}
return result;
}
static int FindSubstring(string pattern, string text)
{
int res = -1;
int[] pf = GetPrefix(pattern);
int index = 0;
for (int i = 0; i < text.Length; i++)
{
while (index > 0 && pattern[index] != text[i]) { index = pf[index - 1]; }
if (pattern[index] == text[i]) index++;
if (index == pattern.Length)
{
return res = i - index + 1;
}
}
return res;
}
If you are looking for all occurance of the subtect in the text you can use the following code:
public static void Main(string[] args)
{
string subtext = "polly";
string text = "polly put the katle on,polly put the katle on,polly put the katle on,we all have tea";
int index = 0;
int startPosition = 0;
bool found = false;
while (index < text.Length - 1)
{
if (subtext[0] == text[index])
{
startPosition = index;
index++;
for (int j = 1; j <= subtext.Length - 1; j++)
{
if (subtext[j] != text[index])
{
found = false;
break;
}
else
{
found = true;
}
index++;
}
}
if (found)
{
Console.WriteLine("{0} found at index: {1}", subtext, startPosition);
found = false;
}
index++;
}
Console.ReadLine();
}
If you are looking only for the first occurance add break in the "if (found)" condition
I'm facing a problem I don't even know what to search in Google/Stack Overflow.
So comment if you feel the need for further explanation, questions.
Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.
Example:
I have two strings, that I convert to a CharArray.
I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).
As you can see the first string contains E15 (in that specific order), and so does the seconds one.
So these two strings will return : { 'E', '1', '5' }
string s1 = "E15QD(A)";
string s2 = "NHE15H";
The problem I am facing is that if i replace "s2" with:
string s2 = "NQE18H" // Will return {'Q', 'E', '1' }
My operation will return : {'Q', 'E', '1' }
The result should be : {'E', '1' } because Q don't follow the letter 1
Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.
Current code:
List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();
var result = cA1.Where(x => cA2.Contains(x)).ToList();
Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.
This is a "longest common substring" problem.
You can use this extension to get all substrings lazily:
public static class StringExtensions
{
public static IEnumerable<string> GetSubstrings(this string str)
{
if (string.IsNullOrEmpty(str))
throw new ArgumentException("str must not be null or empty", "str");
for (int c = 0; c < str.Length - 1; c++)
{
for (int cc = 1; c + cc <= str.Length; cc++)
{
yield return str.Substring(c, cc);
}
}
}
}
Then it's easy and readable with this LINQ query:
string longestIntersection = "E15QD(A)".GetSubstrings()
.Intersect("NQE18H".GetSubstrings())
.OrderByDescending(s => s.Length)
.FirstOrDefault(); // E1
Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:
longString.GetSubstrings().Intersect(shortString.GetSubstrings())
I think this should do it:
string similar = null;
for (int i = 0; i < s1.Length; i++)
{
string s = s1.Substring(0, i + 1);
if (s2.Contains(s))
{
similar = s;
}
}
char[] result = similar.ToCharArray();
#TimSchmelter provided the link to this answer in the comments of the original post.
public int LongestCommonSubstring(string str1, string str2, out string sequence)
{
sequence = string.Empty;
if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
return 0;
int[,] num = new int[str1.Length, str2.Length];
int maxlen = 0;
int lastSubsBegin = 0;
StringBuilder sequenceBuilder = new StringBuilder();
for (int i = 0; i < str1.Length; i++)
{
for (int j = 0; j < str2.Length; j++)
{
if (str1[i] != str2[j])
num[i, j] = 0;
else
{
if ((i == 0) || (j == 0))
num[i, j] = 1;
else
num[i, j] = 1 + num[i - 1, j - 1];
if (num[i, j] > maxlen)
{
maxlen = num[i, j];
int thisSubsBegin = i - num[i, j] + 1;
if (lastSubsBegin == thisSubsBegin)
{//if the current LCS is the same as the last time this block ran
sequenceBuilder.Append(str1[i]);
}
else //this block resets the string builder if a different LCS is found
{
lastSubsBegin = thisSubsBegin;
sequenceBuilder.Length = 0; //clear it
sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
}
}
}
}
}
sequence = sequenceBuilder.ToString();
return maxlen;
}
I've written a recursive method in C# that should indent strings. For example, this string:
for (int i = 0; i < sb.Length; i++)
{
if (sb[i] == '{')
{
startIndex = i;
break;
}
}
should be converted to:
for (int i = 0; i < sb.Length; i++)
{
if (sb[i] == '{')
{
startIndex = i;
break;
}
}
My method is (updated):
private static string IndentText(string t,bool first = true)
{
if (first == false)
{
t = t.PadLeft(2);
}
int startIndex = t.IndexOf('{') + 1;
int stopIndex = t.LastIndexOf('}') - 1;
int blockLength = stopIndex - startIndex + 1;
if (blockLength <= 1 )
{
return "";
}
string start = t.Substring(0, startIndex);
string end = t.Substring(stopIndex + 1);
string indentBlock = t.Substring(startIndex, blockLength);
if (!CheckNestedBlocks(indentBlock))
{
return indentBlock;
}
return start + IndentText(indentBlock,false) + end;
}
private static bool CheckNestedBlocks(string t)
{
for (int i = 0; i < t.Length; i++)
{
if (t[i] == '{') // { and } always come in pairs, so I can check of only one of then
{
return true;
}
}
return false;
}
But I'm getting a StackOverflow exception in mscorlib.dll
What is my mistake? Thanks in advance.
By the way, because I think I'm complicating this problem, is there a better (and working) way to indent strings like this?
You should not include the braces in the "block" that is passed in the recursive call:
if (t[i] == '{')
{
startIndex = i + 1; // Start one character beyond {
break;
}
// ...
if (t[i] == '}')
{
stopIndex = i - 1; // Stop one character prior to }
break;
}
Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}
Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.
As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks
I am doing my homework and I have to do a program that extends simple letters from a file, like E and F, to continuous productions, given also in the folder, such as E+T E-F etc. Anyway the code shown below gives me an argument out of range exception. I crafted the same code in java and all works fine. I don't know why in C# it gives me this exception. Please give me some advice!!
I forgot to put the file that I'm reading from:
EFT
a+()
E
E+T|E-T|T
T*F|T/F|F
a|(E)
public void generare(){
String N = null;
String T = null;
String S = null;
String[] P = null;
TextReader tr = new StreamReader("dateIntrare.txt");
try
{
N = tr.ReadLine();
T = tr.ReadLine();
S = tr.ReadLine();
P = new String[N.Length];
for (int i = 0; i < N.Length; i++)
{
P[i] = tr.ReadLine();
}
tr.Close();
Console.WriteLine("Neterminale: N = " + N);
Console.WriteLine("Terminale: T = " + T);
Console.WriteLine("Productii ");
for (int i = 0; i < P.Length; i++)
Console.WriteLine("\t" + P[i]);
Console.WriteLine("Start: S = " + S);
Boolean gata = false;
String iesire = S.Substring(0, S.Length);
Console.WriteLine("\nRezultat");
Console.Write("\t");
while ((gata == false) && (iesire.Length < 50))
{
Console.Write(iesire);
Boolean ok = false;
for (int i = iesire.Length - 1; i >= 0 && ok == false; i--)
{
for (int j = 0; j < N.Length && ok == false; j++)
if (N[j] == iesire[i])
{
String s1 = iesire.Substring(0, i);
String s2 = iesire.Substring(i + 1, iesire.Length); // HERE IS THE EXCEPTION TAKING PLACE
String inlocuire = P[N.IndexOf(iesire[i])];
String[] optiuni = null;
String[] st = inlocuire.Split('|');
int k = 0;
foreach (String now in st)
{
k++;
}
optiuni = new String[k];
st = inlocuire.Split('|');
k = 0;
foreach (string next in st)
{
optiuni[k++] = next;
}
Random rand = new Random();
int randNr = rand.Next(optiuni.Length);
String inlocuireRandom = optiuni[randNr];
iesire = s1 + inlocuireRandom + s2;
ok = true;
}
}
if (ok == false)
{
gata = true;
}
else
{
if (iesire.Length < 50)
Console.Write(" => ");
}
}
}
catch (FileNotFoundException)
{
Console.WriteLine("Eroare, fisierul nu exista!");
}
Console.WriteLine();
}
But why in java works and here not? I'm confused
When in doubt, read the documentation. In Java, the 2-parameter overload of substring takes a start index and an end index. In .NET, the second parameter is the number of characters to take, not an end index.
So you probably want
String s2 = iesire.Substring(i + 1, iesire.Length - i - 1);
Or to be simpler about it, just use the 1-parameter version, which takes all the characters from the specified index onwards:
String s2 = iesire.Substring(i + 1);
(I'd use that in Java too...)
Fundamentally though, it's worth taking a step back and working out why you couldn't work this out for yourself... even if you missed it before:
Look at the line that threw the exception in your code
Look at which method actually threw the exception (String.Substring in this case)
Look at the exception message carefully (it's a really good hint!) and also any nested exceptins
Read the documentation for the relevant method carefully, especially the sections describing the parameters and exceptions
This is a common mistake while porting codes from Java to c#.
Substring in Java takes start & end parameters but in c# they are start and length