Count Number of Changes Comparing two string in c# [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to find out Number of changes comparing two string.
Example:
String 1:
I hope to do something good from this chance.I think my Test form will
help me in ODI. Scoring runs in international cricket, regardless of
the format, gives a player confidence.
string 2:
"I hope to do something good this chance. I think my Testing form will
help me in ODI Format. Scoring runs in latest international cricket,
regardless of the format, gives a playing confidence."
Expected Output: 5.(ignoring space & newline)
Changes are:
from(delete from string 2),
Testing(modified in string 2),
Format(extra addition in string 2),
latest(extra addition in string 2),
playing(modified in string 2).
Have any algorithm for count the number of changes?

You problem is quite similar to comparing two files.. Difference is In file comparison files are compared by comparing text in a single line. In your case it white space would be a separator instead of Newline.
Typically this is accomplished by finding the Longest Common Subsequence. Here is a related document which explains it: https://nanohub.org/infrastructure/rappture/export/2719/trunk/gui/src/diff.pdf
For finding LCS:
public static int GetLCS(string str1, string str2)
{
int[,] table;
return GetLCSInternal(str1, str2, out table);
}
private static int GetLCSInternal(string str1, string str2, out int[,] matrix)
{
matrix = null;
if (string.IsNullOrEmpty(str1) || string.IsNullOrEmpty(str2))
{
return 0;
}
int[,] table = new int[str1.Length + 1, str2.Length + 1];
for (int i = 0; i < table.GetLength(0); i++)
{
table[i, 0] = 0;
}
for(int j= 0;j<table.GetLength(1); j++)
{
table[0,j] = 0;
}
for (int i = 1; i < table.GetLength(0); i++)
{
for (int j = 1; j < table.GetLength(1); j++)
{
if (str1[i-1] == str2[j-1])
table[i, j] = table[i - 1, j - 1] + 1;
else
{
if (table[i, j - 1] > table[i - 1, j])
table[i, j] = table[i, j - 1];
else
table[i, j] = table[i - 1, j];
}
}
}
matrix = table;
return table[str1.Length, str2.Length];
}
//Reading Out All LCS sorted in lexicographic order
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Text;
namespace LambdaPractice
{
class Program
{
static int[,] c;
static int max(int a, int b)
{
return (a > b) ? a : b;
}
static int LCS(string s1, string s2)
{
for (int i = 1; i <= s1.Length; i++)
c[i,0] = 0;
for (int i = 1; i <= s2.Length; i++)
c[0, i] = 0;
for (int i=1;i<=s1.Length;i++)
for (int j = 1; j <= s2.Length; j++)
{
if (s1[i-1] == s2[j-1])
c[i, j] = c[i - 1, j - 1] + 1;
else
{
c[i, j] = max(c[i - 1, j], c[i, j - 1]);
}
}
return c[s1.Length, s2.Length];
}
/* Prints one LCS
static string BackTrack(string s1, string s2, int i, int j)
{
if (i == 0 || j == 0)
return "";
if (s1[i - 1] == s2[j - 1])
return BackTrack(s1, s2, i - 1, j - 1) + s1[i - 1];
else if (c[i - 1, j] > c[i, j - 1])
return BackTrack(s1, s2, i - 1, j);
else
return BackTrack(s1, s2, i, j - 1);
}*/
static SortedSet<string> backtrack(string s1, string s2, int i, int j)
{
if (i == 0 || j == 0)
return new SortedSet<string>(){""} ;
else if (s1[i - 1] == s2[j - 1])
{
SortedSet<string> temp = new SortedSet<string>();
SortedSet<string> holder = backtrack(s1, s2, i - 1, j - 1);
if (holder.Count == 0)
{
temp.Add(s1[i - 1]);
}
foreach (string str in holder)
temp.Add(str + s1[i - 1]);
return temp;
}
else
{
SortedSet<string> Result = new SortedSet<string>() ;
if (c[i - 1, j] >= c[i, j - 1])
{
SortedSet<string> holder = backtrack(s1, s2, i - 1, j);
foreach (string s in holder)
Result.Add(s);
}
if (c[i, j - 1] >= c[i - 1, j])
{
SortedSet<string> holder = backtrack(s1, s2, i, j - 1);
foreach (string s in holder)
Result.Add(s);
}
return Result;
}
}
static void Main(string[] args)
{
string s1, s2;
s1 = Console.ReadLine();
s2 = Console.ReadLine();
c = new int[s1.Length+1, s2.Length+1];
LCS(s1, s2);
// Console.WriteLine(BackTrack(s1, s2, s1.Length, s2.Length));
// Console.WriteLine(s1.Length);
SortedSet<string> st = backtrack(s1, s2, s1.Length, s2.Length);
foreach (string str in st)
Console.WriteLine(str);
GC.Collect();
Console.ReadLine();
}
}
}

could you check this might be useful somehow, I would say you'll use a lot of control statements (if/else or switch) for result sets.

You can do it relatively easy, if you you consider a string as modified if it is starting with another string:
void Main()
{
var a = "I hope to do something good from this chance.I think my Test form will help me in ODI.Scoring runs in international cricket, regardless of the format, gives a player confidence.";
var b = "I hope to do something good this chance. I think my Testing form will help me in ODI Format. Scoring runs in latest international cricket, regardless of the format, gives a playing confidence.";
var d = new Difference(a,b);
Console.WriteLine("Number of differences: {0}", d.Count);
foreach (var diff in d.Differences)
{
Console.WriteLine("Different: {0}", diff);
}
}
class Difference
{
string a;
string b;
List<string> notInA;
List<string> notInB;
public int Count
{
get { return notInA.Count + notInB.Count; }
}
public IEnumerable<string> Differences
{
get { return notInA.Concat(notInB); }
}
public Difference(string a, string b)
{
this.a = a;
this.b = b;
var itemsA = Split(a);
var itemsB = Split(b);
var changedPairs =
from x in itemsA
from y in itemsB
where (x.StartsWith(y) || y.StartsWith(x)) && y != x
select new { x, y };
var softChanged = changedPairs.SelectMany(p => new[] {p.x, p.y}).Distinct().ToList();
notInA = itemsA.Except(itemsB).Except(softChanged).ToList();
notInB = itemsB.Except(itemsA).Except(softChanged).ToList();
}
IEnumerable<string> Split(string x)
{
return x.Split(new[] { " ", ".", ","}, StringSplitOptions.RemoveEmptyEntries);
}
}
Output:
Number of differences: 5
Different: from
Different: player
Different: Format
Different: latest
Different: playing

Related

Simple spell checker using Levenshtein distance

I have to implement a simple spell checker. Basically I have to user input an incorrect sentence or a word, then a number N, and then N correct words each on new line. The program has to output "incorrect word: suggestion". If there is no suggestions available it should output "incorrect word: no suggestions" and if all the words from sentence are correct it should display "Correct text!". The typos can be:
Misspeled word.
Swapped letters.
Extra letter.
Missing letter.
To do this I implemented Levensthein minumim distance algorithm, which calculates the minimum number of modifications that a string has to take to be transformed into another string. All the test cases are fine but I want to reduce the cyclomatic complexity of the main method from 27 to below 26. Any suggestion would be helpful. For the example:
Thsi is an texzt fr tet
5
This
an
text
for
test
It displays:
Thsi: This
an: no suggestions
texzt: text
fr: for
tet: text test
using System;
using System.Collections.Generic;
namespace MisspelledWords
{
class Program
{
static void Main(string[] args)
{
string sentence = Console.ReadLine();
sentence = sentence.ToLower();
int numWords = int.Parse(Console.ReadLine());
const int doi = 2;
const int doi2 = 3;
int index = 0;
int index1 = 0;
string[] correctWords = new string[numWords];
for (int i = 0; i < numWords; i++)
{
correctWords[i] = Console.ReadLine();
}
foreach (string word in sentence.Split(' '))
{
index++;
int minDistance = int.MaxValue;
string closestWord = "";
foreach (string correctWord in correctWords)
{
int distance = GetLevenshteinDistance(word, correctWord);
if (distance < minDistance)
{
minDistance = distance;
closestWord = correctWord;
}
}
Message(minDistance, closestWord, word, index, ref index1, correctWords);
if (index1 != 0)
{
return;
}
}
static void Message(int minDistance, string closestWord, string word, int index, ref int index1, string[] correctWords)
{
if (minDistance >= doi)
{
// Print the misspelled word followed by "no suggestions"
Console.WriteLine(word + ": (no suggestion)");
}
else if (minDistance < doi && closestWord != word || minDistance >= doi2)
{
// Find all correct words that have the same minimum distance
List<string> suggestions = new List<string>();
foreach (string correctWord in correctWords)
{
int distance = GetLevenshteinDistance(word, correctWord);
if (distance == minDistance)
{
suggestions.Add(correctWord);
}
}
// Print the misspelled word followed by the suggestions
Console.Write(word + ": ");
Console.WriteLine(string.Join(" ", suggestions));
}
else if (minDistance == 0 && index > 1)
{
Console.WriteLine("Text corect!");
index1++;
}
}
static int Min(int value1, int value2, int value3)
{
return Math.Min(Math.Min(value1, value2), value3);
}
static int GetLevenshteinDistance(string word1, string word2)
{
int[,] distance = new int[word1.Length + 1, word2.Length + 1];
InitializeDistanceMatrix(distance, word1, word2);
CalculateLevenshteinDistance(distance, word1, word2);
return distance[word1.Length, word2.Length];
}
static void InitializeDistanceMatrix(int[,] distance, string word1, string word2)
{
for (int i = 0; i <= word1.Length; i++)
{
for (int j = 0; j <= word2.Length; j++)
{
if (i == 0)
{
distance[i, j] = j;
}
else if (j == 0)
{
distance[i, j] = i;
}
}
}
}
static void CalculateLevenshteinDistance(int[,] distance, string word1, string word2)
{
for (int i = 0; i <= word1.Length; i++)
{
for (int j = 0; j <= word2.Length; j++)
{
CLD(i, j, distance, word1, word2);
}
}
}
static void CLD(int i, int j, int[,] distance, string word1, string word2)
{
const int v = 2;
if (i <= 0 || j <= 0)
{
return;
}
distance[i, j] = Min(
distance[i - 1, j] + 1,
distance[i, j - 1] + 1,
distance[i - 1, j - 1] + (word1[i - 1] == word2[j - 1] ? 0 : 1));
// Check if swapping the characters at positions i and j results in a new minimum distance
if (i <= 1 || j <= 1 || word1[i - 1] != word2[j - v] || word1[i - v] != word2[j - 1])
{
return;
}
distance[i, j] = Math.Min(distance[i, j], distance[i - v, j - v] + 1);
}
Console.ReadLine();
}
}
}

Compare chars in two strings of different length in C#

In C#, Im trying to compare two strings and find out how many chars are different.
I Tried this:
static void Main(String[] args)
{
var strOne = "abcd";
var strTwo = "bcd";
var arrayOne = strOne.ToCharArray();
var arrayTwo = strTwo.ToCharArray();
var differentChars = arrayOne.Except(arrayTwo);
foreach (var character in differentChars)
Console.WriteLine(character); //Will print a
}
but there are problems with this:
if the strings contain same chars but on different positions within the string
it removes the duplicate occurences
If the strings would be the same length I would compare the chars one by one but if one is bigger than the other the positions are different
You may want to have a look at the Leventshtein Distance Algorithm.
As the article says
the Levenshtein distance between two words is the minimum number of
single-character edits (insertions, deletions or substitutions)
required to change one word into the other
You can find many reference implementations in the internet, like here or here.
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}
}
How about this? Append two strings into one and group them, you will have the count of the chars, > 1 will give the repeatings and = 0 will give the unique ones.
var strOne = "abcd";
var strTwo = "bcd";
var arrayOne = strOne.Concat(strTwo).GroupBy(x => x).Select(x => new { Key = x.Key, Count = x.Count() });
foreach (var character in arrayOne) {
if (character.Count > 1)
{
Console.WriteLine(character.Key); // the repeating chars
}
}
If the same character appears twice in the same string,
var strOne = "abbcdd";
var strTwo = "cd";
var arrayTwo = strOne.Select(x => new { Key = x, IsExists = strTwo.Any(y => y == x) });
foreach (var character in arrayTwo) {
if (character.IsExists)
{
Console.WriteLine(character.Key);
}
}

How I can check if substring contain another substring c#

//the word skill it's a substring for two string i want to compare based it
string first = "skill.Name";
string second = "jobskillRelation";
first.Contains(second);
You can use Longest Common Substring code provided here, the C# version is like this:
public static string lcs(string a, string b)
{
var lengths = new int[a.Length, b.Length];
int greatestLength = 0;
string output = "";
for (int i = 0; i < a.Length; i++)
{
for (int j = 0; j < b.Length; j++)
{
if (a[i] == b[j])
{
lengths[i, j] = i == 0 || j == 0 ? 1 : lengths[i - 1, j - 1] + 1;
if (lengths[i, j] > greatestLength)
{
greatestLength = lengths[i, j];
output = a.Substring(i - greatestLength + 1, greatestLength);
}
}
else
{
lengths[i, j] = 0;
}
}
}
return output;
}
so the usage will be:
var LCS = lcs(first,second)
If you want to compare two string to see if both contain a certain keyword, this may help.
Boolean compare(string first, string second, string keyword)
{
if (first.Contains(keyword) && second.Contains(keyword))
return true;
return false;
}

Sequence of marching chars in 2 strings

I have to write searching algorithm, For example I have to compare str="giorgi" to str2="grigol". I'm trying to find longest matching sequence of chars, so that the order of chars is the same and string which I should get is "grg"... with this c# code I'm getting "grig".
int k=0;
string s="";
string str = "giorgi";
string str2 = "grigol";
for(int i=0;i<str.Length;i++)
{
for (int j = k; j < str2.Length; j++)
{
if (str[i] == str2[j])
{
s += str2[k];
k++;
goto endofloop;
}
}
endofloop:;
}
Console.WriteLine(s);
The solution:
using System;
class GFG
{
/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
static int lcs( char[] X, char[] Y, int m, int n )
{
int [,]L = new int[m+1,n+1];
/* Following steps build L[m+1][n+1]
in bottom up fashion. Note
that L[i][j] contains length of
LCS of X[0..i-1] and Y[0..j-1] */
for (int i = 0; i <= m; i++)
{
for (int j = 0; j <= n; j++)
{
if (i == 0 || j == 0)
L[i, j] = 0;
else if (X[i - 1] == Y[j - 1])
L[i, j] = L[i - 1, j - 1] + 1;
else
L[i, j] = GFG.max(L[i - 1, j], L[i, j - 1]);
}
}
return L[m, n];
}
static int max(int a, int b)
{
return (a > b)? a : b;
}
}
And now the program to test it:
public static void Main()
{
String s1 = "giorgi";
String s2 = "grigol";
char[] X=s1.ToCharArray();
char[] Y=s2.ToCharArray();
int m = X.Length;
int n = Y.Length;
Console.Write("Length of LCS is" + " " +lcs( X, Y, m, n ) );
}
}

Longest Common Subsequence

Hi this is my code for longest common subsequence for 2 strings in c# . I need help in backtracking . I need to find out the subsequence : GTCGT
String str1 = "GTCGTTCG";
String str2 = "ACCGGTCGAGTG";
int[,] l = new int[str1.Length, str2.Length]; // String 1 length and string 2 length storing it in a 2-dimensional array
int lcs = -1;
string substr = string.Empty;
int end = -1;
for (int i = 0; i <str1.Length ; i++) // Looping based on string1 length
{
for (int j = 0; j < str2.Length; j++) // Looping based on string2 Length
{
if (str1[i] == str2[j]) // if match found
{
if (i == 0 || j == 0) // i is first element or j is first elemnt then array [i,j] = 1
{
l[i, j] = 1;
}
else
{
l[i, j] = l[i - 1, j - 1] + 1; // fetch the upper value and increment by 1
}
if (l[i, j] > lcs)
{
lcs = l[i, j]; // store lcs value - how many time lcs is found
end = i; // index on longest continuous string
}
}
else // if match not found store zero initialze the array value by zero
{
l[i, j] = 0;
}
}
Your function needs to return a collection of strings. There might be several longest common sub-sequence with same length.
public List<string> LCS(string firstString, string secondString)
{
// to create the lcs table easier which has first row and column empty.
string firstStringTemp = " " + firstString;
string secondStringTemp = " " + secondString;
// create the table
List<string>[,] temp = new List<string>[firstStringTemp.Length, secondStringTemp.Length];
// loop over all items in the table.
for (int i = 0; i < firstStringTemp.Length; i++)
{
for (int j = 0; j < secondStringTemp.Length; j++)
{
temp[i, j] = new List<string>();
if (i == 0 || j == 0) continue;
if (firstStringTemp[i] == secondStringTemp[j])
{
var a = firstStringTemp[i].ToString();
if (temp[i - 1, j - 1].Count == 0)
{
temp[i, j].Add(a);
}
else
{
foreach (string s in temp[i - 1, j - 1])
{
temp[i, j].Add(s + a);
}
}
}
else
{
List<string> b = temp[i - 1, j].Concat(temp[i, j - 1]).Distinct().ToList();
if (b.Count == 0) continue;
int max = b.Max(p => p.Length);
b = b.Where(p => p.Length == max).ToList();
temp[i, j] = b;
}
}
}
return temp[firstStringTemp.Length - 1, secondStringTemp.Length - 1];
}
You need to have a collection set in each entry of table. So you can still keep different strings with the same length in each cell of table.
As far as I've understood your question, I think you want to know the subsequence value i.e. that string. So, to get the subsequence, I've learnt a little bit differently. First, I calculate the table the one we do in standard Longest Common Subsequence (LCS) problem. Then I traverse the table to get the subsequence value. Sorry, I'm not familiar with C#, so, I will give you CPP code. Please have a look and let me know if you face any problem.
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string printLongestCommonSubsequence(vector<vector<int> >& dp, int m, int n, string text1, string text2){
int i = m, j = n;
string lcs = "";
while(i > 0 && j > 0){
if(text1[i-1] == text2[j-1]){
lcs.push_back(text1[i-1]);
i--; j--;
}
else{
if(dp[i][j-1] > dp[i-1][j]) j--;
else i--;
}
}
reverse(lcs.begin(), lcs.end());
return lcs;
}
string longestCommonSubsequence(string text1, string text2){
int m = text1.size();
int n = text2.size();
vector<vector<int> > dp(m+1, vector<int>(n+1));
//initialization
for(int i=0; i<m+1; i++){
for(int j=0; j<n+1; j++){
if(i == 0 || j == 0) dp[i][j] = 0;
}
}
//solving the subproblems to solve the bigger problems
for(int i=1; i<m+1; i++){
for(int j=1; j<n+1; j++){
if(text1[i-1] == text2[j-1])
dp[i][j] = 1 + dp[i-1][j-1];
else
dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
}
}
return printLongestCommonSubsequence(dp, m, n, text1, text2);
}
int main(){
string text1, text2;
cout<<"Enter the first string: ";
cin>>text1;
cout<<"\nEnter the second string: ";
cin>>text2;
string lcs = longestCommonSubsequence(text1, text2);
cout<<"Longest Common Subsequence is: "<<lcs<<endl;
return(0);
}
Please have a look at the diagram.
With respect to printing the LCS,
The basic idea is:
When the characters are equal of both the strings then move towards diagonal.
When the characters are not equal of both the strings then move towards the maximum of both the directions.
I hope this helps 🙂
Happy Learning
Thanks

Categories

Resources