C# LevenshteinDistance algorithm for spellchecker - c#

Hi i'm using the levenshtein algorithm to calculate the difference between two strings, using the below code. It currently provides the total number of changes which need to be made to get from 'answer' to 'target', but i'd like to split these up into the types of errors being made. So classifying an error as a deletion, substitution or insertion.
I've tried adding a simple count but i'm new at this and don't really understand how the code works so not sure how to go about it.
static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
Thanks in advance.

Related

How to sort array by first item in string in descending order on C#

I don't know what to do, my program is not working
I need to sort a two-dimensional array by the first elements of a row in descending order by rearranging the rows
(need to sort string without sort item)
Let's say I'm given an array:
1 2 3
4 5 6
7 8 9
As a result of the program, I need to get:
7 8 9
4 5 6
1 2 3
`
using System;
namespace BubbleSort
{
internal class Program
{
static void Main(string[] args)
{
//Объявление массива и его размерности
const int n = 3;
int bubble;
int[,] A =
{
{ 1, 2, 3 },
{ 4, 5, 6 },
{ 7, 8, 9 },
};
//Алгоритм пузырьковой сортировки
for (int i = 0; i < n; i++)
{
for(int j = 0; j < n - 1; j++)
{
if (A[i, 0] < A[i++, 0])
{
for(j = 0; j < n-1; j++)
{
bubble = A[i, j];
A[i, j] = A[i, j+1];
A[i, j+1] = bubble;
}
}
}
}
//Вывод массива
for (int y = 0; y < n; y++)
{
for (int x = 0; x < n; x++)
{
Console.Write(A[y, x] + " ");
}
Console.WriteLine();
}
}
}
}
`
From the code in your comments, I'm not sure why you have a nested loop for swapping the values since a single loop will do. I've removed that here.
If you refer to the pseudocode implementation on Wikipedia, you'll see that you need to keep running multiple passes across the array until everything is sorted. You can do this like so:
bool swapped;
do
{
swapped = false; // reset swapped
for (int i = 0; i < n - 1; i++) // loop through all but the last row
{
if (A[i, 0] < A[i + 1, 0]) // determine if this row needs to be swapped with the next row
{
swapped = true; // mark swapped
for (int j = 0; j < n; j++) // swap each item in row i with each item in row i+1
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
}
}
}
while (swapped); // if we swapped anything, we need to make another pass to ensure the array is sorted
We can also do away with the need for n by using .GetUpperBound(dimension) which returns a value between 0 and n - 1 (where n is the count of items in the array in that dimension). Because the result is effectively n - 1, I've modified the loop conditions slightly:
bool swapped;
do
{
swapped = false;
for (int i = 0; i < A.GetUpperBound(0); i++)
{
if (A[i, 0] < A[i + 1, 0])
{
swapped = true;
for (int j = 0; j <= A.GetUpperBound(1); j++)
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
}
}
}
while (swapped);
We can also refer to the "Optimizing bubble sort" section of the Wikipedia page and implement that instead, which will make our code run more optimally:
int n = A.GetUpperBound(0); // get the initial value of n
do
{
int newn = 0; // default newn to 0, so if no items are visited, it will remain 0 and the loop will exit
for (int i = 0; i < n; i++)
{
if (A[i, 0] < A[i + 1, 0])
{
for (int j = 0; j <= A.GetUpperBound(1); j++)
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
newn = i; // store the current (highest) value of i swapped
}
}
n = newn; // set the value of n to the highest value of i swapped
}
while (n > 0); // loop until n == 0
The logic here (as explained on Wikipedia) is that by the end of the first pass, the last item is in the correct position. By the end of the second pass, the second-to-last and last items are in the correct position, and so on. So each time, we can visit one less item. When we have 0 items to visit, we have 0 items to swap, and the sort is complete.
You can see this optimised version in this YouTube visualisation.
I think the line
if (A[i, 0] < A[i++, 0])
should read
if (A[i, 0] < A[i+ 1, 0])

longest common subsequence in c#

I want to allow users to input their own characters to find the lowest common subsequence, this code is mostly a skeleton of somebody else's work- and the characters used are AGGTAB and GXTXYAB which is GTAB however I want users to implement their own characters
code for longest common subsequence:
class GFG
{
public static void Main()
{
int m = X.Length;
int n = Y.Length;
Console.WriteLine("Write down characters for string A");
string A = Console.ReadLine();
Console.WriteLine("Write down characters for string B");
string B = Console.ReadLine();
int[,] L = new int[m + 1, n + 1];
for (int i = 0; i <= m; i++)
{
for (int j = 0; j <= n; j++)
{
if (i == 0 || j == 0)
L[i, j] = 0;
else if (X[i - 1] == Y[j - 1])
L[i, j] = L[i - 1, j - 1] + 1;
else
L[i, j] = Math.Max(L[i - 1, j], L[i, j - 1]);
}
}
}
}

Neural network [ocr]

I come looking for general tips about the program I'm writing now.
The goal is:
Use neural network program to recognize 3 letters [D,O,M] (or display "nothing is recognized" if i input anything other than those 3).
Here's what I have so far:
A class for my single neuron
public class neuron
{
double[] weights;
public neuron()
{
weights = null;
}
public neuron(int size)
{
weights = new double[size + 1];
Random r = new Random();
for (int i = 0; i <= size; i++)
{
weights[i] = r.NextDouble() / 5 - 0.1;
}
}
public double output(double[] wej)
{
double s = 0.0;
for (int i = 0; i < weights.Length; i++) s += weights[i] * wej[i];
s = 1 / (1 + Math.Exp(s));
return s;
}
}
A class for a layer:
public class layer
{
neuron[] tab;
public layer()
{
tab = null;
}
public layer(int numNeurons, int numInputs)
{
tab = new neuron[numNeurons];
for (int i = 0; i < numNeurons; i++)
{
tab[i] = new neuron(numInputs);
}
}
public double[] compute(double[] wejscia)
{
double[] output = new double[tab.Length + 1];
output[0] = 1;
for (int i = 1; i <= tab.Length; i++)
{
output[i] = tab[i - 1].output(wejscia);
}
return output;
}
}
And finally a class for a network
public class network
{
layer[] layers = null;
public network(int numLayers, int numInputs, int[] npl)
{
layers = new layer[numLayers];
for (int i = 0; i < numLayers; i++)
{
layers[i] = new layer(npl[i], (i == 0) ? numInputs : (npl[i - 1]));
}
}
double[] compute(double[] inputs)
{
double[] output = layers[0].compute(inputs);
for (int i = 1; i < layers.Length; i++)
{
output = layers[i].compute(output);
}
return output;
}
}
Now for the algorythm I chose:
I have a picture box, size 200x200, where you can draw a letter (or read one from jpg file).
I then convert it to my first array(get the whole picture) and 2nd one(cut the non relevant background around it) like so:
Bitmap bmp2 = new Bitmap(this.pictureBox1.Image);
int[,] binaryfrom = new int[bmp2.Width, bmp2.Height];
int minrow=0, maxrow=0, mincol=0, maxcol=0;
for (int i = 0; i < bmp2.Height; i++)
{
for (int j = 0; j < bmp2.Width; j++)
{
if (bmp2.GetPixel(j, i).R == 0)
{
binaryfrom[i, j] = 1;
if (minrow == 0) minrow = i;
if (maxrow < i) maxrow = i;
if (mincol == 0) mincol = j;
else if (mincol > j) mincol = j;
if (maxcol < j) maxcol = j;
}
else
{
binaryfrom[i, j] = 0;
}
}
}
int[,] boundaries = new int[binaryfrom.GetLength(0)-minrow-(binaryfrom.GetLength(0)-(maxrow+1)),binaryfrom.GetLength(1)-mincol-(binaryfrom.GetLength(1)-(maxcol+1))];
for(int i = 0; i < boundaries.GetLength(0); i++)
{
for(int j = 0; j < boundaries.GetLength(1); j++)
{
boundaries[i, j] = binaryfrom[i + minrow, j + mincol];
}
}
And convert it to my final array of 12x8 like so (i know I could shorten this a fair bit, but wanted to have every step in different loop so I can see what went wrong easier[if anything did]):
int[,] finalnet = new int[12, 8];
int k = 1;
int l = 1;
for (int i = 0; i < finalnet.GetLength(0); i++)
{
for (int j = 0; j < finalnet.GetLength(1); j++)
{
finalnet[i, j] = 0;
}
}
while (k <= finalnet.GetLength(0))
{
while (l <= finalnet.GetLength(1))
{
for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * (k - 1); i < (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * k; i++)
{
for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * (l - 1); j < (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * l; j++)
{
if (boundaries[i, j] == 1) finalnet[k-1, l-1] = 1;
}
}
l++;
}
l = 1;
k++;
}
int a = boundaries.GetLength(0);
int b = finalnet.GetLength(1);
if((a%b) != 0){
k = 1;
while (k <= finalnet.GetLength(1))
{
for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * finalnet.GetLength(0); i < boundaries.GetLength(0); i++)
{
for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * (k - 1); j < (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * k; j++)
{
if (boundaries[i, j] == 1) finalnet[finalnet.GetLength(0) - 1, k - 1] = 1;
}
}
k++;
}
}
if (boundaries.GetLength(1) % finalnet.GetLength(1) != 0)
{
k = 1;
while (k <= finalnet.GetLength(0))
{
for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * (k - 1); i < (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * k; i++)
{
for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * finalnet.GetLength(1); j < boundaries.GetLength(1); j++)
{
if (boundaries[i, j] == 1) finalnet[k - 1, finalnet.GetLength(1) - 1] = 1;
}
}
k++;
}
for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * finalnet.GetLength(0); i < boundaries.GetLength(0); i++)
{
for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * finalnet.GetLength(1); j < boundaries.GetLength(1); j++)
{
if (boundaries[i, j] == 1) finalnet[finalnet.GetLength(0) - 1, finalnet.GetLength(1) - 1] = 1;
}
}
}
The result is a 12x8 (I can change it in the code to get it from some form controls) array of 0 and 1, where 1 form the rough shape of a letter you drawn.
Now my questions are:
Is this a correct algorythm?
Is my function
1/(1+Math.Exp(x))
good one to use here?
What should be the topology? 2 or 3 layers, and if 3, how many neurons in hidden layer? I have 96 inputs (every field of the finalnet array), so should I also take 96 neurons in the first layer? Should I have 3 neurons in the final layer or 4(to take into account the "not recognized" case), or is it not necessary?
Thank you for your help.
EDIT: Oh, and I forgot to add, I'm gonna train my network using Backpropagation algorythm.
You may need 4 layers at least to get accurate results using back propagation method. 1 input, 2 middle layers, and an output layer.
12 * 8 matrix is too small(and you may end up in data loss which will result in total failure) - try some thing 16 * 16. If you want to reduce the size then you have to peel out the outer layers of black pixels further.
Think about training the network with your reference characters.
Remember that you have to feed back the output back to the input layer again and iterate it multiple times.
A while back I created a neural net to recognize digits 0-9 (python, sorry), so based on my (short) experience, 3 layers are ok and 96/50/3 topology will probably do a good job. As for the output layer, it's your choice; you can either backpropagate all 0s when the input image is not a D, O or M or use the fourth output neuron to indicate that the letter was not recognized. I think that the first option would be the best one because it's simpler (shorter training time, less problems debugging the net...), you just need to apply a threshold under which you classify the image as 'not recognized'.
I also used the sigmoid as activation function, I didn't try others but it worked :)

Computing Array Determinant for NxN Recursive C#

Well, this is giving me a real headache. I'm building a matrix determinant function to compute NxN determinant, and I'm using recursion.
The logic is working right but I'm not able to get the final value computed correctly.
Here is my code for Matrix Determinant:
public static double determinant(double[,]array){
double det=0;
double total = 0;
double[,] tempArr = new double[array.GetLength(0) - 1, array.GetLength(1) - 1];
if(array.GetLength(0)==2)
{
det = array[0, 0] * array[1, 1] - array[0, 1] * array[1, 0];
}
else {
for (int i = 0; i <1; i++)
{
for (int j = 0; j < array.GetLength(1); j++)
{
if (j % 2 != 0) array[i, j] = array[i, j] * -1;
tempArr= fillNewArr(array, i, j);
det+=determinant(tempArr);
total =total + (det * array[i, j]);
}
}
}
return det;
}
and about fillNewArr method it's just a method to trim the array, method is as follow:
p
ublic static double[,] fillNewArr(double[,] originalArr, int row, int col)
{
double[,] tempArray = new double[originalArr.GetLength(0) - 1, originalArr.GetLength(1) - 1];
for (int i = 0, newRow = 0; i < originalArr.GetLength(0); i++)
{
if (i == row)
continue;
for (int j = 0, newCol=0; j < originalArr.GetLength(1); j++)
{
if ( j == col) continue;
tempArray[newRow, newCol] = originalArr[i, j];
newCol++;
}
newRow++;
}
return tempArray;
}
The method is working as it supposed to "I assume" but the final result is not computed in the right way, why would that be?!
4x4 Array Example:
{2 6 6 2}
{2 7 3 6}
{1 5 0 1}
{3 7 0 7}
Final result should be -168, while mine is 104!
This bit
if (j % 2 != 0) array[i, j] = array[i, j] * -1;
tempArr= fillNewArr(array, i, j);
det+=determinant(tempArr);
total =total + (det * array[i, j]);
uses a variable total that is never used again. It should probably be something like
double subdet = determinant(fillNewArr(array, i, j));
if (j % 2 != 0) subdet *= -1;
det += array[i, j] * subdet;

Prefer jagged arrays over multidimensional [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I get this performance issue from visual studio (Prefer jagged arrays over multidimensional).
The code to be replaced is "//matrix".
How can i do this with my code?
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
Here's a version which uses only a single dimensional array.
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length; //length of s
int m = t.Length; //length of t
int stride = m+1;
int[] d = new int[(n + 1)*stride];
// note, d[i*m + i + j] holds (i,j)
int cost;
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2, adjusted to skip (0,0)
for (int i = 0, k = stride; k < d.Length; k += stride) d[k] = ++i;
for (int j = 1; j < stride; ++j) d[j] = j;
// Step 3
int newrow = stride * 2;
char si = s[0];
for (int i=0, j=0, k = stride + 1; k < d.Length; ++k)
{
// don't overwrite d[i,0]
if (k == newrow) {
newrow += stride;
j=0;
si=s[++i];
continue;
}
// Step 5
cost = (t[j] == si ? 0 : 1);
// Step 6
d[k] = System.Math.Min(System.Math.Min(
d[k-stride] + 1, /* up one row */
d[k-1] + 1 /* left one */ ),
d[k-stride-1] + cost /* diagonal */ );
}
// Step 7
return d[d.Length-1];
}
This should improve performance 3 ways:
No string comparison and no one-character string garbage for the GC to clean up
Changed memory layout to match iteration order, improving cache behavior
Used single dimensional array and optimizer-friendly idioms, which should reduce bounds-checking
However, I'm pretty sure that applying mike z's suggestion of using two vectors will make for even clearer code.

Categories

Resources