Print LevenshteinDistance matrix in c# [closed] - c#

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I have implemented LevenshteinDistance algorithm in c# as below.This code is running perfectly.But for debugging purpose I want to print the matrix but not able to decide where should I place the Print statement.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace _Levenshtein_
{
class Program
{
public static void Print(int[,] data)
{
for (int i = 0; i < data.GetUpperBound(0); i++)
{
for (int j = 0; j < data.GetUpperBound(1); j++)
{
Console.Write(data[i, j] + " ");
}
}
Console.WriteLine();
}
public static int LevenshteinDistance(string source, string target)
{
if (String.IsNullOrEmpty(source))
{
if (String.IsNullOrEmpty(target)) return 0;
{
return target.Length;
}
}
if (String.IsNullOrEmpty(target)) return source.Length;
if (source.Length > target.Length)
{
var temp = target;
target = source;
source = temp;
}
var m = target.Length;
var n = source.Length;
var distance = new int[2, m + 1];
// Initialize the distance 'matrix'
for (var j = 1; j <= m; j++) distance[0, j] = j;
Console.Write(target + " ");
var currentRow = 0;
for (var i = 1; i <= n; ++i)
{
currentRow = i & 1;
distance[currentRow, 0] = i;
var previousRow = currentRow ^ 1;
Console.WriteLine(source[i-1] + " " );
for (var j = 1; j <= m; j++)
{
var cost = (target[j - 1] == source[i - 1] ? 0 : 1);
distance[currentRow, j] = Math.Min(Math.Min(distance[previousRow, j] + 1,distance[currentRow, j - 1] + 1),distance[previousRow, j - 1] + cost);
Print(distance);
}
Console.WriteLine();
}
return distance[currentRow, m];
}
static void Main(string[] args)
{
LevenshteinDistance("Sunday", "Saturday");
}
}
}

I added comment to print distance matrix.
// `target` string in first ROW, each char in 4 width
// for (var j = 0; j <=target.Length; j++)
// Console.Write(target[j] + " ")
for (var i = 1; i <= n; ++i)
{
currentRow = i & 1;
distance[currentRow, 0] = i;
var previousRow = currentRow ^ 1;
// print: `source[i]` ith char only one Console.Write(source[i] + " ")
// Console.Write(source[i] + " ")
for (var j = 1; j <= m; j++)
{
var cost = (target[j - 1] == source[i - 1] ? 0 : 1);
distance[currentRow, j] = Math.M........);
// write distance in 3 width
//Console.Write(distance[currentRow, j] + " ")
}
// Console.Write("\n")
}
Edit:
length "Sunday" = 6
length "Saturday" = 8
if (source.Length > target.Length){ // target is large .
// swap means
}
So target is "Saturday" string. (Horizontal string)
and "Sunday" is vertical
You will get output like following fig:
Here on Codepade is my working C code that can print edit-distance matrix like:
S a t u r d a y
S 0 1 2 3 4 5 6 7
u 1 1 2 2 3 4 5 6
n 2 2 2 3 3 4 5 6
d 3 3 3 3 4 3 4 5
a 4 3 4 4 4 4 3 4
y 5 4 4 5 5 5 4 3

Related

How to sort array by first item in string in descending order on C#

I don't know what to do, my program is not working
I need to sort a two-dimensional array by the first elements of a row in descending order by rearranging the rows
(need to sort string without sort item)
Let's say I'm given an array:
1 2 3
4 5 6
7 8 9
As a result of the program, I need to get:
7 8 9
4 5 6
1 2 3
`
using System;
namespace BubbleSort
{
internal class Program
{
static void Main(string[] args)
{
//Объявление массива и его размерности
const int n = 3;
int bubble;
int[,] A =
{
{ 1, 2, 3 },
{ 4, 5, 6 },
{ 7, 8, 9 },
};
//Алгоритм пузырьковой сортировки
for (int i = 0; i < n; i++)
{
for(int j = 0; j < n - 1; j++)
{
if (A[i, 0] < A[i++, 0])
{
for(j = 0; j < n-1; j++)
{
bubble = A[i, j];
A[i, j] = A[i, j+1];
A[i, j+1] = bubble;
}
}
}
}
//Вывод массива
for (int y = 0; y < n; y++)
{
for (int x = 0; x < n; x++)
{
Console.Write(A[y, x] + " ");
}
Console.WriteLine();
}
}
}
}
`
From the code in your comments, I'm not sure why you have a nested loop for swapping the values since a single loop will do. I've removed that here.
If you refer to the pseudocode implementation on Wikipedia, you'll see that you need to keep running multiple passes across the array until everything is sorted. You can do this like so:
bool swapped;
do
{
swapped = false; // reset swapped
for (int i = 0; i < n - 1; i++) // loop through all but the last row
{
if (A[i, 0] < A[i + 1, 0]) // determine if this row needs to be swapped with the next row
{
swapped = true; // mark swapped
for (int j = 0; j < n; j++) // swap each item in row i with each item in row i+1
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
}
}
}
while (swapped); // if we swapped anything, we need to make another pass to ensure the array is sorted
We can also do away with the need for n by using .GetUpperBound(dimension) which returns a value between 0 and n - 1 (where n is the count of items in the array in that dimension). Because the result is effectively n - 1, I've modified the loop conditions slightly:
bool swapped;
do
{
swapped = false;
for (int i = 0; i < A.GetUpperBound(0); i++)
{
if (A[i, 0] < A[i + 1, 0])
{
swapped = true;
for (int j = 0; j <= A.GetUpperBound(1); j++)
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
}
}
}
while (swapped);
We can also refer to the "Optimizing bubble sort" section of the Wikipedia page and implement that instead, which will make our code run more optimally:
int n = A.GetUpperBound(0); // get the initial value of n
do
{
int newn = 0; // default newn to 0, so if no items are visited, it will remain 0 and the loop will exit
for (int i = 0; i < n; i++)
{
if (A[i, 0] < A[i + 1, 0])
{
for (int j = 0; j <= A.GetUpperBound(1); j++)
{
int tmp = A[i, j];
A[i, j] = A[i + 1, j];
A[i + 1, j] = tmp;
}
newn = i; // store the current (highest) value of i swapped
}
}
n = newn; // set the value of n to the highest value of i swapped
}
while (n > 0); // loop until n == 0
The logic here (as explained on Wikipedia) is that by the end of the first pass, the last item is in the correct position. By the end of the second pass, the second-to-last and last items are in the correct position, and so on. So each time, we can visit one less item. When we have 0 items to visit, we have 0 items to swap, and the sort is complete.
You can see this optimised version in this YouTube visualisation.
I think the line
if (A[i, 0] < A[i++, 0])
should read
if (A[i, 0] < A[i+ 1, 0])

For loop Hash character pattern

Hi i am new to programming and i currently have this code:
namespace Patterns
{
class Program
{
static void Main(string[] args)
{
for (int i = 1; i <= 4; i++)//'rows'
{
for (int h = 1; h <= 9 - (i*2)+1; h++)
{
Console.Write("#");
}
Console.WriteLine("\n" );
}
}
}
}
This produces this output:
########
######
####
##
the number of hashes is correct as i am going from 8, 6, 4, 2 but i need to add an extra space every time i go onto a new line. How do i make it so the output is as follows?
########
######
####
##
Thanks,
Umer
From your code you could modify it to do the following in the inner for loop:
for (int j = 0; j < i - 1; j++) {
Console.Write(" ");
}
for (int h = 1; h <= 9 - (i*2)+1; h++) {
Console.Write("#");
}
Console.WriteLine("\n" );
As a note you should probably use StringBuilder to do this as I believe it is quite inefficient to constantly call Console.WriteLine.
The code could be modified further:
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= 4; i++) {
for (int j = 0; j < i - 1; j++) {
sb.append(" ");
}
for (int h = 1; h <= 9 - (i*2)+1; h++) {
sb.append("#");
}
sb.append("\n" );
}
Console.WriteLine(sb.toString());
Introduce variables, start your rows at 0 and repeat the string for each row number.
This can also be applied to the string printing the hashes:
static void Main(string[] args)
{
int rows = 4;
int columns = 9;
for (int i = 0; i < rows; i++)
{
// Print a string with `i` spaces.
Console.Write(new String(' ', i));
int hashes = columns - ((i + 1) * 2) + 1;
Console.Write(new String('#', hashes));
Console.WriteLine();
}
}
Basically, just add space in front of your hash characters.
######## Row 1 (i=1), 0 Space
###### Row 2 (i=2), 1 Space
#### Row 3 (i=3), 2 Spaces
## Row 4 (i=4), 3 Spaces
In this case, you need "i-1" spaces for each rows. (Actually, it's (8 - charater count) / 2) and character count was 9 - (i*2) + 1, so ( 8 - 9 + i * 2 - 1 ) / 2 = (i * 2 - 2) / 2 = i - 1 )
So just make loop to add spaces before print hash chracters.
namespace Patterns
{
class Program
{
static void Main(string[] args)
{
for (int i = 1; i <= 4; i++)//'rows'
{
for (int j = 0; j < i -1; j++) {
Console.Write(" ");
}
for (int h = 1; h <= 9 - (i*2)+1; h++)
{
Console.Write("#");
}
Console.WriteLine("\n" );
}
}
}
}
You could do something like this:
for (int i = 1; i <= 4; i++)//'rows'
{
for (int h = 1; h <= 9 - (i*2)+1; h++)
{
Console.Write("#");
}
Console.WriteLine("\n" );
for (int y = i; y > 0; y--)
{
Console.Write(" ");
}
}

Prefer jagged arrays over multidimensional [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I get this performance issue from visual studio (Prefer jagged arrays over multidimensional).
The code to be replaced is "//matrix".
How can i do this with my code?
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
Here's a version which uses only a single dimensional array.
public static int LevenshteinDistance(string s, string t)
{
int n = s.Length; //length of s
int m = t.Length; //length of t
int stride = m+1;
int[] d = new int[(n + 1)*stride];
// note, d[i*m + i + j] holds (i,j)
int cost;
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2, adjusted to skip (0,0)
for (int i = 0, k = stride; k < d.Length; k += stride) d[k] = ++i;
for (int j = 1; j < stride; ++j) d[j] = j;
// Step 3
int newrow = stride * 2;
char si = s[0];
for (int i=0, j=0, k = stride + 1; k < d.Length; ++k)
{
// don't overwrite d[i,0]
if (k == newrow) {
newrow += stride;
j=0;
si=s[++i];
continue;
}
// Step 5
cost = (t[j] == si ? 0 : 1);
// Step 6
d[k] = System.Math.Min(System.Math.Min(
d[k-stride] + 1, /* up one row */
d[k-1] + 1 /* left one */ ),
d[k-stride-1] + cost /* diagonal */ );
}
// Step 7
return d[d.Length-1];
}
This should improve performance 3 ways:
No string comparison and no one-character string garbage for the GC to clean up
Changed memory layout to match iteration order, improving cache behavior
Used single dimensional array and optimizer-friendly idioms, which should reduce bounds-checking
However, I'm pretty sure that applying mike z's suggestion of using two vectors will make for even clearer code.

C# LevenshteinDistance algorithm for spellchecker

Hi i'm using the levenshtein algorithm to calculate the difference between two strings, using the below code. It currently provides the total number of changes which need to be made to get from 'answer' to 'target', but i'd like to split these up into the types of errors being made. So classifying an error as a deletion, substitution or insertion.
I've tried adding a simple count but i'm new at this and don't really understand how the code works so not sure how to go about it.
static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
Thanks in advance.

Damerau - Levenshtein Distance, adding a threshold

I have the following implementation, but I want to add a threshold, so if the result is going to be greater than it, just stop calculating and return.
How would I go about that?
EDIT: Here is my current code, threshold is not yet used...the goal is that it is used
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var cost = string1[i - 1] == string2[j - 1] ? 0 : 1;
var del = d[i - 1, j] + 1;
var ins = d[i, j - 1] + 1;
var sub = d[i - 1, j - 1] + cost;
d[i, j] = Math.Min(del, Math.Min(ins, sub));
if (i > 1 && j > 1 && string1[i - 1] == string2[j - 2] && string1[i - 2] == string2[j - 1])
d[i, j] = Math.Min(d[i, j], d[i - 2, j - 2] + cost);
}
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)];
}
}
This is Regarding ur answer this: Damerau - Levenshtein Distance, adding a threshold
(sorry can't comment as I don't have 50 rep yet)
I think you have made an error here. You initialized:
var minDistance = threshold;
And ur update rule is:
if (d[i, j] < minDistance)
minDistance = d[i, j];
Also, ur early exit criteria is:
if (minDistance > threshold)
return int.MaxValue;
Now, observe that the if condition above will never hold true! You should rather initialize minDistance to int.MaxValue
Here's the most elegant way I can think of. After setting each index of d, see if it exceeds your threshold. The evaluation is constant-time, so it's a drop in the bucket compared to the theoretical N^2 complexity of the overall algorithm:
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
...
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
...
var temp = d[i,j] = Math.Min(del, Math.Min(ins, sub));
if (i > 1 && j > 1 && string1[i - 1] == string2[j - 2] && string1[i - 2] == string2[j - 1])
temp = d[i,j] = Math.Min(temp, d[i - 2, j - 2] + cost);
//Does this value exceed your threshold? if so, get out now
if(temp > threshold)
return temp;
}
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)];
}
You also asked this as a SQL CLR UDF question so I'll answer in that specific context: you best optmiziation won't come from optimizing the Levenshtein distance, but from reducing the number of pairs you compare. Yes, a faster Levenshtein algorithm will improve things, but not nearly as much as reducing the number of comparisons from N square (with N in the millions of rows) to N*some factor. My proposal is to compare only elements who have the length difference within a tolerable delta. On your big table, you add a persisted computed column on LEN(Data) and then create an index on it with include Data:
ALTER TABLE Table ADD LenData AS LEN(Data) PERSISTED;
CREATE INDEX ndxTableLenData on Table(LenData) INCLUDE (Data);
Now you can restrict the sheer problem space by joining within an max difference on lenght (eg. say 5), if your data's LEN(Data) varies significantly:
SELECT a.Data, b.Data, dbo.Levenshtein(a.Data, b.Data)
FROM Table A
JOIN Table B ON B.DataLen BETWEEN A.DataLen - 5 AND A.DataLen+5
Finally got it...though it's not as beneficial as I had hoped
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
var del = d[im1, j] + 1;
var ins = d[i, jm1] + 1;
var sub = d[im1, jm1] + cost;
//Math.Min is slower than native code
//d[i, j] = Math.Min(del, Math.Min(ins, sub));
d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i, j] = Math.Min(d[i, j], d[im2, jm2] + cost);
if (d[i, j] < minDistance)
minDistance = d[i, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[d.GetUpperBound(0), d.GetUpperBound(1)];
}

Categories

Resources