Is this Levenshtein Distance algorithm correct?

Is this Levenshtein Distance algorithm correct? - c#

I have written the algorithm below to compute the Levenshtein distance, and it seems to return the correct results based on my tests. The time complexity is O(n+m), and the space is O(1).
All the existing algorithms I've seen only for this have space complexity O(n*m), as they create a matrix. Is there something wrong with my algorithm?
public static int ComputeLevenshteinDistance(string word1, string word2)
{
var index1 = 0;
var index2 = 0;
var numDeletions = 0;
var numInsertions = 0;
var numSubs = 0;
while (index1 < word1.Length || index2 < word2.Length)
{
if (index1 == word1.Length)
{
// Insert word2[index2]
numInsertions++;
index2++;
}
else if (index2 == word2.Length)
{
// Delete word1[index1]
numDeletions++;
index1++;
}
else if (word1[index1] == word2[index2])
{
// No change as word1[index1] == word2[index2]
index1++;
index2++;
}
else if (index1 < word1.Length - 1 && word1[index1 + 1] == word2[index2])
{
// Delete word1[index1]
numDeletions++;
index1++;
}
else if (index2 < word2.Length - 1 && word1[index1] == word2[index2 + 1])
{
// Insert word2[index2]
numInsertions++;
index2++;
}
else
{
// Substitute word1[index1] for word2[index2]
numSubs++;
index1++;
index2++;
}
}
return numDeletions + numInsertions + numSubs;
}

Was a comment, but I feel it is probably suitable as an answer:
Short answer is "no", if you want the true shortest distance for any given inputs.
The reason your code appears more efficient (and the reason that other implementations create a matrix instead of doing what you're doing) is that your stepwise implementation ignores a lot of potential solutions.
Examples #BenVoigt gave illustrate this, another perhaps clearer illustration is ("aaaardvark", "aardvark") returns 8, should be 2: it's getting tripped up because it's matching the first a and thinking it can move on, when in fact a more optimal solution would be to consider the first two characters insertions.

Related

Ask someone's help to explain the performance issue of C# code

I am a C# developer and I am training my coding and algorithm skills on LeetCode.
And now I am handling the problem 5: longest palindromic substring.
I want someone can explain the reason.
My solution version 1 to this problem was:
public string LongestPalindrome(string s)
{
// Step 0: Handles invalid or special cases.
if (string.IsNullOrWhiteSpace(s) ||
s.Length == 1 ||
s.Distinct().Count() == 1 ||
s.Reverse().SequenceEqual(s))
{
return s;
}
if (s.Length == 2)
{
return s.First().Equals(s.Last()) ? s : s.First().ToString();
}
if (s.Distinct().Count() == s.Length)
{
return s.First().ToString();
}
// Step 1: Handles normal cases.
var longestPalindromeSubstring = string.Empty;
for (var index = 0; index < s.Length && s.Length - index > longestPalindromeSubstring.Length; index++)
{
var currentChar = s[index];
var currentCharLastIndex = s.LastIndexOf(currentChar);
if (index == currentCharLastIndex)
{
if (!string.IsNullOrWhiteSpace(longestPalindromeSubstring) ||
longestPalindromeSubstring.Length > 1)
{
continue;
}
longestPalindromeSubstring = currentChar.ToString();
}
var currentCharIndexes = new Stack<int>();
for (var nextIndex = index + 1; nextIndex <= currentCharLastIndex; nextIndex++)
{
if (s[nextIndex] == currentChar)
{
currentCharIndexes.Push(nextIndex);
}
}
while (currentCharIndexes.Any())
{
var relatedIndex = currentCharIndexes.Pop();
var possibleStr = s.Substring(index, relatedIndex - index + 1);
var reversedPossibleStr = new string(possibleStr.Reverse().ToArray());
if (!possibleStr.Equals(reversedPossibleStr) ||
possibleStr.Length < longestPalindromeSubstring.Length ||
possibleStr.Equals(longestPalindromeSubstring))
{
continue;
}
longestPalindromeSubstring = possibleStr;
}
}
return longestPalindromeSubstring;
}
However this solution above was failed to pass the LeetCode validation since the issue: Time Limit Exceeded.
Then I just made a small update, and the solution version 2 passed, the changed part was only adding ToCharArray() method before invoking Reverse() method:
var reversedPossibleStr = new string(possibleStr.ToCharArray().Reverse().ToArray());
if (!possibleStr.Equals(reversedPossibleStr) ||
possibleStr.Length < longestPalindromeSubstring.Length ||
possibleStr.Equals(longestPalindromeSubstring))
{
continue;
}
…………
But I am not sure the reason why it can work, I just guessed that the data in an array will be arranged in a sequence memory space, it may help to improve the performance, could someone explain more detail.
Thank you in advance.

The Reverse method uses EnumerableHelpers.ToArray to fetch the count of the input enumerable. If the enumerable doesn't implement ICollection<T> interface, it will use a list-like approach to creates an array which will extend the array many times. Unfortunately string doesn't implement ICollection<char>, though it knows how many characters it contains, so string.Reverse() is slower than string.ToCharArray().Reverse().

HackerRank Climbing the Leaderboard

This question has to do with this challenge on HackerRank. It seems to be failing some cases, but I'm not clear what's wrong with the algorithm (many people seem to have problem with timeouts, that's not an issue here, everything runs plenty fast, and all the cases that are visible to me pass, so I don't have a specific case that's failing).
The essential outline of how the algorithm works is as follows:
First be sure that Alice isn't already winning over the existing highest score (degenerate case), if she is just tell the world she's #1 from start to finish. Otherwise, at least one score on the leaderboard beats Alice's first try.
Start by walking down the scores list from the highest until we find a place where Alice fits in and record the scores that beat Alice's initial score along the way.
If we reach the end of the scores list before finding a place for Alice's bottom score, pretend there is a score at the bottom of the list which matches Alice's first score (this is just convenient for the main loop and reduces the problem to one where Alice's first score is on the list somewhere)
At this point we have a (sorted) array of scores with their associated ranks, rankAry[r - 1] is the minimum score needed for Alice to attain rank r as of the end of the if clause following the first while loop.
From there, the main algorithm takes over where we walk through Alice's scores and note her rank as we go by comparing against the benchmarks from the scores array that we setup as rankAry earlier. curRank is our candidate rank at each stage which we've definitely achieved by the time this loop starts (by construction).
If we're at rank 1 we will be forever more, so just populate the current rank as 1 and move on.
If we're currently tied with or beating the current benchmark and that's not the end of the line, keep peeking at the next one and if we're also beating that next one, decrease the current benchmark location and iterate
Once this terminates, we've found the one we're going to supplant and we cannot supplant anything further, so assign this rank to this score and repeat until done
As far as I can tell this handles all cases correctly, even if Alice has repeated values or increases between the benchmarks from scores, we should stay at the same rank until we hit the new benchmarks, but the site feedback indicates there must be a bug somewhere.
All the other approaches I've been able to find seem to be some variation on doing a binary search to find the score each time, but I prefer not having to constantly search each time and just use the auxiliary space, so I'm a little stumped on what could be off.
static int[] climbingLeaderboard(int[] scores, int[] alice) {
int[] res = new int[alice.Length];
if (scores.Length == 0 || alice[0] >= scores[0]) { //degenerate cases
for (int i = 0; i < alice.Length; ++i) {
res[i] = 1;
}
return res;
}
int[] rankAry = new int[scores.Length + 1];
rankAry[0] = scores[0]; //top score rank
int curPos = 1; //start at the front and move down
int curRank = 1; //initialize
//initialize from the front. This way we can figure out ranks as we go
while (curPos < scores.Length && scores[curPos] > alice[0]) {
if (scores[curPos] < scores[curPos-1]) {
rankAry[curRank] = scores[curPos]; //update the rank break point
curRank++; //moved down in rank
}
curPos++; //move down the array
}
if (curPos == scores.Length) { //smallest score still bigger than Alice's first
rankAry[curRank] = alice[0]; //pretend there was a virtual value at the end
curRank++; //give rank Alice will have for first score when we get there
}
for (int i = 0; i < alice.Length; ++i) {
if (curRank == 1) { //if we're at the top, we're going to stay there
res[i] = 1;
continue;
}
//Non-degenerate cases
while (alice[i] >= rankAry[curRank - 1]) {
if (curRank == 1 || alice[i] < rankAry[curRank - 2]) {
break;
}
curRank--;
}
res[i] = curRank;
}
return res;
}

You have a couple of bugs in your algorithm.
Wrong mapping
Your rankAry must map a rank (your index) to a score. However, with this line rankAry[0] = scores[0];, the highest score is mapped to 0, but the highest possible rank is 1 and not 0. So, change that to:
rankAry[1] = scores[0];
Wrong initial rank
For some reason, your curRank is set to 1 as below:
int curRank = 1; //initialize
However, it's wrong since your alice[0] is less than scores[0] because of the following block running at the beginning of your method:
if (scores.Length == 0 || alice[0] >= scores[0]) { //degenerate cases
for (int i = 0; i < alice.Length; ++i) {
res[i] = 1;
}
return res;
}
So, at best your curRank is 2. Hence, change it to:
int curRank = 2;
Then, you can also remove curRank++ as your curRank has a correct initial value from:
if (curPos == scores.Length) { //smallest score still bigger than Alice's first
rankAry[curRank] = alice[0]; //pretend there was a virtual value at the end
curRank++; // it's not longer needed so remove it
}
Improve "Non-degenerate cases" handling
Your break condition should consider rankAry at curRank - 1 and not curRank - 2 as it's enough to check the adjacent rank value. Also, a value at curRank - 2 will produce wrong results for some input but I won't explain for which cases specifically - I'll leave it up to you to find out.
Fixed Code
So, I fixed your method according to my comment above and it passed it all the tests. Here it is.
static int[] climbingLeaderboard(int[] scores, int[] alice) {
int[] res = new int[alice.Length];
if (scores.Length == 0 || alice[0] >= scores[0]) { //degenerate cases
for (int i = 0; i < alice.Length; ++i) {
res[i] = 1;
}
return res;
}
int[] rankAry = new int[scores.Length + 1];
rankAry[1] = scores[0]; //top score rank
int curPos = 1; //start at the front and move down
int curRank = 2; //initialize
//initialize from the front. This way we can figure out ranks as we go
while (curPos < scores.Length && scores[curPos] > alice[0]) {
if (scores[curPos] < scores[curPos-1]) {
rankAry[curRank] = scores[curPos]; //update the rank break point
curRank++; //moved down in rank
}
curPos++; //move down the array
}
if (curPos == scores.Length) { //smallest score still bigger than Alice's first
rankAry[curRank] = alice[0]; //pretend there was a virtual value at the end
}
for (int i = 0; i < alice.Length; ++i) {
if (curRank == 1) { //if we're at the top, we're going to stay there
res[i] = 1;
continue;
}
//Non-degenerate cases
while (alice[i] >= rankAry[curRank - 1]) {
if (curRank == 1 || alice[i] < rankAry[curRank - 1]) {
break;
}
curRank--;
}
res[i] = curRank;
}
return res;
}

Find lowest period of a string (given array of string) to create a substring and return an array of integers

So I was looking through interview questions that unfortunately didn't have any solutions, and one of them seemed interesting (mostly because I had no idea how to do it). However now I have hit a roadblock and am not sure how to proceed. The objective of the program is to take any amount of strings and spit out the periods of the lowest repeated characters to form the substring in an integer array.
Ex: "abbabbabba","abcdabcddabcdabc","aeiouuaeioouaei" would be 5, 8, 5.
I kind of looked at it as doing Indian runs. The first character runs to the front, if it equals the next character than continue running, etc.
This is the code I have compiled and I tried to take a look at the algorithm for finding periods as well (http://monge.univ-mlv.fr/~mac/REC/text-algorithms.pdf#page=345). I just can't seem to pinpoint the problem in the code
namespace Interview36{
public class PeriodicStrings
{
public static int[] process(string[] input)
{
if (input == null || input.Length == 0 || input[0].Length == 0)
{
Console.WriteLine("Result is null");
}
int length = 0;
int[] list = new int[input.Length];
for (int i = 0; i < input.Length; i++) {
while (i < input[i].Length)
{
if (input[i] == input[length])
{
length++;
list[i] = length;
i++;
}
else
{
if (length != 0)
{
length = list[length - 1];
}
else
{
list[i] = 0;
i++;
}
}
}
}
return list;
}
}
}
My best guess is that my logic is WAY off, or the loop isn't running through properly and I'm just not catching it in debug. Thank you in advance for any insight!

Which corner case unit test would this fail?

I tried the Fish problem on Codility and I secured 75% marks for correctness because the results reported that my code failed one simple test case. The results do not report what input was provided for the test case.
Could you please help me find out what is wrong with my code and what corner case it would fail?
using System;
public class Solution
{
// Time complexity: O(N)
// Space complexity: O(N)
public int solution(int[] sizes, int[] direction)
{
if (sizes == null || direction == null)
throw new ArgumentNullException();
var sizesLen = sizes.Length;
var directionLen = direction.Length;
if (sizesLen != direction.Length)
throw new ArgumentException();
var len = sizesLen;
if (len <= 1) return len;
var survivors = new Fish[len];
survivors[0] = new Fish(sizes[0], direction[0]);
var curr = 0;
for (int i = 1; i < len; i++)
{
var fish = new Fish(sizes[i], direction[i]);
if (survivors[curr].Direction == 1 && fish.Direction == 0)
{
if (fish.Size < survivors[curr].Size) continue;
while(curr >= 0 &&
fish.Size > survivors[curr].Size &&
survivors[curr].Direction == 1)
{
curr--;
}
}
survivors[++curr] = fish;
}
return ++curr;
}
}
public class Fish
{
public Fish(int size, int direction)
{
Size = size;
Direction = direction;
}
public int Size { get; set; }
public int Direction { get; set; }
}

As mentioned in your code, your Solution is O(M*N). As stated in the problem link, the code should run in linear time. Hence, I will not correct your solution as it will eventually fail on bigger test cases. I will provide you a linear algorithm that you can easily implement.
Keep a Stack S, empty initially.
Iterate over the array A, i from 0 to n-1
When you encounter an element, say A[i], do the following
If the stack S is empty, then push both (A[i], B[i]) as a pair
Else, extract the top pair from the stack S and compare the value of B[top] and B[i].
While B[top] is 1 and B[i] is 0, then one of the fishes will eat the other one. So pop from stack S, the top element. Now, compare which fish is bigger with values A[top] and A[i]. Whichever is bigger, that fish stays alive. Push that pair in the stack S, that corresponds to the fish that stays alive. Continue the while loop till the condition fails
If B[top] is not 1 and B[i] is not 0, then simply push the new pair (A[i],B[i])
The size of the stack S at the end, is your answer.
Note: You might not be passing that test case, for which your solution times out. For example, for N=100000, your solution will time out.
In my solution, the worst case time complexity is O(N+N) = O(2N) = O(N). N times because of the iteration over array A and another N times worst case, due to the Stack if it keeps shrinking, for the while condition holds true.
Hope it helps!!!
Edit: suppose A = [ 99, 98, 92, 91, 93 ], and B = [1, 1, 1, 1, 0]. Your code gives answer as 3. Expected answer = 2
Edit-2: This is your modified code that will pass every test case
public int solution(int[] sizes, int[] direction)
{
if (sizes == null || direction == null)
throw new ArgumentNullException();
var sizesLen = sizes.Length;
var directionLen = direction.Length;
if (sizesLen != direction.Length)
throw new ArgumentException();
var len = sizesLen;
if (len <= 1) return len;
var survivors = new Fish[len];
survivors[0] = new Fish(sizes[0], direction[0]);
var curr = 0;
for (int i = 1; i < len; i++)
{
var fish = new Fish(sizes[i], direction[i]);
if (survivors[curr].Direction == 1 && fish.Direction == 0)
{
if (fish.Size < survivors[curr].Size) continue;
while(curr >= 0 &&
fish.Size > survivors[curr].Size &&
survivors[curr].Direction == 1)
{
curr--;
}
if (curr >= 0)
{
if (fish.Size < survivors[curr].Size &&
survivors[curr].Direction == 1)
continue;
}
}
survivors[++curr] = fish;
}
return ++curr;
}
}
public class Fish
{
public Fish(int size, int direction)
{
Size = size;
Direction = direction;
}
public int Size { get; set; }
public int Direction { get; set; }
}

I think the intention here is to use Stack or Queue. Here is a solution with two Stack.
public static int Fish(int[] A, int[] B)
{
var downStreamFish = new Stack<int>(B.Length);
var upStreamFish = new Stack<int>(B.Length);
var result = B.Length;
for (var i = 0; i < B.Length; i++)
{
// push the fish into up/down stream stack.
if (B[i] == 1)
downStreamFish.Push(i);
else
upStreamFish.Push(i);
// check to see whether it's possible to eat a fish
while (downStreamFish.Count > 0 && upStreamFish.Count > 0)
{
var dfIndex = downStreamFish.Peek();
var ufIndex = upStreamFish.Peek();
//NOTE:downstream fish index must be less than upstream fish index in order for 'eat' to happen
if (dfIndex < ufIndex)
{
if (A[dfIndex] > A[ufIndex])
upStreamFish.Pop();
else
downStreamFish.Pop();
result--; // one fish is eatten
}
else
break; // eat condition is not met
}
}
return result;
}

change the size of Stack in C#

I write a DFS(depth first search) founction in my C# project A, and it runs OK. Then I build a new c# project B, which has more lines of codes than A. When I run the same founction in project B with the same input data,my VS2008 shows that there is a stack-overflow error.
Can I change the size of Stack in C#?
The founction's name is :FindBlocksFounction().
Codes which cause stack-overflow:
int tempx = nowx + dir[i, 0];
int tempy = nowy + dir[i, 1];
if (tempx < 0 || tempy < 0 || tempx >= m_Bitmap.Height || tempy >= m_Bitmap.Width)
continue;
int next;
next = PointList.FindIndex(t =>
{
if (t.x == tempx && t.y == tempy)
return true;
else
return false;
});//It seems like that FindIndex() in List<> costs some stack room.
if (next == -1)
continue;
if (color[next] == 0)
{
FindBlocksFounction(next);
}

I think best way to convert your Depth First Search algorithm from recursion to using Queue/Dequeue collection. It is a not complex task. Just google it or take a look here:
Non recursive Depth first search algorithm
It will prevent your code from stack size issues for any amount of data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is this Levenshtein Distance algorithm correct? - c#

Related

Ask someone's help to explain the performance issue of C# code

HackerRank Climbing the Leaderboard

Find lowest period of a string (given array of string) to create a substring and return an array of integers

Which corner case unit test would this fail?

change the size of Stack in C#

Categories

Resources