Fastest way to check if an array is sorted - c#

Considering there is an array returned from a function which is of very large size.
What will be the fastest approach to test if the array is sorted?
A simplest approach will be:
/// <summary>
/// Determines if int array is sorted from 0 -> Max
/// </summary>
public static bool IsSorted(int[] arr)
{
for (int i = 1; i < arr.Length; i++)
{
if (arr[i - 1] > arr[i])
{
return false;
}
}
return true;
}

You will have to visit each element of the array to see if anything is unsorted.
Your O(n) approach is about as fast as it gets, without any special knowledge about the likely state of the array.
Your code specifically tests if the array is sorted with smaller values at lower indices. If that is not what you intend, your if becomes slightly more complex. Your code comment does suggest that is what you're after.
If you were to have special knowledge of the probable state (say, you know it's generally sorted but new data might be added to the end), you can optimize the order in which you visit array elements to allow the test to fail faster when the array is unsorted.
You can leverage knowledge of the hardware architecture to check multiple parts of the array in parallel by partitioning the array, first comparing the boundaries of the partition (fail fast check) and then running one array partition per core on a separate thread (no more than 1 thread per CPU core). Note though that if a array partition is much smaller than the size of a cache line, the threads will tend to compete with each other for access to the memory containing the array. Multithreading will only be very efficient for fairly large arrays.

Faster approach, platform target: Any CPU, Prefer 32-bit.
A sorted array with 512 elements: ~25% faster.
static bool isSorted(int[] a)
{
int j = a.Length - 1;
if (j < 1) return true;
int ai = a[0], i = 1;
while (i <= j && ai <= (ai = a[i])) i++;
return i > j;
}
Target: x64, same array: ~40% faster.
static bool isSorted(int[] a)
{
int i = a.Length - 1;
if (i <= 0) return true;
if ((i & 1) > 0) { if (a[i] < a[i - 1]) return false; i--; }
for (int ai = a[i]; i > 0; i -= 2)
if (ai < (ai = a[i - 1]) || ai < (ai = a[i - 2])) return false;
return a[0] <= a[1];
}
Forgot one, marginally slower than my first code block.
static bool isSorted(int[] a)
{
int i = a.Length - 1; if (i < 1) return true;
int ai = a[i--]; while (i >= 0 && ai >= (ai = a[i])) i--;
return i < 0;
}
Measuring it (see greybeard's comment).
using System; // ????????? DEBUG ?????????
using sw = System.Diagnostics.Stopwatch; // static bool abc()
class Program // { // a <= b <= c ?
{ // int a=4,b=7,c=9;
static void Main() // int i = 1;
{ // if (a <= (a = b))
//abc(); // {
int i = 512; // i++;
int[] a = new int[i--]; // if (a <= (a = c))
while (i > 0) a[i] = i--; // {
sw sw = sw.StartNew(); // i++;
for (i = 10000000; i > 0; i--) // }
isSorted(a); // }
sw.Stop(); // return i > 2;
Console.Write(sw.ElapsedMilliseconds); // }
Console.Read(); // static bool ABC();
} // {
// int[]a={4,7,9};
static bool isSorted(int[] a) // OP Cannon // int i=1,j=2,ai=a[0];
{ // L0: if(i<=j)
for (int i = 1; i < a.Length; i++) // if(ai<=(ai=a[i]))
if (a[i - 1] > a[i]) return false; // {i++;goto L0;}
return true; // return i > j;
} // }
}
Target: x64. Four cores/threads.
A sorted array with 100,000 elements: ~55%.
static readonly object _locker = new object();
static bool isSorted(int[] a) // a.Length > 3
{
bool b = true;
Parallel.For(0, 4, k =>
{
int i = 0, j = a.Length, ai = 0;
if (k == 0) { j /= 4; ai = a[0]; } // 0 1
if (k == 1) { j /= 2; i = j / 2; ai = a[i]; } // 1 2
if (k == 2) { i = j - 1; ai = a[i]; j = j / 2 + j / 4; } // 4 3
if (k == 3) { i = j - j / 4; ai = a[i]; j = j / 2; } // 3 2
if (k < 2)
while (b && i <= j)
{
if (ai <= (ai = a[i + 1]) && ai <= (ai = a[i + 2])) i += 2;
else lock (_locker) b = false;
}
else
while (b && i >= j)
{
if (ai >= (ai = a[i - 1]) && ai >= (ai = a[i - 2])) i -= 2;
else lock (_locker) b = false;
}
});
return b;
}
1,000,000 items?
if (k < 2)
while (b && i < j)
if (ai <= (ai = a[i + 1]) && ai <= (ai = a[i + 2]) &&
ai <= (ai = a[i + 3]) && ai <= (ai = a[i + 4])) i += 4;
else lock (_locker) b = false;
else
while (b && i > j)
if (ai >= (ai = a[i - 1]) && ai >= (ai = a[i - 2]) &&
ai >= (ai = a[i - 3]) && ai >= (ai = a[i - 4])) i -= 4;
else lock (_locker) b = false;
Let's forget percentages.
Original: 0.77 ns/item, now: 0.22 ns/item.
2,000,000 items? Four cores: 4 times faster.

Linq solution.
public static bool IsSorted<T>(IEnumerable<T> list) where T:IComparable<T>
{
var y = list.First();
return list.Skip(1).All(x =>
{
bool b = y.CompareTo(x) < 0;
y = x;
return b;
});
}

Here is my version of the function IsSorted
public static bool IsSorted(int[] arr)
{
int last = arr.Length - 1;
if (last < 1) return true;
int i = 0;
while(i < last && arr[i] <= arr[i + 1])
i++;
return i == last;
}
While this function is a bit faster than in the question, it will do fewer assignments and comparisons than anything has been posted so far. In the worst case, it does 2n+1 comparisons. It still can be improved if you can make a reasonable assumption about the nature of the data like minimum data size or array contains even number of elements.

The only improvement i can think of is check both ends of the array at the same time, this little change will do it in half time...
public static bool IsSorted(int[] arr)
{
int l = arr.Length;
for (int i = 1; i < l/2 + 1 ; i++)
{
if (arr[i - 1] > arr[i] || arr[l-i] < arr[l-i-1])
{
return false;
}
}
return true;
}

This is what I came up with and find works better particularly with greater sized arrays. The function is recursive and will be called for the very first time, say in a while loop like this
while( isSorted( yourArray, 0 )
The if statement checks if the bounds of the array have been reached.
The else if statement will call itself recursively and break at any time when the condition becomes false
public static bool IsSorted(int[] arr, int index)
{
if (index >= arr.Length - 1)
{
return true;
}
else if ((arr[index] <= arr[ index + 1]) && IsSorted(arr, index + 1))
{
return true;
}
else
{
return false;
}
}

If the order doesn't matter(descending or ascending).
private bool IsSorted<T>(T[] values) where T:IComparable<T>
{
if (values == null || values.Length == 0) return true;
int sortOrder = 0;
for (int i = 0; i < values.Length - 1; i++)
{
int newSortOrder = values[i].CompareTo(values[i + 1]);
if (sortOrder == 0) sortOrder = newSortOrder;
if (newSortOrder != 0 && sortOrder != newSortOrder) return false;
}
return true;
}

The question that comes to my mind is "why"?
Is it to avoid re-sorting an already-sorted list? If yes, just use Timsort (standard in Python and Java). It's quite good at taking advantage of an array/list being already sorted, or almost sorted. Despite how good Timsort is at this, it's better to not sort inside a loop.
Another alternative is to use a datastructure that is innately sorted, like a treap, red-black tree or AVL tree. These are good alternatives to sorting inside a loop.

This might not be the fastest but it's the complete solution. Every value with index lower than i is checked against the current value at i. This is written in php but can easily be translated into c# or javascript
for ($i = 1; $i < $tot; $i++) {
for ($j = 0; $j <= $i; $j++) {
//Check all previous values with indexes lower than $i
if ($chekASCSort[$i - $j] > $chekASCSort[$i]) {
return false;
}
}
}

Related

returns the smallest positive integer (greater than 0) that does not occur in Array

I have the following question:-
Write a function:
class Solution { public int solution(int[] A); }
that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [−1,000,000..1,000,000].
now i tried this code:-
using System;
// you can also use other imports, for example:
// using System.Collections.Generic;
// you can write to stdout for debugging purposes, e.g.
// Console.WriteLine("this is a debug message");
class Solution
{
public int solution(int[] A)
{
// write your code in C# 6.0 with .NET 4.5 (Mono)
int n = 1;
Array.Sort(A);
for (int i = 1; i <= 100000; i++)
{
for (int i2 = 0; i2 <= A.Length - 1; i2++)
{
if (A[i2] == i)
{
n = A[i2] + 1;
break;
}
}
}
return n;
}
}
where my code worked well for these test data:-
A = [1, 2, 3]
A = [−1, −3]
while failed for this one:-
A = [1, 3, 6, 4, 1, 2] where it return 7 instead of 5.
any advice why my code failed on the 3rd test?
Thanks
using System.Linq;
int smallestNumber = Enumerable.Range(1, 100000).Except(A).Min();
I would use following approach that uses a HashSet<int> to check if a given integer is missing:
public static int? SmallestMissing(int[] A, int rangeStart = 1, int rangeEnd = 100_000)
{
HashSet<int> hs = new HashSet<int>(A);
for (int i = rangeStart; i <= rangeEnd; i++)
if(!hs.Contains(i)) return i;
return null;
}
A HashSet is a collection if unique values and it's very efficient in lookup items(complexity is O(1)). So you get a very readable and efficient algorithm at the cost of some memory.
Maybe you could optimize it by providing another algorithm in case the array is very large, you don't want to risk an OutOfMemoryException:
public static int? SmallestMissing(int[] A, int rangeStart = 1, int rangeEnd = 100_000)
{
if(A.Length > 1_000_000)
{
Array.Sort(A);
for (int i = rangeStart; i <= rangeEnd; i++)
{
int index = Array.BinarySearch(A, i);
if(index < 0) return i;
}
return null;
}
HashSet<int> hs = new HashSet<int>(A);
for (int i = rangeStart; i <= rangeEnd; i++)
if(!hs.Contains(i)) return i;
return null;
}
If you're allowed to sort the array in-place, which means modifying the input parameter value, here's a simple linear probe for the missing value (on top of the sort of course).
Here's the pseudo-code:
Sort the array
Skip all negatives and 0's at the start
Loopify the following:
Expect 1, if not found at current location return 1
Skip all 1's
Expect 2, if not found at current location return 2
Skip all 2's
Expect 3, if not found at current location return 3
Skip all 3's
... and so on for 4, 5, 6, etc. until end of array
If we get here, return currently expected value which should've been at the end
Here's the code:
public static int FirstMissingValue(int[] input)
{
Array.Sort(input);
int index = 0;
// Skip negatives
while (index < input.Length && input[index] < 1)
index++;
int expected = 1;
while (index < input.Length)
{
if (input[index] > expected)
return expected;
// Skip number and all duplicates
while (index < input.Length && input[index] == expected)
index++;
expected++;
}
return expected;
}
Test-cases:
Console.WriteLine(FirstMissingValue(new[] { 1, 3, 6, 4, 1, 2 }));
Console.WriteLine(FirstMissingValue(new[] { 1, 2, 3 }));
Console.WriteLine(FirstMissingValue(new[] { -1, -3 }));
output:
5
4
1
Your alg won't work in case input array becomes like this: [1,2-1,1,3,5]. I did this based on your alg. Give it a try:
int[] a = new int[] { -1, -2};
IEnumerable<int> uniqueItems = a.Distinct<int>().Where(x => x > 0);
if (uniqueItems.Count() == 0)
{
Console.WriteLine("result: 1");
}
else
{
Array asList = uniqueItems.ToArray();
Array.Sort(asList);
for (int i = 1; i <= 100000; i++)
{
if ((int)asList.GetValue(i - 1) != i)
{
Console.WriteLine("result: " + i);
break;
}
}
}
you can try like this.
public static int solution(int[] A)
{
int smallest = -1;
Array.Sort(A);
if(A[0] > 1)
return 1;
for(int i = 0; i < A.Length; i++)
{
if(A.Length != i+1 && A[i] + 1 != A[i + 1] && A[i+1] > 0)
{
smallest = A[i]+1;
break;
}
else if(A[i] > 0 && A.Length == i+1)
{
smallest = A[i] + 1;
}
}
return smallest > 0 ? smallest:1;
}
Here's the approach that uses O(N) partitioning followed by an O(N) search. This approach does not use any additional storage, but it DOES change the contents of the array.
This code was converted from here. Also see this article.
I've added comments to try to explain how the second stage findSmallestMissing() works. I've not commented the partitioning method, since that's just a variant of a standard partition as might be used in a QuickSort algorithm.
static class Program
{
public static void Main()
{
Console.WriteLine(FindSmallestMissing(1, 3, 6, 4, 1, 2));
Console.WriteLine(FindSmallestMissing(1, 2, 3));
Console.WriteLine(FindSmallestMissing(-1, -3));
}
public static int FindSmallestMissing(params int[] array)
{
return findSmallestMissing(array, partition(array));
}
// Places all the values > 0 before any values <= 0,
// and returns the index of the first value <= 0.
// The values are unordered.
static int partition(int[] arr)
{
void swap(int x, int y)
{
var temp = arr[x];
arr[x] = arr[y];
arr[y] = temp;
}
int pIndex = 0; // Index of pivot.
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] > 0) // pivot is 0, hence "> 0"
swap(i, pIndex++);
}
return pIndex;
}
// This is the clever bit.
// We will use the +ve values in the array as flags to indicate that the number equal to that index is
// present in the array, by making the value negative if it is found in the array.
// This way we can store both the original number AND whether or not the number equal to that index is present
// in a single value.
//
// Given n numbers that are all > 0, find the smallest missing number as follows:
//
// For each array index i in (0..n):
// val = |arr[i]| - 1; // Subtract 1 so val will be between 0 and max +ve value in original array.
// if (val is in range) // If val beyond the end of the array we can ignore it
// and arr[val] is non-negative // If already negative, no need to make it negative.
// make arr[val] negative
//
// After that stage, we just need to find the first positive number in the array, which indicates that
// the number equal to that index + 1 is missing.
// n = number of values at the start of the array that are > 0
static int findSmallestMissing(int[] arr, int n)
{
for (int i = 0; i < n; i++)
{
int val = Math.Abs(arr[i]) - 1;
if (val < n && arr[val] >= 0)
arr[val] = -arr[val];
}
for (int i = 0; i < n; i++)
{
if (arr[i] > 0) // Missing number found.
return i + 1;
}
return n + 1; // No missing number found.
}
}
class Program
{
static void Main(string[] args)
{
int [] A = new int[] {1, 2, 3};
int n = 0;
bool found = false;
Array.Sort(A);
for (int i = 1; i <= 100000; i++) {
for (int x = 0; x <= A.Length - 1; x++) {
int next = (x + 1) < A.Length ? (x + 1): x;
if (A[x] > 0 && (A[next] - A[x]) > 0) {
n = A[x] + 1;
found = true;
break;
}
}
if(found) {
break;
}
}
Console.WriteLine("Smallest number: " + n);
}
}
int smallestNumber=Enumerable.Range(1,(int.Parse(A.Length.ToString())+1)).Except(A).Min();
Array.Sort(A);
for (int number = 1; number <= 100000; number++)
{
for (int num = number; i2 <= A.Length - 1; num++)
{
if (A[num] == number)
{
smallestNumber = A[num] + 1;
break;
}
}
}
return smallestNumber;
}
The easiest one :)
class Solution
{
public int solution(int[] array)
{
int[] onlyPositiveArray = array.Where(a => a > 0).OrderBy(a => a).Distinct().ToArray();
int smallestNumber = 1;
foreach (var number in onlyPositiveArray)
{
if (smallestNumber != number)
{
break;
}
smallestNumber ++;
}
if (!onlyPositiveArray.Contains(smallestNumber ))
{
return smallestNumber;
}
else
{
return smallestNumber + 1;
}
}
}
PHP Solution:
function solution($A) {
// write your code in PHP7.0
// sort array
sort($A);
// get the first
$smallest = $A[0];
// write while
while( in_array(($smallest),$A) || (($smallest) < 1 ) )
{
$smallest++;
}
return $smallest;
}
My solution, also if someone could test how performant it is?
public int solution(int[] N) {
if (N.Length == 0)
return 1;
else if (N.Length == 1)
return N[0] >= 0 ? N[0] + 1 : 1;
Array.Sort(N);
int min = Array.Find(N, IsUnderZero);
if (min ==
default)
return 1;
HashSet < int > hashSet = new HashSet < int > (N);
int max = N[N.Length - 1];
for (int i = min + 1; i <= max + 1; i++) {
if (!hashSet.Contains(i) && i > 0)
return i;
}
return max + 1;
bool IsUnderZero(int i) => i <= 0;
}
Try the below:
public static int MinIntegerGreaterThanZeroInArray(int[] A)
{
int minInt;
if (A.Length > 0)
{
Array.Sort(A);
for (minInt = 1; minInt <= A.Length; minInt++)
{
int index = Array.BinarySearch(A, minInt);
if (index < 0) return minInt;
}
return minInt;
}
//Array is empty.
throw new InvalidOperationException();
}
public static int Smallest(int[] A)
{
int maxPositiveInt = 1;
HashSet<int> NumDic = new HashSet<int>();
for (int i = 0; i < A.Length; i++)
{
if (A[i] <= 0)
{
continue;
}
if (!NumDic.Contains(A[i]))
{
NumDic.Add(A[i]);
}
maxPositiveInt = Math.Max(A[i], maxPositiveInt);
}
//All numbers are negative
if (NumDic.Count == 0)
{
return 1;
}
int smallestinteger = 1;
for (int i = 0; i < A.Length; i++)
{
if (A[i] <= 0)
{
continue;
}
if (!NumDic.Contains(smallestinteger))
{
return smallestinteger;
}
else
{
smallestinteger++;
}
}
return maxPositiveInt + 1;
}
static void Main(string[] args)
{
Console.WriteLine(solution(new int[]{1, 3, 6, 4, 1, 2}));
}
public static int solution(int[] A)
{
Array.Sort(A);
int smallest = A[0];
while (A.Contains(smallest+1)|| (smallest+1)<1)
{
smallest++;
}
return smallest +1;
}

Appending '0' using concat does not seem to work

I have a program that compares two integer value's length with this extension method
public static int NumDigits(this int n)
{
if (n < 0)
{
n = (n == int.MinValue) ? int.MaxValue : -n;
}
if (n < 10) return 1;
if (n < 100) return 2;
if (n < 1000) return 3;
if (n < 10000) return 4;
if (n < 100000) return 5;
if (n < 1000000) return 6;
if (n < 10000000) return 7;
if (n < 100000000) return 8;
return n < 1000000000 ? 9 : 10;
}
And it works perfectly. When I print the value of num1.numDigits(), the value returns 4 (it is worth '1111'. And my other integer: num2.numDigits() returns 2 (it is 11). This is great but when I actually compare them:
int[] rawNum2 = Arrays.DigitArr(num2);
if (num1.NumDigits() > num2.NumDigits())
{
int diff = num1.NumDigits() - num2.NumDigits();
for (int i = 1; i < diff; i++)
{
rawNum2.Append(0);
}
reversedNum2 = rawNum2.Reverse();
}
reversedNum2 is still '11' when it should be '0011'.
This is the class I compiled and used.
public static int[] Append(this int[] source, int value)
{
int[] newValue = source;
newValue = newValue.Concat(new[] { value }).ToArray();
return newValue;
}
public static int[] Reverse(this int[] array)
{
int[] arr = array;
for (int i = 0; i < arr.Length / 2; i++)
{
int tmp = arr[i];
arr[i] = arr[arr.Length - i - 1];
arr[arr.Length - i - 1] = tmp;
}
return arr;
}
public static int[] DigitArr(int n)
{
if (n == 0) return new int[1] { 0 };
var digits = new List<int>();
for (; n != 0; n /= 10)
digits.Add(n % 10);
var arr = digits.ToArray();
Array.Reverse(arr);
return arr;
}
Why is this happening?
You are discarding the return value of the Append method.
Change
rawNum2.Append(0);
to
rawNum2 = rawNum2.Append(0);
inside the for loop.
Your loop could be and should be simplified to:
rawNum2 = rawNum2.PadRight(num1.NumDigits(), '0')
To get the reversedNum2 as 0011 change your loop as below.
for (int i = 1; i <= diff; i++)
{
rawNum2=rawNum2.Append(0);
}
I made two changes changed the for loop to use i<=diff instead of i < diff
and assigning the return value from Append() method into the rawNum2.

Why is this my simple cache not working?

I have a chunk of code I wrote
private int CountChangeOptions(int[] coins, int j, int N)
{
if(j == 0)
return 0;
if(N == 0)
return 1;
if(j == 1)
return N % coins[0] == 0 ? 1 : 0;
int sum = 0;
int k = coins[j - 1];
for(int i = N; i >= 0; i -= k)
{
sum += CountChangeOptions(coins, j - 1, i);
}
return sum;
}
that is evaluating correctly (though not fast enough). Since CountChangeOptions(coins, x, y) for most coins will in the course of the algorithm will be called several times with the same x,y I decided to implement a caching strategy
private int CountChangeOptions(int[] coins, int j, int N)
{
if(j == 0)
return 0;
if(N == 0)
return 1;
if(j == 1)
return N % coins[0] == 0 ? 1 : 0;
int sum = 0;
int k = coins[j - 1];
for(int i = N; i >= 0; i -= k)
{
sum += CountOrGetFromCache(coins, j - 1, i);
}
return sum;
}
private int CountOrGetFromCache(int[] coins, int j, int i)
{
if(_cache[j, i] == null)
{
_cache[j, i] = CountChangeOptions(coins, j, i);
}
return (int)_cache[j, i];
}
where _cache is an int?[][] that has been previously set with _cache = new int[N+1, N+1]. However this is now failing some of the test cases. Any idea why?
The only issue that I see is that the _cache variable is incorrectly initialized to
_cache = new int[N+1, N+1]
instead of
_cache = new int[coins.Length+1, N+1]
Other than that, both solutions are functionally same. I don't believe that there is a functional test case which works for the first (slow) implementation and fails for the second implementation.
I am guessing that you are using some website like HackerRank or its like, and the first solution probably times-out, giving you the feeling that it works and the second being faster does not time-out but produces the incorrect result.

Latest Codility Time Problems

Just got done with the latest Codility, passed it, but didnt get 100% on it
Here is the spec
A prefix of a string S is any leading contiguous part of S. For example, "c" and "cod" are prefixes of the string "codility". For simplicity, we require prefixes to be non-empty.
The product of prefix P of string S is the number of occurrences of P multiplied by the length of P. More precisely, if prefix P consists of K characters and P occurs exactly T times in S, then the product equals K * T.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
The longest prefix is identical to the original string. The goal is to choose such a prefix as maximizes the value of the product. In above example the maximal product is 10.
In this problem we consider only strings that consist of lower-case English letters (a−z).
So basically, it is a string traverse problem. I was able to pass all the validation parts, but I lost on the time. Here is what I wrote
int Solution(string S)
{
int finalCount = 0;
for (int i = 0; i <= S.Length - 1; i++)
{
string prefix = S.Substring(0, i + 1);
int count = 0;
for (int j = 0; j <= S.Length - 1; j++)
{
if (prefix.Length + j <= S.Length)
{
string newStr = S.Substring(j, prefix.Length);
if (newStr == prefix)
{
count++;
}
}
if (j == S.Length - 1)
{
int product = count * prefix.Length;
if (product > finalCount)
{
finalCount = product;
}
}
}
}
return finalCount;
}
I know that the nested loop is killing me, but I cannot think of a way to traverse the "sections" of the string without adding the other loop.
Any help would be appreciated.
Naive brute-force solution takes O(N ** 3) time. Choose length from 1 to N, get a prefix of its length and count occurences by brute-force searching. Сhoosing length takes O(N) time and brute force takes O(N ** 2) time, totally O(N ** 3).
If you use KMP or Z-algo, you can find occurences in O(N) time, so the whole solution will be O(N ** 2) time.
And you can precalc occurences, so it will take O(N) + O(N) = O(N) time solution.
vector<int> z_function(string &S); //z-function, z[0] = S.length()
vector<int> z = z_function(S);
//cnt[i] - count of i-length prefix occurrences of S
for (int i = 0; i < n; ++i)
++cnt[z[i]];
//if cnt[i] is in S, cnt[i - 1] will be in S
int previous = 0;
for (int i = n; i > 0; --i) {
cnt[i] += previous;
previous = cnt[i];
}
Here's the blog post, explaining all O(N ** 3), O(N ** 2), O(N) solutions.
my effort was as follows trying to eliminate unnecessary string compares, i read isaacs blog but it is in c how would this translate to c#, i even went as far as using arrays everywhere to avoid the string immutability factor but there was no improvement
int Rtn = 0;
int len = S.Length;
if (len > 1000000000)
return 1000000000;
for (int i = 1; i <= len; i++)
{
string tofind = S.Substring(0, i);
int tofindlen = tofind.Length;
int occurencies = 0;
for (int ii = 0; ii < len; ii++)
{
int found = FastIndexOf(S, tofind.ToCharArray(), ii);
if (found != -1)
{
if ((found == 0 && tofindlen != 1) || (found >= 0))
{
ii = found;
}
++occurencies ;
}
}
if (occurencies > 0)
{
int total = occurencies * tofindlen;
if (total > Rtn)
{
Rtn = total;
}
}
}
return Rtn;
}
static int FastIndexOf(string source, char[] pattern, int start)
{
if (pattern.Length == 0) return 0;
if (pattern.Length == 1) return source.IndexOf(pattern[0], start);
bool found;
int limit = source.Length - pattern.Length + 1 - start;
if (limit < 1) return -1;
char c0 = pattern[0];
char c1 = pattern[1];
// Find the first occurrence of the first character
int first = source.IndexOf(c0, start, limit);
while ((first != -1) && (first + pattern.Length <= source.Length))
{
if (source[first + 1] != c1)
{
first = source.IndexOf(c0, ++first);
continue;
}
found = true;
for (int j = 2; j < pattern.Length; j++)
if (source[first + j] != pattern[j])
{
found = false;
break;
}
if (found) return first;
first = source.IndexOf(c0, ++first);
}
return -1;
}
I only got a 43... I like my code though! Same script in javascript got 56 for what it's worth.
using System;
class Solution
{
public int solution(string S)
{
int highestCount = 0;
for (var i = S.Length; i > 0; i--)
{
int occurs = 0;
string prefix = S.Substring(0, i);
int tempIndex = S.IndexOf(prefix) + 1;
string tempString = S;
while (tempIndex > 0)
{
tempString = tempString.Substring(tempIndex);
tempIndex = tempString.IndexOf(prefix);
tempIndex++;
occurs++;
}
int product = occurs * prefix.Length;
if ((product) > highestCount)
{
if (highestCount > 1000000000)
return 1000000000;
highestCount = product;
}
}
return highestCount;
}
}
This works better
private int MaxiumValueOfProduct(string input)
{
var positions = new List<int>();
int max = 0;
for (int i = 1; i <= input.Length; i++)
{
var subString = input.Substring(0, i);
int position = 0;
while ((position < input.Length) &&
(position = input.IndexOf(subString, position, StringComparison.OrdinalIgnoreCase)) != -1)
{
positions.Add(position);
++position;
}
var currentMax = subString.Length * positions.Count;
if (currentMax > max) max = currentMax;
positions.Clear();
}
return max;
}

Damerau - Levenshtein Distance, adding a threshold

I have the following implementation, but I want to add a threshold, so if the result is going to be greater than it, just stop calculating and return.
How would I go about that?
EDIT: Here is my current code, threshold is not yet used...the goal is that it is used
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var cost = string1[i - 1] == string2[j - 1] ? 0 : 1;
var del = d[i - 1, j] + 1;
var ins = d[i, j - 1] + 1;
var sub = d[i - 1, j - 1] + cost;
d[i, j] = Math.Min(del, Math.Min(ins, sub));
if (i > 1 && j > 1 && string1[i - 1] == string2[j - 2] && string1[i - 2] == string2[j - 1])
d[i, j] = Math.Min(d[i, j], d[i - 2, j - 2] + cost);
}
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)];
}
}
This is Regarding ur answer this: Damerau - Levenshtein Distance, adding a threshold
(sorry can't comment as I don't have 50 rep yet)
I think you have made an error here. You initialized:
var minDistance = threshold;
And ur update rule is:
if (d[i, j] < minDistance)
minDistance = d[i, j];
Also, ur early exit criteria is:
if (minDistance > threshold)
return int.MaxValue;
Now, observe that the if condition above will never hold true! You should rather initialize minDistance to int.MaxValue
Here's the most elegant way I can think of. After setting each index of d, see if it exceeds your threshold. The evaluation is constant-time, so it's a drop in the bucket compared to the theoretical N^2 complexity of the overall algorithm:
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
...
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
...
var temp = d[i,j] = Math.Min(del, Math.Min(ins, sub));
if (i > 1 && j > 1 && string1[i - 1] == string2[j - 2] && string1[i - 2] == string2[j - 1])
temp = d[i,j] = Math.Min(temp, d[i - 2, j - 2] + cost);
//Does this value exceed your threshold? if so, get out now
if(temp > threshold)
return temp;
}
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)];
}
You also asked this as a SQL CLR UDF question so I'll answer in that specific context: you best optmiziation won't come from optimizing the Levenshtein distance, but from reducing the number of pairs you compare. Yes, a faster Levenshtein algorithm will improve things, but not nearly as much as reducing the number of comparisons from N square (with N in the millions of rows) to N*some factor. My proposal is to compare only elements who have the length difference within a tolerable delta. On your big table, you add a persisted computed column on LEN(Data) and then create an index on it with include Data:
ALTER TABLE Table ADD LenData AS LEN(Data) PERSISTED;
CREATE INDEX ndxTableLenData on Table(LenData) INCLUDE (Data);
Now you can restrict the sheer problem space by joining within an max difference on lenght (eg. say 5), if your data's LEN(Data) varies significantly:
SELECT a.Data, b.Data, dbo.Levenshtein(a.Data, b.Data)
FROM Table A
JOIN Table B ON B.DataLen BETWEEN A.DataLen - 5 AND A.DataLen+5
Finally got it...though it's not as beneficial as I had hoped
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
var del = d[im1, j] + 1;
var ins = d[i, jm1] + 1;
var sub = d[im1, jm1] + cost;
//Math.Min is slower than native code
//d[i, j] = Math.Min(del, Math.Min(ins, sub));
d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i, j] = Math.Min(d[i, j], d[im2, jm2] + cost);
if (d[i, j] < minDistance)
minDistance = d[i, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[d.GetUpperBound(0), d.GetUpperBound(1)];
}

Categories

Resources