I've been trying to solve this interview problem which asks to shuffle a string so that no two adjacent letters are identical
For example,
ABCC -> ACBC
The approach I'm thinking of is to
1) Iterate over the input string and store the (letter, frequency)
pairs in some collection
2) Now build a result string by pulling the highest frequency (that is > 0) letter that we didn't just pull
3) Update (decrement) the frequency whenever we pull a letter
4) return the result string if all letters have zero frequency
5) return error if we're left with only one letter with frequency greater than 1
With this approach we can save the more precious (less frequent) letters for last. But for this to work, we need a collection that lets us efficiently query a key and at the same time efficiently sort it by values. Something like this would work except we need to keep the collection sorted after every letter retrieval.
I'm assuming Unicode characters.
Any ideas on what collection to use? Or an alternative approach?
You can sort the letters by frequency, split the sorted list in half, and construct the output by taking letters from the two halves in turn. This takes a single sort.
Example:
Initial string: ACABBACAB
Sort: AAAABBBCC
Split: AAAA+BBBCC
Combine: ABABABCAC
If the number of letters of highest frequency exceeds half the length of the string, the problem has no solution.
Why not use two Data Structures: One for sorting (Like a Heap) and one for key retrieval, like a Dictionary?
The accepted answer may produce a correct result, but is likely not the 'correct' answer to this interview brain teaser, nor the most efficient algorithm.
The simple answer is to take the premise of a basic sorting algorithm and alter the looping predicate to check for adjacency rather than magnitude. This ensures that the 'sorting' operation is the only step required, and (like all good sorting algorithms) does the least amount of work possible.
Below is a c# example akin to insertion sort for simplicity (though many sorting algorithm could be similarly adjusted):
string NonAdjacencySort(string stringInput)
{
var input = stringInput.ToCharArray();
for(var i = 0; i < input.Length; i++)
{
var j = i;
while(j > 0 && j < input.Length - 1 &&
(input[j+1] == input[j] || input[j-1] == input[j]))
{
var tmp = input[j];
input[j] = input[j-1];
input[j-1] = tmp;
j--;
}
if(input[1] == input[0])
{
var tmp = input[0];
input[0] = input[input.Length-1];
input[input.Length-1] = tmp;
}
}
return new string(input);
}
The major change to standard insertion sort is that the function has to both look ahead and behind, and therefore needs to wrap around to the last index.
A final point is that this type of algorithm fails gracefully, providing a result with the fewest consecutive characters (grouped at the front).
Since I somehow got convinced to expand an off-hand comment into a full algorithm, I'll write it out as an answer, which must be more readable than a series of uneditable comments.
The algorithm is pretty simple, actually. It's based on the observation that if we sort the string and then divide it into two equal-length halves, plus the middle character if the string has odd length, then corresponding positions in the two halves must differ from each other, unless there is no solution. That's easy to see: if the two characters are the same, then so are all the characters between them, which totals ⌈n/2⌉+1 characters. But a solution is only possible if there are no more than ⌈n/2⌉ instances of any single character.
So we can proceed as follows:
Sort the string.
If the string's length is odd, output the middle character.
Divide the string (minus its middle character if the length is odd) into two equal-length halves, and interleave the two halves.
At each point in the interleaving, since the pair of characters differ from each other (see above), at least one of them must differ from the last character output. So we first output that character and then the corresponding one from the other half.
The sample code below is in C++, since I don't have a C# environment handy to test with. It's also simplified in two ways, both of which would be easy enough to fix at the cost of obscuring the algorithm:
If at some point in the interleaving, the algorithm encounters a pair of identical characters, it should stop and report failure. But in the sample implementation below, which has an overly simple interface, there's no way to report failure. If there is no solution, the function below returns an incorrect solution.
The OP suggests that the algorithm should work with Unicode characters, but the complexity of correctly handling multibyte encodings didn't seem to add anything useful to explain the algorithm. So I just used single-byte characters. (In C# and certain implementations of C++, there is no character type wide enough to hold a Unicode code point, so astral plane characters must be represented with a surrogate pair.)
#include <algorithm>
#include <iostream>
#include <string>
// If possible, rearranges 'in' so that there are no two consecutive
// instances of the same character.
std::string rearrange(std::string in) {
// Sort the input. The function is call-by-value,
// so the argument itself isn't changed.
std::string out;
size_t len = in.size();
if (in.size()) {
out.reserve(len);
std::sort(in.begin(), in.end());
size_t mid = len / 2;
size_t tail = len - mid;
char prev = in[mid];
// For odd-length strings, start with the middle character.
if (len & 1) out.push_back(prev);
for (size_t head = 0; head < mid; ++head, ++tail)
// See explanatory text
if (in[tail] != prev) {
out.push_back(in[tail]);
out.push_back(prev = in[head]);
}
else {
out.push_back(in[head]);
out.push_back(prev = in[tail]);
}
}
}
return out;
}
you can do that by using a priority queue.
Please find the below explanation.
https://iq.opengenus.org/rearrange-string-no-same-adjacent-characters/
Here is a probabilistic approach. The algorithm is:
10) Select a random char from the input string.
20) Try to insert the selected char in a random position in the output string.
30) If it can't be inserted because of proximity with the same char, go to 10.
40) Remove the selected char from the input string and go to 10.
50) Continue until there are no more chars in the input string, or the failed attempts are too many.
public static string ShuffleNoSameAdjacent(string input, Random random = null)
{
if (input == null) return null;
if (random == null) random = new Random();
string output = "";
int maxAttempts = input.Length * input.Length * 2;
int attempts = 0;
while (input.Length > 0)
{
while (attempts < maxAttempts)
{
int inputPos = random.Next(0, input.Length);
var outputPos = random.Next(0, output.Length + 1);
var c = input[inputPos];
if (outputPos > 0 && output[outputPos - 1] == c)
{
attempts++; continue;
}
if (outputPos < output.Length && output[outputPos] == c)
{
attempts++; continue;
}
input = input.Remove(inputPos, 1);
output = output.Insert(outputPos, c.ToString());
break;
}
if (attempts >= maxAttempts) throw new InvalidOperationException(
$"Shuffle failed to complete after {attempts} attempts.");
}
return output;
}
Not suitable for strings longer than 1,000 chars!
Update: And here is a more complicated deterministic approach. The algorithm is:
Group the elements and sort the groups by length.
Create three empty piles of elements.
Insert each group to a separate pile, inserting always the largest group to the smallest pile, so that the piles differ in length as little as possible.
Check that there is no pile with more than half the total elements, in which case satisfying the condition of not having same adjacent elements is impossible.
Shuffle the piles.
Start yielding elements from the piles, selecting a different pile each time.
When the piles that are eligible for selection are more than one, select randomly, weighting by the size of each pile. Piles containing near half of the remaining elements should be much preferred. For example if the remaining elements are 100 and the two eligible piles have 49 and 40 elements respectively, then the first pile should be 10 times more preferable than the second (because 50 - 49 = 1 and 50 - 40 = 10).
public static IEnumerable<T> ShuffleNoSameAdjacent<T>(IEnumerable<T> source,
Random random = null, IEqualityComparer<T> comparer = null)
{
if (source == null) yield break;
if (random == null) random = new Random();
if (comparer == null) comparer = EqualityComparer<T>.Default;
var grouped = source
.GroupBy(i => i, comparer)
.OrderByDescending(g => g.Count());
var piles = Enumerable.Range(0, 3).Select(i => new Pile<T>()).ToArray();
foreach (var group in grouped)
{
GetSmallestPile().AddRange(group);
}
int totalCount = piles.Select(e => e.Count).Sum();
if (piles.Any(pile => pile.Count > (totalCount + 1) / 2))
{
throw new InvalidOperationException("Shuffle is impossible.");
}
piles.ForEach(pile => Shuffle(pile));
Pile<T> previouslySelectedPile = null;
while (totalCount > 0)
{
var selectedPile = GetRandomPile_WeightedByLength();
yield return selectedPile[selectedPile.Count - 1];
selectedPile.RemoveAt(selectedPile.Count - 1);
totalCount--;
previouslySelectedPile = selectedPile;
}
List<T> GetSmallestPile()
{
List<T> smallestPile = null;
int smallestCount = Int32.MaxValue;
foreach (var pile in piles)
{
if (pile.Count < smallestCount)
{
smallestPile = pile;
smallestCount = pile.Count;
}
}
return smallestPile;
}
void Shuffle(List<T> pile)
{
for (int i = 0; i < pile.Count; i++)
{
int j = random.Next(i, pile.Count);
if (i == j) continue;
var temp = pile[i];
pile[i] = pile[j];
pile[j] = temp;
}
}
Pile<T> GetRandomPile_WeightedByLength()
{
var eligiblePiles = piles
.Where(pile => pile.Count > 0 && pile != previouslySelectedPile)
.ToArray();
Debug.Assert(eligiblePiles.Length > 0, "No eligible pile.");
eligiblePiles.ForEach(pile =>
{
pile.Proximity = ((totalCount + 1) / 2) - pile.Count;
pile.Score = 1;
});
Debug.Assert(eligiblePiles.All(pile => pile.Proximity >= 0),
"A pile has negative proximity.");
foreach (var pile in eligiblePiles)
{
foreach (var otherPile in eligiblePiles)
{
if (otherPile == pile) continue;
pile.Score *= otherPile.Proximity;
}
}
var sumScore = eligiblePiles.Select(p => p.Score).Sum();
while (sumScore > Int32.MaxValue)
{
eligiblePiles.ForEach(pile => pile.Score /= 100);
sumScore = eligiblePiles.Select(p => p.Score).Sum();
}
if (sumScore == 0)
{
return eligiblePiles[random.Next(0, eligiblePiles.Length)];
}
var randomScore = random.Next(0, (int)sumScore);
int accumulatedScore = 0;
foreach (var pile in eligiblePiles)
{
accumulatedScore += (int)pile.Score;
if (randomScore < accumulatedScore) return pile;
}
Debug.Fail("Could not select a pile randomly by weight.");
return null;
}
}
private class Pile<T> : List<T>
{
public int Proximity { get; set; }
public long Score { get; set; }
}
This implementation can suffle millions of elements. I am not completely convinced that the quality of the suffling is as perfect as the previous probabilistic implementation, but should be close.
func shuffle(str:String)-> String{
var shuffleArray = [Character](str)
//Sorting
shuffleArray.sort()
var shuffle1 = [Character]()
var shuffle2 = [Character]()
var adjacentStr = ""
//Split
for i in 0..<shuffleArray.count{
if i > shuffleArray.count/2 {
shuffle2.append(shuffleArray[i])
}else{
shuffle1.append(shuffleArray[i])
}
}
let count = shuffle1.count > shuffle2.count ? shuffle1.count:shuffle2.count
//Merge with adjacent element
for i in 0..<count {
if i < shuffle1.count{
adjacentStr.append(shuffle1[i])
}
if i < shuffle2.count{
adjacentStr.append(shuffle2[i])
}
}
return adjacentStr
}
let s = shuffle(str: "AABC")
print(s)
Related
I have been programming an object to calculate the DiceSorensen Distance between two strings. The logic of the operation is not so difficult. You calculate how many two letter pairs exist in a string, compare it with a second string and then perform this equation
2(x intersect y)/ (|x| . |y|)
where |x| and |y| is the number of bigram elements in x & y. Reference can be found here for further clarity https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
So I have tried looking up how to do the code online in various spots but every method I have come across uses the 'Intersect' method between two lists and as far as I am aware this won't work because if you have a string where the bigram already exists it won't add another one. For example if I had a string
'aaaa'
I would like there to be 3 'aa' bigrams but the Intersect method will only produce one, if i am incorrect on this assumption please tell me cause i wondered why so many people used the intersect method. My assumption is based on the MSDN website https://msdn.microsoft.com/en-us/library/bb460136(v=vs.90).aspx
So here is the code I have made
public static double SorensenDiceDistance(this string source, string target)
{
// formula 2|X intersection Y|
// --------------------
// |X| + |Y|
//create variables needed
List<string> bigrams_source = new List<string>();
List<string> bigrams_target = new List<string>();
int source_length;
int target_length;
double intersect_count = 0;
double result = 0;
Console.WriteLine("DEBUG: string length source is " + source.Length);
//base case
if (source.Length == 0 || target.Length == 0)
{
return 0;
}
//extract bigrams from string 1
bigrams_source = source.ListBiGrams();
//extract bigrams from string 2
bigrams_target = target.ListBiGrams();
source_length = bigrams_source.Count();
target_length = bigrams_target.Count();
Console.WriteLine("DEBUG: bigram counts are source: " + source_length + " . target length : " + target_length);
//now we have two sets of bigrams compare them in a non distinct loop
for (int i = 0; i < bigrams_source.Count(); i++)
{
for (int y = 0; y < bigrams_target.Count(); y++)
{
if (bigrams_source.ElementAt(i) == bigrams_target.ElementAt(y))
{
intersect_count++;
//Console.WriteLine("intersect count is :" + intersect_count);
}
}
}
Console.WriteLine("intersect line value : " + intersect_count);
result = (2 * intersect_count) / (source_length + target_length);
if (result < 0)
{
result = Math.Abs(result);
}
return result;
}
In the code somewhere you can see I call a method called listBiGrams and this is how it looks
public static List<string> ListBiGrams(this string source)
{
return ListNGrams(source, 2);
}
public static List<string> ListTriGrams(this string source)
{
return ListNGrams(source, 3);
}
public static List<string> ListNGrams(this string source, int n)
{
List<string> nGrams = new List<string>();
if (n > source.Length)
{
return null;
}
else if (n == source.Length)
{
nGrams.Add(source);
return nGrams;
}
else
{
for (int i = 0; i < source.Length - n; i++)
{
nGrams.Add(source.Substring(i, n));
}
return nGrams;
}
}
So my understanding of the code step by step is
1) pass in strings
2) 0 length check
3) create list and pass up bigrams into them
4) get the lengths of each bigram list
5) nested loop to check in source position[i] against every bigram in target string and then increment i until no more source list to check against
6) perform equation mentioned above taken from wikipedia
7) if result is negative Math.Abs it to return a positive result (however i know the result should be between 0 and 1 already this is what keyed me into knowing i was doing something wrong)
the source string i used is source = "this is not a correct string" and the target string was, target = "this is a correct string"
the result I got was -0.090909090908
I'm SURE (99%) that what I'm missing is something small like a mis-calculated length somewhere or a count mis-count. If anyone could point out what i'm doing wrong I'd be really grateful. Thank you for your time!
This looks like homework, yet this similarity metric on strings is new to me so I took a look.
Algorith implementation in various languages
As you may notice the C# version uses HashSet and takes advantage of the IntersectWith method.
A set is a collection that contains no duplicate elements, and whose
elements are in no particular order.
This solves your string 'aaaa' puzzle. Only one bigram there.
My naive implementation on Rextester
If you prefer Linq then I'd suggest Enumerable.Distinct, Enumerable.Union and Enumerable.Intersect. These should mimic very well the duplicate removal capabilities of the HashSet.
Also found this nice StringMetric framework written in Scala.
First of all, there is actually more restrictions than stated in the title. Plz readon.
say, i have a dictionary<char,int> where key acts as the item, and value means the number of occurrence in the output. (somewhat like weighting but without replacement)
e.g. ('a',2) ('b',3) ('c',1)
a possible output would be 'babcab'
I am thinking of the following way to implement it.
build a new list containing (accumulated weightings,char) as its entry.
randomly select an item from the list,
recalculate the accumulated weightings, also set the recent drawn item weighing as 0.
repeat.
to some extent there might be a situation like such: 'bacab' is generated, but can do no further (as only 'b' left, but the weighting is set to 0 as no immediate repetition allowed). in this case i discard all the results and start over from the very beginning.
Is there any other good approach?
Also, what if i skip the "set the corresponding weighting to 0" process, instead I reject any infeasible solution. e.g. already i got 'bab'. In the next rng selection i get 'b', then i redo the draw process, until i get something that is not 'b', and then continue. Does this perform better?
How about this recursive algorithm.
Create a list of all characters (candidate list), repeating them according to their weight.
Create an empty list of characters (your solution list)
Pick a random entry from the candidate list
If the selected item (character) is the same as the last in solution list then start scanning for another character in the candidate list (wrapping around if needed).
If no such character in step 4 can be found and candidate list is not empty then backtrack
Append the selected character to the solution list.
If the candidate list is empty print out the solution and 'backtrack', else go to step 3.
I'm not quite sure about the 'backtrack' step yet, but you should get the general idea.
Try this out, it should generate a (pseudo) random ordering of the elements in your enumerable. I'd recommend flattening from your dictionary to a list:
AKA Dictionary of
{b, 2}, {a, 3} becomes {b} {b} {a} {a} {a}
public static IEnumerable<T> RandomPermutation<T>(this IEnumerable<T> enumerable)
{
if (enumerable.Count() < 1)
throw new InvalidOperationException("Must have some elements yo");
Random random = new Random(DateTime.Now.Millisecond);
while (enumerable.Any())
{
int currentCount = enumerable.Count();
int randomIndex = random.Next(0, currentCount);
yield return enumerable.ElementAt(randomIndex);
if (randomIndex == 0)
enumerable = enumerable.Skip(1);
else if (randomIndex + 1 == currentCount)
enumerable = enumerable.Take(currentCount - 1);
else
{
T removeditem = enumerable.ElementAt(randomIndex);
enumerable = enumerable.Where(item => !item.Equals(removeditem));
}
}
}
If you need additional permutations, simply call it again for another random ordering. While this wont get you every permutation, you should find something useful. You can also use this as a base line to get a solution going.
This should be split into some seperate methods and could use some refactoring but the idea is to implement it in such a way that it does not depend on randomly moving things around till you get a valid result. That way you can't predict how long it would take
Concatenate all chars to a string and randomize that string
Loop through the randomized string and find any char that violates the rule
Remove that char from the string
Pick a random number. Use this number as "put the removed char at the nth valid position")
Loop around the remaining string to find the Nth valid postion to put the char back.
If there is no valid position left drop the char
Repeat from step 2 until no more violations are found
using System;
using System.Collections.Generic;
namespace RandomString
{
class Program
{
static void Main(string[] args)
{
Random rnd = new Random(DateTime.Now.Millisecond);
Dictionary<char, int> chars = new Dictionary<char, int> { { 'a', 2 }, { 'b', 3 }, { 'c', 1 } };
// Convert to a string with all chars
string basestring = "";
foreach (var pair in chars)
{
basestring += new String(pair.Key, pair.Value);
}
// Randomize the string
string randomstring = "";
while (basestring.Length > 0)
{
int randomIndex = rnd.Next(basestring.Length);
randomstring += basestring.Substring(randomIndex, 1);
basestring = basestring.Remove(randomIndex, 1);
}
// Now fix 'violations of the rule
// this can be optimized by not starting over each time but this is easier to read
bool done;
do
{
Console.WriteLine("Current string: " + randomstring);
done = true;
char lastchar = randomstring[0];
for (int i = 1; i < randomstring.Length; i++)
{
if (randomstring[i] == lastchar)
{
// uhoh violation of the rule. We pick a random position to move it to
// this means it gets placed at the nth location where it doesn't violate the rule
Console.WriteLine("Violation at position {0} ({1})", i, randomstring[i]);
done = false;
char tomove = randomstring[i];
randomstring = randomstring.Remove(i, 1);
int putinposition = rnd.Next(randomstring.Length);
Console.WriteLine("Moving to {0}th valid position", putinposition);
bool anyplacefound;
do
{
anyplacefound = false;
for (int replace = 0; replace < randomstring.Length; replace++)
{
if (replace == 0 || randomstring[replace - 1] != tomove)
{
// then no problem on the left side
if (randomstring[replace] != tomove)
{
// no problem right either. We can put it here
anyplacefound = true;
if (putinposition == 0)
{
randomstring = randomstring.Insert(replace, tomove.ToString());
break;
}
putinposition--;
}
}
}
} while (putinposition > 0 && anyplacefound);
break;
}
lastchar = randomstring[i];
}
} while (!done);
Console.WriteLine("Final string: " + randomstring);
Console.ReadKey();
}
}
}
I'm writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can't use any libraries to do so. So, I need a function that takes a byte array as an argument and returns the position of the byte following the given sequence. The function doesn't have to be fast, it simply has to work. Any help would be greatly appreciated :)
If it doesn't have to be fast you could use this:
int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
for (int i = 0; i < data.Length - pattern.Length; i++)
{
bool match = true;
for (int k = 0; k < pattern.Length; k++)
{
if (data[i + k] != pattern[k])
{
match = false;
break;
}
}
if (match)
{
return i + pattern.Length;
}
}
}
But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.
The straight-forward approach as pointed out by Turrau works, and for your purposes is probably good enough, since you say it doesn't have to be fast - especially since for most practical purposes the algorithm is much faster than the worst case O(n*m). (Depending on your pattern I guess).
For an optimal solution you can also check out the Knuth-Morris-Pratt algorithm, which makes use of partial matches which in the end is O(n+m).
Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.
int[] badMatch = new int[256];
byte[] pattern; //the pattern we are searching for
//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
badMatch[i] = pattern.Length;
}
//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}
// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>
// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
// if our payload character equals our search string character, continue matching counting backwards
for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
{
k--;
}
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
if (j == -1)
{
//we MATCHED!!!
//i = end;
cont = false;
}
}
I would like to automatically parse a range of numbered sequences from an already sorted List<FileData> of filenames by checking which part of the filename changes.
Here is an example (file extension has already been removed):
First filename: IMG_0000
Last filename: IMG_1000
Numbered Range I need: 0000 and 1000
Except I need to deal with every possible type of file naming convention such as:
0000 ... 9999
20080312_0000 ... 20080312_9999
IMG_0000 - Copy ... IMG_9999 - Copy
8er_green3_00001 .. 8er_green3_09999
etc.
I would like the entire 0-padded range e.g. 0001 not just 1
The sequence number is 0-padded e.g. 0001
The sequence number can be located anywhere e.g. IMG_0000 - Copy
The range can start and end with anything i.e. doesn't have to start with 1 and end with 9999
Numbers may appear multiple times in the filename of the sequence e.g. 20080312_0000
Whenever I get something working for 8 random test cases, the 9th test breaks everything and I end up re-starting from scratch.
I've currently been comparing only the first and last filenames (as opposed to iterating through all filenames):
void FindRange(List<FileData> files, out string startRange, out string endRange)
{
string firstFile = files.First().ShortName;
string lastFile = files.Last().ShortName;
...
}
Does anyone have any clever ideas? Perhaps something with Regex?
If you're guaranteed to know the files end with the number (eg. _\d+), and are sorted, just grab the first and last elements and that's your range. If the filenames are all the same, you can sort the list to get them in order numerically. Unless I'm missing something obvious here -- where's the problem?
Use a regex to parse out the numbers from the filenames:
^.+\w(\d+)[^\d]*$
From these parsed strings, find the maximum length, and left-pad any that are less than the maximum length with zeros.
Sort these padded strings alphabetically. Take the first and last from this sorted list to give you your min and max numbers.
Firstly, I will assume that the numbers are always zero-padded so that they are the same length. If not then bigger headaches lie ahead.
Secondly, assume that the file names are exactly the same apart from the increment number component.
If these assumptions are true then the algorithm should be to look at each character in the first and last filenames to determine which same-positioned characters do not match.
var start = String.Empty;
var end = String.Empty;
for (var index = 0; index < firstFile.Length; index++)
{
char c = firstFile[index];
if (filenames.Any(filename => filename[index] != c))
{
start += firstFile[index];
end += lastFile[index];
}
}
// convert to int if required
edit: Changed to check every filename until a difference is found. Not as efficient as it could be but very simple and straightforward.
Here is my solution. It works with all of the examples that you have provided and it assumes the input array to be sorted.
Note that it doesn't look exclusively for numbers; it looks for a consistent sequence of characters that might differ across all of the strings. So if you provide it with {"0000", "0001", "0002"} it will hand back "0" and "2" as the start and end strings, since that's the only part of the strings that differ. If you give it {"0000", "0010", "0100"}, it will give you back "00" and "10".
But if you give it {"0000", "0101"}, it will whine since the differing parts of the string are not contiguous. If you would like this behavior modified so it will return everything from the first differing character to the last, that's fine; I can make that change. But if you are feeding it a ton of filenames that will have sequential changes to the number region, this should not be a problem.
public static class RangeFinder
{
public static void FindRange(IEnumerable<string> strings,
out string startRange, out string endRange)
{
using (var e = strings.GetEnumerator()) {
if (!e.MoveNext())
throw new ArgumentException("strings", "No elements.");
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index 0.");
var template = e.Current;
// If an element in here is true, it means that index differs.
var matchMatrix = new bool[template.Length];
int index = 1;
string last = null;
while (e.MoveNext()) {
if (e.Current == null)
throw new ArgumentException("strings",
"Null element encountered at index " + index + ".");
last = e.Current;
if (last.Length != template.Length)
throw new ArgumentException("strings",
"Element at index " + index + " has incorrect length.");
for (int i = 0; i < template.Length; i++)
if (last[i] != template[i])
matchMatrix[i] = true;
}
// Verify the matrix:
// * There must be at least one true value.
// * All true values must be consecutive.
int start = -1;
int end = -1;
for (int i = 0; i < matchMatrix.Length; i++) {
if (matchMatrix[i]) {
if (end != -1)
throw new ArgumentException("strings",
"Inconsistent match matrix; no usable pattern discovered.");
if (start == -1)
start = i;
} else {
if (start != -1 && end == -1)
end = i;
}
}
if (start == -1)
throw new ArgumentException("strings",
"Strings did not vary; no usable pattern discovered.");
if (end == -1)
end = matchMatrix.Length;
startRange = template.Substring(start, end - start);
endRange = last.Substring(start, end - start);
}
}
}
I have a list of constant numbers. I need to find the closest number to x in the list of the numbers. Any ideas on how to implement this algorithm?
Well, you cannot do this faster than O(N) because you have to check all numbers to be sure you have the closest one. That said, why not use a simple variation on finding the minimum, looking for the one with the minimum absolute difference with x?
If you can say the list is ordered from the beginning (and it allows random-access, like an array), then a better approach is to use a binary search. When you end the search at index i (without finding x), just pick the best out of that element and its neighbors.
I suppose that the array is unordered. In ordered it can be faster
I think that the simpliest and the fastest method is using linear algorithm for finding minimum or maximum but instead of comparing values you will compare absolute value of difference between this and needle.
In the C++ ( I can't C# but it will be similar ) can code look like this:
// array of numbers is haystack
// length is length of array
// needle is number which you are looking for ( or compare with )
int closest = haystack[0];
for ( int i = 0; i < length; ++i ) {
if ( abs( haystack[ i ] - needle ) < abs( closest - needle ) ) closest = haystack[i];
}
return closest;
In general people on this site won't do your homework for you. Since you didn't post code I won't post code either. However, here's one possible approach.
Loop through the list, subtracting the number in the list from x. Take the absolute value of this difference and compare it to the best previous result you've gotten and, if the current difference is less than the best previous result, save the current number from the list. At the end of the loop you'll have your answer.
private int? FindClosest(IEnumerable<int> numbers, int x)
{
return
(from number in numbers
let difference = Math.Abs(number - x)
orderby difference, Math.Abs(number), number descending
select (int?) number)
.FirstOrDefault();
}
Null means there was no closest number. If there are two numbers with the same difference, it will choose the one closest to zero. If two numbers are the same distance from zero, the positive number will be chosen.
Edit in response to Eric's comment:
Here is a version which has the same semantics, but uses the Min operator. It requires an implementation of IComparable<> so we can use Min while preserving the number that goes with each distance. I also made it an extension method for ease-of-use:
public static int? FindClosestTo(this IEnumerable<int> numbers, int targetNumber)
{
var minimumDistance = numbers
.Select(number => new NumberDistance(targetNumber, number))
.Min();
return minimumDistance == null ? (int?) null : minimumDistance.Number;
}
private class NumberDistance : IComparable<NumberDistance>
{
internal NumberDistance(int targetNumber, int number)
{
this.Number = number;
this.Distance = Math.Abs(targetNumber - number);
}
internal int Number { get; private set; }
internal int Distance { get; private set; }
public int CompareTo(NumberDistance other)
{
var comparison = this.Distance.CompareTo(other.Distance);
if(comparison == 0)
{
// When they have the same distance, pick the number closest to zero
comparison = Math.Abs(this.Number).CompareTo(Math.Abs(other.Number));
if(comparison == 0)
{
// When they are the same distance from zero, pick the positive number
comparison = this.Number.CompareTo(other.Number);
}
}
return comparison;
}
}
It can be done using SortedList:
Blog post on finding closest number
If the complexity you're looking for counts only the searching the complexity is O(log(n)). The list building will cost O(n*log(n))
If you're going to insert item to the list much more times than you're going to query it for the closest number then the best choice is to use List and use naive algorithm to query it for the closest number. Each search will cost O(n) but time to insert will be reduced to O(n).
General complexity: If the collection has n numbers and searched q times -
List: O(n+q*n)
Sorted List: O(n*log(n)+q*log(n))
Meaning, from some q the sorted list will provide better complexity.
Being lazy I have not check this but shouldn't this work
private int FindClosest(IEnumerable<int> numbers, int x)
{
return
numbers.Aggregate((r,n) => Math.Abs(r-x) > Math.Abs(n-x) ? n
: Math.Abs(r-x) < Math.Abs(n-x) ? r
: r < x ? n : r);
}
Haskell:
import Data.List (minimumBy)
import Data.Ord (comparing)
findClosest :: (Num a, Ord a) => a -> [a] -> Maybe a
findClosest _ [] = Nothing
findClosest n xs = Just $ minimumBy (comparing $ abs . (+ n)) xs
Performance wise custom code will be more use full.
List<int> results;
int targetNumber = 0;
int nearestValue=0;
if (results.Any(ab => ab == targetNumber ))
{
nearestValue= results.FirstOrDefault<int>(i => i == targetNumber );
}
else
{
int greaterThanTarget = 0;
int lessThanTarget = 0;
if (results.Any(ab => ab > targetNumber ))
{
greaterThanTarget = results.Where<int>(i => i > targetNumber ).Min();
}
if (results.Any(ab => ab < targetNumber ))
{
lessThanTarget = results.Where<int>(i => i < targetNumber ).Max();
}
if (lessThanTarget == 0 )
{
nearestValue= greaterThanTarget;
}
else if (greaterThanTarget == 0)
{
nearestValue= lessThanTarget;
}
else if (targetNumber - lessThanTarget < greaterThanTarget - targetNumber )
{
nearestValue= lessThanTarget;
}
else
{
nearestValue= greaterThanTarget;
}
}