Find the closest number in a list of numbers - c#

I have a list of constant numbers. I need to find the closest number to x in the list of the numbers. Any ideas on how to implement this algorithm?

Well, you cannot do this faster than O(N) because you have to check all numbers to be sure you have the closest one. That said, why not use a simple variation on finding the minimum, looking for the one with the minimum absolute difference with x?
If you can say the list is ordered from the beginning (and it allows random-access, like an array), then a better approach is to use a binary search. When you end the search at index i (without finding x), just pick the best out of that element and its neighbors.

I suppose that the array is unordered. In ordered it can be faster
I think that the simpliest and the fastest method is using linear algorithm for finding minimum or maximum but instead of comparing values you will compare absolute value of difference between this and needle.
In the C++ ( I can't C# but it will be similar ) can code look like this:
// array of numbers is haystack
// length is length of array
// needle is number which you are looking for ( or compare with )
int closest = haystack[0];
for ( int i = 0; i < length; ++i ) {
if ( abs( haystack[ i ] - needle ) < abs( closest - needle ) ) closest = haystack[i];
}
return closest;

In general people on this site won't do your homework for you. Since you didn't post code I won't post code either. However, here's one possible approach.
Loop through the list, subtracting the number in the list from x. Take the absolute value of this difference and compare it to the best previous result you've gotten and, if the current difference is less than the best previous result, save the current number from the list. At the end of the loop you'll have your answer.

private int? FindClosest(IEnumerable<int> numbers, int x)
{
return
(from number in numbers
let difference = Math.Abs(number - x)
orderby difference, Math.Abs(number), number descending
select (int?) number)
.FirstOrDefault();
}
Null means there was no closest number. If there are two numbers with the same difference, it will choose the one closest to zero. If two numbers are the same distance from zero, the positive number will be chosen.
Edit in response to Eric's comment:
Here is a version which has the same semantics, but uses the Min operator. It requires an implementation of IComparable<> so we can use Min while preserving the number that goes with each distance. I also made it an extension method for ease-of-use:
public static int? FindClosestTo(this IEnumerable<int> numbers, int targetNumber)
{
var minimumDistance = numbers
.Select(number => new NumberDistance(targetNumber, number))
.Min();
return minimumDistance == null ? (int?) null : minimumDistance.Number;
}
private class NumberDistance : IComparable<NumberDistance>
{
internal NumberDistance(int targetNumber, int number)
{
this.Number = number;
this.Distance = Math.Abs(targetNumber - number);
}
internal int Number { get; private set; }
internal int Distance { get; private set; }
public int CompareTo(NumberDistance other)
{
var comparison = this.Distance.CompareTo(other.Distance);
if(comparison == 0)
{
// When they have the same distance, pick the number closest to zero
comparison = Math.Abs(this.Number).CompareTo(Math.Abs(other.Number));
if(comparison == 0)
{
// When they are the same distance from zero, pick the positive number
comparison = this.Number.CompareTo(other.Number);
}
}
return comparison;
}
}

It can be done using SortedList:
Blog post on finding closest number
If the complexity you're looking for counts only the searching the complexity is O(log(n)). The list building will cost O(n*log(n))
If you're going to insert item to the list much more times than you're going to query it for the closest number then the best choice is to use List and use naive algorithm to query it for the closest number. Each search will cost O(n) but time to insert will be reduced to O(n).
General complexity: If the collection has n numbers and searched q times -
List: O(n+q*n)
Sorted List: O(n*log(n)+q*log(n))
Meaning, from some q the sorted list will provide better complexity.

Being lazy I have not check this but shouldn't this work
private int FindClosest(IEnumerable<int> numbers, int x)
{
return
numbers.Aggregate((r,n) => Math.Abs(r-x) > Math.Abs(n-x) ? n
: Math.Abs(r-x) < Math.Abs(n-x) ? r
: r < x ? n : r);
}

Haskell:
import Data.List (minimumBy)
import Data.Ord (comparing)
findClosest :: (Num a, Ord a) => a -> [a] -> Maybe a
findClosest _ [] = Nothing
findClosest n xs = Just $ minimumBy (comparing $ abs . (+ n)) xs

Performance wise custom code will be more use full.
List<int> results;
int targetNumber = 0;
int nearestValue=0;
if (results.Any(ab => ab == targetNumber ))
{
nearestValue= results.FirstOrDefault<int>(i => i == targetNumber );
}
else
{
int greaterThanTarget = 0;
int lessThanTarget = 0;
if (results.Any(ab => ab > targetNumber ))
{
greaterThanTarget = results.Where<int>(i => i > targetNumber ).Min();
}
if (results.Any(ab => ab < targetNumber ))
{
lessThanTarget = results.Where<int>(i => i < targetNumber ).Max();
}
if (lessThanTarget == 0 )
{
nearestValue= greaterThanTarget;
}
else if (greaterThanTarget == 0)
{
nearestValue= lessThanTarget;
}
else if (targetNumber - lessThanTarget < greaterThanTarget - targetNumber )
{
nearestValue= lessThanTarget;
}
else
{
nearestValue= greaterThanTarget;
}
}

Related

Is there a way to find the closest number in an Array to another inputed number?

So I have a Visualstudio Forms where I have a NumericUpDown function that will allow users to input a 5 digit number such as 09456. And I need to be able to compare that number to an already existing array of similar 5 digit numbers, so essentially I need to get the inputted number and find the closest number to that.
var numbers = new List<float> {89456f, 23467f, 86453f, };
// the list is way longer but you get the idea
var target = numericUpDown.3 ;
var closest = numbers.Select(n => new { n, (n - target) })
.OrderBy(p => p.distance)
.First().n;
But the first problem I encounter is that I cannot use a "-" operation on a float. Is there any way I can avoid that error and be able to still find the closest input?
Anonymous type members need names, and you need to use the absolute value of the difference. eg
var numbers = new List<float> { 89456f, 23467f, 86453f, };
var target = 3;
var closest = numbers.Select(n => new { n, distance = Math.Abs(n - target) })
.OrderBy(p => p.distance)
.First().n;
Well, apart from some issues in your sample(like no distance property on float) it should work:
int target = 55555;
float closest = numbers.OrderBy(f => Math.Abs(f - target)).First();
Demo: https://dotnetfiddle.net/gqS50L
The answers that use OrderBy are correct, but have less than optimal performance. OrderBy is an O(N log N) operation, but why sort the whole collection when you only need the top element? By contrast, MinBy will give you the result in O(N) time:
var closest = numbers.MinBy(n => Math.Abs(n - target));
Apart from the compilation errors, using LINQ for this is very slow and time consuming. The entire list has to be scanned once to find the distance, then it needs to be sorted, which scans it all over again and caches the results before returning them in order.
Before .NET 6
A faster way would be to iterate only once, calculating the distance of the current item from the target, and keep track of which number is closest. That's how eg Min and Max work.
public static float? Closest(this IEnumerable<float> list, float target)
{
float? closest=null;
float bestDist=float.MaxValue;
foreach(var n in list)
{
var dist=Math.Abs(n-target);
if (dist<bestDist)
{
bestDist=dist;
closest=n;
}
}
return closest;
}
This will return the closest number in a single pass.
var numbers = new List<float> { 89456f, 23467f, 86453f, };
var closest=numbers.Closest(20000);
Console.WriteLine($"Closest is {closest}");
------------------
Closest is 23467
Using MoreLINQ and MinBy
The same can be done in a single line using the MinBy extension method from the MoreLINQ library:
var closest=numbers.MinBy(n=>Math.Abs(n-target));
Using MinBy
In .NET 6 and later, Enumerable.MinBy was added to the BCL:
var closest=numbers.MinBy(n=>Math.Abs(n-target));
The code is similar to the explicit loop once you look past the generic key selectors and comparers :
while (e.MoveNext())
{
TSource nextValue = e.Current;
TKey nextKey = keySelector(nextValue);
if (nextKey != null && comparer.Compare(nextKey, key) < 0)
{
key = nextKey;
value = nextValue;
}
}

Massive amount number comparison using c#

Comparison of number sets is too slow. What is more efficiency way to solve this problem?
I have two groups of sets, each group has about 5 millions of sets, each set has 6 numbers and each number is between 1 to 100. Sets and Groups are not sorted and duplicated.
Following is Example.
No. Group A Group B
1 {1,2,3,4,5,6} {6,2,4,87,53,12}
2 {2,3,4,5,6,8} {43,6,78,23,96,24}
3 {45,23,57,79,23,76} {12,1,90,3,2,23}
4 {3,5,85,24,78,90} {12,65,78,9,23,13}
... ...
My goal is compare two groups and classify Group A by maximum common element count in 5hrs on my laptop.
In the example, No 1 of Group A and No 3 of Group B has 3 common elements(1,2,3).
Also, No 2 of Group A and No 3 of Group B has 2 common elements(2,3). Therefore I will classify Group A as following.
No. Group A Maximum Common Element Count
1 {1,2,3,4,5,6} 3
2 {2,3,4,5,6,8} 3
3 {45,23,57,79,23,76} 1
4 {3,5,85,24,78,90} 2
...
My approach is compare every sets and number, so complexity is Group A Count * Group B Count * 6 * 6. Therefore it need so many time.
Dictionary<int, List<int>> Classified = new Dictionary<int, List<int>>();
foreach (List<int> setA in GroupA)
{
int maxcount = 0;
foreach (List<int> setB in GroupB)
{
int count = 0;
foreach(int elementA in setA)
{
foreach(int elementB in setB)
{
if (elementA == elementB) count++;
}
}
if (count > maxcount) maxcount = count;
}
Classified.Add(maxcount, setA);
}
Here is my attempt - using a HashSet<int> and precalculating the range of each set to avoid set-to-set comparisons like {1,2,3,4,5,6} and {7,8,9,10,11,12} (as pointed out by Matt's answer).
For me (running with random sets) it resulted in a 130x speed improvement on the original code. You mentioned in a comment that
Now execution time is over 3 days, so as others said I need parallelization.
and in the question itself that
My goal is compare two groups and classify Group A by maximum common element count in 5hrs on my laptop.
so assuming that the comment means that the execution time for your data exceeded 3 days (72 hours), but you want it to complete in 5 hours, you'd only need something like a 14x speed increase.
Framework
I've created some classes to run these benchmarks:
Range - takes some int values, and keeps track of the minimum and maximum values.
public class Range
{
private readonly int _min;
private readonly int _max;
public Range(IReadOnlyCollection<int> values)
{
_min = values.Min();
_max = values.Max();
}
public int Min { get { return _min; } }
public int Max { get { return _max; } }
public bool Intersects(Range other)
{
if ( _min < other._max )
return false;
if ( _max > other._min )
return false;
return true;
}
}
SetWithRange - wraps a HashSet<int> and a Range of the values.
public class SetWithRange : IEnumerable<int>
{
private readonly HashSet<int> _values;
private readonly Range _range;
public SetWithRange(IReadOnlyCollection<int> values)
{
_values = new HashSet<int>(values);
_range = new Range(values);
}
public static SetWithRange Random(Random random, int size, Range range)
{
var values = new HashSet<int>();
// Random.Next(int, int) generates numbers in the range [min, max)
// so we need to add one here to be able to generate numbers in [min, max].
// See https://learn.microsoft.com/en-us/dotnet/api/system.random.next
var min = range.Min;
var max = range.Max + 1;
while ( values.Count() < size )
values.Add(random.Next(min, max));
return new SetWithRange(values);
}
public int CommonValuesWith(SetWithRange other)
{
// No need to call Intersect on the sets if the ranges don't intersect
if ( !_range.Intersects(other._range) )
return 0;
return _values.Intersect(other._values).Count();
}
public IEnumerator<int> GetEnumerator()
{
return _values.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
The results were generated using SetWithRange.Random as follows:
const int groupCount = 10000;
const int setSize = 6;
var range = new Range(new[] { 1, 100 });
var generator = new Random();
var groupA = Enumerable.Range(0, groupCount)
.Select(i => SetWithRange.Random(generator, setSize, range))
.ToList();
var groupB = Enumerable.Range(0, groupCount)
.Select(i => SetWithRange.Random(generator, setSize, range))
.ToList();
The timings given below are for an average of three x64 release build runs on my machine.
For all cases I generated groups with 10000 random sets then scaled up to approximate the execution time for 5 million sets by using
timeFor5Million = timeFor10000 / 10000 / 10000 * 5000000 * 5000000
= timeFor10000 * 250000
Results
Four foreach blocks:
Average time = 48628ms; estimated time for 5 million sets = 3377 hours
var result = new Dictionary<SetWithRange, int>();
foreach ( var setA in groupA )
{
int maxcount = 0;
foreach ( var setB in groupB )
{
int count = 0;
foreach ( var elementA in setA )
{
foreach ( int elementB in setB )
{
if ( elementA == elementB )
count++;
}
}
if ( count > maxcount ) maxcount = count;
}
result.Add(setA, maxcount);
}
Three foreach blocks with parallelisation on the outer foreach:
Average time = 10305ms; estimated time for 5 million sets = 716 hours (4.7 times faster than original):
var result = new Dictionary<SetWithRange, int>();
Parallel.ForEach(groupA, setA =>
{
int maxcount = 0;
foreach ( var setB in groupB )
{
int count = 0;
foreach ( var elementA in setA )
{
foreach ( int elementB in setB )
{
if ( elementA == elementB )
count++;
}
}
if ( count > maxcount ) maxcount = count;
}
lock ( result )
result.Add(setA, maxcount);
});
Using HashSet<int> and adding a Range to only check sets which intersect:
Average time = 375ms; estimated time for 5 million sets = 24 hours (130 times faster than original):
var result = new Dictionary<SetWithRange, int>();
Parallel.ForEach(groupA, setA =>
{
var commonValues = groupB.Max(setB => setA.CommonValuesWith(setB));
lock ( result )
result.Add(setA, commonValues);
});
Link to a working online demo here: https://dotnetfiddle.net/Kxpagh (note that .NET Fiddle limits execution times to 10 seconds, and that for obvious reasons its results are slower than running in a normal environment).
Fastest I can think of is this:
As all your numbers come from a limited range (1-100), you can express each of your sets as a 100-digit binary number <d1,d2,...,d100> where dn equals 1 iff n is in the set.
Then comparing two sets means a binary AND on the two binary representations and counting the set bits (which can be done efficiently)
In addition to that, this task can be parallelized (your input is immutable, so it's quite straightforward).
You would have to benchmark it with smaller sets but since you're going to have to do 5E6 * 5E6 = 25E12 comparisons, you might as well sort the contents of 5E6 + 5E6 = 10E6 sets first.
Then the set to set comparisons become much fast since you can stop in each comparison as soon as you reach the highest number in the first side of the comparison. Minuscule savings per set comparison but trillions of times over, it adds up.
You could also go further and index the two sets of five million by lowest entry and highest entry. You would further cut down the number of comparisons significantly. In the end, that's only 100 * 100' = 10,000 = 1E4 distinct collections. You would never have to compare sets that have for instance 12 for the highest number, with any sets that start with 13 or more. effectively avoiding a ton of work.
In my mind, this is sorting a lot of data, but it pales in order to the number of actual set to set comparisons you would have to do raw. Here, you are eliminating work for all the 0s and able to abort early if the conditions are right when you do do a compare.
And as others have said, parallelization...
PS: 5E6 = 5 * 10^6 = 5,000,000 and 25E12 = 25 * 10^12 = 25 * 10,000,000,000,000
The time complexity of any algorithm you come up with is going to be of the same order. HashSets might be a bit faster, but if they are it won't be by much - the overhead of 36 direct list comparisons vs 12 hashset lookups isn't going to be significantly higher, if at all, but you'll have to benchmark. Presorting might help a bit considering each set will be compared millions of times. Just FYI, for loops are faster than foreach loops on a List and arrays are faster than Lists (for and foreach on array is same performance), which for something like this might make a decent performance difference. If the No. column is sequential then I would use an array for that instead of a dictionary as well. Array lookups are an order of magnitude faster than dictionary lookups.
I think you are generally doing this as quickly as possible aside from parallelization though, with some small gains possible through the above micro-optimizations.
How far off from your target execution time is the current algorithm?
I would use the following:
foreach (List<int> setA in GroupA)
{
int maxcount = GroupB.Max(x => x.Sum(y => setA.Contains(y) ? 1 : 0));
Classified.Add(maxcount, setA);
}

How to Order By or Sort an integer List and select the Nth element

I have a list, and I want to select the fifth highest element from it:
List<int> list = new List<int>();
list.Add(2);
list.Add(18);
list.Add(21);
list.Add(10);
list.Add(20);
list.Add(80);
list.Add(23);
list.Add(81);
list.Add(27);
list.Add(85);
But OrderbyDescending is not working for this int list...
int fifth = list.OrderByDescending(x => x).Skip(4).First();
Depending on the severity of the list not having more than 5 elements you have 2 options.
If the list never should be over 5 i would catch it as an exception:
int fifth;
try
{
fifth = list.OrderByDescending(x => x).ElementAt(4);
}
catch (ArgumentOutOfRangeException)
{
//Handle the exception
}
If you expect that it will be less than 5 elements then you could leave it as default and check it for that.
int fifth = list.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == 0)
{
//handle default
}
This is still some what flawed because you could end up having the fifth element being 0. This can be solved by typecasting the list into a list of nullable ints at before the linq:
var newList = list.Select(i => (int?)i).ToList();
int? fifth = newList.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == null)
{
//handle default
}
Without LINQ expressions:
int result;
if(list != null && list.Count >= 5)
{
list.Sort();
result = list[list.Count - 5];
}
else // define behavior when list is null OR has less than 5 elements
This has a better performance compared to LINQ expressions, although the LINQ solutions presented in my second answer are comfortable and reliable.
In case you need extreme performance for a huge List of integers, I'd recommend a more specialized algorithm, like in Matthew Watson's answer.
Attention: The List gets modified when the Sort() method is called. If you don't want that, you must work with a copy of your list, like this:
List<int> copy = new List<int>(original);
List<int> copy = original.ToList();
The easiest way to do this is to just sort the data and take N items from the front. This is the recommended way for small data sets - anything more complicated is just not worth it otherwise.
However, for large data sets it can be a lot quicker to do what's known as a Partial Sort.
There are two main ways to do this: Use a heap, or use a specialised quicksort.
The article I linked describes how to use a heap. I shall present a partial sort below:
public static IList<T> PartialSort<T>(IList<T> data, int k) where T : IComparable<T>
{
int start = 0;
int end = data.Count - 1;
while (end > start)
{
var index = partition(data, start, end);
var rank = index + 1;
if (rank >= k)
{
end = index - 1;
}
else if ((index - start) > (end - index))
{
quickSort(data, index + 1, end);
end = index - 1;
}
else
{
quickSort(data, start, index - 1);
start = index + 1;
}
}
return data;
}
static int partition<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
T x = lst[start];
int i = start;
for (int j = start + 1; j <= end; j++)
{
if (lst[j].CompareTo(x) < 0) // Or "> 0" to reverse sort order.
{
i = i + 1;
swap(lst, i, j);
}
}
swap(lst, start, i);
return i;
}
static void swap<T>(IList<T> lst, int p, int q)
{
T temp = lst[p];
lst[p] = lst[q];
lst[q] = temp;
}
static void quickSort<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
if (start >= end)
return;
int index = partition(lst, start, end);
quickSort(lst, start, index - 1);
quickSort(lst, index + 1, end);
}
Then to access the 5th largest element in a list you could do this:
PartialSort(list, 5);
Console.WriteLine(list[4]);
For large data sets, a partial sort can be significantly faster than a full sort.
Addendum
See here for another (probably better) solution that uses a QuickSelect algorithm.
This LINQ approach retrieves the 5th biggest element OR throws an exception WHEN the list is null or contains less than 5 elements:
int fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
throw new Exception("list is null OR has not enough elements");
This one retrieves the 5th biggest element OR null WHEN the list is null or contains less than 5 elements:
int? fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
default(int?);
if(fifth == null) // define behavior
This one retrieves the 5th biggest element OR the smallest element WHEN the list contains less than 5 elements:
if(list == null || list.Count <= 0)
throw new Exception("Unable to retrieve Nth biggest element");
int fifth = list.OrderByDescending(x => x).Take(5).Last();
All these solutions are reliable, they should NEVER throw "unexpected" exceptions.
PS: I'm using .NET 4.7 in this answer.
Here there is a C# implementation of the QuickSelect algorithm to select the nth element in an unordered IList<>.
You have to put all the code contained in that page in a static class, like:
public static class QuickHelpers
{
// Put the code here
}
Given that "library" (in truth a big fat block of code), then you can:
int resA = list.QuickSelect(2, (x, y) => Comparer<int>.Default.Compare(y, x));
int resB = list.QuickSelect(list.Count - 1 - 2);
Now... Normally the QuickSelect would select the nth lowest element. We reverse it in two ways:
For resA we create a reverse comparer based on the default int comparer. We do this by reversing the parameters of the Compare method. Note that the index is 0 based. So there is a 0th, 1th, 2th and so on.
For resB we use the fact that the 0th element is the list-1 th element in the reverse order. So we count from the back. The highest element would be the list.Count - 1 in an ordered list, the next one list.Count - 1 - 1, then list.Count - 1 - 2 and so on
Theorically using Quicksort should be better than ordering the list and then picking the nth element, because ordering a list is on average a O(NlogN) operation and picking the nth element is then a O(1) operation, so the composite is O(NlogN) operation, while QuickSelect is on average a O(N) operation. Clearly there is a but. The O notation doesn't show the k factor... So a O(k1 * NlogN) with a small k1 could be better than a O(k2 * N) with a big k2. Only multiple real life benchmarks can tell us (you) what is better, and it depends on the size of the collection.
A small note about the algorithm:
As with quicksort, quickselect is generally implemented as an in-place algorithm, and beyond selecting the k'th element, it also partially sorts the data. See selection algorithm for further discussion of the connection with sorting.
So it modifies the ordering of the original list.

Shuffling a string so that no two adjacent letters are the same

I've been trying to solve this interview problem which asks to shuffle a string so that no two adjacent letters are identical
For example,
ABCC -> ACBC
The approach I'm thinking of is to
1) Iterate over the input string and store the (letter, frequency)
pairs in some collection
2) Now build a result string by pulling the highest frequency (that is > 0) letter that we didn't just pull
3) Update (decrement) the frequency whenever we pull a letter
4) return the result string if all letters have zero frequency
5) return error if we're left with only one letter with frequency greater than 1
With this approach we can save the more precious (less frequent) letters for last. But for this to work, we need a collection that lets us efficiently query a key and at the same time efficiently sort it by values. Something like this would work except we need to keep the collection sorted after every letter retrieval.
I'm assuming Unicode characters.
Any ideas on what collection to use? Or an alternative approach?
You can sort the letters by frequency, split the sorted list in half, and construct the output by taking letters from the two halves in turn. This takes a single sort.
Example:
Initial string: ACABBACAB
Sort: AAAABBBCC
Split: AAAA+BBBCC
Combine: ABABABCAC
If the number of letters of highest frequency exceeds half the length of the string, the problem has no solution.
Why not use two Data Structures: One for sorting (Like a Heap) and one for key retrieval, like a Dictionary?
The accepted answer may produce a correct result, but is likely not the 'correct' answer to this interview brain teaser, nor the most efficient algorithm.
The simple answer is to take the premise of a basic sorting algorithm and alter the looping predicate to check for adjacency rather than magnitude. This ensures that the 'sorting' operation is the only step required, and (like all good sorting algorithms) does the least amount of work possible.
Below is a c# example akin to insertion sort for simplicity (though many sorting algorithm could be similarly adjusted):
string NonAdjacencySort(string stringInput)
{
var input = stringInput.ToCharArray();
for(var i = 0; i < input.Length; i++)
{
var j = i;
while(j > 0 && j < input.Length - 1 &&
(input[j+1] == input[j] || input[j-1] == input[j]))
{
var tmp = input[j];
input[j] = input[j-1];
input[j-1] = tmp;
j--;
}
if(input[1] == input[0])
{
var tmp = input[0];
input[0] = input[input.Length-1];
input[input.Length-1] = tmp;
}
}
return new string(input);
}
The major change to standard insertion sort is that the function has to both look ahead and behind, and therefore needs to wrap around to the last index.
A final point is that this type of algorithm fails gracefully, providing a result with the fewest consecutive characters (grouped at the front).
Since I somehow got convinced to expand an off-hand comment into a full algorithm, I'll write it out as an answer, which must be more readable than a series of uneditable comments.
The algorithm is pretty simple, actually. It's based on the observation that if we sort the string and then divide it into two equal-length halves, plus the middle character if the string has odd length, then corresponding positions in the two halves must differ from each other, unless there is no solution. That's easy to see: if the two characters are the same, then so are all the characters between them, which totals ⌈n/2⌉&plus;1 characters. But a solution is only possible if there are no more than ⌈n/2⌉ instances of any single character.
So we can proceed as follows:
Sort the string.
If the string's length is odd, output the middle character.
Divide the string (minus its middle character if the length is odd) into two equal-length halves, and interleave the two halves.
At each point in the interleaving, since the pair of characters differ from each other (see above), at least one of them must differ from the last character output. So we first output that character and then the corresponding one from the other half.
The sample code below is in C++, since I don't have a C# environment handy to test with. It's also simplified in two ways, both of which would be easy enough to fix at the cost of obscuring the algorithm:
If at some point in the interleaving, the algorithm encounters a pair of identical characters, it should stop and report failure. But in the sample implementation below, which has an overly simple interface, there's no way to report failure. If there is no solution, the function below returns an incorrect solution.
The OP suggests that the algorithm should work with Unicode characters, but the complexity of correctly handling multibyte encodings didn't seem to add anything useful to explain the algorithm. So I just used single-byte characters. (In C# and certain implementations of C++, there is no character type wide enough to hold a Unicode code point, so astral plane characters must be represented with a surrogate pair.)
#include <algorithm>
#include <iostream>
#include <string>
// If possible, rearranges 'in' so that there are no two consecutive
// instances of the same character.
std::string rearrange(std::string in) {
// Sort the input. The function is call-by-value,
// so the argument itself isn't changed.
std::string out;
size_t len = in.size();
if (in.size()) {
out.reserve(len);
std::sort(in.begin(), in.end());
size_t mid = len / 2;
size_t tail = len - mid;
char prev = in[mid];
// For odd-length strings, start with the middle character.
if (len & 1) out.push_back(prev);
for (size_t head = 0; head < mid; ++head, ++tail)
// See explanatory text
if (in[tail] != prev) {
out.push_back(in[tail]);
out.push_back(prev = in[head]);
}
else {
out.push_back(in[head]);
out.push_back(prev = in[tail]);
}
}
}
return out;
}
you can do that by using a priority queue.
Please find the below explanation.
https://iq.opengenus.org/rearrange-string-no-same-adjacent-characters/
Here is a probabilistic approach. The algorithm is:
10) Select a random char from the input string.
20) Try to insert the selected char in a random position in the output string.
30) If it can't be inserted because of proximity with the same char, go to 10.
40) Remove the selected char from the input string and go to 10.
50) Continue until there are no more chars in the input string, or the failed attempts are too many.
public static string ShuffleNoSameAdjacent(string input, Random random = null)
{
if (input == null) return null;
if (random == null) random = new Random();
string output = "";
int maxAttempts = input.Length * input.Length * 2;
int attempts = 0;
while (input.Length > 0)
{
while (attempts < maxAttempts)
{
int inputPos = random.Next(0, input.Length);
var outputPos = random.Next(0, output.Length + 1);
var c = input[inputPos];
if (outputPos > 0 && output[outputPos - 1] == c)
{
attempts++; continue;
}
if (outputPos < output.Length && output[outputPos] == c)
{
attempts++; continue;
}
input = input.Remove(inputPos, 1);
output = output.Insert(outputPos, c.ToString());
break;
}
if (attempts >= maxAttempts) throw new InvalidOperationException(
$"Shuffle failed to complete after {attempts} attempts.");
}
return output;
}
Not suitable for strings longer than 1,000 chars!
Update: And here is a more complicated deterministic approach. The algorithm is:
Group the elements and sort the groups by length.
Create three empty piles of elements.
Insert each group to a separate pile, inserting always the largest group to the smallest pile, so that the piles differ in length as little as possible.
Check that there is no pile with more than half the total elements, in which case satisfying the condition of not having same adjacent elements is impossible.
Shuffle the piles.
Start yielding elements from the piles, selecting a different pile each time.
When the piles that are eligible for selection are more than one, select randomly, weighting by the size of each pile. Piles containing near half of the remaining elements should be much preferred. For example if the remaining elements are 100 and the two eligible piles have 49 and 40 elements respectively, then the first pile should be 10 times more preferable than the second (because 50 - 49 = 1 and 50 - 40 = 10).
public static IEnumerable<T> ShuffleNoSameAdjacent<T>(IEnumerable<T> source,
Random random = null, IEqualityComparer<T> comparer = null)
{
if (source == null) yield break;
if (random == null) random = new Random();
if (comparer == null) comparer = EqualityComparer<T>.Default;
var grouped = source
.GroupBy(i => i, comparer)
.OrderByDescending(g => g.Count());
var piles = Enumerable.Range(0, 3).Select(i => new Pile<T>()).ToArray();
foreach (var group in grouped)
{
GetSmallestPile().AddRange(group);
}
int totalCount = piles.Select(e => e.Count).Sum();
if (piles.Any(pile => pile.Count > (totalCount + 1) / 2))
{
throw new InvalidOperationException("Shuffle is impossible.");
}
piles.ForEach(pile => Shuffle(pile));
Pile<T> previouslySelectedPile = null;
while (totalCount > 0)
{
var selectedPile = GetRandomPile_WeightedByLength();
yield return selectedPile[selectedPile.Count - 1];
selectedPile.RemoveAt(selectedPile.Count - 1);
totalCount--;
previouslySelectedPile = selectedPile;
}
List<T> GetSmallestPile()
{
List<T> smallestPile = null;
int smallestCount = Int32.MaxValue;
foreach (var pile in piles)
{
if (pile.Count < smallestCount)
{
smallestPile = pile;
smallestCount = pile.Count;
}
}
return smallestPile;
}
void Shuffle(List<T> pile)
{
for (int i = 0; i < pile.Count; i++)
{
int j = random.Next(i, pile.Count);
if (i == j) continue;
var temp = pile[i];
pile[i] = pile[j];
pile[j] = temp;
}
}
Pile<T> GetRandomPile_WeightedByLength()
{
var eligiblePiles = piles
.Where(pile => pile.Count > 0 && pile != previouslySelectedPile)
.ToArray();
Debug.Assert(eligiblePiles.Length > 0, "No eligible pile.");
eligiblePiles.ForEach(pile =>
{
pile.Proximity = ((totalCount + 1) / 2) - pile.Count;
pile.Score = 1;
});
Debug.Assert(eligiblePiles.All(pile => pile.Proximity >= 0),
"A pile has negative proximity.");
foreach (var pile in eligiblePiles)
{
foreach (var otherPile in eligiblePiles)
{
if (otherPile == pile) continue;
pile.Score *= otherPile.Proximity;
}
}
var sumScore = eligiblePiles.Select(p => p.Score).Sum();
while (sumScore > Int32.MaxValue)
{
eligiblePiles.ForEach(pile => pile.Score /= 100);
sumScore = eligiblePiles.Select(p => p.Score).Sum();
}
if (sumScore == 0)
{
return eligiblePiles[random.Next(0, eligiblePiles.Length)];
}
var randomScore = random.Next(0, (int)sumScore);
int accumulatedScore = 0;
foreach (var pile in eligiblePiles)
{
accumulatedScore += (int)pile.Score;
if (randomScore < accumulatedScore) return pile;
}
Debug.Fail("Could not select a pile randomly by weight.");
return null;
}
}
private class Pile<T> : List<T>
{
public int Proximity { get; set; }
public long Score { get; set; }
}
This implementation can suffle millions of elements. I am not completely convinced that the quality of the suffling is as perfect as the previous probabilistic implementation, but should be close.
func shuffle(str:String)-> String{
var shuffleArray = [Character](str)
//Sorting
shuffleArray.sort()
var shuffle1 = [Character]()
var shuffle2 = [Character]()
var adjacentStr = ""
//Split
for i in 0..<shuffleArray.count{
if i > shuffleArray.count/2 {
shuffle2.append(shuffleArray[i])
}else{
shuffle1.append(shuffleArray[i])
}
}
let count = shuffle1.count > shuffle2.count ? shuffle1.count:shuffle2.count
//Merge with adjacent element
for i in 0..<count {
if i < shuffle1.count{
adjacentStr.append(shuffle1[i])
}
if i < shuffle2.count{
adjacentStr.append(shuffle2[i])
}
}
return adjacentStr
}
let s = shuffle(str: "AABC")
print(s)

C# Array of Increments

If I want to generate an array that goes from 1 to 6 and increments by .01, what is the most efficient way to do this?
What I want is an array, with mins and maxs subject to change later...like this: x[1,1.01,1.02,1.03...]
Assuming a start, end and an increment value, you can abstract this further:
Enumerable
.Repeat(start, (int)((end - start) / increment) + 1)
.Select((tr, ti) => tr + (increment * ti))
.ToList()
Let's break it down:
Enumerable.Repeat takes a starting number, repeats for a given number of elements, and returns an enumerable (a collection). In this case, we start with the start element, find the difference between start and end and divide it by the increment (this gives us the number of increments between start and end) and add one to include the original number. This should give us the number of elements to use. Just be warned that since the increment is a decimal/double, there might be rounding errors when you cast to an int.
Select transforms all elements of an enumerable given a specific selector function. In this case, we're taking the number that was generated and the index, and adding the original number with the index multiplied by the increment.
Finally, the call to ToList will save the collection into memory.
If you find yourself using this often, then you can create a method to do this for you:
public static List<decimal> RangeIncrement(decimal start, decimal end, decimal increment)
{
return Enumerable
.Repeat(start, (int)((end - start) / increment) + 1)
.Select((tr, ti) => tr + (increment * ti))
.ToList()
}
Edit: Changed to using Repeat, so that non-whole number values will still be maintained. Also, there's no error checking being done here, so you should make sure to check that increment is not 0 and that start < end * sign(increment). The reason for multiplying end by the sign of increment is that if you're incrementing by a negative number, end should be before start.
The easiest way is to use Enumerable.Range:
double[] result = Enumerable.Range(100, 500)
.Select(i => (double)i/100)
.ToArray();
(hence efficient in terms of readability and lines of code)
I would just make a simple function.
public IEnumerable<decimal> GetValues(decimal start, decimal end, decimal increment)
{
for (decimal i = start; i <= end; i += increment)
yield return i;
}
Then you can turn that into an array, query it, or do whatever you want with it.
decimal[] result1 = GetValues(1.0m, 6.0m, .01m).ToArray();
List<decimal> result2 = GetValues(1.0m, 6.0m, .01m).ToList();
List<decimal> result3 = GetValues(1.0m, 6.0m, .01m).Where(d => d > 3 && d < 4).ToList();
Use a for loop with 0.01 increments:
List<decimal> myList = new List<decimal>();
for (decimal i = 1; i <= 6; i+=0.01)
{
myList.Add(i);
}
Elegant
double[] v = Enumerable.Range(1, 600).Select(x => x * 0.01).ToArray();
Efficient
Use for loop
Whatever you do, don't use a floating point datatype (like double), they don't work for things like this on behalf of rounding behaviour. Go for either a decimal, or integers with a factor. For the latter:
Decimal[] decs = new Decimal[500];
for (int i = 0; i < 500; i++){
decs[i] = (new Decimal(i) / 100)+1 ;
}
You could solve it like this. The solution method returns a double array
double[] Solution(double min, int length, double increment)
{
double[] arr = new double[length];
double value = min;
arr[0] = value;
for (int i = 1; i<length; i++)
{
value += increment;
arr[i] = value;
}
return arr;
}
var ia = new float[500]; //guesstimate
var x = 0;
for(float i =1; i <6.01; i+= 0.01){
ia[x] = i;
x++;
}
You could multi-thread this for speed, but it's probably not worth the overhead unless you plan on running this on a really really slow processor.

Categories

Resources