Why is C# Array.BinarySearch so fast? - c#

I have implemented a very simple binarySearch implementation in C# for finding integers in an integer array:
Binary Search
static int binarySearch(int[] arr, int i)
{
int low = 0, high = arr.Length - 1, mid;
while (low <= high)
{
mid = (low + high) / 2;
if (i < arr[mid])
high = mid - 1;
else if (i > arr[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
When comparing it to C#'s native Array.BinarySearch() I can see that Array.BinarySearch() is more than twice as fast as my function, every single time.
MSDN on Array.BinarySearch:
Searches an entire one-dimensional sorted array for a specific element, using the IComparable generic interface implemented by each element of the Array and by the specified object.
What makes this approach so fast?
Test code
using System;
using System.Diagnostics;
class Program
{
static void Main()
{
Random rnd = new Random();
Stopwatch sw = new Stopwatch();
const int ELEMENTS = 10000000;
int temp;
int[] arr = new int[ELEMENTS];
for (int i = 0; i < ELEMENTS; i++)
arr[i] = rnd.Next(int.MinValue,int.MaxValue);
Array.Sort(arr);
// Custom binarySearch
sw.Restart();
for (int i = 0; i < ELEMENTS; i++)
temp = binarySearch(arr, i);
sw.Stop();
Console.WriteLine($"Elapsed time for custom binarySearch: {sw.ElapsedMilliseconds}ms");
// C# Array.BinarySearch
sw.Restart();
for (int i = 0; i < ELEMENTS; i++)
temp = Array.BinarySearch(arr,i);
sw.Stop();
Console.WriteLine($"Elapsed time for C# BinarySearch: {sw.ElapsedMilliseconds}ms");
}
static int binarySearch(int[] arr, int i)
{
int low = 0, high = arr.Length - 1, mid;
while (low <= high)
{
mid = (low+high) / 2;
if (i < arr[mid])
high = mid - 1;
else if (i > arr[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
}
Test results
+------------+--------------+--------------------+
| Attempt No | binarySearch | Array.BinarySearch |
+------------+--------------+--------------------+
| 1 | 2700ms | 1099ms |
| 2 | 2696ms | 1083ms |
| 3 | 2675ms | 1077ms |
| 4 | 2690ms | 1093ms |
| 5 | 2700ms | 1086ms |
+------------+--------------+--------------------+

Your code is faster when run outside Visual Studio:
Yours vs Array's:
From VS - Debug mode: 3248 vs 1113
From VS - Release mode: 2932 vs 1100
Running exe - Debug mode: 3152 vs 1104
Running exe - Release mode: 559 vs 1104
Array's code might be already optimized in the framework but also does a lot more checking than your version (for instance, your version may overflow if arr.Length is greater than int.MaxValue / 2) and, as already said, is designed for a wide range of types, not just int[].
So, basically, it's slower only when you are debugging your code, because Array's code is always run in release and with less control behind the scenes.

Related

Optimal solution for "Bitwise AND" problem in C#

Problem statement:
Given an array of non-negative integers, count the number of unordered pairs of array elements, such that their bitwise AND is a power of 2.
Example:
arr = [10, 7, 2, 8, 3]
Answer: 6 (10&7, 10&2, 10&8, 10&3, 7&2, 2&3)
Constraints:
1 <= arr.Count <= 2*10^5
0 <= arr[i] <= 2^12
Here's my brute-force solution that I've come up with:
private static Dictionary<int, bool> _dictionary = new Dictionary<int, bool>();
public static long CountPairs(List<int> arr)
{
long result = 0;
for (var i = 0; i < arr.Count - 1; ++i)
{
for (var j = i + 1; j < arr.Count; ++j)
{
if (IsPowerOfTwo(arr[i] & arr[j])) ++result;
}
}
return result;
}
public static bool IsPowerOfTwo(int number)
{
if (_dictionary.TryGetValue(number, out bool value)) return value;
var result = (number != 0) && ((number & (number - 1)) == 0);
_dictionary[number] = result;
return result;
}
For small inputs this works fine, but for big inputs this works slow.
My question is: what is the optimal (or at least more optimal) solution for the problem? Please provide a graceful solution in C#. 😊
One way to accelerate your approach is to compute the histogram of your data values before counting.
This will reduce the number of computations for long arrays because there are fewer options for value (4096) than the length of your array (200000).
Be careful when counting bins that are powers of 2 to make sure you do not overcount the number of pairs by including cases when you are comparing a number with itself.
We can adapt the bit-subset dynamic programming idea to have a solution with O(2^N * N^2 + n * N) complexity, where N is the number of bits in the range, and n is the number of elements in the list. (So if the integers were restricted to [1, 4096] or 2^12, with n at 100,000, we would have on the order of 2^12 * 12^2 + 100000*12 = 1,789,824 iterations.)
The idea is that we want to count instances for which we have overlapping bit subsets, with the twist of adding a fixed set bit. Given Ai -- for simplicity, take 6 = b110 -- if we were to find all partners that AND to zero, we'd take Ai's negation,
110 -> ~110 -> 001
Now we can build a dynamic program that takes a diminishing mask, starting with the full number and diminishing the mask towards the left
001
^^^
001
^^
001
^
Each set bit on the negation of Ai represents a zero, which can be ANDed with either 1 or 0 to the same effect. Each unset bit on the negation of Ai represents a set bit in Ai, which we'd like to pair only with zeros, except for a single set bit.
We construct this set bit by examining each possibility separately. So where to count pairs that would AND with Ai to zero, we'd do something like
001 ->
001
000
we now want to enumerate
011 ->
011
010
101 ->
101
100
fixing a single bit each time.
We can achieve this by adding a dimension to the inner iteration. When the mask does have a set bit at the end, we "fix" the relevant bit by counting only the result for the previous DP cell that would have the bit set, and not the usual union of subsets that could either have that bit set or not.
Here is some JavaScript code (sorry, I do not know C#) to demonstrate with testing at the end comparing to the brute-force solution.
var debug = 0;
function bruteForce(a){
let answer = 0;
for (let i = 0; i < a.length; i++) {
for (let j = i + 1; j < a.length; j++) {
let and = a[i] & a[j];
if ((and & (and - 1)) == 0 && and != 0){
answer++;
if (debug)
console.log(a[i], a[j], a[i].toString(2), a[j].toString(2))
}
}
}
return answer;
}
function f(A, N){
const n = A.length;
const hash = {};
const dp = new Array(1 << N);
for (let i=0; i<1<<N; i++){
dp[i] = new Array(N + 1);
for (let j=0; j<N+1; j++)
dp[i][j] = new Array(N + 1).fill(0);
}
for (let i=0; i<n; i++){
if (hash.hasOwnProperty(A[i]))
hash[A[i]] = hash[A[i]] + 1;
else
hash[A[i]] = 1;
}
for (let mask=0; mask<1<<N; mask++){
// j is an index where we fix a 1
for (let j=0; j<=N; j++){
if (mask & 1){
if (j == 0)
dp[mask][j][0] = hash[mask] || 0;
else
dp[mask][j][0] = (hash[mask] || 0) + (hash[mask ^ 1] || 0);
} else {
dp[mask][j][0] = hash[mask] || 0;
}
for (let i=1; i<=N; i++){
if (mask & (1 << i)){
if (j == i)
dp[mask][j][i] = dp[mask][j][i-1];
else
dp[mask][j][i] = dp[mask][j][i-1] + dp[mask ^ (1 << i)][j][i - 1];
} else {
dp[mask][j][i] = dp[mask][j][i-1];
}
}
}
}
let answer = 0;
for (let i=0; i<n; i++){
for (let j=0; j<N; j++)
if (A[i] & (1 << j))
answer += dp[((1 << N) - 1) ^ A[i] | (1 << j)][j][N];
}
for (let i=0; i<N + 1; i++)
if (hash[1 << i])
answer = answer - hash[1 << i];
return answer / 2;
}
var As = [
[10, 7, 2, 8, 3] // 6
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 4) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 1000;
for (let i=0; i<numTests; i++){
const N = 6;
const A = [];
const n = 10;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << N));
A.push(num);
}
const fA = f(A, N);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");
int[] numbers = new[] { 10, 7, 2, 8, 3 };
static bool IsPowerOfTwo(int n) => (n != 0) && ((n & (n - 1)) == 0);
long result = numbers.AsParallel()
.Select((a, i) => numbers
.Skip(i + 1)
.Select(b => a & b)
.Count(IsPowerOfTwo))
.Sum();
If I understand the problem correctly, this should work and should be faster.
First, for each number in the array we grab all elements in the array after it to get a collection of numbers to pair with.
Then we transform each pair number with a bitwise AND, then counting the number that satisfy our 'IsPowerOfTwo;' predicate (implementation here).
Finally we simply get the sum of all the counts - our output from this case is 6.
I think this should be more performant than your dictionary based solution - it avoids having to perform a lookup each time you wish to check power of 2.
I think also given the numerical constraints of your inputs it is fine to use int data types.

How would I print martini glass pattern in c# optimal way

I am trying to print the martini glass pattern using c#.
The pattern is like following:
for input = 4;
0000000
00000
000
0
|
|
|
|
=======
for input = 5;
000000000
0000000
00000
000
0
|
|
|
|
|
=========
I am able to get till triangle(0's).However, I am failing to get the neck(|) and bottom(=).
My code looks as follows:
const int height = 4;
for (int row = 0; row < height; row++)
{
//left padding
for (int col = 0; col < row; col++)
{
Console.Write(' ');
}
for (int col = 0; col < (height - row) * 2 - 1; col++)
{
Console.Write('0');
}
//right padding
for (int col = 0; col < row; col++)
{
Console.Write(' ');
}
Console.WriteLine();
}
for(int i = 1; i < height; i++)
{
Console.Write('|');
}
Console.ReadKey();
And it prints like this:
0000000
00000
000
0
|||
Can somebody help me in finishing the neck and the bottom?
And also is my code optimal? You are free to edit the complete code for optimization.
Thanks in advance.
Edited:
Code added for neck and bottom:
for (int i = 1; i <= height; i++)
{
// Left padding
for (int j = 1; j < height; j++)
{
Console.Write(' ');
}
Console.WriteLine('|');
}
for (int row = 0; row < height; row++)
{
for (int col = 0; col < row; col++)
{
Console.Write('=');
}
}
Console.ReadKey();
string constructor is helpful to avoid writing excessive loops
int count = 5;
for(int i = count - 1; i >= 0; i--)
{
Console.WriteLine(new string('0', 2*i + 1).PadLeft(i+count));
}
Console.Write(new string('|', count).Replace("|","|\n".PadLeft(count+1)));
Console.WriteLine(new string('=', count* 2-1));
Use the string class constructor to repeat a pattern instead of looping them over.
class HelloWorld {
static void Main() {
const int height = 1 ;
for (int row = 0; row < height; row++)
{
var spaces = new String(' ', row);
var zeroes = new String('0', ((height - row) * 2 ) -1 );
Console.WriteLine(spaces + zeroes);
}
for(int i = 1; i <= height; i++)
{
var spaces = new String(' ', height -1);
Console.WriteLine(spaces + '|');
}
Console.WriteLine(new String('=', (height *2) -1));
Console.ReadKey();
}
}
Edit
But optimal I'm assuming faster execution time. But for relatively smaller values I do not see how the both could make a significant difference. But I still ran it on BenchmarkDotNet
First refer's to my code and Second is kazem's one.
I am not sure what to make out of this output. But I assume you can read more on it from their documentation
// * Detailed results *
MartiniBenchMark.First: DefaultJob
Runtime = .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0; GC = Concurrent Workstation
Mean = 1.7365 ms, StdErr = 0.0081 ms (0.47%); N = 15, StdDev = 0.0315 ms
Min = 1.6916 ms, Q1 = 1.7099 ms, Median = 1.7309 ms, Q3 = 1.7626 ms, Max = 1.8087 ms
IQR = 0.0527 ms, LowerFence = 1.6309 ms, UpperFence = 1.8417 ms
ConfidenceInterval = [1.7028 ms; 1.7702 ms] (CI 99.9%), Margin = 0.0337 ms (1.94% of Mean)
Skewness = 0.45, Kurtosis = 2.58
MartiniBenchMark.Second: DefaultJob
Runtime = .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0; GC = Concurrent Workstation
Mean = 1.8580 ms, StdErr = 0.0147 ms (0.79%); N = 96, StdDev = 0.1440 ms
Min = 1.6291 ms, Q1 = 1.7440 ms, Median = 1.8311 ms, Q3 = 1.9782 ms, Max = 2.2573 ms
IQR = 0.2342 ms, LowerFence = 1.3927 ms, UpperFence = 2.3295 ms
ConfidenceInterval = [1.8081 ms; 1.9079 ms] (CI 99.9%), Margin = 0.0499 ms (2.69% of Mean)
Skewness = 0.42, Kurtosis = 2.22
Total time: 00:12:04 (724.8 sec)
// * Summary *
BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 1 (10.0.14393)
Processor=Intel Core i3-3110M CPU 2.40GHz (Ivy Bridge), ProcessorCount=4
Frequency=2338445 Hz, Resolution=427.6346 ns, Timer=TSC
[Host] : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0
DefaultJob : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1586.0
Method | Mean | Error | StdDev |
------- |---------:|----------:|----------:|
First | 1.737 ms | 0.0337 ms | 0.0315 ms |
Second | 1.858 ms | 0.0499 ms | 0.1440 ms |
// * Hints *
Outliers
MartiniBenchMark.First: Default -> 3 outliers were removed
MartiniBenchMark.Second: Default -> 4 outliers were removed
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
1 ms : 1 Millisecond (0.001 sec)
You need to print the spaces in the left first. Just as the same you did in case of '0'. Left Padding part.
for(int i = 1; i <= height; i++)
{
// Left padding
for(int j = 1; j < height; j++)
{
Console.Write(' ');
}
Console.WriteLine('|');
}
And your neck loop should go till <=height.
Now, I think you can complete the bottom part.(It will be same as the first line of '0', without any padding) Please let me know if you face any difficulty.
Also, I don't think you need Right padding part.
Hope it helps.
EDIT:
Bottom Part:
for(int i = 1; i <= height * 2 - 1; i++)
{
Console.Write("=");
}

Fill a multidimensional array with same values C#

Is there a faster way of doing this using C#?
double[,] myArray = new double[length1, length2];
for(int i=0;i<length1;i++)
for(int j=0;j<length2;j++)
myArray[i,j] = double.PositiveInfinity;
I remember using C++, there was something called memset() for doing these kind of things...
A multi-dimensional array is just a large block of memory, so we can treat it like one, similar to how memset() works. This requires unsafe code. I wouldn't say it's worth doing unless it's really performance critical. This is a fun exercise, though, so here are some benchmarks using BenchmarkDotNet:
public class ArrayFillBenchmark
{
const int length1 = 1000;
const int length2 = 1000;
readonly double[,] _myArray = new double[length1, length2];
[Benchmark]
public void MultidimensionalArrayLoop()
{
for (int i = 0; i < length1; i++)
for (int j = 0; j < length2; j++)
_myArray[i, j] = double.PositiveInfinity;
}
[Benchmark]
public unsafe void MultidimensionalArrayNaiveUnsafeLoop()
{
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
for (int i = 0; i < length1; i++)
for (int j = 0; j < length2; j++)
*b++ = double.PositiveInfinity;
}
}
[Benchmark]
public unsafe void MultidimensionalSpanFill()
{
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
var span = new Span<double>(b, length1 * length2);
span.Fill(double.PositiveInfinity);
}
}
[Benchmark]
public unsafe void MultidimensionalSseFill()
{
var vectorPositiveInfinity = Vector128.Create(double.PositiveInfinity);
fixed (double* a = &_myArray[0, 0])
{
double* b = a;
ulong i = 0;
int size = Vector128<double>.Count;
ulong length = length1 * length2;
for (; i < (length & ~(ulong)15); i += 16)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
Sse2.Store(b+size*2, vectorPositiveInfinity);
Sse2.Store(b+size*3, vectorPositiveInfinity);
Sse2.Store(b+size*4, vectorPositiveInfinity);
Sse2.Store(b+size*5, vectorPositiveInfinity);
Sse2.Store(b+size*6, vectorPositiveInfinity);
Sse2.Store(b+size*7, vectorPositiveInfinity);
b += size*8;
}
for (; i < (length & ~(ulong)7); i += 8)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
Sse2.Store(b+size*2, vectorPositiveInfinity);
Sse2.Store(b+size*3, vectorPositiveInfinity);
b += size*4;
}
for (; i < (length & ~(ulong)3); i += 4)
{
Sse2.Store(b+size*0, vectorPositiveInfinity);
Sse2.Store(b+size*1, vectorPositiveInfinity);
b += size*2;
}
for (; i < length; i++)
{
*b++ = double.PositiveInfinity;
}
}
}
}
Results:
| Method | Mean | Error | StdDev | Ratio |
|------------------------------------- |-----------:|----------:|----------:|------:|
| MultidimensionalArrayLoop | 1,083.1 us | 11.797 us | 11.035 us | 1.00 |
| MultidimensionalArrayNaiveUnsafeLoop | 436.2 us | 8.567 us | 8.414 us | 0.40 |
| MultidimensionalSpanFill | 321.2 us | 6.404 us | 10.875 us | 0.30 |
| MultidimensionalSseFill | 231.9 us | 4.616 us | 11.323 us | 0.22 |
MultidimensionalArrayLoop is slow because of bounds checking. The JIT emits code each loop that makes sure that [i, j] is inside the bounds of the array. The JIT can elide bounds checking sometimes, I know it does for single-dimensional arrays. I'm not sure if it does it for multi-dimensional.
MultidimensionalArrayNaiveUnsafeLoop is essentially the same code as MultidimensionalArrayLoop but without bounds checking. It's considerably faster, taking 40% of the time. It's considered 'Naive', though, because the loop could still be improved by unrolling the loop.
MultidimensionalSpanFill also has no bounds check, and is more-or-less the same as MultidimensionalArrayNaiveUnsafeLoop, however, Span.Fill internally does loop unrolling, which is why it's a bit faster than our naive unsafe loop. It only take 30% of the time as our original.
MultidimensionalSseFill improves on our first unsafe loop by doing two things: loop unrolling and vectorizing. This requires a CPU with Sse2 support, but it allows us to write 128-bits (16 bytes) in a single instruction. This gives us an additional speed boost, taking it down to 22% of the original. Interestingly, this same loop with Avx (256-bits) was consistently slower than the Sse2 version, so that benchmark is not included here.
But these numbers only apply to an array that is 1000x1000. As you change the size of the array, the results differ. For example, when we change the array size to 10000x10000, the results for all of the unsafe benchmarks are very close. Probably because there are more memory fetches for the larger array that it tends to equalize the smaller iterative improvements seen in the last three benchmarks.
There's a lesson in there somewhere, but I mostly just wanted to share these results, since it was a pretty fun experiment to do.
I wrote the method that is not faster, but it works with actual multidimensional arrays, not only 2D.
public static class ArrayExtensions
{
public static void Fill(this Array array, object value)
{
var indicies = new int[array.Rank];
Fill(array, 0, indicies, value);
}
public static void Fill(Array array, int dimension, int[] indicies, object value)
{
if (dimension < array.Rank)
{
for (int i = array.GetLowerBound(dimension); i <= array.GetUpperBound(dimension); i++)
{
indicies[dimension] = i;
Fill(array, dimension + 1, indicies, value);
}
}
else
array.SetValue(value, indicies);
}
}
double[,] myArray = new double[x, y];
if( parallel == true )
{
stopWatch.Start();
System.Threading.Tasks.Parallel.For( 0, x, i =>
{
for( int j = 0; j < y; ++j )
myArray[i, j] = double.PositiveInfinity;
});
stopWatch.Stop();
Print( "Elapsed milliseconds: {0}", stopWatch.ElapsedMilliseconds );
}
else
{
stopWatch.Start();
for( int i = 0; i < x; ++i )
for( int j = 0; j < y; ++j )
myArray[i, j] = double.PositiveInfinity;
stopWatch.Stop();
Print("Elapsed milliseconds: {0}", stopWatch.ElapsedMilliseconds);
}
When setting x and y to 10000 I get 553 milliseconds for the single-threaded approach and 170 for the multi-threaded one.
There is a possibility to quickly fill an md-array that does not use the keyword unsafe (see answers for this question)

Calculate square root of a BigInteger (System.Numerics.BigInteger)

.NET 4.0 provides the System.Numerics.BigInteger type for arbitrarily-large integers. I need to compute the square root (or a reasonable approximation -- e.g., integer square root) of a BigInteger. So that I don't have to reimplement the wheel, does anyone have a nice extension method for this?
Check if BigInteger is not a perfect square has code to compute the integer square root of a Java BigInteger. Here it is translated into C#, as an extension method.
public static BigInteger Sqrt(this BigInteger n)
{
if (n == 0) return 0;
if (n > 0)
{
int bitLength = Convert.ToInt32(Math.Ceiling(BigInteger.Log(n, 2)));
BigInteger root = BigInteger.One << (bitLength / 2);
while (!isSqrt(n, root))
{
root += n / root;
root /= 2;
}
return root;
}
throw new ArithmeticException("NaN");
}
private static Boolean isSqrt(BigInteger n, BigInteger root)
{
BigInteger lowerBound = root*root;
BigInteger upperBound = (root + 1)*(root + 1);
return (n >= lowerBound && n < upperBound);
}
Informal testing indicates that this is about 75X slower than Math.Sqrt, for small integers. The VS profiler points to the multiplications in isSqrt as the hotspots.
I am not sure if Newton's Method is the best way to compute bignum square roots, because it involves divisions which are slow for bignums. You can use a CORDIC method, which uses only addition and shifts (shown here for unsigned ints)
static uint isqrt(uint x)
{
int b=15; // this is the next bit we try
uint r=0; // r will contain the result
uint r2=0; // here we maintain r squared
while(b>=0)
{
uint sr2=r2;
uint sr=r;
// compute (r+(1<<b))**2, we have r**2 already.
r2+=(uint)((r<<(1+b))+(1<<(b+b)));
r+=(uint)(1<<b);
if (r2>x)
{
r=sr;
r2=sr2;
}
b--;
}
return r;
}
There's a similar method which uses only addition and shifts, called 'Dijkstras Square Root', explained for example here:
http://lib.tkk.fi/Diss/2005/isbn9512275279/article3.pdf
Ok, first a few speed tests of some variants posted here. (I only considered methods which give exact results and are at least suitable for BigInteger):
+------------------------------+-------+------+------+-------+-------+--------+--------+--------+
| variant - 1000x times | 2e5 | 2e10 | 2e15 | 2e25 | 2e50 | 2e100 | 2e250 | 2e500 |
+------------------------------+-------+------+------+-------+-------+--------+--------+--------+
| my version | 0.03 | 0.04 | 0.04 | 0.76 | 1.44 | 2.23 | 4.84 | 23.05 |
| RedGreenCode (bound opti.) | 0.56 | 1.20 | 1.80 | 2.21 | 3.71 | 6.10 | 14.53 | 51.48 |
| RedGreenCode (newton method) | 0.80 | 1.21 | 2.12 | 2.79 | 5.23 | 8.09 | 19.90 | 65.36 |
| Nordic Mainframe (CORDIC) | 2.38 | 5.52 | 9.65 | 19.80 | 46.69 | 90.16 | 262.76 | 637.82 |
| Sunsetquest (without divs) | 2.37 | 5.48 | 9.11 | 24.50 | 56.83 | 145.52 | 839.08 | 4.62 s |
| Jeremy Kahan (js-port) | 46.53 | #.## | #.## | #.## | #.## | #.## | #.## | #.## |
+------------------------------+-------+------+------+-------+-------+--------+--------+--------+
+------------------------------+--------+--------+--------+---------+---------+--------+--------+
| variant - single | 2e1000 | 2e2500 | 2e5000 | 2e10000 | 2e25000 | 2e50k | 2e100k |
+------------------------------+--------+--------+--------+---------+---------+--------+--------+
| my version | 0.10 | 0.77 | 3.46 | 14.97 | 105.19 | 455.68 | 1,98 s |
| RedGreenCode (bound opti.) | 0.26 | 1.41 | 6.53 | 25.36 | 182.68 | 777.39 | 3,30 s |
| RedGreenCode (newton method) | 0.33 | 1.73 | 8.08 | 32.07 | 228.50 | 974.40 | 4,15 s |
| Nordic Mainframe (CORDIC) | 1.83 | 7.73 | 26.86 | 94.55 | 561.03 | 2,25 s | 10.3 s |
| Sunsetquest (without divs) | 31.84 | 450.80 | 3,48 s | 27.5 s | #.## | #.## | #.## |
| Jeremy Kahan (js-port) | #.## | #.## | #.## | #.## | #.## | #.## | #.## |
+------------------------------+--------+--------+--------+---------+---------+--------+--------+
- value example: 2e10 = 20000000000 (result: 141421)
- times in milliseconds or with "s" in seconds
- #.##: need more than 5 minutes (timeout)
Descriptions:
Jeremy Kahan (js-port)
Jeremy's simple algorithm works, but the computational effort increases exponentially very fast due to the simple adding/subtracting... :)
Sunsetquest (without divs)
The approach without dividing is good, but due to the divide and conquer variant the results converges relatively slowly (especially with large numbers)
Nordic Mainframe (CORDIC)
The CORDIC algorithm is already quite powerful, although the bit-by-bit operation of the imuttable BigIntegers generates much overhead.
I have calculated the required bits this way: int b = Convert.ToInt32(Math.Ceiling(BigInteger.Log(x, 2))) / 2 + 1;
RedGreenCode (newton method)
The proven newton method shows that something old does not have to be slow. Especially the fast convergence of large numbers can hardly be topped.
RedGreenCode (bound opti.)
The proposal of Jesan Fafon to save a multiplication has brought a lot here.
my version
First: calculate small numbers at the beginning with Math.Sqrt() and as soon as the accuracy of double is no longer sufficient, then use the newton algorithm. However, I try to pre-calculate as many numbers as possible with Math.Sqrt(), which makes the newton algorithm converge much faster.
Here the source:
static readonly BigInteger FastSqrtSmallNumber = 4503599761588223UL; // as static readonly = reduce compare overhead
static BigInteger SqrtFast(BigInteger value)
{
if (value <= FastSqrtSmallNumber) // small enough for Math.Sqrt() or negative?
{
if (value.Sign < 0) throw new ArgumentException("Negative argument.");
return (ulong)Math.Sqrt((ulong)value);
}
BigInteger root; // now filled with an approximate value
int byteLen = value.ToByteArray().Length;
if (byteLen < 128) // small enough for direct double conversion?
{
root = (BigInteger)Math.Sqrt((double)value);
}
else // large: reduce with bitshifting, then convert to double (and back)
{
root = (BigInteger)Math.Sqrt((double)(value >> (byteLen - 127) * 8)) << (byteLen - 127) * 4;
}
for (; ; )
{
var root2 = value / root + root >> 1;
if ((root2 == root || root2 == root + 1) && IsSqrt(value, root)) return root;
root = value / root2 + root2 >> 1;
if ((root == root2 || root == root2 + 1) && IsSqrt(value, root2)) return root2;
}
}
static bool IsSqrt(BigInteger value, BigInteger root)
{
var lowerBound = root * root;
return value >= lowerBound && value <= lowerBound + (root << 1);
}
full Benchmark-Source:
using System;
using System.Numerics;
using System.Diagnostics;
namespace MathTest
{
class Program
{
static readonly BigInteger FastSqrtSmallNumber = 4503599761588223UL; // as static readonly = reduce compare overhead
static BigInteger SqrtMax(BigInteger value)
{
if (value <= FastSqrtSmallNumber) // small enough for Math.Sqrt() or negative?
{
if (value.Sign < 0) throw new ArgumentException("Negative argument.");
return (ulong)Math.Sqrt((ulong)value);
}
BigInteger root; // now filled with an approximate value
int byteLen = value.ToByteArray().Length;
if (byteLen < 128) // small enough for direct double conversion?
{
root = (BigInteger)Math.Sqrt((double)value);
}
else // large: reduce with bitshifting, then convert to double (and back)
{
root = (BigInteger)Math.Sqrt((double)(value >> (byteLen - 127) * 8)) << (byteLen - 127) * 4;
}
for (; ; )
{
var root2 = value / root + root >> 1;
if ((root2 == root || root2 == root + 1) && IsSqrt(value, root)) return root;
root = value / root2 + root2 >> 1;
if ((root == root2 || root == root2 + 1) && IsSqrt(value, root2)) return root2;
}
}
static bool IsSqrt(BigInteger value, BigInteger root)
{
var lowerBound = root * root;
return value >= lowerBound && value <= lowerBound + (root << 1);
}
// newton method
public static BigInteger SqrtRedGreenCode(BigInteger n)
{
if (n == 0) return 0;
if (n > 0)
{
int bitLength = Convert.ToInt32(Math.Ceiling(BigInteger.Log(n, 2)));
BigInteger root = BigInteger.One << (bitLength / 2);
while (!isSqrtRedGreenCode(n, root))
{
root += n / root;
root /= 2;
}
return root;
}
throw new ArithmeticException("NaN");
}
private static bool isSqrtRedGreenCode(BigInteger n, BigInteger root)
{
BigInteger lowerBound = root * root;
//BigInteger upperBound = (root + 1) * (root + 1);
return n >= lowerBound && n <= lowerBound + root + root;
//return (n >= lowerBound && n < upperBound);
}
// without divisions
public static BigInteger SqrtSunsetquest(BigInteger number)
{
if (number < 9)
{
if (number == 0)
return 0;
if (number < 4)
return 1;
else
return 2;
}
BigInteger n = 0, p = 0;
var high = number >> 1;
var low = BigInteger.Zero;
while (high > low + 1)
{
n = (high + low) >> 1;
p = n * n;
if (number < p)
{
high = n;
}
else if (number > p)
{
low = n;
}
else
{
break;
}
}
return number == p ? n : low;
}
// javascript port
public static BigInteger SqrtJeremyKahan(BigInteger n)
{
var oddNumber = BigInteger.One;
var result = BigInteger.Zero;
while (n >= oddNumber)
{
n -= oddNumber;
oddNumber += 2;
result++;
}
return result;
}
// CORDIC
public static BigInteger SqrtNordicMainframe(BigInteger x)
{
int b = Convert.ToInt32(Math.Ceiling(BigInteger.Log(x, 2))) / 2 + 1;
BigInteger r = 0; // r will contain the result
BigInteger r2 = 0; // here we maintain r squared
while (b >= 0)
{
var sr2 = r2;
var sr = r;
// compute (r+(1<<b))**2, we have r**2 already.
r2 += (r << 1 + b) + (BigInteger.One << b + b);
r += BigInteger.One << b;
if (r2 > x)
{
r = sr;
r2 = sr2;
}
b--;
}
return r;
}
static void Main(string[] args)
{
var t2 = BigInteger.Parse("2" + new string('0', 10000));
//var q1 = SqrtRedGreenCode(t2);
var q2 = SqrtSunsetquest(t2);
//var q3 = SqrtJeremyKahan(t2);
//var q4 = SqrtNordicMainframe(t2);
var q5 = SqrtMax(t2);
//if (q5 != q1) throw new Exception();
if (q5 != q2) throw new Exception();
//if (q5 != q3) throw new Exception();
//if (q5 != q4) throw new Exception();
for (int r = 0; r < 5; r++)
{
var mess = Stopwatch.StartNew();
//for (int i = 0; i < 1000; i++)
{
//var q = SqrtRedGreenCode(t2);
var q = SqrtSunsetquest(t2);
//var q = SqrtJeremyKahan(t2);
//var q = SqrtNordicMainframe(t2);
//var q = SqrtMax(t2);
}
mess.Stop();
Console.WriteLine((mess.ElapsedTicks * 1000.0 / Stopwatch.Frequency).ToString("N2") + " ms");
}
}
}
}
Short answer: (but beware, see below for more details)
Math.Pow(Math.E, BigInteger.Log(pd) / 2)
Where pd represents the BigInteger on which you want to perform the square root operation.
Long answer and explanation:
Another way to understanding this problem is understanding how square roots and logs work.
If you have the equation 5^x = 25, to solve for x we must use logs. In this example, I will use natural logs (logs in other bases are also possible, but the natural log is the easy way).
5^x = 25
Rewriting, we have:
x(ln 5) = ln 25
To isolate x, we have
x = ln 25 / ln 5
We see this results in x = 2. But since we already know x (x = 2, in 5^2), let's change what we don't know and write a new equation and solve for the new unknown. Let x be the result of the square root operation. This gives us
2 = ln 25 / ln x
Rewriting to isolate x, we have
ln x = (ln 25) / 2
To remove the log, we now use a special identity of the natural log and the special number e. Specifically, e^ln x = x. Rewriting the equation now gives us
e^ln x = e^((ln 25) / 2)
Simplifying the left hand side, we have
x = e^((ln 25) / 2)
where x will be the square root of 25. You could also extend this idea to any root or number, and the general formula for the yth root of x becomes e^((ln x) / y).
Now to apply this specifically to C#, BigIntegers, and this question specifically, we simply implement the formula. WARNING: Although the math is correct, there are finite limits. This method will only get you in the neighborhood, with a large unknown range (depending on how big of a number you operate on). Perhaps this is why Microsoft did not implement such a method.
// A sample generated public key modulus
var pd = BigInteger.Parse("101017638707436133903821306341466727228541580658758890103412581005475252078199915929932968020619524277851873319243238741901729414629681623307196829081607677830881341203504364437688722228526603134919021724454060938836833023076773093013126674662502999661052433082827512395099052335602854935571690613335742455727");
var sqrt = Math.Pow(Math.E, BigInteger.Log(pd) / 2);
Console.WriteLine(sqrt);
NOTE: The BigInteger.Log() method returns a double, so two concerns arise. 1) The number is imprecise, and 2) there is an upper limit on what Log() can handle for BigInteger inputs. To examine the upper limit, we can look at normal form for the natural log, that is ln x = y. In other words, e^y = x. Since double is the return type of BigInteger.Log(), it would stand to reason the largest BigInteger would then be e raised to double.MaxValue. On my computer, that would e^1.79769313486232E+308. The imprecision is unhandled. Anyone want to implement BigDecimal and update BigInteger.Log()?
Consumer beware, but it will get you in the neighborhood, and squaring the result does produce a number similar to the original input, up to so many digits and not as precise as RedGreenCode's answer. Happy (square) rooting! ;)
You can convert this to the language and variable types of your choice. Here is a truncated squareroot in JavaScript (freshest for me) that takes advantage of 1+3+5...+nth odd number = n^2. All the variables are integers, and it only adds and subtracts.
var truncSqrt = function(n) {
var oddNumber = 1;
var result = 0;
while (n >= oddNumber) {
n -= oddNumber;
oddNumber += 2;
result++;
}
return result;
};
Update: For best performance, use the Newton Plus version.
That one is hundreds of times faster. I am leaving this one for reference, however, as an alternative way.
// Source: http://mjs5.com/2016/01/20/c-biginteger-square-root-function/ Michael Steiner, Jan 2016
// Slightly modified to correct error below 6. (thank you M Ktsis D)
public static BigInteger SteinerSqrt(BigInteger number)
{
if (number < 9)
{
if (number == 0)
return 0;
if (number < 4)
return 1;
else
return 2;
}
BigInteger n = 0, p = 0;
var high = number >> 1;
var low = BigInteger.Zero;
while (high > low + 1)
{
n = (high + low) >> 1;
p = n * n;
if (number < p)
{
high = n;
}
else if (number > p)
{
low = n;
}
else
{
break;
}
}
return number == p ? n : low;
}
Update: Thank you to M Ktsis D for finding a bug in this. It has been corrected with a guard clause.
The two methods below use the babylonian method to calculate the square root of the provided number. The Sqrt method returns BigInteger type and therefore will only provide answer to the last whole number (no decimal points).
The method will use 15 iterations, although after a few tests, I found out that 12-13 iterations are enough for 80+ digit numbers, however I decided to keep it at 15 just in case.
As the Babylonian square root approximation method requires us to pick a number that is half the length of the number that we want to find square root of, the RandomBigIntegerOfLength() method therefore provides that number.
The RandomBigIntegerOfLength() takes an integer length of a number as an argument and provides a randomly generated number of that length. The number is generated using the Next() method from the Random class, the Next() method is called twice in order to avoid the number to have 0 at the beginning (something like 041657180501613764193159871) as it causes the DivideByZeroException. It is important to point out that initially the number is generataed one by one, concatenated, and only then it is converted to BigInteger type from string.
The Sqrt method uses the RandomBigIntegerOfLength method to obtain a random number of half the length of the provided argument "number" and then calculates the square root using the babylonian method with 15 iterations. The number of iterations may be changed to smaller or bigger as you would like. As the babylonian method cannot provide square root of 0, as it requires dividing by 0, in case 0 is provided as an argument it will return 0.
//Copy the two methods
public static BigInteger Sqrt(BigInteger number)
{
BigInteger _x = RandomBigIntegerOfLength((number.ToString().ToCharArray().Length / 2));
try
{
for (int i = 0; i < 15; i++)
{
_x = (_x + number / _x) / 2;
}
return _x;
}
catch (DivideByZeroException)
{
return 0;
}
}
// Copy this method as well
private static BigInteger RandomBigIntegerOfLength(int length)
{
Random rand = new Random();
string _randomNumber = "";
_randomNumber = String.Concat(_randomNumber, rand.Next(1, 10));
for (int i = 0; i < length-1; i++)
{
_randomNumber = String.Concat(_randomNumber,rand.Next(10).ToString());
}
if (String.IsNullOrEmpty(_randomNumber) == false) return BigInteger.Parse(_randomNumber);
else return 0;
}
*** World's fastest BigInteger Sqrt for Java/C# !!!!***
Write-Up: https://www.codeproject.com/Articles/5321399/NewtonPlus-A-Fast-Big-Number-Square-Root-Function
Github: https://github.com/SunsetQuest/NewtonPlus-Fast-BigInteger-and-BigFloat-Square-Root
public static BigInteger NewtonPlusSqrt(BigInteger x)
{
if (x < 144838757784765629) // 1.448e17 = ~1<<57
{
uint vInt = (uint)Math.Sqrt((ulong)x);
if ((x >= 4503599761588224) && ((ulong)vInt * vInt > (ulong)x)) // 4.5e15 = ~1<<52
{
vInt--;
}
return vInt;
}
double xAsDub = (double)x;
if (xAsDub < 8.5e37) // long.max*long.max
{
ulong vInt = (ulong)Math.Sqrt(xAsDub);
BigInteger v = (vInt + ((ulong)(x / vInt))) >> 1;
return (v * v <= x) ? v : v - 1;
}
if (xAsDub < 4.3322e127)
{
BigInteger v = (BigInteger)Math.Sqrt(xAsDub);
v = (v + (x / v)) >> 1;
if (xAsDub > 2e63)
{
v = (v + (x / v)) >> 1;
}
return (v * v <= x) ? v : v - 1;
}
int xLen = (int)x.GetBitLength();
int wantedPrecision = (xLen + 1) / 2;
int xLenMod = xLen + (xLen & 1) + 1;
//////// Do the first Sqrt on hardware ////////
long tempX = (long)(x >> (xLenMod - 63));
double tempSqrt1 = Math.Sqrt(tempX);
ulong valLong = (ulong)BitConverter.DoubleToInt64Bits(tempSqrt1) & 0x1fffffffffffffL;
if (valLong == 0)
{
valLong = 1UL << 53;
}
//////// Classic Newton Iterations ////////
BigInteger val = ((BigInteger)valLong << 52) + (x >> xLenMod - (3 * 53)) / valLong;
int size = 106;
for (; size < 256; size <<= 1)
{
val = (val << (size - 1)) + (x >> xLenMod - (3 * size)) / val;
}
if (xAsDub > 4e254) // 4e254 = 1<<845.76973610139
{
int numOfNewtonSteps = BitOperations.Log2((uint)(wantedPrecision / size)) + 2;
////// Apply Starting Size ////////
int wantedSize = (wantedPrecision >> numOfNewtonSteps) + 2;
int needToShiftBy = size - wantedSize;
val >>= needToShiftBy;
size = wantedSize;
do
{
//////// Newton Plus Iterations ////////
int shiftX = xLenMod - (3 * size);
BigInteger valSqrd = (val * val) << (size - 1);
BigInteger valSU = (x >> shiftX) - valSqrd;
val = (val << size) + (valSU / val);
size *= 2;
} while (size < wantedPrecision);
}
/////// There are a few extra digits here, lets save them ///////
int oversidedBy = size - wantedPrecision;
BigInteger saveDroppedDigitsBI = val & ((BigInteger.One << oversidedBy) - 1);
int downby = (oversidedBy < 64) ? (oversidedBy >> 2) + 1 : (oversidedBy - 32);
ulong saveDroppedDigits = (ulong)(saveDroppedDigitsBI >> downby);
//////// Shrink result to wanted Precision ////////
val >>= oversidedBy;
//////// Detect a round-ups ////////
if ((saveDroppedDigits == 0) && (val * val > x))
{
val--;
}
////////// Error Detection ////////
//// I believe the above has no errors but to guarantee the following can be added.
//// If an error is found, please report it.
//BigInteger tmp = val * val;
//if (tmp > x)
//{
// Console.WriteLine($"Missed , {ToolsForOther.ToBinaryString(saveDroppedDigitsBI, oversidedBy)}, {oversidedBy}, {size}, {wantedPrecision}, {saveDroppedDigitsBI.GetBitLength()}");
// if (saveDroppedDigitsBI.GetBitLength() >= 6)
// Console.WriteLine($"val^2 ({tmp}) < x({x}) off%:{((double)(tmp)) / (double)x}");
// //throw new Exception("Sqrt function had internal error - value too high");
//}
//if ((tmp + 2 * val + 1) <= x)
//{
// Console.WriteLine($"(val+1)^2({((val + 1) * (val + 1))}) >= x({x})");
// //throw new Exception("Sqrt function had internal error - value too low");
//}
return val;
}
Below is a log-based chart. Please note a small difference is a huge difference in performance. All are in C# except GMP (C++/Asm) which was added for comparison. Java's version (ported to C#) has also been added.

Is there a good radixsort-implementation for floats in C#

I have a datastructure with a field of the float-type. A collection of these structures needs to be sorted by the value of the float. Is there a radix-sort implementation for this.
If there isn't, is there a fast way to access the exponent, the sign and the mantissa.
Because if you sort the floats first on mantissa, exponent, and on exponent the last time. You sort floats in O(n).
Update:
I was quite interested in this topic, so I sat down and implemented it (using this very fast and memory conservative implementation). I also read this one (thanks celion) and found out that you even dont have to split the floats into mantissa and exponent to sort it. You just have to take the bits one-to-one and perform an int sort. You just have to care about the negative values, that have to be inversely put in front of the positive ones at the end of the algorithm (I made that in one step with the last iteration of the algorithm to save some cpu time).
So heres my float radixsort:
public static float[] RadixSort(this float[] array)
{
// temporary array and the array of converted floats to ints
int[] t = new int[array.Length];
int[] a = new int[array.Length];
for (int i = 0; i < array.Length; i++)
a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
// set the group length to 1, 2, 4, 8 or 16
// and see which one is quicker
int groupLength = 4;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
int[] count = new int[1 << groupLength];
int[] pref = new int[1 << groupLength];
int groups = bitLength / groupLength;
int mask = (1 << groupLength) - 1;
int negatives = 0, positives = 0;
for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
{
// reset count array
for (int j = 0; j < count.Length; j++)
count[j] = 0;
// counting elements of the c-th group
for (int i = 0; i < a.Length; i++)
{
count[(a[i] >> shift) & mask]++;
// additionally count all negative
// values in first round
if (c == 0 && a[i] < 0)
negatives++;
}
if (c == 0) positives = a.Length - negatives;
// calculating prefixes
pref[0] = 0;
for (int i = 1; i < count.Length; i++)
pref[i] = pref[i - 1] + count[i - 1];
// from a[] to t[] elements ordered by c-th group
for (int i = 0; i < a.Length; i++){
// Get the right index to sort the number in
int index = pref[(a[i] >> shift) & mask]++;
if (c == groups - 1)
{
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if (a[i] < 0)
index = positives - (index - negatives) - 1;
else
index += negatives;
}
t[index] = a[i];
}
// a[]=t[] and start again until the last group
t.CopyTo(a, 0);
}
// Convert back the ints to the float array
float[] ret = new float[a.Length];
for (int i = 0; i < a.Length; i++)
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
return ret;
}
It is slightly slower than an int radix sort, because of the array copying at the beginning and end of the function, where the floats are bitwise copied to ints and back. The whole function nevertheless is again O(n). In any case much faster than sorting 3 times in a row like you proposed. I dont see much room for optimizations anymore, but if anyone does: feel free to tell me.
To sort descending change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to this:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measuring:
I set up some short test, containing all special cases of floats (NaN, +/-Inf, Min/Max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floats:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So i ran a test with a huge array of 10M numbers:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
byte[] buffer = new byte[4];
rnd.NextBytes(buffer);
float rndfloat = BitConverter.ToSingle(buffer, 0);
switch(i){
case 0: { test[i] = float.MaxValue; break; }
case 1: { test[i] = float.MinValue; break; }
case 2: { test[i] = float.NaN; break; }
case 3: { test[i] = float.NegativeInfinity; break; }
case 4: { test[i] = float.PositiveInfinity; break; }
case 5: { test[i] = 0f; break; }
default: { test[i] = test[i] = rndfloat; break; }
}
}
And stopped the time of the different sorting algorithms:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the output was (update: now ran with release build, not debug):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
roughly more than four times as fast as Linq. That is not bad. But still not yet that fast as Array.Sort, but also not that much worse. But i was really surprised by this one: I expected it to be slightly slower than Linq on very small arrays. But then I ran a test with just 20 elements:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
and even this time my Radixsort is quicker than Linq, but way slower than array sort. :)
Update 2:
I made some more measurements and found out some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group length of 16 bits (only 2 iterations), you have a huge memory overhead when sorting small arrays, but you can beat Array.Sort if it comes to arrays larger than about 100k elements, even if not very much. The charts axes are both logarithmized:
(source: daubmeier.de)
There's a nice explanation of how to perform radix sort on floats here:
http://www.codercorner.com/RadixSortRevisited.htm
If all your values are positive, you can get away with using the binary representation; the link explains how to handle negative values.
By doing some fancy casting and swapping arrays instead of copying this version is 2x faster for 10M numbers as Philip Daubmeiers original with grouplength set to 8. It is 3x faster as Array.Sort for that arraysize.
static public void RadixSortFloat(this float[] array, int arrayLen = -1)
{
// Some use cases have an array that is longer as the filled part which we want to sort
if (arrayLen < 0) arrayLen = array.Length;
// Cast our original array as long
Span<float> asFloat = array;
Span<int> a = MemoryMarshal.Cast<float, int>(asFloat);
// Create a temp array
Span<int> t = new Span<int>(new int[arrayLen]);
// set the group length to 1, 2, 4, 8 or 16 and see which one is quicker
int groupLength = 8;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
var dim = 1 << groupLength;
int groups = bitLength / groupLength;
if (groups % 2 != 0) throw new Exception("groups must be even so data is in original array at end");
var count = new int[dim];
var pref = new int[dim];
int mask = (dim) - 1;
int negatives = 0, positives = 0;
// counting elements of the 1st group incuding negative/positive
for (int i = 0; i < arrayLen; i++)
{
if (a[i] < 0) negatives++;
count[(a[i] >> 0) & mask]++;
}
positives = arrayLen - negatives;
int c;
int shift;
for (c = 0, shift = 0; c < groups - 1; c++, shift += groupLength)
{
CalcPrefixes();
var nextShift = shift + groupLength;
//
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
count[( ai>> nextShift) & mask]++;
t[index] = ai;
}
// swap the arrays and start again until the last group
var temp = a;
a = t;
t = temp;
}
// Last round
CalcPrefixes();
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if ( ai < 0) index = positives - (index - negatives) - 1; else index += negatives;
//
t[index] = ai;
}
void CalcPrefixes()
{
pref[0] = 0;
for (int i = 1; i < dim; i++)
{
pref[i] = pref[i - 1] + count[i - 1];
count[i - 1] = 0;
}
}
}
You can use an unsafe block to memcpy or alias a float * to a uint * to extract the bits.
I think your best bet if the values aren't too close and there's a reasonable precision requirement, you can just use the actual float digits before and after the decimal point to do the sorting.
For example, you can just use the first 4 decimals (be they 0 or not) to do the sorting.

Categories

Resources