How can the following simple implementation of sum be faster?
private long sum( int [] a, int begin, int end ) {
if( a == null ) {
return 0;
}
long r = 0;
for( int i = begin ; i < end ; i++ ) {
r+= a[i];
}
return r;
}
EDIT
Background is in order.
Reading latest entry on coding horror, I came to this site: http://codility.com which has this interesting programming test.
Anyway, I got 60 out of 100 in my submission, and basically ( I think ) is because this implementation of sum, because those parts where I failed are the performance parts. I'm getting TIME_OUT_ERROR's
So, I was wondering if an optimization in the algorithm is possible.
So, no built in functions or assembly would be allowed. This my be done in C, C++, C#, Java or pretty much in any other.
EDIT
As usual, mmyers was right. I did profile the code and I saw most of the time was spent on that function, but I didn't understand why. So what I did was to throw away my implementation and start with a new one.
This time I've got an optimal solution [ according to San Jacinto O(n) -see comments to MSN below - ]
This time I've got 81% on Codility which I think is good enough. The problem is that I didn't take the 30 mins. but around 2 hrs. but I guess that leaves me still as a good programmer, for I could work on the problem until I found an optimal solution:
Here's my result.
I never understood what is those "combinations of..." nor how to test "extreme_first"
I don't think your problem is with the function that's summing the array, it's probably that you're summing the array WAY to frequently. If you simply sum the WHOLE array once, and then step through the array until you find the first equilibrium point you should decrease the execution time sufficiently.
int equi ( int[] A ) {
int equi = -1;
long lower = 0;
long upper = 0;
foreach (int i in A)
upper += i;
for (int i = 0; i < A.Length; i++)
{
upper -= A[i];
if (upper == lower)
{
equi = i;
break;
}
else
lower += A[i];
}
return equi;
}
Here is my solution and I scored 100%
public static int solution(int[] A)
{
double sum = A.Sum(d => (double)d);
double leftSum=0;
for (int i = 0; i < A.Length; i++){
if (leftSum == (sum-leftSum-A[i])) {
return i;
}
else {
leftSum = leftSum + A[i];
}
}
return -1;
}
If this is based on the actual sample problem, your issue isn't the sum. Your issue is how you calculate the equilibrium index. A naive implementation is O(n^2). An optimal solution is much much better.
This code is simple enough that unless a is quite small, it's probably going to be limited primarily by memory bandwidth. As such, you probably can't hope for any significant gain by working on the summing part itself (e.g., unrolling the loop, counting down instead of up, executing sums in parallel -- unless they're on separate CPUs, each with its own access to memory). The biggest gain will probably come from issuing some preload instructions so most of the data will already be in the cache by the time you need it. The rest will just (at best) get the CPU to hurry up more, so it waits longer.
Edit: It appears that most of what's above has little to do with the real question. It's kind of small, so it may be difficult to read, but, I tried just using std::accumulate() for the initial addition, and it seemed to think that was all right:
Some tips:
Use a profiler to identify where you're spending a lot of time.
Write good performance tests so that you can tell the exact effect of every single change you make. Keep careful notes.
If it turns out that the bottleneck is the checks to ensure that you're dereferencing a legal address inside the array, and you can guarantee that begin and end are in fact both inside the array, then consider fixing the array, making a pointer to the array, and doing the algorithm in pointers rather than arrays. Pointers are unsafe; they do not spend any time checking to make sure you're still inside the array, so therefore they can be somewhat faster. But you take responsibility then for ensuring that you do not corrupt every byte of memory in the address space.
I don't believe the problem is in the code you provided, but somehow the bigger solution must be suboptimal. This code looks good for calculating the sum of one slice of the array, but maybe it's not what you need to solve the whole problem.
Probably the fastest you could get would be to have your int array 16-byte aligned, stream 32 bytes into two __m128i variables (VC++) and call _mm_add_epi32 (again, a VC++ intrinsic) on the chunks. Reuse one of the chunks to keep adding into it and on the final chunk extract your four ints and add them the old fashioned way.
The bigger question is why simple addition is a worthy candidate for optimization.
Edit: I see it's mostly an academic exercise. Perhaps I'll give it a go tomorrow and post some results...
In C# 3.0, my computer and my OS this is faster as long as you can guarantee that 4 consecutive numbers won't overflow the range of an int, probably because most additions are done using 32-bit math.
However using a better algorithm usually provides higher speed up than any micro-optimization.
Time for a 100 millon elements array:
4999912596452418 -> 233ms (sum)
4999912596452418 -> 126ms (sum2)
private static long sum2(int[] a, int begin, int end)
{
if (a == null) { return 0; }
long r = 0;
int i = begin;
for (; i < end - 3; i+=4)
{
//int t = ;
r += a[i] + a[i + 1] + a[i + 2] + a[i + 3];
}
for (; i < end; i++) { r += a[i]; }
return r;
}
This won't help you with an O(n^2) algorithm, but you can optimize your sum.
At a previous company, we had Intel come by and give us optimization tips. They had one non-obvious and somewhat cool trick. Replace:
long r = 0;
for( int i = begin ; i < end ; i++ ) {
r+= a[i];
}
with
long r1 = 0, r2 = 0, r3 = 0, r4 = 0;
for( int i = begin ; i < end ; i+=4 ) {
r1+= a[i];
r2+= a[i + 1];
r3+= a[i + 2];
r4+= a[i + 3];
}
long r = r1 + r2 + r3 + r4;
// Note: need to be clever if array isn't divisible by 4
Why this is faster:
In the original implementation, your variable r is a bottleneck. Every time through the loop, you have to pull data from memory array a (which takes a couple cycles), but you can't do multiple pulls in parallel, because the value of r in the next iteration of the loop depends on the value of r in this iteration of the loop. In the second version, r1, r2, r3, and r4 are independent, so the processor can hyperthread their execution. Only at the very end do they come together.
Here is a thought:
private static ArrayList equi(int[] A)
{
ArrayList answer = new ArrayList();
//if(A == null) return -1;
if ((answer.Count == null))
{
answer.Add(-1);
return answer;
}
long sum0 = 0, sum1 = 0;
for (int i = 0; i < A.Length; i++) sum0 += A[i];
for (int i = 0; i < A.Length; i++)
{
sum0 -= A[i];
if (i > 0) { sum1 += A[i - 1]; }
if (sum1 == sum0) answer.Add(i);
//return i;
}
//return -1;
return answer;
}
If you are using C or C++ and develop for modern desktop systems and are willing to learn some assembler or learn about GCC intrinsics, you could use SIMD instructions.
This library is an example of what is possible for float and double arrays, similar results should be possible for integer arithmetic since SSE has integer instructions as well.
In C++, the following:
int* a1 = a + begin;
for( int i = end - begin - 1; i >= 0 ; i-- )
{
r+= a1[i];
}
might be faster.
The advantage is that we compare against zero in the loop.
Of course, with a really good optimizer there should be no difference at all.
Another possibility would be
int* a2 = a + end - 1;
for( int i = -(end - begin - 1); i <= 0 ; i++ )
{
r+= a2[i];
}
here we traversing the items in the same order, just not comparing to end.
I did the same naive implementation and here's my O(n) solution. I did not use the IEnumerable Sum method because it was not available at Codility. My solution still doesn't check for overflow in case the input has large numbers so it's failing that particular test on Codility.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var list = new[] {-7, 1, 5, 2, -4, 3, 0};
Console.WriteLine(equi(list));
Console.ReadLine();
}
static int equi(int[] A)
{
if (A == null || A.Length == 0)
return -1;
if (A.Length == 1)
return 0;
var upperBoundSum = GetTotal(A);
var lowerBoundSum = 0;
for (var i = 0; i < A.Length; i++)
{
lowerBoundSum += (i - 1) >= 0 ? A[i - 1] : 0;
upperBoundSum -= A[i];
if (lowerBoundSum == upperBoundSum)
return i;
}
return -1;
}
private static int GetTotal(int[] ints)
{
var sum = 0;
for(var i=0; i < ints.Length; i++)
sum += ints[i];
return sum;
}
}
}
100% O(n) solution in C
int equi ( int A[], int n ) {
long long sumLeft = 0;
long long sumRight = 0;
int i;
if (n <= 0) return -1;
for (i = 1; i < n; i++)
sumRight += A[i];
i = 0;
do {
if (sumLeft == sumRight)
return i;
sumLeft += A[i];
if ((i+1) < n)
sumRight -= A[i+1];
i++;
} while (i < n);
return -1;
}
Probably not perfect but it passes their tests anyway :)
Can't say I'm a big fan of Codility though - it is an interesting idea, but I found the requirements a little too vague. I think I'd be more impressed if they gave you requirements + a suite of unit tests that test those requirements and then asked you to write code. That's how most TDD happens anyway. I don't think doing it blind really gains anything other than allowing them to throw in some corner cases.
private static int equi ( int[] A ) {
if (A == null || A.length == 0)
return -1;
long tot = 0;
int len = A.length;
for(int i=0;i<len;i++)
tot += A[i];
if(tot == 0)
return (len-1);
long partTot = 0;
for(int i=0;i<len-1;i++)
{
partTot += A[i];
if(partTot*2+A[i+1] == tot)
return i+1;
}
return -1;
}
I considered the array as a bilance so if the equilibrium index exist then half of the weight is on the left. So I only compare the partTot (partial total) x 2 with the total weight of the array.
the Alg takes O(n) + O(n)
100% correctness and performance of this code is tested
Private Function equi(ByVal A() As Integer) As Integer
Dim index As Integer = -1
If A.Length > 0 And Not IsDBNull(A) Then
Dim sumLeft As Long = 0
Dim sumRight As Long = ArraySum(A)
For i As Integer = 0 To A.Length - 1
Dim val As Integer = A(i)
sumRight -= val
If sumLeft = sumRight Then
index = i
End If
sumLeft += val
Next
End If
Return index
End Function
Just some thought, not sure if accessing the pointer directly be faster
int* pStart = a + begin;
int* pEnd = a + end;
while (pStart != pEnd)
{
r += *pStart++;
}
{In Pascal + Assembly}
{$ASMMODE INTEL}
function equi (A : Array of longint; n : longint ) : longint;
var c:Longint;
label noOverflow1;
label noOverflow2;
label ciclo;
label fine;
label Over;
label tot;
Begin
Asm
DEC n
JS over
XOR ECX, ECX {Somma1}
XOR EDI, EDI {Somma2}
XOR EAX, EAX
MOV c, EDI
MOV ESI, n
tot:
MOV EDX, A
MOV EDX, [EDX+ESI*4]
PUSH EDX
ADD ECX, EDX
JNO nooverflow1
ADD c, ECX
nooverflow1:
DEC ESI
JNS tot;
SUB ECX, c
SUB EDI, c
ciclo:
POP EDX
SUB ECX, EDX
CMP ECX, EDI
JE fine
ADD EDI, EDX
JNO nooverflow2
DEC EDI
nooverflow2:
CMP EAX, n
JA over
INC EAX
JMP ciclo
over:
MOV EAX, -1
fine:
end;
End;
This got me 100% in Javascript:
function solution(A) {
if (!(A) || !(Array.isArray(A)) || A.length < 1) {
return -1;
}
if (A.length === 1) {
return 0;
}
var sum = A.reduce(function (a, b) { return a + b; }),
lower = 0,
i,
val;
for (i = 0; i < A.length; i++) {
val = A[i];
if (((sum - lower) - val) === (lower)) {
return i;
}
lower += val;
}
return -1;
}
Here is my answer with with explanations on how to go about it. It will get you 100%
class Solution
{
public int solution(int[] A)
{
long sumLeft = 0; //Variable to hold sum of elements to the left of the current index
long sumRight = 0; //Variable to hold sum of elements to the right of the current index
long sum = 0; //Variable to hold sum of all elements in the array
long leftHolder = 0; //Variable that holds the sum of all elements to the left of the current index, including the element accessed by the current index
//Calculate the total sum of all elements in the array and store it in the sum variable
for (int i = 0; i < A.Length; i++)
{
//sum = A.Sum();
sum += A[i];
}
for (int i = 0; i < A.Length; i++)
{
//Calculate the sum of all elements before the current element plus the current element
leftHolder += A[i];
//Get the sum of all elements to the right of the current element
sumRight = sum - leftHolder;
//Get the sum of all elements of elements to the left of the current element.We don't include the current element in this sum
sumLeft = sum - sumRight - A[i];
//if the sum of the left elements is equal to the sum of the right elements. Return the index of the current element
if (sumLeft == sumRight)
return i;
}
//Otherwise return -1
return -1;
}
}
This may be old, but here is solution in Golang with 100% pass rate:
package solution
func Solution(A []int) int {
// write your code in Go 1.4
var left int64
var right int64
var equi int
equi = -1
if len(A) == 0 {
return equi
}
left = 0
for _, el := range A {
right += int64(el)
}
for i, el := range A {
right -= int64(el)
if left == right {
equi = i
}
left += int64(el)
}
return equi
}
Related
I'm doing brainteasers and exercises on hackerrank to prepare for an interview test and came across some weird behaviour.
The taks in question is Minimum Swaps 2 from the Interview Preparation Kit's Arrays section.
Depending on which of the two elements I put into my temporary variable, it either passes or fails on timeout on most test cases.
Version A:
static int minimumSwaps(int[] arr) {
int swaps=0;
int tmp=0;
int n = arr.Length;
for(int i = 0; i < n; i++){
if(arr[i]==i+1) continue;
tmp = arr[arr[i]-1];
arr[arr[i]-1] = arr[i];
arr[i] = tmp;
i--;
swaps++;
}
return swaps;
}
Version B
static int minimumSwaps(int[] arr) {
int swaps=0;
int tmp=0;
int n = arr.Length;
for(int i = 0; i < n; i++){
if(arr[i]==i+1) continue;
tmp = arr[i];
arr[i] = arr[arr[i]-1];
arr[arr[i]-1]=tmp;
i--;
swaps++;
}
return swaps;
}
A passes, B times out on all but one test.
What it actually does is find the minimum amount of swaps to sort an array of consecutive integers. That's why i can be used to compare values and indices.
If arr[i] is on the corresponding index, we continue, if it's wrong, we swap it with the index it's supposed to be at, decrement i and repeat.
I'm honestly at a loss as to why storing arr[arr[i]-1] in tmp is less complicated than storing arr[i] first. It must have something to do with the intermediate machine code it get's compiled to, right?
I'd be grateful for any explanation so I don't run into similar issues in the future.
I've made a small C# program which calculates prime numbers using the Sieve of Eratosthenes.
long n = 100000;
bool[] p = new bool[n+1];
for(long i=2; i<=n; i++)
{
p[i]=true;
}
for(long i=2; i<=X; i++)
{
for(long j=Y; j<=Z; j++)
{
p[i*j]=false;
}
}
for(long i=0; i<=n; i++)
{
if(p[i])
{
Console.Write(" "+i);
}
}
Console.ReadKey(true);
My question is: which X, Y and Z should I choose to make my program as efficient and economical as possible?
Of course we can just take:
X = n
Y = 2
Z = n
But then the program won't be very efficient.
It seems we can take:
X = Math.Sqrt(n)
Y = i
Z = n/i
And apparently the first 100 primes that the program gives are all correct.
There are several optimisations that can be applied without making the program overly complicated.
you can start the crossing out at j = i (effectively i * i instead of 2 * i) since all lower multiples of i have already been crossed out
you can save some work by leaving all even numbers out of the array (remembering to produce the prime 2 out of thin air when needed); hence array cell k represents the odd integer 2 * k + 1
you can make things faster by turning repeated multiplication (i * j) into iterated addition (k += i); instead of looping over j in the inner loop you loop (k = i * i; k <= N; k += i)
in some cases it can be advantageous to initialise the array with 0 (false) and set cells to 1 (true) for composites; its meaning is thus 'is_composite' instead of 'is_prime'
Harvesting all the low-hanging fruit, the loops thus become (in C++, but C# should be sort of similar):
uint32_t max_factor_bit = uint32_t(sqrt(double(n))) >> 1;
uint32_t max_bit = n >> 1;
for (uint32_t i = 3 >> 1; i <= max_factor_bit; ++i)
{
if (composite[i]) continue;
uint32_t n = (i << 1) + 1;
uint32_t k = (n * n) >> 1;
for ( ; k <= max_bit; k += n)
{
composite[k] = true;
}
}
Regarding the computation of max_factor there are some caveats where the compiler can bite you, for larger values of n. There's a topic for that on Code Review.
A further, easy optimisation is to represent the bitmap as an array of bytes, with each byte standing for eight odd integers. For setting bit k in byte array a you would do a[k / CHAR_BIT] |= (1 << (k % CHAR_BIT)) where CHAR_BIT is the number of bits in a byte. However, such bit trickery is normally wrapped into an inline function to keep the code clean. E.g. in C++ I tell the compiler how to generate such functions using a template like this:
template<typename word_t>
inline
void set_bit (word_t *p, uint32_t index)
{
enum { BITS_PER_WORD = sizeof(word_t) * CHAR_BIT };
// we can trust the compiler to use masking and shifting instead of division; we cannot do that
// ourselves without having the log2 which cannot easily be computed as a constexpr
p[index / BITS_PER_WORD] |= word_t(1) << (index % BITS_PER_WORD);
}
This allows me to say set_bit(a, k) for any type of array - byte, integer, whatever - without having to write special code or use invocations; it's basically a type-safe equivalent to the old C-style macros. I'm not certain whether something similar is possible in C#. There is, however, the C# type BitArray where all that stuff is already done for you under the hood.
On pastebin there's a small demo .cpp for the segmented Sieve of Eratosthenes, where two further optimisations are applied: presieving by small integers, and sieving in small, cache friendly blocks so that the full range of 32-bit integers can be sieved in 2 seconds flat. This could give you some inspiration...
When doing the Sieve of Eratosthenes, memory savings easily translate to speed gains because the algorithm is memory-intensive and it tends to stride all over the memory instead of accessing it locally. That's why space savings due to compact representation (only odd integers, packed bits - i.e. BitArray) and localisation of access (by sieving in small blocks instead of the whole array in one go) can speed up the code by one or more orders of magnitude, without making the code significantly more complicated.
It is possible to go far beyond the easy optimisations mentioned here, but that tends to make the code increasingly complicated. One word that often occurs in this context is the 'wheel', which can save a further 50% of memory space. The wiki has an explanation of wheels here, and in a sense the odds-only sieve is already using a 'modulo 2 wheel'. Conversely, a wheel is the extension of the odds-only idea to dropping further small primes from the array, like 3 and 5 in the famous 'mod 30' wheel with modulus 2 * 3 * 5. That wheel effectively stuffs 30 integers into one 8-bit byte.
Here's a runnable rendition of the above code in C#:
static uint max_factor32 (double n)
{
double r = System.Math.Sqrt(n);
if (r < uint.MaxValue)
{
uint r32 = (uint)r;
return r32 - ((ulong)r32 * r32 > n ? 1u : 0u);
}
return uint.MaxValue;
}
static void sieve32 (System.Collections.BitArray odd_composites)
{
uint max_bit = (uint)odd_composites.Length - 1;
uint max_factor_bit = max_factor32((max_bit << 1) + 1) >> 1;
for (uint i = 3 >> 1; i <= max_factor_bit; ++i)
{
if (odd_composites[(int)i]) continue;
uint p = (i << 1) + 1; // the prime represented by bit i
uint k = (p * p) >> 1; // starting point for striding through the array
for ( ; k <= max_bit; k += p)
{
odd_composites[(int)k] = true;
}
}
}
static int Main (string[] args)
{
int n = 100000000;
System.Console.WriteLine("Hello, Eratosthenes! Sieving up to {0}...", n);
System.Collections.BitArray odd_composites = new System.Collections.BitArray(n >> 1);
sieve32(odd_composites);
uint cnt = 1;
ulong sum = 2;
for (int i = 1; i < odd_composites.Length; ++i)
{
if (odd_composites[i]) continue;
uint prime = ((uint)i << 1) + 1;
cnt += 1;
sum += prime;
}
System.Console.WriteLine("\n{0} primes, sum {1}", cnt, sum);
return 0;
}
This does 10^8 in about a second, but for higher values of n it gets slow. If you want to do faster then you have to employ sieving in small, cache-sized blocks.
I'm quite new to programming in C#, and thought that attempting the Euler Problems would be a good idea as a starting foundation. However, I have gotten to a point where I can't seem to get the correct answer for Problem 2.
"Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms."
My code is this:
int i = 1;
int j = 2;
int sum = 0;
while (i < 4000000)
{
if (i < j)
{
i += j;
if (i % 2 == 0)
{
sum += i;
}
}
else
{
j += i;
if (j % 2 == 0)
{
sum += j;
}
}
}
MessageBox.Show("The answer is " + sum);
Basically, I think that I am only getting the last two even numbers of the sequence and adding them - but I don't know how to get all of the even numbers of the sequence and add them. Could someone please help me, whilst trying to progress from my starting point?
P.S. - If there are any really bad layout choices, do say as eliminating these now will help me to become a better programmer in the future :)
Thanks a lot in advance.
I just logged in into my Project Euler account, to see the correct answer. As others say, you forgot to add the initial term 2, but otherwise your code is OK (the correct answer is what your code outputs + 2), so well done!
It is pretty confusing though, I think it would look way clearer if you'd use 3 variables, something like:
int first = 1;
int second = 1;
int newTerm = 0;
int sum = 0;
while (newTerm <= 4000000)
{
newTerm = first + second;
if (newTerm % 2 == 0)
{
sum += newTerm;
}
first = second;
second = newTerm;
}
MessageBox.Show("The answer is " + sum);
You need to set the initial value of sum to 2, as you are not including that in your sum with the current code.
Also, although it may be less efficient memory-usage-wise, I would probably write the code something like this because IMO it's much more readable:
var fibonacci = new List<int>();
fibonacci.Add(1);
fibonacci.Add(2);
var curIndex = 1;
while(fibonacci[curIndex] + fibonacci[curIndex - 1] <= 4000000) {
fibonacci.Add(fibonacci[curIndex] + fibonacci[curIndex - 1]);
curIndex++;
}
var sum = fibonacci.Where(x => x % 2 == 0).Sum();
Use an array fib to store the sequence. Its easier to code and debug. At every iteration, you just need to check if the value is even.
fib[i] = fib[i - 1] + fib[i - 2];
if (fib[i] > 4000000) break;
if (fib[i] % 2 == 0) sum += fib[i];
I've solved this a few years ago, so I don't remember how I did it exactly, but I do have access to the forums talking about it. A few hints and an outright solution. The numbers repeat in a pattern. two odds followed by an even. So you could skip numbers without necessarily doing the modulus operation.
A proposed C# solution is
long sum = 0, i0, i1 = 1, i2 = 2;
do
{
sum += i2;
for (int i = 0; i < 3; i++)
{
i0 = i1;
i1 = i2;
i2 = i1 + i0;
}
} while (i2 < 4000000);
Your code is a bit ugly, since you're alternating between i and j. There is a much easier way to calculate fibonacci numbers by using three variables and keeping their meaning the same all the time.
One related bug is that you only check the end condition on every second iteration and in the wrong place. But you are lucky that the cutoff fit your bug(the number that went over the limit was odd), so this is not your problem.
Another bug is that you check with < instead of <=, but since there is no fibonacci number equal to the cutoff this doesn't cause your problem.
It's not an int overflow either.
What remains is that you forgot to look at the first two elements of the sequence. Only one of which is even, so you need to add 2 to your result.
int sum = 0; => int sum = 2;
Personally I'd write one function that returns the infinite fibonacci sequence and then filter and sum with Linq. Fibonacci().TakeWhile(i=> i<=4000000).Where(i=>i%2==0).Sum()
I used a class to represent a FibonacciNumber, which i think makes the code more readable.
public class FibonacciNumber
{
private readonly int first;
private readonly int second;
public FibonacciNumber()
{
this.first = 0;
this.second = 1;
}
private FibonacciNumber(int first, int second)
{
this.first = first;
this.second = second;
}
public int Number
{
get { return first + second; }
}
public FibonacciNumber Next
{
get
{
return new FibonacciNumber(this.second, this.Number);
}
}
public bool IsMultipleOf2
{
get { return (this.Number % 2 == 0); }
}
}
Perhaps it's a step too far but the end result is a function which reads quite nicely IMHO:
var current = new FibonacciNumber();
var result = 0;
while (current.Number <= max)
{
if (current.IsMultipleOf2)
result += current.Number;
current = current.Next;
}
return result;
However it's not going to be as efficient as the other solutions which aren't newing up classes in a while loop. Depends on your requirements I guess, for me I just wanted to solve the problem and move on to the next one.
Hi i have solved this question, check it if its right.
My code is,
#include<iostream.h>
#include<conio.h>
class euler2
{
unsigned long long int a;
public:
void evensum();
};
void euler2::evensum()
{
a=4000000;
unsigned long long int i;
unsigned long long int u;
unsigned long long int initial=0;
unsigned long long int initial1=1;
unsigned long long int sum=0;
for(i=1;i<=a;i++)
{
u=initial+initial1;
initial=initial1;
initial1=u;
if(u%2==0)
{
sum=sum+u;
}
}
cout<<"sum of even fibonacci numbers upto 400000 is"<<sum;
}
void main()
{
euler2 a;
clrscr();
a.evensum();
getch();
}
Here's my implementation:
public static int evenFibonachi(int n)
{
int EvenSum = 2, firstElem = 1, SecondElem = 2, SumElem=0;
while (SumElem <= n)
{
swich(ref firstElem, ref SecondElem, ref SumElem);
if (SumElem % 2 == 0)
EvenSum += SumElem;
}
return EvenSum;
}
private static void swich(ref int firstElem, ref int secondElem, ref int SumElem)
{
int temp = firstElem;
firstElem = secondElem;
secondElem += temp;
SumElem = firstElem + secondElem;
}
I heard about Counting Sort and wrote my version of it based on what I understood.
public void my_counting_sort(int[] arr)
{
int range = 100;
int[] count = new int[range];
for (int i = 0; i < arr.Length; i++) count[arr[i]]++;
int index = 0;
for (int i = 0; i < count.Length; i++)
{
while (count[i] != 0)
{
arr[index++] = i;
count[i]--;
}
}
}
The above code works perfectly.
However, the algorithm given in CLRS is different. Below is my implementation
public int[] counting_sort(int[] arr)
{
int k = 100;
int[] count = new int[k + 1];
for (int i = 0; i < arr.Length; i++)
count[arr[i]]++;
for (int i = 1; i <= k; i++)
count[i] = count[i] + count[i - 1];
int[] b = new int[arr.Length];
for (int i = arr.Length - 1; i >= 0; i--)
{
b[count[arr[i]]] = arr[i];
count[arr[i]]--;
}
return b;
}
I've directly translated this from pseudocode to C#. The code doesn't work and I get an IndexOutOfRange Exception.
So my questions are:
What's wrong with the second piece of code ?
What's the difference algorithm wise between my naive implementation and the one given in the book ?
The problem with your version is that it won't work if the elements have satellite data.
CLRS version would work and it's stable.
EDIT:
Here's an implementation of the CLRS version in Python, which sorts pairs (key, value) by key:
def sort(a):
B = 101
count = [0] * B
for (k, v) in a:
count[k] += 1
for i in range(1, B):
count[i] += count[i-1]
b = [None] * len(a)
for i in range(len(a) - 1, -1, -1):
(k, v) = a[i]
count[k] -= 1
b[count[k]] = a[i]
return b
>>> print sort([(3,'b'),(2,'a'),(3,'l'),(1,'s'),(1,'t'),(3,'e')])
[(1, 's'), (1, 't'), (2, 'a'), (3, 'b'), (3, 'l'), (3, 'e')]
It should be
b[count[arr[i]]-1] = arr[i];
I'll leave it to you to track down why ;-).
I don't think they perform any differently. The second just pushes the correlation of counts out of the loop so that it's simplified a bit within the final loop. That's not necessary as far as I'm concerned. Your way is just as straightforward and probably more readable. In fact (I don't know about C# since I'm a Java guy) I would expect that you could replace that inner while-loop with a library array fill; something like this:
for (int i = 0; i < count.Length; i++)
{
arrayFill(arr, index, count[i], i);
index += count[i];
}
In Java the method is java.util.Arrays.fill(...).
The problem is that you have hard-coded the length of the array that you are using to 100. The length of the array should be m + 1 where m is the maximum element on the original array. This is the first reason that you would think using counting-sort, if you have information about the elements of the array are all minor that some constant and it would work great.
I have a datastructure with a field of the float-type. A collection of these structures needs to be sorted by the value of the float. Is there a radix-sort implementation for this.
If there isn't, is there a fast way to access the exponent, the sign and the mantissa.
Because if you sort the floats first on mantissa, exponent, and on exponent the last time. You sort floats in O(n).
Update:
I was quite interested in this topic, so I sat down and implemented it (using this very fast and memory conservative implementation). I also read this one (thanks celion) and found out that you even dont have to split the floats into mantissa and exponent to sort it. You just have to take the bits one-to-one and perform an int sort. You just have to care about the negative values, that have to be inversely put in front of the positive ones at the end of the algorithm (I made that in one step with the last iteration of the algorithm to save some cpu time).
So heres my float radixsort:
public static float[] RadixSort(this float[] array)
{
// temporary array and the array of converted floats to ints
int[] t = new int[array.Length];
int[] a = new int[array.Length];
for (int i = 0; i < array.Length; i++)
a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
// set the group length to 1, 2, 4, 8 or 16
// and see which one is quicker
int groupLength = 4;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
int[] count = new int[1 << groupLength];
int[] pref = new int[1 << groupLength];
int groups = bitLength / groupLength;
int mask = (1 << groupLength) - 1;
int negatives = 0, positives = 0;
for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
{
// reset count array
for (int j = 0; j < count.Length; j++)
count[j] = 0;
// counting elements of the c-th group
for (int i = 0; i < a.Length; i++)
{
count[(a[i] >> shift) & mask]++;
// additionally count all negative
// values in first round
if (c == 0 && a[i] < 0)
negatives++;
}
if (c == 0) positives = a.Length - negatives;
// calculating prefixes
pref[0] = 0;
for (int i = 1; i < count.Length; i++)
pref[i] = pref[i - 1] + count[i - 1];
// from a[] to t[] elements ordered by c-th group
for (int i = 0; i < a.Length; i++){
// Get the right index to sort the number in
int index = pref[(a[i] >> shift) & mask]++;
if (c == groups - 1)
{
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if (a[i] < 0)
index = positives - (index - negatives) - 1;
else
index += negatives;
}
t[index] = a[i];
}
// a[]=t[] and start again until the last group
t.CopyTo(a, 0);
}
// Convert back the ints to the float array
float[] ret = new float[a.Length];
for (int i = 0; i < a.Length; i++)
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
return ret;
}
It is slightly slower than an int radix sort, because of the array copying at the beginning and end of the function, where the floats are bitwise copied to ints and back. The whole function nevertheless is again O(n). In any case much faster than sorting 3 times in a row like you proposed. I dont see much room for optimizations anymore, but if anyone does: feel free to tell me.
To sort descending change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to this:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measuring:
I set up some short test, containing all special cases of floats (NaN, +/-Inf, Min/Max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floats:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So i ran a test with a huge array of 10M numbers:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
byte[] buffer = new byte[4];
rnd.NextBytes(buffer);
float rndfloat = BitConverter.ToSingle(buffer, 0);
switch(i){
case 0: { test[i] = float.MaxValue; break; }
case 1: { test[i] = float.MinValue; break; }
case 2: { test[i] = float.NaN; break; }
case 3: { test[i] = float.NegativeInfinity; break; }
case 4: { test[i] = float.PositiveInfinity; break; }
case 5: { test[i] = 0f; break; }
default: { test[i] = test[i] = rndfloat; break; }
}
}
And stopped the time of the different sorting algorithms:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the output was (update: now ran with release build, not debug):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
roughly more than four times as fast as Linq. That is not bad. But still not yet that fast as Array.Sort, but also not that much worse. But i was really surprised by this one: I expected it to be slightly slower than Linq on very small arrays. But then I ran a test with just 20 elements:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
and even this time my Radixsort is quicker than Linq, but way slower than array sort. :)
Update 2:
I made some more measurements and found out some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group length of 16 bits (only 2 iterations), you have a huge memory overhead when sorting small arrays, but you can beat Array.Sort if it comes to arrays larger than about 100k elements, even if not very much. The charts axes are both logarithmized:
(source: daubmeier.de)
There's a nice explanation of how to perform radix sort on floats here:
http://www.codercorner.com/RadixSortRevisited.htm
If all your values are positive, you can get away with using the binary representation; the link explains how to handle negative values.
By doing some fancy casting and swapping arrays instead of copying this version is 2x faster for 10M numbers as Philip Daubmeiers original with grouplength set to 8. It is 3x faster as Array.Sort for that arraysize.
static public void RadixSortFloat(this float[] array, int arrayLen = -1)
{
// Some use cases have an array that is longer as the filled part which we want to sort
if (arrayLen < 0) arrayLen = array.Length;
// Cast our original array as long
Span<float> asFloat = array;
Span<int> a = MemoryMarshal.Cast<float, int>(asFloat);
// Create a temp array
Span<int> t = new Span<int>(new int[arrayLen]);
// set the group length to 1, 2, 4, 8 or 16 and see which one is quicker
int groupLength = 8;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
var dim = 1 << groupLength;
int groups = bitLength / groupLength;
if (groups % 2 != 0) throw new Exception("groups must be even so data is in original array at end");
var count = new int[dim];
var pref = new int[dim];
int mask = (dim) - 1;
int negatives = 0, positives = 0;
// counting elements of the 1st group incuding negative/positive
for (int i = 0; i < arrayLen; i++)
{
if (a[i] < 0) negatives++;
count[(a[i] >> 0) & mask]++;
}
positives = arrayLen - negatives;
int c;
int shift;
for (c = 0, shift = 0; c < groups - 1; c++, shift += groupLength)
{
CalcPrefixes();
var nextShift = shift + groupLength;
//
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
count[( ai>> nextShift) & mask]++;
t[index] = ai;
}
// swap the arrays and start again until the last group
var temp = a;
a = t;
t = temp;
}
// Last round
CalcPrefixes();
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if ( ai < 0) index = positives - (index - negatives) - 1; else index += negatives;
//
t[index] = ai;
}
void CalcPrefixes()
{
pref[0] = 0;
for (int i = 1; i < dim; i++)
{
pref[i] = pref[i - 1] + count[i - 1];
count[i - 1] = 0;
}
}
}
You can use an unsafe block to memcpy or alias a float * to a uint * to extract the bits.
I think your best bet if the values aren't too close and there's a reasonable precision requirement, you can just use the actual float digits before and after the decimal point to do the sorting.
For example, you can just use the first 4 decimals (be they 0 or not) to do the sorting.