Efficient way of counting specific occurrences during a for loop - c#

I have a loop of about one million values. I want to store just the exponent of each value to display them e.g. in a histogram.
At the moment I'm doing it this way:
int histogram[51]; //init all with 0
for(int i = 0; i < 1000000; i++)
{
int exponent = getExponent(getValue(i));
//getExponent(double value) gives the exponent(base 10)
//getValue(int i) gives the value for loop i
if(exponent > 25)
exponent = 25;
if(exponent < -25)
exponent = -25;
histogramm[exponent+25]++;
}
Is there a more efficient and elegant way of doing this?
Perhaps without using an array?

Is there a more efficient and elegant way of doing this?
The only more elegant way would be to use Math.Min and Math.Max but it wouldn't be any more efficient. histogramm[Math.Max(0,Math.Min(50,exponent+25))]++ is fewer characters but no more performant or elegant in my opinion.
Perhaps without using an array?
An array is a raw set of values so is the most straightforward way of storing them.

Assuming that getExponent and getValue are optimized, the only way to optimize this is to use Parrallel.For. I don't think the difference will be significant, but i see it as the only way.
As for the array, it is the best indexed low-level data structure that you can use for storing data.
Use Parallel Library [ using System.Threading.Tasks; ]:
int[] histogram = new int[51]; //init all with 0
Action<int> work = (i) =>
{
int exponent = getExponent(getValue(i));
//getExponent(double value) gives the exponent(base 10)
//getValue(int i) gives the value for loop i
if (exponent > 25)
exponent = 25;
if (exponent < -25)
exponent = -25;
histogram[exponent + 25]++;
};
Parallel.For(0, 1000000, work);

There's not much to optimize with the conditional and array access. However, if the getExponent function is a time consuming operation, you can cache the results. A lot is really going to depend on what typical data looks like. Profile to see if it helps. Also, someone else mentioned using Parallel. That's worth profiling too as it could make a difference if getValue or getExponent are slow enough to overcome the Parallel and locking overhead.
Important note: Using double is probably a bad idea for a dictionary. But I'm confused by the math going on and conversions from double's to int's.
Either way, the idea here is to cache a calculation. So perhaps you can find a better way if this test proves useful.
int[] histogram = Enumerable.Repeat(0, 51).ToArray();
...
Dictionary<double, int> cache = new Dictionary<double, int>(histogram.Length);
...
for (int i = 0; i < 1000000; i++)
{
double value = getValue(i);
int exponent;
if(!cache.TryGetValue(value, out exponent))
{
exponent = getExponent(value);
cache[value] = exponent;
}
if (exponent > 25)
exponent = 25;
else if (exponent < -25)
exponent = -25;
histogram[exponent + 25]++;
}

Try using an public property to store your values for you exponent, so can test and check your results.
As for as elegance goes, I would simplify the if conditions with actual functions or subroutines, and create additional abstraction with clear and concise naming.

Related

System.DivideByZeroException on high numbers in C#

So I've been trying to write a program in C# that returns all factors for a given number (I will implement user input later). The program looks as follows:
//Number to divide
long num = 600851475143;
//initializes list
List<long> list = new List<long>();
//Defines combined variable for later output
var combined = string.Join(", ", list);
for (int i = 1; i < num; i++)
{
if (num % i == 0)
{
list.Add(i);
Console.WriteLine(i);
}
}
However, after some time the program starts to also try to divide negative numbers, which after some time ends in a System.DivideByZeroException. It's not clear to me why it does this. It only starts to do this after the "num" variable contains a number with 11 digits or more. But since I need such a high number, a fix or similiar would be highly appreciated. I am still a beginner.
Thank you!
I strongly suspect the problem is integer overflow. num is a 64-bit integer, whereas i is a 32-bit integer. If num is more than int.MaxValue, then as you increment i it will end up overflowing back to negative values and then eventually 0... at which point num % i will throw.
The simplest option is just to change i to be a long instead:
for (long i = 1; i < num; i++)
It's unfortunate that there'd be no warning in your original code - i is promoted to long where it needs to be, because there's an implicit conversion from int to long. It's not obvious to me what would need to change for this to be spotted in the language itself. It would be simpler for a Roslyn analyzer to notice this sort of problem.

C# Double.ToString() performance issue

I have the following method to convert a double array to a List<string>:
static Dest Test(Source s)
{
Dest d = new Dest();
if (s.A24 != null)
{
double[] dd = s.A24;
int cnt = dd.Length;
List<string> lst = new List<string>();
for (int i = 0; i < cnt; i++)
lst.Add(((double)dd[i]).ToString());
d.A24 = lst;
}
else
{
d.A24 = null;
}
return d;
}
Doing a List.Add() in a loop seems like the fastest way according to my benchmarks beating all the various LINQ and Convert tricks.
This is really slow. 2400ms for a million calls (Any CPU, prefer 64-bit). So I was experimenting with various ways to make it faster. Let's assume I cannot cache the source or dest lists, etc obviously.
So anyways, I stumbled across something weird here... if I change the lst.Add() line to cast to a decimal instead of a double, it is much, MUCH faster. 900ms vs 2400ms.
Here's my questions
1) decimal has greater accuracy then double, so I shouldn't lose anything in the type cast, correct?
2) why is the Decimal.ToString() so much faster then Double.ToString()?
3) is this a reasonable optimization, or am I missing some key detail where this will come back to bite me?
I'm not concerned about using up a little bit more memory, I am only concerned about performance.
Nothing sophisticated for the test data at this point, just using:
s.A24 = new double[] { 1.2, 3.4, 5.6 };
For what it's worth, I ran the following and got different results, with decimal usually taking slightly longer (but both calls of the calls to lst.Add() and number.ToString() being roughly equivalent).
What type of collection is A24 in your code? I wouldn't be surprised if the additional overhead you're seeing is actually in casting or something you're not currently looking at.
var iterations = 1000000;
var lst = new List<string>();
var rnd = new Random();
var dblArray = new double[iterations];
for (var i = 0; i < iterations; i++)
//INTERESTING FINDING FROM THE COMMENTS
//double.ToString() is faster if this line is rnd.NextDouble()
//but decimal.ToString() is faster if hard-coding the value "3.5"
//(despite the overhead of casting to decimal)
dblArray[i] = rnd.NextDouble();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < iterations; i++)
lst.Add(dblArray[i].ToString());
sw.Stop();
//takes 280-300 MS
Debug.WriteLine("Double loop MS: " + sw.ElapsedMilliseconds);
//reset list
lst = new List<string>();
sw.Restart();
for (var i = 0; i < iterations; i++)
lst.Add(((decimal)dblArray[i]).ToString());
sw.Stop();
//takes 280-320 MS
Debug.WriteLine("Decimal loop MS: " + sw.ElapsedMilliseconds);
A Decimal and Double are often confused and interchanged, but they are completely different animals at the processor level. If I had to imagine writing the code to for Double.ToString(), I can see the problem... It's hard. Comparatively, writing the code for Decimal.ToString() shouldn't be much more difficult than Int32.ToString(). I'm sure if you compare Int32.ToString() to Decimal.ToString() you will find they results are very close.
FYI: Double and Float(Single) are not exact, and many numbers can't be expressed in a Double. In your example, you give 1.2 which is really 1 + 1/5. That can't exist as a true double (even if the VS IDE covers for it). You would get something like 1.1999999999999998. If you want performance, use a Decimal.
2) why is the Decimal.ToString() so much faster then Double.ToString()?
Because Double.ToString() is actually much more complex. Compare the core implementation of Decimal.ToString() and Double.ToString(), Decimal.ToString() has fixed precision whereas Double.ToString()'s precision is on demand. Double has its IEEE floating point definition which is also much more complex than Decimal.
Current Double.ToString() implementation relies on _ecvt on Windows and snprintf on Linux. They're inefficient (especially for Linux implementation). There's an in-progress PR to re-write Double.ToString() in an efficient way which removes the dependency of _ecvt and snprintf.

What would be the shortest way to sum up the digits in odd and even places separately

I've always loved reducing number of code lines by using simple but smart math approaches. This situation seems to be one of those that need this approach. So what I basically need is to sum up digits in the odd and even places separately with minimum code. So far this is the best way I have been able to think of:
string number = "123456789";
int sumOfDigitsInOddPlaces=0;
int sumOfDigitsInEvenPlaces=0;
for (int i=0;i<number.length;i++){
if(i%2==0)//Means odd ones
sumOfDigitsInOddPlaces+=number[i];
else
sumOfDigitsInEvenPlaces+=number[i];
{
//The rest is not important
Do you have a better idea? Something without needing to use if else
int* sum[2] = {&sumOfDigitsInOddPlaces,&sumOfDigitsInEvenPlaces};
for (int i=0;i<number.length;i++)
{
*(sum[i&1])+=number[i];
}
You could use two separate loops, one for the odd indexed digits and one for the even indexed digits.
Also your modulus conditional may be wrong, you're placing the even indexed digits (0,2,4...) in the odd accumulator. Could just be that you're considering the number to be 1-based indexing with the number array being 0-based (maybe what you intended), but for algorithms sake I will consider the number to be 0-based.
Here's my proposition
number = 123456789;
sumOfDigitsInOddPlaces=0;
sumOfDigitsInEvenPlaces=0;
//even digits
for (int i = 0; i < number.length; i = i + 2){
sumOfDigitsInEvenPlaces += number[i];
}
//odd digits, note the start at j = 1
for (int j = 1; i < number.length; i = i + 2){
sumOfDigitsInOddPlaces += number[j];
}
On the large scale this doesn't improve efficiency, still an O(N) algorithm, but it eliminates the branching
Since you added C# to the question:
var numString = "123456789";
var odds = numString.Split().Where((v, i) => i % 2 == 1);
var evens = numString.Split().Where((v, i) => i % 2 == 0);
var sumOfOdds = odds.Select(int.Parse).Sum();
var sumOfEvens = evens.Select(int.Parse).Sum();
Do you like Python?
num_string = "123456789"
odds = sum(map(int, num_string[::2]))
evens = sum(map(int, num_string[1::2]))
This Java solution requires no if/else, has no code duplication and is O(N):
number = "123456789";
int[] sums = new int[2]; //sums[0] == sum of even digits, sums[1] == sum of odd
for(int arrayIndex=0; arrayIndex < 2; ++arrayIndex)
{
for (int i=0; i < number.length()-arrayIndex; i += 2)
{
sums[arrayIndex] += Character.getNumericValue(number.charAt(i+arrayIndex));
}
}
Assuming number.length is even, it is quite simple. Then the corner case is to consider the last element if number is uneven.
int i=0;
while(i<number.length-1)
{
sumOfDigitsInEvenPlaces += number[ i++ ];
sumOfDigitsInOddPlaces += number[ i++ ];
}
if( i < number.length )
sumOfDigitsInEvenPlaces += number[ i ];
Because the loop goes over i 2 by 2, if number.length is even, removing 1 does nothing.
If number.length is uneven, it removes the last item.
If number.length is uneven, then the last value of i when exiting the loop is that of the not yet visited last element.
If number.length is uneven, by modulo 2 reasoning, you have to add the last item to sumOfDigitsInEvenPlaces.
This seems slightly more verbose, but also more readable, to me than Anonymous' (accepted) answer. However, benchmarks to come.
Well, the compiler seems to think my code more understandable as well, since he removes it all if I don't print the results (which explains why I kept getting a time of 0 all along...). The other code though is obfuscated enough for even the compiler.
In the end, even with huge arrays, it's pretty hard for clock_t to tell the difference between the two. You get about a third less instructions in the second case, but since everything's in cache (and your running sums even in registers) it doesn't matter much.
For the curious, I've put the disassembly of both versions (compiled from C) here : http://pastebin.com/2fciLEMw

Shuffling an array bottlenecks on Random.Nex(int)

I've been working on a small piece of code that sorts the provided array. The array should be sorted as fast as possible. Randomization is not that important. After profiling the method I found out that the biggest hog is Random.Next. Which takes up about 70% of the method execution time. After searching online for faster random generators I found no plug and play libraries that offer any improved performance.
So I was wondering whether there are any ways to improve the performance of this code any more.
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Shuffle(byte[] chars)
{
for (var i = 0; i < chars.Length; i++)
{
var index = rnd.Next(chars.Length);
byte tmpStore = chars[index];
chars[index] = chars[i];
chars[i] = tmpStore;
}
}
Alright, this is getting into micro-optimization territory.
Random.Next(int) actually performs some ops internally that we can optimize out:
int index = (int)(rnd.Next() * (1.0 / int.Max) * chars.Length);
Since you're using the same maxValue over and over in a loop, a trivial optimization would be to precalculate your denominator outside of the loop. This way we get rid of an int->double conversion and a multiply:
double d = chars.Length / (double)int.Max;
And then:
int index = (int)(rnd.Next() * d);
On a separate note: your shuffle isn't going to have a uniform distribution. See Jeff Atwood's post The Danger of Naïveté which deals specifically with this subject and shows how to perform a uniform Fisher-Yates shuffle.
If n^n isn't too big for the double range, you could create one random double number, multiply it by n^n, then use modulo(n) each iteration as the current random number prior to dividing the random result by n as preparation for the next iteration.

C# Prime Generator, Maxxing out Bit Array

(C#, prime generator)
Heres some code a friend and I were poking around on:
public List<int> GetListToTop(int top)
{
top++;
List<int> result = new List<int>();
BitArray primes = new BitArray(top / 2);
int root = (int)Math.Sqrt(top);
for (int i = 3, count = 3; i <= root; i += 2, count++)
{
int n = i - count;
if (!primes[n])
for (int j = n + i; j < top / 2; j += i)
{
primes[j] = true;
}
}
if (top >= 2)
result.Add(2);
for (int i = 0, count = 3; i < primes.Length; i++, count++)
{
if (!primes[i])
{
int n = i + count;
result.Add(n);
}
}
return result;
}
On my dorky AMD x64 1800+ (dual core), for all primes below 1 billion in 34546.875ms. Problem seems to be storing more in the bit array. Trying to crank more than ~2billion is more than the bitarray wants to store. Any ideas on how to get around that?
I would "swap" parts of the array out to disk. By that, I mean, divide your bit array into half-billion bit chunks and store them on disk.
The have only a few chunks in memory at any one time. With C# (or any other OO language), it should be easy to encapsulate the huge array inside this chunking class.
You'll pay for it with a slower generation time but I don't see any way around that until we get larger address spaces and 128-bit compilers.
Or as an alternative approach to the one suggested by Pax, make use of the new Memory-Mapped File classes in .NET 4.0 and let the OS decide which chunks need to be in memory at any given time.
Note however that you'll want to try and optimise the algorithm to increase locality so that you do not needlessly end up swapping pages in and out of memory (trickier than this one sentence makes it sound).
Use multiple BitArrays to increase the maximum size. If a number is to great bit-shift it and store the result in a bit-array for storing bits 33-64.
BitArray second = new BitArray(int.MaxValue);
long num = 23958923589;
if (num > int.MaxValue)
{
int shifted = (int)num >> 32;
second[shifted] = true;
}
long request = 0902305023;
if (request > int.MaxValue)
{
int shifted = (int)request >> 32;
return second[shifted];
}
else return first[request];
Of course it would be nice if BitArray would support size up to System.Numerics.BigInteger.
Swapping to disk will make your code really slow.
I have a 64-bit OS, and my BitArray is also limited to 32-bits.
PS: your prime number calculations looks wierd, mine looks like this:
for (int i = 2; i <= number; i++)
if (primes[i])
for (int scalar = i + i; scalar <= number; scalar += i)
{
primes[scalar] = false;
yield return scalar;
}
The Sieve algorithm would be better performing. I could determine all the 32-bit primes (total about 105 million) for the int range in less than 4 minutes with that. Of course returning the list of primes is a different thing as the memory requirement there would be a little over 400 MB (1 int = 4 bytes). Using a for loop the numbers were printed to a file and then imported to a DB for more fun :) However for the 64 bit primes the program would need several modifications and perhaps require distributed execution over multiple nodes. Also refer to the following links
http://www.troubleshooters.com/codecorn/primenumbers/primenumbers.htm
http://en.wikipedia.org/wiki/Prime-counting_function

Categories

Resources