I'm trying to solve a simple question on leetcode.com (https://leetcode.com/problems/number-of-1-bits/) and I encounter a strange behavior which is probably my lack of understanding...
My solution to the question in the link is the following:
public int HammingWeight(uint n) {
int sum = 0;
while (n > 0) {
uint t = n % 10;
sum += t == 0 ? 0 : 1;
n /= 10;
}
return sum;
}
My solution was to isolate each number and if it's one increase the sum. When I ran this on my PC it worked (yes - I know it's not the optimal solution and there are more elegant solutions considering it's binary representation).
But when I tried running in the leetcode editor it returned a wrong answer for the following input (00000000000000000000000000001011).
No real easy way to debug other then printing to the console so I printed the value of n when entering the method and got the result of 11 instead of 1011 - on my PC I got 11. If I take a different solution - one that uses bitwise right shift or calculating mod by 2 then it works even when the printed n is still 11. And I would have expected those solutions to fail as well considering that n is "wrong" (different from my PC and the site as described).
Am I missing some knowledge regarding the representation of uint? Or binary number in a uint variable?
Your code appears to be processing it as base 10 (decimal), but hamming weight is about base 2 (i.e. binary). So: instead if doing % 10 and /= 10, you should be looking at % 2 and /= 2.
As for what uint looks like as binary: essentially like this, but ... the CPU is allowed to lie about where each of the octets actually is (aka "endianness"). The good news is: it doesn't usually expose that lie to you unless you cheat and look under the covers by looking at raw memory. As long as you use regular operators (include bitwise operators): the lie will remain undiscovered.
Side note: for binary work that is about checking a bit and shuffling the data down, & 1 and >> 1 would usually be preferable to % 2 and / 2. But as canton7 notes: there are also inbuilt operations for this specific scenario which uses the CPU intrinsic instruction when possible (however: using the built-in function doesn't help you increase your understanding!).
This Kata has a poor writing, in the examples the Inputs are printed in binary representation while the Outputs are in printed in decimal representation. And there is no clues to help understand that.
00000000000000000000000000001011b is 11 (in decimal, 8 + 2 + 1). That is why you get 11 as input for the first test case.
There is no numbers made of 0s and 1s in base 10 you have to decode as base 2 stuff here.
To solve the Kata, you just need to work in base 2 as you succeed to do and like #MarcGravell explained.
Please check below code, it will work for you.
Its very simple way to solve.
var result = 0;
for(var i = 0; i < 32; i++)
{
if ((n & 1) == 1) result++;
n = n >> 1;
}
return result;
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In counting the number of bits in a word, a brute force would be something like this:
int CountNumSetBits(unsigned long n)
{
unsigned short num_setbits = 0;
while (n)
{
num_setbits += n & 1;
n >>= 1;
}
return num_setbits;
}
The big O speed would be O(n) where n is the number of bits in the Word.
I thought of another way of writing the algorithm taking advantage of the fact that we an optain the first occurance of a set bit using y = x&~(x-1)
int CountNumSetBitsMethod2(unsigned long n)
{
unsigned short num_setbits = 0;
int y = 0;
while (n)
{
y = n& ~(n - 1); // get first occurrence of '1'
if (y) // if we have a set bit inc our counter
++num_setbits;
n ^=y; // erase the first occurrence of '1'
}
return num_setbits;
}
If we assume that are inputs are 50% 1's and 50% 0's it appears that the second algorithm could be twice as fast. However, the actual complexity is greater:
In method one we do the following for each bit:
1 add
1 and
1 shift
In method two we do the following for each set bit:
1 and
1 complement
1 subtraction (the result of the subtraction has to be copied to another reg)
1 compare
1 increment (if compare is true)
1 XOR
Now, in practice one can determine which algorithm is faster by performing some profiling. That is, using a stop watch mechanism and some test data and call each algorithm say a million times.
What I want to do first, however, is see how well I can estimate the speed difference by eyeballing the code (given same number of set and unset bits).
If we assume that the subtraction takes the same amount cycles as the add (approximately), and all the other operations are equal cycle wise, can one conclude that each algorithm takes about the same amount of time?
Note: I am assuming here we cannot use lookup tables.
The second algorithm can be greatly simplified:
int CountNumSetBitsMethod2(unsigned long n) {
unsigned short num_setbits = 0;
while (n) {
num_setbits++;
n &= n - 1;
}
return num_setbits;
}
There are many more ways to compute the number of bits set in a word:
Using lookup tables for mutiple bits at a time
Using 64-bit multiplications
Using parallel addition
Using extra tricks to shave a few cycles.
Trying to determine empirically which is faster by counting cycles is not so easy because even looking at the assembly output, it is difficult to assess the impact of instruction parallelisation, pipelining, branch prediction, register renaming and contention... Modern CPUs are very sophisticated! Furthermore, the actual code generated depends on the compiler version and configuration and the timings depend on the CPU type and release... Not to mention the variability linked to the particular sets of values used (for algorithms with variable numbers of instructions).
Benchmarking is a necessary tool, but even careful benchmarking may fail to model the actual usage correctly.
Here is a great site for this kind of bit twiddling games:
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
I suggest you implement the different versions and perform comparative benchmarks on your system. There is no definite answer, only local optima for specific sets of conditions.
Some amazing finds:
// option 3, for at most 32-bit values in v:
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %
0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
A more classic one, usually considered the best method for counting bits in a 32-bit integer v:
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
first, the only way to know how fast things are is to measure them.
Second - to find the number of set bits in some bytes, build a lookup table for the number of set bits in a byte
0->0
1->1
2->1
3->2
4->1
etc.
This is a common method and very fast
You can code it by hand or create it at startup
I need to calculate PI with predefined precision using this formula:
So I ended up with this solution.
private static double CalculatePIWithPrecision(int presicion)
{
if (presicion == 0)
{
return PI_ZERO_PRECISION;
}
double sum = 0;
double numberOfSumElements = Math.Pow(10, presicion + 2);
for (double i = 1; i < numberOfSumElements; i++)
{
sum += 1 / (i * i);
}
double pi = Math.Sqrt(sum * 6);
return pi;
}
So this works correct, but I faced the problem with efficiency. It's very slow with precision values 8 and higher.
Is there a better (and faster!) way to calculate PI using that formula?
double numberOfSumElements = Math.Pow(10, presicion + 2);
I'm going to talk about this strictly in practical software engineering terms, avoiding getting lost in the formal math. Just practical tips that any software engineer should know.
First observe the complexity of your code. How long it takes to execute is strictly determined by this expression. You've written an exponential algorithm, the value you calculate very rapidly goes up as presicion increases. You quote the uncomfortable number, 8 produces 10^10 or a loop that makes ten billion calculations. Yes, you notice this, that's when computers starts to take seconds to produce a result, no matter how fast they are.
Exponential algorithms are bad, they perform very poorly. You can only do worse with one that has factorial complexity, O(n!), that goes up even faster. Otherwise the complexity of many real-world problems.
Now, is that expression actually accurate? You can do this with an "elbow test", using a practical back-of-the-envelope example. Let's pick a precision of 5 digits as a target and write it out:
1.0000 + 0.2500 + 0.1111 + 0.0625 + 0.0400 + 0.0278 + ... = 1.6433
You can tell that the additions rapidly get smaller, it converges quickly. You can reason out that, once the next number you add gets small enough then it does very little to make the result more accurate. Let's say that when the next number is less than 0.00001 then it's time to stop trying to improve the result.
So you'll stop at 1 / (n * n) = 0.00001 => n * n = 100000 => n = sqrt(100000) => n ~= 316
Your expression says to stop at 10^(5+2) = 10,000,000
You can tell that you are way off, looping entirely too often and not improving the accuracy of the result with the last 9.999 million iterations.
Time to talk about the real problem, too bad that you didn't explain how you got to such a drastically wrong algorithm. But surely you discovered when testing your code that it just was not very good at calculating a more precise value for pi. So you figured that by iterating more often, you'd get a better result.
Do note that in this elbow-test, it is also very important that you are able to calculate the additions with sufficient precision. I intentionally rounded the numbers, as though it was calculated on a machine capable of performing additions with 5 digits of precision. Whatever you do, the result can never be more precise than 5 digits.
You are using the double type in your code. Directly supported by the processor, it does not have infinite precision. The one and only rule you ever need to keep in mind is that calculations with double are never more precise than 15 digits. Also memorize the rule for float, it is never more precise than 7 digits.
So no matter what value you pass for presicion, the result can never be more precise than 15 digits. That is not useful at all, you already have the value of pi accurate to 15 digits. It is Math.Pi
The one thing you need to do to fix this is using a type that has more precision than double. In fact, it needs to be a type that has arbitrary precision, it needs to be at least as accurate as the presicion value you pass. Such a type does not exist in the .NET framework. Finding a library that can provide you with one is a common question at SO.
I'm working on a data structure which subdivides items into quadrants, and one of the bottlenecks I've identified is my method to select the quadrant of the point. Admittedly, it's fairly simple, but it's called so many times that it adds up. I imagine there's got to be an efficient way to bit twiddle this into what I want, but I can't think of it.
private int Quadrant(Point p)
{
if (p.X >= Center.X)
return p.Y >= Center.Y ? 0 : 3;
return p.Y >= Center.Y ? 1 : 2;
}
Center is of type Point, coordinates are ints. Yes, I've run a code profile, and no, this isn't premature optimization.
Because this is only used internally, I suppose my quadrants don't have to be in Cartesian order, as long as they range from 0-3.
The fastest way in C/C++ it would be
(((unsigned int)x >> 30) & 2) | ((unsigned int)y >> 31)
(30/31 or 62/63, depending on size of int).
This will give the quadrants in order 0, 2, 3, 1.
Edit for LBushkin:
(((unsigned int)(x - center.x) >> 30) & 2) | ((unsigned int)(y-center.y) >> 31)
I don't know that you can make this code dramatically faster in C#. What you may be able to do, however, it look at how you're processing points, and see if you can avoid making unecessary calls to this method. Perhaps you could create a QuadPoint structure that stores which quadrant a point is in (after you compute it once), so that you don't have to do so again.
But, admittedly, this depends on what your algorithm is doing, and whether it's possible to store/memoize the quadrant information. If every point is completely unique, this obviously won't help.
I've just been told about the solution which produces 0,1,2,3 quadrant results ordered correctly:
#define LONG_LONG_SIGN (sizeof(long long) * 8 - 1)
double dx = point.x - center.x;
double dy = point.y - center.y;
long long *pdx = (void *)&dx;
long long *pdy = (void *)&dy;
int quadrant = ((*pdy >> LONG_LONG_SIGN) & 3) ^ ((*pdx >> LONG_LONG_SIGN) & 1);
This solution is for x,y coordinates of double type.
I've done some performance testing of this method and the method with branching as in original question: my results are that the branching method is always a bit faster (currently I am having stable 160/180 relation), so I prefer the branching method over the method with bitwise operations.
UPDATE
If someone is interested, all three algorithms were merged into EKAlgorithms C/Objective-C repository as "Cartesian quadrant selection" algorithms:
Original branching algorithm
Bitwise algorithm by #ruslik from the accepted answer.
Alternative bitwise promoted by one of my colleagues which is a bit slower than second algorithm but returns quadrants in correct order.
All algorithms there are optimized to work with double-typed points.
Performance testing showed us that in general the first branching algorithm is the winner on Mac OS X, though on Linux machine we did see second algorithm performing a small bit faster than the branching one.
So, general conclusion is to stick with branching algorithm because bitwise versions do not give any performance gain.
My first try would be to get rid of the nested conditional.
int xi = p.X >= Center.X ? 1 : 0;
int yi = p.Y >= Center.Y ? 2 : 0;
int quadrants[4] = { ... };
return quadrants[xi+yi];
The array lookup in quadrants is optional if the quadrants are allowed to be renumbered. My code still needs two comparisons but they can be done in parallel.
I apologise in advance for any C# errors as I usually code C++.
Perhaps something more efficient is possible when two unsigned 31 bit coordinates are stored in a 64 bit unsigned long variable.
// The following two lines are unnecessary
// if you store your coordinated in unsigned longs right away
unsigned long Pxy = (((unsigned long)P.x) << 32) + P.y;
unsigned long Centerxy = (((unsigned long)Center.x) << 32) + Center.y;
// This is the actual calculation, only 1 subtraction is needed.
// The or-ing with ones hast only to be done once for a repeated use of Centerxy.
unsigned long diff = (Centerxy|(1<<63)|(1<<31))-Pxy;
int quadrant = ((diff >> 62)&2) | ((diff >> 31)&1);
Taking a step back, a different solution is possible. Do not arrange your data structure split into quadrants right away but alternately in both directions. This is also done in the related Kd-tree
I have a large set of numbers, probably in the multiple gigabytes range. First issue is that I can't store all of these in memory. Second is that any attempt at addition of these will result in an overflow. I was thinking of using more of a rolling average, but it needs to be accurate. Any ideas?
These are all floating point numbers.
This is not read from a database, it is a CSV file collected from multiple sources. It has to be accurate as it is stored as parts of a second (e.g; 0.293482888929) and a rolling average can be the difference between .2 and .3
It is a set of #'s representing how long users took to respond to certain form actions. For example when showing a messagebox, how long did it take them to press OK or Cancel. The data was sent to me stored as seconds.portions of a second; 1.2347 seconds for example. Converting it to milliseconds and I overflow int, long, etc.. rather quickly. Even if I don't convert it, I still overflow it rather quickly. I guess the one answer below is correct, that maybe I don't have to be 100% accurate, just look within a certain range inside of a sepcific StdDev and I would be close enough.
You can sample randomly from your set ("population") to get an average ("mean"). The accuracy will be determined by how much your samples vary (as determined by "standard deviation" or variance).
The advantage is that you have billions of observations, and you only have to sample a fraction of them to get a decent accuracy or the "confidence range" of your choice. If the conditions are right, this cuts down the amount of work you will be doing.
Here's a numerical library for C# that includes a random sequence generator. Just make a random sequence of numbers that reference indices in your array of elements (from 1 to x, the number of elements in your array). Dereference to get the values, and then calculate your mean and standard deviation.
If you want to test the distribution of your data, consider using the Chi-Squared Fit test or the K-S test, which you'll find in many spreadsheet and statistical packages (e.g., R). That will help confirm whether this approach is usable or not.
Integers or floats?
If they're integers, you need to accumulate a frequency distribution by reading the numbers and recording how many of each value you see. That can be averaged easily.
For floating point, this is a bit of a problem. Given the overall range of the floats, and the actual distribution, you have to work out a bin-size that preserves the accuracy you want without preserving all of the numbers.
Edit
First, you need to sample your data to get a mean and a standard deviation. A few thousand points should be good enough.
Then, you need to determine a respectable range. Folks pick things like ±6σ (standard deviations) around the mean. You'll divide this range into as many buckets as you can stand.
In effect, the number of buckets determines the number of significant digits in your average. So, pick 10,000 or 100,000 buckets to get 4 or 5 digits of precision. Since it's a measurement, odds are good that your measurements only have two or three digits.
Edit
What you'll discover is that the mean of your initial sample is very close to the mean of any other sample. And any sample mean is close to the population mean. You'll note that most (but not all) of your means are with 1 standard deviation of each other.
You should find that your measurement errors and inaccuracies are larger than your standard deviation.
This means that a sample mean is as useful as a population mean.
Wouldn't a rolling average be as accurate as anything else (discounting rounding errors, I mean)? It might be kind of slow because of all the dividing.
You could group batches of numbers and average them recursively. Like average 100 numbers 100 times, then average the result. This would be less thrashing and mostly addition.
In fact, if you added 256 or 512 at once you might be able to bit-shift the result by either 8 or 9, (I believe you could do this in a double by simply changing the floating point mantissa)--this would make your program extremely quick and it could be written recursively in just a few lines of code (not counting the unsafe operation of the mantissa shift).
Perhaps dividing by 256 would already use this optimization? I may have to speed test dividing by 255 vs 256 and see if there is some massive improvement. I'm guessing not.
You mean of 32-bit and 64-bit numbers. But why not just use a proper Rational Big Num library? If you have so much data and you want an exact mean, then just code it.
class RationalBignum {
public Bignum Numerator { get; set; }
public Bignum Denominator { get; set; }
}
class BigMeanr {
public static int Main(string[] argv) {
var sum = new RationalBignum(0);
var n = new Bignum(0);
using (var s = new FileStream(argv[0])) {
using (var r = new BinaryReader(s)) {
try {
while (true) {
var flt = r.ReadSingle();
rat = new RationalBignum(flt);
sum += rat;
n++;
}
}
catch (EndOfStreamException) {
break;
}
}
}
Console.WriteLine("The mean is: {0}", sum / n);
}
}
Just remember, there are more numeric types out there than the ones your compiler offers you.
You could break the data into sets of, say, 1000 numbers, average these, and then average the averages.
This is a classic divide-and-conquer type problem.
The issue is that the average of a large set of numbers is the same
as the average of the first-half of the set, averaged with the average of the second-half of the set.
In other words:
AVG(A[1..N]) == AVG( AVG(A[1..N/2]), AVG(A[N/2..N]) )
Here is a simple, C#, recursive solution.
Its passed my tests, and should be completely correct.
public struct SubAverage
{
public float Average;
public int Count;
};
static SubAverage AverageMegaList(List<float> aList)
{
if (aList.Count <= 500) // Brute-force average 500 numbers or less.
{
SubAverage avg;
avg.Average = 0;
avg.Count = aList.Count;
foreach(float f in aList)
{
avg.Average += f;
}
avg.Average /= avg.Count;
return avg;
}
// For more than 500 numbers, break the list into two sub-lists.
SubAverage subAvg_A = AverageMegaList(aList.GetRange(0, aList.Count/2));
SubAverage subAvg_B = AverageMegaList(aList.GetRange(aList.Count/2, aList.Count-aList.Count/2));
SubAverage finalAnswer;
finalAnswer.Average = subAvg_A.Average * subAvg_A.Count/aList.Count +
subAvg_B.Average * subAvg_B.Count/aList.Count;
finalAnswer.Count = aList.Count;
Console.WriteLine("The average of {0} numbers is {1}",
finalAnswer.Count, finalAnswer.Average);
return finalAnswer;
}
The trick is that you're worried about an overflow. In that case, it all comes down to order of execution. The basic formula is like this:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
(C * A) + V
A1 = ———————————
C + 1
The danger is over the course of evaulating the sequence, while A should stay relatively manageable C will become very large.
Eventually C * A will overflow the integer or double types.
One thing we can try is to re-write it like this, to reduce the chance of an overflow:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
In this way, we never multiply C * A and only deal with smaller numbers. But the concern now is the result of the division operations. If C is very large, C/C+1 (for example) may not be meaningful when constrained to normal floating point representations. The best I can suggest is to use the largest type possible for C here.
Here's one way to do it in pseudocode:
average=first
count=1
while more:
count+=1
diff=next-average
average+=diff/count
return average
Sorry for the late comment, but isn't it the formula above provided by Joel Coehoorn rewritten wrongly?
I mean, the basic formula is right:
Given:
A = current avg
C = count of items
V = next value in the sequence
The next average (A1) is:
A1 = ( (C * A) + V ) / ( C + 1 )
But instead of:
A1 = C/(C+1) * A/(C+1) + V/(C+1)
shouldn't we have:
A1 = C/(C+1) * A + V/(C+1)
That would explain kastermester's post:
"My math ticks off here - You have C, which you say "go towards infinity" or at least, a really big number, then: C/(C+1) goes towards 1. A /(C+1) goes towards 0. V/(C+1) goes towards 0. All in all: A1 = 1 * 0 + 0 So put shortly A1 goes towards 0 - seems a bit off. – kastermester"
Because we would have A1 = 1 * A + 0, i.e., A1 goes towards A, which it's right.
I've been using such method for calculating averages for a long time and the aforementioned precision problems have never been an issue for me.
With floating point numbers the problem is not overflow, but loss of precision when the accumulated value gets large. Adding a small number to a huge accumulated value will result in losing most of the bits of the small number.
There is a clever solution by the author of the IEEE floating point standard himself, the Kahan summation algorithm, which deals exactly with this kind of problems by checking the error at each step and keeping a running compensation term that prevents losing the small values.
If the numbers are int's, accumulate the total in a long. If the numbers are long's ... what language are you using? In Java you could accumulate the total in a BigInteger, which is an integer which will grow as large as it needs to be. You could always write your own class to reproduce this functionality. The gist of it is just to make an array of integers to hold each "big number". When you add two numbers, loop through starting with the low-order value. If the result of the addition sets the high order bit, clear this bit and carry the one to the next column.
Another option would be to find the average of, say, 1000 numbers at a time. Hold these intermediate results, then when you're done average them all together.
Why is a sum of floating point numbers overflowing? In order for that to happen, you would need to have values near the max float value, which sounds odd.
If you were dealing with integers I'd suggest using a BigInteger, or breaking the set into multiple subsets, recursively averaging the subsets, then averaging the averages.
If you're dealing with floats, it gets a bit weird. A rolling average could become very inaccurate. I suggest using a rolling average which is only updated when you hit an overflow exception or the end of the set. So effectively dividing the set into non-overflowing sets.
Two ideas from me:
If the numbers are ints, use an arbitrary precision library like IntX - this could be too slow, though
If the numbers are floats and you know the total amount, you can divide each entry by that number and add up the result. If you use double, the precision should be sufficient.
Why not just scale the numbers (down) before computing the average?
If I were to find the mean of billions of doubles as accurately as possible, I would take the following approach (NOT TESTED):
Find out 'M', an upper bound for log2(nb_of_input_data). If there are billions of data, 50 may be a good candidate (> 1 000 000 billions capacity). Create an L1 array of M double elements. If you're not sure about M, creating an extensible list will solve the issue, but it is slower.
Also create an associated L2 boolean array (all cells set to false by default).
For each incoming data D:
int i = 0;
double localMean = D;
while (L2[i]) {
L2[i] = false;
localMean = (localMean + L1[i]) / 2;
i++;
}
L1[i] = localMean;
L2[i] = true;
And your final mean will be:
double sum = 0;
double totalWeight = 0;
for (int i = 0; i < 50) {
if (L2[i]) {
long weight = 1 << i;
sum += L1[i] * weight;
totalWeight += weight;
}
}
return sum / totalWeight;
Notes:
Many proposed solutions in this thread miss the point of lost precision.
Using binary instead of 100-group-or-whatever provides better precision, and doubles can be safely doubled or halved without losing precision!
Try this
Iterate through the numbers incrementing a counter, and adding each number to a total, until adding the next number would result in an overflow, or you run out of numbers.
( It makes no difference if the inputs are integers or floats - use the largest precision float you can and convert each input to that type)
Divide the total by the counter to get a mean ( a floating point), and add it to a temp array
If you had run out of numbers, and there is only one element in temp, that's your result.
Start over using the temp array as input, ie iteratively recurse until you reached the end condition described earlier.
depending on the range of numbers it might be a good idea to have an array where the subscript is your number and the value is the quantity of that number, you could then do your calculation from this