C#/XNA - Multiplication faster than Division?

C#/XNA - Multiplication faster than Division? - c#

I saw a tweet recently that confused me (this was posted by an XNA coder, in the context of writing an XNA game):
Microoptimization tip of the day: when possible, use multiplication instead of division in high frequency areas. It's a few cycles faster.
I was quite surprised, because I always thought compilers where pretty smart (for example, using bit-shifting), and recently read a post by Shawn Hargreaves saying much the same thing. I wondered how much truth there was in this, since there are lots of calculations in my game.
I inquired, hoping for a sample, however the original poster was unable to give one. He did, however, say this:
Not necessarily when it's something like "center = width / 2". And I've already determined "yes, it's worth it". :)
So, I'm curious...
Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself.

Most compilers can do a reasonable job of optimizing when you give them a chance. For example, if you're dividing by a constant, chances are pretty good that the compiler can/will optimize that so it's done about as quickly as anything you can reasonably substitute for it.
When, however, you have two values that aren't known ahead of time, and you need to divide one by the other to get the answer, if there was much way for the compiler to do much with it, it would -- and for that matter, if there was much room for the compiler to optimize it much, the CPU would do it so the compiler didn't have to.
Edit: Your best bet for something like that (that's reasonably realistic) would probably be something like:
double scale_factor = get_input();
for (i=0; i<values.size(); i++)
values[i] /= scale_factor;
This is relatively easy to convert to something like:
scale_factor = 1.0 / scale_factor;
for (i=0; i<values.size(); i++)
values[i] *= scale_factor;
I can't really guarantee much one way or the other about a particular compiler doing that. It's basically a combination of strength reduction and loop hoisting. There are certainly optimizers that know how to do both, but what I've seen of the C# compiler suggests that it may not (but I never tested anything exactly like this, and the testing I did was a few versions back...)

Although the compiler can optimize out divisions and multiplications by powers of 2, other numbers can be difficult or impossible to optimize. Try optimizing a division by 17 and you'll see why. This is of course assuming the compiler doesn't know that you are dividing by 17 ahead of time (it is a run-time variable, not a constant).

Bit late but never mind.
The answer to your question is yes.
Have a look at my article here, http://www.codeproject.com/KB/cs/UniqueStringList2.aspx, which uses information based on the article mentioned in the first comment to your question.
I have a QuickDivideInfo struct which stores the magic number and the shift for a given divisor thus allowing division and modulo to be calculated using faster multiplication. I pre-computed (and tested!) QuickDivideInfos for a list of Golden Prime Numbers. For x64 at least, the .Divide method on QuickDivideInfo is inlined and is 3x quicker than using the divide operator (on an i5); it works for all numerators except int.MinValue and cannot overflow since the multiplication is stored in 64 bits before shifting. (I've not tried on x86 but if it doesn't inline for some reasons then the neatness of the Divide method would be lost and you would have to manually inline it).
So the above will work in all scenarios (except int.MinValue) if you can precalculate. If you trust the code that generates the magic number/shift, then you can deal with any divisor at runtime.
Other well-known small divisors with a very limited range of numerators could be written inline and may well be faster if they don't need an intermediate long.
Division by multiple of two: I would expect the compiler to deal with this (as in your width / 2) example since it is constant. If it doesn't then changing it to width >> 1 should be fine

To give some numbers, on this pdf
http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags
of the Pentium we get some numbers, and they aren't good:
IMUL 10 or 11
FMUL 3+1
IDIV 46 (32 bits operand)
FDIV 39
We are speaking of BIG differences

while(start<=end)
{
int mid=(start+end)/2;
if(mid*mid==A)
return mid;
if(mid*mid<A)
{
start=mid+1;
ans=mid;
}
If i am doing this way the outcome is the TIME LIMIT EXCEEDED for square root of 2147483647
But if i am doing the following way then the thing is clear that for Division compiler responds faster than for multiplication.
while(start<=end)
{
int mid=(start+end)/2;
if(mid==A/mid)
return mid;
if(mid<A/mid)
{
start=mid+1;
ans=mid;
}
else
end=mid-1;
}

Related

Why is Random.NextBytes() "surprisingly slow"?

From Fastest way to generate a random boolean, in the comment, CodesInChaos said:
MS messed up the implementation of NextBytes, so it's surprisingly slow.
[...] the performance is about as bad as calling Next for each byte, instead of taking advantage of all 31 bits. But since System.Random has bad design and implementation at pretty much every level, this is one of my smaller gripes.
Why did he said MS has made a design and implementation at pretty much every level?
How is the Random class wrongly implemented?

Of course I can't look inside his head for his reasons, but System.Random is pretty weird.
InternalSample() returns a non-negative int that cannot be int.MaxValue. That doesn't sound so bad on the outset, but that means it has almost (but not quite) 31 usable bits of randomness. That complicates things such as efficiently implementing NextBytes(byte[] buffer).. which it doesn't even try! It does this:
for (int index = 0; index < buffer.Length; ++index)
buffer[index] = (byte) (this.InternalSample() % 256);
Making approximately 4 times more calls to InternalSample than necessary. Also the % 256 is useless, casting to a byte truncates anyway. It's also biased, 255 is just slightly less probable than any other result, since the internal sample cannot be int.MaxValue.
But it gets worse. For example, NextDouble uses this.InternalSample() * 4.6566128752458E-10. It is probably not immediately obvious, but 4.6566128752458E-10 is 1.0 / int.MaxValue. What's annoying about that is that it's not a power of two, so it's a "messy" number that causes the gaps between adjacent possible results to be nonuniform.
Worse yet, the algorithms for Next(int) and Next(int, int) are inherently biased, since they simply scale a random double and reject nothing. It's also not especially fast, which would normally be a reason to avoid rejection sampling.
It's also fairly slow in general. It's a subtractive generator, a relatively unknown PRNG that apparently isn't too bad, but it has a big state (which is slow to seed and has an annoying cache footprint) and a bunch of annoying operations in the sampling algorithm. Certainly it has better quality than a basic LCG, but it's significantly marred by biased scaling methods and bad performance.
The interface design is also annoying. Since the upper bounds everywhere are exclusive, there is no easy way to generate a sample in [0 .. int.MaxValue] or [int.MinValue .. int.MaxValue], both of which are fairly commonly useful. Exclusive upper bounds are often nice to avoid weird -1's, offering no way to get a full-range sample is just annoying. Of course it can be done through NextDouble, but since the input of NextDouble already isn't a full-range sample, the result is necessarily biased.
There are probably some deficiencies that I've missed.

Avoid float rounding error when calculating average value

I'm writing a simple extension that calculates an average for an array. It works fine except when values are very big. So here is an example
const int div = 100;
double num = 0;
for (int i = 0; i < div; i++)
{
num += double.MaxValue/div;
}
Console.WriteLine(num);
Console.WriteLine(double.MaxValue);
I expect to get double.MaxValue but i get Infinity because of rounding error. Is it possible to change an algorithm or handle this situation? I know that there is some techniques to work with floats (rounding to even, for example), but I'm looking for something that could be helpful in this very case.
I hope an answer isn't No, you cannot, just humble yourself, you have no chance when you work with floats

One possibility would be to not divide each term by number of items, but divide the sum once afterward, in order to accumulate less error...
Of course MaxValue+MaxValue would overflow in this case. Yes, but your library would cleverly mitigate this problem by detecting overflow (arrange to trap the exception), and scale the operands by 1/2 in this case.
At the end after the division, you would apply inverse scale by appropriate power of 2.
Yes but the sum of MaxValue,3,-MaxValue might be exhibit very bad accuracy (like answering zero instead of 1).
Ah, no problem, you can have a perfectly exact sum with code like this Precise sum of floating point numbers, that's easy, modulo the scale protection you'll have to mix with...
A small collateral effect is that your average is no more O(n) just a bit more exensive (hem...). Oh bad luck, some real-time application might expect O(n), and since you're writing a general purpose library...
So, in order to balance both expectations, you might choose to sum with kind of Kahan sum, and sacrifice some accuracy for speed...
Oh but why? What are the expectations exactly? This is the main question you'd have to answer... Do you prefer a library that guaranty best accuracy possible at the price of speed (think of crlibm vs libm), best speed at the price of a few corner cases, exception free behaviour whatever the illness of inputs (your original question), or a mix of above?
Unfortunately, you can't have them all together...
In all case, as Patricia said, document it.

In loop you are dividing it with zero and that's why the value of num is Infinite.

C# and strange exponentiation of complex numbers

Have been playing around with complex numbers in C#, and I found something interesting. Not sure if its a bug or if I have just missed something, but when I run the following code:
var num = new Complex(0, 1);
var complexPow = Complex.Pow(num, 2);
var numTimesNum = num * num;
Console.WriteLine("Complex.Pow(num, 2) = {0} num*num = {1}", complexPow.ToString(), numTimesNum.ToString());
I get the following output:
Complex.Pow(num, 2) = (-1, 1.22460635382238E-16) num*num = (-1, 0)
If memory serves, a complex number times itself should be just -1 with no imaginary part (or rather an imaginary part of 0). So why doesnt Complex.Pow(num, 2) give -1? Where does the 1.22460635382238E-16 come from?
If it matters, I am using mono since Im not in Windows atm. Think it might be 64-bit, since I am running a 64-bit OS, but I am not sure where to check.
Take care,
Kerr.
EDIT:
Ok, I explained poorly. I of course mean the square of i is -1, not the square of any complex number. Thanks for pointing it out. A bit tired right now, so my brain doesnt work too well, lol.
EDIT 2:
To clarify something, I have been reading a little math lately and decided to make a small scripting language for fun. Ok, "scripting language" is an over statement, it just evaluates equations and nothing more.

You're seeing floating-point imprecision.
1.22460635382238E-16 is actually 0.00000000000000012....
Complex.Pow() is probably implemented through De Moivre's formula, using trigonometry to compute arbitrary powers.
It is therefore subject to inaccuracy from both floating-point arithmetic and trig.
It apparently does not have any special-case code for integral powers, which can be simpler.
Ordinary complex multiplication only involves simple arithmetic, so it is not subject to floating-point inaccuracies when the numbers are integral.

So why doesnt Complex.Pow(num, 2) give -1? Where does the 1.22460635382238E-16 come from?
I suspect it's a rounding error, basically. It's a very small number, after all. I don't know the details of Complex.Pow, but I wouldn't be at all surprised to find it used some trigonometry somewhere - you may well be observing the fact that pi/2 isn't exactly representable as a double.
The * operation is able to avoid this by being more simply defined - Complex.Pow could be special-cased to just use x * x where the power is 2, but I expect that hasn't been done. Instead, a general algorithm is used which gives an answer very close to the hypothetical one, but which can result in small errors.

So why doesnt Complex.Pow(num, 2) give -1? Where does the 1.22460635382238E-16 come from?
Standard issues with floating point representations, and the algorithms that are used to compute Complex.Pow (it's not as simple as you think). Note that 1.22460635382238E-16 is extremely small, close to machine epsilon. Additionally, a key fact here is that (0, 1) is really (1, pi / 2) in polar coordinates, and pi / 2 doesn't have an exact representation in floating point.
If this is at all uncomfortable to you, I recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic. It should be required reading for college CS curriculum.

Multiply or divide in C#/.NET [duplicate]

Here's a silly fun question:
Let's say we have to perform a simple operation where we need half of the value of a variable. There are typically two ways of doing this:
y = x / 2.0;
// or...
y = x * 0.5;
Assuming we're using the standard operators provided with the language, which one has better performance?
I'm guessing multiplication is typically better so I try to stick to that when I code, but I would like to confirm this.
Although personally I'm interested in the answer for Python 2.4-2.5, feel free to also post an answer for other languages! And if you'd like, feel free to post other fancier ways (like using bitwise shift operators) as well.

Python:
time python -c 'for i in xrange(int(1e8)): t=12341234234.234 / 2.0'
real 0m26.676s
user 0m25.154s
sys 0m0.076s
time python -c 'for i in xrange(int(1e8)): t=12341234234.234 * 0.5'
real 0m17.932s
user 0m16.481s
sys 0m0.048s
multiplication is 33% faster
Lua:
time lua -e 'for i=1,1e8 do t=12341234234.234 / 2.0 end'
real 0m7.956s
user 0m7.332s
sys 0m0.032s
time lua -e 'for i=1,1e8 do t=12341234234.234 * 0.5 end'
real 0m7.997s
user 0m7.516s
sys 0m0.036s
=> no real difference
LuaJIT:
time luajit -O -e 'for i=1,1e8 do t=12341234234.234 / 2.0 end'
real 0m1.921s
user 0m1.668s
sys 0m0.004s
time luajit -O -e 'for i=1,1e8 do t=12341234234.234 * 0.5 end'
real 0m1.843s
user 0m1.676s
sys 0m0.000s
=>it's only 5% faster
conclusions: in Python it's faster to multiply than to divide, but as you get closer to the CPU using more advanced VMs or JITs, the advantage disappears. It's quite possible that a future Python VM would make it irrelevant

Always use whatever is the clearest. Anything else you do is trying to outsmart the compiler. If the compiler is at all intelligent, it will do the best to optimize the result, but nothing can make the next guy not hate you for your crappy bitshifting solution (I love bit manipulation by the way, it's fun. But fun != readable)
Premature optimization is the root of all evil. Always remember the three rules of optimization!
Don't optimize.
If you are an expert, see rule #1
If you are an expert and can justify the need, then use the following procedure:
Code it unoptimized
determine how fast is "Fast enough"--Note which user requirement/story requires that metric.
Write a speed test
Test existing code--If it's fast enough, you're done.
Recode it optimized
Test optimized code. IF it doesn't meet the metric, throw it away and keep the original.
If it meets the test, keep the original code in as comments
Also, doing things like removing inner loops when they aren't required or choosing a linked list over an array for an insertion sort are not optimizations, just programming.

I think this is getting so nitpicky that you would be better off doing whatever makes the code more readable. Unless you perform the operations thousands, if not millions, of times, I doubt anyone will ever notice the difference.
If you really have to make the choice, benchmarking is the only way to go. Find what function(s) are giving you problems, then find out where in the function the problems occur, and fix those sections. However, I still doubt that a single mathematical operation (even one repeated many, many times) would be a cause of any bottleneck.

Multiplication is faster, division is more accurate. You'll lose some precision if your number isn't a power of 2:
y = x / 3.0;
y = x * 0.333333; // how many 3's should there be, and how will the compiler round?
Even if you let the compiler figure out the inverted constant to perfect precision, the answer can still be different.
x = 100.0;
x / 3.0 == x * (1.0/3.0) // is false in the test I just performed
The speed issue is only likely to matter in C/C++ or JIT languages, and even then only if the operation is in a loop at a bottleneck.

If you want to optimize your code but still be clear, try this:
y = x * (1.0 / 2.0);
The compiler should be able to do the divide at compile-time, so you get a multiply at run-time. I would expect the precision to be the same as in the y = x / 2.0 case.
Where this may matter a LOT is in embedded processors where floating-point emulation is required to compute floating-point arithmetic.

Just going to add something for the "other languages" option.
C: Since this is just an academic exercise that really makes no difference, I thought I would contribute something different.
I compiled to assembly with no optimizations and looked at the result.
The code:
int main() {
volatile int a;
volatile int b;
asm("## 5/2\n");
a = 5;
a = a / 2;
asm("## 5*0.5");
b = 5;
b = b * 0.5;
asm("## done");
return a + b;
}
compiled with gcc tdiv.c -O1 -o tdiv.s -S
the division by 2:
movl $5, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, %edx
shrl $31, %edx
addl %edx, %eax
sarl %eax
movl %eax, -4(%ebp)
and the multiplication by 0.5:
movl $5, -8(%ebp)
movl -8(%ebp), %eax
pushl %eax
fildl (%esp)
leal 4(%esp), %esp
fmuls LC0
fnstcw -10(%ebp)
movzwl -10(%ebp), %eax
orw $3072, %ax
movw %ax, -12(%ebp)
fldcw -12(%ebp)
fistpl -16(%ebp)
fldcw -10(%ebp)
movl -16(%ebp), %eax
movl %eax, -8(%ebp)
However, when I changed those ints to doubles (which is what python would probably do), I got this:
division:
flds LC0
fstl -8(%ebp)
fldl -8(%ebp)
flds LC1
fmul %st, %st(1)
fxch %st(1)
fstpl -8(%ebp)
fxch %st(1)
multiplication:
fstpl -16(%ebp)
fldl -16(%ebp)
fmulp %st, %st(1)
fstpl -16(%ebp)
I haven't benchmarked any of this code, but just by examining the code you can see that using integers, division by 2 is shorter than multiplication by 2. Using doubles, multiplication is shorter because the compiler uses the processor's floating point opcodes, which probably run faster (but actually I don't know) than not using them for the same operation. So ultimately this answer has shown that the performance of multiplaction by 0.5 vs. division by 2 depends on the implementation of the language and the platform it runs on. Ultimately the difference is negligible and is something you should virtually never ever worry about, except in terms of readability.
As a side note, you can see that in my program main() returns a + b. When I take the volatile keyword away, you'll never guess what the assembly looks like (excluding the program setup):
## 5/2
## 5*0.5
## done
movl $5, %eax
leave
ret
it did both the division, multiplication, AND addition in a single instruction! Clearly you don't have to worry about this if the optimizer is any kind of respectable.
Sorry for the overly long answer.

Firstly, unless you are working in C or ASSEMBLY, you're probably in a higher level language where memory stalls and general call overheads will absolutely dwarf the difference between multiply and divide to the point of irrelevance. So, just pick what reads better in that case.
If you're talking from a very high level it won't be measurably slower for anything you're likely to use it for. You'll see in other answers, people need to do a million multiply/divides just to measure some sub-millisecond difference between the two.
If you're still curious, from a low level optimisation point of view:
Divide tends to have a significantly longer pipeline than multiply. This means it takes longer to get the result, but if you can keep the processor busy with non-dependent tasks, then it doesn't end up costing you any more than a multiply.
How long the pipeline difference is is completely hardware dependant. Last hardware I used was something like 9 cycles for a FPU multiply and 50 cycles for a FPU divide. Sounds a lot, but then you'd lose 1000 cycles for a memory miss, so that can put things in perspective.
An analogy is putting a pie in a microwave while you watch a TV show. The total time it took you away from the TV show is how long it was to put it in the microwave, and take it out of the microwave. The rest of your time you still watched the TV show. So if the pie took 10 minutes to cook instead of 1 minute, it didn't actually use up any more of your tv watching time.
In practice, if you're going to get to the level of caring about the difference between Multiply and Divide, you need to understand pipelines, cache, branch stalls, out-of-order prediction, and pipeline dependencies. If this doesn't sound like where you were intending to go with this question, then the correct answer is to ignore the difference between the two.
Many (many) years ago it was absolutely critical to avoid divides and always use multiplies, but back then memory hits were less relevant, and divides were much worse. These days I rate readability higher, but if there's no readability difference, I think its a good habit to opt for multiplies.

Write whichever is more clearly states your intent.
After your program works, figure out what's slow, and make that faster.
Don't do it the other way around.

Do whatever you need. Think of your reader first, do not worry about performance until you are sure you have a performance problem.
Let compiler do the performance for you.

Actually there is a good reason that as a general rule of thumb multiplication will be faster than division. Floating point division in hardware is done either with shift and conditional subtract algorithms ("long division" with binary numbers) or - more likely these days - with iterations like Goldschmidt's algorithm. Shift and subtract needs at least one cycle per bit of precision (the iterations are nearly impossible to parallelize as are the shift-and-add of multiplication), and iterative algorithms do at least one multiplication per iteration. In either case, it's highly likely that the division will take more cycles. Of course this does not account for quirks in compilers, data movement, or precision. By and large, though, if you are coding an inner loop in a time sensitive part of a program, writing 0.5 * x or 1.0/2.0 * x rather than x / 2.0 is a reasonable thing to do. The pedantry of "code what's clearest" is absolutely true, but all three of these are so close in readability that the pedantry is in this case just pedantic.

If you are working with integers or non floating point types don't forget your bitshifting operators: << >>
int y = 10;
y = y >> 1;
Console.WriteLine("value halved: " + y);
y = y << 1;
Console.WriteLine("now value doubled: " + y);

Multiplication is usually faster - certainly never slower.
However, if it is not speed critical, write whichever is clearest.

I have always learned that multiplication is more efficient.

Floating-point division is (generally) especially slow, so while floating-point multiplication is also relatively slow, it's probably faster than floating-point division.
But I'm more inclined to answer "it doesn't really matter", unless profiling has shown that division is a bit bottleneck vs. multiplication. I'm guessing, though, that the choice of multiplication vs. division isn't going to have a big performance impact in your application.

This becomes more of a question when you are programming in assembly or perhaps C. I figure that with most modern languages that optimization such as this is being done for me.

Be wary of "guessing multiplication is typically better so I try to stick to that when I code,"
In context of this specific question, better here means "faster". Which is not very useful.
Thinking about speed can be a serious mistake. There are profound error implications in the specific algebraic form of the calculation.
See Floating Point arithmetic with error analysis. See Basic Issues in Floating Point Arithmetic and Error Analysis.
While some floating-point values are exact, most floating point values are an approximation; they are some ideal value plus some error. Every operation applies to the ideal value and the error value.
The biggest problems come from trying to manipulate two nearly-equal numbers. The right-most bits (the error bits) come to dominate the results.
>>> for i in range(7):
... a=1/(10.0**i)
... b=(1/10.0)**i
... print i, a, b, a-b
...
0 1.0 1.0 0.0
1 0.1 0.1 0.0
2 0.01 0.01 -1.73472347598e-18
3 0.001 0.001 -2.16840434497e-19
4 0.0001 0.0001 -1.35525271561e-20
5 1e-05 1e-05 -1.69406589451e-21
6 1e-06 1e-06 -4.23516473627e-22
In this example, you can see that as the values get smaller, the difference between nearly equal numbers create non-zero results where the correct answer is zero.

I've read somewhere that multiplication is more efficient in C/C++; No idea regarding interpreted languages - the difference is probably negligible due to all the other overhead.
Unless it becomes an issue stick with what is more maintainable/readable - I hate it when people tell me this but its so true.

I would suggest multiplication in general, because you don't have to spend the cycles ensuring that your divisor is not 0. This doesn't apply, of course, if your divisor is a constant.

As with posts #24 (multiplication is faster) and #30 - but sometimes they are both just as easy to understand:
1*1e-6F;
1/1e6F;
~ I find them both just as easy to read, and have to repeat them billions of times. So it is useful to know that multiplication is usually faster.

There is a difference, but it is compiler dependent. At first on vs2003 (c++) I got no significant difference for double types (64 bit floating point). However running the tests again on vs2010, I detected a huge difference, up to factor 4 faster for multiplications. Tracking this down, it seems that vs2003 and vs2010 generates different fpu code.
On a Pentium 4, 2.8 GHz, vs2003:
Multiplication: 8.09
Division: 7.97
On a Xeon W3530, vs2003:
Multiplication: 4.68
Division: 4.64
On a Xeon W3530, vs2010:
Multiplication: 5.33
Division: 21.05
It seems that on vs2003 a division in a loop (so the divisor was used multiple times) was translated to a multiplication with the inverse. On vs2010 this optimization is not applied any more (I suppose because there is slightly different result between the two methods). Note also that the cpu performs divisions faster as soon as your numerator is 0.0. I do not know the precise algorithm hardwired in the chip, but maybe it is number dependent.
Edit 18-03-2013: the observation for vs2010

Java android, profiled on Samsung GT-S5830
public void Mutiplication()
{
float a = 1.0f;
for(int i=0; i<1000000; i++)
{
a *= 0.5f;
}
}
public void Division()
{
float a = 1.0f;
for(int i=0; i<1000000; i++)
{
a /= 2.0f;
}
}
Results?
Multiplications(): time/call: 1524.375 ms
Division(): time/call: 1220.003 ms
Division is about 20% faster than multiplication (!)

After such a long and interesting discussion here is my take on this: There is no final answer to this question. As some people pointed out it depends on both, the hardware (cf piotrk and gast128) and the compiler (cf #Javier's tests). If speed is not critical, if your application does not need to process in real-time huge amount of data, you may opt for clarity using a division whereas if processing speed or processor load are an issue, multiplication might be the safest.
Finally, unless you know exactly on what platform your application will be deployed, benchmark is meaningless. And for code clarity, a single comment would do the job!

Here's a silly fun answer:
x / 2.0 is not equivalent to x * 0.5
Let's say you wrote this method on Oct 22, 2008.
double half(double x) => x / 2.0;
Now, 10 years later you learn that you can optimize this piece of code. The method is referenced in hundreds of formulas throughout your application. So you change it, and experience a remarkable 5% performance improvement.
double half(double x) => x * 0.5;
Was it the right decision to change the code? In maths, the two expressions are indeed equivalent. In computer science, that does not always hold true. Please read Minimizing the effect of accuracy problems for more details. If your calculated values are - at some point - compared with other values, you will change the outcome of edge cases. E.g.:
double quantize(double x)
{
if (half(x) > threshold))
return 1;
else
return -1;
}
Bottom line is; once you settle for either of the two, then stick to it!

Well, if we assume that an add/subtrack operation costs 1, then multiply costs 5, and divide costs about 20.

Technically there is no such thing as division, there is just multiplication by inverse elements. For example You never divide by 2, you in fact multiply by 0.5.
'Division' - let's kid ourselves that it exists for a second - is always harder that multiplication because to 'divide' x by y one first needs to compute the value y^{-1} such that y*y^{-1} = 1 and then do the multiplication x*y^{-1}. If you already know y^{-1} then not calculating it from y must be an optimization.

Is shifting bits faster than multiplying and dividing in Java? .NET? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Shifting bits left and right is apparently faster than multiplication and division operations on most, maybe even all, CPUs if you happen to be using a power of 2. However, it can reduce the clarity of code for some readers and some algorithms. Is bit-shifting really necessary for performance, or can I expect the compiler or VM to notice the case and optimize it (in particular, when the power-of-2 is a literal)? I am mainly interested in the Java and .NET behavior but welcome insights into other language implementations as well.

Almost any environment worth its salt will optimize this away for you. And if it doesn't, you've got bigger fish to fry. Seriously, do not waste one more second thinking about this. You will know when you have performance problems. And after you run a profiler, you will know what is causing it, and it should be fairly clear how to fix it.
You will never hear anyone say "my application was too slow, then I started randomly replacing x * 2 with x << 1 and everything was fixed!" Performance problems are generally solved by finding a way to do an order of magnitude less work, not by finding a way to do the same work 1% faster.

Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it.
For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.
Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:
http://www.hackersdelight.org/
See also a discussion (with a link or two ) in:
Reading assembly code
Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.

Humans are wrong in these cases.
99% when they try to second guess a modern (and all future) compilers.
99.9% when they try to second guess modern (and all future) JITs at the same time.
99.999% when they try to second guess modern (and all future) CPU optimizations.
Program in a way that accurately describes what you want to accomplish, not how to do it. Future versions of the JIT, VM, compiler, and CPU can all be independantly improved and optimized. If you specify something so tiny and specific, you lose the benefit of all future optimizations.

You can almost certainly depend on the literal-power-of-two multiplication optimisation to a shift operation. This is one of the first optimisations that students of compiler construction will learn. :)
However, I don't think there's any guarantee for this. Your source code should reflect your intent, rather than trying to tell the optimiser what to do. If you're making a quantity larger, use multiplication. If you're moving a bit field from one place to another (think RGB colour manipulation), use a shift operation. Either way, your source code will reflect what you are actually doing.

Note that shifting down and division will (in Java, certainly) give different results for negative, odd numbers.
int a = -7;
System.out.println("Shift: "+(a >> 1));
System.out.println("Div: "+(a / 2));
Prints:
Shift: -4
Div: -3
Since Java doesn't have any unsigned numbers it's not really possible for a Java compiler to optimise this.

On computers I tested, integer divisions are 4 to 10 times slower than other operations.
When compilers may replace divisions by multiples of 2 and make you see no difference, divisions by not multiples of 2 are significantly slower.
For example, I have a (graphics) program with many many many divisions by 255.
Actually my computation is :
r = (((top.R - bottom.R) * alpha + (bottom.R * 255)) * 0x8081) >> 23;
I can ensure that it is a lot faster than my previous computation :
r = ((top.R - bottom.R) * alpha + (bottom.R * 255)) / 255;
so no, compilers cannot do all the tricks of optimization.

I would ask "what are you doing that it would matter?". First design your code for readability and maintainability. The likelyhood that doing bit shifting verses standard multiplication will make a performance difference is EXTREMELY small.

It is hardware dependent. If we are talking micro-controller or i386, then shifting might be faster but, as several answers state, your compiler will usually do the optimization for you.
On modern (Pentium Pro and beyond) hardware the pipelining makes this totally irrelevant and straying from the beaten path usually means you loose a lot more optimizations than you can gain.
Micro optimizations are not only a waste of your time, they are also extremely difficult to get right.

If the compiler (compile-time constant) or JIT (runtime constant) knows that the divisor or multiplicand is a power of two and integer arithmetic is being performed, it will convert it to a shift for you.

According to the results of this microbenchmark, shifting is twice as fast as dividing (Oracle Java 1.7.0_72).

Most compilers will turn multiplication and division into bit shifts when appropriate. It is one of the easiest optimizations to do. So, you should do what is more easily readable and appropriate for the given task.

I am stunned as I just wrote this code and realized that shifting by one is actually slower than multiplying by 2!
(EDIT: changed the code to stop overflowing after Michael Myers' suggestion, but the results are the same! What is wrong here?)
import java.util.Date;
public class Test {
public static void main(String[] args) {
Date before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a *=2;
}
}
Date after = new Date();
System.out.println("Multiplying " + (after.getTime()-before.getTime()) + " milliseconds");
before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a = a << 1;
}
}
after = new Date();
System.out.println("Shifting " + (after.getTime()-before.getTime()) + " milliseconds");
}
}
The results are:
Multiplying 639 milliseconds
Shifting 718 milliseconds

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.