I just read about Denormalized floating point numbers, should i replace all zero literals with almost-zero literal to get better performance.
I am afraid that the evil zero constants in my could pollute my performance.
Example:
Program 1:
float a = 0.0f;
Console.WriteLine(a);
Program 2:
float b = 1.401298E-45f;
Console.WriteLine(b);
Shouldn't program 2 be 1.000.000 times faster than program 1 since b can be represented by ieee floating point representation in cannonized form ? whereas program 1 has to act with "zero" which is not directly representable.
If so the whole software development industry is flawed. A simple field declaration:
float c;
Would automatically initialize it to zero, Which would cause the dreaded performance hit.
Avoid the hustle mentioning "Premature Optimization is the..., blablabla".
Delayed Knowledge of Compilers Optimization Workings could result in the explosion of a nuclear factory. So i would like to know ahead what i am paying, so that i am safe to ignore optimizing it.
Ps. I don't care if float becomes denormalized by the result of a mathematical operation, i have no control in that, so i don't care.
Proof: x + 0.1f is 10 times faster than x + 0
Why does changing 0.1f to 0 slow down performance by 10x?
Question Synopsis: is 0.0f evil ? So all who used it as a constant are also evil?
There's nothing special about denormals that makes them inherently slower than normalized floating point numbers. In fact, a FP system which only supported denormals would be plenty fast, because it would essentially only be doing integer operations.
The slowness comes from the relative difficulty of certain operations when performed on a mix of normals and denormals. Adding a normal to a denormal is much trickier than adding a normal to a normal, or adding a denormal to a denormal. The machinery of computation is simply more involved, requires more steps. Because most of the time you're only operating on normals, it makes sense to optimize for that common case, and drop into the slower and more generalized normal/denormal implementation only when that doesn't work.
The exception to denormals being unusual, of course, is 0.0, which is a denormal with a zero mantissa. Because 0 is the sort of thing one often finds and does operations on, and because an operation involving a 0 is trivial, those are handled as part of the fast common case.
I think you've misunderstood what's going on in the answer to the question you linked. The 0 isn't by itself making things slow: despite being technically a denormal, operations on it are fast. The denormals in question are the ones stored in the y array after a sufficient number of loop iterations. The advantage of the 0.1 over the 0 is that, in that particular code snippet, it prevents numbers from becoming nonzero denormals, not that it's faster to add 0.1 than 0.0 (it isn't).
I'm writing a simple extension that calculates an average for an array. It works fine except when values are very big. So here is an example
const int div = 100;
double num = 0;
for (int i = 0; i < div; i++)
{
num += double.MaxValue/div;
}
Console.WriteLine(num);
Console.WriteLine(double.MaxValue);
I expect to get double.MaxValue but i get Infinity because of rounding error. Is it possible to change an algorithm or handle this situation? I know that there is some techniques to work with floats (rounding to even, for example), but I'm looking for something that could be helpful in this very case.
I hope an answer isn't No, you cannot, just humble yourself, you have no chance when you work with floats
One possibility would be to not divide each term by number of items, but divide the sum once afterward, in order to accumulate less error...
Of course MaxValue+MaxValue would overflow in this case. Yes, but your library would cleverly mitigate this problem by detecting overflow (arrange to trap the exception), and scale the operands by 1/2 in this case.
At the end after the division, you would apply inverse scale by appropriate power of 2.
Yes but the sum of MaxValue,3,-MaxValue might be exhibit very bad accuracy (like answering zero instead of 1).
Ah, no problem, you can have a perfectly exact sum with code like this Precise sum of floating point numbers, that's easy, modulo the scale protection you'll have to mix with...
A small collateral effect is that your average is no more O(n) just a bit more exensive (hem...). Oh bad luck, some real-time application might expect O(n), and since you're writing a general purpose library...
So, in order to balance both expectations, you might choose to sum with kind of Kahan sum, and sacrifice some accuracy for speed...
Oh but why? What are the expectations exactly? This is the main question you'd have to answer... Do you prefer a library that guaranty best accuracy possible at the price of speed (think of crlibm vs libm), best speed at the price of a few corner cases, exception free behaviour whatever the illness of inputs (your original question), or a mix of above?
Unfortunately, you can't have them all together...
In all case, as Patricia said, document it.
In loop you are dividing it with zero and that's why the value of num is Infinite.
I need to calculate PI with predefined precision using this formula:
So I ended up with this solution.
private static double CalculatePIWithPrecision(int presicion)
{
if (presicion == 0)
{
return PI_ZERO_PRECISION;
}
double sum = 0;
double numberOfSumElements = Math.Pow(10, presicion + 2);
for (double i = 1; i < numberOfSumElements; i++)
{
sum += 1 / (i * i);
}
double pi = Math.Sqrt(sum * 6);
return pi;
}
So this works correct, but I faced the problem with efficiency. It's very slow with precision values 8 and higher.
Is there a better (and faster!) way to calculate PI using that formula?
double numberOfSumElements = Math.Pow(10, presicion + 2);
I'm going to talk about this strictly in practical software engineering terms, avoiding getting lost in the formal math. Just practical tips that any software engineer should know.
First observe the complexity of your code. How long it takes to execute is strictly determined by this expression. You've written an exponential algorithm, the value you calculate very rapidly goes up as presicion increases. You quote the uncomfortable number, 8 produces 10^10 or a loop that makes ten billion calculations. Yes, you notice this, that's when computers starts to take seconds to produce a result, no matter how fast they are.
Exponential algorithms are bad, they perform very poorly. You can only do worse with one that has factorial complexity, O(n!), that goes up even faster. Otherwise the complexity of many real-world problems.
Now, is that expression actually accurate? You can do this with an "elbow test", using a practical back-of-the-envelope example. Let's pick a precision of 5 digits as a target and write it out:
1.0000 + 0.2500 + 0.1111 + 0.0625 + 0.0400 + 0.0278 + ... = 1.6433
You can tell that the additions rapidly get smaller, it converges quickly. You can reason out that, once the next number you add gets small enough then it does very little to make the result more accurate. Let's say that when the next number is less than 0.00001 then it's time to stop trying to improve the result.
So you'll stop at 1 / (n * n) = 0.00001 => n * n = 100000 => n = sqrt(100000) => n ~= 316
Your expression says to stop at 10^(5+2) = 10,000,000
You can tell that you are way off, looping entirely too often and not improving the accuracy of the result with the last 9.999 million iterations.
Time to talk about the real problem, too bad that you didn't explain how you got to such a drastically wrong algorithm. But surely you discovered when testing your code that it just was not very good at calculating a more precise value for pi. So you figured that by iterating more often, you'd get a better result.
Do note that in this elbow-test, it is also very important that you are able to calculate the additions with sufficient precision. I intentionally rounded the numbers, as though it was calculated on a machine capable of performing additions with 5 digits of precision. Whatever you do, the result can never be more precise than 5 digits.
You are using the double type in your code. Directly supported by the processor, it does not have infinite precision. The one and only rule you ever need to keep in mind is that calculations with double are never more precise than 15 digits. Also memorize the rule for float, it is never more precise than 7 digits.
So no matter what value you pass for presicion, the result can never be more precise than 15 digits. That is not useful at all, you already have the value of pi accurate to 15 digits. It is Math.Pi
The one thing you need to do to fix this is using a type that has more precision than double. In fact, it needs to be a type that has arbitrary precision, it needs to be at least as accurate as the presicion value you pass. Such a type does not exist in the .NET framework. Finding a library that can provide you with one is a common question at SO.
I had to debug some code that was exhibiting transient and sporadic behavior, which ultimately could be attributed to an uninitialized float in a line of initializations, i.e.:
float a = number, b, c = other_number;
This section of code was rapidly sampling a device over a serial connection and averaging the output over some interval. Every once in a while, the number 2.7916085e+035 would get reported, but otherwise the code worked as intended and the bug was not reproducible.
Since the number was always 2.7916085e+035, I thought there might have been some issues with the communications handling, or the device itself, but these were ruled out. I was almost ready to blame it on external interference until I finally caught a faulty sample in the debugger.
So, to the question. Can someone postulate the significance of 2.7916085e+035? I'm not sure it has any meaning outside of my context, but what bothers me is that this number was essentially unreproducibly reproducible. That is to say, I couldn't replicate the problem reliably, but when it arose, it was always the same thing. From my understanding, uninitialized variables are supposed to be indeterminate. It's worth noting that the issue happened in all different places of program execution, phase, time of day, etc... but always on the same system.
Is there something in the .NET framework, runtime, or operating system that was causing the behavior? This was particularly troublesome to track down because the uninitialized variable always had the same value, when it didn't luckily get set to 0.
Edit: Some context. The code is within a timer with a variable tick rate, so the variables are local non-static members of a class:
if(//some box checked)
{
switch(//some output index)
{
case problem_variable:
{
if(ready_to_sample)
{
float average;
for each(float num in readings)
{
average += num;
}
average /= readings.Count;
}
}
}
}
The variable in question here would be average. readings is a list of outputs that I want to average. average would be redeclared one time per.... average, which can happen in seconds, minutes, hours, or whenever the condition is met to take an average. More often than not the variable would get 0, but occasionally it would get the number above.
In the common floating-point encodings, 2.7916085e+035 is 0x7a570ec5 as a float and 0x474ae1d8a58be975 as a double, modulo endianness. These do not look like a typical text character string, a simple integer, or a common address. (The low bits of the double encoding are uncertain, as you did not capture enough decimal digits to determine them, but the high bits do not look meaningful.)
I expect there is little information to be deduced from this value by itself.
That double in 64-bit binary translates to
0100011101001010111000011101100010100101100010111110100000000000
or
01111010010101110000111011000101
as a 32-bit float. Nearly all modern processors keep instructions and data separate--especially R/W data. The exception, of course, is the old x86, which is a CISC processor, based on the 4004 from the days when every byte was at a premium, and even minicomputers did not have caches to work with. With modern OS's, however, it is much more likely that, while 4 or 8KB pages were being moved around, a page of instructions was changed without zeroing out the old page.
The double version might be the equivalent to
Increment by 1, where r7 (EDI - extended destination index) is selected
The second, viewed as a float, looks like it would translate to either x86 or x86-64:
How do I interpet this x86_64 assembly opcode?
The value that you see for an uninitialized variable is whatever happens to be in that memory location. It's not random; it's a value that was stored in memory by a previous function call. For example:
void f() {
int i = 3;
}
void g() {
int i;
std::cout << i << std::endl;
}
int main() {
f();
g();
return 0;
}
Chances are, this program (assuming the compiler doesn't optimize out the initialization in f()) will write 3 to the console.
Floats are a base 2 number system. Because of this there are specific values that can not be accurately saved, and evaluate to an approximation.
Your output is probably giving you a value that specifically gets the same estimation. Try running through some common values that you get from the serial connection and see if you can find the value that is causing you grief. I personally would use a double for something like this instead of floats, especially if you are going to be doing any kind of calculations against those numbers.
Here's a silly fun question:
Let's say we have to perform a simple operation where we need half of the value of a variable. There are typically two ways of doing this:
y = x / 2.0;
// or...
y = x * 0.5;
Assuming we're using the standard operators provided with the language, which one has better performance?
I'm guessing multiplication is typically better so I try to stick to that when I code, but I would like to confirm this.
Although personally I'm interested in the answer for Python 2.4-2.5, feel free to also post an answer for other languages! And if you'd like, feel free to post other fancier ways (like using bitwise shift operators) as well.
Python:
time python -c 'for i in xrange(int(1e8)): t=12341234234.234 / 2.0'
real 0m26.676s
user 0m25.154s
sys 0m0.076s
time python -c 'for i in xrange(int(1e8)): t=12341234234.234 * 0.5'
real 0m17.932s
user 0m16.481s
sys 0m0.048s
multiplication is 33% faster
Lua:
time lua -e 'for i=1,1e8 do t=12341234234.234 / 2.0 end'
real 0m7.956s
user 0m7.332s
sys 0m0.032s
time lua -e 'for i=1,1e8 do t=12341234234.234 * 0.5 end'
real 0m7.997s
user 0m7.516s
sys 0m0.036s
=> no real difference
LuaJIT:
time luajit -O -e 'for i=1,1e8 do t=12341234234.234 / 2.0 end'
real 0m1.921s
user 0m1.668s
sys 0m0.004s
time luajit -O -e 'for i=1,1e8 do t=12341234234.234 * 0.5 end'
real 0m1.843s
user 0m1.676s
sys 0m0.000s
=>it's only 5% faster
conclusions: in Python it's faster to multiply than to divide, but as you get closer to the CPU using more advanced VMs or JITs, the advantage disappears. It's quite possible that a future Python VM would make it irrelevant
Always use whatever is the clearest. Anything else you do is trying to outsmart the compiler. If the compiler is at all intelligent, it will do the best to optimize the result, but nothing can make the next guy not hate you for your crappy bitshifting solution (I love bit manipulation by the way, it's fun. But fun != readable)
Premature optimization is the root of all evil. Always remember the three rules of optimization!
Don't optimize.
If you are an expert, see rule #1
If you are an expert and can justify the need, then use the following procedure:
Code it unoptimized
determine how fast is "Fast enough"--Note which user requirement/story requires that metric.
Write a speed test
Test existing code--If it's fast enough, you're done.
Recode it optimized
Test optimized code. IF it doesn't meet the metric, throw it away and keep the original.
If it meets the test, keep the original code in as comments
Also, doing things like removing inner loops when they aren't required or choosing a linked list over an array for an insertion sort are not optimizations, just programming.
I think this is getting so nitpicky that you would be better off doing whatever makes the code more readable. Unless you perform the operations thousands, if not millions, of times, I doubt anyone will ever notice the difference.
If you really have to make the choice, benchmarking is the only way to go. Find what function(s) are giving you problems, then find out where in the function the problems occur, and fix those sections. However, I still doubt that a single mathematical operation (even one repeated many, many times) would be a cause of any bottleneck.
Multiplication is faster, division is more accurate. You'll lose some precision if your number isn't a power of 2:
y = x / 3.0;
y = x * 0.333333; // how many 3's should there be, and how will the compiler round?
Even if you let the compiler figure out the inverted constant to perfect precision, the answer can still be different.
x = 100.0;
x / 3.0 == x * (1.0/3.0) // is false in the test I just performed
The speed issue is only likely to matter in C/C++ or JIT languages, and even then only if the operation is in a loop at a bottleneck.
If you want to optimize your code but still be clear, try this:
y = x * (1.0 / 2.0);
The compiler should be able to do the divide at compile-time, so you get a multiply at run-time. I would expect the precision to be the same as in the y = x / 2.0 case.
Where this may matter a LOT is in embedded processors where floating-point emulation is required to compute floating-point arithmetic.
Just going to add something for the "other languages" option.
C: Since this is just an academic exercise that really makes no difference, I thought I would contribute something different.
I compiled to assembly with no optimizations and looked at the result.
The code:
int main() {
volatile int a;
volatile int b;
asm("## 5/2\n");
a = 5;
a = a / 2;
asm("## 5*0.5");
b = 5;
b = b * 0.5;
asm("## done");
return a + b;
}
compiled with gcc tdiv.c -O1 -o tdiv.s -S
the division by 2:
movl $5, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, %edx
shrl $31, %edx
addl %edx, %eax
sarl %eax
movl %eax, -4(%ebp)
and the multiplication by 0.5:
movl $5, -8(%ebp)
movl -8(%ebp), %eax
pushl %eax
fildl (%esp)
leal 4(%esp), %esp
fmuls LC0
fnstcw -10(%ebp)
movzwl -10(%ebp), %eax
orw $3072, %ax
movw %ax, -12(%ebp)
fldcw -12(%ebp)
fistpl -16(%ebp)
fldcw -10(%ebp)
movl -16(%ebp), %eax
movl %eax, -8(%ebp)
However, when I changed those ints to doubles (which is what python would probably do), I got this:
division:
flds LC0
fstl -8(%ebp)
fldl -8(%ebp)
flds LC1
fmul %st, %st(1)
fxch %st(1)
fstpl -8(%ebp)
fxch %st(1)
multiplication:
fstpl -16(%ebp)
fldl -16(%ebp)
fmulp %st, %st(1)
fstpl -16(%ebp)
I haven't benchmarked any of this code, but just by examining the code you can see that using integers, division by 2 is shorter than multiplication by 2. Using doubles, multiplication is shorter because the compiler uses the processor's floating point opcodes, which probably run faster (but actually I don't know) than not using them for the same operation. So ultimately this answer has shown that the performance of multiplaction by 0.5 vs. division by 2 depends on the implementation of the language and the platform it runs on. Ultimately the difference is negligible and is something you should virtually never ever worry about, except in terms of readability.
As a side note, you can see that in my program main() returns a + b. When I take the volatile keyword away, you'll never guess what the assembly looks like (excluding the program setup):
## 5/2
## 5*0.5
## done
movl $5, %eax
leave
ret
it did both the division, multiplication, AND addition in a single instruction! Clearly you don't have to worry about this if the optimizer is any kind of respectable.
Sorry for the overly long answer.
Firstly, unless you are working in C or ASSEMBLY, you're probably in a higher level language where memory stalls and general call overheads will absolutely dwarf the difference between multiply and divide to the point of irrelevance. So, just pick what reads better in that case.
If you're talking from a very high level it won't be measurably slower for anything you're likely to use it for. You'll see in other answers, people need to do a million multiply/divides just to measure some sub-millisecond difference between the two.
If you're still curious, from a low level optimisation point of view:
Divide tends to have a significantly longer pipeline than multiply. This means it takes longer to get the result, but if you can keep the processor busy with non-dependent tasks, then it doesn't end up costing you any more than a multiply.
How long the pipeline difference is is completely hardware dependant. Last hardware I used was something like 9 cycles for a FPU multiply and 50 cycles for a FPU divide. Sounds a lot, but then you'd lose 1000 cycles for a memory miss, so that can put things in perspective.
An analogy is putting a pie in a microwave while you watch a TV show. The total time it took you away from the TV show is how long it was to put it in the microwave, and take it out of the microwave. The rest of your time you still watched the TV show. So if the pie took 10 minutes to cook instead of 1 minute, it didn't actually use up any more of your tv watching time.
In practice, if you're going to get to the level of caring about the difference between Multiply and Divide, you need to understand pipelines, cache, branch stalls, out-of-order prediction, and pipeline dependencies. If this doesn't sound like where you were intending to go with this question, then the correct answer is to ignore the difference between the two.
Many (many) years ago it was absolutely critical to avoid divides and always use multiplies, but back then memory hits were less relevant, and divides were much worse. These days I rate readability higher, but if there's no readability difference, I think its a good habit to opt for multiplies.
Write whichever is more clearly states your intent.
After your program works, figure out what's slow, and make that faster.
Don't do it the other way around.
Do whatever you need. Think of your reader first, do not worry about performance until you are sure you have a performance problem.
Let compiler do the performance for you.
Actually there is a good reason that as a general rule of thumb multiplication will be faster than division. Floating point division in hardware is done either with shift and conditional subtract algorithms ("long division" with binary numbers) or - more likely these days - with iterations like Goldschmidt's algorithm. Shift and subtract needs at least one cycle per bit of precision (the iterations are nearly impossible to parallelize as are the shift-and-add of multiplication), and iterative algorithms do at least one multiplication per iteration. In either case, it's highly likely that the division will take more cycles. Of course this does not account for quirks in compilers, data movement, or precision. By and large, though, if you are coding an inner loop in a time sensitive part of a program, writing 0.5 * x or 1.0/2.0 * x rather than x / 2.0 is a reasonable thing to do. The pedantry of "code what's clearest" is absolutely true, but all three of these are so close in readability that the pedantry is in this case just pedantic.
If you are working with integers or non floating point types don't forget your bitshifting operators: << >>
int y = 10;
y = y >> 1;
Console.WriteLine("value halved: " + y);
y = y << 1;
Console.WriteLine("now value doubled: " + y);
Multiplication is usually faster - certainly never slower.
However, if it is not speed critical, write whichever is clearest.
I have always learned that multiplication is more efficient.
Floating-point division is (generally) especially slow, so while floating-point multiplication is also relatively slow, it's probably faster than floating-point division.
But I'm more inclined to answer "it doesn't really matter", unless profiling has shown that division is a bit bottleneck vs. multiplication. I'm guessing, though, that the choice of multiplication vs. division isn't going to have a big performance impact in your application.
This becomes more of a question when you are programming in assembly or perhaps C. I figure that with most modern languages that optimization such as this is being done for me.
Be wary of "guessing multiplication is typically better so I try to stick to that when I code,"
In context of this specific question, better here means "faster". Which is not very useful.
Thinking about speed can be a serious mistake. There are profound error implications in the specific algebraic form of the calculation.
See Floating Point arithmetic with error analysis. See Basic Issues in Floating Point Arithmetic and Error Analysis.
While some floating-point values are exact, most floating point values are an approximation; they are some ideal value plus some error. Every operation applies to the ideal value and the error value.
The biggest problems come from trying to manipulate two nearly-equal numbers. The right-most bits (the error bits) come to dominate the results.
>>> for i in range(7):
... a=1/(10.0**i)
... b=(1/10.0)**i
... print i, a, b, a-b
...
0 1.0 1.0 0.0
1 0.1 0.1 0.0
2 0.01 0.01 -1.73472347598e-18
3 0.001 0.001 -2.16840434497e-19
4 0.0001 0.0001 -1.35525271561e-20
5 1e-05 1e-05 -1.69406589451e-21
6 1e-06 1e-06 -4.23516473627e-22
In this example, you can see that as the values get smaller, the difference between nearly equal numbers create non-zero results where the correct answer is zero.
I've read somewhere that multiplication is more efficient in C/C++; No idea regarding interpreted languages - the difference is probably negligible due to all the other overhead.
Unless it becomes an issue stick with what is more maintainable/readable - I hate it when people tell me this but its so true.
I would suggest multiplication in general, because you don't have to spend the cycles ensuring that your divisor is not 0. This doesn't apply, of course, if your divisor is a constant.
As with posts #24 (multiplication is faster) and #30 - but sometimes they are both just as easy to understand:
1*1e-6F;
1/1e6F;
~ I find them both just as easy to read, and have to repeat them billions of times. So it is useful to know that multiplication is usually faster.
There is a difference, but it is compiler dependent. At first on vs2003 (c++) I got no significant difference for double types (64 bit floating point). However running the tests again on vs2010, I detected a huge difference, up to factor 4 faster for multiplications. Tracking this down, it seems that vs2003 and vs2010 generates different fpu code.
On a Pentium 4, 2.8 GHz, vs2003:
Multiplication: 8.09
Division: 7.97
On a Xeon W3530, vs2003:
Multiplication: 4.68
Division: 4.64
On a Xeon W3530, vs2010:
Multiplication: 5.33
Division: 21.05
It seems that on vs2003 a division in a loop (so the divisor was used multiple times) was translated to a multiplication with the inverse. On vs2010 this optimization is not applied any more (I suppose because there is slightly different result between the two methods). Note also that the cpu performs divisions faster as soon as your numerator is 0.0. I do not know the precise algorithm hardwired in the chip, but maybe it is number dependent.
Edit 18-03-2013: the observation for vs2010
Java android, profiled on Samsung GT-S5830
public void Mutiplication()
{
float a = 1.0f;
for(int i=0; i<1000000; i++)
{
a *= 0.5f;
}
}
public void Division()
{
float a = 1.0f;
for(int i=0; i<1000000; i++)
{
a /= 2.0f;
}
}
Results?
Multiplications(): time/call: 1524.375 ms
Division(): time/call: 1220.003 ms
Division is about 20% faster than multiplication (!)
After such a long and interesting discussion here is my take on this: There is no final answer to this question. As some people pointed out it depends on both, the hardware (cf piotrk and gast128) and the compiler (cf #Javier's tests). If speed is not critical, if your application does not need to process in real-time huge amount of data, you may opt for clarity using a division whereas if processing speed or processor load are an issue, multiplication might be the safest.
Finally, unless you know exactly on what platform your application will be deployed, benchmark is meaningless. And for code clarity, a single comment would do the job!
Here's a silly fun answer:
x / 2.0 is not equivalent to x * 0.5
Let's say you wrote this method on Oct 22, 2008.
double half(double x) => x / 2.0;
Now, 10 years later you learn that you can optimize this piece of code. The method is referenced in hundreds of formulas throughout your application. So you change it, and experience a remarkable 5% performance improvement.
double half(double x) => x * 0.5;
Was it the right decision to change the code? In maths, the two expressions are indeed equivalent. In computer science, that does not always hold true. Please read Minimizing the effect of accuracy problems for more details. If your calculated values are - at some point - compared with other values, you will change the outcome of edge cases. E.g.:
double quantize(double x)
{
if (half(x) > threshold))
return 1;
else
return -1;
}
Bottom line is; once you settle for either of the two, then stick to it!
Well, if we assume that an add/subtrack operation costs 1, then multiply costs 5, and divide costs about 20.
Technically there is no such thing as division, there is just multiplication by inverse elements. For example You never divide by 2, you in fact multiply by 0.5.
'Division' - let's kid ourselves that it exists for a second - is always harder that multiplication because to 'divide' x by y one first needs to compute the value y^{-1} such that y*y^{-1} = 1 and then do the multiplication x*y^{-1}. If you already know y^{-1} then not calculating it from y must be an optimization.