Most efficient way of multiplying and dividing fixed scale decimal numbers - c#

Background
I work in the field of financial trading and am currently optimizing a real-time C# trading application.
Through extensive profiling I have identified that the performance of System.Decimal is now a bottleneck. As a result I am currently coding up a couple of more efficient fixed scale 64-bit 'decimal' structures (one signed, one unsigned) to perform base10 arithmatic. Using a fixed scale of 9 (i.e. 9 digits after the decimal point) means the underlying 64-bit integer can be used to represent the values:
-9,223,372,036.854775808 to 9,223,372,036.854775807
and
0 to 18,446,744,073.709551615
respectively.
This makes most operations trivial (i.e. comparisons, addition, subtraction). However, for multiplication and division I am currently falling back on the implementation provided by System.Decimal. I assume the external FCallMultiply method it invokes for multiplication uses either the Karatsuba or Toom–Cook algorithm under the covers. For division, I'm not sure which particular algorithm it would use.
Question
Does anyone know if, due to the fixed scale of my decimal values, there are any faster multiplication and division algorithms I can employ which are likely to out-perform System.Decimal.
I would appreciate your thoughts...

I have done something similar, by using the Schönhage Strassen algorithm.
I cannot find any sources now, but you can try to convert this code to the C# language.
P.S. i cannot say for sure about System.Decimal, but the "Karatsuba algorithm" is used by System.Numerics.BigInteger

My take of fixed point arithmetic (in general, not knowing about about C# or .NET in particular (VS Express acting up) (then, there's Fixed point math in c#? and Why no fixed point type in C#?):
The main point is a fixed scale - and that this is conceptual, first and foremost - the hardware couldn't care less about meaning/interpretation of numbers (or much anything) (unless it supports something, if for marketing reasons)
the easy: addition/subtraction - just ignore scaling
multiplication: compute the double-wide product, divide by scale
division: multiply (widened) dividend by scale and divide
the ugly - transcendental functions beyond exponentiation (exponentiate, multiply by scale to half that power)
in choosing a scale, don't forget conversion to and from digits, which may vastly outnumber multiplication&division (and give using a square a thought, see above …)
That said, "multiples of word size" and powers of two have been popular choices for scale due to speed in multiplying and dividing by such a scale. This still may make a difference with contemporary processors, if not for main ALUs of PCs - think SIMD extensions, GPUs, embedded …
Given what little I was able to discern of your application and requirements (consider disclosing more), three generic choices to consider are 10^9 (to the 9th power), 2^30 and 2^32. The latter representations may be called 34.30 and 32.32 for the bit lengths of their integral and fractional parts, respectively.
With a language that allows to create types (especially supporting operators in addition to invokable procedures), I deem designing and implementing that new type according the principle of least surprise important.

Related

C#: Natural Log needed with decimal values for financial purpose [duplicate]

I need to be able to use the standard math functions on decimal numbers. Accuracy is very important. double is not an acceptable substitution. How can math operations be implemented with decimal numbers in C#?
edit
I am using the System.Decimal. My issue is that System.Math does not work with System.Decimal. For example, the following functions do not work with System.Decimal:
System.Math.Pow
System.Math.Log
System.Math.Sqrt
Well, Double uses floating point math which isn't what you're after unless you're doing trigonometry for 3D graphics or something.
If you need to do simple math operations like division, you should use System.Decimal.
From MSDN: The decimal keyword denotes a 128-bit data type. Compared to floating-point types, the decimal type has a greater precision and a smaller range, which makes it suitable for financial and monetary calculations.
Update: After some discussion, the problem is that you want to work with Decimals, but System.Math only takes Doubles for several key pieces of functionality. Sadly, you are working with high precision numbers, and since Decimal is 128 bit and Double is only 64, the conversion results in a loss of precision.
Apparently there are some possible plans to make most of System.Math handle Decimal, but we aren't there yet.
I googled around a bit for math libraries and compiled this list:
Mathdotnet, A mathematical open source (MIT/X11, LGPL & GPL) library written in C#/.Net, aiming to provide a self contained clean framework for symbolic algebraic and numerical / scientific computations.
Extreme Optimization Mathematics Library for .NET (paid)
DecimalMath A relative newcomer, this one advertises itself as: Portable math support for Decimal that Microsoft forgot and more. Sounds promising.
DecimalMath contains all functions in System.Math class with decimal argument analogy
Note : it is my library and also contains some examples in it
You haven't given us nearly enough information to answer the question.
decimal and double are both inaccurate. The representation error of decimals is zero when the quantity being represented is exactly equal to a fraction of the form (x/10n) for suitable choices of x and n. The representation error of doubles is zero when the quantity is exactly equal to a fraction of the form (x/2n) again for suitable choices of x and n.
If the quantities you are dealing with are not fractions of that form then you will get some representation error, period. In particular, you mention taking square roots. Many square roots are irrational numbers; they have no fractional form, so any representation format that uses fractions is going to give small errors.
Can you explain what you are doing in hugely more detail?

'Beautify' number by rounding erroneous digits appropriately

I want my cake and to eat it. I want to beautify (round) numbers to the largest extent possible without compromising accuracy for other calculations. I'm using doubles in C# (with some string conversion manipulation too).
Here's the issue. I understand the inherent limitations in double number representation (so please don't explain that). HOWEVER, I want to round the number in some way to appear aesthetically pleasing to the end user (I am making a calculator). The problem is rounding by X significant digits works in one case, but not in the other, whilst rounding by decimal place works in the other, but not the first case.
Observe:
CASE A: Math.Sin(Math.Pi) = 0.000000000000000122460635382238
CASE B: 0.000000000000001/3 = 0.000000000000000333333333333333
For the first case, I want to round by DECIMAL PLACES. That would give me the nice neat zero I'm looking for. Rounding by Sig digits would mean I would keep the erroneous digits too.
However for the second case, I want to round by SIGNIFICANT DIGITS, as I would lose tons of accuracy if I rounded merely by decimal places.
Is there a general way I can cater to both types of calculation?
I don't thinks it's feasible to do that to the result itself and precision has nothing to do with it.
Consider this input: (1+3)/2^3 . You can "beautify" it by showing the result as sin(30) or cos(60) or 1/2 and a whole lot of other interpretations. Choosing the wrong "beautification" can mislead your user, making them think their function has something to do with sin(x).
If your calculator keeps all the initial input as variables you could keep all the operations postponed until you need the result and then make sure you simplify the result until it matches your needs. And you'll need to consider using rational numbers, e, Pi and other irrational numbers may not be as easy to deal with.
The best solution to this is to keep every bit you can get during calculations, and leave the display format up to the end user. The user should have some idea how many significant digits make sense in their situation, given both the nature of the calculations and the use of the result.
Default to a reasonable number of significant digits for a few calculations in the floating point format you are using internally - about 12 if you are using double. If the user changes the format, immediately redisplay in the new format.
The best solution is to use arbitrary-precision and/or symbolic arithmetic, although these result in much more complex code and slower speed. But since performance isn't important for a calculator (in case of a button calculator and not the one that you enter expressions to calculate) you can use them without issue
Anyway there's a good trade-off which is to use decimal floating point. You'll need to limit the input/output precision but use a higher precision for the internal representation so that you can discard values very close to zero like the sin case above. For better results you could detect some edge cases such as sine/cosine of 45 degree's multiples... and directly return the exact result.
Edit: just found a good solution but haven't had an opportunity to try.
Here’s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it’s been regarded as essentially solved since about 1990.
Prior to Steele and White’s "How to print floating-point numbers accurately", implementations of printf and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.
...
In 2010, Florian Loitsch published a wonderful paper in PLDI, "Printing floating-point numbers quickly and accurately with integers", which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say "mostly"? Because although Loitsch's "Grisu3" algorithm is very fast, it gives up on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative
Here be dragons: advances in problems you didn’t even know you had

Why is the division result between two integers truncated?

All experienced programmers in C# (I think this comes from C) are used to cast on of the integers in a division to get the decimal / double / float result instead of the int (the real result truncated).
I'd like to know why is this implemented like this? Is there ANY good reason to truncate the result if both numbers are integer?
C# traces its heritage to C, so the answer to "why is it like this in C#?" is a combination of "why is it like this in C?" and "was there no good reason to change?"
The approach of C is to have a fairly close correspondence between the high-level language and low-level operations. Processors generally implement integer division as returning a quotient and a remainder, both of which are of the same type as the operands.
(So my question would be, "why doesn't integer division in C-like languages return two integers", not "why doesn't it return a floating point value?")
The solution was to provide separate operations for division and remainder, each of which returns an integer. In the context of C, it's not surprising that the result of each of these operations is an integer. This is frequently more accurate than floating-point arithmetic. Consider the example from your comment of 7 / 3. This value cannot be represented by a finite binary number nor by a finite decimal number. In other words, on today's computers, we cannot accurately represent 7 / 3 unless we use integers! The most accurate representation of this fraction is "quotient 2, remainder 1".
So, was there no good reason to change? I can't think of any, and I can think of a few good reasons not to change. None of the other answers has mentioned Visual Basic which (at least through version 6) has two operators for dividing integers: / converts the integers to double, and returns a double, while \ performs normal integer arithmetic.
I learned about the \ operator after struggling to implement a binary search algorithm using floating-point division. It was really painful, and integer division came in like a breath of fresh air. Without it, there was lots of special handling to cover edge cases and off-by-one errors in the first draft of the procedure.
From that experience, I draw the conclusion that having different operators for dividing integers is confusing.
Another alternative would be to have only one integer operation, which always returns a double, and require programmers to truncate it. This means you have to perform two int->double conversions, a truncation and a double->int conversion every time you want integer division. And how many programmers would mistakenly round or floor the result instead of truncating it? It's a more complicated system, and at least as prone to programmer error, and slower.
Finally, in addition to binary search, there are many standard algorithms that employ integer arithmetic. One example is dividing collections of objects into sub-collections of similar size. Another is converting between indices in a 1-d array and coordinates in a 2-d matrix.
As far as I can see, no alternative to "int / int yields int" survives a cost-benefit analysis in terms of language usability, so there's no reason to change the behavior inherited from C.
In conclusion:
Integer division is frequently useful in many standard algorithms.
When the floating-point division of integers is needed, it may be invoked explicitly with a simple, short, and clear cast: (double)a / b rather than a / b
Other alternatives introduce more complication both the programmer and more clock cycles for the processor.
Is there ANY good reason to truncate the result if both numbers are integer?
Of course; I can think of a dozen such scenarios easily. For example: you have a large image, and a thumbnail version of the image which is 10 times smaller in both dimensions. When the user clicks on a point in the large image, you wish to identify the corresponding pixel in the scaled-down image. Clearly to do so, you divide both the x and y coordinates by 10. Why would you want to get a result in decimal? The corresponding coordinates are going to be integer coordinates in the thumbnail bitmap.
Doubles are great for physics calculations and decimals are great for financial calculations, but almost all the work I do with computers that does any math at all does it entirely in integers. I don't want to be constantly having to convert doubles or decimals back to integers just because I did some division. If you are solving physics or financial problems then why are you using integers in the first place? Use nothing but doubles or decimals. Use integers to solve finite mathematics problems.
Calculating on integers is faster (usually) than on floating point values. Besides, all other integer/integer operations (+, -, *) return an integer.
EDIT:
As per the request of the OP, here's some addition:
The OP's problem is that they think of / as division in the mathematical sense, and the / operator in the language performs some other operation (which is not the math. division). By this logic they should question the validity of all other operations (+, -, *) as well, since those have special overflow rules, which is not the same as would be expected from their math counterparts. If this is bothersome for someone, they should find another language where the operations perform as expected by the person.
As for the claim on perfomance difference in favor of integer values: When I wrote the answer I only had "folk" knowledge and "intuition" to back up the claim (hece my "usually" disclaimer). Indeed as Gabe pointed out, there are platforms where this does not hold. On the other hand I found this link (point 12) that shows mixed performances on an Intel platform (the language used is Java, though).
The takeaway should be that with performance many claims and intuition are unsubstantiated until measured and found true.
Yes, if the end result needs to be a whole number. It would depend on the requirements.
If these are indeed your requirements, then you would not want to store a decimal and then truncate it. You would be wasting memory and processing time to accomplish something that is already built-in functionality.
The operator is designed to return the same type as it's input.
Edit (comment response):
Why? I don't design languages, but I would assume most of the time you will be sticking with the data types you started with and in the remaining instance, what criteria would you use to automatically assume which type the user wants? Would you automatically expect a string when you need it? (sincerity intended)
If you add an int to an int, you expect to get an int. If you subtract an int from an int, you expect to get an int. If you multiple an int by an int, you expect to get an int. So why would you not expect an int result if you divide an int by an int? And if you expect an int, then you will have to truncate.
If you don't want that, then you need to cast your ints to something else first.
Edit: I'd also note that if you really want to understand why this is, then you should start looking into how binary math works and how it is implemented in an electronic circuit. It's certainly not necessary to understand it in detail, but having a quick overview of it would really help you understand how the low-level details of the hardware filter through to the details of high-level languages.

What is the recommended data type for scientific calculation in .Net?

What is the most recommended data type to use in scientific calculation in .Net? Is it float, double or something else?
Scientific values tend to be "natural" values (length, mass, time etc) where there's a natural degree of imprecision to start with - but where you may well want very, very large or very, very small numbers. For these values, double is generally a good idea. It's fast (with hardware support almost everywhere), scales up and down to huge/tiny values, and generally works fine if you're not concerned with exact decimal values.
decimal is a good type for "artificial" numbers where there's an exact value, almost always represented naturally as a decimal - the canonical example for this is currency. However, it's twice as expensive as double in terms of storage (8 bytes per value instead of 4), has a smaller range (due to a more limited exponent range) and is significantly slower due to a lack of hardware support.
I'd personally only use float if storage was an issue - it's amazing how quickly the inaccuracies can build up when you only have around 7 significant decimal places.
Ultimately, as the comment from "bears will eat you" suggests, it depends on what values you're talking about - and of course what you plan to do with them. Without any further information I suspect that double is a good starting point - but you should really make the decision based on the individual situation.
Well, of course the term “scientific calculation” is a bit vague, but in general, it’s double.
float is largely for compatibility with libraries that expect 32-bit floating-point numbers. The performance of float and double operations (like addition) is exactly the same, so new code should always use double because it has greater precision.
However, the x86 JITter will never inline functions that take or return a float, so using float in methods could actually be slower. Once again, this is for compatibility: if it were inlined, the execution engine would skip a conversion step that reduces its precision, and thus the JITter could inadvertantly change the result of some calculations if it were to inline such functions.
Finally, there’s also decimal. Use this whenever it is important to have a certain number of decimal places. The stereotypical use-case is currency operations, but of course it supports more than 2 decimal places — it’s actually an 80-bit piece of data.
If even the accuracy of 64-bit double is not enough, consider using an external library for arbitrary-precision numbers, but of course you will only need that if your specific scientific use-case specifically calls for it.
Double seems to be the most reliable data type for such operations. Even WPF uses it extensively.
Be aware that decimals are much more expensive to use than floats/doubles (in addition to what Jon Skeet and Timwi wrote).
I'd recommend double unless you need the value to be exact; decimal is for financial calculations that need this exactitude. Scientific calculations tolerate small errors because you can't exactly measure 1 meter anyways. Float only helps if storage is a problem (ie. huge matrices).

Why is System.Math and for example MathNet.Numerics based on double?

All the methods in System.Math takes double as parameters and returns parameters. The constants are also of type double. I checked out MathNet.Numerics, and the same seems to be the case there.
Why is this? Especially for constants. Isn't decimal supposed to be more exact? Wouldn't that often be kind of useful when doing calculations?
This is a classic speed-versus-accuracy trade off.
However, keep in mind that for PI, for example, the most digits you will ever need is 41.
The largest number of digits of pi
that you will ever need is 41. To
compute the circumference of the
universe with an error less than the
diameter of a proton, you need 41
digits of pi †. It seems safe to
conclude that 41 digits is sufficient
accuracy in pi for any circle
measurement problem you're likely to
encounter. Thus, in the over one
trillion digits of pi computed in
2002, all digits beyond the 41st have
no practical value.
In addition, decimal and double have a slightly different internal storage structure. Decimals are designed to store base 10 data, where as doubles (and floats), are made to hold binary data. On a binary machine (like every computer in existence) a double will have fewer wasted bits when storing any number within its range.
Also consider:
System.Double 8 bytes Approximately ±5.0e-324 to ±1.7e308 with 15 or 16 significant figures
System.Decimal 12 bytes Approximately ±1.0e-28 to ±7.9e28 with 28 or 29 significant figures
As you can see, decimal has a smaller range, but a higher precision.
No, - decimals are no more "exact" than doubles, or for that matter, any type. The concept of "exactness", (when speaking about numerical representations in a compuiter), is what is wrong. Any type is absolutely 100% exact at representing some numbers. unsigned bytes are 100% exact at representing the whole numbers from 0 to 255. but they're no good for fractions or for negatives or integers outside the range.
Decimals are 100% exact at representing a certain set of base 10 values. doubles (since they store their value using binary IEEE exponential representation) are exact at representing a set of binary numbers.
Neither is any more exact than than the other in general, they are simply for different purposes.
To elaborate a bit furthur, since I seem to not be clear enough for some readers...
If you take every number which is representable as a decimal, and mark every one of them on a number line, between every adjacent pair of them there is an additional infinity of real numbers which are not representable as a decimal. The exact same statement can be made about the numbers which can be represented as a double. If you marked every decimal on the number line in blue, and every double in red, except for the integers, there would be very few places where the same value was marked in both colors.
In general, for 99.99999 % of the marks, (please don't nitpick my percentage) the blue set (decimals) is a completely different set of numbers from the red set (the doubles).
This is because by our very definition for the blue set is that it is a base 10 mantissa/exponent representation, and a double is a base 2 mantissa/exponent representation. Any value represented as base 2 mantissa and exponent, (1.00110101001 x 2 ^ (-11101001101001) means take the mantissa value (1.00110101001) and multiply it by 2 raised to the power of the exponent (when exponent is negative this is equivilent to dividing by 2 to the power of the absolute value of the exponent). This means that where the exponent is negative, (or where any portion of the mantissa is a fractional binary) the number cannot be represented as a decimal mantissa and exponent, and vice versa.
For any arbitrary real number, that falls randomly on the real number line, it will either be closer to one of the blue decimals, or to one of the red doubles.
Decimal is more precise but has less of a range. You would generally use Double for physics and mathematical calculations but you would use Decimal for financial and monetary calculations.
See the following articles on msdn for details.
Double
http://msdn.microsoft.com/en-us/library/678hzkk9.aspx
Decimal
http://msdn.microsoft.com/en-us/library/364x0z75.aspx
Seems like most of the arguments here to "It does not do what I want" are "but it's faster", well so is ANSI C+Gmp library, but nobody is advocating that right?
If you particularly want to control accuracy, then there are other languages which have taken the time to implement exact precision, in a user controllable way:
http://www.doughellmann.com/PyMOTW/decimal/
If precision is really important to you, then you are probably better off using languages that mathematicians would use. If you do not like Fortran then Python is a modern alternative.
Whatever language you are working in, remember the golden rule:
Avoid mixing types...
So do convert a and b to be the same before you attempt a operator b
If I were to hazard a guess, I'd say those functions leverage low-level math functionality (perhaps in C) that does not use decimals internally, and so returning a decimal would require a cast from double to decimal anyway. Besides, the purpose of the decimal value type is to ensure accuracy; these functions do not and cannot return 100% accurate results without infinite precision (e.g., irrational numbers).
Neither Decimal nor float or double are good enough if you require something to be precise. Furthermore, Decimal is so expensive and overused out there it is becoming a regular joke.
If you work in fractions and require ultimate precision, use fractions. It's same old rule, convert once and only when necessary. Your rounding rules too will vary per app, domain and so on, but sure you can find an odd example or two where it is suitable. But again, if you want fractions and ultimate precision, the answer is not to use anything but fractions. Consider you might want a feature of arbitrary precision as well.
The actual problem with CLR in general is that it is so odd and plain broken to implement a library that deals with numerics in generic fashion largely due to bad primitive design and shortcoming of the most popular compiler for the platform. It's almost the same as with Java fiasco.
double just turns out to be the best compromise covering most domains, and it works well, despite the fact MS JIT is still incapable of utilising a CPU tech that is about 15 years old now.
[piece to users of MSDN slowdown compilers]
Double is a built-in type. Is is supported by FPU/SSE core (formerly known as "Math coprocessor"), that's why it is blazingly fast. Especially at multiplication and scientific functions.
Decimal is actually a complex structure, consisting of several integers.

Categories

Resources