What are downsides to storing decimals as strings in proto3?

What are downsides to storing decimals as strings in proto3? - c#

I am using Googles protocol buffer library within my persistent storage system and want to persist currency values, but I am not sure that the floating point types provided by photo (float/double) are good enough. Are there any downsides to storing all of my currency values as strings (e.g. storing "0.10" instead of 0.1), then using the Convert.ToDecimal function when I retrieve my data and need to do arithmetic?

You are correct in anticipating that float/double data types are not suitable for "currency!"
Consider how SQL databases (and, uhh, COBOL programs ...) commonly store "currency" values: they use a decimal representation of some sort. For instance, a true COBOL program might use a "binary-coded decimal (BCD)" data type. A Microsoft Access database uses a "scaled integer": the dollars-and-cents value multiplied by 10,000, giving a fixed(!) "4 digits to the right of the decimal."
For the immediate purposes of this question, I would definitely store the values as strings, and then give very serious thought to the number of digits to be stored and just how to handle "rounding" to that number of digits. (For instance, there are algorithms such as “banker’s rounding.”)
“Storage size?” You don’t care about that. What you do care about is, that if a particular customer (or, auditor ...) actually adds-up all the numbers on a printed statement, the bottom-line on that piece of paper will agree ... at the very(!) least, within a single penny.

Related

Linq to SQL: Why decimal field get truncated on Insert?

I have a column defined as decimal(10,6). When I try to save the model with the value 10.12345, in the database I saved as 10.123400. The last digit ("5") is truncated.
Why is the number default to only 4 digits in LINQ (for decimal) and how can I avoid this for all columns in my models? The solution I found was to use DbType="Decimal(10,6)", but I have a lot of columns in this situation and the change should be applied to all, and I don't see it like a good idea.
Is there a way to change this behavior without changing all the decimal columns?
Thanks

You need to use the proper DbType, decimal(10, 6) in this case.
The reason for this is simple - while .NET's decimal is actually a (decimal) floating point (the decimal point can move), MS SQL's isn't. It's a fixed "four left of decimal point, six right of decimal point". When LINQ passes the decimal to MS SQL, it has to use a specific SQL decimal - and the default simply happens to use four for the scale. You could always use a decimal big enough for whatever value you're trying to pass, but that's very impractical - for one, it will pretty much eliminate execution plan caching, because each different decimal(p, s) required will be its own separate query. If you're passing multiple decimals, this means you'll pretty much never get a cached plan; ouch.
In effect, the command doesn't send the value 10.12345 - it sends 10123450 (not entirely true, but just bear with me). Thus, when you're passing the parameter, you must know the scale - you need to send 10 as 10000000, for example. The same applies when you're not using LINQ - using SqlCommand manually has the same "issue", and you have to use a specific precision and scale.
If you're wary of modifying all those columns manually, just write a script to do it for you. But you do need to maintain the proper data types manually, there's no way around it.

Reference values: string or integer?

We have reference values created from a Sequence in a database, which means that they are all integers. (It's not inconceivable - although massively unlikely - that they could change in the future to include letters, e.g. R12345.)
In our [C#] code, should these be typed as strings or integers?
Does the fact that it wouldn't make sense to perform any arithmetic on these values (e.g. adding them together) mean that they should be treated as string literals? If not, and they should be typed as integers (/longs), then what is the underlying principle/reason behind this?
I've searched for an answer to this, but not managed to find anything, either on Google or StackOverflow, so your input is very much appreciated.

There are a couple of other differences:
Leading Zeroes:
Do you need to allow for these. If you have an ID string then it would be required
Sorting:
Sort order will vary between the types:
Integer:
1
2
3
10
100
String
1
10
100
2
3
So will you have a requirement to put the sequence in order (either way around)?
The same arguments apply to your typing as applied in the DB itself too, as the requirements there are likely to be the same. Ideally as Chris says, they should be consistent.

Here are a few things to consider:
Are leading zeros important, i.e. is 010 different to 10. If so, use string.
Is the sort order important? i.e. should 200 be sorted before or after 30?
Is the speed of sorting and/or equality checking important? If so, use int.
Are you at all limited in memory or disk space? If so, ints are 4 bytes, strings at minimum 1 byte per character.
Will int provide enough unique values? A string can support potentially unlimited unique values.
Is there any sort of link in the system that isn't guaranteed reliable (networking, user input, etc)? If it's a text medium, int values are safer (all non-digit characters are erraneous), if it's binary, strings make for easier visual inspection (R13_55 is clearly an error if your ids are just alphanumeric, but is 12372?)

From the sounds of your description, these are values that currently happen to be represented by a series of digits; they are not actually numbers in themselves. This, incidentally, is just like my phone number: it is not a single number, it is a set of digits.
And, like my phone number, I would suggest storing it as a string. Leading zeros don't appear to be an issue here but considering you are treating them as strings, you may as well store them as such and give yourself the future flexibility.

They should be typed as integers and the reason is simply this: retain the same type definition wherever possible to avoid overhead or unexpected side-effects of type conversion.

There are good reasons to not use use types like int, string, long all over your code. Among other problems, this allows for stupid errors like
using a key for one table in a query pertaining another table
doing arithmetic on a key and winding up with a nonsense result
confusing an index or other integral quantity with a key
and communicates very little information: Given int id, what table does this refer to, what kind of entity does it signify? You need to encode this in parameter/variable/field/method names and the compiler won't help you with that.
Since it's likely those values will always be integers, using an integral type should be more efficient and put less load on the GC. But to prevent the aforementioned errors, you could use an (immutable, of course) struct containing a single field. It doesn't need to support anything but a constructor and a getter for the id, that's enough to solve the above problems except in the few pieces of code that need the actual value of the key (to build a query, for example).
That said, using a proper ORM also solves these problems, with less work on your side. They have their own share of downsides, but they're really not that bad.

If you don't need to perform some mathematical calculations on the sequences, you can easily choose strings.
But think about sorting: Produced orders between integers and strings will differ, e.g. 1, 2, 10 for integers and 1, 10, 2 for strings.

What datatype to use when storing metric data in SQL databases?

I'm trying to store metric data (meters, kilometers, square-meters) in SQL Server 2012.
What is the best datatype to use? float (C#: double), decimal (C#: decimal) or even geometry? Or something different?

Either a decimal with an appropriate amount of precision for your data, or an int type, if appropriate

It completely depends on the application and what precision you need for it.
If we are talking about architecture then then precision needs are relatively limited and a C# 32-bit float will take you a long way. In SQL this translates to float(24), also referred to as the database type real. This SQL DB type requires 4 bytes of storage per entry.
If we instead want to address points on the surface of the earth you need a lot higher precision. Here a C# double is suitable, which corresponds to a SQL float(53) or just float. This SQL DB type requires 8 bytes of storage and should be used only if needed or if the application is small and disk/memory usage is not a concern.
The SQL Decimal is could be a good alternative for the actual SQL DB, but has 2 drawbacks:
It corresponds to a C# Decimal which is a type designed for financial usage and to prevent round-off errors. This design renders the C# Decimal type slower than a float/double when used in trigonometric methods etc. You could of course cast this back and forth in your code, but that is not the most straight-forward approach IMO.
"The Decimal value type is appropriate for financial calculations requiring large numbers of significant integral and fractional digits and no round-off errors." - MSDN : Decimal Structure
The SQL DB type Decimal requires 5-9 bytes of storage per entry (depending on the precision used), which is larger than the float(x) alternatives.
So, use it according to your needs. In your comment you state that its about real estate, so I'd go for float(24) (aka real) which is exactly 4 bytes and directly translates to a C# float. See: float and real (Transact-SQL)
Lastly, here is a helpful resource for converting different types between .Net and SQL: SqlDbType Enumeration

Depends what you want to do
float or double are non-exact datatypes (so 5.0 == 5.0 may be false due to rounding issues)
Decimal is an exact datatype (so 5.0 == 5.0 will always be true)
and Geometry/Geography (easy said) are for locations on a map.
Float calculation should be fastes among the three, since geography is binary data with some infomation about projection (ist all about maps here) and decimal technically not as easy to handle as float.

How can I 'trim' a C# double to the value it will be stored as in an sqlite database?

I noticed that when I store a double value such as e.g. x = 0.56657011973046234 in an sqlite database, and then retrieve it later, I get y = 0.56657011973046201. According to the sqlite spec and the .NET spec (neither of which I originally bothered to read :) this is expected and normal.
My problem is that while high precision is not important, my app deals with users inputting/selecting doubles that represent basic 3D info, and then running simulations on them to find a result. And this input can be saved to an sqlite database to be reloaded and re-run later.
The confusion occurs because a freshly created series of inputs will obviously simulate in slightly different way to those same inputs once stored and reloaded (as the double values have changed). This is logical, but not desireable.
I haven't quite come to terms of how to deal with this, but in the meantime I'd like to limit/clamp the user inputs to values which can be exactly stored in an sqlite database. So if a user inputs 0.56657011973046234, it is actually transformed into 0.56657011973046201.
However I haven't been able to figure out, given a number, what value would be stored in the database, short of actually storing and retrieving it from the database, which seems clunky. Is there an established way of doing this?

The answer may be to store the double values as 17 significant digit strings. Look at the difference between how SQLite handles real numbers vs. text (I'll illustrate with the command line interface, for simplicity):
sqlite> create table t1(dr real, dt varchar(25));
sqlite> insert into t1 values(0.56657011973046234,'0.56657011973046234');
sqlite> select * from t1;
0.566570119730462|0.56657011973046234
Storing it with real affinity is the cause of your problem -- SQLite only gives you back a 15 digit approximation. If instead you store it as text, you can retrieve the original string with your C# program and convert it back to the original double.

Double round has an implementation with a parameter that specifies the number of digits. Use this to round to 14 digits (say) with: rval = Math.Round(Val, 14)
Then round when receiving the value from the database, and at the beginning of simulations, ie. So at the values match?
For details:
http://msdn.microsoft.com/en-us/library/75ks3aby.aspx
Another thought if you are not comparing values in the database, just storing them : Why not simply store them as binary data? Then all the bits would be stored and recovered verbatim?

Assuming that both SQL Lite and .NET correctly implement the IEEE specification, you should be able to get the same numeric results if you used the same floating point type on both of the sides (because the value shouldn't be altered when passed from database to C# and vice versa).
Currently you're using 8-byte IEEE floating point (single) (*) in SQL Lite and 16-byte floating-point in C# (double). The float type in C# corresponds to the 8-byte IEEE standard, so using this type instead of double could solve the problem.
(*) The SQL Lite documentation says that REAL is a floating point value, stored as an 8-byte IEEE floating point number.

You can use a string to store the # in the db. Personally I've done what winwaed suggested of rounding before storing and after fetching from the db (which used numeric()).
I recall being burned by bankers rounding but it could just be that didn't meet spec.

You can store the double as a string, and by using the round-trip formatting when converting the double to a string, it's guaranteed to generate the same value when parsed:
string formatted = theDouble.ToString("R", CultureInfo.Invariant);

If you want the decimal input values to round-trip, then you'll have to limit them to 15 significant digits. If you want the SQLite internal double-precision floating-point values to round-trip, then you might be out of luck; that requires printing to a minimum of 17 significant digits, but from what I can tell, SQLite prints them to a maximum of 15 (EDIT: maybe an SQLite expert can confirm this? I just read the source code and traced it -- I was correct, the precision is limited to 15 digits.)
I tested your example in the SQLite command interface on Windows. I inserted 0.56657011973046234, and select returned 0.566570119730462. In C, when I assigned 0.566570119730462 to a double and printed it to 17 digits, I got 0.56657011973046201; that's the same value you get from C#. 0.56657011973046234 and 0.56657011973046201 map to different floating-point numbers, so in other words, the SQLite double does not round-trip.

What is the best way to store a money value in the database?

I need to store a couple of money related fields in the database but I'm not sure which data type to use between money and decimal.

Decimal and money ought to be pretty reliable. What i can assure you (from painful personal experience from inherited applications) is DO NOT use float!

I always use Decimal; never used MONEY before.
Recently, I found an article regarding decimal versus money data type in Sql server that you might find interesting:
Money vs Decimal
It also seems that the money datatype does not always result in accurate results when you perform calculations with it : click
What I've done as wel in the past, is using an INT field, and store the amount in cents (eurocent / dollarcent).

I guess it comes down to precision and scale. IIRC, money is 4dp. If that is fine, money expresses your intent. If you want more control, use decimal with a specific precision and scale.

It depends on your application!!!
I work in financial services where we normally consider price to be significant to 5 decimal places after the point, which of course when you buy a couple of million at 3.12345pence/cents is a significant amount.
Some applications will supply their own sql type to handle this.
On the other hand, this might not be necessary.
<Humour>
Contractor rates always seemed to be rounded to the nearest £100, but currently seem to be to nearest £25 pounds in the current credit crunch.
</Humour>

Don't align your thoughts based on available datatypes. Rather, analyze your requirement and then see which datatype fits best.
Float is anytime the worst choice considering the limitation of the architecture in storing binary version of floating point numbers.
Money is a standard unit and will surely have more support for handling money related operations.
In case of decimal, you'll have to handle each and everything but you know it's only you who is handling a decimal type, thus no surprises which you may get with other two data types.

Use decimal and use more decimal places than you think you will need so that caclulations will be correct. Money does not alwys return correct results in calculations. Under no circumstances use float or real as these are inexact datatypes and can cause calculations to be wrong (especially as they get more complex).

For some data (like money) where you want no approximation or changes due to float value, you must be sure that the data is never 'floating', it must be rigid on both sides of decimal point.
One easy way to be safe is, to value by converting it into INTEGER data type, and be sure that while you retrive the value, decimal point is placed at proper location.
e.g.
1. To save $240.10 into database.
2. Convert it to a pure integral form: 24010 (you know its just the shift of decimal).
3. Revert it back to proper decimal state. Place decimal at 2 positions from right. $240.10
So, while being in databse it will be in a rigid integer form.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.