How does Int32.Parse() exactly parse the String? - c#

I am a novice C# learner. I know the basic concepts of this language. While revising the concepts, I stumbled upon one problem - How does Int32.Parse() exactly work?
Now I know what it does and the output and the overloads. What I need is the exact way in which this parsing is accomplished.
I searched on the MSDN site. It gives a very generalized definition of this method (Converts the string representation of a number to its 32-bit signed integer equivalent.) So my question is - How does it convert the string into a 32-bit signed integer?
On reading more, I found out 2 things -
The string parameter is interpreted using the "NumberStyles" enumeration
The string parameter is formatted and parsed using the "NumberFormatInfo" class
I need the theory behind this concept. Also, I did not understand the term - "culture-specific information" from the definition of the NumberFormatInfo class.

Here is the relevant code, which you can view under the terms of the MS-RSL.

"Culture-specific information" refers to the ways numbers can be written in different cultures. For example, in the US, you might write 1 million as:
1,000,000
But other cultures use the comma as a decimal separator, so you might see
1'000'000
or:
1 000 000
or, of course (in any culture):
1000000

Related

Reference values: string or integer?

We have reference values created from a Sequence in a database, which means that they are all integers. (It's not inconceivable - although massively unlikely - that they could change in the future to include letters, e.g. R12345.)
In our [C#] code, should these be typed as strings or integers?
Does the fact that it wouldn't make sense to perform any arithmetic on these values (e.g. adding them together) mean that they should be treated as string literals? If not, and they should be typed as integers (/longs), then what is the underlying principle/reason behind this?
I've searched for an answer to this, but not managed to find anything, either on Google or StackOverflow, so your input is very much appreciated.
There are a couple of other differences:
Leading Zeroes:
Do you need to allow for these. If you have an ID string then it would be required
Sorting:
Sort order will vary between the types:
Integer:
1
2
3
10
100
String
1
10
100
2
3
So will you have a requirement to put the sequence in order (either way around)?
The same arguments apply to your typing as applied in the DB itself too, as the requirements there are likely to be the same. Ideally as Chris says, they should be consistent.
Here are a few things to consider:
Are leading zeros important, i.e. is 010 different to 10. If so, use string.
Is the sort order important? i.e. should 200 be sorted before or after 30?
Is the speed of sorting and/or equality checking important? If so, use int.
Are you at all limited in memory or disk space? If so, ints are 4 bytes, strings at minimum 1 byte per character.
Will int provide enough unique values? A string can support potentially unlimited unique values.
Is there any sort of link in the system that isn't guaranteed reliable (networking, user input, etc)? If it's a text medium, int values are safer (all non-digit characters are erraneous), if it's binary, strings make for easier visual inspection (R13_55 is clearly an error if your ids are just alphanumeric, but is 12372?)
From the sounds of your description, these are values that currently happen to be represented by a series of digits; they are not actually numbers in themselves. This, incidentally, is just like my phone number: it is not a single number, it is a set of digits.
And, like my phone number, I would suggest storing it as a string. Leading zeros don't appear to be an issue here but considering you are treating them as strings, you may as well store them as such and give yourself the future flexibility.
They should be typed as integers and the reason is simply this: retain the same type definition wherever possible to avoid overhead or unexpected side-effects of type conversion.
There are good reasons to not use use types like int, string, long all over your code. Among other problems, this allows for stupid errors like
using a key for one table in a query pertaining another table
doing arithmetic on a key and winding up with a nonsense result
confusing an index or other integral quantity with a key
and communicates very little information: Given int id, what table does this refer to, what kind of entity does it signify? You need to encode this in parameter/variable/field/method names and the compiler won't help you with that.
Since it's likely those values will always be integers, using an integral type should be more efficient and put less load on the GC. But to prevent the aforementioned errors, you could use an (immutable, of course) struct containing a single field. It doesn't need to support anything but a constructor and a getter for the id, that's enough to solve the above problems except in the few pieces of code that need the actual value of the key (to build a query, for example).
That said, using a proper ORM also solves these problems, with less work on your side. They have their own share of downsides, but they're really not that bad.
If you don't need to perform some mathematical calculations on the sequences, you can easily choose strings.
But think about sorting: Produced orders between integers and strings will differ, e.g. 1, 2, 10 for integers and 1, 10, 2 for strings.

How to get the best number format string?

I read a lot of articles about number format string, ex: http://msdn.microsoft.com/en-us/library/0c899ak8.aspx
I really not understand how to write the best format string. To get a excepted result, I can write some ways. Example: print number 1234567890 as a text "1,234,567,890". These ways give the same result:
1234567890.ToString("#,#")
1234567890.ToString("#,##")
"#,##" is the popular one on internet but why? Please give me some information how to write a good format string. Thanks.
As far as I can see, there is no difference between "#,#" and "#,##": both mean 'format a number with group separators and without the fractional part'. Refer to SSCLI source for general number formatting for the gory details.
In your case it does not matter, because you are using int. This format #,## and #,# are for doubles and means that print only two digits after decimal point

Math with very large numbers using strings

I'm trying to use strings to do math with very large numbers using strings, and without external libraries.
I have tried looking online with no success, and I need functions for addition, subtraction, multiplication, and division (if possible, and limited to a specified number of decimal places.)
example: add 9,900,000,000
and 100,000,020
should be 10,000,000,020.
EDIT: Im sorry I diddn't be specific enough, but I can only use Strings. no Long, bigInt, anything.
just the basic string and if nessecary, int32.
This is NOT a homework question!
Have you looked at BigInteger ?
If you're using .NET Framework 4, you can make use of the new System.Numerics.BigInteger class, which is an integer that can hold any whole number at all, until you run out of memory.
(The examples you provide, by the way, can be computed using long or System.UInt64.)
You have to convert the value in bits first & then apply the operation which you wish. After operation, you should convert back the bits to the number.

.NET some values multipled by large factor on some machines not others

I have a rather strange problem. I have a very simple application that reads some data from a csv formatted file, and draws a polar 'butterfly' to a form. However a few people in european countries get a very wierd looking curve instead, and when I modified the program to output some sample values to try and workout what is going on, it only gave me more questions!
Here is a sample of expected values, and what one particular user gets instead:
EXPECTED -> SEEN
0.00 0.00 -> 0,00 0,00
5.00 1.35 -> 5,00 1346431626488,41
10.00 2.69 -> 10,00 2690532522738,65
So all the values on the right (which are computed in my program) are multiplied by a factor of 10^12!! How on earth can that happen in the CLR? the first numbers - 0, 5, 10 - are just produced by the simple loop that writes the output, using: value += 5.
The code producing these computations does make use of interpolation using the alglib.net library, but the problem does also occur with 2 other values that are extracted from xml returned from a http get, and then converted from radians to degrees.
Also not exactly a problem, but why would decimal values print with commas instead of decimal points? The output code is a simple string.Format("{0:F}", value) where value is a double?
So why on earth would some values be shifted by 12 decimal places, but not others, and only in some countries? Yes others have run the app with no problems... Not sure if there is any relevance but this output came from Netherlands.
Different cultures use different thousands and decimal separators. en-US (US English) uses "," and "." but de-DE (German German) uses "." and ",". This means that when reading from or writing to strings you need to use the proper culture. When persisting information for later retrieval that generally means CultureInfo.InvariantCulture. When displaying information to the user that generally means CultureInfo.CurrentCulture.
You haven't provided the code that reads from the CSV file, but I imagine you're doing something like double.Parse(field) for each field. If the field has the value "5.0" and you parse it when the current culture is de-DE "." will be considered a thousands separator and the value gets read as 50.0 in en-US terms. What you should be doing is double.Parse(field, CultureInfo.InvariantCulture).
All of the Parse, TryParse, Format, and many ToString methods accept an IFormatProvider. Get in the habit of always providing the appropriate format provider and you wont get bitten by internationalization issues.
My personal guess would be that you have a string -> Number conversion somewhere that is not culture aware at all.
Why oh simple run this code :
var nl = System.Globalization.CultureInfo.GetCultureInfo("nl-NL");
var numberString = "1.000000000000000";
Console.WriteLine(float.Parse(numberString, nl));
The result is 1E+15 now you just have to find the places where you need to provide the CultureInfo.InvariantCulture (Simplified english, equivalent to the "C" culture in C) to Parse along with the string.
In some languages a decimal comma is used instead of the decimal point. This depends on the culture. You can force your own culture if it's important to you that only points are used.
One interesting thing of note is that if 1346431626488 were divided by 1,000,000,000,000, then you would get 1.35 rounded to two decimal places. And if 2.69 were divided by 1,000,000,000,000 then you would get 2.69 rounded to two decimal places. Just an observation.

Rational numbers in C# and XML

I'm working with an XML file that subscribes to an industry standard. The standards document for the schema defines one of the fields as a rational number and its data is represented as two integers, typically with the second value being a 1 (e.g. <foo>20 1</foo>). I've been hunting around without a great deal of success to see if there's an XML-defined standard for rational numbers. I did find this (8 year old) exchange on the mailing list for XML-SCHEMA:
http://markmail.org/message/znvfw2r3a2pfeykl
I'm not clear that there is a standard "XML way" for representing rational numbers and whether the standard applying to this document is subscribing to it, or whether they've cooked up their own way of doing it for their documents and are relying on people to read the standard. The document is not specific beyond saying the field is a rational number.
Assuming there is a standard way of representing rational numbers and this document is correctly implementing it, does the functionality in System.Xml recognize it? Again, my searches have not been particularly fruitful.
Thanks for any feedback anyone has.
This isn't exactly an answer to the XML-side of things, but if you are wanting a C# class for representing rational numbers, I write a very flexible one a while back as part of my ExifUtils library (since most EXIF values are represented as rational numbers).
Rational<T> https://github.com/mckamey/exif-utils.net/blob/master/ExifUtils/ExifUtils/Rational.cs
The class itself is generic accepting a numerator/denominator of any type implementing IConvertable (which includes all BCL number types) and will serialize (ToString) and deserialize (Parse/TryParse) which may give you exactly what you need for your XML representation.
If you absolutely must represent a rational number with a space, you could adapt it to use space ' ' as the delimiter with literally a single character change in the source.
As a slightly off-topic aside in response to Steven Lowe's comments, the use of rational numbers while seemingly unintuitive has some advantages. Numbers such as PI cannot be represented as a decimal/floating point number either. The approximation of PI (e.g. the value in Math.PI) can be just as precisely represented as a rational number:
314159265358979323846 / 100000000000000000000
Whereas the very simple rational number 2/3 is impossible to represent to the same precision as any sort of floating point / fixed precision decimal number:
0.66666666666666666667
i'm glad they didn't accept this proposal as a standard! the guy proposing to base all other numbers on a 'rational number' primitive has never heard of transcendental numbers (like Pi, for example) which cannot be represented in this manner
but back to your question - i've only run across rational numbers in xml as part of an RDF specification for certain engineering values related to the power industry. I think it was just a pair of numbers separated by a comma
this document defines the format as N/M, while another reference has it as N,M
You can express fractions in MathML. That is the industry standard AFAIK.

Categories

Resources