... but returns 12345?
The doc for Single.Parse says:
Exceptions
...
FormatException
s does not represent a numeric value.
...
For my understanding "123,45" doesn't represent a proper numeric value (in countries that use comma as thousands separator).
The system's CultureInfo has:
CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator == "."
CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator == ","
CultureInfo.CurrentCulture.NumberFormat.NumberGroupSizes == [3]
Apparently the comma is simply ignored and this leads to even more irritating results: "123,45.67" or "1,23,45.67"–which look utterly wrong–become 12345.67.
Supplementary question
I don't get what this sentence in the doc is supposed to mean and whether this is relevant for this case:
If a separator is encountered in the s parameter during a parse operation, and the applicable currency or number decimal and group separators are the same, the parse operation assumes that the separator is a decimal separator rather than a group separator.
In the default and US culture, the comma (,) is legal as a separator between groups. Think of larger numbers like this:
987,654,321
That it's in the wrong place for a group doesn't really matter; the parser isn't that smart. It just ignores the separator.
For the supplemental question, some cultures use commas as the decimal separator, rather than a group separator. This part of the documentation clarifies what will happen if the group separator and decimal separator are somehow set to the same character.
As Joel said, "the parser isn't that smart". The source code is available, so here's the proof.
The code for Single.Parse ends up calling Number.ParseNumber.
Interestingly, Number.ParseNumber is given a NumberFormatInfo object, which does have a NumberGroupSizes property, which defines "the number of digits in each group to the left of the decimal".
However, you'll notice that on line 851, where it checks for the group separator, it doesn't bother to reference the NumberGroupSizes property to check if the group separator is in an expected position. In fact Number.ParseNumber never uses the NumberGroupSizes property.
NumberFormatInfo.NumberGroupSizes is only ever used when converting a number to a string.
Related
In C#:
This throws a FormatException, which seems like it shouldn't:
Int32.Parse("1,234");
This does not, which seems normal:
Single.Parse("1,234");
And surprisingly, this parses just fine:
Single.Parse("1,2,3,4"); //Returns 1234
My local culture is EN-US, so , is the default thousands separator char.
Main question: Why the inconsistency?
Also: Why does Parse("1,2,3,4") work? It appears to just be removing all instances of the local separator char before parsing. I know there would be extra runtime overhead in a regex check or something like that, but when would the numeric literal "1,2,3,4" not be a typo?
Related:
C# Decimal.Parse issue with commas
According to MSDN:
The s parameter contains a number of the form:
[ws][sign]digits[ws]
The s parameter is interpreted using the NumberStyles.Integer style. In addition to decimal digits, only leading and trailing spaces together with a leading sign are allowed.
That's it, NumberStyles.Integer disallows the Parse method to use the thousands separator, whereas Single.Parse uses by default NumberStyles.Float and NumberStyles.AllowThousands. You can change this behaviour by specifiying the second argument as NumberStyles:
Int32.Parse("1,234", NumberStyles.AllowThousands); //works
Single.Parse ignores the grouping and doesn't use culture-specific NumberGroupSizes at all, and only determines if the character is a group or decimal separator. The group sizes are used only when formatting numbers.
For the first case, from Microsoft Source Code Reference, by default Int32.Parse implements NumberStyles.Integer but not NumberStyles.AllowThousands
public static int Parse(String s) {
return Number.ParseInt32(s, NumberStyles.Integer, NumberFormatInfo.CurrentInfo);
}
Thus any comma separator is not allowed. This:
Int32.Parse("1,234");
or
Int32.Parse("1.234");
will both be wrong. In any culture.
To fix it, NumberStyles.AllowThousands must be added to the NumberStyles which will allow "1,234" to be parsed in EN-US culture:
Int32.Parse("1,234", NumberStyles.Integer | NumberStyles.AllowThousands);
But
Int32.Parse("1.234", NumberStyles.Integer | NumberStyles.AllowThousands);
Will still throw an Exception.
For the second case, according to Microsoft Code Source Reference, the default style for Single.Parse is:
public static float Parse(String s) {
return Parse(s, NumberStyles.Float | NumberStyles.AllowThousands, NumberFormatInfo.CurrentInfo);
}
Which allows thousands separator. And "," is recognized as thousand separator in EN-US culture, for Single.Parse and thus you get the second case parsed correctly
Single.Parse("1,234"); //OK
And obviously "1.234" will also be correct, except that "." is not recognized as thousand separators but decimal separator.
As for the third case, Internally, Single.Parse calls TryStringToNumber, and Parse.Number which would simply ignore the thousand separators. Thus you get:
Single.Parse("1,2,3,4"); //Returns 1234
Because it is equivalent as
Single.Parse("1234"); //Returns 1234
mine is es-ES
On these . is the default thousands separator char, and "," the separate character between int and double
So
any parse like "1.2.3,4" gives me "123,40" ( 123.40 on US )
If i put the "." before the "," like "123,4.3" it gives error
but, the same way the questions says, if i put "1.2.3.4" gives me "1234"
So, may be it is a functionality of the .net itself.
I have a little Problem.
i use [0-9\,.]*
to finde a decimal in a string.
And ([^\s]+) to find the text behind the first number.
The string looks normally like this. 1 number a text and than a date:
1.023,45 stück
24.05.10
but sometimes I had just the date and then i become 240510 as decimal.
And sometimes I had just the decimal.
How should I modify the regex to find the date if existing and remove it?
And then look for a decimal an select this if existing.
Thanks in advance.
Divide and conquer
Check for the date first and remove the match from the string
([0-9]{1,2}\.){2}[0-9]{1,2}
Find the number using your original regex
[0-9\,.]*
If you need it find the unit of quantity (assuming that you will only have it as lower case with u Umlaut)
([a-zü]+)
See http://regexe.de/ (German) and http://www.regexr.com/ (English) for some useful information and tools for dealing with regex.
I suggest matching the number in a more restricted way (1-3 digits, then . + 3 digits groups if any, and a decimal separator with digits, optional).
(?s)(?<number>\d{1,3}(?:\.\d{3})*(?:,\d+)?)\s+(.*?)(?:$|\n|(?<date>\d{2}\.?`\d{2}\.?(?:\d{4}|\d{2})))
See demo
The number will be held in ${number}, and the date in ${date}. If the string starts with something very similar to a date (6 or 8 digits with optional periods), it won't be captured. If the date format is known (say, the periods are always present), remove the ?s from \.?s.
(?s) at the beginning will force the period . to match a new line (maybe it is not necessary).
I have a line like the following in my code:
string buffer = string.Format(CultureInfo.InvariantCulture, "{0:N4}", 1008.0);
Why does buffer contain 1,008.0 (note comma) after executing this line?
Yes, I do guess that it's caused by my regional settings. The question is why they affect the result in this case?
EDIT:
Ok, I understand that it's completely my fault. It seems like I should have used F format specifier.
The InvariantCulture is loosely based on en-US which uses , as a thousands (group) separator.
Your result is what I would expect.
I also point you to the details of the N numeric format specifier:
The numeric ("N") format specifier converts a number to a string of the form "-d,ddd,ddd.ddd…", where "-" indicates a negative number symbol if required, "d" indicates a digit (0-9), "," indicates a group separator, and "." indicates a decimal point symbol.
You're using the invariant culture; your culture is irrelevant to this. For this, the N4 format means
-d,ddd,ddd,ddd...
That is, possible leading negative sign indicator and commas between thousands groups. For details see: http://msdn.microsoft.com/en-us/library/dwhawy9k#NFormatString
You can look at
NegativeSign
NumberNegativePattern
NumberGroupSizes
NumberGroupSeparator
NumberDecimalSeparator
NumberDecimalDigits
for the invariant culture. If you do, you'll see:
-
1
{ 3 }
,
.
2
You are getting the comma because of "{0:N4}"
n ----- Number with commas for thousands ----- {0:n}
Source:
You will get the comma even without specifying InvariantCulture
Console.WriteLine(string.Format("{0:n4}", 1008.0));
Why decimal.Parse(10 10) is valid?
I need to get exception in such case.
Please advise me something.
decimal c;
try
{
c = decimal.Parse("10 10");
Console.Write(c);
Console.ReadLine();
}
catch (Exception)
{
throw;
}
This throws an exception when I run it - which leads me to suspect that it's culture-sensitive.
My guess is that you're in a culture which uses space as a "thousands" separator. For example, if I try to parse "10,10" that works because comma is the thousands separator in my default culture.
To prevent this, use
decimal value = decimal.Parse(text, NumberStyles.None);
... or some other appropriate combination of NumberStyles which excludes AllowThousands.
From MSDN: "Parameter s is interpreted using the NumberStyles.Number style. This means that white space and thousands separators are allowed but currency symbols are not. To explicitly define the elements (such as currency symbols, thousands separators, and white space) that can be present in s, use either the Decimal.Parse(String, NumberStyles) or the Decimal.Parse(String, NumberStyles, IFormatProvider) method."
http://msdn.microsoft.com/en-us/library/cafs243z.aspx
edit: To further clarify, you need to either
explicitly set the culture of your application to one which does NOT allow whitespace in numbers, or
explicitly provide a NumberStyles parameter which specifies that whitespace is NOT allowed
edit 2: Jon Skeet's answer is correct. For example, the following does NOT throw an exception, because whitespace is used as thousands separators in sv-SE:
Decimal.Parse(" 10 10 ", CultureInfo.GetCultureInfo("sv-SE").NumberFormat)
The following, however, DOES throw an exception:
Decimal.Parse(" 10 10 ", CultureInfo.GetCultureInfo("en-US").NumberFormat)
I just ran this code on Visual Studio 2010/C# 4.0, and got a FormatException, as expected. What regional settings is your computer configured to use? Is it possible that you have " " (space) as a thousands separator or decimal separator?
For my answer in this question I have to compare two characters. I thought that the normal char.CompareTo() method would allow me to specify a CultureInfo, but that's not the case.
So my question is: How can I compare two characters and specify a CultureInfo for the comparison?
There is no culture enabled comparison for characters, you have to convert the characters to strings so that you can use for example the String.Compare(string, string, CultureInfo, CompareOptions) method.
Example:
char a = 'å';
char b = 'ä';
// outputs -1:
Console.WriteLine(String.Compare(
a.ToString(),
b.ToString(),
CultureInfo.GetCultureInfo("sv-SE"),
CompareOptions.IgnoreCase
));
// outputs 1:
Console.WriteLine(String.Compare(
a.ToString(),
b.ToString(),
CultureInfo.GetCultureInfo("en-GB"),
CompareOptions.IgnoreCase
));
There is indeed a difference between comparing characters and strings. Let me try to explain the basic issue, which is quite simple: A character always represents a single unicode point. Comparing characters always compares the code points without any regard as to their equal meaning.
If you want to compare characters for equal meaning, you need to create a string and use the comparison methods provided there. These include support for different cultures. See Guffa's answer on how to do that.
Did you try String.Compare Method?
The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.
String.Compare(str1, str2, false, new CultureInfo("en-US"))
I don't think cultureInfo matters while comparing chars in C#. char is already a Unicode character so two characters can be easily compared witohut CultureInfo.