C# Decimal.Parse issue with commas - c#

Here's my problem (for en-US):
Decimal.Parse("1,2,3,4") returns 1234, instead of throwing an InvalidFormatException.
Most Windows applications (Excel en-US) do not drop the thousand separators and do not consider that value a decimal number. The same issue happens for other languages (although with different characters).
Are there any other decimal parsing libraries out there that solve this issue?
Thanks!

It's allowing thousands, because the default NumberStyles value used by Decimal.Parse (NumberStyles.Number) includes NumberStyles.AllowThousands.
If you want to disallow the thousands separators, you can just remove that flag, like this:
Decimal.Parse("1,2,3,4", NumberStyles.Number ^ NumberStyles.AllowThousands)
(the above code will throw an InvalidFormatException, which is what you want, right?)

I ended up having to write the code to verify the currency manually. Personally, for a framework that prides itself for having all the globalization stuff built in, it's amazing .NET doesn't have anything to handle this.
My solution is below. It works for all the locales in the framework. It doesn't support Negative numbers, as Orion pointed out below, though. What do you guys think?
public static bool TryParseCurrency(string value, out decimal result)
{
result = 0;
const int maxCount = 100;
if (String.IsNullOrEmpty(value))
return false;
const string decimalNumberPattern = #"^\-?[0-9]{{1,{4}}}(\{0}[0-9]{{{2}}})*(\{0}[0-9]{{{3}}})*(\{1}[0-9]+)*$";
NumberFormatInfo format = CultureInfo.CurrentCulture.NumberFormat;
int secondaryGroupSize = format.CurrencyGroupSizes.Length > 1
? format.CurrencyGroupSizes[1]
: format.CurrencyGroupSizes[0];
var r = new Regex(String.Format(decimalNumberPattern
, format.CurrencyGroupSeparator==" " ? "s" : format.CurrencyGroupSeparator
, format.CurrencyDecimalSeparator
, secondaryGroupSize
, format.CurrencyGroupSizes[0]
, maxCount), RegexOptions.Compiled | RegexOptions.CultureInvariant);
return !r.IsMatch(value.Trim()) ? false : Decimal.TryParse(value, NumberStyles.Any, CultureInfo.CurrentCulture, out result);
}
And here's one test to show it working (nUnit):
[Test]
public void TestCurrencyStrictParsingInAllLocales()
{
var originalCulture = CultureInfo.CurrentCulture;
var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
const decimal originalNumber = 12345678.98m;
foreach(var culture in cultures)
{
var stringValue = originalNumber.ToCurrencyWithoutSymbolFormat();
decimal resultNumber = 0;
Assert.IsTrue(DecimalUtils.TryParseCurrency(stringValue, out resultNumber));
Assert.AreEqual(originalNumber, resultNumber);
}
System.Threading.Thread.CurrentThread.CurrentCulture = originalCulture;
}

You might be able to do this in a two-phase process. First you could verify the thousands separator using the information in the CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator and CultureInfo.CurrentCulture.NumberFormat.NumberGroupSizes throwing an exception if it doesn't pass and then pass the number into the Decimal.Parse();

It is a common issue never solved by microsoft.
So, I don't understand why 1,2,3.00 (english culture for example) is valid!
You need to build an algorith to examine group size and return false/exception(like a failed double.parse) if the test is not passed.
I had a similar problem in a mvc application, which build in validator doesn't accept thousands..so i've overwrite it with a custom, using double/decimal/float.parse, but adding a logic to validate group size.
If you want read my solution (it is used for my mvc custom validator, but you can use it to have a better double/decimal/float.parse generic validator) go here
https://stackoverflow.com/a/41916721/3930528

Related

Parsing double with dot to comma

I am working with doubles. In the Netherlands we make use of 51,3 instead of 51.3. I did write a piece of code that works with dots instead of commas. But the result of the previously written code returns a double the English way, with a dot. I am encountering some strange errors.
Here is what I have:
var calResult = 15.2d;
var calResultString = calResult.ToString(CultureInfo.GetCultureInfo("nl-NL"));
var result = double.Parse(calResultString);
calResult == "15.2" -> as expected
calResultString == "15,2" -> as expected
result == "152" -> here I expect a comma.
A also did try to add the cultureinfo also in the double.Parse. This resulted in a "15.2".
TLDR: I need to convert an English/American double to a Dutch(or similar rules) one.
Thanks in advance! :)
P.S
I hope this is not a duplicate question, but didn't found anything this specific.
You, probably, should either provide "nl-NL" whenever you work with Netherlands' culture
var calResult = 15.2d;
var calResultString = calResult.ToString(CultureInfo.GetCultureInfo("nl-NL"));
// We should parse with "nl-NL", not with CurrentCulture which seems to be "en-US"
var result = double.Parse(calResultString, CultureInfo.GetCultureInfo("nl-NL"));
Or specify CurrentCulture (default culture)
CultureInfo.CurrentCulture = CultureInfo.GetCultureInfo("nl-NL");
var calResult = 15.2d;
// now CultureInfo.GetCultureInfo("nl-NL") is redundant
var calResultString = calResult.ToString();
var result = double.Parse(calResultString);
Finally, if you have a string which represents some floating point value in en-US culture, and you want the same value but be a string in nl-NL format:
string source = "123.456";
string result = double
.Parse(source, CultureInfo.GetCultureInfo("en-US"))
.ToString(CultureInfo.GetCultureInfo("nl-NL"));
Numbers and strings don't contain any culture information, instead you specify the culture when you convert between numbers and strings.
result == "152" -> here I expect a comma
What happened is that you asked the operating system to parse "15,2" into a double, and didn't specify a culture. It defaulted to US culture and ignored the comma.
If you'd specified a culture:
var result = double.Parse(calResultString, CultureInfo.GetCultureInfo("nl-NL"));
it would have given you the right value (15.2), and that might even have been displayed as 15,2 if your computer was configured to the right number format (and the debugger used your preference).
Ideally you don't hard-code the culture, but use the culture that the user has chosen.
I've written a simple method that will check for the coma character in your input and replace it with a dot. I believe the best way is to take an input as a string value. this way you can manipulate it and then you can parse it and return a double or a string if you wish:
var input = Console.ReadLine();
double parsedDouble;
if (input.Contains(","))
{
input = input.ToString().Replace(",", ".");
}
if (!Double.TryParse(input, out parsedDouble))
{
Console.WriteLine("Error parsing input");
}
else
{
Console.WriteLine(parsedDouble);
}
Console.ReadLine();
edit: the answers from Robin Bennett/Dmitry Bychenko are much better than mine, as mine is just more manual. I wasn't aware of the overload of parse that he had provided.
I'll leave my solution, cause it does solve this issue, even if it's a bit more... brute ;)
var calResult = 15.2d;
var calResultString = calResult.ToString();
string result = double.Parse(calResultString).ToString(CultureInfo.GetCultureInfo("nl-NL"));

C# string to Decimal On All style or Culture

Hi I want to find if there is any better way to parse the string to Decimal which covers various format
$1.30
£1.50
€2,50
2,50  €
2.500,00  €
I see a lot of examples using culture to convert . & ,. But in my case, I don't have anything to identify the culture.
This display field I get from the client and I need to extract the value.
I tried following (which didn't work for all scenario) but would like to know if we have any best way to handle this.
Decimal.Parse(value,NumberStyles.Currency |
NumberStyles.Number|NumberStyles.AllowThousands |
NumberStyles.AllowTrailingSign | NumberStyles.AllowCurrencySymbol)
I also tried to use Regex to remove the currency sign but unable to convert both 1.8 or 1,8 in one logic.
Well, assuming you always get a valid currency format, and it's only the culture that changes, you could guess which character is used as a decimal point and which is used as a thousands separator by checking which appears the last in the number. Then remove all the thousand separators and parse it like its culture was invariant.
The code would look like the following:
// Replace with your input
var numberString = "2.500,00 €";
// Regex to extract the number part from the string (supports thousands and decimal separators)
// Simple replace of all non numeric and non ',' '.' characters with nothing might suffice as well
// Depends on the input you receive
var regex = new Regex"^[^\\d-]*(-?(?:\\d|(?<=\\d)\\.(?=\\d{3}))+(?:,\\d+)?|-?(?:\\d|(?<=\\d),(?=\\d{3}))+(?:\\.\\d+)?)[^\\d]*$");
char decimalChar;
char thousandsChar;
// Get the numeric part from the string
var numberPart = regex.Match(numberString).Groups[1].Value;
// Try to guess which character is used for decimals and which is used for thousands
if (numberPart.LastIndexOf(',') > numberPart.LastIndexOf('.'))
{
decimalChar = ',';
thousandsChar = '.';
}
else
{
decimalChar = '.';
thousandsChar = ',';
}
// Remove thousands separators as they are not needed for parsing
numberPart = numberPart.Replace(thousandsChar.ToString(), string.Empty);
// Replace decimal separator with the one from InvariantCulture
// This makes sure the decimal parses successfully using InvariantCulture
numberPart = numberPart.Replace(decimalChar.ToString(),
CultureInfo.InvariantCulture.NumberFormat.CurrencyDecimalSeparator);
// Voilá
var result = decimal.Parse(numberPart, NumberStyles.AllowDecimalPoint | NumberStyles.Number, CultureInfo.InvariantCulture);
It does look a bit of complicated for a simple decimal parsing, but I think should do the work for all the input numbers you get or at least the most of them.
If you do this in some sort of loop, you might want to use compiled regex.
The problem here is that in one case . means decimal point but in other it is a thousnads separator. And then you have , as decimal separator. Clearly, it is impossible for the parser to "guess" what is meant, so the only thing you can do is to decide on some rules on how to handle which case.
If you have control over the UI the best approach would be to validate user input and just reject any value that can't be parsed with an explanation on which format is expected.
If you have no control over the UI, the second best option would be to check for some "rules" and then devise which culture is appropriate for that given input and try to run it through decimal.TryParse for that given culture.
For the given input you have, you could have the following rules:
input.StartsWith("$") -> en-US
input.StartsWith("£") -> en-GB
input.StartsWith("€") || input.EndsWith("€") -> de-DE
These could reasonably handle all cases.
In code:
static void Main(string[] args)
{
string[] inputs =
{
"$1.30",
"£1.50",
"€2,50",
"2,50 €",
"2.500,00 €"
};
for (int i = 0; i < inputs.Length; i++)
{
Console.Write((i + 1).ToString() + ". ");
if (decimal.TryParse(inputs[i], NumberStyles.Currency,
GetAppropriateCulture(inputs[i]), out var parsed))
{
Console.WriteLine(parsed);
}
else
{
Console.WriteLine("Can't parse");
}
}
}
private static CultureInfo GetAppropriateCulture(string input)
{
if (input.StartsWith("$"))
return CultureInfo.CreateSpecificCulture("en-US");
if (input.StartsWith("£"))
return CultureInfo.CreateSpecificCulture("en-GB");
if (input.StartsWith("€") || input.EndsWith("€"))
return CultureInfo.CreateSpecificCulture("de-DE");
return CultureInfo.InvariantCulture;
}
Output:
1.30
1.50
2.50
2.50
2500.00
The only way you could do that is just strip string from symbols and change . and , to decimal separator. Something like:
public decimal UniversalConvertDecimal(string str)
{
char currentDecimalSeparator = Convert.ToChar(Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator);
str = str.Replace('.', currentDecimalSeparator);
str = str.Replace(',', currentDecimalSeparator);
StringBuilder builder = new StringBuilder(str.Length);
foreach(var ch in str)
{
if(Char.IsDigit(ch) || ch == currentDecimalSeparator)
builder.Add(ch);
}
string s = builder.ToString();
return Convert.ToDecimal(s);
}
First you have to get current decimal separator from your system.
Then you have to replace . and , with current decimal separator.
Next, you will have to strip the string from any other char than a digit or decimal separator. At the end you can be sure that Convert.ToDecimal is going to work. But I don't know if it is something you want to achieve.
If you need some mechanism to save currency to database, there is a far simpler solution. Just convert this currency to least currency part. For example instead of $1, save 100 cents.
So if you have $1.99, just multiply it by 100 and you will get: 199 cents. And this integer can be saved to db.

Convert string to decimal with format

I need convert a String to a decimal in C#, but this string have different formats.
For example:
"50085"
"500,85"
"500.85"
This should be convert for 500,85 in decimal. Is there is a simplified form to do this convertion using format?
Some cultures use a comma to indicate the floating point. You can test this with the following code on an aspx page:
var x = decimal.Parse("500,85");
Response.Write(x + (decimal)0.15);
This gives the answer 501 when the thread culture has been set to a culture that uses the comma as floating point. You can force this like so:
var x = decimal.Parse("500,85", new NumberFormatInfo() { NumberDecimalSeparator = "," });
While decimal.Parse() is the method you are looking for, you will have to provide a bit more information to it. It will not automatically pick between the 3 formats you give, you will have to tell it which format you are expecting (in the form of an IFormatProvider). Note that even with an IFormatProvider, I don't think "50085" will be properly pulled in.
The only consistent thing I see is that it appears from your examples that you always expect two decimal places of precision. If that is the case, you could strip out all periods and commas and then divide by 100.
Maybe something like:
public decimal? CustomParse(string incomingValue)
{
decimal val;
if (!decimal.TryParse(incomingValue.Replace(",", "").Replace(".", ""), NumberStyles.Number, CultureInfo.InvariantCulture, out val))
return null;
return val / 100;
}
This will work, depending on your culture settings:
string s = "500.85";
decimal d = decimal.Parse(s);
If your culture does not by default allow , instead of . as a decimal point, you will probably need to:
s = s.Replace(',','.');
But will need to check for multiple .'s... this seems to boil down to more of an issue of input sanitization. If you are able to validate and sanitize the input to all conform to a set of rules, the conversion to decimal will be a lot easier.
Try this code below:
string numValue = "500,85";
System.Globalization.CultureInfo culInfo = new System.Globalization.CultureInfo("fr-FR");
decimal decValue;
bool decValid = decimal.TryParse(numValue, System.Globalization.NumberStyles.Number, culInfo.NumberFormat, out decValue);
if (decValid)
{
lblDecNum.Text = Convert.ToString(decValue, culInfo.NumberFormat);
}
Since I am giving a value of 500,85 I will assume that the culture is French and hence the decimal separator is ",". Then decimal.TryParse(numValue, System.Globalization.NumberStyles.Number, culInfo.NumberFormat,out decValue);
will return the value as 500.85 in decValue. Similarly if the user is English US then change the culInfo constructor.
There are numerous ways:
System.Convert.ToDecimal("232.23")
Double.Parse("232.23")
double test;
Double.TryParse("232.23", out test)
Make sure you try and catch...
This is a new feature called Digit Grouping Symbol.
Steps:
Open Region and Language in control panel
Click on Additional setting
On Numbers tab
Set Digit Grouping Symbol as custom setting.
Change comma; replace with (any character as A to Z or {/,}).
Digit Grouping Symbol=e;
Example:
string checkFormate = "123e123";
decimal outPut = 0.0M;
decimal.TryParse(checkFormate, out outPut);
Ans: outPut=123123;
Try This
public decimal AutoParse(string value)
{
if (Convert.ToDecimal("3.3") == ((decimal)3.3))
{
return Convert.ToDecimal(value.Replace(",", "."));
}
else
{
return Convert.ToDecimal(value.Replace(".", ","));
}
}

universal parsing of strings into float

In the environment that my program is going to run, people use ',' and '.' as decimal separators randomly on PCs with ',' and '.' separators.
How would you implements such a floatparse(string) function?
I tried this one:
try
{
d = float.Parse(s);
}
catch
{
try
{
d = float.Parse(s.Replace(".", ","));
}
catch
{
d = float.Parse(s.Replace(",", "."));
}
}
It doesn't work. And when I debugg it turns out that it parses it wrong the first time thinking that "." is a separator for thousands (like 100.000.000,0).
I'm noob at C#, so hopefully there is less overcomplicated solution then that :-)
NB: People a going to use both '.' and ',' in PCs with different separator settings.
If you are sure nobody uses thousand-separators, you can Replace first:
string s = "1,23"; // or s = "1.23";
s = s.Replace(',', '.');
double d = double.Parse(s, CultureInfo.InvariantCulture);
The general pattern would be:
first try to sanitize and normalize. Like Replace(otherChar, myChar).
try to detect a format, maybe using RegEx, and use a targeted conversion. Like counting . and , to guess whether thousand separators are used.
try several formats in order, with TryParse. Do not use exceptions for this.
Parsing uses the settings of the CultureInfo.CurrentCulture, which reflects the language and the Number format selected by the user through Regional Settings. If your users type the decimals that correspond to the language they chose for their computers, you should have no problem using plain old double.Parse(). If a user sets his locale to Greek and types "8,5", double.Parse("8,5") will return 8.5. If he types "8.5" parse will return 85.
If a user sets his locale to one setting and then starts using the wrong decimal, you face a problem. There is no clean way to separate such wrong entries instead of entries that really wanted to enter the grouping character. What you can do is to warn the user when a number is too short to include a grouping character and use Masked or numerical text boxes to prevent wrong entries.
Another, somewhat stricter option, is to disallow the grouping character for number entry in your application by cancelling it in the KeyDown event of your textboxes. You can get the numeric and grouping characters from CultureInfo.CurrentCulture.NumberFormat property.
Trying to replace the decimal and grouping characters is doomed to fail as it depends on knowing during compile time what kind of separator the user is going to use. If you knew that, you could just parse the number using the CultureInfo in the user's mind. Unfortunately, User.Brain.CultureInfo is not yet part of the .NET framework :P
I would do someting like this
float ConvertToFloat(string value)
{
float result;
var converted = float.TryParse(value, out result);
if (converted) return result;
converted = float.TryParse(value.Replace(".", ",")),
out result);
if (converted) return result;
return float.NaN;
}
In this case the following would return correct data
Console.WriteLine(ConvertToFloat("10.10").ToString());
Console.WriteLine(ConvertToFloat("11,0").ToString());
Console.WriteLine(ConvertToFloat("12").ToString());
Console.WriteLine(ConvertToFloat("1 . 10").ToString());
Returns
10,1
11
12
NaN
In this case if it is not possible to convert it, you will at least know that it is not a number. It's a safe way to convert.
You can also use the following overload
float.TryParse(value,
NumberStyles.Currency,
CultureInfo.CurrentCulture,
out result)
On this test-code:
Console.WriteLine(ConvertToFloat("10,10").ToString());
Console.WriteLine(ConvertToFloat("11,0").ToString());
Console.WriteLine(ConvertToFloat("12").ToString());
Console.WriteLine(ConvertToFloat("1 . 10").ToString());
Console.WriteLine(ConvertToFloat("100.000,1").ToString());
It returns the following
10,1
11
12
110
100000,1
So depending on how "nice" you want to be to the user, you can always replace the last step, if it is not a number, try converting it this way aswell, otherwsie it really isn't a number.
It would the look like this
float ConvertToFloat(string value)
{
float result;
var converted = float.TryParse(value,
out result);
if (converted) return result;
converted = float.TryParse(value.Replace(".", ","),
out result);
if (converted) return result;
converted = float.TryParse(value,
NumberStyles.Currency,
CultureInfo.CurrentCulture,
out result);
return converted ? result : float.NaN;
}
Where the following
Console.WriteLine(ConvertToFloat("10,10").ToString());
Console.WriteLine(ConvertToFloat("11,0").ToString());
Console.WriteLine(ConvertToFloat("12").ToString());
Console.WriteLine(ConvertToFloat("1 . 10").ToString());
Console.WriteLine(ConvertToFloat("100.000,1").ToString());
Console.WriteLine(ConvertToFloat("asdf").ToString());
Returns
10,1
11
12
110
100000,1
NaN

Parsing amount strings into numbers

I am working on a system that is recognizing paper documents using OCR engines. These documents are invoices containing amounts such as total, vat and net amounts. I need to parse these amount strings into numbers, but they are coming in many formats and flavors using different symbols for decimal and thousands separation in the number in each invoice. If I am trying to use the normal double.tryparse and double.parse methods in .NET then they normally fail for some of the amounts
These are some of the examples I receive as amount
"3.533,65" => 3533.65
"-133.696" => -133696
"-33.017" => -33017
"-166.713" => -166713
"-5088,8" => -5088.8
"0.423" => 0.423
"9,215,200" => 9215200
"1,443,840.00" => 1443840
I need some way to guess what the decimal separator and the thousand separator is in the number and then present the value to the user to decide if this is correct or not.
I am wondering how to solve this problem in an elegant way.
I'm not sure you'll be able to get an elegant way of figuring this out, because it's always going to be ambigious if you can't tell it where the data is from.
For example, the numbers 1.234 and 1,234 are both valid numbers, but without establishing what the symbols mean you won't be able to tell which is which.
Personally, I would write a function which attempted to do a "best guess" based on some rules...
If the number contains , BEFORE ., then , must be for thousands and . must be for decimals
If the number contains . BEFORE ,, then . must be for thousands and , must be for decimals
If there are >1 , symbols, the thousand separator must be ,
If there are >1 . symbols, the thousand separator must be .
If there is only 1 , how many numbers follow it? If it's NOT 3, then it must be
the decimal separator (same rule for .)
If there are 3 numbers separating it (e.g. 1,234 and 1.234), perhaps you could put this number aside and parse other numbers on the same page to try and figure out if they use different separators, then come back to it?
Once you've figured out the decimal separate, remove any thousand separators (not needed for parsing the number) and ensure the decimal separator is . in the string you are parsing. Then you can pass this into Double.TryParse
I would probably set up a list of rules that are specified in order of preference, this way you can plug rules in by precedence. You can then parse the list based on regex matches returning the correct rule.
A quick prototype would be very easy to set up similar to:
public class FormatRule
{
public string Pattern { get; set; }
public CultureInfo Culture { get; set; }
public FormatRule(string pattern, CultureInfo culture)
{
Pattern = pattern;
Culture = culture;
}
}
Now a list of of FormatRule used to store your rules in order of precedence:
List<FormatRule> Rules = new List<FormatRule>()
{
/* Add rules in order of precedence specifying a culture
* that can handle the pattern, I've chosen en-US and fr-FR
* for this example, but equally any culture could be swapped
* in for various formats you may need to use */
new FormatRule(#"^0.\d+$", CultureInfo.GetCultureInfo("en-US")),
new FormatRule(#"^0,\d+$", CultureInfo.GetCultureInfo("fr-FR")),
new FormatRule(#"^[1-9]+.\d{4,}$", CultureInfo.GetCultureInfo("en-US")),
new FormatRule(#"^[1-9]+,\d{4,}$", CultureInfo.GetCultureInfo("fr-FR")),
new FormatRule(#"^-?[1-9]{1,3}(,\d{3,})*(\.\d*)?$", CultureInfo.GetCultureInfo("en-US")),
new FormatRule(#"^-?[1-9]{1,3}(.\d{3,})*(\,\d*)?$", CultureInfo.GetCultureInfo("fr-FR")),
/* The default rule */
new FormatRule(string.Empty, CultureInfo.CurrentCulture)
}
You should then be able to iterate your list looking for the correct rule to apply:
public CultureInfo FindProvider(string numberString)
{
foreach(FormatRule rule in Rules)
{
if (Regex.IsMatch(numberString, rule.Pattern))
return rule.Culture;
}
return Rules[Rules.Count - 1].Culture;
}
This setup allows you to easily manage rules and set precedence on when something should be handled one way or another. It also allows you to be able to specify different cultures to handle one format one way and a different format another.
public float ParseValue(string valueString)
{
float value = 0;
NumberStyles style = NumberStyles.Any;
IFormatProvider provider = FindCulture(valueString).NumberFormat;
if (float.TryParse(numberString, style, provider, out value))
return value;
else
throw new InvalidCastException(string.Format("Value '{0}' cannot be parsed with any of the providers in the rule set.", valueString));
}
Finally, call your ParseValue() method to convert the string value you have to a float:
string numberString = "-123,456.78"; //Or "23.457.234,87"
float value = ParseValue(numberString);
You may decide to use a dictionary to save on the extra FormatRule class; the concept is the same... I used a list in the example because it makes it easier to query use LINQ. Also, you could easily replace the float type I've used for single, double or decimal if needed.
You will have to create your own function to guess what is the decimal separator and the thousand separator. Then you will be able to double.Parse but with the corresponding CultureInfo.
I recommend to do something like this (just an i.e. this is not a production tested function):
private CultureInfo GetNumbreCultureInfo(string number)
{
CultureInfo dotDecimalSeparator = new CultureInfo("En-Us");
CultureInfo commaDecimalSeparator = new CultureInfo("Es-Ar");
string[] splitByDot = number.Split('.');
if (splitByDot.Count() > 2) //has more than 1 . so the . is the thousand separator
return commaDecimalSeparator; //return a cultureInfo where the thousand separator is the .
//the same for the ,
string[] splitByComma = number.Split(',');
if (splitByComma.Count() > 2)
return dotDecimalSeparator;
//if there is no , or . return an invariant culture
if (splitByComma.Count() == 1 && splitByDot.Count() == 1)
return CultureInfo.InvariantCulture;
//if there is only 1 . or 1 , lets check witch is the last one
if (splitByComma.Count() == 2)
if (splitByDot.Count() == 1)
if (splitByComma.Last().Length != 3) // , its a decimal separator
return commaDecimalSeparator;
else// here you dont really know if its the dot decimal separator i.e 100.001 this can be thousand or decimal separator
return dotDecimalSeparator;
else //here you have something like 100.010,00 ir 100.010,111 or 100,000.111
{
if (splitByDot.Last().Length > splitByComma.Last().Length) //, is the decimal separator
return commaDecimalSeparator;
else
return dotDecimalSeparator;
}
else
if (splitByDot.Last().Length != 3) // . its a decimal separator
return dotDecimalSeparator;
else
return commaDecimalSeparator; //again you really dont know here... i.e. 100,101
}
you can do a quick test like this:
string[] numbers = { "100.101", "1.000.000,00", "100.100,10", "100,100.10", "100,100.100", "1,00" };
decimal n;
foreach (string number in numbers)
{
if (decimal.TryParse(number, NumberStyles.Any, GetNumbreCultureInfo(number), out n))
MessageBox.Show(n.ToString());//the decimal was parsed
else
MessageBox.Show("there was problems parsing");
}
Also look the if where you dont really know witch is the separator (like 100,010 or 100.001) where can be a decimal or thousand separator.
You can save this looking in the document for a number with the amount of data necessary to know witch is the culture of the document, save that culture and use always the same culture (if you can asume that the document is all in the same culture...)
Hope this will help
You should be able to that with Double.TryParse. Your biggest problem as I see it is that you have inconsistencies in the way you interpret the numbers.
For example, how can
"-133.696" => -133696
When
"-166.713" => -166.713
?
If the rules for converting the numbers aren't consistent then you won't be able to solve this in code. As klausbyskov pointed out, why does the period in "-133.696" have a different meaning than the one in "-166.713"? How would you know what to do with a number containing a decimal point given these 2 examples where one is using it as expected but the other is using it as a thousand separator?
You'll need to define the various cases you're likely to encounter, create some logic to match each incoming string to one of your cases, and then parse it specifying an appropriate FormatProvider. For example - if your string contains a decimal point BEFORE a comma, then you can assume that for this particular string, they're using the decimal point as the thousands separator and the comma as the decimal separator, so you can construct a format provider to cope with this scenario.
Try something along these lines:
public IFormatProvider GetParseFormatProvider(string s) {
var nfi = new CultureInfo("en-US", false).NumberFormat;
if (/* s contains period before comma */) {
nfi.NumberDecimalSeparator = ",";
nfi.NumberGroupSeparator = ".";
} else if (/* some other condition */) {
/* construct some other format provider */
}
return(nfi);
}
and then use Double.Parse(myString, GetParseFormatProvider(myString)) to perform the actual parsing.
"and then present the value to the user to decide if this is correct or not."
If there are multiple possibilities, why not show the user both of them?
You can have multiple methods calling TryParse with the different cultures you want to be able to handle, and collect the parse results for those methods that succeed in a list (removing duplicates).
You could even estimate the likelihood of the different possiblities being correct based on what frequency the various formats are used elsewhere in the document, and present the alternatives in a list sorted by likelihood of being correct. For example, if you have seen a lot of numbers like 3,456,231.4 already then you can guess that comma is probably the thousands seperator when you see 4,675 later in the same document, and present "4675" first in the list, and "4.675" second.
If you have a dot or comma followed by no more than two digits, it's the decimal point. Otherwise, ignore it.

Categories

Resources