Problems parsing through string in C#

Problems parsing through string in C# - c#

I am trying to parse through the first three characters of a string.
public List<string> sortModes(List<string> allModesNonSorted)
{
foreach (string s in allModesNonSorted)
{
char firstNumber = s[0];
char secondNumber = s[1];
char thirdNumber = s[2];
char.IsDigit(firstNumber);
char.IsDigit(secondNumber);
char.IsDigit(thirdNumber);
combinedNumbers = Convert.ToInt16(firstNumber) + Convert.ToInt16(secondNumber) + Convert.ToInt16(thirdNumber);
}
return allModesNonSorted;
}
It recognizes each character correctly, but adds on an extra value 53 or 55. Below when I add the numbers, the 53 and 55 are included. Why is it doing this??

53 is the Unicode value of '5', and 55 is the Unicode value of '7'. It's showing you both the numeric and character versions of the data.
You'll notice with secondNumber you see the binary value 0 and the character value '\0' as well.
If you want to interpret a string as an integer, you can use
int myInteger = int.Parse(myString);
Specifically if you know you always have the format
input = "999 Hz Bla bla"
you can do something like:
int firstSeparator = input.IndexOf(' ');
string frequency = input.Substring(firstSeparator);
int numericFrequency = int.Parse(frequency);
That will work no matter how many digits are in the frequency as long as the digits are followed by a space character.

53 is the ASCII value for the character '5'
57 is the ASCII value for the character '7'
this is just Visual Studio showing you extra details about the actual values.
You can proceed with your code.

Because you're treating them as Characters.
the character '5' is sequentially the 53rd character in ASCII.
the simplest solution is to just subtract the character '0' from all of them, that will give you the numeric value of a single character.

53 and 55 are the ASCII values of the '5' and '7' characters (the way the characters are stored in memory).
If you need to convert them to Integers, take a look at this SO post.

Related

remove 4 byte UTF8 characters

I'd like to remove 4 byte UTF8 characters which starts with \xF0 (the char with the ASCII code 0xF0) from a string and tried
sText = Regex.Replace (sText, "\xF0...", "");
This doesn't work. Using two backslashes did not work neither.
The exact input is the content of https://de.wikipedia.org/w/index.php?title=Spezial:Exportieren&action=submit&pages=Unicode The 4 byte character ist the one after the text "[[Violinschlüssel]] ", in hex notation: .. 0x65 0x6c 0x5d 0x5d 0x20 0xf0 0x9d 0x84 0x9e 0x20 .. The expected output is 0x65 0x6c 0x5d 0x5d 0x20 0x20 ..
What's wrong?

Such characters will be surrogate pairs in .NET which uses UTF-16. Each of them will be two UTF-16 code units, that is two char values.
To just remove them, you can do (using System.Linq;):
sText = string.Concat(sText.Where(x => !char.IsSurrogate(x)));
(uses an overload of Concat introduced in .NET 4.0 (Visual Studio 2010)).
Late addition: It may give better performance to use:
sText = new string(sText.Where(x => !char.IsSurrogate(x)).ToArray());
even if it looks worse. (Works in .NET 3.5 (Visual Studio 2008).)

You are trying to search for byte values but C# strings are made from char values. The C# language spec at section "2.4.4.4 Character literals" states:
A character literal represents a single character, and usually consists of a character in quotes, as in 'a'.
...
A hexadecimal escape sequence represents a single Unicode character, with the value formed by the hexadecimal number following \x.
Hence the search for "\xF0..." is searching for the character U+F0 which would be represented by the bytes C3 B0.
If you want find replace all Unicode characters whose first byte is 0xF0 then I believe you need to search for the character values whose first byte if 0xFO.
The character U+10000 is represented as F0 90 80 80 (the preceding code is U+FFFF which is EF BF BF). The first code with F1 .... .. is U+40000 which is F1 80 80 80 and the value before it is U+3FFFF which is F0 BF BF BF.
Hence you need to remove characters in the range U+10000 to U+3FFFF. This should be possible with a regular expression such as
sText = Regex.Replace (sText, "[\\x10000-\\x3FFFF]", "");
The relevant characters from the source quoted in the question have been extracted into the code below. The code then tries to understand how the characters are held in strings.
static void Main(string[] args)
{
string input = "] 𝄞 (";
Console.Write("Input length {0} : '{1}' : ", input.Length, input);
foreach (char cc in input)
{
Console.Write(" {0,2:X02}", (int)cc);
}
Console.WriteLine();
}
The output from the program is as below. This supports the surrogate pair explanation given by #Jeppe in his answer.
Input length 6 : '] ?? (' : 5D 20 D834 DD1E 20 28

How to get the ASCII character in c#?

How can I get this ƒ character from the ASCII table..? I have tried like this.
txt2 = (char)131;
I can able to get till 127 values only.. If I give more than 127 its returning NULL value. So how can I get till 255 ?

ƒ isn't ASCII... it is Unicode, Unicode Character 'LATIN SMALL LETTER F WITH HOOK' (U+0192).
char ch = 'ƒ';
or
char ch = (char)0x0192;
or
char ch = '\x0192';
There are only 128 characters in the ASCII set (0-127), and there are no non-american letters (there are only A-Z and a-z)

Char array is returning incorrect values

I have a char array, chars[] with values {'#', '$', '1'} contained within it. I want to remove the 1 and place it into another variable, val, but when I do it gives me a 49 (idk why). I tried debugging it and the info shows that the elements of chars are as follows:
char[0] = 35 '#'
char[1] = 36 '$'
char[2] = 49 '1'
Which in turn makes
int val = char[2];
become
val = 49
I'm not sure why this is, but it's throwing my plans off. Does anyone know what the problem is and what I can do to fix it?

You should use
char val = char[2];
With int, you are getting the ASCII representation of the char as an integer.
see also http://hu.wikipedia.org/wiki/ASCII

49 is the ASCII representation for the char '1'
link to ASCII table

Just go for: charArray[x].ToString();
This will convert the ASCII representation to an actual Character.

adding 48 to string number

I have a string in C# like this:
string only_number;
I assigned it a value = 40
When I check only_number[0], I get 52
When I check only_number[1], I get 48
why it is adding 48 to a character at current position? Please suggest

String is basically char[]. So what you are seeing is ASCII value of char 4 and 0.
Proof: Diff between 4 and 0 = Diff between 52 and 48.
Since it is a string so you didn't assigned it 40. Instead you assigned it "40".

What you see is the ASCII code of '4' and '0'.

It's not adding 48 to the character. What you see is the character code, and the characters for digits start at 48 in Unicode:
'0' = 48
'1' = 49
'2' = 50
'3' = 51
'4' = 52
'5' = 53
'6' = 54
'7' = 55
'8' = 56
'9' = 57
A string is a range of char values, and each char value is a 16 bit integer basically representing a code point in the Unicode character set.
When you read from only_number[0] you get a char value that is '4', and the character code for that is 52. So, what you have done is reading a character from the string, and then converted that to an integer before you display it.
So:
char c = only_number[0];
Console.WriteLine(c); // displays 4
int n = (int)only_number[0]; // cast to integer
Console.WriteLine(n); // displays 52
int m = only_number[0]; // the cast is not needed, but the value is cast anyway
Console.WriteLine(m); // displays 52

You are accessing this string and it is outputting the ASCII character codes for each of your two characters, '4' and '0' - please see here:
http://www.theasciicode.com.ar/ascii-control-characters/null-character-ascii-code-0.html

string is the array of chars, so, that;s why you recieved these results, it basicallly display the ASCII of '4' and '0'.

Conversion of a Char variable to an integer

why is the integer equivalent of '8' is 56 in C sharp? I want to convert it to an integer 8 and not any other number.

You'll need to subtract the offset from '0'.
int zero = (int)'0'; // 48
int eight = (int)'8'; // 56
int value = eight - zero; // 8

56 is the (EDIT) Unicode value for the character 8 use:
Int32.Parse(myChar.ToString());
EDIT:
OR this:
char myChar = '8';
Convert.ToInt32(myChar);

The right way of converting unicode characters in C#/.Net is to use corresponding Char methods IsDigit and GetNumericValue (http://msdn.microsoft.com/en-us/library/e7k33ktz.aspx).
If you are absolutely sure that there will be no non-ASCII numbers in your input than ChaosPandion's suggestion is fine too.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problems parsing through string in C# - c#

53 is the ASCII value for the character '5' 57 is the ASCII value for the character '7' this is just Visual Studio showing you extra details about the actual values. You can proceed with your code.

Because you're treating them as Characters. the character '5' is sequentially the 53rd character in ASCII. the simplest solution is to just subtract the character '0' from all of them, that will give you the numeric value of a single character.

53 and 55 are the ASCII values of the '5' and '7' characters (the way the characters are stored in memory). If you need to convert them to Integers, take a look at this SO post.

Related

remove 4 byte UTF8 characters

How to get the ASCII character in c#?

Char array is returning incorrect values

adding 48 to string number

Conversion of a Char variable to an integer

Categories

Resources