How to get the ASCII character in c#? - c#

How can I get this ƒ character from the ASCII table..? I have tried like this.
txt2 = (char)131;
I can able to get till 127 values only.. If I give more than 127 its returning NULL value. So how can I get till 255 ?

ƒ isn't ASCII... it is Unicode, Unicode Character 'LATIN SMALL LETTER F WITH HOOK' (U+0192).
char ch = 'ƒ';
or
char ch = (char)0x0192;
or
char ch = '\x0192';
There are only 128 characters in the ASCII set (0-127), and there are no non-american letters (there are only A-Z and a-z)

Related

remove 4 byte UTF8 characters

I'd like to remove 4 byte UTF8 characters which starts with \xF0 (the char with the ASCII code 0xF0) from a string and tried
sText = Regex.Replace (sText, "\xF0...", "");
This doesn't work. Using two backslashes did not work neither.
The exact input is the content of https://de.wikipedia.org/w/index.php?title=Spezial:Exportieren&action=submit&pages=Unicode The 4 byte character ist the one after the text "[[Violinschlüssel]] ", in hex notation: .. 0x65 0x6c 0x5d 0x5d 0x20 0xf0 0x9d 0x84 0x9e 0x20 .. The expected output is 0x65 0x6c 0x5d 0x5d 0x20 0x20 ..
What's wrong?
Such characters will be surrogate pairs in .NET which uses UTF-16. Each of them will be two UTF-16 code units, that is two char values.
To just remove them, you can do (using System.Linq;):
sText = string.Concat(sText.Where(x => !char.IsSurrogate(x)));
(uses an overload of Concat introduced in .NET 4.0 (Visual Studio 2010)).
Late addition: It may give better performance to use:
sText = new string(sText.Where(x => !char.IsSurrogate(x)).ToArray());
even if it looks worse. (Works in .NET 3.5 (Visual Studio 2008).)
You are trying to search for byte values but C# strings are made from char values. The C# language spec at section "2.4.4.4 Character literals" states:
A character literal represents a single character, and usually consists of a character in quotes, as in 'a'.
...
A hexadecimal escape sequence represents a single Unicode character, with the value formed by the hexadecimal number following \x.
Hence the search for "\xF0..." is searching for the character U+F0 which would be represented by the bytes C3 B0.
If you want find replace all Unicode characters whose first byte is 0xF0 then I believe you need to search for the character values whose first byte if 0xFO.
The character U+10000 is represented as F0 90 80 80 (the preceding code is U+FFFF which is EF BF BF). The first code with F1 .... .. is U+40000 which is F1 80 80 80 and the value before it is U+3FFFF which is F0 BF BF BF.
Hence you need to remove characters in the range U+10000 to U+3FFFF. This should be possible with a regular expression such as
sText = Regex.Replace (sText, "[\\x10000-\\x3FFFF]", "");
The relevant characters from the source quoted in the question have been extracted into the code below. The code then tries to understand how the characters are held in strings.
static void Main(string[] args)
{
string input = "] 𝄞 (";
Console.Write("Input length {0} : '{1}' : ", input.Length, input);
foreach (char cc in input)
{
Console.Write(" {0,2:X02}", (int)cc);
}
Console.WriteLine();
}
The output from the program is as below. This supports the surrogate pair explanation given by #Jeppe in his answer.
Input length 6 : '] ?? (' : 5D 20 D834 DD1E 20 28

How to convert latin character into HTML Entity (Decimal) in c#?

i want to convert latin character into html entity code in c#
for example
Th‚rŠse Ramdally should convert into
Th‚rŠse Ramdally
Thanks
vela
Possible solution is to encode every character that's beyond ASCII character table (i.e. character >= 128 or character < 32):
String source = #"Th‚rŠse Ramdally";
String result = String.Concat(source
.Select(c => (c < 128 && c > 31)
? c.ToString()
: String.Format("&#{0};", (int) c)));

Char array is returning incorrect values

I have a char array, chars[] with values {'#', '$', '1'} contained within it. I want to remove the 1 and place it into another variable, val, but when I do it gives me a 49 (idk why). I tried debugging it and the info shows that the elements of chars are as follows:
char[0] = 35 '#'
char[1] = 36 '$'
char[2] = 49 '1'
Which in turn makes
int val = char[2];
become
val = 49
I'm not sure why this is, but it's throwing my plans off. Does anyone know what the problem is and what I can do to fix it?
You should use
char val = char[2];
With int, you are getting the ASCII representation of the char as an integer.
see also http://hu.wikipedia.org/wiki/ASCII
49 is the ASCII representation for the char '1'
link to ASCII table
Just go for: charArray[x].ToString();
This will convert the ASCII representation to an actual Character.

Problems parsing through string in C#

I am trying to parse through the first three characters of a string.
public List<string> sortModes(List<string> allModesNonSorted)
{
foreach (string s in allModesNonSorted)
{
char firstNumber = s[0];
char secondNumber = s[1];
char thirdNumber = s[2];
char.IsDigit(firstNumber);
char.IsDigit(secondNumber);
char.IsDigit(thirdNumber);
combinedNumbers = Convert.ToInt16(firstNumber) + Convert.ToInt16(secondNumber) + Convert.ToInt16(thirdNumber);
}
return allModesNonSorted;
}
It recognizes each character correctly, but adds on an extra value 53 or 55. Below when I add the numbers, the 53 and 55 are included. Why is it doing this??
53 is the Unicode value of '5', and 55 is the Unicode value of '7'. It's showing you both the numeric and character versions of the data.
You'll notice with secondNumber you see the binary value 0 and the character value '\0' as well.
If you want to interpret a string as an integer, you can use
int myInteger = int.Parse(myString);
Specifically if you know you always have the format
input = "999 Hz Bla bla"
you can do something like:
int firstSeparator = input.IndexOf(' ');
string frequency = input.Substring(firstSeparator);
int numericFrequency = int.Parse(frequency);
That will work no matter how many digits are in the frequency as long as the digits are followed by a space character.
53 is the ASCII value for the character '5'
57 is the ASCII value for the character '7'
this is just Visual Studio showing you extra details about the actual values.
You can proceed with your code.
Because you're treating them as Characters.
the character '5' is sequentially the 53rd character in ASCII.
the simplest solution is to just subtract the character '0' from all of them, that will give you the numeric value of a single character.
53 and 55 are the ASCII values of the '5' and '7' characters (the way the characters are stored in memory).
If you need to convert them to Integers, take a look at this SO post.

Conversion of a Char variable to an integer

why is the integer equivalent of '8' is 56 in C sharp? I want to convert it to an integer 8 and not any other number.
You'll need to subtract the offset from '0'.
int zero = (int)'0'; // 48
int eight = (int)'8'; // 56
int value = eight - zero; // 8
56 is the (EDIT) Unicode value for the character 8 use:
Int32.Parse(myChar.ToString());
EDIT:
OR this:
char myChar = '8';
Convert.ToInt32(myChar);
The right way of converting unicode characters in C#/.Net is to use corresponding Char methods IsDigit and GetNumericValue (http://msdn.microsoft.com/en-us/library/e7k33ktz.aspx).
If you are absolutely sure that there will be no non-ASCII numbers in your input than ChaosPandion's suggestion is fine too.

Categories

Resources