Convert single-character string to char - c#

I need to convert single string character to ASC code, similar to Visual Basic ASC("a")
I need to do it in C#, something similar to ToCharArray()
("a").ToCharArray()
returns
{char[1]}
[0]: 97 'a'
I need to have 97 alone.

A string is an array of char, so you can get the first character using array indexing syntax, and a char, if used as an int (which is an implicit conversion), will return the ASCII value.
Try:
int num = "a"[0]; // num will be 97
// Which is the same as using a char directly to get the int value:
int num = 'a'; // num will be 97
What you're seeing that seems to be causing some confusion is how the char type is represented in the debugger: both the character and the int value are shown.
Here's an example of an int and a char in the debugger as well as in the console window (which is their ToString() representation):
int num = "a"[0];
char chr = "a"[0];
Console.WriteLine($"num = {num}");
Console.WriteLine($"chr = {chr}");

If you want to convert a single character string to char, do this
char.Parse("a");
If you want to get char code do this
char.ConvertToUtf32("a", 0); // return 97

char chrReadLetter;
chrReadLetter = (char)char.ConvertToUtf32(txtTextBox1.Text.Substring(0, 1), 0);
Reads the first letter of the textbox into a character variable.

Related

Split Chinese Character problem using tocharArray()

I am writing a C# program to split Chinese Character input like this
textbox input ="大大大"
expected output =
大
大
大
And the code is like this
string aa="大大大";
foreach (char c in aa.ToCharArray()){
Console.WriteLine(c);
}
It works fine for most of the characters.
However, for some characters such as "𧜏", I got the result like this
textbox input = 𧜏大
output =
口
口
大
It looks like that the program fail to handle this character
Is there any solution to solve this?
TL;DR:
Don't use ToCharArray() when working with non-"ASCII" (or at least, non-Latin-style) text.
Instead, use TextElementEnumerator.
Here's why.
Explanation
In Unicode, the 𧜏 character has a code-point of U+2770F which is outside the range supported by a single 16-bit UTF-16 value (i.e. 2 bytes, a single .NET Char value), so UTF-16 uses a pair of separate 16-bit values known as a surrogate pair to represent it:
using Shouldly;
String input = "𧜏";
Char[] chars = input.ToCharArray();
chars.Length.ShouldBe( 2 ); // 2*Char == 2*16-bits == 32 bits
Char.GetUnicodeCategory( chars[0] ).ShouldBe( UnicodeCategory.Surrogate );
Char.GetUnicodeCategory( chars[1] ).ShouldBe( UnicodeCategory.Surrogate );
Therefore, to meaingfully "split" a string like this, your program needs to be aware of surrogate-pairs and not split a pair up.
The code below is a simple program that extracts each Unicode code-point from a string and adds it to a list.
String input = "大𧜏大";
// Don't forget to Normalize!
input = input.Normalize();
List<UInt32> codepoints = new List<UInt32>( capacity: 3 );
for( Int32 i = 0; i < input.Length; i++ )
{
Char c = input[i];
if( Char.GetUnicodeCategory( c ) == UnicodeCategory.Surrogate )
{
Char former = c;
Char latter = input[i+1];
// The former sibling has the upper 11 bits of the code-point (after 0x00D800).
// The latter sibling has the lower 10 bits of the code-point.
UInt32 hi = former;
UInt32 lo = latter;
UInt32 codepoint = ((hi - 0xD800) * 0x400) + (lo - 0xDC00) + 0x10000;
codepoints.Add( codepoint );
i += 1; // Skip the next char
}
else
{
codepoints.Add( c );
}
}
codepoint.Dump();
// [0] = 22823 == '大'
// [1] = 161551 == '𧜏'
// [2] = 22823 == '大'
Note that when it comes to non-Latin-style alphabets, the concept of splitting a string up into discrete characters, glyphs, or graphemes is... complicated. But in general, you don't want to split a string up into discrete Char values (Q.E.D.), but also you shouldn't split a string up into code-points either, instead you'll want to split a string up into grapheme clusters (a visual-grouping of related glyphs, each represented by their own codepoints, which in-turn may be a single .NET 16-bit Char value, or a Surrogate Pair of Char values).
Fortunately .NET has this functionality built-in into System.Globalization.TextElementEnumerator.
using System.Globalization;
String input = "大𧜏大".Normalize();
TextElementEnumerator iter = StringInfo.GetTextElementEnumerator( input );
while( iter.MoveNext() )
{
String graphemeCluster = iter.GetTextElement();
Console.WriteLine( graphemeCluster );
}
Gives me the expected output:
大
𧜏
大

String with index conversion or array of numbers

Why i can't convert this string to a number? Or how to make a array of numbers from this string.
string str = "110101010";
int c = Int32.Parse(str[0]);
str is a string so str[0] returns a char and the Parse method doesnt take a char as input but rather a string.
if you want to convert the string into an int then you would need to do:
int c = Int32.Parse(str); // or Int32.Parse(str[0].ToString()); for a single digit
or you're probably looking for a way to convert all the individual numbers into an array which can be done as:
var result = str.Select(x => int.Parse(x.ToString()))
.ToArray();
I assume you are trying to convert a binary string into its decimal representation.
For this you could make use of System.Convert:
int c = Convert.ToInt32(str, 2);
For the case that you want to sum up all the 1s and 0s from the string you could make use of System.Linq's Select() and Sum():
int c = str.Select(i => int.Parse(i.ToString())).Sum();
Alternatively if you just want to have an array of 1s and 0s from the string you could omit the Sum() and instead enumerate to an array using ToArray():
int[] c = str.Select(i => int.Parse(i.ToString())).ToArray();
Disclaimer: The two snippets above using int.Parse()would throw an exception if str were to contain a non-numeric character.
Int32.Parse accepts string argument, not char which str[0] returs.
To get the first number, try:
string str = "110101010";
int c = Int32.Parse(str.Substring(0, 1));

Converting a string getting Wrong value

I write a simple code and need to get each number in a string as an Integer
But when I try to use Convert.ToInt32() it give me another value
Example:
string x="4567";
Console.WriteLine(x[0]);
the result will be 4 , but if i try to use Convert
Console.WriteLine(Convert.ToInt32(x[0]));
it Give me 52 !!
I try using int.TryParse() and its the same
According to the docs:
The ToInt32(Char) method returns a 32-bit signed integer that represents the UTF-16 encoded code unit of the value argument. If value is not a low surrogate or a high surrogate, this return value also represents the Unicode code point of value.
In order to get 4, you'd have to convert it to a string before converting to int32:
Console.WriteLine(Convert.ToInt32(x[0].ToString()));
you can alternatively do this way to loop through all characters in a string
foreach (char c in x)
{
Console.WriteLine(c);
}
Will Print 4,5,6,7
And as suggested in earlier answer before converting to integer make that as a sting so it doesn't return the ascii code
A char is internally a short integer of it's ASCII representation.
If you cast/convert (explicitly or implicitly) a char to int, you will get it's Ascii value. Example Convert.ToInt32('4') = 52
But when you print it to Console, you are using it's ToString() method implicitly, so you are actually printing the ASCII character '4'. Example: Console.WriteLine(x[0]) is equivalent to Console.WriteLine("4")
Try using an ASCII letter so you will notice the difference clearly:
Console.WriteLine((char)('a')); // a
Console.WriteLine((int)('a')); // 97
Now play with char math:
Console.WriteLine((char)('a'+5)); // f
Console.WriteLine((int)('a'+5)); // 102
Bottom line, just use int digit = int.Parse(x[0]);
Or the funny (but less secure) way: int digit = x[0] - '0';
x isnt x[0], x[0] is the first char, '4' which is 52.

Use a Decimal Value as a Hexadecimal Value

I have the int 15 (or the string "15", that's just as easy), and I need to use it to create the value:
"\u0015"
Is there some conversion which would accomplish this? I can't do this:
"\u00" + myInt.ToString()
Because the first literal is invalid. Is there a simple way to get this result?
(If you're curious, this is for integrating with a hardware device where the vendor sometimes expresses integer values as hexadecimal. For example, I'd need to send today's date to the device as "\u0015\u0010\u0002".)
Given that you want a Unicode code point of 21, not 15, you should definitely start with the string "15". If you try to start with 15 as an int, you'll find you can't express anything with a hex representation involving A-F...
So, given "15" the simplest way of parsing that as hex is probably:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
After that, you just need to cast to char:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
char character = (char) codePoint;
Note that this will only work for code points in the Basic Multilingual Plane (BMP) - i.e. U+0000 to U+FFFF. If you need to handle values beyond that (e.g. U+1F601) then you should use char.ConvertFromUtf32 instead:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
string character = char.ConvertFromUtf32(codePoint);
Unicode literals in strings are resolved at compile-time, that's why "\u00" + myInt.ToString() doesn't work (the ToString() and concatenation are evaluated at runtime).
You could cast the int to char:
int unicodeCodePoint = 15; // or 21
char c = (char)unicodeCodePoint;

Six digit unicode escaped value comparison

I have a six digit unicode character, for example U+100000 which I wish to make a comparison with a another char in my C# code.
My reading of the MSDN documentation is that this character cannot be represented by a char, and must instead be represented by a string.
a Unicode character in the range U+10000 to U+10FFFF is not permitted in a character literal and is represented using a Unicode surrogate pair in a string literal
I feel that I'm missing something obvious, but how can you get the follow comparison to work correctly:
public bool IsCharLessThan(char myChar, string upperBound)
{
return myChar < upperBound; // will not compile as a char is not comparable to a string
}
Assert.IsTrue(AnExample('\u0066', "\u100000"));
Assert.IsFalse(AnExample("\u100000", "\u100000")); // again won't compile as this is a string and not a char
edit
k, I think I need two methods, one to accept chars and another to accept 'big chars' i.e. strings. So:
public bool IsCharLessThan(char myChar, string upperBound)
{
return true; // every char is less than a BigChar
}
public bool IsCharLessThan(string myBigChar, string upperBound)
{
return string.Compare(myBigChar, upperBound) < 0;
}
Assert.IsTrue(AnExample('\u0066', "\u100000));
Assert.IsFalse(AnExample("\u100022", "\u100000"));
To construct a string with the Unicode code point U+10FFFF using a string literal, you need to work out the surrogate pair involved.
In this case, you need:
string bigCharacter = "\uDBFF\uDFFF";
Or you can use char.ConvertFromUtf32:
string bigCharacter = char.ConvertFromUtf32(0x10FFFF);
It's not clear what you want your method to achieve, but if you need it to work with characters not in the BMP, you'll need to make it accept int instead of char, or a string.
As per the documentation for string, if you want to iterate over characters in a string as full Unicode values, use TextElementEnumerator or StringInfo.
Note that you do need to do this explicitly. If you just use ordinal values, it will check UTF-16 code units, not the UTF-32 code points. For example:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
Console.WriteLine(string.Compare(text, upperBound, StringComparison.Ordinal));
This prints out a value greater than zero, suggesting that text is greater than upperBound here. Instead, you should use char.ConvertToUtf32:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
int textUtf32 = char.ConvertToUtf32(text, 0);
int upperBoundUtf32 = char.ConvertToUtf32(upperBound, 0);
Console.WriteLine(textUtf32 < upperBoundUtf32); // True
So that's probably what you need to do in your method. You might want to use StringInfo.LengthInTextElements to check that the strings really are single UTF-32 code points first.
From https://msdn.microsoft.com/library/aa664669.aspx, you have to use \U with full 8 hex digits. So for example:
string str1 = "\U0001F300";
string str2 = "\uD83C\uDF00";
bool eq = str1 == str2;
using the :cyclone: emoji.

Categories

Resources