C# Weird Backslash on Convert.ToChar()

C# Weird Backslash on Convert.ToChar() - c#

I'm trying to convert a xml character entity to a C# char...
string charString = "₁".Replace("&#", "\\").Replace(";", "");
char c = Convert.ToChar(charString);
I have no idea why it is failing on the Convert.Char line. Even though the debugger shows charString as "\\\\x2081" it really is "\x2081", which is a valid Unicode character. The exception is too many characters.

The documentation for ToChar(string) is quite readable:
Converts the first character of a specified string to a Unicode character.
Also:
FormatException – The length of value is not 1.
It will not convert a hex representation of your character into said character. It will take a one-character string and give you that character back. The same as doing s[0].
What you want is:
string hex = "₁".Replace("&#x", "").Replace(";", "");
char c = (char)Convert.ToInt32(hex, 16);

Convert.ToChar(value) with value is a string of length 1. But charString is "\\x2081" length over 1.
Seems "₁" is Unicode Hex Character Code (Unicode Hex Character Code ₁ ). So you must do that:
string charString = "₁".Replace("&#x", "").Replace(";", "");
char c = (char)Convert.ToInt32(charString , NumberStyles.HexNumber);
Note: It's HTML Entity (hex) of SUBSCRIPT ONE (see in link above ^_^)

Related

Why/how function string.Substring treats "\u0002" as a one sign? Unicode chars in C#

Why/how function string.Substring treats "\u0002" as a one sign?
I mean ok, "\u0002" is a "character" STX.
\u says that is unicode
Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units.
Code checks if preffix suffics is correct. Data length does not matter.
Prefix is STX , suffix is ETX added do data string.
How to do this(code below) explicitly without a doubt?
string stx = "\u0002";
string etx = "\u0003";
string ReceivedData= stx + "1122334455" + etx;
string prefix = ReceivedData.Substring(0, 1);
string suffix = ReceivedData.Substring(ReceivedData.Length - 1, 1);

Do you wonder the working mechanism of UTF-16 and Unicode? May this topic helps:
What is Unicode, UTF-8, UTF-16?
The code snippet looks reasonable as the variables are explicitly named and '\u' is a sign of Unicode.
string stx = "\u0002";
string etx = "\u0003";
string prefix = ReceivedData.Substring(0, 1);
string suffix = ReceivedData.Substring(ReceivedData.Length - 1, 1);

convert a hex string to corresponding emoji string

I'm trying to create a string with emoji "👱" starting from this string "D83DDC71". For doing that I'm trying to convert the string above in this string "\uD83D\uDC71".
If i use this code it work (textbox shows 👱 as expected):
textbox.Text += "\uD83D\uDC71";
but if i use this it doesn't work (textbox shows exact text "\uD83D\uDC71" instead of single character):
textbox.Text += sender.Code.ToString("X").insert(4, #"\u").insert(0, #"\u");
What is the right way to convert hex representation of an emoji to a corresponding C# string (UTF-16)?

Okay. It seems you have a string which gives the hexadecimal of each of the UTF-16 code units of the character U+1F471 (👱).
Since char represents a UTF-16 code unit, split the string into two 4-character chunks, parse that into an int as hexadecimal, cast each to char and then combine them into a string:
var personWithBlondHair = ""
+ (char)int.Parse("D83DDC71".Substring(0, 4), NumberStyles.HexNumber)
+ (char)int.Parse("D83DDC71".Substring(4, 4), NumberStyles.HexNumber);
As per https://dotnetfiddle.net/oTgXfG

You have a string containing two shorts in hexadecimal form, so you need to parse them first. My example uses an overload of Convert.ToInt16 which also accepts an integer specifying the base of the integers in the string which, in our case, is 16 (hexadecimal).
string ParseUnicodeHex(string hex)
{
var sb = new StringBuilder();
for (int i = 0; i < hex.Length; i+=4)
{
string temp = hex.Substring(i, 4);
char character = (char)Convert.ToInt16(temp, 16);
sb.Append(character);
}
return sb.ToString();
}
Please note that this method will fail if the string's length isn't divisible by 4.
The reason this works:
textbox.Text += "\uD83D\uDC71";
is because you've got a string literal containing unicode character escape sequences. When you compile your program, the compiler replaces these escape sequences with the correct unicode bytes. This is why you cannot just add \u in front of the characters during execution to make it work.

Try this one
string str = "D83DDC71";
string emoji = string.Join("", (from Match m in Regex.Matches(str, #"\S{4}")
select (char) int.Parse(m.Value, NumberStyles.HexNumber)).ToArray());
This will Separate your string 4 by 4 into array of strings. then it will convert each of strings into char. Finally it will Join all the chars into one string as emoji. all in one line.

How to concatenate string with a backslash with another string

How can I concatenate the string "\u" with "a string" to get "\u0000"?
My code creates two backslashes:
string a = #"\u" + "0000"; //ends up being "\\\u0000";

The escape sequence \uXXXX is part of the language's syntax and represents a single Unicode character. By contrast, #"\u" and "0000" are two different strings, with a total of six characters. Concatenating them won't magically turn them into a single Unicode escape.
If you're trying to convert a Unicode code point into a single-character string, do this:
char.ConvertFromUtf32(strUnicodeOfMiddleChar).ToString()
BTW, don't use == true; it's redundant.

If I understand you correctly, I think you want to build a single-char string from an arbitrary Unicode value (4 hex digits). So given the string "0000", you want to convert that into the string "\u0000", i.e., a string containing a single character.
I think this is what you want:
string f = "0000"; // Or whatever
int n = int.Parse(f, NumberStyles.AllowHexSpecifier);
string s = ((char) n).ToString();
The resulting string s is "\u0000", which you can then use for your search.
(With corrections suggested by Thomas Levesque.)

the line below creates tow backslash:
string a = #"\u" + "0000"; //a ends up being "\\u0000";
No, it doesn't; the debugger shows "\" as "\", because that's how you write a backslash in C# (when you don't prefix the string with #). If you print that string, you will see \u0000, not \\u0000.

Nope, that string really has single backslash in. Print it out to the console and you'll see that.

Escape your characters correctly!!
Both:
// I am an escaped '\'.
string a = "\\u" + "0000";
And:
// I am a literal string.
string a = #"\u" + "0000";
Will work just fine. But, and I am going out on a limb here, I am guessing that you are trying to escape a Unicode Character and Hex value so, to do that, you need:
// I am an escaped Unicode Sequence with a Hex value.
char a = '\uxxxx';

Adding whitespaces to a string in C#

I'm getting a string as a parameter.
Every string should take 30 characters and after I check its length I want to add whitespaces to the end of the string.
E.g. if the passed string is 25 characters long, I want to add 5 more whitespaces.
The question is, how do I add whitespaces to a string?

You can use String.PadRight for this.
Returns a new string that left-aligns the characters in this string by padding them with spaces on the right, for a specified total length.
For example:
string paddedParam = param.PadRight(30);

You can use String.PadRight method for this;
Returns a new string of a specified length in which the end of the
current string is padded with spaces or with a specified Unicode
character.
string s = "cat".PadRight(10);
string s2 = "poodle".PadRight(10);
Console.Write(s);
Console.WriteLine("feline");
Console.Write(s2);
Console.WriteLine("canine");
Output will be;
cat feline
poodle canine
Here is a DEMO.
PadRight adds spaces to the right of strings. It makes text easier to
read or store in databases. Padding a string adds whitespace or other
characters to the beginning or end. PadRight supports any character
for padding, not just a space.

Use String.PadRight which will space out a string so it is as long as the int provided.
var str = "hello world";
var padded = str.PadRight(30);
// padded = "hello world "

you can use Padding in C#
eg
string s = "Example";
s=s.PadRight(30);
I hope It will resolve your problem.

Convert hexadecimal unicode character into its visual representation

I am trying to make a C# program that translates unicode character from its hexadecimal format to a single character, and I have a problem. This is my code:
This works:
char e = Convert.ToChar("\u0066");
However, this doesn't work:
Console.WriteLine("enter unicode format character (for example \\u0066)");
string s = Console.ReadLine();
Console.WriteLine("you entered (for example f)");
char c = Convert.ToChar(s);
Because (Convert.ToChar("\\u0066")) gives the error:
String must be exactly one character long
Anyone have an idea how to do this?

int.Parse doesn't like the "\u" prefix, but if you validate first to ensure that it's there, you can use
char c = (char)int.Parse(s.Substring(2), NumberStyles.HexNumber);
This strips the first two characters from the input string and parses the remaining text.
In order to ensure that the sequence is a valid one, try this:
Regex reg = new Regex(#"^\\u([0-9A-Fa-f]{4})$");
if( reg.IsMatch(s) )
{
char c = (char)int.Parse(s.Substring(2), NumberStyles.HexNumber);
}
else
{
// Error
}

Convert.ToChar("\u0066");
This is a one-character string at run-time, because the compiler processed the backslash sequence.
The rest of your code is dealing with six character strings { '\\', 'u', '0', '0', '6', '6' }, which Convert.ToChar cannot handle.
Try char.Parse (or possibly Int16.Parse(s, NumberStyles.AllowHexSpecifier) followed by a cast to char).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Weird Backslash on Convert.ToChar() - c#

Related

Why/how function string.Substring treats "\u0002" as a one sign? Unicode chars in C#

convert a hex string to corresponding emoji string

How to concatenate string with a backslash with another string

Adding whitespaces to a string in C#

Convert hexadecimal unicode character into its visual representation

Categories

Resources