convert a hex string to corresponding emoji string - c#

I'm trying to create a string with emoji "πŸ‘±" starting from this string "D83DDC71". For doing that I'm trying to convert the string above in this string "\uD83D\uDC71".
If i use this code it work (textbox shows πŸ‘± as expected):
textbox.Text += "\uD83D\uDC71";
but if i use this it doesn't work (textbox shows exact text "\uD83D\uDC71" instead of single character):
textbox.Text += sender.Code.ToString("X").insert(4, #"\u").insert(0, #"\u");
What is the right way to convert hex representation of an emoji to a corresponding C# string (UTF-16)?

Okay. It seems you have a string which gives the hexadecimal of each of the UTF-16 code units of the character U+1F471 (πŸ‘±).
Since char represents a UTF-16 code unit, split the string into two 4-character chunks, parse that into an int as hexadecimal, cast each to char and then combine them into a string:
var personWithBlondHair = ""
+ (char)int.Parse("D83DDC71".Substring(0, 4), NumberStyles.HexNumber)
+ (char)int.Parse("D83DDC71".Substring(4, 4), NumberStyles.HexNumber);
As per https://dotnetfiddle.net/oTgXfG

You have a string containing two shorts in hexadecimal form, so you need to parse them first. My example uses an overload of Convert.ToInt16 which also accepts an integer specifying the base of the integers in the string which, in our case, is 16 (hexadecimal).
string ParseUnicodeHex(string hex)
{
var sb = new StringBuilder();
for (int i = 0; i < hex.Length; i+=4)
{
string temp = hex.Substring(i, 4);
char character = (char)Convert.ToInt16(temp, 16);
sb.Append(character);
}
return sb.ToString();
}
Please note that this method will fail if the string's length isn't divisible by 4.
The reason this works:
textbox.Text += "\uD83D\uDC71";
is because you've got a string literal containing unicode character escape sequences. When you compile your program, the compiler replaces these escape sequences with the correct unicode bytes. This is why you cannot just add \u in front of the characters during execution to make it work.

Try this one
string str = "D83DDC71";
string emoji = string.Join("", (from Match m in Regex.Matches(str, #"\S{4}")
select (char) int.Parse(m.Value, NumberStyles.HexNumber)).ToArray());
This will Separate your string 4 by 4 into array of strings. then it will convert each of strings into char. Finally it will Join all the chars into one string as emoji. all in one line.

Related

convert non alphanumeric glyphs to unicode while preserving alphanumeric

I need to convert non alpha-numeric glyphs in a string to their unicode value, while preserving the alphanumeric characters. Is there a method to do this in C#?
As an example, I need to convert this string:
"hello world!"
To this:
"hello_x0020_world_x0021_"
To get string safe for XML node name you should use XmlConver.EncodeName.
Note that if you need to encode all non-alphanumeric characters you'd need to write it yourself as "_" is not encoded by that method.
You could start with this code using LINQ Select extension method:
string str = "hello world!";
string a = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
a += a.ToLower();
char[] alphabet = a.ToCharArray();
str = string.Join("",
str.Select(ch => alphabet.Contains(ch) ?
ch.ToString() : String.Format("_x{0:x4}_", ch)).ToArray()
);
Now clearly it has some problems:
it does linear search in the list of characters
missed numeric...
if we add numeric need to decide if first character is ok to be digit (assuming yes)
code creates large number of strings that are immediately discarded (one per character)
alphanumeric is limited to ASCII (assuming ok, if not Char.IsLetterOrDigit to help)
does to much work for pure alpha-numeric strings
First two are easy - we can use HashSet (O(1) Contains) initialized by full list of characters (if any alpahnumeric characters are ok more readable to use existing method - Char.IsLetterOrDigit):
public static HashSet<char> asciiAlphaNum = new HashSet<char>
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
To avoid ch.ToString() that really pointlessly produces strings for immediate GC we need to figure out how to construct string from mix of char and string. String.Join does not work because it wants strings to start with, regular new string(...) does not have option for mix of char and string. So we are left with StringBuilder that happily takes both to Append. Consider starting with initial size str.Length if most strings don't have other characters.
So for each character we just need to either builder.Append(ch) or builder.AppendFormat(("_x{0:x4}_", (int)ch). To perform iteration it is easier to just use regular foreach, but if one really wants LINQ - Enumerable.Aggregate is the way to go.
string ReplaceNonAlphaNum(string str)
{
var builder = new StringBuilder();
foreach (var ch in str)
{
if (asciiAlphaNum.Contains(ch))
builder.Append(ch);
else
builder.AppendFormat("_x{0:x4}_", (int)ch);
}
return builder.ToString();
}
string ReplaceNonAlphaNumLinq(string str)
{
return str.Aggregate(new StringBuilder(), (builder, ch) =>
asciiAlphaNum.Contains(ch) ?
builder.Append(ch) : builder.AppendFormat("_x{0:x4}_", (int)ch)
).ToString();
}
To the last point - we don't really need to do anything if there is nothing to convert - so some check like check alphanumeric characters in string in c# would help to avoid extra strings.
Thus final version (LINQ as it is a bit shorter and fancier):
private static asciiAlphaNumRx = new Regex(#"^[a-zA-Z0-9]*$");
public static HashSet<char> asciiAlphaNum = new HashSet<char>
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789");
string ReplaceNonAlphaNumLinq(string str)
{
return asciiAlphaNumRx.IsMatch(str) ? str :
str.Aggregate(new StringBuilder(), (builder, ch) =>
asciiAlphaNum.Contains(ch) ?
builder.Append(ch) : builder.AppendFormat("_x{0:x4}_", (int)ch)
).ToString();
}
Alternatively whole thing could be done with Regex - see Regex replace: Transform pattern with a custom function for starting point.

C# subscript string including variable

I know that in C# to write subscript, I should use Unicode
for example
I want to write H2O , I should write
String str = "H"+"\x2082"+ "O"
But I want to put variable type of int instead of 2 in formula
How can I create a string with variable, which is written in subscript?
In Unicode, the subscript digits are assigned consecutive codepoints, ranging from U+2080 to U+2089. Thus, if you take the Unicode character for subscript 0 – namely, 'β‚€' – then you can obtain the subscript character for any other digit by adding its numeric value to the former's codepoint.
If your integer will only consist of a single digit:
int num = 3;
char subscript = (char)('β‚€' + num); // '₃'
If your integer may consist of any number of digits, then you can apply the same addition to each of its digits individually. The easiest way of enumerating an integer's digits is by converting it to a string, and using the LINQ Select operator to get the individual characters. Then, you subtract '0' from each character to get the digit's numeric value, and add it to 'β‚€' as described above. The code below assumes that num is non-negative.
int num = 351;
var chars = num.ToString().Select(c => (char)('β‚€' + c - '0'));
string subscript = new string(chars.ToArray()); // "₃₅₁"
This wikipedia article shows you all the unicode codes for super and subscript symbols.
You can simply create a method which maps these:
public string GetSubScriptNumber(int i)
{
// get the code as needed
}
I will give a few hints to help you along:
Unfortunately you can't just do return "\x208" + i so you'll need to do a switch for the numbers 0-9 or add the integer to "\x2080".
If you only need 0-9 then do some error checking that the input is in that range and throw an ArgumentOutOfRangeException
If you need all ints then you may find it easier splitting it up into each digit and getting a char for each of those - watch out for negative numbers but there is a character for subscript minus sign too!
To include the number in your string, you can use something like String.Format:
String.Format("H{0}O", GetSubScriptNumber(i))
Try to escape it
String str = "H" + "\\x2082" + "O";
or use verbose strings
String str = "H" + #"\x2082" + "O";
Maybe we don't understand your question. Several of the answers above seem correct. Does this work?
static string GetSubscript(int value)
{
StringBuilder returnValue = new StringBuilder();
foreach (char digit in value.ToString())
returnValue.Append((char)(digit - '0' + 'β‚€'));
return returnValue.ToString();
}
string waterFormula = "H" + GetSubscript(2) + "0" // Hβ‚‚O
string methaneFormula = "CH" + GetSubscript(4) // CHβ‚„

How to concatenate string with a backslash with another string

How can I concatenate the string "\u" with "a string" to get "\u0000"?
My code creates two backslashes:
string a = #"\u" + "0000"; //ends up being "\\\u0000";
The escape sequence \uXXXX is part of the language's syntax and represents a single Unicode character. By contrast, #"\u" and "0000" are two different strings, with a total of six characters. Concatenating them won't magically turn them into a single Unicode escape.
If you're trying to convert a Unicode code point into a single-character string, do this:
char.ConvertFromUtf32(strUnicodeOfMiddleChar).ToString()
BTW, don't use == true; it's redundant.
If I understand you correctly, I think you want to build a single-char string from an arbitrary Unicode value (4 hex digits). So given the string "0000", you want to convert that into the string "\u0000", i.e., a string containing a single character.
I think this is what you want:
string f = "0000"; // Or whatever
int n = int.Parse(f, NumberStyles.AllowHexSpecifier);
string s = ((char) n).ToString();
The resulting string s is "\u0000", which you can then use for your search.
(With corrections suggested by Thomas Levesque.)
the line below creates tow backslash:
string a = #"\u" + "0000"; //a ends up being "\\u0000";
No, it doesn't; the debugger shows "\" as "\", because that's how you write a backslash in C# (when you don't prefix the string with #). If you print that string, you will see \u0000, not \\u0000.
Nope, that string really has single backslash in. Print it out to the console and you'll see that.
Escape your characters correctly!!
Both:
// I am an escaped '\'.
string a = "\\u" + "0000";
And:
// I am a literal string.
string a = #"\u" + "0000";
Will work just fine. But, and I am going out on a limb here, I am guessing that you are trying to escape a Unicode Character and Hex value so, to do that, you need:
// I am an escaped Unicode Sequence with a Hex value.
char a = '\uxxxx';

string and 4-byte Unicode characters

I have one question about strings and chars in C#. I found that a string in C# is a Unicode string, and a char takes 2 bytes. So every char is in UTF-16 encoding. That's great, but I also read on Wikipedia that there are some characters that in UTF-16 take 4 bytes.
I'm doing a program that lets you draw characters for alphanumerical displays. In program there is also a tester, where you can write some string, and it draws it for you to see how it looks.
So how I should work with strings, where the user writes a character which takes 4 bytes, i.e. 2 chars. Because I need to go char by char through the string, find this char in the list, and draw it into the panel.
You you could do:
for( int i = 0; i < str.Length; ++i ) {
int codePoint = Char.ConvertToUTF32( str, i );
if( codePoint > 0xffff ) {
i++;
}
}
Then the codePoint represents any possible code point as a 32 bit integer.
Work entirely with String objects; don't use Char at all. Example using IndexOf:
var needle = "ℬ"; // U+1D49D (I think)
var hayStack = "a code point outside basic multi lingual plane: ℬ";
var index = heyStack.IndexOf(needle);
Most methods on the String class have overloads which accept Char or String. Most methods on Char have overrides which use String as well. Just don't use Char.

C# Weird Backslash on Convert.ToChar()

I'm trying to convert a xml character entity to a C# char...
string charString = "₁".Replace("&#", "\\").Replace(";", "");
char c = Convert.ToChar(charString);
I have no idea why it is failing on the Convert.Char line. Even though the debugger shows charString as "\\\\x2081" it really is "\x2081", which is a valid Unicode character. The exception is too many characters.
The documentation for ToChar(string) is quite readable:
Converts the first character of a specified string to a Unicode character.
Also:
FormatException – The length of value is not 1.
It will not convert a hex representation of your character into said character. It will take a one-character string and give you that character back. The same as doing s[0].
What you want is:
string hex = "₁".Replace("&#x", "").Replace(";", "");
char c = (char)Convert.ToInt32(hex, 16);
Convert.ToChar(value) with value is a string of length 1. But charString is "\\x2081" length over 1.
Seems "₁" is Unicode Hex Character Code (Unicode Hex Character Code ₁ ). So you must do that:
string charString = "₁".Replace("&#x", "").Replace(";", "");
char c = (char)Convert.ToInt32(charString , NumberStyles.HexNumber);
Note: It's HTML Entity (hex) of SUBSCRIPT ONE (see in link above ^_^)

Categories

Resources