Six digit unicode escaped value comparison - c#

I have a six digit unicode character, for example U+100000 which I wish to make a comparison with a another char in my C# code.
My reading of the MSDN documentation is that this character cannot be represented by a char, and must instead be represented by a string.
a Unicode character in the range U+10000 to U+10FFFF is not permitted in a character literal and is represented using a Unicode surrogate pair in a string literal
I feel that I'm missing something obvious, but how can you get the follow comparison to work correctly:
public bool IsCharLessThan(char myChar, string upperBound)
{
return myChar < upperBound; // will not compile as a char is not comparable to a string
}
Assert.IsTrue(AnExample('\u0066', "\u100000"));
Assert.IsFalse(AnExample("\u100000", "\u100000")); // again won't compile as this is a string and not a char
edit
k, I think I need two methods, one to accept chars and another to accept 'big chars' i.e. strings. So:
public bool IsCharLessThan(char myChar, string upperBound)
{
return true; // every char is less than a BigChar
}
public bool IsCharLessThan(string myBigChar, string upperBound)
{
return string.Compare(myBigChar, upperBound) < 0;
}
Assert.IsTrue(AnExample('\u0066', "\u100000));
Assert.IsFalse(AnExample("\u100022", "\u100000"));

To construct a string with the Unicode code point U+10FFFF using a string literal, you need to work out the surrogate pair involved.
In this case, you need:
string bigCharacter = "\uDBFF\uDFFF";
Or you can use char.ConvertFromUtf32:
string bigCharacter = char.ConvertFromUtf32(0x10FFFF);
It's not clear what you want your method to achieve, but if you need it to work with characters not in the BMP, you'll need to make it accept int instead of char, or a string.
As per the documentation for string, if you want to iterate over characters in a string as full Unicode values, use TextElementEnumerator or StringInfo.
Note that you do need to do this explicitly. If you just use ordinal values, it will check UTF-16 code units, not the UTF-32 code points. For example:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
Console.WriteLine(string.Compare(text, upperBound, StringComparison.Ordinal));
This prints out a value greater than zero, suggesting that text is greater than upperBound here. Instead, you should use char.ConvertToUtf32:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
int textUtf32 = char.ConvertToUtf32(text, 0);
int upperBoundUtf32 = char.ConvertToUtf32(upperBound, 0);
Console.WriteLine(textUtf32 < upperBoundUtf32); // True
So that's probably what you need to do in your method. You might want to use StringInfo.LengthInTextElements to check that the strings really are single UTF-32 code points first.

From https://msdn.microsoft.com/library/aa664669.aspx, you have to use \U with full 8 hex digits. So for example:
string str1 = "\U0001F300";
string str2 = "\uD83C\uDF00";
bool eq = str1 == str2;
using the :cyclone: emoji.

Related

Convert single-character string to char

I need to convert single string character to ASC code, similar to Visual Basic ASC("a")
I need to do it in C#, something similar to ToCharArray()
("a").ToCharArray()
returns
{char[1]}
[0]: 97 'a'
I need to have 97 alone.
A string is an array of char, so you can get the first character using array indexing syntax, and a char, if used as an int (which is an implicit conversion), will return the ASCII value.
Try:
int num = "a"[0]; // num will be 97
// Which is the same as using a char directly to get the int value:
int num = 'a'; // num will be 97
What you're seeing that seems to be causing some confusion is how the char type is represented in the debugger: both the character and the int value are shown.
Here's an example of an int and a char in the debugger as well as in the console window (which is their ToString() representation):
int num = "a"[0];
char chr = "a"[0];
Console.WriteLine($"num = {num}");
Console.WriteLine($"chr = {chr}");
If you want to convert a single character string to char, do this
char.Parse("a");
If you want to get char code do this
char.ConvertToUtf32("a", 0); // return 97
char chrReadLetter;
chrReadLetter = (char)char.ConvertToUtf32(txtTextBox1.Text.Substring(0, 1), 0);
Reads the first letter of the textbox into a character variable.

How to convert unicode set from db to characters?

I need to convert unicode characters that I take from the database field to a string value. In the database field unicode characters are in format U+0024 and next I get \u0024 format but I cannot convert it.
string a = "U+0024";
string b = a.Remove(0, 2);
string c = #"\u" + b;
string d = System.Uri.UnescapeDataString(c);
Console.WriteLine(d);
// There is \u0024 in output
string e =System.Uri.UnescapeDataString(\u0024);
Console.WriteLine(e);
//There is $ in output that I would like to
The strings you got from your DB seems to be Unicode codepoints, as they are in the format U+XXXX.
There is a very useful method called char.ConvertFromUtf32 that converts a Unicode code point to a string containing a single char, or a surrogate pair of chars.
This method accepts an int as parameter, so you would need to convert your b string (which is in hexadecimal) into an int.
int codepoint = Convert.ToInt32(b, 16);
Then, pass it to ConvertFromUtf32:
string result = char.ConvertFromUtf32(codepoint);

String with index conversion or array of numbers

Why i can't convert this string to a number? Or how to make a array of numbers from this string.
string str = "110101010";
int c = Int32.Parse(str[0]);
str is a string so str[0] returns a char and the Parse method doesnt take a char as input but rather a string.
if you want to convert the string into an int then you would need to do:
int c = Int32.Parse(str); // or Int32.Parse(str[0].ToString()); for a single digit
or you're probably looking for a way to convert all the individual numbers into an array which can be done as:
var result = str.Select(x => int.Parse(x.ToString()))
.ToArray();
I assume you are trying to convert a binary string into its decimal representation.
For this you could make use of System.Convert:
int c = Convert.ToInt32(str, 2);
For the case that you want to sum up all the 1s and 0s from the string you could make use of System.Linq's Select() and Sum():
int c = str.Select(i => int.Parse(i.ToString())).Sum();
Alternatively if you just want to have an array of 1s and 0s from the string you could omit the Sum() and instead enumerate to an array using ToArray():
int[] c = str.Select(i => int.Parse(i.ToString())).ToArray();
Disclaimer: The two snippets above using int.Parse()would throw an exception if str were to contain a non-numeric character.
Int32.Parse accepts string argument, not char which str[0] returs.
To get the first number, try:
string str = "110101010";
int c = Int32.Parse(str.Substring(0, 1));

Use a Decimal Value as a Hexadecimal Value

I have the int 15 (or the string "15", that's just as easy), and I need to use it to create the value:
"\u0015"
Is there some conversion which would accomplish this? I can't do this:
"\u00" + myInt.ToString()
Because the first literal is invalid. Is there a simple way to get this result?
(If you're curious, this is for integrating with a hardware device where the vendor sometimes expresses integer values as hexadecimal. For example, I'd need to send today's date to the device as "\u0015\u0010\u0002".)
Given that you want a Unicode code point of 21, not 15, you should definitely start with the string "15". If you try to start with 15 as an int, you'll find you can't express anything with a hex representation involving A-F...
So, given "15" the simplest way of parsing that as hex is probably:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
After that, you just need to cast to char:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
char character = (char) codePoint;
Note that this will only work for code points in the Basic Multilingual Plane (BMP) - i.e. U+0000 to U+FFFF. If you need to handle values beyond that (e.g. U+1F601) then you should use char.ConvertFromUtf32 instead:
string text = "15";
int codePoint = Convert.ToInt32(text, 16);
string character = char.ConvertFromUtf32(codePoint);
Unicode literals in strings are resolved at compile-time, that's why "\u00" + myInt.ToString() doesn't work (the ToString() and concatenation are evaluated at runtime).
You could cast the int to char:
int unicodeCodePoint = 15; // or 21
char c = (char)unicodeCodePoint;

Casting HexNumber as character to string

I need to process a numeral as a string.
My value is 0x28 and this is the ascii code for '('.
I need to assign this to a string.
The following lines do this.
char c = (char)0x28;
string s = c.ToString();
string s2 = ((char)0x28).ToString();
My usecase is a function that only accepts strings.
My call ends up looking cluttered:
someCall( ((char)0x28).ToString() );
Is there a way of simplifying this and make it more readable without writing '(' ?
The Hexnumber in the code is always paired with a Variable that contains that hex value in its name, so "translating" it would destroy that visible connection.
Edit:
A List of tuples is initialised with this where the first item has the character in its name and the second item results from a call with that character.
One of the answers below is exactly what i am looking for so i incorporated it here now.
{ existingStaticVar0x28, someCall("\u0028") }
The reader can now instinctively see the connection between item1 and item2 and is less likely to run into a trap when this gets refactored.
You can use Unicode character escape sequence in place of a hex to avoid casting:
string s2 = '\u28'.ToString();
or
someCall("\u28");
Well supposing that you have not a fixed input then you could write an extension method
namespace MyExtensions
{
public static class MyStringExtensions
{
public static string ConvertFromHex(this string hexData)
{
int c = Convert.ToInt32(hexCode, 16);
return new string(new char[] {(char)c});
}
}
}
Now you could call it in your code wjth
string hexNumber = "0x28"; // or whatever hexcode you need to convert
string result = hexNumber.ConvertFromHex();
A bit of error handling should be added to the above conversion.

Categories

Resources