How to convert unicode set from db to characters?

How to convert unicode set from db to characters? - c#

I need to convert unicode characters that I take from the database field to a string value. In the database field unicode characters are in format U+0024 and next I get \u0024 format but I cannot convert it.
string a = "U+0024";
string b = a.Remove(0, 2);
string c = #"\u" + b;
string d = System.Uri.UnescapeDataString(c);
Console.WriteLine(d);
// There is \u0024 in output
string e =System.Uri.UnescapeDataString(\u0024);
Console.WriteLine(e);
//There is $ in output that I would like to

The strings you got from your DB seems to be Unicode codepoints, as they are in the format U+XXXX.
There is a very useful method called char.ConvertFromUtf32 that converts a Unicode code point to a string containing a single char, or a surrogate pair of chars.
This method accepts an int as parameter, so you would need to convert your b string (which is in hexadecimal) into an int.
int codepoint = Convert.ToInt32(b, 16);
Then, pass it to ConvertFromUtf32:
string result = char.ConvertFromUtf32(codepoint);

Related

Convert string with special characters to hex - C#

Hi I'm trying to transform a string containing special characters like û and ….
In my research and tests I almost succeeded using the following function:
public static string ToHex(this string input)
{
char[] values = input.ToCharArray();
string hex = "0x";
string add = "";
foreach (char c in values)
{
int value = Convert.ToInt32(c);
add = String.Format("{0:X}", value).Length == 1 ?
"0" + String.Format("{0:X}", value) + "00"
: String.Format("{0:X}", value) + "00";
hex += add;
}
return hex;
}
If I try to decode ´o¸sçPQ^ûË\u000f±d it does it correctly and turns it into this 0xB4006F00B8007300E700500051005E00FB00CB000F00B1006400,
instead when I try to decode ´o¸sçPQ](ÂF\u0012…a it fails and turns it into 0xB4006F00B8007300E700500051005D002800C200460012002026006100 instead of this
0xB4006F00B8007300E700500051005D002800C2004600120026206100.
Making a minimum of debug I saw that the string is transformed from
´o¸sçPQ](ÂF\u0012…a to ´o¸sçPQ](ÂF.a, I wouldn't want that to be the problem but I'm not sure.
EDIT
0xB4006F00B8007300E700500051005D002800C2004600120026206100 ´o¸sçPQ](ÂF…a CORRECT
0xB4006F00B8007300E700500051005D002800C200460012002026006100 ´o¸sçPQ](ÂF.a MY OUTPUT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» CORRECT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» MY OUTPUT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na CORRECT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na MY OUTPUT
0xB4006F00B8007300E700500051005F001A20BC006B0021003500DD00 ´o¸sçPQ_‚¼k!5Ý CORRECT
0xB4006F00B8007300E700500051005F00201A00BC006B0021003500DD00 ´o¸sçPQ_'¼k!5Ý MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006B00290014204E004100 ´o¸sçPQ]/îk)—NA CORRECT
0xB4006F00B8007300E700500051005D002F00EE006B0029002014004E004100 ´o¸sçPQ]/îk)-NA MY OUTPUT
0xB4006F00B8007300E700500051005D003800E600690036001C204C004F00 ´o¸sçPQ]8æi6“LO CORRECT
0xB4006F00B8007300E700500051005D003800E60069003600201C004C004F00 ´o¸sçPQ]8æi6"LO MY OUTPUT
0xB4006F00B8007300E700500051005D002F00F3006200390014204E004700C602 ´o¸sçPQ]/ób9—NGˆ CORRECT
0xB4006F00B8007300E700500051005D002F00F300620039002014004E0047002C600 ´o¸sçPQ]/ób9-NG^ MY OUTPUT
0xB4006F00B8007300E700500051005D003B00EE007200330078014100 ´o¸sçPQ];îr3ŸA CORRECT
0xB4006F00B8007300E700500051005D003B00EE0072003300178004100 ´o¸sçPQ];îr3YA MY OUTPUT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>K CORRECT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>?K MY OUTPUT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> CORRECT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006A003000DC024500 ´o¸sçPQ]/îj0˜E CORRECT
0xB4006F00B8007300E700500051005D002F00EE006A0030002DC004500 ´o¸sçPQ]/îj0~E MY OUTPUT
I thank you in advance for every reply or comment,
greetings.

This is due to endianness, and different integer and string encodings.
char cc = '…';
Console.WriteLine(cc);
// 2026 <-- note, hex value differs from byte representation shown below
Console.WriteLine(((int)cc).ToString("x"));
// 26200000
Console.WriteLine(BytesToHex(BitConverter.GetBytes((int)cc)));
// 2620
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes(new[] { cc })));
You should not treat chars as integers. There are plenty of different ways to encode strings, .net internally uses UTF-16. And all encodings works with bytes, not with integers. Explicit conversion chars to integer can lead to unexpected results, like yours. Why don't you get encoding you need and work with bytes via Encoding.GetBytes?
void Main()
{
// output you expect 0xB4006F00B8007300E700500051005D002800C2004600120026206100
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes("´o¸sçPQ](ÂF\u0012…a")));
}
public static string BytesToHex(byte[] bytes)
{
// whatever way to convert bytes to hex
return "0x" + BitConverter.ToString(bytes).Replace("-", "");
}

c# padding to string

Am reading from an excel sheet column and need to save into an sql table.
this is what the fiedl looks like in excel;'33349836', but I need it to be saved this way in the database '0033349836', because that field needs to be 10characters.

You can use String.PadLeft Method (Int32, Char) overload.
Returns a new string that right-aligns the characters in this instance
by padding them on the left with a specified Unicode character, for a
specified total length.
string s = "33349836";
string newstring = s.PadLeft(10, '0');
Remember, 0033349836 will be a string representation of your numeric values. Don't keep this kind of data in a numeric column type. Keep it in a some character type of column like nvarchar.

There are a couple of ways to do this using C#.
One such way is this:
static void Main(string[] args)
{
string s = "33349836";
int width = 10;
char padding = '0';
string s1 = s.PadLeft(width, padding);
Console.WriteLine(s);
Console.WriteLine(s1);
}
That code will output these values:
33349836
0033349836

You can use PadLeft() function to add zeros to your string.
Try This:
var str = "33349836";
str = str.PadLeft(10,'0');

C# Convert Int64 to String using a cast

How do you convert an integer to a string? It works the other way around but not this way.
string message
Int64 message2;
message2 = (Int64)message[0];
If the message is "hello", the output is 104 as a number;
If I do
string message3 = (string)message2;
I get an error saying that you cant convert a long to a string. Why is this. The method .ToString() does not work because it converts just the number to a string so it will still show as "104". Same with Convert.ToString(). How do I make it say "hello" again from 104? In C++ it allows you to cast such methods but not in C#

message[0] gives first letter from string as char, so you're casting char to long, not string to long.
Try casting it back to char again and then concatenate all chars to get entire string.

ToString() is working exactly correctly. Your error is in the conversion to integer.
How exactly do you expect to store a string composed of non-numeric digits in a long? You might be interested in BitConverter, if you want to treat numbers as byte arrays.
If you want to convert a numeric ASCII code to a string, try
((char)value).ToString()

Another alternative approach is using ASCII.GetBytes method as below
string msg1 ="hello";
byte[] ba = System.Text.Encoding.ASCII.GetBytes(msg1);
//ba[0] = 104
//ba[1] = 101
//ba[2] = 108
//ba[3] = 108
//ba[4] = 111
string msg2 =System.Text.Encoding.ASCII.GetString(ba);
//msg2 = "hello"

Try this method:
string message3 = char.ConvertFromUtf32(message2);
104 is the value of "h" not "hello".

There is no integer representation of a string, only of a char. Therefore, as stated by others, 104 is not the value of "hello" (a string) but of 'h' (a char) (see the ASCII chart here).
I can't entirely think of why you'd want to convert a string to an int-array and then back into a string, but the way to do it would be to run through the string and get the int-value of each character and then reconvert the int-values into char-values and concatenate each of them. So something like
string str = "hello"
List<int> N = new List<int>();
//this creates the list of int-values
for(int i=0;i<str.Count;i++)
N.Add((int)str[i]);
//and this joins it all back into a string
string newString = "";
for(int i=0;i<str.Count;i++)
newString += (char)N[i];

convert a given char varialble to string, representing it's code in my encoding

I want to write a txt file. Some of the chars need to be escaped in a way: \'c1, where c1 is the code of a char in encoding 1251.
How can I convert a given char varialble to string, representing it's code in my encoding?
I found a way to do this for utf, but no way for other ecnodings. For utf variant there is Char.ConvertToUtf32() method.

// get the encoding
Encoding encoding = Encoding.GetEncoding(1251);
// for each character you want to encode
byte b = encoding.GetBytes("" + c)[0];
string hex = b.ToString("x");
string output = #"\'" + hex;

How can I convert a given char varialble to string, representing it's code in my encoding?
Try something like this:
var enc = Encoding.GetEncoding("Windows-1251");
char myCharacter = 'д'; // Cyrillic 'd'
byte code = enc.GetBytes(new[] { myCharacter, })[0];
Console.WriteLine(code.ToString()); // "228" (decimal)
Console.WriteLine(code.ToString("X2")); // "E4" (hex)

Six digit unicode escaped value comparison

I have a six digit unicode character, for example U+100000 which I wish to make a comparison with a another char in my C# code.
My reading of the MSDN documentation is that this character cannot be represented by a char, and must instead be represented by a string.
a Unicode character in the range U+10000 to U+10FFFF is not permitted in a character literal and is represented using a Unicode surrogate pair in a string literal
I feel that I'm missing something obvious, but how can you get the follow comparison to work correctly:
public bool IsCharLessThan(char myChar, string upperBound)
{
return myChar < upperBound; // will not compile as a char is not comparable to a string
}
Assert.IsTrue(AnExample('\u0066', "\u100000"));
Assert.IsFalse(AnExample("\u100000", "\u100000")); // again won't compile as this is a string and not a char
edit
k, I think I need two methods, one to accept chars and another to accept 'big chars' i.e. strings. So:
public bool IsCharLessThan(char myChar, string upperBound)
{
return true; // every char is less than a BigChar
}
public bool IsCharLessThan(string myBigChar, string upperBound)
{
return string.Compare(myBigChar, upperBound) < 0;
}
Assert.IsTrue(AnExample('\u0066', "\u100000));
Assert.IsFalse(AnExample("\u100022", "\u100000"));

To construct a string with the Unicode code point U+10FFFF using a string literal, you need to work out the surrogate pair involved.
In this case, you need:
string bigCharacter = "\uDBFF\uDFFF";
Or you can use char.ConvertFromUtf32:
string bigCharacter = char.ConvertFromUtf32(0x10FFFF);
It's not clear what you want your method to achieve, but if you need it to work with characters not in the BMP, you'll need to make it accept int instead of char, or a string.
As per the documentation for string, if you want to iterate over characters in a string as full Unicode values, use TextElementEnumerator or StringInfo.
Note that you do need to do this explicitly. If you just use ordinal values, it will check UTF-16 code units, not the UTF-32 code points. For example:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
Console.WriteLine(string.Compare(text, upperBound, StringComparison.Ordinal));
This prints out a value greater than zero, suggesting that text is greater than upperBound here. Instead, you should use char.ConvertToUtf32:
string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
int textUtf32 = char.ConvertToUtf32(text, 0);
int upperBoundUtf32 = char.ConvertToUtf32(upperBound, 0);
Console.WriteLine(textUtf32 < upperBoundUtf32); // True
So that's probably what you need to do in your method. You might want to use StringInfo.LengthInTextElements to check that the strings really are single UTF-32 code points first.

From https://msdn.microsoft.com/library/aa664669.aspx, you have to use \U with full 8 hex digits. So for example:
string str1 = "\U0001F300";
string str2 = "\uD83C\uDF00";
bool eq = str1 == str2;
using the :cyclone: emoji.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to convert unicode set from db to characters? - c#

Related

Convert string with special characters to hex - C#

c# padding to string

C# Convert Int64 to String using a cast

convert a given char varialble to string, representing it's code in my encoding

Six digit unicode escaped value comparison

Categories

Resources