C# unicode characters not getting displayed - c#

i'm trying the msdn example,
using windows xp , .Net 4.0,
using System;
using System.Text;
class Example
{
static void Main()
{
string unicodeString = "This string contains the unicode character Pi (\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
}
}
Expected:
// The example displays the following output:
// Original string: This string contains the unicode character Pi (Π)
// Ascii converted string: This string contains the unicode character Pi (?)
But I'm getting
// The example displays the following output:
// Original string: This string contains the unicode character Pi (?)
// Ascii converted string: This string contains the unicode character Pi (?)

Related

C# SHA256 Hashing

Below is the code I use for hashing and I am trying to store the hash in a string(strSHA256) with UTF-8 encoding.
using (SHA256 sha256Hash = SHA256.Create())
{
bytes = sha256Hash.ComputeHash(Encoding.UTF8.GetBytes(utf8EncodedString));
strSHA256 = Encoding.UTF8.GetString(bytes);
}
When I inspect the string variable I see "�u��Ou\u0004����O��zA�\u0002��\0�\a�}\u0012�L�Dr�"
type of value. My question is regarding to ? character marked in bold. They should be UTF-8 chars.
Why cant I view the UTF-8 special chars when I inspect from visual studio. The string o/p contains this weird char as well.
Note: I tried SHA256 with Node Js and I tend to see perfect UTF-8 special chars.
As #user2864740 mentioned bytes are not UTF8. Use base64 for bytes string representation.
var hello = "Hello world!";
using (var sha256Hash = SHA256.Create())
{
var bytes = sha256Hash.ComputeHash(Encoding.UTF8.GetBytes(hello));
// String representation
var stringForBytes = Convert.ToBase64String(bytes);
// HEX string representation
var str = new StringBuilder(bytes.Length * 2);
foreach (var b in bytes)
{
str.Append(b.ToString("X2"));
}
var hexString = str.ToString();
}

How to convert emoticons to its UTF-32/escaped unicode?

I am working on a chatting application in WPF and I want to use emoticons in it. I am working on WPF app. I want to read emoticons which are coming from Android/iOS devices and show respective images.
On WPF, I am getting a black Emoticon looking like . I somehow got a library of emoji icons which are saved with respective hex/escaped unicode values.
So, I want to convert these symbols of emoticons into UTF-32/escaped unicode so that I can directly replace related emoji icons with them.
I had tried to convert an emoticon to its unicode but end up getting a different string with couple of symbols, which are having different unicode.
string unicodeString = "\u1F642"; // represents 🙂
Encoding unicode = Encoding.Unicode;
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
char[] unicodeChars = new char[unicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length)];
unicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, unicodeChars, 0);
string asciiString = new string(unicodeChars);
Any help is appreciated!!
Your escaped Unicode String is invalid in C#.
string unicodeString = "\u1F642"; // represents 🙂
This piece of code doesnt represent the "slightly smiling face" since C# only respects the first 4 characters - representing an UTF-16 (with 2 Bytes).
So what you actually get is the letter representing 1F64 followed by a simple 2.
http://www.fileformat.info/info/unicode/char/1f64/index.htm
So this: ὤ2
If you want to type hex with 4 Bytes and get the corresponding string you have to use:
var unicodeString = char.ConvertFromUtf32(0x1F642);
https://msdn.microsoft.com/en-us/library/system.char.convertfromutf32(v=vs.110).aspx
or you could write it like this:
\uD83D\uDE42
This string can than be parsed like this, to get your desired result which is again is the hex value that we started with:
var x = char.ConvertFromUtf32(0x1F642);
var enc = new UTF32Encoding(true, false);
var bytes = enc.GetBytes(x);
var hex = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
{
hex.AppendFormat("{0:x2}", bytes[i]);
}
var o = hex.ToString();
//result is 0001F642
(The result has the leading Zeros, since an UTF-32 is always 4 Bytes)
Instead of the for Loop you can also use BitConverter.ToString(byte[]) https://msdn.microsoft.com/en-us/library/3a733s97(v=vs.110).aspx the result than will look like:
var x = char.ConvertFromUtf32(0x1F642);
var enc = new UTF32Encoding(true, false);
var bytes = enc.GetBytes(x);
var o = BitConverter.ToString(bytes);
//result is 00-01-F6-42
Please be aware that Encoding.Unicode is UTF-16 in C#. To read 32 bits Unicode, there is this Encoding.UTF32. Link on MSDN for Encoding.​UT​F32
Since C# source files can contain UTF-32 string literals, there is no need to use any encodings for this task.
Example 1.
var rgch = "\U0001F642".ToCharArray();
var str = $"\\u{(ushort)rgch[0]:X4}\\u{(ushort)rgch[1]:X4}";
Result: "\uD83D\uDE42"         Length of string str is 12 UTF-16 code points (24 bytes)
Example 2.
var rgch = "\U0001F642".ToCharArray();
var str = rgch[0] + "" + rgch[1];
Result: "🙂"             Length of string str is 2 UTF-16 code points (4 bytes)

convert a given char varialble to string, representing it's code in my encoding

I want to write a txt file. Some of the chars need to be escaped in a way: \'c1, where c1 is the code of a char in encoding 1251.
How can I convert a given char varialble to string, representing it's code in my encoding?
I found a way to do this for utf, but no way for other ecnodings. For utf variant there is Char.ConvertToUtf32() method.
// get the encoding
Encoding encoding = Encoding.GetEncoding(1251);
// for each character you want to encode
byte b = encoding.GetBytes("" + c)[0];
string hex = b.ToString("x");
string output = #"\'" + hex;
How can I convert a given char varialble to string, representing it's code in my encoding?
Try something like this:
var enc = Encoding.GetEncoding("Windows-1251");
char myCharacter = 'д'; // Cyrillic 'd'
byte code = enc.GetBytes(new[] { myCharacter, })[0];
Console.WriteLine(code.ToString()); // "228" (decimal)
Console.WriteLine(code.ToString("X2")); // "E4" (hex)

How to convert a string with character codes above 127 to a byte array properly?

I am retrieving ASCII strings encoded with code page 437 from another system which I need to transform to Unicode so they can be mixed with other Unicode strings.
This is what I am working with:
var asciiString = "\u0094"; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeEncoding = Encoding.Unicode;
// This is what I attempted to do but it seems not to be able to support the eight bit. Characters using the eight bit are replaced with '?' (0x3F)
var asciiBytes = asciiEncoding.GetBytes(asciiString);
// This work-around does the job, but there must be built in functionality to do this?
//var asciiBytes = asciiString.Select(c => (byte)c).ToArray();
// This piece of code happliy converts the character correctly to unicode { 0x94 } => { 0xF6, 0x0 } .
var unicodeBytes = Encoding.Convert(asciiEncoding, unicodeEncoding, asciiBytes);
var unicodeString = unicodeEncoding.GetString(unicodeBytes); // I want this to be 'ö'.
What I am struggling with is that I cannot find a suitable method in the .NET framework to transform a string with character codes above 127 to a byte array. This seems strange since there are support there to transform a byte array with characters above 127 to Unicode strings.
So my question is, is there any built in method to do this conversion properly or is my work-around the proper way to do it?
var asciiString = "\u0094";
Whatever you name it, this will always be a Unicode string. .NET only has Unicode strings.
I am retrieving ASCII strings encoded with code page 437 from another system
Treat the incoming data as byte[], not as string.
var asciiBytes = new byte[] { 0x94 }; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeString = asciiEncoding.GetString(asciiBytes);
\u0094 is Unicode code-point 0094, which is a control character; it is not ö. If you wanted ö, the correct string is
string s = "ö";
which is LATIN SMALL LETTER O WITH DIAERESIS, aka code-point 00F6.
So:
var s = "\u00F6"; // Identical to "ö"
Now we get our encoding:
var enc = Encoding.GetEncoding(437);
var bytes = enc.GetBytes(s);
And we find that it is a single-byte decimal 148, which is hex 94 - i.e. what you were after.
The significance here is that in C# when you use the "\uXXXX" syntax, the XXXX is always referring to Unicode code-points, not the encoded value in some particular encoding.
You have to look earlier in the code. Once you have the data as a string, it has already been decoded. Any characters lost in that decoding is impossible to get back.
You need the input as bytes, so that you can use your encoding object for code page 437 to decode it into a string.
byte[] asciiData = new byte[] { 0x94 }; // character ö in codepage 437
Encoding asciiEncoding = Encoding.GetEncoding(437);
string unicodeString = asciiEncoding.GetString(asciiData);
Console.WriteLine(unicodeString);
Output:
ö

Conversion of text to unicode strings

I have to process JSON files that looks like this:
\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\/b> \u043d\u0430\u0447
Unfortunately, I'm not sure how this encoding is called.
I would like to convert it to .NET Unicode strings. What's the easies way to do it?
This is Unicode characters for Russian alphabet.
try simply put this line in VisualStudio and it will parse it.
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c";
Or if you want to convert this string to another encoding, for example utf8, try this code:
static void Main()
{
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\b> \u043d\u0430\u0447";
// Create two different encodings.
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] utf8Bytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Console.ReadKey();
}
code taken from Convert

Categories

Resources