I have a unicode string, let's say "U+660E", and I want to display the corresponding character, which in this case is 明. See this page (ctrl-F to find 明).
My code so far:
string unicodeString = reader.GetString(0);
unicodeString.Trim();
Encoding codepage = Encoding.GetEncoding(950);
Encoding unicode = Encoding.Unicode;
byte[] encodedBytes = codepage.GetBytes(unicodeString);
//unicodeString = Encoding.Convert(codepage, unicode, encodedBytes).ToString();
unicodeString = unicode.GetString(encodedBytes);
richTextBox1.Text = unicodeString;
My output is "⭕㘶䔰�".
Any idea where I went wrong?
.net deals directly with unicode. You do not have to play the encoding games. Just tell the reader if the input is UTF-8 or UTF-16 and then deal with it as a normal string.
richTextBox1.Text = reader.GetString(0)
There's no need to convert to CP-950; C# is Unicode through-and-through. Just input and print as Unicode unless you're outputting to a file that you know has to be CP-950.
Related
I need a equivalent C# code for Base64.getDecoder().decode Java code.
I have tried something like the following in C#
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.UTF8.GetString(decodedBytes);
byte[] bytes = Encoding.ASCII.GetBytes(decodedText);
But the string has some special characters like 0��\u0002B\0�*-\u0017���c\u001e�aֺ]���qr����`. How can I achieve this in C#
'\u0002' is the 'start of text' character for unicode encoding.
So use
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.Unicode.GetString(decodedBytes);
And please don't try to encode Unicode text as ASCII, Unicode has a wider character range, which ASCII will not be able to recognize. So use unicode encoding again to write to bytes.
I use the below code to convert from unicode to utf-8:
Encoding unicode = Encoding.Unicode;
Encoding UTF8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(stringResp);
byte[] UTF8Bytes = Encoding.Convert(unicode, UTF8, unicodeBytes);
string stringResp = UTF8.GetString(UTF8Bytes, 0, UTF8Bytes.Length);
But the special characters doesn't show, only their unicode code (\u00c1 for Á for example). If I look manually the next string ("\\u00c1") and replace it for Á it works. Does someone knows why and how I can make an automatic conversion?
How do I create/encode a string with a specific encoding in C#/.Net framework? For example, I would like to make a string which uses the Western European ISO 8859-1 encoding.
C#/.Net/.NetCore Framework(s) use the UTF-16 encoding (i.e. any string you create will be this encoding). Which is found under Encoding.Unicode (but not necessarily UTF-16 for everyone...).
Thus you need to convert your string to the desired encoding. Note that this approach/code is only if you have created your own string, otherwise you have to take a different approach if you got the string/text from somewhere else like a file.
Encoding westernEuropeanIso8859 = Encoding.GetEncoding("ISO-8859-1");
Encoding utf16CSharpDefault = Encoding.Unicode;
byte[] utfBytes = utf16CSharpDefault.GetBytes(vExp);
byte[] isoBytes = Encoding.Convert(utf16CSharpDefault, westernEuropeanIso8859, utfBytes);
string stringWithDesiredEncoding = westernEuropeanIso8859.GetString(isoBytes);
I get question marks in output of my program: ?????? ??????
string str = "Привет медвед";
Encoding srcEncodingFormat = Encoding.GetEncoding("utf-16");
Encoding dstEncodingFormat = Encoding.ASCII;
byte [] originalByteString = srcEncodingFormat.GetBytes(str);
byte [] convertedByteString = Encoding.Convert(srcEncodingFormat,
dstEncodingFormat, originalByteString);
string finalString = dstEncodingFormat.GetString(convertedByteString);
Console.WriteLine (finalString);
There is no text but encoded text. But, .NET's char and string use Unicode/UTF-16, as you know. So, you can simplify your code by calling GetBytes and passing in the string instead of doing it twice as your code does.
As for your question, you have a choice of a lossy conversion or no conversion at all. Below is code that prevents a lossy conversion.
Now, how to see the result? As with all text, it is a sequence of bytes. Your best bet is to write them to a file and open the file in an editor that you can indicate the encoding to and that can use a font that supports the characters you want to see.
string str = "Привет медвед";
Encoding dstEncodingFormat = Encoding.GetEncoding("US-ASCII",
new EncoderExceptionFallback(),
new DecoderReplacementFallback());
byte[] output = dstEncodingFormat.GetBytes(str);
File.WriteAllBytes("Test Привет медвед.txt", output);
Few days ago I've asked a question about german special characters.
I can encode and decode characters like ö, ä or ü now. But.. some characters left and I need to encode/decode them too.
For example, characters that fails: ² ³ € µ Ü Ö Ä ~ ´ §
Here is code:
private static byte[] MyGetBytesArray(string data)
{
Encoding enc = new UTF8Encoding(true, true);
return enc.GetBytes(data);
}
private static string MyGetString(byte[] data)
{
Encoding enc = new UTF8Encoding(true, true);
return enc.GetString(data);
}
I'm looking for a solution to encode/decode all characters. I'm writing an encrypt/decrypt algorythm, and I don't know what user will paste into program. I need to give back exactly the same.
Thanks for help, again..
EDIT:
Ok, UnicodeEncoding works (I think). It is my encrypt/decrypt algoryth now:/ I'm still not sure what is going on (I thnik it is sth with zeros. During encoding by Unicode zero is after every character), but encoding special characters wokrs. At least that test was successfull:
string text = File.ReadAllText(opd.FileName, Encoding.Default);
byte[] byt = getBytesArray(text);
string text2 = getString(byt);
if (text2 == text)
{
MessageBox.Show("OK");
}
else
{
MessageBox.Show("FAIL");
}
BTW. Encoding.Default is correct right ?
Try UnicodeEncoding instead.
var encoding = new UnicodeEncoding();
return Write(encoding.GetBytes(s));
Unfortunately those characters are Unicode so you won't be able to use the UTF8Encoding class.
Try using the UnicodeEncoding class instead.