C# Encoding from utf-16 to ascii - c#

I get question marks in output of my program: ?????? ??????
string str = "Привет медвед";
Encoding srcEncodingFormat = Encoding.GetEncoding("utf-16");
Encoding dstEncodingFormat = Encoding.ASCII;
byte [] originalByteString = srcEncodingFormat.GetBytes(str);
byte [] convertedByteString = Encoding.Convert(srcEncodingFormat,
dstEncodingFormat, originalByteString);
string finalString = dstEncodingFormat.GetString(convertedByteString);
Console.WriteLine (finalString);

There is no text but encoded text. But, .NET's char and string use Unicode/UTF-16, as you know. So, you can simplify your code by calling GetBytes and passing in the string instead of doing it twice as your code does.
As for your question, you have a choice of a lossy conversion or no conversion at all. Below is code that prevents a lossy conversion.
Now, how to see the result? As with all text, it is a sequence of bytes. Your best bet is to write them to a file and open the file in an editor that you can indicate the encoding to and that can use a font that supports the characters you want to see.
string str = "Привет медвед";
Encoding dstEncodingFormat = Encoding.GetEncoding("US-ASCII",
new EncoderExceptionFallback(),
new DecoderReplacementFallback());
byte[] output = dstEncodingFormat.GetBytes(str);
File.WriteAllBytes("Test Привет медвед.txt", output);

Related

Base64.getDecoder().decode in C#

I need a equivalent C# code for Base64.getDecoder().decode Java code.
I have tried something like the following in C#
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.UTF8.GetString(decodedBytes);
byte[] bytes = Encoding.ASCII.GetBytes(decodedText);
But the string has some special characters like 0��\u0002B\0�*-\u0017���c\u001e�aֺ]���qr����`. How can I achieve this in C#
'\u0002' is the 'start of text' character for unicode encoding.
So use
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.Unicode.GetString(decodedBytes);
And please don't try to encode Unicode text as ASCII, Unicode has a wider character range, which ASCII will not be able to recognize. So use unicode encoding again to write to bytes.

Unicode string to Chinese characters

I have a unicode string, let's say "U+660E", and I want to display the corresponding character, which in this case is 明. See this page (ctrl-F to find 明).
My code so far:
string unicodeString = reader.GetString(0);
unicodeString.Trim();
Encoding codepage = Encoding.GetEncoding(950);
Encoding unicode = Encoding.Unicode;
byte[] encodedBytes = codepage.GetBytes(unicodeString);
//unicodeString = Encoding.Convert(codepage, unicode, encodedBytes).ToString();
unicodeString = unicode.GetString(encodedBytes);
richTextBox1.Text = unicodeString;
My output is "⭕㘶䔰�".
Any idea where I went wrong?
.net deals directly with unicode. You do not have to play the encoding games. Just tell the reader if the input is UTF-8 or UTF-16 and then deal with it as a normal string.
richTextBox1.Text = reader.GetString(0)
There's no need to convert to CP-950; C# is Unicode through-and-through. Just input and print as Unicode unless you're outputting to a file that you know has to be CP-950.

how to convert string of unicodes to the char?

i have a text file in which sets of Unicodes are written as
"'\u0641'","'\u064A','\u0649','\u0642','\u0625','\u0644','\u0627','\u0647','\u0631','\u062A','\u0643','\u0645','\u0639','\u0648','\u0623','\u0646','\u0636','\u0635','\u0633','\u0641','\u062D','\u0628','\u0650','\u064E','\u062C','\u0626"
"'\u0622'","'\u062E','\u0644','\u064A','\u0645".
I opened the file and started reading of file by using readline method. I got the above line shown as a line now i want to convert all Unicode to char so that i could get a readable string. i tried some logic but that doesn't worked i stuck with converting string "'\u00641'" to char.
You can extract strings containing individual numbers (using Regex for example), apply Int16.Parse to each and then convert it to a char.
string num = "0641"; // replace it with extracting logic of your preference
char c = (char)Int16.Parse(num, System.Globalization.NumberStyles.HexNumber);
You could parse the line to get each unicode char. To convert unicode to readable character you could do
char MyChar = '\u0058';
Hope this help
What if you do something like this:
string codePoints = "\u0641 \u064A \u0649 \u0642 \u0625";
UnicodeEncoding uEnc = new UnicodeEncoding();
byte[] bytesToWrite = uEnc.GetBytes(codePoints);
System.IO.File.WriteAllBytes(#"yadda.txt", bytesToWrite);
byte[] readBytes = System.IO.File.ReadAllBytes(#"yadda.txt");
string val = uEnc.GetString(readBytes);
//daniel

Convert UTF-8 to Chinese Simplified (GB2312)

Is there a way to convert UTF-8 string to Chinese Simplified (GB2312) in C#. Any help is greatly appreciated.
Regards
Jyothish George
The first thing to be aware of is that there's no such thing as a "UTF-8 string" in .NET. All strings in .NET are effectively UTF-16. However, .NET provides the Encoding class to allow you to decode binary data into strings, and re-encode it later.
Encoding.Convert can convert a byte array representing text encoded with one encoding into a byte array with the same text encoded with a different encoding. Is that what you want?
Alternatively, if you already have a string, you can use:
byte[] bytes = Encoding.GetEncoding("gb2312").GetBytes(text);
If you can provide more information, that would be helpful.
Try this;
public string GB2312ToUtf8(string gb2312String)
{
Encoding fromEncoding = Encoding.GetEncoding("gb2312");
Encoding toEncoding = Encoding.UTF8;
return EncodingConvert(gb2312String, fromEncoding, toEncoding);
}
public string Utf8ToGB2312(string utf8String)
{
Encoding fromEncoding = Encoding.UTF8;
Encoding toEncoding = Encoding.GetEncoding("gb2312");
return EncodingConvert(utf8String, fromEncoding, toEncoding);
}
public string EncodingConvert(string fromString, Encoding fromEncoding, Encoding toEncoding)
{
byte[] fromBytes = fromEncoding.GetBytes(fromString);
byte[] toBytes = Encoding.Convert(fromEncoding, toEncoding, fromBytes);
string toString = toEncoding.GetString(toBytes);
return toString;
}
source here

Conversion of text to unicode strings

I have to process JSON files that looks like this:
\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\/b> \u043d\u0430\u0447
Unfortunately, I'm not sure how this encoding is called.
I would like to convert it to .NET Unicode strings. What's the easies way to do it?
This is Unicode characters for Russian alphabet.
try simply put this line in VisualStudio and it will parse it.
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c";
Or if you want to convert this string to another encoding, for example utf8, try this code:
static void Main()
{
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\b> \u043d\u0430\u0447";
// Create two different encodings.
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] utf8Bytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Console.ReadKey();
}
code taken from Convert

Categories

Resources