i have a text file in which sets of Unicodes are written as
"'\u0641'","'\u064A','\u0649','\u0642','\u0625','\u0644','\u0627','\u0647','\u0631','\u062A','\u0643','\u0645','\u0639','\u0648','\u0623','\u0646','\u0636','\u0635','\u0633','\u0641','\u062D','\u0628','\u0650','\u064E','\u062C','\u0626"
"'\u0622'","'\u062E','\u0644','\u064A','\u0645".
I opened the file and started reading of file by using readline method. I got the above line shown as a line now i want to convert all Unicode to char so that i could get a readable string. i tried some logic but that doesn't worked i stuck with converting string "'\u00641'" to char.
You can extract strings containing individual numbers (using Regex for example), apply Int16.Parse to each and then convert it to a char.
string num = "0641"; // replace it with extracting logic of your preference
char c = (char)Int16.Parse(num, System.Globalization.NumberStyles.HexNumber);
You could parse the line to get each unicode char. To convert unicode to readable character you could do
char MyChar = '\u0058';
Hope this help
What if you do something like this:
string codePoints = "\u0641 \u064A \u0649 \u0642 \u0625";
UnicodeEncoding uEnc = new UnicodeEncoding();
byte[] bytesToWrite = uEnc.GetBytes(codePoints);
System.IO.File.WriteAllBytes(#"yadda.txt", bytesToWrite);
byte[] readBytes = System.IO.File.ReadAllBytes(#"yadda.txt");
string val = uEnc.GetString(readBytes);
//daniel
Related
How to turn such a string into an emoji? 1F600 => 😀 or 1F600 => \U0001F600 or 1F600 => 0x1F600
I spent a few days but I still didn't understand how to translate a string like 1F600 into emoji
You simply need to convert the value to the code point then get the character at that code point:
var emoji = Char.ConvertFromUtf32(Convert.ToInt32("1F600", 16));
Demo on dotnetfiddle
The string "1F600" is the hexadecimal representation of a Unicode code point. As it is not in the BMP, you either need UTF32 or a UTF16 surrogate pair to represent it.
Here is some code to perform the requested conversion using UTF32 representation:
Parse as 32-bit integer:
var utf32Char = uint.Parse("1F600", NumberStyles.AllowHexSpecifier);
Convert this to a 4-element byte array in litte-endian byte order:
var utf32Bytes = BitConverter.GetBytes(utf32Char);
if (!BitConverter.IsLittleEndian)
Array.Reverse(utf32Bytes);
Finally, use Encoding.UTF32 to make a string from it.
var str = Encoding.UTF32.GetString(utf32Bytes);
Console.WriteLine(str);
I need a equivalent C# code for Base64.getDecoder().decode Java code.
I have tried something like the following in C#
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.UTF8.GetString(decodedBytes);
byte[] bytes = Encoding.ASCII.GetBytes(decodedText);
But the string has some special characters like 0��\u0002B\0�*-\u0017���c\u001e�aֺ]���qr����`. How can I achieve this in C#
'\u0002' is the 'start of text' character for unicode encoding.
So use
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.Unicode.GetString(decodedBytes);
And please don't try to encode Unicode text as ASCII, Unicode has a wider character range, which ASCII will not be able to recognize. So use unicode encoding again to write to bytes.
I get question marks in output of my program: ?????? ??????
string str = "Привет медвед";
Encoding srcEncodingFormat = Encoding.GetEncoding("utf-16");
Encoding dstEncodingFormat = Encoding.ASCII;
byte [] originalByteString = srcEncodingFormat.GetBytes(str);
byte [] convertedByteString = Encoding.Convert(srcEncodingFormat,
dstEncodingFormat, originalByteString);
string finalString = dstEncodingFormat.GetString(convertedByteString);
Console.WriteLine (finalString);
There is no text but encoded text. But, .NET's char and string use Unicode/UTF-16, as you know. So, you can simplify your code by calling GetBytes and passing in the string instead of doing it twice as your code does.
As for your question, you have a choice of a lossy conversion or no conversion at all. Below is code that prevents a lossy conversion.
Now, how to see the result? As with all text, it is a sequence of bytes. Your best bet is to write them to a file and open the file in an editor that you can indicate the encoding to and that can use a font that supports the characters you want to see.
string str = "Привет медвед";
Encoding dstEncodingFormat = Encoding.GetEncoding("US-ASCII",
new EncoderExceptionFallback(),
new DecoderReplacementFallback());
byte[] output = dstEncodingFormat.GetBytes(str);
File.WriteAllBytes("Test Привет медвед.txt", output);
I want to write a txt file. Some of the chars need to be escaped in a way: \'c1, where c1 is the code of a char in encoding 1251.
How can I convert a given char varialble to string, representing it's code in my encoding?
I found a way to do this for utf, but no way for other ecnodings. For utf variant there is Char.ConvertToUtf32() method.
// get the encoding
Encoding encoding = Encoding.GetEncoding(1251);
// for each character you want to encode
byte b = encoding.GetBytes("" + c)[0];
string hex = b.ToString("x");
string output = #"\'" + hex;
How can I convert a given char varialble to string, representing it's code in my encoding?
Try something like this:
var enc = Encoding.GetEncoding("Windows-1251");
char myCharacter = 'д'; // Cyrillic 'd'
byte code = enc.GetBytes(new[] { myCharacter, })[0];
Console.WriteLine(code.ToString()); // "228" (decimal)
Console.WriteLine(code.ToString("X2")); // "E4" (hex)
I have a unicode string, let's say "U+660E", and I want to display the corresponding character, which in this case is 明. See this page (ctrl-F to find 明).
My code so far:
string unicodeString = reader.GetString(0);
unicodeString.Trim();
Encoding codepage = Encoding.GetEncoding(950);
Encoding unicode = Encoding.Unicode;
byte[] encodedBytes = codepage.GetBytes(unicodeString);
//unicodeString = Encoding.Convert(codepage, unicode, encodedBytes).ToString();
unicodeString = unicode.GetString(encodedBytes);
richTextBox1.Text = unicodeString;
My output is "â•ã˜¶ä”°ï¿½".
Any idea where I went wrong?
.net deals directly with unicode. You do not have to play the encoding games. Just tell the reader if the input is UTF-8 or UTF-16 and then deal with it as a normal string.
richTextBox1.Text = reader.GetString(0)
There's no need to convert to CP-950; C# is Unicode through-and-through. Just input and print as Unicode unless you're outputting to a file that you know has to be CP-950.