I need a equivalent C# code for Base64.getDecoder().decode Java code.
I have tried something like the following in C#
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.UTF8.GetString(decodedBytes);
byte[] bytes = Encoding.ASCII.GetBytes(decodedText);
But the string has some special characters like 0��\u0002B\0�*-\u0017���c\u001e�aֺ]���qr����`. How can I achieve this in C#
'\u0002' is the 'start of text' character for unicode encoding.
So use
byte[] decodedBytes = Convert.FromBase64String(embedCode);
string decodedText = Encoding.Unicode.GetString(decodedBytes);
And please don't try to encode Unicode text as ASCII, Unicode has a wider character range, which ASCII will not be able to recognize. So use unicode encoding again to write to bytes.
Related
I use the below code to convert from unicode to utf-8:
Encoding unicode = Encoding.Unicode;
Encoding UTF8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(stringResp);
byte[] UTF8Bytes = Encoding.Convert(unicode, UTF8, unicodeBytes);
string stringResp = UTF8.GetString(UTF8Bytes, 0, UTF8Bytes.Length);
But the special characters doesn't show, only their unicode code (\u00c1 for Á for example). If I look manually the next string ("\\u00c1") and replace it for Á it works. Does someone knows why and how I can make an automatic conversion?
I get question marks in output of my program: ?????? ??????
string str = "Привет медвед";
Encoding srcEncodingFormat = Encoding.GetEncoding("utf-16");
Encoding dstEncodingFormat = Encoding.ASCII;
byte [] originalByteString = srcEncodingFormat.GetBytes(str);
byte [] convertedByteString = Encoding.Convert(srcEncodingFormat,
dstEncodingFormat, originalByteString);
string finalString = dstEncodingFormat.GetString(convertedByteString);
Console.WriteLine (finalString);
There is no text but encoded text. But, .NET's char and string use Unicode/UTF-16, as you know. So, you can simplify your code by calling GetBytes and passing in the string instead of doing it twice as your code does.
As for your question, you have a choice of a lossy conversion or no conversion at all. Below is code that prevents a lossy conversion.
Now, how to see the result? As with all text, it is a sequence of bytes. Your best bet is to write them to a file and open the file in an editor that you can indicate the encoding to and that can use a font that supports the characters you want to see.
string str = "Привет медвед";
Encoding dstEncodingFormat = Encoding.GetEncoding("US-ASCII",
new EncoderExceptionFallback(),
new DecoderReplacementFallback());
byte[] output = dstEncodingFormat.GetBytes(str);
File.WriteAllBytes("Test Привет медвед.txt", output);
I have a unicode string, let's say "U+660E", and I want to display the corresponding character, which in this case is 明. See this page (ctrl-F to find 明).
My code so far:
string unicodeString = reader.GetString(0);
unicodeString.Trim();
Encoding codepage = Encoding.GetEncoding(950);
Encoding unicode = Encoding.Unicode;
byte[] encodedBytes = codepage.GetBytes(unicodeString);
//unicodeString = Encoding.Convert(codepage, unicode, encodedBytes).ToString();
unicodeString = unicode.GetString(encodedBytes);
richTextBox1.Text = unicodeString;
My output is "⭕㘶䔰�".
Any idea where I went wrong?
.net deals directly with unicode. You do not have to play the encoding games. Just tell the reader if the input is UTF-8 or UTF-16 and then deal with it as a normal string.
richTextBox1.Text = reader.GetString(0)
There's no need to convert to CP-950; C# is Unicode through-and-through. Just input and print as Unicode unless you're outputting to a file that you know has to be CP-950.
I am retrieving ASCII strings encoded with code page 437 from another system which I need to transform to Unicode so they can be mixed with other Unicode strings.
This is what I am working with:
var asciiString = "\u0094"; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeEncoding = Encoding.Unicode;
// This is what I attempted to do but it seems not to be able to support the eight bit. Characters using the eight bit are replaced with '?' (0x3F)
var asciiBytes = asciiEncoding.GetBytes(asciiString);
// This work-around does the job, but there must be built in functionality to do this?
//var asciiBytes = asciiString.Select(c => (byte)c).ToArray();
// This piece of code happliy converts the character correctly to unicode { 0x94 } => { 0xF6, 0x0 } .
var unicodeBytes = Encoding.Convert(asciiEncoding, unicodeEncoding, asciiBytes);
var unicodeString = unicodeEncoding.GetString(unicodeBytes); // I want this to be 'ö'.
What I am struggling with is that I cannot find a suitable method in the .NET framework to transform a string with character codes above 127 to a byte array. This seems strange since there are support there to transform a byte array with characters above 127 to Unicode strings.
So my question is, is there any built in method to do this conversion properly or is my work-around the proper way to do it?
var asciiString = "\u0094";
Whatever you name it, this will always be a Unicode string. .NET only has Unicode strings.
I am retrieving ASCII strings encoded with code page 437 from another system
Treat the incoming data as byte[], not as string.
var asciiBytes = new byte[] { 0x94 }; // 94 corresponds represents 'ö' in code page 437.
var asciiEncoding = Encoding.GetEncoding(437);
var unicodeString = asciiEncoding.GetString(asciiBytes);
\u0094 is Unicode code-point 0094, which is a control character; it is not ö. If you wanted ö, the correct string is
string s = "ö";
which is LATIN SMALL LETTER O WITH DIAERESIS, aka code-point 00F6.
So:
var s = "\u00F6"; // Identical to "ö"
Now we get our encoding:
var enc = Encoding.GetEncoding(437);
var bytes = enc.GetBytes(s);
And we find that it is a single-byte decimal 148, which is hex 94 - i.e. what you were after.
The significance here is that in C# when you use the "\uXXXX" syntax, the XXXX is always referring to Unicode code-points, not the encoded value in some particular encoding.
You have to look earlier in the code. Once you have the data as a string, it has already been decoded. Any characters lost in that decoding is impossible to get back.
You need the input as bytes, so that you can use your encoding object for code page 437 to decode it into a string.
byte[] asciiData = new byte[] { 0x94 }; // character ö in codepage 437
Encoding asciiEncoding = Encoding.GetEncoding(437);
string unicodeString = asciiEncoding.GetString(asciiData);
Console.WriteLine(unicodeString);
Output:
ö
i have a text file in which sets of Unicodes are written as
"'\u0641'","'\u064A','\u0649','\u0642','\u0625','\u0644','\u0627','\u0647','\u0631','\u062A','\u0643','\u0645','\u0639','\u0648','\u0623','\u0646','\u0636','\u0635','\u0633','\u0641','\u062D','\u0628','\u0650','\u064E','\u062C','\u0626"
"'\u0622'","'\u062E','\u0644','\u064A','\u0645".
I opened the file and started reading of file by using readline method. I got the above line shown as a line now i want to convert all Unicode to char so that i could get a readable string. i tried some logic but that doesn't worked i stuck with converting string "'\u00641'" to char.
You can extract strings containing individual numbers (using Regex for example), apply Int16.Parse to each and then convert it to a char.
string num = "0641"; // replace it with extracting logic of your preference
char c = (char)Int16.Parse(num, System.Globalization.NumberStyles.HexNumber);
You could parse the line to get each unicode char. To convert unicode to readable character you could do
char MyChar = '\u0058';
Hope this help
What if you do something like this:
string codePoints = "\u0641 \u064A \u0649 \u0642 \u0625";
UnicodeEncoding uEnc = new UnicodeEncoding();
byte[] bytesToWrite = uEnc.GetBytes(codePoints);
System.IO.File.WriteAllBytes(#"yadda.txt", bytesToWrite);
byte[] readBytes = System.IO.File.ReadAllBytes(#"yadda.txt");
string val = uEnc.GetString(readBytes);
//daniel