Writing to file a UNICODE character from HEX input - c#

I have a HEX input (eg. 394A) and I need to encode it as UNICODE, then save the resulting character(s) to a file. How do I go about that?
I've tried this, but it doesn't seem to work.
fsDest.Write(StrToUni(uni.ToString()), 0, 2);
private static byte[] StrToUni(string str)
{
Encoding unicode = Encoding.Unicode;
byte[] unicodeBytes = unicode.GetBytes(str);
return unicodeBytes;
}
I should see this in my file: 9J

Related

c# - How to convert a converted UTF8 string to UTF16?

I'm trying to convert a converted UTF-8 string to UTF-16, because I'm going to read a file and it comes like the var strUTF8 below.
For example, the entry would be the string "Não é possível equipar" and the return I needed is "Não é possível equipar".
static void Main(string[] args)
{
test3();
Console.ReadKey();
}
static void test3()
{
string str = "Não é possível equipar";
string strUTF16 = Utf8ToUtf16(str);
Console.WriteLine(str);
Console.WriteLine(strUTF16);
}
static string Utf8ToUtf16(string utf8String)
{
byte[] utf8Bytes = Encoding.UTF8.GetBytes(utf8String);
byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes);
return Encoding.Unicode.GetString(unicodeBytes);
}
I really don't know how to solve this. Any tips?
If you want to read a file then you should read a file. When you read the file, specify the encoding of that file. If I'm not mistaken UTF8 is the default, so reading files encoded with UTF8 doesn't require the encoding to be specified. If you want to save that text to a file with a specific encoding, specify that encoding when saving the file.
var text = File.ReadAllText(filePath, Encoding.UTF8);
File.WriteAllText(filePath, text, Encoding.Unicode);
That will effectively convert a file from UTF8 encoding to UTF16. A more verbose version would be:
var data = File.ReadAllBytes(filePath);
var text = Encoding.UTF8.GetString(data);
data = Encoding.Unicode.GetBytes(text);
File.WriteAllBytes(filePath, data);
Your Utf8ToUtf16() function is effectively a no-op. You are taking an arbitrary UTF-16 string as input, encoding it into UTF-8 bytes, then decoding those bytes as UTF-8 back into UTF-16. So, you effectively end up with the same string value you started with. You may as well have just written the following, the result would be the same:
static string Utf8ToUtf16(string utf8String)
{
return utf8String;
}
That being said, Não é possível equipar is what you get when the UTF-8 encoded form of Não é possível equipar is mis-interpreted as Latin (probably ISO-8859-1) or Windows-125x etc, instead of being properly interpreted as UTF-8 to begin with.
If you have a C# string that contains such UTF-8 bytes which were up-scaled as-is to UTF-16 (why???), then you need to down-scale those characters as-is back into 8-bit bytes, and then you can decode those bytes as UTF-8, eg:
static void test3()
{
string str = "Não é possível equipar";
string strUTF16 = Utf8ToUtf16(str);
Console.WriteLine(str);
Console.WriteLine(strUTF16);
}
static string Utf8ToUtf16(string utf8String)
{
byte[] utf8Bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(utf8String); // or: GetEncoding(28591)
return Encoding.UTF8.GetString(utf8Bytes);
}

Modify encoding of unicode string that comes from a Web API. Econding.Convert doesn't work

I use the below code to convert from unicode to utf-8:
Encoding unicode = Encoding.Unicode;
Encoding UTF8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(stringResp);
byte[] UTF8Bytes = Encoding.Convert(unicode, UTF8, unicodeBytes);
string stringResp = UTF8.GetString(UTF8Bytes, 0, UTF8Bytes.Length);
But the special characters doesn't show, only their unicode code (\u00c1 for Á for example). If I look manually the next string ("\\u00c1") and replace it for Á it works. Does someone knows why and how I can make an automatic conversion?

German special characters put into Byte array

I'm doing encrypt algotythm right now and I need to encrypt german words also. So I have to encrypt for example characters like: ü,ä or ö.
Inside I've got a function:
private static byte[] getBytesArray(string data)
{
byte[] array;
System.Text.ASCIIEncoding asciiEncoding = new System.Text.ASCIIEncoding();
array = asciiEncoding.GetBytes(data);
return array;
}
But when data is "ü", byte returned in array is 63 (so "?"). How can I return ü byte?
I also tried:
private static byte[] MyGetBytesArray(string data)
{
byte[] array;
System.Text.ASCIIEncoding asciiEncoding = new System.Text.ASCIIEncoding();
Encoding enc = new UTF8Encoding(true, true);
array = enc.GetBytes(data);
return array;
}
but in this case I get 2 bytes in array: 195 and 188.
Please replace System.Text.ASCIIEncoding with System.Text.UTF8Encoding and rename the encoding object accordingly in your first example. ASCII basically does not support german characters, so this is why you'll have to use some other encoding (UTF-8 seems to be the best idea here).
Please take a look here: ASCII Encoding and here: UTF-8 Encoding
You can use this
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);

Conversion of text to unicode strings

I have to process JSON files that looks like this:
\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\/b> \u043d\u0430\u0447
Unfortunately, I'm not sure how this encoding is called.
I would like to convert it to .NET Unicode strings. What's the easies way to do it?
This is Unicode characters for Russian alphabet.
try simply put this line in VisualStudio and it will parse it.
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c";
Or if you want to convert this string to another encoding, for example utf8, try this code:
static void Main()
{
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\b> \u043d\u0430\u0447";
// Create two different encodings.
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] utf8Bytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Console.ReadKey();
}
code taken from Convert

C# and utf8_decode

Is there a C# utf8_decode equivalent?
Use the Encoding class.
For example:
byte[] bytes = something;
string str = Encoding.UTF8.GetString(bytes);
Yes. You can use the System.Text.Encoding class to convert the encoding.
string source = "Déjà vu";
Encoding unicode = Encoding.Unicode;
// iso-8859-1 <- codepage 28591
Encoding latin1 = Encoding.GetEncoding(28591);
Byte[] result = Encoding.Convert(unicode, latin1, unicode.GetBytes(s));
// result contains the byte sequence for the latin1 encoded string
edit: or simply
string source = "Déjà vu";
Byte[] latin1 = Encoding.GetEncoding(28591).GetBytes(source);
string (System.String) is always unicode encoded, i.e. if you convert the byte sequence back to string (Encoding.GetString()) your data will again be stored as utf-16 codepoints again.
If your input is a string here is a method that would probably work (assuming your from wester europe :)
public string Utf8Decode(string inputDate)
{
return Encoding.GetEncoding("iso-8859-1").GetString(Encoding.UTF8.GetBytes(inputDate));
}
Of course, if the current encoding of the inputData is not latin1, change the "iso-8859-1" to the correct encoding.
I tried to make this implementation on Xamarin C#.
The code below worked for me:
public static string Utf8Encode(string inputDate)
{
byte[] bytes = Encoding.UTF8.GetBytes(inputDate);
return Encoding.GetEncoding("iso-8859-1").GetString(bytes,0, bytes.Length);
}
public static string Utf8Decode(string inputDate)
{
byte[] bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(inputDate);
return Encoding.UTF8.GetString(bytes, 0, bytes.Length);
}

Categories

Resources