German special characters put into Byte array - c#

I'm doing encrypt algotythm right now and I need to encrypt german words also. So I have to encrypt for example characters like: ü,ä or ö.
Inside I've got a function:
private static byte[] getBytesArray(string data)
{
byte[] array;
System.Text.ASCIIEncoding asciiEncoding = new System.Text.ASCIIEncoding();
array = asciiEncoding.GetBytes(data);
return array;
}
But when data is "ü", byte returned in array is 63 (so "?"). How can I return ü byte?
I also tried:
private static byte[] MyGetBytesArray(string data)
{
byte[] array;
System.Text.ASCIIEncoding asciiEncoding = new System.Text.ASCIIEncoding();
Encoding enc = new UTF8Encoding(true, true);
array = enc.GetBytes(data);
return array;
}
but in this case I get 2 bytes in array: 195 and 188.

Please replace System.Text.ASCIIEncoding with System.Text.UTF8Encoding and rename the encoding object accordingly in your first example. ASCII basically does not support german characters, so this is why you'll have to use some other encoding (UTF-8 seems to be the best idea here).
Please take a look here: ASCII Encoding and here: UTF-8 Encoding

You can use this
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);

Related

Writing to file a UNICODE character from HEX input

I have a HEX input (eg. 394A) and I need to encode it as UNICODE, then save the resulting character(s) to a file. How do I go about that?
I've tried this, but it doesn't seem to work.
fsDest.Write(StrToUni(uni.ToString()), 0, 2);
private static byte[] StrToUni(string str)
{
Encoding unicode = Encoding.Unicode;
byte[] unicodeBytes = unicode.GetBytes(str);
return unicodeBytes;
}
I should see this in my file: 9J

Modify encoding of unicode string that comes from a Web API. Econding.Convert doesn't work

I use the below code to convert from unicode to utf-8:
Encoding unicode = Encoding.Unicode;
Encoding UTF8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(stringResp);
byte[] UTF8Bytes = Encoding.Convert(unicode, UTF8, unicodeBytes);
string stringResp = UTF8.GetString(UTF8Bytes, 0, UTF8Bytes.Length);
But the special characters doesn't show, only their unicode code (\u00c1 for Á for example). If I look manually the next string ("\\u00c1") and replace it for Á it works. Does someone knows why and how I can make an automatic conversion?

Convert UTF-8 to Chinese Simplified (GB2312)

Is there a way to convert UTF-8 string to Chinese Simplified (GB2312) in C#. Any help is greatly appreciated.
Regards
Jyothish George
The first thing to be aware of is that there's no such thing as a "UTF-8 string" in .NET. All strings in .NET are effectively UTF-16. However, .NET provides the Encoding class to allow you to decode binary data into strings, and re-encode it later.
Encoding.Convert can convert a byte array representing text encoded with one encoding into a byte array with the same text encoded with a different encoding. Is that what you want?
Alternatively, if you already have a string, you can use:
byte[] bytes = Encoding.GetEncoding("gb2312").GetBytes(text);
If you can provide more information, that would be helpful.
Try this;
public string GB2312ToUtf8(string gb2312String)
{
Encoding fromEncoding = Encoding.GetEncoding("gb2312");
Encoding toEncoding = Encoding.UTF8;
return EncodingConvert(gb2312String, fromEncoding, toEncoding);
}
public string Utf8ToGB2312(string utf8String)
{
Encoding fromEncoding = Encoding.UTF8;
Encoding toEncoding = Encoding.GetEncoding("gb2312");
return EncodingConvert(utf8String, fromEncoding, toEncoding);
}
public string EncodingConvert(string fromString, Encoding fromEncoding, Encoding toEncoding)
{
byte[] fromBytes = fromEncoding.GetBytes(fromString);
byte[] toBytes = Encoding.Convert(fromEncoding, toEncoding, fromBytes);
string toString = toEncoding.GetString(toBytes);
return toString;
}
source here

Conversion of text to unicode strings

I have to process JSON files that looks like this:
\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\/b> \u043d\u0430\u0447
Unfortunately, I'm not sure how this encoding is called.
I would like to convert it to .NET Unicode strings. What's the easies way to do it?
This is Unicode characters for Russian alphabet.
try simply put this line in VisualStudio and it will parse it.
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c";
Or if you want to convert this string to another encoding, for example utf8, try this code:
static void Main()
{
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\b> \u043d\u0430\u0447";
// Create two different encodings.
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] utf8Bytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Console.ReadKey();
}
code taken from Convert

C# and utf8_decode

Is there a C# utf8_decode equivalent?
Use the Encoding class.
For example:
byte[] bytes = something;
string str = Encoding.UTF8.GetString(bytes);
Yes. You can use the System.Text.Encoding class to convert the encoding.
string source = "Déjà vu";
Encoding unicode = Encoding.Unicode;
// iso-8859-1 <- codepage 28591
Encoding latin1 = Encoding.GetEncoding(28591);
Byte[] result = Encoding.Convert(unicode, latin1, unicode.GetBytes(s));
// result contains the byte sequence for the latin1 encoded string
edit: or simply
string source = "Déjà vu";
Byte[] latin1 = Encoding.GetEncoding(28591).GetBytes(source);
string (System.String) is always unicode encoded, i.e. if you convert the byte sequence back to string (Encoding.GetString()) your data will again be stored as utf-16 codepoints again.
If your input is a string here is a method that would probably work (assuming your from wester europe :)
public string Utf8Decode(string inputDate)
{
return Encoding.GetEncoding("iso-8859-1").GetString(Encoding.UTF8.GetBytes(inputDate));
}
Of course, if the current encoding of the inputData is not latin1, change the "iso-8859-1" to the correct encoding.
I tried to make this implementation on Xamarin C#.
The code below worked for me:
public static string Utf8Encode(string inputDate)
{
byte[] bytes = Encoding.UTF8.GetBytes(inputDate);
return Encoding.GetEncoding("iso-8859-1").GetString(bytes,0, bytes.Length);
}
public static string Utf8Decode(string inputDate)
{
byte[] bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(inputDate);
return Encoding.UTF8.GetString(bytes, 0, bytes.Length);
}

Categories

Resources