How to convert from Hex to Unicode?

How to convert from Hex to Unicode? - c#

My application receives Hex values from client and converts back to character which is usually chinese character. But I can't implement this properly. As per my current programme it can convert "e5a682e4bd95313233" to "如何123" but I am actually receiving "59824F55003100320033" from the client side for the same Chinese character "如何123" and my programme unable to convert back into string. Please help me on this.
Here is my current code:
byte[] uniMsg = null;
string msg = "59824F55003100320033";
uniMsg = StringToByteArray(msg.ToUpper());
msg = System.Text.Encoding.UTF8.GetString(uniMsg);
public static byte[] StringToByteArray(String hex)
{
hex = hex.Replace("-", "");
byte[] raw = new byte[hex.Length / 2];
for (int i = 0; i < raw.Length; i++)
{
raw[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return raw;
}
Appreciate any help on this. Thanks.
Solution:
updated
msg = System.Text.Encoding.UTF8.GetString(uniMsg);
to
msg = System.Text.Encoding.BigEndianUnicode.GetString(uniMsg)
Thanks to #CodesInChaos for suggesting the encoding type.

It doesn't seem to be encoded in UTF8. Note the 00 31 00 32 00 33 part. In UTF8, it'd be just 31 32 33. I think the hexstring is in UTF16 BE, because it's exactly 2 bytes per character and they are 00-padded. Decode your byte array as UTF16, you will get a string. Then you can use it as string, or reconvert it to any other encoding you need.

Related

Why do I get a different value after turning an integer into ASCII and then back to an integer?

Why, when I turn INT value to bytes and to ASCII and back, I get another value?
Example:
var asciiStr = new string(Encoding.ASCII.GetChars(BitConverter.GetBytes(2000)));
var intVal = BitConverter.ToInt32(Encoding.ASCII.GetBytes(asciiStr), 0);
Console.WriteLine(intVal);
// Result: 1855

ASCII is only 7-bit - code points above 127 are unsupported. Unsupported characters are converted to ? per the docs on Encoding.ASCII:
The ASCIIEncoding object that is returned by this property might not have the appropriate behavior for your app. It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character.
So 2000 decimal = D0 07 00 00 hexadecimal (little endian) = [unsupported character] [BEL character] [NUL character] [NUL character] = ? [BEL character] [NUL character] [NUL character] = 3F 07 00 00 hexadecimal (little endian) = 1855 decimal.

TL;DR: Everything's fine. But you're a victim of character replacement.
We start with 2000. Let's acknowledge, first, that this number can be represented in hexadecimal as 0x000007d0.
BitConverter.GetBytes
BitConverter.GetBytes(2000) is an array of 4 bytes, Because 2000 is a 32-bit integer literal. So the 32-bit integer representation, in little endian (least significant byte first), is given by the following byte sequence { 0xd0, 0x07, 0x00, 0x00 }. In decimal, those same bytes are { 208, 7, 0, 0 }
Encoding.ASCII.GetChars
Uh oh! Problem. Here's where things likely took an unexpected turn for you.
You're asking the system to interpret those bytes as ASCII-encoded data. The problem is that ASCII uses codes from 0-127. The byte with value 208 (0xd0) doesn't correspond to any character encodable by ASCII. So what actually happens?
When decoding ASCII, if it encounters a byte that is out of the range 0-127 then it decodes that byte to a replacement character and moves to the next byte. This replacement character is a question mark ?. So the 4 chars you get back from Encoding.ASCII.GetChars are ?, BEL (bell), NUL (null) and NUL (null).
BEL is the ASCII name of the character with code 7, which traditionally elicits a beep when presented on a capable terminal. NUL (code 0) is a null character traditionally used for representing the end of a string.
new string
Now you create a string from that array of chars. In C# a string is perfectly capable of representing a NUL character within the body of a string, so your string will have two NUL chars in it. They can be represented in C# string literals with "\0", in case you want to try that yourself. A C# string literal that represents the string you have would be "?\a\0\0" Did you know that the BEL character can be represented with the escape sequence \a? Many people don't.
Encoding.ASCII.GetBytes
Now you begin the reverse journey. Your string is comprised entirely of characters in the ASCII range. The encoding of a question mark is code 63 (0x3F). And the BEL is 7, and the NUL is 0. so the bytes are { 0x3f, 0x07, 0x00, 0x00 }. Surprised? Well, you're encoding a question mark now where before you provided a 208 (0xd0) byte that was not representable with ASCII encoding.
BitConverter.ToInt32
Converting these four bytes back to a 32-bit integer gives the integer 0x0000073f, which, in decimal, is 1855.

String encoding (ASCII, UTF8, SHIFT_JIS, etc.) is designed to pigeonhole human language into a binary (byte) form. It isn't designed to store arbitrary binary data, such as the binary form of an integer.
While your binary data will be interpreted as a string, some of the information will be lost, meaning that storing binary data in this way will fail in the general case. You can see the point where this fails using the following code:
for (int i = 0; i < 255; ++i)
{
var byteData = new byte[] { (byte)i };
var stringData = System.Text.Encoding.ASCII.GetString(byteData);
var encodedAsBytes = System.Text.Encoding.ASCII.GetBytes(stringData);
Console.WriteLine("{0} vs {1}", i, (int)encodedAsBytes[0]);
}
Try it online
As you can see it starts off well because all of the character codes correspond to ASCII characters, but once we get up in the numbers (i.e. 128 and beyond), we start to require a more than 7 bits to store the binary value. At this point it ceases to be decoded correctly, and we start seeing 63 come back instead of the input value.
Ultimately you will have this problem encoding binary data using any string encoding. You need to choose an encoding method specifically meant for storing binary data as a string.
Two popular methods are:
Hexadecimal
Base64 using ToBase64String and FromBase64String
Hexadecimal example (using the hex methods here):
int initialValue = 2000;
Console.WriteLine(initialValue);
// Convert from int to bytes and then to hex
byte[] bytesValue = BitConverter.GetBytes(initialValue);
string stringValue = ByteArrayToString(bytesValue);
Console.WriteLine("As hex: {0}", stringValue); // outputs D0070000
// Convert form hex to bytes and then to int
byte[] decodedBytesValue = StringToByteArray(stringValue);
int intValue = BitConverter.ToInt32(decodedBytesValue, 0);
Console.WriteLine(intValue);
Try it online
Base64 example:
int initialValue = 2000;
Console.WriteLine(initialValue);
// Convert from int to bytes and then to base64
byte[] bytesValue = BitConverter.GetBytes(initialValue);
string stringValue = Convert.ToBase64String(bytesValue);
Console.WriteLine("As base64: {0}", stringValue); // outputs 0AcAAA==
// Convert form base64 to bytes and then to int
byte[] decodedBytesValue = Convert.FromBase64String(stringValue);
int intValue = BitConverter.ToInt32(decodedBytesValue, 0);
Console.WriteLine(intValue);
Try it online
P.S. If you simply wanted to convert your integer to a string (e.g. "2000") then you can simply use .ToString():
int initialValue = 2000;
string stringValue = initialValue.ToString();

How to convert From Hex To Dump in C#

I convert my Hex to dump to get special character like symbol but when I try to convert my "0x18" i "\u0018" this value. Can anyone give me solution regarding this matter.
Here is my code:
public static string FromHexDump(string sText)
{
Int32 lIdx;
string prValue ="" ;
for (lIdx = 1; lIdx < sText.Length; lIdx += 2)
{
string prString = "0x" + Mid(sText, lIdx, 2);
string prUniCode = Convert.ToChar(Convert.ToInt64(prString,16)).ToString();
prValue = prValue + prUniCode;
}
return prValue;
}
I used VB language. I have a database that already encrypted text to my password and the value is BAA37D40186D like this so I loop it by step 2 and it will like this 0xBA,0xA3,0x7D,0x40,0x18,0x6D and the VB result getting like this º£}#m

You can use this code:
var myHex = '\x0633';
var formattedString += string.Format(#"\x{0:x4}", (int)myHex);
Or you can use this code from MSDN (https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types):
string hexValues = "48 65 6C 6C 6F 20 57 6F 72 6C 64 21";
string[] hexValuesSplit = hexValues.Split(' ');
foreach (string hex in hexValuesSplit)
{
// Convert the number expressed in base-16 to an integer.
int value = Convert.ToInt32(hex, 16);
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Console.WriteLine("hexadecimal value = {0}, int value = {1}, char value = {2} or {3}",
hex, value, stringValue, charValue);
}

The question is unclear - what is the database column's type? Does it contain 6 bytes, or 12 characters with the hex encoding of the bytes? In any case, this has nothing to do with special characters or encodings.
First, 0x18 is the byte value of the Cancel Character in the Latin 1 codepage, not the pound sign. That's 0xA3. It seems that the byte values in the question are just the Latin 1 bytes for the string in hex.
.NET strings are Unicode (UTF16LE specifically). There's no UTF8 string or Latin1 string. Encodings and codepages apply when converting bytes to strings or vice versa. This is done using the Encoding class and eg Encoding.GetBytes
In this case, this code will convert the byte to the expected string form, including the unprintable character :
new byte[] {0xBA,0xA3,0x7D,0x40,0x18,0x6D};
var latinEncoding=Encoding.GetEncoding(1252);
var result=latinEncoding.GetString(dbBytes);
The result is :
º£}#m
With the Cancel character between # and m.
If the database column contains the byte values as strings :
it takes double the required space and
the hex values have to be converted back to bytes before converting to strings
The x format is used to convert numbers or bytes to their hex form and vice versa. For each byte value, ToString("x") returns the hex string.
The hex string can be produced from the original buffer with :
var dbBytes=new byte[] {0xBA,0xA3,0x7D,0x40,0x18,0x6D};
var hexString=String.Join("",dbBytes.Select(c=>c.ToString("x")));
There are many questions that show how to parse a byte string into a byte array. I'll just steal Jared Parson's LINQ answer :
public static byte[] StringToByteArray(string hex) {
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
With that, we can parse the hex string into a byte array and convert it to the original string :
var bytes=StringToByteArray(hexString);
var latinEncoding=Encoding.GetEncoding(1252);
var result=latinEncoding.GetString(bytes);

First of all u don't need dump but Unicode, I would recomend to read about unicode/encoding etc and why this is a problem with strings.
PS: solution : StackOverflow

Convert from Hexadecimal to Text

With respect to this tool, I need to convert hexadecimal data, irrespective of their combination to equivalent text. For example:
"HelloWorld" = 48656c6c6f576f726c64;
The solution needs to take into account that hexadecimal can be grouped in different lengths:
48656c6c 6f576f72 6c64
or
48 65 6c 6c 6f 57 6f 72 6c 64
All of the hexadecimal values supplied above read as HelloWorld when converted to text.

First, I would like to point out that this question has been asked many times on the web (here is one example). However, I am going to break this down step by step for you to hopefully teach you how to not only utilize your resources available on the web, but also how to solve your problem.
Overview: Converting from hexadecimal data to text that is able to be read by human beings is a straight-forward process in modern development languages; you clean the data (ensuring no illegal characters remain), then you convert down to the byte level so that you can work with the raw data. Finally, you'll convert that raw data into readable text utilizing a method that has already been created by Microsoft.
Important: Remember, for the conversion to work, you have to ensure you're converting in the same format that you started with:
ASCII -> ASCII: Works Great!
ASCII -> UTF7: Not so much...
Removing Illegal Characters: One of the first things you'll need to do is ensure the hexadecimal value that you're supplying doesn't contain any illegal characters. The simplest way to do this is to create an array of acceptable characters and then remove anything but these in a loop:
private string GetCleanHex(string hex) {
string legalCharacters = "0123456789ABCDEF";
string result = hex.ToUpper();
foreach (char c in result) {
if (!legalCharacters.Contains(c))
result = result.Replace(c.ToString(), string.Empty);
}
}
Getting The Byte Array: Once you've cleaned out all illegal characters, you can now convert your hexadecimal string into a byte array. This is required to convert from hexadecimal to ASCII. This step was provided by the linked post above:
private byte[] GetBytesFromHex(string hex) {
byte[] bytes = new byte[result.Length / 2];
for (int i = 0; i < bytes.Length; i++)
bytes[i] = Convert.ToByte(result.Substring(i * 2, 2), 16);
}
Converting To Text: Now that you've cleaned your data, and converted it to a byte[], you can now convert that byte data into ASCII. This can be done using a method available in Encoding.ASCII called GetString:
string text = Encoding.ASCII.GetString(bytes);
The Final Result: Plug all of this into your application and you'll have successfully converted hexadecimal data into clean, readable text:
string hex = GetCleanHex("506c 65 61736520 72 656164 20686f77 2074 6f 2061 73 6b 2e");
byte[] bytes = GetBytesFromHex(hex);
string text = Encoding.ASCII.GetString(bytes);
Console.WriteLine(text);
Console.ReadKey();
The code above will print the following text to the console:
Please read how to ask.

char[] array to textfile, including non-printing control characters

I got a quite tricky question (at least for me), i am currently coding a small and simple Encryption and Decryption program which works with polyalphabetical substitution.
This is the Encryption function:
static string Encr(string plainText, string key)
{
char[] chars = new char[plainText.Length];
int h = 0;
for (int i = 0; i < plainText.Length; i++)
{
if (h == key.Length)
h = 0;
int j = plainText[i] + key[h];
chars[i] = (char)j;
h++;
}
StreamWriter sw = new StreamWriter(FILE_NAME, false, Encoding.Unicode);
for (int x = 0; x < plainText.Length; x++)
{
sw.Write(chars[x]);
}
sw.Close();
return new string(chars);
}
It's working fine, now my problem is that the outputFile created by StreamWriter contains additional unwanted 00's (due to the Unicode encoding) and 2 totally wrong beginning-values also due Unicode encoding
http://abload.de/img/unbenannt-16ilx4.jpg
(sorry i can't post images directly cause i am < 10rep)
FF FE 8A 00 A6 00 CC 00 A4 00 B0 00
bold ones are the correct ones, FF FE in the beginning is completely useless for my encryption/decryption, and the 00's are unwanted (i know this is standard Unicode encoding, my question is how do i achieve it without this encoding but still be able to display the corresponding Unicode chars)
I hope it is clear what i want to achieve, i would like to write out the characters only.
So in this special case to hex-view of the encrypted file would look like this:
8A A6 CC A4 B0, encoded as Unicode UTF-8 according to http://unicode-table.com/, so the corresponding letters would be ¦Ì¤°
I have failed so far in all of my attempts to solve this. The solution is probably really easy through...

The characters at the start are the 2 bytes of the encoding preamble that identify the encoding that is used. I think you would be better of converting to bytes and writing the bytes to a general binary file without any encoding.
As in:
static string Encr(string plainText, string key)
{
char[] chars = new char[plainText.Length];
int h = 0;
for (int i = 0; i < plainText.Length; i++)
{
if (h == key.Length)
h = 0;
int j = plainText[i] + key[h];
chars[i] = (char)j;
h++;
}
File.WriteAllBytes(FILE_NAME, System.Text.Encoding.UTF8.GetBytes(chars));
return new String(chars, System.Text.Encoding.UTF8);
}

Is there a reason not to use Encoding.UTF8?

What you have appears to be working, but it's not really working. When you have encrypted the character codes, it doesn't make sense to look at them as character codes any more, they are just numerical values. Most values will correspond to some character, but not all.
The correct output from the encryption would be an array of 16 bit integers, not a string of characters. You can then turn the numbers into some textual representation if you like, but you can't use them as character codes and reliably get the same result back when decrypting.

Weird behavior when converting byte from text box to byte array to characters?

I have a textbox that I use to convert things like:
74 00 65 00 73 00 74 00
Back into a string, the above says "test" but for some reason when I click the convert button it will display only the first letter "t" 74 00 and other byte arrays work just as expected, the entire text is converted.
Here is the 2 codes I have tried which produce the same behavior of not properly converting the entire byte array back to word:
byte[] bArray = ByteStrToByteArray(iSequence.Text);
ASCIIEncoding enc = new ASCIIEncoding();
string word = enc.GetString(bArray);
iResult.Text = word + Environment.NewLine;
which uses the function:
private byte[] ByteStrToByteArray(string byteString)
{
byteString = byteString.Replace(" ", string.Empty);
byte[] buffer = new byte[byteString.Length / 2];
for (int i = 0; i < byteString.Length; i += 2)
buffer[i / 2] = (byte)Convert.ToByte(byteString.Substring(i, 2), 16);
return buffer;
}
another way I was using is:
string str = iSequence.Text.Replace(" ", "");
byte[] bArray = Enumerable.Range(0, str.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(str.Substring(x, 2), 16))
.ToArray();
ASCIIEncoding enc = new ASCIIEncoding();
string word = enc.GetString(bArray);
iResult.Text = word + Environment.NewLine;
Tried checking for the lengths to see if it was iterating thru and it was ...
Don't really know how to debug why this is happenning to the above byte array but all the other byte arrays seemed to be working just fine only this one is outputing only the first letter of it.
Have I done something wrong that could produce this behavior some how ?
What could I try in order to find out what is wrong ?

If you have the byte sequence
var bytes = new byte[] { 0x74, 0x00, 0x65, 0x00, 0x73, 0x00, 0x74, 0x00 };
and you decode it to a string using ASCII encoding (Encoding.ASCII), then you get
var result = Encoding.ASCII.GetString(bytes);
// result == "\x74\x00\x65\x00\x73\x00\x74\x00" == "t\0e\0s\0t\0"
Notice the Null \0 characters? When you display such a string in a textbox, only the part of the string until the first Null character is displayed.
Since you say the result should read "test", the input is actually not encoded in ASCII but in UTF-16LE (Encoding.Unicode).
var result = Encoding.Unicode.GetString(bytes);
// result == "\u0074\u0065\u0073\u0074" == "test"

your converting a unicode string to ascii , your not specifying the codepage on your machine to convert from.
System.Text.Encoding.GetEncoding("codepage").GetString()
if my memory serves me correct. Also to note, any control in .NET is unicode ... Soooooo.... what your trying to stick in the text box (if the conversion isent correct) could be an end of line character .. or eof, or any kind of control character. all depends on your codepage.

I tried debugging the first program using breakpoints in VS2010. I found out that the line
string word = enc.GetString(bArray);
output word as "t\0e\0s\0t".
The last line
iResult.Text = word + Environment.NewLine;
gives iResult.Text as simply "t".
So I was thinking since \0 is not a valid escape sequence, the compiler ignored everything after it. Could be wrong though but try removing all occurrences of 00 in the input string.
I'm not really into C#. I'm only suggesting this because it looks like C++.

It works for me:
string outputText = "t\0e\0s\0t";
outputText = outputText.Replace("\0", " ");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.