UTF-16 Encoding in Java versus C#

UTF-16 Encoding in Java versus C# - c#

I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it.
The following is the piece of code in Java:
public static void main(String[] args) {
String str = "preparar mantecado con coca cola";
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
digest.update(str.getBytes("UTF-16"));
byte[] hash = digest.digest();
String output = "";
for(byte b: hash){
output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 );
}
System.out.println(output);
} catch (Exception e) {
}
}
The output for this is: 249ece65145dca34ed310445758e5504
The following is the piece of code in C#:
public static string GetMD5Hash()
{
string input = "preparar mantecado con coca cola";
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string output= s.ToString();
Console.WriteLine(output);
}
The output for this is: c04d0f518ba2555977fa1ed7f93ae2b3
I am not sure, why the outputs are not the same. How do we change the above piece of code, so that both of them return the same output?

UTF-16 != UTF-16.
In Java, getBytes("UTF-16") returns an a big-endian representation with optional byte-ordering mark. C#'s System.Text.Encoding.Unicode.GetBytes returns a little-endian representation. I can't check your code from here, but I think you'll need to specify the conversion precisely.
Try getBytes("UTF-16LE") in the Java version.

The first thing I can find, and this might not be the only problem, is that C#'s Encoding.Unicode.GetBytes() is littleendian, while Java's natural byte order is bigendian.

You could use the System.Text.Enconding.Unicode.GetString(byte[]) to convert back from byte to string. In this way you're sure that all happens in Unicode encoding.

Related

Cannot get same hash in C# as in python

I have a string that I need to hash in order to access an API. The API-creator has provided a codesnippet in Python, which hashes the code like this:
hashed_string = hashlib.sha1(string_to_hash).hexdigest()
When using this hashed string to access the API, everything is fine. I have tried to get the same hashed string result in C#, but without success. I have tried incredibly many ways but nothing has worked so far. I am aware about the hexdigest part aswell and I have kept that in mind when trying to mimic the behaviour.
Does anyone know how to get the same result in C#?
EDIT:
This is one of the many ways I have tried to reproduce the same result in C#:
public string Hash(string input)
{
using (SHA1Managed sha1 = new SHA1Managed())
{
var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
var sb = new StringBuilder(hash.Length * 2);
foreach (byte b in hash)
{
sb.Append(b.ToString("X2"));
}
return sb.ToString().ToLower();
}
}
This code is taken from: Hashing with SHA1 Algorithm in C#
Another way
public string ToHexString(string myString)
{
HMACSHA1 hmSha1 = new HMACSHA1();
Byte[] hashMe = new ASCIIEncoding().GetBytes(myString);
Byte[] hmBytes = hmSha1.ComputeHash(hashMe);
StringBuilder hex = new StringBuilder(hmBytes.Length * 2);
foreach (byte b in hmBytes)
{
hex.AppendFormat("{0:x2}", b);
}
return hex.ToString();
}
This code is taken from: Python hmac and C# hmac
EDIT 2
Some input/output:
C# (using second method provided in above description)
input: callerId1495610997apiKey3*_&E#N#B1)O)-1Y
output: 1ecded2b66e152f0965adb96727d96b8f5db588a
Python
input: callerId1495610997apiKey3*_&E#N#B1)O)-1Y
output: bf11a12bbac84737a39152048e299fa54710d24e
C# (using first method provided in above description)
input: callerId1495611935apiKey{[B{+%P)s;WD5&5x
output: 7e81e0d40ff83faf1173394930443654a2b39cb3
Python
input: callerId1495611935apiKey{[B{+%P)s;WD5&5x
output: 512158bbdbc78b1f25f67e963fefdc8b6cbcd741

C#:
public static string Hash(string input)
{
using (SHA1Managed sha1 = new SHA1Managed())
{
var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
var sb = new StringBuilder(hash.Length * 2);
foreach (byte b in hash)
{
sb.Append(b.ToString("x2")); // x2 is lowercase
}
return sb.ToString().ToLower();
}
}
public static void Main()
{
var x ="callerId1495611935apiKey{[B{+%P)s;WD5&5x";
Console.WriteLine(Hash(x)); // prints 7e81e0d40ff83faf1173394930443654a2b39cb3
}
Python
import hashlib
s = u'callerId1495611935apiKey{[B{+%P)s;WD5&5x'
enc = s.encode('utf-8') # encode in utf8
hash = hashlib.sha1(enc)
formatted = h.hexdigest()
print(formatted) # prints 7e81e0d40ff83faf1173394930443654a2b39cb3
Your main problem is that you are using different encodings for the same string in C# and Python. Use UTF8 in both languages and use the same casing. The output is the same.
Note that inside your input string (between callerId1495611935 and apiKey{[B{+%P)s;WD5&5x) there is an hidden \u200b character. That's why encoding your string in UTF-8 gives a different result than encoding it using ASCII. Does that character have to be inside your string?

How to match output form MD5 hash string in C# and in Ruby?

Here is code of C#
public static string GetMD5Hash(string input)
{
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
return s.ToString();
}
Here is code of Ruby
def getMD5Hash(str）
bs = Digest::MD5.digest( str.encode( 'UTF-8' ) ).bytes.to_a
bs = bs.map { |b| b.to_s(16).downcase }
str_bs = bs.join
return str_bs
end
When I am running ruby code and C# code to encrypt the same string, the result from Ruby is not the same result as C# provided.
How to modify Ruby code? Thanks a lot

I'm not a ruby programmer, but there is something wrong with how you are converting to hex. It looks like values such as '0a' are rendering as 'a' making the output incorrect. Ruby already has a method for this though, Digest::MD5.hexdigest, so I'm not sure why anyone would roll their own.
I would write the ruby function:
def getMD5Hash(str)
return Digest::MD5.hexdigest(str.encode( 'UTF-8' ))
end

C# ByteString to ASCII String

I am looking for a smart way to convert a string of hex-byte-values into a string of 'real text' (ASCII Characters).
For example I have the word "Hello" written in Hexadecimal ASCII: 48 45 4C 4C 4F. And using some method I want to receive the ASCII text of it (in this case "Hello").
// I have this string (example: "Hello") and want to convert it to "Hello".
string strHexa = "48454C4C4F";
// I want to convert the strHexa to an ASCII string.
string strResult = ConvertToASCII(strHexa);
I am sure there is a framework method. If this is not the case of course I could implement my own method.
Thanks!

var str = Encoding.UTF8.GetString(SoapHexBinary.Parse("48454C4C4F").Value); //HELLO
PS: SoapHexBinary is in System.Runtime.Remoting.Metadata.W3cXsd2001 namespace

I am sure there is a framework method.
A a single framework method: No.
However the second part of this: converting a byte array containing ASCII encoded text into a .NET string (which is UTF-16 encoded Unicode) does exist: System.Text.ASCIIEncoding and specifically the method GetString:
string result = ASCIIEncoding.GetString(byteArray);
The First part is easy enough to do yourself: take two hex digits at a time, parse as hex and cast to a byte to store in the array. Seomthing like:
byte[] HexStringToByteArray(string input) {
Debug.Assert(input.Length % 2 == 0, "Must have two digits per byte");
var res = new byte[input.Length/2];
for (var i = 0; i < input.Length/2; i++) {
var h = input.Substring(i*2, 2);
res[i] = Convert.ToByte(h, 16);
}
return res;
}
Edit: Note: L.B.'s answer identifies a method in .NET that will do the first part more easily: this is a better approach that writing it yourself (while in a, perhaps, obscure namespace it is implemented in mscorlib rather than needing an additional reference).

StringBuilder sb = new StringBuilder();
for (int i = 0; i < hexStr.Length; i += 2)
{
string hs = hexStr.Substring(i, 2);
sb.Append(Convert.ToByte(hs, 16));
}

Ignore Zero in Calculate Hash by HMACSHA256

I use Crypto-JS v2.5.3 (hmac.min.js) http://code.google.com/p/crypto-js/ library to calculate client side hash and the script is:
$("#PasswordHash").val(Crypto.HMAC(Crypto.SHA256, $("#pwd").val(), $("#PasswordSalt").val(), { asByte: true }));
this return something like this:
b3626b28c57ea7097b6107933c6e1f24f586cca63c00d9252d231c715d42e272
Then in Server side I use the following code to calculate hash:
private string CalcHash(string PlainText, string Salt) {
string result = "";
ASCIIEncoding enc = new ASCIIEncoding();
byte[]
baText2BeHashed = enc.GetBytes(PlainText),
baSalt = enc.GetBytes(Salt);
System.Security.Cryptography.HMACSHA256 hasher = new HMACSHA256(baSalt);
byte[] baHashedText = hasher.ComputeHash(baText2BeHashed);
result = string.Join("", baHashedText.ToList().Select(b => b.ToString("x")).ToArray());
return result;
}
and this method returned:
b3626b28c57ea797b617933c6e1f24f586cca63c0d9252d231c715d42e272
As you see there is just some zero characters that the server side method ignore that. where is the problem? is there any fault with my server side method? I just need this two value be same with equal string and salt.

As you see there is just some zero characters that the server side method ignore that. where is the problem?
Here - your conversion to hex in C#:
b => b.ToString("x")
If b is 10, that will just give "a" rather than "0a".
Personally I'd suggest a simpler hex conversion:
return BitConverter.ToString(baHashedText).Replace("-", "").ToLowerInvariant();
(You could just change "x" to "x2" instead, to specify a length of 2 characters, but it's still a somewhat roundabout way of performing a bytes-to-hex conversion.)

Everyone else keeps reccomending to use things like using BitConverter and trimming "-" or using ToString(x2). There is a better solution, a class that has been in .NET since 1.1 SoapHexBinary.
using System.Runtime.Remoting.Metadata.W3cXsd2001;
public byte[] StringToBytes(string value)
{
SoapHexBinary soapHexBinary = SoapHexBinary.Parse(value);
return soapHexBinary.Value;
}
public string BytesToString(byte[] value)
{
SoapHexBinary soapHexBinary = new SoapHexBinary(value);
return soapHexBinary.ToString();
}
This will produce the exact format you want.

I believe the problem is here:
result = string.Join("", baHashedText.ToList().Select(b => b.ToString("x")).ToArray());
change it to:
result = string.Join("", baHashedText.ToList().Select(b => b.ToString("x2")).ToArray());

How can i convert a string of characters into binary string and back again?

I need to convert a string into it's binary equivilent and keep it in a string. Then return it back into it's ASCII equivalent.

You can encode a string into a byte-wise representation by using an Encoding, e.g. UTF-8:
var str = "Out of cheese error";
var bytes = Encoding.UTF8.GetBytes(str);
To get back a .NET string object:
var strAgain = Encoding.UTF8.GetString(bytes);
// str == strAgain
You seem to want the representation as a series of '1' and '0' characters; I'm not sure why you do, but that's possible too:
var binStr = string.Join("", bytes.Select(b => Convert.ToString(b, 2)));
Encodings take an abstract string (in the sense that they're an opaque representation of a series of Unicode code points), and map them into a concrete series of bytes. The bytes are meaningless (again, because they're opaque) without the encoding. But, with the encoding, they can be turned back into a string.
You seem to be mixing up "ASCII" with strings; ASCII is simply an encoding that deals only with code-points up to 128. If you have a string containing an 'é', for example, it has no ASCII representation, and so most definitely cannot be represented using a series of ASCII bytes, even though it can exist peacefully in a .NET string object.
See this article by Joel Spolsky for further reading.

You can use these functions for converting to binary and restore it back :
public static string BinaryToString(string data)
{
List<Byte> byteList = new List<Byte>();
for (int i = 0; i < data.Length; i += 8)
{
byteList.Add(Convert.ToByte(data.Substring(i, 8), 2));
}
return Encoding.ASCII.GetString(byteList.ToArray());
}
and for converting string to binary :
public static string StringToBinary(string data)
{
StringBuilder sb = new StringBuilder();
foreach (char c in data.ToCharArray())
{
sb.Append(Convert.ToString(c, 2).PadLeft(8, '0'));
}
return sb.ToString();
}
Hope Helps You.

First convert the string into bytes, as described in my comment and in Cameron's answer; then iterate, convert each byte into an 8-digit binary number (possibly with Convert.ToString, padding appropriately), then concatenate. For the reverse direction, split by 8 characters, run through Convert.ToInt16, build up a byte array, then convert back to a string with GetString.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

UTF-16 Encoding in Java versus C# - c#

The first thing I can find, and this might not be the only problem, is that C#'s Encoding.Unicode.GetBytes() is littleendian, while Java's natural byte order is bigendian.

You could use the System.Text.Enconding.Unicode.GetString(byte[]) to convert back from byte to string. In this way you're sure that all happens in Unicode encoding.

Related

Cannot get same hash in C# as in python

How to match output form MD5 hash string in C# and in Ruby?

C# ByteString to ASCII String

Ignore Zero in Calculate Hash by HMACSHA256

How can i convert a string of characters into binary string and back again?

Categories

Resources