Below is the code I use for hashing and I am trying to store the hash in a string(strSHA256) with UTF-8 encoding.
using (SHA256 sha256Hash = SHA256.Create())
{
bytes = sha256Hash.ComputeHash(Encoding.UTF8.GetBytes(utf8EncodedString));
strSHA256 = Encoding.UTF8.GetString(bytes);
}
When I inspect the string variable I see "�u��Ou\u0004����O��zA�\u0002��\0�\a�}\u0012�L�Dr�"
type of value. My question is regarding to ? character marked in bold. They should be UTF-8 chars.
Why cant I view the UTF-8 special chars when I inspect from visual studio. The string o/p contains this weird char as well.
Note: I tried SHA256 with Node Js and I tend to see perfect UTF-8 special chars.
As #user2864740 mentioned bytes are not UTF8. Use base64 for bytes string representation.
var hello = "Hello world!";
using (var sha256Hash = SHA256.Create())
{
var bytes = sha256Hash.ComputeHash(Encoding.UTF8.GetBytes(hello));
// String representation
var stringForBytes = Convert.ToBase64String(bytes);
// HEX string representation
var str = new StringBuilder(bytes.Length * 2);
foreach (var b in bytes)
{
str.Append(b.ToString("X2"));
}
var hexString = str.ToString();
}
Related
I have a string that I need to hash in order to access an API. The API-creator has provided a codesnippet in Python, which hashes the code like this:
hashed_string = hashlib.sha1(string_to_hash).hexdigest()
When using this hashed string to access the API, everything is fine. I have tried to get the same hashed string result in C#, but without success. I have tried incredibly many ways but nothing has worked so far. I am aware about the hexdigest part aswell and I have kept that in mind when trying to mimic the behaviour.
Does anyone know how to get the same result in C#?
EDIT:
This is one of the many ways I have tried to reproduce the same result in C#:
public string Hash(string input)
{
using (SHA1Managed sha1 = new SHA1Managed())
{
var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
var sb = new StringBuilder(hash.Length * 2);
foreach (byte b in hash)
{
sb.Append(b.ToString("X2"));
}
return sb.ToString().ToLower();
}
}
This code is taken from: Hashing with SHA1 Algorithm in C#
Another way
public string ToHexString(string myString)
{
HMACSHA1 hmSha1 = new HMACSHA1();
Byte[] hashMe = new ASCIIEncoding().GetBytes(myString);
Byte[] hmBytes = hmSha1.ComputeHash(hashMe);
StringBuilder hex = new StringBuilder(hmBytes.Length * 2);
foreach (byte b in hmBytes)
{
hex.AppendFormat("{0:x2}", b);
}
return hex.ToString();
}
This code is taken from: Python hmac and C# hmac
EDIT 2
Some input/output:
C# (using second method provided in above description)
input: callerId1495610997apiKey3*_&E#N#B1)O)-1Y
output: 1ecded2b66e152f0965adb96727d96b8f5db588a
Python
input: callerId1495610997apiKey3*_&E#N#B1)O)-1Y
output: bf11a12bbac84737a39152048e299fa54710d24e
C# (using first method provided in above description)
input: callerId1495611935apiKey{[B{+%P)s;WD5&5x
output: 7e81e0d40ff83faf1173394930443654a2b39cb3
Python
input: callerId1495611935apiKey{[B{+%P)s;WD5&5x
output: 512158bbdbc78b1f25f67e963fefdc8b6cbcd741
C#:
public static string Hash(string input)
{
using (SHA1Managed sha1 = new SHA1Managed())
{
var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
var sb = new StringBuilder(hash.Length * 2);
foreach (byte b in hash)
{
sb.Append(b.ToString("x2")); // x2 is lowercase
}
return sb.ToString().ToLower();
}
}
public static void Main()
{
var x ="callerId1495611935apiKey{[B{+%P)s;WD5&5x";
Console.WriteLine(Hash(x)); // prints 7e81e0d40ff83faf1173394930443654a2b39cb3
}
Python
import hashlib
s = u'callerId1495611935apiKey{[B{+%P)s;WD5&5x'
enc = s.encode('utf-8') # encode in utf8
hash = hashlib.sha1(enc)
formatted = h.hexdigest()
print(formatted) # prints 7e81e0d40ff83faf1173394930443654a2b39cb3
Your main problem is that you are using different encodings for the same string in C# and Python. Use UTF8 in both languages and use the same casing. The output is the same.
Note that inside your input string (between callerId1495611935 and apiKey{[B{+%P)s;WD5&5x) there is an hidden \u200b character. That's why encoding your string in UTF-8 gives a different result than encoding it using ASCII. Does that character have to be inside your string?
my goal is to convert a .NET string (Unicode) into Windows-1252 and - if necessary - store the original UTF-8 string in a Base64 entity.
For example, the string "DJ Doena" converted to 1252 is still "DJ Doena".
However if you convert the Japanese kanjii for tree (木) into 1251 you end up with a question mark.
These are my test strings:
String doena = "DJ Doena";
String umlaut = "äöüßéèâ";
String allIn = "< ä ß á â & 木 >";
This is how I convert the string in the first place:
using (MemoryStream ms = new MemoryStream())
{
using (StreamWriter sw = new StreamWriter(ms, Encoding.UTF8))
{
sw.Write(decoded);
sw.Flush();
ms.Seek(0, SeekOrigin.Begin);
using (StreamReader sr = new StreamReader(ms, Encoding.GetEncoding(1252)))
{
encoded = sr.ReadToEnd();
}
}
}
Problem is, while debugging string comparison claims that both are indeed identical, so a simple == or .Equals() doesn't suffice.
This is how I try to find out if I need base64 and produce it:
private static String GetBase64Alternate(String utf8Text, String windows1252Text)
{
Byte[] utf8Bytes;
Byte[] windows1252Bytes;
String base64;
utf8Bytes = Encoding.UTF8.GetBytes(utf8Text);
windows1252Bytes = Encoding.GetEncoding(1252).GetBytes(windows1252Text);
base64 = null;
if (utf8Bytes.Length != windows1252Bytes.Length)
{
base64 = Convert.ToBase64String(utf8Bytes);
}
else
{
for(Int32 i = 0; i < utf8Bytes.Length; i++)
{
if(utf8Bytes[i] != windows1252Bytes[i])
{
base64 = Convert.ToBase64String(utf8Bytes);
break;
}
}
}
return (base64);
}
The first string doena is completely identical and doesn't produce a base64 result
Console.WriteLine(String.Format("{0} / {1}", windows1252Text, base64Text));
results in
DJ Doena /
But the second string umlauts already has twice the bytes in UTF-8 than in 1252 and thus produces an Base64 string even though it does not appear to be necessary:
äöüßéèâ / w6TDtsO8w5/DqcOow6I=
And the third one does what it's supposed to do (no more "木" but a "?", thus base64 needed):
< ä ß á â & ? > / PCDDpCDDnyDDoSDDoiAmIOacqCA+
Any clues how my Base64 getter could be enhanced a) for performance b) for better results?
Thank you in advance. :-)
I'm not sure I completely understood the question. But I tried. :) If I do understand correctly, this code does what you want:
static void Main(string[] args)
{
string[] testStrings = { "DJ Doena", "äöüßéèâ", "< ä ß á â & 木 >" };
foreach (string text in testStrings)
{
Console.WriteLine(ReencodeText(text));
}
}
private static string ReencodeText(string text)
{
Encoding encoding = Encoding.GetEncoding(1252);
string text1252 = encoding.GetString(encoding.GetBytes(text));
return text.Equals(text1252, StringComparison.Ordinal) ?
text : Convert.ToBase64String(Encoding.UTF8.GetBytes(text));
}
I.e. it encodes the text to Windows-1252, then decodes back to a string object, which it then compares with the original. If the comparison succeeds, it returns the original string, otherwise it encodes it to UTF8, and then to base64.
It produces the following output:
DJ Doena
äöüßéèâ
PCDDpCDDnyDDoSDDoiAmIOacqCA+
In other words, the first two strings are left intact, while the third is encoded as base64.
In your first code you are encoding the string using one encoding, then decoding it using a different encoding. That doesn't give you any reliable result at all; it's the equivalent of writing out a number in octal, then reading it as if it was in decimal. It seems to work just fine for numbers up to 7, but after that you get useless results.
The problem with the GetBase64Alternate method is that it's encoding a string to two different encodings, and assumes that the first encoding doesn't support some of the characters if the second encoding resulted in a different set of bytes.
Comparing the byte sequences doesn't tell you whether any of the encodings failed. The sequences will be different if it failed, but it will also be different if there are any characters that are encoded differently between the encodings.
What you want to do is to determine if the encoding actually worked for all characters. You can do that by creating an Encoding instance with a fallback for unsupported characters. There is an EncoderExceptionFallback class that you can use for that, which throws an EncoderFallbackException if it's called.
This code will try use the Windows-1252 encoding on a string, and sets the ok variable to false if the encoding doesn't support all characters in the string:
Encoding e = Encoding.GetEncoding(1252, new EncoderExceptionFallback(), new DecoderExceptionFallback());
bool ok = true;
try {
e.GetByteCount(allIn);
} catch (EncoderFallbackException) {
ok = false;
}
As you are not actually going to used the encoded result for anything, you can use the GetByteCount method. It will check how all characters would be encoded without producing the encoded result.
Used in your method it would be:
private static String GetBase64Alternate(string text) {
Encoding e = Encoding.GetEncoding(1252, new EncoderExceptionFallback(), new DecoderExceptionFallback());
bool ok = true;
try {
e.GetByteCount(allIn);
} catch (EncoderFallbackException) {
ok = false;
}
return ok ? null : Convert.ToBase64(Encoding.UTF8.GetBytes(text));
}
I have simple encrypt function which takes string, convert it to bytes, xor it and apply base64.
JAVA:
String key = "1234";
public String encrypt(String plainText) throws Exception {
byte [] input = plainText.getBytes("UTF8");
byte [] output = new byte[input.length];
for(int i=0; i<input.length; i++)
output[i] = ((byte)(input[i] ^ key.charAt(i % key.length())));
String utf8 = new String(output);
return Utils.encode(utf8);
}
Then I save it to a file and open it in another application in C# using this decrypting method:
C#:
string key="1234";
public string Decrypt(string CipherText)
{
var decoded = System.Convert.FromBase64String(CipherText);
var dexored = xor(decoded, key);
return Encoding.UTF8.GetString(dexored);
}
byte[] xor(byte[] text, string key)
{
byte[] res = new byte[text.Length];
for (int c = 0; c < text.Length; c++)
{
res[c] = (byte)((uint)text[c] ^ (uint)key[c % key.Length]);
}
return res;
}
Problem is that accented characters like ěščřžýáí fail to decode.
Do you have any idea how to determine from which part the problem comes from or how to find it out? Looks to me that it has something to do with UTF-8.
I don't need suggestions for a better encryption. I have already working AES but I want to switch to xored base64 due to performance issues.
Thank you.
This is the problem:
String utf8 = new String(output);
return Utils.encode(utf8);
You should be using base64 here on output, rather than constructing a string out of the now-not-really-text data. It's possible that Utils.encode performs base64 encoding, having converted the input string back to byte for some reason - but fundamentally you shouldn't be constructing a string with your encrypted bytes using the String constructor.
If your Utils class has an encode(byte[]) method - and if that really does base64-encode the data (it's very frustrating to only have half of the code you're using) then you can just use:
return Utils.encode(output);
I have to process JSON files that looks like this:
\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\/b> \u043d\u0430\u0447
Unfortunately, I'm not sure how this encoding is called.
I would like to convert it to .NET Unicode strings. What's the easies way to do it?
This is Unicode characters for Russian alphabet.
try simply put this line in VisualStudio and it will parse it.
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c";
Or if you want to convert this string to another encoding, for example utf8, try this code:
static void Main()
{
string unicodeString = "\u0432\u043b\u0430\u0434\u043e\u043c <b>\u043f\u0443\u0442\u0438\u043c<\b> \u043d\u0430\u0447";
// Create two different encodings.
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] utf8Bytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Console.ReadKey();
}
code taken from Convert
I am taking the MD5 hash of an image file and I want to use the hash as a filename.
How do I convert the hash to a string that is valid filename?
EDIT: toString() just gives "System.Byte[]"
How about this:
string filename = BitConverter.ToString(yourMD5ByteArray);
If you prefer a shorter filename without hyphens then you can just use:
string filename =
BitConverter.ToString(yourMD5ByteArray).Replace("-", string.Empty);
System.Convert.ToBase64String
As a commenter pointed out -- normal base 64 encoding can contain a '/' character, which obivously will be a problem with filenames. However, there are other characters that are usable, such as an underscore - just replace all the '/' with an underscore.
string filename = Convert.ToBase64String(md5HashBytes).Replace("/","_");
Try this:
Guid guid = new Guid(md5HashBytes);
string hashString = guid.ToString("N");
// format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
string hashString = guid.ToString("D");
// format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
string hashString = guid.ToString("B");
// format: {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
string hashString = guid.ToString("P");
// format: (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
This is probably the safest for file names. You always get a hex string and never worry about / or +, etc.
byte[] data = md5Hash.ComputeHash(Encoding.UTF8.GetBytes(inputString));
StringBuilder sBuilder = new StringBuilder();
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
string hashed = sBuilder.ToString();
Try this:
string Hash = Convert.ToBase64String(MD5.Create().ComputeHash(Encoding.UTF8.GetBytes("sample")));
//input "sample" returns = Xo/5v1W6NQgZnSLphBKb5g==
or
string Hash = BitConverter.ToString(MD5.Create().ComputeHash(Encoding.UTF8.GetBytes("sample")));
//input "sample" returns = 5E-8F-F9-BF-55-BA-35-08-19-9D-22-E9-84-12-9B-E6
Technically using Base64 is bad if this is Windows, filenames are case insensitive (at least in explorers view).. but in base64, 'a' is different to 'A', this means that perhaps unlikely but you end up with even higher rate of collision..
A better alternative is hexadecimal like the bitconverter class, or if you can- use base32 encoding (which after removing the padding from both base64 and base32, and in the case of 128bit, will give you similar length filenames).
Try Base32 hash of MD5. It gives filename-safe case insensitive strings.
string Base32Hash(string input)
{
byte[] buf = MD5.Create().ComputeHash(Encoding.UTF8.GetBytes(input));
return String.Join("", buf.Select(b => "abcdefghijklmonpqrstuvwxyz234567"[b & 0x1F]));
}