This question already exists:
Closed 11 years ago.
Possible Duplicate:
Reversing an MD5 Hash
Given this method in c#
public string CalculateFileHash(string filePaths) {
var csp = new MD5CryptoServiceProvider();
var pathBytes = csp.ComputeHash(Encoding.UTF8.GetBytes(filePaths));
return BitConverter.ToUInt64(pathBytes, 0).ToString();
}
how would one reverse this process with a "DecodeFileHash" method?
var fileQuery = "fileone.css,filetwo.css,file3.css";
var hashedQuery = CalculateFileHash(fileQuery); // e.g. "23948759234"
var decodedQuery = DecodeFileHash(hashedQuery); // "fileone.css,filetwo.css,file3.css"
where decodedQuery == fileQuery in the end.
Is this even possible? If it isn't possible, would there by any way to generate a hash that I could easily decode?
Edit: So just to be clear, I just want to compress the variable "fileQuery" and decompress fileQuery to determine what it originally was. Any suggestions for solving that problem since hashing/decoding is out?
Edit Again: just doing a base64 encode/decode sounds like the optimal solution then.
public string EncodeTo64(string toEncode) {
var toEncodeAsBytes = Encoding.ASCII.GetBytes(toEncode);
var returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
return returnValue;
}
public string DecodeFrom64(string encodedData) {
var encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
var returnValue = Encoding.ASCII.GetString(encodedDataAsBytes);
return returnValue;
}
Impossible. By definition and design hashes cannot be reverted to plain text or their original input.
It sounds like you are actually trying to compress the files. If that is the case, here is a simple method to do so using GZip:
public static byte[] Compress( byte[] data )
{
var output = new MemoryStream();
using ( var gzip = new GZipStream( output, CompressionMode.Compress, true ) )
{
gzip.Write( data, 0, data.Length );
gzip.Close();
}
return output.ToArray();
}
A hash is derived from the original information, but it does not contain the original information. If you want a shorter value that hides the original information, but can be resolved to the original value your options are fairly limited:
Compress the original information. If you need a string, then your original information would have to be fairly large in order for the compressed, base-64-encoded version to not be even bigger than the original data.
Encrypt the original information - this is more secure than just compressing it, and can be combined with compression, but it's also probably going to be larger than the original information.
Store the original information somewhere and return a lookup key.
If you want to be able to get the data back, you want compression, not hashing.
What you want to do is Encrypt and Decrypt....
Not Hash and Unhash which, as #Thomas pointed out, is impossible.
Hashes are typically defeated using rainbow tables or some other data set which includes something which produces the same hash... not guaranteed to be the input value, just some value which produces the same output in the hashing algorithm.
Jeff Atwood has some good code for understanding encryption here:
http://www.codeproject.com/KB/security/SimpleEncryption.aspx
If that's useful to you
A cryptographic hash is by definition not reversible with typical amounts of computation power. It's usually not even possible to find any input which has the same hash as your original input.
Getting back the original input is mathematically impossible if there are more than 2^n different inputs. With n being the bitlength of the hash(128 for md5). Look up the pidgeonhole principle.
A hash is no lossless compression function.
A cryptographic hash, like MD5, is designed to be a one-way function, that is, it is computationally infeasable to derive the source data from which a given hash was computed. MD5, though, hasn't considered to be secure for some time, due to weaknesses that have been discovered:
Wikipedia on MD5 Security
MD5 Considered Harmful
Another weakness in MD5 is that due to its relative small size, large rainbow tables have been published that let you look up a given MD5 hash to get a source input that will collide with the specified hash value.
Related
I'd like to know if there is a standard code to generate a SHA256 hash using a key. I've come across several types of code, however, they don't generate the same output.
Code found at JokeCamp
private string CreateToken(string message, string secret)
{
secret = secret ?? "";
var encoding = new System.Text.ASCIIEncoding();
byte[] keyByte = encoding.GetBytes(secret);
byte[] messageBytes = encoding.GetBytes(message);
using (var hmacsha256 = new HMACSHA256(keyByte))
{
byte[] hashmessage = hmacsha256.ComputeHash(messageBytes);
return Convert.ToBase64String(hashmessage);
}
}
Here's another one that I found
private static string ComputeHash(string apiKey, string message)
{
var key = Encoding.UTF8.GetBytes(apiKey);
string hashString;
using (var hmac = new HMACSHA256(key))
{
var hash = hmac.ComputeHash(Encoding.UTF8.GetBytes(message));
hashString = Convert.ToBase64String(hash);
}
return hashString;
}
The code generated by both of these are different to what is generated by http://www.freeformatter.com/hmac-generator.html#ad-output
I'll be using the SHA256 for one of our external API's where consumers would hash the data and send it to us. So we just want to make sure we use a standard approach so that they send us the correct hash. Also, I would like to know if there are any well-known nugets for this. I've also tried to find a solution with Bouncy Castle, however, I couldn't find one that uses a key to hash.
The difference is because of the character encodings (ASCII vs UTF-8 in your examples). Note that the hashing algorithm takes an array of bytes, and you do the conversion from a string to that byte-array beforehand.
Your question "whats the standard code" probably hasnt an answer, id say that if you expect the input to contain content from just the ASCII character-space, go for that, if not go for UTF-8. Either way - communicate it to your users
If you want to look at it from a usability perspective and make it the optimal for your users - go for both. Hash the content both ways and check agains the users incoming hash, but it all depends on your evaluation on clock-cycles vs security vs usability (you can have two)
They are almost equivalent.
The difference is how the encoding for the string is established.
In the first portion of code it assumes ASCII, whereas in the second portion it assumes UTF-8. It is possible that the string used another encoding which is none of those.
But regardless of that, the idea is to understand what is the goal of this operation. The truly relevant things in this context are:
Given equal input, output should be the same
There should be no way to retrieve the plaintext only by knowing the output (within a reasonable amount of time)
After hashing, you no longer require the original input in plaintext.
A secure cryptographic hashing function (meaning not older functions like MD5) achieves that.
Then, if your data store where hashes are stored is compromised, the attacker would only have a hash which cannot be used to retrieved the original plaintext. This is why hashing is used rather than encryption, encryption is a reversible operation (through decryption).
Then, within the system, if you've made the decision to use one encoding, you need to keep that decision consistent across all components in your system so they can interoperate.
So, I found a question just like this with an accepted answer, so I hopped off and tried to implement the necessary changes. For some reason though, I am still getting two different strings, and I don't know what it is I'm doing wrong. I tried to comment on the accepted answer to find help, but I lack the reputation to do so. So, I figured I'd ask the question again (that question was 2 years old, too).
Let me explain what I'm doing.
In php...
$intermediatesalt = md5(uniqid(rand(), true));
$salt = substr($intermediatesalt, 0, 8);
$hashpassword = base64_encode(
hash('sha256', $salt . hash('sha256', $password), true)
);
The line that says $hashpassword was taken from the accepted answer from this question. I didn't write any of this php, my friend did. I only know enough about programming to alter the code, but I couldn't create anything in php, let alone HTML.
After the hash has been created, both the hash and the salt are stored on a database.
The C# method I'm using is also from the answer I found here.
public static string ComputeHash(string plainText, string salt)
{
// Convert plain text into a byte array.
byte[] plainTextBytes = Encoding.UTF8.GetBytes(plainText);
byte[] saltBytes = Encoding.UTF8.GetBytes(salt);
SHA256Managed hash = new SHA256Managed();
// Compute hash value of salt.
byte[] plainHash = hash.ComputeHash(plainTextBytes);
byte[] concat = new byte[plainHash.Length + saltBytes.Length];
System.Buffer.BlockCopy(saltBytes, 0, concat, 0, saltBytes.Length);
System.Buffer.BlockCopy(plainHash, 0, concat, saltBytes.Length, plainHash.Length);
byte[] tHashBytes = hash.ComputeHash(concat);
// Convert result into a base64-encoded string.
string hashValue = Convert.ToBase64String(tHashBytes);
// Return the result.
return hashValue;
}
But for some bizarre reason, even though the person who asked the question got what s/he wanted, I am still getting an undesired result.
This is the block of code that loads player data then compares the the php generated hashed password with the c# generated hashed password.
// load the player based on the given email
PlayerStructure.Player newPlayer = MySQL.loadPlayer(email);
// compute a hash based on the given password and the retrieved salt
// then, compare it to the hashed password on the database
string hPassword = Program.ComputeHash(password, newPlayer.salt);
if (newPlayer.password != hPassword)
{
sendStatusMsg(index, "Invalid password.");
sendStatusMsg(index, "1: " + hPassword);
sendStatusMsg(index, "2: " + newPlayer.password);
return;
}
MySQL.loadPlayer loads the hash string and the salt string from the database, and I had to use those sendStatusMessage methods to print strings as this is for a server application that takes up to 15 minutes to load data from the database in debug mode. So I run the debug exe instead, ergo no Console.WriteLine calls. newPlayer.password is the hashed password stored on the database (the password created with php). hPassword is the computed hash using the C# method I borrowed.
The salt is e0071fa9 and the plain-text password is 'test'.
This is the result I get with the sendStatusMsg methods:
Invalid password.
1: 3QQyVEfmBN4kJJHsRQ307TCDYxNMpc4k3r3udBaVz8Y=
2: moHRVv9C0JvpdTk28xGm3uvPPuhatK2rAHXd5he4ZJI=
Any ideas as to what I might be doing incorrectly? As I've stated before, I literally just used the answer on here (borrowing the code almost verbatim) and I'm still not getting my desired result. This is the question I referenced: Why isn't my PHP SHA256 hash equivalent to C# SHA256Managed hash
Because as the answer to question you are linking to says, hash returns a hex-encoded string instead of raw bytes by default. You are passing true as the third parameter to override this behavior for the outer call to hash, but you are not doing the same for the inner call.
In fact why are there two hashes in the first place? The inner hash doesn't seem to serve any purpose at all.
As Jon stated earlier, the php was slightly flawed. If anyone else is attempting to do something like this, know that
$hashpassword = base64_encode(
hash('sha256', $salt . hash('sha256', $password), true)
);
as opposed to
$hashpassword = base64_encode(
hash('sha256', $salt . hash('sha256', $password, true), true)
);
makes a HUGE difference. The second line of php is what did the trick.
I hope this helps!
Please on PHP avoid using your own hashing mechanism, unless you are a security / crypto expert and (more important, know what you are doing).
Have a good look on how password_hash works in PHP (and if using a PHP version that doesn't support it - please upgrade it), you can always use Anthony Ferrara compatibility library for good effect:
https://github.com/ircmaxell/password_compat
If you follow his blog, you will get some hints about the issues at stake:
http://blog.ircmaxell.com/search/label/Security
:)
I am trying to connect my system to a banks payment system. The problem is, their documentation was mostly not correct, if it wasn't a complete disaster.
In the documentation of 3D secure system, the bank asks me to fill out a html form and submit it to their system. The form should include some data AND a SHA1 hash of the data with the data. I tried many times but the bank's system returned "Hash not correct" error all the time.
After some inspection on their example C# code, I found a function they used to get hash results. The problem is function was doing some other stuff to the data rather than just hashing them. And bigger problem is I cannot find out what this piece of code is doing to the string that hashed.
public static string CreateHash(string notHashedStr)
{
SHA1 sha1 = new SHA1CryptoServiceProvider();
byte[] notHashedBytes = System.Text.Encoding.ASCII.GetBytes(notHashedStr);
byte[] hashedByte = sha1.ComputeHash(notHashedBytes);
string hashedStr = System.Convert.ToBase64String(hashedByte);
return hashedStr;
}
I have nearly no experience on .Net framework and also I am on a mac, so I cannot test the code easily, and MSDN is definitely not for me(I am a Ruby developer most of the time, and I know enough C). If anyone can explain what these functions do to the string to be hashed, i'll be very glad.
It's very simple.
Get the ASCII encoded bytes from notHashedStr.
Create a SHA1 hash from that bytes
Convert that hash in a Base64 encoded string.
return that Base64-SHA1-ASCII-String.
I never did any ruby, but it must look a bit like this.
require 'digest/sha1'
returnValue = Digest::SHA1.base64digest 'notHashedStr'
I know there are similar questions already on SO but none of them seem to address this problem. I have inherited the following c# code that has been used to create password hashes in a legacy .net app, for various reasons the C# implementation is now being migrated to php:
string input = "fred";
SHA256CryptoServiceProvider provider = new SHA256CryptoServiceProvider();
byte[] hashedValue = provider.ComputeHash(Encoding.ASCII.GetBytes(input));
string output = "";
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
foreach ( char c in asciiString ) {
int tmp = c;
output += String.Format("{0:x2}",
(uint)System.Convert.ToUInt32(tmp.ToString()));
}
return output;
My php code is very simple but for the same input "fred" doesn't produce the same result:
$output = hash('sha256', "fred");
I've traced the problem down to an encoding issue - if I change this line in the C# code:
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
to
string asciiString = ASCIIEncoding.UTF7.GetString(hashedValue);
Then the php and C# output match (it yields d0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5).
Since I'm not able to change the .net code I need to work out how to replicate the results in php.
Thanks in advance for any help,
I don’t know PHP well enough to answer your question; however, I must point out that your C# code is broken. Try generating the hash of these two inputs: "âèí" and "çñÿ". You will find that their hash collides:
3f3b221c6c6e3f71223f51695d456d52223f243f3f363949443f3f763b483615
The first bug lies in this operation:
Encoding.ASCII.GetBytes(input)
This assumes that all characters within your input are US-ASCII. Any non-ASCII characters would cause the encoder to fall back to the byte value for the ? character, thereby giving (unwanted) hash collisions, as demonstrated above. Notwithstanding, this will not be an issue if your input is constrained to only allow US-ASCII characters.
The other (more severe) bug lies in the following operation:
ASCIIEncoding.ASCII.GetString(hashedValue)
ASCII only defines mappings for values 0–127. Since the elements of your hashedValue byte array may contain any byte value (0–255), encoding them as ASCII would cause data to be lost whenever a value greater than 127 is encountered. This may lead to further “unwanted” (read: potentially maliciously generated) hash collisions, even when your original input was US-ASCII.
Given that, statistically, half of the bytes constituting your hashes would be greater than 127, then you are losing at least half the strength of your hash algorithm. If a hacker gains access to your stored hashes, it is quite likely that they will manage to devise an attack to generate hash collisions by exploiting this cryptographic weakness.
Edit: Notwithstanding the considerations mentioned in my posts and Jon’s, here is the PHP code that succumbs to the same weakness – so to speak – as your C# code, and thereby gives the same hash:
$output = hash('sha256', $input, true);
for ($i = 0; $i < strlen($output); $i++)
if ($output[$i] > chr(127))
$output[$i] = '?';
$output = bin2hex($output);
Could you use mb_convert_encoding (see http://php.net/manual/en/function.mb-convert-encoding.php - the page also has a link to a list of supported encodings) to convert the PHP string to ASCII from UTF7?
I've traced the problem down to an encoding issue
Yes. You're trying to treat arbitrary binary data as if it's valid text-encoded data. It's not. You should not be using any Encoding here.
If you want the results in hex, the simplest approach is to use BitConverter.ToString
string text = BitConverter.ToString(hashedValue).Replace("-", "").ToLower();
And yes, as pointed out elsewhere, you probably shouldn't be using ASCII to convert the text to binary at the start of the hashing process. I'd probably use UTF-8.
It's really important that you understand the problem here though, as otherwise you'll run into it in other places too. You should only use encodings such as ASCII, UTF-8 etc (on any platform) when you've genuinely got encoded text data. You shouldn't use them for images, the results of cryptography, the results of hashing, etc.
EDIT: Okay, you say you can't change the C# code... it's not clear whether that just means you've got legacy data, or whether you need to keep using the C# code regardless. You should absolutey not run this code for a second longer than you have to.
But in PHP, you may find you can get away with just replacing every byte with a value >= 0x80 in the hash with 0x3F, which is the ASCII for "question mark". If you look through your data you'll probably find there are a lot of 3F bytes in there.
If you can get this to work, I would strongly suggest that you migrate over to the true MD5 hash without losing information like this. Wherever you're storing the hashes, store two: the legacy one (which is all you have now) and the rehashed one. Whenever you're asked to validate that a password is correct, you should:
Check whether you have a "new" one; if so, only use that - ignore the legacy one.
If you only have a legacy one:
Hash the password in the broken way to check whether it's correct
If it is, hash it again properly and store the results in the "new" place.
Then when everyone's logged in correctly once, you'll be able to wipe out the legacy hashes.
I'm working on a program that searches entire drives for a given file. At the moment, I calculate an MD5 hash for the known file and then scan all files recursively, looking for a match.
The only problem is that MD5 is painfully slow on large files. Is there a faster alternative that I can use while retaining a very small probablity of false positives?
All code is in C#.
Thank you.
Update
I've read that even MD5 can be pretty quick and that disk I/O should be the limiting factor. That leads me to believe that my code might not be optimal. Are there any problems with this approach?
MD5 md5 = MD5.Create();
StringBuilder sb = new StringBuilder();
try
{
using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read))
{
foreach (byte b in md5.ComputeHash(fs))
sb.Append(b.ToString("X2"));
}
return sb.ToString();
}
catch (Exception)
{
return "";
}
I hope you're checking for an MD5 match only if the file size already matches.
Another optimization is to do a quick checksum of the first 1K (or some other arbitrary, but reasonably small number) and make sure those match before working the whole file.
Of course, all this assumes that you're just looking for a match/nomatch decision for a particular file.
Regardless of cryptographic requirements, the possibility of a hash collision exists, so no hashing function can be used to guarantee that two files are identical.
I wrote similar code a while back which I got to run pretty fast by indexing all the files first, and discarding any with a different size. A fast hash comparison (on part of each file) was then performed on the remaining entries (comparing bytes for this step was proved to be less useful - many file types have common headers which have identical bytes at the start of the file). Any files that were left after this stage were then checked using MD5, and finally a byte comparison of the whole file if the MD5 matched, just to ensure that the contents were the same.
just read the file linearly? It seems pretty pointless to read the entire file, compute a md5 hash, and then compare the hash.
Reading the file sequentially, a few bytes at a time, would allow you to discard the vast majority of files after reading, say, 4 bytes. And you'd save all the processing overhead of computing a hashing function which doesn't give you anything in your case.
If you already had the hashes for all the files in the drive, it'd make sense to compare them, but if you have to compute them on the fly, there just doesn't seem to be any advantage to the hashing.
Am I missing something here? What does hashing buy you in this case?
First consider what is really your bottleneck: the hash function itself or rather a disk access speed? If you are bounded by disk, changing hashing algorithm won't give you much. From your description I imply that you are always scanning the whole disk to find a match - consider building the index first and then only match a given hash against the index, this will be much faster.
There is one small problem with using MD5 to compare files: there are known pairs of files which are different but have the same MD5.
This means you can use MD5 to tell if the files are different (if the MD5 is different, the files must be different), but you cannot use MD5 to tell if the files are equal (if the files are equal, the MD5 must be the same, but if the MD5 is equal, the files might or might not be equal).
You should either use a hash function which has not been broken yet (like SHA-1), or (as #SoapBox mentioned) use MD5 only as a fast way to find candidates for a deeper comparison.
References:
http://www.win.tue.nl/hashclash/SoftIntCodeSign/
Use MD5CryptoServiceProvider and BufferedStream
using (FileStream stream = File.OpenRead(filePath))
{
using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
{
var sha = new MD5CryptoServiceProvider();
byte[] checksum = sha.ComputeHash(bufferedStream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}