I want to generate a hash code for a file. Using C# I would do something like this then store the value in a database.
byte[] b = File.ReadAllBytes(#"C:\image.jpg");
string hash = ComputeHash(b);
Now, if i use say a Java program that implements the same hashing alogorithm (Md5), can i expect the hash values to be the equal to the value generated in C#? What if i execute the java program from different environments, Windows, Linux or Mac?
Hash values are not globally unique. But that is not what you are really asking.
What you really want to know is whether a hashing algorithm (such as MD5) will produce the same hash value for identical files on different operating system platforms. The answer to that is "yes" ... provided that files are byte-for-byte identical.
In the case of an binary format that should be the case. In the case of text files, transcoding between different character encodings, or changing line termination sequences will make the files different at the byte level and result in different MD5 hash values.
Havh values generated from the same input and with the same algorithm are defined to be equal. 1+1=2, regardless of the programming language I program this in.
Otherwise the internet would not work at all, you know.
My suggestion would be to use a common/accepted hashing algorithm like MD5 to achieve the same hash values.
If the Hashing algorithm and the input are same, the hash value generated would be same irrespective of language or environment.
The hashing algorithm takes the full/part of the key and manipulates it to generate the value which is why it would be same in all languages.
I wish I could comment on this but I don't have enough reputation to do that.
While I don't know for what purpose you want to use a hash algorithm, I'd like to say that some collisions have been found for MD5 so it might be less "secure" (well, we probably can't say "broken" since those collisions are hard to compute). The same remark applies to the SHA-1 algorithm.
More information here: http://www.mathstat.dal.ca/~selinger/md5collision/
So if you want to use a hash algorithm for security purposes, you might take a look at SHA-256 or SHA-512 which are stronger for now.
Otherwise you can probably keep going with MD5.
My two cents.
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I encrypt a string and get a equal length encrypted string?
I am new to Encryption and Decryption. I have a string which is 24 char length. I need to encrypt and decrypt the word. The encryption may be less secure but I need encrypted word should be same length as input string (24 char). I have searched through web and find some sample Encryption algorithm (AES, MD5). But the encrypted word is too length than input string. This is product key that we will share to customer, so strong encryption is not required. It would be useful if you share sample codes.
Use Vernam cipher. For a single string with a truly-randomly generated key it's theoretically unbreakable. If you start using the same key for multiple strings you reduce its security significantly but apparently you are not looking for utmost security. If you are, you must be able to come up with a different random key for each encrypted password.
Although you can find lots of sample code on the web, I think it would be good practice to you to implement it yourself. It's pretty straightforward.
What you're looking for seems to be Format-preserving encryption, I don't think there are any implementations of this in .NET (I certainly haven't used any). You may need to think up a custom algorithm for this. You say strong encryption isn't required, but you'll obviously need the algorithm to not be obvious. Unfortunately there are literally hundreds of ways you could do this, so it depends on which one suits you.
This seems to be a great post for encryption algorithms
To make the cyphertext the same length as the plaintext, use a stream cypher. This can either be a block cypher in CTR mode, such as AES-CTR, or a dedicated stream cypher, like Rabbit or RC4.
Be aware that you cannot reuse a key for a stream cypher, otherwise an attacker will probably be able to break the encryption. Two cyphertexts that use the same key can be used to eliminate the key entirely, leaving just the two plaintexts.
If you only have the one 24 byte word to encrypt then this is not a problem. If you need to encrypt more than one piece of data, then key management becomes important.
I want to know how php, c# has included the md5 hashing because how can php or c# include md5 hashing system without knowing the algorithm, and how have the managed to keep the algorithm secure, and does this mean that the people who made c# or php know the md5 algorithm?
If there is a need to keep the hash (or encryption) algorithm secret, then it is not really good.
The MD5 algorithm is available for everyone.
Yes, md5 algorithm is public - http://en.wikipedia.org/wiki/MD5
Anyone can independently implement it in any language.
While I am not sure what you mean by secure, assuming you are referring to using md5 as a password hash, md5 is not secure. If thats what you are planning, read How To Safely Store A Password.
The nice thing about encryption algorithms is that even if you know how it works you can't easily decrypt it. That's why most encryption algorithms are public, at least the ones deployed by c# and php. But md5 isn't really secure anymore, you better use AES.
Firstly - It's down to the source code library included in the language build.
Secondly - MD5 is a one way algorithm, It doesn't need to be secure, there is no 'key'
Everyone has already said this, but I'll add this one thing... MD5 isn't meant to "obscure" things, really, It's really a way to "normalize" data. No matter what you start with you end up with a 32 character [thing] that represents the original string. It's a fabulous way to make sure that the original string is what you expect it to be. That's why it's frequently used to compare files, since no matter what the original content of a file, it can still be represented with a 32 character string. So, the answer to your question is really that the whole point is to simply make sure that it's the SAME algorithm in all these different languages so that you can count on the same result no matter where you convert your source string to its MD5 counterpart.
I am doing a md5 hash, and just want to make sure the result of:
md5.ComputeHash(bytePassword);
Is consistent regardless of the server?
e.g. windows 2003/2008 and 32/64 bit etc.
Yes, it's consistent, the md5 algorithm specification defines it regardless of platform.
MD5 is independent of operating system and architecture. So it is "consistent".
However, MD5 takes as input an arbitrary sequence of bits, and outputs a sequence of 128 bits. In many situations, you want strings. For instance, you want to hash a password, and the password is initially a string. The conversion of that string into a sequence of bits is not part of MD5 itself, and several conventions exist. I do not know precisely about C#, but the Java equivalent String.getBytes() method will use the "platform default charset" which may vary with the operating system installation. Similarly, the output of MD5 is often converted to a string with hexadecimal notation, and it may be uppercase or lowercase or whatever.
So that while MD5 itself is consistent, bugs often lurk in the parts which prepare the data for MD5 and post-process its output. Beware.
The result of an md5 hash is a number. The number returned for a given input is always the same, no matter what server or even platform you use.
However, the expression of the number may vary. For example, 1 and 1.0 are the same number, but are expressed differently. Similarly, some platforms will return the hash formatted slightly differently than others. In this case, you have a byte array, and that should be fairly safe to pass around. Just be careful what you do after converting it to a string.
MD5 Hashing is [system/time/anything except the input] independent
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I generate a hashcode from a byte array in c#
In C#, I need to create a Hash of an image to ensure it is unique in storage.
I can easily convert it to a byte array, but unsure how to proceed from there.
Are there any classes in the .NET framework that can assist me, or is anyone aware of some efficient algorithms to create such a unique hash?
There's plenty of hashsum providers in .NET which create cryptographic hashes - which satisifies your condition that they are unique (for most purposes collision-proof). They are all extremely fast and the hashing definitely won't be the bottleneck in your app unless you're doing it a trillion times over.
Personally I like SHA1:
public static string GetHashSHA1(this byte[] data)
{
using (var sha1 = new System.Security.Cryptography.SHA1CryptoServiceProvider())
{
return string.Concat(sha1.ComputeHash(data).Select(x => x.ToString("X2")));
}
}
Even when people say one method might be slower than another, it's all in relative terms. A program dealing with images definitely won't notice the microsecond process of generating a hashsum.
And regarding collisions, for most purposes this is also irrelevant. Even "obsolete" methods like MD5 are still highly useful in most situations. Only recommend not using it when the security of your system relies on preventing collisions.
The part of Rex M's answer about using SHA1 to generate a hash is a good one (MD5 is also a popular option). zvolkov's suggestion about not constantly creating new crypto providers is also a good one (as is the suggestion about using CRC if speed is more important than virtually-guaranteed uniqueness.
However, do not use Encoding.UTF8.GetString() to convert a byte[] into a string (unless of course you know from context that it is valid UTF8). For one, it will reject invalid surogates. A method guaranteed to always give you a valid string from a byte[] is Convert.ToBase64String().
Creating new instance of SHA1CryptoServiceProvider every time you need to compute a hash is NOT fast at all. Using the same instance is pretty fast.
Still I'd rather do one of the many CRC algorithms instead of a cryptographic hash as hash functions designed for cryptography don't work too well for very small hash sizes (32 bit) which is what you want for your GetHash() override (assuming that's what you want).
Check this link out for one example of computing CRC in C#: http://sanity-free.org/134/standard_crc_16_in_csharp.html
P.S. the reason you want your hash to be small (16 or 32 bit) is so you can compare them FAST (that was the whole point of having hashes, remember?). Having hash represented by a 256-bit long value encoded as string is pretty insane in terms of performance.
You can use any of the standard hashing algorithms, but hashing can't technically guarantee uniqueness. Hashing is designed to be a relatively fast and/or small token to be able to see if one piece of data likely is the same as the other. It's fully possible for entirely different sets of data to produce the same hash, though being able to produce these algorithmically is very hard.
All of that aside, for checking likely identity, MD5 is fairly fast. SHA is more reliable (MD5 has been hacked, so shouldn't be use for security), but it's also slower.
The company I work for has taken on a support contract for a large order processing system. As part of the initial system audit I noticed that the passwords stored in the database were actually the hashcode of the password.
Essentially:
string pwd = "some pasword";
string securePwd = pwd.GetHashCode();
My question is, how secure or otherwise is this?
I'm not comfortable with it, but I don't know enough about how GetHashCode works. I would prefer to use something like an MD5 hash, but if I'm wasting my time then I won't bother.
You should use a salted, cryptographically strong hash, such as SHA256Managed.
Jeff Attwood has a few good posts on this topic:
Rainbow Hash Cracking
You're Probably Storing Passwords Incorrectly
It's not just insecure, but also subject to change:
http://netrsc.blogspot.com/2008/08/gethashcode-differs-on-systems.html
The value returned by GetHashValue for a given input has changed in the past.
There's no guarantee it will even be the same between different executions of the app.
GetHashCode returns a 32 bit integer as the hash value. Considering the birthday paradox, it's not a long enough hash value due to the relatively high probability of collisions, even if it were explicitly designed to be collision resistant, which is not.
You should go for SHA256 or another cryptographically secure hash function designed to handle such a task.
To store passwords, just using a simple hash function is not enough. You should add some random "salt" per user and iterate enough times so that it would be computationally expensive to brute force. Therefore, you should use something like bcrypt, scrypt, PBKDF2, with a large number of iterations.
I'd recommend using BCrypt instead. As others have already said using GetHashCode for passwords isn't a good idea.
GetHashCode was definitely not designed to be used in this way as the implementation does not guarantee different hash returns for different objects. This means that potentially multiple passwords could produce the same hash. It also isn't guaranteed to return the same hash value on different versions of the .NET framework meaning that an upgrade could potentially produce a different hash for the same string, rendering your passwords unusable to you.
It is recommended that you use a salted hash or even MD5 at a push. You can easily switch it to something within the Security.Cryptography namespace.
As others have said, GetHashCode isn't designed for what you're trying to do. There is a really excellent article on how to handle user passwords securely.
To summarise the article, you need to use either a relatively slow adaptive hashing scheme such as bcrypt, or alternatively the Stanford Secure Remote Password Protocol. I would suggest the former. And of course you should also use a salt.