A really simple question:
I am doing a simple thing. I have few string like:
string A = "usd";
And I want to get the hashcode in C#:
int audusdReqId = Convert.ToInt32("usd".GetHashCode());
Now how could I convert the 32-bit integer audusdReqId back to the string "usd"?
You cannot. A hash code does not contain all of the necessary information to convert it back to a string. More importantly, you should be careful what you use GetHashCode for. This article explains what you should use it for. You can't even guarantee that GetHashCode will return the same thing in different environments. So you should not be using GetHashCode for any cryptographic purposes.
If you are trying to create a binary representation of a string, you can use
var bytes = System.Text.Encoding.UTF8.GetBytes(someString);
This will give you a byte[] for that particular string. You can use GetString on Encoding to convert it back to a string:
var originalString = System.Text.Encoding.UTF8.GetString(bytes);
If you are trying to cryptographically secure the string, you can use a symmetric encryption algorithm like AES as Robert Harvey pointed out in the comments.
The purpose of the hashcode/hash function is that it has to be one way and that it cannot be converted back to the original value.
There are good reasons as to why this is so. A common usage would be password storage in a database for example. You shouldn't know the original value(the plain text password) and that is why you would normally use a hashcode to encode it one way.
There are also other usages as storing values to hashsets etc.
Related
We're generating hashes to provide identifiers for documents being stored in RavenDB. We're doing this as there is a limit on the length of the DocumentID (127 characters - ESent limitation) if you want to use BulkInsert like so:
_documentStore.BulkInsert(options: new BulkInsertOptions { CheckForUpdates = true }))
In order for the BulkInsert to work, the DocumentID needs to match the row being upserted; so we want an DocumentID that can be regenerated from the same source string consistently.
An MD5 hash will provide us a fixed length value with a low probability of collision, with the code used to generate the hash below:
public static string GetMD5Hash(string inputString)
{
HashAlgorithm algorithm = MD5.Create();
var hashBytes = algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
return Encoding.UTF8.GetString(hashBytes);
}
However; RavenDB does not support "\" in DocumentID; so I want to replace it with "/". However my fear is that in doing so we are increasing the likelihood of a hashing conflict.
Code I want to change to:
public static string GetMD5Hash(string inputString)
{
HashAlgorithm algorithm = MD5.Create();
var hashBytes = algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
return Encoding.UTF8.GetString(hashBytes).Replace('\\', '"');
}
Will this increase the likelihood of hash conflicts and remove our ability to depend on the DocumentID as "unique"?
X-Y problem - instead of converting byte array into version that is known to be correctly handled as string with Base64 (or similar) you using UTF8 as encoding.
Reading random byte array as UTF8 string will have non-printable and 0 characters as well random failures due to incorrect UTF8 sequences.
Use Base64 (or base32 if need case insensitive string). If some characters still not supported - replace with other unique ones. I.e. URL-friendly base64 uses -, _ and no padding to simplify encoding as query parameter.
To original question:
hash of any kind can't be considered "unique ID" for document due to possibility of collisions.
yes replacing one character with another that already could be used in the string will decrease number of possible combinations and increase possibility of collision. I can't estimate it properly - math or statistics specific question may be needed if you really need precise answer.
You increase the probability of collision, but only slightly. All "/" in the output hash are like 'wildcards' which match either "/" or "\" in the raw hash. If you have zero of these in a hash, nothing changes. If you have one of these in a hash, there are now twice as many documents that can match that hash. If you have two in a hash, there are four times as many. Having many more is unlikely given the alphabet and the length of the MD5 hash.
The probability of a collision is still pretty small (unless you have a huge number of documents, etc).
However, you should do what was suggested in comments and use a Base64 or HEX string to store the MD5.
Bad things can happen in cryptography when you 'roll your own' and try and modify protocols which you don't have an inside-out understanding of. You should always always stick to doing standard things which have been tested theoretically and in practice and found to be reasonable. Bruce Schneier puts across this principle at length in Practical Cryptography and elsewhere.
Use Base64 instead of UTF8 and you will solve your problem (no more /).
Have a look at Convert.ToBase64String.
I have the following Java code being used on an Android device that encrypts and decrypts strings using the AES encryption algorithm and an SHA1PRNG hash. I want the Android device to call a .NET WCF service written in C#. I have been searching everywhere trying to find an equivalent in C# that could encrypt and decrypt in a similar way to the Java code, but could not find the exact same way to do it. Here is the Encrypt() method in both languages:
Java:
public static String encrypt(String seed, String cleartext) throws Exception
{
KeyGenerator kgen = KeyGenerator.getInstance("AES");
SecureRandom sr = SecureRandom.getInstance("SHA1PRNG");
sr.setSeed(seed);
kgen.init(128, sr); // 192 and 256 bits may not be available
SecretKey skey = kgen.generateKey();
byte[] rawKey = skey.getEncoded();
SecretKeySpec skeySpec = new SecretKeySpec(rawKey, "AES");
Cipher cipher = Cipher.getInstance("AES");
cipher.init(Cipher.ENCRYPT_MODE, skeySpec);
byte[] encrypted = cipher.doFinal(cleartext.getBytes());
return toHex(encrypted);
}
I have created something similar to this in C#, which also uses AES and SHA1:
C#:
public static string Encrypt(string seed, string cleartext)
{
var objAesCrypto = new AesManaged();
var objHashSha1 = new SHA1Managed();
var byteHash = objHashSha1.ComputeHash(Encoding.ASCII.GetBytes(seed));
var truncatedHash = new byte[16];
Array.Copy(byteHash, truncatedHash, truncatedHash.Length);
objAesCrypto.Key = truncatedHash;
objAesCrypto.Mode = CipherMode.ECB;
var byteBuff = Encoding.ASCII.GetBytes(cleartext);
return Convert.ToBase64String(objAesCrypto.CreateEncryptor().TransformFinalBlock(byteBuff, 0, byteBuff.Length));
}
There are several problems with this, however. As you can see, using C#'s version of SHA1 (SHA1Managed), it returns a hash of 20 bytes, not 16. The only way to get it to pass into the AES algorithm is to truncate the hash to 16 bytes first.
The second problem is, although both work just fine in their respective environments, when I try to pass an encrypted string from Java, along with the seed, the C# code is never able to decrypt it properly. The encrypted strings in both cases look nothing alike and are even different lengths. A typical encrypted string from the Java side looks something like this: F7E8758A2E65518FB49C53BC707288FC (32 chars long). Whereas the same exact encrypted string with the same exact seed from the C# side looks like this: 3VysgnYgNi9OJBxL2FP+rQ== (24 chars long).
I'm sure it has something to do with the fact that I'm truncating the hash in C#, but that doesn't explain why the two encrypted strings look so vastly different. (Another intersting thing I noticed is that no matter what string and seed I use on the C# side, it's always 24 chars long and ends with two equal signs - why is that?)
So, my question is, how do I get both environments to be able to decrypt each other's encrypted strings using the same seed values? I don't care if I even need to use different algorithms on the C# side than the Java side, I just need the C# code to be able to read the Java-encrypted strings.
The second problem is, although both work just fine in their respective environments, when I try to pass an encrypted string from Java, along with the seed, the C# code is never able to decrypt it properly.
You shouldn't be trying to decrypt a hash. Hashes are one-way.
A typical encrypted string from the Java side looks something like this: F7E8758A2E65518FB49C53BC707288FC (32 chars long). Whereas the same exact encrypted string with the same exact seed from the C# side looks like this: 3VysgnYgNi9OJBxL2FP+rQ== (24 chars long).
That's because you're converting to hex in Java, but to Base64 in C#:
return toHex(encrypted);
vs
return Convert.ToBase64String(...);
As for the seed length issue - again, you're doing different things in the Java vs the C#. It's not at all clear to me that using SecureRandom in that way is meant to generate the same secret key as using a straight hash from SHA1.
Rather than trying to fix this approach though, I'd suggest you should be rethinking it - it doesn't look secure to me at all. What you've called a seed is more than just a seed - it's basically a complete key. An attacker who knows the seed effectively knows the "password" to your system; you might as well just use raw bytes.
It appears that Android uses a fixed version of the SHA1PRNG. Also there seem to be many implementations for SHA1PRNG for .NET/Java/Android.
You may want to take a look at the below link for some similar problem and also a possible port of the SHA1PRNG present in Android to C#.
SHA1PRNG in Android - .NET
Your toHex(encrypted); is not the same thing as Convert.ToBase64String() as far as I know.
From this sample code from MSDN
http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx
The hash code for "abc" is: 536991770
But how to convert back the "536991770" to "abc"?
The is no way to get value from the hashcode. See hash-function definition.
Hash values are not used to uniquely identify the original value, have values are not unique for each type of the input value.
A hash function may map two or more
keys to the same hash value. In many
applications, it is desirable to
minimize the occurrence of such
collisions, which means that the hash
function must map the keys to the hash
values as evenly as possible.
You cannot. Hashes are one way.
The thing with hashes is that you loose information. Independent of the length of the string, the result is always an integer. This means e.g. that getting the has of a string of 10,000 characters will also result in an integer. It is of course impossible to get the original string back from this integer.
There is no way to "decrypt" the hash code. Amongst other reasons, because two different strings may very well produce the same hash code. That feature alone would make it impossible to reverse the process.
You cannot,
Even if you will have a table with all strings in the world and their hash code you wouldn't be able to achieve that since there are more string then ints (~4 billion ints) so there are several strings that result in the same hash code.
Can someone reverse this handy hash code I'm using?
using System.Security.Cryptography;
public static string EncodePasswordToBase64(string password)
{ byte[] bytes = Encoding.Unicode.GetBytes(password);
byte[] inArray = HashAlgorithm.Create("SHA1").ComputeHash(bytes);
return Convert.ToBase64String(inArray);
}
Everything I end up doing fails horribly :(.
No, you can't reverse a hash. The typical process is to compare other input to the hash you have, to check if they are the same.
Like (pseudo):
initial = Hash(password);
possible = Hash("test");
if( initial == possible ){
// we infer that password = "test"
}
But note that SHA1, SHA0, and MD5 should no longer be used; (due to various degrees of breaking in each). You should use SHA-2
The only real way of "unhashing" is using a rainbow table, which is a big table of hashes computed for all possible inputs. You look up the hash and get what was probably the original input.
http://en.wikipedia.org/wiki/Rainbow_table
You cannot un-hash SHA1, MD5, or any other one-way hash method unfortunately. Though it is possible to undo BASE-64.
SHA is an NSA acronym for "Secure Hash Algorithm".
Secure Hashes are hard to reverse by definition -- otherwise they would not be Secure.
You need to use a reversible hash function if you want to be able to easily compute sources that will generate the same hash (and even then you might not get the original source due to hash collisions where more than one source input can result in the same hash output).
I want to send BigInteger data in socket and my friend wants to retrieve the data.
I am using Java, but my friend uses C#.
String str = "Hello";
BigInteger big = new BigInteger(str.getBytes);
byteToBeSent[] = big.toByteArray();
I am sending this byte array (byteToBeSent[]) through socket. And my friend wants to retrieve "Hello".
How is this possible?
From Java, send your string using String.getBytes(encoding) and specify the encoding to match how your friend will read it (e.g. UTF-8).
This will translate your string into a byte stream that will be translatable at the C# end due to the fact that you're both agreeing on the encoding mechanism.
I'm not sure what your BigInteger mechanism is doing, but I don't believe it'll be portable, nor handle sizable strings.
Honestly, your best bet would to be use the built in Encoding classes in C#.
string str = "Hello";
byte[] data = System.Text.Encoding.UTF8.GetBytes(str);
And then send that through the socket.
Why do you use BigInteger to send String data? Or is that just an example?
If you want to send String data, use String.getBytes(String encoding), send the result and decode it using System.Text.Encoding.
You'll need to get a custom BigInteger class for C#.
But parsing "Hello" as a biginteger isn't going to work. I you want to send text, you're better of using Navaars method.
On the C# end, you can use System.Text.Encoding.ASCII.GetString(bytesToConvert[]) to convert the received byte array back to a string.
I had thought that was some sort of Java idiom to convert a string to a byte array. It does correctly convert the string into ASCII bytes, and since BigInteger length is arbitrary, the length of the string should not be an issue.