I am new to hashing the data, I got a requirement to Hash data to the length of 128.
Tried hashing using SHA256 and SHA512, which produces 44 and 88 lengths of hashed data. Is there any way to generate hashed data at a specified length?
I am trying to achieve using the below code.
var value = "Test";
using var sha256 = SHA256.Create();
byte[] privatekeyBytes = Encoding.UTF8.GetBytes(value);
var text = Convert.ToBase64String(sha256.ComputeHash(privatekeyBytes));
I know it's a pretty basic question, any links to understand more on this will also help me.
It really depends on what you want to achieve. In some contexts, I keep insisting that return 1 is a valid hash function. By and large, there are three applications for hashes:
Quickly differentiate between data that is certainly different for better runtime performance. For that purpose, depending on your data something dumb like return input.Length can be perfectly sufficient.
Evenly distribute data in a HashSet or similar, i.e., you primarily care that all available hashes are used roughly equally often. For that, and again depending on the type of data you are processing, it might suffice to take the first 128 bytes, XOR them with the next 128 bytes, XOR with the next 128 bytes and so on until the end of the input data.
Cryptographically sign something. For that you should use one of the hashing algorithms designed for that purpose; they all produce a fixed number of bytes. If you find one that produces more than 128 bytes, it is perfectly fine to just truncate it to that length (at the loss of the additional security the extra bytes would have brought). As far as I can see, SHA512 is available in C# and should return a sufficiently long hash, that is, SHA512.Create().ComputeHash(privatekeyBytes).First(128).
Related
for my employer I have to present customers of a web-app with checksums for certain files they download.
I'd like to present the user with the hash their client tools are also likely to generate, hence I have been comparing online hashing tools. My question is regarding their form of hashing, since they differ, strangely enough.
After a quick search I tested with 5:
http://www.convertstring.com/Hash/SHA256
http://www.freeformatter.com/sha256-generator.html#ad-output
http://online-encoder.com/sha256-encoder-decoder.html
http://www.xorbin.com/tools/sha256-hash-calculator
http://www.everpassword.com/sha-256-generator
Entering the value 'test' (without 'enter' after it) all 5 give me the same SHA256 result. However, and here begins the peculiar thing, when I enter the value 'test[enter]test' (so two lines) online tool 1, 2 and 3 give me the same SHA256 hash, and site 4 and 5 give me a different one (so 1, 2 and 3 are equal, and 4 and 5 are equal). This most likely has to do with the way the tool, or underlying code handles \r\n, or at least I think so.
Coincidentally, site 1, 2 and 3 present me with the same hash as my C# code does:
var sha256Now = ComputeHash(Encoding.UTF8.GetBytes("test\r\ntest"), new SHA256CryptoServiceProvider());
private static string ComputeHash(byte[] inputBytes, HashAlgorithm algorithm)
{
var hashedBytes = algorithm.ComputeHash(inputBytes);
return BitConverter.ToString(hashedBytes);
}
The question is: which sites are 'right'?
Is there any way to know if a hash is compliant with the standard?
UPDATE1: Changed the encoding to UTF8. This has no influence on the output hash being created though. Thx #Hans. (because my Encoding.Default is probably Encoding.UTF8)
UPDATE2: Maybe I should expand the question a bit, since it may have been under-explained, sorry. I guess what I am asking is more of a usability question than a technical one; Should I offer all the hashes with different line endings? Or should I stick to one? The client will probably call my company afraid that their file was changed somehow if they have a different way of calculating the hash. How is this usually solved?
All those sites return valid values.
Sites 4 and 5 use \n as line break.
EDIT
I see you edited your question to add Encoding.Default.GetBytes in the code example.
This is interesting, because you see there is some string to byte array conversion to run before computing the hash. Line breaking (\n or \r\n) as well as text encoding are both ways to interpret your string to get different bytes values.
Once you have the same bytes as input, all hash results will be identical.
EDIT 2:
If you're dealing with bytes directly, then just compute the hash with those bytes. Don't try to provide different hash values; a hash must only return one value. If your clients have a different hash value than yours, then they are doing it wrong.
That being said, I'm pretty sure it won't ever happen because there isn't any way to misinterpret a byte array.
I need to hash a number (about 22 digits) and the result length must be less than 12 characters. It can be a number or a mix of characters, and must be unique. (The number entered will be unique too).
For example, if the number entered is 000000000000000000001, the result should be something like 2s5As5A62s.
I looked at the typicals, like MD5, SHA-1, etc., but they give high length results.
The problem with your question is that the input is larger than the output and unique. If you're expecting a unique output as well, it won't happen. The reason behind this that if you have an input space of say 22 numeric digits (10^22 possibilities) and an output space of hexadecimal digits with a length of 11 digits (16^11 possibilities), you end up with more input possibilities than output possibilities.
The graph below shows that you would need a an output space of 19 hexadecimal digits and a perfect one-to-one function, otherwise you will have collisions pretty often (more than 50% of the time). I assume this is something you do not want, but you did not specify.
Since what you want cannot be done, I would suggest rethinking your design or using a checksum such as the cyclic redundancy check (CRC). CRC-64 will produce a 64 bit output and when encoded with any base64 algorithm, will give you something along the lines of what you want. This does not provide cryptographic strength like SHA-1, so it should never be used in anything related to information security.
However, if you were able to change your criteria to allow for long hash outputs, then I would strongly suggest you look at SHA-512, as it will provide high quality outputs with an extremely low chance of duplication. By a low chance I mean that no two inputs have yet been found to equal the same hash in the history of the algorithm.
If both of these suggestions still are not great for you, then your last alternative is probably just going with only base64 on the input data. It will essentially utilize the standard English alphabet in the best way possible to represent your data, thus reducing the number of characters as much as possible while retaining a complete representation of the input data. This is not a hash function, but simply a method for encoding binary data.
Why not taking MD5 or SHA-N then refactor to BASE64 (or base-whatever) and take only 12 characters of them ?
NB: In all case the hash will NEVER be unique (but can offer low collision probability)
You can't use a hash if it has to be unique.
You need about 74 bits to store such a number. If you convert it to base-64 it will be about 12 characters.
Can you elaborate on what your requirement is for the hashing? Do you need to make sure the result is diverse? (i.e. not 1 = a, 2 = b)
Just thinking out loud, and a little bit laterally, but could you not apply principles of run-length encoding on your number, treating it as data you want to compress. You could then use the base64 version of your compressed version.
On this blog post, there is a sentence as below:
This hash is unique for the given text. If you use the hash function
on the same text again, you'll get the same hash. But there is no way
to get the given text from the hash.
Forgive my ignorance on math but I cannot understand why it is not possible to get the given text from the hash.
I would understand if we use one key to encrypt the value and another to decrypt but I cannot figure it out in my mind. What is really going on here behind the scenes?
Anything that clears my mind will be appreciated.
Hashing is not encryption.
A hash produces a "digest" - a summary of the input. Whatever the input size, the hash size is always the same (see how MD5 returns the same size result for any input size).
With a hash, you can get the same hash from several different inputs (hash collisions) - how would you reverse this? Which is the correct input?
I suggest reading this blog post from Troy Hunt on the matter in order to gain better understanding of hashes, passwords and security.
Encryption is a different thing - you would get a different cypher from the input and key - and the size of the cypher will tend to be larger as the input is larger. This is reversible if you have the right key.
Update (following the different comments):
Though collisions can happen, when using a cryptographically significant hash (like the ones you have posted about), they will be rare and difficult to produce.
When hashing passwords, always use a salt - this reduces the chances of the hash being reversed by rainbow tables to almost nothing (assuming a good salt has been used).
You need to decide about the tradeoffs of the cost of hashing (can be processor intensive) and the cost of what you are protecting.
As you are simply protecting the login details, using the .NET membership provider should provide enough security.
Hash functions are many to one functions. This means that many inputs will give the same result but that for any given input you get one and only one result.
Why this is so can be intuitively seen by considering a hash function that takes a string input of any length and generates a 32 bit integer. There are obviously far more strings than 2^32 which means that your hash function cannot give each input string a unique output. (see http://en.wikipedia.org/wiki/Pigeonhole_principle for more discussion - the Uses and applications section specifically talks about hashes)
Given we now know that any result from our hash function could have been generated from one or more inputs and we have no information other than the result we have no way to determine which input was used so it cannot be reversed.
There are at least two reasons:
Hashing usually uses asymmetric functions for calculations - meaning that finding reverse value of some operation is MUCH more difficult (in time/resources/efforts) than the direct operation.
Hashes of same algorithm are always of the same length - meaning there is a limited set of possible hashes. This means that for every hash there will be infinite number of collisions - different source data block which form the same hash value.
It's not encrypt/decrypt. For example, simple hash function:
int hash(int data)
{
return data % 2;
}
Problem?
Hashing is like using a checksum to verify data, not to encrypt or compress data.
This is essentially math, a Hash function is a function that is NOT 1 to 1. It takes a Range of inputs in the set of all binary data B* and maps it to some fixed length binary string set Bn for fixed n or so.( this definition is onto however)
you can try and calculate the pre-image, of a given hash via brute force, but without knowing the size, it is infinite.
You can hash any length of data you want, from a single byte to a terabyte file. All possible data can be hashed to a 256 bit value (taking SHA-256 as an example). That means that there are 2^256 possible values output from the SHA-256 hash algorithm. However, there are a lot more than 2^256 possible values that can be input to SHA-256. You can input any combination of bytes for any length you want.
Because there are far more possible inputs than possible outputs, then some of the inputs must generate the same output. Since you don't know which of the many possible inputs generated the output, it is not possible to reliably go backwards.
A very simple hash algorithm would be to take the first character of each word within a text. If you take the same text you can always get out the same hash but it is impossible to rebuilt the original text from only having the first character of each word.
Example hash from my answer above:
AvshawbtttfcoewwatIyttstycagotshbisitrtotfohtfcoew
And now try to find out the corresponding text from the given hash. ;-)
Can someone reverse this handy hash code I'm using?
using System.Security.Cryptography;
public static string EncodePasswordToBase64(string password)
{ byte[] bytes = Encoding.Unicode.GetBytes(password);
byte[] inArray = HashAlgorithm.Create("SHA1").ComputeHash(bytes);
return Convert.ToBase64String(inArray);
}
Everything I end up doing fails horribly :(.
No, you can't reverse a hash. The typical process is to compare other input to the hash you have, to check if they are the same.
Like (pseudo):
initial = Hash(password);
possible = Hash("test");
if( initial == possible ){
// we infer that password = "test"
}
But note that SHA1, SHA0, and MD5 should no longer be used; (due to various degrees of breaking in each). You should use SHA-2
The only real way of "unhashing" is using a rainbow table, which is a big table of hashes computed for all possible inputs. You look up the hash and get what was probably the original input.
http://en.wikipedia.org/wiki/Rainbow_table
You cannot un-hash SHA1, MD5, or any other one-way hash method unfortunately. Though it is possible to undo BASE-64.
SHA is an NSA acronym for "Secure Hash Algorithm".
Secure Hashes are hard to reverse by definition -- otherwise they would not be Secure.
You need to use a reversible hash function if you want to be able to easily compute sources that will generate the same hash (and even then you might not get the original source due to hash collisions where more than one source input can result in the same hash output).
I was working on some encryption/decryption algorithms and I noticed that the encrypted byte[] arrays always had a length of 33, and the char[] arrays always had a length of 44. Does anyone know why this is?
(I'm using Rijndael encryption.)
Padding and text encoding. Most encryption algorithms have a block size, and input needs to be padded up to a multiple of that block size. Also, turning binary data into text usually involves the Base64 algorithm, which expands 3 bytes into 4 characters.
That's certainly not true for all encryption algorithms, it must just be a property of the particular one you're using. Without knowing what algorithm it is, I can only guess, but the ratio 33/44 suggests that the algorithm might be compressing each character into 6 bits in the output byte array. That probably means it's making the assumption that no more than 64 distinct characters are used, which is a good assumption for plain text (in fact, that's how base64 decoding works).
But again, without knowing what algorithm you're using, this is all guesswork.
Without knowing the encryption you're using, its a little tough to determine the exact cause. To start, here's an article on How to Calculate the Size of Encrypted Data. It sounds like you might be using a hash of your plaintext, which is why the result is shorter.
Edit: Heres the source for a Rijndael Implementation. It looks like the ciphertext output is initially the same length as the plaintext input, and then they do a base64 on it, which, as the previous poster mentioned, would reduce your final output to 3/4 of the original input.
No, no idea at all, but my first thought would be that your encryption algorithm is built such that it removes 1 bit per 10 from the output data.
Only you can know for sure since we cannot see you code from out here :-)
It would be a pretty lousy encryption algorithm if it was just replacing bytes one-for-one. That was state of the art 50 years ago, and it didn't work very well even then. :)