crc32 decrypt short string - c#

I am retrieving lists of crc32 hashes that contain names of files, not there contents.
I need to be able to decrypt the strings which are hashed names like "vacationplans_2010.txt"
which are less then 25 characters long.
is this possible?

it is one-way hash function. It can't be decrypted.

Despite what other users answered, CRC32 is not a cryptographic hash function; it is meant for integrity checks (data checksums).
Cryptographic hash functions are often described as "one-way hash functions", CRC32 lacks the "one-way" part.
That being said, you should consider the following: since the set of all possible 25-characters-or-less filenames is more than 2^32, some file names are bound to have the same hash value. Therefore, it might be that for some of the CRC32 values you get - there will be several possible sources (file-names). You will need a way to determine the "real" source (i assume that human-decision would be the best choice, since our brain is a great pattern-recognition device, but it really depends on your scenario).
Several methods can be used to partially achieve what you are asking for. Brute-force is one of them (although, with 25 characters long file names, brute-force may take a while). A modified dictionary attack is another option. Other options are based on analysis of the CRC32 algorithm, and will require that you dive into the implementation details of the algorithm (otherwise you'll have a hard time understanding what you're implementing). For example, see this article, or this artice.
EDIT: definitions by Bruce Schneier (author of Applied Cryptography, among other things):
One-way functions are relatively easy
to compute, but significantly harder
to reverse. … . In this context,
"hard" is defined as something like:
It would take millions of years to
compute x from f(x), even if all the
computers in the worlds were assigned
to the problem.
A hash function is a function,
mathematical or otherwise, that takes
a variable length input string and
(called a pre-image) and converts it
to a fixed length (generally smaller)
output string (called a hash value).
The security of a one-way hash
function is its one-wayness.

A hash function like CRC32 calculates a simple value given (variable) input. The calculation is not reversible - i.e. you cannot reliably get the original value given only the hash.

Yep, the general method is to find out the rule how u hash encryt result be the same as

Related

How to obfuscate an integer?

From a list of integers in C#, I need to generate a list of unique values. I thought in MD5 or similar but they generates too many bytes.
Integer size is 2 bytes.
I want to get a one way correspondence, for example
0 -> ARY812Q3
1 -> S6321Q66
2 -> 13TZ79K2
So, proving the hash, the user cannot know the integer or to interfere a sequence behind a list of hashes.
For now, I tried to use MD5(my number) and then I used the first 8 characters. However I found the first collision at 51389. Which other alternatives I could use?
As I say, I only need one way. It is not necessary to be able to calculate the integer from the hash. The system uses a dictionary to find them.
UPDATE:
Replying some suggestions about using GetHashCode(). GetHashCode returns the same integer. My purpose is to hide to the end user the integer. In this case, the integer is the primary key of a database. I do not want to give this information to users because they could deduce the number of records in the database or the increment of records by week.
Hashes are not unique, so maybe I need to use encryption like TripleDes or so, but I wanted to use something fast and simple. Also, TripleDes returns too many bytes too.
UPDATE 2:
I was talking about hashes and it is an error. In reality, I am trying to obfuscate it, and I tried it using hash algorithm, that it is not a good idea because they are not unique.
Update May 2017
Feel free to use (or modify) the library I developed, installable via Nuget with:
Install-Package Kent.Cryptography.Obfuscation
This converts a non-negative id such as 127 to 8-character string, e.g. xVrAndNb, and back (with some available options to randomize the sequence each time it's generated).
Example Usage
var obfuscator = new Obfuscator();
string maskedID = obfuscator.Obfuscate(15);
Full documentation at: Github.
Old Answer
I came across this problem way back and I couldn't find what I want in StackOverflow. So I made this obfuscation class and just shared it on github.
Obfuscation.cs - Github
You can use it by:
Obfuscation obfuscation = new Obfuscation();
string maskedValue = obfuscation.Obfuscate(5);
int? value = obfuscation.DeObfuscate(maskedValue);
Perhaps it can be of help to future visitor :)
Encrypt it with Skip32, which produces a 32 bit output. I found this C# implementation but can't vouch for its correctness. Skip32 is a relatively uncommon crypto choice and probably hasn't been analyzed much. Still it should be sufficient for your obfuscation purposes.
The strong choice would be format preserving encryption using AES in FFX mode. But that's pretty complicated and probably overkill for your application.
When encoded with Base32 (case insensitive, alphanumeric) a 32 bit value corresponds to 7 characters. When encoded in hex, it corresponds to 8 characters.
There is also the non cryptographic alternative of generating a random value, storing it in the database and handling collisions.
Xor the integer. Maybe with a random key that it is generated per user (stored in session). While it's not strictly a hash (as it is reversible), the advantages are that you don't need to store it anywhere, and the size will be the same.
For what you want, I'd recommend using GUIDs (or other kind of unique identifier where the probability of collision is either minimal or none) and storing them in the database row, then just never show the ID to the user.
IMHO, it's kind of bad practice to ever show the primary key in the database to the user (much less to let users do any kind of operations on them).
If they need to have raw access to the database for some reason, then just don't use ints as primary keys, and make them guids (but then your requirement loses importance since they can just access the number of records)
Edit
Based on your requirements, if you don't care the algorithm is potentially computationally expensive, then you can just generate a random 8 byte string every time a new row is added, and keep generating random strings until you find one that is not already in the database.
This is far from optimal, and -can- be computationally expensive, but taking you use a 16-bit id and the maximum number of rows is 65536, I'd not care too much about it (the possibility of an 8 byte random string to be in a 65536 possibility list is minimal, so you'll probably be good at first or as much as second try, if your pseudo-random generator is good).

How to get a reasonable CRC of CRCs

I have a tree structure where each node knows its CRC. What's a reasonable way to compute a CRC for each sub-tree that would give me a good value for the entire sub-tree to that point? In other words, a value to identify if any part of the sub-tree was changed.
My current thought is simply take each child node CRC, convert it to a string/byte[], concatenate all the nodes together, and take the CRC of that byte[]. But I'm not sure if this might lead to easy collisions as I suspect this removes quite a bit of information.
(I looked at crc32_combine but it seems inappropriate because I don't have any lengths. I could use a length of zero, but would that be any better or worse?)
Working in C# but I guess this is really language agnostic.
EDIT: Ended up going with this technique. Will switch to longer hashes if collisions seem to be a problem. While I don't need leaf order to be important, am not using xor just in case it is later on.
Ideally you would combine the CRCs of the nodes to compute the CRC of a sub-tree, using something like crc32_combine(). The result would be the same as computing the CRC over all the nodes in whatever canonical ordering you have defined. This would only check the ordering though, not the structure of the tree. A different structure with the same ordering would give the same CRC. This will be true no matter how you combine the CRCs, unless you include additional information on the tree structure.
The use of crc32_combine() requires the length of the data for each of the CRCs being combined (except the first). This is apparently not saved and not available in this case. You can instead make a stream of bytes of the CRCs in the canonical order and take the CRC of that stream. (You will need to decide if the CRCs are to be stored big or little endian in the byte stream, and then stick to your convention.)
The use of cryptographic signatures such as SHA1 or MD5 is unnecessary, unless you are worried for some reason that a devious human is interfering with your computed check values and trying to trick you into thinking that contents of the tree have not changed when they have. (The devious human can already do this at the level of the nodes anyway, since CRCs are easily spoofed.) Otherwise such signatures are just a waste of CPU time. If you simply want a longer hash, more than 32 bits, to reduce the probability of collisions, then you can use a fast hash function such as one from the CityHash family.
I'd probably use least SHA1 for your checksums since the collisions aren't that infrequent for MD5s and your idea about combining the CRCs seems solid though personally I'd XOR the hashes together to save on RAM and CPU cycles.
You should use something designed for this such as SHA-2. You may be able to get away with CRC32 depending on your particular requirements. There is a similar question posted here with more discussion:
Can CRC32 be used as a hash function?

Hashes (MD5, SHA1, SHA256, SHA384, SHA512) - why isn't it possible to get the value back from the hash?

On this blog post, there is a sentence as below:
This hash is unique for the given text. If you use the hash function
on the same text again, you'll get the same hash. But there is no way
to get the given text from the hash.
Forgive my ignorance on math but I cannot understand why it is not possible to get the given text from the hash.
I would understand if we use one key to encrypt the value and another to decrypt but I cannot figure it out in my mind. What is really going on here behind the scenes?
Anything that clears my mind will be appreciated.
Hashing is not encryption.
A hash produces a "digest" - a summary of the input. Whatever the input size, the hash size is always the same (see how MD5 returns the same size result for any input size).
With a hash, you can get the same hash from several different inputs (hash collisions) - how would you reverse this? Which is the correct input?
I suggest reading this blog post from Troy Hunt on the matter in order to gain better understanding of hashes, passwords and security.
Encryption is a different thing - you would get a different cypher from the input and key - and the size of the cypher will tend to be larger as the input is larger. This is reversible if you have the right key.
Update (following the different comments):
Though collisions can happen, when using a cryptographically significant hash (like the ones you have posted about), they will be rare and difficult to produce.
When hashing passwords, always use a salt - this reduces the chances of the hash being reversed by rainbow tables to almost nothing (assuming a good salt has been used).
You need to decide about the tradeoffs of the cost of hashing (can be processor intensive) and the cost of what you are protecting.
As you are simply protecting the login details, using the .NET membership provider should provide enough security.
Hash functions are many to one functions. This means that many inputs will give the same result but that for any given input you get one and only one result.
Why this is so can be intuitively seen by considering a hash function that takes a string input of any length and generates a 32 bit integer. There are obviously far more strings than 2^32 which means that your hash function cannot give each input string a unique output. (see http://en.wikipedia.org/wiki/Pigeonhole_principle for more discussion - the Uses and applications section specifically talks about hashes)
Given we now know that any result from our hash function could have been generated from one or more inputs and we have no information other than the result we have no way to determine which input was used so it cannot be reversed.
There are at least two reasons:
Hashing usually uses asymmetric functions for calculations - meaning that finding reverse value of some operation is MUCH more difficult (in time/resources/efforts) than the direct operation.
Hashes of same algorithm are always of the same length - meaning there is a limited set of possible hashes. This means that for every hash there will be infinite number of collisions - different source data block which form the same hash value.
It's not encrypt/decrypt. For example, simple hash function:
int hash(int data)
{
return data % 2;
}
Problem?
Hashing is like using a checksum to verify data, not to encrypt or compress data.
This is essentially math, a Hash function is a function that is NOT 1 to 1. It takes a Range of inputs in the set of all binary data B* and maps it to some fixed length binary string set Bn for fixed n or so.( this definition is onto however)
you can try and calculate the pre-image, of a given hash via brute force, but without knowing the size, it is infinite.
You can hash any length of data you want, from a single byte to a terabyte file. All possible data can be hashed to a 256 bit value (taking SHA-256 as an example). That means that there are 2^256 possible values output from the SHA-256 hash algorithm. However, there are a lot more than 2^256 possible values that can be input to SHA-256. You can input any combination of bytes for any length you want.
Because there are far more possible inputs than possible outputs, then some of the inputs must generate the same output. Since you don't know which of the many possible inputs generated the output, it is not possible to reliably go backwards.
A very simple hash algorithm would be to take the first character of each word within a text. If you take the same text you can always get out the same hash but it is impossible to rebuilt the original text from only having the first character of each word.
Example hash from my answer above:
AvshawbtttfcoewwatIyttstycagotshbisitrtotfohtfcoew
And now try to find out the corresponding text from the given hash. ;-)

Decrypt hash string from String.GetHashCode?

From this sample code from MSDN
http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx
The hash code for "abc" is: 536991770
But how to convert back the "536991770" to "abc"?
The is no way to get value from the hashcode. See hash-function definition.
Hash values are not used to uniquely identify the original value, have values are not unique for each type of the input value.
A hash function may map two or more
keys to the same hash value. In many
applications, it is desirable to
minimize the occurrence of such
collisions, which means that the hash
function must map the keys to the hash
values as evenly as possible.
You cannot. Hashes are one way.
The thing with hashes is that you loose information. Independent of the length of the string, the result is always an integer. This means e.g. that getting the has of a string of 10,000 characters will also result in an integer. It is of course impossible to get the original string back from this integer.
There is no way to "decrypt" the hash code. Amongst other reasons, because two different strings may very well produce the same hash code. That feature alone would make it impossible to reverse the process.
You cannot,
Even if you will have a table with all strings in the world and their hash code you wouldn't be able to achieve that since there are more string then ints (~4 billion ints) so there are several strings that result in the same hash code.

What hash algorithm does .net utilise? What about java?

Regarding the HashTable (and subsequent derivatives of such) does anyone know what hashing algorithm .net and Java utilise?
Are List and Dictionary both direct descandents of Hashtable?
The hash function is not built into the hash table; the hash table invokes a method on the key object to compute the hash. So, the hash function varies depending on the type of key object.
In Java, a List is not a hash table (that is, it doesn't extend the Map interface). One could implement a List with a hash table internally (a sparse list, where the list index is the key into the hash table), but such an implementation is not part of the standard Java library.
I know nothing about .NET but I'll attempt to speak for Java.
In Java, the hash code is ultimately a combination of the code returned by a given object's hashCode() function, and a secondary hash function inside the HashMap/ConcurrentHashMap class (interestingly, the two use different functions). Note that Hashtable and Dictionary (the precursors to HashMap and AbstractMap) are obsolete classes. And a list is really just "something else".
As an example, the String class constructs a hash code by repeatedly multiplying the current code by 31 and adding in the next character. See my article on how the String hash function works for more information. Numbers generally use "themselves" as the hash code; other classes, e.g. Rectangle, that have a combination of fields often use a combination of the String technique of multiplying by a small prime number and adding in, but add in the various field values. (Choosing a prime number means you're unlikely to get "accidental interactions" between certain values and the hash code width, since they don't divide by anything.)
Since the hash table size-- i.e. the number of "buckets" it has-- is a power of two, a bucket number is derived from the hash code essentially by lopping off the top bits until the hash code is in range. The secondary hash function protects against hash functions where all or most of the randomness is in those top bits, by "spreading the bits around" so that some of the randomness ends up in the bottom bits and doesn't get lopped off. The String hash code would actually work fairly well without this mixing, but user-created hash codes may not work quite so well. Note that if two different hash codes resolve to the same bucket number, Java's HashMap implementations use the "chaining" technique-- i.e. they create a linked list of entries in each bucket. It's thus important for hash codes to have a good degree of randomness so that items don't cluster into a particular range of buckets. (However, even with a perfect hash function, you will still by law of averages expect some chaining to occur.)
Hash code implementations shouldn't be a mystery. You can look at the hashCode() source for any class you choose.
The HASHING algorithm is the algorithm used to determine the hash code of an item within the HashTable.
The HASHTABLE algorithm (which I think is what this person is asking) is the algorithm the HashTable uses to organize its elements given their hash code.
Java happens to use a chained hash table algorithm.
While looking for the same answer myself, I found this in .net's reference source # http://referencesource.microsoft.com.
/*
Implementation Notes:
The generic Dictionary was copied from Hashtable's source - any bug
fixes here probably need to be made to the generic Dictionary as well.
This Hashtable uses double hashing. There are hashsize buckets in the
table, and each bucket can contain 0 or 1 element. We a bit to mark
whether there's been a collision when we inserted multiple elements
(ie, an inserted item was hashed at least a second time and we probed
this bucket, but it was already in use). Using the collision bit, we
can terminate lookups & removes for elements that aren't in the hash
table more quickly. We steal the most significant bit from the hash code
to store the collision bit.
Our hash function is of the following form:
h(key, n) = h1(key) + n*h2(key)
where n is the number of times we've hit a collided bucket and rehashed
(on this particular lookup). Here are our hash functions:
h1(key) = GetHash(key); // default implementation calls key.GetHashCode();
h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1));
The h1 can return any number. h2 must return a number between 1 and
hashsize - 1 that is relatively prime to hashsize (not a problem if
hashsize is prime). (Knuth's Art of Computer Programming, Vol. 3, p. 528-9)
If this is true, then we are guaranteed to visit every bucket in exactly
hashsize probes, since the least common multiple of hashsize and h2(key)
will be hashsize * h2(key). (This is the first number where adding h2 to
h1 mod hashsize will be 0 and we will search the same bucket twice).
We previously used a different h2(key, n) that was not constant. That is a
horrifically bad idea, unless you can prove that series will never produce
any identical numbers that overlap when you mod them by hashsize, for all
subranges from i to i+hashsize, for all i. It's not worth investigating,
since there was no clear benefit from using that hash function, and it was
broken.
For efficiency reasons, we've implemented this by storing h1 and h2 in a
temporary, and setting a variable called seed equal to h1. We do a probe,
and if we collided, we simply add h2 to seed each time through the loop.
A good test for h2() is to subclass Hashtable, provide your own implementation
of GetHash() that returns a constant, then add many items to the hash table.
Make sure Count equals the number of items you inserted.
Note that when we remove an item from the hash table, we set the key
equal to buckets, if there was a collision in this bucket. Otherwise
we'd either wipe out the collision bit, or we'd still have an item in
the hash table.
--
*/
Anything purporting to be a HashTable or something like it in .NET does not implement its own hashing algorithm: they always call the object-being-hashed's GetHashCode() method.
There is a lot of confusion though as to what this method does or is supposed to do, especially when concerning user-defined or otherwise custom classes that override the base Object implementation.
For .NET, you can use Reflector to see the various algorithms. There is a different one for the generic and non-generic hash table, plus of course each class defines its own hash code formula.
The .NET Dictionary<T> class uses an IEqualityComparer<T> to compute hash codes for keys and to perform comparisons between keys in order to do hash lookups.
If you don't provide an IEqualityComparer<T> when constructing the Dictionary<T> instance (it's an optional argument to the constructor) it will create a default one for you, which uses the object.GetHashCode and object.Equals methods by default.
As for how the standard GetHashCode implementation works, I'm not sure it's documented. For specific types you can read the source code for the method in Reflector or try checking the Rotor source code to see if it's there.

Categories

Resources