I would like to generate a long UUID - something like the session key used by gmail. It should be at least 256 chars and no more than 512. It can contain all alpha-numeric chars and a few special chars (the ones below the function keys on the keyboard). Has this been done already or is there a sample out there?
C++ or C#
Update: A GUID is not enough. We already have been seeing collisions and need to remedy this. 512 is the max as of now because it will prevent us from changing stuff that was already shipped.
Update 2: For the guys who are insisting about how unique the GUID is, if someone wants to guess your next session ID, they don't have to compute the combinations for the next 1 trillion years. All they have to do is use constrain the time factor and they will be done in hours.
If your GUIDs are colliding, may I ask how you're generating them?
It is astronomically improbable that GUIDs would collide as they are based on:
60 bits - timestamp during generation
48 bits - computer identifier
14 bits - unique ID
6 bits are fixed
You would have to run the GUID generation on the same machine about 50 times in the exact same instant in time in order to have a 50% chance of collision. Note that instant is measured down to nanoseconds.
Update:
As per your comment "putting GUIDs into a hashtable"... the GetHashCode() method is what is causing the collision, not the GUIDs:
public override int GetHashCode()
{
return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}
You can see it returns an int, so if you have more than 2^32 "GUIDs" in the hashtable, you are 100% going to have a collision.
As per your update2 you are correct on Guids are predicable even the msdn references that. here is a method that uses a crptographicly strong random number generator to create the ID.
static long counter; //store and load the counter from persistent storage every time the program loads or closes.
public static string CreateRandomString(int length)
{
long count = System.Threading.Interlocked.Increment(ref counter);
int PasswordLength = length;
String _allowedChars = "abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNOPQRSTUVWXYZ23456789";
Byte[] randomBytes = new Byte[PasswordLength];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
char[] chars = new char[PasswordLength];
int allowedCharCount = _allowedChars.Length;
for (int i = 0; i < PasswordLength; i++)
{
while(randomBytes[i] > byte.MaxValue - (byte.MaxValue % allowedCharCount))
{
byte[] tmp = new byte[1];
rng.GetBytes(tmp);
randomBytes[i] = tmp[0];
}
chars[i] = _allowedChars[(int)randomBytes[i] % allowedCharCount];
}
byte[] buf = new byte[8];
buf[0] = (byte) count;
buf[1] = (byte) (count >> 8);
buf[2] = (byte) (count >> 16);
buf[3] = (byte) (count >> 24);
buf[4] = (byte) (count >> 32);
buf[5] = (byte) (count >> 40);
buf[6] = (byte) (count >> 48);
buf[7] = (byte) (count >> 56);
return Convert.ToBase64String(buf) + new string(chars);
}
EDIT I know there is some biasing because allowedCharCount is not evenly divisible by 255, you can get rid of the bias throwing away and getting a new random number if it lands in the no-mans-land of the remainder.
EDIT2 - This is not guaranteed to be unique, you could hold a static 64 bit(or higher if necessary) monotonic counter encode it to base46 and have that be the first 4-5 characters of the id.
UPDATE - Now guaranteed to be unique
UPDATE 2: Algorithm is now slower but removed biasing.
EDIT: I just ran a test, I wanted to let you know that ToBase64String can return non alphnumeric charaters (like 1 encodes to "AQAAAAAAAAA=") just so you are aware.
New Version:
Taking from Matt Dotson's answer on this page, if you are no so worried about the keyspace you can do it this way and it will run a LOT faster.
public static string CreateRandomString(int length)
{
length -= 12; //12 digits are the counter
if (length <= 0)
throw new ArgumentOutOfRangeException("length");
long count = System.Threading.Interlocked.Increment(ref counter);
Byte[] randomBytes = new Byte[length * 3 / 4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
byte[] buf = new byte[8];
buf[0] = (byte)count;
buf[1] = (byte)(count >> 8);
buf[2] = (byte)(count >> 16);
buf[3] = (byte)(count >> 24);
buf[4] = (byte)(count >> 32);
buf[5] = (byte)(count >> 40);
buf[6] = (byte)(count >> 48);
buf[7] = (byte)(count >> 56);
return Convert.ToBase64String(buf) + Convert.ToBase64String(randomBytes);
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < HOW_MUCH_YOU_WANT / 32; i++)
sb.Append(Guid.NewGuid().ToString("N"));
return sb.ToString();
but what for?
The problem here is why, not how. A session ID bigger than a GUID is useless, because it's already big enough to thwart brute force attacks.
If you're concerned about predicting GUID's, don't be. Unlike the earlier, sequential GUID's, V4 GUID's are cryptographically secure, based on RC4. The only exploit I know about depends on having full access to the internal state of the process that's generating the values, so it can't get you anywhere if all you have is a partial sequence of GUID's.
If you're paranoid, generate a GUID, hash it with something like SHA-1, and use that value. However, this is a waste of time. If you're concerned about session hijacking, you should be looking at SSL, not this.
byte[] random = new Byte[384];
//RNGCryptoServiceProvider is an implementation of a random number generator.
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(random);
var sessionId = Convert.ToBase64String(random);
You can replace the "/" and "=" from the base64 encoding to be whatever special characters are acceptable to you.
Base64 encoding creates a string that is 4/3 larger than the byte array (hence the 384 bytes should give you 512 characters).
This should give you orders of magnatude more values than a base16 (hex) encoded guid. 512^16 vs 512^64
Also if you are putting these in sql server, make sure to turn OFF case insensitivity.
There are two really easy ways (C#):
1) Generate a bunch of Guids using Guid.NewGuid().ToString("N"). each GUID will be 32 characters long, so just generate 8 of them and concatenate them to get 256 chars.
2) Create a constant string (const string sChars = "abcdef") of acceptable characters you'd like in your UID. Then in a loop, randomly pick characters from that string by randomly generating a number from 0 to the length of the string of acceptable characters (sChars), and concatenate them in a new string (use stringbuilder to make it more performant, but string will work too).
You may want to check out boost's Uuid Library. It supports a variety of generators, including a random generator that might suit your needs.
I would use some kind of hash of std::time() probably sha512.
ex (using crypto++ for the sha hash + base64 encoding).
#include <iostream>
#include <sstream>
#include <ctime>
#include <crypto++/sha.h>
#include <crypto++/base64.h>
int main() {
std::string digest;
std::stringstream ss("");
ss << std::time(NULL);
// borrowed from http://www.cryptopp.com/fom-serve/cache/50.html
CryptoPP::SHA512 hash;
CryptoPP::StringSource foo(ss.str(), true,
new CryptoPP::HashFilter(hash,
new CryptoPP::Base64Encoder(
new CryptoPP::StringSink(digest))));
std::cout << digest << std::endl;
return 0;
}
https://github.com/bigfatsea/SUID Simple Unique Identifier
Though it's in Java, but can be easily ported to any other language. You may expect duplicated ids on same instance 136 years later, good enough for medium-small projects.
Example:
long id = SUID.id().get();
Related
All conversions of the 160 bit SHA1 use 40 ascii characters (320 bits) to represent 160 bits of data (that I have been able to find). I have a need to optimize this and use as few ascii characters as possible to represent a SHA1 hash.
For instance this string "The quick brown fox jumps over the lazy dog" equals this in ASCII "2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12" when converted by typical algorithms.
I have create an algorithm that uses 5 bits for each ASCII character so I go from needing 40 ASCII characters to 32 "F0K1032QD08C1M44U11B0R77P3R31L2I".
Does anybody have a better way to get fewer characters, but not lose information (by something like a lossy compression technique or using a smaller hash like MD5)?
I have a need to potentially represent this hash as a folder on windows so using upper and lower case to use 6 bits per character cant be done.
class Program
{
static byte[] GetBytesForTypical(byte[] hash)
{
List<byte> newHash = new List<byte>();
foreach (byte b in hash)
{
int first4Bits = (b & 0xF0) >> 4;
int last4bits = b & 0x0F;
newHash.Add((byte)first4Bits);
newHash.Add((byte)last4bits);
}
return newHash.ToArray();
}
public static string ConvertHashToFileSystemFriendlyStringTypical(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
strToConvert.Append(b.ToString("X"));
}
return strToConvert.ToString();
}
static byte[] GetBytesForCompressedAttempt(byte[] hash)
{
byte[] newHash = new byte[32];
// the bit array 5 bits at a time
// at 8 bits per bytes that is 40 bits per loop 4 times
int byteCounter =0;
int k = 0;
for(int i=0; i < 4 ;++i)
{
//Get 5 bits worth
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
//Get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
newHash[k] <<= 2;
++byteCounter;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
hash[byteCounter] >>= 5;
++k;
// get 1 bit
newHash[k] = (byte)(hash[byteCounter] & 0x1);
newHash[k] <<= 7;
++byteCounter;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++k;
hash[byteCounter] >>= 4;
// get 4 bits
newHash[k] = (byte)(hash[byteCounter] & 0xF);
++byteCounter;
// get 1 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1);
hash[byteCounter] >>=1;
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++k;
hash[byteCounter] >>= 5;
// get 2 bits
newHash[k] = (byte)(hash[byteCounter] & 0x3);
++byteCounter;
// get 3 bits
newHash[k] = (byte)(hash[byteCounter] & 0x7);
++k;
// get 5 bits
newHash[k] = (byte)(hash[byteCounter] & 0x1F);
++byteCounter;
++k;
}
return newHash;
}
public static string ConvertHashToFileSystemFriendlyStringCompressedl(byte[] str)
{
StringBuilder strToConvert = new StringBuilder();
foreach (byte b in str)
{
System.Diagnostics.Debug.Assert(b < 32);
if (b >= 10 && b < 32)
{
strToConvert.Append((char)(b - 10 + 'A'));
}
else
{
strToConvert.Append((char)(b + '0'));
}
}
return strToConvert.ToString();
}
static void Main(string[] args)
{
System.Security.Cryptography.SHA1 hasher = System.Security.Cryptography.SHA1.Create();
byte[] data = hasher.ComputeHash(Encoding.Default.GetBytes("The quick brown fox jumps over the lazy dog"));
byte[] stringBytesTypical = GetBytesForTypical(data);
string typicalFriendlyHashString = ConvertHashToFileSystemFriendlyStringTypical(stringBytesTypical);
//2FD4E1C67A2D28FCED849EE1BB76E7391B93EB12 == typicalFriendlyHashString
byte[] stringBytesCompressedAttempt = GetBytesForCompressedAttempt(data);
string compressedFriendlyHashString = ConvertHashToFileSystemFriendlyStringCompressedl(stringBytesCompressedAttempt);
//F0K1032QD08C1M44U11B0R77P3R31L2I == compressedFriendlyHashString
}
}
EDIT:
The need to reduce to fewer than 40 characters has nothing to do with windows folder names. (although it could since windows paths have a limit). I need to conserve as much space as possible for human readable strings and then create a folder for anything that needs to be reviewed. The problem with the 40 character ascii string is that 1/2 of the bits are set to 0 and are in essence wasted. So when storing millions and millions of hashes space and lookup speed start to become intertwined. I cant redesign user workflow, but I can make the system more snappy and consume less memory
EDIT:
Also this would improve user experience. Currently a user has to use a partial hash to look something up. Worse case (in practice) the first 8 characters in the hash need to be used currently to usually ensure there are no duplicates. These 8 characters represent 32 bits of real hash data. Going down to 5 bits per character users will only need 6 characters to ensure no dups. If I can get it to 6 bits then user should only need around 5 characters. This gets into the realm of what most people are able to memorize
EDIT: I've made some progress from the original code I posed above. Once I converted the hash into hexatridecimal (base 36) I was able to remove one of the characters from the original 5 bit implementation above. So I am currently at 31 characters. Which means that from the typical implementation where 8 characters are required for retrieval (in practice) users should be able to use 6 characters to retrieve the same data.
public static string ConvertHashToFileSystemFriendlyStringCompressed2(byte[] hashData)
{
string mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
BigInteger base10 = new BigInteger(hashData);
string base36;
var result = new Stack<char>();
do
{
result.Push(mapping[(int)(base10 % 36)]);
base10 /= 36;
} while (base10 != 0);
base36 = new string(result.ToArray());
return base36;
}
EDIT: Been doing more research and I have a graph that I wanted to post showing the diminishing returns you get as you increase the number of ASCII characters you have to choose from. You wind up needing more and more characters for smaller and smaller gains. I seem to be at the tail end of where you get the biggest bang for your buck (at 36 characters). So even if I am able to jump to use 64 characters (which I cant at the present time) I only remove 4 of the final string. However if slim down the original hash to 18 bytes those same 36 characters now only create a 27 character string (same length as converting to base 64). Now the problem is how can I reliably compress a 20 byte hash into 18 bytes. Truncation wont work since users will still have to memorize 6 characters if I use truncation. Since a SHA1 hash are random bytes I am not sure I can lossless compress 2 bytes away (10% space savings).
EDIT: So my attempts to compress the hash bytes have not met with success. I expected this but had to try in order to prove this to myself. Basically what I did was attempt to use a Huffman Code to compress the original hash.
Since each value in the hash is equally likely (definition of a good hash) using a common Huffman tree for all compression is out of the question (since that would yield the same number of bits I am trying to compress for no net gain). However, once you create a Huffman tree for a specific hash you do get compression of the original hash (20 bytes to 16 bytes for example), only to have the saved 4 bytes subsequently lost because you have to store the Huffman tree as well. This approach may work for longer hash values (512 bits ect) but does not appear to work well enough for all SHA1 hash values to warrant implementation (only a very small subset of SHA1 hash outputs will benefit from this type of compression).
I need help trying to verify CRC-16 values (also need help with CRC-32 values). I tried to sit down and understand how CRC works but I am drawing a blank.
My first problem is when trying to use an online calculator for calculating the message "BD001325E032091B94C412AC" into CRC16 = 12AC. The documentation states that the last two octets are the CRC16 value, so I am inputting "BD001325E032091B94C4" into the site http://www.lammertbies.nl/comm/info/crc-calculation.html and receive 5A90 as the result instead of 12AC.
Does anybody know why these values are different and where I can find code for how to calculate CRC16 and CRC32 values (I plan to later learn how to do this but times doesn't allow right now)?
Some more messages are as following:
16000040FFFFFFFF00015FCB
3C00003144010405E57022C7
BA00001144010101B970F0ED
3900010101390401B3049FF1
09900C800000000000008CF3
8590000000000000000035F7
00900259025902590259EBC9
0200002B00080191014BF5A2
BB0000BEE0014401B970E51E
3D000322D0320A2510A263A0
2C0001440000D60000D65E54
--Edit--
I have included more information. The documentation I was referencing is TIA-102.BAAA-A (from the TIA standard). The following is what the documentation states (trying to avoid copyright infringement as much as possible):
The Last Block in a packet comprises several octets of user information and / or
pad octets, followed by a 4-octet CRC parity check. This is referred to as the
packet CRC.
The packet CRC is a 4-octet cyclic redundancy check coded over all of the data
octets included in the Intermediate Blocks and the octets of user information of
the Last Block. The specific calculation is as follows.
Let k be the total number of user information and pad bits over which the packet
CRC is to be calculated. Consider the k message bits as the coefficients of a
polynomial M(x) of degree k–1, associating the MSB of the zero-th message
octet with x^k–1 and the LSB of the last message octet with x^0. Define the
generator polynomial, GM(x), and the inversion polynomial, IM(x).
GM(x) = x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 +
x^4 + x^2 + x + 1
IM(x) = x^31 + x^30 + x^29 + ... + x^2 + x +1
The packet CRC polynomial, FM(x), is then computed from the following formula.
FM(x) = ( x^32 M(x) mod GM(x) ) + IM(x) modulo 2, i.e., in GF(2)
The coefficients of FM(x) are placed in the CRC field with the MSB of the zero-th
octet of the CRC corresponding to x^31 and the LSB of the third octet of the CRC
corresponding to x^0.
In the above quote, I have put ^ to show powers as the formatting didn't stay the same when quoted. I'm not sure what goes to what but does this help?
I have a class I converted from a C++ I found in internet, it uses a long to calculate a CRC32. It adhere to the standard and is the one use by PKZIP, WinZip and Ethernet. To test it, use Winzip and compress a file then calculate the same file with this class, it should return the same CRC. It does for me.
public class CRC32
{
private int[] iTable;
public CRC32() {
this.iTable = new int[256];
Init();
}
/**
* Initialize the iTable aplying the polynomial used by PKZIP, WINZIP and Ethernet.
*/
private void Init()
{
// 0x04C11DB7 is the official polynomial used by PKZip, WinZip and Ethernet.
int iPolynomial = 0x04C11DB7;
// 256 values representing ASCII character codes.
for (int iAscii = 0; iAscii <= 0xFF; iAscii++)
{
this.iTable[iAscii] = this.Reflect(iAscii, (byte) 8) << 24;
for (int i = 0; i <= 7; i++)
{
if ((this.iTable[iAscii] & 0x80000000L) == 0) this.iTable[iAscii] = (this.iTable[iAscii] << 1) ^ 0;
else this.iTable[iAscii] = (this.iTable[iAscii] << 1) ^ iPolynomial;
}
this.iTable[iAscii] = this.Reflect(this.iTable[iAscii], (byte) 32);
}
}
/**
* Reflection is a requirement for the official CRC-32 standard. Note that you can create CRC without it,
* but it won't conform to the standard.
*
* #param iReflect
* value to apply the reflection
* #param iValue
* #return the calculated value
*/
private int Reflect(int iReflect, int iValue)
{
int iReturned = 0;
// Swap bit 0 for bit 7, bit 1 For bit 6, etc....
for (int i = 1; i < (iValue + 1); i++)
{
if ((iReflect & 1) != 0)
{
iReturned |= (1 << (iValue - i));
}
iReflect >>= 1;
}
return iReturned;
}
/**
* PartialCRC caculates the CRC32 by looping through each byte in sData
*
* #param lCRC
* the variable to hold the CRC. It must have been initialize.
* <p>
* See fullCRC for an example
* </p>
* #param sData
* array of byte to calculate the CRC
* #param iDataLength
* the length of the data
* #return the new caculated CRC
*/
public long CalculateCRC(long lCRC, byte[] sData, int iDataLength)
{
for (int i = 0; i < iDataLength; i++)
{
lCRC = (lCRC >> 8) ^ (long) (this.iTable[(int) (lCRC & 0xFF) ^ (int) (sData[i] & 0xff)] & 0xffffffffL);
}
return lCRC;
}
/**
* Caculates the CRC32 for the given Data
*
* #param sData
* the data to calculate the CRC
* #param iDataLength
* then length of the data
* #return the calculated CRC32
*/
public long FullCRC(byte[] sData, int iDataLength)
{
long lCRC = 0xffffffffL;
lCRC = this.CalculateCRC(lCRC, sData, iDataLength);
return (lCRC /*& 0xffffffffL)*/^ 0xffffffffL);
}
/**
* Calculates the CRC32 of a file
*
* #param sFileName
* The complete file path
* #param context
* The context to open the files.
* #return the calculated CRC32 or -1 if an error occurs (file not found).
*/
long FileCRC(String sFileName, Context context)
{
long iOutCRC = 0xffffffffL; // Initilaize the CRC.
int iBytesRead = 0;
int buffSize = 32 * 1024;
FileInputStream isFile = null;
try
{
byte[] data = new byte[buffSize]; // buffer de 32Kb
isFile = context.openFileInput(sFileName);
try
{
while ((iBytesRead = isFile.read(data, 0, buffSize)) > 0)
{
iOutCRC = this.CalculateCRC(iOutCRC, data, iBytesRead);
}
return (iOutCRC ^ 0xffffffffL); // Finalize the CRC.
}
catch (Exception e)
{
// Error reading file
}
finally
{
isFile.close();
}
}
catch (Exception e)
{
// file not found
}
return -1l;
}
}
Read Ross Williams tutorial on CRCs to get a better understanding of CRC's, what defines a particular CRC, and their implementations.
The reveng website has an excellent catalog of known CRCs, and for each the CRC of a test string (nine bytes: "123456789" in ASCII/UTF-8). Note that there are 22 different 16-bit CRCs defined there.
The reveng software on that same site can be used to reverse engineer the polynomial, initialization, post-processing, and bit reversal given several examples as you have for the 16-bit CRC. (Hence the name "reveng".) I ran your data through and got:
./reveng -w 16 -s 16000040FFFFFFFF00015FCB 3C00003144010405E57022C7 BA00001144010101B970F0ED 3900010101390401B3049FF1 09900C800000000000008CF3 8590000000000000000035F7 00900259025902590259EBC9 0200002B00080191014BF5A2 BB0000BEE0014401B970E51E 3D000322D0320A2510A263A0 2C0001440000D60000D65E54
width=16 poly=0x1021 init=0xc921 refin=false refout=false xorout=0x0000 check=0x2fcf name=(none)
As indicated by the "(none)", that 16-bit CRC is not any of the 22 listed on reveng, though it is similar to several of them, differing only in the initialization.
The additional information you provided is for a 32-bit CRC, either CRC-32 or CRC-32/BZIP in the reveng catalog, depending on whether the bits are reversed or not.
There are quite a few parameters to CRC calculations: Polynomial, initial value, final XOR... see Wikipedia for details. Your CRC does not seem to fit the ones on the site you used, but you can try to find the right parameters from your documentation and use a different calculator, e.g. this one (though I'm afraid it doesn't support HEX input).
One thing to keep in mind is that CRC-16 is usually calculated over the data that is supposed to be checksummed plus two zero-bytes, e.g. you are probably looking for a CRC16 function where CRC16(BD001325E032091B94C40000) == 12AC. With checksums calculated in this way, the CRC of the data with checksum appended will work out to 0, which makes checking easier, e.g. CRC16(BD001325E032091B94C412AC) == 0000
I want to get 64 bit hash code of given string. How can i do that with fastest way ?
There is a ready method for get 32 bit hash code but i need 64 bit.
I am looking for only integer hashing. Not md5.
Thank you very much.
C# 4.0
Simple solution:
public static long GetHashCodeInt64(string input)
{
var s1 = input.Substring(0, input.Length / 2);
var s2 = input.Substring(input.Length / 2);
var x= ((long)s1.GetHashCode()) << 0x20 | s2.GetHashCode();
return x;
}
Since the question was about making URL I presume you always need the same hashed 64 bit int. GetHashCode is not relyable in this way. To make a hash with few collisions i use this one.
public static ulong GetUInt64Hash(HashAlgorithm hasher, string text)
{
using (hasher)
{
var bytes = hasher.ComputeHash(Encoding.Default.GetBytes(text));
Array.Resize(ref bytes, bytes.Length + bytes.Length % 8); //make multiple of 8 if hash is not, for exampel SHA1 creates 20 bytes.
return Enumerable.Range(0, bytes.Length / 8) // create a counter for de number of 8 bytes in the bytearray
.Select(i => BitConverter.ToUInt64(bytes, i * 8)) // combine 8 bytes at a time into a integer
.Aggregate((x, y) =>x ^ y); //xor the bytes together so you end up with a ulong (64-bit int)
}
}
To use it just pass whatever hashalgorithm you prefer
ulong result = GetUInt64Hash(SHA256.Create(), "foodiloodiloo")
//result: 259973318283508806
or
ulong result = GetUInt64Hash(SHA1.Create(), "foodiloodiloo")
//result: 6574081600879152103
Difference between this one and the accepted answer is that this one XOR's all the bits, and you can use whatever algorithm you want
This code is from Code Project Article - Convert String to 64bit Integer
static Int64 GetInt64HashCode(string strText)
{
Int64 hashCode = 0;
if (!string.IsNullOrEmpty(strText))
{
//Unicode Encode Covering all characterset
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
System.Security.Cryptography.SHA256 hash =
new System.Security.Cryptography.SHA256CryptoServiceProvider();
byte[] hashText = hash.ComputeHash(byteContents);
//32Byte hashText separate
//hashCodeStart = 0~7 8Byte
//hashCodeMedium = 8~23 8Byte
//hashCodeEnd = 24~31 8Byte
//and Fold
Int64 hashCodeStart = BitConverter.ToInt64(hashText, 0);
Int64 hashCodeMedium = BitConverter.ToInt64(hashText, 8);
Int64 hashCodeEnd = BitConverter.ToInt64(hashText, 24);
hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
}
return (hashCode);
}
I'll introduce a new possible answer. xxHash is very fast. Check out the benchmarks here:
https://cyan4973.github.io/xxHash/
It has a NuGet package:
https://www.nuget.org/packages/System.Data.HashFunction.xxHash
Or open sources:
https://github.com/brandondahler/Data.HashFunction/blob/master/src/System.Data.HashFunction.xxHash/xxHash_Implementation.cs
The other answers here are either 1. questionable as to their real prevention of collision or 2. just wrappers around the large and slow existing HashAlgorithm implementations.
xxHash is not cryptographic strength, but it would seem to fit the bill better for what you need. Its:
64 bits all the way,
Bench-marked faster than others.
Has good distribution for maximized collision avoidance.
I assume you are refering to the MD5 hashing algorithm for your current use?
You can do a SHA 256 for twice the length....
http://msdn.microsoft.com/en-us/library/system.security.cryptography.sha256.aspx
Extract...
byte[] data = new byte[DATA_SIZE];
byte[] result;
SHA256 shaM = new SHA256Managed();
result = shaM.ComputeHash(data);
I have used the #Kirill solution. I'm a little bit weird and I don't like "var" (I guess it's because I come from c++) so I make a variant:
string s1 = text.Substring(0, text.Length / 2);
string s2 = text.Substring(text.Length / 2);
Byte[] MS4B = BitConverter.GetBytes(s1.GetHashCode());
Byte[] LS4B = BitConverter.GetBytes(s2.GetHashCode());
UInt64 hash = (UInt64)MS4B[0] << 56 | (UInt64)MS4B[1] << 48 |
(UInt64)MS4B[2] << 40 | (UInt64)MS4B[3] << 32 |
(UInt64)LS4B[0] << 24 | (UInt64)LS4B[1] << 16 |
(UInt64)LS4B[2] << 8 | (UInt64)LS4B[3] ;
I'm not very sure about the order of the bytes, depends on the machine, (whether is little-endian or big-endian) but, who cares? it's just a number (a hash). Thank you #Kirill, it was very useful to me!
I am attempting to wrap my brain around generating a 6 digit/character non case sensitive expiring one-time password.
My source is https://www.rfc-editor.org/rfc/rfc4226#section-5
First the definition of the parameters
C 8-byte counter value, the moving factor. This counter
MUST be synchronized between the HOTP generator (client)
and the HOTP validator (server).
K shared secret between client and server; each HOTP
generator has a different and unique secret K.
T throttling parameter: the server will refuse connections
from a user after T unsuccessful authentication attempts.
Then we have the algorithm to generate the HOTP
As the output of the HMAC-SHA-1 calculation is 160 bits, we must
truncate this value to something that can be easily entered by a
user.
HOTP(K,C) = Truncate(HMAC-SHA-1(K,C))
Then, we have Truncate defined as
String = String[0]...String[19]
Let OffsetBits be the low-order 4 bits of String[19]
Offset = StToNum(OffsetBits) // 0 <= OffSet <= 15
Let P = String[OffSet]...String[OffSet+3]
Return the Last 31 bits of P
And then an example is offered for a 6 digit HOTP
The following code example describes the extraction of a dynamic
binary code given that hmac_result is a byte array with the HMAC-
SHA-1 result:
int offset = hmac_result[19] & 0xf ;
int bin_code = (hmac_result[offset] & 0x7f) << 24
| (hmac_result[offset+1] & 0xff) << 16
| (hmac_result[offset+2] & 0xff) << 8
| (hmac_result[offset+3] & 0xff) ;
I am rather at a loss in attempting to convert this into useful C# code for generating one time passwords. I already have code for creating an expiring HMAC as follows:
byte[] hashBytes = alg.ComputeHash(Encoding.UTF8.GetBytes(input));
byte[] result = new byte[8 + hashBytes.Length];
hashBytes.CopyTo(result, 8);
BitConverter.GetBytes(expireDate.Ticks).CopyTo(result, 0);
I'm just not sure how to go from that, to 6 digits as proposed in the above algorithms.
You have two issues here:
If you are generating alpha-numeric, you are not conforming to the RFC - at this point, you can simply take any N bytes and turn them to a hex string and get alpha-numeric. Or, convert them to base 36 if you want a-z and 0-9. Section 5.4 of the RFC is giving you the standard HOTP calc for a set Digit parameter (notice that Digit is a parameter along with C, K, and T). If you are choosing to ignore this section, then you don't need to convert the code - just use what you want.
Your "result" byte array has the expiration time simply stuffed in the first 8 bytes after hashing. If your truncation to 6-digit alphanumeric does not collect these along with parts of the hash, it may as well not be calculated at all. It is also very easy to "fake" or replay - hash the secret once, then put whatever ticks you want in front of it - not really a one time password. Note that parameter C in the RFC is meant to fulfill the expiring window and should be added to the input prior to computing the hash code.
For anyone interested, I did figure out a way to build expiration into my one time password. The approach is to use the created time down to the minute (ignoring seconds, milliseconds, etc). Once you have that value, use the ticks of the DateTime as your counter, or variable C.
otpLifespan is my HOTP lifespan in minutes.
DateTime current = new DateTime(DateTime.Now.Year, DateTime.Now.Month,
DateTime.Now.Day, DateTime.Now.Hour, DateTime.Now.Minute, 0);
for (int x = 0; x <= otpLifespan; x++)
{
var result = NumericHOTP.Validate(hotp, key,
current.AddMinutes(-1 * x).Ticks);
//return valid state if validation succeeded
//return invalid state if the passed in value is invalid
// (length, non-numeric, checksum invalid)
}
//return expired state
My expiring HOTP extends from my numeric HOTP which has a static validation method that checks the length, ensures it is numeric, validates the checksum if it is used, and finally compares the hotp passed in with a generated one.
The only downside to this is that each time you validate an expiring hotp, your worse case scenario is to check n + 1 HOTP values where n is the lifespan in minutes.
The java code example in the document outlining RFC 4226 was a very straightforward move into C#. The only piece I really had to put any effort into rewriting was the hashing method.
private static byte[] HashHMACSHA1(byte[] keyBytes, byte[] text)
{
HMAC alg = new HMACSHA1(keyBytes);
return alg.ComputeHash(text);
}
I hope this helps anyone else attempting to generate one time passwords.
This snippet should do what you are asking for:
public class UniqueId
{
public static string GetUniqueKey()
{
int maxSize = 6; // whatever length you want
char[] chars = new char[62];
string a;
a = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
char[] chars = new char[a.Length];
chars = a.ToCharArray();
int size = maxSize;
byte[] data = new byte[1];
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
crypto.GetNonZeroBytes(data);
size = maxSize;
data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
{ result.Append(chars[b % (chars.Length - 1)]); }
return result.ToString();
}
}
I have the following hash function, and I'm trying to get my way to reverse it, so that I can find the key from a hashed value.
uint Hash(string s)
{
uint result = 0;
for (int i = 0; i < s.Length; i++)
{
result = ((result << 5) + result) + s[i];
}
return result;
}
The code is in C# but I assume it is clear.
I am aware that for one hashed value, there can be more than one key, but my intent is not to find them all, just one that satisfies the hash function suffices.
EDIT :
The string that the function accepts is formed only from digits 0 to 9 and the chars '*' and '#' hence the Unhash function must respect this criteria too.
Any ideas? Thank you.
This should reverse the operations:
string Unhash(uint hash)
{
List<char> s = new List<char>();
while (hash != 0)
{
s.Add((char)(hash % 33));
hash /= 33;
}
s.Reverse();
return new string(s.ToArray());
}
This should return a string that gives the same hash as the original string, but it is very unlikely to be the exact same string.
Characters 0-9,*,# have ASCII values 48-57,42,35, or binary: 00110000 ... 00111001, 00101010, 00100011
First 5 bits of those values are different, and 6th bit is always 1. This means that you can deduce your last character in a loop by taking current hash:
uint lastChar = hash & 0x1F - ((hash >> 5) - 1) & 0x1F + 0x20;
(if this doesn't work, I don't know who wrote it)
Now roll back hash,
hash = (hash - lastChar) / 33;
and repeat the loop until hash becomes zero. I don't have C# on me, but I'm 70% confident that this should work with only minor changes.
Brute force should work if uint is 32 bits. Try at least 2^32 strings and one of them is likely to hash to the same value. Should only take a few minutes on a modern pc.
You have 12 possible characters, and 12^9 is about 2^32, so if you try 9 character strings you're likely to find your target hash. I'll do 10 character strings just to be safe.
(simple recursive implementation in C++, don't know C# that well)
#define NUM_VALID_CHARS 12
#define STRING_LENGTH 10
const char valid_chars[NUM_VALID_CHARS] = {'0', ..., '#' ,'*'};
void unhash(uint hash_value, char *string, int nchars) {
if (nchars == STRING_LENGTH) {
string[STRING_LENGTH] = 0;
if (Hash(string) == hash_value) { printf("%s\n", string); }
} else {
for (int i = 0; i < NUM_VALID_CHARS; i++) {
string[nchars] = valid_chars[i];
unhash(hash_value, string, nchars + 1);
}
}
}
Then call it with:
char string[STRING_LENGTH + 1];
unhash(hash_value, string, 0);
Hash functions are designed to be difficult or impossible to reverse, hence the name (visualize meat + potatoes being ground up)
I would start out by writing each step that result = ((result << 5) + result) + s[i]; does on a separate line. This will make solving a lot easier. Then all you have to do is the opposite of each line (in the opposite order too).