What's the best way to convert (to hash) a string like 3800290030, which represents an id for a classification into a four character one like 3450 (I need to support at max 9999 classes). We will only have less than 1000 classes in 10 character space and it will never grow to more than 10k.
The hash needs to be unique and always the same for the same an input.
The resulting string should be numeric (but it will be saved as char(4) in SQL Server).
I removed the requirement for reversibility.
This is my solution, please comment:
string classTIC = "3254002092";
MD5 md5Hasher = MD5.Create();
byte[] classHash = md5Hasher.ComputeHash(Encoding.Default.GetBytes(classTIC));
StringBuilder sBuilder = new StringBuilder();
foreach (byte b in classHash)
{
sBuilder.Append(b.ToString());
}
string newClass = (double.Parse(sBuilder.ToString())%9999 + 1).ToString();
You can do something like
str.GetHashCode() % 9999 + 1;
The hash can't be unique since you have more than 9,999 strings
It is not unique so it cannot be reversible
and of course my answer is wrong in case you don't have more than 9999 different 10 character classes.
In case you don't have more than 9999 classes you need to have a mapping from string id to its 4 char representation - for example - save the stings in a list and each string key will be its index in the list
When you want to reverse the process, and have no knowledge about the id's apart from that there are at most 9999 of them, I think you need to use a translation dictionary to map each id to its short version.
Even without the need to reverse the process, I don't think there is a way to guerantee unique id's without such a dictionary.
This short version could then simply be incremented by one with each new id.
You do not want a hash. Hashing by design allows for collisions. There is no possible hashing function for the kind of strings you work with that won't have collisions.
You need to build a persistent mapping table to convert the string to a number. Logically similar to a Dictionary<string, int>. The first string you'll add gets number 0. When you need to map, look up the string and return its associate number. If it is not present then add the string and simply assign it a number equal to the count.
Making this mapping table persistent is what you'll need to think about. Trivially done with a dbase of course.
ehn no idea
Unique is difficult, you have - in your request - 4 characters - thats a max of 9999, collision will occur.
Hash is not reversible. Data is lost (obviously).
I think you might need to create and store a lookup table to be able to support your requirements. And in that case you don't even need a hash you could just increment the last used 4 digit lookup code.
use md5 or sha like:
string = substring(md5("05910395410"),0,4)
or write your own simple method, for example
sum = 0
foreach(char c in string)
{
sum+=(int)c;
}
sum %= 9999
Convert the number to base35/base36
ex: 3800290030 decimal = 22CGHK5 base-35 //length: 7
Or may be convert to Base60 [ignoring Capital O and small o to not confuse with 0]
ex: 3800290030 decimal = 4tDw7A base-60 //length: 6
Convert your int to binary and then base64 encode it. It wont be numbers then, but it will be a reversible hash.
Edit:
As far as my sense tells me you are asking for the impossible.
You cannot take a totally random data and somehow reduce the amount of data it takes to encode it (some might be shorter, others might be longer), thus your requirement that the number is unique is not possible, there has to be some dataloss somewhere and no matter how you do it it won't ensure uniqueness.
Second, due to the above it is also not possible to make it reversible. Thus that is out of the question.
Therefore, the only possible way I can see, is if you have an enumerable data source. IE. you know all the values prior to calculating the value. In that case you can simply assign them a sequencial id.
Related
I would like to encrypt a string and it has to be decrypted back. The input string could be of varying length but the encrypted string must be a max of 15 characters and alphanumeric. This is for an intranet application, so security is not of a big concern. I should be able to decrypt it back to match in another page. I am using vs2012, c#, asp.net. Please advice. I tried rijndael, but it gives a long string. The encrypted string must be user friendly since the user will need to remember and enter it.
Thanks,
DotNet
The input string could be of varying length but the encrypted string must be a max of 15 characters and alphanumeric.
This is clearly impossible. If you solved this, you would solve all possible storage concerns - after all, you'd be able to store the whole internet (viewed as one long string) in 15 alphanumeric characters (after decryption).
You haven't told us what the input string might consist of, either - assuming that by "alphanumeric" you mean A-Z, a-z, 0-9 that's only 62 possible characters. So there are 6215 possible encrypted strings. If your input is in UTF-16 code units, then just 6 code units has more possibilities (655366 is greater than 6215).
Basically, you're onto a losing proposition here - you should rethink your design.
Perhaps you should store the original value in a server, return some token, and then be able to fetch the value when you want? That isn't encryption, but it may satisfy your real requirements.
Further reading: the pigeonhole principle
I think you are not looking for encryption but Hash algorithm. Hash algorithm generates Hash of specific length (not encrypted text).
John skeet is almost right.
If you convert to an encryption of base 10 then convert back adding characters:
!##$%^&*()_+-=;:'"<>,.?/\ to the 0-9, A-Z, a-z
then you will sometimes reduce the size and all the times it won't be recognizable.
basically convert to base 10 from a large base then convert that to a larger base
Thanks for your suggestions! I had to generate the password using the same technique that my peer had used which is SHA.
Convert.ToBase64String(new SHA1CryptoServiceProvider().ComputeHash(Encoding.UTF8.GetBytes(username)));
As Jon Skeet brings up, what you probably need is a token. You'll need to store you data on the server (database?). Use a token value as a key to retrieve the data.
If you don't need any security at all, you can use an Identity field (autoincrementing). If you want to make it harder to guess, you'll need something like this:
public static string GetRandomToken()
{
// create a guid and convert to a byte array
var guid = Guid.NewGuid();
var bytes = guid.ToByteArray();
// xor the first 8 bytes with the last
for (int i = 0; i < 8; i++)
{
bytes[i] = (byte)(bytes[i] ^ bytes[i + 8]);
}
// resize the array down to eight bytes
Array.Resize<byte>(ref bytes, 8);
// return the hexidecimal representation, with the last character lopped off
return BitConverter.ToString(bytes).Replace("-", string.Empty).Substring(0, 15);
}
This only handles generating a token with length of 15. You still have to handle the database access yourself.
i have a table that have one column (AbsoluteUrl NVARCHAR(2048)) and i want to querying on this column, so this took long time to comparing each records with my own string. at least this table have 1000000 records.
Now i think there is better solution to making a checksum for each AbsoluteUrl and compare to checksum together instead of to AbsoluteUrl column. so i'm use below method to generate checksum. but i want another class to making checksum's with fewer than 128 length bytes.
public static byte[] GenerateChecksumAsByte(string content)
{
var buffer = Encoding.UTF8.GetBytes(content);
return new SHA1Managed().ComputeHash(buffer);
}
And is this approach good for my work?
UPDATE
According to answers, i want to explain in more depth. so actually I'm work on very simple Web Search Engine. If I want to briefly explain that I have to say when all of urls of web page are extracted (collection of found urls) then I'm going to index that to Urls table.
UrlId uniqueidentifier NotNull Primary Key (Clustered Index)
AbsoluteUrl nvarchar(2048) NoyNull
Checksum varbinary(128) NotNull
So i first search the table to if i have same url which is indexed before or not. if not then create new record.
public Url Get(byte[] checksum)
{
return _dataContext.Urls.SingleOrDefault(url => url.Checksum == checksum);
//Or querying by AbsoluteUrl field
}
And Save method.
public void Save(Url url)
{
if (url == null)
throw new ArgumentNullException("url");
var origin = _dataContext.Urls.GetOriginalEntityState(url);
if (origin == null)
{
_dataContext.Urls.Attach(url);
_dataContext.Refresh(RefreshMode.KeepCurrentValues, url);
}
else
_dataContext.Urls.InsertOnSubmit(url);
_dataContext.SubmitChanges();
}
For example if on one page i found 2000 urls, i must search for 2000 times.
You want to use a hash of size (p) as a key, expecting at most 1m records (u). To answer this question you have to first do the math...
Solve the following for each hash size to consider: 1 - e ^ (-u^2 / (2 * p))
32-bit: 100% chance of collision
64-bit: 0.00000271% chance of collision
128-bit: 0% (too small to calculate with a double precision)
Now you should have enough information to make an informed decision. Here is the code to produce the above calculation on the 64-bit key:
double keySize = 64;
double possibleKeys = Math.Pow(2, keySize);
double universeSize = 1000000;
double v1, v2;
v1 = -Math.Pow(universeSize, 2);
v2 = 2.0 * possibleKeys;
v1 = v1 / v2;
v1 = Math.Pow(2.718281828, v1);
v1 = 1.0 - v1;
Console.WriteLine("The resulting percentage is {0:n40}%", v1 * 100.0);
Personally I'd stick with at least a 128 bit hash myself. Moreover if collisions can cause any form of a security hole you need to use at least a v2 SHA hash (SHA256/SHA512).
Now, If this is just an optimization for the database consider the following:
add a 32-bit hash code to the table.
create a composite key containing both the 32-bit hash AND the original string.
ALWAYS seek on both the hash and the original string.
Assume the hash is only an optimization and never unique.
I agree with Steven that you should first try an index on the field to see if it really is "comparing each records" that is the bottleneck.
However, depending on your database, indexing an NVARCHAR(2048) may not be possible, and really could be the bottleneck. In that case generating checksums actually could improve your search performance if:
You do many more comparisons than inserts.
Comparing the checksum is faster than comparing NVARCHARs.
Most of your checksums are different.
You have not shown us any queries or sample data, so I have no way of knowing if these are true. If they are true, you can indeed improve performance by generating a checksum for each AbsoluteUrl and assuming values are different where these checksums are different. If the checksums are the same, you will have to do a string comparison to see if values match, but if checksums are different you can be sure the strings are different.
In this case a cryptographic checksum is not necessary, you can use a smaller, faster checksum algorithm like CRC64.
As Steven points out, if your checksums are the same you cannot assume your values are the same. However, if most of your values are different and you have a good checksum, most of your checksums will be different and will not require string comparisons.
No, this is not a good approach.
A million records is no big deal for an indexed field. On the other hand, any checksum/hash/whatever you generate is capable of false positives due to the pigeonhole principle (aka birthday paradox). Making it bigger reduces but does not eliminate this chance, but it does slow things down to the point where there will be no speed increase.
Just slap an index on the field and see what happens.
I have an id in the url. So normally it will be an auto number and so it will be 1,2,3,4,5,.....
I don't want visitors to figure out the sequence and so i want to let the number be kinda of random. So i want 1 to be converted to 174891 and 2 to 817482 and so on. But i want this to be in a specique range like 1 to 1,000,000.
I figured out i can do this using xoring and shifting of the bits of the integer. But i was wondering if this already was implemented in some place.
Thanks
You could pass your integer as the seed to a random number generator. (Just make sure that it would be unique)
You could also generate the SHA-512c hash of the integer and use that instead.
However, the best thing to do here is to use a GUID instead of an integer.
EDIT: If it needs to be reversible, the correct way to do it is to encrypt the number using AES or a different encryption algorithm. However, this won't result in a number between one and a million.
Don't rely on obscurity -- i.e., non-sequential ids -- for security. Build your app so that even if someone does guess the next id, it's still secure.
If you do need non-sequential ids, though. Generate a new id each time randomly. Store that in your table as a indexed (uniquely) column along with your autogenerated primary key id. Then all you need to do is a look up on that column to get back the real id.
EDIT: In general, I prefer tvanfosson's approach on both scores. However, here's an answer to the question as stated...
These are fairly strange design constraints, to be honest - but they're reasonably easy to deal with:
Pick an arbitrary RNG seed which you will use on every execution of your program
Create an instance of Random using that seed
Create an array of integers 1..1000000
Shuffle the array using the Random instance
Create a "reverse mapping" array by going through the original array like this:
int[] reverseMapping = new int[mapping.Length];
for (int i = 0; i < mapping.Length; i++)
{
reverseMapping[mapping[i]] = i + 1;
}
Then you can map both ways. This does rely on the algorithm used by Random not changing, admittedly... if that's a concern, you could always generate this mapping once and save it somewhere.
If you're looking for a fairly simple pseudo-random integer sequence, the linear congruential method is pretty good:
ni+1 = (a×ni + k) mod m
Use prime numbers for a and k.
A Ticket has a integer ID.
We decided users should not interact (see, enter as search parameter) the integer ID, because it is sequential, predictable.
So the users should work with a encrypted ID. It should have eight chars, between letters and numbers, avoiding those who look like (0, o, 1, l, 5, s, u, v), and not being sequential.
Which algorithm do you think is the best for generating this encrypted id, this two-way convertible string? (from id to encrypted, from encrypted to id)
thanks!
edit: hashed => encrypted
A hash is by definition not convertible back to the original.
What you are looking for is encryption, if you want the both-way-conversion to be done programmatically.
Alternatively, you can use the database-based approach. You generate a hash (or even better a unique identifier) for an integer and store them both in a mapping table. Then you can easily find the original based on its hash (identifier).
One simplistic approach:
Declare a string containing all the characters you do want to use.
To take a hash, create a new instance of Random with the ticket ID as the seed
Take the first 8 random numbers from this Random instance, and use those numbers to index into the string to determine the 8 random characters.
However, this really will create a hash in that it may well not be unique (or reversible). Are you sure that's okay for your purposes?
Why not generate a random "visible ticket ID" when you create a new ticket, repeatedly generating random strings of 8 (or more?) characters until you avoid a collision - then store that visible ticket ID along with the ticket data (so you can search for it later, when the user presents it to you).
The larger your alphabet or the more characters you use, the smaller the chance is of a collision.
Note that one advantage of generating a random "visible ID" which isn't based on the sequential ticket ID is that you're not relying on security through obscurity... if you use something which predictably creates the same string from the same ID then if anyone works out that algorithm, you're effectively back to where you started (they can work out the "current" sequence number and generated the next visible ticket ID).
I think you dont want to hash the data, but to encode it. (Hashing is a ONE WAY algorithm)
To create an encoder/decoder you just do the following:
Encode
- Create a set of valid symbols
- take the ID, divide it by the number of symbols you have
--> Remainder becomes the index in you symbols aray
--> append this letter to the ones you already have.
- take the floored result and start by step 2 until you have a result smaller then 1
Decode:
Take 1st symbol, and get the index of it in your array.
add it to the running total * elements in your symbol array
take next symbol and continue at step 2
If all you are worried about is the predictability of the id in the URL, you could simply add another column to your user table that contains a hash of the row id. You wouldn't necessarily have to use a reversible algorithm and you could ensure the hash was unique by checking the other rows in the table.
You would probably want to add an index on this column if performance is an issue because you would be selecting on it frequently.
You could create an encrypted value for the user to see and work with and then decrypt it when working with it. I think a better solution would be to create a hashed ID value as part of the object as others have suggested, but if you wanted to go the encrypt/decrypt route the below is what we did for encrypting id's on the fly. You could encrypt the id to a string and show that string to the user, then decrypt the string when passed in from the user.
We implemented a wrapper around the 'DESCryptoServiceProvider' class so could encrypt a string and decrypt back to a string. We use this to encrpt ID's in query strings. We ToString() the ids to make them strings we can then encrypt. Our wrapper class Encrypt method returns a base64 encoded string. Our wrapper class takes care of converting string input into byte array, encrypting and returning encrypted byte array as base64 encoded string. The Decrypt method on our wrapper takes a string, the base64 encoded string, and decrypt it to a clear text string. You would then have to parse the clear text back into an int. One note if using in url is you have to UrlEncode the base64 encoded string before putting it in a url.
We make the keys configurable so can share keys across machines for web farming so ids encrypted on one web box can be decrypted on others.
I have looked all of the place for this and I can't seem to get a complete answer for this. So if the answer does already exist on stackoverflow then I apologize in advance.
I want a unique and random ID so that users in my website can't guess the next number and just hop to someone else's information. I plan to stick to a incrementing ID for the primary key but to also store a random and unique ID (sort of a hash) for that row in the DB and put an index on it.
From my searching I realize that I would like to avoid collisions and I have read some mentions of SHA1.
My basic requirements are
Something smaller than a GUID. (Looks horrible in URL)
Must be unique
Avoid collisions
Not a long list of strange characters that are unreadable.
An example of what I am looking for would be www.somesite.com/page.aspx?id=AF78FEB
I am not sure whether I should be implementing this in the database (I am using SQL Server 2005) or in the code (I am using C# ASP.Net)
EDIT:
From all the reading I have done I realize that this is security through obscurity. I do intend having proper authorization and authentication for access to the pages. I will use .Net's Authentication and authorization framework. But once a legitimate user has logged in and is accessing a legimate (but dynamically created page) filled with links to items that belong to him. For example a link might be www.site.com/page.aspx?item_id=123. What is stopping him from clicking on that link, then altering the URL above to go www.site.com/page.aspx?item_id=456 which does NOT belong to him? I know some Java technologies like Struts (I stand to be corrected) store everything in the session and somehow work it out from that but I have no idea how this is done.
Raymond Chen has a good article on why you shouldn't use "half a guid", and offers a suitable solution to generating your own "not quite guid but good enough" type value here:
GUIDs are globally unique, but substrings of GUIDs aren't
His strategy (without a specific implementiation) was based on:
Four bits to encode the computer number,
56 bits for the timestamp, and
four bits as a uniquifier.
We can reduce the number of bits to make the computer unique since the number of computers in the cluster is bounded, and we can reduce the number of bits in the timestamp by assuming that the program won’t be in service 200 years from now.
You can get away with a four-bit uniquifier by assuming that the clock won’t drift more than an hour out of skew (say) and that the clock won’t reset more than sixteen times per hour.
UPDATE (4 Feb 2017):
Walter Stabosz discovered a bug in the original code. Upon investigation there were further bugs discovered, however, extensive testing and reworking of the code by myself, the original author (CraigTP) has now fixed all of these issues. I've updated the code here with the correct working version, and you can also download a Visual Studio 2015 solution here which contains the "shortcode" generation code and a fairly comprehensive test suite to prove correctness.
One interesting mechanism I've used in the past is to internally just use an incrementing integer/long, but to "map" that integer to a alphanumeric "code".
Example
Console.WriteLine($"1371 as a shortcode is: {ShortCodes.LongToShortCode(1371)}");
Console.WriteLine($"12345 as a shortcode is: {ShortCodes.LongToShortCode(12345)}");
Console.WriteLine($"7422822196733609484 as a shortcode is: {ShortCodes.LongToShortCode(7422822196733609484)}");
Console.WriteLine($"abc as a long is: {ShortCodes.ShortCodeToLong("abc")}");
Console.WriteLine($"ir6 as a long is: {ShortCodes.ShortCodeToLong("ir6")}");
Console.WriteLine($"atnhb4evqqcyx as a long is: {ShortCodes.ShortCodeToLong("atnhb4evqqcyx")}");
// PLh7lX5fsEKqLgMrI9zCIA
Console.WriteLine(GuidToShortGuid( Guid.Parse("957bb83c-5f7e-42b0-aa2e-032b23dcc220") ) );
Code
The following code shows a simple class that will change a long to a "code" (and back again!):
public static class ShortCodes
{
// You may change the "shortcode_Keyspace" variable to contain as many or as few characters as you
// please. The more characters that are included in the "shortcode_Keyspace" constant, the shorter
// the codes you can produce for a given long.
private static string shortcodeKeyspace = "abcdefghijklmnopqrstuvwxyz0123456789";
public static string LongToShortCode(long number)
{
// Guard clause. If passed 0 as input
// we always return empty string.
if (number == 0)
{
return string.Empty;
}
var keyspaceLength = shortcodeKeyspace.Length;
var shortcodeResult = "";
var numberToEncode = number;
var i = 0;
do
{
i++;
var characterValue = numberToEncode % keyspaceLength == 0 ? keyspaceLength : numberToEncode % keyspaceLength;
var indexer = (int) characterValue - 1;
shortcodeResult = shortcodeKeyspace[indexer] + shortcodeResult;
numberToEncode = ((numberToEncode - characterValue) / keyspaceLength);
}
while (numberToEncode != 0);
return shortcodeResult;
}
public static long ShortCodeToLong(string shortcode)
{
var keyspaceLength = shortcodeKeyspace.Length;
long shortcodeResult = 0;
var shortcodeLength = shortcode.Length;
var codeToDecode = shortcode;
foreach (var character in codeToDecode)
{
shortcodeLength--;
var codeChar = character;
var codeCharIndex = shortcodeKeyspace.IndexOf(codeChar);
if (codeCharIndex < 0)
{
// The character is not part of the keyspace and so entire shortcode is invalid.
return 0;
}
try
{
checked
{
shortcodeResult += (codeCharIndex + 1) * (long) (Math.Pow(keyspaceLength, shortcodeLength));
}
}
catch(OverflowException)
{
// We've overflowed the maximum size for a long (possibly the shortcode is invalid or too long).
return 0;
}
}
return shortcodeResult;
}
}
}
This is essentially your own baseX numbering system (where the X is the number of unique characters in the shortCode_Keyspace constant.
To make things unpredicable, start your internal incrementing numbering at something other than 1 or 0 (i.e start at 184723) and also change the order of the characters in the shortCode_Keyspace constant (i.e. use the letters A-Z and the numbers 0-9, but scamble their order within the constant string. This will help make each code somewhat unpredictable.
If you're using this to "protect" anything, this is still security by obscurity, and if a given user can observe enough of these generated codes, they can predict the relevant code for a given long. The "security" (if you can call it that) of this is that the shortCode_Keyspace constant is scrambled, and remains secret.
EDIT:
If you just want to generate a GUID, and transform it to something that is still unique, but contains a few less characters, this little function will do the trick:
public static string GuidToShortGuid(Guid gooid)
{
string encoded = Convert.ToBase64String(gooid.ToByteArray());
encoded = encoded.Replace("/", "_").Replace("+", "-");
return encoded.Substring(0, 22);
}
If you don't want other users to see people information why don't you secure the page which you are using the id?
If you do that then it won't matter if you use an incrementing Id.
[In response to the edit]
You should consider query strings as "evil input". You need to programmatically check that the authenticated user is allowed to view the requested item.
if( !item456.BelongsTo(user123) )
{
// Either show them one of their items or a show an error message.
}
You could randomly generate a number. Check that this number is not already in the DB and use it. If you want it to appear as a random string you could just convert it to hexadecimal, so you get A-F in there just like in your example.
A GUID is 128 bit. If you take these bits and don’t use a character set with just 16 characters to represent them (16=2^4 and 128/4 = 32 chacters) but a character set with, let’s say, 64 characters (like Base 64), you would end up at only 22 characters (64=2^6 and 128/6 = 21.333, so 22 characters).
Take your auto-increment ID, and HMAC-SHA1 it with a secret known only to you. This will generate a random-looking 160-bits that hide the real incremental ID. Then, take a prefix of a length that makes collisions sufficiently unlikely for your application---say 64-bits, which you can encode in 8 characters. Use this as your string.
HMAC will guarantee that no one can map from the bits shown back to the underlying number. By hashing an auto-increment ID, you can be pretty sure that it will be unique. So your risk for collisions comes from the likelihood of a 64-bit partial collision in SHA1. With this method, you can predetermine if you will have any collisions by pre-generating all the random strings that this method which generate (e.g. up to the number of rows you expect) and checking.
Of course, if you are willing to specify a unique condition on your database column, then simply generating a totally random number will work just as well. You just have to be careful about the source of randomness.
How long is too long? You could convert the GUID to Base 64, which ends up making it quite a bit shorter.
What you could do is something I do when I want exactly what you are wanting.
Create your GUID.
Get remove the dashes, and get a
substring of how long you want your
ID
Check the db for that ID, if it
exists goto step 1.
Insert record.
This is the simplest way to insure it is obscured and unique.
I have just had an idea and I see Greg also pointed it out. I have the user stored in the session with a user ID. When I create my query I will join on the Users table with that User ID, if the result set is empty then we know he was hacking the URL and I can redirect to an error page.
A GUID is just a number
The latest generation of GUIDs (version 4) is basically a big random number*
Because it's a big random number the chances of a collision are REALLY small.
The biggest number you can make with a GUID is over:
5,000,000,000,000,000,000,000,000,000,000,000,000
So if you generate two GUIDs the chance the second GUID is the same as the first is:
1 in 5,000,000,000,000,000,000,000,000,000,000,000,000
If you generate 100 BILLION GUIDs.
The chance your 100 billionth GUID collides with the other 99,999,999,999 GUIDs is:
1 in 50,000,000,000,000,000,000,000,000
Why 128 bits?
One reason is that computers like working with multiples of 8 bits.
8, 16, 32, 64, 128, etc
The other reason is that the guy who came up with the GUID felt 64 wasn't enough, and 256 was way too much.
Do you need 128 bits?
No, how many bits you need depends on how many numbers you expect to generate and how sure you want to be that they don't collide.
64 bit example
Then the chance that your second number would collide with the first would be:
1 in 18,000,000,000,000,000,000 (64 bit)
Instead of:
1 in 5,000,000,000,000,000,000,000,000,000,000,000,000 (128 bit)
What about the 100 billionth number?
The chance your 100 billionth number collides with the other 99,999,999,999 would be:
1 in 180,000,000 (64 bit)
Instead of:
1 in 50,000,000,000,000,000,000,000,000 (128 bit)
So should you use 64 bits?
Depends are you generating 100 billion numbers? Even if you were then does 180,000,000 make you uncomfortable?
A little more details about GUIDs
I'm specifically talking about version 4.
Version 4 doesn't actually use all 128 bits for the random number portion, it uses 122 bits. The other 6 bits are used to indicate that is version 4 of the GUID standard.
The numbers in this answer are based on 122 bits.
And yes since it's just a random number you can just take the number of bits you want from it. (Just make sure you don't take any of the 6 versioning bits that never change - see above).
Instead of taking bits from the GUID though you could instead use the the same random number generator the GUID got it's bits from.
It probably used the random number generator that comes with the operating system.
Late to the party but I found this to be the most reliable way to generate Base62 random strings in C#.
private static Random random = new Random();
void Main()
{
var s = RandomString(7);
Console.WriteLine(s);
}
public static string RandomString(int length)
{
const string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}