HMAC-based one time password in C# (RFC 4226 - HOTP) - c#

I am attempting to wrap my brain around generating a 6 digit/character non case sensitive expiring one-time password.
My source is https://www.rfc-editor.org/rfc/rfc4226#section-5
First the definition of the parameters
C 8-byte counter value, the moving factor. This counter
MUST be synchronized between the HOTP generator (client)
and the HOTP validator (server).
K shared secret between client and server; each HOTP
generator has a different and unique secret K.
T throttling parameter: the server will refuse connections
from a user after T unsuccessful authentication attempts.
Then we have the algorithm to generate the HOTP
As the output of the HMAC-SHA-1 calculation is 160 bits, we must
truncate this value to something that can be easily entered by a
user.
HOTP(K,C) = Truncate(HMAC-SHA-1(K,C))
Then, we have Truncate defined as
String = String[0]...String[19]
Let OffsetBits be the low-order 4 bits of String[19]
Offset = StToNum(OffsetBits) // 0 <= OffSet <= 15
Let P = String[OffSet]...String[OffSet+3]
Return the Last 31 bits of P
And then an example is offered for a 6 digit HOTP
The following code example describes the extraction of a dynamic
binary code given that hmac_result is a byte array with the HMAC-
SHA-1 result:
int offset = hmac_result[19] & 0xf ;
int bin_code = (hmac_result[offset] & 0x7f) << 24
| (hmac_result[offset+1] & 0xff) << 16
| (hmac_result[offset+2] & 0xff) << 8
| (hmac_result[offset+3] & 0xff) ;
I am rather at a loss in attempting to convert this into useful C# code for generating one time passwords. I already have code for creating an expiring HMAC as follows:
byte[] hashBytes = alg.ComputeHash(Encoding.UTF8.GetBytes(input));
byte[] result = new byte[8 + hashBytes.Length];
hashBytes.CopyTo(result, 8);
BitConverter.GetBytes(expireDate.Ticks).CopyTo(result, 0);
I'm just not sure how to go from that, to 6 digits as proposed in the above algorithms.

You have two issues here:
If you are generating alpha-numeric, you are not conforming to the RFC - at this point, you can simply take any N bytes and turn them to a hex string and get alpha-numeric. Or, convert them to base 36 if you want a-z and 0-9. Section 5.4 of the RFC is giving you the standard HOTP calc for a set Digit parameter (notice that Digit is a parameter along with C, K, and T). If you are choosing to ignore this section, then you don't need to convert the code - just use what you want.
Your "result" byte array has the expiration time simply stuffed in the first 8 bytes after hashing. If your truncation to 6-digit alphanumeric does not collect these along with parts of the hash, it may as well not be calculated at all. It is also very easy to "fake" or replay - hash the secret once, then put whatever ticks you want in front of it - not really a one time password. Note that parameter C in the RFC is meant to fulfill the expiring window and should be added to the input prior to computing the hash code.

For anyone interested, I did figure out a way to build expiration into my one time password. The approach is to use the created time down to the minute (ignoring seconds, milliseconds, etc). Once you have that value, use the ticks of the DateTime as your counter, or variable C.
otpLifespan is my HOTP lifespan in minutes.
DateTime current = new DateTime(DateTime.Now.Year, DateTime.Now.Month,
DateTime.Now.Day, DateTime.Now.Hour, DateTime.Now.Minute, 0);
for (int x = 0; x <= otpLifespan; x++)
{
var result = NumericHOTP.Validate(hotp, key,
current.AddMinutes(-1 * x).Ticks);
//return valid state if validation succeeded
//return invalid state if the passed in value is invalid
// (length, non-numeric, checksum invalid)
}
//return expired state
My expiring HOTP extends from my numeric HOTP which has a static validation method that checks the length, ensures it is numeric, validates the checksum if it is used, and finally compares the hotp passed in with a generated one.
The only downside to this is that each time you validate an expiring hotp, your worse case scenario is to check n + 1 HOTP values where n is the lifespan in minutes.
The java code example in the document outlining RFC 4226 was a very straightforward move into C#. The only piece I really had to put any effort into rewriting was the hashing method.
private static byte[] HashHMACSHA1(byte[] keyBytes, byte[] text)
{
HMAC alg = new HMACSHA1(keyBytes);
return alg.ComputeHash(text);
}
I hope this helps anyone else attempting to generate one time passwords.

This snippet should do what you are asking for:
public class UniqueId
{
public static string GetUniqueKey()
{
int maxSize = 6; // whatever length you want
char[] chars = new char[62];
string a;
a = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
char[] chars = new char[a.Length];
chars = a.ToCharArray();
int size = maxSize;
byte[] data = new byte[1];
RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
crypto.GetNonZeroBytes(data);
size = maxSize;
data = new byte[size];
crypto.GetNonZeroBytes(data);
StringBuilder result = new StringBuilder(size);
foreach (byte b in data)
{ result.Append(chars[b % (chars.Length - 1)]); }
return result.ToString();
}
}

Related

Is it possible to reduce the length of DateTime.Now.Ticks.ToString("X") and still maintain uniqueness?

I have a limitation with some hardware with which I am working wherein I can only broadcast ( wirelessly ) 26 characters.
To overcome this limitation, the first broadcast transmits a timestamp converted into hexadecimal ( DateTime.Now.Ticks.ToString( "X" ) ), along with the length of the message being transmitted ( also as a hexadecimal string ).
The receiving software tests for header messages, and when it confirms that it receives one, stores the time stamp ( reconverted into a long ) in a dictionary :
/*************************************************************************
* _pendingMessages.Add( DateTime.Now.Ticks, Tuple.Create( MessageLength, string.Empty ) );
* T.Item1 = Message Length
* T.Item2 = Message ( when Message.Length == Length, Pop Message )
*************************************************************************/
private static Dictionary<long, Tuple<long, string>> _pendingMessages;
Unfortunately, the time stamp has to be passed each time, and it's... over half the allotted character length ( at 15 characters right now ).
So I was thinking that, rather than pass the entire time stamp, that I might be able to reduce it by summing the value of the characters of the hex string :
For Example :
DateTime.Now.Ticks.ToSTring("X").Sum( C => C ).ToString("X");
Unfortunately, a quick test blew that idea away rather unceremoniously
( duplicate keys rather quickly ) :
Dictionary<string, long> _dctTest = new Dictionary<string, long>( );
while ( true ){
long dtNow = DateTime.Now.Ticks;
string strKey = dtNow.ToString("X").Sum( C => C ).ToStrings("X");
_dctTest.Add( strKey, dtNow ); //<=====Explodes after less than a second.
}
So my question is - is there any way for me to reliably reduce the length of my "Key" while still ( reasonably ) guaranteeing uniqueness?
Here's something to kick-start some answers. I'm not claiming this is an optimal solution but I can get you millisecond precision with only 11 characters 8 characters 7 characters of encoded data.
Assuming millisecond accuracy is good enough, we can reduce the precision of our algorithm from the get-go. A tick represents 100 nanoseconds. There are 10,000 ticks in a millisecond. Here's the algorithm:
Start with a known, large number of ticks that occurred in the past. This example uses the beginning of the century.
long centuryBegin = new DateTime(2001, 1, 1).Ticks;
// 631139040000000000
Now take a snapshot of the current timestamp:
long currentDate = DateTime.Now.Ticks;
// 636083231371736598
Take the difference, and reduce the precision to millisecond-level:
long shortTicks = (currentDate - centuryBegin) / 10000L;
// 494419137173
Now we just base64-encode the string:
string base64Ticks = Convert.ToBase64String(BitConverter.GetBytes(shortTicks));
// lVKtHXMAAAA=
However, without going into too much detail of why, the trailing "AAAA=" will be present on any encoded number of this number of bytes, so we can remove it!
base64Ticks = base64Ticks.Substring(0, 7);
// lVKtHXM
You now have a 7-character string lVKtHXM for transmission. On the other side:
// Decode the base64-encoded string back into bytes
// Note we need to add on the "AAAA=" that we stripped off
byte[] data = new byte[8];
Convert.FromBase64String(base64Ticks + "AAAA=").CopyTo(data, 0);
// From the bytes, convert back to a long, multiply by 10,000, and then
// add on the known century begin number of ticks
long originalTicks = (BitConverter.ToInt64(data, 0) * 10000L) + centuryBegin;
// 636083231371730000
Let's check the difference between the two:
636083231371736598 (original ticks)
-636083231371730000 (decoded ticks)
===================
6598 (difference)
And you can see this gets you to within 6,598 ticks, or 0.6598 milliseconds, of the original timestamp. The difference will always be <= 1 ms.
In terms of uniqueness, I tried this on 100,000 fake transmissions, sleeping for 1 millisecond in between each attempt, and there were no collisions.
To round out this post, here are some helper methods you might use:
public static string EncodeTransmissionTimestamp(DateTime date)
{
long shortTicks = (date.Ticks - 631139040000000000L) / 10000L;
return Convert.ToBase64String(BitConverter.GetBytes(shortTicks)).Substring(0, 7);
}
public static DateTime DecodeTransmissionTimestamp(string encodedTimestamp)
{
byte[] data = new byte[8];
Convert.FromBase64String(encodedTimestamp + "AAAA=").CopyTo(data, 0);
return new DateTime((BitConverter.ToInt64(data, 0) * 10000L) + 631139040000000000L);
}
Some of this work was inspired by this post: Compress large Integers into smallest possible string

RADIUS AND EAP calculating the Message-Authenticator

I have been reading through RFC 3579 as I am implementing a RADIUS service that supports EAP-MD5 authentication. Unfortunately I am a little unsure how to interpret the RFC, particularly when trying to calculate the Message-Authenticator.
I basically create an HMAC-MD5 object (I am using C#) use the shared secret of the NAS for the key and concatenate Type (one byte) + Identifier (one byte) + Length (two bytes) + Request Authenticator (16 bytes) + All Attributes (Except the Message-Authenticator in the Access-Request) but the calculated value does not match the value in the packet.
Following the RFC this seems correct. Am I interpreting the RFC correctly?
Here is the code:
RadiusPacket packet = Objects.Packet;
byte[] toHMAC;
toHMAC = new byte[1] { (byte)packet.Code };
toHMAC = ByteArray.Combine(toHMAC, new byte[1] { packet.Identifier });
// reversed to match endian of packet
toHMAC = ByteArray.Combine(toHMAC, ByteArray.Reverse(packet.LengthAsBytes));
toHMAC = ByteArray.Combine(toHMAC, packet.Authenticator);
for (int i = 0; i < packet.Attributes.Length; i++)
{
if (packet.Attributes[i].Type != RadiusAttributeType.MessageAuthenticator)
{
toHMAC = ByteArray.Combine(toHMAC, packet.Attributes[i].RawData);
}
}
HMACMD5 md5 = new HMACMD5(Encoding.ASCII.GetBytes(Objects.NAS.SharedSecret));
// this DOES NOT match what is in the received packet...
byte[] hmac = md5.ComputeHash(toHMAC);
Any help would be much appreciated.
I found the answer by a combination of re-reading the RFC and looking at some source code in JQuery. Here is what I found for anybody else that has the same is
The RFC (3579) reads:
'When the message integrity check is calculated the signature string should be considered to be sixteen octets of zero.'
Upon recieiving the Access-Request packet I replaced the existing Message-Authenticator with 16 zero bytes then HMAC-MD5 the entire packet and compare the calculated value with the Message-Authenticator in the packet.
The code is much simpler (I created a test packet from a wireshark capture):
// a radius-eap packet captured from wireshark
RadiusPacket packet = new RadiusPacket(ByteArray.FromHex("017600ad375be8f596e90bcffc5e32929d14275b04060a3e01ee05060000c3513d060000000f011f686f73742f64727377696e377472616379702e6472736c2e636f2e756b1e1330302d31322d30302d45332d34312d43311f1342342d39392d42412d46322d38412d44360606000000020c06000005dc4f240200002201686f73742f64727377696e377472616379702e6472736c2e636f2e756b5012c93ef628690a578b31709b0bbccade41"));
// identical packet that I can zero out MA for testing
RadiusPacket radiusPacketCopy = new RadiusPacket(ByteArray.FromHex("017600ad375be8f596e90bcffc5e32929d14275b04060a3e01ee05060000c3513d060000000f011f686f73742f64727377696e377472616379702e6472736c2e636f2e756b1e1330302d31322d30302d45332d34312d43311f1342342d39392d42412d46322d38412d44360606000000020c06000005dc4f240200002201686f73742f64727377696e377472616379702e6472736c2e636f2e756b5012c93ef628690a578b31709b0bbccade41"));
// zero out MA
radiusPacketCopy.ZeroMessageAuthenticator();
// hash it up
HMACMD5 md5 = new HMACMD5(Encoding.ASCII.GetBytes("mykey"));
byte[] hmac = md5.ComputeHash(radiusPacketCopy.RawPacket);
// the message authenticator MUST be correct
if (!ByteArray.AreEqual(hmac, packet.MessageAuthenticator))
{
// etc
Your code is close, but not quite there. You are stripping out the Message-Authenticator attribute completely.
Instead, it should remain in its original position within the packet, but the 16-byte value field of that attribute should be over-written with zeros.

Calculating the number of bits in a Subnet Mask in C#

I have a task to complete in C#. I have a Subnet Mask: 255.255.128.0.
I need to find the number of bits in the Subnet Mask, which would be, in this case, 17.
However, I need to be able to do this in C# WITHOUT the use of the System.Net library (the system I am programming in does not have access to this library).
It seems like the process should be something like:
1) Split the Subnet Mask into Octets.
2) Convert the Octets to be binary.
3) Count the number of Ones in each Octet.
4) Output the total number of found Ones.
However, my C# is pretty poor. Does anyone have the C# knowledge to help?
Bit counting algorithm taken from:
http://www.necessaryandsufficient.net/2009/04/optimising-bit-counting-using-iterative-data-driven-development/
string mask = "255.255.128.0";
int totalBits = 0;
foreach (string octet in mask.Split('.'))
{
byte octetByte = byte.Parse(octet);
while (octetByte != 0)
{
totalBits += octetByte & 1; // logical AND on the LSB
octetByte >>= 1; // do a bitwise shift to the right to create a new LSB
}
}
Console.WriteLine(totalBits);
The most simple algorithm from the article was used. If performance is critical, you might want to read the article and use a more optimized solution from it.
string ip = "255.255.128.0";
string a = "";
ip.Split('.').ToList().ForEach(x => a += Convert.ToInt32(x, 2).ToString());
int ones_found = a.Replace("0", "").Length;
A complete sample:
public int CountBit(string mask)
{
int ones=0;
Array.ForEach(mask.Split('.'),(s)=>Array.ForEach(Convert.ToString(int.Parse(s),2).Where(c=>c=='1').ToArray(),(k)=>ones++));
return ones
}
You can convert a number to binary like this:
string ip = "255.255.128.0";
string[] tokens = ip.Split('.');
string result = "";
foreach (string token in tokens)
{
int tokenNum = int.Parse(token);
string octet = Convert.ToString(tokenNum, 2);
while (octet.Length < 8)
octet = octet + '0';
result += octet;
}
int mask = result.LastIndexOf('1') + 1;
The solution is to use a binary operation like
foreach(string octet in ipAddress.Split('.'))
{
int oct = int.Parse(octet);
while(oct !=0)
{
total += oct & 1; // {1}
oct >>=1; //{2}
}
}
The trick is that on line {1} the binary AND is in sence a multiplication so multiplicating 1x0=0, 1x1=1. So if we have some hypothetic number
0000101001 and multiply it by 1 (so in binary world we execute &), which is nothig else then 0000000001, we get
0000101001
0000000001
Most right digit is 1 in both numbers so making binary AND return 1, otherwise if ANY of the numbers minor digit will be 0, the result will be 0.
So here, on line total += oct & 1 we add to tolal either 1 or 0, based on that digi number.
On line {2}, instead we just shift the minor bit to right by, actually, deviding the number by 2, untill it becomes 0.
Easy.
EDIT
This is valid for intgere and for byte types, but do not use this technique on floating point numbers. By the way, it's pretty valuable solution for this question.

How to generate a LONG guid?

I would like to generate a long UUID - something like the session key used by gmail. It should be at least 256 chars and no more than 512. It can contain all alpha-numeric chars and a few special chars (the ones below the function keys on the keyboard). Has this been done already or is there a sample out there?
C++ or C#
Update: A GUID is not enough. We already have been seeing collisions and need to remedy this. 512 is the max as of now because it will prevent us from changing stuff that was already shipped.
Update 2: For the guys who are insisting about how unique the GUID is, if someone wants to guess your next session ID, they don't have to compute the combinations for the next 1 trillion years. All they have to do is use constrain the time factor and they will be done in hours.
If your GUIDs are colliding, may I ask how you're generating them?
It is astronomically improbable that GUIDs would collide as they are based on:
60 bits - timestamp during generation
48 bits - computer identifier
14 bits - unique ID
6 bits are fixed
You would have to run the GUID generation on the same machine about 50 times in the exact same instant in time in order to have a 50% chance of collision. Note that instant is measured down to nanoseconds.
Update:
As per your comment "putting GUIDs into a hashtable"... the GetHashCode() method is what is causing the collision, not the GUIDs:
public override int GetHashCode()
{
return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}
You can see it returns an int, so if you have more than 2^32 "GUIDs" in the hashtable, you are 100% going to have a collision.
As per your update2 you are correct on Guids are predicable even the msdn references that. here is a method that uses a crptographicly strong random number generator to create the ID.
static long counter; //store and load the counter from persistent storage every time the program loads or closes.
public static string CreateRandomString(int length)
{
long count = System.Threading.Interlocked.Increment(ref counter);
int PasswordLength = length;
String _allowedChars = "abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNOPQRSTUVWXYZ23456789";
Byte[] randomBytes = new Byte[PasswordLength];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
char[] chars = new char[PasswordLength];
int allowedCharCount = _allowedChars.Length;
for (int i = 0; i < PasswordLength; i++)
{
while(randomBytes[i] > byte.MaxValue - (byte.MaxValue % allowedCharCount))
{
byte[] tmp = new byte[1];
rng.GetBytes(tmp);
randomBytes[i] = tmp[0];
}
chars[i] = _allowedChars[(int)randomBytes[i] % allowedCharCount];
}
byte[] buf = new byte[8];
buf[0] = (byte) count;
buf[1] = (byte) (count >> 8);
buf[2] = (byte) (count >> 16);
buf[3] = (byte) (count >> 24);
buf[4] = (byte) (count >> 32);
buf[5] = (byte) (count >> 40);
buf[6] = (byte) (count >> 48);
buf[7] = (byte) (count >> 56);
return Convert.ToBase64String(buf) + new string(chars);
}
EDIT I know there is some biasing because allowedCharCount is not evenly divisible by 255, you can get rid of the bias throwing away and getting a new random number if it lands in the no-mans-land of the remainder.
EDIT2 - This is not guaranteed to be unique, you could hold a static 64 bit(or higher if necessary) monotonic counter encode it to base46 and have that be the first 4-5 characters of the id.
UPDATE - Now guaranteed to be unique
UPDATE 2: Algorithm is now slower but removed biasing.
EDIT: I just ran a test, I wanted to let you know that ToBase64String can return non alphnumeric charaters (like 1 encodes to "AQAAAAAAAAA=") just so you are aware.
New Version:
Taking from Matt Dotson's answer on this page, if you are no so worried about the keyspace you can do it this way and it will run a LOT faster.
public static string CreateRandomString(int length)
{
length -= 12; //12 digits are the counter
if (length <= 0)
throw new ArgumentOutOfRangeException("length");
long count = System.Threading.Interlocked.Increment(ref counter);
Byte[] randomBytes = new Byte[length * 3 / 4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
byte[] buf = new byte[8];
buf[0] = (byte)count;
buf[1] = (byte)(count >> 8);
buf[2] = (byte)(count >> 16);
buf[3] = (byte)(count >> 24);
buf[4] = (byte)(count >> 32);
buf[5] = (byte)(count >> 40);
buf[6] = (byte)(count >> 48);
buf[7] = (byte)(count >> 56);
return Convert.ToBase64String(buf) + Convert.ToBase64String(randomBytes);
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < HOW_MUCH_YOU_WANT / 32; i++)
sb.Append(Guid.NewGuid().ToString("N"));
return sb.ToString();
but what for?
The problem here is why, not how. A session ID bigger than a GUID is useless, because it's already big enough to thwart brute force attacks.
If you're concerned about predicting GUID's, don't be. Unlike the earlier, sequential GUID's, V4 GUID's are cryptographically secure, based on RC4. The only exploit I know about depends on having full access to the internal state of the process that's generating the values, so it can't get you anywhere if all you have is a partial sequence of GUID's.
If you're paranoid, generate a GUID, hash it with something like SHA-1, and use that value. However, this is a waste of time. If you're concerned about session hijacking, you should be looking at SSL, not this.
byte[] random = new Byte[384];
//RNGCryptoServiceProvider is an implementation of a random number generator.
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(random);
var sessionId = Convert.ToBase64String(random);
You can replace the "/" and "=" from the base64 encoding to be whatever special characters are acceptable to you.
Base64 encoding creates a string that is 4/3 larger than the byte array (hence the 384 bytes should give you 512 characters).
This should give you orders of magnatude more values than a base16 (hex) encoded guid. 512^16 vs 512^64
Also if you are putting these in sql server, make sure to turn OFF case insensitivity.
There are two really easy ways (C#):
1) Generate a bunch of Guids using Guid.NewGuid().ToString("N"). each GUID will be 32 characters long, so just generate 8 of them and concatenate them to get 256 chars.
2) Create a constant string (const string sChars = "abcdef") of acceptable characters you'd like in your UID. Then in a loop, randomly pick characters from that string by randomly generating a number from 0 to the length of the string of acceptable characters (sChars), and concatenate them in a new string (use stringbuilder to make it more performant, but string will work too).
You may want to check out boost's Uuid Library. It supports a variety of generators, including a random generator that might suit your needs.
I would use some kind of hash of std::time() probably sha512.
ex (using crypto++ for the sha hash + base64 encoding).
#include <iostream>
#include <sstream>
#include <ctime>
#include <crypto++/sha.h>
#include <crypto++/base64.h>
int main() {
std::string digest;
std::stringstream ss("");
ss << std::time(NULL);
// borrowed from http://www.cryptopp.com/fom-serve/cache/50.html
CryptoPP::SHA512 hash;
CryptoPP::StringSource foo(ss.str(), true,
new CryptoPP::HashFilter(hash,
new CryptoPP::Base64Encoder(
new CryptoPP::StringSink(digest))));
std::cout << digest << std::endl;
return 0;
}
https://github.com/bigfatsea/SUID Simple Unique Identifier
Though it's in Java, but can be easily ported to any other language. You may expect duplicated ids on same instance 136 years later, good enough for medium-small projects.
Example:
long id = SUID.id().get();

YouTube-like GUID

Is it possible to generate short GUID like in YouTube (N7Et6c9nL9w)?
How can it be done? I want to use it in web app.
You could use Base64:
string base64Guid = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
That generates a string like E1HKfn68Pkms5zsZsvKONw==. Since a GUID is always 128 bits, you can omit the == that you know will always be present at the end and that will give you a 22 character string. This isn't as short as YouTube though.
URL Friendly Solution
As mentioned in the accepted answer, base64 is a good solution but it can cause issues if you want to use the GUID in a URL. This is because + and / are valid base64 characters, but have special meaning in URLs.
Luckily, there are unused characters in base64 that are URL friendly. Here is a more complete answer:
public string ToShortString(Guid guid)
{
var base64Guid = Convert.ToBase64String(guid.ToByteArray());
// Replace URL unfriendly characters
base64Guid = base64Guid.Replace('+', '-').Replace('/', '_');
// Remove the trailing ==
return base64Guid.Substring(0, base64Guid.Length - 2);
}
public Guid FromShortString(string str)
{
str = str.Replace('_', '/').Replace('-', '+');
var byteArray = Convert.FromBase64String(str + "==");
return new Guid(byteArray);
}
Usage:
Guid guid = Guid.NewGuid();
string shortStr = ToShortString(guid);
// shortStr will look something like 2LP8GcHr-EC4D__QTizUWw
Guid guid2 = FromShortString(shortStr);
Assert.AreEqual(guid, guid2);
EDIT:
Can we do better? (Theoretical limit)
The above yields a 22 character, URL friendly GUID.
This is because a GUID uses 128 bits, so representing it in base64 requires
characters, which is 21.33, which rounds up to 22.
There are actually 66 URL friendly characters (we aren't using . and ~). So theoretically, we could use base66 to get
characters, which is 21.17, which also rounds up to 22.
So this is optimal for a full, valid GUID.
However, GUID uses 6 bits to indicate the version and variant, which in our case are constant. So we technically only need 122 bits, which in both bases rounds to 21 ( = 20.33). So with more manipulation, we could remove another character. This requires wrangling the bits out however, so I leave this as an exercise to the reader.
How does youtube do it?
YouTube IDs use 11 characters. How do they do it?
A GUID uses 122 bits, which guarantees collisions are virtually impossible. This means you can generate a random GUID and be certain it is unique without checking. However, we don't need so many bits for just a regular ID.
We could use a smaller ID. If we use 66 bits or less, we have a higher risk of collision, but can represent this ID with 11 characters (even in base64). One could either accept the risk of collision, or test for a collision and regenerate.
With 122 bits (regular GUID), you would have to generate ~ GUIDs to have a 1% chance of collision.
With 66 bits, you would have to generate ~ or 1 billion IDs to have a 1% chance of collision. That is not that many IDs.
My guess is youtube uses 64 bits (which is more memory friendly than 66 bits), and checks for collisions to regenerate the ID if necessary.
If you want to abandon GUIDs in favor of smaller IDs, here is code for that:
class IdFactory
{
private Random random = new Random();
public int CharacterCount { get; }
public IdFactory(int characterCount)
{
CharacterCount = characterCount;
}
public string Generate()
{
// bitCount = characterCount * log (targetBase) / log(2)
var bitCount = 6 * CharacterCount;
var byteCount = (int)Math.Ceiling(bitCount / 8f);
byte[] buffer = new byte[byteCount];
random.NextBytes(buffer);
string guid = Convert.ToBase64String(buffer);
// Replace URL unfriendly characters
guid = guid.Replace('+', '-').Replace('/', '_');
// Trim characters to fit the count
return guid.Substring(0, CharacterCount);
}
}
Usage:
var factory = new IdFactory(characterCount: 11);
string guid = factory.Generate();
// guid will look like Mh3darwiZhp
This uses 64 characters which is not optimal, but requires much less code (since we can reuse Convert.ToBase64String).
You should be a lot more careful of collisions if you use this.
9 chars is not a GUID. Given that, you could use the hexadecimal representation of an int, which gives you a 8 char string.
You can use an id you might already have. Also you can use .GetHashCode against different simple types and there you have a different int. You can also xor different fields. And if you are into it, you might even use a Random number - hey, you have well above 2.000.000.000+ possible values if you stick to the positives ;)
It's not a GUID but rather an auto-incremented unique alphanumeric string
Please see the following code where I am trying to do the same, It uses the TotalMilliseconds from EPOCH and a valid set of characters to generate a unique string that is incremented with each passing milliseconds.
The one other way is to use numeric counters but that is expensive to maintain and will create a series where you can + or - values to guess the previous or the next unique string in the system and we don't what that to happen.
Do remember:
This will not be globally unique but unique to the instance where it's defined
It uses Thread.Sleep() to handle multithreading issue
public string YoutubeLikeId()
{
Thread.Sleep(1);//make everything unique while looping
long ticks = (long)(DateTime.UtcNow
.Subtract(new DateTime(1970, 1, 1,0,0,0,0))).TotalMilliseconds;//EPOCH
char[] baseChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
.ToCharArray();
int i = 32;
char[] buffer = new char[i];
int targetBase= baseChars.Length;
do{
buffer[--i] = baseChars[ticks % targetBase];
ticks = ticks / targetBase;
}
while (ticks > 0);
char[] result = new char[32 - i];
Array.Copy(buffer, i, result, 0, 32 - i);
return new string(result);
}
The output will come something like
XOTgBsu
XOTgBtB
XOTgBtR
XOTgBtg
XOTgBtw
XOTgBuE
Update: The same can be achieved from Guid as
var guid = Guid.NewGuid();
guid.ToString("N");
guid.ToString("N").Substring(0,8);
guid.ToString("N").Substring(8,4);
guid.ToString("N").Substring(12,4);
guid.ToString("N").Substring(16,4);
guid.ToString("N").Substring(20,12);
For a Guid ecd65132-ab5a-4587-87b8-b875e2fe0f35 it will break it down in chunks as ecd65132 ,ab5a , 4587,87b8,b875e2fe0f35
but it's not guarantee it to be unique always.
Update 2: There is also a project called ShortGuid to get a url friendly GUID it can be converted from/to a regular Guid
When I went under the hood I found it works by encoding the Guid to Base64 as the code below:
public static string Encode(Guid guid)
{
string encoded = Convert.ToBase64String(guid.ToByteArray());
encoded = encoded
.Replace("/", "_")
.Replace("+", "-");
return encoded.Substring(0, 22);
}
The good thing about it it can be decoded again to get the Guid back with
public static Guid Decode(string value)
{
// avoid parsing larger strings/blobs
if (value.Length != 22)
{
throw new ArgumentException("A ShortGuid must be exactly 22 characters long. Receive a character string.");
}
string base64 = value
.Replace("_", "/")
.Replace("-", "+") + "==";
byte[] blob = Convert.FromBase64String(base64);
var guid = new Guid(blob);
var sanityCheck = Encode(guid);
if (sanityCheck != value)
{
throw new FormatException(
#"Invalid strict ShortGuid encoded string. The string '{value}' is valid URL-safe Base64, " +
#"but failed a round-trip test expecting '{sanityCheck}'."
);
}
return guid;
}
So a Guid 4039124b-6153-4721-84dc-f56f5b057ac2 will be encoded as SxI5QFNhIUeE3PVvWwV6wg and the Output will look something like.
ANf-MxRHHky2TptaXBxcwA
zpjp-stmVE6ZCbOjbeyzew
jk7P-XYFokmqgGguk_530A
81t6YZtkikGfLglibYkDhQ
qiM2GmqCK0e8wQvOSn-zLA
As others have mentioned, YouTube's VideoId is not technically a GUID since it's not inherently unique.
As per Wikipedia:
The total number of unique keys is 2128 or 3.4×1038. This number is so
large that the probability of the same number being generated randomly
twice is negligible.
The uniqueness YouTube's VideoId is maintained by their generator algorithm.
You can either write your own algorithm, or you can use some sort of random string generator and utilize the UNIQUE CONSTRAINT constraint in SQL to enforce its uniqueness.
First, create a UNIQUE CONSTRAINT in your database:
ALTER TABLE MyTable
ADD CONSTRAINT UniqueUrlId
UNIQUE (UrlId);
Then, for example, generate a random string (from philipproplesch's answer):
string shortUrl = System.Web.Security.Membership.GeneratePassword(11, 0);
If the generated UrlId is sufficiently random and sufficiently long you should rarely encounter the exception that is thrown when SQL encounters a duplicate UrlId. In such an event, you can easily handle the exception in your web app.
Technically it's not a Guid. Youtube has a simple randomized string generator that you can probably whip up in a few minutes using an array of allowed characters and a random number generator.
It might be not the best solution, but you can do something like that:
string shortUrl = System.Web.Security.Membership.GeneratePassword(11, 0);
This id is probably not globally unique. GUID's should be globally unique as they include elements which should not occur elsewhere (the MAC address of the machine generating the ID, the time the ID was generated, etc.)
If what you need is an ID that is unique within your application, use a number fountain - perhaps encoding the value as a hexadecimal number. Every time you need an id, grab it from the number fountain.
If you have multiple servers allocating id's, you could grab a range of numbers (a few tens or thousands depending on how quickly you're allocating ids) and that should do the job. an 8 digit hex number will give you 4 billion ids - but your first id's will be a lot shorter.
Maybe using NanoId will save you from a lot of headaches:
https://github.com/codeyu/nanoid-net
You can do something like:
var id = Nanoid.Generate('1234567890abcdef', 10) //=> "4f90d13a42"
And you can check the collision probability here:
https://alex7kom.github.io/nano-nanoid-cc/

Categories

Resources