For an application I am working on, I need to generate a session token which must have following properties:
it should be unique for the application at least, globally unique would be even better
it should be unpredictable
it should not be too long, as it will have to be passed around in a http request header, so smaller is better
I was thinking about adapting a Guid structure with a cryptographic random number, but it may be an overkill. Anyone know/created a data structure that would fit all those properties?
Let me be very clear on this point. All of the answers saying to use a GUID are deeply wrong and you should not under any circumstances do so.
There is nothing whatsoever in the specification for GUIDs that requires that they be unpredictable by attackers. GUIDs are allowed to be sequential if allocated in blocks, GUIDs are allowed to be created with non-crypto-strength randomness, and GUIDs are allowed to be created from known facts about the world, like your MAC address and the current time. Nothing whatsoever requires that GUIDs be unpredictable.
If you need an unpredictable session key then generate one using a crypto strength random number generator to make sufficiently many bits.
More generally: use the right tool for the job particularly when it comes to security-impacting code. GUIDs were designed to make sure that two companies did not accidentally give the same identifier to two different interfaces. If that's your application, use a GUID for it. GUIDs were invented to prevent accidents by benign entities, not to protect innocent users against determined attackers. GUIDs are like stop signs -- there to prevent accidental collisions, not to protect against attacking tanks.
GUIDs were not designed to solve any problem other than accident prevention, so do not under any circumstances use them to solve crypto problems. Use a crypto library specifically designed to solve your problem, and implemented by world-class experts.
You could use the cryptographic RandomNumberGenerator and get however many bytes you want to generate an identifier of suitable length.
RandomNumberGenerator rng = RandomNumberGenerator.Create();
byte[] bytes = new byte[8];
rng.GetBytes(bytes);
// produces a string like "4jpKc52ArXU="
var s = Convert.ToBase64String(bytes);
How about using what ASP.NET uses to generate unique, unpredictable and secure session ids:
using System;
using System.IO;
using System.Web;
using System.Web.SessionState;
class Program
{
static void Main()
{
var request = new HttpRequest("", "http://localhost", "");
var response = new HttpResponse(TextWriter.Null);
var context = new HttpContext(request, response);
var id = new SessionIDManager().CreateSessionID(context);
Console.WriteLine(id);
}
}
If you really and truly must have unpredictability, then generate a GUID and use a 128-bit bit mixer to rearrange the bits. Probably best to use a 64-bit bit mixer to mix the high and low portions. And, yes, there are bit mixers that guarantee unique output for every unique input.
Note that this isn't completely unpredictable. If somebody watching knows that the value is an obfuscated GUID, then he can probably examine successive values and, with some effort, potentially reverse-engineer your bit mixer.
Related
Is there a way/algorithm/method to generate a new Guid (x) using our old GUid (y) and then get y back whenever we want from x?
Something similar to below answer but it shows a way to old Guid(I can consider it as a string) to convert to Guid but not a way back.
https://stackoverflow.com/a/9386095/5887074
How can I generate a GUID for a string?
P.S.: This is not related to anything security. The two Guids will just be used to find records from the table. We can convert Guid to string in this conversion if required.
There are thousands of ways: a guid is 128 bits, so you could flip one bit which would make it simple to translate back and forth. Or you could do modulo 42 and make it look as if you made something unpredictable. Or you could reverse the order of the bits, do a NOT operation on all of them or rearrange the bits by some predefined pattern.
But I suspect that you have a use case which you do not define. Please tell a bit more about the problem you want to solve. Your request sounds a little bit dangerous as it sounds as if you want to enable some kind of tracking between seemingly unrelated entities. If there is some security issues involved you are very likely to get it wrong if both cleartext (guid pre translation) and cipher (guid after translation) are public. Perhaps simple AES encryption would suffice as a translation function, but I think you need to specify you problems in much more details to get a useful answer.
I'm looking at using a Guid as a random anonymous visitor identifier for a website (stored both as a cookie client-size, and in a db server-side), and I wanted a cryptographically strong way of generating Guids (so as to minimize the chance of collisions).
For the record, there are 16 bytes (or 128 bits) in a Guid.
This is what I have in mind:
/// <summary>
/// Generate a cryptographically strong Guid
/// </summary>
/// <returns>a random Guid</returns>
private Guid GenerateNewGuid()
{
byte[] guidBytes = new byte[16]; // Guids are 16 bytes long
RNGCryptoServiceProvider random = new RNGCryptoServiceProvider();
random.GetBytes(guidBytes);
return new Guid(guidBytes);
}
Is there a better way to do this?
Edit:
This will be used for two purposes, a unique Id for a visitor, and a transaction Id for purchases (which will briefly be the token needed for viewing/updating sensitive information).
In answer to the OP's actual question whether this is cryptographically strong, the answer is yes since it is created directly from RNGCryptoServiceProvider. However the currently accepted answer provides a solution that is most definitely not cryptographically secure as per this SO answer:
Is Microsoft's GUID generator cryptographically secure.
Whether this is the correct approach architecturally due to theoretical lack of uniqueness (easily checked with a db lookup) is another concern.
So, what you're building is not technically a GUID. A GUID is a Globally Unique Identifier. You're building a random string of 128 bits. I suggest, like the previous answerer, that you use the built-in GUID generation methods. This method has a (albeit tremendously small) chance of generating duplicate GUID's.
There are a few advantages to using the built-in functionality, including cross-machine uniqueness [partially due to the MAC Address being referenced in the guid, see here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier.
Regardless of whether you use the built in methods, I suggest that you not expose the Purchase GUID to the customer. The standard method used by Microsoft code is to expose a Session GUID that identifies the customer and expires comparatively quickly. Cookies track customer username and saved passwords for session creation. Thus your 'short term purchase ID' is never actually passed to (or, more importantly, received from) the client and there is a more durable wall between your customers' personal information and the Interwebs at large.
Collisions are theoretically impossible (it's not Globally Unique for nothing), but predictability is a whole other question. As Christopher Stevenson correctly points out, given a few previously generated GUIDs it actually becomes possible to start predicting a pattern within a much smaller keyspace than you'd think. GUIDs guarantee uniqueness, not predictability. Most algorithms take it into account, but you should never count on it, especially not as transaction Id for purchases, however briefly. You're creating an open door for brute force session hijacking attacks.
To create a proper unique ID, take some random stuff from your system, append some visitor specific information, and append a string only you know on the server, and then put a good hash algorithm over the whole thing. Hashes are meant to be unpredictable and unreversable, unlike GUIDs.
To simplify: if uniqueness is all you care about, why not just give all your visitors sequential integers, from 1 to infinity. Guaranteed to be unique, just terribly predictable that when you just purchased item 684 you can start hacking away at 685 until it appears.
To avoid collisions:
If you can't keep a global count, then use Guid.NewGuid().
Otherwise, increment some integer and use 1, 2, 3, 4...
"But isn't that ridiculously easy to guess?"
Yes, but accidental and deliberate collisions are different problems with different solutions, best solved separately, note least because predictability helps prevent accidental collision while simultaneously making deliberate collision easier.
If you can increment globally, then number 2 guarantees no collisions. UUIDs were invented as a means to approximate that without the ability to globally track.
Let's say we use incrementing integers. Let's say the ID we have in a given case is 123.
We can then do something like:
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
Which produces 09C495910319E4BED2A64EA16149521C51791D8E123. To decode it back to the id we do:
private static int GetIDFromProtectedID(string str)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
string chkHash = chkID.ToString() + "this is my secret seed kjٵتשڪᴻᴌḶḇᶄ™∞ﮟﻑfasdfj90213";
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
return chkID;
}
}
return 0;//or perhaps raise an exception here.
}
Even if someone guessed from that they were given number 123, it wouldn't let them deduce that the id for 122 was B96594E536C9F10ED964EEB4E3D407F183FDA043122.
Alternatively, the two could be given as separate tokens, and so on.
I generally just use Guid.NewGuid();
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx
I'm looking to generate unique ids for identifying some data in my system. I'm using an elaborate system which concatenates some (non unique, relevant) meta-data with System.Guid.NewGuid()s. Are there any drawbacks to this approach, or am I in the clear?
I'm looking to generate unique ids for identifying some data in my system.
I'd recommend a GUID then, since they are by definition globally unique identifiers.
I'm using an elaborate system which concatenates some (non unique, relevant) meta-data with System.Guid.NewGuid(). Are there any drawbacks to this approach, or am I in the clear?
Well, since we do not know what you would consider a drawback, it is hard to say. A number of possible drawbacks come to mind:
GUIDs are big: 128 bits is a lot of bits.
GUIDs are not guaranteed to have any particular distribution; it is perfectly legal for GUIDs to be generated sequentially, and it is perfectly legal for the to be distributed uniformly over their 124 bit space (128 bits minus the four bits that are the version number of course.) This can have serious impacts on database performance if the GUID is being used as a primary key on a database that is indexed into sorted order by the GUID; insertions are much more efficient if the new row always goes at the end. A uniformly distributed GUID will almost never be at the end.
Version 4 GUIDs are not necessarily cryptographically random; if GUIDs are generated by a non-crypto-random generator, an attacker could in theory predict what your GUIDs are when given a representative sample of them. An attacker could in theory determine the probability that two GUIDs were generated in the same session. Version one GUIDs are of course barely random at all, and can tell the sophisticated reader when and where they were generated.
And so on.
I am planning a series of articles about these and other characteristics of GUIDs in the next couple of weeks; watch my blog for details.
UPDATE: https://ericlippert.com/2012/04/24/guid-guide-part-one/
When you use System.Guid.NewGuid(), you may still want to check that the guid doesn't already exist in your system.
While a guid is so complex as to be virtually unique, there is nothing to guarantee that it doesn't already exist except probability. It's just incredibly statistically unlikely, to the point that in almost any case it's the same as being unique.
Generating to identical guids is like winning the lottery twice - there's nothing to actually prevent it, it's just so unlikely it might as well be impossible.
Most of the time you could probably get away with not checking for existing matches, but in a very extreme case with lots of generation going on, or where the system absolutely must not fail, it could be worth checking.
EDIT
Let me clarify a little more. It is highly, highly unlikely that you would ever see a duplicate guid. That's the point. It's "globally unique", meaning there's such an infinitesimally chance of a duplicate that you can assume it will be unique. However, if we are talking about code that keeps an aircraft in the sky, monitors a nuclear reactor, or handles life support on the International Space Station, I, personally, would still check for a duplicate, just because it would really be terrible to hit that edge case. If you're just writing a blog engine, on the other hand, go ahead, use it without checking.
Feel free to use NewGuid(). There is no problem with its uniqueness.
There is too low probability that it will generate the same guid twice; a nice example can be found here: Simple proof that GUID is not unique
var bigHeapOGuids = new Dictionary<Guid, Guid>();
try
{
do
{
Guid guid = Guid.NewGuid();
bigHeapOGuids.Add(guid ,guid );
} while (true);
}
catch (OutOfMemoryException)
{
}
At some point it just crashed on OutOfMemory and not on duplicated key conflict.
In our application we are creating Xml files with an attribute that has a Guid value. This value needed to be consistent between file upgrades. So even if everything else in the file changes, the guid value for the attribute should remain the same.
One obvious solution was to create a static dictionary with the filename and the Guids to be used for them. Then whenever we generate the file, we look up the dictionary for the filename and use the corresponding guid. But this is not feasible because we might scale to 100's of files and didnt want to maintain big list of guids.
So another approach was to make the Guid the same based on the path of the file. Since our file paths and application directory structure are unique, the Guid should be unique for that path. So each time we run an upgrade, the file gets the same guid based on its path. I found one cool way to generate such 'Deterministic Guids' (Thanks Elton Stoneman). It basically does this:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
So given a string, the Guid will always be the same.
Are there any other approaches or recommended ways to doing this? What are the pros or cons of that method?
As mentioned by #bacar, RFC 4122 §4.3 defines a way to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs.
There's no native support in the .NET Framework for creating these, but I posted code on GitHub that implements the algorithm. It can be used as follows:
Guid guid = GuidUtility.Create(GuidUtility.UrlNamespace, filePath);
To reduce the risk of collisions with other GUIDs even further, you could create a private GUID to use as the namespace ID (instead of using the URL namespace ID defined in the RFC).
This will convert any string into a Guid without having to import an outside assembly.
public static Guid ToGuid(string src)
{
byte[] stringbytes = Encoding.UTF8.GetBytes(src);
byte[] hashedBytes = new System.Security.Cryptography
.SHA1CryptoServiceProvider()
.ComputeHash(stringbytes);
Array.Resize(ref hashedBytes, 16);
return new Guid(hashedBytes);
}
There are much better ways to generate a unique Guid but this is a way to consistently upgrading a string data key to a Guid data key.
As Rob mentions, your method doesn't generate a UUID, it generates a hash that looks like a UUID.
The RFC 4122 on UUIDs specifically allows for deterministic (name-based) UUIDs - Versions 3 and 5 use md5 and SHA1(respectively). Most people are probably familiar with version 4, which is random. Wikipedia gives a good overview of the versions. (Note that the use of the word 'version' here seems to describe a 'type' of UUID - version 5 doesn't supercede version 4).
There seem to be a few libraries out there for generating version 3/5 UUIDs, including the python uuid module, boost.uuid (C++) and OSSP UUID. (I haven't looked for any .net ones)
You need to make a distinction between instances of the class Guid, and identifiers that are globally unique. A "deterministic guid" is actually a hash (as evidenced by your call to provider.ComputeHash). Hashes have a much higher chance of collisions (two different strings happening to produce the same hash) than Guid created via Guid.NewGuid.
So the problem with your approach is that you will have to be ok with the possibility that two different paths will produce the same GUID. If you need an identifier that's unique for any given path string, then the easiest thing to do is just use the string. If you need the string to be obscured from your users, encrypt it - you can use ROT13 or something more powerful...
Attempting to shoehorn something that isn't a pure GUID into the GUID datatype could lead to maintenance problems in future...
MD5 is weak, I believe you can do the same thing with SHA-1 and get better results.
BTW, just a personal opinion, dressing a md5 hash up as a GUID does not make it a good GUID. GUIDs by their very nature are non Deterministic. this feels like a cheat. Why not just call a spade a spade and just say its a string rendered hash of the input. you could do that by using this line, rather than the new guid line:
string stringHash = BitConverter.ToString(hashBytes)
Here's a very simple solution that should be good enough for things like unit/integration tests:
var rnd = new Random(1234); // Seeded random number (deterministic).
Console.WriteLine($"{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}");
I have an object with the following properties
GID
ID
Code
Name
Some of the clients dont want to enter the Code so the intial plan was to put the ID in the code but the baseobject of the orm is different so I'm like screwed...
my plan was to put ####-#### totally random values in code how can I generate something like that say a windows 7 serial generator type stuff but would that not have an overhead what would you do in this case.
Do you want a random value, or a unique value?
random != unique.
Remember, random merely states a probability of not generating the same value, or a probability of generating the same value again. As time increases, likelihood of generating a previous value increases - becoming a near certainty. Which do you require?
Personally, I recommend just using a Guid with some context [refer to easiest section below]. I also provided some other suggestions so you have options, depending on your situation.
easiest
If Code is an unbounded string [ie can be of any length], easiest semi-legible means of generating a unique code would be
OrmObject ormObject= new OrmObject ();
string code = string.
Format ("{0} [{1}]", ormObject.Name, Guid.NewGuid ()).
Trim ();
// generates something like
// "My Product [DA9190E1-7FC6-49d6-9EA5-589BBE6E005E]"
you can substitute ormObject.Name for any distinguishable string. I would typically use typeof (objectInstance.GetType ()).Name but that will only work if OrmObject is a base class, if it's a concrete class used for everything they will all end up with similar tags. The point is to add some user context, such that - as in #Yuriy Faktorovich's referenced wtf article - users have something to read.
random
I responded a day or two ago about random number generation. Not so much generating numbers as building a simple flexible framework around a generator to improve quality of code and data, this should help streamline your source.
If you read that, you could easily write an extension method, say
public static class IRandomExtensions
{
public static CodeType GetCode (this IRandom random)
{
// 1. get as many random bytes as required
// 2. transform bytes into a 'Code'
// 3. bob's your uncle
...
}
}
// elsewhere in code
...
OrmObject ormObject = new OrmObject ();
ormObject.Code = random.GetCode ();
...
To actually generate a value, I would suggest implementing an IRandom interface with a System.Security.Cryptography.RNGCryptoServiceProvider implementation. Said implementation would generate a buffer of X random bytes, and dole out as many as required, regenerating a stream when exhausted.
Furthermore - I don't know why I keep writing, I guess this problem is really quite fascinating! - if CodeType is string and you want something readable, you could just take said random bytes and turn them into a "seemingly" readable string via Base64 conversion
public static class IRandomExtensions
{
// assuming 'CodeType' is in fact a string
public static string GetCode (this IRandom random)
{
// 1. get as many random bytes as required
byte[] randomBytes; // fill from random
// 2. transform bytes into a 'Code'
string randomBase64String =
System.Convert.ToBase64String (randomBytes).Trim ("=");
// 3. bob's your uncle
...
}
}
Remember
random != unique.
Your values will repeat. Eventually.
unique
There are a number of questions you need to ask yourself about your problem.
Must all Code values be unique? [if not, you're trying too hard]
What Type is Code? [if any-length string, use a full Guid]
Is this a distributed application? [if not, use a DB value as suggested by #LBushkin above]
If it is a distributed application, can client applications generate and submit instances of these objects? [if so, then you want a globally unique identifier, and again Guids are a sure bet]
I'm sure you have more constraints, but this is an example of the kind of line of inquiry you need to perform when you encounter a problem like your own. From these questions, you will come up with a series of constraints. These constraints will inform your design.
Hope this helps :)
Btw, you will receive better quality solutions if you post more details [ie constraints] about your problem. Again, what Type is Code, are there length constraints? Format constraints? Character constraints?
Arg, last edit, I swear. If you do end up using Guids, you may wish to obfuscate this, or even "compress" their representation by encoding them in base64 - similar to base64 conversion above for random numbers.
public static class GuidExtensions
{
public static string ToBase64String (this Guid id)
{
return System.Convert.
ToBase64String (id.ToByteArray ()).
Trim ("=");
}
}
Unlike truncating, base64 conversion is not a lossful transformation. Of course, the trim above is lossful in context of full base64 expansion - but = is just padding, extra information introduced by the conversion, and not part of original Guid data. If you want to go back to a Guid from this base64 converted value, then you will have to re-pad your base64 string until its length is a multiple of 4 - don't ask, just look up base64 if you are interested :)
You could generate a Guid using :
Guid.NewGuid().ToString();
It would give you something like :
788E94A0-C492-11DE-BFD4-FCE355D89593
Use an Autonumber column or Sequencer from your database to generate a unique code number. Almost all modern databases support automatically generated numbers in one form or another. Look into what you database supports.
Autonumber/Sequencer values from the DB are guaranteed to be unique and are relatively inexpensive to acquire. If you want to avoid completely sequential numbers assigned to codes, you can pad and concatenate several sequencer values together.