How to Create Deterministic Guids

How to Create Deterministic Guids - c#

In our application we are creating Xml files with an attribute that has a Guid value. This value needed to be consistent between file upgrades. So even if everything else in the file changes, the guid value for the attribute should remain the same.
One obvious solution was to create a static dictionary with the filename and the Guids to be used for them. Then whenever we generate the file, we look up the dictionary for the filename and use the corresponding guid. But this is not feasible because we might scale to 100's of files and didnt want to maintain big list of guids.
So another approach was to make the Guid the same based on the path of the file. Since our file paths and application directory structure are unique, the Guid should be unique for that path. So each time we run an upgrade, the file gets the same guid based on its path. I found one cool way to generate such 'Deterministic Guids' (Thanks Elton Stoneman). It basically does this:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
So given a string, the Guid will always be the same.
Are there any other approaches or recommended ways to doing this? What are the pros or cons of that method?

As mentioned by #bacar, RFC 4122 §4.3 defines a way to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs.
There's no native support in the .NET Framework for creating these, but I posted code on GitHub that implements the algorithm. It can be used as follows:
Guid guid = GuidUtility.Create(GuidUtility.UrlNamespace, filePath);
To reduce the risk of collisions with other GUIDs even further, you could create a private GUID to use as the namespace ID (instead of using the URL namespace ID defined in the RFC).

This will convert any string into a Guid without having to import an outside assembly.
public static Guid ToGuid(string src)
{
byte[] stringbytes = Encoding.UTF8.GetBytes(src);
byte[] hashedBytes = new System.Security.Cryptography
.SHA1CryptoServiceProvider()
.ComputeHash(stringbytes);
Array.Resize(ref hashedBytes, 16);
return new Guid(hashedBytes);
}
There are much better ways to generate a unique Guid but this is a way to consistently upgrading a string data key to a Guid data key.

As Rob mentions, your method doesn't generate a UUID, it generates a hash that looks like a UUID.
The RFC 4122 on UUIDs specifically allows for deterministic (name-based) UUIDs - Versions 3 and 5 use md5 and SHA1(respectively). Most people are probably familiar with version 4, which is random. Wikipedia gives a good overview of the versions. (Note that the use of the word 'version' here seems to describe a 'type' of UUID - version 5 doesn't supercede version 4).
There seem to be a few libraries out there for generating version 3/5 UUIDs, including the python uuid module, boost.uuid (C++) and OSSP UUID. (I haven't looked for any .net ones)

You need to make a distinction between instances of the class Guid, and identifiers that are globally unique. A "deterministic guid" is actually a hash (as evidenced by your call to provider.ComputeHash). Hashes have a much higher chance of collisions (two different strings happening to produce the same hash) than Guid created via Guid.NewGuid.
So the problem with your approach is that you will have to be ok with the possibility that two different paths will produce the same GUID. If you need an identifier that's unique for any given path string, then the easiest thing to do is just use the string. If you need the string to be obscured from your users, encrypt it - you can use ROT13 or something more powerful...
Attempting to shoehorn something that isn't a pure GUID into the GUID datatype could lead to maintenance problems in future...

MD5 is weak, I believe you can do the same thing with SHA-1 and get better results.
BTW, just a personal opinion, dressing a md5 hash up as a GUID does not make it a good GUID. GUIDs by their very nature are non Deterministic. this feels like a cheat. Why not just call a spade a spade and just say its a string rendered hash of the input. you could do that by using this line, rather than the new guid line:
string stringHash = BitConverter.ToString(hashBytes)

Here's a very simple solution that should be good enough for things like unit/integration tests:
var rnd = new Random(1234); // Seeded random number (deterministic).
Console.WriteLine($"{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}");

Related

Whats the standard code to generate HMAC SHA256 with key using C#

I'd like to know if there is a standard code to generate a SHA256 hash using a key. I've come across several types of code, however, they don't generate the same output.
Code found at JokeCamp
private string CreateToken(string message, string secret)
{
secret = secret ?? "";
var encoding = new System.Text.ASCIIEncoding();
byte[] keyByte = encoding.GetBytes(secret);
byte[] messageBytes = encoding.GetBytes(message);
using (var hmacsha256 = new HMACSHA256(keyByte))
{
byte[] hashmessage = hmacsha256.ComputeHash(messageBytes);
return Convert.ToBase64String(hashmessage);
}
}
Here's another one that I found
private static string ComputeHash(string apiKey, string message)
{
var key = Encoding.UTF8.GetBytes(apiKey);
string hashString;
using (var hmac = new HMACSHA256(key))
{
var hash = hmac.ComputeHash(Encoding.UTF8.GetBytes(message));
hashString = Convert.ToBase64String(hash);
}
return hashString;
}
The code generated by both of these are different to what is generated by http://www.freeformatter.com/hmac-generator.html#ad-output
I'll be using the SHA256 for one of our external API's where consumers would hash the data and send it to us. So we just want to make sure we use a standard approach so that they send us the correct hash. Also, I would like to know if there are any well-known nugets for this. I've also tried to find a solution with Bouncy Castle, however, I couldn't find one that uses a key to hash.

The difference is because of the character encodings (ASCII vs UTF-8 in your examples). Note that the hashing algorithm takes an array of bytes, and you do the conversion from a string to that byte-array beforehand.
Your question "whats the standard code" probably hasnt an answer, id say that if you expect the input to contain content from just the ASCII character-space, go for that, if not go for UTF-8. Either way - communicate it to your users
If you want to look at it from a usability perspective and make it the optimal for your users - go for both. Hash the content both ways and check agains the users incoming hash, but it all depends on your evaluation on clock-cycles vs security vs usability (you can have two)

They are almost equivalent.
The difference is how the encoding for the string is established.
In the first portion of code it assumes ASCII, whereas in the second portion it assumes UTF-8. It is possible that the string used another encoding which is none of those.
But regardless of that, the idea is to understand what is the goal of this operation. The truly relevant things in this context are:
Given equal input, output should be the same
There should be no way to retrieve the plaintext only by knowing the output (within a reasonable amount of time)
After hashing, you no longer require the original input in plaintext.
A secure cryptographic hashing function (meaning not older functions like MD5) achieves that.
Then, if your data store where hashes are stored is compromised, the attacker would only have a hash which cannot be used to retrieved the original plaintext. This is why hashing is used rather than encryption, encryption is a reversible operation (through decryption).
Then, within the system, if you've made the decision to use one encoding, you need to keep that decision consistent across all components in your system so they can interoperate.

Is this a cryptographically strong Guid?

I'm looking at using a Guid as a random anonymous visitor identifier for a website (stored both as a cookie client-size, and in a db server-side), and I wanted a cryptographically strong way of generating Guids (so as to minimize the chance of collisions).
For the record, there are 16 bytes (or 128 bits) in a Guid.
This is what I have in mind:
/// <summary>
/// Generate a cryptographically strong Guid
/// </summary>
/// <returns>a random Guid</returns>
private Guid GenerateNewGuid()
{
byte[] guidBytes = new byte[16]; // Guids are 16 bytes long
RNGCryptoServiceProvider random = new RNGCryptoServiceProvider();
random.GetBytes(guidBytes);
return new Guid(guidBytes);
}
Is there a better way to do this?
Edit:
This will be used for two purposes, a unique Id for a visitor, and a transaction Id for purchases (which will briefly be the token needed for viewing/updating sensitive information).

In answer to the OP's actual question whether this is cryptographically strong, the answer is yes since it is created directly from RNGCryptoServiceProvider. However the currently accepted answer provides a solution that is most definitely not cryptographically secure as per this SO answer:
Is Microsoft's GUID generator cryptographically secure.
Whether this is the correct approach architecturally due to theoretical lack of uniqueness (easily checked with a db lookup) is another concern.

So, what you're building is not technically a GUID. A GUID is a Globally Unique Identifier. You're building a random string of 128 bits. I suggest, like the previous answerer, that you use the built-in GUID generation methods. This method has a (albeit tremendously small) chance of generating duplicate GUID's.
There are a few advantages to using the built-in functionality, including cross-machine uniqueness [partially due to the MAC Address being referenced in the guid, see here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier.
Regardless of whether you use the built in methods, I suggest that you not expose the Purchase GUID to the customer. The standard method used by Microsoft code is to expose a Session GUID that identifies the customer and expires comparatively quickly. Cookies track customer username and saved passwords for session creation. Thus your 'short term purchase ID' is never actually passed to (or, more importantly, received from) the client and there is a more durable wall between your customers' personal information and the Interwebs at large.

Collisions are theoretically impossible (it's not Globally Unique for nothing), but predictability is a whole other question. As Christopher Stevenson correctly points out, given a few previously generated GUIDs it actually becomes possible to start predicting a pattern within a much smaller keyspace than you'd think. GUIDs guarantee uniqueness, not predictability. Most algorithms take it into account, but you should never count on it, especially not as transaction Id for purchases, however briefly. You're creating an open door for brute force session hijacking attacks.
To create a proper unique ID, take some random stuff from your system, append some visitor specific information, and append a string only you know on the server, and then put a good hash algorithm over the whole thing. Hashes are meant to be unpredictable and unreversable, unlike GUIDs.
To simplify: if uniqueness is all you care about, why not just give all your visitors sequential integers, from 1 to infinity. Guaranteed to be unique, just terribly predictable that when you just purchased item 684 you can start hacking away at 685 until it appears.

To avoid collisions:
If you can't keep a global count, then use Guid.NewGuid().
Otherwise, increment some integer and use 1, 2, 3, 4...
"But isn't that ridiculously easy to guess?"
Yes, but accidental and deliberate collisions are different problems with different solutions, best solved separately, note least because predictability helps prevent accidental collision while simultaneously making deliberate collision easier.
If you can increment globally, then number 2 guarantees no collisions. UUIDs were invented as a means to approximate that without the ability to globally track.
Let's say we use incrementing integers. Let's say the ID we have in a given case is 123.
We can then do something like:
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
Which produces 09C495910319E4BED2A64EA16149521C51791D8E123. To decode it back to the id we do:
private static int GetIDFromProtectedID(string str)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
string chkHash = chkID.ToString() + "this is my secret seed kjٵتשڪᴻᴌḶḇᶄ™∞ﮟﻑfasdfj90213";
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(hashString)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
return chkID;
}
}
return 0;//or perhaps raise an exception here.
}
Even if someone guessed from that they were given number 123, it wouldn't let them deduce that the id for 122 was B96594E536C9F10ED964EEB4E3D407F183FDA043122.
Alternatively, the two could be given as separate tokens, and so on.

I generally just use Guid.NewGuid();
http://msdn.microsoft.com/en-us/library/system.guid.newguid(v=vs.110).aspx

conversion modified guid

I have the following t-sql code which I have converted to c#.
DECLARE #guidRegular UNIQUEIDENTIFIER, #dtmNow DATETIME
SELECT #guidRegular = '{5bf8e554-8dbc-4008-9d48-5c6e0a4d28d7}'
SELECT #dtmNow = '2012-02-09 18:31:38'
print (CAST(CAST(#guidRegular AS BINARY(10)) + CAST(#dtmNow AS BINARY(6)) AS UNIQUEIDENTIFIER))
When I execute the .net version of the code (using same Guid and DateTime) I Get a different guid? It looks like it has something to do with the datetime element can anyone help ?
c# extension code:
using system.data.linq;
...
...
public static class GuidExtensions
{
public static Guid ToNewModifiedGuid(this Guid guid)
{
var dateTime = new DateTime(2012,02,09,18,31,38);
var guidBinary = new Binary(guid.ToByteArray().Take(10).ToArray());
var dateBinary = new Binary(BitConverter.GetBytes(dateTime.ToBinary()).ToArray().Take(6).ToArray());
var bytes = new byte[guidBinary.Length + dateBinary.Length];
Buffer.BlockCopy(guidBinary.ToArray(), 0, bytes, 0, guidBinary.ToArray().Length);
Buffer.BlockCopy(dateBinary.ToArray(), 0, bytes, guidBinary.ToArray().Length, dateBinary.ToArray().Length);
return new Guid(bytes);
}
}

I'm not surprised that SQL and .net would have different binary representations of a date/time. I would be surprised if they had.
Your c# code is asking the DateTime structure to serialize a value to a 64-bit ( 8 byte) byte array that can be used to recreate the same value. Then you're throwing away 2 bytes (the year? the millisecond? a checksum? who knows?)
Your sql code is asking the sql engine to take it's internal representation of a datetime - which is also 8 bytes - throw away two, and give the result.
So:
If you want identical values, you would need to stop relying on the internals of how a datetime is stored / serialized. Convert it to 6 bytes using a repeatable method you can write in both .net and tsql
Realize that you are removing the 6 bytes of a guid that represent the spatially unique portion and replacing them with the time. So you are creating a GUID that has the time encoded twice, and are greatly increasing the odds of duplicate GUIDs being created.
Of course, this ignores the more glaring issue of "why would anyone want to do that?" I'm going to assume that it's some really brilliant subsystem, instead of the more likely explanation that somebody is desperately trying to solve the wrong problem.

The original article has a flaw in the logic. The author describes both Natural and Surrogate keys but doesn't recognize that the RFC for UUIDs can be used to create a Natural key. Of course, doing so would require creating a custom function for generating a UUID based on some solution domain information, rather than relying on the default machine/time-based function for their generation.
Doing a single function to replace the generation of the keys makes a lot more sense than this, though.

Generating a non-guid unique key outside of a database

I have a situation where I need to create some kind of uniqueness between 'entities', but it is not a GUID, and it is not saved in a database (It is saved, however. Just not by a database).
The basic use of the key is a mere redundancy check. It does not have to be as scalable as a real 'primary key', but in the simplest terms I can think of , this is how it works.
[receiver] has List<possibilities>.
possibilities exist independently, but many will have the same values (impossible to predict. This is by design)
Frequently, the receivers list of possibilities will have to be emptied and then refilled (this is a business requirement).
The key is basically used to add a very lightweight redundancy check. In other words, sometimes the same possibility will be repeated, sometimes it should only appear once in the receiver's list.
I basically want to use something very light and simple. A string is sufficient. I was just wanting to figure out a modest algorithm to accomplish this. I thought about using the GetHashCode() method, but I am not certain about how reliable that is. Can I get some thoughts?

If you can use GetHashCode() at a first glance, you can probably use an MD5 hash as well, obtaining less collision probability. The resulting MD5 can be stored as a 24 charachter string by encoding it base 64, let see this example:
public static class MD5Gen
{
static MD5 hash = MD5.Create();
public static string Encode(string toEncode)
{
return Convert.ToBase64String(
hash.ComputeHash(Encoding.UTF8.GetBytes(toEncode)));
}
}
with this you encode a source string in an md5 hash in string format too. You just have to write the "possibility" class in term of string.

Try this for generating Guid.
VBScript Function to Generate a UUID/GUID
If you are on Windows, you can use the simple VBScript below to generate a UUID. Just save the code to a file called createguid.vbs, and then run cscript createguid.vbs at a command prompt.
Set TypeLib = CreateObject("Scriptlet.TypeLib")
NewGUID = TypeLib.Guid
WScript.Echo(left(NewGUID, len(NewGUID)-2))
Set TypeLib = Nothing
Create a UUID/GUID via the Windows Command Line
If you have the Microsoft SDK installed on your system, you can use the utility uuidgen.exe, which is located in the "C:\Program Files\Microsoft SDK\Bin" directory
or try the same for more info.
Link
I would say go for the Windows command line as it is more reliable.

Unpredictable Unique Identifier

For an application I am working on, I need to generate a session token which must have following properties:
it should be unique for the application at least, globally unique would be even better
it should be unpredictable
it should not be too long, as it will have to be passed around in a http request header, so smaller is better
I was thinking about adapting a Guid structure with a cryptographic random number, but it may be an overkill. Anyone know/created a data structure that would fit all those properties?

Let me be very clear on this point. All of the answers saying to use a GUID are deeply wrong and you should not under any circumstances do so.
There is nothing whatsoever in the specification for GUIDs that requires that they be unpredictable by attackers. GUIDs are allowed to be sequential if allocated in blocks, GUIDs are allowed to be created with non-crypto-strength randomness, and GUIDs are allowed to be created from known facts about the world, like your MAC address and the current time. Nothing whatsoever requires that GUIDs be unpredictable.
If you need an unpredictable session key then generate one using a crypto strength random number generator to make sufficiently many bits.
More generally: use the right tool for the job particularly when it comes to security-impacting code. GUIDs were designed to make sure that two companies did not accidentally give the same identifier to two different interfaces. If that's your application, use a GUID for it. GUIDs were invented to prevent accidents by benign entities, not to protect innocent users against determined attackers. GUIDs are like stop signs -- there to prevent accidental collisions, not to protect against attacking tanks.
GUIDs were not designed to solve any problem other than accident prevention, so do not under any circumstances use them to solve crypto problems. Use a crypto library specifically designed to solve your problem, and implemented by world-class experts.

You could use the cryptographic RandomNumberGenerator and get however many bytes you want to generate an identifier of suitable length.
RandomNumberGenerator rng = RandomNumberGenerator.Create();
byte[] bytes = new byte[8];
rng.GetBytes(bytes);
// produces a string like "4jpKc52ArXU="
var s = Convert.ToBase64String(bytes);

How about using what ASP.NET uses to generate unique, unpredictable and secure session ids:
using System;
using System.IO;
using System.Web;
using System.Web.SessionState;
class Program
{
static void Main()
{
var request = new HttpRequest("", "http://localhost", "");
var response = new HttpResponse(TextWriter.Null);
var context = new HttpContext(request, response);
var id = new SessionIDManager().CreateSessionID(context);
Console.WriteLine(id);
}
}

If you really and truly must have unpredictability, then generate a GUID and use a 128-bit bit mixer to rearrange the bits. Probably best to use a 64-bit bit mixer to mix the high and low portions. And, yes, there are bit mixers that guarantee unique output for every unique input.
Note that this isn't completely unpredictable. If somebody watching knows that the value is an obfuscated GUID, then he can probably examine successive values and, with some effort, potentially reverse-engineer your bit mixer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Create Deterministic Guids - c#

Related

Whats the standard code to generate HMAC SHA256 with key using C#

Is this a cryptographically strong Guid?

conversion modified guid

Generating a non-guid unique key outside of a database

Unpredictable Unique Identifier

Categories

Resources