Guid generating bug in C# - c#

I was working with Guid.NewGuid() in C# , .NET 4.0 . And I've found interesting fact: every Guid I've generated has '4' at 13 position.
da1471ac-11f7- 4 fb7-a7fa-927fffe8a97c
c90058aa-5d7f- 4 bb5-b3a9-c1db197cf3b1
fa68ec75-8cd2- 4 c24-92f8-41dbbdd428a0
d4efd455-e892- 4 3ef-b7bf-9462c5dc4de4
e0a001a0-8969- 4 092-b7a2-e410ed2b351a
30ae98b9-48ae- 4 25d-b6e7-e091502d6ce2
6a95de82-67ff- 4 4c9-9f7b-e37a80462cf7
66768e46-6d60- 4 2b4-b473-2f6f8bc1559a
I've tried it on several machines and have the same result.
Can anybody try it too or explain it?
Simple code for checking:
static void Main(string[] args)
{
bool ok = false;
for (int i = 0; i < 10000; i++)
{
var guid = Guid.NewGuid();
if (guid.ToString()[14] != '4')
ok = true;
Console.WriteLine(guid);
}
Console.WriteLine(ok ? "No bug!" : "4bug founded!");
}

This is indeed the case and not a bug, it's a feature. It specifies what method was used to generate the GUID. In the case of 4, those GUIDs are generated randomly.
Eric Lippert has a fantastic series on this topic:
Part One
Part Two
Part Three

It's in the algorythm specification. V4 GUIDs must have a "4" on that position. If you're interested in the details, give this a shot: http://en.wikipedia.org/wiki/Globally_unique_identifier#Algorithm

This really is of no consequence. Guids are generated from a number of different components in order to help maximize the chances that they are unique. For example, I believe one component is an ID on your network card. Part is the date. And so on.
I can't tell you off the top of my head where the 4 came from, but this isn't at all a surprise and is definitely not a bug as you seem to be suggestion.
You can learn more about the algorithm used to generate this number at http://en.wikipedia.org/wiki/Globally_unique_identifier.

This is expected behavior, not a bug. This position in the GUID identifies the basic generating algorithm that produced the GUID. There are a few different types; a V4 GUID is, with the exception of seven bits in defined locations of the GUID that identify the version and type variant, composed of random data. Other GUIDs use machine NICs and timestamp values.

Related

Unique ticket numbers for software support system

I'm developing a ticketing system for tracking bugs and software changes using ASP.NET MVC 4 and Entity Framework 5. I need a way to pick a unique number from a set of possible numbers. My thought is to create a set of possible numbers and mark numbers from this set as they are used and assigned to a support ticket.
I have this code for generating all possible ticket numbers to choose from, but I want to have leading zeroes so that all ticket numbers have the same length:
public static class GenerateNumber
{
private static IEnumerable<int> GenerateNumbers(int count)
{
return Enumerable.Range(0, count);
}
public static IEnumerable<string> GenerateTicketNumbers(int count)
{
return GenerateNumbers(count).Select(n => "TN" + n.ToString());
}
}
I want the output of
IEnumerable<string> ticketNumbers = GenerateNumber.GenerateTicketNumbers(Int32.MaxValue);
to be something like this:
TN0000000001
.
.
.
TN2147483647
Hopefully we won't need anything as large as Int32.MaxValue as that would mean we have way too many bugs haha. I just wanted to be safe than sorry on the limits of the available numbers. Perhaps we could use the methodology of reusing ticket numbers after they have been resolved. However, I don't know how I feel about reuse as it could lead to ambiguity for referring to documentation later on.
Considering the size of this set, is this the most efficient method to go about having unique ticket numbers?
Use an identity column in the database - this will autoincrement for you.
If you need a prefix as well, then store this as a separate varchar column and then for display purposes you can concatenate it (with your requisite leading zeros if that is absolutely really necessary). Trying to store an incrementing number in a varchar field is going to bite you in the ass one day.
As a side note, why the leading zeros? If I am fixing a ticket, I want to annotate my code with the ticket number. Leading zeros are just a pain - why not just have TN-123 and have the number get bigger as required?

is there any way I can generate a unique number that is NOT as long as a UUID (GUID)? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Formulas to generate a unique id?
Basically I need to generate a unique number but I don't want it to be too long, such as a UUID. Something half of that size (if not smaller).
Can anyone think of any ways to do this?
Basically I'm going to have an app which might be in use by multiple people and the app generates files and uploads them to the web server. Those names need to be unique.
I'm not looking to use a database table to keep track of this stuff, by the way.
Generate a UUID, and only take the first half the string.
If you're concerned about generating duplicate IDs, your options are to make them non-random and auto-increment, or to check for the existence of newly generated IDs:
do {
newId = generateNewId();
} while (idExists(newId));
If you need it unique and short go with UUID and use a url shortener.
Piqued my curiosity:
// create a 32-bit uid:
var i = BitConverter.ToUInt32(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 12));
// create a 64-bit uid
var l = BitConverter.ToUInt64(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 8));
Of course the following may be equally applicable because you lose most of the features of a guid when you truncate it, and might as well resort to a random number:
l = BitConverter.ToUInt64(BitConverter.GetBytes((new Random()).NextDouble()), 0);
... if you're looking for a 64-bit integer.

conversion modified guid

I have the following t-sql code which I have converted to c#.
DECLARE #guidRegular UNIQUEIDENTIFIER, #dtmNow DATETIME
SELECT #guidRegular = '{5bf8e554-8dbc-4008-9d48-5c6e0a4d28d7}'
SELECT #dtmNow = '2012-02-09 18:31:38'
print (CAST(CAST(#guidRegular AS BINARY(10)) + CAST(#dtmNow AS BINARY(6)) AS UNIQUEIDENTIFIER))
When I execute the .net version of the code (using same Guid and DateTime) I Get a different guid? It looks like it has something to do with the datetime element can anyone help ?
c# extension code:
using system.data.linq;
...
...
public static class GuidExtensions
{
public static Guid ToNewModifiedGuid(this Guid guid)
{
var dateTime = new DateTime(2012,02,09,18,31,38);
var guidBinary = new Binary(guid.ToByteArray().Take(10).ToArray());
var dateBinary = new Binary(BitConverter.GetBytes(dateTime.ToBinary()).ToArray().Take(6).ToArray());
var bytes = new byte[guidBinary.Length + dateBinary.Length];
Buffer.BlockCopy(guidBinary.ToArray(), 0, bytes, 0, guidBinary.ToArray().Length);
Buffer.BlockCopy(dateBinary.ToArray(), 0, bytes, guidBinary.ToArray().Length, dateBinary.ToArray().Length);
return new Guid(bytes);
}
}
I'm not surprised that SQL and .net would have different binary representations of a date/time. I would be surprised if they had.
Your c# code is asking the DateTime structure to serialize a value to a 64-bit ( 8 byte) byte array that can be used to recreate the same value. Then you're throwing away 2 bytes (the year? the millisecond? a checksum? who knows?)
Your sql code is asking the sql engine to take it's internal representation of a datetime - which is also 8 bytes - throw away two, and give the result.
So:
If you want identical values, you would need to stop relying on the internals of how a datetime is stored / serialized. Convert it to 6 bytes using a repeatable method you can write in both .net and tsql
Realize that you are removing the 6 bytes of a guid that represent the spatially unique portion and replacing them with the time. So you are creating a GUID that has the time encoded twice, and are greatly increasing the odds of duplicate GUIDs being created.
Of course, this ignores the more glaring issue of "why would anyone want to do that?" I'm going to assume that it's some really brilliant subsystem, instead of the more likely explanation that somebody is desperately trying to solve the wrong problem.
The original article has a flaw in the logic. The author describes both Natural and Surrogate keys but doesn't recognize that the RFC for UUIDs can be used to create a Natural key. Of course, doing so would require creating a custom function for generating a UUID based on some solution domain information, rather than relying on the default machine/time-based function for their generation.
Doing a single function to replace the generation of the keys makes a lot more sense than this, though.

Simple proof that GUID is not unique [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'd like to prove that a GUID is not unique in a simple test program.
I expected the following code to run for hours, but it's not working. How can I make it work?
BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10); //2^128
for(begin; begin<end; begin++)
Console.WriteLine(System.Guid.NewGuid().ToString());
I'm using C#.
Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me $0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.
using System;
using System.Collections.Generic;
using System.Linq;
namespace GuidCollisionDetector
{
class Program
{
static void Main(string[] args)
{
//var reserveSomeRam = new byte[1024 * 1024 * 100]; // This indeed has no effect.
Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
// Fill up memory with guids.
var bigHeapOGuids = new HashSet<Guid>();
try
{
do
{
bigHeapOGuids.Add(Guid.NewGuid());
} while (true);
}
catch (OutOfMemoryException)
{
// Release the ram we allocated up front.
// Actually, these are pointless too.
//GC.KeepAlive(reserveSomeRam);
//GC.Collect();
}
Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());
// Spool up some threads to keep checking if there's a match.
// Keep running until the heat death of the universe.
for (long k = 0; k < Int64.MaxValue; k++)
{
for (long j = 0; j < Int64.MaxValue; j++)
{
Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
{
if (bigHeapOGuids.Contains(Guid.NewGuid()))
throw new ApplicationException("Guids collided! Oh my gosh!");
}
);
Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
}
}
Console.WriteLine("Umm... why hasn't the universe ended yet?");
}
}
}
PS: I wanted to try out the Parallel extensions library. That was easy.
And using OutOfMemoryException as control flow just feels wrong.
EDIT
Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.
And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.
EDIT 2
As always, the GC does a better job than I do at managing memory; any previous attempts at doing it myself were doomed to failure.
This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.
Assuming Moores law holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).
A GUID is theoretically non-unique. Here's your proof:
GUID is a 128 bit number
You cannot generate 2^128 + 1 or more GUIDs without re-using old GUIDs
However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.
GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.
Of course GUIDs can collide. Since GUIDs are 128-bits, just generate 2^128 + 1 of them and by the pigeonhole principle there must be a collision.
But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).
If you generate a sequence of n GUIDs randomly, then the probability of at least one collision is approximately p(n) = 1 - exp(-n^2 / 2 * 2^128) (this is the birthday problem with the number of possible birthdays being 2^128).
n p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03
To make these numbers concrete, 2^60 = 1.15e+18. So, if you generate one billion GUIDs per second, it will take you 36 years to generate 2^60 random GUIDs and even then the probability that you have a collision is still 1.95e-03. You're more likely to be murdered at some point in your life (4.76e-03) than you are to find a collision over the next 36 years. Good luck.
If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.
Personally, I think the "Big Bang" was caused when two GUIDs collided.
You can show that in O(1) time with a variant of the quantum bogosort algorithm.
Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();
Any two GUIDs are very likely unique (not equal).
See this SO entry, and from Wikipedia
While each generated GUID is not
guaranteed to be unique, the total
number of unique keys (2^128 or
3.4×10^38) is so large that the probability of the same number being
generated twice is very small. For
example, consider the observable
universe, which contains about 5×10^22
stars; every star could then have
6.8×10^15 universally unique GUIDs.
So probably you have to wait for many more billion of years, and hope that you hit one before the universe as we know it comes to an end.
[Update:] As the comments below point out, newer MS GUIDs are V4 and do not use the MAC address as part of the GUID generation (I haven't seen any indication of a V5 implementation from MS though, so if anyone has a link confirming that let me know). WIth V4 though, time is still a factor though, and the odds against duplication of GUIDs remains so small as to be irrelevant for any practical usage. You certainly would not be likely to ever generate a duplicate GUID from just a single system test such as the OP was trying to do.
Most of these answers are missing one vital point about Microsoft's GUID implementation. The first part of the GUID is based on a timestamp and another part is based on the MAC address of the network card (or a random number if no NIC is installed).
If I understand this correctly, it means that the only reliable way to duplicate a GUID would be to run simultainous GUID generations on multiple machines where the MAC addresses were the same AND where the clocks on both systems were at the same exact time when the generation occured (the timestamp is based on milliseconds if I understand it correctly).... even then there are a lot of other bits in the number that are random, so the odds are still vanishingly small.
For all practical purposes the GUIDs are universally unique.
There is a pretty good description of the MS GUID over at "The Old New Thing" blog
Here's a nifty little extension method that you can use if you want to check guid uniqueness in many places in your code.
internal static class GuidExt
{
public static bool IsUnique(this Guid guid)
{
while (guid != Guid.NewGuid())
{ }
return false;
}
}
To call it, simply call Guid.IsUnique whenever you generate a new guid...
Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
throw new GuidIsNotUniqueException();
}
...heck, I'd even recommend calling it twice to make sure it got it right in the first round.
Counting to 2^128 - ambitious.
Lets imagine that we can count 2^32 IDs per second per machine - not that ambitious, since it's not even 4.3 billion per second. Lets dedicate 2^32 machines to that task. Furthermore, lets get 2^32 civilisations to each dedicate the same resources to the task.
So far, we can count 2^96 IDs per second, meaning we will be counting for 2^32 seconds (a little over 136 years).
Now, all we need is to get 4,294,967,296 civilisations to each dedicate 4,294,967,296 machines, each machine capable of counting 4,294,967,296 IDs per second, purely to this task for the next 136 years or so - I suggest we get started on this essential task right now ;-)
Well if the running time of 83 billion years does not scare you, think that you will also need to store the generated GUIDs somewhere to check if you have a duplicate; storing 2^128 16-byte numbers would only require you to allocate 4951760157141521099596496896 terabytes of RAM upfront, so imagining you have a computer which could fit all that and that you somehow find a place to buy terabyte DIMMs at 10 grams each, combined they will weigh more than 8 Earth masses, so you can seriously shift it off the current orbit, before you even press "Run". Think twice!
for(begin; begin<end; begin)
Console.WriteLine(System.Guid.NewGuid().ToString());
You aren't incrementing begin so the condition begin < end is always true.
If GUID collisions are a concern, I would recommend using the ScottGuID instead.
Presumably you have reason to be believe that the algorithm for producing Guids is not producing truly random numbers, but is in fact cycling with a period << 2^128.
e.g. RFC4122 method used to derive GUIDs which fixes the values of some bits.
Proof of cycling is going to depend upon the possible size of the period.
For small periods, hash table of hash(GUID) -> GUID with replacement on collision
if GUIDs do not match (terminate if they do) might be an approach. Consider also only doing the replacement a random fraction of the time.
Ultimately if the maximum period between collisions is large enough (and isn't known in advance) any method is only going to yield a probability that the collision would be found if it existed.
Note that if the method of generating Guids is clock based (see the RFC), then it may not be possible to determine if collisions exist because either (a) you won't be able to wait long enough for the clock to wrap round, or (b) you can't request enough Guids within a clock tick to force a collision.
Alternatively you might be able to show a statistical relationship between the bits in the Guid, or a correlation of bits between Guids. Such a relationship might make it highly probable that the algorithm is flawed without necessarily being able to find an actual collision.
Of course, if you just want to prove that Guids can collide, then a mathematical proof, not a program, is the answer.
I don't understand why no one has mentioned upgrading your graphics card... Surely if you got a high-end NVIDIA Quadro FX 4800 or something (192 CUDA cores) this would go faster...
Of course if you could afford a few NVIDIA Qadro Plex 2200 S4s (at 960 CUDA cores each), this calculation would really scream. Perhaps NVIDIA would be willing to loan you a few for a "Technology Demonstration" as a PR stunt?
Surely they'd want to be part of this historic calculation...
But do you have to be sure you have a duplicate, or do you only care if there can be a duplicate. To be sure that you have two people with the same birthday, you need 366 people (not counting leap year). For there to be a greater than 50% chance of having two people with the same birthday you only need 23 people. That's the birthday problem.
If you have 32 bits, you only need 77,163 values to have a greater than 50% chance of a duplicate. Try it out:
Random baseRandom = new Random(0);
int DuplicateIntegerTest(int interations)
{
Random r = new Random(baseRandom.Next());
int[] ints = new int[interations];
for (int i = 0; i < ints.Length; i++)
{
ints[i] = r.Next();
}
Array.Sort(ints);
for (int i = 1; i < ints.Length; i++)
{
if (ints[i] == ints[i - 1])
return 1;
}
return 0;
}
void DoTest()
{
baseRandom = new Random(0);
int count = 0;
int duplicates = 0;
for (int i = 0; i < 1000; i++)
{
count++;
duplicates += DuplicateIntegerTest(77163);
}
Console.WriteLine("{0} iterations had {1} with duplicates", count, duplicates);
}
1000 iterations had 737 with duplicates
Now 128 bits is a lot, so you are still talking a large number of items still giving you a low chance of collision. You would need the following number of records for the given odds using an approximation:
0.8 billion billion for a 1/1000 chance of a collision occurring
21.7 billion billion for 50% chance of a collision occurring
39.6 billion billion for 90% chance of a collision occurring
There are about 1E14 emails sent per year so it would be about 400,000 years at this level before you would have a 90% chance of having two with the same GUID, but that is a lot different than saying you need to run a computer 83 billion times the age of the universe or that the sun would go cold before finding a duplicate.
Aren't you all missing a major point?
I thought GUIDs were generated using two things which make the chances of them being Globally unique quite high. One is they are seeded with the MAC address of the machine that you are on and two they use the time that they were generated plus a random number.
So unless you run it on the actual machine and run all you guesses within the smallest amount of time that the machine uses to represent a time in the GUID you will never generate the same number no matter how many guesses you take using the system call.
I guess if you know the actual way a GUID is made would actually shorten the time to guess quite substantially.
Tony
You could hash the GUIDs. That way, you should get a result much faster.
Oh, of course, running multiple threads at the same time is also a good idea, that way you'll increase the chance of a race condition generating the same GUID twice on different threads.
GUIDs are 124 bits because 4 bits hold the version number.
Go to the cryogenics lab in the New York City.
Freeze yourself for (roughly) 1990 years.
Get a job at Planet Express.
Buy a brand-new CPU. Build a computer, run the program, and place it in the safe place with an pseudo-perpetual motion machine like the doomsday machine.
Wait until the time machine is invented.
Jump to the future using the time machine. If you bought 1YHz 128bit CPU, go to 3,938,453,320 days 20 hours 15 minutes 38 seconds 463 ms 463 μs 374 ns 607 ps after when you started to run the program.
...?
PROFIT!!!
... It takes at least 10,783,127 years even if you had 1YHz CPU which is 1,000,000,000,000,000 (or 1,125,899,906,842,624 if you prefer to use binary prefix) times faster than 1GHz CPU.
So rather than waiting for the compute finished, it would be better to feed pigeons which lost their home because other n pigeons took their home. :(
Or, you can wait until 128-bit quantum computer is invented. Then you may prove that GUID is not unique, by using your program in reasonable time(maybe).
Have you tried begin = begin + new BigInteger((long)1) in place of begin++?
If the number of UUID being generated follows Moore's law, the impression of never running out of GUID in the foreseeable future is false.
With 2 ^ 128 UUIDs, it will only take 18 months * Log2(2^128) ~= 192 years, before we run out of all UUIDs.
And I believe (with no statistical proof what-so-ever) in the past few years since mass adoption of UUID, the speed we are generating UUID is increasing way faster than Moore's law dictates. In other words, we probably have less than 192 years until we have to deal with UUID crisis, that's a lot sooner than end of universe.
But since we definitely won't be running them out by the end of 2012, we'll leave it to other species to worry about the problem.
The odds of a bug in the GUID generating code are much higher than the odds of the algorithm generating a collision. The odds of a bug in your code to test the GUIDs is even greater. Give up.
The program, albeit its errors, shows proof that a GUID is not unique. Those that try to prove the contrary are missing the point. This statement just proves the weak implementation of some of the GUID variations.
A GUID is not necessary unique by definition, it is highly unique by definition. You just refined the meaning of highly. Depending on the version, the implementator (MS or others), use of VM's, etc your definition of highly changes. (see link in earlier post)
You can shorten your 128 bit table to prove your point. The best solution is to use a hash formula to shorten your table with duplicates, and then use the full value once the hash collides and based on that re-generate a GUID. If running from different locations, you would be storing your hash/full key pairs in a central location.
Ps: If the goal is just to generate x number of different values, create a hash table of this width and just check on the hash value.
Not to p**s on the bonfire here, but it does actually happen, and yes, I understand the joking you have been giving this guy, but the GUID is unique only in principle, I bumped into this thread because there is a bug in the WP7 emulator which means every time it boots it gives out the SAME GUID the first time it is called! So, where in theory you cannot have a conflict, if there is a problem generating said GUI, then you can get duplicates
http://forums.create.msdn.com/forums/p/92086/597310.aspx#597310
Since part of Guid generation is based on the current machine's time, my theory to get a duplicate Guid is:
Perform a clean installation of Windows
Create a startup script that resets the time to 2010-01-01 12:00:00 just as Windows boots up.
Just after the startup script, it triggers your application to generate a Guid.
Clone this Windows installation, so that you rule out any subtle differences that may occur in subsequent boot-ups.
Re-image the hard drive with this image and boot-up the machine a few times.
For me.. the time it takes for a single core to generate a UUIDv1 guarantees it will be unique. Even in a multi core situation if the UUID generator only allows one UUID to be generated at a time for your specific resource (keep in mind that multiple resources can totally utilize the same UUIDs however unlikely since the resource inherently part of the address) then you will have more than enough UUIDs to last you until the timestamp burns out. At which point I really doubt you would care.
Here's a solution, too:
int main()
{
QUuid uuid;
while ( (uuid = QUuid::createUuid()) != QUuid::createUuid() ) { }
std::cout << "Aha! I've found one! " << qPrintable( uuid.toString() ) << std::endl;
}
Note: requires Qt, but I guarantee that if you let it run long enough, it might find one.
(Note note: actually, now that I'm looking at it, there may be something about the generation algorithm that prevents two subsequently generated uuids that collide--but I kinda doubt it).
The only solution to prove GUIDs are not unique would be by having a World GUID Pool. Each time a GUID is generated somewhere, it should be registered to the organization. Or heck, we might include a standardization that all GUID generators needs to register it automatically and for that it needs an active internet connection!

Creating GUIDs with a set Prefix

i wonder if there is a way to generate valid GUIDs/UUIDs where the first (or any part) part is a user-selected prefix.
I.e., the GUID has the format AAAAAAAA-BBBB-CCCC-DDDD-DDDDDDDDDDDD, and I want to set any part to a pre-defined value (ideally the AAA's). The goal is to have GUIDs still globally unique, but they do not need to be cryptographically safe.
Sorry, you want too much from a GUID. Summarizing from both your question and your own answer/update, you want it to
1 be a GUID
2 not collide with any other GUID (be globally unique)
3 Ignore the standard on the interpretation of the first bits, using a reserved value
4 Use a personal scheme for the remaining bits
This is impossible, proof:
If it was possible, I could generate a GUID G1 and you could generate another GUID G2. Since we both ignore the standard and use the same reserved prefix, and my personal scheme for the other bits is outside your control, my GUID G1 can clash with your GUID G2. The non-collision propery of GUIDs follows from sticking to the GUID standard.
The mechanisms to prevent collisions are indeed inherently privacy-sensitive. If I generate at random a GUID G1, I can guarantee that random GUID is unique if two conditions are satisfied:
1 It's a member of the subset of GUIDs under my control and
2 I didn't generate the GUID before.
For GUIDs outside the subset under your control, you cannot guarantee (2). But how do you assign non-overlapping subsets of GUIDs to a single person? Using the MAC of a NIC is a simple, effective way. Other means are also possible. But in any case, the mere existence of such a subset is privacy-implicating. It's got to belong to someone, and I must be able to determine whether that's me or someone else. It's a bit harder to prove whether two random GUIDs G1 and G2 belong to the same subset (ie. person) but the current schemes (which you object to) do not try to hide that.
Hmmm...so, you'd basically like a 12 byte GUID? Since, once you remove the uniqueness of the first 4 bytes (your AAA's), you've broken the existing algorithm - you'll need to come up with your own algorithm.
According to the relevant RFC, the GUID format breaks down to:
UUID = time-low "-" time-mid "-"
time-high-and-version "-"
clock-seq-and-reserved
clock-seq-low "-" node
time-low = 4hexOctet
time-mid = 2hexOctet
time-high-and-version = 2hexOctet
clock-seq-and-reserved = hexOctet
clock-seq-low = hexOctet
node = 6hexOctet
hexOctet = hexDigit hexDigit
hexDigit =
"0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
"a" / "b" / "c" / "d" / "e" / "f" /
"A" / "B" / "C" / "D" / "E" / "F"
The only static data in there is version (4 bits) and reserved/variant (2-3 bits). I don't see that they allowed for any "user specified" versions, but I'd say you'll be safe for the foreseeable future if you use 1111 as your version identifier. The existing versions are in section 4.1.3, but only 5 have been defined so far...that gives you 11 more revisions before collision.
So, if you can live with 6 or 7 bits of distinctness, a combination of Guid.NewGuid().ToByteArray() and creating a new Guid after your bit fiddling should get you there.
Not possible to create GUIDs/UUIDs where the first (or any part) part is a user-selected prefix , whereas you can write your own function to create a unique id wid same number (36/38) of characters...
I recently had a similar need - I needed a GUID that was:
created by the standard guid algorithms, and therefore has a chance of being globally unique
has a defined prefix.
As you might imagine, I was doing something I shouldn't have.
You mention in one of your comments that you could just let the GUID generator run until it happens to hit upon a guid with the prefix you need. That's the tactic I took. Here's the code:
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string target_prefix = "dead";
while (true)
{
Guid g = Guid.NewGuid();
string gs = g.ToString();
if (gs.Substring(0, target_prefix.Length) == target_prefix)
{
Console.WriteLine("Match: " + gs);
}
else
{
//Console.WriteLine("Mismatch: " + gs);
}
}
}
}
}
For smaller prefixes it produces matches more quickly. I bet it's 16x as long for every digit of target prefix.
You can simply create a Guid, and change the prefix to be like you whish it to be.
Have seen this in an OS-Project, where same question was thrown and solved by generating so many guids until one matches the wished prefix (ugh!).
Guid g = Guid.NewGuid();
string gs = g.ToString();
Guid f = new Guid(string.Format("{0}-{1}", "AAAAAAAA", gs.Substring(gs.IndexOf('-') + 1)));
Not nice, but works.
What bothered me from other posts in this subject is, that a guid shall be globally unique, thats wrong in all cases, it has just enough room to generaty unique guids, but nothing guaranteed for global uniquely. Even time is not considered in generating a guid.
Thanks. My problem with these attempts is that they are not guaranteed to be globally unique, as Raymond Chen pointed out. I was wondering if there is another algorithm that generates GUIDs that are unique. I remember that there used to be implementations that used a Timestamp and/or the NIC MAC Address, but they are not used anymore since they are not cryptographic strong and/or there were privacy concerns.
I wonder: If I just make up my own, i should be fine? According to Wikipedia:
One to three of the most significant bits of the second byte in Data 4 define the type variant of the GUID:
Pattern Description
0 Network Computing System backward compatibility
10 Standard
110 Microsoft Component Object Model backward compatibility; this includes the GUID's for important interfaces like IUnknown and IDispatch.
111 Reserved for future use.
The most significant four bits of Data3 define the version number, and the algorithm used.
So if I make up something in Data3/Data4, i would normally create my own implementation that should not clash with any other GUID, but of course there is always a bit of risk associated with that, so before I do that I wanted to check if there is an older/not anymore used algorhithm that generates true Unique Ids.

Categories

Resources