Should I Use Path.GetRandomFileName or use a Guid? - c#

I need to generate unique folder names, should I use Path.GetRandomFileName or just use Guid.NewGuid?
Guids say they are globally unique, GetRandomFileName does not make such a claim.

I think both are equally random, the difference being that Path.GetRandomFileName will produce a 8.3 filename (total of 11 characters) so is going to have a smaller set of unique names than those generated by Guid.NewGuid.

Related

is there any way I can generate a unique number that is NOT as long as a UUID (GUID)? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Formulas to generate a unique id?
Basically I need to generate a unique number but I don't want it to be too long, such as a UUID. Something half of that size (if not smaller).
Can anyone think of any ways to do this?
Basically I'm going to have an app which might be in use by multiple people and the app generates files and uploads them to the web server. Those names need to be unique.
I'm not looking to use a database table to keep track of this stuff, by the way.
Generate a UUID, and only take the first half the string.
If you're concerned about generating duplicate IDs, your options are to make them non-random and auto-increment, or to check for the existence of newly generated IDs:
do {
newId = generateNewId();
} while (idExists(newId));
If you need it unique and short go with UUID and use a url shortener.
Piqued my curiosity:
// create a 32-bit uid:
var i = BitConverter.ToUInt32(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 12));
// create a 64-bit uid
var l = BitConverter.ToUInt64(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 8));
Of course the following may be equally applicable because you lose most of the features of a guid when you truncate it, and might as well resort to a random number:
l = BitConverter.ToUInt64(BitConverter.GetBytes((new Random()).NextDouble()), 0);
... if you're looking for a 64-bit integer.

Generate unique hash from filename

I'm looking to generate a unique random hash that has a miniscule chance of being duplicated. It should only contain numbers, and I want it to be 4 characters long. I have the file path in the form of
filepath = "c:\\users\\john\\filename.csv"
Now, I'd like to only select the "filename" part of that string and create a hash from that filename, though I want it to be different each time so if two users upload a similarly named file it will likely generate a different hash code. What's the best way to go about doing this?
I will be using this hash to append "001", "002", etc. on to create student IDs.
Generating a unique hash from a file's filename is fairly simple.
However...
It should only contain numbers, and I want it to be 4 characters long.
With only 4 numeric characters, you're going to be guaranteed to have a collision with 1000 different files, and will likely be hit quite a bit sooner. This makes it impossible to have a "minuscule chance of being duplicated".
Edit in response to comments:
You could do some simple type of hash, though this will give quite a few collisions:
string ComputeFourDigitStringHash(string filepath)
{
string filename = System.IO.Path.GetFileNameWithoutExtension(filepath);
int hash = filename.GetHashCode() % 10000;
return hash.ToString("0000");
}
This will give you a 4 digit "hash" from the filename portion of the string. Note that it will have a lot of collisions, but it will give you something you can use.

Creating unique URLs in ASP.NET

In my website, I need to create unique URLs that an admin user would use to send it to a group of users. The unique URL is created whenever an admin creates a new form. I understand I can use a guid to represent unique URLs, but I am looking for something shorter (hopefully around 4 characters, since it's easier to remember). How would I generate a unique URL in ASP.NET that would look like this:
http://mydomain.com/ABCD
I understand some of the URL shortener websites (like bit.ly) does something like this with a very short unique URL. Is there an algorithm I can use?
How about something like
public static string GetRandomString (int length)
{
string charPool = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder sb = new StringBuilder();
Random rnd = new Random();
while ((length--) > 0)
sb.Append(charPool[(int)(rnd.NextDouble() * charPool.Length)]);
return sb.ToString();
}
and call
GetRandomString(4);
Just write an algorithm to select a certain number of characters from a GUID (e.g. the first 4 or 8 characters, every even character up to 4 or 8 characters.)
Be sure to check it against the database to make sure it isn't already in use, and if it is regenerate it. As a safeguard, maybe make a timeout (if it tries to generate 10 and they're all in use, give up,) but it's unlikely to use every possible combination.
I believe bit.ly performs a hash and then base64 encodes the result. You could do the same, although it'll be more than 4 characters. Be sure to add code that handles hashing collisions. You could append 1, 2, 3, etc. when the first hash is in use.
Another approach is to create a new table in a database. Every time you need a new URL, add a row to this table. You could use the PK as the URL value. This will give you up to 10,000 unique values using only four characters. Base64 encode for even more.

How to Create Deterministic Guids

In our application we are creating Xml files with an attribute that has a Guid value. This value needed to be consistent between file upgrades. So even if everything else in the file changes, the guid value for the attribute should remain the same.
One obvious solution was to create a static dictionary with the filename and the Guids to be used for them. Then whenever we generate the file, we look up the dictionary for the filename and use the corresponding guid. But this is not feasible because we might scale to 100's of files and didnt want to maintain big list of guids.
So another approach was to make the Guid the same based on the path of the file. Since our file paths and application directory structure are unique, the Guid should be unique for that path. So each time we run an upgrade, the file gets the same guid based on its path. I found one cool way to generate such 'Deterministic Guids' (Thanks Elton Stoneman). It basically does this:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
So given a string, the Guid will always be the same.
Are there any other approaches or recommended ways to doing this? What are the pros or cons of that method?
As mentioned by #bacar, RFC 4122 ยง4.3 defines a way to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs.
There's no native support in the .NET Framework for creating these, but I posted code on GitHub that implements the algorithm. It can be used as follows:
Guid guid = GuidUtility.Create(GuidUtility.UrlNamespace, filePath);
To reduce the risk of collisions with other GUIDs even further, you could create a private GUID to use as the namespace ID (instead of using the URL namespace ID defined in the RFC).
This will convert any string into a Guid without having to import an outside assembly.
public static Guid ToGuid(string src)
{
byte[] stringbytes = Encoding.UTF8.GetBytes(src);
byte[] hashedBytes = new System.Security.Cryptography
.SHA1CryptoServiceProvider()
.ComputeHash(stringbytes);
Array.Resize(ref hashedBytes, 16);
return new Guid(hashedBytes);
}
There are much better ways to generate a unique Guid but this is a way to consistently upgrading a string data key to a Guid data key.
As Rob mentions, your method doesn't generate a UUID, it generates a hash that looks like a UUID.
The RFC 4122 on UUIDs specifically allows for deterministic (name-based) UUIDs - Versions 3 and 5 use md5 and SHA1(respectively). Most people are probably familiar with version 4, which is random. Wikipedia gives a good overview of the versions. (Note that the use of the word 'version' here seems to describe a 'type' of UUID - version 5 doesn't supercede version 4).
There seem to be a few libraries out there for generating version 3/5 UUIDs, including the python uuid module, boost.uuid (C++) and OSSP UUID. (I haven't looked for any .net ones)
You need to make a distinction between instances of the class Guid, and identifiers that are globally unique. A "deterministic guid" is actually a hash (as evidenced by your call to provider.ComputeHash). Hashes have a much higher chance of collisions (two different strings happening to produce the same hash) than Guid created via Guid.NewGuid.
So the problem with your approach is that you will have to be ok with the possibility that two different paths will produce the same GUID. If you need an identifier that's unique for any given path string, then the easiest thing to do is just use the string. If you need the string to be obscured from your users, encrypt it - you can use ROT13 or something more powerful...
Attempting to shoehorn something that isn't a pure GUID into the GUID datatype could lead to maintenance problems in future...
MD5 is weak, I believe you can do the same thing with SHA-1 and get better results.
BTW, just a personal opinion, dressing a md5 hash up as a GUID does not make it a good GUID. GUIDs by their very nature are non Deterministic. this feels like a cheat. Why not just call a spade a spade and just say its a string rendered hash of the input. you could do that by using this line, rather than the new guid line:
string stringHash = BitConverter.ToString(hashBytes)
Here's a very simple solution that should be good enough for things like unit/integration tests:
var rnd = new Random(1234); // Seeded random number (deterministic).
Console.WriteLine($"{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}");

Creating GUIDs with a set Prefix

i wonder if there is a way to generate valid GUIDs/UUIDs where the first (or any part) part is a user-selected prefix.
I.e., the GUID has the format AAAAAAAA-BBBB-CCCC-DDDD-DDDDDDDDDDDD, and I want to set any part to a pre-defined value (ideally the AAA's). The goal is to have GUIDs still globally unique, but they do not need to be cryptographically safe.
Sorry, you want too much from a GUID. Summarizing from both your question and your own answer/update, you want it to
1 be a GUID
2 not collide with any other GUID (be globally unique)
3 Ignore the standard on the interpretation of the first bits, using a reserved value
4 Use a personal scheme for the remaining bits
This is impossible, proof:
If it was possible, I could generate a GUID G1 and you could generate another GUID G2. Since we both ignore the standard and use the same reserved prefix, and my personal scheme for the other bits is outside your control, my GUID G1 can clash with your GUID G2. The non-collision propery of GUIDs follows from sticking to the GUID standard.
The mechanisms to prevent collisions are indeed inherently privacy-sensitive. If I generate at random a GUID G1, I can guarantee that random GUID is unique if two conditions are satisfied:
1 It's a member of the subset of GUIDs under my control and
2 I didn't generate the GUID before.
For GUIDs outside the subset under your control, you cannot guarantee (2). But how do you assign non-overlapping subsets of GUIDs to a single person? Using the MAC of a NIC is a simple, effective way. Other means are also possible. But in any case, the mere existence of such a subset is privacy-implicating. It's got to belong to someone, and I must be able to determine whether that's me or someone else. It's a bit harder to prove whether two random GUIDs G1 and G2 belong to the same subset (ie. person) but the current schemes (which you object to) do not try to hide that.
Hmmm...so, you'd basically like a 12 byte GUID? Since, once you remove the uniqueness of the first 4 bytes (your AAA's), you've broken the existing algorithm - you'll need to come up with your own algorithm.
According to the relevant RFC, the GUID format breaks down to:
UUID = time-low "-" time-mid "-"
time-high-and-version "-"
clock-seq-and-reserved
clock-seq-low "-" node
time-low = 4hexOctet
time-mid = 2hexOctet
time-high-and-version = 2hexOctet
clock-seq-and-reserved = hexOctet
clock-seq-low = hexOctet
node = 6hexOctet
hexOctet = hexDigit hexDigit
hexDigit =
"0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
"a" / "b" / "c" / "d" / "e" / "f" /
"A" / "B" / "C" / "D" / "E" / "F"
The only static data in there is version (4 bits) and reserved/variant (2-3 bits). I don't see that they allowed for any "user specified" versions, but I'd say you'll be safe for the foreseeable future if you use 1111 as your version identifier. The existing versions are in section 4.1.3, but only 5 have been defined so far...that gives you 11 more revisions before collision.
So, if you can live with 6 or 7 bits of distinctness, a combination of Guid.NewGuid().ToByteArray() and creating a new Guid after your bit fiddling should get you there.
Not possible to create GUIDs/UUIDs where the first (or any part) part is a user-selected prefix , whereas you can write your own function to create a unique id wid same number (36/38) of characters...
I recently had a similar need - I needed a GUID that was:
created by the standard guid algorithms, and therefore has a chance of being globally unique
has a defined prefix.
As you might imagine, I was doing something I shouldn't have.
You mention in one of your comments that you could just let the GUID generator run until it happens to hit upon a guid with the prefix you need. That's the tactic I took. Here's the code:
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string target_prefix = "dead";
while (true)
{
Guid g = Guid.NewGuid();
string gs = g.ToString();
if (gs.Substring(0, target_prefix.Length) == target_prefix)
{
Console.WriteLine("Match: " + gs);
}
else
{
//Console.WriteLine("Mismatch: " + gs);
}
}
}
}
}
For smaller prefixes it produces matches more quickly. I bet it's 16x as long for every digit of target prefix.
You can simply create a Guid, and change the prefix to be like you whish it to be.
Have seen this in an OS-Project, where same question was thrown and solved by generating so many guids until one matches the wished prefix (ugh!).
Guid g = Guid.NewGuid();
string gs = g.ToString();
Guid f = new Guid(string.Format("{0}-{1}", "AAAAAAAA", gs.Substring(gs.IndexOf('-') + 1)));
Not nice, but works.
What bothered me from other posts in this subject is, that a guid shall be globally unique, thats wrong in all cases, it has just enough room to generaty unique guids, but nothing guaranteed for global uniquely. Even time is not considered in generating a guid.
Thanks. My problem with these attempts is that they are not guaranteed to be globally unique, as Raymond Chen pointed out. I was wondering if there is another algorithm that generates GUIDs that are unique. I remember that there used to be implementations that used a Timestamp and/or the NIC MAC Address, but they are not used anymore since they are not cryptographic strong and/or there were privacy concerns.
I wonder: If I just make up my own, i should be fine? According to Wikipedia:
One to three of the most significant bits of the second byte in Data 4 define the type variant of the GUID:
Pattern Description
0 Network Computing System backward compatibility
10 Standard
110 Microsoft Component Object Model backward compatibility; this includes the GUID's for important interfaces like IUnknown and IDispatch.
111 Reserved for future use.
The most significant four bits of Data3 define the version number, and the algorithm used.
So if I make up something in Data3/Data4, i would normally create my own implementation that should not clash with any other GUID, but of course there is always a bit of risk associated with that, so before I do that I wanted to check if there is an older/not anymore used algorhithm that generates true Unique Ids.

Categories

Resources