Random String generation - avoiding duplicates - c#

I am using the below code to generate random keys like the following TX8L1I
public string GetNewId()
{
string result = string.Empty;
Random rnd = new Random();
short codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[rnd.Next(chars.Length)]);
result = string.Format("{0}{1}", "T", builder.ToString());
return result;
}
Every-time a key is generated a correspondent record is created on the database using the generated result as primary key. Off course this is not safe since a primary key violation might occur.
What's the correct approach to this? Should I load all existing keys and verify against tha t list if the key already exist and if yes generate another one?
Or maybe a more efficient way would be to move this logic into the database side? But even on the database side I still need to check that the random key generated does not exist on the current table.
Any ideas?
Thanks

Move the decision to the database. Add a primary key of type uniqueidentifier. Then its a simple "fire and forget" algorithm.. the database decides what the key is.

I think you should use Random instance outside of your method.
Since Random object is seeded from the system clock, which means that if you call your method several times very quickly, it will use the same seed each time, which means that you'll get the same string at the end.

well my decision to use the random is that I don't want the users to be able to know how the keys are generated (format), since the website is public and I don't want users to be trying to access keys that.
You've merged two problems into one that are really separate:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this, but is not predictable.
This is a specific case of a more general set of two problems:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this.
In this case, when we don't care about guessing, then the easiest way to deal with it is to just use the same identifier. E.g. for the database row identified by the integer 42 we use the string "42" and the mapping is a trivial int.Parse() or int.TryParse() in one direction and a trivial .ToString() or implicit .ToString() in the other.
And that's therefore the pattern we use without even thinking about it; public IDs and database keys are the same (perhaps with some conversion).
But the best solution to your specific case where you want to prevent guessing is not to change the key, but to change mapping between the key and the public identifier.
First, use auto-incremented integers ("IDENTITY" in SQL Server, and various similar concepts in other databases).
Then, when you are sending the key to the client (i.e. using it in a form value or appending it to a URI) then map it as so:
private const string Seed = "this is my secret seed ¾Ÿˇʣכ ↼⊜┲◗ blah balh";
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(id.ToString() + Seed)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
For example, with an ID of 123 this produces "989178D90470D8777F77C972AF46C4DED41EF0D9123".
Now map back to a the key with:
private static bool GetIDFromProtectedID(string str, out int id)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(chkID.ToString() + Seed)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
{
id = chkID;
return true;
}
}
}
id = 0;
return false;
}
For "989178D90470D8777F77C972AF46C4DED41EF0D9123" this returns true and sets the id argument to 123. For "989178D90470D8777F77C972AF46C4DED41EF0D9122" (because I tried to guess the ID to attack your site) it returns false and sets id to 0. (The correct ID for key 122 is "F8AD0F55CA1B9426D18F684C4857E0C4D43984BA122", which is not easy to guess from having seen that for 123).
If you want, you can remove some of the 40 characters of the output to produce smaller IDs. This makes it less secure (an attacker has fewer possible IDs to brute-force) but should still be reasonable for many uses.
Obviously, you should used a different value for Seed than here, so that someone reading this answer can't use it to predict your ID; change the seed, change the ID. Seed cannot be changed once set without altering every ID in the system. Sometimes that's a good thing (if identifiers are never meant to have long-term value anyway), but normally it's bad (you've just 404d every page in the site that used it).

I would move the logic to the database instead. Even though the database still have to check for the existence of the key, it's already within the database operation at that point, so there is no back-and-forth happening.
Also, if there's an index on this randomly generated key, a simple if exists will be quick enough to determine if the key has been used before or not.

You can do the following:
Generate your string as well as concatenate the following onto it
string newUniqueString = string.Format("{0}{1}", result, DateTime.Now.ToString("yyyyMMddHHmmssfffffff"));
This way, you will never have the same key ever again!
Or use
var StringGuid = Guid.NewGuid();

You've commited a typical sin with Random: one shouldn't create Random instance
each time they want to generate random value (otherwise one'll get badly skewed distribution with many repeats). Put Random away form function:
// Let Generator be thread safe
private static ThreadLocal<Random> s_Generator = new ThreadLocal<Random>(
() => new Random());
public static Random Generator {
get {
return s_Generator.Value;
}
}
// For repetition test. One can remove repetion test if
// number of generated ids << 15000
private HashSet<String> m_UsedIds = new HashSet<String>();
public string GetNewId() {
int codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
while (true) {
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[Generator.Next(chars.Length)]); // <- same generator for all calls
result = string.Format("{0}{1}", "T", builder.ToString());
// Test if random string in fact a repetition
if (m_UsedIds.Contains(result))
continue;
m_UsedIds.Add(result);
return result;
}
}
For codeLength = 5 there're Math.Pow(chars.Length, codeLength) possible
strings (Math.Pow(36, 5) == 60466176 = 6e7). According to birthday paradox
you can expect 1st repetition at about 2 * sqrt(possible strings) == 2 * sqrt(6e7) == 15000. If it's OK, you can skip the repetition test, otherwise HashSet<String> could
be a solution.

Related

Auto generated id with character i made

string SetTeacherId()
{
char digit = 'T';
string id = "";
var count = db.Teachers.Count();
if (count > 0)
{
var str = db.Teachers.OrderByDescending(a => a.TeacherID).Select(a => a.TeacherID).First();
string digits = new string(str.Where(char.IsDigit).ToArray());
string letters = new string(str.Where(char.IsLetter).ToArray());
int numbers;
if (!int.TryParse(digits, out numbers))
{
}
id += letters + (++numbers).ToString("D4");
return id;
}
else
return digit +"0001";
}
In here i make a method called SetTeacherId. To be honest it made by my senior. The Teacher Id will auto increment. Can someone make a lot simpler code to auto generate id, because it a little bit confused.
I think the code is quite clear already.
var str = db.Teachers.OrderByDescending(a => a.TeacherID).Select(a => a.TeacherID).First();
This statement get the largest teacher id.
string digits = new string(str.Where(char.IsDigit).ToArray());
This statement get the digit part of the id.
int.TryParse(digits, out numbers)
This statement parse the digit string to int.
id += letters + (++numbers).ToString("D4");
This statement make a new id string by the letter forward and the plus-one digit.
The only advice I can give is adding the error handler in if block.
If an explanation could be an answer, this is mine.
First of all, explaining what your ID datatype in the Table is, would have been nice.
We can only assume you're using a NVARCHAR(x) to represent the IDs.
I'm not sure if you can or would like to make changes to your tables, but you can make the ID column an IDENTITY and a numeric Datatype, and there you can define the seed (starting value) and incremental step (how much bigger the next value will be) and let the DataBAse Management System deal with it.
example: DBA Stackexchange topic
But since you're in the C# development, I assume you'd rather use a programmatical approach.
What your function does is, creates a simple hash from the letters used in the previous ID and the last number, incremeted by one and padded so it contains 4 digits (Hence the D4 in the ToString function)
Example: Teacher0001
Teacher0002
Teacher0003
...
Teacher9999
To make it simpler, you could have just make the ID into a numeric data type, read the biggest one
and just increment by 1 and insert with other values. Usually IDs are numeric and not really descriptive, as they should be used for machines to comprehend them quickly, not humans.
But it's much simpler and cleaner to let the DB deal with it, like described in my 3rd paragraph..

Comparing strings multiple times

I'm generating random scripts, but I have to guarantee that each new one is unique (hasn't been repeated before). So basically each script that has already been generated gets compared against every new script.
Instead of just using normal string compare, I'm thinking there must be a way to hash each new script so that comparison will be faster.
Any ideas on how to hash strings to make multiple comparisons faster?
One way is to use a HashSet<String>
The HashSetclass provides high performance set operations. A set is
a collection that contains no duplicate elements, and whose elements
are in no particular order.
HashSet<string> scripts = new HashSet<string>();
string generated_script = "some_text";
if (!scripts.Contains(generated_script)) // is HashSet<String> dont contains your string already then you can add it
{
scripts.Add(generated_script);
}
Also, You can check for existence of duplicate items in the array.
But this may not be very efficient as compared to HashSet<String>
string[] array = new[] {"demo", "demo", "demo"};
string compareWith = "demo";
int duplicates_count = array.GroupBy(x => x).Count(g => g.Count() > 1);
Use HashSet like below
string uniqueCode= "ABC";
string uniqueCode1 = "XYZ";
string uniqueCode2 = "ABC";
HashSet<string> uniqueList = new HashSet<string>();
uniqueList.Add(uniqueCode);
uniqueList.Add(uniqueCode1);
uniqueList.Add(uniqueCode2);
If you see the Count of uniqueList you will 2. so ABC will not be there two times.
You could use a HashSet. a hash-set is guaranteed to never contain duplicates
Store the script along with its hash:
class ScriptData
{
public ScriptData(string script)
{
this.ScriptHash=script.GetHashCode();
this.Script=script;
}
public int ScriptHash{get;private set;}
public string Script{get;private set;}
}
Then, whenever you need to check if your new random script is unique just take the hash code of the new script and seach all your ScriptData instances for any with the same hash code. If you dont find any you know your new random script is unique. If you do find some then they may be the same and you'll have to compare the actual text of the scripts in order to see if they're identical.
You can store each generated string in a HashSet.
For each new string you will call the method Contains which runs in O(1) complexity. This is an easy way to decide if the new generated string was generated before.

Using clock ticks as random number seed

I'm using the current clock ticks as a seed for random number generation. The random number is used in a pseudo GUID and a check in my database will make sure it doesn't already exist before returning. On average, this method will be called around 10k times in succession during the life of the process.
My concern is that an identical number might be generated back to back resulting in multiple unnecessary recursive calls to my database checking for the same ID. I'd like to avoid this if possible. What is the best way to test this scenario?
If it matters, application is .NET 4 and database is SQL Server 2008.
private static string GenerateUniqueDelId()
{
// Generate a random integer using the current number of clock ticks as seed.
// Then prefix number with "DEL" and date, finally padding random integer with leading zeros for a fixed 25-character total length.
int seed = (int)DateTime.Now.Ticks;
Random number = new Random(seed);
string id = string.Format("DEL{0}{1}", DateTime.Today.ToString("yyyyMMdd"), number.Next().ToString("D14"));
// Lookup record with generated ID in Sesame. If one exists, call method recursively.
string query = "SELECT * FROM Lead WHERE Esm_Id = #Esm_Id";
SqlParameter[] parameters = { new SqlParameter("#Esm_Id", id) };
if (DataManager.GetRow(query, parameters, DelConnection.Sesame) != null) return GenerateUniqueDelId();
// Otherwise, return ID.
return id;
} //// End GenerateUniqueDelId()
You are right in your concern: You should move the creation of your Random instance out of your method body - otherwise you will re-seed with the same value many times which results in the same number sequence.
Also you are kinda re-inventing the wheel: the default constructor of the Random class already uses the current clock time as default seed.
The question is why don't you avoid all of this and just use an auto-generated Guid on the database side?
Quoting Jon Skeet
When you see the word "random" in a question title on Stack Overflow you can almost guarantee it will be the same fundamental problem as countless similar questions. This article takes a look at why randomness causes so many problems, and how to address them.
Check his article about random number generators
http://csharpindepth.com/Articles/Chapter12/Random.aspx
basically his solution looks like:
using System;
using System.Threading;
public static class RandomProvider
{
private static int seed = Environment.TickCount;
private static ThreadLocal<Random> randomWrapper = new ThreadLocal<Random>(() =>
new Random(Interlocked.Increment(ref seed))
);
public static Random GetThreadRandom()
{
return randomWrapper.Value;
}
}

Stack Overflow in random number generator

For some reason, this code works fine when I don't use a seed in the Random class, but if I try to use DateTime.Now to get a more random number, I get a StackOverflowException! My class is really simple. Could someone tell me what I'm doing wrong here? See MakeUniqueFileName.
public class TempUtil
{
private int strcmp(string s1, string s2)
{
try
{
for (int i = 0; i < s1.Length; i++)
if (s1[i] != s2[i]) return 0;
return 1;
}
catch (IndexOutOfRangeException)
{
return 0;
}
}
private int Uniqueness(object randomObj)
{
switch (randomObj.ToString())
{
case "System.Object":
case "System.String":
return randomObj.ToString()[0];
case "System.Int32":
return int.Parse(randomObj.ToString());
case "System.Boolean":
return strcmp(randomObj.ToString(), "True");
default:
return Uniqueness(randomObj.ToString());
}
}
public string MakeUniqueFileName()
{
return "C:\\windows\\temp\\" + new Random(Uniqueness(DateTime.Now)).NextDouble() + ".tmp";
}
}
You're calling DateTime.Now.ToString(), which doesn't give you one of the strings you're checking for... so you're recursing, calling it with the same string... which still isn't one of the strings you're looking for.
You don't need to use Random to demonstrate the problem. This will do it very easily:
Uniqueness(""); // Tick, tick, tick... stack overflow
What did you expect it to be doing? It's entirely unclear what your code is meant to be doing, but I suggest you ditch the Uniqueness method completely. In fact, I suggest you get rid of the whole class, and use Path.GetTempFileName instead.
In short:
It should say
switch (randomObj.GetType().ToString())
instead of
switch (randomObj.ToString())
But even then this isn't very clever.
You are passing a DateTime instance to your Uniqueness method.
This falls through and calls itself with ToString - on a DateTime instance this will be a formatted DateTime string (such as "21/01/2011 13:13:01").
Since this string doesn't match any of your switch cases (again), the method calls itself again, but the result of calling ToString on a string is the same string.
You have caused an infinite call stack that results in the StackOverflowException.
There is no need to call Uniquness - when creating a Random instance, it will be based on the current time anyways.
I suggest reading Random numbers from the C# in depth website.
The parameter-less constructor of Random already uses the current time as seed value. It uses the time ticks, used internally to represent a DateTime.
A problem with this approach, however, is that the time clock ticks very slowly compared to the CPU clock frequency. If you create a new instance of Random each time you need a random value, it may be, that the clock did not tick between two calls, thus generating the same random number twice.
You can simply solve this problem by creating a single instance of Random.
public class TempUtil {
private static readonly Random random = new Random();
public string MakeUniqueFileName()
{
return #"C:\windows\temp\" + random.NextDouble() + ".tmp";
}
}
This will generate very good random numbers.
By the way
System.IO.Path.GetTempFileName()
automatically creates an empty temporary file with a unique name and returns the full path of that file.
Where to begin.
1. There is already a string compare. Use it. It has been debugged.
2. Your Unique function is illogical. The first two case items return a 'S' perhaps cast to an int. You have neglected the break on the first case.
Your third case is like this:
if (x =="System.Int32") return int.Parse("System.Int32");
That may return 32, or a parse error.
Your fourth case is like this:
if (x == "System.Boolean") return strcmp("System.Boolean", "True");
Your default case is called recursevly (sp) causing the stack overflow (see comment above)
In order fix this program, I recommend you read at least one good book on C#, then rethink your program, then write it. Perhaps Javascript would be a better fit.

Generating the next available unique name in C#

If you were to have a naming system in your app where the app contains say 100 actions, which creates new objects, like:
Blur
Sharpen
Contrast
Darken
Matte
...
and each time you use one of these, a new instance is created with a unique editable name, like Blur01, Blur02, Blur03, Sharpen01, Matte01, etc. How would you generate the next available unique name, so that it's an O(1) operation or near constant time. Bear in mind that the user can also change the name to custom names, like RemoveFaceDetails, etc.
It's acceptable to have some constraints, like restricting the number of characters to 100, using letters, numbers, underscores, etc...
EDIT: You can also suggest solutions without "filling the gaps" that is without reusing the already used, but deleted names, except the custom ones of course.
I refer you to Michael A. Jackson's Two Rules of Program Optimization:
Don't do it.
For experts only: Don't do it yet.
Simple, maintainable code is far more important than optimizing for a speed problem that you think you might have later.
I would start simple: build a candidate name (e.g. "Sharpen01"), then loop through the existing filters to see if that name exists. If it does, increment and try again. This is O(N2), but until you get thousands of filters, that will be good enough.
If, sometime later, the O(N2) does become a problem, then I'd start by building a HashSet of existing names. Then you can check each candidate name against the HashSet, rather than iterating. Rebuild the HashSet each time you need a unique name, then throw it away; you don't need the complexity of maintaining it in the face of changes. This would leave your code easy to maintain, while only being O(N).
O(N) will be good enough. You do not need O(1). The user is not going to click "Sharpen" enough times for there to be any difference.
I would create a static integer in action class that gets incremented and assigned as part of each new instance of the class. For instance:
class Blur
{
private static int count = 0;
private string _name;
public string Name
{
get { return _name; }
set { _name = value; }
}
public Blur()
{
_name = "Blur" + count++.ToString();
}
}
Since count is static, each time you create a new class, it will be incremented and appended to the default name. O(1) time.
EDIT
If you need to fill in the holes when you delete, I would suggest the following. It would automatically queue up numbers when items are renamed, but it would be more costly overall:
class Blur
{
private static int count = 0;
private static Queue<int> deletions = new Queue<int>();
private string _name;
public string Name
{
get { return _name; }
set
{
_name = value;
Delete();
}
}
private int assigned;
public Blur()
{
if (deletions.Count > 0)
{
assigned = deletions.Dequeue();
}
else
{
assigned = count++;
}
_name = "Blur" + assigned.ToString();
}
public void Delete()
{
if (assigned >= 0)
{
deletions.Enqueue(assigned);
assigned = -1;
}
}
}
Also, when you delete an object, you'll need to call .Delete() on the object.
CounterClass Dictionary version
class CounterClass
{
private int count;
private Queue<int> deletions;
public CounterClass()
{
count = 0;
deletions = new Queue<int>();
}
public string GetNumber()
{
if (deletions.Count > 0)
{
return deletions.Dequeue().ToString();
}
return count++.ToString();
}
public void Delete(int num)
{
deletions.Enqueue(num);
}
}
you can create a Dictionary to look up counters for each string. Just make sure you parse out the index and call .Delete(int) whenever you rename or delete a value.
You can easily do it in O(m) where m is the number of existing instances of the name (and not dependent on n, the number of items in the list.
Look up the string S in question. If S isn't in the list, you're done.
S exists, so construct S+"01" and check for that. Continue incrementing (e.g. next try S+"02" until it doesn't exist.
This gives you unique names but they're still "pretty" and human-readable.
Unless you expect a large number of duplicates, this should be "near-constant" time because m will be so small.
Caveat: What if the string naturally ends with e.g. "01"? In your case this sounds unlikely so perhaps you don't care. If you do care, consider adding more of a suffix, e.g. "_01" instead of just "01" so it's easier to tell them apart.
You could do something like this:
private Dictionary<string, int> instanceCounts = new Dictionary<string, int>();
private string GetNextName(string baseName)
{
int count = 1;
if (instanceCounts.TryGetValue(baseName, out count))
{
// the thing already exists, so add one to it
count++;
}
// update the dictionary with the new value
instanceCounts[baseName] = count;
// format the number as desired
return baseName + count.ToString("00");
}
You would then just use it by calling GetNextName(...) with the base name you wanted, such as
string myNextName = GetNextName("Blur");
Using this, you wouldn't have to pre-init the dictionary.
It would fill in as you used the various base words.
Also, this is O(1).
I would create a dictionary with a string key and a integer value, storing the next number to use for a given action. This will be almost O(1) in practice.
private IDictionary<String, Int32> NextFreeActionNumbers = null;
private void InitializeNextFreeActionNumbers()
{
this.NextFreeActionNumbers = new Dictionary<String, Int32>();
this.NextFreeActionNumbers.Add("Blur", 1);
this.NextFreeActionNumbers.Add("Sharpen", 1);
this.NextFreeActionNumbers.Add("Contrast", 1);
// ... and so on ...
}
private String GetNextActionName(String action)
{
Int32 number = this.NextFreeActionNumbers[action];
this.NextFreeActionNumbers[action] = number + 1;
return String.Format("{0} {1}", action, number);
}
And you will have to check against collisions with user edited values. Again a dictionary might be a smart choice. There is no way around that. What ever way you generate your names, the user can always change a existing name to the next one you generate unless you include all existing names into the generation schema. (Or use a special character that is not allowed in user edited names, but that would be not that nice.)
Because of the comments on reusing the holes I want to add it here, too. Don't resuse the holes generated be renaming or deletion. This will confuse the user because names he deleted or modified will suddenly reappear.
I would look for ways to simplify the problem.
Are there any constraints that can be applied? As an example, would it be good enough if each user can only have one (active) type of action? Then, the actions could be distinguished using the name (or ID) of the user.
Blur (Ben F)
Blur (Adrian H)
Focus (Ben F)
Perhaps this is not an option in this case, but maybe something else would be possible. I would go to great lengths in order to avoid the complexity in some of the proposed solutions!
If you want O(1) time then just track how many instances of each you have. Keep a hashtable with all of the possible objects, when you create an object, increment the value for that object and use the result in the name.
You're definitely not going to want to expose a GUID to the user interface.
Are you proposing an initial name like "Blur04", letting the user rename it, and then raising an error message if the user's custom name conflicts? Or silently renaming it to "CustomName01" or whatever?
You can use a Dictionary to check for duplicates in O(1) time. You can have incrementing counters for each effect type in the class that creates your new effect instances. Like Kevin mentioned, it gets more complex if you have to fill in gaps in the numbering when an effect is deleted.

Categories

Resources