I am working on a programming project for class, and I wanted to add something extra to the project by randomly generating data for it. My issue is that I have a list populating with copies of the same data even though it seems to be generating completely different things each time a new object is created. When I attempt to debug, I encounter very strange behavior. This is my code:
private void PopulateAtRandom(int amount)
{
// create a list of first names from a text file of 200 names
List<string> firstnames = new List<string>();
StreamReader reader = new StreamReader("Random First Names.txt");
while (!reader.EndOfStream)
firstnames.Add(reader.ReadLine());
reader.Close();
// create a list of last names from a text file of 500 names
List<string> lastnames = new List<string>();
reader = new StreamReader("Random Last Names.txt");
while (!reader.EndOfStream)
lastnames.Add(reader.ReadLine());
reader.Close();
// create a list of majors from a text file of 198 majors
List<string> majors = new List<string>();
reader = new StreamReader("Majors.txt");
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
majors.Add(line.Substring(0, line.IndexOf(" - ")));
}
reader.Close();
// create a list of high schools from a text file of 860 schools
List<string> highschools = new List<string>();
reader = new StreamReader("All Illinois High Schools.txt");
while (!reader.EndOfStream)
highschools.Add(reader.ReadLine().Split(',')[0]);
reader.Close();
// create a list of colleges from a text file of 9436 schools
List<string> colleges = new List<string>();
reader = new StreamReader("All US Colleges.txt");
while (!reader.EndOfStream)
colleges.Add(reader.ReadLine());
reader.Close();
students = new List<Student>();
for (int i = 0; i < amount; i++)
{
bool graduate = random.NextDouble() >= 0.5;
string fName = firstnames[random.Next(firstnames.Count)];
string lName = lastnames[random.Next(lastnames.Count)];
string major = majors[random.Next(majors.Count)];
int gradYear = RandomGauss(1950, 2017, 2013, (graduate ? 10 : 4));
string prevSchool = graduate ? colleges[random.Next(colleges.Count)]
: highschools[random.Next(highschools.Count)];
string otherInfo = graduate ? RandomWithDefault<string>(major, 0.05, majors)
: "" + RandomGauss(0, 60, 0, 15) + " transfer credits";
Student student = new Student(graduate, fName, lName, major, gradYear, prevSchool, otherInfo);
students.Add(student); /* I put a breakpoint here for debugging */
}
}
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){...}
/**
* <summary>
* Randomly return either the default value or a different value based on the given odds
* </summary>
*/
private T RandomWithDefault<T>(T defaultValue, double oddsOfOther, List<T> otherOptions){...}
private void buttonSubmit_Click(object sender, EventArgs e)
{
for (int i = 0; i < students.Count; i++)
{
Student student = students[i];
listBox.Items.Add(student); /* I put another breakpoint here for debugging */
}
}
I have been using PopulateAtRandom(1000); in my constructor. When buttonSubmit_Click() is called, listBox will display one of two things. The first entry is always unique, then either a) the next 500-ish entries are one student and the rest are a second student, or b) the rest of the entries are alternating between two different students. However, when I go to debug, I can see that every new entry into students is unique, as it should be. Then, when I check how listBox.Items is being populated, I find that same pattern of the first few being unique and the rest displaying only two different students. The actual act of debugging seems to affect this as well. For example, I will stop at the first breakpoint 20 times then let the program finish on its own until I reach the second breakpoint. As I stop on the second breakpoint, I find that each of those 20 students, plus another, display properly, then the following 979 follow that same pattern as earlier. I see this same effect no matter how many times I stop at the first breakpoint.
I have tried searching the internet for similar instances of this behavior, but I am not getting anywhere, which is probably because I am not sure how to word this issue. When I search using the title I provided for this question, I do not get anything remotely related to my problem, so if any of you know of a similar issue, please point me in the right direction. My only thought is that this is an issue with memory allocation. The PopulateAtRandom() method is using up a lot of memory with the lists I create before attempting to populate students, so maybe the program is recycling the same memory address for each new Student, and since students is really just a list of memory addresses, it ends up with repeats of the same addresses. C# does not seem to have a nice way of giving me the memory address of an object, so I haven't been able to confirm that. If this is the case, though, I am still not sure how to circumvent that issue, so any advice would be greatly appreciated. Thank you!
RandomGauss probably leverages the Random class, which creates a seed based on the time when it's instantiated. I'm guessing the RandomGauss method instantiates a new Random instance each time it's invoked. When you aren't debugging, your loop repeats a lot of times before the system's clock ticks to change time, so many of your Random instances end up using the same seed, and hence produce the same result the first time you ask them for a random number.
The solution is to create a single Random instance and storing it to a field on your class.
e.g., instead of this:
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){
Random random = new Random();
// code that uses random ...
}
you want something more like this:
private Random random = new Random();
/**
* <summary>
* Return a random integer in the given range based on the specified gaussian distribution
* </summary>
*/
private int RandomGauss(int min, int max, double mean, double sigma){
// code that uses random ...
}
PS--there are utility methods that help with reading text from files.
// create a list of first names from a text file of 200 names
List<string> firstnames = File.ReadAllLines("Random First Names.txt").ToList();
Related
I am doing an exercise from exercism.io, in which I have to generate random names for robots. I am able to get through a bulk of the tests until I hit this test:
[Fact]
public void Robot_names_are_unique()
{
var names = new HashSet<string>();
for (int i = 0; i < 10_000; i++) {
var robot = new Robot();
Assert.True(names.Add(robot.Name));
}
}
After some googling around, I stumbled upon a couple of solutions and found out about the Fisher-Yates algorithm. I tried to implement it into my own solution but unfortunately, I haven't been able to pass the final test, and I'm stumped. If anyone could point me in the right direction with this, I'd greatly appreciate it. My code is below:
EDIT: I forgot to mention that the format of the string has to follow this: #"^[A-Z]{2}\d{3}$"
public class Robot
{
string _name;
Random r = new Random();
string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string nums = "0123456789";
public Robot()
{
_name = letter() + num();
}
public string Name
{
get { return _name; }
}
private string letter() => GetString(2 ,alpha.ToCharArray(), r);
private string num() => GetString(3, nums.ToCharArray(), r);
public void Reset() => _name = letter() + num();
public string GetString(int length,char[] chars, Random rnd)
{
Shuffle(chars, rnd);
return new string(chars, 0, length);
}
public void Shuffle(char[] _alpha, Random r)
{
for(int i = _alpha.Length - 1; i > 1; i--)
{
int j = r.Next(i);
char temp = _alpha[i];
_alpha[i] = _alpha[j];
_alpha[j] = temp;
}
}
}
The first rule of any ID is:
It does not mater how big it is, how many possible value it has - if you just create enough of them, you will get a colission eventually.
To Quote Trillian from the Hithchikers Guide: "[A colission] is not impossible. Just realy, really unlikely."
However in this case, I think it is you creating Random Instances in a Loop. This is a classical beginners mistake when workign with Random. You should not create a new random isntance for each Robot Instance, you should have one for the application that you re-use. Like all Pseudorandom Number Generators, Random is deterministic. Same inputs - same outputs.
As you did not specify a seed value, it will use the time in milliseconds. Wich is going to the same between the first 20+ loop itterations at last. So it is going to have the same seed and the same inputs, so the same outputs.
The easiest solution for unique names is to use GUIDs. In theory, it is possible to generate non-unique GUIDs but it is pretty close to zero.
Here is the sample code:
var newUniqueName = Guid.NewGuid().ToString();
Sure GUIDs do not look pretty but they are really easy to use.
EDIT: Since the I missed the additional requirement for the format I see that GUID format is not acceptable.
Here is an easy way to do that too. Since format is two letters (26^2 possibile values) and 3 digits (10^3 possible values) the final number of possible values is 26^2 * 10^3 = 676 * 1000 = 676000. This number is quite small so Random can be used to generate the random integer in the range 0-675999 and then that number can be converted to the name. Here is the sample code:
var random = new System.Random();
var value = random.Next(676000);
var name = ((char)('A' + (value % 26))).ToString();
value /= 26;
name += (char)('A' + (value % 26));
value /= 26;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
value /= 10;
name += (char)('0' + (value % 10));
The usual disclaimer about possible identical names applies here too since we have 676000 possible variants and 10000 required names.
EDIT2: Tried the code above and generating 10000 names using random numbers produced between 9915 and 9950 unique names. That is no good. I would use a simple static in class member as a counter instead of random number generator.
First, let's review the test you're code is failing against:
10.000 instances created
Must all have distinct names
So somehow, when creating 10000 "random" names, your code produces at least two names that are the same.
Now, let's have a look at the naming scheme you're using:
AB123
The maximum number of unique names we could possibly create is 468000 (26 * 25 * 10 * 9 * 8).
This seems like it should not be a problem, because 10000 < 468000 - but this is where the birthday paradox comes in!
From wikipedia:
In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.
Rewritten for the purposes of your problem, we end up asking:
What's the probability that, in a set of 10000 randomly chosen people, some pair of them will have the same name.
The wikipedia article also lists a function for approximating the number of people required to reach a 50% propbability that two people will have the same name:
where m is the total number of possible distinct values. Applying this with m=468000 gives us ~806 - meaning that after creating only 806 randomly named Robots, there's already a 50% chance of two of them having the same name.
By the time you reach Robot #10000, the chances of not having generated two names that are the same is basically 0.
As others have noted, you can solve this by using a Guid as the robot name instead.
If you want to retain the naming convention you might also get around this by implementing an LCG with an appropriate period and use that as a less collision-prone "naming generator".
Here's one way you can do it:
Generate the list of all possible names
For each robot, select a name from the list at random
Remove the selected name from the list so it can't be selected again
With this, you don't even need to shuffle. Something like this (note, I stole Optional Option's method of generating names because it's quite clever and I couldn't be bother thinking of my own):
public class Robot
{
private static List<string> names;
private static Random rnd = new Random();
public string Name { get; private set; }
static Robot()
{
Console.WriteLine("Initializing");
// Generate possible candidates
names = Enumerable.Range(0, 675999).Select(i =>
{
var sb = new StringBuilder(5);
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append((char)('A' + i % 26));
i /= 26;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
i /= 10;
sb.Append(i % 10);
return sb.ToString();
}).ToList();
}
public Robot()
{
// Note: if this needs to be multithreaded, then you'd need to do some work here
// to avoid two threads trying to take a name at the same time
// Also note: you should probably check that names.Count > 0
// and throw an error if not
var i = rnd.Next(0, names.Count - 1);
Name = names[i];
names.RemoveAt(i);
}
}
Here's a fiddle that generates 20 random names. They can only be unique because they are removed once they are used.
The point about multitheading is very important however. If you needed to be able to generate robots in parallel, then you'd need to add some code (e.g. locking the critical section of code) to ensure that only one name is being picked and removed from the list of candidates at a time or else things will get really bad, really quickly. This is why, when people need a random id with a reasonable expectation that it'll be unique, without worrying that some other thread(s) are trying the same thing at the same time, they use GUIDs. The sheer number of possible GUIDs makes collisions very unlikely. But you don't have that luxury with only 676,000 possible values
I am working with generating Random items from either one or another list. I am kind of struggling how to do that.
Basically I have two lists:
List<string> names = new List<string>();
List<string> surnames = new List<string>();
I know how to get an item from one list randomly, but I am struggling how to do so there will be a possibility of taking an item from either names or surnames.
I know there is possibly an easy solution for that but couldn't find it.
Any help would be appreciated.
I know how to get an item from one list randomly
Leverage the technique for taking a random item from a single list to build a simple approach that works with two lists.
Imagine that you have a list of length N = names.Count + surnames.Count
Pick a random position p between 0, inclusive, and N, exclusive
If the position p is less than names.Count, use names[p]
Otherwise, use surnames[p - names.Length]
Effectively, the above approach picks an item form a merged list without performing an actual merge.
Edit: It turns out that you wanted a random combination of names[] and surnames[]. This is a simpler task, which is achieved by picking a random element from an array twice - once from names[], and then separately from surnames[].
This should do the job:
Random r = new Random();
Int32 nameIdx = r.Next(names.Count);
Int32 surnameIdx = r.Next(surnames.Count);
String randFullname = names[nameIdx] + " " + surnames[surnameIdx];
This is just an example to show you how to work with random array accesses. If you need to select only one name or one surname (the question was not really clear on that point "but I am struggling how to do so there will be a possibility of taking an item from either names or surnames"), just throw another random [0 1] and pick the first or the second list basing your choice on the output value:
List<String> currentList;
String result;
Random r = new Random();
if (rand.Next(0, 2) == 0)
currentList = names;
else
currentList = surnames;
Int32 idx = r.Next(currentList.Count);
String result = currentList[idx];
Otherwise, just pick a single random entry from a concatenation:
List<String> con = names.Concat(surnames).ToList();
You can merge two lists and access random element as follows,
var newList = names.Concat(surnames).ToList();
Random r = new Random();
string rand = newList[r.Next(newList.Count)];
If you want to do it in a one-liner you could try the following:
var r = new Random();
var randomName = names.Concat(surnames).OrderBy(n => r.Next()).First();
It's not very efficient memory wise, but it should work.
I was wondering how get on with this code, currently working on a tournament bracket system.
Currently I have created a comboBox that fetches all the lines from "log.txt" there are 16 lines in the txt file; then I created a assign button that is supposed to assign all the names into 16 textboxes called User1 --> User16, however the same name cant be repeated.
I looked at "Array of list" & "Array of string", but I seem to be stuck since I cant really figure out what to put in the code.
my random button looks like this at the moment:
private void assign_Click(object sender, EventArgs e)
{
int x;
Random rnd = new Random();
x = rnd.Next(0, 16);
User1.Text = comboBox2.Items[x].ToString();
x = rnd.Next(0, 16);
User2.Text = comboBox2.Items[x].ToString();
x = rnd.Next(0, 16);
User3.Text = comboBox2.Items[x].ToString();
x = rnd.Next(0, 16);
User4.Text = comboBox2.Items[x].ToString();
and so on untill i hit
x = rnd.Next(0, 16);
User16.Text = comboBox2.Items[x].ToString();
}
One of the simplest, but not necessarily most efficient, way to do this is to put all your strings into a List<string> and remove them randomly one-by-one. This would work a lot better if you put all your textboxes into a collection as well. For example, given a list of strings called myStrings and a collection of textboxes called myTextboxes, you could:
for (var i=0; i < myStrings.Count; i++)
{
var idx = rnd.Next(0, myStrings.Count);
myTextboxes[i].Text = myStrings[idx]; // Note: we are assuming the two collections have
// the same length
myStrings.RemoveAt(idx);
}
This is very easy to implement and very easy to get right, but it's not terribly efficient (for 16 items, it probably doesn't matter) because your collection is repeatedly resized. For a more efficient approach, first shuffle your strings using the Fisher-Yates shuffle and then just assign the first entry from your shuffled strings to the first textbox, the second to the second, and so on.
You could use a List, and after each assignment you could remove the assigned item from the list. This would prevent duplicates.
http://msdn.microsoft.com/en-us/library/cd666k3e(v=vs.110).aspx
How about removing each item after selecting it?
Try something like
comboBox1.Items.RemoveAt(x);
After adding it and each time your
x = rnd.Next(0, 16);
code will reduce to
x = rnd.Next(0, 15);
until it reaches zero.
A different approach would be after selecting one randomly loop through all the filled ones (or all in general for simpler code) and check if it is already selected. If already selected get a new one until it's different.
For that you could use an array of textboxes (store what you have in an array) and loop through them like so
for(int i=0;i<16;i++)
if(textBoxArray[i].Text==comboBox2.Items[x].toString()){
chosen=true;
}
But removing them from the combobox is much simpler and much faster as code. If you want them to still appear in your combobox you could simultaneously in a List, get your items from that List and remove it from there.
The user will not see anything.
To accomplish this, is fairly simple.
First, you know there are 16 items in total. You don't need to randomize the list but rather, randomize the index that you use to access the item of the list. This part you know.
In order to avoid repeating items, you need to keep a list of indexes that have already been used. Once you have determined an unused index, that's when you need to access your list.
Example:
class Sample
{
List<int> _usedIndexes;
public Sample()
{
_usedIndexes = new List<int>();
}
public int GetRandomIndex(int s, e)
{
Random rnd = new Random();
//Initialize with a random number
int x = rnd.Next(s, e);
//While the index exists in the list of used indexes, get another random number.
while(_usedIndexes.Exists(index => index == x))
{
x = rnd.Next(s, e);
}
//Add the number to the list of used indexes
_usedIndexes.Add(x);
return x;
}
}
Then you simply access the List of names you have with the index you have acquired as follows:
int unusedIndex = GetRandomIndex(0, 16);
User1.Text = comboBox2.Items[unusedIndex].ToString();
on my site I allow people to buy subscriptions to my site in bulk(I call them vouchers). Once they have these vouchers, they give them to whoever and they enter that code into their account to upgrade them.
Right now I am thinking of doing 4 alphanumeric code(upper case, lower case and digits) and will have something like this
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var stringChars = new char[4];
var random = new Random();
for (int i = 0; i < stringChars.Length; i++)
{
stringChars[i] = chars[random.Next(chars.Length)];
}
var finalString = new String(stringChars);
For now I think that will give me more than enough combinations and if I ever do run out I can always up the length of the code. I want to keep it short because I don't want the user to have to type in huge as numbers.
I also don't have the time to make a more elegant solution maybe were they click a link or something in their email and it activates their account and of course this would cut down on someone trying to randomly guess a voucher number.
These are things I would deal with if the site every becomes more popular.
I am wondering though how can I handle the possible duplicate generation of the same voucher. My first thought was to check the database each time a voucher is created and if it exists then make a new one.
However that seems like it could be slow. So I thought also maybe getting all the keys first and store them in memory and they check there but if the list keeps growing I might run into out of memory exceptions and all that great stuff.
So does anyone have any ideas? Or am I stuck doing one of the 2 method I listed above?
I am using nhibernate, asp.net mvc and C#.
Edit
static void Main(string[] args)
{
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
HashAlgorithm sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
hold.Add(hex.Substring(0,3));
Console.WriteLine(hex.Substring(0, 4));
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
above is my attempt to try to use hashing. However I think I am missing something as it seems to have quite a bit more duplicates then expected.
Edit 2
I think I added what I was missing but not sure if this is exactly what he meant. I am also not sure what to do in a situation when I moved it as far as I can move it(my has seems to give me a length of 40 places I can move it).
static void Main(string[] args)
{
int subStringLength = 4;
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
SHA1CryptoServiceProvider sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
int startingPositon = 0;
string possibleVoucherCode = hex.Substring(startingPositon,subStringLength);
string voucherCode = Move(subStringLength, hold, hex, startingPositon, possibleVoucherCode);
hold.Add(voucherCode);
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
private static string Move(int subStringLength, List<string> hold, string hex, int startingPositon, string possibleVoucherCode)
{
if (hold.Contains(possibleVoucherCode))
{
int newPosition = startingPositon + 1;
if (newPosition <= hex.Length)
{
if ((newPosition + subStringLength) > hex.Length)
{
possibleVoucherCode = hex.Substring(newPosition, subStringLength);
return Move(subStringLength, hold, hex, newPosition, possibleVoucherCode);
}
// return something
return "0";
}
else
{
// return something
return "0";
}
}
else
{
return possibleVoucherCode;
}
}
}
It is going to be slow because you want to generate the vouchers randomly and then check the database for every generated code.
I would create a table vouchers with an id, the code and an is_used column. I would fill that table once with enough random codes. Since this can be done in a separate process, the performance won't be such a big problem. Let it run in the evening and the next day you get a fully filled vouchers-table.
If you want to prevent generating duplicate vouchers, that won't be a problem. You can generate them anyway and put them either in a System.Collections.Generic.HashSet (which prevents adding duplicates without throwing an exception) or call the Linq-method Distinct(), before adding them to that vouchers table.
If you insist on short codes:
Use a GUID as a primary key, generate one random number. How you might want to translate this in to alpha-num is up to you.
Use the last byte or two of the guid and the random number. 1234-684687 This should make it slightly less easy to bruteforce coupons. And handle any (rare) collisions with an exception.
Easy way to shorten an int, change it's base (from 10 to 62). (in VB, and this is old code)
This yields "2lkCB1" when given Int32.MaxValue
''//given intValue as your random integer
Dim result As String = String.Empty
Dim digits as String = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Dim x As Integer
While (intValue > 0)
x = intValue Mod digits.Length
result = digits(x) & result
intValue = intValue - x
intValue = intValue \ digits.Length
End While
Return result
But now we're already answering more than one question.
For a bulk data operation like this, I would recommend not using NHibernate and just doing straight ADO.NET.
Batch Check
Since you anticipate generating big batches of codes at once, you should batch multiple code checks into a single round-trip to the database. If you're using SQL Server 2008 or higher, you could do this using table-valued parameters, checking a whole list of codes at once.
SELECT DISTINCT b.Code
FROM #batch b
WHERE NOT EXISTS (
SELECT v.Code
FROM dbo.Voucher v
WHERE v.Code = b.Code
);
Concurrency
Now, what about concurrency issues? What if two users generate the same code at roughly the same time? Or simply in-between the time when we check the code for uniqueness and when we insert it into the Voucher table?
We can take care of that by modifying the query as follows:
DECLARE #batchid uniqueidentifier;
SET #batchid = NEWID();
INSERT INTO dbo.Voucher (Code, BatchId)
SELECT DISTINCT b.Code, #batchid
FROM #batch b
WHERE NOT EXISTS (
SELECT Code
FROM dbo.Voucher v
WHERE b.Code = v.Code
);
SELECT Code
FROM dbo.Voucher
WHERE BatchId = #batchid;
Executing via .NET
Assuming that you have defined the following table-valued user type...
CREATE TYPE dbo.VoucherCodeList AS TABLE (
Code nvarchar(8) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
/* !!! Remember to specify the collation on your Voucher.Code column too, since you want upper and lower-case codes. */
);
... you could execute this query via .NET code like this:
public ICollection<string> GenerateCodes(int numberOfCodes)
{
var result = new List<string>(numberOfCodes);
while (result.Count < numberOfCodes)
{
var batchSize = Math.Min(_batchSize, numberOfCodes - result.Count);
var batch = Enumerable.Range(0, batchSize)
.Select(x => GenerateRandomCode());
var oldResultCount = result.Count;
result.AddRange(FilterAndSecureBatch(batch));
var filteredBatchSize = result.Count - oldResultCount;
var collisionRatio = ((double)batchSize - filteredBatchSize) / batchSize;
// Automatically increment length of random codes if collisions begin happening too frequently
if (collisionRatio > _collisionThreshold)
CodeLength++;
}
return result;
}
private IEnumerable<string> FilterAndSecureBatch(IEnumerable<string> batch)
{
using (var command = _connection.CreateCommand())
{
command.CommandText = _sqlQuery; // the concurrency-safe query listed above
var metaData = new[] { new SqlMetaData("Code", SqlDbType.NVarChar, 8) };
var param = command.Parameters.Add("#batch", SqlDbType.Structured);
param.TypeName = "dbo.VoucherCodeList";
param.Value = batch.Select(x =>
{
var record = new SqlDataRecord(metaData);
record.SetString(0, x);
return record;
});
using (var reader = command.ExecuteReader())
while (reader.Read())
yield return reader.GetString(0);
}
}
Performance
After implementing all of this (and moving the command and parameter creation out of the loop so it would be re-used between batches), I was able to insert 10,000 codes using a batch size of 500 consistently in approx. 0.5 to 2 seconds, or 5 to 20 codes per millisecond.
Code Density / Collisions / Guessability
The _collisionThreshold field limits the density of your codes. It's a value between 0 and 1. Actually, it must be less than 1 or else you would wind up in an infinite loop when the 4 digit codes were exhausted (probably should add an assertion for this in code). I would recommend never turning it above 0.5 for performance reasons. More than 50% collisions would mean it's spending more time testing already-used codes than actually generating new ones.
Keeping the collision threshold low is how you would control how hard-to-guess your codes are. Setting _collisionThreshold to 0.01 would generate codes such that there's approximately a 1% chance of someone guessing a code.
If collisions occur too frequently, CodeLength (which is used by the GenerateRandomCode() method) will be incremented. This value needs to be persisted somewhere. After executing GenerateCodes(), check CodeLength to see if it has changed and then save the new value.
Source Code
The full code is available here: https://gist.github.com/3217856. I am the author of this code, and am releasing it under the MIT license. I had fun with this little challenge, and also got to learn how to pass a table-valued parameter to an inline parametrized query. I hadn't ever done that before. I've only ever passed them to full-fledged stored procedures.
A possible solution for you is like this:
Find the maximum ID of a voucher (an integer). Then, run any hash function on it, take the first 32 bits and convert to the string you want to show the user (or use a 32bit hash function such as Jenkins hash function). This will probably work, hash collisions are pretty rare. But this solution is very similar to yours, in the point of randomness.
You could run a test which finds the first 10 or 100 collisions (this should be enough for you) and forces the algorithm to "skip" them and use a different starting value. Then, you don't need to check the database at all (well, at least until you reach about 4294967296 vouchers...)
how about utilizing nHibernate's HiLo algorithm?
Here is an example on how you can get the next value (without DB access).
I have a list of games, which I need to select randomly, by the day of week,
the following code works perfectly for 1 game.
var gameOfTheDay = games.AllActive[(int)(DateTime.Today.GetHashCode()) % games.AllActive.Count()];
What I need is to return more then 1 game, randomized, based on X ( X in the case above is the day of the week, I will change it to a specific string )
I need this to create a semi-random generation of items.
Semi - since I want to feed it a keyword, and get the same results Per keyword
Random - since I need to make the game list random
For example, every time you enter Page with title "hello", you will see THE SAME games, that were selected specificly for that keyword from the games list based on the keyword "hello".
In the same way the gameOfTheDay Works.
You can use LINQ for this:
int limit = 10;
string keyword = "foo";
Random rng = new Random(keyword.GetHashCode());
var gamesOfTheDay = games.OrderBy(x => rng.Next()).Take(limit);
However, this will have some overhead for the sort. If you have a lot of games compared to the amount you're selecting—enough that the sort might be too expensive, and enough that it's safe to just keep retrying in the event of a collision—manually doing it might be faster:
HashSet<Game> gamesOfTheDay = new HashSet<Game>();
while(gamesOfTheDay.Count < limit && gamesOfTheDay.Count < games.Length)
{
int idx = rng.Next(games.Length);
gamesOfTheDay.Add(games[idx]);
}
Note that in either case the Random is constructed with a seed dependent on the keyword, so the order will be the same every time for that keyword. You could similarly combine the hashes of the current DateTime and the keyword to get a unique random sequence for that day-keyword combination.
Use similar code to what you have now to randomly add games to a list of games (which will initially be empty) - if the game is already in the list, don't add it.
Stop when the list is the right size.
Untested code:
var rand = new Random();
var randomGames = new List<game>();
while(randomGames.Count < limit)
{
var aGame = games.AllActive[rand.Next(limit)];
if (!randomGames.Contains(aGame))
{
randomGames.Add(aGame);
}
}