How to assert uniqueness of a huge collection of strings? - c#

Let's say I have an algorithm which takes an unsigned 64-bit integer as input, and yields a string as a result. The string's alphabet is limited to [a-z, A-Z, 0-9] and its' maximum length is 16. So that's or 47,672,401,706,823,533,450,263,330,816 possible results.
I would like to assert the uniqueness of the algorithm's output. Read: I want to verify there are no collisions.
Is there an easy/quick 'n dirty way to do this, without having to fall back to (e.g.) some kind of database?
[EDIT]
Some clarification: the concerns uttered in the comments are legit, but no worries, I wasn't really planning on iterating over all possible combinations, my lifespan will probably be sub-1 century ;) Nor did I write my own algorithm to generate unique ID's. I just saw this and started wondering how one would go about asserting uniqueness for algorithms with very large result sets that can't be handled in-memory
[/EDIT]

As said in the comments, It would take a very long time to compute every possible entries, but just for fun, here is a try:
var workspace = new DirectoryInfo("MyWorkspace");
if (workspace.Exists)
{
workspace.Delete();
}
workspace.Create();
var limit = 23997907;
var buffer = new HashSet<string>();
ulong i = 0;
int j = 0;
var stopWatch = Stopwatch.StartNew();
while (i <= ulong.MaxValue)
{
var result = YourSuperAlgorythm(i);
// Check the result with current results
if (buffer.Contains(result))
{
throw new Exception("Failure !");
}
// Check the result with older results
foreach (var file in workspace.GetFiles())
{
var content = new HashSet<string>(File.ReadAllText(file.FullName).Split(';'));
if (content.Contains(result))
{
throw new Exception("Failure !");
}
}
buffer[j] = result;
i++;
j++;
if (j == arrayLimit)
{
stopWatch.Stop();
Console.WriteLine("Resetting. This loop takes " + stopWatch.Elapsed.TotalMilliseconds + "ms");
j = 0;
var file = Path.GetRandomFileName();
File.WriteAllText(Path.Combine(workspace.FullName, file), String.Join(";", buffer));
buffer = new HashSet<string>();
stopWatch.Restart();
}
}
You could probably optimize it but you won't have enought of a lifetime to check the results. For now, it did not even create a file to store the first set of entries :D. I will edit this post when one loop will be done!
Your only option is to prove mathematically your algorithm. Good luck with that...
EDIT1: for my test, I use this function:
private static string YourSuperAlgorythm(ulong i)
{
return i.ToString("x");
}
EDIT2: One loop takes 1477221.4261ms (~25min). And then the String.Join(";", buffer) line failed (OutOfMemory). So 23997907 is not the max value for my try. It must be decreased!

Related

Optimize recursive function that generates a sequence of chars

I have a function that needs to generate a sequence of num chars to test a security algorithm.
For instance, patternLength of 4 it would generate:
["0000", "0001", "0002", ... , "9999"]
and 3 would generate:
["000", "001", ... "999"] and so on.
As we know, recursion can be pretty expensive and begins to slow to a crawl at higher lengths so I'm hoping to speed it up by using some caching or DP. Is this at all possible?
Current function with recursion:
private static List<char> PossibleCharacters = new List<char>()
{
'0','1','2','3','4','5','6','7','8','9'
};
public static List<string> SequenceGenerator(int patternLength)
{
List<string> result = new List<string>();
if (patternLength > 0)
{
List<string> prev = SequenceGenerator(patternLength - 1);
foreach (string entry in prev)
{
foreach (char ch in PossibleCharacters)
{
result.Add(entry + ch);
}
}
}
else
{
result.Add("");
}
return result;
}
My messy attempt. I'm building the list starting with length 1, then 2, and so on. It gets to 999 -> 1000 where it becomes wrong, it should be 999 -> 0000. Of course, I'll need to clear the contents of the cache where the lengths aren't what I want.
// patternLength = 4
string[] result = new string[10000];
string[] cache = new string[10000];
result.Append("");
dp[0] = "0";
int i = 1;
int j = 0;
foreach (string entry in result)
{
foreach (char ch in PossibleCharacters)
{
cache[i] = entry + ch;
i++;
}
result[j + 1] = j.ToString();
j++;
}
return cache.toList();
Thanks all for your time.
Your code is inherently bounded by an O(10^n) minimum time complexity (where n is the length of the tested string) and there is no way to bypass this limit without changing its' functionality. In other words, your code is likely mostly slow because you are doing a lot of work and not because you implemented the aforementioned work in a particularly inefficient way. Optimizing string generation is very unlikely to provide any significant improvement in run time. Utilizing multithreading is more likely to provide real-world performance improvements in your described scenario, so personally I would start from there. Additionally, if you only need to access each tested string once, it is preferable to to generate them one-by-one as they are used in order to reduce your program's space complexity. Currently you are storing every string for the entire duration of your program or function's execution, which is a potentially wasteful usage of memory if they are only needed once.

Parallel.ForEach search doesn't find the correct value

This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit
The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.
Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.
Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}

Performance impact of array writes is much larger than expected

I've stumbled upon this effect when debugging an application - see the repro code below.
It gives me the following results:
Data init, count: 100,000 x 10,000, 4.6133365 secs
Perf test 0 (False): 5.8289565 secs
Perf test 0 (True): 5.8485172 secs
Perf test 1 (False): 32.3222312 secs
Perf test 1 (True): 217.0089923 secs
As far as I understand, the array store operations shouldn't normally have such a drastic performance effect (32 vs 217 seconds). I wonder if anyone understands what effects are at play here?
UPD extra test added; Perf 0 shows the results as expected, Perf 1 - shows the performance anomaly.
class Program
{
static void Main(string[] args)
{
var data = InitData();
TestPerf0(data, false);
TestPerf0(data, true);
TestPerf1(data, false);
TestPerf1(data, true);
if (Debugger.IsAttached)
Console.ReadKey();
}
private static string[] InitData()
{
var watch = Stopwatch.StartNew();
var data = new string[100_000];
var maxString = 10_000;
for (int i = 0; i < data.Length; i++)
{
data[i] = new string('-', maxString);
}
watch.Stop();
Console.WriteLine($"Data init, count: {data.Length:n0} x {maxString:n0}, {watch.Elapsed.TotalSeconds} secs");
return data;
}
private static void TestPerf1(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[char.MaxValue];
int tmp = 0;
for (var j = 0; ; j++)
{
var allEmpty = true;
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
if (j < val.Length)
{
allEmpty = false;
var ch = val[j];
var count = counters[ch];
tmp ^= count;
if (testStore)
counters[ch] = count + 1;
}
}
if (allEmpty)
break;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 1 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
private static void TestPerf0(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[65536];
int tmp = 0;
for (var i = 0; i < 1_000_000_000; i++)
{
var j = i % counters.Length;
var count = counters[j];
tmp ^= count;
if (testStore)
counters[j] = count + 1;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 0 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
}
After testing your code for quite some time my best guess is, as already said in the comments, that you experience a lot of cache-misses with your current solution. The line:
if (testStore)
counters[ch] = count + 1;
might be force the compiler to completely load a new cache-line into the memory and displace the current content. There might also be some problems with branch-prediction in this scenario. This is highly hardware dependent and I'm not aware of a really good solution to test this in any interpreted language (It's also quite hard in compiled languages where the hardware is set and well-known).
After going through the disassembly, you can clearly see that you also introduce a whole bunch of new instruction which might increase the before mentioned problems further.
Overall I'd advice you the re-write the complete algorithm as there are better places to improve performance instead of picking at this one little assignment. This would be the optimizations I'd suggest (this also improves readability):
Invert your i and j loop. This will remove the allEmpty variable completely.
Cast ch to int with var ch = (int) val[j]; - because you ALWAYS use it as index.
Think about why this might be a problem at all. You introduce a new instruction and any instruction comes at a cost. If this is really the primary "hot-spot" of your code you can start to think about better solutions (Remember: "premature optimization is the root of all evil").
As this is a "test setting" which the name suggests, is this important at all? Just remove it.
EDIT: Why did I suggest to invert to loops? With this little rearrangement of code:
foreach (var val in vals)
{
foreach (int ch in val)
{
var count = counters[ch];
tmp ^= count;
if (testStore)
{
counters[ch] = count + 1;
}
}
}
I come from runtimes like this:
to runtimes like this:
Do you still think it's not worth a try? I saved some orders of magnitude here and nearly eliminated the effect of the if (to be clear - all optimizations are disabled in the settings). If there are special reasons not to do this you should tell us more about the context in which this code will be used.
EDIT2: For the in-depth answer. My best explanation for why this problem occurs is because you cross-reference your cache-lines. In the lines:
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
you load a really massive dataset. This is by far bigger than a cache-line itself. So it will most likely need to be loaded every iteration fresh from the memory into a new cache-line (displacing the old content). This is also known as "cache-thrashing" if I remember correctly. Thanks to #mjwills for pointing this out in his comment.
In my suggested solution, on the other hand, the content of a cache-line can stay alive as long as the inner loop did not exceed its boundaries (which happens a lot less if you use this direction of memory access).
This is the closest explanation why me code runs that much faster and it also supports the assumption that you have serious caching problems with your code.

Best Way to Check for Used Key with Nhibernate?

on my site I allow people to buy subscriptions to my site in bulk(I call them vouchers). Once they have these vouchers, they give them to whoever and they enter that code into their account to upgrade them.
Right now I am thinking of doing 4 alphanumeric code(upper case, lower case and digits) and will have something like this
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var stringChars = new char[4];
var random = new Random();
for (int i = 0; i < stringChars.Length; i++)
{
stringChars[i] = chars[random.Next(chars.Length)];
}
var finalString = new String(stringChars);
For now I think that will give me more than enough combinations and if I ever do run out I can always up the length of the code. I want to keep it short because I don't want the user to have to type in huge as numbers.
I also don't have the time to make a more elegant solution maybe were they click a link or something in their email and it activates their account and of course this would cut down on someone trying to randomly guess a voucher number.
These are things I would deal with if the site every becomes more popular.
I am wondering though how can I handle the possible duplicate generation of the same voucher. My first thought was to check the database each time a voucher is created and if it exists then make a new one.
However that seems like it could be slow. So I thought also maybe getting all the keys first and store them in memory and they check there but if the list keeps growing I might run into out of memory exceptions and all that great stuff.
So does anyone have any ideas? Or am I stuck doing one of the 2 method I listed above?
I am using nhibernate, asp.net mvc and C#.
Edit
static void Main(string[] args)
{
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
HashAlgorithm sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
hold.Add(hex.Substring(0,3));
Console.WriteLine(hex.Substring(0, 4));
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
above is my attempt to try to use hashing. However I think I am missing something as it seems to have quite a bit more duplicates then expected.
Edit 2
I think I added what I was missing but not sure if this is exactly what he meant. I am also not sure what to do in a situation when I moved it as far as I can move it(my has seems to give me a length of 40 places I can move it).
static void Main(string[] args)
{
int subStringLength = 4;
List<string> hold = new List<string>();
for (int i = 0; i < 10000; i++)
{
SHA1CryptoServiceProvider sha = new SHA1CryptoServiceProvider();
byte[] result = sha.ComputeHash(BitConverter.GetBytes(i));
string hex = null;
foreach (byte x in result)
{
hex += String.Format("{0:x2}", x);
}
int startingPositon = 0;
string possibleVoucherCode = hex.Substring(startingPositon,subStringLength);
string voucherCode = Move(subStringLength, hold, hex, startingPositon, possibleVoucherCode);
hold.Add(voucherCode);
}
Console.WriteLine("Number of Distinct values {0}", hold.Distinct().Count());
}
private static string Move(int subStringLength, List<string> hold, string hex, int startingPositon, string possibleVoucherCode)
{
if (hold.Contains(possibleVoucherCode))
{
int newPosition = startingPositon + 1;
if (newPosition <= hex.Length)
{
if ((newPosition + subStringLength) > hex.Length)
{
possibleVoucherCode = hex.Substring(newPosition, subStringLength);
return Move(subStringLength, hold, hex, newPosition, possibleVoucherCode);
}
// return something
return "0";
}
else
{
// return something
return "0";
}
}
else
{
return possibleVoucherCode;
}
}
}
It is going to be slow because you want to generate the vouchers randomly and then check the database for every generated code.
I would create a table vouchers with an id, the code and an is_used column. I would fill that table once with enough random codes. Since this can be done in a separate process, the performance won't be such a big problem. Let it run in the evening and the next day you get a fully filled vouchers-table.
If you want to prevent generating duplicate vouchers, that won't be a problem. You can generate them anyway and put them either in a System.Collections.Generic.HashSet (which prevents adding duplicates without throwing an exception) or call the Linq-method Distinct(), before adding them to that vouchers table.
If you insist on short codes:
Use a GUID as a primary key, generate one random number. How you might want to translate this in to alpha-num is up to you.
Use the last byte or two of the guid and the random number. 1234-684687 This should make it slightly less easy to bruteforce coupons. And handle any (rare) collisions with an exception.
Easy way to shorten an int, change it's base (from 10 to 62). (in VB, and this is old code)
This yields "2lkCB1" when given Int32.MaxValue
''//given intValue as your random integer
Dim result As String = String.Empty
Dim digits as String = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Dim x As Integer
While (intValue > 0)
x = intValue Mod digits.Length
result = digits(x) & result
intValue = intValue - x
intValue = intValue \ digits.Length
End While
Return result
But now we're already answering more than one question.
For a bulk data operation like this, I would recommend not using NHibernate and just doing straight ADO.NET.
Batch Check
Since you anticipate generating big batches of codes at once, you should batch multiple code checks into a single round-trip to the database. If you're using SQL Server 2008 or higher, you could do this using table-valued parameters, checking a whole list of codes at once.
SELECT DISTINCT b.Code
FROM #batch b
WHERE NOT EXISTS (
SELECT v.Code
FROM dbo.Voucher v
WHERE v.Code = b.Code
);
Concurrency
Now, what about concurrency issues? What if two users generate the same code at roughly the same time? Or simply in-between the time when we check the code for uniqueness and when we insert it into the Voucher table?
We can take care of that by modifying the query as follows:
DECLARE #batchid uniqueidentifier;
SET #batchid = NEWID();
INSERT INTO dbo.Voucher (Code, BatchId)
SELECT DISTINCT b.Code, #batchid
FROM #batch b
WHERE NOT EXISTS (
SELECT Code
FROM dbo.Voucher v
WHERE b.Code = v.Code
);
SELECT Code
FROM dbo.Voucher
WHERE BatchId = #batchid;
Executing via .NET
Assuming that you have defined the following table-valued user type...
CREATE TYPE dbo.VoucherCodeList AS TABLE (
Code nvarchar(8) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL
/* !!! Remember to specify the collation on your Voucher.Code column too, since you want upper and lower-case codes. */
);
... you could execute this query via .NET code like this:
public ICollection<string> GenerateCodes(int numberOfCodes)
{
var result = new List<string>(numberOfCodes);
while (result.Count < numberOfCodes)
{
var batchSize = Math.Min(_batchSize, numberOfCodes - result.Count);
var batch = Enumerable.Range(0, batchSize)
.Select(x => GenerateRandomCode());
var oldResultCount = result.Count;
result.AddRange(FilterAndSecureBatch(batch));
var filteredBatchSize = result.Count - oldResultCount;
var collisionRatio = ((double)batchSize - filteredBatchSize) / batchSize;
// Automatically increment length of random codes if collisions begin happening too frequently
if (collisionRatio > _collisionThreshold)
CodeLength++;
}
return result;
}
private IEnumerable<string> FilterAndSecureBatch(IEnumerable<string> batch)
{
using (var command = _connection.CreateCommand())
{
command.CommandText = _sqlQuery; // the concurrency-safe query listed above
var metaData = new[] { new SqlMetaData("Code", SqlDbType.NVarChar, 8) };
var param = command.Parameters.Add("#batch", SqlDbType.Structured);
param.TypeName = "dbo.VoucherCodeList";
param.Value = batch.Select(x =>
{
var record = new SqlDataRecord(metaData);
record.SetString(0, x);
return record;
});
using (var reader = command.ExecuteReader())
while (reader.Read())
yield return reader.GetString(0);
}
}
Performance
After implementing all of this (and moving the command and parameter creation out of the loop so it would be re-used between batches), I was able to insert 10,000 codes using a batch size of 500 consistently in approx. 0.5 to 2 seconds, or 5 to 20 codes per millisecond.
Code Density / Collisions / Guessability
The _collisionThreshold field limits the density of your codes. It's a value between 0 and 1. Actually, it must be less than 1 or else you would wind up in an infinite loop when the 4 digit codes were exhausted (probably should add an assertion for this in code). I would recommend never turning it above 0.5 for performance reasons. More than 50% collisions would mean it's spending more time testing already-used codes than actually generating new ones.
Keeping the collision threshold low is how you would control how hard-to-guess your codes are. Setting _collisionThreshold to 0.01 would generate codes such that there's approximately a 1% chance of someone guessing a code.
If collisions occur too frequently, CodeLength (which is used by the GenerateRandomCode() method) will be incremented. This value needs to be persisted somewhere. After executing GenerateCodes(), check CodeLength to see if it has changed and then save the new value.
Source Code
The full code is available here: https://gist.github.com/3217856. I am the author of this code, and am releasing it under the MIT license. I had fun with this little challenge, and also got to learn how to pass a table-valued parameter to an inline parametrized query. I hadn't ever done that before. I've only ever passed them to full-fledged stored procedures.
A possible solution for you is like this:
Find the maximum ID of a voucher (an integer). Then, run any hash function on it, take the first 32 bits and convert to the string you want to show the user (or use a 32bit hash function such as Jenkins hash function). This will probably work, hash collisions are pretty rare. But this solution is very similar to yours, in the point of randomness.
You could run a test which finds the first 10 or 100 collisions (this should be enough for you) and forces the algorithm to "skip" them and use a different starting value. Then, you don't need to check the database at all (well, at least until you reach about 4294967296 vouchers...)
how about utilizing nHibernate's HiLo algorithm?
Here is an example on how you can get the next value (without DB access).

My static C# function is playing games with me... totaly weird!

So, I've written a small and, from what I initially thought, easy method in C#.
This static method is meant to be used as a simple password suggestion generator, and the code looks like this:
public static string CreateRandomPassword(int outputLength, string source = "")
{
var output = string.Empty;
for (var i = 0; i < outputLength; i++)
{
var randomObj = new Random();
output += source.Substring(randomObj.Next(source.Length), 1);
}
return output;
}
I call this function like this:
var randomPassword = StringHelper.CreateRandomPassword(5, "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
Now, this method almost always return random strings like "AAAAAA", "BBBBBB", "888888" etc.. , where I thought it should return strings like "A8JK2A", "82mOK7" etc.
However, and here is the wierd part; If I place a breakpoint, and step through this iteration line by line, I get the correct type of password in return. In 100% of the other cases, when Im not debugging, it gives me crap like "AAAAAA", "666666", etc..
How is this possible? Any suggestion is greatly appreciated! :-)
BTW, my system: Visual Studio 2010, C# 4.0, ASP.NET MVC 3 RTM project w/ ASP.NET Development Server. Haven't tested this code in any other environments.
Move the declaration for the randomObj outside the loop. When you're debugging it, it creates it with a new seed each time, because there's enough time difference for the seed to be different. But when you're not debugging, the seed time is basically the same for each iteration of the loop, so it's giving you the same start value each time.
And a minor nit -- it's a good habit to use a StringBuilder rather than a string for this sort of thing, so you don't have to re-initialize the memory space every time you append a character to the string.
In other words, like this:
public static string CreateRandomPassword(int outputLength, string source = "")
{
var output = new StringBuilder();
var randomObj = new Random();
for (var i = 0; i < outputLength; i++)
{
output.Append(source.Substring(randomObj.Next(source.Length), 1));
}
return output.ToString();
}
The behavior you're seeing is because Random is time-based, and when you're not debugging it flies through all 5 iterations at the same moment (more or less). So you're asking for the first random number off the same seed. When you're debugging, it takes long enough to get a new seed each time.
Move the declaration of Random outside the loop:
var randomObj = new Random();
for (var i = 0; i < outputLength; i++)
{
output += source.Substring(randomObj.Next(source.Length), 1);
}
Now you're moving forward 5 steps away from a Random seed instead of moving 1 step away from the same Random seed 5 times.
You are instantiating a new instance of Random() on each iteration through the loop with a new time-dependent seed. Given the granularity of the system clock and the speed of modern CPUs, this pretty much guarantees that you restart the pseudo-random sequence over and over with the same seed.
Try something like the following, though if you're single-threaded, you can safely omit the lock():
private static Random randomBits = new Random() ;
public static string CreateRandomPassword(int outputLength, string source = "")
{
StringBuilder sb = new StringBuilder(outputLength) ;
lock ( randomBits )
{
while ( sb.Length < outputLength )
{
sb.Append( randomBits.Next( source.Length) , 1 ) ;
}
}
return sb.ToString() ;
}
You only instantiate the RNG once. Every draws bits from the same RNG, so it should behave much more like a source of entropy. If you need repeatability for testing, use the Random constructor overload that lets you supply the seed. Same seed == same pseudo-random sequence.

Categories

Resources