Random.Next() - finding the Nth .Next() - c#

Given a consistently seeded Random:
Random r = new Random(0);
Calling r.Next() consistently produces the same series; so is there a way to quickly discover the N-th value in that series, without calling r.Next() N times?
My scenario is a huge array of values created via r.Next(). The app occasionally reads a value from the array at arbitrary indexes. I'd like to optimize memory usage by eliminating the array and instead, generating the values on demand. But brute-forcing r.Next() 5 million times to simulate the 5 millionth index of the array is more expensive than storing the array. Is it possible to short-cut your way to the Nth .Next() value, without / with less looping?

I don't know the details of the PRNG used in the BCL, but my guess is that you will find it extremely difficult / impossible to find a nice, closed-form solution for N-th value of the series.
How about this workaround:
Make the seed to the random-number generator the desired index, and then pick the first generated number. This is equally 'deterministic', and gives you a wide range to play with in O(1) space.
static int GetRandomNumber(int index)
{
return new Random(index).Next();
}

In theory if you knew the exact algorithm and the initial state you'd be able to duplicate the series but the end result would just be identical to calling r.next().
Depending on how 'good' you need your random numbers to be you might consider creating your own PRNG based on a Linear congruential generator which is relatively easy/fast to generate numbers for. If you can live with a "bad" PRNG there are likely other algorithms that may be better to use for your purpose. Whether this would be faster/better than just storing a large array of numbers from r.next() is another question.

No, I don't believe there is. For some RNG algorithms (such as linear congruential generators) it's possible in principle to get the n'th value without iterating through n steps, but the Random class doesn't provide a way of doing that.
I'm not sure whether the algorithm it uses makes it possible in principle -- it's a variant (details not disclosed in documentation) of Knuth's subtractive RNG, and it seems like the original Knuth RNG should be equivalent to some sort of polynomial-arithmetic thing that would allow access to the n'th value, but (1) I haven't actually checked that and (2) whatever tweaks Microsoft have made might break that.
If you have a good enough "scrambling" function f then you can use f(0), f(1), f(2), ... as your sequence of random numbers, instead of f(0), f(f(0)), f(f(f(0))), etc. (the latter being roughly what most RNGs do) and then of course it's trivial to start the sequence at any point you please. But you'll need to choose a good f, and it'll probably be slower than a standard RNG.

You could build your own on-demand dictionary of 'indexes' & 'random values'. This assumes that you will always 'demand' indexes in the same order each time the program runs or that you don't care if the results are the same each time the program runs.
Random rnd = new Random(0);
Dictionary<int,int> randomNumbers = new Dictionary<int,int>();
int getRandomNumber(int index)
{
if (!randomNumbers.ContainsKey(index))
randomNumbers[index] = rnd.Next();
return randomNumbers[index];
}

Related

Random number within a range biased towards the minimum value of that range

I want to generate random numbers within a range (1 - 100000), but instead of purely random I want the results to be based on a kind of distribution. What I mean that in general I want the numbers "clustered" around the minimum value of the range (1).
I've read about Box–Muller transform and normal distributions but I'm not quite sure how to use them to achieve the number generator.
How can I achieve such an algorithm using C#?
There are a lot of ways doing this (using uniform distribution prng) here few I know of:
Combine more uniform random variables to obtain desired distribution.
I am not a math guy but there sure are equations for this. This kind of solution has usually the best properties from randomness and statistical point of view. For more info see the famous:
Understanding “randomness”.
but there are limited number of distributions we know the combinations for.
Apply non linear function on uniform random variable
This is the simplest to implement. You simply use floating randoms in <0..1> range apply your non linear function (that change the distribution towards your wanted shape) on them (while result is still in the <0..1> range) and rescale the result into your integer range for example (in C++):
floor( pow( random(),5 ) * 100000 )
The problem is that this is just blind fitting of the distribution so you usually need to tweak the constants a bit. It a good idea to render histogram and randomness graphs to see the quality of result directly like in here:
How to seed to generate random numbers?
You can also avoid too blind fitting with BEZIERS like in here:
Random but most likely 1 float
Distribution following pseudo random generator
there are two approaches I know of for this the simpler is:
create big enough array of size n
fill it with all values following the distribution
so simply loop through all values you want to output and compute how many of them will be in n size array (from your distribution) and add that count of the numbers into array. Beware the filled size of the array might be slightly less than n due to rounding. If n is too small you will be missing some less occurring numbers. so if you multiply probability of the least probable number and n it should be at least >=1. After the filling change the n into the real array size (number of really filled numbers in it).
shuffle the array
now use the array as linear list of random numbers
so instead of random() you just pick a number from array and move to the next one. Once you get into n-th value schuffle the array and start from first one again.
This solution has very good statistical properties (follows the distribution exactly) but the randomness properties are not good and requires array and occasional shuffling. For more info see:
How to efficiently generate a set of unique random numbers with a predefined distribution?
The other variation of this is to avoid use of array and shuffling. It goes like this:
get random value in range <0..1>
apply inverse cumulated distribution function to convert to target range
as you can see its like the #2 Apply non linear function... approach but instead of "some" non linear function you use directly the distribution. So if p(x) is probability of x in range <0..1> where 1 means 100% than we need a function that cumulates all the probabilities up to x (sorry do not know the exact math term in English). For integers:
f(x) = p(0)+p(1)+...+p(x)
Now we need inverse function g() to it so:
y = f(x)
x = g(y)
Now if my memory serves me well then the generation should look like this:
y = random(); // <0..1>
x = g(y); // probability -> value
Many distributions have known g() function but for those that do not (or we are too lazy to derive it) you can use binary search on p(x). Too lazy to code it so here slower linear search version:
for (x=0;x<max;x++) if (f(x)>=y) break;
So when put all together (and using only p(x)) I got this (C++):
y=random(); // uniform distribution pseudo random value in range <0..1>
for (f=0.0,x=0;x<max;x++) // loop x through all values
{
f+=p(x); // f(x) cumulative distribution function
if (f>=y) break;
}
// here x is your pseudo random value following p(x) distribution
This kind of solution has usually very good both statistical and randomness properties and does not require that the distribution is a continuous function (it can be even just an array of values instead).

How double hashing works in case of the .NET Dictionary?

The other day I was reading that article on CodeProject
And I got hard times understanding a few points about the implementation of the .NET Dictionary (considering the implementation here without all the optimizations in .NET Core):
Note: If will add more items than the maximum number in the table
(i.e 7199369), the resize method will manually search the next prime
number that is larger than twice the old size.
Note: The reason that the sizes are being doubled while resizing the
array is to make the inner-hash table operations to have asymptotic
complexity. The prime numbers are being used to support
double-hashing.
So I tried to remember my old CS classes back a decade ago with my good friend wikipedia:
Open Addressing
Separate Chaining
Double Hashing
But I still don't really see how first it relates to double hashing (which is a collision resolution technique for open-addressed hash tables) except the fact that the Resize() method double of the entries based on the minimum prime number (taken based on the current/old size), and tbh I don't really see the benefits of "doubling" the size, "asymptotic complexity" (I guess that article meant O(n) when the underlying array (entries) is full and subject to resize).
First, If you double the size with or without using a prime, is it not really the same?
Second, to me, the .NET hash table use a separate chaining technique when it comes to collision resolution.
I guess I must have missed a few things and I would like to have someone who can shed the light on those two points.
I got my answer on Reddit, so I am gonna try to summarize here:
Collision Resolution Technique
First off, it seems that the collision resolution is using Separate Chaining technique and not Open addressing technique and therefore there is no Double Hashing strategy:
The code goes as follows:
private struct Entry
{
public int hashCode; // Lower 31 bits of hash code, -1 if unused
public int next; // Index of next entry, -1 if last
public TKey key; // Key of entry
public TValue value; // Value of entry
}
It just that instead of having one dedicated storage for all the entries sharing the same hashcode / index like a list or whatnot for every bucket, everything is stored in the same entries array.
Prime Number
About the prime number the answer lies here: https://cs.stackexchange.com/a/64191/42745 it's all about multiple:
Therefore, to minimize collisions, it is important to reduce the number of common factors between m and the elements of K. How can this
be achieved? By choosing m to be a number that has very few factors: a
prime number.
Doubling the underlying entries array size
Help to avoid call too many resize operations (i.e. copies) by increasing the size of the array by enough amount of slots.
See that answer: https://stackoverflow.com/a/2369504/4636721
Hash-tables could not claim "amortized constant time insertion" if,
for instance, the resizing was by a constant increment. In that case
the cost of resizing (which grows with the size of the hash-table)
would make the cost of one insertion linear in the total number of
elements to insert. Because resizing becomes more and more expensive
with the size of the table, it has to happen "less and less often" to
keep the amortized cost of insertion constant.

Create a identical "random" float based on multiple data

I'm working on a game (Unity) and I need to create a random float value (between 0 and 1) based on multiple int and/or float.
I think it'll be more easy to manually create a single string for the function, but maybe it could accept a list of int and/or float.
Example of result:
"[5-91]-52-1" > 0.158756..
Important points:
The distribution of results (between 0 and 1) must be equals (don't want 90% of results between 0.45 and 0.55)
Asking 2 times for the same string must return the exact same result (even if I reload the app, or start it on different computers, ..)
Results have no need to be unique.
Bonus Point:
Sometime I need that close similar string return close result, but not everytime. It's possible for "random generation" to handle a boolean with this feature ?
What you've described is essentially definition of a hash function.
So just use one and normalize results into range you want. Most basic case can use GetHashCode, but it is not guaranteed to produce the same results across different versions of framework.
Stable version that guarantees to provide exactly the same results across machines would be to use well known good hash - like crypto hash SHA256 and take several first bytes of result as integer and normalize. Crypto hash functions also conveniently take byte arrays as input so you can combine multiple values as bytes directly and get stable result.
var intValue = 42;
var bytesToHash = BitConverter.GetBytes(intValue);
var hash = System.Security.Cryptography.SHA256Managed.Create()
.ComputeHash(bytesToHash);
var toNormalize = BitConverter.ToUInt32(hash,0);
var fancyRandom = (double)toNormalize/UInt32.MaxValue;
To combine multiple values into byte array you can either manually combine results of BitConverter.GetBytes or use BinaryWriter on MemoryStream.
Alternatively you can use resulting integer as seed for some custom implementation of pseudo-random generator (as one in .Net does not guarantee to provide same results across machines/version of .Net) as suggested in comments, but I don't think it will give significantly better distribution.
Note: make sure resulting numbers are distributed "randomly enough" for your case. Crypto hashing functions likely give result you want but I'm not sure how to prove that.
For "bonus" part: I would be very surprised if you can find pseudo-random generator that will consistently produce close results for "similar" seeds. Instead you can use same approach as above for separate parts - one that "same" and other that handles variation (i.e. intValue & 0xFFFFFF00 for stable part, intValue & 0xFF for "small difference") and than combine resulting "random" numbers with some weight: randomFromStable + 0.05 * randomFromDifference.
I would suggest using the hashcode (or something similar) as the seed to a Random object. Hashcodes must be the same for the same string so you will always get the same sequence back.
As Nuf notes, hashcodes are only guaranteed to be the same in the same app-domain; so it may not work across restarts.
As to your bonus point, getting there without writing your own RNG will be hard. Any variance in the seed can and should cause a lot of variation in the resulting sequence.

C# random number generator

I'm looking for a random number that always generates the same "random" number for a given seed. The seed is defined by x + (y << 16), where x and y are positions on a heightmap.
I could create a new instance of System.Random every time with my seed, but thats a lot of GC pressure. Especially since this will be called a lot of times.
EDIT:
"A lot" means half a million times.
Thanks to everyone that answered! I know I was unclear, but I learned here that a hash function is exactly what I want.
Since a hash function is apparently closer to what you want, consider a variation of the following:
int Hash(int n) {
const int prime = 1031;
return (((n & 0xFFFF) * prime % 0xFFFF)) ^ (n >> 16);
}
This XORs the least significant two bytes with the most significant two bytes of a four-byte number after shuffling the least significant two byte around a little bit by multiplication with a prime number. The result is thus in the range 0 < 0x10000 (i.e. it fits in an Int16).
This should “shuffle” the input number a bit, reliably produces the same value for the same input and looks “random”. Now, I haven’t done a stochastic analysis of the distribution and if ever a statistician was to look at it, he would probably go straight into anaphylactic shock. (In fact, I have really written this implementation off the top of my head.)
If you require something less half-baked, consider using an established check sum (such as CRC32).
I could create a new instance of System.Random every time with my seed
Do that.
but thats a lot of GC pressure. Especially since this will be called a lot of times.
How many times do you call it? Does it verifiably perform badly? Notice, the GC is optimized to deal with lots of small objects with short life time. It should deal with this easily.
And, what would be the alternative that takes a seed but doesn’t create a new instance of some object? That sounds rather like a badly designed class, in fact.
See Simple Random Number Generation for C# source code. The state is just two unsigned integers, so it's easy to keep up with between calls. And the generator passes standard tests for quality.
What about storing a Dictionary<int, int> the provides the first value returned by a new Random object for a given seed?
class RandomSource
{
Dictionary<int, int> _dictionary = new Dictionary<int, int>();
public int GetValue(int seed)
{
int value;
if (!_dictionary.TryGetValue(seed, out value))
{
value = _dictionary[seed] = new Random(seed).Next();
}
return value;
}
}
This incurs the GC pressue of constructing a new Random instance the first time you want a value for a particular seed, but every subsequent call with the same seed will retrieve a cached value instead.
I don't think a "random number generator" is actually what you're looking for. Simply create another map and pre-populate it with random values. If your current heightmap is W x H, the simplest solution would be to create a W x H 2D array and just fill each element with a random value using System.Random. You can then look up the pre-populated random value for a particular (x, y) coordinate whenever you need it.
Alternatively, if your current heighmap actually stores some kind of data structure, you could modify that to store the random value in addition to the height value.
A side benefit that this has is that later, if you need to, you can perform operations over the entire "random" map to ensure that it has certain properties. For example, depending on the context (is this for a game?) you may find later that you want to smooth the randomness out across the map. This is trivial if you precompute and store the values as I've described.
CSharpCity provides source to several random number generators. You'll have to experiment to see whether these have less impact on performance than System.Random.
ExtremeOptimization offers a library with several generators. They also discuss quality and speed of the generators and compare against System.Random.
Finally, what do you mean by GC pressure? Do you really mean memory pressure, which is the only context I've seen it used in? The job of the GC is to handle the creation and destruction of gobs of objects very efficiently. I'm concerned that you're falling for the premature optimization temptation. Perhaps you can create a test app that gives some cold, hard numbers.

Is there a method to randomize integers so that visitors can't figure out the sequence of objects

I have an id in the url. So normally it will be an auto number and so it will be 1,2,3,4,5,.....
I don't want visitors to figure out the sequence and so i want to let the number be kinda of random. So i want 1 to be converted to 174891 and 2 to 817482 and so on. But i want this to be in a specique range like 1 to 1,000,000.
I figured out i can do this using xoring and shifting of the bits of the integer. But i was wondering if this already was implemented in some place.
Thanks
You could pass your integer as the seed to a random number generator. (Just make sure that it would be unique)
You could also generate the SHA-512c hash of the integer and use that instead.
However, the best thing to do here is to use a GUID instead of an integer.
EDIT: If it needs to be reversible, the correct way to do it is to encrypt the number using AES or a different encryption algorithm. However, this won't result in a number between one and a million.
Don't rely on obscurity -- i.e., non-sequential ids -- for security. Build your app so that even if someone does guess the next id, it's still secure.
If you do need non-sequential ids, though. Generate a new id each time randomly. Store that in your table as a indexed (uniquely) column along with your autogenerated primary key id. Then all you need to do is a look up on that column to get back the real id.
EDIT: In general, I prefer tvanfosson's approach on both scores. However, here's an answer to the question as stated...
These are fairly strange design constraints, to be honest - but they're reasonably easy to deal with:
Pick an arbitrary RNG seed which you will use on every execution of your program
Create an instance of Random using that seed
Create an array of integers 1..1000000
Shuffle the array using the Random instance
Create a "reverse mapping" array by going through the original array like this:
int[] reverseMapping = new int[mapping.Length];
for (int i = 0; i < mapping.Length; i++)
{
reverseMapping[mapping[i]] = i + 1;
}
Then you can map both ways. This does rely on the algorithm used by Random not changing, admittedly... if that's a concern, you could always generate this mapping once and save it somewhere.
If you're looking for a fairly simple pseudo-random integer sequence, the linear congruential method is pretty good:
ni+1 = (a×ni + k) mod m
Use prime numbers for a and k.

Categories

Resources