Selecting random item from list having probability weighting using c#? - c#

I have a scenario where i a was taking a list of users (20 users) from my database, where i was giving
weighting for users
first 5 users probability factor of 4
next 5 users probability factor of 3
next 5 users probability factor of 2
next 5 users probability factor of 1
So an user that occurs in the first 5 users is 4 times more
likely to occur than an user in the last 5.
So how can i select a random user from the list using probability in c#?
can anybody help me in doing this i am totally stuck up logically?

You could add the uses the number of probability times in the list. So the 5 first users are 4 times in the list, next 5 users 3 times and so on. Then just select one user from the complete list.

Create a list of partial sums of weights. In your example, it would be
[4, 8, 12, 16, 20, 23, ...]
The last element is the sum of all weights. Pick a random number between 0 and this sum (exclusive). Then your element is the first element with partial sum greater then the random number. So if you got 11, you need the third element, if you got 16, the fifth, etc.

I have a (bit hacky) solution for you:
Create a list containing the users, where each user is added as often as his weightage is. (e.g. User has a weightage of 5, add him 5 times to the list). Then us a Random to fetch a user from that list, that should solve your problem.

One solution would be to find the smallest common denominator of the weights (or just multiply them together) and create a new list that contains the keys of the first list, but multiple times, ie:
user1
user1
user2
user3
user3
user3
Then just to a newList.skip(Random.Next(newList.Count)).Take(1) and you are set!

You could apportion the probability range amongst the users using a dictionary. eg
User 1 has 1-4 (so max of 4)
User 2 has 5-8 (max of 8) etc etc...
Then after selecting the random number find which user within the dictionary it relates to. You can do this using Linq like so...
int iUser = users.Where(p => (choice <= p.Value)).First().Key;
..where users is a Dictionary<int,int> (Key = user number, Value = max value) and choice is the randomly generated value.
This is obviously more complex than the "multiple entries" method proposed by others but has its advantages if you
a) need a fractional weighting which makes the common denominator of your multiple entry method very small (resulting in many entries) or
b) need to weight heavily in favour of particular users (which would again have the effect of making the multiple entry method very large).
Working Example at ideone.

Related

Efficient and deterministic ranking items in collection?

I have list of billions of items in SQL which can be shuffled by user at random, by moving them inside list to another position, I consider using simple double divide solution:
Id, Rank
1 10
2 20
3 30
4 40
5 50
Now user moves item id=3 to first position and I perform item rank recalculation based on their adjasent items (0 - means no relative from left, max - no relative from right):
Id, Rank
3 (0+10)/2 = 5
1 10
2 20
4 40
5 50
Now there is a bug - until it reach epsilon for double, it will work, after that you will get a couple of elements with epsilon and they are not possible to move.
This can be avoided by infrequent recalculation of stack rank for entire collection, but I hesitate at the moment to implement this, because this looks too much.
I wanted to know is there some other algorithmic solution other than changing billions of items or is there a well-known name to this problem to find appropriate solution myself.

How to pick random items through cumulative probability?

Exercise Background
The exercise consists in generating a 2D map with a user given x,y size of said map, and then place on each cell of the map random items from a table.
I have a cell in an [x, y] coordinate of an Items matrix and I have to pick items randomly for every cell of this matrix.
My Problem
I have to select random items from a table of 4 items that have their probabilities shown in cumulative probability, and a cell that has such items can have more than 1 and different combinations of those items.
I don't really know how to go about this problem, taking in account that 2 of the items have the same probability on the given table for the homework.
This is the table of probability given:
Food - 1
Weapons - 0.5
Enemy - 0.5
Trap - 0.3
My Items enumeration:
[Flags]
enum Items
{
Food = 1<<0,
Weapon = 1<<1,
Enemy = 1<<2,
Trap = 1<<3
}
Again, the expected output is to pick randomly through this percentages what items does 1 cell have. What I'd like to have as an answer would be just a start or a way to go about this problem please, I still want to try and do it myself, avoid complete code solutions if you can.
I find it easier to work with integers in this type of problem, so I'll work with:
Food - 10
Weapons - 5
Enemy - 5
Trap - 3
That gives a total of 10 + 5 + 5 + 3 = 23 total possible options.
Most computer RNGs work from base 0, so split the 23 options (as in 0..22) like this:
Food - 0..9 giving 10 options.
Weapons - 10..14 giving 5 options.
Enemy - 15..19 giving 5 options.
Trap - 20..22 giving 3 options.
Work through the possibilities in order, stopping when you reach the selected option. I will use pseudocode as my C++ is very rusty:
function pickFWET()
pick <- randomInRange(0 to 22);
if (pick < 10) return FOOD;
if (pick < 15) return WEAPONS;
if (pick < 20) return ENEMY;
if (pick < 23) return TRAP;
// If we reach here then there was an error.
throwError("Wrong pick in pickFWET");
end function pickFWET
If two items have the same cumulative probability then the probability of getting the latter item is 0. Double check the probability table, but if it is correct, then 'Weapons' is not a valid option to get.
However in general. If you could 'somehow' generate a random number between 0 and 1, the problem would be easy right? With a few if conditions you can choose one of the options given this random number.
With a little bit of search you can easily find how to generate a random number in whatever language you desire.

Ideas about Generating Untraceable Invoice IDs

I want to print invoices for customers in my app. Each invoice has an Invoice ID. I want IDs to be:
Sequential (ids entered lately come late)
32 bit integers
Not easily traceable like 1 2 3 so that people can't tell how many items we sell.
An idea of my own:
Number of seconds since a specific date & time (e.g. 1/1/2010 00 AM).
Any other ideas how to generate these numbers ?
I don't like the idea of using time. You can run into all sorts of issues - time differences, several events happening in a single second and so on.
If you want something sequential and not easily traceable, how about generating a random number between 1 and whatever you wish (for example 100) for each new Id. Each new Id will be the previous Id + the random number.
You can also add a constant to your IDs to make them look more impressive. For example you can add 44323 to all your IDs and turn IDs 15, 23 and 27 into 44338, 44346 and 44350.
There are two problems in your question. One is solvable, one isn't (with the constraints you give).
Solvable: Unguessable numbers
The first one is quite simple: It should be hard for a customer to guess a valid invoice number (or the next valid invoice number), when the customer has access to a set of valid invoice numbers.
You can solve this with your constraint:
Split your invoice number in two parts:
A 20 bit prefix, taken from a sequence of increasing numbers (e.g. the natural numbers 0,1,2,...)
A 10 bit suffix that is randomly generated
With these scheme, there are a bout 1 million valid invoice numbers. You can precalculate them and store them in the database. When presented with a invoice number, check if it is in your database. When it isn't, it's not valid.
Use a SQL sequence for handing out numbers. When issuing a new (i.e. unused) invoice number, increment the seuqnce and issue the n-th number from the precalculated list (order by value).
Not solvable: Guessing the number of customers
When you want to prevent a customer having a number of valid invoice numbers from guessing how much invoice numbers you have issued yet (and there for how much customers you have): This is not possible.
You have hare a variant form the so called "German tank problem". I nthe second world war, the allies used serial numbers printed on the gear box of german tanks to guestimate, how much tanks Germany had produced. This worked, because the serial number was increasing without gaps.
But even when you increase the numbers with gaps, the solution for the German tank problem still works. It is quite easy:
You use the method described here to guess the highest issued invoice number
You guess the mean difference between two successive invoice numbers and divide the number through this value
You can use linear regression to get a stable delta value (if it exists).
Now you have a good guess about the order of magnitude of the number of invoices (200, 15000, half an million, etc.).
This works as long there (theoretically) exists a mean value for two successive invoice numbers. This is usually the case, even when using a random number generator, because most random number generators are designed to have such a mean value.
There is a counter measure: You have to make sure that there exists no mean value for the gap of two successive numbers. A random number generator with this property can be constructed very easy.
Example:
Start with the last invoice number plus one as current number
Multiply the current number with a random number >=2. This is your new current number.
Get a random bit: If the bit is 0, the result is your current number. Otherwise go back to step 2.
While this will work in theory, you will very soon run out of 32 bit integer numbers.
I don't think there is a practical solution for this problem. Either the gap between two successive number has a mean value (with little variance) and you can guess the amount of issued numbers easily. Or you will run out of 32 bit numbers very quickly.
Snakeoil (non working solutions)
Don't use any time based solution. The timestamp is usually easy guessable (probably an approximately correct timestamp will be printed somewhere on invoice). Using timestamps usually makes it easier for the attacker, not harder.
Don't use insecure random numbers. Most random number generators are not cryptographically safe. They usually have mathematical properties that are good for statistics but bad for your security (e.g. a predicable distribution, a stable mean value, etc.)
One solution may involve Exclusive OR (XOR) binary bitmaps. The result function is reversible, may generate non-sequential numbers (if the first bit of the least significant byte is set to 1), and is extremely easy to implement. And, as long as you use a reliable sequence generator (your database, for example,) there is no need for thread safety concerns.
According to MSDN, 'the result [of a exclusive-OR operation] is true if and only if exactly one of its operands is true.' reverse logic says that equal operands will always result false.
As an example, I just generated a 32-bit sequence on Random.org. This is it:
11010101111000100101101100111101
This binary number translates to 3588381501 in decimal, 0xD5E25B3D in hex. Let's call it your base key.
Now, lets generate some values using the ([base key] XOR [ID]) formula. In C#, that's what your encryption function would look like:
public static long FlipMask(long baseKey, long ID)
{
return baseKey ^ ID;
}
The following list contains some generated content. Its columns are as follows:
ID
Binary representation of ID
Binary value after XOR operation
Final, 'encrypted' decimal value
0 | 000 | 11010101111000100101101100111101 | 3588381501
1 | 001 | 11010101111000100101101100111100 | 3588381500
2 | 010 | 11010101111000100101101100111111 | 3588381503
3 | 011 | 11010101111000100101101100111110 | 3588381502
4 | 100 | 11010101111000100101101100111001 | 3588381497
In order to reverse the generated key and determine the original value, you only need to do the same XOR operation using the same base key. Let's say we want to obtain the original value of the second row:
11010101111000100101101100111101 XOR
11010101111000100101101100111100 =
00000000000000000000000000000001
Which was indeed your original value.
Now, Stefan made very good points, and the first topic is crucial.
In order to cover his concerns, you may reserve the last, say, 8 bytes to be purely random garbage (which I believe is called a nonce), which you generate when encrypting the original ID and ignore when reversing it. That would heavily increase your security at the expense of a generous slice of all the possible positive integer numbers with 32 bits (16,777,216 instead of 4,294,967,296, or 1/256 of it.)
A class to do that would look like this:
public static class int32crypto
{
// C# follows ECMA 334v4, so Integer Literals have only two possible forms -
// decimal and hexadecimal.
// Original key: 0b11010101111000100101101100111101
public static long baseKey = 0xD5E25B3D;
public static long encrypt(long value)
{
// First we will extract from our baseKey the bits we'll actually use.
// We do this with an AND mask, indicating the bits to extract.
// Remember, we'll ignore the first 8. So the mask must look like this:
// Significance mask: 0b00000000111111111111111111111111
long _sigMask = 0x00FFFFFF;
// sigKey is our baseKey with only the indicated bits still true.
long _sigKey = _sigMask & baseKey;
// nonce generation. First security issue, since Random()
// is time-based on its first iteration. But that's OK for the sake
// of explanation, and safe for most circunstances.
// The bits it will occupy are the first eight, like this:
// OriginalNonce: 0b000000000000000000000000NNNNNNNN
long _tempNonce = new Random().Next(255);
// We now shift them to the last byte, like this:
// finalNonce: 0bNNNNNNNN000000000000000000000000
_tempNonce = _tempNonce << 0x18;
// And now we mix both Nonce and sigKey, 'poisoning' the original
// key, like this:
long _finalKey = _tempNonce | _sigKey;
// Phew! Now we apply the final key to the value, and return
// the encrypted value.
return _finalKey ^ value;
}
public static long decrypt(long value)
{
// This is easier than encrypting. We will just ignore the bits
// we know are used by our nonce.
long _sigMask = 0x00FFFFFF;
long _sigKey = _sigMask & baseKey;
// We will do the same to the informed value:
long _trueValue = _sigMask & value;
// Now we decode and return the value:
return _sigKey ^ _trueValue;
}
}
perhaps idea may come from the millitary? group invoices in blocks like these:
28th Infantry Division
--1st Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
--2nd Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
--3rd Brigade
---1st BN
----A Co
----B Co
---2nd BN
----A Co
----B Co
http://boards.straightdope.com/sdmb/showthread.php?t=432978
groups don't have to be sequential but numbers in groups do
UPDATE
Think about above as groups differentiated by place, time, person, etc. For example: create group using seller temporary ID, changing it every 10 days or by office/shop.
There is another idea, you may say a bit weird but... when I think of it I like it more and more. Why not to count down these invoices? Choose a big number and count down. It's easy to trace number of items when counting up, but counting down? How anyone would guess where is a starting point? It's easy to implement,
too.
If the orders sit in an inbox until a single person processes them each morning, seeing that it took that person till 16:00 before he got round to creating my invoice will give me the impression that he's been busy. Getting the 9:01 invoice makes me feel like I'm the only customer today.
But if you generate the ID at the time when I place my order, the timestamp tells me nothing.
I think I therefore actually like the timestamps, assuming that collisions where two customers simultaneously need an ID created are rare.
You can see from the code below that I use newsequentialid() to generate a sequential number then convert that to a [bigint]. As that generates a consistent increment of 4294967296 I simply divide that number by the [id] on the table (it could be rand() seeded with nanoseconds or something similar). The result is a number that is always less than 4294967296 so I can safely add it and be sure I'm not overlapping the range of the next number.
Peace
Katherine
declare #generator as table (
[id] [bigint],
[guid] [uniqueidentifier] default( newsequentialid()) not null,
[converted] as (convert([bigint], convert ([varbinary](8), [guid], 1))) + 10000000000000000000,
[converted_with_randomizer] as (convert([bigint], convert ([varbinary](8), [guid], 1))) + 10000000000000000000 + cast((4294967296 / [id]) as [bigint])
);
insert into #generator ([id])
values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
select [id],
[guid],
[converted],
[converted] - lag([converted],
1.0)
over (
order by [id]) as [orderly_increment],
[converted_with_randomizer],
[converted_with_randomizer] - lag([converted_with_randomizer],
1.0)
over (
order by [id]) as [disorderly_increment]
from #generator
order by [converted];
I do not know the reasons for the rules you set on the Invoice ID, but you could consider to have an internal Invoice Id which could be the sequential 32-bits integer and an external Invoice ID that you can share with your customers.
This way your internal Id can start at 1 and you can add one to it everytime and the customer invoice id could be what ever you want.
I think Na Na has the correct idea with choosing a big number and counting down. Start off with a large value seed and either count up or down, but don't start with the last placeholder. If you use one of the other placeholders it will give an illusion of a higher invoice count....if they are actually looking at that anyway.
The only caveat here would be to modify the last X digits of the number periodically to maintain the appearance of a change.
Why not taking an easy readable Number constructed like
first 12 digits is the datetime in a yyyymmddhhmm format (that ensures the order of your invoice IDs)
last x-digits is the order number (in this example 8 digits)
The number you get then is something like 20130814140300000008
Then do some simple calculations with it like the first 12 digits
(201308141403) * 3 = 603924424209
The second part (original: 00000008) can be obfuscated like this:
(10001234 - 00000008 * 256) * (minutes + 2) = 49995930
It is easy to translate it back into an easy readable number but unless you don't know how the customer has no clue at all.
Alltogether this number would look like 603924424209-49995930
for an invoice at the 14th August 2013 at 14:03 with the internal invoice number 00000008.
You can write your own function that when applied to the previous number generates the next sequential random number which is greater than the previous one but random. Though the numbers that can be generated will be from a finite set (for example, integers between 1 and 2 power 31) and may eventually repeat itself though highly unlikely. To Add more complexity to the generated numbers you can add some AlphaNumeric Characters at the end. You can read about this here Sequential Random Numbers.
An example generator can be
private static string GetNextnumber(int currentNumber)
{
Int32 nextnumber = currentNumber + (currentNumber % 3) + 5;
Random _random = new Random();
//you can skip the below 2 lines if you don't want alpha numeric
int num = _random.Next(0, 26); // Zero to 25
char let = (char)('a' + num);
return nextnumber + let.ToString();
}
and you can call like
string nextnumber = GetNextnumber(yourpreviouslyGeneratedNumber);

Struggling to make algorithm to generate board for a puzzle game

I'm looking to make a number puzzle game. For the sake of the question, let's say the board is a grid consisting of 4 x 4 squares. (In the actual puzzle game, this number will be 1..15)
A number may only occur once in each column and once in each row, a little like Sudoku, but without "squares".
Valid:
[1, 2, 3, 4
2, 3, 4, 1
3, 4, 1, 2
4, 1, 2, 3]
I can't seem to come up with an algorithm that will consistently generate valid, random n x n boards.
I'm writing this in C#.
Start by reading my series on graph colouring algorithms:
http://blogs.msdn.com/b/ericlippert/archive/tags/graph+colouring/
It is going to seem like this has nothing to do with your problem, but by the time you're done, you'll see that it has everything to do with your problem.
OK, now that you've read that, you know that you can use a graph colouring algorithm to describe a Sudoku-like puzzle and then solve a specific instance of the puzzle. But clearly you can use the same algorithm to generate puzzles.
Start by defining your graph regions that are fully connected.
Then modify the algorithm so that it tries to find two solutions.
Now create a blank graph and set one of the regions at random to a random colour. Try to solve the graph. Were there two solutions? Then add another random colour. Try it again. Were there no solutions? Then back up a step and add a different random colour.
Keep doing that -- adding random colours, backtracking when you get no solutions, and continuing until you get a puzzle that has a unique solution. And you're done; you've got a random puzzle generator.
It seems you could use this valid example as input to an algorithm that randomly swapped two rows a random number of times, then swapped two random columns a random number of times.
There aren't too many combinations you need to try. You can always rearrange a valid board so the top row is 1,2,3,4 (by remapping the symbols), and the left column is 1,2,3,4 (by rearranging rows 2 thru 4). On each row there are only 6 permutations of the remaining 3 symbols, so you can loop over those to find which of the 216 possible boards are valid. You may as well store the valid ones.
Then pick a valid board randomly, randomly rearrange the rows, and randomly reassign the symbols.
I don't speak C#, but the following algorithm ought to be easily translated.
Associate a set consisting of the numbers 1..N with each row and column:
for i = 1 to N
row_set[i] = column_set[i] = Set(1 .. N)
Then make a single pass through the matrix, choosing an entry for each position randomly from the set elements valid at that row and column. Remove the number chosen from the respective row and column sets.
for r = 1 to N
for c = 1 to N
k = RandomChoice( Intersection( column_set[c], row_set[r] ))
puzzle_board[r, c] = k
column_set[c] = column_set[c] - k
row_set[r] = row_set[r] - k
next c
next r
Looks like you want to generate uniformly distributed Latin Squares.
This pdf has a description of a method by Jacobson and Matthews (which was published elsewhere, a reference of which can be found here: http://designtheory.org/library/encyc/latinsq/z/)
Or you could potentially pre-generate a "lot" of them (before you ship :-)), store that in a file and randomly pick one.
Hope that helps.
The easiest way I can think of would be to create a partial game and solve it. If it's not solvable, or if it's wrong, make another. ;-)
Sudoku without squares sounds a bit like Sudoku. :)
http://www.codeproject.com/KB/game/sudoku.aspx
There is an explanation of the board generator code they use there.
Check out http://www.chiark.greenend.org.uk/~sgtatham/puzzles/ - he's got several puzzles that have precisely this constraint (among others).
A further solution would be this. Suppose you have a number of solutions. For each of them, you can generate a new solution by simply permuting the identifiers (1..15). These new solutions are of course logically the same, but to a player they will appear different.
The permutation might be done by treating each identifier in the initial solution as an index into an array, and then shuffling that array.
Use your first valid example:
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
Then, create randomly 2 permutations of {1, 2, 3, 4}.
Use the first to permute rows and then the second to permute columns.
You can find several ways to create permutations in Knuth's The Art of Computer Programming (TAOCP), Volume 4 Fascicle 2, Generating All Tuples and Permutations (2005), v+128pp. ISBN 0-201-85393-0.
If you can't find a copy in a library, a preprint (of the part that discusses permutations) is available at his site: fasc2b.ps.gz
EDIT - CORRECTION
The above solution is similar to 500-Intenral Server Error's one. But I think both won't find all valid arrangements.
For example they'll find:
1 3 2 4
3 1 4 2
2 4 1 3
4 2 3 1
but not this one:
1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1
One more step is needed: After rearranging rows and columns (either using my or 500's way), create one more permutation (lets call it s3) and use it to permute all the numbers in the array.
s3 = randomPermutation(1 ... n)
for i=1 to n
for j=1 to n
array[i,j] = s3( array[i,j] )

Finding 2 or more numbers having the given number as GCF

I don't want to find the GCF of given numbers. I use Euclidean for that. I want to generate a series of numbers having a given GCF. For example if I choose 4, I should get something like 100, 72 or 4, 8 etc.,
Any pointers would be appreciated.
A series of pairs of numbers having N as a GCF is {N,N}, {N,2N}, {N,3N}, ....
In fact, any set consisting of N and 1 or more multiples of N has N as its GCF.
1.Maybe this question can be better answered at http://math.stackexchange.com
2.Just construct the numbers you are interested in by multiplying the numbers that are not factors of the GCD. for your example of given GCD=4 that means
$k_1=4$ the GCD itself
$k_2=4 * 2$ since 4 does not divide 2
$k_3=4 * 3$ since 4 does not divide 3
$not k_4=4 * 4$ since 4 divides 4 but
$ k_4=4 * 5$ since 4 does not divide 5 etc.
If 4 is the input, you want a list of numbers whose greatest common factor is 4. You can ensure this by making 4 the only factor in the entire series. Therefore, you multiply the number (4) by all primes to ensure that.
prime-list = 3, 5, 7, 11, 13, 17
gcf-list for 4 -> (3*4)12, (4*5)20, (4*7)28, (4*11)44, (4*13)52, (4*17)68, ...
This will give you a list such that the GCF of any two numbers is 4
Choose a set of numbers that are pairwise-independent (that is gcd(x,y) = 1 for every x<>y in the set). Multiply each number by your target GCD.
I realize that this is an old question but I am going to provide my own answer along with an explanation of how I got there. First, let's call the GCF n.
Initially I would have suggested doing something like picking random integers and multiplying them each by n to get the set of numbers, this would of course give you numbers evenly divisible by n but not necessarily numbers with a GCF of n. If the integers happened to all have a GCF other than '1' then the GCF of the resulting set would actually have a GCF of n times that number, not n. That being said multiplying n by a set of integers seems the best way of ensuring that each number in the set is at least divisible by n
One option would be to make one of those numbers 1 but that would reduce the randomness of the set as n would always be in the resulting set.
Next you could use some prime numbers and multiply them by n but that would also reduce randomness as there would be less possible numbers, and the numbers don't actually need to be prime, just co-prime (GCF = 1 for the entire set)
You could also pick a set of numbers where each pair of numbers were co-prime but again, the entire set needs to be co-prime not co-prime pair-wise (which would be pretty processor intensive with larger sets).
So if you are going for fairly random numbers I would start by determining how many numbers you want in the set (whether that is randomly determined or predetermined) and then generating one less than that number completely 'randomly'. I would then compute the common prime factors for those numbers and then pick a random number that does not have any of those prime factors. Merely ensuring it does not have the same GCF is not sufficient as the GCF could have common factors to the final number. It only requires one number in a set that does not have any of the same prime factors as the other numbers in the set to make the GCF of that set '1'. I would then take that set of numbers and multiply each by n to get the set of numbers you want.

Categories

Resources