How to write a random multiplier selection function? - c#

I am trying to write a function that the returns one of the following multipliers randomly selected but following the frequency requirement. What below table defines is that for 1 Million calls to this function the 1500 will be returned once, 500 twice and so on.
|---------------------|------------------------------|
| Multiplier | Frequency Per Million |
|---------------------|------------------------------|
| 1500 | 1 |
|---------------------|------------------------------|
| 500 | 2 |
|---------------------|------------------------------|
| 200 | 50 |
|---------------------|------------------------------|
| 50 | 100 |
|---------------------|------------------------------|
| 25 | 20,000 |
|---------------------|------------------------------|
| 5 | 75,000 |
|---------------------|------------------------------|
| 3 | 414,326 |
|---------------------|------------------------------|
| 2 | 490521 |
|---------------------|------------------------------|
Wondering what would be the best way to approach implementing this.

First, let's declare the model:
static Dictionary<int, int> multipliers = new Dictionary<int, int>() {
{1500, 1},
{ 500, 2},
{ 200, 50},
{ 50, 100},
{ 25, 20_000},
{ 5, 75_000},
{ 3, 414_326},
{ 2, 490_521},
};
Then you can easily choose random multiplier:
// Easiest, but not thread safe
static Random random = new Random();
...
private static int RandomMultiplier() {
int r = random.Next(multipliers.Values.Sum());
s = 0;
foreach (var pair in multipliers.OrderByDescending(p => p.Key))
if (r < (s += pair.Value))
return pair.Key;
throw new InvalidOperationException($"{r} must not reach this line");
}
...
int multiplier = RandomMultiplier();

If the frequency and value that needs to be returned are set, then nothing complicated is needed. You just need to adjust for the previous numbers being handled in the if blocks by adding the frequency of the previous numbers.
private int GetRandomMultiplier()
{
var random = new Random();
var next = random.Next(1000000);
if (next < 1)
{
return 1500;
}
else if (next < 3)
{
return 500;
}
else if (next < 53)
{
return 200;
}
else if (next < 153)
{
return 50;
}
else if (next < 20153)
{
return 25;
}
else if (next < 95153)
{
return 5;
}
else if (next < 509479)
{
return 3;
}
return 2;
}
You don't want to create a new Random every time though, so create one once and use that.

Related

C# logic to find offset and fetch values

I get the below resultset from a SQL query and I store it in var.
+-----------+--------+
| Rownumber | Data |
+-----------+--------+
| 0 | 9 |
| 1 | 0 |
| 2 | 4 |
| 3 | 9 |
| 4 | 15 |
| 5 | 2 |
| 6 | 1 |
| 7 | 6 |
| 8 | 0 |
| 9 | 4 |
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1 |
| 15 | 1 |
| 16 | 1 |
| 17 | 1 |
| 18 | 1 |
| 19 | 1 |
| 20 | 1 |
| 21 | 1 |
| 22 | 1 |
+-----------+--------+
I want to write a logic in c# :
I want to add the Data column sequentially.
If the summed Data column value is more than or equal to 15, then I want to store the following value in two variables:
offset = The starting point of rownumber
Fetch = Num of rows taken to achieve the sum 15
E.g:
Iteration 1:
+-----------+--------+
| Rownumber | Data |
+-----------+--------+
| 0 | 9 |
| 1 | 0 |
| 2 | 4 |
| 3 | 9 |
+-----------+--------+
Expected variable values:
offset = 0
Fetch = 4 (num of rows taken to achieve the value of 15. Sum of value should be >= 15)
Iteration 2 :
+-----------+--------+
| Rownumber | Data |
+-----------+--------+
| 4 | 15 |
+-----------+--------+
Expected values:
offset = 4
Fetch = 1 (num of rows taken to achieve the value of 15)
Iteration 3:
+-----------+--------+
| Rownumber | Data |
+-----------+--------+
| 5 | 2 |
| 6 | 1 |
| 7 | 6 |
| 8 | 0 |
| 9 | 4 |
| 10 | 1 |
| 11 | 1 |
+-----------+--------+
Expected values:
offset = 5
Fetch = 7 (num of rows taken to achieve the value of 15)
The iteration will go on until the last value.
I supposed your model look like the following
public class Data
{
public int Rownumber { get; set; }
public int data { get; set; }
}
public class Result
{
public int offset { get; set; }
public int fetsh { get; set; }
}
and you need the following code
public List<Result> GetResults(List<Data> data)
{
var sum = 0;
var start_taking_index = 0;
List<Result> results = new List<Result>();
for (int i = 0; i < data.Count; i++)
{
sum += data[i].data;
if(sum >= 15 || i == data.Count-1)
{
// if the sum exceed 15 create new result
results.Add(new Result
{
offset = start_taking_index,
fetsh = i - start_taking_index +1,
});
// then reset the tracking variables
start_taking_index = i+1;
sum = 0;
}
}
return results;
}
here is xUnit test the scenario in the question
[Fact]
public void GetResults_test()
{
List<Data> datas = new List<Data>()
{
new Data{Rownumber = 0,data= 9},
new Data{Rownumber = 1,data= 0},
new Data{Rownumber = 2,data= 4},
new Data{Rownumber = 3,data= 9},
new Data{Rownumber = 4,data=15},
new Data{Rownumber = 5,data= 2},
new Data{Rownumber = 6,data= 1},
new Data{Rownumber = 7,data= 6},
new Data{Rownumber = 8,data= 0},
new Data{Rownumber = 9,data= 4},
new Data{Rownumber = 10,data= 1},
new Data{Rownumber = 11,data= 1},
new Data{Rownumber = 12,data= 1},
new Data{Rownumber = 13,data= 1},
new Data{Rownumber = 14,data= 1},
new Data{Rownumber = 15,data= 1},
new Data{Rownumber = 16,data= 1},
new Data{Rownumber = 17,data= 1},
new Data{Rownumber = 18,data= 1},
new Data{Rownumber = 19,data= 1},
new Data{Rownumber = 20,data= 1},
new Data{Rownumber = 21,data= 1},
new Data{Rownumber = 22,data= 1},
};
var result = GetResults(datas);
Assert.NotEmpty(result);
// first
Assert.Equal(0,result[0].offset);
Assert.Equal(4,result[0].fetsh);
// second
Assert.Equal(4, result[1].offset);
Assert.Equal(1, result[1].fetsh);
//
Assert.Equal(5, result[2].offset);
Assert.Equal(7, result[2].fetsh);
//
Assert.Equal(12, result[3].offset);
Assert.Equal(11, result[3].fetsh);
// total count count
Assert.Equal(4, result.Count);
}
I would go with:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
List<Data> datas = new List<Data>()
{
new Data{Rownumber = 0,data= 9},
new Data{Rownumber = 1,data= 0},
new Data{Rownumber = 2,data= 4},
new Data{Rownumber = 3,data= 9},
new Data{Rownumber = 4,data=15},
new Data{Rownumber = 5,data= 2},
new Data{Rownumber = 6,data= 1},
new Data{Rownumber = 7,data= 6},
new Data{Rownumber = 8,data= 0},
new Data{Rownumber = 9,data= 4},
new Data{Rownumber = 10,data= 1},
new Data{Rownumber = 11,data= 1},
new Data{Rownumber = 12,data= 1},
new Data{Rownumber = 13,data= 1},
new Data{Rownumber = 14,data= 1},
new Data{Rownumber = 15,data= 1},
new Data{Rownumber = 16,data= 1},
new Data{Rownumber = 17,data= 1},
new Data{Rownumber = 18,data= 1},
new Data{Rownumber = 19,data= 1},
new Data{Rownumber = 20,data= 1},
new Data{Rownumber = 21,data= 1},
new Data{Rownumber = 22,data= 1},
};
foreach(var entry in Calculate(datas))
{
Console.WriteLine("Offset: " + entry.Key + " | Fetch: " + entry.Value);
}
}
public static List<KeyValuePair<int, int>> Calculate(List<Data> data)
{
var result = new List<KeyValuePair<int, int>>();
int offset = 0, lastOffset = 0;
int sum = 0;
foreach(var entry in data)
{
sum += entry.data;
if(sum >= 15)
{
result.Add(new KeyValuePair<int, int>(lastOffset, offset - lastOffset + 1));
sum = 0;
lastOffset = offset + 1;
}
offset++;
}
return result;
}
public class Data
{
public int Rownumber { get; set; }
public int data { get; set; }
}
}
Mind that it does not have any guards - it's up to you.
System.Linq may do trick for you with Skip and TakeWhile extension methods:
static void Main()
{
// Fill source info
var source = new List<(int, int)>();
for (int i = 0; i <= 22; i++)
source.Add((i, i switch
{
0 => 9,
1 or 8 => 0,
2 or 9 => 4,
3 => 9,
4 => 15,
5 => 2,
7 => 6,
_ => 1,
}));
// Fetching result
foreach (var (offset, fetch) in GetResult(source))
Console.WriteLine($"Offset: {offset} | Fetch: {fetch}");
// Output:
// Offset: 0 | Fetch: 4
// Offset: 4 | Fetch: 1
// Offset: 5 | Fetch: 7
// Offset: 12 | Fetch: 11
Console.ReadKey();
}
static List<(int, int)> GetResult(List<(int, int)> source)
{
var result = new List<(int, int)>();
var proceededRecords = 0;
while (proceededRecords < source.Count)
{
var offset = proceededRecords;
var dataSum = 0;
var fetch = source.Skip(proceededRecords).TakeWhile((data, _) =>
{
if (dataSum >= 15)
return false;
dataSum += data.Item2;
return true;
}).Count();
proceededRecords += fetch;
result.Add((offset, fetch));
}
return result;
}
Remarks.
I used tuples to simplify example and avoid creating some Model class with RowNumber and Data properties or Result class with Offset and Fetch properties.
The idea was to loop over source collection with taking some unknown amount of tuples until sum of 2nd value in tuple is less than 15. TakeWhile help me with that. Skip was used to... skip amount of already fetched records.
While your example shows that your data indexes are sequential and start at zero, it was not stated explicitely that it will be always the case.
This solution works even if the index of the data from your database is not starting from 0, or is not sequencial (or both). Or if instead of an index you had any other kind of identifier, such as a timestamp for instance.
i.e. if your data is like
+-----------+--------+
| Rownumber | Data |
+-----------+--------
| 6 | 1 |
| 7 | 6 |
| 8 | 0 |
| 9 | 4 |
etc...
+-----------+--------+
or
+-----------+--------+
| Rownumber | Data |
+-----------+--------
| 16 | 1 |
| 7 | 6 |
| 108 | 0 |
| 9 | 4 |
| 1910 | 1 |
| 121 | 1 |
etc..
+-----------+--------+
or even
+-----------------------+--------+
| Timestamp | Data |
+-----------------------+--------+
| 2021/03/02 - 10:06:24 | 1 |
| 2021/03/02 - 12:13:03 | 6 |
| 2021/03/04 - 02:48:57 | 0 |
| 2021/05/23 - 23:38:17 | 4 |
etc...
+-----------------------+--------+
Here is the part of the code actually doing the work. It does not really add complexity compared to a solution that would work only on zero starting sequencial indexes:
var Results = new List<Result>();
var Group = new List<Data>();
var Sum = 0;
var CurrentIndex = 0;
while (Source.Any())
{
CurrentIndex = 0;
Sum = 0;
while (Sum < 15 && CurrentIndex < Source.Count)
{
Sum += Source[CurrentIndex].Value;
CurrentIndex++;
}
Group = Source.Take(CurrentIndex).ToList();
Source = Source.Skip(CurrentIndex).ToList();
Results.Add(new Result
{
Root = Group.First().Index,
Fetch = Group.Count
});
}
What it does is rather simple:
It enumerates the first elements of your collection (source) while their sum is inferior to 15 (and there is still some elements to enumerate).
It counts the number of elements just enumerated (the fetch) and get the index of the first element (the root).
It then constructs a new collection by removing the elements that were just enumerated, and starts again, using that new collection until there is no more elements to enumerate.
That is all.
The Group variable could be avoided alltogether. It would give the following code. I prefered keeping it in my example as it shows that the group itself could be used to perform any kind of operation on its content if needed.
while (Source.Any())
{
CurrentIndex = 0;
Sum = 0;
while (Sum < 15 && CurrentIndex < Source.Count)
{
Sum += Source[CurrentIndex].Value;
CurrentIndex++;
}
Results.Add(new Result
{
Root = Source.First().Index,
Fetch = CurrentIndex
});
Source = Source.Skip(CurrentIndex).ToList();
}
By the way, the second nested while loop could be avoided by using Linq, see below. However this Linq query is particular and should be used with caution.
The TakeWhile method is using an unpure lambda, i.e. the lambda relies on external data: the captured variable Sum.
While this works perfectly fine, be aware that generally this kind of Linq query could lead to problems further down the road. For instance adding .AsParallel() to such kind of query would not work at all.
while (Source.Any())
{
Sum = 0;
Group = Source.TakeWhile(e =>
{
if (Sum < 15)
{
Sum += e.Value;
return true;
}
return false;
}).ToList();
Results.Add(new Result
{
Root = Group.First().Index,
Fetch = Group.Count
});
Source = Source.Skip(Group.Count).ToList();
}
Here is the complete code, as a full runnable Linqpad query, with randomly generated data:
void Main()
{
// Random data set preparation.
var Rnd = new Random();
var NonSequencialIndexes = Enumerable
.Range(100, 300)
.Where(i => Rnd.Next(2) == 1)
.OrderBy(i => Guid.NewGuid())
.Take(30)
.ToArray();
var Source = Enumerable
.Range(0, 30)
.Select(i => new Data
{
Index = NonSequencialIndexes[i],
Value = Rnd.Next(16)
})
.ToList()
.Dump("Random data set");
// Actual code
var Results = new List<Result>();
var Group = new List<Data>();
var Sum = 0;
var CurrentIndex = 0;
while (Source.Any())
{
CurrentIndex = 0;
Sum = 0;
while (Sum < 15 && CurrentIndex < Source.Count)
{
Sum += Source[CurrentIndex].Value;
CurrentIndex++;
}
Group = Source.Take(CurrentIndex).ToList();
Source = Source.Skip(CurrentIndex).ToList();
Results.Add(new Result
{
Root = Group.First().Index,
Fetch = Group.Count
});
}
// Display results
Results.Dump("Results");
}
// You can define other methods, fields, classes and namespaces here
public class Data
{
public int Index { get; set; }
public int Value { get; set; }
}
public class Result
{
public int Root { get; set; }
public int Fetch { get; set; }
}
An example of result:
+------+-------+
| Root | Fetch |
+------+-------+
| 346 | 3 |
+------+-------+
| 121 | 3 |
+------+-------+
| 381 | 2 |
+------+-------+
| 110 | 2 |
+------+-------+
| 334 | 2 |
+------+-------+
| 226 | 2 |
+------+-------+
| 148 | 2 |
+------+-------+
| 114 | 3 |
+------+-------+
| 397 | 3 |
+------+-------+
| 274 | 3 |
+------+-------+
| 135 | 3 |
+------+-------+
| 386 | 2 |
+------+-------+
for this data collection
+-------+------+
| Index | Value|
+-------+------+
| 346 | 0|
+-------+------+
| 294 | 14|
+-------+------+
| 152 | 11|
+-------+------+
| 121 | 3|
+-------+------+
| 234 | 6|
+-------+------+
| 393 | 13|
+-------+------+
| 381 | 8|
+-------+------+
| 305 | 15|
+-------+------+
| 110 | 13|
+-------+------+
| 357 | 9|
+-------+------+
| 334 | 8|
+-------+------+
| 214 | 13|
+-------+------+
| 226 | 6|
+-------+------+
| 248 | 15|
+-------+------+
| 148 | 12|
+-------+------+
| 131 | 9|
+-------+------+
| 114 | 3|
+-------+------+
| 250 | 4|
+-------+------+
| 217 | 11|
+-------+------+
| 397 | 3|
+-------+------+
| 312 | 7|
+-------+------+
| 191 | 7|
+-------+------+
| 274 | 7|
+-------+------+
| 292 | 6|
+-------+------+
| 277 | 14|
+-------+------+
| 135 | 2|
+-------+------+
| 240 | 12|
+-------+------+
| 163 | 12|
+-------+------+
| 386 | 12|
+-------+------+
| 330 | 5|
+-------+------+

Logarithmic distribution of profits among game winners

I have a gave, which, when it's finished, has a table of players and their scores.
On the other hand i have a virtual pot of money that i want to distribute among these winners. I'm looking for a SQL query or piece of C# code to do so.
The descending sorted table looks like this:
UserId | Name | Score | Position | % of winnings | abs. winnings $
00579 | John | 754 | 1 | ? | 500 $
98983 | Sam | 733 | 2 | ? | ?
29837 | Rick | 654 | 3 | ? | ? <- there are 2 3rd places
21123 | Hank | 654 | 3 | ? | ? <- there are 2 3rd places
99821 | Buck | 521 | 5 | ? | ? <- there is no 4th, because of the 2 3rd places
92831 | Joe | 439 | 6 | ? | ? <- there are 2 6rd places
99281 | Jack | 439 | 6 | ? | ? <- there are 2 6rd places
12345 | Hal | 412 | 8 | ? | ?
98112 | Mick | 381 | 9 | ? | ?
and so on, until position 50
98484 | Sue | 142 | 50 | ? | 5 $
Be aware of the double 3rd and 6th places.
Now i want to distribute the total amount of (virtual) money ($ 10,000) among the first 50 positions. (It would be nice if the positions to distribute among (which is now 50) can be a variable).
The max and min amount (for nr 1 and nr 50) are fixed at 500 and 5.
Does anyone have a good idea for a SQL query or piece of C# code to fill the columns with % of winnings and absolute winnings $ correctly?
I prefer to have a distribution that looks a bit logarithmic like this: (which makes that the higher positions get relatively more than the lower ones).
.
|.
| .
| .
| .
| .
| .
| .
| .
| .
I haven't done SQL since 1994, but I like C# :-). The following might suit, adjust parameters of DistributeWinPot.DistributeWinPot(...) as required:
private class DistributeWinPot {
private static double[] GetWinAmounts(int[] psns, double TotWinAmounts, double HighWeight, double LowWeight) {
double[] retval = new double[psns.Length];
double fac = -Math.Log(HighWeight / LowWeight) / (psns.Length - 1), sum = 0;
for (int i = 0; i < psns.Length; i++) {
sum += retval[i] = (i == 0 || psns[i] > psns[i - 1] ? HighWeight * Math.Exp(fac * (i - 1)) : retval[i - 1]);
}
double scaling = TotWinAmounts / sum;
for (int i = 0; i < psns.Length; i++) {
retval[i] *= scaling;
}
return retval;
}
public static void main(string[] args) {
// set up dummy data, positions in an int array
int[] psns = new int[50];
for (int i = 0; i < psns.Length; i++) {
psns[i] = i+1;
}
psns[3] = 3;
psns[6] = 6;
double[] WinAmounts = GetWinAmounts(psns, 10000, 500, 5);
for (int i = 0; i < psns.Length; i++) {
System.Diagnostics.Trace.WriteLine((i + 1) + "," + psns[i] + "," + string.Format("{0:F2}", WinAmounts[i]));
}
}
}
Output from that code was:
1,1,894.70
2,2,814.44
3,3,741.38
4,3,741.38
5,5,614.34
6,6,559.24
7,6,559.24
8,8,463.41
9,9,421.84
10,10,384.00
11,11,349.55
12,12,318.20
13,13,289.65
14,14,263.67
15,15,240.02
16,16,218.49
17,17,198.89
18,18,181.05
19,19,164.81
20,20,150.03
21,21,136.57
22,22,124.32
23,23,113.17
24,24,103.02
25,25,93.77
26,26,85.36
27,27,77.71
28,28,70.74
29,29,64.39
30,30,58.61
31,31,53.36
32,32,48.57
33,33,44.21
34,34,40.25
35,35,36.64
36,36,33.35
37,37,30.36
38,38,27.64
39,39,25.16
40,40,22.90
41,41,20.85
42,42,18.98
43,43,17.27
44,44,15.72
45,45,14.31
46,46,13.03
47,47,11.86
48,48,10.80
49,49,9.83
50,50,8.95
Then how about this?
Select userid, log(score),
10000 * log(score) /
(Select Sum(log(score))
From TableName
Where score >=
(Select Min(score)
from (Select top 50 score
From TableName
Order By score desc) z))
From TableName
Order By score desc

How can I split a list vertically in n parts with LINQ

I would like to split a list in parts, without knowing how much items I will have in that list. The question is different from those of you who wants to split a list into chunk of fixed size.
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
I would like the values to be splitted vertically.
Splitted in 2 :
-------------------
| item 1 | item 6 |
| item 2 | item 7 |
| item 3 | item 8 |
| item 4 | item 9 |
| item 5 | |
Splitted in 3:
| item 1 | item 4 | item 7 |
| item 2 | item 5 | item 8 |
| item 3 | item 6 | item 9 |
Splitted in 4:
| item 1 | item 4 | item 6 | item 8 |
| item 2 | item 5 | item 7 | item 9 |
| item 3 | | | |
I've found a few c# extensions that can do that but it doesn't distribute the value the way I want. Here's what I've found:
// this technic is an horizontal distribution
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
the result is this but my problem is that the value are distributed horizontally:
| item 1 | item 2 |
| item 3 | item 4 |
| item 5 | item 6 |
| item 7 | item 8 |
| item 9 | |
or
| item 1 | item 2 | item 3 |
| item 4 | item 5 | item 6 |
| item 7 | item 8 | item 9 |
any idea how I can distribute my values vertically and have the possibility to choose the number of parts that i want?
In real life
For those of you who want to know in which situation I would like to split a list vertically, here's a screenshot of a section of my website:
With .Take() and .Skip() you can:
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int splitIndex = 4; // or (a.Length / 2) to split in the middle.
var list1 = a.Take(splitIndex).ToArray(); // Returns a specified number of contiguous elements from the start of a sequence.
var list2 = a.Skip(splitIndex).ToArray(); // Bypasses a specified number of elements in a sequence and then returns the remaining elements.
You can use .ToList() instead of .ToArray() if you want a List<int>.
EDIT:
After you changed (clarified maybe) your question a bit, I guess this is what you needed:
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int parts)
{
var list = new List<T>(source);
int defaultSize = (int)((double)list.Count / (double)parts);
int offset = list.Count % parts;
int position = 0;
for (int i = 0; i < parts; i++)
{
int size = defaultSize;
if (i < offset)
size++; // Just add one to the size (it's enough).
yield return list.GetRange(position, size);
// Set the new position after creating a part list, so that it always start with position zero on the first yield return above.
position += size;
}
}
}
Using it:
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var lists = a.Split(2);
This would generate:
split in 2 : a.Split(2);
| item 1 | item 6 |
| item 2 | item 7 |
| item 3 | item 8 |
| item 4 | item 9 |
| item 5 | |
split in 3 : a.Split(3);
| item 1 | item 4 | item 7 |
| item 2 | item 5 | item 8 |
| item 3 | item 6 | item 9 |
split in 4 : a.Split(4);
| item 1 | item 4 | item 6 | item 8 |
| item 2 | item 5 | item 7 | item 9 |
| item 3 | | | |
Also, if you would have:
int[] b = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; // 10 items
and split in 4 : b.Split(4);
| item 1 | item 4 | item 7 | item 9 |
| item 2 | item 5 | item 8 | item 10|
| item 3 | item 6 | | |
This seems to do the trick quite nicely. It can probably be done even more efficient, but this was puzzling enough... It's much easier to do:
1|4|7|10
2|5|8
3|6|9
Than:
1|4|7|9
2|5|8|10
3|6|
I ignored the LINQ request first, since I had trouble wrapping my head around it. The solution using normal array manipulation could result in something like this:
public static IEnumerable<IEnumerable<TListItem>> Split<TListItem>(this IEnumerable<TListItem> items, int parts)
where TListItem : struct
{
var itemsArray = items.ToArray();
int itemCount = itemsArray.Length;
int itemsOnlastRow = itemCount - ((itemCount / parts) * parts);
int numberOfRows = (int)(itemCount / (decimal)parts) + 1;
for (int row = 0; row < numberOfRows; row++)
{
yield return SplitToRow(itemsArray, parts, itemsOnlastRow, numberOfRows, row);
}
}
private static IEnumerable<TListItem> SplitToRow<TListItem>(TListItem[] items, int itemsOnFirstRows, int itemsOnlastRow,
int numberOfRows, int row)
{
for (int column = 0; column < itemsOnFirstRows; column++)
{
// Are we on the last row?
if (row == numberOfRows - 1)
{
// Are we within the number of items on that row?
if (column < itemsOnlastRow)
{
yield return items[(column + 1) * numberOfRows -1];
}
}
else
{
int firstblock = itemsOnlastRow * numberOfRows;
int index;
// are we in the first block?
if (column < itemsOnlastRow)
{
index = column*numberOfRows + ((row + 1)%numberOfRows) - 1;
}
else
{
index = firstblock + (column - itemsOnlastRow)*(numberOfRows - 1) + ((row + 1)%numberOfRows) - 1;
}
yield return
items[index];
}
}
}
The LINQ pseudo code would be:
//WARNING: DOES NOT WORK
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int itemOnIndex = 0;
var splits = from item in list
group item by MethodToDefineRow(itemOnIndex++) into row
select row.AsEnumerable();
return splits;
}
But without knowing the number of items, there is no way to calculate the place where to put it.
So by doing a little pre-calculation, you can use LINQ to achieve the same thing as above this requires going through the IEnumerable twice, there doesn't seem to be a way around that. The trick is to calculate the row each value will be assigned to.
//WARNING: Iterates the IEnumerable twice
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int itemOnIndex = 0;
int itemCount = list.Count();
int itemsOnlastRow = itemCount - ((itemCount / parts) * parts);
int numberOfRows = (int)(itemCount / (decimal)parts) + 1;
int firstblock = (numberOfRows*itemsOnlastRow);
var splits = from item in list
group item by (itemOnIndex++ < firstblock) ? ((itemOnIndex -1) % numberOfRows) : ((itemOnIndex - 1 - firstblock) % (numberOfRows - 1)) into row
orderby row.Key
select row.AsEnumerable();
return splits;
}
Use .Take(#OfElements) to specify number of elements you want to take.
You can also use .First and .Last

c# recursive function help understanding how it works?

I need help to understand how a function is working;: it is a recursive function with yield return but I can't figure out how it works. It is used calculate a cumulative density function (approximate) over a set of data.
Thanks a lot to everyone.
/// Approximates the cumulative density through a recursive procedure
/// estimating counts of regions at different resolutions.
/// </summary>
/// <param name="data">Source collection of integer values</param>
/// <param name="maximum">The largest integer in the resulting cdf (it has to be a power of 2...</param>
/// <returns>A list of counts, where entry i is the number of records less than i</returns>
public static IEnumerable<int> FUNCT(IEnumerable<int> data, int max)
{
if (max == 1)
{
yield return data.Count();
}
else
{
var t = data.Where(x => x < max / 2);
var f = data.Where(x => x > max / 2);
foreach (var value in FUNCT(t, max / 2))
yield return value;
var count = t.Count();
f = f.Select(x => x - max / 2);
foreach (var value in FUNCT(f, max / 2))
yield return value + count;
}
}
In essence, IEnumerable functions that use yield return function slightly differently from traditional recursive functions. As a base case, suppose you have:
IEnumerable<int> F(int n)
{
if (n == 1)
{
yield return 1;
yield return 2;
// implied yield return break;
}
// Enter loop 1
foreach (var v in F(n - 1))
yield return v;
// End loop 1
int sum = 5;
// Enter loop 2
foreach (var v in F(n - 1))
yield return v + sum;
// End loop 2
// implied yield return break;
}
void Main()
{
foreach (var v in F(2))
Console.Write(v);
// implied return
}
F takes the basic orm of the original FUNCT. If we call F(2), then walking through the yields:
F(2)
| F(1)
| | yield return 1
| yield return 1
Console.Write(1);
| | yield return 2
| yield return 2
Console.Write(2)
| | RETURNS
| sum = 5;
| F(1)
| | yield return 1
| yield return 1 + 5
Console.Write(6)
| | yield return 2
| yield return 2 + 5
Console.Write(7)
| | RETURNS
| RETURNS
RETURNS
And 1267 is printed. Note that the yield return statement yields control to the caller, but that the next iteration causes the function to continue where it had previously yielded.
The CDF method does adds some additional complexity, but not much. The recursion splits the collection into two pieces, and computes the CDF of each piece, until max=1. Then the function counts the number of elements and yields it, with each yield propogating recursively to the enclosing loop.
To walk through FUNCT, suppose you run with data=[0,1,0,1,2,3,2,1] and max=4. Then running through the method, using the same Main function above as a driver, yields:
FUNCT([0,1,0,1,2,3,2,1], 4)
| max/2 = 2
| t = [0,1,0,1,1]
| f = [3] // (note: per my comment to the original question,
| // should be [2,3,2] to get true CDF. The 2s are
| // ignored since the method uses > max/2 rather than
| // >= max/2.)
| FUNCT(t,max/2) = FUNCT([0,1,0,1,1], 2)
| | max/2 = 1
| | t = [0,0]
| | f = [] // or [1,1,1]
| | FUNCT(t, max/2) = FUNCT([0,0], 1)
| | | max = 1
| | | yield return data.count = [0,0].count = 2
| | yield return 2
| yield return 2
Console.Write(2)
| | | RETURNS
| | count = t.count = 2
| | F(f, max/2) = FUNCT([], 1)
| | | max = 1
| | | yield return data.count = [].count = 0
| | yield return 0 + count = 2
| yield return 2
Console.Write(2)
| | | RETURNS
| | RETURNS
| count = t.Count() = 5
| f = f - max/2 = f - 2 = [1]
| FUNCT(f, max/2) = FUNCT([1], 2)
| | max = 2
| | max/2 = 1
| | t = []
| | f = [] // or [1]
| | FUNCT(t, max/2) = funct([], 1)
| | | max = 1
| | | yield return data.count = [].count = 0
| | yield return 0
| yield return 0 + count = 5
Console.Write(5)
| | | RETURNS
| | count = t.count = [].count = 0
| | f = f - max/2 = []
| | F(f, max/2) = funct([], 1)
| | | max = 1
| | | yield return data.count = [].count = 0
| | yield return 0 + count = 0 + 0 = 0
| yield return 0 + count = 0 + 5 = 5
Console.Write(5)
| | RETURNS
| RETURNS
RETURNS
So this returns the values (2,2,5,5). (using >= would yield the values (2,5,7,8) -- note that these are the exact values of a scaled CDF for non-negative integral data, rather than an approximation).
Interesting question. Assuming you understand how yield works, the comments on the function (in your question) are very helpful. I've commented the code as I understand it which might help:
public static IEnumerable<int> FUNCT(IEnumerable<int> data, int max)
{
if (max == 1)
{
// Effectively the end of the recursion.
yield return data.Count();
}
else
{
// Split the data into two sets
var t = data.Where(x => x < max / 2);
var f = data.Where(x => x > max / 2);
// In the set of smaller numbers, recurse to split it again
foreach (var value in FUNCT(t, max / 2))
yield return value;
// For the set of smaller numbers, get the count.
var count = t.Count();
// Shift the larger numbers so they are in the smaller half.
// This allows the recursive function to reach an end.
f = f.Select(x => x - max / 2);
// Recurse but add the count of smaller numbers. We already know there
// are at least 'count' values which are less than max / 2.
// Recurse to find out how many more there are.
foreach (var value in FUNCT(f, max / 2))
yield return value + count;
}
}

Select N random elements from a List<T> in C#

I need a quick algorithm to select 5 random elements from a generic list. For example, I'd like to get 5 random elements from a List<string>.
Using linq:
YourList.OrderBy(x => rnd.Next()).Take(5)
Iterate through and for each element make the probability of selection = (number needed)/(number left)
So if you had 40 items, the first would have a 5/40 chance of being selected. If it is, the next has a 4/39 chance, otherwise it has a 5/39 chance. By the time you get to the end you will have your 5 items, and often you'll have all of them before that.
This technique is called selection sampling, a special case of Reservoir Sampling. It's similar in performance to shuffling the input, but of course allows the sample to be generated without modifying the original data.
public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount)
{
return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList();
}
This is actually a harder problem than it sounds like, mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities (more on this below).
First, here are some easy-to-implement, correct-if-you-have-a-truly-random-number generator:
(0) Kyle's answer, which is O(n).
(1) Generate a list of n pairs [(0, rand), (1, rand), (2, rand), ...], sort them by the second coordinate, and use the first k (for you, k=5) indices to get your random subset. I think this is easy to implement, although it is O(n log n) time.
(2) Init an empty list s = [] that will grow to be the indices of k random elements. Choose a number r in {0, 1, 2, ..., n-1} at random, r = rand % n, and add this to s. Next take r = rand % (n-1) and stick in s; add to r the # elements less than it in s to avoid collisions. Next take r = rand % (n-2), and do the same thing, etc. until you have k distinct elements in s. This has worst-case running time O(k^2). So for k << n, this can be faster. If you keep s sorted and track which contiguous intervals it has, you can implement it in O(k log k), but it's more work.
#Kyle - you're right, on second thought I agree with your answer. I hastily read it at first, and mistakenly thought you were indicating to sequentially choose each element with fixed probability k/n, which would have been wrong - but your adaptive approach appears correct to me. Sorry about that.
Ok, and now for the kicker: asymptotically (for fixed k, n growing), there are n^k/k! choices of k element subset out of n elements [this is an approximation of (n choose k)]. If n is large, and k is not very small, then these numbers are huge. The best cycle length you can hope for in any standard 32 bit random number generator is 2^32 = 256^4. So if we have a list of 1000 elements, and we want to choose 5 at random, there's no way a standard random number generator will hit all the possibilities. However, as long as you're ok with a choice that works fine for smaller sets, and always "looks" random, then these algorithms should be ok.
Addendum: After writing this, I realized that it's tricky to implement idea (2) correctly, so I wanted to clarify this answer. To get O(k log k) time, you need an array-like structure that supports O(log m) searches and inserts - a balanced binary tree can do this. Using such a structure to build up an array called s, here is some pseudopython:
# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
for i in range(k):
r = UniformRandom(0, n-i) # May be 0, must be < n-i
q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search.
s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q.
return s
I suggest running through a few sample cases to see how this efficiently implements the above English explanation.
I think the selected answer is correct and pretty sweet. I implemented it differently though, as I also wanted the result in random order.
static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
IEnumerable<SomeType> someTypes,
int maxCount)
{
Random random = new Random(DateTime.Now.Millisecond);
Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();
foreach(SomeType someType in someTypes)
randomSortTable[random.NextDouble()] = someType;
return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
}
I just ran into this problem, and some more google searching brought me to the problem of randomly shuffling a list: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
To completely randomly shuffle your list (in place) you do this:
To shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
If you only need the first 5 elements, then instead of running i all the way from n-1 to 1, you only need to run it to n-5 (ie: n-5)
Lets say you need k items,
This becomes:
for (i = n − 1; i >= n-k; i--)
{
j = random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
}
Each item that is selected is swapped toward the end of the array, so the k elements selected are the last k elements of the array.
This takes time O(k), where k is the number of randomly selected elements you need.
Further, if you don't want to modify your initial list, you can write down all your swaps in a temporary list, reverse that list, and apply them again, thus performing the inverse set of swaps and returning you your initial list without changing the O(k) running time.
Finally, for the real stickler, if (n == k), you should stop at 1, not n-k, as the randomly chosen integer will always be 0.
You can use this but the ordering will happen on client side
.AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);
12 years on and the this question is still active, I didn't find an implementation of Kyle's solution I liked so here it is:
public IEnumerable<T> TakeRandom<T>(IEnumerable<T> collection, int take)
{
var random = new Random();
var available = collection.Count();
var needed = take;
foreach (var item in collection)
{
if (random.Next(available) < needed)
{
needed--;
yield return item;
if (needed == 0)
{
break;
}
}
available--;
}
}
From Dragons in the Algorithm, an interpretation in C#:
int k = 10; // items to select
var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
var selected = new List<int>();
double needed = k;
double available = items.Count;
var rand = new Random();
while (selected.Count < k) {
if( rand.NextDouble() < needed / available ) {
selected.Add(items[(int)available-1])
needed--;
}
available--;
}
This algorithm will select unique indicies of the items list.
Was thinking about comment by #JohnShedletsky on the accepted answer regarding (paraphrase):
you should be able to to this in O(subset.Length), rather than O(originalList.Length)
Basically, you should be able to generate subset random indices and then pluck them from the original list.
The Method
public static class EnumerableExtensions {
public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable
public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) {
return (list as T[] ?? list.ToArray()).GetRandom(numItems);
// because ReSharper whined about duplicate enumeration...
/*
items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--;
*/
}
// just because the parentheses were getting confusing
public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) {
var items = new HashSet<T>(); // don't want to add the same item twice; otherwise use a list
while (numItems > 0 )
// if we successfully added it, move on
if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--;
return items;
}
// and because it's really fun; note -- you may get repetition
public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) {
while( true )
yield return list.ElementAt(randomizer.Next(list.Count()));
}
}
If you wanted to be even more efficient, you would probably use a HashSet of the indices, not the actual list elements (in case you've got complex types or expensive comparisons);
The Unit Test
And to make sure we don't have any collisions, etc.
[TestClass]
public class RandomizingTests : UnitTestBase {
[TestMethod]
public void GetRandomFromList() {
this.testGetRandomFromList((list, num) => list.GetRandom(num));
}
[TestMethod]
public void PluckRandomly() {
this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false);
}
private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) {
var items = Enumerable.Range(0, 100);
IEnumerable<int> randomItems = null;
while( repetitions-- > 0 ) {
randomItems = methodToGetRandomItems(items, numToTake);
Assert.AreEqual(numToTake, randomItems.Count(),
"Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions);
if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(),
"Collisions (non-unique values) found, failed at {0} repetition--", repetitions);
Assert.IsTrue(randomItems.All(o => items.Contains(o)),
"Some unknown values found; failed at {0} repetition--", repetitions);
}
}
}
Selecting N random items from a group shouldn't have anything to do with order! Randomness is about unpredictability and not about shuffling positions in a group. All the answers that deal with some kinda ordering is bound to be less efficient than the ones that do not. Since efficiency is the key here, I will post something that doesn't change the order of items too much.
1) If you need true random values which means there is no restriction on which elements to choose from (ie, once chosen item can be reselected):
public static List<T> GetTrueRandom<T>(this IList<T> source, int count,
bool throwArgumentOutOfRangeException = true)
{
if (throwArgumentOutOfRangeException && count > source.Count)
throw new ArgumentOutOfRangeException();
var randoms = new List<T>(count);
randoms.AddRandomly(source, count);
return randoms;
}
If you set the exception flag off, then you can choose random items any number of times.
If you have { 1, 2, 3, 4 }, then it can give { 1, 4, 4 }, { 1, 4, 3 } etc for 3 items or even { 1, 4, 3, 2, 4 } for 5 items!
This should be pretty fast, as it has nothing to check.
2) If you need individual members from the group with no repetition, then I would rely on a dictionary (as many have pointed out already).
public static List<T> GetDistinctRandom<T>(this IList<T> source, int count)
{
if (count > source.Count)
throw new ArgumentOutOfRangeException();
if (count == source.Count)
return new List<T>(source);
var sourceDict = source.ToIndexedDictionary();
if (count > source.Count / 2)
{
while (sourceDict.Count > count)
sourceDict.Remove(source.GetRandomIndex());
return sourceDict.Select(kvp => kvp.Value).ToList();
}
var randomDict = new Dictionary<int, T>(count);
while (randomDict.Count < count)
{
int key = source.GetRandomIndex();
if (!randomDict.ContainsKey(key))
randomDict.Add(key, sourceDict[key]);
}
return randomDict.Select(kvp => kvp.Value).ToList();
}
The code is a bit lengthier than other dictionary approaches here because I'm not only adding, but also removing from list, so its kinda two loops. You can see here that I have not reordered anything at all when count becomes equal to source.Count. That's because I believe randomness should be in the returned set as a whole. I mean if you want 5 random items from 1, 2, 3, 4, 5, it shouldn't matter if its 1, 3, 4, 2, 5 or 1, 2, 3, 4, 5, but if you need 4 items from the same set, then it should unpredictably yield in 1, 2, 3, 4, 1, 3, 5, 2, 2, 3, 5, 4 etc. Secondly, when the count of random items to be returned is more than half of the original group, then its easier to remove source.Count - count items from the group than adding count items. For performance reasons I have used source instead of sourceDict to get then random index in the remove method.
So if you have { 1, 2, 3, 4 }, this can end up in { 1, 2, 3 }, { 3, 4, 1 } etc for 3 items.
3) If you need truly distinct random values from your group by taking into account the duplicates in the original group, then you may use the same approach as above, but a HashSet will be lighter than a dictionary.
public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count,
bool throwArgumentOutOfRangeException = true)
{
if (count > source.Count)
throw new ArgumentOutOfRangeException();
var set = new HashSet<T>(source);
if (throwArgumentOutOfRangeException && count > set.Count)
throw new ArgumentOutOfRangeException();
List<T> list = hash.ToList();
if (count >= set.Count)
return list;
if (count > set.Count / 2)
{
while (set.Count > count)
set.Remove(list.GetRandom());
return set.ToList();
}
var randoms = new HashSet<T>();
randoms.AddRandomly(list, count);
return randoms.ToList();
}
The randoms variable is made a HashSet to avoid duplicates being added in the rarest of rarest cases where Random.Next can yield the same value, especially when input list is small.
So { 1, 2, 2, 4 } => 3 random items => { 1, 2, 4 } and never { 1, 2, 2}
{ 1, 2, 2, 4 } => 4 random items => exception!! or { 1, 2, 4 } depending on the flag set.
Some of the extension methods I have used:
static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
return rnd.Next(source.Count);
}
public static T GetRandom<T>(this IList<T> source)
{
return source[source.GetRandomIndex()];
}
static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count)
{
while (toCol.Count < count)
toCol.Add(fromList.GetRandom());
}
public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst)
{
return lst.ToIndexedDictionary(t => t);
}
public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst,
Func<S, T> valueSelector)
{
int index = -1;
return lst.ToDictionary(t => ++index, valueSelector);
}
If its all about performance with tens of 1000s of items in the list having to be iterated 10000 times, then you may want to have faster random class than System.Random, but I don't think that's a big deal considering the latter most probably is never a bottleneck, its quite fast enough..
Edit: If you need to re-arrange order of returned items as well, then there's nothing that can beat dhakim's Fisher-Yates approach - short, sweet and simple..
I combined several of the above answers to create a Lazily-evaluated extension method. My testing showed that Kyle's approach (Order(N)) is many times slower than drzaus' use of a set to propose the random indices to choose (Order(K)). The former performs many more calls to the random number generator, plus iterates more times over the items.
The goals of my implementation were:
1) Do not realize the full list if given an IEnumerable that is not an IList. If I am given a sequence of a zillion items, I do not want to run out of memory. Use Kyle's approach for an on-line solution.
2) If I can tell that it is an IList, use drzaus' approach, with a twist. If K is more than half of N, I risk thrashing as I choose many random indices again and again and have to skip them. Thus I compose a list of the indices to NOT keep.
3) I guarantee that the items will be returned in the same order that they were encountered. Kyle's algorithm required no alteration. drzaus' algorithm required that I not emit items in the order that the random indices are chosen. I gather all the indices into a SortedSet, then emit items in sorted index order.
4) If K is large compared to N and I invert the sense of the set, then I enumerate all items and test if the index is not in the set. This means that
I lose the Order(K) run time, but since K is close to N in these cases, I do not lose much.
Here is the code:
/// <summary>
/// Takes k elements from the next n elements at random, preserving their order.
///
/// If there are fewer than n elements in items, this may return fewer than k elements.
/// </summary>
/// <typeparam name="TElem">Type of element in the items collection.</typeparam>
/// <param name="items">Items to be randomly selected.</param>
/// <param name="k">Number of items to pick.</param>
/// <param name="n">Total number of items to choose from.
/// If the items collection contains more than this number, the extra members will be skipped.
/// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param>
/// <returns>Enumerable over the retained items.
///
/// See http://stackoverflow.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary.
/// </returns>
public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n)
{
var r = new FastRandom();
var itemsList = items as IList<TElem>;
if (k >= n || (itemsList != null && k >= itemsList.Count))
foreach (var item in items) yield return item;
else
{
// If we have a list, we can infer more information and choose a better algorithm.
// When using an IList, this is about 7 times faster (on one benchmark)!
if (itemsList != null && k < n/2)
{
// Since we have a List, we can use an algorithm suitable for Lists.
// If there are fewer than n elements, reduce n.
n = Math.Min(n, itemsList.Count);
// This algorithm picks K index-values randomly and directly chooses those items to be selected.
// If k is more than half of n, then we will spend a fair amount of time thrashing, picking
// indices that we have already picked and having to try again.
var invertSet = k >= n/2;
var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>();
var numbersNeeded = invertSet ? n - k : k;
while (numbersNeeded > 0)
if (positions.Add(r.Next(0, n))) numbersNeeded--;
if (invertSet)
{
// positions contains all the indices of elements to Skip.
for (var itemIndex = 0; itemIndex < n; itemIndex++)
{
if (!positions.Contains(itemIndex))
yield return itemsList[itemIndex];
}
}
else
{
// positions contains all the indices of elements to Take.
foreach (var itemIndex in positions)
yield return itemsList[itemIndex];
}
}
else
{
// Since we do not have a list, we will use an online algorithm.
// This permits is to skip the rest as soon as we have enough items.
var found = 0;
var scanned = 0;
foreach (var item in items)
{
var rand = r.Next(0,n-scanned);
if (rand < k - found)
{
yield return item;
found++;
}
scanned++;
if (found >= k || scanned >= n)
break;
}
}
}
}
I use a specialized random number generator, but you can just use C#'s Random if you want. (FastRandom was written by Colin Green and is part of SharpNEAT. It has a period of 2^128-1 which is better than many RNGs.)
Here are the unit tests:
[TestClass]
public class TakeRandomTests
{
/// <summary>
/// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability.
/// </summary>
[TestMethod]
public void TakeRandom_Array_Uniformity()
{
const int numTrials = 2000000;
const int expectedCount = numTrials/20;
var timesChosen = new int[100];
var century = new int[100];
for (var i = 0; i < century.Length; i++)
century[i] = i;
for (var trial = 0; trial < numTrials; trial++)
{
foreach (var i in century.TakeRandom(5, 100))
timesChosen[i]++;
}
var avg = timesChosen.Average();
var max = timesChosen.Max();
var min = timesChosen.Min();
var allowedDifference = expectedCount/100;
AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average");
//AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min");
//AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max");
var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
}
/// <summary>
/// Ensure that when randomly choosing items from an IEnumerable that is not an IList,
/// all items are chosen with roughly equal probability.
/// </summary>
[TestMethod]
public void TakeRandom_IEnumerable_Uniformity()
{
const int numTrials = 2000000;
const int expectedCount = numTrials / 20;
var timesChosen = new int[100];
for (var trial = 0; trial < numTrials; trial++)
{
foreach (var i in Range(0,100).TakeRandom(5, 100))
timesChosen[i]++;
}
var avg = timesChosen.Average();
var max = timesChosen.Max();
var min = timesChosen.Min();
var allowedDifference = expectedCount / 100;
var countInRange =
timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
}
private IEnumerable<int> Range(int low, int count)
{
for (var i = low; i < low + count; i++)
yield return i;
}
private static void AssertBetween(int x, int low, int high, String message)
{
Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
}
private static void AssertBetween(double x, double low, double high, String message)
{
Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
}
}
Here you have one implementation based on Fisher-Yates Shuffle whose algorithm complexity is O(n) where n is the subset or sample size, instead of the list size, as John Shedletsky pointed out.
public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize)
{
if (list == null) throw new ArgumentNullException("list");
if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
var indices = new Dictionary<int, int>(); int index;
var rnd = new Random();
for (int i = 0; i < sampleSize; i++)
{
int j = rnd.Next(i, list.Count);
if (!indices.TryGetValue(j, out index)) index = j;
yield return list[index];
if (!indices.TryGetValue(i, out index)) index = i;
indices[j] = index;
}
}
Extending from #ers's answer, if one is worried about possible different implementations of OrderBy, this should be safe:
// Instead of this
YourList.OrderBy(x => rnd.Next()).Take(5)
// Temporarily transform
YourList
.Select(v => new {v, i = rnd.Next()}) // Associate a random index to each entry
.OrderBy(x => x.i).Take(5) // Sort by (at this point fixed) random index
.Select(x => x.v); // Go back to enumerable of entry
The simple solution I use (probably not good for large lists):
Copy the list into temporary list, then in loop randomly select Item from temp list and put it in selected items list while removing it form temp list (so it can't be reselected).
Example:
List<Object> temp = OriginalList.ToList();
List<Object> selectedItems = new List<Object>();
Random rnd = new Random();
Object o;
int i = 0;
while (i < NumberOfSelectedItems)
{
o = temp[rnd.Next(temp.Count)];
selectedItems.Add(o);
temp.Remove(o);
i++;
}
This is the best I could come up with on a first cut:
public List<String> getRandomItemsFromList(int returnCount, List<String> list)
{
List<String> returnList = new List<String>();
Dictionary<int, int> randoms = new Dictionary<int, int>();
while (randoms.Count != returnCount)
{
//generate new random between one and total list count
int randomInt = new Random().Next(list.Count);
// store this in dictionary to ensure uniqueness
try
{
randoms.Add(randomInt, randomInt);
}
catch (ArgumentException aex)
{
Console.Write(aex.Message);
} //we can assume this element exists in the dictonary already
//check for randoms length and then iterate through the original list
//adding items we select via random to the return list
if (randoms.Count == returnCount)
{
foreach (int key in randoms.Keys)
returnList.Add(list[randoms[key]]);
break; //break out of _while_ loop
}
}
return returnList;
}
Using a list of randoms within a range of 1 - total list count and then simply pulling those items in the list seemed to be the best way, but using the Dictionary to ensure uniqueness is something I'm still mulling over.
Also note I used a string list, replace as needed.
Based on Kyle's answer, here's my c# implementation.
/// <summary>
/// Picks random selection of available game ID's
/// </summary>
private static List<int> GetRandomGameIDs(int count)
{
var gameIDs = (int[])HttpContext.Current.Application["NonDeletedArcadeGameIDs"];
var totalGameIDs = gameIDs.Count();
if (count > totalGameIDs) count = totalGameIDs;
var rnd = new Random();
var leftToPick = count;
var itemsLeft = totalGameIDs;
var arrPickIndex = 0;
var returnIDs = new List<int>();
while (leftToPick > 0)
{
if (rnd.Next(0, itemsLeft) < leftToPick)
{
returnIDs .Add(gameIDs[arrPickIndex]);
leftToPick--;
}
arrPickIndex++;
itemsLeft--;
}
return returnIDs ;
}
This method may be equivalent to Kyle's.
Say your list is of size n and you want k elements.
Random rand = new Random();
for(int i = 0; k>0; ++i)
{
int r = rand.Next(0, n-i);
if(r<k)
{
//include element i
k--;
}
}
Works like a charm :)
-Alex Gilbert
Here is a benchmark of three different methods:
The implementation of the accepted answer from Kyle.
An approach based on random index selection with HashSet duplication filtering, from drzaus.
A more academic approach posted by Jesús López, called Fisher–Yates shuffle.
The testing will consist of benchmarking the performance with multiple different list sizes and selection sizes.
I also included a measurement of the standard deviation of these three methods, i.e. how well distributed the random selection appears to be.
In a nutshell, drzaus's simple solution seems to be the best overall, from these three. The selected answer is great and elegant, but it's not that efficient, given that the time complexity is based on the sample size, not the selection size. Consequently, if you select a small number of items from a long list, it will take orders of magnitude more time. Of course it still performs better than the solutions based on complete reordering.
Curiously enough, this O(n) time complexity issue is true even if you only touch the list when you actually return an item, like I do in my implementation. The only thing I can thing of is that Random.Next() is pretty slow, and that performance benefits if you generate only one random number for each selected item.
And, also interestingly, the StdDev of Kyle's solution was significantly higher comparatively. I have no clue why; maybe the fault is in my implementation.
Sorry for the long code and output that will commence now; but I hope it's somewhat illuminative. Also, if you spot any issues in the tests or implementations, let me know and I'll fix it.
static void Main()
{
BenchmarkRunner.Run<Benchmarks>();
new Benchmarks() { ListSize = 100, SelectionSize = 10 }
.BenchmarkStdDev();
}
[MemoryDiagnoser]
public class Benchmarks
{
[Params(50, 500, 5000)]
public int ListSize;
[Params(5, 10, 25, 50)]
public int SelectionSize;
private Random _rnd;
private List<int> _list;
private int[] _hits;
[GlobalSetup]
public void Setup()
{
_rnd = new Random(12345);
_list = Enumerable.Range(0, ListSize).ToList();
_hits = new int[ListSize];
}
[Benchmark]
public void Test_IterateSelect()
=> Random_IterateSelect(_list, SelectionSize).ToList();
[Benchmark]
public void Test_RandomIndices()
=> Random_RandomIdices(_list, SelectionSize).ToList();
[Benchmark]
public void Test_FisherYates()
=> Random_FisherYates(_list, SelectionSize).ToList();
public void BenchmarkStdDev()
{
RunOnce(Random_IterateSelect, "IterateSelect");
RunOnce(Random_RandomIdices, "RandomIndices");
RunOnce(Random_FisherYates, "FisherYates");
void RunOnce(Func<IEnumerable<int>, int, IEnumerable<int>> method, string methodName)
{
Setup();
for (int i = 0; i < 1000000; i++)
{
var selected = method(_list, SelectionSize).ToList();
Debug.Assert(selected.Count() == SelectionSize);
foreach (var item in selected) _hits[item]++;
}
var stdDev = GetStdDev(_hits);
Console.WriteLine($"StdDev of {methodName}: {stdDev :n} (% of average: {stdDev / (_hits.Average() / 100) :n})");
}
double GetStdDev(IEnumerable<int> hits)
{
var average = hits.Average();
return Math.Sqrt(hits.Average(v => Math.Pow(v - average, 2)));
}
}
public IEnumerable<T> Random_IterateSelect<T>(IEnumerable<T> collection, int needed)
{
var count = collection.Count();
for (int i = 0; i < count; i++)
{
if (_rnd.Next(count - i) < needed)
{
yield return collection.ElementAt(i);
if (--needed == 0)
yield break;
}
}
}
public IEnumerable<T> Random_RandomIdices<T>(IEnumerable<T> list, int needed)
{
var selectedItems = new HashSet<T>();
var count = list.Count();
while (needed > 0)
if (selectedItems.Add(list.ElementAt(_rnd.Next(count))))
needed--;
return selectedItems;
}
public IEnumerable<T> Random_FisherYates<T>(IEnumerable<T> list, int sampleSize)
{
var count = list.Count();
if (sampleSize > count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
var indices = new Dictionary<int, int>(); int index;
for (int i = 0; i < sampleSize; i++)
{
int j = _rnd.Next(i, count);
if (!indices.TryGetValue(j, out index)) index = j;
yield return list.ElementAt(index);
if (!indices.TryGetValue(i, out index)) index = i;
indices[j] = index;
}
}
}
Output:
| Method | ListSize | Select | Mean | Error | StdDev | Gen 0 | Allocated |
|-------------- |--------- |------- |------------:|----------:|----------:|-------:|----------:|
| IterateSelect | 50 | 5 | 711.5 ns | 5.19 ns | 4.85 ns | 0.0305 | 144 B |
| RandomIndices | 50 | 5 | 341.1 ns | 4.48 ns | 4.19 ns | 0.0644 | 304 B |
| FisherYates | 50 | 5 | 573.5 ns | 6.12 ns | 5.72 ns | 0.0944 | 447 B |
| IterateSelect | 50 | 10 | 967.2 ns | 4.64 ns | 3.87 ns | 0.0458 | 220 B |
| RandomIndices | 50 | 10 | 709.9 ns | 11.27 ns | 9.99 ns | 0.1307 | 621 B |
| FisherYates | 50 | 10 | 1,204.4 ns | 10.63 ns | 9.94 ns | 0.1850 | 875 B |
| IterateSelect | 50 | 25 | 1,358.5 ns | 7.97 ns | 6.65 ns | 0.0763 | 361 B |
| RandomIndices | 50 | 25 | 1,958.1 ns | 15.69 ns | 13.91 ns | 0.2747 | 1298 B |
| FisherYates | 50 | 25 | 2,878.9 ns | 31.42 ns | 29.39 ns | 0.3471 | 1653 B |
| IterateSelect | 50 | 50 | 1,739.1 ns | 15.86 ns | 14.06 ns | 0.1316 | 629 B |
| RandomIndices | 50 | 50 | 8,906.1 ns | 88.92 ns | 74.25 ns | 0.5951 | 2848 B |
| FisherYates | 50 | 50 | 4,899.9 ns | 38.10 ns | 33.78 ns | 0.4349 | 2063 B |
| IterateSelect | 500 | 5 | 4,775.3 ns | 46.96 ns | 41.63 ns | 0.0305 | 144 B |
| RandomIndices | 500 | 5 | 327.8 ns | 2.82 ns | 2.50 ns | 0.0644 | 304 B |
| FisherYates | 500 | 5 | 558.5 ns | 7.95 ns | 7.44 ns | 0.0944 | 449 B |
| IterateSelect | 500 | 10 | 5,387.1 ns | 44.57 ns | 41.69 ns | 0.0458 | 220 B |
| RandomIndices | 500 | 10 | 648.0 ns | 9.12 ns | 8.54 ns | 0.1307 | 621 B |
| FisherYates | 500 | 10 | 1,154.6 ns | 13.66 ns | 12.78 ns | 0.1869 | 889 B |
| IterateSelect | 500 | 25 | 6,442.3 ns | 48.90 ns | 40.83 ns | 0.0763 | 361 B |
| RandomIndices | 500 | 25 | 1,569.6 ns | 15.79 ns | 14.77 ns | 0.2747 | 1298 B |
| FisherYates | 500 | 25 | 2,726.1 ns | 25.32 ns | 22.44 ns | 0.3777 | 1795 B |
| IterateSelect | 500 | 50 | 7,775.4 ns | 35.47 ns | 31.45 ns | 0.1221 | 629 B |
| RandomIndices | 500 | 50 | 2,976.9 ns | 27.11 ns | 24.03 ns | 0.6027 | 2848 B |
| FisherYates | 500 | 50 | 5,383.2 ns | 36.49 ns | 32.35 ns | 0.8163 | 3870 B |
| IterateSelect | 5000 | 5 | 45,208.6 ns | 459.92 ns | 430.21 ns | - | 144 B |
| RandomIndices | 5000 | 5 | 328.7 ns | 5.15 ns | 4.81 ns | 0.0644 | 304 B |
| FisherYates | 5000 | 5 | 556.1 ns | 10.75 ns | 10.05 ns | 0.0944 | 449 B |
| IterateSelect | 5000 | 10 | 49,253.9 ns | 420.26 ns | 393.11 ns | - | 220 B |
| RandomIndices | 5000 | 10 | 642.9 ns | 4.95 ns | 4.13 ns | 0.1307 | 621 B |
| FisherYates | 5000 | 10 | 1,141.9 ns | 12.81 ns | 11.98 ns | 0.1869 | 889 B |
| IterateSelect | 5000 | 25 | 54,044.4 ns | 208.92 ns | 174.46 ns | 0.0610 | 361 B |
| RandomIndices | 5000 | 25 | 1,480.5 ns | 11.56 ns | 10.81 ns | 0.2747 | 1298 B |
| FisherYates | 5000 | 25 | 2,713.9 ns | 27.31 ns | 24.21 ns | 0.3777 | 1795 B |
| IterateSelect | 5000 | 50 | 54,418.2 ns | 329.62 ns | 308.32 ns | 0.1221 | 629 B |
| RandomIndices | 5000 | 50 | 2,886.4 ns | 36.53 ns | 34.17 ns | 0.6027 | 2848 B |
| FisherYates | 5000 | 50 | 5,347.2 ns | 59.45 ns | 55.61 ns | 0.8163 | 3870 B |
StdDev of IterateSelect: 671.88 (% of average: 0.67)
StdDev of RandomIndices: 296.07 (% of average: 0.30)
StdDev of FisherYates: 280.47 (% of average: 0.28)
It is a lot harder than one would think. See the great Article "Shuffling" from Jeff.
I did write a very short article on that subject including C# code:
Return random subset of N elements of a given array
Goal: Select N number of items from collection source without duplication.
I created an extension for any generic collection. Here's how I did it:
public static class CollectionExtension
{
public static IList<TSource> RandomizeCollection<TSource>(this IList<TSource> source, int maxItems)
{
int randomCount = source.Count > maxItems ? maxItems : source.Count;
int?[] randomizedIndices = new int?[randomCount];
Random random = new Random();
for (int i = 0; i < randomizedIndices.Length; i++)
{
int randomResult = -1;
while (randomizedIndices.Contains((randomResult = random.Next(0, source.Count))))
{
//0 -> since all list starts from index 0; source.Count -> maximum number of items that can be randomize
//continue looping while the generated random number is already in the list of randomizedIndices
}
randomizedIndices[i] = randomResult;
}
IList<TSource> result = new List<TSource>();
foreach (int index in randomizedIndices)
result.Add(source.ElementAt(index));
return result;
}
}
Short and simple. Hope this helps someone!
if (list.Count > maxListCount)
{
var rndList = new List<YourEntity>();
var r = new Random();
while (rndList.Count < maxListCount)
{
var addingElement = list[r.Next(list.Count)];
//element uniqueness checking
//choose your case
//if (rndList.Contains(addingElement))
//if (rndList.Any(p => p.Id == addingElement.Id))
continue;
rndList.Add(addingElement);
}
return rndList;
}
public static IEnumerable<TItem> RandomSample<TItem>(this IReadOnlyList<TItem> items, int count)
{
if (count < 1 || count > items.Count)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
List<int> indexes = Enumerable.Range(0, items.Count).ToList();
int yieldedCount = 0;
while (yieldedCount < count)
{
int i = RandomNumberGenerator.GetInt32(indexes.Count);
int randomIndex = indexes[i];
yield return items[randomIndex];
// indexes.RemoveAt(i); // Avoid removing items from the middle of the list
indexes[i] = indexes[indexes.Count - 1]; // Replace yielded index with the last one
indexes.RemoveAt(indexes.Count - 1);
yieldedCount++;
}
}
public static IEnumerable<T> GetRandom<T>(IList<T> list, int count, Random random)
{
// Probably you should throw exception if count > list.Count
count = Math.Min(list.Count, count);
var selectedIndices = new SortedSet<int>();
// Random upper bound (exclusive)
int randomMax = list.Count;
while (selectedIndices.Count < count)
{
int randomIndex = random.Next(0, randomMax);
// skip over already selected indices
foreach (var selectedIndex in selectedIndices)
if (selectedIndex <= randomIndex)
++randomIndex;
else
break;
yield return list[randomIndex];
selectedIndices.Add(randomIndex);
--randomMax;
}
}
Memory: ~count
Complexity: O(count2)
I recently did this on my project using an idea similar to Tyler's point 1.
I was loading a bunch of questions and selecting five at random. Sorting was achieved using an IComparer.
aAll questions were loaded in the a QuestionSorter list, which was then sorted using the List's Sort function and the first k elements where selected.
private class QuestionSorter : IComparable<QuestionSorter>
{
public double SortingKey
{
get;
set;
}
public Question QuestionObject
{
get;
set;
}
public QuestionSorter(Question q)
{
this.SortingKey = RandomNumberGenerator.RandomDouble;
this.QuestionObject = q;
}
public int CompareTo(QuestionSorter other)
{
if (this.SortingKey < other.SortingKey)
{
return -1;
}
else if (this.SortingKey > other.SortingKey)
{
return 1;
}
else
{
return 0;
}
}
}
Usage:
List<QuestionSorter> unsortedQuestions = new List<QuestionSorter>();
// add the questions here
unsortedQuestions.Sort(unsortedQuestions as IComparer<QuestionSorter>);
// select the first k elements
why not something like this:
Dim ar As New ArrayList
Dim numToGet As Integer = 5
'hard code just to test
ar.Add("12")
ar.Add("11")
ar.Add("10")
ar.Add("15")
ar.Add("16")
ar.Add("17")
Dim randomListOfProductIds As New ArrayList
Dim toAdd As String = ""
For i = 0 To numToGet - 1
toAdd = ar(CInt((ar.Count - 1) * Rnd()))
randomListOfProductIds.Add(toAdd)
'remove from id list
ar.Remove(toAdd)
Next
'sorry i'm lazy and have to write vb at work :( and didn't feel like converting to c#
Here's my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).
It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:
public <T> List<T> take(List<T> source, int k) {
int n = source.size();
if (k > n) {
throw new IllegalStateException(
"Can not take " + k +
" elements from a list with " + n +
" elements");
}
List<T> result = new ArrayList<T>(k);
Map<Integer,Integer> used = new HashMap<Integer,Integer>();
int metric = 0;
for (int i = 0; i < k; i++) {
int off = random.nextInt(n - i);
while (true) {
metric++;
Integer redirect = used.put(off, n - i - 1);
if (redirect == null) {
break;
}
off = redirect;
}
result.add(source.get(off));
}
assert metric <= 2*k;
return result;
}
This isn't as elegant or efficient as the accepted solution, but it's quick to write up. First, permute the array randomly, then select the first K elements. In python,
import numpy
N = 20
K = 5
idx = np.arange(N)
numpy.random.shuffle(idx)
print idx[:K]
I would use an extension method.
public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> elements, int countToTake)
{
var random = new Random();
var internalList = elements.ToList();
var selected = new List<T>();
for (var i = 0; i < countToTake; ++i)
{
var next = random.Next(0, internalList.Count - selected.Count);
selected.Add(internalList[next]);
internalList[next] = internalList[internalList.Count - selected.Count];
}
return selected;
}
Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:
new int[5].Select(o => (int)(rnd.NextDouble() * maxIndex)).Select(i => YourIEnum.ElementAt(i))
For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.
Having a large list will reduce the odds greatly for duplicates.

Categories

Resources