Can I use LINQ to generate random numbers reliably in C#? - c#

I am familiar with Random and RNGCryptoServiceProvider and how they can be used to generate random numbers in C#.
But I am also aware of the following method:
int start = 10;
int end = 50;
Enumerable.Range(start, end).OrderBy(o => Guid.NewGuid()).Take(5);
Given the streaming power of LINQ and the given thread-safety of this method (as opposed to using a shared Random) I cannot find a reason not to use this method.
What am I missing other than performance?

The Guid class is intended to create unique numbers, not random ones.
There is no guarantee that you will get a uniform distribution. It will certainly not qualify as safe for Encryption related tasks.
Just use the class that is most appropriate for a certain task. Random for fast pseudo random data and the RNGCryptoServiceProvider for security.
The Linq approach is a hack. When you have a Threading problem, make it thread-safe in an established manner.

If you want to use linq interface, I would suggest something like this:
public IEnumerable<int> GetRandomSequence(int start, int end)
{
var generator = new Random();
while (true)
yield return generator.Next(start, end);
}
And then
var randomSequence = GetRandomSequence(10, 50).Take(5);
But it would be safer to use a counter instead of "while (true)" to avoid endless cycles in cases like this:
var sequence = GetRandomSequence();
var someOtherSequence = sequence.Skip(5); // here we forget to call Take, Zip, TakeWhile etc.
// some code
var someVar = someOtherSequence.ToList(); // endless call here
var someOtherVar = someOtherSequence.Join( ... ) // and here

The problem (or possibly benefit?) of your technique is that it will not repeat any number in the range.
In a true random-set, you could reasonably see a number repeated on occasion.
Example: {12, 46, 12, 27, 12} should be valid output.
With your method, you'll never see 12 repeated.If you want the numbers 10-50 returned in a random order, but each one only once, then you really want a Shuffling Algorithm.

Related

Randomize list with seed

I'm looking for a way to randomize a List using a seed. What I want to achieve by this is controlled randomization. In other words to make sure that a list is always randomized in the same order if I use the same seed to do it.
I'm currently using this code:
string arbitrarySeed = "someValue";
System.Random random = new System.Random(BitConverter.ToInt32(Encoding.ASCII.GetBytes(arbitrarySeed), 0));
List<object> randomizedOptions = items?.OrderBy(o => random.Next()).ToList();
Which seems a bit overcomplicated so I'm looking for a more elegant way to handle this.
I know that randomizing is this way is not actually very random, but I don't need it to be true randomness. Something that comes close will do fine.
Depending on how "consistent" you need the generated series to be, you could use string.GetHashCode:
string arbitrarySeed = "someValue";
System.Random random = new System.Random(arbitrarySeed.GetHashCode());
List<object> randomizedOptions = items?.OrderBy(o => random.Next()).ToList();
GetHashCode is not guaranteed to be consistent across .NET versions or implementations (e.g. Linux, Windows, macOS) but if you need localized consistency it may work for you.
Also note that BitConverter.ToInt32 will only take 4 bytes of the array you pass in, so "someValue" and "someValue2" would give you the same seed value.

How do I insert an int into a sorted array quickly?

I'd like to insert an int into a sorted array. This operation is going to be performed very often, so it needs to be as fast as possible.
It is possible and even preferred to use a List or any other class instead of an array
All values are in the 1 to 34 range
The array typically contains exactly 14 values
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide. Also, I felt like I missed an idea. Do you have experiences on this topic or any new ideas to consider?
I will use an int array whose length is 35(because you said range 1-34) to record the status of the numbers.
int[] status = Enumerable.Repeat(0, 35).ToArray();
//an array contains 35 zeros
//which means currently there is no elements in the array
status[10] = 1; // now the array have only one number: 10
status[11] ++; // a new number 11 is added to the list
So if you want to add a number i to the list:
status[i]++; // O(1) to add a number
To remove an i from the list:
status[i]--; // O(1) to remove a number
Want to know all the numebrs in the list?
for (int i = 0; i < status.Length; i++)
{
if (status[i] > 0)
{
for (int j = 0; j < status[i]; j++)
Console.WriteLine(i);
}
}
//or more easier using LINQ
var result = status.SelectMany((i, index) => Enumerable.Repeat(index, i));
The following example may help you understand my code better:
the real number array: 1 12 12 15 9 34 // i don't care if it's sorted
the status array: status[1]=1,status[12]=2,status[15]=1,status[9]=1,status[34]=1
all others are 0
At 14 values this is a pretty small array, I don't think switching to a smarter data structure such as a list will win you much, especially if you fast good random access. Even binary search may actually be slower than linear search at this scale. Are you sure that, say, insert-on-copy does not satisfy your performance requirements?
This operation is going to be performed very often, so it needs to be as fast as possible.
The things that you notice happen "very often" are frequently not the bottlenecks in the program - it's often surprising what the actual bottlenecks are. You should code something simple and measure the actual performance of your program before performing any optimizations.
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide.
Assuming that this is the bottleneck, the big-O performance of the different methods is not going to be relevant here because of the small size of your array. It is easier to just try a few different approaches, measure the results, see which performs best and choose that method. If you have followed the advice from the first paragraph you already have a profiler setup that you can use for this step too.
For inserting into the middle, a LinkedList<int> would be the fastest option - anything else involves copying data. At 14 elements, don't stress over binary search etc - just walk forwards to the item you want:
using System;
using System.Collections.Generic;
static class Program
{
static void Main()
{
LinkedList<int> data = new LinkedList<int>();
Random rand = new Random(12345);
for (int i = 0; i < 20; i++)
{
data.InsertSortedValue(rand.Next(300));
}
foreach (int i in data) Console.WriteLine(i);
}
}
static class LinkedListExtensions {
public static void InsertSortedValue(this LinkedList<int> list, int value)
{
LinkedListNode<int> node = list.First, next;
if (node == null || node.Value > value)
{
list.AddFirst(value);
}
else
{
while ((next = node.Next) != null && next.Value < value)
node = next;
list.AddAfter(node, value);
}
}
}
Doing the brute-force approach is the best decision here because 14 isn't a number :). However, this is not a scalable decision, since should 14 become 14000 one day that will cause problems
What is the most common operation with your array?
Insert? Read?
Heap data structure will give you O(log(14)) for both of them. SortedDictionary may hit your performance.
Using a simple array will give you O(1) for reading and O(14) for insert.
By the way, have you tried System.Collections.Generic.SortedDictionary ot System.Collections.Generic.SortedList?
If you're on .Net 4 you should take a look at the SortedSet<T>. Otherwise take a look at SortedDictionary<TKey, TValue> where you make TValue as object and just put null into it, cause you're just interested into the keys.
If there is no repeated value on the array and the possible values won´t change maybe a fixed size array where the value is equal to the index is a good choice
Both insert and read are O(1)
You have a range of possible values from 1-34 which is rather narrow. So the fastest way would likely be using an array with 34 slots. To insert a number n just do array[n-1]++ and to remove it do array[n.1]-- (if n>0).
To check if a value exists in your collection you do array[n-1]>0.
edit: Damn...Danny was faster. :)
Write a method takes an array of integers and sorts them in place using Bubble Sort. The method is not allowed to create any additional arrays. Bubble Sort is a simple sorting algorithm that works by looping through the array to be sorted, comparing each pair of adjacent elements and swapping them if they are in the wrong order.

Designing a custom Random class

I know C# has the Random class and probably a few classes in LINQ to do this, but if I was to write my own code to randomly select an item from a collection without using any built in .NET objects, how would this be done?
I can't seem to nail the logic required for this - how would I tell the system when to stop an iteration and select the current value - at random?
EDIT: This is a hypothetical question. This is not related to a production coding matter. I am just curious.
Selecting a random element from a collection can be done as follows.
Random r = new Random();
int randomIndex = r.Next(0, myCollection.Size -1);
var randomCollectionItem = myCollection[randomIndex];
Unless you have a VERY good reason, writing your own random generator is not necessary.
My advice to you is DON'T DO IT. Whatever reason you think you may have for not wanting to use the built-in library, I am pretty sure you misunderstood something. Please go back to the drawing board.
All of the advice above is technically accurate, but is kind of like giving a chemistry textbook to someone who wants to refine his own oil to use in his car.
There are many pseudo-random number generators. They aren't truly random, but they come at different quality, distinguished by their statistical and sequential properties and what purpose they are applicable for.
It very much depends on "how random you need it". If it just needs to "look random to a human", simple generators look like that:
rnd = seed; // some starting value
rnd = (a * rnd + b) % c; // next value
...
For well chosen values of a, b, and cthese generators are ok for simple statistical tests. A detailed discussion and common values for these you find here.
One interesting approach is to collect as much "external" data as possible - like time between keypresses, mouse movements, duration of disk reads etc. -, and use an algorithm that accumulates randomness while discarding dependency. That is mathematically tricky though (IIRC not long ago a critical attack surfaced based on one of these not being as random as thought).
Only a very few special applications use a truly random external hardware source - anything between a open-imput amplifier and radioactive decay.
You need to use a seed, something semi random provided by the computer itself.
Maybe use very fine resolution time and use the last couple microseconds when the method is called. That should be random enough to generate anything from 00 to 99, you can then go from there.
It sounds like your problem isn't in calculating a random number, but in how to use that random number to select an item from a list. Assuming you can create a random number somehow, all you need to do is use it as the argument to the list's indexer.
int index = customRandomGenerator.Next();
var selection = items[index];
Assuming that your presupposition about having to iterate through the list is correct (or the collection doesn't have an indexer) then you could do:
int index = customRandomGenerator.Next();
Item selection = null;
for (int i = 0; i < items.Length; i++)
{
if (i == index)
{
selection = items[i];
break;
}
}
The only true "cryptographically strong" random number generator in the .Net Framework is in System.Cryptography.RandomNumberGenerator - run this through Reflector to see what is does? Looking at your problem you would need a to know the Count of the collection otherwise you may never retrieve an item - you would need to specify a start and end value to draw random numbers from - the Random class would work best - pop it through Reflector.
Well, I never thought about implementing that myself as it seems like reinventing the wheel but you may have a look on this wikipedia article, hope it helps you do what you want
Random Number Generator

break up array into little arrays

i am sending out email to a list of people. I have the list of recipients in array but the list can get up to 500 people. There is a limitation on the number of recipients that my mail server sends out at once (50 recipients)
so if the list is > 50 i need to break it up in to different mails.
What is the best way to take one array and break it up into arrays of 50
for example:
if array is 120 long, i would expect 3 arrays returned, one with 50, another with 50 and a third with 20.
You could use the Batch operation from MoreLINQ:
Person[] array = ...;
var arrays = list.Batch(50).Select(x = x.ToArray());
foreach (Person[] shorterArray in arrays)
{
...
}
(If you're happy with IEnumerable<Person> instead of arrays, you don't need the Select call of course.)
Maybe ArraySegment<T> works for you? You'd have to split it up manually though, but this is not hard in a loop.
int recipient = 0;
while (recipient < recipients.Count) {
ArraySegment<string> recipientSegment = new ArraySegment<string>(recipients, recipient, Math.Min(50, recipients.Count-recipient));
// build your message here, using the recipientSegment for the names
recipient += 50;
}
I would simply iterate over the complete array, building up the recipients string, then sending out an email when the limit is reached, then resetting the string and continuing on with the iteration until the next limit event or until the end of the array is reached.
If you can use LINQ when you may find this useful: Linq: How to group by maximum number of items
Shouldn't LINQ be the right stuff for this?
A common method for "paging" results from a set is to combine the Skip and Take methods provided by LINQ. This solution is great because it can be further combined with other LINQ methods to implement filtering, ordering, etc. as needed.
I'm not sure what the performance considerations are for your application, so keep in mind that this may not perform very well for sets where the number of pages is relatively large (i.e., batch size is significantly smaller than the total size of the set), but it's at least fairly straightforward for anyone familiar with this style of coding.
Here's an example of what this implementation might look like:
List<EmailAddress> list = new List<EmailAddress>();
const int BATCH_SIZE = 50;
for (int i = 0; i < list.Count; i += BATCH_SIZE)
{
IEnumerable<EmailAddress> currentBatch =
list.Skip(i).Take(BATCH_SIZE);
// do stuff...
}

random string generation - two generated one after another give same results

I have a simple piece of code:
public string GenerateRandomString()
{
string randomString = string.Empty;
Random r = new Random();
for (int i = 0; i < length; i++)
randomString += chars[r.Next(chars.Length)];
return randomString;
}
If i call this function to generate two strings, one after another, they are identical... but if i debug through the two lines where the strings are generated - the results are different.
does anyone know why is it happening?
This is happening, because the calls happen very close to each other (during the same milli-second), then the Random constructor will seed the Random object with the same value (it uses date & time by default).
So, there are two solutions, actually.
1. Provide your own seed value, that would be unique each time you construct the Random object.
2. Always use the same Random object - only construct it once.
Personally, I would use the second approach. It can be done by making the Random object static, or making it a member of the class.
The above answers are correct. I would suggest the following changes to your code though:
1) I would suggest using a StringBuilder instead of appending to the string all the time. Strings are immutable, so this is creating a new string each time you add to it. If you have never used StringBuilder, look it up. It is very useful for this sort of work.
2) You can make your method easier to reuse if you pass the length into the method itself. You could probably pass the chars array in as well, but I've left that out.
3) Use the same random object each time, as suggested above.
public string GenerateRandomString(int length)
{
StringBuilder randomString = new StringBuilder(length);
for (int i = 0; i < length; i++)
randomString.Append(chars[(int)(_RandomObj.Next(chars.Length))].ToString());
return randomString.ToString();
}
It's because you're creating two random objects at the same time. This is giving it the same seed, so you're going to get the same numbers.
When you debug it, there's time between the creation of the random objects which allow them to get different seeds.
The default constructor for Random (the one you're using) seeds the generator with a value based on the current time. If the time in milliseconds doesn't change between the first and second call of the function, it would use the same random seed.
My suggestion is to use a static Random object and only initialize it once.
Since the Random generator is tied with the system clock you are probably displaying the same results with that time period. There are several ways to correct. If you are using loops place the Random rnd = new Random(); outside of the loop.
Place the Random rnd = new Random(); line where you declare your variables and use the same variable throughout your program (rnd for this example).
This will work in most cases.

Categories

Resources