Emulate Python's random.choice in .NET - c#

Python's module 'random' has a function random.choice
random.choice(seq)
Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
How can I emulate this in .NET ?
public T RandomChoice<T> (IEnumerable<T> source)
Edit: I heard this as an interview question some years ago, but today the problem occurred naturally in my work. The interview question was stated with constraints
'the sequence is too long to save to memory'
'you can only loop over the sequence once'
'the sequence doesn't have a length/count method' (à la .NET IEnumerable)

To make a method that iterates the source only once, and doesn't have to allocate memory to store it temporarily, you count how many items you have iterated, and determine the probability that the current item should be the result:
public T RandomChoice<T> (IEnumerable<T> source) {
Random rnd = new Random();
T result = default(T);
int cnt = 0;
foreach (T item in source) {
cnt++;
if (rnd.Next(cnt) == 0) {
result = item;
}
}
return result;
}
When you are at the first item, the probability is 1/1 that it should be used (as that is the only item that you have seen this far). When you are at the second item, the probability is 1/2 that it should replace the first item, and so on.
This will naturally use a bit more CPU, as it creates one random number per item, not just a single random number to select an item, as dasblinkenlight pointed out. You can check if the source implements IList<T>, as Dan Tao suggested, and use an implementation that uses the capabilities to get the length of the collection and access items by index:
public T RandomChoice<T> (IEnumerable<T> source) {
IList<T> list = source as IList<T>;
if (list != null) {
// use list.Count and list[] to pick an item by random
} else {
// use implementation above
}
}
Note: You should consider sending the Random instance into the method. Otherwise you will get the same random seed if you call the method two times too close in time, as the seed is created from the current time.
The result of a test run, picking one number from an array containing 0 - 9, 1000000 times, to show that the distribution of the chosen numbers is not skewed:
0: 100278
1: 99519
2: 99994
3: 100327
4: 99571
5: 99731
6: 100031
7: 100429
8: 99482
9: 100638

To avoid iterating through the sequence two times (once for the count and once for the element) it is probably a good idea to save your sequence in an array before getting its random element:
public static class RandomExt {
private static Random rnd = new Random();
public static T RandomChoice<T> (this IEnumerable<T> source) {
var arr = source.ToArray();
return arr[rnd.Next(arr.Length)];
}
public static T RandomChoice<T> (this ICollection<T> source) {
return source[rnd.Next(rnd.Count)];
}
}
EDIT Implemented a very good idea by Chris Sinclair.

private static Random rng = new Random();
...
return source.Skip(rng.next(source.Count())).Take(1);

public T RandomChoice<T> (IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var list = source.ToList();
if (list.Count < 1)
{
throw new MissingMemberException();
}
var rnd = new Random();
return list[rnd.Next(0, list.Count)];
}
or extension
public static T RandomChoice<T> (this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var list = source.ToList();
if (list.Count < 1)
{
throw new MissingMemberException();
}
var rnd = new Random();
return list[rnd.Next(0, list.Count)];
}

I'd go with dasblinkenlight's answer, with one small change: leverage the fact that source might already be an indexed collection, in which case you really don't need to populate a new array (or list):
public static class RandomExt
{
public static T Choice<T>(this Random random, IEnumerable<T> sequence)
{
var list = sequence as IList<T> ?? sequence.ToList();
return list[random.Next(list.Count)];
}
}
Note that I also modified the interface from the abovementioned answer to make it more consistent with the Python version you referenced in your question:
var random = new Random();
var numbers = new int[] { 1, 2, 3 };
int randomNumber = random.Choice(numbers);
Edit: I like Guffa's answer even better, actually.

Well, get a list of all elements in the sequence. ask a random number generator for the index, return elemnt by index. Define what Sequence is - IEnumerable would be most obvious, but you need to materialize that into a list then to know the number of elements for the random number generator.
This is btw., not emulate, it is implement.
Is this some homework beginner study course question?

Assuming one has an extension method IEnumerable.MinBy:
var r = new Random();
return source.MinBy(x=>r.Next())
The method MinBy doesn't save the sequence to memory, it works like IEnumerable.Min making one iteration (see MoreLinq or elsewhere )

Related

Accessing yield return collection

Is there any way to access the IEnumerable<T> collection being build up by yield return in a loop from within the method building the IEnumerable itself?
Silly example:
Random random = new Random();
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
while ([RETURN_VALUE].Count() < n)
{
int value = random.Next(max);
if (![RETURN_VALUE].Contains(value))
yield return value;
}
}
There is no collection being built up. The sequence that is returned is evaluated lazily, and unless the caller explicitly copies the data to another collection, it will be gone as soon as it's been fetched.
If you want to ensure uniqueness, you'll need to do that yourself. For example:
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
HashSet<int> returned = new HashSet<int>();
for (int i = 0; i < n; i++)
{
int candidate;
do
{
candidate = random.Next(max);
} while (returned.Contains(candidate));
yield return candidate;
returned.Add(candidate);
}
}
Another alternative for unique random integers is to build a collection of max items and shuffle it, which can still be done just-in-time. This is more efficient in the case where max and n are similar (as you don't need to loop round until you're lucky enough to get a new item) but inefficient in the case where max is very large and n isn't.
EDIT: As noted in comments, you can shorten this slightly by changing the body of the for loop to:
int candidate;
do
{
candidate = random.Next(max);
} while (!returned.Add(candidate))
yield return candidate;
That uses the fact that Add will return false if the item already exists in the set.

get next available integer using LINQ

Say I have a list of integers:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
I would like to get the next available integer, ordered by increasing integer. Not the last or highest one, but in this case the next integer that is not in this list. In this case the number is 4.
Is there a LINQ statement that would give me this? As in:
var nextAvailable = myInts.SomeCoolLinqMethod();
Edit: Crap. I said the answer should be 2 but I meant 4. I apologize for that!
For example: Imagine that you are responsible for handing out process IDs. You want to get the list of current process IDs, and issue a next one, but the next one should not just be the highest value plus one. Rather, it should be the next one available from an ordered list of process IDs. You could get the next available starting with the highest, it does not really matter.
I see a lot of answers that write a custom extension method, but it is possible to solve this problem with the standard linq extension methods and the static Enumerable class:
List<int> myInts = new List<int>() {1,2,3,5,8,13,21};
// This will set firstAvailable to 4.
int firstAvailable = Enumerable.Range(1, Int32.MaxValue).Except(myInts).First();
The answer provided by #Kevin has a undesirable performance profile. The logic will access the source sequence numerous times: once for the .Count call, once for the .FirstOrDefault call, and once for each .Contains call. If the IEnumerable<int> instance is a deferred sequence, such as the result of a .Select call, this will cause at least 2 calculations of the sequence, along with once for each number. Even if you pass a list to the method, it will potentially go through the entire list for each checked number. Imagine running it on the sequence { 1, 1000000 } and you can see how it would not perform well.
LINQ strives to iterate source sequences no more than once. This is possible in general and can have a big impact on the performance of your code. Below is an extension method which will iterate the sequence exactly once. It does so by looking for the difference between each successive pair, then adds 1 to the first lower number which is more than 1 away from the next number:
public static int? FirstMissing(this IEnumerable<int> numbers)
{
int? priorNumber = null;
foreach(var number in numbers.OrderBy(n => n))
{
var difference = number - priorNumber;
if(difference != null && difference > 1)
{
return priorNumber + 1;
}
priorNumber = number;
}
return priorNumber == null ? (int?) null : priorNumber + 1;
}
Since this extension method can be called on any arbitrary sequence of integers, we make sure to order them before we iterate. We then calculate the difference between the current number and the prior number. If this is the first number in the list, priorNumber will be null and thus difference will be null. If this is not the first number in the list, we check to see if the difference from the prior number is exactly 1. If not, we know there is a gap and we can add 1 to the prior number.
You can adjust the return statement to handle sequences with 0 or 1 items as you see fit; I chose to return null for empty sequences and n + 1 for the sequence { n }.
This will be fairly efficient:
static int Next(this IEnumerable<int> source)
{
int? last = null;
foreach (var next in source.OrderBy(_ => _))
{
if (last.HasValue && last.Value + 1 != next)
{
return last.Value + 1;
}
last = next;
}
return last.HasValue ? last.Value + 1 : Int32.MaxValue;
}
public static class IntExtensions
{
public static int? SomeCoolLinqMethod(this IEnumerable<int> ints)
{
int counter = ints.Count() > 0 ? ints.First() : -1;
while (counter < int.MaxValue)
{
if (!ints.Contains(++counter)) return counter;
}
return null;
}
}
Usage:
var nextAvailable = myInts.SomeCoolLinqMethod();
Ok, here is the solution that I came up with that works for me.
var nextAvailableInteger = Enumerable.Range(myInts.Min(),myInts.Max()).FirstOrDefault( r=> !myInts.Contains(r));
If anyone has a more elegant solution I would be happy to accept that one. But for now, this is what I'm putting in my code and moving on.
Edit: this is what I implemented after Kevin's suggestion to add an extension method. And that was the real answer - that no single LINQ extension would do so it makes more sense to add my own. That is really what I was looking for.
public static int NextAvailableInteger(this IEnumerable<int> ints)
{
return NextAvailableInteger(ints, 1); // by default we use one
}
public static int NextAvailableInteger(this IEnumerable<int> ints, int defaultValue)
{
if (ints == null || ints.Count() == 0) return defaultValue;
var ordered = ints.OrderBy(v => v);
int counter = ints.Min();
int max = ints.Max();
while (counter < max)
{
if (!ordered.Contains(++counter)) return counter;
}
return (++counter);
}
Not sure if this qualifies as a cool Linq method, but using the left outer join idea from This SO Answer
var thelist = new List<int> {1,2,3,4,5,100,101};
var nextAvailable = (from curr in thelist
join next in thelist
on curr + 1 equals next into g
from newlist in g.DefaultIfEmpty()
where !g.Any ()
orderby curr
select curr + 1).First();
This puts the processing on the sql server side if you're using Linq to Sql, and allows you to not have to pull the ID lists from the server to memory.
var nextAvailable = myInts.Prepend(0).TakeWhile((x,i) => x == i).Last() + 1;
It is 7 years later, but there are better ways of doing this than the selected answer or the answer with the most votes.
The list is already in order, and based on the example 0 doesn't count. We can just prepend 0 and check if each item matches it's index. TakeWhile will stop evaluating once it hits a number that doesn't match, or at the end of the list.
The answer is the last item that matches, plus 1.
TakeWhile is more efficient than enumerating all the possible numbers then excluding the existing numbers using Except, because we TakeWhile will only go through the list until it finds the first available number, and the resulting Enumerable collection is at most n.
The answer using Except generates an entire enumerable of answers that are not needed just to grab the first one. Linq can do some optimization with First(), but it still much slower and more memory intensive than TakeWhile.

Using LINQ to create an IEnumerable<> of delta values

I've got a list of timestamps (in ticks), and from this list I'd like to create another one that represents the delta time between entries.
Let's just say, for example, that my master timetable looks like this:
10
20
30
50
60
70
What I want back is this:
10
10
20
10
10
What I'm trying to accomplish here is detect that #3 in the output table is an outlier by calculating the standard deviation. I've not taken statistics before, but I think if I look for the prevalent value in the output list and throw out anything outside of 1 sigma that this will work adequately for me.
I'd love to be able to create the output list with a single LINQ query, but I haven't figured it out yet. Currently I'm just brute forcing it with a loop.
If you are running .NET 4.0, this should work fine:
var deltas = list.Zip(list.Skip(1), (current, next) => next - current);
Apart from the multiple enumerators, this is quite efficient; it should work well on any kind of sequence.
Here's an alternative for .NET 3.5:
var deltas = list.Skip(1)
.Select((next, index) => next - list[index]);
Obviously, this idea will only be efficient when the list's indexer is employed. Modifying it to use ElementAt may not be a good idea: quadratic run-time will occur for non IList<T> sequences. In this case, writing a custom iterator is a good solution.
EDIT: If you don't like the Zip + Skip(1) idea, writing an extension such as this (untested) maybe useful in these sorts of circumstances:
public class CurrentNext<T>
{
public T Current { get; private set; }
public T Next { get; private set; }
public CurrentNext(T current, T next)
{
Current = current;
Next = next;
}
}
...
public static IEnumerable<CurrentNext<T>> ToCurrentNextEnumerable<T>(this IEnumerable<T> source)
{
if (source == null)
throw new ArgumentException("source");
using (var source = enumerable.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
T current = enumerator.Current;
while (enumerator.MoveNext())
{
yield return new CurrentNext<T>(current, enumerator.Current);
current = enumerator.Current;
}
}
}
Which you could then use as:
var deltas = list.ToCurrentNextEnumerable()
.Select(c=> c.Next - c.Current);
You can use Ani's answer:-
var deltas = list.Zip(list.Skip(1), (current, next) => next - current);
With a super-simple implementation of the Zip extension method:-
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> func)
{
var ie1 = first.GetEnumerator();
var ie2 = second.GetEnumerator();
while (ie1.MoveNext() && ie2.MoveNext())
yield return func(ie1.Current, ie2.Current);
}
That'll work with 3.5.
This should do the trick:
static IEnumerable<int> GetDeltas(IEnumerable<int> collection)
{
int? previous = null;
foreach (int value in collection)
{
if (previous != null)
{
yield return value - (int)previous;
}
previous = value;
}
}
Now you can call your collection like this:
var masterTimetable = GetMasterTimeTable();
var deltas = GetDeltas(masterTimetable);
It's not really LINQ, but will effectively do the trick.
It looks like there are sufficient answers to get you going already, but I asked a similar question back in the spring:
How to zip one ienumerable with itself
In the responses to my question, I learned about "Pairwise" and "Pairwise"
As I recall, explicitly implementing your own "Pairwise" enumerator does mean that you iterate through you list exactly once whereas implementing "Pairwise" in terms of .Zip + .Skip(1) means that you will ultimately iterate over your list twice.
In my post I also include several examples of geometry (operating on lists of points) processing code such as Length/Distance, Area, Centroid.
Not that I recommend this, but totally abusing LINQ the following would work:
var vals = new[] {10, 20, 30, 50, 60, 70};
int previous = 0;
var newvals = vals.Select(i =>
{
int dif = i - previous;
previous = i;
return dif;
});
foreach (var newval in newvals)
{
Console.WriteLine(newval);
}
One liner for you:
int[] i = new int[] { 10, 20, 30, 50, 60, 70 };
IEnumerable<int> x = Enumerable.Range(1, i.Count()-1).Select(W => i[W] - i[W - 1]);
LINQ is not really designed for what you're trying to do here, because it usually evaluates value by value, much like an extremely efficient combination of for-loops.
You'd have to know your current index, something you don't, without some kind of workaround.

Get next N elements from enumerable

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

C# functional quicksort is failing

I'm trying to implement quicksort in a functional style using C# using linq, and this code randomly works/doesn't work, and I can't figure out why.
Important to mention: When I call this on an array or list, it works fine. But on an unknown-what-it-really-is IEnumerable, it goes insane (loses values or crashes, usually. sometimes works.)
The code:
public static IEnumerable<T> Quicksort<T>(this IEnumerable<T> source)
where T : IComparable<T>
{
if (!source.Any())
yield break;
var pivot = source.First();
var sortedQuery = source.Skip(1).Where(a => a.CompareTo(source.First()) <= 0).Quicksort()
.Concat(new[] { pivot })
.Concat(source.Skip(1).Where(a => a.CompareTo(source.First()) > 0).Quicksort());
foreach (T key in sortedQuery)
yield return key;
}
Can you find any faults here that would cause this to fail?
Edit: Some better test code:
var rand = new Random();
var ienum = Enumerable.Range(1, 100).Select(a => rand.Next());
var array = ienum.ToArray();
try
{
array.Quicksort().Count();
Console.WriteLine("Array went fine.");
}
catch (Exception ex)
{
Console.WriteLine("Array did not go fine ({0}).", ex.Message);
}
try
{
ienum.Quicksort().Count();
Console.WriteLine("IEnumerable went fine.");
}
catch (Exception ex)
{
Console.WriteLine("IEnumerable did not go fine ({0}).", ex.Message);
}
Some enumerable instances, such as those returned by Linq to SQL or Entity Framework queries, are only designed to be iterated once. Your code requires multiple iterations and will crash or behave strangely on these types of instances. You'd have to materialize those enumerables with ToArray() or a similar method first.
You should also be reusing that pivot so that you don't have to keep iterating for the first and remaining elements. This may not completely solve the problem, but it'll help in some cases:
public static IEnumerable<T> Quicksort<T>(this IEnumerable<T> source)
where T : IComparable<T>
{
if (!source.Any())
return source;
var pivot = source.First();
var remaining = source.Skip(1);
return remaining
.Where(a => a.CompareTo(pivot) <= 0).Quicksort()
.Concat(new[] { pivot })
.Concat(remaining.Where(a => a.CompareTo(pivot) > 0).Quicksort());
}
(You also don't need to iterate through the sortedQuery - just return it, it's already an IEnumerable<T>.)
On a related note, why do you feel the need to re-implement this functionality? Enumerable.OrderBy already does it for you.
Response to update:
Your tests are failing because your test is wrong, not the algorithm.
Random is a non-deterministic input source and, as I have explained above, the sort method needs to perform multiple iterations over the same sequence. If the sequence is totally random, then it is going to get different values on subsequent iterations. Essentially, you are trying to quicksort a sequence whose elements keep changing!
If you want the test to succeed, you need to make the input consistent. Use a seed for the random number generator:
static IEnumerable<int> GetRandomInput(int seed, int length)
{
Random rand = new Random(seed);
for (int i = 0; i < length; i++)
{
yield return rand.Next();
}
}
Then:
static void Main(string[] args)
{
var sequence = GetRandomInput(248917, 100);
int lastNum = 0;
bool isSorted = true;
foreach (int num in sequence.Quicksort())
{
if (num < lastNum)
{
isSorted = false;
break;
}
lastNum = num;
}
Console.WriteLine(isSorted ? "Sorted" : "Not sorted");
Console.ReadLine();
}
It will come back sorted.

Categories

Resources