C# functional quicksort is failing - c#

I'm trying to implement quicksort in a functional style using C# using linq, and this code randomly works/doesn't work, and I can't figure out why.
Important to mention: When I call this on an array or list, it works fine. But on an unknown-what-it-really-is IEnumerable, it goes insane (loses values or crashes, usually. sometimes works.)
The code:
public static IEnumerable<T> Quicksort<T>(this IEnumerable<T> source)
where T : IComparable<T>
{
if (!source.Any())
yield break;
var pivot = source.First();
var sortedQuery = source.Skip(1).Where(a => a.CompareTo(source.First()) <= 0).Quicksort()
.Concat(new[] { pivot })
.Concat(source.Skip(1).Where(a => a.CompareTo(source.First()) > 0).Quicksort());
foreach (T key in sortedQuery)
yield return key;
}
Can you find any faults here that would cause this to fail?
Edit: Some better test code:
var rand = new Random();
var ienum = Enumerable.Range(1, 100).Select(a => rand.Next());
var array = ienum.ToArray();
try
{
array.Quicksort().Count();
Console.WriteLine("Array went fine.");
}
catch (Exception ex)
{
Console.WriteLine("Array did not go fine ({0}).", ex.Message);
}
try
{
ienum.Quicksort().Count();
Console.WriteLine("IEnumerable went fine.");
}
catch (Exception ex)
{
Console.WriteLine("IEnumerable did not go fine ({0}).", ex.Message);
}

Some enumerable instances, such as those returned by Linq to SQL or Entity Framework queries, are only designed to be iterated once. Your code requires multiple iterations and will crash or behave strangely on these types of instances. You'd have to materialize those enumerables with ToArray() or a similar method first.
You should also be reusing that pivot so that you don't have to keep iterating for the first and remaining elements. This may not completely solve the problem, but it'll help in some cases:
public static IEnumerable<T> Quicksort<T>(this IEnumerable<T> source)
where T : IComparable<T>
{
if (!source.Any())
return source;
var pivot = source.First();
var remaining = source.Skip(1);
return remaining
.Where(a => a.CompareTo(pivot) <= 0).Quicksort()
.Concat(new[] { pivot })
.Concat(remaining.Where(a => a.CompareTo(pivot) > 0).Quicksort());
}
(You also don't need to iterate through the sortedQuery - just return it, it's already an IEnumerable<T>.)
On a related note, why do you feel the need to re-implement this functionality? Enumerable.OrderBy already does it for you.
Response to update:
Your tests are failing because your test is wrong, not the algorithm.
Random is a non-deterministic input source and, as I have explained above, the sort method needs to perform multiple iterations over the same sequence. If the sequence is totally random, then it is going to get different values on subsequent iterations. Essentially, you are trying to quicksort a sequence whose elements keep changing!
If you want the test to succeed, you need to make the input consistent. Use a seed for the random number generator:
static IEnumerable<int> GetRandomInput(int seed, int length)
{
Random rand = new Random(seed);
for (int i = 0; i < length; i++)
{
yield return rand.Next();
}
}
Then:
static void Main(string[] args)
{
var sequence = GetRandomInput(248917, 100);
int lastNum = 0;
bool isSorted = true;
foreach (int num in sequence.Quicksort())
{
if (num < lastNum)
{
isSorted = false;
break;
}
lastNum = num;
}
Console.WriteLine(isSorted ? "Sorted" : "Not sorted");
Console.ReadLine();
}
It will come back sorted.

Related

Accessing yield return collection

Is there any way to access the IEnumerable<T> collection being build up by yield return in a loop from within the method building the IEnumerable itself?
Silly example:
Random random = new Random();
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
while ([RETURN_VALUE].Count() < n)
{
int value = random.Next(max);
if (![RETURN_VALUE].Contains(value))
yield return value;
}
}
There is no collection being built up. The sequence that is returned is evaluated lazily, and unless the caller explicitly copies the data to another collection, it will be gone as soon as it's been fetched.
If you want to ensure uniqueness, you'll need to do that yourself. For example:
IEnumerable<int> UniqueRandomIntegers(int n, int max)
{
HashSet<int> returned = new HashSet<int>();
for (int i = 0; i < n; i++)
{
int candidate;
do
{
candidate = random.Next(max);
} while (returned.Contains(candidate));
yield return candidate;
returned.Add(candidate);
}
}
Another alternative for unique random integers is to build a collection of max items and shuffle it, which can still be done just-in-time. This is more efficient in the case where max and n are similar (as you don't need to loop round until you're lucky enough to get a new item) but inefficient in the case where max is very large and n isn't.
EDIT: As noted in comments, you can shorten this slightly by changing the body of the for loop to:
int candidate;
do
{
candidate = random.Next(max);
} while (!returned.Add(candidate))
yield return candidate;
That uses the fact that Add will return false if the item already exists in the set.

Emulate Python's random.choice in .NET

Python's module 'random' has a function random.choice
random.choice(seq)
Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
How can I emulate this in .NET ?
public T RandomChoice<T> (IEnumerable<T> source)
Edit: I heard this as an interview question some years ago, but today the problem occurred naturally in my work. The interview question was stated with constraints
'the sequence is too long to save to memory'
'you can only loop over the sequence once'
'the sequence doesn't have a length/count method' (à la .NET IEnumerable)
To make a method that iterates the source only once, and doesn't have to allocate memory to store it temporarily, you count how many items you have iterated, and determine the probability that the current item should be the result:
public T RandomChoice<T> (IEnumerable<T> source) {
Random rnd = new Random();
T result = default(T);
int cnt = 0;
foreach (T item in source) {
cnt++;
if (rnd.Next(cnt) == 0) {
result = item;
}
}
return result;
}
When you are at the first item, the probability is 1/1 that it should be used (as that is the only item that you have seen this far). When you are at the second item, the probability is 1/2 that it should replace the first item, and so on.
This will naturally use a bit more CPU, as it creates one random number per item, not just a single random number to select an item, as dasblinkenlight pointed out. You can check if the source implements IList<T>, as Dan Tao suggested, and use an implementation that uses the capabilities to get the length of the collection and access items by index:
public T RandomChoice<T> (IEnumerable<T> source) {
IList<T> list = source as IList<T>;
if (list != null) {
// use list.Count and list[] to pick an item by random
} else {
// use implementation above
}
}
Note: You should consider sending the Random instance into the method. Otherwise you will get the same random seed if you call the method two times too close in time, as the seed is created from the current time.
The result of a test run, picking one number from an array containing 0 - 9, 1000000 times, to show that the distribution of the chosen numbers is not skewed:
0: 100278
1: 99519
2: 99994
3: 100327
4: 99571
5: 99731
6: 100031
7: 100429
8: 99482
9: 100638
To avoid iterating through the sequence two times (once for the count and once for the element) it is probably a good idea to save your sequence in an array before getting its random element:
public static class RandomExt {
private static Random rnd = new Random();
public static T RandomChoice<T> (this IEnumerable<T> source) {
var arr = source.ToArray();
return arr[rnd.Next(arr.Length)];
}
public static T RandomChoice<T> (this ICollection<T> source) {
return source[rnd.Next(rnd.Count)];
}
}
EDIT Implemented a very good idea by Chris Sinclair.
private static Random rng = new Random();
...
return source.Skip(rng.next(source.Count())).Take(1);
public T RandomChoice<T> (IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var list = source.ToList();
if (list.Count < 1)
{
throw new MissingMemberException();
}
var rnd = new Random();
return list[rnd.Next(0, list.Count)];
}
or extension
public static T RandomChoice<T> (this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var list = source.ToList();
if (list.Count < 1)
{
throw new MissingMemberException();
}
var rnd = new Random();
return list[rnd.Next(0, list.Count)];
}
I'd go with dasblinkenlight's answer, with one small change: leverage the fact that source might already be an indexed collection, in which case you really don't need to populate a new array (or list):
public static class RandomExt
{
public static T Choice<T>(this Random random, IEnumerable<T> sequence)
{
var list = sequence as IList<T> ?? sequence.ToList();
return list[random.Next(list.Count)];
}
}
Note that I also modified the interface from the abovementioned answer to make it more consistent with the Python version you referenced in your question:
var random = new Random();
var numbers = new int[] { 1, 2, 3 };
int randomNumber = random.Choice(numbers);
Edit: I like Guffa's answer even better, actually.
Well, get a list of all elements in the sequence. ask a random number generator for the index, return elemnt by index. Define what Sequence is - IEnumerable would be most obvious, but you need to materialize that into a list then to know the number of elements for the random number generator.
This is btw., not emulate, it is implement.
Is this some homework beginner study course question?
Assuming one has an extension method IEnumerable.MinBy:
var r = new Random();
return source.MinBy(x=>r.Next())
The method MinBy doesn't save the sequence to memory, it works like IEnumerable.Min making one iteration (see MoreLinq or elsewhere )

Why is IEnumerable.Count() reevaluating the query?

The following code prints "2 2 2 2" when I would expect "1 1 1 1". Why does "Count()" reevaluates the query?
class Class1
{
static int GlobalTag = 0;
public Class1()
{
tag = (++GlobalTag);
}
public int tag;
public int calls = 0;
public int Do()
{
calls++;
return tag;
}
}
class Program
{
static void Main(string[] args)
{
Class1[] cls = new Class1[] { new Class1(), new Class1(), new Class1(), new Class1() };
var result = cls.Where(c => (c.Do() % 2) == 0);
if (result.Count() <= 10)
{
if (result.Count() <= 10)
{
foreach (var c in cls)
{
Console.WriteLine(c.calls);
}
}
}
}
}
How else could it work? What would you expect Count() to do in order to cache the values?
LINQ to Objects generally executes lazily, only actually evaluating a query when it needs to - such as to count the elements. So the call to Where isn't evaluating the sequence at all; it just remembers the predicate and the sequence so that it can evaluate it when it needs to.
For a lot more details about how LINQ to Objects works, I suggest you read my Edulinq blog series. It's rather long (and not quite finished) but it'll give you a lot more insight into how LINQ to Objects works.
Not all sequences are repeatable, so it generally has to count them. To help it, you could call ToList() on the sequence - even if typed as IEnumerable<T>, LINQ-to-Objects will still short-cut and use the .Count - so very cheap, and repeatable.
For an example of a non-repeatable sequence:
static int evil;
static IEnumerable<int> GetSequence() {
foreach(var item in Enumerable.Range(1, Interlocked.Increment(ref evil)))
yield return item;
}
with demo:
var sequence = GetSequence();
Console.WriteLine(sequence.Count()); // 1
Console.WriteLine(sequence.Count()); // 2
Console.WriteLine(sequence.Count()); // 3

Get next N elements from enumerable

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

Shove a delimited string into a List<int>

If I have for example the following string:
"123;3344;4334;12"
and I want these numbers in a generic List<int>, I guess I don't know of a good way here other than to split in a loop and do a conversion then add to a List<int> through each iteration. Does anyone have other ways to go about this?
Updated. Here's what I came up with. I want to do this the old fashion way, not with LINQ because I'm trying to get better with just strings, arrays, lists and manipulating and converting in general.
public List<int> StringToList(string stringToSplit, char splitDelimiter)
{
List<int> list = new List<int>();
if (string.IsNullOrEmpty(stringToSplit))
return list;
string[] values = stringToSplit.Split(splitDelimiter);
if (values.Length <= 1)
return list;
foreach (string s in values)
{
int i;
if (Int32.TryParse(s, out i))
list.Add(i);
}
return list;
}
This is a new string utility method I plan on using whenever I need to convert a delimited string list to List
So I'm returning an empty list back to the caller if something fails. Good/Bad? is it pretty common to do this?
Yes, there are more "elegant" ways to do this with LINQ but I want to do it manually..the old way for now just for my own understanding.
Also, what bothers me about this:
list.AddRange(str.Split(';').Select(Int32.Parse));
is that I have no idea:
How to shove in a TryParse there instead.
What if the str.Split(';').Select(Int32.Parse) just fails for whatever reason...then the method that this AddRange resides in is going to blow up and unless I add a try/catch around this whole thing, I'm screwed if I don't handle it properly.
static int? ToInt32OrNull(string s)
{
int value;
return (Int32.TryParse(s, out value)) ? value : default(int?);
}
// ...
var str = "123;3344;4334;12";
var list = new List<int>();
list.AddRange(str.Split(';')
.Select(ToInt32OrNull)
.Where(i => i != null)
.Cast<int>());
Questioner notes:
I don't know of a good way here other than to split in a loop and do a conversion then add to a List
In general, this is a major reason why LINQ was brought into C# - to remove the need to work with sequences of values by implementing loops, and instead just declare your intention to transform the sequence. If you ever find yourself thinking "I don't know how to do this except with a loop" - it's time to look into a LINQ construct which will do the work for you.
Performance Update:
Performance of LINQ has been quesioned below. While in the comments the idea of LINQ being slower is defended since we gain the benefits of readability, maintainability and composibility, there is another aspect which gives LINQ an easy performance advantage: parallelism. Here is an example where adding just one extension method call, AsParallel() doubles the performance. This is a great example of where scale-out beats micro-optimization without even needing to measure very carefully. Note I'm not claiming that micro-optimizations are not ever needed, but with the tools we have available at this level of absraction, the need becomes vanishingly small.
class Program
{
private const int ElementCount = 10000000;
static void Main(string[] args)
{
var str = generateString();
var stopwatch = new Stopwatch();
var list1 = new List<int>(ElementCount);
var list2 = new List<int>(ElementCount);
var split = str.Split(';');
stopwatch.Start();
list1.AddRange(split
.Select(ToInt32OrNull)
.Where(i => i != null)
.Cast<int>());
stopwatch.Stop();
TimeSpan nonParallel = stopwatch.Elapsed;
stopwatch.Restart();
list2.AddRange(split
.AsParallel()
.Select(ToInt32OrNull)
.Where(i => i != null)
.Cast<int>());
stopwatch.Stop();
TimeSpan parallel = stopwatch.Elapsed;
Debug.WriteLine("Non-parallel: {0}", nonParallel);
Debug.WriteLine("Parallel: {0}", parallel);
}
private static String generateString()
{
var builder = new StringBuilder(1048576);
var rnd = new Random();
for (int i = 0; i < ElementCount; i++)
{
builder.Append(rnd.Next(99999));
builder.Append(';');
}
builder.Length--;
return builder.ToString();
}
static int? ToInt32OrNull(string s)
{
int value;
return (Int32.TryParse(s, out value)) ? value : default(int?);
}
}
Non-parallel: 00:00:07.0719911
Parallel: 00:00:04.5933906
string str = "123;3344;4334;12";
List<int> list = new List<int>();
foreach (string s in str.Split(';'))
{
list.Add( Int32.Parse(s));
}
List<int> list = (from numString in "123;3344;4334;12".Split(';')
select int.Parse(numString)).ToList();
string myString = "123;3344;4334;12";
var ints = new List<int>();
(from s in myString.Split(';')
select int.Parse()).ToList().ForEach(i=>ints.Add(i));
I've heard .Net 4.0 may have added ForEach to Enumerable<T>, so the ToList might be unnecessary there (can't test).
I think this is simplest
var str = "123;3344;4334;12";
var list = str.Split(';').ToList().Cast<int>();

Categories

Resources