How to get results efficiently out of an Octree/Quadtree? - c#

I am working on a piece of 3D software that has sometimes has to perform intersections between massive numbers of curves (sometimes ~100,000). The most natural way to do this is to do an N^2 bounding box check, and then those curves whose bounding boxes overlap get intersected.
I heard good things about octrees, so I decided to try implementing one to see if I would get improved performance.
Here's my design:
Each octree node is implemented as a class with a list of subnodes and an ordered list of object indices.
When an object is being added, it's added to the lowest node that entirely contains the object, or some of that node's children if the object doesn't fill all of the children.
Now, what I want to do is retrieve all objects that share a tree node with a given object. To do this, I traverse all tree nodes, and if they contain the given index, I add all of their other indices to an ordered list.
This is efficient because the indices within each node are already ordered, so finding out if each index is already in the list is fast. However, the list ends up having to be resized, and this takes up most of the time in the algorithm. So what I need is some kind of tree-like data structure that will allow me to efficiently add ordered data, and also be efficient in memory.
Any suggestions?

Assuming you keep the size of the OctTree as a property of the tree, you should be able to preallocate a list that is larger than the number of things you could possibly put it in. Preallocating the size will keep the resize from happening as long as the size is larger than you need. I assume that you are using a SortedList to keep your ordered results.
var results = new SortedList<Node>( octTree.Count );
// now find the node and add the points
results = result.TrimToSize(); // reclaim space as needed
An alternative would be to augment your data structure keeping the size of the tree below the current node in the node itself. Then you'd be able to find the node of interest and directly determine what the size of the list needs to be. All you'd have to do is modify the insert/delete operations to update the size of each of the ancestors of the node inserted/deleted at the end of the operation.

SortedDictionary (.NET 2+) or SortedSet (.NET 4 only) is probably what you want. They are tree structures.
SortedList is a dumb class which is no different from List structurally.
However, it is still not entirely clear to me why you need this list as sorted.
Maybe if you could elaborate on this matter we could find a solution where you don't need sorting at all. For example a simple HashSet could do. It is faster at both lookups and insertions than SortedList or any of the tree structures if hashing is done properly.
Ok, now when it is clear to me that you wanted sorted lists merging, I can try to write an implementation.
At first, I implemented merging using SortedDictionary to store heads of all the arrays. At each iteration I removed the smallest element from the dictionary and added the next one from the same array. Performance tests showed that overhead of SortedDictionary is huge, so that it is almost impossible to make it faster than simple concatenation+sorting. It even struggles to match SortedList performance on small tests.
Then I replaced SortedDictionary with custom-made binary heap implementation. Performance improvement was tremendous (more than 6 times). This Heap implementation even manages to beat .Distinct() (which is usually the fastest) in some tests.
Here is my code:
class Heap<T>
{
public Heap(int limit, IComparer<T> comparer)
{
this.comparer = comparer;
data = new T[limit];
}
int count = 0;
T[] data;
public void Add(T t)
{
data[count++] = t;
promote(count-1);
}
IComparer<T> comparer;
public int Count { get { return count; } }
public T Pop()
{
T result = data[0];
fill(0);
return result;
}
bool less(T a, T b)
{
return comparer.Compare(a,b)<0;
}
void fill(int index)
{
int child1 = index*2+1;
int child2 = index*2+2;
if(child1 >= Count)
{
data[index] = data[--count];
if(index!=count)
promote(index);
}
else
{
int bestChild = child1;
if(child2 < Count && less(data[child2], data[child1]))
{
bestChild = child2;
}
data[index] = data[bestChild];
fill(bestChild);
}
}
void promote(int index)
{
if(index==0)
return;
int parent = (index-1)/2;
if(less(data[index], data[parent]))
{
T tmp = data[parent];
data[parent] = data[index];
data[index] = tmp;
promote(parent);
}
}
}
struct ArrayCursor<T>
{
public T [] Array {get;set;}
public int Index {get;set;}
public bool Finished {get{return Array.Length == Index;}}
public T Value{get{return Array[Index];}}
}
class ArrayComparer<T> : IComparer<ArrayCursor<T>>
{
IComparer<T> comparer;
public ArrayComparer(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare (ArrayCursor<T> a, ArrayCursor<T> b)
{
return comparer.Compare(a.Value, b.Value);
}
}
static class HeapMerger
{
public static IEnumerable<T> MergeUnique<T>(this T[][] arrays)
{
bool first = true;
T last = default(T);
IEqualityComparer<T> eq = EqualityComparer<T>.Default;
foreach(T i in Merge(arrays))
if(first || !eq.Equals(last,i))
{
yield return i;
last = i;
first = false;
}
}
public static IEnumerable<T> Merge<T>(this T[][] arrays)
{
var map = new Heap<ArrayCursor<T>>(arrays.Length, new ArrayComparer<T>(Comparer<T>.Default));
Action<ArrayCursor<T>> tryAdd = (a)=>
{
if(!a.Finished)
map.Add(a);
};
for(int i=0;i<arrays.Length;i++)
tryAdd(new ArrayCursor<T>{Array=arrays[i], Index=0});
while(map.Count>0)
{
ArrayCursor<T> lowest = map.Pop();
yield return lowest.Value;
lowest.Index++;
tryAdd(lowest);
}
}
}

Related

Looking for something like a HashSet, but with a range of values for the key?

I'm wondering if there is something like HashSet, but keyed by a range of values.
For example, we could add an item which is keyed to all integers between 100 and 4000. This item would be returned if we used any key between 100 and 4000, e.g. 287.
I would like the lookup speed to be quite close to HashSet, i.e. O(1). It would be possible to implement this using a binary search, but this would be too slow for the requirements. I would like to use standard .NET API calls as much as possible.
Update
This is interesting: https://github.com/mbuchetics/RangeTree
It has a time complexity of O(log(N)) where N is number of intervals, so it's not exactly O(1), but it could be used to build a working implementation.
I don't believe there's a structure for it already. You could implement something like a RangedDictionary:
class RangedDictionary {
private Dictionary<Range, int> _set = new Dictionary<Range, int>();
public void Add(Range r, int key) {
_set.Add(r, key);
}
public int Get(int key) {
//find a range that includes that key and return _set[range]
}
}
struct Range {
public int Begin;
public int End;
//override GetHashCode() and Equals() methods so that you can index a Dictionary by Range
}
EDIT: changed to HashSet to Dictionary
Here is a solution you can try out. However it assumes some points :
No range overlaps
When you request for a number, it is effectively inside a range (no error check)
From what you said, this one is O(N), but you can make it O(log(N)) with little effort I think.
The idea is that a class will handle the range thing, it will basically convert any value given to it to its range's lower boundary. This way your Hashtable (here a Dictionary) contains the low boundaries as keys.
public class Range
{
//We store all the ranges we have
private static List<int> ranges = new List<int>();
public int value { get; set; }
public static void CreateRange(int RangeStart, int RangeStop)
{
ranges.Add(RangeStart);
ranges.Sort();
}
public Range(int value)
{
int previous = ranges[0];
//Here we will find the range and give it the low boundary
//This is a very simple foreach loop but you can make it better
foreach (int item in ranges)
{
if (item > value)
{
break;
}
previous = item;
}
this.value = previous;
}
public override int GetHashCode()
{
return value;
}
}
Here is to test it.
class Program
{
static void Main(string[] args)
{
Dictionary<int, int> myRangedDic = new Dictionary<int,int>();
Range.CreateRange(10, 20);
Range.CreateRange(50, 100);
myRangedDic.Add(new Range(15).value, 1000);
myRangedDic.Add(new Range(75).value, 5000);
Console.WriteLine("searching for 16 : {0}", myRangedDic[new Range(16).value].ToString());
Console.WriteLine("searching for 64 : {0}", myRangedDic[new Range(64).value].ToString());
Console.ReadLine();
}
}
I don't believe you really can go below O(Log(N)) because there is no way for you to know immediately in which range a number is, you must always compare it with a lower (or upper) bound.
If you had predetermined ranges, that would have been easier to do. i.e. if your ranges are every hundreds, it is really easy to find the correct range of any number by calculating it modulo 100, but here we can assume nothing, so we must check.
To go down to Log(N) with this solution, just replace the foreach with a loop that will look at the middle of the array, then split it in two every iteration...

C# data structure, list which can dynamically resize up to a given limit, and allows fast access to any index

I'm implementing a memory system for an AI agent. It needs to have an internal list of state transitions which is capped at some number, say 10000.
If at capacity, adding a new memory should automatically remove the oldest memory.
Importantly, I should also need to be able to quickly access any item in this list.
A wrapper for Queue at first seemed obvious, but Queue does not allow fast access of any element. (O(n))
Similarly, remove an item from the beginning of a List structure takes O(n).
LinkedLists allow fast additions and removals, but again do not allow quick access to every index.
An array would allow random access but obviously it's not dynamically resizeable and deletion is problematic.
I've seen a HashMap being suggested but I'm ensure how that might be implemented.
Suggestions?
If you want the queue to be a fixed length, you could use a circular buffer which enables O(1) enqueue, dequeue and indexing operations and automatically overwrites old entries when the queue is full.
Try using a Dictionary with a LinkedList. The keys of the Dictionary are the indexes of the LinkedList nodes and the values of the Dictionary are of type LinkedListNode; that is, the LinkedList nodes.
The Dictionary would give you almost an O(1) on its operations and removing/adding LinkedListNode(s) to the beginning or end of a LinkedList is of O(1) as well.
Another alternative is to use a HashTable. However, in this case you have to know the capacity of the table beforehand (See Hashtable.Add Method) in order to get the O(1) performance:
If Count is less than the capacity of the Hashtable, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
In the first solution, no matter what's the capcity of the LinkedList or the Dictionary you would still get almost an O(1) from both the Dictionary and the LinkedList. Of course that's going to be an O(3) or O(4) depending on the total number of operations that you perform on both the Dictionary and the LinkedList to do an add or remove operation inside your memory class. The search access is going to be always an O(1) because you will be using the Dictionary only.
HashMap is for Java, so the closest equivalent is Dictionary. C# Java HashMap equivalent. But I wouldn't say that this is the ultimate answer.
If you implement it as Dictionary, which key == the content, then you can search the content with O(1). However, you cannot have same key. Also, because it is not ordered, you may not know which the 1st content is.
If you implement it as Dictionary, which key == index, and value == the content, searching for the content still takes O(n) because you don't know the location of content.
A List or an Array will cost O(1) if you search the content by index reference. So, please double check your statement that it takes O(n)
If you search by index is sufficient, then circular array/ buffer which #Lee mentioned is good enough.
Otherwise, similar to DB, you might want to maintain in 2 separate data: 1 for storing the data (Circular Array) and the other one for search (Hash).
EDIT: #Lee has it right. A circular buffer seems to give you what you want. Answer left in place though.
I think the data structure you want might be a priority queue -- it depends on what you mean by 'quickly access any item'. If you mean 'able to enumerate all items in O(N)', then a priority queue fits the bill. If you mean 'enumerate the list in historical order', then it won't.
Assuming you need these operations;
add an item and associate with a time
remove the oldest item
enumerate all existing items in arbitrary order
Then you could easily extend this priority queue implementation I wrote a little while ago.
You'll want implement IEnumerable as a loop through the T[] data array from 0 to cursor. This will give you your enumeration.
Implement a GetItem(i) function which returns this.data[i] so long as i <= cursor.
Implement an automatic size limit by putting this into the Push() method;
if (queue.Size => 10000) {
queue.Pop();
}
I think this is O(ln n) for push and pop, and O(N) to enumerate ALL items, or O(i) to find ANY item, so long as you don't need them in order.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Mindfire.DataStructures
{
public class PiorityQueue<T>
{
private int[] priorities;
private T[] data;
private int cursor;
private int capacity;
public int Size
{
get
{
return cursor+1;
}
}
public PiorityQueue(int capacity)
{
this.cursor = -1;
this.capacity = capacity;
this.priorities = new int[this.capacity];
this.data = new T[this.capacity];
}
public T Pop()
{
if (this.Size == 0)
{
throw new InvalidOperationException($"The {this.GetType().Name} is Empty");
}
var result = this.data[0];
this.data[0] = this.data[cursor];
this.priorities[0] = this.priorities[cursor];
this.cursor--;
var loc = 0;
while (true)
{
var l = loc * 2;
var r = loc * 2 + 1;
var leftIsBigger = l <= cursor && this.priorities[loc] < this.priorities[l];
var rightIsBigger = r <= cursor && this.priorities[loc] < this.priorities[r];
if (leftIsBigger)
{
Swap(loc, l);
loc = l;
}
else if (rightIsBigger)
{
Swap(loc, r);
loc = r;
}
else
{
break;
}
}
return result;
}
public void Push(int priority, T v)
{
this.cursor++;
if (this.cursor == this.capacity)
{
Resize(this.capacity * 2);
};
this.data[this.cursor] = v;
this.priorities[this.cursor] = priority;
var loc = (this.cursor -1)/ 2;
while (this.priorities[loc] < this.priorities[cursor])
{
// swap
this.Swap(loc, cursor);
}
}
private void Swap(int a, int b)
{
if (a == b) { return; }
var data = this.data[b];
var priority = this.priorities[b];
this.data[b] = this.data[a];
this.priorities[b] = this.priorities[a];
this.priorities[a] = priority;
this.data[a] = data;
}
private void Resize(int newCapacity)
{
var newPriorities = new int[newCapacity];
var newData = new T[newCapacity];
this.priorities.CopyTo(newPriorities, 0);
this.data.CopyTo(newData, 0);
this.data = newData;
this.priorities = newPriorities;
this.capacity = newCapacity;
}
public PiorityQueue() : this(1)
{
}
public T Peek()
{
if (this.cursor > 0)
{
return this.data[0];
}
else
{
return default(T);
}
}
public void Push(T item, int priority)
{
}
}
}

Writing the implementation for List<T> using IEnumerable<T> from scratch

Out of boredom I decided to write the implementation of List from scratch using IEnumerable. I ran into a few issues that I honestly don't know how to solve:
How would you resize a generic array (T[]) when an index is nulled or set to default(T)?
Since you cannot null T, how do you overcome the numerical primitive problem with their values being 0 by default?
If nothing can be done regarding #2, how do you stop the GetEnumerator() method from yield returning 0 when utilizing a numerical data type?
Last but not least, what is the standard practice regarding downsizing an array? I know for certain that one of the best solutions for upsizing is to increase the current length by a power of 2; if and when do you downsize? Per Remove/RemoveAt or by the currently used length % 2?
Here's what I've done so far:
public class List<T> : IEnumerable<T>
{
T[] list = new T[32];
int current;
public void Add(T item)
{
if (current + 1 > list.Length)
{
T[] temp = new T[list.Length * 2];
Array.Copy(list, temp, list.Length);
list = temp;
}
list[current] = item;
current++;
}
public void Remove(T item)
{
for (int i = 0; i < list.Length; i++)
if (list[i].Equals(item))
list[i] = default(T);
}
public void RemoveAt(int index)
{
list[index] = default(T);
}
public IEnumerator<T> GetEnumerator()
{
foreach (T item in list)
if (item != null && !item.Equals(default(T)))
yield return item;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
foreach (T item in list)
if (item != null && !item.Equals(default(T)))
yield return item;
}
}
Thanks in advance.
Well, for starters, your Remove and RemoveAt methods do not implement the same behavior as List<T>. List<T> will decrease in size by 1, whereas your List will remain constant in size. You should be shifting the values of higher index from the removed object to one lower index.
Also, GetEnumerator will iterate over all items in the array, regardless of what the value is.
I believe that will solve all of the issues you have. If someone adds a default(T) to the list, then a default(T) is what they will get back out again, regardless if T is an int and thus 0 or a class-type and thus null.
Finally, on downsizing: some growable array implementations rationalize that, if the array had ever gotten so big, then it is more likely than usual to get that big again. For that reason, they specifically avoid downsizing.
The key problem you're running into is maintaining the internal array and what remove does. List<T> does not support partial arrays internally. That doesn't mean you can't, but doing so is far more complicated. To exactly mimic List<T> you want to keep an array and a field for the number of elements in the array that are actually utilized (the list length, which is equal to or less than array length).
Add is easy, you add an element to the end like you did.
Remove is more complicated. If you are removing an element from the end, set the end element to default(T) and change the list length. If you are removing and element from the beginning or middle, then you need to shift the contents of the array and set the last one to default(T). The reason we set the last element to default(T) is to clear the reference, not so we can tell whether or not it's "in use". We know if it's "in use" based on the position in the array and our stored list length.
Another key to implementation is the enumerator. You want to loop through the first elements until you hit the list length. Don't skip nulls.
This is not a complete implementation, but should be correct implementation of the methods you started.
btw, I would not agree with
I know for certain that the best solution for upsizing is to increase the current length by a power of 2
This is the default behavior of List<T> but it's not the best solution in all situations. That's exactly why List<T> allows you to specify a capacity. If you're loading a list from a source and know how many items you're adding, then you can pre-initialize the capacity of the list to reduce the number of copies. Similarly, if you're creating hundreds or thousands of lists that are larger than the default size or likely to be larger, it can be a benefit to memory utilization to pre-initialize the lists to be the same size. That way the memory they allocate and free will be the same continuous blocks and can be more efficiently allocated and deallocated repeatedly. For example, we have a reporting calculation engine that creates about 300,000 lists for each run, with many runs a second. We know the lists are always a few hundred items each, so we pre-initialize them all to 1024 capacity. This is more than most need, but since they're all the same length and they're created and disposed of very quickly, this makes memory reusage efficient.
public class MyList<T> : IEnumerable<T>
{
T[] list = new T[32];
int listLength;
public void Add(T item)
{
if (listLength + 1 > list.Length)
{
T[] temp = new T[list.Length * 2];
Array.Copy(list, temp, list.Length);
list = temp;
}
list[listLength] = item;
listLength++;
}
public void Remove(T item)
{
for (int i = 0; i < list.Length; i++)
if (list[i].Equals(item))
{
RemoveAt(i);
return;
}
}
public void RemoveAt(int index)
{
if (index < 0 || index >= listLength)
{
throw new ArgumentException("'index' must be between 0 and list length.");
}
if (index == listLength - 1)
{
list[index] = default(T);
listLength = index;
return;
}
// need to shift the list
Array.Copy(list, index + 1, list, index, listLength - index + 1);
listLength--;
list[listLength] = default(T);
}
public IEnumerator<T> GetEnumerator()
{
for (int i = 0; i < listLength; i++)
{
yield return list[i];
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}

Way to pad an array to avoid index outside of bounds of array error

I expect to have at least 183 items in my list when I query it, but sometimes the result from my extract results in items count lower than 183. My current fix supposedly pads the array in the case that the count is less than 183.
if (extractArray.Count() < 183) {
int arraysize= extractArray.Count();
var tempArr = new String[183 - arraysize];
List<string> itemsList = extractArray.ToList<string>();
itemsList.AddRange(tempArr);
var values = itemsList.ToArray();
//-- Process the new array that is now at least 183 in length
}
But it seems my solution is not the best. I would appreciate any other solutions that could help ensure I get at least 183 items whenever the extract happens please.
I'd probably follow others' suggestions, and use a list. Use the "capacity" constructor for added performance:
var list = new List<string>(183);
Then, whenever you get a new array, do this (replace " " with whatever value you use to pad the array):
list.Clear();
list.AddRange(array);
// logically, you can do this without the if, but it saves an object allocation when the array is full
if (array.Length < 183)
list.AddRange(Enumerable.Repeat(" ", 183 - array.Length));
This way, the list is always reusing the same internal array, reducing allocations and GC pressure.
Or, you could use an extension method:
public static class ArrayExtensions
{
public static T ElementOrDefault<T>(this T[] array, int index)
{
return ElementOrDefault(array, index, default(T));
}
public static T ElementOrDefault<T>(this T[] array, int index, T defaultValue)
{
return index < array.Length ? array[index] : defaultValue;
}
}
Then code like this:
items.Zero = array[0];
items.One = array[1];
//...
Becomes this:
items.Zero = array.ElementOrDefault(0);
items.One = array.ElementOrDefault(1);
//...
Finally, this is the rather cumbersome idea with which I started writing this answer: You could wrap the array in an IList implementation that's guaranteed to have 183 indexes (I've omitted most of the interface member implementations for brevity):
class ConstantSizeReadOnlyArrayWrapper<T> : IList<T>
{
private readonly T[] _array;
private readonly int _constantSize;
private readonly T _padValue;
public ConstantSizeReadOnlyArrayWrapper(T[] array, int constantSize, T padValue)
{
//parameter validation omitted for brevity
_array = array;
_constantSize = constantSize;
_padValue = padValue;
}
private int MissingItemCount
{
get { return _constantSize - _array.Length; }
}
public IEnumerator<T> GetEnumerator()
{
//maybe you don't need to implement this, or maybe just returning _array.GetEnumerator() would suffice.
return _array.Concat(Enumerable.Repeat(_padValue, MissingItemCount)).GetEnumerator();
}
public int Count
{
get { return _constantSize; }
}
public bool IsReadOnly
{
get { return true; }
}
public int IndexOf(T item)
{
var arrayIndex = Array.IndexOf(_array, item);
if (arrayIndex < 0 && item.Equals(_padValue))
return _array.Length;
return arrayIndex;
}
public T this[int index]
{
get
{
if (index < 0 || index >= _constantSize)
throw new IndexOutOfRangeException();
return index < _array.Length ? _array[index] : _padValue;
}
set { throw new NotSupportedException(); }
}
}
Ack.
The Array base class implements the Resize method
if(extractArray.Length < 183)
Array.Resize<string>(ref extractArray, 183);
However, keep in mind that resizing is problematic for performance, thus this method is useful only if you require the array for some reason. If you can switch to a List
And, I suppose you have an unidimensional array of strings here, so I use the Length property to check the effective number of items in the array.
Since you've stated that you need to ensure there's 183 indexes, and that you need to pad it if there is not, I would suggest using a List instead of an array. You can do something like:
while (extractList.Count < 183)
{
extractList.Add(" "); // just add a space
}
If you ABSOLUTELY have to go back to an array you can using something similar.
I can't say that I would recommend this solution, but I won't let that stop me from posting it! Whether they like to admit it or not, everyone likes linq solutions!
Using linq, given an array with X elements in it, you can generate an array with exactly Y (183 in your case) elements in it like this:
var items183exactly = extractArray.Length == 183 ? extractArray :
extractArray.Take(183)
.Concat(Enumerable.Repeat(string.Empty, Math.Max(0, 183 - extractArray.Length)))
.ToArray();
If there are fewer than 183 elements, the array will be padded with empty strings. If there are more than 183 elements, the array will be truncated. If there are exactly 183 elements, the array is used as is.
I don't claim that this is efficient or that it is necessarily a good idea. However, it does use linq (yippee!) and it is fun.

Get next N elements from enumerable

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

Categories

Resources