Managed memory leaked by C# iterator

Managed memory leaked by C# iterator - c#

I have a class that generates DNA sequences, that are represented by long strings. This class implements the IEnumerable<string> interface, and it can produce an infinite number of DNA sequences. Below is a simplified version of my class:
class DnaGenerator : IEnumerable<string>
{
private readonly IEnumerable<string> _enumerable;
public DnaGenerator() => _enumerable = Iterator();
private IEnumerable<string> Iterator()
{
while (true)
foreach (char c in new char[] { 'A', 'C', 'G', 'T' })
yield return new String(c, 10_000_000);
}
public IEnumerator<string> GetEnumerator() => _enumerable.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
This class generates the DNA sequences by using an iterator. Instead of invoking the iterator again and again, an IEnumerable<string> instance is created during the construction and is cached as a private field. The problem is that using this class results in a sizable chunk of memory being constantly allocated, with the garbage collector being unable to recycle this chunk. Here is a minimal demonstration of this behavior:
var dnaGenerator = new DnaGenerator();
Console.WriteLine($"TotalMemory: {GC.GetTotalMemory(true):#,0} bytes");
DoWork(dnaGenerator);
GC.Collect();
Console.WriteLine($"TotalMemory: {GC.GetTotalMemory(true):#,0} bytes");
GC.KeepAlive(dnaGenerator);
static void DoWork(DnaGenerator dnaGenerator)
{
foreach (string dna in dnaGenerator.Take(5))
{
Console.WriteLine($"Processing DNA of {dna.Length:#,0} nucleotides" +
$", starting from {dna[0]}");
}
}
Output:
TotalMemory: 84,704 bytes
Processing DNA of 10,000,000 nucleotides, starting from A
Processing DNA of 10,000,000 nucleotides, starting from C
Processing DNA of 10,000,000 nucleotides, starting from G
Processing DNA of 10,000,000 nucleotides, starting from T
Processing DNA of 10,000,000 nucleotides, starting from A
TotalMemory: 20,112,680 bytes
Try it on Fiddle.
My expectation was that all generated DNA sequences would be eligible for garbage collection, since they are not referenced by my program. The only reference that I hold is the reference to the DnaGenerator instance itself, which is not meant to contain any sequences. This component just generates the sequences. Nevertheless, no matter how many or how few sequences my program generates, there are always around 20 MB of memory allocated after a full garbage collection.
My question is: Why is this happening? And how can I prevent this from happening?
.NET 6.0, Windows 10, 64-bit operating system, x64-based processor, Release built.
Update: The problem disappears if I replace this:
public IEnumerator<string> GetEnumerator() => _enumerable.GetEnumerator();
...with this:
public IEnumerator<string> GetEnumerator() => Iterator().GetEnumerator();
But I am not a fan of creating a new enumerable each time an enumerator is needed. My understanding is that a single IEnumerable<T> can create many IEnumerator<T>s. AFAIK these two interfaces are not meant to have an one-to-one relationship.

The problem is caused by the auto-generated implementation for the code using yield.
You can mitigate this somewhat by explicitly implementing the enumerator.
You have to fiddle it a bit by calling .Reset() from public IEnumerator<string> GetEnumerator() to ensure the enumeration restarts at each call:
class DnaGenerator : IEnumerable<string>
{
private readonly IEnumerator<string> _enumerable;
public DnaGenerator() => _enumerable = new IteratorImpl();
sealed class IteratorImpl : IEnumerator<string>
{
public bool MoveNext()
{
return true; // Infinite sequence.
}
public void Reset()
{
_index = 0;
}
public string Current
{
get
{
var result = new String(_data[_index], 10_000_000);
if (++_index >= _data.Length)
_index = 0;
return result;
}
}
public void Dispose()
{
// Nothing to do.
}
readonly char[] _data = { 'A', 'C', 'G', 'T' };
int _index;
object IEnumerator.Current => Current;
}
public IEnumerator<string> GetEnumerator()
{
_enumerable.Reset();
return _enumerable;
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

Note that 10_000_000 of chars (which are 16 bit) will take approximately 20 MB. If you will take a look at the decompilation you will notice that yeild return results in internal <Iterator> class generated which in turn has a current field to store the string (to implement IEnumerator<string>.Current):
[CompilerGenerated]
private sealed class <Iterator>d__2 : IEnumerable<string>, IEnumerable, IEnumerator<string>, IEnumerator, IDisposable
{
 ...
private string <>2__current;
...
}
And Iterator method internally will be compiled to something like this:
[IteratorStateMachine(typeof(<Iterator>d__2))]
private IEnumerable<string> Iterator()
{
return new <Iterator>d__2(-2);
}
Which leads to the current string always being stored in memory for _enumerable.GetEnumerator(); implementation (after iteration start) while DnaGenerator instance is not GCed itself.
UPD
My understanding is that a single IEnumerable can create many IEnumerators. AFAIK these two interfaces are not meant to have an one-to-one relationship.
Yes, in case of generated for yield return enumerable it can create multiple enumerators, but in this particular case the implementation have "one-to-one" relationship cause the generated implementation is both IEnumerable and IEnumerator:
private sealed class <Iterator>d__2 :
IEnumerable<string>, IEnumerable,
IEnumerator<string>, IEnumerator,
IDisposable
But I am not a fan of creating a new enumerable each time an enumerator is needed.
But it is actually what is happening when you call _enumerable.GetEnumerator() (which is obviously an implementation detail), if you check already mentioned decompilation you will see that _enumerable = Iterator() is actually new <Iterator>d__2(-2) and <Iterator>d__2.GetEnumerator() looks something like this:
IEnumerator<string> IEnumerable<string>.GetEnumerator()
{
if (<>1__state == -2 && <>l__initialThreadId == Environment.CurrentManagedThreadId)
{
<>1__state = 0;
return this;
}
return new <Iterator>d__2(0);
}
So it actually should create a new iterator instance every time except the first enumeration, so your public IEnumerator<string> GetEnumerator() => Iterator().GetEnumerator(); approach is just fine.

If memory usage (or speed) is an concern, you might (also) want to use bytes (or ints) to represent 4 nucleotides at once. Given what you shared with us, that might be the case.

#GuruStron's answer demonstrated that the problem that I've presented here was created by my shallow understanding of the C# iterators, and of how they are implemented internally. By storing an IEnumerable<string> in my DnaGenerator instances, I am gaining essentially nothing. When an enumerator is requested, both lines below result in allocating a single object. It's an autogenerated object with dual personality. It is both an IEnumerable<string>, and an IEnumerator<string>.
public IEnumerator<string> GetEnumerator() => _enumerable.GetEnumerator();
public IEnumerator<string> GetEnumerator() => Iterator().GetEnumerator();
By storing the _enumerable in a field I am just preventing this object from getting recycled.
Nevertheless I am still searching for ways to solve this non-issue, in a way that would allow me to keep the cached _enumerable field, without causing a memory leak, and without resorting to implementing a full fledged IEnumerable<string> from scratch as shown in #MatthewWatson's answer. The workaround that I found is to wrap my generated DNA sequences in StrongBox<string> wrappers:
private IEnumerable<StrongBox<string>> Iterator()
{
while (true)
foreach (char c in new char[] { 'A', 'C', 'G', 'T' })
yield return new(new String(c, 10_000_000));
}
Then I have to Unwrap the iterator before exposing it to the external world:
private readonly IEnumerable<string> _enumerable;
public DnaGenerator() => _enumerable = Iterator().Unwrap();
Here is the Unwrap extension method:
/// <summary>
/// Unwraps an enumerable sequence that contains values wrapped in StrongBox instances.
/// The latest StrongBox instance is emptied when the enumerator is disposed.
/// </summary>
public static IEnumerable<T> Unwrap<T>(this IEnumerable<StrongBox<T>> source)
=> new StrongBoxUnwrapper<T>(source);
private class StrongBoxUnwrapper<T> : IEnumerable<T>
{
private readonly IEnumerable<StrongBox<T>> _source;
public StrongBoxUnwrapper(IEnumerable<StrongBox<T>> source)
{
ArgumentNullException.ThrowIfNull(source);
_source = source;
}
public IEnumerator<T> GetEnumerator() => new Enumerator(_source.GetEnumerator());
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
private class Enumerator : IEnumerator<T>
{
private readonly IEnumerator<StrongBox<T>> _source;
private StrongBox<T> _latest;
public Enumerator(IEnumerator<StrongBox<T>> source)
{
ArgumentNullException.ThrowIfNull(source);
_source = source;
}
public T Current => _source.Current.Value;
object IEnumerator.Current => Current;
public bool MoveNext()
{
var moved = _source.MoveNext();
_latest = _source.Current;
return moved;
}
public void Dispose()
{
_source.Dispose();
if (_latest is not null) _latest.Value = default;
}
public void Reset() => _source.Reset();
}
}
The trick is to keep track of the latest StrongBox<T> that has been emitted by the enumerator, and set its Value to default when the enumerator is disposed.
Live demo.

Related

Writing an IEnumerator with performance comparable to array foreach

To add foreach support to a custom collection, you need to implement IEnumerable. Arrays, however, are special in that they essentially compile into a range-based for loop, which is much faster than using an IEnumerable. A simple benchmark confirms that:
number of elements: 20,000,000
byte[]: 6.860ms
byte[] as IEnumerable<byte>: 89.444ms
CustomCollection.IEnumerator<byte>: 89.667ms
The benchmark:
private byte[] byteArray = new byte[20000000];
private CustomCollection<byte> collection = new CustomCollection<T>( 20000000 );
[Benchmark]
public void enumerateByteArray()
{
var counter = 0;
foreach( var item in byteArray )
counter += item;
}
[Benchmark]
public void enumerateByteArrayAsIEnumerable()
{
var counter = 0;
var casted = (IEnumerable<byte>) byteArray;
foreach( var item in casted )
counter += item;
}
[Benchmark]
public void enumerateCollection()
{
var counter = 0;
foreach( var item in collection )
counter += item;
}
And the implementation:
public class CustomCollectionEnumerator : IEnumerable<T> where T : unmanaged
{
private CustomCollection<T> _collection;
private int _index;
private int _endIndex;
public CustomCollectionEnumerator( CustomCollection<T> collection )
{
_collection = collection;
_index = -1;
_endIndex = collection.Length;
}
public bool MoveNext()
{
if ( _index < _endIndex )
{
_index++;
return ( _index < _endIndex );
}
return false;
}
public T Current => _collection[ _index ];
object IEnumerator.Current => _collection[ _index ];
public void Reset() { _index = -1; }
public void Dispose() { }
}
public class CustomCollection<T> : IEnumerable<T> where T : unmanaged
{
private T* _ptr;
public int Length { get; private set; }
public T this[ int index ]
{
[MethodImpl( MethodImplOptions.AggressiveInlining )]
get => *_ptr[ index ];
[MethodImpl( MethodImplOptions.AggressiveInlining )]
set => *_ptr[ index ] = value;
}
public IEnumerator<T> GetEnumerator()
{
return new CustomCollectionEnumerator<T>( this );
}
}
Because arrays get special treatment from the compiler, they leave IEnumerable collections in the dust. Since C# focuses heavily on type safety, I can understand why this is the case, but it still incurs an absurd amount of overhead, especially for my custom collection, which enumerates in the exact same way as an array would. In fact, my custom collection is faster than a byte array in a range based for loop, as it uses pointer arithmetic to skip the CLR's array range checks.
So my question is: Is there a way to customize the behavior of a foreach loop such that I can achieve performance comparable to an array? Maybe through compiler intrinsics or manually compiling a delegate with IL?
Of course, I can always just use a range based for loop instead. I am just curious as to if there is any possible way to customize the low-level behavior of a foreach loop in a similar manner to how the compiler handles arrays.

A type doesn't actually need to implement IEnumerable/IEnumerable<T> to be used in a foreach statement. The foreach statement is duck-typed, meaning that the compiler first looks for public methods with the right signatures (GetEnumerator(), MoveNext() and Current) regardless of whether they are implementations of these interfaces, and only falls back to the interfaces if necessary.
This opens the door for some optimizations that can make a significant difference in a tight loop: GetEnumerator() can return a concrete type instead of IEnumerator<T>, which then allows the foreach loop to be built with non-virtual and potentially inlined calls, as well as making the enumerator a struct to avoid the GC overhead. Certain Framework collections such as List<T> also take advantage of this.
Together with a couple other optimizations, this enumerator based on your CustomCollection gets pretty close to a raw array loop in a microbenchmark:
public Enumerator GetEnumerator() => new Enumerator(this);
// Being a ref struct makes it less likely to mess up the pointer usage,
// but doesn't affect the foreach loop
// There is no technical reason why this couldn't implement IEnumerator
// as long as lifetime issues are considered
public unsafe ref struct Enumerator
{
// Storing the pointer directly instead of the collection reference to reduce indirection
// Assuming it's immutable for the lifetime of the enumerator
private readonly T* _ptr;
private uint _index;
private readonly uint _endIndex;
public T Current
{
get
{
// This check could be omitted at the cost of safety if consumers are
// expected to never manually use the enumerator in an incorrect order
if (_index >= _endIndex)
ThrowInvalidOp();
// Without the (int) cast Desktop x86 generates much worse code,
// but only if _ptr is generic. Not sure why.
return _ptr[(int)_index];
}
}
internal Enumerator(CustomCollection<T> collection)
{
_ptr = collection._ptr;
_index = UInt32.MaxValue;
_endIndex = (uint)collection.Length;
}
// Technically this could unexpectedly reset the enumerator if someone were to
// manually call MoveNext() countless times after it returns false for some reason
public bool MoveNext() => unchecked(++_index) < _endIndex;
// Pulling this out of the getter improves inlining of Current
private static void ThrowInvalidOp() => throw new InvalidOperationException();
}

Iterate/enumerate over part of a list?

Is there a way to remember the position of an enumerator?
I want to remember the position of an enumerate, so that I can reset it to a position before the current. I don't want to go back to the beginning so .reset() doesn't help.
Btw, is it possible to let the enumerator start eg at the 2. position?
List<string> list = new List<string>(new string[] { "a", "b", "c" });
IEnumerator<string> i = list.GetEnumerator();
i.MoveNext(); richTextBoxOutput.AppendText(i.Current);
IEnumerator<string> t = i; // how do I make a real copy i?
i.MoveNext(); richTextBoxOutput.AppendText(i.Current);
i = t;
i.MoveNext(); richTextBoxOutput.AppendText(i.Current);

As you already have a List<> why don't you maintain an indexer/counter then use the IEnumerable Skip() extension method (and possibly combine that with Take() followed by ForEach()).
Some possibly useful further info:
MSDN: Return Or Skip Elements in a Sequence
Stack Overflow: LINQ with Skip and Take

Is there a way to remember the position of an enumerator?
Sometimes. It depends on how the enumerator is implemented.
In this case the enumerator is implemented as a mutable struct, which was a performance optimisation that people more often run into when it produces this "freeze position" behaviour in situations where they don't want it. (If you're ever writing a generic class that wraps an implementation of IEnumerable<T> then either hold that reference as the interface type rather than the type itself, or don't have it readonly even if it seems like it should be, if you do you can end up with such a struct enumerator permanently frozen).
Just change your code so that instead of:
IEnumerator<string> i = list.GetEnumerator();
…
IEnumerator<string> t = i;
You have either:
List<string>.Enumerator i = list.GetEnumerator();
…
List<string>.Enumerator t = i;
Or simply:
var i = list.GetEnumerator();
…
var t = i;
Now you have i and t defined in terms of this struct and copying from one to the other copies the struct rather than just the reference to the boxed struct.
This will not work with all enumerators, and for that matter it isn't the best way to deliberately make it available when writing your own enumerator (if you needed to do so you'd be better adding some sort of Clone() or Snapshot() method to an enumerator that was a class rather than a struct), but it will work with List<T>.
A more flexible solution that doesn't depend on such a quirk of implementation would be:
public class SnapshotableListEnumerator<T> : IEnumerator<T>
{
private readonly IList<T> _list;
private int _idx;
private SnapshotableListEnumerator(IList<T> list, int idx)
{
_list = list;
_idx = idx;
}
public SnapshotableListEnumerator(IList<T> list)
: this(list, -1)
{
}
public bool MoveNext()
{
// Note that this enumerator doesn't complain about the list
// changing during enumeration, but we do want to check that
// a change doesn't push us past the end of the list, rather
// than caching the size.
if(_idx >= _list.Count)
return false;
++_idx;
return true;
}
public void Reset()
{
_idx = -1;
}
public T Current
{
get
{
if(_idx < 0 || _idx >= _list.Count)
throw new InvalidOperationException();
return _list[_idx];
}
}
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose()
{
}
public SnapshotableListEnumerator<T> Snapshot()
{
return new SnapshotableListEnumerator<T>(_list, _idx);
}
}
public static class SnapshotableListEnumeratorHelper
{
public static SnapshotableListEnumerator<T> GetSnapshotableEnumerator<T>(this IList<T> list)
{
return new SnapshotableListEnumerator<T>(list);
}
}
Now you can call GetSnapshotableEnumerator() on any implementation of IList<T> and use its Snapshot() method whenever you want a copy of the position within the enumeration.

Do you definitely need an IEnumerator instance? Why not enumerate using the index and store that in your own variable?
var list = new List<string>(new { "a", "b", "c" });
var pos = 2; // this is the position
richTextBoxOutput.AppendText(list[pos]);
You can reset at any time with:
pos = (desired position);

Just when is a stackoverflow fair and sensible?

Code updated
For fixing the bug of a filtered Interminable, the following code is updated and merged into original:
public static bool IsInfinity(this IEnumerable x) {
var it=
x as Infinity??((Func<object>)(() => {
var info=x.GetType().GetField("source", bindingAttr);
return null!=info?info.GetValue(x):x;
}))();
return it is Infinity;
}
bindingAttr is declared a constant.
Summary
I'm trying to implement an infinite enumerable, but encountered something seem to be illogical, and temporarily run out of idea. I need some direction to complete the code, becoming a semantic, logical, and reasonable design.
The whole story
I've asked the question a few hours ago:
Is an infinite enumerable still "enumerable"?
This might not be a good pattern of implementation. What I'm trying to do, is implement an enumerable to present infinity, in a logical and semantic way(I thought ..). I would put the code at the last of this post.
The big problem is, it's just for presenting of infinite enumerable, but the enumeration on it in fact doesn't make any sense, since there are no real elements of it.
So, besides provide dummy elements for the enumeration, there are four options I can imagine, and three lead to the StackOverflowException.
Throw an InvalidOperationException once it's going to be enumerated.
public IEnumerator<T> GetEnumerator() {
for(var message="Attempted to enumerate an infinite enumerable"; ; )
throw new InvalidOperationException(message);
}
and 3. are technically equivalent, let the stack overflowing occurs when it's really overflowed.
public IEnumerator<T> GetEnumerator() {
foreach(var x in this)
yield return x;
}
public IEnumerator<T> GetEnumerator() {
return this.GetEnumerator();
}
(described in 2)
Don't wait for it happens, throw StackOverflowException directly.
public IEnumerator<T> GetEnumerator() {
throw new StackOverflowException("... ");
}
The tricky things are:
If option 1 is applied, that is, enumerate on this enumerable, becomes an invalid operation. Isn't it weird to say that this lamp isn't used to illuminate(though it's true in my case).
If option 2 or option 3 is applied, that is, we planned the stack overflowing. Is it really as the title, just when stackoverflow is fair and sensible? Perfectly logical and reasonable?
The last choice is option 4. However, the stack in fact does not really overflow, since we prevented it by throwing a fake StackOverflowException. This reminds me that when Tom Cruise plays John Anderton said that: "But it didn't fall. You caught it. The fact that you prevented it from happening doesnt change the fact that it was going to happen."
Some good ways to avoid the illogical problems?
The code is compile-able and testable, note that one of OPTION_1 to OPTION_4 shoule be defined before compile.
Simple test
var objects=new object[] { };
Debug.Print("{0}", objects.IsInfinity());
var infObjects=objects.AsInterminable();
Debug.Print("{0}", infObjects.IsInfinity());
Classes
using System.Collections.Generic;
using System.Collections;
using System;
public static partial class Interminable /* extensions */ {
public static Interminable<T> AsInterminable<T>(this IEnumerable<T> x) {
return Infinity.OfType<T>();
}
public static Infinity AsInterminable(this IEnumerable x) {
return Infinity.OfType<object>();
}
public static bool IsInfinity(this IEnumerable x) {
var it=
x as Infinity??((Func<object>)(() => {
var info=x.GetType().GetField("source", bindingAttr);
return null!=info?info.GetValue(x):x;
}))();
return it is Infinity;
}
const BindingFlags bindingAttr=
BindingFlags.Instance|BindingFlags.NonPublic;
}
public abstract partial class Interminable<T>: Infinity, IEnumerable<T> {
IEnumerator IEnumerable.GetEnumerator() {
return this.GetEnumerator();
}
#if OPTION_1
public IEnumerator<T> GetEnumerator() {
for(var message="Attempted to enumerate an infinite enumerable"; ; )
throw new InvalidOperationException(message);
}
#endif
#if OPTION_2
public IEnumerator<T> GetEnumerator() {
foreach(var x in this)
yield return x;
}
#endif
#if OPTION_3
public IEnumerator<T> GetEnumerator() {
return this.GetEnumerator();
}
#endif
#if OPTION_4
public IEnumerator<T> GetEnumerator() {
throw new StackOverflowException("... ");
}
#endif
public Infinity LongCount<U>(
Func<U, bool> predicate=default(Func<U, bool>)) {
return this;
}
public Infinity Count<U>(
Func<U, bool> predicate=default(Func<U, bool>)) {
return this;
}
public Infinity LongCount(
Func<T, bool> predicate=default(Func<T, bool>)) {
return this;
}
public Infinity Count(
Func<T, bool> predicate=default(Func<T, bool>)) {
return this;
}
}
public abstract partial class Infinity: IFormatProvider, ICustomFormatter {
partial class Instance<T>: Interminable<T> {
public static readonly Interminable<T> instance=new Instance<T>();
}
object IFormatProvider.GetFormat(Type formatType) {
return typeof(ICustomFormatter)!=formatType?null:this;
}
String ICustomFormatter.Format(
String format, object arg, IFormatProvider formatProvider) {
return "Infinity";
}
public override String ToString() {
return String.Format(this, "{0}", this);
}
public static Interminable<T> OfType<T>() {
return Instance<T>.instance;
}
}

public IEnumerator<T> GetEnumerator()
{
while (true)
yield return default(T);
}
This will create an infinite enumerator - a foreach on it will never end and will just continue to give out the default value.
Note that you will not be able to determine IsInfinity() the way you wrote in your code. That is because new Infinity().Where(o => o == /*do any kind of comparison*/) will still be infinite but will have a different type.

As mentioned in the other post you linked, an infinite enumeration makes perfectly sense for C# to enumerate and there are an huge amount of real-world examples where people write enumerators that just do never end(first thing that springs off my mind is a random number generator).
So you have a particular case in your mathematical problem, where you need to define a special value (infinite number of points of intersection). Usually, that is where I use simple static constants for. Just define some static constant IEnumerable and test against it to find out whether your algorithm had the "infinite number of intersection" as result.
To more specific answer your current question: DO NOT EVER EVER cause a real stack overflow. This is about the nastiest thing you can do to users of your code. It can not be caught and will immediately terminate your process(probably the only exception is when you are running inside an attached instrumenting debugger).
If at all, I would use NotSupportedException which is used in other places to signal that some class do not support a feature(E.g. ICollections may throw this in Remove() if they are read-only).

If I understand correctly -- infinite is a confusing word here. I think you need a monad which is either enumerable or not. But let's stick with infinite for now.
I cannot think of a nice way of implementing this in C#. All ways this could be implemented don't integrate with C# generators.
With C# generator, you can only emit valid values; so there's no way to indicate that this is an infinite enumerable. I don't like idea of throwing exceptions from generator to indicate that it is infinite; because to check that it is infinite, you will have to to try-catch every time.
If you don't need to support generators, then I see following options :
Implement sentinel enumerable:
public class InfiniteEnumerable<T>: IEnumerable<T> {
private static InfiniteEnumerable<T> val;
public static InfiniteEnumerable<T> Value {
get {
return val;
}
}
public IEnumerator<T> GetEnumerator() {
throw new InvalidOperationException(
"This enumerable cannot be enumerated");
}
IEnumerator IEnumerable.GetEnumerator() {
throw new InvalidOperationException(
"This enumerable cannot be enumerated");
}
}
Sample usage:
IEnumerable<int> enumerable=GetEnumerable();
if(enumerable==InfiniteEnumerable<int>.Value) {
// This is 'infinite' enumerable.
}
else {
// enumerate it here.
}
Implement Infinitable<T> wrapper:
public class Infinitable<T>: IEnumerable<T> {
private IEnumerable<T> enumerable;
private bool isInfinite;
public Infinitable(IEnumerable<T> enumerable) {
this.enumerable=enumerable;
this.isInfinite=false;
}
public Infinitable() {
this.isInfinite=true;
}
public bool IsInfinite {
get {
return isInfinite;
}
}
public IEnumerator<T> GetEnumerator() {
if(isInfinite) {
throw new InvalidOperationException(
"The enumerable cannot be enumerated");
}
return this.enumerable.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator() {
if(isInfinite) {
throw new InvalidOperationException(
"The enumerable cannot be enumerated");
}
return this.enumerable.GetEnumerator();
}
}
Sample usage:
Infinitable<int> enumerable=GetEnumerable();
if(enumerable.IsInfinite) {
// This is 'infinite' enumerable.
}
else {
// enumerate it here.
foreach(var i in enumerable) {
}
}

Infinite sequences may be perfectly iterable/enumerable. Natural numbers are enumerable and so are rational numbers or PI digits. Infinite is the opposite of finite, not enumerable.
The variants that you've provided don't represent the infinite sequences. There are infinitely many different infinite sequences and you can see that they're different by iterating through them. Your idea, on the other hand, is to have a singleton, which goes against that diversity.
If you have something that cannot be enumerated (like the set of real numbers), then you just shouldn't define it as IEnumerable as it's breaking the contract.
If you want to discern between finite and infinite enumerable sequences, just crate a new interface IInfiniteEnumerable : IEnumerable and mark infinite sequences with it.
Interface that marks infinite sequences
public interface IInfiniteEnumerable<T> : IEnumerable<T> {
}
A wrapper to convert an existing IEnumerable<T> to IInfiniteEnumerable<T> (IEnumerables are easily created with C#'s yield syntax, but we need to convert them to IInfiniteEnumerable )
public class InfiniteEnumerableWrapper<T> : IInfiniteEnumerable<T> {
IEnumerable<T> _enumerable;
public InfiniteEnumerableWrapper(IEnumerable<T> enumerable) {
_enumerable = enumerable;
}
public IEnumerator<T> GetEnumerator() {
return _enumerable.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator() {
return _enumerable.GetEnumerator();
}
}
Some infinity-aware routines (like calculating the sequence length)
//TryGetCount() returns null if the sequence is infinite
public static class EnumerableExtensions {
public static int? TryGetCount<T>(this IEnumerable<T> sequence) {
if (sequence is IInfiniteEnumerable<T>) {
return null;
} else {
return sequence.Count();
}
}
}
Two examples of sequences - a finite range sequence and the infinite Fibonacci sequence.
public class Sequences {
public static IEnumerable<int> GetIntegerRange(int start, int count) {
return Enumerable.Range(start, count);
}
public static IInfiniteEnumerable<int> GetFibonacciSequence() {
return new InfiniteEnumerableWrapper<int>(GetFibonacciSequenceInternal());
}
static IEnumerable<int> GetFibonacciSequenceInternal() {
var p = 0;
var q = 1;
while (true) {
yield return p;
var newQ = p + q;
p = q;
q = newQ;
}
}
}
A test app that generates random sequences and tries to calculate their lengths.
public class TestApp {
public static void Main() {
for (int i = 0; i < 20; i++) {
IEnumerable<int> sequence = GetRandomSequence();
Console.WriteLine(sequence.TryGetCount() ?? double.PositiveInfinity);
}
Console.ReadLine();
}
static Random _rng = new Random();
//Randomly generates an finite or infinite sequence
public static IEnumerable<int> GetRandomSequence() {
int random = _rng.Next(5) * 10;
if (random == 0) {
return Sequences.GetFibonacciSequence();
} else {
return Sequences.GetIntegerRange(0, random);
}
}
}
The program output something like this:
20
40
20
10
20
10
20
Infinity
40
30
40
Infinity
Infinity
40
40
30
20
30
40
30

IEnumerable from IEnumerator

I have writen about custom IEnumerator. Whats the simplest way to make IEnumerable from it ? Ideal solution (one line of code) would be if there was some class for that purpose. Or do I have to create my own ?

There's no built-in method, unfortunately. I have this extension method that I use often enough:
static IEnumerable Iterate(this IEnumerator iterator)
{
while (iterator.MoveNext())
yield return iterator.Current;
}

In my collection of C# utils I have this:
class Enumerable<T> : IEnumerable<T>
{
Func<IEnumerator<T>> factory;
public Enumerable(Func<IEnumerator<T>> factory) { this.factory = factory; }
public IEnumerator<T> GetEnumerator() { return this.factory(); }
IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); }
}
This takes an IEnumerator factory function, which usually can be provided very easily instead of the single IEnumerator instance (which yields wrong results after first iteration and breaks the semantics of IEnumerable). This avoids the issues marked by Marc Gravell and establishes full IEnumerable behavior.
I use it this way:
IEnumerable<Fruit> GetFruits()
{
var arg1 = ...
return new Enumerable<Fruit>(() => new FruitIterator(arg1, arg2, ...));
}

I would really approach this the other way around; while you can (as per Mike P's excellent answer) wrap an enumerator to pretend to be enumerable, there are some things that you can't really do - for example, it is hoped (although, to be fair, not insisted) that you can obtain multiple enumerators from an enumerable, ideally isolated and repeatable. So if I do:
Assert.AreEqual(sequence.Sum(), sequence.Sum());
but if you "spoof" the enumerator into an enumerable, the second sequence will be empty. Or if you do them in parallel - just bizarre. And there are methods that process them in parallel - consider:
Assert.IsTrue(sequence.SequenceEqual(sequence));
this works both enumerators forward at the same time, so if you only have one enumerator, you are fairly scuppered.
There is a reset on enumerators, but this is largely a design mistake and shouldn't be used (it is even a formal requirement in the spec that iterator blocks throw an exception if you call it).
A better (IMO) question is "how do I get an enumerator from an enumerable", in which case the answer is "call GetEnumerator(), and remember to check to dispose to iterator" - or in simpler terms "use foreach".

Pretty simple:
class Enumerate : IEnumerable
{
private Enumerate IEnumerator it;
public Enumerate(IEnumerator it) { this.it = it; }
public IEnumerator GetEnumerator() { return this.it; }
}
This also allows the user to call IEnumerator.Reset() if the enumerator you gave it supports it.

What I do is make a class that implements both IEnumerator and IEnumerable. Make GetEnumerator() return itself and you can iterate it like normal.
public class MyClassEnumerator : IEnumerator<MyClass>, IEnumerable<MyClass>
{
public MyClass Current { get; private set; }
object IEnumerator.Current => Current;
public void Reset()
{
...
}
public bool MoveNext()
{
...
}
public IEnumerator<MyClass> GetEnumerator()
{
return this;
}
IEnumerator IEnumerable.GetEnumerator()
{
return this;
}
public void Dispose()
{
...
}
}

Passing a single item as IEnumerable<T>

Is there a common way to pass a single item of type T to a method which expects an IEnumerable<T> parameter? Language is C#, framework version 2.0.
Currently I am using a helper method (it's .Net 2.0, so I have a whole bunch of casting/projecting helper methods similar to LINQ), but this just seems silly:
public static class IEnumerableExt
{
// usage: IEnumerableExt.FromSingleItem(someObject);
public static IEnumerable<T> FromSingleItem<T>(T item)
{
yield return item;
}
}
Other way would of course be to create and populate a List<T> or an Array and pass it instead of IEnumerable<T>.
[Edit] As an extension method it might be named:
public static class IEnumerableExt
{
// usage: someObject.SingleItemAsEnumerable();
public static IEnumerable<T> SingleItemAsEnumerable<T>(this T item)
{
yield return item;
}
}
Am I missing something here?
[Edit2] We found someObject.Yield() (as #Peter suggested in the comments below) to be the best name for this extension method, mainly for brevity, so here it is along with the XML comment if anyone wants to grab it:
public static class IEnumerableExt
{
/// <summary>
/// Wraps this object instance into an IEnumerable<T>
/// consisting of a single item.
/// </summary>
/// <typeparam name="T"> Type of the object. </typeparam>
/// <param name="item"> The instance that will be wrapped. </param>
/// <returns> An IEnumerable<T> consisting of a single item. </returns>
public static IEnumerable<T> Yield<T>(this T item)
{
yield return item;
}
}

Well, if the method expects an IEnumerable you've got to pass something that is a list, even if it contains one element only.
passing
new[] { item }
as the argument should be enough I think

In C# 3.0 you can utilize the System.Linq.Enumerable class:
// using System.Linq
Enumerable.Repeat(item, 1);
This will create a new IEnumerable that only contains your item.

Your helper method is the cleanest way to do it, IMO. If you pass in a list or an array, then an unscrupulous piece of code could cast it and change the contents, leading to odd behaviour in some situations. You could use a read-only collection, but that's likely to involve even more wrapping. I think your solution is as neat as it gets.

In C# 3 (I know you said 2), you can write a generic extension method which might make the syntax a little more acceptable:
static class IEnumerableExtensions
{
public static IEnumerable<T> ToEnumerable<T>(this T item)
{
yield return item;
}
}
client code is then item.ToEnumerable().

This helper method works for item or many.
public static IEnumerable<T> ToEnumerable<T>(params T[] items)
{
return items;
}

I'm kind of surprised that no one suggested a new overload of the method with an argument of type T to simplify the client API.
public void DoSomething<T>(IEnumerable<T> list)
{
// Do Something
}
public void DoSomething<T>(T item)
{
DoSomething(new T[] { item });
}
Now your client code can just do this:
MyItem item = new MyItem();
Obj.DoSomething(item);
or with a list:
List<MyItem> itemList = new List<MyItem>();
Obj.DoSomething(itemList);

Either (as has previously been said)
MyMethodThatExpectsAnIEnumerable(new[] { myObject });
or
MyMethodThatExpectsAnIEnumerable(Enumerable.Repeat(myObject, 1));
As a side note, the last version can also be nice if you want an empty list of an anonymous object, e.g.
var x = MyMethodThatExpectsAnIEnumerable(Enumerable.Repeat(new { a = 0, b = "x" }, 0));

I agree with #EarthEngine's comments to the original post, which is that 'AsSingleton' is a better name. See this wikipedia entry. Then it follows from the definition of singleton that if a null value is passed as an argument that 'AsSingleton' should return an IEnumerable with a single null value instead of an empty IEnumerable which would settle the if (item == null) yield break; debate. I think the best solution is to have two methods: 'AsSingleton' and 'AsSingletonOrEmpty'; where, in the event that a null is passed as an argument, 'AsSingleton' will return a single null value and 'AsSingletonOrEmpty' will return an empty IEnumerable. Like this:
public static IEnumerable<T> AsSingletonOrEmpty<T>(this T source)
{
if (source == null)
{
yield break;
}
else
{
yield return source;
}
}
public static IEnumerable<T> AsSingleton<T>(this T source)
{
yield return source;
}
Then, these would, more or less, be analogous to the 'First' and 'FirstOrDefault' extension methods on IEnumerable which just feels right.

This is 30% faster than yield or Enumerable.Repeat when used in foreach due to this C# compiler optimization, and of the same performance in other cases.
public struct SingleSequence<T> : IEnumerable<T> {
public struct SingleEnumerator : IEnumerator<T> {
private readonly SingleSequence<T> _parent;
private bool _couldMove;
public SingleEnumerator(ref SingleSequence<T> parent) {
_parent = parent;
_couldMove = true;
}
public T Current => _parent._value;
object IEnumerator.Current => Current;
public void Dispose() { }
public bool MoveNext() {
if (!_couldMove) return false;
_couldMove = false;
return true;
}
public void Reset() {
_couldMove = true;
}
}
private readonly T _value;
public SingleSequence(T value) {
_value = value;
}
public IEnumerator<T> GetEnumerator() {
return new SingleEnumerator(ref this);
}
IEnumerator IEnumerable.GetEnumerator() {
return new SingleEnumerator(ref this);
}
}
in this test:
// Fastest among seqs, but still 30x times slower than direct sum
// 49 mops vs 37 mops for yield, or c.30% faster
[Test]
public void SingleSequenceStructForEach() {
var sw = new Stopwatch();
sw.Start();
long sum = 0;
for (var i = 0; i < 100000000; i++) {
foreach (var single in new SingleSequence<int>(i)) {
sum += single;
}
}
sw.Stop();
Console.WriteLine($"Elapsed {sw.ElapsedMilliseconds}");
Console.WriteLine($"Mops {100000.0 / sw.ElapsedMilliseconds * 1.0}");
}

As I have just found, and seen that user LukeH suggested too, a nice simple way of doing this is as follows:
public static void PerformAction(params YourType[] items)
{
// Forward call to IEnumerable overload
PerformAction(items.AsEnumerable());
}
public static void PerformAction(IEnumerable<YourType> items)
{
foreach (YourType item in items)
{
// Do stuff
}
}
This pattern will allow you to call the same functionality in a multitude of ways: a single item; multiple items (comma-separated); an array; a list; an enumeration, etc.
I'm not 100% sure on the efficiency of using the AsEnumerable method though, but it does work a treat.
Update: The AsEnumerable function looks pretty efficient! (reference)

Although it's overkill for one method, I believe some people may find the Interactive Extensions useful.
The Interactive Extensions (Ix) from Microsoft includes the following method.
public static IEnumerable<TResult> Return<TResult>(TResult value)
{
yield return value;
}
Which can be utilized like so:
var result = EnumerableEx.Return(0);
Ix adds new functionality not found in the original Linq extension methods, and is a direct result of creating the Reactive Extensions (Rx).
Think, Linq Extension Methods + Ix = Rx for IEnumerable.
You can find both Rx and Ix on CodePlex.

I recently asked the same thing on another post
Is there a way to call a C# method requiring an IEnumerable<T> with a single value? ...with benchmarking.
I wanted people stopping by here to see the brief benchmark comparison shown at that newer post for 4 of the approaches presented in these answers.
It seems that simply writing new[] { x } in the arguments to the method is the shortest and fastest solution.

This may not be any better but it's kind of cool:
Enumerable.Range(0, 1).Select(i => item);

Sometimes I do this, when I'm feeling impish:
"_".Select(_ => 3.14) // or whatever; any type is fine
This is the same thing with less shift key presses, heh:
from _ in "_" select 3.14
For a utility function I find this to be the least verbose, or at least more self-documenting than an array, although it'll let multiple values slide; as a plus it can be defined as a local function:
static IEnumerable<T> Enumerate (params T[] v) => v;
// usage:
IEnumerable<double> example = Enumerate(1.234);
Here are all of the other ways I was able to think of (runnable here):
using System;
using System.Collections.Generic;
using System.Linq;
public class Program {
public static IEnumerable<T> ToEnumerable1 <T> (T v) {
yield return v;
}
public static T[] ToEnumerable2 <T> (params T[] vs) => vs;
public static void Main () {
static IEnumerable<T> ToEnumerable3 <T> (params T[] v) => v;
p( new string[] { "three" } );
p( new List<string> { "three" } );
p( ToEnumerable1("three") ); // our utility function (yield return)
p( ToEnumerable2("three") ); // our utility function (params)
p( ToEnumerable3("three") ); // our local utility function (params)
p( Enumerable.Empty<string>().Append("three") );
p( Enumerable.Empty<string>().DefaultIfEmpty("three") );
p( Enumerable.Empty<string>().Prepend("three") );
p( Enumerable.Range(3, 1) ); // only for int
p( Enumerable.Range(0, 1).Select(_ => "three") );
p( Enumerable.Repeat("three", 1) );
p( "_".Select(_ => "three") ); // doesn't have to be "_"; just any one character
p( "_".Select(_ => 3.3333) );
p( from _ in "_" select 3.0f );
p( "a" ); // only for char
// these weren't available for me to test (might not even be valid):
// new Microsoft.Extensions.Primitives.StringValues("three")
}
static void p <T> (IEnumerable<T> e) =>
Console.WriteLine(string.Join(' ', e.Select((v, k) => $"[{k}]={v,-8}:{v.GetType()}").DefaultIfEmpty("<empty>")));
}

For those wondering about performance, while #mattica has provided some benchmarking information in a similar question referenced above, My benchmark tests, however, have provided a different result.
In .NET 7, yield return value is ~9% faster than new T[] { value } and allocates 75% the amount of memory. In most cases, this is already hyper-performant and is as good as you'll ever need.
I was curious if a custom single collection implementation would be faster or more lightweight. It turns out because yield return is implemented as IEnumerator<T> and IEnumerable<T>, the only way to beat it in terms of allocation is to do that in my implementation as well.
If you're passing IEnumerable<> to an outside library, I would strongly recommend not doing this unless you're very familiar with what you're building. That being said, I made a very simple (not-reuse-safe) implementation which was able to beat the yield method by 5ns and allocated only half as much as the array.
Because all tests were passed an IEnumerable<T>, value types generally performed worse than reference types. The best implementation I had was actually the simplest - you can look at the SingleCollection class in the gist I linked to. (This was 2ns faster than yield return, but allocated 88% of what the array would, compared to the 75% allocated for yield return.)
TL:DR; if you care about speed, use yield return item. If you really care about speed, use a SingleCollection.

The easiest way I'd say would be new T[]{item};; there's no syntax to do this. The closest equivalent that I can think of is the params keyword, but of course that requires you to have access to the method definition and is only usable with arrays.

Enumerable.Range(1,1).Select(_ => {
//Do some stuff... side effects...
return item;
});
The above code is useful when using like
var existingOrNewObject = MyData.Where(myCondition)
.Concat(Enumerable.Range(1,1).Select(_ => {
//Create my object...
return item;
})).Take(1).First();
In the above code snippet there is no empty/null check, and it is guaranteed to have only one object returned without afraid of exceptions. Furthermore, because it is lazy, the closure will not be executed until it is proved there is no existing data fits the criteria.

To be filed under "Not necessarily a good solution, but still...a solution" or "Stupid LINQ tricks", you could combine Enumerable.Empty<>() with Enumerable.Append<>()...
IEnumerable<string> singleElementEnumerable = Enumerable.Empty<string>().Append("Hello, World!");
...or Enumerable.Prepend<>()...
IEnumerable<string> singleElementEnumerable = Enumerable.Empty<string>().Prepend("Hello, World!");
The latter two methods are available since .NET Framework 4.7.1 and .NET Core 1.0.
This is a workable solution if one were really intent on using existing methods instead of writing their own, though I'm undecided if this is more or less clear than the Enumerable.Repeat<>() solution. This is definitely longer code (partly due to type parameter inference not being possible for Empty<>()) and creates twice as many enumerator objects, however.
Rounding out this "Did you know these methods exist?" answer, Array.Empty<>() could be substituted for Enumerable.Empty<>(), but it's hard to argue that makes the situation...better.

I'm a bit late to the party but I'll share my way anyway.
My problem was that I wanted to bind the ItemSource or a WPF TreeView to a single object. The hierarchy looks like this:
Project > Plot(s) > Room(s)
There was always going to be only one Project but I still wanted to Show the project in the Tree, without having to pass a Collection with only that one object in it like some suggested.
Since you can only pass IEnumerable objects as ItemSource I decided to make my class IEnumerable:
public class ProjectClass : IEnumerable<ProjectClass>
{
private readonly SingleItemEnumerator<AufmassProjekt> enumerator;
...
public IEnumerator<ProjectClass > GetEnumerator() => this.enumerator;
IEnumerator IEnumerable.GetEnumerator() => this.GetEnumerator();
}
And create my own Enumerator accordingly:
public class SingleItemEnumerator : IEnumerator
{
private bool hasMovedOnce;
public SingleItemEnumerator(object current)
{
this.Current = current;
}
public bool MoveNext()
{
if (this.hasMovedOnce) return false;
this.hasMovedOnce = true;
return true;
}
public void Reset()
{ }
public object Current { get; }
}
public class SingleItemEnumerator<T> : IEnumerator<T>
{
private bool hasMovedOnce;
public SingleItemEnumerator(T current)
{
this.Current = current;
}
public void Dispose() => (this.Current as IDisposable).Dispose();
public bool MoveNext()
{
if (this.hasMovedOnce) return false;
this.hasMovedOnce = true;
return true;
}
public void Reset()
{ }
public T Current { get; }
object IEnumerator.Current => this.Current;
}
This is probably not the "cleanest" solution but it worked for me.
EDIT
To uphold the single responsibility principle as #Groo pointed out I created a new wrapper class:
public class SingleItemWrapper : IEnumerable
{
private readonly SingleItemEnumerator enumerator;
public SingleItemWrapper(object item)
{
this.enumerator = new SingleItemEnumerator(item);
}
public object Item => this.enumerator.Current;
public IEnumerator GetEnumerator() => this.enumerator;
}
public class SingleItemWrapper<T> : IEnumerable<T>
{
private readonly SingleItemEnumerator<T> enumerator;
public SingleItemWrapper(T item)
{
this.enumerator = new SingleItemEnumerator<T>(item);
}
public T Item => this.enumerator.Current;
public IEnumerator<T> GetEnumerator() => this.enumerator;
IEnumerator IEnumerable.GetEnumerator() => this.GetEnumerator();
}
Which I used like this
TreeView.ItemSource = new SingleItemWrapper(itemToWrap);
EDIT 2
I corrected a mistake with the MoveNext() method.

I prefer
public static IEnumerable<T> Collect<T>(this T item, params T[] otherItems)
{
yield return item;
foreach (var otherItem in otherItems)
{
yield return otherItem;
}
}
This lets you call item.Collect() if you want the singleton, but it also lets you call item.Collect(item2, item3) if you want

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Managed memory leaked by C# iterator - c#

If memory usage (or speed) is an concern, you might (also) want to use bytes (or ints) to represent 4 nucleotides at once. Given what you shared with us, that might be the case.

Related

Writing an IEnumerator with performance comparable to array foreach

Iterate/enumerate over part of a list?

Just when is a stackoverflow fair and sensible?

IEnumerable from IEnumerator

Passing a single item as IEnumerable<T>

Categories

Resources