In order to make my life easier dealing to strings, I want to use a hashset initialized with StringComparer.OrdinalIgnoreCase.
But sometimes, I need to make an operation on all items.
This is clearly not the way I will achieve my goal for obvious performance reasons, but I'd like to know if this code makes sense, especially the "Set" part of the index, and how it could cause unwanted side effects on the collection.
Here is the HashSet implementation:
public class MyHashSet<T> : HashSet<T>
{
public T this[int index]
{
get
{
int i = 0;
foreach (T t in this)
{
if (i == index)
return t;
i++;
}
throw new IndexOutOfRangeException();
}
set
{
int i = 0;
foreach (T t in this)
{
if (i == index)
{
this.RemoveWhere(element => element.Equals(t));
this.Add(value);
return;
}
i++;
}
throw new IndexOutOfRangeException();
}
}
public MyHashSet()
{
}
public MyHashSet(IEnumerable<T> collection)
: base(collection)
{
}
public MyHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
: base(collection, comparer)
{
}
public MyHashSet(IEqualityComparer<T> comparer)
: base(comparer)
{
}
}
In what conditions isn't it safe?
In what conditions isn't it safe?
Any. You're trying to access the items in a HashSet by index, but they have no logical index. The order in which they are iterated is arbitrary and cannot be relied on, so the method doesn't make sense, at a conceptual level, under any circumstances.
If you want to be able to access items by index then use a collection that is ordered, such as a List.
Related
I have a very large collection that implements the generic IList<T> interface and contains tens of millions of elements, and I would like to process them in parallel using PLINQ. I noticed that the overhead of parallelism is quite significant because processing each individual element is very lightweight, so I am searching for ways to chunkify the processing by splitting the IList<T> into small segments. My goal is to have finally something like this:
IList<Person> source = GetAllPersons();
double averageAge = source
.Segmentate(1000) // Hypothetical operator that segmentates the source
.AsParallel()
.Select(segment => segment.Select(person => (double)person.CalculateAge()).Sum())
.Sum() / source.Count;
I could use the Batch operator from the MoreLinq library, or any of the answers from many related questions, but all of these solutions are allocating multiple small arrays (or lists), and are copying the source into these containers, and I don't want that. In my case I have the additional requirement of keeping the garbage collector idle as much as possible.
I noticed that the .NET has the ArraySegment type, that seems to fit perfectly with my requirements:
// Delimits a section of a one-dimensional array.
public readonly struct ArraySegment<T> : ICollection<T>, IEnumerable<T>,
IEnumerable, IList<T>, IReadOnlyCollection<T>, IReadOnlyList<T>
I could use this type to implement the allocation-free Segmentate operator like this:
/// <summary>Segmentates the source array into sized segments.</summary>
public static IEnumerable<ArraySegment<T>> Segmentate<T>(this T[] source, int size)
{
for (int offset = 0; offset < source.Length; offset += size)
{
yield return new ArraySegment<T>(source, offset,
Math.Min(size, source.Length - offset));
}
}
But I can't use this type because my source is an IList<T> and not an array. And copying it to an array is not really an option, because the source is mutated frequently. And creating new array copies all the time is against my requirements.
So I am searching for a ListSegment<T> type, but as far as I can see it doesn't exist in .NET. Do I have to implement it myself? And if so, how? Or is any other way to segmentate an IList<T> without causing allocations?
Clarification: My source collection is not a List<T>. It is a custom class that implements the IList<T> interface.
You need to implement an ArraySegment<T> equivalent for IList<T>. See implementation below. For optimal performance, consider using spans instead.
ListSegment<T> Struct
public readonly struct ListSegment<T> : IList<T>
{
public List<T> Items { get; }
public int Offset { get; }
public int Count { get; }
public ListSegment(List<T> items, int offset, int count)
{
Items = items ?? throw new ArgumentNullException(nameof(items));
Offset = offset;
Count = count;
if (items.Count < offset + count)
{
throw new ArgumentException("List segment out of range.", nameof(count));
}
}
public void CopyTo(T[] array, int index)
{
if (Count > 0)
{
Items.CopyTo(Offset, array, index, Count);
}
}
public bool Contains(T item) => IndexOf(item) != -1;
public int IndexOf(T item)
{
for (var i = Offset; i < Offset + Count; i++)
{
if (Items[i].Equals(item))
{
return i;
}
}
return -1;
}
private T ElementAt(int index)
{
if (Count > 0)
{
return Items[Offset + index];
}
throw new ArgumentOutOfRangeException(nameof(index));
}
public ListSegmentEnumerator GetEnumerator() => new ListSegmentEnumerator(this);
#region IEnumerable<T> interface
IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
#endregion
#region ICollection<T> interface
bool ICollection<T>.IsReadOnly => true;
void ICollection<T>.Add(T item) => throw new NotImplementedException();
bool ICollection<T>.Remove(T item) => throw new NotImplementedException();
void ICollection<T>.Clear() => throw new NotImplementedException();
#endregion
#region IList<T> interface
void IList<T>.Insert(int index, T item) => throw new NotImplementedException();
void IList<T>.RemoveAt(int index) => throw new NotImplementedException();
T IList<T>.this[int index]
{
get => ElementAt(index);
set => throw new NotImplementedException();
}
#endregion
public struct ListSegmentEnumerator : IEnumerator<T>
{
private readonly List<T> items;
private readonly int start;
private readonly int end;
private int current;
public ListSegmentEnumerator(ListSegment<T> segment)
{
items = segment.Items;
start = segment.Offset;
end = start + segment.Count;
current = start - 1;
}
public bool MoveNext()
{
if (current < end)
{
current++;
return current < end;
}
return false;
}
public T Current => items[current];
object IEnumerator.Current => Current;
void IEnumerator.Reset() => current = start - 1;
void IDisposable.Dispose() { }
}
}
This answer assumes that your concrete IList, is of type List. You can use the GetRange function, which pretty much does what you want:
A shallow copy of a collection of reference types, or a subset of that collection, contains only the references to the elements of the collection. The objects themselves are not copied. The references in the new list point to the same objects as the references in the original list.
Even the ArraySegment<T> will create some kind of reference object to store the array segment, so you can't completely avoid that.
If you want to avoid storing the references (aka shallow copy), then a Span would be in order, but your comment that your collection changes conflicts with this
Items should not be added or removed from the List while the Span is in use.
So, your only other solution, would be to create one yourself as you mentioned.
Warning: There is a reason why such a thing does not exist built in. The array is fixed size, so getting a segment is much safer to handle. Be careful of unexpected consequences and side-effects when creating such a construct. This is the reason why the Span warns you that it's unsafe. You only know your requirements and how your data changes, so your collection wrapper should take them into account and handle them accordingly.
I'm trying to develop a custom collection or list class which provides me the following capabilities:
Add(MyObject)
Add(MyObject, String) ' key
Remove(MyObject)
RemoveByKey(String) ' key
RemoveAt(Index)
Count
Clear
Item(Index)
Item(String) ' key
Contains(MyObject)
ContainsKey(String) ' key
GetEnumerator ' for-each MyObject
I've searched through IEnumerable, IList, ICollection but none are satisfying what I need above. For example, they're all missing storing of objects by Key(string).
How do I create such a collection/list object? I've noticed that the best thing that matches my requirements is the ListViewItemCollection object available by the system. I wish I could see the coding inside it to find out how it has implemented the storing and retrieval of objects.
Can anybody help out? Or guide me to tutorial links.
Thanks.
Example of such class could be System.Windows.Forms.Control.ControlCollection which is implemented like List<KeyValuePair<string,Control>> (actually Control already contains the key) and this[string] is implemented using ordinary for-loop (linear searching for the key).
We can help to speed this up by adding Dictionary and add every item with key to both collection (List+Dictionary). Items without key are added to List only.
EDIT: Further improvement may use List<KeyValuePair<string,T>> and Dictionary<string,KeyValuePair<int,T>> - mapping index from List to Dictionary for faster removing. RemoveAt should check if the key is preset and delete it from dictionary as well. RemoveByKey can get index for internal List.RemoveAt.
ADDON based on comments: implementation of IEnumerable<T> may look like this:
class MyObjectList<T>: IEnumerable<T> {
public IEnumerator<T> GetEnumerator() {
T[] a = items; int n = size;
for(int i = 0; i < n; i++)
yield return a[i]; }
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator(); }
...the above are internals of List<T>
ADDON: here you can see my custom ListCore and List created from it (feel free to use it as you wish).
I bet there are tons of easier ways to do this, but here's one approach. You could create a struct containing the key and value of each item:
public sealed class Listionary<K, T> : IDictionary<K, T>, IList<T>
{
private struct ListionaryPair
{
public ListionaryPair(T item) : this()
{
Item = item;
}
public ListionaryPair(K key, T item) : this()
{
Key = key;
Item = item;
}
public K Key { get; private set; }
public T Item { get; private set; }
public bool HasKey { get; private set; }
}
private readonly List<ListionaryPair> list = new List<ListionaryPair>();
(The whole HasKey thing allows value types as K, or null references as valid keys. If you only want string keys you could replace this struct with KeyValuePair<string, T>)
And then both interfaces separately:
public void Add(T item)
{
list.Add(new ListionaryPair(item));
}
public void Add(K key, T item)
{
list.Add(new ListionaryPair(key, item));
}
public void RemoveAt(int index)
{
list.RemoveAt(index);
}
You can hide ugly methods by explicitly implementing them:
void ICollection<KeyValuePair<K, T>>.CopyTo(KeyValuePair<K, T>[] array, int arrayIndex)
{
// code implementing the method
}
You'll need some helper methods for access by key:
private int IndexOfKey(K key)
{
for (int i = 0; i < list.Count; i++)
{
var pair = list[i];
if (pair.HasKey && pair.Key == key)
{
return i;
}
}
return -1;
}
but if you get them right the rest won't be that much of a challenge:
public T this[K key]
{
get
{
int index = IndexOfKey(key);
if (index < 0)
{
throw new IndexOutOfRangeException();
}
return list[index].Item;
}
set
{
int index = IndexOfKey(key);
if (index < 0)
{
throw new IndexOutOfRangeException();
}
list[index] = new ListionaryPair(key, value);
}
}
It's quite a bit of coding to complete each interface method, but most will be short and simple. You'll have to decide whether you allow multiple items with the same key, whether IDictionary<,>.Clear() clears the entire collection or only keyed items, etc.
Also there's no backing Dictionary in this example, so performance might not be that great.
Code updated
For fixing the bug of a filtered Interminable, the following code is updated and merged into original:
public static bool IsInfinity(this IEnumerable x) {
var it=
x as Infinity??((Func<object>)(() => {
var info=x.GetType().GetField("source", bindingAttr);
return null!=info?info.GetValue(x):x;
}))();
return it is Infinity;
}
bindingAttr is declared a constant.
Summary
I'm trying to implement an infinite enumerable, but encountered something seem to be illogical, and temporarily run out of idea. I need some direction to complete the code, becoming a semantic, logical, and reasonable design.
The whole story
I've asked the question a few hours ago:
Is an infinite enumerable still "enumerable"?
This might not be a good pattern of implementation. What I'm trying to do, is implement an enumerable to present infinity, in a logical and semantic way(I thought ..). I would put the code at the last of this post.
The big problem is, it's just for presenting of infinite enumerable, but the enumeration on it in fact doesn't make any sense, since there are no real elements of it.
So, besides provide dummy elements for the enumeration, there are four options I can imagine, and three lead to the StackOverflowException.
Throw an InvalidOperationException once it's going to be enumerated.
public IEnumerator<T> GetEnumerator() {
for(var message="Attempted to enumerate an infinite enumerable"; ; )
throw new InvalidOperationException(message);
}
and 3. are technically equivalent, let the stack overflowing occurs when it's really overflowed.
public IEnumerator<T> GetEnumerator() {
foreach(var x in this)
yield return x;
}
public IEnumerator<T> GetEnumerator() {
return this.GetEnumerator();
}
(described in 2)
Don't wait for it happens, throw StackOverflowException directly.
public IEnumerator<T> GetEnumerator() {
throw new StackOverflowException("... ");
}
The tricky things are:
If option 1 is applied, that is, enumerate on this enumerable, becomes an invalid operation. Isn't it weird to say that this lamp isn't used to illuminate(though it's true in my case).
If option 2 or option 3 is applied, that is, we planned the stack overflowing. Is it really as the title, just when stackoverflow is fair and sensible? Perfectly logical and reasonable?
The last choice is option 4. However, the stack in fact does not really overflow, since we prevented it by throwing a fake StackOverflowException. This reminds me that when Tom Cruise plays John Anderton said that: "But it didn't fall. You caught it. The fact that you prevented it from happening doesnt change the fact that it was going to happen."
Some good ways to avoid the illogical problems?
The code is compile-able and testable, note that one of OPTION_1 to OPTION_4 shoule be defined before compile.
Simple test
var objects=new object[] { };
Debug.Print("{0}", objects.IsInfinity());
var infObjects=objects.AsInterminable();
Debug.Print("{0}", infObjects.IsInfinity());
Classes
using System.Collections.Generic;
using System.Collections;
using System;
public static partial class Interminable /* extensions */ {
public static Interminable<T> AsInterminable<T>(this IEnumerable<T> x) {
return Infinity.OfType<T>();
}
public static Infinity AsInterminable(this IEnumerable x) {
return Infinity.OfType<object>();
}
public static bool IsInfinity(this IEnumerable x) {
var it=
x as Infinity??((Func<object>)(() => {
var info=x.GetType().GetField("source", bindingAttr);
return null!=info?info.GetValue(x):x;
}))();
return it is Infinity;
}
const BindingFlags bindingAttr=
BindingFlags.Instance|BindingFlags.NonPublic;
}
public abstract partial class Interminable<T>: Infinity, IEnumerable<T> {
IEnumerator IEnumerable.GetEnumerator() {
return this.GetEnumerator();
}
#if OPTION_1
public IEnumerator<T> GetEnumerator() {
for(var message="Attempted to enumerate an infinite enumerable"; ; )
throw new InvalidOperationException(message);
}
#endif
#if OPTION_2
public IEnumerator<T> GetEnumerator() {
foreach(var x in this)
yield return x;
}
#endif
#if OPTION_3
public IEnumerator<T> GetEnumerator() {
return this.GetEnumerator();
}
#endif
#if OPTION_4
public IEnumerator<T> GetEnumerator() {
throw new StackOverflowException("... ");
}
#endif
public Infinity LongCount<U>(
Func<U, bool> predicate=default(Func<U, bool>)) {
return this;
}
public Infinity Count<U>(
Func<U, bool> predicate=default(Func<U, bool>)) {
return this;
}
public Infinity LongCount(
Func<T, bool> predicate=default(Func<T, bool>)) {
return this;
}
public Infinity Count(
Func<T, bool> predicate=default(Func<T, bool>)) {
return this;
}
}
public abstract partial class Infinity: IFormatProvider, ICustomFormatter {
partial class Instance<T>: Interminable<T> {
public static readonly Interminable<T> instance=new Instance<T>();
}
object IFormatProvider.GetFormat(Type formatType) {
return typeof(ICustomFormatter)!=formatType?null:this;
}
String ICustomFormatter.Format(
String format, object arg, IFormatProvider formatProvider) {
return "Infinity";
}
public override String ToString() {
return String.Format(this, "{0}", this);
}
public static Interminable<T> OfType<T>() {
return Instance<T>.instance;
}
}
public IEnumerator<T> GetEnumerator()
{
while (true)
yield return default(T);
}
This will create an infinite enumerator - a foreach on it will never end and will just continue to give out the default value.
Note that you will not be able to determine IsInfinity() the way you wrote in your code. That is because new Infinity().Where(o => o == /*do any kind of comparison*/) will still be infinite but will have a different type.
As mentioned in the other post you linked, an infinite enumeration makes perfectly sense for C# to enumerate and there are an huge amount of real-world examples where people write enumerators that just do never end(first thing that springs off my mind is a random number generator).
So you have a particular case in your mathematical problem, where you need to define a special value (infinite number of points of intersection). Usually, that is where I use simple static constants for. Just define some static constant IEnumerable and test against it to find out whether your algorithm had the "infinite number of intersection" as result.
To more specific answer your current question: DO NOT EVER EVER cause a real stack overflow. This is about the nastiest thing you can do to users of your code. It can not be caught and will immediately terminate your process(probably the only exception is when you are running inside an attached instrumenting debugger).
If at all, I would use NotSupportedException which is used in other places to signal that some class do not support a feature(E.g. ICollections may throw this in Remove() if they are read-only).
If I understand correctly -- infinite is a confusing word here. I think you need a monad which is either enumerable or not. But let's stick with infinite for now.
I cannot think of a nice way of implementing this in C#. All ways this could be implemented don't integrate with C# generators.
With C# generator, you can only emit valid values; so there's no way to indicate that this is an infinite enumerable. I don't like idea of throwing exceptions from generator to indicate that it is infinite; because to check that it is infinite, you will have to to try-catch every time.
If you don't need to support generators, then I see following options :
Implement sentinel enumerable:
public class InfiniteEnumerable<T>: IEnumerable<T> {
private static InfiniteEnumerable<T> val;
public static InfiniteEnumerable<T> Value {
get {
return val;
}
}
public IEnumerator<T> GetEnumerator() {
throw new InvalidOperationException(
"This enumerable cannot be enumerated");
}
IEnumerator IEnumerable.GetEnumerator() {
throw new InvalidOperationException(
"This enumerable cannot be enumerated");
}
}
Sample usage:
IEnumerable<int> enumerable=GetEnumerable();
if(enumerable==InfiniteEnumerable<int>.Value) {
// This is 'infinite' enumerable.
}
else {
// enumerate it here.
}
Implement Infinitable<T> wrapper:
public class Infinitable<T>: IEnumerable<T> {
private IEnumerable<T> enumerable;
private bool isInfinite;
public Infinitable(IEnumerable<T> enumerable) {
this.enumerable=enumerable;
this.isInfinite=false;
}
public Infinitable() {
this.isInfinite=true;
}
public bool IsInfinite {
get {
return isInfinite;
}
}
public IEnumerator<T> GetEnumerator() {
if(isInfinite) {
throw new InvalidOperationException(
"The enumerable cannot be enumerated");
}
return this.enumerable.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator() {
if(isInfinite) {
throw new InvalidOperationException(
"The enumerable cannot be enumerated");
}
return this.enumerable.GetEnumerator();
}
}
Sample usage:
Infinitable<int> enumerable=GetEnumerable();
if(enumerable.IsInfinite) {
// This is 'infinite' enumerable.
}
else {
// enumerate it here.
foreach(var i in enumerable) {
}
}
Infinite sequences may be perfectly iterable/enumerable. Natural numbers are enumerable and so are rational numbers or PI digits. Infinite is the opposite of finite, not enumerable.
The variants that you've provided don't represent the infinite sequences. There are infinitely many different infinite sequences and you can see that they're different by iterating through them. Your idea, on the other hand, is to have a singleton, which goes against that diversity.
If you have something that cannot be enumerated (like the set of real numbers), then you just shouldn't define it as IEnumerable as it's breaking the contract.
If you want to discern between finite and infinite enumerable sequences, just crate a new interface IInfiniteEnumerable : IEnumerable and mark infinite sequences with it.
Interface that marks infinite sequences
public interface IInfiniteEnumerable<T> : IEnumerable<T> {
}
A wrapper to convert an existing IEnumerable<T> to IInfiniteEnumerable<T> (IEnumerables are easily created with C#'s yield syntax, but we need to convert them to IInfiniteEnumerable )
public class InfiniteEnumerableWrapper<T> : IInfiniteEnumerable<T> {
IEnumerable<T> _enumerable;
public InfiniteEnumerableWrapper(IEnumerable<T> enumerable) {
_enumerable = enumerable;
}
public IEnumerator<T> GetEnumerator() {
return _enumerable.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator() {
return _enumerable.GetEnumerator();
}
}
Some infinity-aware routines (like calculating the sequence length)
//TryGetCount() returns null if the sequence is infinite
public static class EnumerableExtensions {
public static int? TryGetCount<T>(this IEnumerable<T> sequence) {
if (sequence is IInfiniteEnumerable<T>) {
return null;
} else {
return sequence.Count();
}
}
}
Two examples of sequences - a finite range sequence and the infinite Fibonacci sequence.
public class Sequences {
public static IEnumerable<int> GetIntegerRange(int start, int count) {
return Enumerable.Range(start, count);
}
public static IInfiniteEnumerable<int> GetFibonacciSequence() {
return new InfiniteEnumerableWrapper<int>(GetFibonacciSequenceInternal());
}
static IEnumerable<int> GetFibonacciSequenceInternal() {
var p = 0;
var q = 1;
while (true) {
yield return p;
var newQ = p + q;
p = q;
q = newQ;
}
}
}
A test app that generates random sequences and tries to calculate their lengths.
public class TestApp {
public static void Main() {
for (int i = 0; i < 20; i++) {
IEnumerable<int> sequence = GetRandomSequence();
Console.WriteLine(sequence.TryGetCount() ?? double.PositiveInfinity);
}
Console.ReadLine();
}
static Random _rng = new Random();
//Randomly generates an finite or infinite sequence
public static IEnumerable<int> GetRandomSequence() {
int random = _rng.Next(5) * 10;
if (random == 0) {
return Sequences.GetFibonacciSequence();
} else {
return Sequences.GetIntegerRange(0, random);
}
}
}
The program output something like this:
20
40
20
10
20
10
20
Infinity
40
30
40
Infinity
Infinity
40
40
30
20
30
40
30
Is there a significant complexity difference between these two implementation or does the compiler optimize it anyway?
Usage:
for(int i = 0; i < int.MaxValue; i++)
{
foreach(var item in GoodItems)
{
if(DoSomethingBad(item))
break; // this is later added.
}
}
Implementation (1):
public IEnumerable<T> GoodItems
{
get { return _list.Where(x => x.IsGood); }
}
Implementation (2):
public IEnumerable<T> GoodItems
{
get { foreach(var item in _list.Where(x => x.IsGood)) yield return item; }
}
It appears that IEnumerable methods should always be implemented using (2)? When is one better than the other?
I just built an example program and then used ILSpy to examine the output assembly. The second option will actually generate an extra class that wraps the call to Where but adds zero value to the code. The extra layer the code must follow will probably not cause performance issues in most programs but consider all the extra syntax just to perform the same thing at a slightly slower speed. Not worth it in my book.
where uses yield return internally. You don't need to wrap it in another yield return.
You do _list.where(x => x.IsGood); in both. With that said, isn't it obvious which has to be the better usage?
yield return has its usages, but this scenario, especially in a getter, is not the one
The extra code without payload in "implementation 2" is the less evil here.
Both variants lead to undesirable creation of new object each time you call the property getter. So, results of two sequential getter calls will not be equal:
interface IItem
{
bool IsGood { get; set; }
}
class ItemsContainer<T>
where T : IItem
{
private readonly List<T> items = new List<T>();
public IEnumerable<T> GoodItems
{
get { return items.Where(item => item.IsGood); }
}
// ...
}
// somewhere in code
class Item : IItem { /* ... */ }
var container = new ItemsContainer<Item>();
Console.WriteLine(container.GoodItems == container.GoodItems); // False; Oops!
You should avoid this side-effect:
class ItemsContainer<T>
where T : IItem
{
private readonly List<T> items;
private readonly Lazy<IEnumerable<T>> goodItems;
public ItemsContainer()
{
this.items = new List<T>();
this.goodItems = new Lazy<IEnumerable<T>>(() => items.Where(item => item.IsGood));
}
public IEnumerable<T> GoodItems
{
get { return goodItems.Value; }
}
// ...
}
or make a method instead of property:
public IEnumerable<T> GetGoodItems()
{
return _list.Where(x => x.IsGood);
}
Also, the property is not a good idea, if you want to provide snapshot of your items to the client code.
Internally, the first version gets compiled down to something that looks like this:
public IEnumerable<T> GoodItems
{
get
{
foreach (var item in _list)
if (item.IsGood)
yield return item;
}
}
Whereas the second one will now look something like:
public IEnumerable<T> GoodItems
{
get
{
foreach (var item in GoodItemsHelper)
yield return item;
}
}
private IEnumerable<T> GoodItemsHelper
{
get
{
foreach (var item in _list)
if (item.IsGood)
yield return item;
}
}
The Where clause in LINQ is implemented with deferred execution. So there's no need to apply the foreach (...) yield return ... pattern. You're making more work for yourself, and potentially for the runtime.
I don't know if the second version gets jitted to the same thing as the first. Semantically, the two are distinct in that the first does a single round of deferred execution while the second does two rounds. On those grounds I'd argue that the second would be more complex.
The real question you need to ask is: When you're exposing the IEnumerable, what guarantees are you making? Are you saying that you want to simply provide forward iteration? Or are you stating that your interface provides deferred execution?
In the code below, my intent for is to simply provide forward enumeration without random access:
private List<Int32> _Foo = new List<Int32>() { 1, 2, 3, 4, 5 };
public IEnumerable<Int32> Foo
{
get
{
return _Foo;
}
}
But here, I want to prevent unnecessary computation. I want my expensive computation to be performed only when a result is requested.
private List<Int32> _Foo = new List<Int32>() { 1, 2, 3, 4, 5 };
public IEnumerable<Int32> Foo
{
get
{
foreach (var item in _Foo)
{
var result = DoSomethingExpensive(item);
yield return result;
}
}
}
Even though both versions of Foo look identical on the outside, their internal implementation does different things. That's the part that you need to watch out for. When you use LINQ, you don't need to worry about deferring execution since most operators do it for you. In your own code, you may wish to go with the first or second depending on your needs.
I need a HashSet that preserves insertion ordering, are there any implementations of this in the framework?
Standard .NET HashSet do not preserve the insertion order.
For simple tests the insertion order may be preserved due to an accident, but it's not guaranteed and would not always work that way. To prove that it is enough to do some removals in between.
See this question for more information on that: Does HashSet preserve insertion order?
I have briefly implemented a HashSet which guarantees insertion order. It uses the Dictionary to look up items and the LinkedList to preserve order. All three insertion, removal and lookup work still in O(1).
public class OrderedSet<T> : ICollection<T>
{
private readonly IDictionary<T, LinkedListNode<T>> m_Dictionary;
private readonly LinkedList<T> m_LinkedList;
public OrderedSet()
: this(EqualityComparer<T>.Default)
{
}
public OrderedSet(IEqualityComparer<T> comparer)
{
m_Dictionary = new Dictionary<T, LinkedListNode<T>>(comparer);
m_LinkedList = new LinkedList<T>();
}
public int Count => m_Dictionary.Count;
public virtual bool IsReadOnly => m_Dictionary.IsReadOnly;
void ICollection<T>.Add(T item)
{
Add(item);
}
public bool Add(T item)
{
if (m_Dictionary.ContainsKey(item)) return false;
var node = m_LinkedList.AddLast(item);
m_Dictionary.Add(item, node);
return true;
}
public void Clear()
{
m_LinkedList.Clear();
m_Dictionary.Clear();
}
public bool Remove(T item)
{
if (item == null) return false;
var found = m_Dictionary.TryGetValue(item, out var node);
if (!found) return false;
m_Dictionary.Remove(item);
m_LinkedList.Remove(node);
return true;
}
public IEnumerator<T> GetEnumerator()
{
return m_LinkedList.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public bool Contains(T item)
{
return item != null && m_Dictionary.ContainsKey(item);
}
public void CopyTo(T[] array, int arrayIndex)
{
m_LinkedList.CopyTo(array, arrayIndex);
}
}
You can get this functionality easily using KeyedCollection<TKey,TItem> specifying the same type argument for TKey and TItem:
public class OrderedHashSet<T> : KeyedCollection<T, T>
{
protected override T GetKeyForItem(T item)
{
return item;
}
}
If you need constant complexity of Add, Remove, Contains and order preservation, then there's no such collection in .NET Framework 4.5.
If you're okay with 3rd party code, take a look at my repository (permissive MIT license):
https://github.com/OndrejPetrzilka/Rock.Collections
There's OrderedHashSet<T> collection:
based on classic HashSet<T> source code (from .NET Core)
preserves order of insertions and allows manual reordering
features reversed enumeration
has same operation complexities as HashSet<T>
Add and Remove operations are 20% slower compared to HashSet<T>
consumes 8 more bytes of memory per item
You can use OrderedDictionary to preserve the order of insertion. But beware of the cost of Removing items (O(n)).