HashSet that preserves ordering - c#

I need a HashSet that preserves insertion ordering, are there any implementations of this in the framework?

Standard .NET HashSet do not preserve the insertion order.
For simple tests the insertion order may be preserved due to an accident, but it's not guaranteed and would not always work that way. To prove that it is enough to do some removals in between.
See this question for more information on that: Does HashSet preserve insertion order?
I have briefly implemented a HashSet which guarantees insertion order. It uses the Dictionary to look up items and the LinkedList to preserve order. All three insertion, removal and lookup work still in O(1).
public class OrderedSet<T> : ICollection<T>
{
private readonly IDictionary<T, LinkedListNode<T>> m_Dictionary;
private readonly LinkedList<T> m_LinkedList;
public OrderedSet()
: this(EqualityComparer<T>.Default)
{
}
public OrderedSet(IEqualityComparer<T> comparer)
{
m_Dictionary = new Dictionary<T, LinkedListNode<T>>(comparer);
m_LinkedList = new LinkedList<T>();
}
public int Count => m_Dictionary.Count;
public virtual bool IsReadOnly => m_Dictionary.IsReadOnly;
void ICollection<T>.Add(T item)
{
Add(item);
}
public bool Add(T item)
{
if (m_Dictionary.ContainsKey(item)) return false;
var node = m_LinkedList.AddLast(item);
m_Dictionary.Add(item, node);
return true;
}
public void Clear()
{
m_LinkedList.Clear();
m_Dictionary.Clear();
}
public bool Remove(T item)
{
if (item == null) return false;
var found = m_Dictionary.TryGetValue(item, out var node);
if (!found) return false;
m_Dictionary.Remove(item);
m_LinkedList.Remove(node);
return true;
}
public IEnumerator<T> GetEnumerator()
{
return m_LinkedList.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public bool Contains(T item)
{
return item != null && m_Dictionary.ContainsKey(item);
}
public void CopyTo(T[] array, int arrayIndex)
{
m_LinkedList.CopyTo(array, arrayIndex);
}
}

You can get this functionality easily using KeyedCollection<TKey,TItem> specifying the same type argument for TKey and TItem:
public class OrderedHashSet<T> : KeyedCollection<T, T>
{
protected override T GetKeyForItem(T item)
{
return item;
}
}

If you need constant complexity of Add, Remove, Contains and order preservation, then there's no such collection in .NET Framework 4.5.
If you're okay with 3rd party code, take a look at my repository (permissive MIT license):
https://github.com/OndrejPetrzilka/Rock.Collections
There's OrderedHashSet<T> collection:
based on classic HashSet<T> source code (from .NET Core)
preserves order of insertions and allows manual reordering
features reversed enumeration
has same operation complexities as HashSet<T>
Add and Remove operations are 20% slower compared to HashSet<T>
consumes 8 more bytes of memory per item

You can use OrderedDictionary to preserve the order of insertion. But beware of the cost of Removing items (O(n)).

Related

How to segmentate an IList<T> to segments of N size, without creating copies and without memory allocations?

I have a very large collection that implements the generic IList<T> interface and contains tens of millions of elements, and I would like to process them in parallel using PLINQ. I noticed that the overhead of parallelism is quite significant because processing each individual element is very lightweight, so I am searching for ways to chunkify the processing by splitting the IList<T> into small segments. My goal is to have finally something like this:
IList<Person> source = GetAllPersons();
double averageAge = source
.Segmentate(1000) // Hypothetical operator that segmentates the source
.AsParallel()
.Select(segment => segment.Select(person => (double)person.CalculateAge()).Sum())
.Sum() / source.Count;
I could use the Batch operator from the MoreLinq library, or any of the answers from many related questions, but all of these solutions are allocating multiple small arrays (or lists), and are copying the source into these containers, and I don't want that. In my case I have the additional requirement of keeping the garbage collector idle as much as possible.
I noticed that the .NET has the ArraySegment type, that seems to fit perfectly with my requirements:
// Delimits a section of a one-dimensional array.
public readonly struct ArraySegment<T> : ICollection<T>, IEnumerable<T>,
IEnumerable, IList<T>, IReadOnlyCollection<T>, IReadOnlyList<T>
I could use this type to implement the allocation-free Segmentate operator like this:
/// <summary>Segmentates the source array into sized segments.</summary>
public static IEnumerable<ArraySegment<T>> Segmentate<T>(this T[] source, int size)
{
for (int offset = 0; offset < source.Length; offset += size)
{
yield return new ArraySegment<T>(source, offset,
Math.Min(size, source.Length - offset));
}
}
But I can't use this type because my source is an IList<T> and not an array. And copying it to an array is not really an option, because the source is mutated frequently. And creating new array copies all the time is against my requirements.
So I am searching for a ListSegment<T> type, but as far as I can see it doesn't exist in .NET. Do I have to implement it myself? And if so, how? Or is any other way to segmentate an IList<T> without causing allocations?
Clarification: My source collection is not a List<T>. It is a custom class that implements the IList<T> interface.
You need to implement an ArraySegment<T> equivalent for IList<T>. See implementation below. For optimal performance, consider using spans instead.
ListSegment<T> Struct
public readonly struct ListSegment<T> : IList<T>
{
public List<T> Items { get; }
public int Offset { get; }
public int Count { get; }
public ListSegment(List<T> items, int offset, int count)
{
Items = items ?? throw new ArgumentNullException(nameof(items));
Offset = offset;
Count = count;
if (items.Count < offset + count)
{
throw new ArgumentException("List segment out of range.", nameof(count));
}
}
public void CopyTo(T[] array, int index)
{
if (Count > 0)
{
Items.CopyTo(Offset, array, index, Count);
}
}
public bool Contains(T item) => IndexOf(item) != -1;
public int IndexOf(T item)
{
for (var i = Offset; i < Offset + Count; i++)
{
if (Items[i].Equals(item))
{
return i;
}
}
return -1;
}
private T ElementAt(int index)
{
if (Count > 0)
{
return Items[Offset + index];
}
throw new ArgumentOutOfRangeException(nameof(index));
}
public ListSegmentEnumerator GetEnumerator() => new ListSegmentEnumerator(this);
#region IEnumerable<T> interface
IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
#endregion
#region ICollection<T> interface
bool ICollection<T>.IsReadOnly => true;
void ICollection<T>.Add(T item) => throw new NotImplementedException();
bool ICollection<T>.Remove(T item) => throw new NotImplementedException();
void ICollection<T>.Clear() => throw new NotImplementedException();
#endregion
#region IList<T> interface
void IList<T>.Insert(int index, T item) => throw new NotImplementedException();
void IList<T>.RemoveAt(int index) => throw new NotImplementedException();
T IList<T>.this[int index]
{
get => ElementAt(index);
set => throw new NotImplementedException();
}
#endregion
public struct ListSegmentEnumerator : IEnumerator<T>
{
private readonly List<T> items;
private readonly int start;
private readonly int end;
private int current;
public ListSegmentEnumerator(ListSegment<T> segment)
{
items = segment.Items;
start = segment.Offset;
end = start + segment.Count;
current = start - 1;
}
public bool MoveNext()
{
if (current < end)
{
current++;
return current < end;
}
return false;
}
public T Current => items[current];
object IEnumerator.Current => Current;
void IEnumerator.Reset() => current = start - 1;
void IDisposable.Dispose() { }
}
}
This answer assumes that your concrete IList, is of type List. You can use the GetRange function, which pretty much does what you want:
A shallow copy of a collection of reference types, or a subset of that collection, contains only the references to the elements of the collection. The objects themselves are not copied. The references in the new list point to the same objects as the references in the original list.
Even the ArraySegment<T> will create some kind of reference object to store the array segment, so you can't completely avoid that.
If you want to avoid storing the references (aka shallow copy), then a Span would be in order, but your comment that your collection changes conflicts with this
Items should not be added or removed from the List while the Span is in use.
So, your only other solution, would be to create one yourself as you mentioned.
Warning: There is a reason why such a thing does not exist built in. The array is fixed size, so getting a segment is much safer to handle. Be careful of unexpected consequences and side-effects when creating such a construct. This is the reason why the Span warns you that it's unsafe. You only know your requirements and how your data changes, so your collection wrapper should take them into account and handle them accordingly.

What would make this HashSet implementation fail?

In order to make my life easier dealing to strings, I want to use a hashset initialized with StringComparer.OrdinalIgnoreCase.
But sometimes, I need to make an operation on all items.
This is clearly not the way I will achieve my goal for obvious performance reasons, but I'd like to know if this code makes sense, especially the "Set" part of the index, and how it could cause unwanted side effects on the collection.
Here is the HashSet implementation:
public class MyHashSet<T> : HashSet<T>
{
public T this[int index]
{
get
{
int i = 0;
foreach (T t in this)
{
if (i == index)
return t;
i++;
}
throw new IndexOutOfRangeException();
}
set
{
int i = 0;
foreach (T t in this)
{
if (i == index)
{
this.RemoveWhere(element => element.Equals(t));
this.Add(value);
return;
}
i++;
}
throw new IndexOutOfRangeException();
}
}
public MyHashSet()
{
}
public MyHashSet(IEnumerable<T> collection)
: base(collection)
{
}
public MyHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
: base(collection, comparer)
{
}
public MyHashSet(IEqualityComparer<T> comparer)
: base(comparer)
{
}
}
In what conditions isn't it safe?
In what conditions isn't it safe?
Any. You're trying to access the items in a HashSet by index, but they have no logical index. The order in which they are iterated is arbitrary and cannot be relied on, so the method doesn't make sense, at a conceptual level, under any circumstances.
If you want to be able to access items by index then use a collection that is ordered, such as a List.

Passing a single item as IEnumerable<T>

Is there a common way to pass a single item of type T to a method which expects an IEnumerable<T> parameter? Language is C#, framework version 2.0.
Currently I am using a helper method (it's .Net 2.0, so I have a whole bunch of casting/projecting helper methods similar to LINQ), but this just seems silly:
public static class IEnumerableExt
{
// usage: IEnumerableExt.FromSingleItem(someObject);
public static IEnumerable<T> FromSingleItem<T>(T item)
{
yield return item;
}
}
Other way would of course be to create and populate a List<T> or an Array and pass it instead of IEnumerable<T>.
[Edit] As an extension method it might be named:
public static class IEnumerableExt
{
// usage: someObject.SingleItemAsEnumerable();
public static IEnumerable<T> SingleItemAsEnumerable<T>(this T item)
{
yield return item;
}
}
Am I missing something here?
[Edit2] We found someObject.Yield() (as #Peter suggested in the comments below) to be the best name for this extension method, mainly for brevity, so here it is along with the XML comment if anyone wants to grab it:
public static class IEnumerableExt
{
/// <summary>
/// Wraps this object instance into an IEnumerable<T>
/// consisting of a single item.
/// </summary>
/// <typeparam name="T"> Type of the object. </typeparam>
/// <param name="item"> The instance that will be wrapped. </param>
/// <returns> An IEnumerable<T> consisting of a single item. </returns>
public static IEnumerable<T> Yield<T>(this T item)
{
yield return item;
}
}
Well, if the method expects an IEnumerable you've got to pass something that is a list, even if it contains one element only.
passing
new[] { item }
as the argument should be enough I think
In C# 3.0 you can utilize the System.Linq.Enumerable class:
// using System.Linq
Enumerable.Repeat(item, 1);
This will create a new IEnumerable that only contains your item.
Your helper method is the cleanest way to do it, IMO. If you pass in a list or an array, then an unscrupulous piece of code could cast it and change the contents, leading to odd behaviour in some situations. You could use a read-only collection, but that's likely to involve even more wrapping. I think your solution is as neat as it gets.
In C# 3 (I know you said 2), you can write a generic extension method which might make the syntax a little more acceptable:
static class IEnumerableExtensions
{
public static IEnumerable<T> ToEnumerable<T>(this T item)
{
yield return item;
}
}
client code is then item.ToEnumerable().
This helper method works for item or many.
public static IEnumerable<T> ToEnumerable<T>(params T[] items)
{
return items;
}
I'm kind of surprised that no one suggested a new overload of the method with an argument of type T to simplify the client API.
public void DoSomething<T>(IEnumerable<T> list)
{
// Do Something
}
public void DoSomething<T>(T item)
{
DoSomething(new T[] { item });
}
Now your client code can just do this:
MyItem item = new MyItem();
Obj.DoSomething(item);
or with a list:
List<MyItem> itemList = new List<MyItem>();
Obj.DoSomething(itemList);
Either (as has previously been said)
MyMethodThatExpectsAnIEnumerable(new[] { myObject });
or
MyMethodThatExpectsAnIEnumerable(Enumerable.Repeat(myObject, 1));
As a side note, the last version can also be nice if you want an empty list of an anonymous object, e.g.
var x = MyMethodThatExpectsAnIEnumerable(Enumerable.Repeat(new { a = 0, b = "x" }, 0));
I agree with #EarthEngine's comments to the original post, which is that 'AsSingleton' is a better name. See this wikipedia entry. Then it follows from the definition of singleton that if a null value is passed as an argument that 'AsSingleton' should return an IEnumerable with a single null value instead of an empty IEnumerable which would settle the if (item == null) yield break; debate. I think the best solution is to have two methods: 'AsSingleton' and 'AsSingletonOrEmpty'; where, in the event that a null is passed as an argument, 'AsSingleton' will return a single null value and 'AsSingletonOrEmpty' will return an empty IEnumerable. Like this:
public static IEnumerable<T> AsSingletonOrEmpty<T>(this T source)
{
if (source == null)
{
yield break;
}
else
{
yield return source;
}
}
public static IEnumerable<T> AsSingleton<T>(this T source)
{
yield return source;
}
Then, these would, more or less, be analogous to the 'First' and 'FirstOrDefault' extension methods on IEnumerable which just feels right.
This is 30% faster than yield or Enumerable.Repeat when used in foreach due to this C# compiler optimization, and of the same performance in other cases.
public struct SingleSequence<T> : IEnumerable<T> {
public struct SingleEnumerator : IEnumerator<T> {
private readonly SingleSequence<T> _parent;
private bool _couldMove;
public SingleEnumerator(ref SingleSequence<T> parent) {
_parent = parent;
_couldMove = true;
}
public T Current => _parent._value;
object IEnumerator.Current => Current;
public void Dispose() { }
public bool MoveNext() {
if (!_couldMove) return false;
_couldMove = false;
return true;
}
public void Reset() {
_couldMove = true;
}
}
private readonly T _value;
public SingleSequence(T value) {
_value = value;
}
public IEnumerator<T> GetEnumerator() {
return new SingleEnumerator(ref this);
}
IEnumerator IEnumerable.GetEnumerator() {
return new SingleEnumerator(ref this);
}
}
in this test:
// Fastest among seqs, but still 30x times slower than direct sum
// 49 mops vs 37 mops for yield, or c.30% faster
[Test]
public void SingleSequenceStructForEach() {
var sw = new Stopwatch();
sw.Start();
long sum = 0;
for (var i = 0; i < 100000000; i++) {
foreach (var single in new SingleSequence<int>(i)) {
sum += single;
}
}
sw.Stop();
Console.WriteLine($"Elapsed {sw.ElapsedMilliseconds}");
Console.WriteLine($"Mops {100000.0 / sw.ElapsedMilliseconds * 1.0}");
}
As I have just found, and seen that user LukeH suggested too, a nice simple way of doing this is as follows:
public static void PerformAction(params YourType[] items)
{
// Forward call to IEnumerable overload
PerformAction(items.AsEnumerable());
}
public static void PerformAction(IEnumerable<YourType> items)
{
foreach (YourType item in items)
{
// Do stuff
}
}
This pattern will allow you to call the same functionality in a multitude of ways: a single item; multiple items (comma-separated); an array; a list; an enumeration, etc.
I'm not 100% sure on the efficiency of using the AsEnumerable method though, but it does work a treat.
Update: The AsEnumerable function looks pretty efficient! (reference)
Although it's overkill for one method, I believe some people may find the Interactive Extensions useful.
The Interactive Extensions (Ix) from Microsoft includes the following method.
public static IEnumerable<TResult> Return<TResult>(TResult value)
{
yield return value;
}
Which can be utilized like so:
var result = EnumerableEx.Return(0);
Ix adds new functionality not found in the original Linq extension methods, and is a direct result of creating the Reactive Extensions (Rx).
Think, Linq Extension Methods + Ix = Rx for IEnumerable.
You can find both Rx and Ix on CodePlex.
I recently asked the same thing on another post
Is there a way to call a C# method requiring an IEnumerable<T> with a single value? ...with benchmarking.
I wanted people stopping by here to see the brief benchmark comparison shown at that newer post for 4 of the approaches presented in these answers.
It seems that simply writing new[] { x } in the arguments to the method is the shortest and fastest solution.
This may not be any better but it's kind of cool:
Enumerable.Range(0, 1).Select(i => item);
Sometimes I do this, when I'm feeling impish:
"_".Select(_ => 3.14) // or whatever; any type is fine
This is the same thing with less shift key presses, heh:
from _ in "_" select 3.14
For a utility function I find this to be the least verbose, or at least more self-documenting than an array, although it'll let multiple values slide; as a plus it can be defined as a local function:
static IEnumerable<T> Enumerate (params T[] v) => v;
// usage:
IEnumerable<double> example = Enumerate(1.234);
Here are all of the other ways I was able to think of (runnable here):
using System;
using System.Collections.Generic;
using System.Linq;
public class Program {
public static IEnumerable<T> ToEnumerable1 <T> (T v) {
yield return v;
}
public static T[] ToEnumerable2 <T> (params T[] vs) => vs;
public static void Main () {
static IEnumerable<T> ToEnumerable3 <T> (params T[] v) => v;
p( new string[] { "three" } );
p( new List<string> { "three" } );
p( ToEnumerable1("three") ); // our utility function (yield return)
p( ToEnumerable2("three") ); // our utility function (params)
p( ToEnumerable3("three") ); // our local utility function (params)
p( Enumerable.Empty<string>().Append("three") );
p( Enumerable.Empty<string>().DefaultIfEmpty("three") );
p( Enumerable.Empty<string>().Prepend("three") );
p( Enumerable.Range(3, 1) ); // only for int
p( Enumerable.Range(0, 1).Select(_ => "three") );
p( Enumerable.Repeat("three", 1) );
p( "_".Select(_ => "three") ); // doesn't have to be "_"; just any one character
p( "_".Select(_ => 3.3333) );
p( from _ in "_" select 3.0f );
p( "a" ); // only for char
// these weren't available for me to test (might not even be valid):
// new Microsoft.Extensions.Primitives.StringValues("three")
}
static void p <T> (IEnumerable<T> e) =>
Console.WriteLine(string.Join(' ', e.Select((v, k) => $"[{k}]={v,-8}:{v.GetType()}").DefaultIfEmpty("<empty>")));
}
For those wondering about performance, while #mattica has provided some benchmarking information in a similar question referenced above, My benchmark tests, however, have provided a different result.
In .NET 7, yield return value is ~9% faster than new T[] { value } and allocates 75% the amount of memory. In most cases, this is already hyper-performant and is as good as you'll ever need.
I was curious if a custom single collection implementation would be faster or more lightweight. It turns out because yield return is implemented as IEnumerator<T> and IEnumerable<T>, the only way to beat it in terms of allocation is to do that in my implementation as well.
If you're passing IEnumerable<> to an outside library, I would strongly recommend not doing this unless you're very familiar with what you're building. That being said, I made a very simple (not-reuse-safe) implementation which was able to beat the yield method by 5ns and allocated only half as much as the array.
Because all tests were passed an IEnumerable<T>, value types generally performed worse than reference types. The best implementation I had was actually the simplest - you can look at the SingleCollection class in the gist I linked to. (This was 2ns faster than yield return, but allocated 88% of what the array would, compared to the 75% allocated for yield return.)
TL:DR; if you care about speed, use yield return item. If you really care about speed, use a SingleCollection.
The easiest way I'd say would be new T[]{item};; there's no syntax to do this. The closest equivalent that I can think of is the params keyword, but of course that requires you to have access to the method definition and is only usable with arrays.
Enumerable.Range(1,1).Select(_ => {
//Do some stuff... side effects...
return item;
});
The above code is useful when using like
var existingOrNewObject = MyData.Where(myCondition)
.Concat(Enumerable.Range(1,1).Select(_ => {
//Create my object...
return item;
})).Take(1).First();
In the above code snippet there is no empty/null check, and it is guaranteed to have only one object returned without afraid of exceptions. Furthermore, because it is lazy, the closure will not be executed until it is proved there is no existing data fits the criteria.
To be filed under "Not necessarily a good solution, but still...a solution" or "Stupid LINQ tricks", you could combine Enumerable.Empty<>() with Enumerable.Append<>()...
IEnumerable<string> singleElementEnumerable = Enumerable.Empty<string>().Append("Hello, World!");
...or Enumerable.Prepend<>()...
IEnumerable<string> singleElementEnumerable = Enumerable.Empty<string>().Prepend("Hello, World!");
The latter two methods are available since .NET Framework 4.7.1 and .NET Core 1.0.
This is a workable solution if one were really intent on using existing methods instead of writing their own, though I'm undecided if this is more or less clear than the Enumerable.Repeat<>() solution. This is definitely longer code (partly due to type parameter inference not being possible for Empty<>()) and creates twice as many enumerator objects, however.
Rounding out this "Did you know these methods exist?" answer, Array.Empty<>() could be substituted for Enumerable.Empty<>(), but it's hard to argue that makes the situation...better.
I'm a bit late to the party but I'll share my way anyway.
My problem was that I wanted to bind the ItemSource or a WPF TreeView to a single object. The hierarchy looks like this:
Project > Plot(s) > Room(s)
There was always going to be only one Project but I still wanted to Show the project in the Tree, without having to pass a Collection with only that one object in it like some suggested.
Since you can only pass IEnumerable objects as ItemSource I decided to make my class IEnumerable:
public class ProjectClass : IEnumerable<ProjectClass>
{
private readonly SingleItemEnumerator<AufmassProjekt> enumerator;
...
public IEnumerator<ProjectClass > GetEnumerator() => this.enumerator;
IEnumerator IEnumerable.GetEnumerator() => this.GetEnumerator();
}
And create my own Enumerator accordingly:
public class SingleItemEnumerator : IEnumerator
{
private bool hasMovedOnce;
public SingleItemEnumerator(object current)
{
this.Current = current;
}
public bool MoveNext()
{
if (this.hasMovedOnce) return false;
this.hasMovedOnce = true;
return true;
}
public void Reset()
{ }
public object Current { get; }
}
public class SingleItemEnumerator<T> : IEnumerator<T>
{
private bool hasMovedOnce;
public SingleItemEnumerator(T current)
{
this.Current = current;
}
public void Dispose() => (this.Current as IDisposable).Dispose();
public bool MoveNext()
{
if (this.hasMovedOnce) return false;
this.hasMovedOnce = true;
return true;
}
public void Reset()
{ }
public T Current { get; }
object IEnumerator.Current => this.Current;
}
This is probably not the "cleanest" solution but it worked for me.
EDIT
To uphold the single responsibility principle as #Groo pointed out I created a new wrapper class:
public class SingleItemWrapper : IEnumerable
{
private readonly SingleItemEnumerator enumerator;
public SingleItemWrapper(object item)
{
this.enumerator = new SingleItemEnumerator(item);
}
public object Item => this.enumerator.Current;
public IEnumerator GetEnumerator() => this.enumerator;
}
public class SingleItemWrapper<T> : IEnumerable<T>
{
private readonly SingleItemEnumerator<T> enumerator;
public SingleItemWrapper(T item)
{
this.enumerator = new SingleItemEnumerator<T>(item);
}
public T Item => this.enumerator.Current;
public IEnumerator<T> GetEnumerator() => this.enumerator;
IEnumerator IEnumerable.GetEnumerator() => this.GetEnumerator();
}
Which I used like this
TreeView.ItemSource = new SingleItemWrapper(itemToWrap);
EDIT 2
I corrected a mistake with the MoveNext() method.
I prefer
public static IEnumerable<T> Collect<T>(this T item, params T[] otherItems)
{
yield return item;
foreach (var otherItem in otherItems)
{
yield return otherItem;
}
}
This lets you call item.Collect() if you want the singleton, but it also lets you call item.Collect(item2, item3) if you want

Generic Key/Value pair collection in that preserves insertion order?

I'm looking for something like a Dictionary<K,V> however with a guarantee that it preserves insertion order. Since Dictionary is a hashtable, I do not think it does.
Is there a generic collection for this, or do I need to use one of the old .NET 1.1 collections?
There is not. However, System.Collections.Specialized.OrderedDictionary should solve most need for it.
EDIT: Another option is to turn this into a Generic. I haven't tested it but it compiles (C# 6) and should work. However, it will still have the same limitations that Ondrej Petrzilka mentions in comments below.
public class OrderdDictionary<T, K>
{
public OrderedDictionary UnderlyingCollection { get; } = new OrderedDictionary();
public K this[T key]
{
get
{
return (K)UnderlyingCollection[key];
}
set
{
UnderlyingCollection[key] = value;
}
}
public K this[int index]
{
get
{
return (K)UnderlyingCollection[index];
}
set
{
UnderlyingCollection[index] = value;
}
}
public ICollection<T> Keys => UnderlyingCollection.Keys.OfType<T>().ToList();
public ICollection<K> Values => UnderlyingCollection.Values.OfType<K>().ToList();
public bool IsReadOnly => UnderlyingCollection.IsReadOnly;
public int Count => UnderlyingCollection.Count;
public IDictionaryEnumerator GetEnumerator() => UnderlyingCollection.GetEnumerator();
public void Insert(int index, T key, K value) => UnderlyingCollection.Insert(index, key, value);
public void RemoveAt(int index) => UnderlyingCollection.RemoveAt(index);
public bool Contains(T key) => UnderlyingCollection.Contains(key);
public void Add(T key, K value) => UnderlyingCollection.Add(key, value);
public void Clear() => UnderlyingCollection.Clear();
public void Remove(T key) => UnderlyingCollection.Remove(key);
public void CopyTo(Array array, int index) => UnderlyingCollection.CopyTo(array, index);
}
There actually is one, which is generic and has been around since .net 2.0. It's called KeyedCollection<TKey, TItem>. However, it comes with the restriction that it constructs the keys from the values, so it is not a generic Key/Value pair collection. (Although you can of course use it like KeyedCollection<TKey, Tuple<TKey, TItem>> as a workaround).
If you need it as an IDictionary<TKey, TItem>, it has a .Dictionary property.
A somewhat minor issue that I have with it is that it is an abstract class and you have to subclass it and implement:
protected abstract TKey GetKeyForItem(TItem item)
I'd rather just pass a lambda into the constructor for this purpose, but then again, I guess a virtual method is slightly faster than a lambda (any comments on this appreciated).
Edit As the question came up in the comments: KeyedCollection preserves order, as it inherits from Collection<T>, which does (it derives from IList<T>. See also the documentation of the Add method: Adds an object to the end of the Collection.).
There is an OrderedDictionary class that is a dictionary but can be indexed in insertion order, but it is not generified. There is not a generified one in the .Net framework at present.
I have read a comment somewhere from someone on the .Net team that said that they may implement a generified version in the future, but if so it would most likely be called IndexableDictionary instead of OrderedDictionary to make its behaviour more obvious.
EDIT: found the quote. It was on the MSDN page for OrderedDictionary, attributed to David M. Kean from Microsoft:
This type is actually misnamed; it is not an 'ordered' dictionary as such, but rather an 'indexed' dictionary. Although, today there is no equivalent generic version of this type, if we add one in the future it is likely that we will name such as type 'IndexedDictionary'.
Here is a wrapper for the non-generic Systems.Collections.Specialized.OrderedDictionary type.
This type will return keys/value/pairs sequences in insertion order, much like Ruby 2.0 hashes.
It does not require C#6 magic, conforms to IDictionary<TKey,TValue> (which also means that accessing a non-assigned key throws an exception), and ought to be serializable.
It is given the name 'IndexedDictionary' per note on Adrian's answer.
YMMV.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Linq;
/// <summary>
/// A dictionary that maintains insertion ordering of keys.
///
/// This is useful for emitting JSON where it is preferable to keep the key ordering
/// for various human-friendlier reasons.
///
/// There is no support to manually re-order keys or to access keys
/// by index without using Keys/Values or the Enumerator (eg).
/// </summary>
[Serializable]
public sealed class IndexedDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
// Non-generic version only in .NET 4.5
private readonly OrderedDictionary _backing = new OrderedDictionary();
private IEnumerable<KeyValuePair<TKey, TValue>> KeyValuePairs
{
get
{
return _backing.OfType<DictionaryEntry>()
.Select(e => new KeyValuePair<TKey, TValue>((TKey)e.Key, (TValue)e.Value));
}
}
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
{
return KeyValuePairs.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public void Add(KeyValuePair<TKey, TValue> item)
{
_backing[item.Key] = item.Value;
}
public void Clear()
{
_backing.Clear();
}
public bool Contains(KeyValuePair<TKey, TValue> item)
{
return _backing.Contains(item.Key);
}
public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
{
KeyValuePairs.ToList().CopyTo(array, arrayIndex);
}
public bool Remove(KeyValuePair<TKey, TValue> item)
{
TValue value;
if (TryGetValue(item.Key, out value)
&& Equals(value, item.Value))
{
Remove(item.Key);
return true;
}
return false;
}
public int Count
{
get { return _backing.Count; }
}
public bool IsReadOnly
{
get { return _backing.IsReadOnly; }
}
public bool ContainsKey(TKey key)
{
return _backing.Contains(key);
}
public void Add(TKey key, TValue value)
{
_backing.Add(key, value);
}
public bool Remove(TKey key)
{
var result = _backing.Contains(key);
if (result) {
_backing.Remove(key);
}
return result;
}
public bool TryGetValue(TKey key, out TValue value)
{
object foundValue;
if ((foundValue = _backing[key]) != null
|| _backing.Contains(key))
{
// Either found with a non-null value, or contained value is null.
value = (TValue)foundValue;
return true;
}
value = default(TValue);
return false;
}
public TValue this[TKey key]
{
get
{
TValue value;
if (TryGetValue(key, out value))
return value;
throw new KeyNotFoundException();
}
set { _backing[key] = value; }
}
public ICollection<TKey> Keys
{
get { return _backing.Keys.OfType<TKey>().ToList(); }
}
public ICollection<TValue> Values
{
get { return _backing.Values.OfType<TValue>().ToList(); }
}
}
I know you're writing C#, but Java has a class called LinkedHashMap that uses a private LinkedList to maintain the order of insertion of keys. If you can't find a suitable generic solution, perhaps that would be a start on implementing your own.
Another option for a Generic Key/Value pair that preserves insertion is to use something like:
Queue<KeyValuePair<string, string>>
This would be a guaranteed ordered list. You can en-queue and dequeue in an ordered faction similar to Add/Remove of dictionary as opposed to resizing an Array. It can often serve as a middle ground between a non-resizing ordered (by insertion) array and an autoresizing unordered (by insertion) list.
If you need constant complexity of Add, Remove, ContainsKey and order preservation, then there's no such generic in .NET Framework 4.5.
If you're okay with 3rd party code, take a look at my repository (permissive MIT license):
https://github.com/OndrejPetrzilka/Rock.Collections
There's OrderedDictionary<K,V> collection:
source code based on classic Dictionary<K,V> (from .NET Core)
preserves order of insertions and allows manual reordering
features reversed enumeration
has same operation complexities as Dictionary<K,V>
Add and Remove operations are ~20% slower compared to Dictionary<K,V>
consumes 8 more bytes of memory per item
Code:
//A SortedDictionary is sorted on the key (not value)
System.Collections.Generic.SortedDictionary<string, string> testSortDic = new SortedDictionary<string, string>();
//Add some values with the keys out of order
testSortDic.Add("key5", "value 1");
testSortDic.Add("key3", "value 2");
testSortDic.Add("key2", "value 3");
testSortDic.Add("key4", "value 4");
testSortDic.Add("key1", "value 5");
//Display the elements.
foreach (KeyValuePair<string, string> kvp in testSortDic)
{
Console.WriteLine("Key = {0}, value = {1}", kvp.Key, kvp.Value);
}
Output:
Key = key1, value = value 5
Key = key2, value = value 3
Key = key3, value = value 2
Key = key4, value = value 4
Key = key5, value = value 1

Is there a better data structure than Dictionary if the values are objects and a property of those objects are the keys?

I have a Dictionary<int, object> where the int is a property of obj. Is there a better data structure for this? I feel like using a property as the key is redundant.
This Dictionary<int, obj> is a field in a container class that allows for random indexing into the obj values based on an int id number. The simplified (no exception handling) indexer in the container class would look like:
obj this[int id]
{
get{ return this.myDictionary[id];}
}
where myDictionary is the aforementioned Dictionary<int, obj> holding the objects.
This may be the typical way of quick random access but I wanted to get second opinions.
There's no concrete class in the framework that does this. There's an abstract one though, KeyedCollection. You'll have to derive your own class from that one and implement the GetKeyForItem() method. That's pretty easy, just return the value of the property by which you want to index.
That's all you need to do, but do keep an eye on ChangeItemKey(). You have to do something meaningful when the property that you use as the key changes value. Easy enough if you ensure that the property is immutable (only has a getter). But quite awkward when you don't, the object itself now needs to have awareness of it being stored in your collection. If you don't do anything about it (calling ChangeItemKey), the object gets lost in the collection, you can't find it back. Pretty close to a leak.
Note how Dictionary<> side-steps this problem by specifying the key value and the object separately. You may still not be able to find the object back but at least it doesn't get lost by design.
There is a KeyedCollection class.
EDIT: The KeyedCollection can use a dictionary internally, but it cleaner interface for this particular scenario than a raw dictionary since you can lookup by values directly. Admittedly I don't find it very useful in general.
You can implement your own KeyedCollection trivially if the extra overhead that comes with the factory settings isn't worth it. The original KeyedCollection in System.Collections.ObjectModel is internally a Dictionary<TKey, TItem> and a List<TItem> which means you can have operations defined on both IList<> and IDictionary<>. For e.g., you can insert, access by index, traverse collection in the inserted order (all which IList<> facilitates) and at the same time you can have quick lookups based on key (with the help of dictionary). This means that when you're adding or removing an item they have to be performed on both underlying collections, apart from the small memory overhead to hold the extra List<> (but the objects are not duplicated as such). Though the addition speeds are not affected much (List<> addition is O(1)), removal speed is affected a little.
If you don't care about insertion order and accessing by index:
public class KeyedCollection<TKey, TItem> : ICollection<TItem>
{
MemberInfo _keyInfo;
Func<TItem, TKey> _keySelector;
Dictionary<TKey, TItem> _dict;
public TItem this[TKey key]
{
get { return _dict[key]; }
}
public int Count
{
get { return _dict.Count; }
}
public bool IsReadOnly
{
get { return false; }
}
public ICollection<TKey> Keys
{
get { return _dict.Keys; }
}
private ICollection<TItem> Items
{
get { return _dict.Values; }
}
public KeyedCollection(Expression<Func<TItem, TKey>> keySelector, IEqualityComparer<TKey> comparer = null)
{
var keyExpression = keySelector.Body as MemberExpression;
if (keyExpression != null)
_keyInfo = keyExpression.Member;
_keySelector = keySelector.Compile();
_dict = new Dictionary<TKey, TItem>(comparer);
}
private TKey GetKeyForItem(TItem item)
{
return _keySelector(item);
}
public bool ContainsKey(TKey key)
{
return _dict.ContainsKey(key);
}
public bool Contains(TItem item)
{
return ContainsKey(GetKeyForItem(item));
}
public bool TryGetItem(TKey key, out TItem item)
{
return _dict.TryGetValue(key, out item);
}
public void Add(TItem item)
{
_dict.Add(GetKeyForItem(item), item);
}
public void AddOrUpdate(TItem item)
{
_dict[GetKeyForItem(item)] = item;
}
public bool UpdateKey(TKey oldKey, TKey newKey)
{
TItem oldItem;
if (_keyInfo == null || !TryGetItem(oldKey, out oldItem) || !SetItem(oldItem, newKey)) // important
return false;
RemoveKey(oldKey);
Add(oldItem);
return true;
}
private bool SetItem(TItem item, TKey key)
{
var propertyInfo = _keyInfo as PropertyInfo;
if (propertyInfo != null)
{
if (!propertyInfo.CanWrite)
return false;
propertyInfo.SetValue(item, key, null);
return true;
}
var fieldInfo = _keyInfo as FieldInfo;
if (fieldInfo != null)
{
if (fieldInfo.IsInitOnly)
return false;
fieldInfo.SetValue(item, key);
return true;
}
return false;
}
public bool RemoveKey(TKey key)
{
return _dict.Remove(key);
}
public bool Remove(TItem item)
{
return RemoveKey(GetKeyForItem(item));
}
public void Clear()
{
_dict.Clear();
}
public void CopyTo(TItem[] array, int arrayIndex)
{
Items.CopyTo(array, arrayIndex);
}
public IEnumerator<TItem> GetEnumerator()
{
return Items.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I have implemented ICollection<TItem> to make it more standard compliant - and also you get the nice collection initializer syntax! :)
A sample usage:
var p1 = new Person { Name = "a" };
var p2 = new Person { Name = "b" };
var people = new KeyedCollection<string, Person>(p => p.Name) { p1, p2 };
// p1 == people["a"];
// p2 == people["b"];
C# dynamic properties post seems to show that using a Dictionary was a popular choice. The other posts suggest using a HashTable
Dictionary vs Hashtable

Categories

Resources