ILookUp vs. Dictionary [duplicate]

ILookUp vs. Dictionary [duplicate] - c#

I'm trying to wrap my head around which data structures are the most efficient and when / where to use which ones.
Now, it could be that I simply just don't understand the structures well enough, but how is an ILookup(of key, ...) different from a Dictionary(of key, list(of ...))?
Also where would I want to use an ILookup and where would it be more efficient in terms of program speed / memory / data accessing, etc?

Two significant differences:
Lookup is immutable. Yay :) (At least, I believe the concrete Lookup class is immutable, and the ILookup interface doesn't provide any mutating members. There could be other mutable implementations, of course.)
When you lookup a key which isn't present in a lookup, you get an empty sequence back instead of a KeyNotFoundException. (Hence there's no TryGetValue, AFAICR.)
They're likely to be equivalent in efficiency - the lookup may well use a Dictionary<TKey, GroupingImplementation<TValue>> behind the scenes, for example. Choose between them based on your requirements. Personally I find that the lookup is usually a better fit than a Dictionary<TKey, List<TValue>>, mostly due to the first two points above.
Note that as an implementation detail, the concrete implementation of IGrouping<,> which is used for the values implements IList<TValue>, which means that it's efficient to use with Count(), ElementAt() etc.

Interesting that nobody has stated the actual biggest difference (Taken directly from MSDN):
A Lookup resembles a Dictionary. The
difference is that a Dictionary maps keys to single
values, whereas a Lookup maps keys to collections of
values.

Both a Dictionary<Key, List<Value>> and a Lookup<Key, Value> logically can hold data organized in a similar way and both are of the same order of efficiency. The main difference is a Lookup is immutable: it has no Add() methods and no public constructor (and as Jon mentioned you can query a non-existent key without an exception and have the key as part of the grouping).
As to which do you use, it really depends on how you want to use them. If you are maintaining a map of key to multiple values that is constantly being modified, then a Dictionary<Key, List<Value>> is probably better since it is mutable.
If, however, you have a sequence of data and just want a read-only view of the data organized by key, then a lookup is very easy to construct and will give you a read-only snapshot.

Another difference not mentioned yet is that Lookup() supports null keys:
Lookup class implements the ILookup interface. Lookup is very similar to a dictionary except multiple values are allowed to map to the same key, and null keys are supported.

The primary difference between an ILookup<K,V> and a Dictionary<K, List<V>> is that a dictionary is mutable; you can add or remove keys, and also add or remove items from the list that is looked up. An ILookup is immutable and cannot be modified once created.
The underlying implementation of both mechanisms will be either the same or similar, so their searching speed and memory footprint will be approximately the same.

When exception is not a option, go for Lookup
If you are trying to get a structure as efficient as a Dictionary but you dont know for sure there is no duplicate key in input, Lookup is safer.
As mentioned in another answer, it also supports null keys, and returns always a valid result when queried with arbitrary data, so it appears as more resilient to unknown input (less prone than Dictionary to raise exceptions).
And it is especially true if you compare it to the System.Linq.Enumerable.ToDictionary function :
// won't throw
new[] { 1, 1 }.ToLookup(x => x);
// System.ArgumentException: An item with the same key has already been added.
new[] { 1, 1 }.ToDictionary(x => x);
The alternative would be to write your own duplicate key management code inside of a foreach loop.
Performance considerations, Dictionary: a clear winner
If you don't need a list and you are going to manage a huge number of items, Dictionary (or even your own custom tailored structure) would be more efficient:
Stopwatch stopwatch = new Stopwatch();
var list = new List<string>();
for (int i = 0; i < 5000000; ++i)
{
list.Add(i.ToString());
}
stopwatch.Start();
var lookup = list.ToLookup(x => x);
stopwatch.Stop();
Console.WriteLine("Creation: " + stopwatch.Elapsed);
// ... Same but for ToDictionary
var lookup = list.ToDictionary(x => x);
// ...
As Lookup has to maintain a list of items for each key, it is slower than Dictionary (around 3x slower for huge number of items)
Lookup speed:
Creation: 00:00:01.5760444
Dictionary speed:
Creation: 00:00:00.4418833

Related

ConcurrentDictionary<TKey, TValue> - How to efficiently "get N elements, starting from key K"?

Situation as follows:
I have a ConcurrentDictionary<TId, TItem>
For efficient paging, we want to implement "get N Items, starting from key K"
The best approach I came up with was:
public IEnumerable<TItem> Get( TId fromKey, int count )
{
// parameter validation left out for brevity
return items.Keys // KeyCollection of the Dictionary, please assume 'items' is a class field
.SkipWhile(key => key != fromKey)
.Take(count)
.Select(x => items[x])
.ToList();
}
But that feels really wrong. Especially because we explicitly do not want to "SkipWhile".
If I was OK to Skip, I could just do .Skip(n).Take(m) on the Values but that's explicitly not wanted. Requirement for me is: starting at key K, return N elements.
Maybe I am overthinking this and I should push back. But I have the feeling, I am missing something here.
So my question is: Is there a way to do this, without having to "skip over" in either KeyCollection or ValueCollection of the Dictionary?
EDIT
ConcurrentDictionary<TKey, TVaue> is where I picked up the task. It is not carved in stone to keep that Type.
Order is no priority. Seniors and PO view it "good enough" to go by whatever order results from KeyCollection. But that's a good point to keep in mind looking to possible future feature requests.

Well, based on the comments, it sounds a bit bizarre, but I'm sure there are reasons you can't go into the backstory or details.
I would say this.
SkipWhile(key => key != fromKey) is really the only way you can find a key for the purpose of finding more keys "after it", so in that sense, what you have is correct. If your keyspace is not ridiculously large, that seems sufficient.
That said, a different data structure would be better. For example, you could implement a concurrent version of a dictionary + array or dictionary + linked list that allows you to access a key in O(1) and then the subsequent elements in O(m) inside of a lock (you could even make it a ReaderWriterLockSlim). That avoids the O(n) scan to find the key if just using ConcurrentDictionary.
Insertion would be a bit strange, because you'd have to maintain a somewhat arbitrary notion of what before and after mean. In the dictionary + array case for example, you could add key 'foo' into the dictionary and into slot 0 in the array. Key 'bar' would go into the dictionary as usual, and into slot 1, and so on.
Oh - and your dictionary entry would have to point to the location in the array or the linked list to get that O(m), as well as the data itself. And, if you want to de-duplicate the data, the array/list could point back to the dictionary entry instead of just holding the data.
Arrays will leave you with holes when items are deleted! That's where a linked list would be helpful. Writes will be a bit slower to maintain "ordering" (using this term loosely) and because you are accessing two underlying data structures.

How can I know if I should access value by index or key in an OrderedDictionary in C#?

All this time I was using Dictionary to store key/value pairs until I came across this new class called OrderedDictionary which has got an additional feature of accessing data through index.
So, I wanted to know when could/would I be running into any situation that would ask me to access value through index when I have the key already. I have a small snippet below.
OrderedDictionary od = new OrderedDictionary();
od.Add("Key1", "Val1");
od.Add("Key2", "Val2");
od.Add("Key3", "Val3");
od.Add("Key4", "Val4");
Probably, the code above may not seem appropriate but, I would really appreciate if someone can give a better one to answer by question.
Many Thanks!

I wanted to know when could/would I be running into any situation that would ask me to access value through index when I have the key already
I follow the YAGNI principle - You Aren't Gonna Need It. If you already know the key, then what value is there in accessing by index? The point of a dictionary is to do FAST lookups by key (by not scanning the entire collection). With an OrderedDictionary, lookups are still fast, but inserts and updates are marginally slower because the structure must keep the keys and indices in sync. Plus, the current framework implementation is not generic, so you'll have to do more casting, but there are plenty of 3rd party generic implementations out there. The fact that MS did not create a generic implementation may tell you something about the value of that type overall.
So the situation you "could" run into is needing to access the values in key order. In that case you'll need to decide if you do that often enough to warrant the overhead of an OrderedDictionary or if you can just use Linq queries to order the items outside of the structure.

Theory of Hash based collections
Before choosing between Dictionary and OrderedDictionary, let's look how some collections are built.
Arrays
Arrays provide time constant access when you now the index of your value.
So keys must be integers. If you don't know the index, you must traverse the full collection to check your value is the one you're looking for.
Dictionaries
Purpose of dictionary is to provide a (relatively) time constant access to any value in it when key is not an integer. However, since there is not always a perfect hash function to get an integer from a value, there will collisions of hash codes and then when several values have the same hash codes, they are added in a array. And search for theses collided values will be slower (since it must traverse the array).
OrderedDictionary
OrderedDictionary is kind of mix between the two previous collections. Index search will/should be the fastest (however you need to profile to be sure of that point). The problem with index search is that apart from special cases, you don't know the index in which your value was stored, so you must rely on the key. Which makes me wonder, why would you need an OrderedDictionary ?
As one comment implies, I would be very interested to know what's your use case for such a collection. Most of the times, you either know the index or don't know it, because it relies on the value nature. So you should either use an array or a Dictionary, not a mix of both.
Two use cases very different:
KeyValuePair<string, string>[] values = new KeyValuePair<string, string>[4];
values[0] = new KeyValuePair<string, string>("Key1", "Value1");
// And so on...
// Or
Dictionary<string, Person> persons = new Dictionary<string, Person>();
var asker = new Person { FirstName = "pradeep", LastName=" pradyumna" };
persons.Add(asker.Key, asker);
// Later in the code, you cannot know the index of the person without having the person instance.

Complexity of searching in a list and in a dictionary

Let's say I have a class:
class C
{
public int uniqueField;
public int otherField;
}
This is very simplified version of the actual problem. I want to store multiple instances of this class, where "uniqueField" should be unique for each instance.
What is better in this case?
a) Dictionary with uniqueField as the key
Dictionary<int, C> d;
or b) List?
List<C> l;
In the first case (a) the same data would be stored twice (as the key and as the field of a class instance). But the question is: Is it faster to find an element in dictionary than in list? Or is the equally fast?
a)
d[searchedUniqueField]
b)
l.Find(x=>x.uniqueField==searchedUniqueField);

Assuming you've got quite a lot of instances, it's likely to be much faster to find the item in the dictionary. Basically a Dictionary<,> is a hash table, with O(1) lookup other than due to collisions.
Now if the collection is really small, then the extra overhead of finding the hash code, computing the right bucket and then looking through that bucket for matching hash codes, then performing a key equality check can take longer than just checking each element in a list.
If you might have a lot of instances but might not, I'd usually pick the dictionary approach. For one thing it expresses what you're actually trying to achieve: a simple way of accessing an element by a key. The overhead for small collections is unlikely to be very significant unless you have far more small collections than large ones.

Use Dictionary when the number of lookups greatly exceeds the number of insertions. It is fine to use List when you will always have fewer than four items.
Reference - http://www.dotnetperls.com/dictionary-time

If you want to ensure that your client will not create a duplication of the key, you may want your class to be responsible to create the unique key. Therefore once the unique key generation is the responsibility of the class , dictionary or list is the client decision.

C# dictionary vs list usage

I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?

If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.

The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.

You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx

You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });

You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?

Does the Enumerator of a Dictionary<TKey, TValue> return key value pairs in the order they were added?

I understand that a dictionary is not an ordered collection and one should not depend on the order of insertion and retrieval in a dictionary.
However, this is what I noticed:
Added 20 key value pairs to a Dictionary
Retrieved them by doing a foreach(KeyValuePair...)
The order of retrieval was same as the order in which they were added.
Tested for around 16 key value pairs.
Is this by design?

It's by coincidence, although predictably so. You absolutely shouldn't rely on it. Usually it will happen for simple situations, but if you start deleting elements and replacing them with anything either with the same hash code or just getting in the same bucket, that element will take the position of the original, despite having been added later than others.
It's relatively fiddly to reproduce this, but I managed to do it a while ago for another question:
using System;
using System.Collections.Generic;
class Test
{
static void Main(string[] args)
{
var dict = new Dictionary<int, int>();
dict.Add(0, 0);
dict.Add(1, 1);
dict.Add(2, 2);
dict.Remove(0);
dict.Add(10, 10);
foreach (var entry in dict)
{
Console.WriteLine(entry.Key);
}
}
}
The results show 10, 1, 2 rather than 1, 2, 10.
Note that even though it looks like the current behaviour will always yield elements in insertion order if you don't perform any deletions, there's no guarantee that future implementations will do the same... so even in the restricted case where you know you won't delete anything, please don't rely on this.

From MSDN:
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<(Of <(TKey, TValue>)>) structure representing a value and its key. The order in which the items are returned is undefined.
[Emphasis added]

If you want to iterate through a Dictionary in a fixed order you could try OrderedDictionary

It is by design that the Dictionary<TKey,TValue> is not an ordered structure as it is intended to be used primarily more for key-based access.
If you have the need to retrieve items in a specific order, you should take a look at the Sorted Dictionary<TKey, TValue>, which takes a Comparer<T> that will be used to sort the keys in the Sorted Dictionary<TKey, TValue>.

Is this by design? It probably wasn't in the original .Net Framework 2.0, but now there is an implicit contract that they will be ordered in the same order as added, because to change this would break so much code that relies on the behaviour of the original generic dictionary. Compare with the Go language, where their map deliberately returns a random ordering to prevent users of maps from relying on any ordering [1].
Any improvements or changes the framework writers make to Dictionary<T,V> would have to keep that implicit contract.
[1] "Since the release of Go 1.0, the runtime has randomized map iteration order. ", https://blog.golang.org/go-maps-in-action .

I don't think so, the dictionary does not grantee the internal ordering of items inside it.
If you need to keep the order as well, use additional data structure (array or list) along with the dictionary.

I believe enumerating a Dictionary<K,V> will return the keys in the same order they were inserted if all the keys hash to the same value. This is because the Dictionary<K,V> implementation uses the hash code of the key object to insert key/value pairs into buckets, and the values are (usually) stored in the buckets in the order they are inserted. If you are consistently seeing this behavior with your user-defined objects, then perhaps you have not (correctly) overridden the GetHashCode() method?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.