Complexity of searching in a list and in a dictionary - c#

Let's say I have a class:
class C
{
public int uniqueField;
public int otherField;
}
This is very simplified version of the actual problem. I want to store multiple instances of this class, where "uniqueField" should be unique for each instance.
What is better in this case?
a) Dictionary with uniqueField as the key
Dictionary<int, C> d;
or b) List?
List<C> l;
In the first case (a) the same data would be stored twice (as the key and as the field of a class instance). But the question is: Is it faster to find an element in dictionary than in list? Or is the equally fast?
a)
d[searchedUniqueField]
b)
l.Find(x=>x.uniqueField==searchedUniqueField);

Assuming you've got quite a lot of instances, it's likely to be much faster to find the item in the dictionary. Basically a Dictionary<,> is a hash table, with O(1) lookup other than due to collisions.
Now if the collection is really small, then the extra overhead of finding the hash code, computing the right bucket and then looking through that bucket for matching hash codes, then performing a key equality check can take longer than just checking each element in a list.
If you might have a lot of instances but might not, I'd usually pick the dictionary approach. For one thing it expresses what you're actually trying to achieve: a simple way of accessing an element by a key. The overhead for small collections is unlikely to be very significant unless you have far more small collections than large ones.

Use Dictionary when the number of lookups greatly exceeds the number of insertions. It is fine to use List when you will always have fewer than four items.
Reference - http://www.dotnetperls.com/dictionary-time

If you want to ensure that your client will not create a duplication of the key, you may want your class to be responsible to create the unique key. Therefore once the unique key generation is the responsibility of the class , dictionary or list is the client decision.

Related

ConcurrentDictionary<TKey, TValue> - How to efficiently "get N elements, starting from key K"?

Situation as follows:
I have a ConcurrentDictionary<TId, TItem>
For efficient paging, we want to implement "get N Items, starting from key K"
The best approach I came up with was:
public IEnumerable<TItem> Get( TId fromKey, int count )
{
// parameter validation left out for brevity
return items.Keys // KeyCollection of the Dictionary, please assume 'items' is a class field
.SkipWhile(key => key != fromKey)
.Take(count)
.Select(x => items[x])
.ToList();
}
But that feels really wrong. Especially because we explicitly do not want to "SkipWhile".
If I was OK to Skip, I could just do .Skip(n).Take(m) on the Values but that's explicitly not wanted. Requirement for me is: starting at key K, return N elements.
Maybe I am overthinking this and I should push back. But I have the feeling, I am missing something here.
So my question is: Is there a way to do this, without having to "skip over" in either KeyCollection or ValueCollection of the Dictionary?
EDIT
ConcurrentDictionary<TKey, TVaue> is where I picked up the task. It is not carved in stone to keep that Type.
Order is no priority. Seniors and PO view it "good enough" to go by whatever order results from KeyCollection. But that's a good point to keep in mind looking to possible future feature requests.
Well, based on the comments, it sounds a bit bizarre, but I'm sure there are reasons you can't go into the backstory or details.
I would say this.
SkipWhile(key => key != fromKey) is really the only way you can find a key for the purpose of finding more keys "after it", so in that sense, what you have is correct. If your keyspace is not ridiculously large, that seems sufficient.
That said, a different data structure would be better. For example, you could implement a concurrent version of a dictionary + array or dictionary + linked list that allows you to access a key in O(1) and then the subsequent elements in O(m) inside of a lock (you could even make it a ReaderWriterLockSlim). That avoids the O(n) scan to find the key if just using ConcurrentDictionary.
Insertion would be a bit strange, because you'd have to maintain a somewhat arbitrary notion of what before and after mean. In the dictionary + array case for example, you could add key 'foo' into the dictionary and into slot 0 in the array. Key 'bar' would go into the dictionary as usual, and into slot 1, and so on.
Oh - and your dictionary entry would have to point to the location in the array or the linked list to get that O(m), as well as the data itself. And, if you want to de-duplicate the data, the array/list could point back to the dictionary entry instead of just holding the data.
Arrays will leave you with holes when items are deleted! That's where a linked list would be helpful. Writes will be a bit slower to maintain "ordering" (using this term loosely) and because you are accessing two underlying data structures.

ILookUp vs. Dictionary [duplicate]

I'm trying to wrap my head around which data structures are the most efficient and when / where to use which ones.
Now, it could be that I simply just don't understand the structures well enough, but how is an ILookup(of key, ...) different from a Dictionary(of key, list(of ...))?
Also where would I want to use an ILookup and where would it be more efficient in terms of program speed / memory / data accessing, etc?
Two significant differences:
Lookup is immutable. Yay :) (At least, I believe the concrete Lookup class is immutable, and the ILookup interface doesn't provide any mutating members. There could be other mutable implementations, of course.)
When you lookup a key which isn't present in a lookup, you get an empty sequence back instead of a KeyNotFoundException. (Hence there's no TryGetValue, AFAICR.)
They're likely to be equivalent in efficiency - the lookup may well use a Dictionary<TKey, GroupingImplementation<TValue>> behind the scenes, for example. Choose between them based on your requirements. Personally I find that the lookup is usually a better fit than a Dictionary<TKey, List<TValue>>, mostly due to the first two points above.
Note that as an implementation detail, the concrete implementation of IGrouping<,> which is used for the values implements IList<TValue>, which means that it's efficient to use with Count(), ElementAt() etc.
Interesting that nobody has stated the actual biggest difference (Taken directly from MSDN):
A Lookup resembles a Dictionary. The
difference is that a Dictionary maps keys to single
values, whereas a Lookup maps keys to collections of
values.
Both a Dictionary<Key, List<Value>> and a Lookup<Key, Value> logically can hold data organized in a similar way and both are of the same order of efficiency. The main difference is a Lookup is immutable: it has no Add() methods and no public constructor (and as Jon mentioned you can query a non-existent key without an exception and have the key as part of the grouping).
As to which do you use, it really depends on how you want to use them. If you are maintaining a map of key to multiple values that is constantly being modified, then a Dictionary<Key, List<Value>> is probably better since it is mutable.
If, however, you have a sequence of data and just want a read-only view of the data organized by key, then a lookup is very easy to construct and will give you a read-only snapshot.
Another difference not mentioned yet is that Lookup() supports null keys:
Lookup class implements the ILookup interface. Lookup is very similar to a dictionary except multiple values are allowed to map to the same key, and null keys are supported.
The primary difference between an ILookup<K,V> and a Dictionary<K, List<V>> is that a dictionary is mutable; you can add or remove keys, and also add or remove items from the list that is looked up. An ILookup is immutable and cannot be modified once created.
The underlying implementation of both mechanisms will be either the same or similar, so their searching speed and memory footprint will be approximately the same.
When exception is not a option, go for Lookup
If you are trying to get a structure as efficient as a Dictionary but you dont know for sure there is no duplicate key in input, Lookup is safer.
As mentioned in another answer, it also supports null keys, and returns always a valid result when queried with arbitrary data, so it appears as more resilient to unknown input (less prone than Dictionary to raise exceptions).
And it is especially true if you compare it to the System.Linq.Enumerable.ToDictionary function :
// won't throw
new[] { 1, 1 }.ToLookup(x => x);
// System.ArgumentException: An item with the same key has already been added.
new[] { 1, 1 }.ToDictionary(x => x);
The alternative would be to write your own duplicate key management code inside of a foreach loop.
Performance considerations, Dictionary: a clear winner
If you don't need a list and you are going to manage a huge number of items, Dictionary (or even your own custom tailored structure) would be more efficient:
Stopwatch stopwatch = new Stopwatch();
var list = new List<string>();
for (int i = 0; i < 5000000; ++i)
{
list.Add(i.ToString());
}
stopwatch.Start();
var lookup = list.ToLookup(x => x);
stopwatch.Stop();
Console.WriteLine("Creation: " + stopwatch.Elapsed);
// ... Same but for ToDictionary
var lookup = list.ToDictionary(x => x);
// ...
As Lookup has to maintain a list of items for each key, it is slower than Dictionary (around 3x slower for huge number of items)
Lookup speed:
Creation: 00:00:01.5760444
Dictionary speed:
Creation: 00:00:00.4418833

C# dictionary vs list usage

I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?

Write-once read-many string-to-object map

I'm looking for a data structure that can possibly outperform Dictionary<string, object>. I have a map that has N items - the map is constructed once and then read many, many times. The map doesn't change during the lifetime of the program (no new items are added, no items are deleted and items are not reordered). Because the map doesn't change, it doesn't need to be thread-safe, even though the application using it is heavily multi-threaded. I expect that ~50% of lookups will happen for items not in the map.
Dictionary<TKey, TItem> is quite fast and I may end up using it but I wonder if there's another data structure that's faster for this scenario. While the rest of the program is obviously more expensive than this map, it is used in performance-critical parts and I'd like to speed it up as much as possible.
What you're looking for is a Perfect Hash Function. You can create one based on your list of strings, and then use it for the Dictionary.
The non-generic HashTable has a constructor that accepts IHashCodeProvider that lets you specify your own hash function. I couldn't find an equivalent for Dictionary, so you might have to resort to using a Hashtable instead.
You can use it internally in your PerfectStringHash class, which will do all the type casting for you.
Note that you may need to be able to specify the number of buckets in the hash. I think HashTable only lets you specify the load factor. You may find out you need to roll your own hash entirely. It's a good class for everyone to use, I guess, a generic perfect hash.
EDIT: Apparantly someone already implemented some Perfect Hash algorithms in C#.
The read performance of the generic dictionary is "close to O(1)" according to the remarks on MSDN for most TKey (and you should get pretty good performance with just string keys). And you get this out of the box, free, from the framework, without implementing your own collection.
http://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.90).aspx
If you need to stick with string keys - Dictionary is at least very good (if not best choice).
One more thing to note when you start measuring - consider if computation of hash itself has measurable impact. Searching for long strings should take longer to compute hash. See if items you want to search for can be represented as other objects with constant get hash time.

Best performance on a String Dictionary in C#

I am designing a C# class that contains a string hierarchy, where each string has 0 or 1 parents.
My inclination is to implement this with a Dictionary<string,string> where the key is the child and value is the parent. The dictionary may have a large amount of values, but I can't say the exact size. This seems like it should perform faster than creating composite wrapper with references to the parent, but I could be wrong.
Is there an alternative approach I can take that will ensure better performance speed?
Retrieving values from a Dictionary<K,V> is extremely fast (close to O(1), i.e., almost constant time lookup regardless of the size of the collection) because the underlying implementation uses a hash table. Of course, if the key type uses a terrible hashing algorithm than the performance can degrade, but you can rest assured that this is likely not the case for the framework's string type.
However, as I asked in my comment, you need to answer a few questions:
Define what performance metric is most important, i.e., Time (CPU) or space (memory).
What are your requirements? How will this be used? What's your worst case scenario? Is this going to hold a ton of data with relatively infrequent lookups, do many lookups need to be performed in a short amount of time, or do both hold true?
The Dictionary<K,V> class also uses an array internally which will grow as you add items. Is this okay for you? Again, you need to be more specific in terms of your requirements before anyone can give you a complete answer.
Using a Dictionary will be slower than using direct references, because the Dictionary will have to compute a hash etc. If you really only need the parent and not the child's operation (which I doubt), then you could store the Strings in an array together with the index of the parent String.

Categories

Resources