In C#, I find myself using a List<T>, IList<T> or IEnumerable<T> 99% of the time. Is there a case when it would be better to use a HashTable (or Dictionary<T,T> in 2.0 and above) over these?
Edit:
As pointed out, what someone would like to do with the collection often dictates what one should be using, so when would you use a Hashtable/Dictonary<T,T> over a List<T>?
Maybe not directly related to the OPs question, but there's a useful blog post about which collection structure to use at: SortedSets
Basically, what you want to do with the collection determines what type of collection you should create.
To summarise in more detail:
Use IList if you want to be able to enumerate and / or modify the collection (normally adding at end of list)
Use IEnumeration if you just want to enumerate the collection (don't need to add / remove - usually used as a return type)
Use IDictionary if you want to access elements by a key (adding / removing elements quickly using a key)
Use SortedSet if you want to access a collection in a predefined order (most common usage being to access the collection in order)
Overall, use Dictionary if you want to access / modify items by key in no particular order (preferred over list as that's generally done in order, preferred over enumeration as you can't modify an enumeration, preferred over hashtable as that's not strictly typed, preferred over sortedlist when you don't need keys sorted)
You use a hashtable (dictionary) when you want fast look up access to an item based on a key.
If you are using List, IList or IEnumerable generally this means that you are looping over data (well in the case of IEnumerable it definitely means that), and a hashtable isn't going to net you anything. Now if you were looking up a value in one list and using that to access data in another list, that would a little different. For example:
Find position in list of Item foo.
Position in list for foo corresponds to position in another list which contains Foo_Value.
Access position in seconds list to get Foo_Value.
Here is a link describing the different datatypes.
Another link.
Use a hashtable, when you need to be able to (quickly) look up items by key.
Of course, you can search through an IList or IEnumerable etc for a matching key but that will take O(n) time rather than O(1) for Hashtable or Dictionary.
Hash-tables are good choices if you are often doing "for something in collection" and you aren't concerned about the order of items in the collection.
Hash-tables are indexes. You can maintain a hash-table to index a list, so you can both choice to access it in order or randomly based upon the key.
Hashtable optimizes lookups. It computes a hash of each key you add. It then uses this hash code to look up the element very quickly. It is an older .NET Framework type. It is slower than the generic Dictionary type.
You're not really comparing the same things, when I use a dictionary it's because I want to have a lookup for the data, usually I want to store a list of objects and I want to be able to quickly look them up using a key of some kind.
I use Hashtables quite often to send back key/value collections to Javascript via page methods.
Dictionaries are good for caching things when you need to retrieve an object given its ID but don't want to have to hit the database: Assuming your collection is not large enough to induce a large number of collisions and your data needs retrieving often enough for an IEnumerable to be too slow, Dictionaries can give a decent speed-up.
There's no way of telling exactly without knowing what the collection is for, but unless the items in your collection are unique you cannot use a hashtable, as there will be nothing to use as a key. So perhaps the rule of thumb you are looking for is that if your members are all different and you want to pull individual instances out by key, use a hashtable. If you have a bunch of items that you wish to treat in the same way (such as doing a foreach on the entire set) use a list.
Related
Normally, I use a dictionary like a list, but with a key of a different type. I like the ability to quickly access individual items in the dictionary without having to loop through it until I find the item with the right property (because the property I'm looking for is in the Key).
But there is another possible use of a dictionary. I could just use the Key to store property A and the Value to store property B without ever using the dictionary's special functionality. For example, I could store a list of persons just by storing the forename in the key and the family name in the value (let's assume, for the sake of simplicity, that there won't ever be two people with the same forename, because I just couldn't come up with an better example). I would only use that dictionary to loop through it in a foreach loop and add items to it (no removing, sorting or accessing individual items). There would actually be no difference to using a List<KeyValuePair<string, string>> from using a Dictionary<string, string> (at least not in the example that I gave - I know that I could e. g. store multiple items wiht the same key in the list).
So, to sum it up, what should I do when I don't need to use the special functionalities a dictionary provides and just use it to store something that has exactly two properties:
use a Dictionary<,>
use a List<KeyValuePair<,>
use a List<MyType> with MyType being a custom class that contains the two properties and a constructor.
Don't use dictionaries for that.
If you don't want to create a class for this purpose, use something like List<Tuple<T1,T2>>. But keep in mind a custom class will be both more readable and more flexible.
Here's the reason: it will be much more easy to read your code if you use proper data structures. Using a dictionary will only confuse the reader, and you'll have problems the day a duplicate key shows up.
If someone reads your code and sees a Dictionary being used, he will assume you really mean to use a map-like structure. Your code should be clear and your intent should be obvious when reading it.
If you're concerned with performance you should probably store the data in a List. A Dictionary has lots of internal overhead. Both memory as well as CPU.
If you're just concerned with readability, chose the data structure that best captures your intent. If you are storing key-value pairs (for example, custom fields in a bug tracker issue) then use a Dictionary. If you are just storing items without them having some kind of logical key, use a List.
It takes little work to create a custom class to use as an item in a List. Using a Dictionary just because it gives you a Key property for each item is a misuse of that data structure. It is easy to create a custom class that also has a Key property.
Use List<MyType> where MyType includes all the values.
The problem with the dictionary approach is that it's not flexible. If you later decide to add middle names, you'll need to redesign your whole data structure, rather than just adding another field to MyType.
Say I have a Dictionary, and I add each key and value entry in a specific order.
Now, if I want later to be able to iterate this Dictionary in the same order entries were added, is it the order I get with simple foreach loop on this dictionary?
If not, I will be glad to hear how can I do that, I know this can be done easily with List instead of Dictionary but I don't want to.
Thanks
Normal Dictionary does not guarantee order of items.
You need OrderedDictionary if you want to maintain order items where added to it. Note that there is no generic version of this class in .Net framework, so either have to give up some type-safety or find other implementation (i.e. https://www.codeproject.com/Articles/18615/OrderedDictionary-T-A-generic-implementation-of-IO as suggested by Tim S).
Alternatively if O(log n) lookup is fine and keys should be sorted - SortedDictionary.
Sounds like what you want is a Queue<T>: http://msdn.microsoft.com/en-us/library/7977ey2c.aspx
Add your KeyValuePair<T, U> items to it in the order you want and then foreaching over it will be in first-in/first-out order.
Dictionarys are hash tables, which means that you can't guarantee that iterating the pairs will return them in the same order you added them.
Each pair is a KeyValuePair<T_K, T_V>, so you could have a List<KeyValuePair<string, string>> that would let you iterate in the order you add them if that's what you need.
The internal sort of the dictionary will depend on the hash function used. However if you need a sorted view of the data, you can use Enumerable.OrderBy.
I'm wondering whether it will be quicker to follow one pattern or another for constructing a unique list of objects in C#:
Option 1
Add all the items into a generic list
Call the list.Distinct function on it
Option 2
Iterate over each item
Check whether the item already exists in the list and if not add it
You can use HashSet<T>:
The HashSet class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
You can provide custom IEqualityComparer<T> via constructor.
This is one of those "should I use a shoe or a brick to pound a nail into the wood" questions. You should use the appropriate data structure for the job, which based on your requirement of "constructing a unique list of objects", the HashSet<T> class satisfies.
If you require the items in list format, you can always call ToList() on the set.
If you are concerned about the performance of looking up unique items, use a Dictionary<TKey, TVale>. Also, a dictionary requires unique keys, so you will never have duplicates.
I have a List of Dictionaries, List<Dictionary<String,Object>>. The key is an identifier of some abstract record. These Dictionaries come from various places. The size of each Dictionary is in the range [0, 1000].
All Dictionaries contain unique keys. After accumulating some Dictionaries I must make a search by key. It could be done by iterating the List and calling search method on every Dictionary or it could be done by copying all Dictionaries into one. These approaches do not offer very good performance. I am interested in ways to optimize this task.
Edit:
Thank you guys! Maybe I'll change the accumulation method and as result eliminate the problem itself!
Are you expecting there to be lots of key fetches after an initial population phase? If so, amalgamate everything into a single dictionary. If you'll only be doing a few fetches, I can't see any way you could get better than asking every dictionary.
Of course you could create a hybrid approach: create a new (initially empty) dictionary for the amalgamated results, and populate it as you're asked for keys - by searching through all the rest each time you're asked for a key which isn't already in your "big" dictionary.
Is there no way of predicting which dictionary would have a particular key?
If there is any way to localize a dictionary of interest by specifying a key, you can try, naturaly, to create a cross association table where you can try to match the key to dictionary.
If not, imho, don't see any other option that just iterate over collection and ask for the key , may be using standart for and not nicer linq coding.
Adding to what Jon said, there is an API called as PowerCollections which contains MultiDictionary. If my memory is not corrupted, I believe, you can use this for the purpose mentioned.
http://powercollections.codeplex.com/discussions/242163
It sounds like you have lots of dictionaries to "speed up" (assumption of motive) searches that are limited to certain "abstract record" types.
You can get away with one single dictionary, but on limited searches check the result is required abstract record type after finding it. Rather than maintaining a single dictionary for each and every abstract record type as at present.
I have a dictionary structure, with multiple key value pairs inside.
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
My dictionary is used as a data source for some control. In the control's dropdown I see the items are like this:
key1
key2
key3
The order looks identical to my dictionary.
I know Dictionary is not like arrayList - you can get the index or so.
I cannot use sortedDictionary.
Now I need to add one more key value pair to this dictionary at some point of my program and I hope it has the same effect as I do this:
myDict.Add(newKey, newValue);
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
If I do this, I know newKey will display in my control as first element.
I have an idea to create a tempDict, put each pair in myDict to tempDict, then clear myDict, then add pairs back like this:
myDict.Add(newKey, newValue);
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
Is there better way than this?
Thanks!
Dictionary<K,V> does not have an ordering. Any perceived order maintenance is by chance (and an artifact of a particular implementation including, but not limited to, bucket selection order and count).
These are the approaches (just using the Base Class Libraries BCL) I know about:
Lookup<K,V>
.NET4, immutable, can map keys to multiple values (watch for duplicates during building)
OrderedDictionary
Old, non-generic, expected Dictionary performance bounds (other two approaches are O(n) for "get(key)/set(key)")
List<KeyValuePair<K,V>>
.NET2/3 okay, mutable, more legwork, can map keys to multiple values (watch for duplicates in inserts)
Happy coding.
Creating a hash data-structure that maintains insertion order is actually only a slight modification of a standard hash implementation (Ruby hashes now maintain insertion order); however, this was not done in .NET nor, more importantly, is it part of the Dictionary/IDictionary contract.
You cannot do that with the Dictionary class. It is working in your example because of a quirk in the way the data structure is implemented. The data structure actually stores the entries in temporal order in one array and then uses another array to index into the entry array. Enumerations are based on the entry array. That is why it appears to be ordered in your case. But, if you apply a series of removal and insertion operations you will notice this ordering gets perturbed.
Use KeyCollection instead. It provides O(1) retrieval by both key and index and preserves temporal ordering.
From the MSDN page on Dictionary(TKey, TValue):
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<(Of <(TKey, TValue>)>) structure representing a value and its key. The order in which the items are returned is undefined.
I'm assuming you can't use SortedDictionary because the control depends on your data source being a Dictionary. If the control expects both the Dictionary type and sorted data, the control needs to be modified, because those two criteria contradict each other. You must use a different datatype if you need sorting/ordering functionality. Depending on undefined behavior is asking for trouble.
Don't use a dictionary - there is no guarantee the order of the keys won't change when you add further elements. Instead, define a class Pair for your Key-Value-Pairs (look here What is C# analog of C++ std::pair? for an example) and use a List<Pair> for your datasource. The List has an Insert operation you can use to insert new elements anywhere into your list.
Dictionary Should not be used to sort objects, it should rather be used to look up objects. i would suggest something else if you want to have it sort the objects too.
If you expand the Dictionary, there are no rule that would stop it from mixing up your List.