I have a List of Dictionaries, List<Dictionary<String,Object>>. The key is an identifier of some abstract record. These Dictionaries come from various places. The size of each Dictionary is in the range [0, 1000].
All Dictionaries contain unique keys. After accumulating some Dictionaries I must make a search by key. It could be done by iterating the List and calling search method on every Dictionary or it could be done by copying all Dictionaries into one. These approaches do not offer very good performance. I am interested in ways to optimize this task.
Edit:
Thank you guys! Maybe I'll change the accumulation method and as result eliminate the problem itself!
Are you expecting there to be lots of key fetches after an initial population phase? If so, amalgamate everything into a single dictionary. If you'll only be doing a few fetches, I can't see any way you could get better than asking every dictionary.
Of course you could create a hybrid approach: create a new (initially empty) dictionary for the amalgamated results, and populate it as you're asked for keys - by searching through all the rest each time you're asked for a key which isn't already in your "big" dictionary.
Is there no way of predicting which dictionary would have a particular key?
If there is any way to localize a dictionary of interest by specifying a key, you can try, naturaly, to create a cross association table where you can try to match the key to dictionary.
If not, imho, don't see any other option that just iterate over collection and ask for the key , may be using standart for and not nicer linq coding.
Adding to what Jon said, there is an API called as PowerCollections which contains MultiDictionary. If my memory is not corrupted, I believe, you can use this for the purpose mentioned.
http://powercollections.codeplex.com/discussions/242163
It sounds like you have lots of dictionaries to "speed up" (assumption of motive) searches that are limited to certain "abstract record" types.
You can get away with one single dictionary, but on limited searches check the result is required abstract record type after finding it. Rather than maintaining a single dictionary for each and every abstract record type as at present.
Related
Normally, I use a dictionary like a list, but with a key of a different type. I like the ability to quickly access individual items in the dictionary without having to loop through it until I find the item with the right property (because the property I'm looking for is in the Key).
But there is another possible use of a dictionary. I could just use the Key to store property A and the Value to store property B without ever using the dictionary's special functionality. For example, I could store a list of persons just by storing the forename in the key and the family name in the value (let's assume, for the sake of simplicity, that there won't ever be two people with the same forename, because I just couldn't come up with an better example). I would only use that dictionary to loop through it in a foreach loop and add items to it (no removing, sorting or accessing individual items). There would actually be no difference to using a List<KeyValuePair<string, string>> from using a Dictionary<string, string> (at least not in the example that I gave - I know that I could e. g. store multiple items wiht the same key in the list).
So, to sum it up, what should I do when I don't need to use the special functionalities a dictionary provides and just use it to store something that has exactly two properties:
use a Dictionary<,>
use a List<KeyValuePair<,>
use a List<MyType> with MyType being a custom class that contains the two properties and a constructor.
Don't use dictionaries for that.
If you don't want to create a class for this purpose, use something like List<Tuple<T1,T2>>. But keep in mind a custom class will be both more readable and more flexible.
Here's the reason: it will be much more easy to read your code if you use proper data structures. Using a dictionary will only confuse the reader, and you'll have problems the day a duplicate key shows up.
If someone reads your code and sees a Dictionary being used, he will assume you really mean to use a map-like structure. Your code should be clear and your intent should be obvious when reading it.
If you're concerned with performance you should probably store the data in a List. A Dictionary has lots of internal overhead. Both memory as well as CPU.
If you're just concerned with readability, chose the data structure that best captures your intent. If you are storing key-value pairs (for example, custom fields in a bug tracker issue) then use a Dictionary. If you are just storing items without them having some kind of logical key, use a List.
It takes little work to create a custom class to use as an item in a List. Using a Dictionary just because it gives you a Key property for each item is a misuse of that data structure. It is easy to create a custom class that also has a Key property.
Use List<MyType> where MyType includes all the values.
The problem with the dictionary approach is that it's not flexible. If you later decide to add middle names, you'll need to redesign your whole data structure, rather than just adding another field to MyType.
I am using MemoryCache to store key/value pairs in my MVC .Net application.
There are 2 main purposes I am using MemoryCache. One is to store sessions for user ids, and another is to store constants (this is just an example) The keys could theoretically be the same in both cases, so I want some way to separate these 2.
I am thinking of 2 or 3 ways. Which way is superior? Or is there a better alternative?
Each key in the cache will be prepended by a namespace.
"user_session:1", "user_session:2"
"constants:1", "constants:2"
Using nested dictionaries as keys.
There will be a key "user_sessions" whose value will be a Dictionary that maps ids to the session object. There will be a key "constants" whose value will be a Dictionary.
Each "namespace" gets its own MemoryCache instance.
The disadvantage with #2 is that when I want to get the value belonging to a user ID, I need to first get the dictionary, then get the value for a key in that dictionary. Which means I need to store the dictionary in memory.
IE:
Dictionary<string, string> userSessions = MemoryCache.Default["user_sessions"]
object session = userSessions.get("1");
Go for option #3!
For the programmer it is easier to access.
It is the fastest solutions (see comments on the other solutions)
It is the most memory efficient solution (see comments on the other solutions)
Option #2:
You said it yourself.
If the cache decides to remove a key, a whole dictionary is removed, resulting in more reloads of the values.
Option #1:
You do not have to concatinate string (performance and memeory)
Longer key names produce longer compare times.
Adding items will be slower because it contains twice as much keys.
I'm not sure what your actual implementation is but be cautious of using sessions in this way with the MVC framework. User identifiers are better left in cookies. Either way, I can see uses to go this route as well on occasion.
I would avoid using dictionaries in the cache in that way. I don't know what type of memory allocation your looking at but it could get real ugly if the server has high traffic and multiple dictionaries. As mentioned above in a comment, you would also have to worry about concurrency issues with dictionaries in that way as well.
The better approach from the options you provided would be to give each namespace it's own instance of the cache.
Basically I have a Dictionary<Guid, Movie> Movies collection and search for movies using Guid, which is basically movie.Guid. It works great, but I also want to be able to search the same dictionary using movie.Name without looping through each element.
Is this possible or do I have to create another Dictionary<K, V> for this?
Just have two Dictionaries, one of them having the guid as its key and the other with the name as its key.
If you don't want to look at every element, you need to index it the other direction. This means another Dictionary to get O(1).
You can iterate across the variables but then you arnt getting the constant-time searching value in a dictionary (because of the way that the keys are hashed.) The answer above regarding using two dictionarys to hash references to your object may be a good solution if you dont have too many objects to reference.
You could search with the Values property:
dictionary.Values.Where(movie => movie.Name == "Some Name")
You'll lose the efficiency of a key based look up, but it will still work.
Since dictionaries are for one-way mapping you can't get keys from values.
You'll need two dictionaries.
There is also a suggestion:
You can use a custom hash function for keys instead of GUIDs and store Movie Names hash as keys. Then you can actually perform two way search in your dictionary.
Rather than using two dictionaries, you'd be much better off using one container class that has two dictionaries inside it.
Some guy named Jon came up with a partial solution to this (which you could easily build upon), leaving his code here: Getting key of value of a generic Dictionary?
You can't use that dictionary to do that search with anything like the same efficiency. But you can easily just run a LINQ query against your dictionary's Values property, which is just collection of the Movie values.
var moviesIWant = From m in movieLookup.Values
Where m.Name == "Star Wars"
Select m
Some thoughts:
When you find your answer though, you would not have the guids, unless they were also a property of movie.
For a small dictionary, this is just fine. For large and repeated searches, you should consider the creation of other dictionaries keyed on the other values you wish to search on. Only in this way would you achieve the speed of a guid lookup comparable to your original dictionary.
You could create another dictionary keyed by Name. Once you've done this, you could search this dictionary by it's key and it would have the same super-efficiency of your original dictionary, even for a very large dictionary.
var moviesByName = movieLookup.Values.ToDictionary(m => m.Name, m => m)
No I don't believe it is possible. You'll have to use another dictionary.
If you are going to want to search on more movie attributes you may be better off moving the data down to a database and use that for querying. That is what databases are good for after all.
I have a dictionary structure, with multiple key value pairs inside.
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
My dictionary is used as a data source for some control. In the control's dropdown I see the items are like this:
key1
key2
key3
The order looks identical to my dictionary.
I know Dictionary is not like arrayList - you can get the index or so.
I cannot use sortedDictionary.
Now I need to add one more key value pair to this dictionary at some point of my program and I hope it has the same effect as I do this:
myDict.Add(newKey, newValue);
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
If I do this, I know newKey will display in my control as first element.
I have an idea to create a tempDict, put each pair in myDict to tempDict, then clear myDict, then add pairs back like this:
myDict.Add(newKey, newValue);
myDict.Add(key1, value1);
myDict.Add(key2, value2);
myDict.Add(key3, value3);
Is there better way than this?
Thanks!
Dictionary<K,V> does not have an ordering. Any perceived order maintenance is by chance (and an artifact of a particular implementation including, but not limited to, bucket selection order and count).
These are the approaches (just using the Base Class Libraries BCL) I know about:
Lookup<K,V>
.NET4, immutable, can map keys to multiple values (watch for duplicates during building)
OrderedDictionary
Old, non-generic, expected Dictionary performance bounds (other two approaches are O(n) for "get(key)/set(key)")
List<KeyValuePair<K,V>>
.NET2/3 okay, mutable, more legwork, can map keys to multiple values (watch for duplicates in inserts)
Happy coding.
Creating a hash data-structure that maintains insertion order is actually only a slight modification of a standard hash implementation (Ruby hashes now maintain insertion order); however, this was not done in .NET nor, more importantly, is it part of the Dictionary/IDictionary contract.
You cannot do that with the Dictionary class. It is working in your example because of a quirk in the way the data structure is implemented. The data structure actually stores the entries in temporal order in one array and then uses another array to index into the entry array. Enumerations are based on the entry array. That is why it appears to be ordered in your case. But, if you apply a series of removal and insertion operations you will notice this ordering gets perturbed.
Use KeyCollection instead. It provides O(1) retrieval by both key and index and preserves temporal ordering.
From the MSDN page on Dictionary(TKey, TValue):
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<(Of <(TKey, TValue>)>) structure representing a value and its key. The order in which the items are returned is undefined.
I'm assuming you can't use SortedDictionary because the control depends on your data source being a Dictionary. If the control expects both the Dictionary type and sorted data, the control needs to be modified, because those two criteria contradict each other. You must use a different datatype if you need sorting/ordering functionality. Depending on undefined behavior is asking for trouble.
Don't use a dictionary - there is no guarantee the order of the keys won't change when you add further elements. Instead, define a class Pair for your Key-Value-Pairs (look here What is C# analog of C++ std::pair? for an example) and use a List<Pair> for your datasource. The List has an Insert operation you can use to insert new elements anywhere into your list.
Dictionary Should not be used to sort objects, it should rather be used to look up objects. i would suggest something else if you want to have it sort the objects too.
If you expand the Dictionary, there are no rule that would stop it from mixing up your List.
In C#, I find myself using a List<T>, IList<T> or IEnumerable<T> 99% of the time. Is there a case when it would be better to use a HashTable (or Dictionary<T,T> in 2.0 and above) over these?
Edit:
As pointed out, what someone would like to do with the collection often dictates what one should be using, so when would you use a Hashtable/Dictonary<T,T> over a List<T>?
Maybe not directly related to the OPs question, but there's a useful blog post about which collection structure to use at: SortedSets
Basically, what you want to do with the collection determines what type of collection you should create.
To summarise in more detail:
Use IList if you want to be able to enumerate and / or modify the collection (normally adding at end of list)
Use IEnumeration if you just want to enumerate the collection (don't need to add / remove - usually used as a return type)
Use IDictionary if you want to access elements by a key (adding / removing elements quickly using a key)
Use SortedSet if you want to access a collection in a predefined order (most common usage being to access the collection in order)
Overall, use Dictionary if you want to access / modify items by key in no particular order (preferred over list as that's generally done in order, preferred over enumeration as you can't modify an enumeration, preferred over hashtable as that's not strictly typed, preferred over sortedlist when you don't need keys sorted)
You use a hashtable (dictionary) when you want fast look up access to an item based on a key.
If you are using List, IList or IEnumerable generally this means that you are looping over data (well in the case of IEnumerable it definitely means that), and a hashtable isn't going to net you anything. Now if you were looking up a value in one list and using that to access data in another list, that would a little different. For example:
Find position in list of Item foo.
Position in list for foo corresponds to position in another list which contains Foo_Value.
Access position in seconds list to get Foo_Value.
Here is a link describing the different datatypes.
Another link.
Use a hashtable, when you need to be able to (quickly) look up items by key.
Of course, you can search through an IList or IEnumerable etc for a matching key but that will take O(n) time rather than O(1) for Hashtable or Dictionary.
Hash-tables are good choices if you are often doing "for something in collection" and you aren't concerned about the order of items in the collection.
Hash-tables are indexes. You can maintain a hash-table to index a list, so you can both choice to access it in order or randomly based upon the key.
Hashtable optimizes lookups. It computes a hash of each key you add. It then uses this hash code to look up the element very quickly. It is an older .NET Framework type. It is slower than the generic Dictionary type.
You're not really comparing the same things, when I use a dictionary it's because I want to have a lookup for the data, usually I want to store a list of objects and I want to be able to quickly look them up using a key of some kind.
I use Hashtables quite often to send back key/value collections to Javascript via page methods.
Dictionaries are good for caching things when you need to retrieve an object given its ID but don't want to have to hit the database: Assuming your collection is not large enough to induce a large number of collisions and your data needs retrieving often enough for an IEnumerable to be too slow, Dictionaries can give a decent speed-up.
There's no way of telling exactly without knowing what the collection is for, but unless the items in your collection are unique you cannot use a hashtable, as there will be nothing to use as a key. So perhaps the rule of thumb you are looking for is that if your members are all different and you want to pull individual instances out by key, use a hashtable. If you have a bunch of items that you wish to treat in the same way (such as doing a foreach on the entire set) use a list.