I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?
Related
var usedIds = list.Count > 20 ? new HashSet<int>() as ICollection<int> : new List<int>();
Assuming that List is more performant with 20 or less items and HashSet is more performant with greater items amount (from this post), is it efficient approach to use different collection types dynamicaly based on the predictable items count?
All of the actions for each of the collection types will be the same.
PS: Also i have found HybridCollection Class which seems to do the same thing automaticaly, but i've never used it so i have no info on its performance either.
EDIT: My collection is mostly used as the buffer with many inserts and gets.
In theory, it could be, depending on how many and what type of operations you are performing on the collections. In practice, it would be a pretty rare case where such micro-optimization would justify the added complexity.
Also consider what type of data you are working with. If you are using int as the collection item as the first line of your question suggests, then the threshold is going to be quite a bit less than 20 where List is no longer faster than HashSet for many operations.
In any case, if you are going to do that, I would create a new collection class to handle it, something along the lines of the HybridDictionary, and expose it to your user code with some generic interface like IDictionary.
And make sure you profile it to be sure that your use case actually benefits from it.
There may even be a better option than either of those collections, depending on what exactly it is you are doing. i.e. if you are doing a lot of "before or after" inserts and traversals, then LinkedList might work better for you.
Hashtables like Hashset<T> and Dictionary<K,T> are faster at searching and inserting items in any order.
Arrays T[] are best used if you always have a fixed size and a lot of indexing operations. Adding items to a array is slower than adding into a list due to the covariance of arrays in c#.
List<T> are best used for dynamic sized collections whith indexing operations.
I don't think it is a good idea to write something like the hybrid collection better use a collection dependent on your requirements. If you have a buffer with a lof of index based operations i would not suggest a Hashtable, as somebody already quoted a Hashtable by design uses more memory
HashSet is for faster access, but List is for insert. If you don't plan adding new items, use HashSet, otherwise List.
If you collection is very small then the performance is virtually always going to be a non-issue. If you know that n is always less than 20, O(n) is, by definition, O(1). Everything is fast for small n.
Use the data structure that most appropriate represents how you are conceptually treating the data, the type of operations that you need to perform, and the type of operations that should be most efficient.
is it efficient approach to use different collection types dynamicaly based on the predictable items count?
It can be depending on what you mean by "efficiency" (MS offers HybridDictionary class for that, though unfortunately it is non generic). But irrespective of that its mostly a bad choice. I will explain both.
From an efficiency standpoint:
Addition will be always faster in a List<T>, since a HashSet<T> will have to precompute hash code and store it. Even though removal and lookup will be faster with a HashSet<T> as size grows up, addition to the end is where List<T> wins. You will have to decide which is more important to you.
HashSet<T> will come up with a memory overhead compared to List<T>. See this for some illustration.
But however, from a usability standpoint it need not make sense. A HashSet<T> is a set, unlike a bag which List<T> is. They are very different, and their uses are very different. For:
HashSet<T> cannot have duplicates.
HashSet<T> will not care about any order.
So when you return a hybrid ICollection<T>, your requirement goes like this: "It doesn't matter whether duplicates can be added or not. Sometimes let it be added, sometimes not. Of course iteration order is not important anyway" - very rarely useful.
Good q, and +1.
HashSet is better, because it will probably use less space, and you will have faster access to elements.
Say that, in my method, I pass in a couple IEnumerables (probably because I'm going to get a bunch of objects from a db or something).
Then for each object in objects1, I want to pull out a diffobject from objects2 that has the same object.iD.
I don't want multiple enumerations (according to resharper) so I could make objects2 into a dictionary keyed with object.iD. Then I only enumerate once for each. (secondary question)Is that a good pattern?
(primary question) What's too big? At what point would this be a horrible pattern? How many objects is too many objects for the dictionary?
Internally, it would be prevented from ever having more than two billion items. Since the way things are positioned within a dictionary is fairly complicated, if I were looking at dealing with a billion items (if a 16-bit value, for example, then 2GB), I'd be looking to store them in a database and retrieve them using data-access code.
I have to ask though, where are Objects1 and Objects2 coming from? It sounds as though you could do this at the DB level and it would be MUCH, MUCH more efficient than doing it in C#!
You might also want to consider using KeyValuePair[]
Dictionaries store instances of KeyValuePair
If all you ever want to do is look up values in the dictionary given their Key, then yes, Dictionary is the way to go - they're pretty quick at doing that. However, if you want to sort items or search for them using the Value or a property of it, it's better to use something else.
As far as the size goes, they get a little slower as they get bigger, it's worth doing some benchmarks to see how it affects your needs, but you could always split values across multiple dictionaries based on their type or range. http://www.dotnetperls.com/dictionary-size
It's worth noting though that when you say "Then I only enumerate once for each", that's slightly incorrect. objects1 will be enumerated fully, but the dictionary of objects2 won't be enumerated. As long as you use the Key to retrieve values, it will hash the key and use the result to calculate a location to store the value, so a dictionary can get pretty quickly to the value you ask for. Ideally use an int for the Key because it can use that as the hash directly. You can enumerate them, but it's must better to look objects up using objects2Dictionary[key].
I'm looking for a data structure that can possibly outperform Dictionary<string, object>. I have a map that has N items - the map is constructed once and then read many, many times. The map doesn't change during the lifetime of the program (no new items are added, no items are deleted and items are not reordered). Because the map doesn't change, it doesn't need to be thread-safe, even though the application using it is heavily multi-threaded. I expect that ~50% of lookups will happen for items not in the map.
Dictionary<TKey, TItem> is quite fast and I may end up using it but I wonder if there's another data structure that's faster for this scenario. While the rest of the program is obviously more expensive than this map, it is used in performance-critical parts and I'd like to speed it up as much as possible.
What you're looking for is a Perfect Hash Function. You can create one based on your list of strings, and then use it for the Dictionary.
The non-generic HashTable has a constructor that accepts IHashCodeProvider that lets you specify your own hash function. I couldn't find an equivalent for Dictionary, so you might have to resort to using a Hashtable instead.
You can use it internally in your PerfectStringHash class, which will do all the type casting for you.
Note that you may need to be able to specify the number of buckets in the hash. I think HashTable only lets you specify the load factor. You may find out you need to roll your own hash entirely. It's a good class for everyone to use, I guess, a generic perfect hash.
EDIT: Apparantly someone already implemented some Perfect Hash algorithms in C#.
The read performance of the generic dictionary is "close to O(1)" according to the remarks on MSDN for most TKey (and you should get pretty good performance with just string keys). And you get this out of the box, free, from the framework, without implementing your own collection.
http://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.90).aspx
If you need to stick with string keys - Dictionary is at least very good (if not best choice).
One more thing to note when you start measuring - consider if computation of hash itself has measurable impact. Searching for long strings should take longer to compute hash. See if items you want to search for can be represented as other objects with constant get hash time.
I am designing a C# class that contains a string hierarchy, where each string has 0 or 1 parents.
My inclination is to implement this with a Dictionary<string,string> where the key is the child and value is the parent. The dictionary may have a large amount of values, but I can't say the exact size. This seems like it should perform faster than creating composite wrapper with references to the parent, but I could be wrong.
Is there an alternative approach I can take that will ensure better performance speed?
Retrieving values from a Dictionary<K,V> is extremely fast (close to O(1), i.e., almost constant time lookup regardless of the size of the collection) because the underlying implementation uses a hash table. Of course, if the key type uses a terrible hashing algorithm than the performance can degrade, but you can rest assured that this is likely not the case for the framework's string type.
However, as I asked in my comment, you need to answer a few questions:
Define what performance metric is most important, i.e., Time (CPU) or space (memory).
What are your requirements? How will this be used? What's your worst case scenario? Is this going to hold a ton of data with relatively infrequent lookups, do many lookups need to be performed in a short amount of time, or do both hold true?
The Dictionary<K,V> class also uses an array internally which will grow as you add items. Is this okay for you? Again, you need to be more specific in terms of your requirements before anyone can give you a complete answer.
Using a Dictionary will be slower than using direct references, because the Dictionary will have to compute a hash etc. If you really only need the parent and not the child's operation (which I doubt), then you could store the Strings in an array together with the index of the parent String.
I want to have around 20,000 complex objects sitting in memory at all times (app will run in indefinite loop). I am considering using either List<MyObject> and then converting the list to Dictionary<int, MyObject> or just avoiding List alltogether and keeping the objects in dictionary. I was wondering, is it pricey to convert list to dictionary each time i need to look up an object? What would be better? Have them stored as Dictionary at all times? Or have List and using lambdas to get the needed object? Or should i look at other options?
Please note, I don't need queue or stack behavior when object retrieval causes dequeuing.
Thanks in advance.
Using a lambda lookup against the list is O(N), which for 20,000 items is not inconsiderable. However, if you know you'll always need to fetch the object by a known key, you can use a dictionary which is O(1) - that's as fast as algorithms go. So if there's some way you can structure your data/application so that you can base retrieval around some sort of predictable, repeatable, unique key, that will maximize performance. The worst thing (from a performance standpoint) is some complex lookup routine against a list, but sometimes it is unavoidable.
Regardless of what you're doing, if you need to access the List, then you are going to need to loop through it to find whatever you want.
If you need to access the Dictionary, then you have the option to use the key value to immediately retrieve what you are looking for, or, if you must, you can still loop through the Dictionary's Values.
Just use the Dictionary.