Utilisation of a Dictionary like class - c#

For the purpose of XML serialisation I had to disband a Dictionary collection I was using. I wrote a very straightforward alternative which consists of 2 classes:
NameValueItem: contains Name (Key) and Value
NameValueCollection: derived from CollectionBase and maintains a collection of NameValueItem objects.
I've included some standard methods to help maintain the collection (Add, Contains and Remove). So just like most Dictionary types, the Name (or Key) is unique:
public bool Contains(NameValueItem item)
{
foreach (NameValueItem lItem in List)
if(lItem.Name.Equals(item.Name))
return true;
return false;
}
Add uses this Contains method to determine whether to include a given item into the collection:
public void Add(NameValueItem item)
{
if (!Contains(item))
List.Add(item);
}
As bog standard, straightforward and easy as this code appears it's proving to be a little sluggish. Is there anything that can be done to improve the performance of this? Or alternatives I could use?
I was considering creating a NameValueHashSet, which is derived from HashSet.
Optional...:
I had a question which I was going to ask in a separate thread, but I'll leave it up to you as to whether you'd like to address it or not.
I wanted to add 2 properties to the NameValueCollection, Names and Values, which return a List of strings from the Collection of NameValueItem objects. Instead I built them into methods GetNames() and GetValues(), as I have to build the collection (i.e. create a List (names/values), iterate over collection adding names/value to List and return List).
Is this a better alternative? In terms of good coding practise, performance, etc.? As my thoughts regarding properties has always been to have it as stripped back as possible, that only references, arithmetic, etc. should exist, with no layers of processes. If that is the case, then it should be built into a method. Thoughts?

Perhaps you shouldn't try to rebuild what the framework already provides? Your implementation of a dictionary is going to perform poorly as it does not scale. The built in Dictionary<TKey, TValue> has O(1) access performance and for most insert and delete operations (unless there are collisions or the internal storage must be expanded).
You can extend the existing dictionary to provide XML serialization support; see this question and answers: Serialize Class containing Dictionary member
As for your second question - Dictionary already provides methods for getting an IEnumerable of the keys and values. This enumerates the keys and/or values as requested by the caller; that is delayed execution and is likely the preferred method over building a full List every time (which requires iterating through all the elements in the dictionary). If the caller wants a list then they just do dictionary.Values.ToList().

Related

Is it OK to use dictionaries when I don't need to quickly access their values?

Normally, I use a dictionary like a list, but with a key of a different type. I like the ability to quickly access individual items in the dictionary without having to loop through it until I find the item with the right property (because the property I'm looking for is in the Key).
But there is another possible use of a dictionary. I could just use the Key to store property A and the Value to store property B without ever using the dictionary's special functionality. For example, I could store a list of persons just by storing the forename in the key and the family name in the value (let's assume, for the sake of simplicity, that there won't ever be two people with the same forename, because I just couldn't come up with an better example). I would only use that dictionary to loop through it in a foreach loop and add items to it (no removing, sorting or accessing individual items). There would actually be no difference to using a List<KeyValuePair<string, string>> from using a Dictionary<string, string> (at least not in the example that I gave - I know that I could e. g. store multiple items wiht the same key in the list).
So, to sum it up, what should I do when I don't need to use the special functionalities a dictionary provides and just use it to store something that has exactly two properties:
use a Dictionary<,>
use a List<KeyValuePair<,>
use a List<MyType> with MyType being a custom class that contains the two properties and a constructor.
Don't use dictionaries for that.
If you don't want to create a class for this purpose, use something like List<Tuple<T1,T2>>. But keep in mind a custom class will be both more readable and more flexible.
Here's the reason: it will be much more easy to read your code if you use proper data structures. Using a dictionary will only confuse the reader, and you'll have problems the day a duplicate key shows up.
If someone reads your code and sees a Dictionary being used, he will assume you really mean to use a map-like structure. Your code should be clear and your intent should be obvious when reading it.
If you're concerned with performance you should probably store the data in a List. A Dictionary has lots of internal overhead. Both memory as well as CPU.
If you're just concerned with readability, chose the data structure that best captures your intent. If you are storing key-value pairs (for example, custom fields in a bug tracker issue) then use a Dictionary. If you are just storing items without them having some kind of logical key, use a List.
It takes little work to create a custom class to use as an item in a List. Using a Dictionary just because it gives you a Key property for each item is a misuse of that data structure. It is easy to create a custom class that also has a Key property.
Use List<MyType> where MyType includes all the values.
The problem with the dictionary approach is that it's not flexible. If you later decide to add middle names, you'll need to redesign your whole data structure, rather than just adding another field to MyType.

Best way for constructing a unique list of objects in C#

I'm wondering whether it will be quicker to follow one pattern or another for constructing a unique list of objects in C#:
Option 1
Add all the items into a generic list
Call the list.Distinct function on it
Option 2
Iterate over each item
Check whether the item already exists in the list and if not add it
You can use HashSet<T>:
The HashSet class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
You can provide custom IEqualityComparer<T> via constructor.
This is one of those "should I use a shoe or a brick to pound a nail into the wood" questions. You should use the appropriate data structure for the job, which based on your requirement of "constructing a unique list of objects", the HashSet<T> class satisfies.
If you require the items in list format, you can always call ToList() on the set.
If you are concerned about the performance of looking up unique items, use a Dictionary<TKey, TVale>. Also, a dictionary requires unique keys, so you will never have duplicates.

How can I discover the keys used in a C# indexer?

We have a legacy class that uses an indexer to allow people to add arbitrary key-value pairs:
legacyInstance["myKey"] = value;
Sadly, we can't look at or modify the code for this class. We can only access it through its various properties.
What we need to do is clone this object in an intelligent way. Normal properties are simple to clone, of course:
myClone.blargle = legacyInstance.blargle;
But how do we discover all of the keys that someone may have added to a particular object instance in order to clone them correctly? Ideally something like this:
string[] keys = legacyInstance.<#magic!#>;
foreach (string key in keys)
{
myClone[key] = legacyInstance[key];
}
And while I suspect the magic may involve some Reflection, getting at it is proving difficult since I am a Reflection tyro.
Grab the ILSpy and get into sources. If your collection doesn't expose members to iterate over keys (like Dictionary exposes Keys) then this is your only option. By the way, may be you collection is IEnumerable? Actually it is strange for a collection not to provide any iteration facilities. Is not it called YouCanOnlyUseIndexCollection?

Why does HttpSessionState not implement IDictionary?

HttpSessionState appears to be a typical key -> value collection, so why does it not implement the IDictionary-Interface?
Background: I am trying to output/save the Context of my ASP.NET Website when an error occurs and wanted to do this with a recursive function, that outputs a Collection and all containing Collections. Because HttpSessionState only implements ICollection and IEnumerable, I am losing the information about the keys if I want to do it in a generic manner (= working with interfaces).
IDictionary implies that the target collection is capable of quick lookups by key. (As far as I am aware) HttpSessionState is just a list of items, not a dictionary style structure. As a search of that structure would take linear time there's no reason to treat it as a dictionary. If you need a lot of quick lookups then copy the keys and values into a true dictionary. If you don't need quick lookups, then you'll just need to specialize for that class.
There are more things to an interface than just a list of method prototypes. There are semantics that need to be preserved for an interface too. Quick lookups by key is one such non-explicit assumption for (most) consumers of any IDictionary.
How about writing your own IDictionary-implementing wrapper that takes an HttpSessionState object in its constructor and behaves as you want? I'm assuming you want to do this so you can swap out other kinds of name-value (IDictionary-implementing) session implementations.
Of course, as Billy points out, this is a great way to dress a poor-performing psuedo-dictionary in dictionary clothes!
Just loop through the Session Keys and reference values like so:
Session[key]

When to use a HashTable

In C#, I find myself using a List<T>, IList<T> or IEnumerable<T> 99% of the time. Is there a case when it would be better to use a HashTable (or Dictionary<T,T> in 2.0 and above) over these?
Edit:
As pointed out, what someone would like to do with the collection often dictates what one should be using, so when would you use a Hashtable/Dictonary<T,T> over a List<T>?
Maybe not directly related to the OPs question, but there's a useful blog post about which collection structure to use at: SortedSets
Basically, what you want to do with the collection determines what type of collection you should create.
To summarise in more detail:
Use IList if you want to be able to enumerate and / or modify the collection (normally adding at end of list)
Use IEnumeration if you just want to enumerate the collection (don't need to add / remove - usually used as a return type)
Use IDictionary if you want to access elements by a key (adding / removing elements quickly using a key)
Use SortedSet if you want to access a collection in a predefined order (most common usage being to access the collection in order)
Overall, use Dictionary if you want to access / modify items by key in no particular order (preferred over list as that's generally done in order, preferred over enumeration as you can't modify an enumeration, preferred over hashtable as that's not strictly typed, preferred over sortedlist when you don't need keys sorted)
You use a hashtable (dictionary) when you want fast look up access to an item based on a key.
If you are using List, IList or IEnumerable generally this means that you are looping over data (well in the case of IEnumerable it definitely means that), and a hashtable isn't going to net you anything. Now if you were looking up a value in one list and using that to access data in another list, that would a little different. For example:
Find position in list of Item foo.
Position in list for foo corresponds to position in another list which contains Foo_Value.
Access position in seconds list to get Foo_Value.
Here is a link describing the different datatypes.
Another link.
Use a hashtable, when you need to be able to (quickly) look up items by key.
Of course, you can search through an IList or IEnumerable etc for a matching key but that will take O(n) time rather than O(1) for Hashtable or Dictionary.
Hash-tables are good choices if you are often doing "for something in collection" and you aren't concerned about the order of items in the collection.
Hash-tables are indexes. You can maintain a hash-table to index a list, so you can both choice to access it in order or randomly based upon the key.
Hashtable optimizes lookups. It computes a hash of each key you add. It then uses this hash code to look up the element very quickly. It is an older .NET Framework type. It is slower than the generic Dictionary type.
You're not really comparing the same things, when I use a dictionary it's because I want to have a lookup for the data, usually I want to store a list of objects and I want to be able to quickly look them up using a key of some kind.
I use Hashtables quite often to send back key/value collections to Javascript via page methods.
Dictionaries are good for caching things when you need to retrieve an object given its ID but don't want to have to hit the database: Assuming your collection is not large enough to induce a large number of collisions and your data needs retrieving often enough for an IEnumerable to be too slow, Dictionaries can give a decent speed-up.
There's no way of telling exactly without knowing what the collection is for, but unless the items in your collection are unique you cannot use a hashtable, as there will be nothing to use as a key. So perhaps the rule of thumb you are looking for is that if your members are all different and you want to pull individual instances out by key, use a hashtable. If you have a bunch of items that you wish to treat in the same way (such as doing a foreach on the entire set) use a list.

Categories

Resources