Composite key in Dictionary; override GetHashCode(), Equals etc or use structs? - c#

I have quite a few dictionaries where the key is a composite of several different values (mostly strings and integers). Do I implement these keys as classes (and override GetHashCode(), Equals() etc) or do I use struct instead?
ReSharper makes it easy to do the overriding, but the code looks horrible. Are there any performance implications of using a struct instead?

If your only problem is defining equality for the use in a Dictionary<TKey,TValue> then another path you may choose is implementing an IEqualityComparer<T>. This can be manually passed to the dictionary constructor and take care of equality comparisons for the TKey value without modification to the key type.
If you have the more general problem of defining equality for your composite values then I would focus on making the composite value natively support equality. Yes, defining the full set of methods necessary for equality is a pain but it's mostly boiler plate code. Getting it right is more important than whether or not the boiler plate code looks messy.

I would actually say that for any struct one should always manually code an override of Equals() and GetHashCode() along with implementing IEquatable<T>, if it's at all likely that it could be used by someone as a key, so I certainly wouldn't use it just to avoid doing so.
As well as requiring boxing, the default implementation is rather slow as it uses reflection to examine the fields. There are also a bug in at least some framework versions (the implementation quite wisely optimises as a binary compare when doing so will give the correct results, but unfortunately mis-judges when this is the case, and hence two structs containing equivalent decimal fields may be considered unequal).
When a quick composite is needed that doesn't really have any meaning to the system besides being a composite key, I'd recommend using Tuple. Tuple.Create() makes it easy to compose them, and the overrides for Equals() and GetHashCode() are pretty reasonable.
In some cases it can also be suitable to use anonymous classes as keys (only within the context of a given method of course), and here the overrides for Equals() and GetHashCode() are pretty reasonable too.

To create such a composite class, the recommended technique is to inherit from Tuple<int, string, ...>.
This way, you don't have to override GetHashCode and Equals yourself, the base class does it for you.
You can easily provide meaningful get accessors for each fields.
public class CompositeKey : Tuple<string, int>
{
public CompositeKey(string name, int age)
: base(name, age)
{
}
public string Name { get { return Item1; } }
public int Age { get { return Item2; } }
}
This also enforces immutability, which is appropriate for dictionary keys.
As for performance, the builtin Tuples are quite fast. I found custom structs can be faster, though if you really need every extra bit of performance, the best is to encode directly your key data into an int or long.

Related

C# override Dictionary ContainsKey

I just can't find any proper piece of code to do what i need.
Im using Dict.ContainsKey but due to the fact im always creating the Key i need to look for, i always get false for the ContainsKey (because hashKey is different and im creating the key i want to check all the time).
can someone please advise how to override Contains key or how to handle keys comparing in this situation ?
My dictionary Looks like
Dictionary<someObj, int>
public class someObj
{
public int someobjParam {get;set;}
public int someobjParamTwo {get;set;}
}
You don't need to override ContainsKey - you need to either override Equals and GetHashCode in someObj (which should be renamed to conform to .NET naming conventions, btw) or you need to pass an IEqualityComparer<someObj> to the Dictionary<,> constructor. Either way, that code is used to compare keys (and obtain hash codes from them).
Basically you need to make Equals determine equality, and make GetHashCode return the same code for equal objects and ideally different codes for different objects - see Eric Lippert's article on GetHashCode for more details.
Also, you should consider making someObj immutable: mutable dictionary keys are generally a bad idea, as if you modify the key in a hashcode-sensitive way after using it as a key within the dictionary, you won't be able to find it again. If the point of your custom type really is to be a key, then just make it immutable.
For simplicity, you should also consider making someObj implement IEquatable<someObj>, and also think about whether it would be appropriate to be a struct instead of a class. If you implement IEquatable<someObj> you should also override object.Equals in a consistent way. Usually the object.Equals implementation will just call the most strongly-typed IEquatable<T>.Equals implementation.
You don't need to override ContainsKey, but rather instruct the dictionary when it should consider that two keys are equal.
One way to do that is by implementing IEquatable<someObj> in your key class. Do this if the concept of equality is global across your app:
public class someObj : IEquatable<someObj>
{
public int someobjParam {get;set;}
public int someobjParamTwo {get;set;}
// override GetHashCode() and Equals(); for an example
// see http://msdn.microsoft.com/en-us/library/ms131190%28v=vs.110%29.aspx
}
Another one is by implementing an IEqualityComparer<someObj> and passing it to the dictionary's constructor.

When to use a GUID for a class

I'm working on a simple application with a few classes. This all started when I wanted to use the Remove method on a List<Car>. This method requires that you override the Equals and the GetHashCode methods for the Car type. In this situation, I decided to implement an ID property on the Car class. That way, my Equals method simply checks for ID equality, and my GetHashCode method returns base.GetHashCode().
Is this a good approach, or is implementing a GUID for a small class too heavy-handed? There wouldn't be any need for it without the reasons I explained above. The only requirement for uniqueness for this Car type is that it be unique within the List<T> collection to which it belongs. But adding the GUID property seemed like the quickest way around the GetHashCode mess. BTW, there are no int properties on my Car type.
There wouldn't be any need for it without the reasons I explained above.
If your class doesn't logically have an ID, then it certainly seems odd to include it just for the sake of equality.
For example, if you have two instances which have equal properties for everything apart from ID, are they really non-equal? If they are, you should potentially just use the default implementation of Equals/GetHashCode which uses reference identity for equality. Where you would use two objects with the same ID, you just use two references to the same object instead.
It really all depends on the context, and you haven't given much of that - but adding an ID just for equality is a bit of a design smell.
Instead of implementing Equals and GetHashCode just use RemoveAll:
myList.RemoveAll(x => x.ID == myCar.ID);
This allows you to specify a predicate that indicates what items should be removed instead (it doesn't matter that you are only removing one item).
Implementing Equals and GetHashCode in the way you describe strikes me as extremely dodgey - if your Equals implementation returns true then your GetHashCode method needs to return the same value so that those two objects will be placed in the same bucket in a hashtable. Your implementation (as I understand it) doesn't match this criteria as the base GetHashCode implementation is almost certainly going to return different values for two Car instances, regardless of if they have the same ID or not.
Implementing Equals and GetHashCode isn't entirely trivial and is probably something I'd generally avoid doing if there are alternatives. If you really want to do this then take a look at these resoruces:
What is the best algorithm for an overridden System.Object.GetHashCode?
Default implementation for Object.GetHashCode().
implementing the Equals method
Also hash codes are not GUIDs

Advantage of deriving external class from IEqualityComparer<> over overriding GetHashCode and Equals

I'm need to hash against a member variable instead of the class, so I don't check if the reference is in the dictionary. Without overriding the defaults, it won't find an identical Value, but only return if it finds the same exact instance of HashedType, such that this code fails.
Dictionary.Add(new HashedType(4));
Dictionary.Contains(new HashedType(4)); // fails to find 4
Definition of HashedType:
HashedType
{
public HashedType(Int32 value) { Value = value); }
public HashedType(String value) { Value = value); }
public object Value;
public void Serialize(Serializer s)
{
if (Value.GetType() == typeof(Int32))
{
s.Set<Int32>(0);
s.Set<Int32>(Value);
}
else
{
s.Set<Int32>(1);
s.Set<String>(Value);
}
}
}
It looks like I can override GetHashCode() and Equals() to do this for me.
However, MSDN recommends I create a separate class that I derive from IEqualityComparer and instantiate my dictionaries used HashedType with the HashedTypeComparer : IEqualityComparer.
To help make this easier, I've derived from Dictionary and created
HashedTypeDictionary<U> : Dictionary<T,U>
{
public HashedTypeDictionary() : base(new HashedTypeComparer()) { }
public bool Equals(HashedType a, HashedType b) { return a.Value == b.Value; }
publci int GetHashCode(HashedType a) { return a.Value.GetHashCode(); }
}
This all seems contrived.
Is the only advantage I get is not changing the Equals()?
I mean, really speaking, I would want Equals to compare against that single member anyway.
The idea is that object.Equals is the natural equality for that type (and GetHashCode should match that idea of equality). IEqualityComparer is used when you want a different equality on a case-by-case basis.
Consider for example, a string. The overridden Equals & GetHashCode methods do case-sensitive comparisons. But what if you want a dictionary where the keys are not case-sensitive? You write an IEqualityComparer that is not case-sensitive and pass it in the constructor of the dictionary.
Your examples sounds like any two instances of HashedType are to be normally treated as equal if their members are equal. In that case I'd recommend overriding the object.Equals and object.GetHashCode methods and not writing a IEqualityComparer.
The reason you would choose one over the other is whether you always want instances of a given type to be compared using a certain logic, or only in this one situation.
Equals and GetHashCode provide the "true" implementation of whether two objects are logically equal. IEqualityComparer allows you to override that in a case-by-case basis, and to separate ownership (it might be different parties who control the entities versus the code using them).
Imagine, for a moment, that you don't own the underlying class (i.e. it's produced by another team, or only given to you as a binary). You always can create the IEqualityComparer. You might not have the option of changing Equals and GetHashCode...
If the majority of the time you want the Dictionary behavior to work by default override GetHashCode and Equals. Bear in mind for this to work they must never change during the lifecycle of the object - so if they are running off Value then Value should be set in the constructor and a read-only property.
IEqualityComparer is really used for when you want to compare things differently in one section of your program.

What is the correct implementation of a "composite" variable type?

I have a program that have to manage objects with a composite key.
this key, to be simple is a couple of strings.
I have the following code :
public struct MyKey
{
public string Part1 { get; set;} // always set
public string Part2 { get; set;} // can be null
public MyKey(string part1, string part2) : this()
{
this.Part1 = part1;
this.Part2 = part2;
}
}
this is ok for storing my values.
Now I want to be able to :
use the equals operator (part1 and part2 are equals)
use the key in a Dictionnary, especially for using Contains method
I've guessing a number of things (override equals operator, overriding GetHashCode and Equals methods, implementing IComparable, etc.), but I'm not sure what are the necessary steps to reach my goals and what will cause an overhead.
thx in advance
Use .NET 4.0 Tuple, it has a correct Equals() and GetHashCode() based on the component values. I've used Tuple before or if they are string keys you can always concat with a separator, but if you truly want to keep that class as your key you do want a proper Equals() and GetHashCode(), so in that case have YourType implement IEqualityComparer<YourType>.
p.s. Here's a good example of overriding the GetHashCode() if you want to do that manually instead of a Tuple...
What is the best algorithm for an overridden System.Object.GetHashCode?
You need to override Equals and GetHashCode in order to use your object as a key in a dictionary. This answer provides an excellent explanation.
First of all, I would stay away from struct because of boxing and various coding pitfalls it can fall us into.
I would override both GetHashCode and Equals. GetHashCode since it is used in dictionaries and Equals since it is used by various ORMs and can be handy in implementing business logic.

Was IEqualityComparer<T> introduced for the following reasons?

1) Are the reasons why IEqualityComparer<T> was introduced:
a) so we would be able to compare objects (of particular type) for equality in as many different ways as needed
b) and by having a standard interface for implementing a custom equality comparison, chances are that much greater that third party classes will accept this interface as a parameter and by that allow us to inject into these classes equality comparison behavior via objects implementing IEqualityComparer<T>
2) I assume IEqualityComparer<T> should not be implemented on type T that we're trying to compare for equality, but instead we should implement it on helper class(es)?
Thank you
I'm doubtful that anyone here will be able to answer with any authority the reason that the interface was introduced (my guess--and that's all it is--would be to support one of the generic set types like Dictionary<TKey, TValue> or HashSet<T>), but its purpose is clear:
Defines methods to support the comparison of objects for equality.
If you combine this with the fact you can have multiple types implementing this interface (see StringComparer), then the answer to question a is yes.
The reason for this is threefold:
Operators (in this case, ==) are not polymorphic; if the type is upcasted to a higher level than where the type-specific comparison logic is defined, then you'll end up performing a reference comparison rather than using the logic within the == operator.
Equals() requires at least one valid reference and can provide different logic depending on whether it's called on the first or second value (one could be more derived and override the logic of the other).
Lastly and most importantly, the comparison logic provided by the type may not be what the user is after. For example, strings (in C#) are case sensitive when compared using == or Equals. This means that any container (like Dictionary<string, T> or HashSet<string>) would be case-sensitive. Allowing the user to provide another type that implements IEqualityComparer<string> means that the user can use whatever logic they like to determine if one string equals the other, including ignoring case.
As for question b, probably, though I wouldn't be surprised if this wasn't high on the list of priorities.
For your final question, I'd say that's generally true. While there's nothing stopping you from doing so, it is confusing to think that type T would provide custom comparison logic that is different from that provided on type T just because it's referenced as an IEqualiltyComparer<T>.
agreed on a and b
"should not be" is always a normative question and rarely a good metric. You do what works without getting into trouble. (Pragmatic Programmer). The fact that you can implement the interface statefull, stateless and in any which way, makes it possible to implement (alternative) comparers for all types, including value types, enums, sealed types, even abstract types; In essence it is a Strategy pattern
Sometimes there's a natural equality comparison for a type, in which case it should implement IEquatable<T>, not IEqualityComparer<T>. At other times, there are multiple possible ways of comparing objects for equality - so it makes sense to implement IEqualityComparer<T> then. It allows hash tables (and sets etc) to work in a flexible way.

Categories

Resources