I've got an object which has 3 integer values, combined the 3 integer are always unique. I want a quick way to find the specific object out of thousands.
my idea was to combine the 3 integers in a string so 1, 2533 and 9 would become a unique string: 1-2533-9. But is this the most efficient way? The numbers cannot be bigger than 2^16, so I could also use bit shifting and create a long which would be faster than creating a string from them I think. Are there other options? what should I do?
The main thing I want to achieve is finding the object quickly even with a collection of thousands of objects.
public class SomeClass
{
private readonly IDictionary<CompositeIntegralTriplet, object> _dictionary = new Dictionary<CompositeIntegralTriplet, object>();
}
public sealed class CompositeIntegralTriplet : IEquatable<CompositeIntegralTriplet>
{
public CompositeIntegralTriplet(int first, int second, int third)
{
First = first;
Second = second;
Third = third;
}
public int First { get; }
public int Second { get; }
public int Third { get; }
public override bool Equals(object other)
{
var otherAsTriplet = other as CompositeIntegralTriplet;
return Equals(otherAsTriplet);
}
public override int GetHashCode()
{
unchecked
{
var hashCode = First;
hashCode = (hashCode*397) ^ Second;
hashCode = (hashCode*397) ^ Third;
return hashCode;
}
}
public bool Equals(CompositeIntegralTriplet other) => other != null && First == other.First && Second == other.Second && Third == other.Third;
}
Related
I'm looking at how build the best HashCode for a class and I see some algorithms. I saw this one : Hash Code implementation, seems to be that .NET classes HashCode methods are similar (see by reflecting the code).
So question is, why don't create the above static class in order to build a HashCode automatically, just by passing fields we consider as a "key".
// Old version, see edit
public static class HashCodeBuilder
{
public static int Hash(params object[] keys)
{
if (object.ReferenceEquals(keys, null))
{
return 0;
}
int num = 42;
checked
{
for (int i = 0, length = keys.Length; i < length; i++)
{
num += 37;
if (object.ReferenceEquals(keys[i], null))
{ }
else if (keys[i].GetType().IsArray)
{
foreach (var item in (IEnumerable)keys[i])
{
num += Hash(item);
}
}
else
{
num += keys[i].GetHashCode();
}
}
}
return num;
}
}
And use it as like this :
// Old version, see edit
public sealed class A : IEquatable<A>
{
public A()
{ }
public string Key1 { get; set; }
public string Key2 { get; set; }
public string Value { get; set; }
public override bool Equals(object obj)
{
return this.Equals(obj as A);
}
public bool Equals(A other)
{
if(object.ReferenceEquals(other, null))
? false
: Key1 == other.Key1 && Key2 == other.Key2;
}
public override int GetHashCode()
{
return HashCodeBuilder.Hash(Key1, Key2);
}
}
Will be much simpler that always is own method, no? I'm missing something?
EDIT
According all remarks, I got the following code :
public static class HashCodeBuilder
{
public static int Hash(params object[] args)
{
if (args == null)
{
return 0;
}
int num = 42;
unchecked
{
foreach(var item in args)
{
if (ReferenceEquals(item, null))
{ }
else if (item.GetType().IsArray)
{
foreach (var subItem in (IEnumerable)item)
{
num = num * 37 + Hash(subItem);
}
}
else
{
num = num * 37 + item.GetHashCode();
}
}
}
return num;
}
}
public sealed class A : IEquatable<A>
{
public A()
{ }
public string Key1 { get; set; }
public string Key2 { get; set; }
public string Value { get; set; }
public override bool Equals(object obj)
{
return this.Equals(obj as A);
}
public bool Equals(A other)
{
if(ReferenceEquals(other, null))
{
return false;
}
else if(ReferenceEquals(this, other))
{
return true;
}
return Key1 == other.Key1
&& Key2 == other.Key2;
}
public override int GetHashCode()
{
return HashCodeBuilder.Hash(Key1, Key2);
}
}
Your Equals method is broken - it's assuming that two objects with the same hash code are necessarily equal. That's simply not the case.
Your hash code method looked okay at a quick glance, but could actually do some with some work - see below. It means boxing any value type values and creating an array any time you call it, but other than that it's okay (as SLaks pointed out, there are some issues around the collection handling). You might want to consider writing some generic overloads which would avoid those performance penalties for common cases (1, 2, 3 or 4 arguments, perhaps). You might also want to use a foreach loop instead of a plain for loop, just to be idiomatic.
You could do the same sort of thing for equality, but it would be slightly harder and messier.
EDIT: For the hash code itself, you're only ever adding values. I suspect you were trying to do this sort of thing:
int hash = 17;
hash = hash * 31 + firstValue.GetHashCode();
hash = hash * 31 + secondValue.GetHashCode();
hash = hash * 31 + thirdValue.GetHashCode();
return hash;
But that multiplies the hash by 31, it doesn't add 31. Currently your hash code will always return the same for the same values, whether or not they're in the same order, which isn't ideal.
EDIT: It seems there's some confusion over what hash codes are used for. I suggest that anyone who isn't sure reads the documentation for Object.GetHashCode and then Eric Lippert's blog post about hashing and equality.
This is what I'm using:
public static class ObjectExtensions
{
/// <summary>
/// Simplifies correctly calculating hash codes based upon
/// Jon Skeet's answer here
/// http://stackoverflow.com/a/263416
/// </summary>
/// <param name="obj"></param>
/// <param name="memberThunks">Thunks that return all the members upon which
/// the hash code should depend.</param>
/// <returns></returns>
public static int CalculateHashCode(this object obj, params Func<object>[] memberThunks)
{
// Overflow is okay; just wrap around
unchecked
{
int hash = 5;
foreach (var member in memberThunks)
hash = hash * 29 + member().GetHashCode();
return hash;
}
}
}
Example usage:
public class Exhibit
{
public virtual Document Document { get; set; }
public virtual ExhibitType ExhibitType { get; set; }
#region System.Object
public override bool Equals(object obj)
{
return Equals(obj as Exhibit);
}
public bool Equals(Exhibit other)
{
return other != null &&
Document.Equals(other.Document) &&
ExhibitType.Equals(other.ExhibitType);
}
public override int GetHashCode()
{
return this.CalculateHashCode(
() => Document,
() => ExhibitType);
}
#endregion
}
Instead of calling keys[i].GetType().IsArray, you should try to cast it to IEnumerable (using the as keyword).
You can fix the Equals method without repeating the field list by registering a static list of fields, like I do here using a collection of delegates.
This also avoids the array allocation per-call.
Note, however, that my code doesn't handle collection properties.
I have two objects using the ff. class:
public class Test {
public string Name {get; set;}
public List<Input> Inputs {get;set;}
......
//some other properties I don't need to check
}
public class Input {
public int VariableA {get;set;}
public int VariableB {get;set;}
public List<Sancti> Sancts {get;set;}
}
public class Sancti {
public string Symbol {get;set;}
public double Percentage {get;set;}
}
I want to check if two instance of Test has the same Inputs value. I've done this using a loop but I believe this is not the way to do this.
I've read some links: link1, link2 but they seem gibberish for me. Are there simpler ways to do this, like a one-liner something like:
test1.Inputs.IsTheSameAs(test2.Inputs)?
I was really hoping for a more readable method. Preferrably Linq.
NOTE: Order of inputs should not matter.
One way is to check the set negation between the two lists. If the result of listA negated by listB has no elements, that means that everything in listA exists in listB. If the reverse is also true, then the two lists are equal.
bool equal = testA.Inputs.Except(testB.Inputs).Count() == 0
&& testB.Inputs.Except(testA.Inputs).Count() == 0;
Another is to simply check each element of listA and see if it exists in listB (and vice versa):
bool equal = testA.Inputs.All(x => testB.Inputs.Contains(x))
&& testB.Inputs.All(x => testA.Inputs.Contains(x));
This being said, either of these can throw a false positive if there is one element in a list that would be "equal" to multiple elements in the other. For example, the following two lists would be considered equal using the above approaches:
listA = { 1, 2, 3, 4 };
listB = { 1, 1, 2, 2, 3, 3, 4, 4 };
To prevent that from happening, you would need to perform a one-to-one search rather than the nuclear solution. There are several ways to do this, but one way to do this is to first sort both lists and then checking their indices against each other:
var listASorted = testA.Inputs.OrderBy(x => x);
var listBSorted = testB.Inputs.OrderBy(x => x);
bool equal = testA.Inputs.Count == testB.Inputs.Count
&& listASorted.Zip(listBSorted, (x, y) => x == y).All(b => b);
(If the lists are already sorted or if you'd prefer to check the lists exactly (with ordering preserved), then you can skip the sorting step of this method.)
One thing to note with this method, however, is that Input needs to implement IComparable in order for them to be properly sorted. How you implement it exactly is up to you, but one possible way would be to sort Input based on the XOR of VariableA and VariableB:
public class Input : IComparable<Input>
{
...
public int Compare(Input other)
{
int a = this.VariableA ^ this.VariableB;
int b = other.VariableA ^ other.VariableB;
return a.Compare(b);
}
}
(In addition, Input should also override GetHashCode and Equals, as itsme86 describes in his answer.)
EDIT:
After being drawn back to this answer, I would now like to offer a much simpler solution:
var listASorted = testA.Inputs.OrderBy(x => x);
var listBSorted = testB.Inputs.OrderBy(x => x);
bool equal = listASorted.SequenceEqual(listBSorted);
(As before, you can skip the sorting step if the lists are already sorted or you want to compare them with their existing ordering intact.)
SequenceEqual uses the equality comparer for a particular type for determining equality. By default, this means checking that the values of all public properties are equal between two objects. If you want to implement a different approach, you can define an IEqualityComparer for Input:
public class InputComparer : IEqualityComparer<Input>
{
public bool Equals(Input a, Input b)
{
return a.variableA == b.variableA
&& a.variableB == b.variableB
&& ... and so on
}
public int GetHashCode(Input a)
{
return a.GetHashCode();
}
}
You can change your Input and Sancti class definitions to override Equals and GetHasCode. The following solution considers that 2 Inputs are equal when:
VariableA are equal and
VariableB are equal and
The Sancts List are equal, considering that the Sancti elements with the same Symbol must have the same Percentage to be equal
You may need to change this if your specifications are different:
public class Input
{
public int VariableA { get; set; }
public int VariableB { get; set; }
public List<Sancti> Sancts { get; set; }
public override bool Equals(object obj)
{
Input otherInput = obj as Input;
if (ReferenceEquals(otherInput, null))
return false;
if ((this.VariableA == otherInput.VariableA) &&
(this.VariableB == otherInput.VariableB) &&
this.Sancts.OrderBy(x=>x.Symbol).SequenceEqual(otherInput.Sancts.OrderBy(x => x.Symbol)))
return true;
else
{
return false;
}
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + VariableA.GetHashCode();
hash = hash * 23 + VariableB.GetHashCode();
hash = hash * 23 + Sancts.GetHashCode();
return hash;
}
}
}
public class Sancti
{
public string Symbol { get; set; }
public double Percentage { get; set; }
public override bool Equals(object obj)
{
Sancti otherInput = obj as Sancti;
if (ReferenceEquals(otherInput, null))
return false;
if ((this.Symbol == otherInput.Symbol) && (this.Percentage == otherInput.Percentage) )
return true;
else
{
return false;
}
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + Symbol.GetHashCode();
hash = hash * 23 + Percentage.GetHashCode();
return hash;
}
}
}
Doing this, you just have to do this to check if Inputs are equal:
test1.Inputs.SequenceEqual(test2.Inputs);
I have a custom class that I was trying to use as a key for a dictionary:
// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);
Now let me be clear, the goal here is to compare two SIMILAR but DIFFERENT lists, using X, Y, and Date as a composite key. The values will vary between these two lists, and I'm trying to quickly compare them and compute their differences.
Here is the class code:
public class MyPoint : IEquatable<MyPoint>
{
public short X { get; set; }
public short Y { get; set; }
public DateTime Date { get; set; }
public double MyValue { get; set; }
public override bool Equals(object obj)
{
return base.Equals(obj as MyPoint);
}
public bool Equals(MyPoint other)
{
if (other == null)
{
return false;
}
return (Date == other.Date)
&& (X == other.X)
&& (Y == other.Y);
}
public override int GetHashCode()
{
return Date.GetHashCode()
| X.GetHashCode()
| Y.GetHashCode();
}
}
I also tried keying with a struct:
public struct MyPointKey
{
public short X;
public short Y;
public DateTime Date;
// The value is not on these, because the struct is only used as key
}
In both cases dictionary writing was very, very slow (reading was quick).
I changed the key to a string, with the format:
var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);
I was amazed at how much quicker this is -- it's at least 10 times faster. I tried Release mode, no debugger, and every scenario I could think of.
This dictionary will contain 350,000 or more items, so performance does matter.
Any thoughts or suggestions? Thanks!
Another edit...
I'm trying to compare two lists of things in the fastest way I can. This is what I'm working with. The Dictionary is important for fast lookups against the source list.
IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;
Parallel.ForEach(sourceList,
sourceItem =>
{
double compareValue = 0;
MyThing compareMatch = null;
if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
{
compareValue = compareMatch.MyValue;
}
// Do a delta check on the item
double difference = sourceItem.MyValue- compareValue;
if (Math.Abs(difference) > 1)
{
// Record the difference...
}
});
As others have said in the comments, the problem is in your GetHashCode() implementation. Taking your code, and running 10,000,000 iterations with the string key took 11-12 seconds. Running with your existing hashCode I stopped it after over a minute. Using the following hashCode implementation took under 5 seconds.
public override int GetHashCode()
{
var hashCode = Date.GetHashCode();
hashCode = (hashCode * 37) ^ X.GetHashCode();
hashCode = (hashCode * 37) ^ Y.GetHashCode();
return hashCode;
}
The problem is that when you get into large numbers, the items are all colliding in the same buckets, due to the ORs. A dictionary where everything is in the same bucket is just a list.
If I got you right, you like to use a set while still maintaining the order of the keys. In this case, take SortedSet`1 instead.
Code:
class Program {
static void Main(string[] args) {
SortedSet<MyKey> list = new SortedSet<MyKey>() {
new MyKey(0, 0, new DateTime(2015, 6, 4)),
new MyKey(0, 1, new DateTime(2015, 6, 3)),
new MyKey(1, 1, new DateTime(2015, 6, 3)),
new MyKey(0, 0, new DateTime(2015, 6, 3)),
new MyKey(1, 0, new DateTime(2015, 6, 3)),
};
foreach(var entry in list) {
Console.WriteLine(string.Join(", ", entry.X, entry.Y, entry.Date));
}
Console.ReadKey();
}
}
I changed your MyPoint class as follows:
public sealed class MyKey : IEquatable<MyKey>, IComparable<MyKey> {
public readonly short X;
public readonly short Y;
public readonly DateTime Date;
public MyKey(short x, short y, DateTime date) {
this.X = x;
this.Y = y;
this.Date = date;
}
public override bool Equals(object that) {
return this.Equals(that as MyKey);
}
public bool Equals(MyKey that) {
if(that == null) {
return false;
}
return this.Date == that.Date
&& this.X == that.X
&& this.Y == that.Y;
}
public static bool operator ==(MyKey lhs, MyKey rhs) {
return lhs != null ? lhs.Equals(rhs) : rhs == null;
}
public static bool operator !=(MyKey lhs, MyKey rhs) {
return lhs != null ? !lhs.Equals(rhs) : rhs != null;
}
public override int GetHashCode() {
int result;
unchecked {
result = (int)X;
result = 31 * result + (int)Y;
result = 31 * result + Date.GetHashCode();
}
return result;
}
public int CompareTo(MyKey that) {
int result = this.X.CompareTo(that.X);
if(result != 0) {
return result;
}
result = this.Y.CompareTo(that.Y);
if(result != 0) {
return result;
}
result = this.Date.CompareTo(that.Date);
return result;
}
}
Output:
0, 0, 03.06.2015 00:00:00
0, 0, 04.06.2015 00:00:00
0, 1, 03.06.2015 00:00:00
1, 0, 03.06.2015 00:00:00
1, 1, 03.06.2015 00:00:00
I am using RTBTextPointer as custom key in dictionary...
Init.SpintaxEditorPropertyMain.SpintaxListDict = new Dictionary<RTBTextPointer, SpintaxEditorProperties.SpintaxMappedValue>(new RTBTextPointerComparer());
I worte this RTBTextPointer, and RTBTextPointerComparer classes in class library and using this in different wpf projects,
if (Init.SpintaxEditorPropertyMain.SpintaxListDict.ContainsKey(_index) == false)
{
Init.SpintaxEditorPropertyMain.SpintaxListDict.Add(_index,_SpintaxMappedVal);
}
everytime containsKey returns false, even it contains, so duplication entry occurs in dictionary.. is anything wrong in my "GetHashCode()"
public class RTBTextPointer
{
static int _row;
static int _column;
public int Row
{
get
{
return _row;
}
set
{
_row = value;
}
}
public int Column
{
get
{
return _column;
}
set
{
_column = value;
}
}
}
public class RTBTextPointerComparer : IEqualityComparer<RTBTextPointer>
{
public bool Equals(RTBTextPointer x, RTBTextPointer y)
{
bool result = int.Equals(x.Column, y.Column) && (int.Equals(x.Row, y.Row));
return result;
}
public int GetHashCode(RTBTextPointer obj)
{
var result = 0;
int hCode = obj.Column ^ obj.Row;
result = hCode.GetHashCode();
return result;
}
}
Please help me
Thanks in advance
I don't think you need to create a separate comparer. Just overriding Equals and GetHashCode should suffice.
Also, if you have very simple properties like that, you could switch to auto properties
public class RTBTextPointer
{
public int Row
{
get;
set;
}
public int Column
{
get;
set;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
{
return false;
}
if (ReferenceEquals(this, obj))
{
return true;
}
var other = obj as RTBTextPointer;
if (other == null)
{
return false;
}
return other.Row == Row && other.Column == Column;
}
public override int GetHashCode()
{
unchecked
{
// 397 or some other prime number
return (Row * 397) ^ Column;
}
}
}
See unchecked for more information about that.
If you have more than two properties, and if those properties could be null, the GetHashCode might look like this:
unchecked
{
var result = 0;
result = (result * 397) ^ (Prop1 != null ? Prop1.GetHashCode() : 0);
result = (result * 397) ^ (Prop2 != null ? Prop2.GetHashCode() : 0);
result = (result * 397) ^ (Prop3 != null ? Prop3.GetHashCode() : 0);
result = (result * 397) ^ (Prop4 != null ? Prop4.GetHashCode() : 0);
// ...
return result;
}
Your problem probably stems from the following declarations in RTBTextPointer:
static int _row;
static int _column;
These don't do what I think you're intending. They should be
private int _row;
private int _column;
As it is right now, these variables reference static members of RTBTextPointer. This means that any access of them will access or mutate the static members of it. static members are accessible to every instance of a type. If you make them private, they will apply per instance, which I believe is your intent.
Once that is corrected, I would reconsider the design of your class, at least if you intent to use it as a key in a Dictionary. RTBTextPointer should be immutable, or atleast the fields and properties that GetHashCode() depends on. Here's why:
When you add a object as a key to a dictionary, it's associated value is placed in a hash bucket , which is simply some data structure associated with a hash code. Assume we have some arbitrary key RTBTextPointer with Row = 2 and Column = 2 and a value of "Foo". It's GetHashCode would be 0 (2 XOR 2).
Hash Key Value
0 RTBTextPointer(2,2) Foo
Right now, a call to Dictionary.ContainsKey() would return true looking for RTBTextPointer(2,2). Now consider if this RTBTextPointer changed to have a Row = 4. It's hash code would now be 6 (4 XOR 2). The call to Dictionary.ContainsKey() would now be false, and the value Foo would be inaccessible because the key has a hash code that depends upon mutable state.
As a final note, I would consider overriding the Equals() and GetHashCode() methods of object.
I need to arrange sort of dictionary where the key would be a pair of enum and int
and value is object. So I want to map a pair to some object.
One option would be
public enum SomeEnum
{
value1, value2
}
class Key
{
public SomeEnum;
public int counter;
// Do I have to implement Compare here?
}
Dictionary<SomeEnum, object> _myDictionary;
Another option would convert enum and int to some unique key.
string key = String.Format("{0}/{1}", enumValue, intValue)
That approach requires string parsing, a lot of extra work.
How to make it easily?
I would go with something similar to
public enum SomeEnum
{
value1, value2
}
public struct Key
{
public SomeEnum;
public int counter;
}
Dictionary<Key, object>
I think that would make it?
If you are going to put this in a dictionary, then you will need to make sure you implement a meaningful .Equals and .GetHashCode or the dictionary will not behave correctly.
I'd start off with something like the following for the basic compound key, and then implement a custom IComparer to get the sort order you need.
public class MyKey
{
private readonly SomeEnum enumeration;
private readonly int number;
public MyKey(SomeEnum enumeration, int number)
{
this.enumeration = enumeration;
this.number = number;
}
public int Number
{
get { return number; }
}
public SomeEnum Enumeration
{
get { return enumeration; }
}
public override int GetHashCode()
{
int hash = 23 * 37 + this.enumeration.GetHashCode();
hash = hash * 37 + this.number.GetHashCode();
return hash;
}
public override bool Equals(object obj)
{
var supplied = obj as MyKey;
if (supplied == null)
{
return false;
}
if (supplied.enumeration != this.enumeration)
{
return false;
}
if (supplied.number != this.number)
{
return false;
}
return true;
}
}
If you are using C# 4.0, you could use the Tuple class.
var key = Tuple.Create(SomeEnum.Value1, 3);