Generate hash of object consistently - c#

I'm trying to get a hash (md5 or sha) of an object.
I've implemented this:
http://alexmg.com/post/2009/04/16/Compute-any-hash-for-any-object-in-C.aspx
I'm using nHibernate to retrieve my POCOs from a database.
When running GetHash on this, it's different each time it's selected and hydrated from the database. I guess this is expected, as the underlying proxies will change.
Anyway,
Is there a way to get a hash of all the properties on an object, consistently each time?
I've toyed with the idea of using a StringBuilder over this.GetType().GetProperties..... and creating a hash on that, but that seems inefficient?
As a side note, this is for change-tracking these entities from one database (RDBMS) to a NoSQL store
(comparing hash values to see if objects changed between rdbms and nosql)

If you're not overriding GetHashCode you just inherit Object.GetHashCode. Object.GetHashCode basically just returns the memory address of the instance, if it's a reference object. Of course, each time an object is loaded it will likely be loaded into a different part of memory and thus result in a different hash code.
It's debatable whether that's the correct thing to do; but that's what was implemented "back in the day" so it can't change now.
If you want something consistent then you have to override GetHashCode and create a code based on the "value" of the object (i.e. the properties and/or fields). This can be as simple as a distributed merging of the hash codes of all the properties/fields. Or, it could be as complicated as you need it to be. If all you're looking for is something to differentiate two different objects, then using a unique key on the object might work for you.If you're looking for change tracking, using the unique key for the hash probably isn't going to work
I simply use all the hash codes of the fields to create a reasonably distributed hash code for the parent object. For example:
public override int GetHashCode()
{
unchecked
{
int result = (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Street != null ? Street.GetHashCode() : 0);
result = (result*397) ^ Age;
return result;
}
}
The use of the prime number 397 is to generate a unique number for a value to better distribute the hash code. See http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/ for more details on the use of primes in hash code calculations.
You could, of course, use reflection to get at all the properties to do this, but that would be slower. Alternatively you could use the CodeDOM to generate code dynamically to generate the hash based on reflecting on the properties and cache that code (i.e. generate it once and reload it next time). But, this of course, is very complex and might not be worth the effort.
An MD5 or SHA hash or CRC is generally based on a block of data. If you want that, then using the hash code of each property doesn't make sense. Possibly serializing the data to memory and calculating the hash that way would be more applicable, as Henk describes.

If this 'hash' is solely used to determine whether entities have changed then the following algorithm may help (NB it is untested and assumes that the same runtime will be used when generating hashes (otherwise the reliance on GetHashCode for 'simple' types is incorrect)):
public static byte[] Hash<T>(T entity)
{
var seen = new HashSet<object>();
var properties = GetAllSimpleProperties(entity, seen);
return properties.Select(p => BitConverter.GetBytes(p.GetHashCode()).AsEnumerable()).Aggregate((ag, next) => ag.Concat(next)).ToArray();
}
private static IEnumerable<object> GetAllSimpleProperties<T>(T entity, HashSet<object> seen)
{
foreach (var property in PropertiesOf<T>.All(entity))
{
if (property is int || property is long || property is string ...) yield return property;
else if (seen.Add(property)) // Handle cyclic references
{
foreach (var simple in GetAllSimpleProperties(property, seen)) yield return simple;
}
}
}
private static class PropertiesOf<T>
{
private static readonly List<Func<T, dynamic>> Properties = new List<Func<T, dynamic>>();
static PropertiesOf()
{
foreach (var property in typeof(T).GetProperties())
{
var getMethod = property.GetGetMethod();
var function = (Func<T, dynamic>)Delegate.CreateDelegate(typeof(Func<T, dynamic>), getMethod);
Properties.Add(function);
}
}
public static IEnumerable<dynamic> All(T entity)
{
return Properties.Select(p => p(entity)).Where(v => v != null);
}
}
This would then be useable like so:
var entity1 = LoadEntityFromRdbms();
var entity2 = LoadEntityFromNoSql();
var hash1 = Hash(entity1);
var hash2 = Hash(entity2);
Assert.IsTrue(hash1.SequenceEqual(hash2));

GetHashCode() returns an Int32 (not an MD5).
If you create two objects with all the same property values they will not have the same Hash if you use the base or system GetHashCode().
String is an object and an exception.
string s1 = "john";
string s2 = "john";
if (s1 == s2) returns true and will return the same GetHashCode()
If you want to control equality comparison of two objects then you should override the GetHash and Equality.
If two object are the same then they must also have the same GetHash(). But two objects with the same GetHash() are not necessarily the same. A comparison will first test the GetHash() and if it gets a match there it will test the Equals. OK there are some comparisons that go straight to Equals but you should still override both and make sure two identical objects produce the same GetHash.
I use this for syncing a client with the server. You could use all the Properties or you could have any Property change change the VerID. The advantage here is a simpler quicker GetHashCode(). In my case I was resetting the VerID with any Property change already.
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || !(obj is FTSdocWord)) return false;
FTSdocWord item = (FTSdocWord)obj;
return (OjbID == item.ObjID && VerID == item.VerID);
}
public override int GetHashCode()
{
return ObjID ^ VerID;
}
I ended up using ObjID alone so I could do the following
if (myClientObj == myServerObj && myClientObj.VerID <> myServerObj.VerID)
{
// need to synch
}
Object.GetHashCode Method
Two objects with the same property values. Are they equal? Do they produce the same GetHashCode()?
personDefault pd1 = new personDefault("John");
personDefault pd2 = new personDefault("John");
System.Diagnostics.Debug.WriteLine(po1.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(po2.GetHashCode().ToString());
// different GetHashCode
if (pd1.Equals(pd2)) // returns false
{
System.Diagnostics.Debug.WriteLine("pd1 == pd2");
}
List<personDefault> personsDefault = new List<personDefault>();
personsDefault.Add(pd1);
if (personsDefault.Contains(pd2)) // returns false
{
System.Diagnostics.Debug.WriteLine("Contains(pd2)");
}
personOverRide po1 = new personOverRide("John");
personOverRide po2 = new personOverRide("John");
System.Diagnostics.Debug.WriteLine(po1.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(po2.GetHashCode().ToString());
// same hash
if (po1.Equals(po2)) // returns true
{
System.Diagnostics.Debug.WriteLine("po1 == po2");
}
List<personOverRide> personsOverRide = new List<personOverRide>();
personsOverRide.Add(po1);
if (personsOverRide.Contains(po2)) // returns true
{
System.Diagnostics.Debug.WriteLine("Contains(p02)");
}
}
public class personDefault
{
public string Name { get; private set; }
public personDefault(string name) { Name = name; }
}
public class personOverRide: Object
{
public string Name { get; private set; }
public personOverRide(string name) { Name = name; }
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || !(obj is personOverRide)) return false;
personOverRide item = (personOverRide)obj;
return (Name == item.Name);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}

Related

C# Dictionary with a class as key

My question is basically the opposite of Dictionary.ContainsKey return False, but a want True and of "the given key was not present in the dictionary" error when using a self-defined class as key:
I want to use a medium-sized class as the dictionary's key, and the dictionary must compare the keys by reference, not by value equality. The problem is, that the class already implements Equals() (which is performing value equality - which is what not what I want here).
Here's a small test class for reproduction:
class CTest
{
public int m_iValue;
public CTest (int i_iValue)
{
m_iValue = i_iValue;
}
public override bool Equals (object i_value)
{
if (ReferenceEquals (null, i_value))
return false;
if (ReferenceEquals (this, i_value))
return true;
if (i_value.GetType () != GetType ())
return false;
return m_iValue == ((CTest)i_value).m_iValue;
}
}
I have NOT yet implemented GetHashCode() (actually I have, but it only returns base.GetHashCode() so far).
Now I created a test program with a dictionary that uses instances of this class as keys. I can add multiple identical instances to the dictionary without problems, but this only works because GetHashCode() returns different values:
private static void Main ()
{
var oTest1 = new CTest (1);
var oTest2 = new CTest (1);
bool bEquals = Equals (oTest1, oTest2); // true
var dict = new Dictionary<CTest, int> ();
dict.Add (oTest1, 1);
dict.Add (oTest2, 2); // works
var iValue1 = dict[oTest1]; // correctly returns 1
var iValue2 = dict[oTest2]; // correctly returns 2
int iH1 = oTest1.GetHashCode (); // values different on each execution
int iH2 = oTest2.GetHashCode (); // values different on each execution, but never equals iH1
}
And the hash values are different every time, maybe because the calculatation in object.GetHashCode() uses some randomization or some numbers that come from the reference handle (which is different for each object).
However, this answer on Why is it important to override GetHashCode when Equals method is overridden? says that GetHashCode() must return the same values for equal objects, so I added
public override int GetHashCode ()
{
return m_iValue;
}
After that, I could not add multiple equal objects to the dictionary any more.
Now, there are two conclusions:
If I removed my own GetHashCode() again, the hash values will be different again and the dictionary can be used. But there may be situations that accidentally give the same hash code for two equal objects, which will cause an exception at runtime, whose cause will for sure never be found. Because of that (little, but not zero) risk, I cannot use a dictionary.
If I correctly implement GetHashCode() like I am supposed to do, I cannot use a dictionary anyway.
What possibilities exist to still use a dictionary?
Like many times before, I had the idea for a solution when writing this question.
You can specify an IEqualityComparer<TKey> in the constructor of the dictionary. There is one in the .net framework, but it's internal sealed, so you need to implement your own:
Is there any kind of "ReferenceComparer" in .NET?
internal class ReferenceComparer<T> : IEqualityComparer<T> where T : class
{
static ReferenceComparer ()
{
Instance = new ReferenceComparer<T> ();
}
public static ReferenceComparer<T> Instance { get; }
public bool Equals (T x, T y)
{
return ReferenceEquals (x, y);
}
public int GetHashCode (T obj)
{
return System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode (obj);
}
}

GetHashCode() method returning value instead of address

class FirstClass { }
class SecondClass { }
class Program
{
private static void Main(string[] args)
{
var firstClass1 = new FirstClass();
var firstClass2 = firstClass1;
var secondClass1 = new SecondClass();;
var secondClass2 = secondClass1;
object null1 = null;
object null2 = null;
int a = 10, b = 10, c = 20;
Console.WriteLine("firstClass1 == firstClass2:\t" + SameReference(firstClass1, firstClass2));
Console.WriteLine("secondClass1 == secondClass2:\t" + SameReference(secondClass1, secondClass2));
Console.WriteLine("firstClass1 == secondClass1:\t" + SameReference(firstClass1, secondClass1));
//Console.WriteLine("null1 == null2:\t" + SameReference(null1, null2));
Console.WriteLine("null1==firstClass1:\t" + SameReference(null1, firstClass1));
Console.WriteLine("a == b:\t" + SameReference(a, b));
Console.ReadKey();
}
public static bool SameReference(object object1,object object2)
{
if ((object1 == null && object2 != null) || (object1 != null && object2 == null))
return false;
if ((object1 == null && object2 == null) || (object1.GetHashCode() == object2.GetHashCode()))
{
Console.WriteLine(object1.GetHashCode() + "\t" + object2.GetHashCode());
return true;
}
return false;
}
}
In the above code, GetHashCode() method is returning 10 and 10 for a and b, but I want to compare addresses. That is how GetHashCode() method should work.please explain.
Please see the docs (emphasis mine):
The GetHashCode method can be overridden by a derived type. If GetHashCode is not overridden, hash codes for reference types are computed by calling the Object.GetHashCode method of the base class, which computes a hash code based on an object's reference; for more information, see RuntimeHelpers.GetHashCode. In other words, two objects for which the ReferenceEquals method returns true have identical hash codes. If value types do not override GetHashCode, the ValueType.GetHashCode method of the base class uses reflection to compute the hash code based on the values of the type's fields. In other words, value types whose fields have equal values have equal hash codes. For more information about overriding GetHashCode, see the "Notes to Inheritors" section.
According to this answer, GetHashCode() for int is intended to return the value of the int. This can also be seen if you look at the C# source code for Int32:
// The absolute value of the int contained.
public override int GetHashCode() {
return m_value;
}
GetHashCode() is working as expected. As per MSDN:
A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary class, the Hashtable class, or a type derived from the DictionaryBase class. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.
As far as GetHashCode() is concerned, the hash code for int a=10,b=10 would always be the same, as - because an int is a primitive type - a and b are both equal to each other.
if you want to check reference equality (that is helpful when you have object from same class not object from two different classes) then you should use
.ReferenceEquals() tests whether or not two objects are the same instance and cannot be overridden.
Because for GethashCode :-
Two objects that are equal return hash codes that are equal.
However, the reverse is not true:
For this you should override GetHashcode in your class as below and calculate hashcode based on property and then try to find equal or not by overriding Equal method.
public class FirstClass
{
public int Id {get;set;}
public override bool Equals(Object obj)
{
if (!(obj is FirstClass)) return false;
FirstClass f = (FirstClass) obj;
return thid.Id == f.id;
}
public override int GetHashCode()
{
return this.Id;
}
}
Just from you code you have objects of two different classes as they are belong to two different class they are not going to be equal. Please refer C# documentation for GetHashCode and Equal.
so this line of code is invalid :
Console.WriteLine("firstClass1==secondClass1:\t"+SameReference(firstClass1,secondClass1));

the easest way of findind the difference of two objects of the same class [duplicate]

The project I'm working on needs some simple audit logging for when a user changes their email, billing address, etc. The objects we're working with are coming from different sources, one a WCF service, the other a web service.
I've implemented the following method using reflection to find changes to the properties on two different objects. This generates a list of the properties that have differences along with their old and new values.
public static IList GenerateAuditLogMessages(T originalObject, T changedObject)
{
IList list = new List();
string className = string.Concat("[", originalObject.GetType().Name, "] ");
foreach (PropertyInfo property in originalObject.GetType().GetProperties())
{
Type comparable =
property.PropertyType.GetInterface("System.IComparable");
if (comparable != null)
{
string originalPropertyValue =
property.GetValue(originalObject, null) as string;
string newPropertyValue =
property.GetValue(changedObject, null) as string;
if (originalPropertyValue != newPropertyValue)
{
list.Add(string.Concat(className, property.Name,
" changed from '", originalPropertyValue,
"' to '", newPropertyValue, "'"));
}
}
}
return list;
}
I'm looking for System.IComparable because "All numeric types (such as Int32 and Double) implement IComparable, as do String, Char, and DateTime." This seemed the best way to find any property that's not a custom class.
Tapping into the PropertyChanged event that's generated by the WCF or web service proxy code sounded good but doesn't give me enough info for my audit logs (old and new values).
Looking for input as to if there is a better way to do this, thanks!
#Aaronaught, here is some example code that is generating a positive match based on doing object.Equals:
Address address1 = new Address();
address1.StateProvince = new StateProvince();
Address address2 = new Address();
address2.StateProvince = new StateProvince();
IList list = Utility.GenerateAuditLogMessages(address1, address2);
"[Address] StateProvince changed from
'MyAccountService.StateProvince' to
'MyAccountService.StateProvince'"
It's two different instances of the StateProvince class, but the values of the properties are the same (all null in this case). We're not overriding the equals method.
IComparable is for ordering comparisons. Either use IEquatable instead, or just use the static System.Object.Equals method. The latter has the benefit of also working if the object is not a primitive type but still defines its own equality comparison by overriding Equals.
object originalValue = property.GetValue(originalObject, null);
object newValue = property.GetValue(changedObject, null);
if (!object.Equals(originalValue, newValue))
{
string originalText = (originalValue != null) ?
originalValue.ToString() : "[NULL]";
string newText = (newText != null) ?
newValue.ToString() : "[NULL]";
// etc.
}
This obviously isn't perfect, but if you're only doing it with classes that you control, then you can make sure it always works for your particular needs.
There are other methods to compare objects (such as checksums, serialization, etc.) but this is probably the most reliable if the classes don't consistently implement IPropertyChanged and you want to actually know the differences.
Update for new example code:
Address address1 = new Address();
address1.StateProvince = new StateProvince();
Address address2 = new Address();
address2.StateProvince = new StateProvince();
IList list = Utility.GenerateAuditLogMessages(address1, address2);
The reason that using object.Equals in your audit method results in a "hit" is because the instances are actually not equal!
Sure, the StateProvince may be empty in both cases, but address1 and address2 still have non-null values for the StateProvince property and each instance is different. Therefore, address1 and address2 have different properties.
Let's flip this around, take this code as an example:
Address address1 = new Address("35 Elm St");
address1.StateProvince = new StateProvince("TX");
Address address2 = new Address("35 Elm St");
address2.StateProvince = new StateProvince("AZ");
Should these be considered equal? Well, they will be, using your method, because StateProvince does not implement IComparable. That's the only reason why your method reported that the two objects were the same in the original case. Since the StateProvince class does not implement IComparable, the tracker just skips that property entirely. But these two addresses are clearly not equal!
This is why I originally suggested using object.Equals, because then you can override it in the StateProvince method to get better results:
public class StateProvince
{
public string Code { get; set; }
public override bool Equals(object obj)
{
if (obj == null)
return false;
StateProvince sp = obj as StateProvince;
if (object.ReferenceEquals(sp, null))
return false;
return (sp.Code == Code);
}
public bool Equals(StateProvince sp)
{
if (object.ReferenceEquals(sp, null))
return false;
return (sp.Code == Code);
}
public override int GetHashCode()
{
return Code.GetHashCode();
}
public override string ToString()
{
return string.Format("Code: [{0}]", Code);
}
}
Once you've done this, the object.Equals code will work perfectly. Instead of naïvely checking whether or not address1 and address2 literally have the same StateProvince reference, it will actually check for semantic equality.
The other way around this is to extend the tracking code to actually descend into sub-objects. In other words, for each property, check the Type.IsClass and optionally the Type.IsInterface property, and if true, then recursively invoke the change-tracking method on the property itself, prefixing any audit results returned recursively with the property name. So you'd end up with a change for StateProvinceCode.
I use the above approach sometimes too, but it's easier to just override Equals on the objects for which you want to compare semantic equality (i.e. audit) and provide an appropriate ToString override that makes it clear what changed. It doesn't scale for deep nesting but I think it's unusual to want to audit that way.
The last trick is to define your own interface, say IAuditable<T>, which takes a second instance of the same type as a parameter and actually returns a list (or enumerable) of all of the differences. It's similar to our overridden object.Equals method above but gives back more information. This is useful for when the object graph is really complicated and you know you can't rely on Reflection or Equals. You can combine this with the above approach; really all you have to do is substitute IComparable for your IAuditable and invoke the Audit method if it implements that interface.
This project on github checks nearly any type of property and can be customized as you need.
You might want to look at Microsoft's Testapi It has an object comparison api that does deep comparisons. It might be overkill for you but it could be worth a look.
var comparer = new ObjectComparer(new PublicPropertyObjectGraphFactory());
IEnumerable<ObjectComparisonMismatch> mismatches;
bool result = comparer.Compare(left, right, out mismatches);
foreach (var mismatch in mismatches)
{
Console.Out.WriteLine("\t'{0}' = '{1}' and '{2}'='{3}' do not match. '{4}'",
mismatch.LeftObjectNode.Name, mismatch.LeftObjectNode.ObjectValue,
mismatch.RightObjectNode.Name, mismatch.RightObjectNode.ObjectValue,
mismatch.MismatchType);
}
Here a short LINQ version that extends object and returns a list of properties that are not equal:
usage: object.DetailedCompare(objectToCompare);
public static class ObjectExtensions
{
public static List<Variance> DetailedCompare<T>(this T val1, T val2)
{
var propertyInfo = val1.GetType().GetProperties();
return propertyInfo.Select(f => new Variance
{
Property = f.Name,
ValueA = f.GetValue(val1),
ValueB = f.GetValue(val2)
})
.Where(v => !v.ValueA.Equals(v.ValueB))
.ToList();
}
public class Variance
{
public string Property { get; set; }
public object ValueA { get; set; }
public object ValueB { get; set; }
}
}
You never want to implement GetHashCode on mutable properties (properties that could be changed by someone) - i.e. non-private setters.
Imagine this scenario:
you put an instance of your object in a collection which uses GetHashCode() "under the covers" or directly (Hashtable).
Then someone changes the value of the field/property that you've used in your GetHashCode() implementation.
Guess what... your object is permanently lost in the collection since the collection uses GetHashCode() to find it! You've effectively changed the hashcode value from what was originally placed in the collection. Probably not what you wanted.
Liviu Trifoi solution: Using CompareNETObjects library.
GitHub - NuGet package - Tutorial.
I think this method is quite neat, it avoids repetition or adding anything to classes. What more are you looking for?
The only alternative would be to generate a state dictionary for the old and new objects, and write a comparison for them. The code for generating the state dictionary could reuse any serialisation you have for storing this data in the database.
The my way of Expression tree compile version. It should faster than PropertyInfo.GetValue.
static class ObjDiffCollector<T>
{
private delegate DiffEntry DiffDelegate(T x, T y);
private static readonly IReadOnlyDictionary<string, DiffDelegate> DicDiffDels;
private static PropertyInfo PropertyOf<TClass, TProperty>(Expression<Func<TClass, TProperty>> selector)
=> (PropertyInfo)((MemberExpression)selector.Body).Member;
static ObjDiffCollector()
{
var expParamX = Expression.Parameter(typeof(T), "x");
var expParamY = Expression.Parameter(typeof(T), "y");
var propDrName = PropertyOf((DiffEntry x) => x.Prop);
var propDrValX = PropertyOf((DiffEntry x) => x.ValX);
var propDrValY = PropertyOf((DiffEntry x) => x.ValY);
var dic = new Dictionary<string, DiffDelegate>();
var props = typeof(T).GetProperties();
foreach (var info in props)
{
var expValX = Expression.MakeMemberAccess(expParamX, info);
var expValY = Expression.MakeMemberAccess(expParamY, info);
var expEq = Expression.Equal(expValX, expValY);
var expNewEntry = Expression.New(typeof(DiffEntry));
var expMemberInitEntry = Expression.MemberInit(expNewEntry,
Expression.Bind(propDrName, Expression.Constant(info.Name)),
Expression.Bind(propDrValX, Expression.Convert(expValX, typeof(object))),
Expression.Bind(propDrValY, Expression.Convert(expValY, typeof(object)))
);
var expReturn = Expression.Condition(expEq
, Expression.Convert(Expression.Constant(null), typeof(DiffEntry))
, expMemberInitEntry);
var expLambda = Expression.Lambda<DiffDelegate>(expReturn, expParamX, expParamY);
var compiled = expLambda.Compile();
dic[info.Name] = compiled;
}
DicDiffDels = dic;
}
public static DiffEntry[] Diff(T x, T y)
{
var list = new List<DiffEntry>(DicDiffDels.Count);
foreach (var pair in DicDiffDels)
{
var r = pair.Value(x, y);
if (r != null) list.Add(r);
}
return list.ToArray();
}
}
class DiffEntry
{
public string Prop { get; set; }
public object ValX { get; set; }
public object ValY { get; set; }
}

Removing duplicates from a List or HashSet in c#

I have a very simple test method that returns a List that has a number of duplicates, but when it did not I thought I'd try HashSet as that should remove duplicates, but it appears I need to override the Equals and GetHashCode but I am really struggling to understand what I need to do. I would appreciate some pointers please.
HashSet<object> test = XmlManager.PeriodHashSet(Server.MapPath("../Xml/XmlFile.xml"));
foreach (Object period in test2)
{
PeriodData pd = period as PeriodData;
Response.Write(pd.PeriodName + "<br>");
}
I also tried it with the following
List<object> test = XmlManager.PeriodList(Server.MapPath("../Xml/XmlFile.xml"));
List<object> test2 = test.Distinct().ToList();
foreach (Object period in test2)
{
PeriodData pd = period as PeriodData;
Response.Write(pd.PeriodName + "<br>");
}
The PeriodData objuect is delcarewd as follows:
public class PeriodData
{
private int m_StartYear = -9999999;
private int m_EndYear = -9999999;
private string m_PeriodName = String.Empty;
public int StartYear
{
get { return m_StartYear; }
set { m_StartYear = value; }
}
public int EndYear
{
get { return m_EndYear; }
set { m_EndYear = value; }
}
public string PeriodName
{
get { return m_PeriodName; }
set { m_PeriodName = value; }
}
}
It is the returned PeriodName I want to remove the duplicate for.
For the HashSet<T> to work, you need to, at a minimum, override Object.Equals and Object.GetHashCode. This is what allows the hashing algorithm to know what makes two objects "distinct" or the same by values.
In terms of simplifying and improving the code, there are two major changes I'd recommend to make this work:
First, you should use HashSet<PeriodData> (or List<PeriodData>), not HashSet<object>.
Second, your PeriodData class should implement IEquatable<PeriodData> in order to provide proper hashing and equality.
You have to decide what makes two periods equal. If all three properties have to be the same for two periods to be equal, then you can implement Equals thus:
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
PeriodData other = (PeriodData)obj;
return m_StartYear == other.m_StartYear && m_EndYear == other.m_EndYear && string.Equals(m_PeriodName, other.m_PeriodName);
}
For GetHashCode, you could do something like this:
public override int GetHashCode()
{
return (((m_StartYear * 397) ^ m_EndYear) * 397) ^ m_PeriodName.GetHashCode();
}
(Credit where it is due: these are adapted from the code generated by ReSharper's code generation tool.)
As others have noted, it would be better to implement IEquatable<T> as well.
If you cannot modify the class, or you do not want to modify it, you can put the equality comparison logic in another class that implements IEqualityComparer<PeriodData, which you can pass to the appropriate constructor of HashSet<PeriodData> and Enumerable.Distinct()
You have to implement IEquatable<T> to make Distinct() work.
How would the framework know how to say "those two objects are identical" if you don't? You have to provide the framework a way to compare your objects, that's the purpose of the IEquatable<T> implementation.

Distinct() returns duplicates with a user-defined type

I'm trying to write a Linq query which returns an array of objects, with unique values in their constructors. For integer types, Distinct returns only one copy of each value, but when I try creating my list of objects, things fall apart. I suspect it's a problem with the equality operator for my class, but when I set a breakpoint, it's never hit.
Filtering out the duplicate int in a sub-expression solves the problem, and also saves me from constructing objects that will be immediately discarded, but I'm curious why this version doesn't work.
UPDATE: 11:04 PM Several folks have pointed out that MyType doesn't override GetHashCode(). I'm afraid I oversimplified the example. The original MyType does indeed implement it. I've added it below, modified only to put the hash code in a temp variable before returning it.
Running through the debugger, I see that all five invocations of GetHashCode return a different value. And since MyType only inherits from Object, this is presumably the same behavior Object would exhibit.
Would I be correct then to conclude that the hash should instead be based on the contents of Value? This was my first attempt at overriding operators, and at the time, it didn't appear that GetHashCode needed to be particularly fancy. (This is the first time one of my equality checks didn't seem to work properly.)
class Program
{
static void Main(string[] args)
{
int[] list = { 1, 3, 4, 4, 5 };
int[] list2 =
(from value in list
select value).Distinct().ToArray(); // One copy of each value.
MyType[] distinct =
(from value in list
select new MyType(value)).Distinct().ToArray(); // Two objects created with 4.
Array.ForEach(distinct, value => Console.WriteLine(value));
}
}
class MyType
{
public int Value { get; private set; }
public MyType(int arg)
{
Value = arg;
}
public override int GetHashCode()
{
int retval = base.GetHashCode();
return retval;
}
public override bool Equals(object obj)
{
if (obj == null)
return false;
MyType rhs = obj as MyType;
if ((Object)rhs == null)
return false;
return this == rhs;
}
public static bool operator ==(MyType lhs, MyType rhs)
{
bool result;
if ((Object)lhs != null && (Object)rhs != null)
result = lhs.Value == rhs.Value;
else
result = (Object)lhs == (Object)rhs;
return result;
}
public static bool operator !=(MyType lhs, MyType rhs)
{
return !(lhs == rhs);
}
}
You need to override GetHashCode() in your class. GetHashCode must be implemented in tandem with Equals overloads. It is common for code to check for hashcode equality before calling Equals. That's why your Equals implementation is not getting called.
Your suspicion is correct,it is the equality which currently just checks the object references. Even your implementation does not do anything extra, change it to this:
public override bool Equals(object obj)
{
if (obj == null)
return false;
MyType rhs = obj as MyType;
if ((Object)rhs == null)
return false;
return this.Value == rhs.Value;
}
In you equality method you are still testing for reference equality, rather than semantic equality, eg on this line:
result = (Object)lhs == (Object)rhs
you are just comparing two object references which, even if they hold exactly the same data, are still not the same object. Instead, your test for equality needs to compare one or more properties of your object. For instance, if your object had an ID property, and objects with the same ID should be considered semantically equivalent, then you could do this:
result = lhs.ID == rhs.ID
Note that overriding Equals() means you should also override GetHashCode(), which is another kettle of fish, and can be quite difficult to do correctly.
You need to implement GetHashCode().
It seems that a simple Distinct operation can be implemented more elegantly as follows:
var distinct = items.GroupBy(x => x.ID).Select(x => x.First());
where ID is the property that determines if two objects are semantically equivalent. From the confusion here (including that of myself), the default implementation of Distinct() seems to be a little convoluted.
I think MyType needs to implement IEquatable for this to work.
The other answers have pretty much covered the fact that you need to implement Equals and GetHashCode correctly, but as a side note you may be interested to know that anonymous types have these values implemented automatically:
var distinct =
(from value in list
select new {Value = value}).Distinct().ToArray();
So without ever having to define this class, you automatically get the Equals and GetHashCode behavior you're looking for. Cool, eh?

Categories

Resources