I have the following classes:
public class MyDocuments
{
public DateTime registeredDate;
public string version;
public List<Document> registeredDocuments;
}
public class Document
{
public string name;
public List<File> registeredFiles;
}
public class File
{
public string name;
public string content;
}
I have an instance of MyDocuments which has several documents in List<Document> registeredDocument. I get a new List<Document> from the user.
How can I verify that the new object doesn't exist in the list? I want to compare by value not reference.
I'm thinking of using HashSet instead of List. Is this the proper approach?
How are equality comparisons performed?
Whenever the BCL classes want to perform an equality check between objects of some type T, they do so by calling one or both of the methods in some implementation of IEqualityComparer<T>. To get hold of such an implementation, the framework looks to EqualityComparer<T>.Default.
As mentioned in the documentation, this property produces an IEqualityComparer<T> like this:
The Default property checks whether type T implements the
System.IEquatable<T> interface and, if so, returns an
EqualityComparer<T> that uses that implementation. Otherwise, it
returns an EqualityComparer<T> that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
What are my options?
So, in general, to dictate how equality comparisons should be performed you can:
Explicitly provide an implementation of IEqualityComparer<T> to the class or method that performs equality checks. This option is not very visible with List<T>, but many LINQ methods (such as Contains) do support it.
Make your class implement IEquatable<T>. This will make EqualityComparer<T>.Default use this implementation, and is a good choice whenever there is an obvious "natural" way to compare objects of type T.
Override object.GetHashCode and object.Equals without implementing IEqualityComparer<T>. However, this is simply an inferior version of #2 and AFAIK should always be avoided.
Which option to pick?
A good rule of thumb is: if there is an obvious and natural way to compare objects of class T, consider having it implement IEquatable<T>; this will make sure your comparison logic is used throughout the framework without any additional involvement. If there is no obvious candidate, or if you want to compare in a manner different than the default, implement your own IEqualityComparer<T> and pass the implementation as an argument to the class or method that needs to perform equality checks.
You will need to implement the Equals() method, and probably GetHashCode() as well. See this answer for an example.
You should implement IEquatable<T>.
When you implement this interface on your custom object, any equality checks (e.g. Contains, IndexOf) are automatically done using your objects implementation.
override the object.Equals method.
here's an example straight from the documentation
public class Person
{
private string idNumber;
private string personName;
public Person(string name, string id)
{
this.personName = name;
this.idNumber = id;
}
public override bool Equals(Object obj)
{
Person personObj = obj as Person;
if (personObj == null)
return false;
else
return idNumber.Equals(personObj.idNumber);
}
public override int GetHashCode()
{
return this.idNumber.GetHashCode();
}
}
the Equals method returns a bool which is whether or not obj is equal to this
Something like this at the top level, continued down at the sub-levels:
public class MyDocuments
{
public DateTime registeredDate;
public string version;
public HashSet<Document> registeredDocuments;
public override bool Equals(Object o)
{
if( !(o is MyDocuments) ) return false;
MyDocuments that = (MyDocuments)o;
if( !String.Equals(this.version, that.version) ) return false;
if( this.registeredDocuments.Count != that.registeredDocuments.Count ) return false;
// assuming registeredDate doesn't matter for equality...
foreach( Document d in this.registeredDocuments )
if( !that.registeredDocuments.Contains(d) )
return false;
return true;
}
public override int GetHashCode()
{
int ret = version.GetHashCode();
foreach (Document d in this.registeredDocuments)
ret ^= d.GetHashCode(); // xor isn't great, but better than nothing.
return ret;
}
}
Note: Caching could be useful for the HashCode values if the properties were change-aware.
Related
I am aware of the fact that I always have to override Equals(object) and GetHashCode() when implementing IEquatable<T>.Equals(T).
However, I don't understand, why in some situations the Equals(object) wins over the generic Equals(T).
For example why is the following happening? If I declare IEquatable<T> for an interface and implement a concrete type X for it, the general Equals(object) is called by a Hashset<X> when comparing items of those type against each other. In all other situations where at least one of the sides is cast to the Interface, the correct Equals(T) is called.
Here's a code sample to demonstrate:
public interface IPerson : IEquatable<IPerson> { }
//Simple example implementation of Equals (returns always true)
class Person : IPerson
{
public bool Equals(IPerson other)
{
return true;
}
public override bool Equals(object obj)
{
return true;
}
public override int GetHashCode()
{
return 0;
}
}
private static void doEqualityCompares()
{
var t1 = new Person();
var hst = new HashSet<Person>();
var hsi = new HashSet<IPerson>();
hst.Add(t1);
hsi.Add(t1);
//Direct comparison
t1.Equals(t1); //IEquatable<T>.Equals(T)
hst.Contains(t1); //Equals(object) --> why? both sides inherit of IPerson...
hst.Contains((IPerson)t1); //IEquatable<T>.Equals(T)
hsi.Contains(t1); //IEquatable<T>.Equals(T)
hsi.Contains((IPerson)t1); //IEquatable<T>.Equals(T)
}
HashSet<T> calls EqualityComparer<T>.Default to get the default equality comparer when no comparer is provided.
EqualityComparer<T>.Default determines if T implementsIEquatable<T>. If it does, it uses that, if not, it uses object.Equals and object.GetHashCode.
Your Person object implements IEquatable<IPerson> not IEquatable<Person>.
When you have a HashSet<Person> it ends up checking if Person is an IEquatable<Person>, which its not, so it uses the object methods.
When you have a HashSet<IPerson> it checks if IPerson is an IEquatable<IPerson>, which it is, so it uses those methods.
As for the remaining case, why does the line:
hst.Contains((IPerson)t1);
call the IEquatable Equals method even though its called on the HashSet<Person>. Here you're calling Contains on a HashSet<Person> and passing in an IPerson. HashSet<Person>.Contains requires the parameter to be a Person; an IPerson is not a valid argument. However, a HashSet<Person> is also an IEnumerable<Person>, and since IEnumerable<T> is covariant, that means it can be treated as an IEnumerable<IPerson>, which has a Contains extension method (through LINQ) which accepts an IPerson as a parameter.
IEnumerable.Contains also uses EqualityComparer<T>.Default to get its equality comparer when none is provided. In the case of this method call we're actually calling Contains on an IEnumerable<IPerson>, which means EqualityComparer<IPerson>.Default is checking to see if IPerson is an IEquatable<IPerson>, which it is, so that Equals method is called.
Although IComparable<in T> is contravariant with respect to T, such that any type which implements IComparable<Person> would automatically be considered an implementation of IComparable<IPerson>, the type IEquatable<T> is intended for use with sealed types, especially structures. The requirement that Object.GetHashCode() be consistent with both IEquatable<T>.Equals(T) and Object.Equals(Object) generally implies that the latter two methods should behave identically, which in turn implies that one of them should chain to the other. While there is a large performance difference between passing a struct directly to an IEquatable<T> implementation of the proper type, compared with constructing a instance of the structure's boxed-heap-object type and having an Equals(Object) implementation copy the structure data out of that, no such performance different exists with reference types. If IEquatable<T>.Equals(T) and Equals(Object) are going to be equivalent and T is an inheritable reference type, there's no meaningful difference between:
bool Equals(MyType obj)
{
MyType other = obj as MyType;
if (other==null || other.GetType() != typeof(this))
return false;
... test whether other matches this
}
bool Equals(MyType other)
{
if (other==null || other.GetType() != typeof(this))
return false;
... test whether other matches this
}
The latter could save one typecast, but that's unlikely to make a sufficient performance difference to justify having two methods.
While comparing instances of a custom class, I noticed that a call to Contains doesn't work the way I expect it to. Assuming that the default comparison goes by the reference (pointer or whatever it's called), I implemented both CompareTo and Equals. I made sure to be implementing IComparable, of course.
It's still doesn't work and I get no hits when I put breakpoints on those methods.
What can I be missing and is the best option to use extension methods if I'm not?
public override bool Equals(Object input)
{
return Id == ((MyType) input).Id;
}
public int CompareTo(Object input)
{
return Id - ((MyType)input).Id;
}
A better implementation could be:
public bool Equals(MyType other)
{
// if 'other' is a null reference, or if 'other' is more derived or less derived
if ((object)other == (object)null || other.GetType() != GetType())
return false;
// OK, check members (assuming 'Id' has a type that makes '==' a wise choice)
return Id == other.Id;
}
public override bool Equals(object obj)
{
// call to other overload
return Equals(obj as MyType);
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
You can mark the class as implementing IEquatable<MyType> in that case (but it will work even without that).
Regarding GetHashCode: Always remember to override it. You should have seen a compiler warning that it was problematic to override Equals(object) without overriding GetHashCode. Never keep the code return base.GetHashCode() in the override (assuming the base class is System.Object). Either give it a try and implement something based on the members that participate in Equals. If you do not think GetHashCode will actually be used in your case, say:
public override int GetHashCode()
{
throw new NotSupportedException("We don't have GetHashCode, sorry");
}
If you absolutely know that you will only be using List<>.Contains, and not e.g. Dictionary<,>, HashSet<> and not Linq's Distinct(), etc. etc., it could work with GetHashCode() simply throwing.
IComparable<MyType> is not needed unless you sort List<MyType> or MyType[], or you use Linq's OrderBy with MyType, or you use SortedDictionary<,>, SortedSet<>.
Overloading operator == is not needed for these uses.
I would like to use Distinct() with my data, declared as IEnumerable<KeyValuePair<IdentType, string>>. In this case, i have to implement my own IEqualityComparer and there is my question:
Is there any difference between below implementations?
public int GetHashCode(KeyValuePair<IdentType, string> obj) {
return EqualityComparer<string>.Default.GetHashCode(obj.Value);
}
and
public int GetHashCode(KeyValuePair<IdentType, string> obj) {
return obj.Value.GetHashCode();
}
There is only a small difference between your two methods.
EqualityComparer<string>.Default will return a class of type GenericEqualityComparer<T> if the class implments IEquateable<T> (which string does). So that GetHashCode(obj.Value) gets called to
public override int GetHashCode(T obj) {
if (obj == null) return 0;
return obj.GetHashCode();
}
which is the same as you calling obj.Value.GetHashCode(); directly, except for the fact that if you have a null string the default comparer will return 0 and the direct call version will throw a null reference exception.
Just one: the equality comparer's GetHashCode will return 0 if the string is null, whereas the second implementation will throw an exception.
One difference is that EqualityComparer<string>.Default.GetHashCode would not crash when you pass null to it.
using System;
using System.Collections.Generic;
public class Test
{
public static void Main()
{
var n = EqualityComparer<string>.Default.GetHashCode(null);
Console.WriteLine(n);
}
}
Other than that, the results would be identical by design, because System.String implements IEquatable<System.String>
The Default property checks whether type T implements the System.IEquatable<T> generic interface and, if so, returns an EqualityComparer<T> that invokes the implementation of the IEquatable<T>.Equals method. Otherwise, it returns an EqualityComparer<T>, as provided by T.
No. It doesn't. The implementation will be the same since they both call GetHashCode() on the actual class, in this case string.
In the end, the CreateComparer method inside the EqualityComparer creates an GenericEqualityComparer, and the implementation of it's GetHashCode is:
public override int GetHashCode(T obj) {
if (obj == null) return 0;
return obj.GetHashCode();
}
In this case, obj will be the original string where you would otherwise call GetHasCode on. The only case that will make it behave differently is when your string is null.
I've implemented IEqualityComparer and IEquatable (both just to be sure), but when I call the Distinct() method on a collection it does not call the methods that come with it. Here is the code that I execute when calling Distinct().
ObservableCollection<GigViewModel> distinctGigs = new ObservableCollection<GigViewModel>(Gigs.Distinct<GigViewModel>());
return distinctGigs;
I want to return an ObservableCollection that doesn't contain any double objects that are in the 'Gigs' ObservableCollection.
I implement the interfaces like this on the GigViewModel class:
public class GigViewModel : INotifyPropertyChanged, IEqualityComparer<GigViewModel>, IEquatable<GigViewModel>
{
....
}
And override the methods that come with the interfaces like so:
public bool Equals(GigViewModel x, GigViewModel y)
{
if (x.Artiest.Naam == y.Artiest.Naam)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(GigViewModel obj)
{
return obj.Artiest.Naam.GetHashCode();
}
public bool Equals(GigViewModel other)
{
if (other.Artiest.Naam == this.Artiest.Naam)
{
return true;
}
else
{
return false;
}
}
Thanks for all the help I'm getting. So I've created a seperate class that implements IEqualityComparer and passed it's instance into the disctinct method. But the methods are still not being triggered.
EqualityComparer:
class GigViewModelComparer : IEqualityComparer<GigViewModel>
{
public bool Equals(GigViewModel x, GigViewModel y)
{
if (x.Artiest.Naam == y.Artiest.Naam)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(GigViewModel obj)
{
return obj.Artiest.Naam.GetHashCode();
}
}
The Distinct() call:
GigViewModelComparer comp = new GigViewModelComparer();
ObservableCollection<GigViewModel> distinctGigs = new ObservableCollection<GigViewModel>(Gigs.Distinct(comp));
return distinctGigs;
EDIT2:
The GetHashCode() method DOES get called! After implementing the new class. But the collection still contains duplicates. I have a list of 'Gigs' that contain an 'Artiest' (or Artist) object. This Artist has a Naam property which is a String (Name).
So you had the object that itself is being compared implement both IEquatable as well as IEqualityComparer. That generally doesn't make sense. IEquatable is a way of saying an object can compare itself to something else. IEqualityComparer is a way of saying it can compare two different things you give it to each other. You generally want to do one or the other, not both.
If you want to implement IEquatable then the object not only needs to have an Equals method of the appropriate signature, but it needs to override GetHashCode to have a sensible implementation for the given definition of equality. You didn't do that. You created GetHashCode method that takes an object as a parameter, but that's the overload used for IEqualityComparer. You need to override the parameter-less version when using IEquatable (the one defined in Object).
If you want to create a class that implements IEqualityComparer you need to pass the comparer to the Distinct method. Since you've defined the object as its own comparer you'd need to pass in some instance of this object as the second parameter. Of course, this doesn't really make a whole lot of sense this way; so it would be better, if you go this route, to pull out the two methods that go with IEqualityComparer into a new type, and create an instance of that type to the Distinct method. If you actually passed an object with those definitions in as a comparer, it'd work just fine.
Following MSDN's advice, you'd be best off creating a separate class for your equality comparisons:
We recommend that you derive from the EqualityComparer class
instead of implementing the IEqualityComparer interface, because
the EqualityComparer class tests for equality using the
IEquatable.Equals method instead of the Object.Equals method. This
is consistent with the Contains, IndexOf, LastIndexOf, and Remove
methods of the Dictionary class and other generic
collections.
So, create a class, GigViewModelComparer, that derives from EqualityComparer and put your Equals and GetHashCode methods there.
Then, pass in an instance of that new comparer class in your call to Gigs.Distinct(new GigViewModelComparer()) and it should work. Follow along in the example in the MSDN link I provided above.
I've never seen somebody implement IEqualityComparer in the same class as the type of objects the collection in question contains, that is probably at least part of your problem.
I have followed the suggestions from this post to try and get Distinct() working in my code but I am still having issues. Here are the two objects I am working with:
public class InvoiceItem : IEqualityComparer<InvoiceItem>
{
public InvoiceItem(string userName, string invoiceNumber, string invoiceAmount)
{
this.UserName = userName;
this.InvoiceNumber= invoiceNumber;
this.InvoiceAmount= invoiceAmount;
}
public string UserName { get; set; }
public string InvoiceNumber { get; set; }
public double InvoiceAmount { get; set; }
public bool Equals(InvoiceItem left, InvoiceItem right)
{
if ((object)left.InvoiceNumber == null && (object)right.InvoiceNumber == null) { return true; }
if ((object)left.InvoiceNumber == null || (object)right.InvoiceNumber == null) { return false; }
return left.InvoiceNumber == right.InvoiceNumber;
}
public int GetHashCode(InvoiceItem item)
{
return item.InvoiceNumber == null ? 0 : item.InvoiceNumber.GetHashCode();
}
}
public class InvoiceItems : List<InvoiceItem>{ }
My goal is to populate an InvoiceItems object (we will call it aBunchOfInvoiceItems) with a couple thousand InvoiceItem objects and then do:
InvoiceItems distinctItems = aBunchOfInvoiceItems.Distinct();
When I set this code up and run it, I get an error that says
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'InvoiceReader.Form1.InvoiceItems'. An explicit conversion exists (are you missing a cast?)
I don't understand how to fix this. Should I be taking a different approach? Any suggestions are greatly appreciated.
Distinct returns a generic IEnumerable<T>. It does not return an InvoiceItems instance. In fact, behind the curtains it returns a proxy object that implements an iterator that is only accessed on demand (i.e. as you iterate over it).
You can explicitly coerce it into a List<> by calling .ToList(). You still need to convert it to your custom list type, though. The easiest way is probably to have an appropriate constructor, and calling that:
public class InvoiceItems : List<InvoiceItem> {
public InvoiceItems() { }
// Copy constructor
public InvoiceItems(IEnumerable<InvoiceItems> other) : base(other) { }
}
// …
InvoiceItems distinctItems = new InvoiceItems(aBunchOfInvoiceItems.Distinct());
Konrad Rudolph's answer should tackle your compilation problems. There is one another important semantic correctness issue here that has been missed: none of your equality-logic is actually going to be used.
When a comparer is not provided to Distinct, it uses EqualityComparer<T>.Default. This is going to try to use the IEquatable<T> interface, and if this is missing, falls back on the plain old Equals(object other) method declared on object. For hashing, it will use the GetHashCode() method, also declared on object. Since the interface hasn't been implemented by your type, and none of the aforementioned methods have been overriden, there's a big problem: Distinct will just fall back on reference-equality, which is not what you want.
Tthe IEqualityComparer<T> interface is typically used when one wants to write an equality-comparer that is decoupled from the type itself. On the other hand, when a type wants to be able to compare an instance of itself with another; it typically implements IEquatable<T>. I suggest one of:
Get InvoiceItem to implement IEquatable<InvoiceItem> instead.
Move the comparison logic to a separate InvoiceItemComparer : IEqualityComparer<InvoiceItem> type, and then call invoiceItems.Distinct(new InvoiceItemComparer());
If you want a quick hack with your existing code, you can do invoiceItems.Distinct(new InvoiceItem());
Quite simply, aBunchOfInvoiceItems.Distinct() returns an IEnumerable<InvoiceItem> and you are trying to assign that to something that is not an IEnumerable<InvoiceItem>.
However, the base class of InvoiceItems has a constructor that takes such an object, so you can use this:
public class InvoiceItems : List<InvoiceItem>
{
public InvoiceItems(IEnumerable<InvoiceItem> items)
base(items){}
}
Then you can use:
InvoiceItems distinctItems = new InvoiceItems(aBunchOfInvoiceItems.Distinct());
As is though, I don't see much benefit in deriving from List<InvoiceItem> so I would probably lean more toward:
List<InvoiceItem> distinctItems = aBunchOfInvoiceItems.Distinct().ToList();
The error has everything to do with your class InvoiceItems, which inherits from List<InvoiceItem>.
Distinct returns an IEnumerable<InvoiceItem>: InvoiceItems is a very specific type of IEnumerable<InvoiceItem>, but any IEnumerable<InvoiceItem> is not necessarily an InvoiceItems.
One solution could be to use an implicit conversion operator, if that's what you wanted to do: Doh, totally forgot you can't convert to/from interfaces (thanks Saed)
public class InvoiceItems : List<InvoiceItem>
{
public InvoiceItems(IEnumerable<InvoiceItem> items) : base(items) { }
}
Other things to note:
Inheriting from List<T> is usually bad. Implement IList<T> instead.
Using a list throws away one of the big benefits of LINQ, which is lazy evaluation. Be sure that prefetching the results is actually what you want to do.
Aside from the custom class vs IEnumerable issue that the other answers deal with, there is one major problem with your code. Your class implements IEqualityComparer instead of IEquatable. When you use Distinct, the items being filtered must either implement IEquatable themselves, or you must use the overload that takes an IEqualityComparer parameter. As it stands now, your call to Distinct will not filter the items according to the IEqualityComparer Equals and GetHashCode methods you provided.
IEqualityComparer should be implemented by another class than the one being compared. If a class knows how to compare itself, like your InvoiceItem class, it should implement IEquatable.