In .NET, the fact that IEnumerable<T> extends IEnumerable often comes in handy. Frustratingly, though IEqualityComparer<T> and IComparer<T> do not extend their non-generic counterparts, despite the fact that the EqualityComparer<T> and Comparer<T> classes implement both interfaces. Is there a reason for this discrepancy?
If I have a sequence of strings I can use that to get a sequence of objects, since each string is also an object. This is true of any sequence; I can always get a sequence of objects when given any sequence.
If I have an object that can compare two strings I can't use it to compare two objects, since those two objects might not be strings.
The reason for this is because IEnumerable<T> is covariant, whereas IComparer and IEqualityComparer are not. (Not just in the C# sense, although that's true as well, but also in the conceptual computer science sense.)
As for why the concrete EqualityComparer<T> and Comparer classes implement both interfaces, rather than not just the generic versions, is a decision that can only really be explained by the employees who choose to create those types. I'd imagine that they did it because at the time enough people were using the non-generic versions of the interfaces that they wanted the types to be usable with all of the existing non-generic code.
Related
I am working on a C# project where I make heavy use of interfaces, and the System.Collections.Immutable library. I wish to sort implementations of one of my interfaces in an immutable set, ImmutableSortedSet<IMyInterface>.
In Java this is a straightforward matter of implementing Comparable<IMyInterface> and overriding the equals and hash code functions. I found a similar interface in .net IComparable<IMyInterface> but it warns implementers that if they choose to implement the interface, then they should also override the comparison operators (<,>,<=,>=), as well as implement IEquatable<IMyInterface>. IEquatable<T> warns implementers that they should override the equals and hash code functions, as well as the '==' and '!=' operators.
Now I'm having second thoughts about implementing IComparable<T>, I'm not creating a new primitive type here, I just want to provide a convenient sorting algorithm for a complex reference type. Furthermore, there seems to be a certain problem in C# with overriding operators at the interface level, I am therefore leaning towards using a separate IComparer<IMyInterface> implementation.
What really raised my eyebrows though was hearing this:
The IEquatable<T> interface is used by generic collection objects such as Dictionary<TKey, TValue>, List<T>, and LinkedList<T> when testing for equality in such methods as Contains, IndexOf, LastIndexOf, and Remove. It should be implemented for any object that might be stored in a generic collection.
Does this combined with
If you implement IEquatable<T>, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behavior is consistent with that of the IEquatable<T>.Equals method. If you do override Object.Equals(Object), your overridden implementation is also called in calls to the static Equals(System.Object, System.Object) method on your class. In addition, you should overload the op_Equality and op_Inequality operators. This ensures that all tests for equality return consistent results.
Mean that I am expected to override both '==' and '!=' operators for any type that I want to store inside a generic collection??
IComparable<T> is the preferred mechanism to provide comparison support for sorting. The advice of implementing the comparison operators doesn't make a lot of sense for most types, and they could not be utilized by generic collections anyway. You should also implement IEquatable<T>, override GetHashCode(), and override object.Equals to delegate to IEquatable<T>.Equals.
In general, whenever I implement IComparable<T>, I also implement the non-generic IComparable, but I implement IComparable.CompareTo explicitly such that it is normally hidden.
As you already noticed, if you need sorting, you can pass IComparer<T> implementation and your type doesn't have to implement IComparable<T>. In some cases it's easier and in some cases it's the only way to go, e.g. when you want the same type to be sorted differently in different situations. You can just pass different IComparer<T> implementations in that case.
About IEquatable<T> related quotes. For most situations overriding Equals and GetHashCode is enough. It will make your type work with List<T>.Contains and similar methods. It will also allow you to use your type as key in Dictionary<TKey, TValue> and store it in HashSet<T>. That's because all these cases use EqualityComparer<T>.Default when comparer is not specified.
The way EqualityComparer<T>.Default works can be found here. As you can see it verifies that your type implements IEquatable<T>, but if it doesn't creates an instance of ObjectEqualityComparer<T>, which will just use Equals and GetHashCode methods to verify equality.
The extension method ToList() returns a List<TSource>. Following the same pattern, ToDictionary() returns a Dictionary<TKey, TSource>.
I am curious why those methods do not type their return values as IList<TSource> and IDictionary<TKey, TSource> respectively. This seems even odder because ToLookup<TSource, TKey> types its return value as an interface instead of an actual implementation.
Looking at the source of those extension methods using dotPeek or other decompiler, we see the following implementation (showing ToList() because it is shorter):
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
So why does this method type its return value as a specific implementation of the interface and not the interface itself? The only change would be the return type.
I am curious because the IEnumerable<> extensions are very consistent in their signatures, except for those two cases. I always thought it to be a bit strange.
Additionally, to make things even more confusing, the documentation for ToLookup() states:
Creates a Lookup from an IEnumerable according to a
specified key selector function.
but the return type is ILookup<TKey, TElement>.
In Edulinq, Jon Skeet mentions that the return type is List<T> instead of IList<T>, but does not touch the subject further.
Extensive searching has yielded no answer, so here I ask you:
Is there any design decision behind not typing the return values as interfaces, or is it just happenstance?
Returning List<T> has the advantage that those methods of List<T> that are not part of IList<T> are easily used. There are a lot of things you can do with a List<T> that you cannot do with a IList<T>.
In contrast, Lookup<TKey, TElement> has only one available method that ILookup<TKey, TElement> does not have (ApplyResultSelector), and you probably would not end up using that anyway.
These kind of decisions may feel arbitrary but I guess that ToList() returns List<T> rather than an interface because List<T> both implements IList<T> but it adds other members not present in a regular IList<T>-typed object.
For example, AddRange().
See what IList<T> should implement (http://msdn.microsoft.com/en-us/library/5y536ey6.aspx):
public interface IList<T> : ICollection<T>,
IEnumerable<T>, IEnumerable
And List<T> (http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx):
public class List<T> : IList<T>, ICollection<T>,
IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>, IEnumerable<T>,
IEnumerable
Maybe your own code doesn't require IReadOnlyList<T>, IReadOnlyCollection<T> or ICollection, but other components on .NET Framework and other products may rely on a more specialized list object and that's why .NET dev team decided to do not return an interface.
Don't feel always return an interface is the best practice. It's if your code or third-party ones require such encapsulation.
There are a number of advantages to just having a List over an IList. To begin with, List has methods that IList does not. You also know what the implementation is which allows you to reason about how it will behave. You know it can efficiently add to the end, but not the start, you know that it's indexer is very fast, etc.
You don't need to worry about your structure being changed to a LinkedList and wrecking the performance of your application. When it comes to data structures like this it really is important in quite a lot of contexts to know how your data structure is implemented, not just the contract that it follows. It's behavior that shouldn't ever change.
You also can't pass an IList to a method accepting a List, which is something that you see quite a lot of. ToList is frequently used because the person really needs an instance of List, to match a signature they can't control, and IList doesn't help with that.
Then we ask ourselves what advantages there are to returning IList. Well, we could possibly return some other implementation of a list, but as mentioned before this is likely to have very detrimental consequences, almost certainly much more than could possibly be gained from using any other type. It might give you warm fuzzies to be using an interface instead of an implementation, but even that is something I don't feel is a good mentality (in general or) in this context. As a rule returning an interface is generally not preferable to returning a concrete implementation. "Be liberal in what you accept and specific in what you provide." The parameters to your methods should, where possible, be interfaces defining the least amount of functionality you need to that your caller can pass in any implementation that does what you need of it, and you should provide as concrete of an implementation as the caller is "allowed" to see so that they can do as much with the result as that object is capable of. Passing an interface is restricting that value, which is only occasionally something that you want to do.
So now we move onto, "Why return ILookup and not Lookup?" Well, first off Lookup isn't a public class. There is no Lookup in System.Collections.*. The Lookup class that is exposed through LINQ exposes no constructors publicly. You're not able to use the class except through ToLookup. It also exposes no functionality that isn't already exposed through ILookup. In this particular case they designed the interface specifically around this exact method (ToLookup) and the Lookup class is a class specifically designed to implement that interface. Because of all of this virtually all of the points discussed about List just don't apply here. Would it have been a problem to return Lookup instead, no, not really. In this case it really just doesn't matter much at all either way.
In my opinion returning a List<T> is justified by the fact that the method name says ToList. Otherwise it would have to be named ToIList. It is the very purpose of this method to convert an unspecific IEnumerable<T> to the specific type List<T>.
If you had a method with an unspecific name like GetResults, then a return type like IList<T> or IEnumerable<T> would seem appropriate to me.
If you look at the implementation of the Lookup<TKey, TElement> class with reflector, you'll see a lot of internal members, that are only accessible to LINQ itself. There is no public constructor and Lookup objects are immutable. Therefore there would be no advantage in exposing Lookup directly.
Lookup<TKey, TElement> class seems to be kind of LINQ-internal and is not meant for public use.
I believe that the decision to return a List<> instead of an IList<> is that one of the more common use cases for calling ToList is to force immediate evaluation of the entire list. By returning a List<> this is guaranteed. With an IList<> the implementation can still be lazy, which would defeat the "primary" purpose of the call.
This is one of the common things that programmers have difficulty understanding around the use of interfaces and concrete types.
Returning a concrete List<T> that implements IList<T> only gives the method consumer more information. Here is what the List object implements (via MSDN):
[SerializableAttribute]
public class List<T> : IList<T>, ICollection<T>, IList, ICollection,
IReadOnlyList<T>, IReadOnlyCollection<T>, IEnumerable<T>, IEnumerable
Returning as a List<T> gives us the ability to call members on all of these interfaces in addition to List<T> itself. For example we could only use List.BinarySearch(T) on a List<T>, as it exists in List<T> but not in IList<T>.
In general to maximize flexibility of our methods, we should take the most abstract types as parameters (ie. only the things we're going to use) and return the least abstract type possible (to allow a more functional return object).
In general when you call ToList() on a method you're looking for a concrete type otherwise the item could stay as type IEnumerable. You don't need to convert to a List unless you're doing something that requires a concrete list.
The short answer is that in general returning the most specific type available is recommended by the authoritative Framework Design Guidelines. (sorry I don't have a citation on hand, but I remember this clearly since it stuck out in contrast to the Java community guidelines which prefer the opposite).
This makes sense to me. You can always do e.g. IList<int> list = x.ToList(), only the library author needs to be concerned with being able to support the concrete return type.
ToLookup<T> is the unique one in the crowd. But perfectly within the guidelines: it is the most specific type available that the library authors are willing to support (as others have pointed out, the concrete Lookup<T> type appears to be more of an internal type not meant for public use).
Because List<T> actually implements a range of interfaces, not just IList:
public class List<T> : IList<T>, ICollection<T>, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>, IEnumerable<T>, IEnumerable{
}
Each of those interfaces define a range of features which the List must conform. Picking one particular one, would render bulk of the implementation unusable.
If you do want to return IList, nothing stops you from having your own simple wrapper:
public static IList<TSource> ToIList<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw new ArgumentNullException(source);
return source.ToList();
}
If a function returns a newly-constructed immutable object, the caller should generally not care about the precise type returned provided it is capable of holding the actual data that it contains. For example, a function that is supposed to return an IImmutableMatrix might normally return an ImmutableArrayMatrix backed by a privately-held array, but if all the cells hold zeroes it might instead return an ZeroMatrix, backed only by Width and Height fields (with a getter that simply returns zero all the time). The caller wouldn't care whether it was given an ImmutableArrayMatrix matrix or a ZeroMatrix; both types would would allow all of their cells to be read, and guarantee their values would never change, and that's what the caller would care about.
On the other hand, functions that return newly-constructed objects that allow open-ended mutation should generally return the precise type the caller is going to expect. Unless there will be a means by which the caller can request different return types (e.g. by calling ToyotaFactory.Build("Corolla") versus ToyotaFactory.Build("Prius")) there's no reason for the declared return type to be anything else. While factories that return immutable data-holding objects can select a type based on the data to be contained, factories that return freely-mutable types will have no way of knowing what data may be put into them. If different callers will have different needs (e.g. returning to the extant example, some callers' needs would be met with an array, while others' would not) they should be given a choice of factory methods.
BTW, something like IEnumerator<T>.GetEnumerator() is a bit of a special case. The returned object will almost always be mutable, but only in a very highly-constrained fashion; indeed, it is expected that the returned object regardless of its type will have exactly one piece of mutable state: its position in the enumeration sequence. Although an IEnumerator<T> is expected to be mutable, the portions of its state which would vary in derived-class implementations are not.
Can someone please explain with example that I can understand about the difference between .Equals, IComparable and IComparer.
I was asked this in an interview.
Well first off, on the surface, Equals is a method (present in every object), while IComparable and IComparer are interfaces.
Equals is present in any class and can be overriden to provide equality testing depending on the context of the class (it's a good practice to override GetHashCode as well). By default it just tests if objects are equal in memory which is not very useful. Equals (and GetHashCode) are usually given a different implementation in the context of searching or hashing.
Implementing IComparable is a more fine-grain way of comparison, as it provides the CompareTo method, which is a greater-than/less-than comparison as opposed to Equals which is simply a is-equal-or-not comparison. For example a binary search tree structure could benefit from this method.
IComparer is similar to IComparable, except that it works from the outside. It allows you to define a "neutral" object that is used for comparing two other objects without modifying them directly, which you need to do with IComparable.
Equals is a method, when 2 other are interfaces. So look like the biggest difference.
More seriously - #ChrisSinclair gave you an answer in comments...
Equals returns true/false if the two objects are equal (or the same reference depending on your implementation) IComparable/IComparer: difference between IComparable and IComparer
.Equals() gives your class a way to test for equality against all other possible objects. This can be considered as the fallback for object equality. So this answers the question am I equivalent to the object passed in as a param.
IComparable provides for a way of comparing objects which can be ordered, possible uses include sorting. Implementing this interface puts the ordering logic into your class.
IComparer does pretty much the same as IComparable except the logic is contained in separate class.
1) Are the reasons why IEqualityComparer<T> was introduced:
a) so we would be able to compare objects (of particular type) for equality in as many different ways as needed
b) and by having a standard interface for implementing a custom equality comparison, chances are that much greater that third party classes will accept this interface as a parameter and by that allow us to inject into these classes equality comparison behavior via objects implementing IEqualityComparer<T>
2) I assume IEqualityComparer<T> should not be implemented on type T that we're trying to compare for equality, but instead we should implement it on helper class(es)?
Thank you
I'm doubtful that anyone here will be able to answer with any authority the reason that the interface was introduced (my guess--and that's all it is--would be to support one of the generic set types like Dictionary<TKey, TValue> or HashSet<T>), but its purpose is clear:
Defines methods to support the comparison of objects for equality.
If you combine this with the fact you can have multiple types implementing this interface (see StringComparer), then the answer to question a is yes.
The reason for this is threefold:
Operators (in this case, ==) are not polymorphic; if the type is upcasted to a higher level than where the type-specific comparison logic is defined, then you'll end up performing a reference comparison rather than using the logic within the == operator.
Equals() requires at least one valid reference and can provide different logic depending on whether it's called on the first or second value (one could be more derived and override the logic of the other).
Lastly and most importantly, the comparison logic provided by the type may not be what the user is after. For example, strings (in C#) are case sensitive when compared using == or Equals. This means that any container (like Dictionary<string, T> or HashSet<string>) would be case-sensitive. Allowing the user to provide another type that implements IEqualityComparer<string> means that the user can use whatever logic they like to determine if one string equals the other, including ignoring case.
As for question b, probably, though I wouldn't be surprised if this wasn't high on the list of priorities.
For your final question, I'd say that's generally true. While there's nothing stopping you from doing so, it is confusing to think that type T would provide custom comparison logic that is different from that provided on type T just because it's referenced as an IEqualiltyComparer<T>.
agreed on a and b
"should not be" is always a normative question and rarely a good metric. You do what works without getting into trouble. (Pragmatic Programmer). The fact that you can implement the interface statefull, stateless and in any which way, makes it possible to implement (alternative) comparers for all types, including value types, enums, sealed types, even abstract types; In essence it is a Strategy pattern
Sometimes there's a natural equality comparison for a type, in which case it should implement IEquatable<T>, not IEqualityComparer<T>. At other times, there are multiple possible ways of comparing objects for equality - so it makes sense to implement IEqualityComparer<T> then. It allows hash tables (and sets etc) to work in a flexible way.
IEquatable<T> could have been declared to be contravariant in T, since it only uses T in an input position (or, equivalently, U being a subtype of T should imply that IEquatable<T> is [a subtype of] IEquatable<U>).
So, why did the BCL team not annotate it (for C# 4.0) with the 'in' keyword, as they did with many other generic interfaces (like the entirely analogous IComparable)?
I think this is mainly for a philosophical reason rather than a technical limitation–as it's perfectly possible to simply annotate the interface. IEquatable<T> is meant to compare objects of the same type for exact equality. An instance of a superclass is not usually considered equal to an instance of a subclass. Equality in this sense implies type equality too. This is a bit different from IComparable<in T>. It can be sensible to define a relative sort order across different types.
To quote MSDN page on IEquatable<T>:
Notes to Implementers:
Replace the type parameter of the IEquatable<T> interface with the type that is implementing this interface.
This sentence further demonstrates the fact that IEquatable<T> is meant to work between instances of a single concrete type.
Inheritable types should generally not implement IEquatable<T>. If IEquatable<T> included a GetHashCode() method, one could define the semantics of IEquatable<T> to say that items should compare equal when examined as T's. Unfortunately, the fact that IEquatable<T> is bound to the same hash code as Object.Equals means that in general IEquatable<T> has to implement essentially the same semantics as Object.Equals.
Consequently, if an implementation of IEquatable<BaseClass> does anything other than call Object.Equals within it, a derived class which overrides Object.Equals and GetHashCode() and does not re-implement IEquatable<BaseClass> will end up with a broken implementation of that interface; an implementation of IEquatable<BaseClass> which simply calls Object.Equals will work just fine, even in that scenario, but will offer no real advantage over a class which doesn't implement IEquatable<T>.
Given that inheritable classes shouldn't be implementing IEquatable<T> in the first place, the notion of covariance is not relevant to proper implementations of the interface.