When Implementing IEqualityComparer Should GetHashCode check for null?

When Implementing IEqualityComparer Should GetHashCode check for null? - c#

When implementing IEqualityComparer<Product> (Product is a class), ReSharper complains that the null check below is always false:
public int GetHashCode(Product product)
{
// Check whether the object is null.
if (Object.ReferenceEquals(product, null))
return 0;
// ... other stuff ...
}
(Code example from MSDN VS.9 documentation of Enumerable.Except)
ReSharper may be wrong, but when searching for an answer, I came across the official documentation for IEqualityComparer<T> which has an example where null is not checked for:
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
Additionally, the documentation for GetHashCode() states that ArgumentNullException will be thrown when "The type of obj is a reference type and obj is null."
So, when implementing IEqualityComparer should GetHashCode check for null, and if so, what should it do with null (throw an exception or return a value)?
I'm interested most in .NET framework official documentation that specifies one way or another if null should be checked.

ReSharper is wrong.
Obviously code you write can call that particular GetHashCode method and pass in a null value. All known methods might ensure this will never happen, but obviously ReSharper can only take existing code (patterns) into account.
So in this case, check for null and do the "right thing".
Corollary: If the method in question was private, then ReSharper might analyze (though I'm not sure it does) the public code and verify that there is indeed no way that this particular private method will be called with a null reference, but since it is a public method, and one available through an interface, then
ReSharper is wrong.

The documentation says that null values should never be hashable, and that attempting to do so should always result in an exception.
Of course, you're free to do whatever you want. If you want to create a hash based structure for which null keys are valid, you're free to do so, in this case you should simply ignore this warning.

ReSharper has some special case code here. It will not warn about the ReferenceEquals in this:
if (ReferenceEquals(obj, null)) { throw new ArgumentNullException("obj"); }
It will warn about the ReferenceEquals in this:
if (ReferenceEquals(obj, null)) { return 0; }
Throwing an ArgumentNullException exception is consistent with the contract specified in IEqualityComparer(Of T).GetHashCode
If you go to the definition of IEqualityComparer (F12) you'll also find further documentation:
// Exceptions:
// System.ArgumentNullException:
// The type of obj is a reference type and obj is null.
int GetHashCode(T obj);
So ReSharper is right that there is something wrong, but the error displayed doesn't match the change you should make to the code.

There is some nuance to this question.
The docs state that IEqualityComparer<T>.GetHashCode(T) throws on null input; however EqualityComparer<>.Default - which is almost certainly by far the most used implementation - does not throw.
Clearly, an implementation does not need to throw on null it merely has the option too.
However, I'd argue that no implementation should ever throw on null here, it's just confusing, and a possible source of bugs. Exceptions are a pain in any case, being a non-local control flow mechanism, and that alone argues for using them when necessary only (i.e.: not here). But additionally, for IEqualityComparer specifically, the docs state that whenever Equals(x, y) then GetHashCode(x) should equal GetHashCode(y) - and Equals does allow nulls, and is not documented as throwing any exceptions.
The invariant that equality implies hashcode equality makes implementing things relying on those hashcodes much simpler. Having a gotcha with the null value is a design cost you should avoid paying without need. And here there is no need, ever.
In short:
do not throw from GetHashCode, even though it is allowed
and do check for nulls; Resharper's warning is incorrect.
Doing this results in simpler code with fewer gotchas, and it follows the behavior of EqualityComparer<>.Default which is the most common implementation used.

Related

Can XPathNavigator.Evaluate(string) return null?

Can this overload of XPathNavigator.Evaluate return null ?
// Can "result" be null ?
object result = xmlDoc.CreateNavigator().Evaluate(xpathString);
If the answer is No, then why Resharper says that result maybe null ?
string str = result.ToString(); // Resharper: Possible NullReferenceException
I found nothing in the documentation about an input that might cause it to return null. I also tried inspecting the Reference Source for this function, but it was unfruitful.
I know that R# uses code annotations, but I still don't trust this warning as I tried different inputs with none of them returns null.

Looking at the code, it does look like it would be highly unlikely to get a null from XPathNavigator.Evaluate. There are a couple of possible code paths that might get you a null, but I suspect they're pathological edge cases (if evaluating a function that should be a number function, but isn't, or if the operand to a query is already null). I doubt these would happen under normal circumstances.
I don't know why ReSharper has the [CanBeNull] annotation on the return value. If I had to guess, I'd say it's because the method is virtual, and therefore there's no way to guarantee that the implementation will always return a value. Or because it calls an abstract method on another class that doesn't have any null-ness guarantees, and there's no check on the return of that value, so again, there's no guarantee that it won't be null.
The annotations are based on static control flow analysis, and that can only get you so far. ReSharper will provide the strongest hints that it can. If it knows it's not null, it will annotate it so, if it doesn't know, it will flag it [CanBeNull], and err on the side of caution.

What to name a variant of a Get() method?

I am developing an API for a repository-like abstraction. I have two methods:
// Throws an exception if object cannot be found
MyObj Get(MyIdType id);
// Returns false if object cannot be found; no exception
bool TryGet(out MyObj obj);
There is a requirement for a third variant: one that returns null if object cannot be found, and does not throw an exception.
// Returns null if object cannot be found; no exception
MyObj ?????(MyIdType id);
I'm stuck as on what to name it. GetOrDefault has been ruled out as confusing. GetIfNotNull has been suggested, but also seems unclear. GetOrNull is the most promising so far.
Does anyone have any other suggestions, or know of any public APIs whose conventions I can follow?

I would opt to not have a Get method that behaves differently in two situations. Why not have the Get return null for all cases. Why throw an exception at all?
I would opt to leave it up to user code to throw an exception if a null value is returned, if required.
See this question for further guidance related to when to throw exceptions.

I'd go with GetOrDefault (as you suggested yourself) based on the LINQ extension method FirstOrDefault.
Maybe GetValue and GetValueOrDefault would sound better though.

How about: GetOrDefault
The ...OrDefault is fairly standard in LINQ.

You could try GetObjectOrReturnDefaultValue or, since you know it's a reference type GetObjectOrReturnNull. It's long and ugly, but it's not ambiguous.

I'd keep only bool TryGetXXXXX(out T value) variant on your interface and provide the rest as extension methods to it. It makes your interface itself very compact, but at the same time as useful as client wants.

In my opinion, you should stick with:
MyObj Get(MyIdType id);
Instead of throwing an exception here, simply return null. If there is a definite requirement to throw an exception or optionally, null, I would try:
MyObj Get (MyIdType id, bool ReturnDefault = false) // if .net 4
I don't particularly like this option - but sometimes requirements will override what we think feels right or natural.

Is there a way to mark a method as ensuring that T is not null?

For example, if I have a method defined as...
T Create()
{
T t = Factory.Create<T>();
// ...
Assert.IsNotNull(t, "Some message.");
// -or-
if (t == null) throw new Exception("...");
// -or- anything that verifies that it is not null
}
...and I am calling that method from somewhere else...
void SomewhereElse()
{
T t = Create();
// >><<
}
...at >><<, I know (meaning me, the person who wrote this) that t is guaranteed to not be null. Is there a way (an attribute, perhaps, that I have not found) to mark a method as ensuring that a reference type that it returns or otherwise passes out (perhaps an out parameter) is guaranteed by internal logic to not be null?
I have to sheepishly admit that ReSharper is mostly why I care as it highlights anything it thinks could cause either InvalidOperationException or NullReferenceException. I figure either it's reading something that I can mark on my methods or it just knows that Assert.IsNotNull, simple boolean checks or a few other things will remove the chance of something being null and that it can remove the highlight.
Any thoughts? Am I just falling victim to oh-my-god-resharper-highlights-it-I-have-to-fix-it disease?

If ReSharper is why you care then you can mark the Factory.Create<T>() method with their [NotNull] attribute described in their web help

Not sure how R# handles this, but the Contract.Assert method may be what you're looking for

You could put a constraint on T to only allow struct.
You could use a language extension that allows you to make stronger definitions of pre/post conditions for your function (contract based programming), like SpecSharp, or Code Contracts. Code Contracts seems to leverage built-in systems from C# 4.0. I have no experience with either - only heard of them.

Could you cast T to an object then check if its null?
var o = (object)Factory.Create<T>();
if(o == null) throw new Exception();

How does ReSharper know "Expression is always true"?

Check out the following code:
private void Foo(object bar)
{
Type type = bar.GetType();
if (type != null) // Expression is always true
{
}
}
ReSharper claims type will never be null. That's obvious to me because there's always going to be a type for bar, but how does ReSharper know that? How can it know that the result of a method will never be null?
Type is not a struct so it can't be that. And if the method were written by me, then the return value could certainly be null (not necessarily GetType, but something else).
Is ReSharper clever enough to know that, for only that particular method, the result will never be null? (Like there's a hard-coded list of known .NET methods which will never return null.)

JetBrains perfectly explains how ReSharper does this in their features list.
Summary from link (this particular question is about NotNullAttribute):
We have analyzed a great share of .NET Framework Class Library, as well as NUnit Framework, and annotated it through external XML files, using a set of custom attributes from the JetBrains.Annotations namespace, specifically:
StringFormatMethodAttribute (for methods that take format strings as parameters)
InvokerParameterNameAttribute (for methods with string literal arguments that should match one of caller parameters)
AssertionMethodAttribute (for assertion methods)
AssertionConditionAttribute (for condition parameters of assertion methods)
TerminatesProgramAttribute (for methods that terminate control flow)
CanBeNullAttribute (for values that can be null)
NotNullAttribute (for values that can not be null)
UsedImplicitlyAttribute (for entities that should not be marked as unused)
MeansImplicitUseAttribute (for extending semantics of any other attribute to mean that the corresponding entity should not be marked as unused)

Yes, it basically has knowledge of some well-known methods. You should find the same for string concatenation too, for example:
string x = null;
string y = null;
string z = x + y;
if (z == null)
{
// ReSharper should warn about this never executing
}
Now the same information is also becoming available via Code Contracts - I don't know whether JetBrains is hooking directly into this information, has its own database, or a mixture of the two.

GetType is not virtual. Your assumption is most likely correct in your last statement.
Edit: to answer your comment question - it can't infer with your methods out of the box.

object.GetType is not virtual, so you cannot yourself implement a version that returns a null value. Therefore, if bar is null, you will get a NullReferenceException and otherwise, type will never by null.

C#: Should I bother checking for null in this situation?

Lets say I have this extention method:
public static bool HasFive<T>(this IEnumerable<T> subjects)
{
if(subjects == null)
throw new ArgumentNullException("subjects");
return subjects.Count() == 5;
}
Do you think this null check and exception throwing is really necessary? I mean, when I use the Count method, an ArgumentNullException will be thrown anyways, right?
I can maybe think of one reason why I should, but would just like to hear others view on this. And yes, my reason for asking is partly laziness (want to write as little as possible), but also because I kind of think a bunch of null checking and exception throwing kind of clutters up the methods which often end up being twice as long as they really needed to be. Someone should know better than to send null into a method :p
Anyways, what do you guys think?
Note: Count() is an extension method and will throw an ArgumentNullException, not a NullReferenceException. See Enumerable.Count<TSource> Method (IEnumerable<TSource>). Try it yourself if you don't believe me =)
Note2: After the answers given here I have been persuaded to start checking more for null values. I am still lazy though, so I have started to use the Enforce class in Lokad Shared Libraries. Can recommend taking a look at it. Instead of my example I can do this instead:
public static bool HasFive<T>(this IEnumerable<T> subjects)
{
Enforce.Argument(() => subjects);
return subjects.Count() == 5;
}

Yes, it will throw an ArgumentNullException. I can think of two reasons for putting the extra checking in:
If you later go back and change the method to do something before calling subjects.Count() and forget to put the check in at that point, you could end up with a side effect before the exception is thrown, which isn't nice.
Currently, the stack trace will show subjects.Count() at the top, and probably with a message with the source parameter name. This could be confusing to the caller of HasFive who can see a subjects parameter name.
EDIT: Just to save me having to write it yet again elsewhere:
The call to subjects.Count() will throw an ArgumentNullException, not a NullReferenceException. Count() is another extension method here, and assuming the implementation in System.Linq.Enumerable is being used, that's documented (correctly) to throw an ArgumentNullException. Try it if you don't believe me.
EDIT: Making this easier...
If you do a lot of checks like this you may want to make it simpler to do so. I like the following extension method:
internal static void ThrowIfNull<T>(this T argument, string name)
where T : class
{
if (argument == null)
{
throw new ArgumentNullException(name);
}
}
The example method in the question can then become:
public static bool HasFive<T>(this IEnumerable<T> subjects)
{
subjects.ThrowIfNull("subjects");
return subjects.Count() == 5;
}
Another alternative would be to write a version which checked the value and returned it like this:
internal static T NullGuard<T>(this T argument, string name)
where T : class
{
if (argument == null)
{
throw new ArgumentNullException(name);
}
return argument;
}
You can then call it fluently:
public static bool HasFive<T>(this IEnumerable<T> subjects)
{
return subjects.NullGuard("subjects").Count() == 5;
}
This is also helpful for copying parameters in constructors etc:
public Person(string name, int age)
{
this.name = name.NullGuard("name");
this.age = age;
}
(You might want an overload without the argument name for places where it's not important.)

I think #Jon Skeet is absolutely spot on, however I'd like to add the following thoughts:-
Providing a meaningful error message is useful for debugging, logging and exception reporting. An exception thrown by the BCL is less likely to describe the specific circumstances of the exception WRT your codebase. Perhaps this is less of an issue with null checks which (most of the time) necessarily can't give you much domain-specific information - 'I was passed a null unexpectedly, no idea why' is pretty much the best you can do most of the time, however sometimes you can provide more information and obviously this is more likely to be relevant when dealing with other exception types.
The null check clearly demonstrates to other developers and you, a form of documentation, if/when you come back to the code a year later, that it's possible someone might pass a null, and it would be problematic if they did so.
Expanding on Jon's excellent point - you might do something before the null gets picked up - I think it is vitally important to engage in defensive programming. Checking for an exception before running other code is a form of defensive programming as you are taking into account things might not work the way you expected (or changes might be made in the future that you didn't expect) and ensuring that no matter what happens (assuming your null check isn't removed) such problems cannot arise.
It's a form of runtime assert that your parameter is not null. You can proceed on the assumption that it isn't.
The above assumption can result in slimmer code, you write the rest of your code knowing the parameter is not null, cutting down on extraneous subsequent null checks.

In my opinion you should check for the null value. Two things that comes to mind.
It makes explicit the possible errors that can happen during runtime.
It also gives you a chance to throw a better exception instead of a generic ArgumentNullException. Thus, making the reason for the exception more explicit.

The exception that you will get thrown will be an Object reference not set to an instance of an object.
Not the most useful of exceptions when tracking down the problem.
The way you have it there will give you much more useful information by specifically stating that it's your subjects reference that is null.

I think it is a good practice to do precondition checks at the top of the function. Maybe it's just my code that is full of bugs, but this practice catched a lot of errors for me.
Also, it's much easier to figure out the source of the problem if you got an ArgumentNullException with the name of the parameter, thrown from the most relevant stack frame. Also, the code in the body of your function can change over time so I wouldn't depend on it catching precondition problems in the future.

It always depends on the context (in my opinion).
For instance, when writing a library (for others to use), it certainly makes sense to fully check each and every parameter and throw the appropriate exceptions.
When writing methods that are used inside a project, I usually skip those checks, attempting to reduce the size of the codebase. But even in this case, there might be a level (between application layers) where you still place such checks. It depends on the context, on the size of the project, on the size of the team working on it...
It certainly doesn't make sense doing it for small projects built by one person :)

It depends on the concrete method. In this case - I think, the exception is not necesary and the better usage will be, if teh extension method can deal with null.
public static bool HasFive<T>(this IEnumerable<T> subjects) {
if ( object.ReferenceEquals( subjects, null ) ) { return false; }
return subjects.Count() == 5;
}
If you call "items.HasFive()" and the "items" is null, then is true that items has not five items.
But if you have extension method:
public static T GetFift<T>(this IEnumerable<T> subjects) {
...
}
The exception for "subjects == null" should be called, because there is no valid way, how to deal with it.

If you look at the source to the Enumerable class (System.Core.dll) where a lot of the default extension methods are defined for IEnumerables classes, you can see that they all check for null references with arguments.
public static IEnumerable<TSource> Skip<TSource>(this IEnumerable<TSource> source, int count)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return SkipIterator<TSource>(source, count);
}
It's a bit of an obvious point, but I tend to follow what I find in the base framework library source as you know that is more than likely to be best practices.

Yes, for two reasons:
Firstly, the other extension methods on IEnumerable do and consumers of your code can expect yours to do so as well, but secondly and more importantly, if you have a long chain of operators in your query then knowing which one threw the exception is useful information.

In my opinion one should check for known conditions that will raise errors later on (at least for public methods). That way it's easier to detect the root of the problem.
I would raise a more informational exception like:
if (subjects == null)
{
throw new ArgumentNullException("subjects ", "subjects is null.");
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.