Why comparing two strings as object causes unexpected result - c#

Consider the following piece of code.
object str = new string(new char[] { 't', 'e', 's', 't' });
object str1 = new string(new char[] { 't', 'e', 's', 't' });
Console.WriteLine(str==str1); // false
Console.WriteLine(str.Equals(str1)); // true
I understand the equality operator working here that as we have implicitly casted to object, the equality operator is checking the references of both if they are equal and returns false.
But i am confused on the second one, returning true looks like it is calling Equals override implementation provided by the String type and it checks for content of string if they are equal.
My question is why it doesn't check for content equality for operator as well, their actual type is string not object. right ?
while the follwing code outputs ture for both:
object str = "test";
object str1 = "test";
Console.WriteLine(str==str1); // true
Console.WriteLine(str.Equals(str1)); // true

With:
Console.WriteLine(str==str1); // false
it is determined at compile-time which C# pre-defined (formal) overload of operator == to use. Since str and str1 are declared as object, the overload operator ==(object, object) is chosen. This is fixed at compile-time. Just because the actual run-time types happen to be more specific, that does not change. If you want binding at run-time, use Console.WriteLine((dynamic)str == (dynamic)str1); /* true */ instead.
With:
Console.WriteLine(str.Equals(str1)); // true
you call a virtual method on object. Virtual means it will go to whatever override is relevant at run-time. The class System.String has an override, and since str will have run-time type System.String, the override will be used by the "virtual dispatch".
Regarding the addition to the bottom of your question: That situation is different because of string interning. String interning is an optimization where the same physical instance is used for formally distinct strings whose values are identical. When you have two strings whose values are given in the source code, string interning will "optimize" and make two references to the same instance. This is usually harmless because strings are guaranteed to be immutable. So normally you do not care if it is the same instance or another instance with identical value. But in your example, we can "reveal" the interning.
Note: String interning was not relevant to your original question. Only after you added a new example to your question, string interning became relevant.

When == is used on an expression of type object, it'll resolve to System.Object.ReferenceEquals.
Equals is just a virtual method and behaves as such, so the overridden version will be used (which, for string type compares the contents).

This happens because of string interning; when you write:
object str = "test";
object str1 = "test";
Console.WriteLine(str==str1);
This works as expected as the two strings are internally and silently copied to one location by the compiler so the two pointers will actually point to the same object.
If you create a string from an array of chars, the compiler is not clever enough to understand your intention and that it is the equivalent of above, so, being a string a reference type, they're effectively two different objects in memory.
Have a look at this article: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/
The Equals method is overridden in string, therefore it's comparing the actual content of the string rather than the address as == (ReferenceEquals) does in your case as the type is object.

I believe it is because the String == operator only takes string types as parameters, while the .Equals method takes object types as parameters.
Since the string == only take string types as parameters, the overload resolution selects the object == operator to use for the comparison.

The help to String.Equals method is giving this as a remark:
This method performs an ordinal (case-sensitive and
culture-insensitive) comparison.
So, the comparison is done by checking the string char by char, thus giving true.

Related

C# RequireNonDefaultAttribute strange behaviour: Equals vs == inside IsValid() [duplicate]

This question already has answers here:
C# difference between == and Equals()
(20 answers)
Closed 9 years ago.
What is the difference between a.Equals(b) and a == b for value types, reference types, and strings? It would seem as though a == b works just fine for strings, but I'm trying to be sure to use good coding practices.
From When should I use Equals and when should I use ==:
The Equals method is just a virtual
one defined in System.Object, and
overridden by whichever classes choose
to do so. The == operator is an
operator which can be overloaded by
classes, but which usually has
identity behaviour.
For reference types where == has not
been overloaded, it compares whether
two references refer to the same
object - which is exactly what the
implementation of Equals does in
System.Object.
Value types do not provide an overload
for == by default. However, most of
the value types provided by the
framework provide their own overload.
The default implementation of Equals
for a value type is provided by
ValueType, and uses reflection to make
the comparison, which makes it
significantly slower than a
type-specific implementation normally
would be. This implementation also
calls Equals on pairs of references
within the two values being compared.
using System;
public class Test
{
static void Main()
{
// Create two equal but distinct strings
string a = new string(new char[] {'h', 'e', 'l', 'l', 'o'});
string b = new string(new char[] {'h', 'e', 'l', 'l', 'o'});
Console.WriteLine (a==b);
Console.WriteLine (a.Equals(b));
// Now let's see what happens with the same tests but
// with variables of type object
object c = a;
object d = b;
Console.WriteLine (c==d);
Console.WriteLine (c.Equals(d));
}
}
The result of this short sample program is
True
True
False
True
Here is a great blog post about WHY the implementations are different.
Essentially == is going to be bound at compile time using the types of the variables and .Equals is going to be dynamically bound at runtime.
In the most shorthand answer:
== opertator is to check identity. (i.e: a==b are these two are the same object?)
.Equals() is to check value. (i.e: a.Equals(b) are both holding identical values?)
With one exception:
For string and predefined value types (such as int, float etc..),
the operator == will answer for value and not identity. (same as using .Equals())
One significant difference between them is that == is a static binary operator that works on two instances of a type whereas Equals is an instance method. The reason this matters is that you can do this:
Foo foo = new Foo()
Foo foo2 = null;
foo2 == foo;
But you cannot do this without throwing a NullReferenceException:
Foo foo = new Foo()
Foo foo2 = null;
foo2.Equals(foo);
At a simple level, the difference is which method is called. The == method will attempt ot bind to operator== if defined for the types in question. If no == is found for value types it will do a value comparison and for reference types it will do a reference comparison. A .Equals call will do a virtual dispatch on the .Equals method.
As to what the particular methods do, it's all in the code. Users can define / override these methods and do anything they please. Ideally this methods should be equivalent (sorry for the pun) and have the same output but it is not always the case.
One simple way to help remember the difference is that a.Equals(b) is more analogous to
a == (object)b.
The .Equals() method is not generic and accepts an argument of type "object", and so when comparing to the == operator you have to think about it as if the right-hand operand were cast to object first.
One implication is that a.Equals(b) will nearly always return some value for a and b, regardless of type (the normal way to overload is to just return false if b is an unkown type). a == b will just throw an exception if there's no comparison available for those types.
"==" is an operator that can be overloaded to perform different things based on the types being compared.
The default operation performed by "==" is a.Equals(b);
Here's how you could overload this operator for string types:
public static bool operator == (string str1, string str2)
{
return (str1.Length == str2.Length;)
}
Note that this is different than str1.Equals(str2);
Derived classes can also override and redefine Equals().
As far as "best practices" go, it depends on your intent.
For strings you want to be careful of culture specific comparisons. The classic example is the german double S, that looks a bit like a b. This should match with "ss" but doesn't in a simple == comparison.
For string comparisons that are culture sensitive use: String.Compare(expected, value, StringComparison....) == 0 ? with the StringComparison overload you need.
By default, both == and .Equals() are equivalent apart from the possibility of calling .Equals() on a null instance (which would give you a NullReferenceException). You can, however, override the functionality of either of them independently (though I'm not sure that would ever be a good idea unless you're trying to work around the shortcomings of another system), which would mean you could MAKE them different.
You'll find people on both sides of the aisle as to the one to use. I prefer the operator rather than the function.
If you're talking about strings, though, it's likely a better idea to use string.Compare() instead of either one of those options.

Does casting null to string cause boxing?

Imagine code like this:
var str = (String)null;
Does it differs from:
String str;
Or:
String str = null;
Does the first code cause boxing of null value, or is it rather resolved at compiler time to string?
String is a reference type, so no, there's no boxing.
var str = (String)null;
String str = null;
These two are equivalent. In the first line, the compiler infers the type of str from the right hand side of the expression. In the second line, the cast from null to string is implicit.
String str;
The last one is equivalent to String str = null if it's a field declaration, which means str will be assigned its default value, which is null. If, however, str is a local variable, it'll have to be explicitly assigned before you can use it.
Let's take your question and pick it apart.
Will the code in your question cause boxing?
No, it will not.
This is not because any of the 3 statements operate differently (there are differences though, more below), but boxing is not a concept that occurs when using strings.
Boxing occurs when you take a value type and wrap it up into an object. A string is a reference type, and thus there will never be boxing involved with it.
So boxing is out, what about the rest, are the three statements equal?
These two will do the same:
var str = (String)null;
String str = null;
The third one (second one in the order of your question though) is different in the sense that it only declares the str identifier to be of type String, it does not specifically initialize it to null.
However, if this is a field declaration of a class, this will be the same since all fields are initialized to defaults / zeroes when an object is constructed, and thus it will actually be initialized to null anyway.
If, on the other hand, this is a local variable, you now have an uninitialized variable. Judging from the fact that you write var ..., which is illegal in terms of fields, this is probably more correct for your question.
MSDN says,
Boxing is the process of converting a value type to the type object
String is not a value type and so there will be no boxing/unboxing.
Yes they are equal. Since string is a reference type even if you say string str; it will get default value which is null
These two are equal:
var str = (String)null;
String str = null;
However, this one,
String str;
Depending on the context might or might not be equal to previous expressions. If it's a local variable, then it's not equal. You must explicitly initialise it. If it's a class variable, then it's initialised to null.
Neither cause boxing.

Is it possible to create a string that's not reference-equal to any other string?

It seems like .NET goes out of its way to make strings that are equal by value equal by reference.
In LINQPad, I tried the following, hoping it'd bypass interning string constants:
var s1 = new string("".ToCharArray());
var s2 = new string("".ToCharArray());
object.ReferenceEquals(s1, s2).Dump();
but that returns true. However, I want to create a string that's reliably distinguishable from any other string object.
(The use case is creating a sentinel value to use for an optional parameter. I'm wrapping WebForms' Page.Validate(), and I want to choose the appropriate overload depending on whether the caller gave me the optional validation group argument. So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value. Obviously there's other less arcane ways of approaching this specific use case, the aim of this question is more academical.),
It seems like .NET goes out of its way to make strings that are equal
by value equal by reference.
Actually, there are really only two special cases for strings that exhibit behavior like what you're describing here:
String literals in your code are interned, so the same literal in two places will result in a reference to the same object.
The empty string is a particularly weird case, where as far as I know literally every empty string in a .NET program is in fact the same object (i.e., "every empty string" constitutes a single string). This is the only case I know of in .NET where using the new keyword (on a class) may potentially not result in the allocation of a new object.
From your question I get the impression you already knew about the first case. The second case is the one you've stumbled across. As others have pointed out, if you just go ahead and use a non-empty string, you'll find it's quite easy to create a string that isn't reference-equal to any other string in your program:
public static string Sentinel = new string(new char[] { 'x' });
As a little editorial aside, I actually wouldn't mind this so much (as long as it were documented); but it kind of irks me that the CLR folks (?) implemented this optimization without also going ahead and doing the same for arrays. That is, it seems to me they might as well have gone ahead and made every new T[0] refer to the same object too. Or, you know, not done that for strings either.
If the strings are ReferenceEqual, they are the same object. When you call new string(new char[0]), you don't get a new object that happens to be reference-equal to string.Empty; that would be impossible. Rather, you get a new reference to the already-created string.Empty instance. This is a result of special-case code in the string constructor.
Try this:
var s1 = new string(new char { 'A', 'b' });
var s2 = new string(new char { 'A', 'b' });
object.ReferenceEquals(s1, s2).Dump();
Also, beware that string constants are interned, so all instances of the literal "Ab" in your code will be reference equal to one another, because they all refer to the same string object. Constant folding applies, too, so the constant expression "A" + "b" will also be reference equal to "Ab".
Your sentinal value, therefore, can be a privately-created non-zero-length string.
You can put non-printable characters into the string... even the 0/nul character. But really, I'd just use null for the sentinel value, and try to ensure code elsewhere is using the empty string instead of null.
So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value.
I've never done this before, but my thoughts would be to make a Nullable class... but instead of Nullable it would be Parameter and would keep track on whether or not it has been assigned anything (including null).

Whats the difference between these two comparison statements?

Whats the difference between these two comparison statments?
var result = EqualityComparer<T>.Default.Equals(#this, null);
var result = #this == null;
Obviously the aim is to test whether the object '#this' isnull.
Well it depends on the type of #this. If it doesn't have an overload of ==, the second line will just perform a direct reference comparison, whereas the first line will call an overridden Equals method or an implementation of IEquatable.Equals.
Any sensible implementation will give the same result for both comparisons.
The first statement calls the Equals() method between objects to see if their values are equal, assuming it has been overriden and implemented in the class T. The second statement compares the references instead, unless the == operator has been overridden like in the String class.
operator == calls ReferenceEquals on comparing objects, so compare that objects are pointing to the same memory location.
Equals, instead, is a just virtual method, so can behave differently for different types, as it can be overriden.
For example, for CLR string Equals compares content of a string and not a reference, even if string is a reference type.

Why does: string st = "" + 12; work in c# without conversion?

This is super dumb, but I've googled and checked references and I just cannot find an answer... Why can an int or float etc be added as part of a string without converstion but not on it's own? that is:
while this work fine:
string st = "" + 12;
this doesn't (of course):
string st = 12;
Where is the magic here? I know it works I just want to know WHY it works and how I control HOW the conversion is done?
In the first statement, the left operand for the + is a string, and as such + becomes the concatenation operator. The compiler finds the overload for that operator which takes a string and an arbitrary value as operands. This converts both operands to strings (using ToString()) then joins them.
The second statement does not work because there's no implicit cast from int to string.
You can control how the conversion is done by using parentheses to change the order of operations (semi-effective) or by writing code to handle the conversions pre-emptively.
This is how string concatenation is designed to work, and as BoltClock's answer noted, the compiler is using the + as the string concatenation operator. From the C# Language Specification on string concatenation, we find that:
Any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object.
String concats in .NET ultimately resolve to calls to one of the overloads of the static String.Concat methods. This is an optimization to reduce the number of temporary strings that would otherwise be created when mutliple concatenations occur in a single statement.
In short the reason this works is because a number of the String.Concat overloads will accept object in the argument list and since an int, float etc. are in essence objects they can be passed to the Concat overload that accepts one or more object parameters. Internally Concat of basically does a .ToString() on the incomming object therefore turning your int into it's string representation.
In your specific example
string st = "" + 12;
The compiler will reconize that the first string is empty and simply call the String.Concat(object) overload. Which will convert the integer 12 to a string and assign it to st.
This overload is called because the integer can be implicitly boxed to fit into the object type and therefore satisfy the method overload selection.
Because the compiler will call .ToString() on all objects if one of the parameters is a string.
This is working because of the operator overload of the operator + on the string datatype.
Because the first is an expression and there C# makes an implicit conversion.
The second is an assignement with a static value, and the static value has no methods to be called.
As long as the value to be asigned is a variable or expression return there will be a toSting method to call.
It's because the compiler translates the addition to a call to String.Concat().
Internally, all operands are boxed (if necessary) and passed to the String.Concat method (which of course calls ToString() on all arguments).

Categories

Resources