Does casting null to string cause boxing? - c#

Imagine code like this:
var str = (String)null;
Does it differs from:
String str;
Or:
String str = null;
Does the first code cause boxing of null value, or is it rather resolved at compiler time to string?

String is a reference type, so no, there's no boxing.
var str = (String)null;
String str = null;
These two are equivalent. In the first line, the compiler infers the type of str from the right hand side of the expression. In the second line, the cast from null to string is implicit.
String str;
The last one is equivalent to String str = null if it's a field declaration, which means str will be assigned its default value, which is null. If, however, str is a local variable, it'll have to be explicitly assigned before you can use it.

Let's take your question and pick it apart.
Will the code in your question cause boxing?
No, it will not.
This is not because any of the 3 statements operate differently (there are differences though, more below), but boxing is not a concept that occurs when using strings.
Boxing occurs when you take a value type and wrap it up into an object. A string is a reference type, and thus there will never be boxing involved with it.
So boxing is out, what about the rest, are the three statements equal?
These two will do the same:
var str = (String)null;
String str = null;
The third one (second one in the order of your question though) is different in the sense that it only declares the str identifier to be of type String, it does not specifically initialize it to null.
However, if this is a field declaration of a class, this will be the same since all fields are initialized to defaults / zeroes when an object is constructed, and thus it will actually be initialized to null anyway.
If, on the other hand, this is a local variable, you now have an uninitialized variable. Judging from the fact that you write var ..., which is illegal in terms of fields, this is probably more correct for your question.

MSDN says,
Boxing is the process of converting a value type to the type object
String is not a value type and so there will be no boxing/unboxing.
Yes they are equal. Since string is a reference type even if you say string str; it will get default value which is null

These two are equal:
var str = (String)null;
String str = null;
However, this one,
String str;
Depending on the context might or might not be equal to previous expressions. If it's a local variable, then it's not equal. You must explicitly initialise it. If it's a class variable, then it's initialised to null.
Neither cause boxing.

Related

Why comparing two strings as object causes unexpected result

Consider the following piece of code.
object str = new string(new char[] { 't', 'e', 's', 't' });
object str1 = new string(new char[] { 't', 'e', 's', 't' });
Console.WriteLine(str==str1); // false
Console.WriteLine(str.Equals(str1)); // true
I understand the equality operator working here that as we have implicitly casted to object, the equality operator is checking the references of both if they are equal and returns false.
But i am confused on the second one, returning true looks like it is calling Equals override implementation provided by the String type and it checks for content of string if they are equal.
My question is why it doesn't check for content equality for operator as well, their actual type is string not object. right ?
while the follwing code outputs ture for both:
object str = "test";
object str1 = "test";
Console.WriteLine(str==str1); // true
Console.WriteLine(str.Equals(str1)); // true
With:
Console.WriteLine(str==str1); // false
it is determined at compile-time which C# pre-defined (formal) overload of operator == to use. Since str and str1 are declared as object, the overload operator ==(object, object) is chosen. This is fixed at compile-time. Just because the actual run-time types happen to be more specific, that does not change. If you want binding at run-time, use Console.WriteLine((dynamic)str == (dynamic)str1); /* true */ instead.
With:
Console.WriteLine(str.Equals(str1)); // true
you call a virtual method on object. Virtual means it will go to whatever override is relevant at run-time. The class System.String has an override, and since str will have run-time type System.String, the override will be used by the "virtual dispatch".
Regarding the addition to the bottom of your question: That situation is different because of string interning. String interning is an optimization where the same physical instance is used for formally distinct strings whose values are identical. When you have two strings whose values are given in the source code, string interning will "optimize" and make two references to the same instance. This is usually harmless because strings are guaranteed to be immutable. So normally you do not care if it is the same instance or another instance with identical value. But in your example, we can "reveal" the interning.
Note: String interning was not relevant to your original question. Only after you added a new example to your question, string interning became relevant.
When == is used on an expression of type object, it'll resolve to System.Object.ReferenceEquals.
Equals is just a virtual method and behaves as such, so the overridden version will be used (which, for string type compares the contents).
This happens because of string interning; when you write:
object str = "test";
object str1 = "test";
Console.WriteLine(str==str1);
This works as expected as the two strings are internally and silently copied to one location by the compiler so the two pointers will actually point to the same object.
If you create a string from an array of chars, the compiler is not clever enough to understand your intention and that it is the equivalent of above, so, being a string a reference type, they're effectively two different objects in memory.
Have a look at this article: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/
The Equals method is overridden in string, therefore it's comparing the actual content of the string rather than the address as == (ReferenceEquals) does in your case as the type is object.
I believe it is because the String == operator only takes string types as parameters, while the .Equals method takes object types as parameters.
Since the string == only take string types as parameters, the overload resolution selects the object == operator to use for the comparison.
The help to String.Equals method is giving this as a remark:
This method performs an ordinal (case-sensitive and
culture-insensitive) comparison.
So, the comparison is done by checking the string char by char, thus giving true.

"string s = null" is technically incorrect?

Please let me know if this is in the wrong place, or let me know a better place for it.
This question is not about the syntax, more the idea behind it.
I would like to know what null is essentially, or isn't as the case may be
First off, I just want to clarify exactly what null is. By my understanding, null is
technically nothing. So the statement
string s = null;
is technically incorrect? You are assigning a variable no value but a variable can't not have a value? Am I correct in thinking this?
My idea of null is that it's kind of like thinking, "If I go and get a drink of water, I will need a cup to put the water in". null is the space in which the data will be placed in, but the space doesn't exist yet. Following this idea:
string s = "";
would be more appropriate, no? And along this thought (though a bit less confusing)
int n = 0;
follows the same idea, where 0 is a value, but the value of 0 is nothing?
The original line of code is quite valid. In C#, any reference type variable can have no value, i.e. it refers to no object. null is quite different to an empty string. Consider a variable of some other type, e.g. Form. What would be the equivalent of an empty string then? A null in C# is basically the same as a NULL in a database.
Value types are a but different. Because reference type variables contain a reference, it is possible for them to refer to no object. Value type variables, on the other hand, contain a value and so cannot be null. A struct or enum is a value type and a class or delegate is a reference type. So, if a variable on the stack contains the value zero then, for a reference type that means no object, i.e. null, and for a value type that means the default value for that type, e.g. zero for numbers, false for bool and DateTime.MinValue for DateTime.
C# references are basically tarted-up pointers. Just as in C/C++ a pointer can be null to point to no object, so a C# reference-type variable can be null to refer to no object. In the case of strings, an empty string is an object and is very different to null. An empty balloon is still a balloon and very different to no balloon at all.
Your confusion likely lies in the difference between reference types and value types.
void Foo() {
string s = null;
}
This does not create a string. Instead, s is a reference to a string. However, s is currently referring to (or pointing to, in C terminology) nothing.
s ---> [nothing]
Now when we do this:
s = "Stack Overflow";
we are making s refer to that string. s itself doesn't contain the string.
s ----> "Stack Overflow"
Note that "" itself is a string, and does exist
s ----> ""
Strings are actually a bad example because of string interning.
Value types on the other hand line up with your confusion.
An int for example, must have a value. If you don't assign it one, it will (generally) take the default value of 0.
See more:
Value Types and Reference Types
This is not wrong as much as it is redundant because null is the default value of string
as int num = 0 is redundant because 0 is the default value of int
If you need to initialize your string then you should go for string s = "" or my personal favorite string s = string.Empty
Null means nothing. 0 is a value, but null means that there is no value. Actually in C#:
string s=null;
means that instance of the class string is null and has no value.
Imagine string s is just a variable pointing to a place in memory which was allocated for you.
string s = null; Is allocating a variable but it is not pointng to a place in your memory.
string s = "; micht be the same but it is pointing to a slot in your memory containing a iteral with an empty string.
I hope that made the problem a bit more clear.

Is it possible to create a string that's not reference-equal to any other string?

It seems like .NET goes out of its way to make strings that are equal by value equal by reference.
In LINQPad, I tried the following, hoping it'd bypass interning string constants:
var s1 = new string("".ToCharArray());
var s2 = new string("".ToCharArray());
object.ReferenceEquals(s1, s2).Dump();
but that returns true. However, I want to create a string that's reliably distinguishable from any other string object.
(The use case is creating a sentinel value to use for an optional parameter. I'm wrapping WebForms' Page.Validate(), and I want to choose the appropriate overload depending on whether the caller gave me the optional validation group argument. So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value. Obviously there's other less arcane ways of approaching this specific use case, the aim of this question is more academical.),
It seems like .NET goes out of its way to make strings that are equal
by value equal by reference.
Actually, there are really only two special cases for strings that exhibit behavior like what you're describing here:
String literals in your code are interned, so the same literal in two places will result in a reference to the same object.
The empty string is a particularly weird case, where as far as I know literally every empty string in a .NET program is in fact the same object (i.e., "every empty string" constitutes a single string). This is the only case I know of in .NET where using the new keyword (on a class) may potentially not result in the allocation of a new object.
From your question I get the impression you already knew about the first case. The second case is the one you've stumbled across. As others have pointed out, if you just go ahead and use a non-empty string, you'll find it's quite easy to create a string that isn't reference-equal to any other string in your program:
public static string Sentinel = new string(new char[] { 'x' });
As a little editorial aside, I actually wouldn't mind this so much (as long as it were documented); but it kind of irks me that the CLR folks (?) implemented this optimization without also going ahead and doing the same for arrays. That is, it seems to me they might as well have gone ahead and made every new T[0] refer to the same object too. Or, you know, not done that for strings either.
If the strings are ReferenceEqual, they are the same object. When you call new string(new char[0]), you don't get a new object that happens to be reference-equal to string.Empty; that would be impossible. Rather, you get a new reference to the already-created string.Empty instance. This is a result of special-case code in the string constructor.
Try this:
var s1 = new string(new char { 'A', 'b' });
var s2 = new string(new char { 'A', 'b' });
object.ReferenceEquals(s1, s2).Dump();
Also, beware that string constants are interned, so all instances of the literal "Ab" in your code will be reference equal to one another, because they all refer to the same string object. Constant folding applies, too, so the constant expression "A" + "b" will also be reference equal to "Ab".
Your sentinal value, therefore, can be a privately-created non-zero-length string.
You can put non-printable characters into the string... even the 0/nul character. But really, I'd just use null for the sentinel value, and try to ensure code elsewhere is using the empty string instead of null.
So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value.
I've never done this before, but my thoughts would be to make a Nullable class... but instead of Nullable it would be Parameter and would keep track on whether or not it has been assigned anything (including null).

What does assigning variable to null do?

In the following I get a compile time error that says "Use of unassigned local variable 'match'"
if I just enter string match; but it works when I use string match = null;
So what is the difference and in general, if a string is not being assigned a value right away should I be assigning to null like this?
string question = "Why do I need to assign to null";
char[] delim = { ' ' };
string[] strArr = question.Split(delim);
//Throws Error
string match;
//No Error
//string match = null;
foreach (string s in strArr)
{
if (s == "Why")
{
match = "Why";
}
}
Console.WriteLine(match);
The C# language prevents the use of a local until it has been definitively assigned a value. In this example the compiler doesn't understand the semantics of Split and has to assume that strArr can be an empty collection and hence the body of the loop could potentially not execute. This means from a definitive assignment perspective the foreach doesn't assign match a value. Hence it's still unassigned when you get to WriteLine
By changing the declaration to string match = null the value is marked as definitely assigned from the very start. The loop calculation hence doesn't matter
Depends on your scenario, though:
string match = null;
Or:
string match = string.Empty;
are both acceptable practices.
In your case it is possible for match never to have a value assigned, thus the compiler error.
You are finding the difference between declaration and assignment. Declaration, with lines like
string match;
simply declares to the compiler that you will be using a variable match of type string. Assignment, with lines like
match = null;
assigns the value null to match.
It is possible for a language to declare that declaration and assignment must always be separated (I'm not 100% sure, but I believe that old versions of Visual Basic did this), but most languages allow you to combine declaration and assignment, writing
string match = null; // combined declaration and assignment
to mean
string match; // declaration
match = null; // assignment
C# requires that variables be assigned before they are used. Unlike fields and events, local variables aren't automatically assigned default values, so you have to prove to the compiler that, before you use match, match will have some value. The compiler doesn't care which value match has, as long as that variable is of type string.
In your case, the compiler can't prove with local analysis that strArr will be nonempty because the compiler doesn't inspect the code of Split, so there is no guarantee that the code will even enter the foreach loop, let along meet the condition to assign to match. Since the Console.WriteLine call uses match, and since match may not be assigned at runtime with the string match declaration, the compiler requires you to assign match outside the loop. One way to meet the requirement is to use string match = null instead of string match.
The compiler has realised that there is a chance you can use match without it ever being assigned to anything. The foreach loop may never get executed. So you have declared the variable, but the compiler has realised it can be accessed without ever being assigned, hence the error.
You have the if() block in there which would initialize the variable 'match' if the condition is met. In that case, match is an object representing actual block in memory.
However, if the if() condition isn't met, there is no 'else' block that does a default initialization of the 'match' variable, in which case you'll be attempting to access a non-initialized object, which would fail.
you can work around this by:
As you commented, default initializing 'match' before the for-loop.
Adding a default 'else' condition after the for-loop.
Luckily if you're working on an IDE, it points this out to you as a compile exception.
When you state, type variable = null;, you are initializing the variable. If you state type variable;, you are only declaring the variable.

What is the difference in string.Equals("string") and "String".Equals(string)?

Is there any difference in following two lines of code that compares the string values.
string str = "abc";
if(str.Equals("abc"))
and
if("abc".Equals(str))
in the first line I am calling the equals method on string variable to compare it with string literal. The second line is vice versa. Is it just the difference of coding style or there is a difference in the way these two statements are processed by the compiler.
The only difference is that, in the first case, when you do:
str.Equals("abc")
If str is null, you'll get an exception at runtime. By doing:
"abc".Equals(str)
If str is null, you'll get false.
The difference is that in the second example, you will never get a NullReferenceException because a literal can't be null.
To add to the other answers: the static string.Equals("abc", str) method always avoids triggering a null reference exception, regardless of which order you pass the two strings.
As mmyers said, you the second example will not throw a NullReferenceException and while allowing the program to "appear" to run error free, may lead to unintended results.
Yes, the way the compiler processed the statements is different. The function equals for String in most languages follows the same guidlines. Here is a semicode:
override def Equals(that:String):Boolean //Should override Object.Equals
if(that==null) return false
for i from 0 to this.length
if(!this(i).Equals(that(i))) return false
return true
Normally, the method will fisrt check that that IS a String, and that this and that have the same length.
You can see, as others pointed out, that if that is null the method returns false. On the other hand, the method is part of of String so it cannot be called on null. That is why in your exampleif str is null you will get a NullReferenceException.
That being said, if you know both variables are non-null Strings of the same length, both statements will evaluate to the same in the same time.

Categories

Resources