Reference of two String in C# [duplicate] - c#

The code is pretty self explanatory. I expected when I made a1 and b1 that I was creating two different string instances that contain the same text. So I figure a1 == b1 would be true but object.ReferenceEquals(a1,b1) would be false, but it isn't. Why?
//make two seemingly different string instances
string a1 = "test";
string b1 = "test";
Console.WriteLine(object.ReferenceEquals(a1, b1)); // prints True. why?
//explicitly "recreating" b2
string a2 = "test";
string b2 = "tes";
b2 += "t";
Console.WriteLine(object.ReferenceEquals(a2, b2)); // prints False
//explicitly using new string constructor
string a3 = new string("test".ToCharArray());
string b3 = new string("test".ToCharArray());
Console.WriteLine(object.ReferenceEquals(a3, b3)); // prints False

Literal string objects are coalesced into single instances by the compiler. This is actually required by the specification:
Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (Section 7.9.7) appear in the same assembly, these string literals refer to the same string instance.

The compiler is optimized to that the if string literals are equal with "==" operator than it does not need to create a new instance and both refer to the same instance... So, that's why your first part of question answered True.
Although string is a reference type, the equality operators (== and !=) are defined to compare the values of string objects, not references. This makes testing for string equality more intuitive. For example:
string a = "hello";
string b = "h";
// Append to contents of 'b'
b += "ello";
Console.WriteLine(a == b);
Console.WriteLine((object)a == (object)b);
This displays "True" and then "False" because the content of the strings are equivalent, but a and b do not refer to the same string instance.
The + operator concatenates strings:
string a = "good " + "morning";
This creates a string object that contains "good morning".
Strings are immutable--the contents of a string object cannot be changed after the object is created, although the syntax makes it appear as if you can do this. For example, when you write this code, the compiler actually creates a new string object to hold the new sequence of characters, and that new object is assigned to b. The string "h" is then eligible for garbage collection.
string b = "h";
b += "ello";
for more reference check this on msdn and this

Compiler optimization. Simple as that.

Related

Although String is a reference type why "==" operator behave correctly in 2 string? [duplicate]

This question already has answers here:
In C#, why is String a reference type that behaves like a value type?
(12 answers)
Closed 2 years ago.
We know String is a reference type and immutable and so reasonable to be that. It means the address of a string store in Stack and its content store in Heap section memory. On the other hand, we know "==" Operation just compares stack value. So for example we have the following code that compares two Objects as we expect in theory.
And in Microsoft Docs we have this:
By default, two non-record reference-type operands are equal if they refer to the same object:
public static void Main()
{
var a = new MyClass(1);
var b = new MyClass(1);
var c = a;
Console.WriteLine(a == b); // output: False
Console.WriteLine(a == c); // output: True
}
But my question is why in a string that is reference type "==" operation behaves like value-type! I just curious about technical reason and how Dotnet act for it, not just concept (like here )
string t1 = "test";
string t2 = "test";
bool result = t1 == t2; // output: True
As mentioned in comments, string type just implements custom equality comparision operators. Any class can do that and make == (and other equality operators) behave "like value-type". For example, that's what is done for new "record" types which are reference types but behave "like" value types.
However, there is one more thing. In your example code:
string t1 = "test";
string t2 = "test";
bool result = t1 == t2; // output: True
Even if you check if both strings actually refer to the same object, it will still return true:
bool result = Object.ReferenceEquals(t1, t2); // output: True
That's because they ARE actually refer to the same object, because of string interning. There is a global pool of strings in the program, in which one can place a string or if string is already there - get a reference to that same string (via String.Intern call). All string literals declared in your program automatically are placed there. That means both t1 and t2 refer to the same string object in that intern pool.
If you do something like:
string t1 = "test";
string t2 = new StringBuilder().Append("test").ToString();
bool result = Object.ReferenceEquals(t1, t2);
Then t2 is no longer automatically interned and comparision will return false (but of course, comparision with == will still return true).

Reference object comparison of type string

Consider the following code:
public static void Main()
{
string str1 = "abc";
string str2 = "abc";
if (str1 == str2)
{
Console.WriteLine("True");
}
else
{
Console.WriteLine("False");
}
Console.ReadLine();
}
The output is "True". string is a reference type in .Net & I am comparing two different objects, but still the output is "True".
Is is because it internally calls ToString() method on both objects & before comparing them?
Or is it because a string is an immutable type? Two completely distinct string objects having the same value would point to same memory location on the heap?
How does string comparison happens?
How does memory allocation works on the heap? Will two different string objects with the same value point to same memory location, or to a different one?
Strings are compared by value by default.
Objects are compared by reference by default.
Identical string literals in the same assembly are interned to be the same reference.
Identical strings that are not literals can legally be interned to the same reference, but in practice typically are not.
So now you should be able to understand why you get the given output in this program fragment:
string a1 = "a";
string a2 = "a";
string aa1 = a1 + a2;
string aa2 = a1 + a2;
object o1 = a1;
object o2 = a2;
object o3 = aa1;
object o4 = aa2;
Console.WriteLine(a1 == a2); // True
Console.WriteLine(aa1 == aa2); // True
Console.WriteLine(o1 == o2); // True
Console.WriteLine(o3 == o4); // False
Does that make sense?
For the string type, == compares the values of the strings.
See http://msdn.microsoft.com/en-us/library/53k8ybth.aspx
Regarding your question about addressing, a few lines of code says they will have the same address.
static void Main(string[] args)
{
String s1 = "hello";
String s2 = "hello";
String s3 = s2.Clone() as String;
Console.Out.WriteLine(Get(s1));
Console.Out.WriteLine(Get(s2));
Console.Out.WriteLine(Get(s3));
s1 = Console.In.ReadLine();
s1 = Console.In.ReadLine();
s3 = s2.Clone() as String;
Console.Out.WriteLine(Get(s1));
Console.Out.WriteLine(Get(s2));
Console.Out.WriteLine(Get(s3));
}
public static string Get(object a)
{
GCHandle handle = GCHandle.Alloc(a, GCHandleType.Pinned);
IntPtr pointer = GCHandle.ToIntPtr(handle);
handle.Free();
return "0x" + pointer.ToString("X");
}
Results in the same address for each set of tests.
Get() courtosey of Memory address of an object in C#
String Class has done operator overloading to write custom logic for == operator.
That's why when == is used in case of string it does not compare references but actual value.
Please see https://stackoverflow.com/a/1659107/562036 for more info
It's entirely likely that a large portion of the developer base comes
from a Java background where using == to compare strings is wrong and
doesn't work. In C# there's no (practical) difference (for strings).
string is reference type, because its does not have default allocation size, but is treated as a value type for sanity reasons, could you image a world were == would not work between to exact string values.

C# assign by reference

Is it possible to assign by reference? I know that ref has to be used in methods.
string A = "abc";
string B = A;
B = "abcd";
Console.WriteLine(A); // abc
Console.WriteLine(B); // abcd
Can I have some sort of
string A = "abc";
string B = (ref)A;
B = "abcd"; // A was assigned to B as reference, so changing B is the same as changing A
Console.WriteLine(A); // abcd
Console.WriteLine(B); // abcd
That's how it works already. Strings are a reference type- your variable A is a reference (like a pointer) to a string on the heap, and you are just copying the pointer's value (the address of the string) into the variable B.
Your example doesn't change the value of A when you assign "abcd" to B because strings are treated specially in .net. They are immutable, as Kevin points out- but it is also important to note that they have value type semantics, that is assignments always result in the reference pointing to a new string, and doesn't change the value of the existing string stored in the variable.
If, instead of Strings, you used (for example) cars, and changed a property, you'd see this is the case:
public class Car {
public String Color { get; set; }
}
Car A = new Car { Color = "Red" };
Car B = A;
B.Color = "Blue";
Console.WriteLine(A.Color); // Prints "Blue"
// What you are doing with the strings in your example is the equivalent of:
Car C = A;
C = new Car { Color = "Black" };
It's probably worth noting that it does not work this way for value types (integers, doubles, floats, longs, decimals, booleans, structs, etc). Those are copied by value, unless they are boxed as an Object.
You aren't modifying the reference to A. You are creating a whole new string. A still shows "abc", because it can't be changed by modifying B. Once you modify B, it points to a whole new object. Strings are immutable too, so any change to one creates a new string.
To further answer your question with non-immutable reference types, it is possible to modify the properties of an object that a variable points to and it will show the changed effect when you access other variables pointing to the same object. This does not mean however that you can have a variable point to a brand new object, and have other variables (that pointed to the old object) point to that new object automatically without modifying them as well.
Strings are immutable that's true. However you can resolve your issue by encapsulating string within a class and making A and B instances of that class. Then A = B should work.
public class ReferenceContainer<T>
{
public T Value {get;set;}
public ReferenceContainer(T item)
{
Value = item;
}
public override string ToString()
{
return Value.ToString();
}
public static implicit operator T (ReferenceContainer<T> item)
{
return Value;
}
}
var A = new ReferenceContainer<string>("X");
var B = A;
B.Value = "Y";
Console.WriteLine(A);// ----> Y
Console.WriteLine(B);// ----> Y
Strings are already references, after B = A then B.equals(A) will return true. However, when you do B = "abcd" you're doing the same thing, you're assigning B to a reference to the string literal.
What you are wanting to do is modify the data pointed to by the string, however, because Strings in .NET are immutable there is no way to do that.
Strings are special objects in C# because they are immutable, otherwise it would be by reference. You can run this snippet to see.
public class Foo
{
public string strA;
}
Foo A = new Foo() { strA = "abc" };
Foo B = A;
B.strA = "abcd";
Console.WriteLine(A.strA);// abcd
Console.WriteLine(B.strA);//abcd
All you do is this:
string A = "abc";
ref string B = ref A;
B = "abcd"; // A was assigned to B as reference, so changing B is the same as changing A
Console.WriteLine(A); // abcd
Console.WriteLine(B); // abcd

String reference

If a and b are both references to the same object why doesn't value of a change when we change value of b in part3. And if I assume that (as in part3) that b is dereferenced when I pass it a new literal string ,why doesn't it also dereference in part2 when I pass "foo" litaral string (ReferenceEquals returns true).
//part1
string a = "foo";
string b = a;
System.Console.WriteLine("\na = {0}\nb = {1}", a, b); //a=foo b=foo
System.Console.WriteLine("a == b : {0}", a == b);//True
System.Console.WriteLine("ReferenceEquals(a, b): {0}", ReferenceEquals(a, b));//True
//part2
b = "foo";
System.Console.WriteLine("\na = {0}\nb = {1}", a, b);//a=foo b=foo
System.Console.WriteLine("a == b : {0}", a == b);//True
System.Console.WriteLine("ReferenceEquals(a, b): {0}", ReferenceEquals(a, b));//True
//part3
b = "bar";
System.Console.WriteLine("\na = {0}\nb = {1}", a, b);//a=foo b=bar
System.Console.WriteLine("a == b : {0}", a == b);//False
System.Console.WriteLine("ReferenceEquals(a, b): {0}", ReferenceEquals(a, b));//False
a and b both refer to the same object.
However, you're changing b, not the object.
String objects are immutable and cannot be changed.
When you write b = "foo", you're changing b to refer to a different String instance.
However, string literals are interned, so writing "foo" will always give the same String instance.
You can get two different String instances with the same value by calling String.Copy:
string a = "foo";
string b = "foo";
Console.WriteLine(ReferenceEquals(a, b)); //True
b = String.Copy("foo");
Console.WriteLine(ReferenceEquals(a, b)); //False
If you change something , like a = "newstring"; it means, that 'a' points to new reference.
Strings are immutable -> you cannot change string itself (a[0] = 'b';).
In part2, when you assign some constant used previously, the reference to the old one is used. This is called 'Interning'.
If you did b = "fo" + "o";, the references wouldn't be equal (in this example they would, because compiler optimizes this, but if the string is created other way than directly used, references are the same).
var a = "foo";
var b = "fo";
b = b + "o";
// in this point, the references AREN'T equal.
b = string.Intern(b);
// in this point, the references ARE equal.
"Strings are immutable--the contents of a string object cannot be changed after the object is created, although the syntax makes it appear as if you can do this."
See http://msdn.microsoft.com/en-us/library/362314fe.aspx
a and b are both references to the same thing in part 1
In part 2 the references remain the same because the compiler has worked out in advance you are just reusing the same string literal (a little memory optimisation) and because strings are immutable it knows it is safe to make that optimisation.
In part 3 you are changing the reference to b only. a remains a reference to "foo" as it was before.
a and b both refer to the same object. How can a change, if you are assigning 'a' to 'b' and then changing the value of 'b'

why don't string object refs behave like other object refs?

string a = "a";
string b = a;
string a = "c";
Why does string b still have the value "a" and not "c"?
As string is an object and not a stack value type, what's with this behaviour?
Thanks
You're pointing the variable to something new, it's no different than if you said
Foo a = new Foo();
Foo b = a;
a = new Foo();
// a no longer equal to b
In this example, b is pointing to what a initially referenced. By changing the value of a, a and b are no longer referencing the same object in memory. This is different than working with properties of a and b.
Foo a = new Foo();
Foo b = a;
a.Name = "Bar";
Console.WriteLine(b.Name);
In this case, "Bar" gets written to the screen because a and b still reference the same object.
Let me start by saying that your choices for variables and data are poor. It makes it very difficult for someone to say "the string a in your example..." because "a" could be the content of the string, or the variable containing the reference. (And it is easily confused with the indefinite article 'a'.)
Also, your code doesn't compile because it declares variable "a" twice. You are likely to get better answers if you ask questions in a way that makes them amenable to being answered clearly.
So let's start over.
We have two variables and two string literals.
string x = "hello";
string y = x;
x = "goodbye";
Now the question is "why does y equal 'hello' and not 'goodbye'"?
Let's go back to basics. What is a variable? A variable is a storage location.
What is a value of the string type? A value of the string type is a reference to string data..
What is a variable of type string? Put it together. A variable of type string is a storage location which holds a reference to string data.
So, what is x? a storage location. What is its first value? a reference to the string data "hello".
What is y? a storage location. What is its first value? a reference to the string data "hello", same as x.
Now we change the contents of storage location x to refer to the string data "goodbye". The contents of storage location y do not change; we didn't set y.
Make sense?
why don’t string object refs behave like other object refs?
I deny the premise of the question. String object refs do behave like other object refs. Can you give an example of where they don't?
Part of what confuses people so much about this is thinking of the following as an append operation:
str1 = str1 + str2;
If string were a mutable type, and the above were shorthand for something like this:
str1.Append(str2);
Then what you're asking would make sense.
But str1 = str1 + str2 is not just some method call on a mutable object; it is an assignment. Realizing this makes it clear that setting a = "c" in your example is no different from assigning any variable (reference type or not) to something new.
The below comparison between code that deals with two List<char> objects and code that deals with two string objects should hopefully make this clearer.
var a = new List<char>();
var b = a; // at this point, a and b refer to the same List<char>
b.Add('a'); // since a and b refer to the same List<char> ...
if (b.Contains('a')) { /* ...this is true... */ }
if (a.Contains('a')) { /* ...and so is this */ }
// HOWEVER...
a = new List<char>(); // now a and b do NOT refer to the same List<char>...
if (b.Contains('a')) { /* ...so this is still true... */ }
if (a.Contains('a')) { /* ...but this is not */ }
Compare this with a slightly modified version of the code you posted:
string a = "a";
string b = a; // at this point, a and b refer to the same string ("a")...
if (b == "a") { /* ...so this is true... */ }
if (a == "a") { /* ...and so is this */ }
// REMEMBER: the below is not simply an append operation like List<T>.Add --
// it is an ASSIGNMENT
a = a + "c"; // now they do not -- b is still "c", but a is "ac"
if (b == "a") { /* ...so this is still true... */ }
if (a == "a") { /* ...but this is not */ }
In .Net, a, b and c are reference to the objects and not the objects themselves. When you reset a, you are pointing this reference to a new memory location. The old memory location and any references to it are unchanged.
I guess the OP thinks string objects to be mutable, so something like var = "content";
would actually store the new character array inside the already existing object.
String is, however, an immutable type, which means that in this case a new string object is created and assigned to var.
See for example:
http://codebetter.com/blogs/patricksmacchia/archive/2008/01/13/immutable-types-understand-them-and-use-them.aspx
It is a misunderstanding because of the builtin string support of c#.
string a = "123"; //The way to write it in C#
string a = new string("123"); //Would be more obvious
The second way to define a is more obvious what happens, but it is verbose.Since strings have direct support from the compiler calling the string constructor is unnecessary.
Writing your example verbose:
string a = new string("a");
string b = a;
string a = new string("c");
Here the behavior is as expected a gets a reference to the new string object assigned. while the reference held by b still points to the old string.

Categories

Resources