will Substring creates another instance C#? - c#

I am new to C# string I am confused about the
Object.referenceEquals
I was reading some article which says ReferenceEquals check if it same instance or not in the program i am checking if object.ReferenceEquals(s1, s4) even though they point to same data why it is coming as false ?
string s1 = "akhil";
string s2 = "akhil";
Console.WriteLine(object.ReferenceEquals(s1, s2)); //true
s2 = "akhil jain";
Console.WriteLine(object.ReferenceEquals(s1, s2)); //false
//Console.WriteLine(s1 == s2);
//Console.WriteLine(s1.Equals(s2));
string s3 = "akhil";
//1".Substring(0, 5);
Console.WriteLine(s3+" " +s1);
Console.WriteLine(object.ReferenceEquals(s1,s3)); //true
string s4 = "akhil1".Substring(0, 5);
Console.WriteLine(object.ReferenceEquals(s1, s4)); //confusion false why as s4 data is same as s1

The references are the same because a string literal gets interned, Substring returns a new string and a new reference, it doesn't try to second guess your parameters and check the intern pool
String.Intern(String) Method
The common language runtime conserves string storage by maintaining a
table, called the intern pool, that contains a single reference to
each unique literal string declared or created programmatically in
your program. Consequently, an instance of a literal string with a
particular value only exists once in the system.
For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal
string from the intern pool and assigns it to each variable.
Though, useless fact 3454345.2, Since .Net 2, you have been able to turn it off for various reasons you may have
CompilationRelaxations Enum
NoStringInterning Marks an assembly as not requiring string-literal interning. In an application domain, the common
language runtime creates one string object for each unique string
literal, rather than making multiple copies. This behavior, called
string interning, internally requires building auxiliary tables that
consume memory resources.

When instantiating two object, the reference is not equal. The Object.ReferenceEquals method therefore returns false. However, strings are a very special case. If you declare a string in code, the CLR maintains it in a table. This is called the intern pool. This causes two strings that were instantiated with the same value to reference the same object in memory. This will cause Object.ReferenceEquals to return true.
When a string was formed by some operation in your code, it is not automatically interned to the pool. And therefore, it has a different reference, although the content of the string might be the same. This is also explained in the remarks of the documentation of Object.ReferenceEquals here.
Note that the String.Equals() method would return true. In C# you can also use the '==' operator on strings. See your adjusted code below.
string s1 = "akhil";
string s2 = "akhil";
Console.WriteLine(s1.Equals(s2)); //true
s2 = "akhil jain";
Console.WriteLine(s1.Equals(s2)); //false
string s3 = "akhil";
Console.WriteLine(s3 + " " + s1);
Console.WriteLine(s1.Equals(s3)); //true
string s4 = "akhil1".Substring(0, 5);
Console.WriteLine(s1.Equals(s4)); //this now returns true as well
Console.WriteLine(s1 == s4); //so does this

The value of object.ReferenceEquals is false since it checks if both the references point to the same object. ReferenceEquals does not check for data equality, but if both objects occupy the same memory address.
As TheGeneral already mentioned, string literals are interned and stored in a table called intern pool. This is to store string objects efficiently.
When a string literal is assigned to multiple variables, they are pointing to the same address in the intern pool. Hence, you get true for object.ReferenceEquals. But when you compare this with a substring, a new object has been created in the memory. This result in a false when reference is compared since they are two different objects occupying different memory locations.
All the dynamically created strings, or read from an external source are not interned automatically.
If you try the following, you will get true for object.ReferenceEquals:
Console.WriteLine(object.ReferenceEquals(s1, string.Intern(s4)));
You can check with Primitive data types that the ReferenceEquals returns false even when one variable is assigned to another.
int a = 10;
int b = a;
Console.WriteLine(ReferenceEquals(a, b)); //false
This is because each primitive type is stored separately.

Related

Why can a string object be reassigned again and again?

Isn't string immutable? Why can line in the following example be reassigned again and again with file.ReadLine()? Thanks.
int counter = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
Console.WriteLine (line);
counter++;
}
file.Close();
// Suspend the screen.
Console.ReadLine();
Isn't string immutable?
Yes, strings are immutable.
Why can line in the following example be reassigned again and again
line is not a string. line is a variable which refers to a string. Variables are called variables because they vary.
That might imply that this fact is a property of references. It is not.
The number 1 is immutable, right? No matter what you do to 1, it is 1. If you add 10 to 1 and get 11, you have not changed 10 or 1. 10 and 1 remain 10 and 1. The result is 11, a brand new number.
So if numbers are immutable, then why can I say:
int x = 1;
x = x + 10;
? Because x is not a number. x is a variable which holds a number, and variables can vary.
Let's think of another example. Think of something that is immutable in real life. Say, the value of a coin. If you have a dime, there is nothing you can do to make it worth more or less than 10 cents and still have it be a dime. Dimes are immutable. Suppose you have a drawer that you keep exactly one coin in, and today it contains a dime. Tomorrow you take the dime out of the drawer and put in a quarter. How did you do that, if a dime is immutable? Variables are like drawers. You can change their contents, even if the objects in the drawer are immutable.
Finally, the title of your question clearly shows the fundamental cause of your confusion:
Why can a string object be reassigned again and again?
Objects are not things that can be assigned in the first place. Variables can be assigned, and variables are not objects. Variables are storage locations that can contain values.
If you have been taught by some book that variables are a kind of object -- and many beginner books make this mistake -- then throw away that book and get a decent book that is not full of lies. The string returned by ReadLine is an object. A reference to that object is assigned to the variable. The value of that variable is then a reference to an immutable object.
There are two things in C# that look like variables but have slightly different semantics.
First, const locals or fields are not variables, because constants cannot vary. If they could vary then they would be variables, not constants. If you say
const string s = "Hello";
then not only is Hello immutable, but s is too. You should only use const for things that are logically immutable for all time. The price of gold, the name of your bank, your last name, these things can all change. The atomic weight of gold, the value of pi, these things cannot ever change, and so they can be const. C# only allows certain types to be const, and only allows certain expressions to initialize a const.
A readonly field is halfway between a const and a variable. A readonly field is a variable in the constructor or field initializer, and is illegal to write to from any other location. C# treats readonly fields as values, not variables, in all code outside of a constructor or field initializer.
Immutable variables can be reassigned but not mutated.
In this example:
string s1 = "hello";
string s2 = s1;
string s1 = "goodbye";
The symbol s1 is reassigned to a new string object "goodbye", while s2 refers to the same original string ("hello"), because the string was never mutated.
your variable 'line' should be thought of as a pointer to the string, not as the string.
The string itself cannot be changed. For example you cannot do
line[4] = 'a'
expecting to be able to change the fifth char. Contrast with c where you can do
mystr[4] = 'a'
(in most cases anyway)
Immutable doesn't mean a reference to the object can't be reassigned, it means the object itself can't be mutated (changed).
You can reassign the for forever, but can't change any of the strings themselves.
If you made the reference a constant, then it couldn't be reassigned.

(string)combination Purpose?

I'm following an exercise which tasks me to...
"Declare two variables of type string with values "Hello" and "World".
Declare a variable of type object. Assign the value obtained of
concatenation of the two string variables (add space if necessary) to
this variable. Print the variable of type object".
Now here was my original solution:
string hi = "Hello";
string wo = "World";
object hiwo = hi + " " + wo;
Console.WriteLine(hiwo);
Console.ReadLine();
I found a good website that gives sample solutions of the exercises I am going through, which I have started to go through comparing to my answers, In this one I noticed I was nearly spot on, apart from an extra line. I've modified my original code to illustrate the comparison more easily.
My modified code:
string firstWord = "Hello";
string secondWord = "World";
object combination = firstWord + " " + secondWord;
Console.WriteLine(combination);
Given Solution:
string firstWord = "Hello";
string secondWord = "World";
object combination = firstWord + " " + secondWord;
string a = (string)combination;
Console.WriteLine(a);
I believe understanding this extra line is the purpose of the exercise. So my question is why is the extra line exists and what the benefits are to having it? The section of the book is understanding types and variables.
The extra line is a type cast:
A cast is a way of explicitly informing the compiler that you intend to make the conversion and that you are aware that data loss might occur.
Usually, a cast doesn't really return a different object. It just checks if the object is, at runtime, of the type you're casting to. That is, the expression firstWord + secondWord returns an object of type string. Assigning it to a variable of type object doesn't change the fact it's really a string. Similarly, doing (string) combination doesn't return a different object – it just tells the compiler that the expression is of type string. (If combination wasn't really a string, the check would fail and throw an exception.)
In this case there is no benefit to having it there I can see. Console.WriteLine(object) converts the object to a string internally, and an object that is already a string will just "convert" to itself.
In your solution when you call
Console.WriteLine(Combination)
.ToString() method is called internally. Therefore you don't feel the difference.
From MSDN
If value is null, only the line terminator is written. Otherwise, the ToString method of value is called to produce its string representation, and the resulting string is written to the standard output stream.
Whereas in the given solution object is first converted to string and then written.
To understand the difference let's take another example
TextBox tb = new TextBox();
Console.WriteLine(tb);
output would be System.Windows.Forms.TextBox, Text: that is the type of object
In your version what is happening in the line Console.WriteLine is a call to the virtual ToString method, which because of being virtual is in fact executed in its version implemented in the string class (which just returns the string).
The given solution explicitly casts the object into string. The difference is thus in increased readability - less things are happening behind the scene - it is made explicit that you're operating on a string instance.
The extra line is basic casting the object to a string type in order for it to be printed out.
Another way would be...
string firstWord = "hello";
string "secondWord = "world";
object combination = string.Format("{0} {1}", firstWord, secondWord);
Console.WriteLine(combination.ToString());

Object reference behaviour

In below snippet, i have two variables firstString and secondString which holds same value "Hello". So the referenced location for both variables are same.
var firstString = "Hello";
var secondString = "Hello";
bool isSameReference = Object.ReferenceEquals(firstString, secondString);
//same reference for both variables
But updating secondString value as "Hey" does not update the firstString,even though it referes to the same location. Why these variables are not getting updated which refers to the same reference location?
secondString = "Hey..";
isSameReference = Object.ReferenceEquals(firstString, secondString);
//reference changed but firstString not updated
Updating secondString to it's pervious value as "Hello" makes the reference same.
secondString = "Hello";
isSameReference = Object.ReferenceEquals(firstString, secondString);
//now the reference for both variables are same
Why c# has this behaviour and how frmaework internaly handling this? Thanks in advance
The process called interning. You can read more on strings interning there. This made to save some space and processing time when allocating new sting with exact same content as already existing one. Also stings interning makes strings comparsion trivial operation. This is possible since String is immutable type.
You did not updating the string, you updated the reference to a string, which now points to "hey…" that does not include the reference to the string "Hello" of firstString.Furthermore ,the "compiler" of c# collects every static string in your code in a list without doublets, thats the reason why two different "Hello"s on different places are the same string if you compare the reference to them.
C# (.NET) holds every string literal only one time in the .NET heap. "Hello" and "Hey..." strings are stored in two different locations on the .NET heap. Initially, firstString and secondString point both to the "Hello" location. secondString = "Hey.."; just change the secondString variable to point to the location on heap where "Hey..." is located. You should be aware that a string variable holds the address(reference) of the place in heap where the string is really located.

C# Changing a string after it has been created

Okay I know this question is painfully simple, and I'll admit that I am pretty new to C# as well. But the title doesn't describe the entire situation here so hear me out.
I need to alter a URL string which is being created in a C# code behind, removing the substring ".aspx" from the end of the string. So basically I know that my URL, coming into this class, will be something like "Blah.aspx" and I want to get rid of the ".aspx" part of that string. I assume this is quite easy to do by just finding that substring, and removing it if it exists (or some similar strategy, would appreciate if someone has an elegant solution for it if they've thought done it before). Here is the problem:
"Because strings are immutable, it is not possible (without using unsafe code) to modify the value of a string object after it has been created." This is from the MSDN official website. So I'm wondering now, if strings are truly immutable, then I simply can't (shouldn't) alter the string after it has been made. So how can I make sure that what I'm planning to do is safe?
You don't change the string, you change the variable. Instead of that variable referring to a string such as "foo.aspx", alter it to point to a new string that has the value "foo".
As an analogy, adding one to the number two doesn't change the number two. Two is still just the same as it always way, you have changed a variable from referring to one number to refer to another.
As for your specific case, EndsWith and Remove make it easy enough:
if (url.EndsWith(".aspx"))
url = url.Remove(url.Length - ".aspx".Length);
Note here that Remove is taking one string, an integer, and giving us a brand new string, which we need to assign back to our variable. It doesn't change the string itself.
Also note that there is a URI class that you can use for parsing URLs, and it will be able to handle all of the complex situations that can arise, including hashes, query parameters, etc. You should use that to parse out the aspects of a URL that you are interested in.
String immutability is not a problem for normal usage -- it just means that member functions like "Replace", instead of modifying the existing string object, return a new one. In practical terms that usually just means you have to remember to copy the change back to the original, like:
string x = "Blah.aspx";
x.Replace(".aspx", ""); // still "Blah.aspx"
x = x.Replace(".aspx", ""); // now "Blah"
The weirdness around strings comes from the fact that System.String inherits System.Object, yet, because of its immutability, behaves like a value type rather than an object. For example, if you pass a string into a function, there's no way to modify it, unless you pass it by reference:
void Test(string y)
{
y = "bar";
}
void Test(ref string z)
{
z = "baz";
}
string x = "foo";
Test(x); // x is still "foo"
Test(ref x); // x is now "baz"
A String in C# is immutable, as you say. Meaning that this would create multiple String objects in memory:
String s = "String of numbers 0";
s += "1";
s += "2";
So, while the variable s would return to you the value String of numbers 012, internally it required the creation of three strings in memory to accomplish.
In your particular case, the solution is quite simple:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
myPath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Again, this appears as if myPath has changed, but it really has not. An internal copy and assign took place and you get to keep using the same variable.
Also, if you must preserve the original variable, you could simply make a new variable:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
String thePath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Either way, you end up with a variable you can use.
Note that the use of the Path methods ensures you get proper path operations, and not blind String replacements that could have unintended side-effects.
String.Replace() will not modify the string. It will create a new one. So the following code:
String myUrl = #"http://mypath.aspx";
String withoutExtension = myUrl.Replace(".aspx", "");
will create a brand-new string which is assigned to withoutExtension.

Why doesn't interning work on copies of a string?

Given:
object literal1 = "abc";
object literal2 = "abc";
object copiedVariable = string.Copy((string)literal1);
if (literal1 == literal2)
Console.WriteLine("objects are equal because of interning");//Are equal
if(literal1 == copiedVariable)
Console.WriteLine("copy is equal");
else
Console.WriteLine("copy not eq");//NOT equal
These results imply that copiedVariable is not subject to string interning. Why?
Is there a circumstance where its useful to have equivalent strings that are not interned or is this behavior due to some language detail?
If you think about it, the interning of strings is a process that it triggered at compile time on literals. Which implies that:
it is implicit when you assign/bind a literal to a variable
it is implicit when you copy a reference (i.e. string a = some_other_string_variable;)
On the other hand, if you create an instance of a string manually - at run-time by using a StringBuilder, or by Copy-ing, than you have to specifically request to intern it by invoking the Intern method of the String class.
Even in the remarks section of the documentation it is stated that:
The common language runtime conserves string storage by maintaining a
table, called the intern pool, that contains a single reference to
each unique literal string declared or created programmatically in
your program. Consequently, an instance of a literal string with a
particular value only exists once in the system. For example, if you
assign the same literal string to several variables, the runtime
retrieves the same reference to the literal string from the intern
pool and assigns it to each variable.
And the documentation for the Copy method of the String class states that it:
Creates a new instance of String with the same value as a specified
String.
which implies that it's not going to just return a reference to the same string (from the intern pool). Again, if it did there wouldn't be much use for it then, would there?!
Some languages requires the result be a copy for certain methods/procedures.
For example in substring type methods. The semantics would then be the same, even if if you call foo.substring(0, foo.length) (and how you would probably implement stringcopy).
Note: IIRC*, this is NOT the case with .NET's implementation of string.Substring though. It is not really clear from MSDN either. (see below)
It returns:
A string that is equivalent to the substring of length length that
begins at startIndex in this instance, or Empty if startIndex is equal
to the length of this instance and length is zero.
It notes:
This method does not modify the value of the current instance.
Instead, it returns a new string with length characters starting from
the startIndex position in the current string.
UPDATE
I remember correctly, it does indeed do a check with string InternalSubString(int startIndex, int length, bool fAlwaysCopy) if fAlwaysCopy is not false. Substring passes false to this method.
UPDATE 2
It looks like string.Copy could have used InternalSubString and passing true to the aforementioned parameter, but looking at the disassembly, it seems to use a slightly more optimized version and possibly save a method call.
Sorry for the redundant information.
* The reason I remember was when implementing the substring procedure for IronScheme, which the R6RS specification requires to make a copy :)

Categories

Resources