Why can a string object be reassigned again and again?

Why can a string object be reassigned again and again? - c#

Isn't string immutable? Why can line in the following example be reassigned again and again with file.ReadLine()? Thanks.
int counter = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
Console.WriteLine (line);
counter++;
}
file.Close();
// Suspend the screen.
Console.ReadLine();

Isn't string immutable?
Yes, strings are immutable.
Why can line in the following example be reassigned again and again
line is not a string. line is a variable which refers to a string. Variables are called variables because they vary.
That might imply that this fact is a property of references. It is not.
The number 1 is immutable, right? No matter what you do to 1, it is 1. If you add 10 to 1 and get 11, you have not changed 10 or 1. 10 and 1 remain 10 and 1. The result is 11, a brand new number.
So if numbers are immutable, then why can I say:
int x = 1;
x = x + 10;
? Because x is not a number. x is a variable which holds a number, and variables can vary.
Let's think of another example. Think of something that is immutable in real life. Say, the value of a coin. If you have a dime, there is nothing you can do to make it worth more or less than 10 cents and still have it be a dime. Dimes are immutable. Suppose you have a drawer that you keep exactly one coin in, and today it contains a dime. Tomorrow you take the dime out of the drawer and put in a quarter. How did you do that, if a dime is immutable? Variables are like drawers. You can change their contents, even if the objects in the drawer are immutable.
Finally, the title of your question clearly shows the fundamental cause of your confusion:
Why can a string object be reassigned again and again?
Objects are not things that can be assigned in the first place. Variables can be assigned, and variables are not objects. Variables are storage locations that can contain values.
If you have been taught by some book that variables are a kind of object -- and many beginner books make this mistake -- then throw away that book and get a decent book that is not full of lies. The string returned by ReadLine is an object. A reference to that object is assigned to the variable. The value of that variable is then a reference to an immutable object.
There are two things in C# that look like variables but have slightly different semantics.
First, const locals or fields are not variables, because constants cannot vary. If they could vary then they would be variables, not constants. If you say
const string s = "Hello";
then not only is Hello immutable, but s is too. You should only use const for things that are logically immutable for all time. The price of gold, the name of your bank, your last name, these things can all change. The atomic weight of gold, the value of pi, these things cannot ever change, and so they can be const. C# only allows certain types to be const, and only allows certain expressions to initialize a const.
A readonly field is halfway between a const and a variable. A readonly field is a variable in the constructor or field initializer, and is illegal to write to from any other location. C# treats readonly fields as values, not variables, in all code outside of a constructor or field initializer.

Immutable variables can be reassigned but not mutated.
In this example:
string s1 = "hello";
string s2 = s1;
string s1 = "goodbye";
The symbol s1 is reassigned to a new string object "goodbye", while s2 refers to the same original string ("hello"), because the string was never mutated.

your variable 'line' should be thought of as a pointer to the string, not as the string.
The string itself cannot be changed. For example you cannot do
line[4] = 'a'
expecting to be able to change the fifth char. Contrast with c where you can do
mystr[4] = 'a'
(in most cases anyway)

Immutable doesn't mean a reference to the object can't be reassigned, it means the object itself can't be mutated (changed).
You can reassign the for forever, but can't change any of the strings themselves.
If you made the reference a constant, then it couldn't be reassigned.

Related

Passing by reference to n-th element in C#

In C, if we have an array, we can pass it by reference to a function. We can also use simple addition of (n-1) to pass the reference starting from n-th element of the array like this:
char *strArr[5];
char *str1 = "I want that!\n";
char *str2 = "I want this!\n";
char *str3 = "I want those!\n";
char *str4 = "I want these!\n";
char *str5 = "I want them!\n";
strArr[0] = str1;
strArr[1] = str2;
strArr[2] = str3;
strArr[3] = str4;
strArr[4] = str5;
printPartially(strArr + 1, 4); //we can pass like this to start printing from 2nd element
....
void printPartially(char** strArrPart, char size){
int i;
for (i = 0; i < size; ++i)
printf(strArrPart[i]);
}
Resulting in these:
I want this!
I want those!
I want these!
I want them!
Process returned 0 (0x0) execution time : 0.006 s
Press any key to continue.
In C#, we can also pass reference to an object by ref (or, out). The object includes array, which is the whole array (or at least, this is how I suppose it works). But how are we to pass by reference to the n-th element of the array such that internal to the function, there is only string[] whose elements are one less than the original string[] without the need to create new array?
Must we use unsafe? I am looking for a solution (if possible) without unsafe
Edit:
I understand that we could pass Array in C# without ref keyword. Perhaps my question sounds quite misleading by mentioning ref when we talk about Array. The point why I put ref there, I should rather put it this way: is the ref keyword can be used, say, to pass the reference to n-th element of the array as much as C does other than passing reference to any object (without mentioning the n-th element or something alike)? My apology for any misunderstanding occurs by my question's phrasing.

The "safe" approach would be to pass an ArraySegment struct instead.
You can of course pass a pointer to a character using unsafe c#, but then you need to worry about buffer overruns.
Incidentally, an Array in C# is (usually) allocated on the heap, so passing it normally (without ref) doesn't mean copying the array- it's still a reference that is passed (just a new one).

Edit:
You won't be able to do it as you do in C in safe code.
A C# array (i.e. string[]) is derived from abstract type Array.
It is not only a simple memory block as it is in C.
So you can't send one of it's element's reference and start iterate from there.
But there are some solutions which will give you the same taste of course (without unsafe):
Like:
As #Chris mentioned you can use ArraySegment<T>.
As Array is also an IEnumerable<T> you can use .Skip and send the returned value. (but this will give you an IEnumerable<T> instead of an Array). But it will allow you iterate.
etc...

If the method should only read from the array, you can use linq:
string[] strings = {"str1", "str2", "str3", ...."str10"};
print(strings.Skip(1).Take(4).ToArray());

Your confusion is a very common one. The essential point is realizing that "reference types" and "passing by reference" (ref keyboard) are totally independent. In this specific case, since string[] is a reference type (as are all arrays), it means the object is not copied when you pass it around, hence you are always referring to the same object.
Modified Version of C# Code:
string[] strArr = new string[5];
strArr[0] = "I want that!\n";
strArr[1] = "I want this!\n";
strArr[2] = "I want those!\n";
strArr[3] = "I want these!\n";
strArr[4] = "I want them!\n";
printPartially(strArr.Skip(1).Take(4).ToArray());
void printPartially(string[] strArr)
{
foreach (string str in strArr)
{
Console.WriteLine(str);
}
}

Question is old, but maybe answer will be useful for someone.
As of C# 7.2 there are much more types to use in that case, ex. Span or Memory.
They allow exactly for the thing you mentioned in your question (and much more).
Here's great article about them
Currently, if you want to use them, remeber to add <LangVersion>7.2</LangVersion> in .csproj file of your project to use C# 7.2 features

Why can you change an immutable string variable if it cannot be changed?

This might have been asked somewhere else but I just did not see it when I did a simple search. I am taking a C# class for work and I do know that string is immutable and stringBuilder can be changed through out the code.
But why does C# let you change the string if it is immutable?
Or does it let you change the string because it take a new memory spot and creates another hole string variable and abandons the other variable address location. If this is the case and the automatic garbage collection comes through and tidies up the C# memory locations does it really matter if a person uses string or stringBuilder?
Code:
string sss = 'abcd';
sss = 'ghy';
sss = 'nnnnnn';

That's changing the value of a variable by assigning the variable to a different value.
That is not mutating the value to which the variable refers.

You've basically answered the question yourself. The short answer is " C# does not let you change the string"
For various design reasons, "String" in c# is implemented as a reference-type. What you're doing is changing the what sss references - you're not changing the thing that's being referenced. 'abcd' exists separately from 'ghy' exists separately from 'nnnnnn' You're not changing one thing into another thing. You're creating a whole new thing, and then changing which one sss references.
Strings are immutable in C# to make them behave like value-types even though they're implemented as reference types. Why? Because the alternative is worse:
Consider an example where strings are reference types, but aren't immutable - and you can change 'abcd' into 'ab12'. How would that look?
string mutableString = "abcd";
string anotherRef = mutableString;
mutableString.mutateTo("ab12"); //A method that doesn't exist - for exposition only
at this point anotherRef would also be "ab12" - which seems very unintuitive.
Regarding String Builder:
StringBuilder is more of a performance thing that's used in special cases. For most string concatenation, it doesn't matter. If you have two strings like "My name is" and "Pete" and you concatenate them - then you'll end up with a new string "My name is Pete" and the old strings will probably be stuck around in memory somewhere. Not a big deal. You have the full string that you want - not a lot of time or memory wasted. Consider a more extreme case though: What if you have more words: "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog" and you want to concatenate them all together:
"The" + "quick" yields "The quick" (a new string)
"The quick" + "brown" yields "The quick brown" (another new string)
"The quick brown" + "fox" yields "The quick brown fox" (another new string - getting longer)
"The quick brown fox" + "jumps" yields "The quick brown fox jumps" (another new string - each more expensive than the last!)
etc. . .
You can see that this method results in a lot of extra intermediate strings that you don't really care about, but they'll all be allocated - which takes time - and they'll all hang around in memory until the GC gets rid of them. StringBuilder gives you a method to do the concatenation without having to allocate every one of the intermediate results.
Note - this next part isn't strictly true, but it's close enough for this example. There is no contractual behavior for the internal implementation of StringBuilder: You can think of it as "Lazy concatenation." It can hold references to all of the component strings internally, but won't reallocate space for the final result until you call the ToString method when you're done appending strings. This lets it compute the final size that it needs to be, and reallocate enough space for the whole combined string - once instead of n times.

Object reference behaviour

In below snippet, i have two variables firstString and secondString which holds same value "Hello". So the referenced location for both variables are same.
var firstString = "Hello";
var secondString = "Hello";
bool isSameReference = Object.ReferenceEquals(firstString, secondString);
//same reference for both variables
But updating secondString value as "Hey" does not update the firstString,even though it referes to the same location. Why these variables are not getting updated which refers to the same reference location?
secondString = "Hey..";
isSameReference = Object.ReferenceEquals(firstString, secondString);
//reference changed but firstString not updated
Updating secondString to it's pervious value as "Hello" makes the reference same.
secondString = "Hello";
isSameReference = Object.ReferenceEquals(firstString, secondString);
//now the reference for both variables are same
Why c# has this behaviour and how frmaework internaly handling this? Thanks in advance

The process called interning. You can read more on strings interning there. This made to save some space and processing time when allocating new sting with exact same content as already existing one. Also stings interning makes strings comparsion trivial operation. This is possible since String is immutable type.

You did not updating the string, you updated the reference to a string, which now points to "hey…" that does not include the reference to the string "Hello" of firstString.Furthermore ,the "compiler" of c# collects every static string in your code in a list without doublets, thats the reason why two different "Hello"s on different places are the same string if you compare the reference to them.

C# (.NET) holds every string literal only one time in the .NET heap. "Hello" and "Hey..." strings are stored in two different locations on the .NET heap. Initially, firstString and secondString point both to the "Hello" location. secondString = "Hey.."; just change the secondString variable to point to the location on heap where "Hey..." is located. You should be aware that a string variable holds the address(reference) of the place in heap where the string is really located.

C# Changing a string after it has been created

Okay I know this question is painfully simple, and I'll admit that I am pretty new to C# as well. But the title doesn't describe the entire situation here so hear me out.
I need to alter a URL string which is being created in a C# code behind, removing the substring ".aspx" from the end of the string. So basically I know that my URL, coming into this class, will be something like "Blah.aspx" and I want to get rid of the ".aspx" part of that string. I assume this is quite easy to do by just finding that substring, and removing it if it exists (or some similar strategy, would appreciate if someone has an elegant solution for it if they've thought done it before). Here is the problem:
"Because strings are immutable, it is not possible (without using unsafe code) to modify the value of a string object after it has been created." This is from the MSDN official website. So I'm wondering now, if strings are truly immutable, then I simply can't (shouldn't) alter the string after it has been made. So how can I make sure that what I'm planning to do is safe?

You don't change the string, you change the variable. Instead of that variable referring to a string such as "foo.aspx", alter it to point to a new string that has the value "foo".
As an analogy, adding one to the number two doesn't change the number two. Two is still just the same as it always way, you have changed a variable from referring to one number to refer to another.
As for your specific case, EndsWith and Remove make it easy enough:
if (url.EndsWith(".aspx"))
url = url.Remove(url.Length - ".aspx".Length);
Note here that Remove is taking one string, an integer, and giving us a brand new string, which we need to assign back to our variable. It doesn't change the string itself.
Also note that there is a URI class that you can use for parsing URLs, and it will be able to handle all of the complex situations that can arise, including hashes, query parameters, etc. You should use that to parse out the aspects of a URL that you are interested in.

String immutability is not a problem for normal usage -- it just means that member functions like "Replace", instead of modifying the existing string object, return a new one. In practical terms that usually just means you have to remember to copy the change back to the original, like:
string x = "Blah.aspx";
x.Replace(".aspx", ""); // still "Blah.aspx"
x = x.Replace(".aspx", ""); // now "Blah"
The weirdness around strings comes from the fact that System.String inherits System.Object, yet, because of its immutability, behaves like a value type rather than an object. For example, if you pass a string into a function, there's no way to modify it, unless you pass it by reference:
void Test(string y)
{
y = "bar";
}
void Test(ref string z)
{
z = "baz";
}
string x = "foo";
Test(x); // x is still "foo"
Test(ref x); // x is now "baz"

A String in C# is immutable, as you say. Meaning that this would create multiple String objects in memory:
String s = "String of numbers 0";
s += "1";
s += "2";
So, while the variable s would return to you the value String of numbers 012, internally it required the creation of three strings in memory to accomplish.
In your particular case, the solution is quite simple:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
myPath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Again, this appears as if myPath has changed, but it really has not. An internal copy and assign took place and you get to keep using the same variable.
Also, if you must preserve the original variable, you could simply make a new variable:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
String thePath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Either way, you end up with a variable you can use.
Note that the use of the Path methods ensures you get proper path operations, and not blind String replacements that could have unintended side-effects.

String.Replace() will not modify the string. It will create a new one. So the following code:
String myUrl = #"http://mypath.aspx";
String withoutExtension = myUrl.Replace(".aspx", "");
will create a brand-new string which is assigned to withoutExtension.

Why doesn't interning work on copies of a string?

Given:
object literal1 = "abc";
object literal2 = "abc";
object copiedVariable = string.Copy((string)literal1);
if (literal1 == literal2)
Console.WriteLine("objects are equal because of interning");//Are equal
if(literal1 == copiedVariable)
Console.WriteLine("copy is equal");
else
Console.WriteLine("copy not eq");//NOT equal
These results imply that copiedVariable is not subject to string interning. Why?
Is there a circumstance where its useful to have equivalent strings that are not interned or is this behavior due to some language detail?

If you think about it, the interning of strings is a process that it triggered at compile time on literals. Which implies that:
it is implicit when you assign/bind a literal to a variable
it is implicit when you copy a reference (i.e. string a = some_other_string_variable;)
On the other hand, if you create an instance of a string manually - at run-time by using a StringBuilder, or by Copy-ing, than you have to specifically request to intern it by invoking the Intern method of the String class.
Even in the remarks section of the documentation it is stated that:
The common language runtime conserves string storage by maintaining a
table, called the intern pool, that contains a single reference to
each unique literal string declared or created programmatically in
your program. Consequently, an instance of a literal string with a
particular value only exists once in the system. For example, if you
assign the same literal string to several variables, the runtime
retrieves the same reference to the literal string from the intern
pool and assigns it to each variable.
And the documentation for the Copy method of the String class states that it:
Creates a new instance of String with the same value as a specified
String.
which implies that it's not going to just return a reference to the same string (from the intern pool). Again, if it did there wouldn't be much use for it then, would there?!

Some languages requires the result be a copy for certain methods/procedures.
For example in substring type methods. The semantics would then be the same, even if if you call foo.substring(0, foo.length) (and how you would probably implement stringcopy).
Note: IIRC*, this is NOT the case with .NET's implementation of string.Substring though. It is not really clear from MSDN either. (see below)
It returns:
A string that is equivalent to the substring of length length that
begins at startIndex in this instance, or Empty if startIndex is equal
to the length of this instance and length is zero.
It notes:
This method does not modify the value of the current instance.
Instead, it returns a new string with length characters starting from
the startIndex position in the current string.
UPDATE
I remember correctly, it does indeed do a check with string InternalSubString(int startIndex, int length, bool fAlwaysCopy) if fAlwaysCopy is not false. Substring passes false to this method.
UPDATE 2
It looks like string.Copy could have used InternalSubString and passing true to the aforementioned parameter, but looking at the disassembly, it seems to use a slightly more optimized version and possibly save a method call.
Sorry for the redundant information.
* The reason I remember was when implementing the substring procedure for IronScheme, which the R6RS specification requires to make a copy :)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.