Strings in Java and C#

Strings in Java and C# - c#

I recently moved over to C# from Java and wanted to know how do we explicitly define a string thats stored on heap.
For example:
In Java, there are two ways we can define Strings:
String s = "Hello" //Goes on string pool and is interned
String s1 = new String("Hello") //creates a new string on heap
AFAIK, C# has only one way of defining String:
String s = "Hello" // Goes on heap and is interned
Is there a way I can force this string to be created on heap, like we do in Java using new operator? There is no business need for me to do this, its just for my understanding.

In C#, strings are ALWAYS created on the heap. Constant strings are also (by default) always interned.
You can force a non-constant string to be interned using string.Intern(), as the following code demonstrates:
string a1 = "TE";
string a2 = "ST";
string a = a1 + a2;
if (string.IsInterned(a) != null)
Console.WriteLine("a was interned");
else
Console.WriteLine("a was not interned");
string.Intern(a);
if (string.IsInterned(a) != null)
Console.WriteLine("a was interned");
else
Console.WriteLine("a was not interned");

In C#, the datatypes can be either
value types - which gets created in the stack (e.g. int, struct )
reference type - which gets created in the heap (e.g string, class)
Since strings are reference types and it always gets created in a heap.

In the .net platform, strings are created on the heap always. If you want to edit a string stay:
string foo = "abc";
string foo = "abc"+ "efg";
it will create a new string, it WON'T EDIT the previous one. The previous one will be deleted from the heap. But, to conclude, it will always be created on the heap.

Like Java:
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
and various ways are explained in this link.

On .Net your literal string will be created on the heap and a reference added to the intern pool before the program starts.
Allocating a new string on the heap occurs at runtime if you do something dynamic like concatenating two variables:
String s = string1 + string2;
See: http://msdn.microsoft.com/library/system.string.intern.aspx

Related

How to view the address of string, to check its (reference) type in C#?

using System;
using System.Runtime.InteropServices;
using System.Security.Claims;
using System.Text;
Method();
unsafe void Method()
{
string a = "hello";
string b = a;
//or: string b = "hello";
Console.WriteLine(object.ReferenceEquals(a, b)); // True.
string aa = "hello";
string bb = "h";
bb += "ello";
Console.WriteLine(object.ReferenceEquals(aa, bb)); // False.
int aaa = 100;
int bbb = aaa;
Console.WriteLine(object.ReferenceEquals(aaa, bbb)); // False.
string* pointer1;
string* pointer2;
string word1 = "Hello";
string word2 = "Hello";
pointer1 = &word1;
pointer2 = &word2;
ulong addr1 = (ulong)pointer1;
ulong addr2 = (ulong)pointer2;
Console.WriteLine($"Address of variable named word1: {addr1}");
Console.WriteLine($"Address of variable named word2: {addr2}");
}
Why different locations?
It works correctly with object/string.ReferenceEquals. But I can't see the ADDRESSes of strings. Beginner in the world of IT. Be kind, people.

We'll start from here:
string word1 = "Hello";
string word2 = "Hello";
It seems you expect word1 and word2 to refer to the same string object in memory. But that's not how it works for normal objects (strings can be a little different... we'll get there). For normal reference types, you should expect two different objects. The two objects have equivalent values, but they are still different objects.
This is important. Imagine the next line changed the string for word1. You would not want the word2 variable to also change.
Now, strings are a little bit "special" in this area. Depending on which version of .Net you're running, the compiler may opt to intern equivalent strings. This means it will use the same object in memory for strings with equivalent values.
This is possible because strings are immutable. That is, calling, say, word1.Replace("e", "3") does not change the value of the string in word1 to instead be "h3llo", and therefore word2 is also not modified by changes made from word1. Instead, the Replace() call returns a new string. Additionally, all the string methods and properties work this way, such that there is no way to change an existing string in-place.
If you want word1 to receive that new value, you must also assign it to the variable: word1 = word1.Replace("e", "3");. Since this is a new assignment and only assigns to word1, the word2 variable will still show "hello". So everything works as expected, and you were able to save some memory use while the two values were equal. Again: strings have special treatment here, and this is a little different from how most reference objects work by default.
But there's another important thing to understand about memory managemnet in .Net. The Garbage Collector can sometimes move objects to new locations. This means any address you see at one moment may not be the address it uses the next moment. This can especially happen during the compaction phase of garbage collection.
Now, it is possible to pin objects via the fixed keyword, but this is not usually a good idea; it's something to avoid unless you really need it: say to pass the object to an outside unmanaged library. There are a number of reasons for this, but one is it prevents the garbage collector from collecting the resource at all until the fixed block closes.

String inside structure behavior

Let's say we have structure
struct MyStruct
{
public string a;
}
When we assign it to the new variable what will be happened with the string? So for example, we expect that string should be shared when structs are copied in the stack. We're using this code to test it, but it returns different pointers:
var a = new MyStruct();
a.a = "test";
var b = a;
IntPtr pA = Marshal.StringToCoTaskMemAnsi(a.a);
IntPtr pB = Marshal.StringToCoTaskMemAnsi(b.a);
Console.WriteLine("Pointer of a : {0}", (int)pA);
Console.WriteLine("Pointer of b : {0}", (int)pB);
The question is when structs are copied in the stack and have string inside did it share the string or the string is recreated?
[UPDATE]
We also tried this code, it returns different pointers as well:
char charA2 = a.a[0];
char charB2 = b.a[0];
unsafe
{
var pointerA2 = &charA2;
var pointerB2 = &charB2;
Console.WriteLine("POinter of a : {0}", (int)pointerA2);
Console.WriteLine("Pointer of b : {0}", (int)pointerB2);
}

The code you use to test it 'Copies the contents of a managed String to a block of memory allocated from the unmanaged COM task allocator.' according to MSDN. I would be surprised if any two subsequent calls to StringToCoTaskMemAnsi would return the same pointer. You can look at the memory address of the two string references or assign an object id using the debugger. Or easier: object.ReferenceEquals(a.a, b.a);
In your update, you are pointing to the stack location of the character variables, also not a good way of finding out. In any case, you are just copying the reference when you assign a string to another string, so they should always be the same.

Strings are immutable in storage and a reference type. Furthermore in your example, the string "test" is interned. So no matter how many copies of the struct you makes you ultimately have multiple pointers to the same underling storage (unless you go through gyrations to copy it to a new block of memory which your contrived examples are doing)
Rest assured that there is only one copy, pointed at multiple times.

Object.ReferenceEquals returns true for matching strings

I am using Mono and encountered an interesting result when comparing references of two strings. The code below demonstrates an example:
using System;
class Program
{
static void Main()
{
String s1 = "asd";
String s2 = "asd";
Console.WriteLine("Reference Equals: {0}", Object.ReferenceEquals(s1, s2));
Console.ReadLine();
}
}
Yields true.
It is interesting, two strings have same value but obviously they refer to two different instances. What is going on?
mono --version : Mono JIT compiler version 3.2.6
OS X 10.9.2

http://msdn.microsoft.com/en-us/library/system.string.intern.aspx
The common language runtime conserves string storage by maintaining a
table, called the intern pool, that contains a single reference to
each unique literal string declared or created programmatically in
your program. Consequently, an instance of a literal string with a
particular value only exists once in the system.
The below shows behavior when strings are not created from a string literal.
static void Main(string[] args)
{
var string1 = new string(new []{'c'});
var string2 = new string(new []{'c'});
Console.WriteLine(string1.Equals(string2)); //true
Console.WriteLine(Object.ReferenceEquals(string1,string2)); //false
}

It is interesting, two strings have same value but obviously they
refer to two different instances
no they dont refer to two different instances, infact there are no two different instances there is only one instance created as you are providing same string literal.
in your program for all similar string constants one and only instance is created and all string reference variables refer the same instance hence when you run the ReferenceEquals() method on of those references, you will get True a they are logically referring same instance.
From : String Interning
The CLR maintains a table called the intern pool that contains a
single, unique reference to every literal string that's either declared or created programmatically while your program's running.
if you want to see the expected result try the below snippet. it will create two different instances as they are passed to constructor.
Try This:
String s1 = new String("asd".ToCharArray());
String s2 = new String("asd".ToCharArray());
Console.WriteLine("Reference Equals: {0}",Object.ReferenceEquals(s1, s2));//False

how String object is allocate memory without having new keyword or constructor?

In C# if we want to create a variable of type string we can use:
string str="samplestring"; // this will allocate the space to hold the string
In C#, string is a class type, so if we want to create an object, normally we have to use the new keyword. So how is allocation happening without new or constructors?

When you write
string str="samplestring";
compiler will generate two instructions:
Firstly, ldstr gets a string literal from the metadata; allocates the requisite amount of memory; creates a new String object and pushes the reference to it onto the stack.
Then stloc (or one of it's short forms, e.g. stloc.0) stores that reference in the local variable str.
Note, that ldstr will allocate memory only once for each sequence of characters.
So in example below both variables will point at the same object in memory:
// CLR will allocate memory and create a new String object
// from the string literal stored in the metadata
string a = "abc";
// CLR won't create a new String object. Instead, it will look up for an existing
// reference pointing to the String object created from "abc" literal
string b = "abc";
This process is known as string interning.
Also, as you know, in .NET strings are immutable. So the contents of a String object cannot be changed after the object is created. That is, every time you're concatenating string, CLR will create a new String object.
For example, the following lines of code:
string a = "abc";
string b = a + "xyz";
Will be compiled into the following IL (not exactly, of course):
ldstr will allocate memory and create a new String object from "abc" literal
stloc will store the reference to that object in the local variable a
ldloc will push that reference onto the stack
ldstr will allocate memory and create a new String object from "xyz" literal
call will invoke the System.String::Concat on these String objects on the stack
A call to System.String::Concat will be decomposed into dozens of IL instructions and internal calls. Which, in short, will check lengths of both strings and allocate the requisite amount of memory to store the concatenation result and then copy those strings into the newly allocated memory.
stloc will store the reference to the newly created string in the local variable b

This is simply the C# compiler giving you a shortcut by allowing string literals.
If you'd rather, you can instantiate a string by any number of different constructors. For example:
char[] chars = { 'w', 'o', 'r', 'd' };
string myStr = new String(chars);

According to the MS documentation you do not need to use the new command to use the default string constructor.
However this does work.
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
c# Strings (from MSDN programming guide)

Strings are in fact reference types. The variable hold a reference to the value in the memory. Therefore you are just assigning the reference and not the value to the object. I would recommend you to have a look at this video from Pluralsight (you can get a free 14 days trial)
Pluralsight C# Fundamentals - Strings
Disclaimer: I am in no way related to Pluralsight. I am a subscriber and I love the videos over there

While everything is an object in .net there are still primitive types (int, bool, etc) that do not require instantiation.
as you can see here, a string is a 4byte address ref pointing to a vector/array structure that can extend to up to 2GB. remember strings are unmutable types so when you change a string you are not editing the existing variable, but instead allocating new memory for the literal value and then changing your string pointer to point to your new memory structure.
hope that helps

When you creates a string using literals, internally, depending on your assembly is marked with NoStringInterning flag or not, it's looks like:
String str = new String("samplestring");
// or with NoStringInterning
String str = String.Intern("samplestring");

In java if you write something like that:
String s1 = "abc";
String s2 = "abc";
memory will be allocated for "abc" in so called string pool and both s1 and s2 will refer to that memory. And s1 == s2 will return true ("==" compares references). But if you write:
String s1 = new String("abc");
String s1 = new String("abc");
s1 == s2 will return false. I guess in c# it'll be the same.

C# passing parameters by reference

I have the below piece of code which Prefixs a string to the start of each member of a string array. ie. ["a","b","c"] prefixed with "z" becomes ["za","zb","zc"].
private string[] Prefix(string[] a, string b) {
for(int i = 0;i < a.Length;i++) {
a[i] = b + a[i];
}
return a;
}
The function works fine (although if theres a better way to do this, I'm happy to hear it), but I'm having issues when passing parameters.
string[] s1 = new string[] {"a","b"};
string[] s2 = Prefix(s1,"z");
Now as far as I can tell, I'm passing s1 by Value. But when the Prefix function has finished, s2 and s1 have the same value of ["za,"zb"], or s1 has been passed by reference. I was certain you had to explicitly declare this behaviour in c#, and am very confused.

As others have said, the reference is passed by value. That means your s1 reference is copied to a, but they both still refer to the same object in memory. What I would to do fix your code is write it like this:
private IEnumerable<string> Prefix(IEnumerable<string> a, string b) {
return a.Select(s => b + s);
}
.
string[] s1 = new string[] {"a","b"};
string[] s2 = Prefix(s1,"z").ToArray();
This not only fixes your problem, but also allows you to work with Lists and other string collections in addition to simple arrays.

C#, like Java before it, passes everything by value by default.
However, for reference types, that value is a reference. Note that both string and array are reference types.

Here's a better way to do it:
private string[] Prefix(string[] a, string b) {
return a.Select(s => b + s).ToArray();
}
or even:
private IEnumerable<string> Prefix(IEnumerable<string> a, string b) {
return a.Select(s => b + s);
}

You are passing the reference to s1 by value. In other words, your a parameter (when in the Prefix function scope), and your s1 variable, are references to the same array.

strings are immutable
this means that when you append a string to a string you get out a totally new string - its for performance reasons. its cheaper to make a new string that to reallocate the existing array.
hence why it feels like you are working with strings by value
in c# all reference types are passed by reference by default - ie classes creaped on the heap rather than values.

Actually, a reference to s1 was passed by value to Prefix(). While you now have two different references, both s1 and s2 still refer to the same string array, as arrays are reference types in C#.

Passing by value doesn't mean that you can't change things.
Objects are passed by value, but the value is (effectively) a pointer to the object.
Arrays are objects and a pointer to the array gets passed in. If you change the contents of the array in a method, the array will reflect the changes.
This doesn't happen with strings only because strings are immutable - once constructed, their contents can't change.

A reference to the string array is passed by value.
Consequently, the original array reference in the calling method cannot be changed, meaning that a = new string[10] within the Prefix method would have no impact on the s1 reference in the calling method. But the array itself is mutable, and a duplicate reference to the same array is capable of making changes to it in a way that would be visible to any other reference to the same array.

The value of an object is passed. For "reference" objects this is the value of the reference. A clone/copy/duplicate of the underlying data is not made.
To fix the issue observed, simply don't mutate the input array -- instead, create a new output array/object and fill it in appropriately. (If you must use arrays, I would likely use this approach as it's so boringly C-like anyway.)
Alternately, you can clone the input array first (which also creates a new object). Using a clone (which is a shallow copy) in this case is okay because the inner members (strings) are themselves immutable -- even though they are reference types the value of the underlying object can't be changed once created. For nested mutable types, more care may need to be taken.
Here are two methods which can be used to create a shallow copy:
string[] B = (string[])A.Clone();
string[] B = (new List<string>(A)).ToArray();
It's not inclusive.
Personally though, I would use LINQ. Here is a teaser:
return a.Select(x => b + x).ToArray();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.