String inside structure behavior

String inside structure behavior - c#

Let's say we have structure
struct MyStruct
{
public string a;
}
When we assign it to the new variable what will be happened with the string? So for example, we expect that string should be shared when structs are copied in the stack. We're using this code to test it, but it returns different pointers:
var a = new MyStruct();
a.a = "test";
var b = a;
IntPtr pA = Marshal.StringToCoTaskMemAnsi(a.a);
IntPtr pB = Marshal.StringToCoTaskMemAnsi(b.a);
Console.WriteLine("Pointer of a : {0}", (int)pA);
Console.WriteLine("Pointer of b : {0}", (int)pB);
The question is when structs are copied in the stack and have string inside did it share the string or the string is recreated?
[UPDATE]
We also tried this code, it returns different pointers as well:
char charA2 = a.a[0];
char charB2 = b.a[0];
unsafe
{
var pointerA2 = &charA2;
var pointerB2 = &charB2;
Console.WriteLine("POinter of a : {0}", (int)pointerA2);
Console.WriteLine("Pointer of b : {0}", (int)pointerB2);
}

The code you use to test it 'Copies the contents of a managed String to a block of memory allocated from the unmanaged COM task allocator.' according to MSDN. I would be surprised if any two subsequent calls to StringToCoTaskMemAnsi would return the same pointer. You can look at the memory address of the two string references or assign an object id using the debugger. Or easier: object.ReferenceEquals(a.a, b.a);
In your update, you are pointing to the stack location of the character variables, also not a good way of finding out. In any case, you are just copying the reference when you assign a string to another string, so they should always be the same.

Strings are immutable in storage and a reference type. Furthermore in your example, the string "test" is interned. So no matter how many copies of the struct you makes you ultimately have multiple pointers to the same underling storage (unless you go through gyrations to copy it to a new block of memory which your contrived examples are doing)
Rest assured that there is only one copy, pointed at multiple times.

Related

c# modify interned string through Span/Memory and MemoryMarshal

I started digging into new C#/.net core features called Span and Memory and so far they look very good.
However, when I encountered MemoryMarshal.AsMemory method I found out the following interesting use case:
const string source1 = "immutable string";
const string source2 = "immutable string";
var memory = MemoryMarshal.AsMemory(source1.AsMemory());
ref char first = ref memory.Span[0];
first = 'X';
Console.WriteLine(source1);
Console.WriteLine(source2);
Output in both cases is Xmmutable string (tested on Windows 10 x64, .net471 and .netcore2.1). And as far as I can see any string that is interned can now be modified in one place and then all references to that string will use updated value.
Is there any way to prevent such behavior? And is it possible to "unintern" string?

This is just the way it works
MemoryMarshal.AsMemory(ReadOnlyMemory) Method
Creates a Memory instance from a ReadOnlyMemory.
Returns
- Memory<T> A memory block that represetns the same memory as the ReadOnlyMemory .
Remarks
This method must be used with extreme caution. ReadOnlyMemory is used to represent immutable data and other memory that is not meant to
be written to. Memory instances created by this method should not
be written to. The purpose of this method is to allow variables typed
as Memory but only used for reading to store a ReadOnlyMemory.
More things you shouldn't do
private const string source1 = "immutable string1";
private const string source2 = "immutable string2";
public unsafe static void Main()
{
fixed(char* c = source1)
{
*c = 'f';
}
Console.WriteLine(source1);
Console.WriteLine(source2);
Console.ReadKey();
}
Output
fmmutable string1
immutable string2

Strings in Java and C#

I recently moved over to C# from Java and wanted to know how do we explicitly define a string thats stored on heap.
For example:
In Java, there are two ways we can define Strings:
String s = "Hello" //Goes on string pool and is interned
String s1 = new String("Hello") //creates a new string on heap
AFAIK, C# has only one way of defining String:
String s = "Hello" // Goes on heap and is interned
Is there a way I can force this string to be created on heap, like we do in Java using new operator? There is no business need for me to do this, its just for my understanding.

In C#, strings are ALWAYS created on the heap. Constant strings are also (by default) always interned.
You can force a non-constant string to be interned using string.Intern(), as the following code demonstrates:
string a1 = "TE";
string a2 = "ST";
string a = a1 + a2;
if (string.IsInterned(a) != null)
Console.WriteLine("a was interned");
else
Console.WriteLine("a was not interned");
string.Intern(a);
if (string.IsInterned(a) != null)
Console.WriteLine("a was interned");
else
Console.WriteLine("a was not interned");

In C#, the datatypes can be either
value types - which gets created in the stack (e.g. int, struct )
reference type - which gets created in the heap (e.g string, class)
Since strings are reference types and it always gets created in a heap.

In the .net platform, strings are created on the heap always. If you want to edit a string stay:
string foo = "abc";
string foo = "abc"+ "efg";
it will create a new string, it WON'T EDIT the previous one. The previous one will be deleted from the heap. But, to conclude, it will always be created on the heap.

Like Java:
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
and various ways are explained in this link.

On .Net your literal string will be created on the heap and a reference added to the intern pool before the program starts.
Allocating a new string on the heap occurs at runtime if you do something dynamic like concatenating two variables:
String s = string1 + string2;
See: http://msdn.microsoft.com/library/system.string.intern.aspx

how String object is allocate memory without having new keyword or constructor?

In C# if we want to create a variable of type string we can use:
string str="samplestring"; // this will allocate the space to hold the string
In C#, string is a class type, so if we want to create an object, normally we have to use the new keyword. So how is allocation happening without new or constructors?

When you write
string str="samplestring";
compiler will generate two instructions:
Firstly, ldstr gets a string literal from the metadata; allocates the requisite amount of memory; creates a new String object and pushes the reference to it onto the stack.
Then stloc (or one of it's short forms, e.g. stloc.0) stores that reference in the local variable str.
Note, that ldstr will allocate memory only once for each sequence of characters.
So in example below both variables will point at the same object in memory:
// CLR will allocate memory and create a new String object
// from the string literal stored in the metadata
string a = "abc";
// CLR won't create a new String object. Instead, it will look up for an existing
// reference pointing to the String object created from "abc" literal
string b = "abc";
This process is known as string interning.
Also, as you know, in .NET strings are immutable. So the contents of a String object cannot be changed after the object is created. That is, every time you're concatenating string, CLR will create a new String object.
For example, the following lines of code:
string a = "abc";
string b = a + "xyz";
Will be compiled into the following IL (not exactly, of course):
ldstr will allocate memory and create a new String object from "abc" literal
stloc will store the reference to that object in the local variable a
ldloc will push that reference onto the stack
ldstr will allocate memory and create a new String object from "xyz" literal
call will invoke the System.String::Concat on these String objects on the stack
A call to System.String::Concat will be decomposed into dozens of IL instructions and internal calls. Which, in short, will check lengths of both strings and allocate the requisite amount of memory to store the concatenation result and then copy those strings into the newly allocated memory.
stloc will store the reference to the newly created string in the local variable b

This is simply the C# compiler giving you a shortcut by allowing string literals.
If you'd rather, you can instantiate a string by any number of different constructors. For example:
char[] chars = { 'w', 'o', 'r', 'd' };
string myStr = new String(chars);

According to the MS documentation you do not need to use the new command to use the default string constructor.
However this does work.
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
c# Strings (from MSDN programming guide)

Strings are in fact reference types. The variable hold a reference to the value in the memory. Therefore you are just assigning the reference and not the value to the object. I would recommend you to have a look at this video from Pluralsight (you can get a free 14 days trial)
Pluralsight C# Fundamentals - Strings
Disclaimer: I am in no way related to Pluralsight. I am a subscriber and I love the videos over there

While everything is an object in .net there are still primitive types (int, bool, etc) that do not require instantiation.
as you can see here, a string is a 4byte address ref pointing to a vector/array structure that can extend to up to 2GB. remember strings are unmutable types so when you change a string you are not editing the existing variable, but instead allocating new memory for the literal value and then changing your string pointer to point to your new memory structure.
hope that helps

When you creates a string using literals, internally, depending on your assembly is marked with NoStringInterning flag or not, it's looks like:
String str = new String("samplestring");
// or with NoStringInterning
String str = String.Intern("samplestring");

In java if you write something like that:
String s1 = "abc";
String s2 = "abc";
memory will be allocated for "abc" in so called string pool and both s1 and s2 will refer to that memory. And s1 == s2 will return true ("==" compares references). But if you write:
String s1 = new String("abc");
String s1 = new String("abc");
s1 == s2 will return false. I guess in c# it'll be the same.

How to determine the size of an instance?

I have set my project to accept unsafe code and have the following helper Class to determine the size of an instance:
struct MyStruct
{
public long a;
public long b;
}
public static class CloneHelper
{
public unsafe static void GetSize(BookSetViewModel book)
{
long n = 0;
MyStruct inst;
inst.a = 0;
inst.b = 0;
n = Marshal.SizeOf(inst);
}
}
This works perfectly fine with a struct. However as soon as I use the actual class-instance that is passed in:
public unsafe static void GetSize(BookSetViewModel book)
{
long n = 0;
n = Marshal.SizeOf(book);
}
I get this error:
Type 'BookSetViewModel' cannot be marshaled as an unmanaged structure;
no meaningful size or offset can be computed.
Any idea how I could fix this?
Thanks,

Well, it really depends on what you mean by the "size" of an instance. There's the size of the single object in memory, but you usually need to think about any objects that the root object refers to. That's how much memory may be reclaimable after the root becomes eligible for garbage collection... but you can't just add them up, as those objects may be referred to by multiple other objects, and indeed there may be repeated references even within a single object.
This blog post shows some code I've used before to determine the size of the raw objects (header + fields), disregarding any extra cost due to the objects that one object refers to. It's not something I would use in production code, but it's useful for experimenting with how large an object is under varying circumstances.

C# passing parameters by reference

I have the below piece of code which Prefixs a string to the start of each member of a string array. ie. ["a","b","c"] prefixed with "z" becomes ["za","zb","zc"].
private string[] Prefix(string[] a, string b) {
for(int i = 0;i < a.Length;i++) {
a[i] = b + a[i];
}
return a;
}
The function works fine (although if theres a better way to do this, I'm happy to hear it), but I'm having issues when passing parameters.
string[] s1 = new string[] {"a","b"};
string[] s2 = Prefix(s1,"z");
Now as far as I can tell, I'm passing s1 by Value. But when the Prefix function has finished, s2 and s1 have the same value of ["za,"zb"], or s1 has been passed by reference. I was certain you had to explicitly declare this behaviour in c#, and am very confused.

As others have said, the reference is passed by value. That means your s1 reference is copied to a, but they both still refer to the same object in memory. What I would to do fix your code is write it like this:
private IEnumerable<string> Prefix(IEnumerable<string> a, string b) {
return a.Select(s => b + s);
}
.
string[] s1 = new string[] {"a","b"};
string[] s2 = Prefix(s1,"z").ToArray();
This not only fixes your problem, but also allows you to work with Lists and other string collections in addition to simple arrays.

C#, like Java before it, passes everything by value by default.
However, for reference types, that value is a reference. Note that both string and array are reference types.

Here's a better way to do it:
private string[] Prefix(string[] a, string b) {
return a.Select(s => b + s).ToArray();
}
or even:
private IEnumerable<string> Prefix(IEnumerable<string> a, string b) {
return a.Select(s => b + s);
}

You are passing the reference to s1 by value. In other words, your a parameter (when in the Prefix function scope), and your s1 variable, are references to the same array.

strings are immutable
this means that when you append a string to a string you get out a totally new string - its for performance reasons. its cheaper to make a new string that to reallocate the existing array.
hence why it feels like you are working with strings by value
in c# all reference types are passed by reference by default - ie classes creaped on the heap rather than values.

Actually, a reference to s1 was passed by value to Prefix(). While you now have two different references, both s1 and s2 still refer to the same string array, as arrays are reference types in C#.

Passing by value doesn't mean that you can't change things.
Objects are passed by value, but the value is (effectively) a pointer to the object.
Arrays are objects and a pointer to the array gets passed in. If you change the contents of the array in a method, the array will reflect the changes.
This doesn't happen with strings only because strings are immutable - once constructed, their contents can't change.

A reference to the string array is passed by value.
Consequently, the original array reference in the calling method cannot be changed, meaning that a = new string[10] within the Prefix method would have no impact on the s1 reference in the calling method. But the array itself is mutable, and a duplicate reference to the same array is capable of making changes to it in a way that would be visible to any other reference to the same array.

The value of an object is passed. For "reference" objects this is the value of the reference. A clone/copy/duplicate of the underlying data is not made.
To fix the issue observed, simply don't mutate the input array -- instead, create a new output array/object and fill it in appropriately. (If you must use arrays, I would likely use this approach as it's so boringly C-like anyway.)
Alternately, you can clone the input array first (which also creates a new object). Using a clone (which is a shallow copy) in this case is okay because the inner members (strings) are themselves immutable -- even though they are reference types the value of the underlying object can't be changed once created. For nested mutable types, more care may need to be taken.
Here are two methods which can be used to create a shallow copy:
string[] B = (string[])A.Clone();
string[] B = (new List<string>(A)).ToArray();
It's not inclusive.
Personally though, I would use LINQ. Here is a teaser:
return a.Select(x => b + x).ToArray();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.