c# modify interned string through Span/Memory and MemoryMarshal

c# modify interned string through Span/Memory and MemoryMarshal - c#

I started digging into new C#/.net core features called Span and Memory and so far they look very good.
However, when I encountered MemoryMarshal.AsMemory method I found out the following interesting use case:
const string source1 = "immutable string";
const string source2 = "immutable string";
var memory = MemoryMarshal.AsMemory(source1.AsMemory());
ref char first = ref memory.Span[0];
first = 'X';
Console.WriteLine(source1);
Console.WriteLine(source2);
Output in both cases is Xmmutable string (tested on Windows 10 x64, .net471 and .netcore2.1). And as far as I can see any string that is interned can now be modified in one place and then all references to that string will use updated value.
Is there any way to prevent such behavior? And is it possible to "unintern" string?

This is just the way it works
MemoryMarshal.AsMemory(ReadOnlyMemory) Method
Creates a Memory instance from a ReadOnlyMemory.
Returns
- Memory<T> A memory block that represetns the same memory as the ReadOnlyMemory .
Remarks
This method must be used with extreme caution. ReadOnlyMemory is used to represent immutable data and other memory that is not meant to
be written to. Memory instances created by this method should not
be written to. The purpose of this method is to allow variables typed
as Memory but only used for reading to store a ReadOnlyMemory.
More things you shouldn't do
private const string source1 = "immutable string1";
private const string source2 = "immutable string2";
public unsafe static void Main()
{
fixed(char* c = source1)
{
*c = 'f';
}
Console.WriteLine(source1);
Console.WriteLine(source2);
Console.ReadKey();
}
Output
fmmutable string1
immutable string2

Related

How to view the address of string, to check its (reference) type in C#?

using System;
using System.Runtime.InteropServices;
using System.Security.Claims;
using System.Text;
Method();
unsafe void Method()
{
string a = "hello";
string b = a;
//or: string b = "hello";
Console.WriteLine(object.ReferenceEquals(a, b)); // True.
string aa = "hello";
string bb = "h";
bb += "ello";
Console.WriteLine(object.ReferenceEquals(aa, bb)); // False.
int aaa = 100;
int bbb = aaa;
Console.WriteLine(object.ReferenceEquals(aaa, bbb)); // False.
string* pointer1;
string* pointer2;
string word1 = "Hello";
string word2 = "Hello";
pointer1 = &word1;
pointer2 = &word2;
ulong addr1 = (ulong)pointer1;
ulong addr2 = (ulong)pointer2;
Console.WriteLine($"Address of variable named word1: {addr1}");
Console.WriteLine($"Address of variable named word2: {addr2}");
}
Why different locations?
It works correctly with object/string.ReferenceEquals. But I can't see the ADDRESSes of strings. Beginner in the world of IT. Be kind, people.

We'll start from here:
string word1 = "Hello";
string word2 = "Hello";
It seems you expect word1 and word2 to refer to the same string object in memory. But that's not how it works for normal objects (strings can be a little different... we'll get there). For normal reference types, you should expect two different objects. The two objects have equivalent values, but they are still different objects.
This is important. Imagine the next line changed the string for word1. You would not want the word2 variable to also change.
Now, strings are a little bit "special" in this area. Depending on which version of .Net you're running, the compiler may opt to intern equivalent strings. This means it will use the same object in memory for strings with equivalent values.
This is possible because strings are immutable. That is, calling, say, word1.Replace("e", "3") does not change the value of the string in word1 to instead be "h3llo", and therefore word2 is also not modified by changes made from word1. Instead, the Replace() call returns a new string. Additionally, all the string methods and properties work this way, such that there is no way to change an existing string in-place.
If you want word1 to receive that new value, you must also assign it to the variable: word1 = word1.Replace("e", "3");. Since this is a new assignment and only assigns to word1, the word2 variable will still show "hello". So everything works as expected, and you were able to save some memory use while the two values were equal. Again: strings have special treatment here, and this is a little different from how most reference objects work by default.
But there's another important thing to understand about memory managemnet in .Net. The Garbage Collector can sometimes move objects to new locations. This means any address you see at one moment may not be the address it uses the next moment. This can especially happen during the compaction phase of garbage collection.
Now, it is possible to pin objects via the fixed keyword, but this is not usually a good idea; it's something to avoid unless you really need it: say to pass the object to an outside unmanaged library. There are a number of reasons for this, but one is it prevents the garbage collector from collecting the resource at all until the fixed block closes.

Why constants in c# can be declared in methods? [duplicate]

Whenever I have local variables in a method, ReSharper suggests to convert them to constants:
// instead of this:
var s = "some string";
var flags = BindingFlags.Public | BindingFlags.Instance;
// ReSharper suggest to use this:
const string s = "some string";
const BindingFlags flags = BindingFlags.Public | BindingFlags.Instance;
Given that these are really constant values (and not variables) I understand that ReSharper suggest to change them to const.
But apart from that, is there any other advantage when using const (e.g. better performance) which justifies using const BindingFlags instead of the handy and readable var keyword?
BTW: I just found a similar question here: Resharper always suggesting me to make const string instead of string, but I think it is more about fields of a class where my question is about local variable/consts.

The compiler will throw an error if you try to assign a value to a constant, thus possibly preventing you from accidentally changing it.
Also, usually there is a small performance benefit to using constants vs. variables. This has to do with the way they are compiled to the MSIL, per this MSDN magazine Q&A:
Now, wherever myInt is referenced in the code, instead of having to do a "ldloc.0" to get the value from the variable, the MSIL just loads the constant value which is hardcoded into the MSIL. As such, there's usually a small performance and memory advantage to using constants. However, in order to use them you must have the value of the variable at compile time, and any references to this constant at compile time, even if they're in a different assembly, will have this substitution made.
Constants are certainly a useful tool if you know the value at compile time. If you don't, but want to ensure that your variable is set only once, you can use the readonly keyword in C# (which maps to initonly in MSIL) to indicate that the value of the variable can only be set in the constructor; after that, it's an error to change it. This is often used when a field helps to determine the identity of a class, and is often set equal to a constructor parameter.

tl;dr for local variables with literal values, const makes no difference at all.
Your distinction of "inside methods" is very important. Let's look at it, then compare it with const fields.
Const local variables
The only benefit of a const local variable is that the value cannot be reassigned.
However const is limited to primitive types (int, double, ...) and string, which limits its applicability.
Digression: There are proposals for the C# compiler to allow a more general concept of 'readonly' locals (here) which would extend this benefit to other scenarios. They will probably not be thought of as const though, and would likely have a different keyword for such declarations (i.e. let or readonly var or something like that).
Consider these two methods:
private static string LocalVarString()
{
var s = "hello";
return s;
}
private static string LocalConstString()
{
const string s = "hello";
return s;
}
Built in Release mode we see the following (abridged) IL:
.method private hidebysig static string LocalVarString() cil managed
{
ldstr "hello"
ret
}
.method private hidebysig static string LocalConstString() cil managed
{
ldstr "hello"
ret
}
As you can see, they both produce the exact same IL. Whether the local s is const or not has no impact.
The same is true for primitive types. Here's an example using int:
private static int LocalVarInt()
{
var i = 1234;
return i;
}
private static int LocalConstInt()
{
const int i = 1234;
return i;
}
And again, the IL:
.method private hidebysig static int32 LocalVarInt() cil managed
{
ldc.i4 1234
ret
}
.method private hidebysig static int32 LocalConstInt() cil managed
{
ldc.i4 1234
ret
}
So again we see no difference. There cannot be a performance or memory difference here. The only difference is that the developer cannot re-assign the symbol.
Const fields
Comparing a const field with a variable field is different. A non-const field must be read at runtime. So you end up with IL like this:
// Load a const field
ldc.i4 1234
// Load a non-const field
ldsfld int32 MyProject.MyClass::_myInt
It's clear to see how this could result in a performance difference, assuming the JIT cannot inline a constant value itself.
Another important difference here is for public const fields that are shared across assemblies. If one assembly exposes a const field, and another uses it, then the actual value of that field is copied at compile time. This means that if the assembly containing the const field is updated but the using assembly is not re-compiled, then the old (and possibly incorrect) value will be used.
Const expressions
Consider these two declarations:
const int i = 1 + 2;
int i = 1 + 2;
For the const form, the addition must be computed at compile time, meaning the number 3 is kept in the IL.
For the non-const form, the compiler is free to emit the addition operation in the IL, though the JIT would almost certainly apply a basic constant folding optimisation so the generated machine code would be identical.
The C# 7.3 compiler emits the ldc.i4.3 opcode for both of the above expressions.

As per my understanding Const values do not exist at run time - i.e. in form of a variable stored in some memory location - they are embeded in MSIL code at compile time . And hence would have an impact on performance. More over run-time would not be required to perform any house keeping (conversion checks / garbage collection etc) on them as well, where as variables require these checks.

const is a compile time constant - that means all your code that is using the const variable is compiled to contain the constant expression the const variable contains - the emitted IL will contain that constant value itself.
This means the memory footprint is smaller for your method because the constant does not require any memory to be allocated at runtime.

Besides the small performance improvement, when you declare a constant you are explicitly enforcing two rules on yourself and other developers who will use your code
I have to initialize it with a value right now i can't to do it any place else.
I cannot change its value anywhere.
In code its all about readability and communication.

A const value is also 'shared' between all instances of an object. It could result in lower memory usage as well.
As an example:
public class NonStatic
{
int one = 1;
int two = 2;
int three = 3;
int four = 4;
int five = 5;
int six = 6;
int seven = 7;
int eight = 8;
int nine = 9;
int ten = 10;
}
public class Static
{
static int one = 1;
static int two = 2;
static int three = 3;
static int four = 4;
static int five = 5;
static int six = 6;
static int seven = 7;
static int eight = 8;
static int nine = 9;
static int ten = 10;
}
Memory consumption is tricky in .Net and I won't pretend to understand the finer details of it, but if you instantiate a list with a million 'Static' it is likely to use considerably less memory than if you do not.
static void Main(string[] args)
{
var maxSize = 1000000;
var items = new List<NonStatic>();
//var items = new List<Static>();
for (var i=0;i<maxSize;i++)
{
items.Add(new NonStatic());
//items.Add(new Static());
}
Console.WriteLine(System.Diagnostics.Process.GetCurrentProcess().WorkingSet64);
Console.Read();
}
When using 'NonStatic' the working set is 69,398,528 compared to only 32,423,936 when using static.

The const keyword tells the compiler that it can be fully evaluated at compile time. There is a performance & memory advantage to this, but it is small.

Constants in C# provide a named location in memory to store a data value. It means that the value of the variable will be known in compile time and will be stored in a single place.
When you declare it, it is kind of 'hardcoded' in the Microsoft Intermediate Language (MSIL).
Although a little, it can improve the performance of your code. If I'm declaring a variable, and I can make it a const, I always do it. Not only because it can improve performance, but also because that's the idea of constants. Otherwise, why do they exist?
Reflector can be really useful in situations like this one. Try declaring a variable and then make it a constant, and see what code is generated in IL. Then all you need to do is see the difference in the instructions, and see what those instructions mean.

Should .NET strings really be considered immutable?

Consider the following code:
unsafe
{
string foo = string.Copy("This can't change");
fixed (char* ptr = foo)
{
char* pFoo = ptr;
pFoo[8] = pFoo[9] = ' ';
}
Console.WriteLine(foo); // "This can change"
}
This creates a pointer to the first character of foo, reassigns it to become mutable, and changes the chars 8 and 9 positions up to ' '.
Notice I never actually reassigned foo; instead, I changed its value by modifying its state, or mutating the string. Therefore, .NET strings are mutable.
This works so well, in fact, that the following code:
unsafe
{
string bar = "Watch this";
fixed (char* p = bar)
{
char* pBar = p;
pBar[0] = 'C';
}
string baz = "Watch this";
Console.WriteLine(baz); // Unrelated, right?
}
will print "Catch this" due to string literal interning.
This has plenty of applicable uses, for example this:
string GetForInputData(byte[] inputData)
{
// allocate a mutable buffer...
char[] buffer = new char[inputData.Length];
// fill the buffer with input data
// ...and a string to return
return new string(buffer);
}
gets replaced by:
string GetForInputData(byte[] inputData)
{
// allocate a string to return
string result = new string('\0', inputData.Length);
fixed (char* ptr = result)
{
// fill the result with input data
}
return result; // return it
}
This could save potentially huge memory allocation / performance costs if you work in a speed-critical field (e.g. encodings).
I guess you could say that this doesn't count because it "uses a hack" to make pointers mutable, but then again it was the C# language designers who supported assigning a string to a pointer in the first place. (In fact, this is done all the time internally in String and StringBuilder, so technically you could make your own StringBuilder with this.)
So, should .NET strings really be considered immutable?

§ 18.6 of the C# language specification (The fixed statement) specifically addresses the case of modifying a string through a fixed pointer, and indicates that doing so can result in undefined behavior:
Modifying objects of managed type through fixed pointers can results in undefined behavior. For example, because strings are immutable, it is the programmer’s responsibility to ensure that the characters referenced by a pointer to a fixed string are not modified.

I just had to play with this and experiment to confirm whether the addresses of string literal are pointing into the same memory location.
The results are:
string foo = "Fix value?"; //New address: 0x02b215f8
string foo2 = "Fix value?"; //Points to same address: 0x02b215f8
string fooCopy = string.Copy(foo); //New address: 0x021b2888
fixed (char* p = foo)
{
p[9] = '!';
}
Console.WriteLine(foo);
Console.WriteLine(foo2);
Console.WriteLine(fooCopy);
//Reference is equal, which means refering to same memory address
Console.WriteLine(string.ReferenceEquals(foo, foo2)); //true
//Reference is not equal, which creates another string in new memory address
Console.WriteLine(string.ReferenceEquals(foo, fooCopy)); //false
We see that foo initializes a string literal which points to 0x02b215f8 memory address in my PC. Assigning the same string literal to foo2 references the same memory address. And creating a copy of that same string literal makes a new one. Further testing via string.ReferenceEquals() reveals that they are indeed equal for foo and foo2 while different reference for foo and fooCopy.
It is interesting to see how string literals can be manipulated in memory and affects other variables that are just referencing it. One of the things that we should be careful of as this behavior exists.

String inside structure behavior

Let's say we have structure
struct MyStruct
{
public string a;
}
When we assign it to the new variable what will be happened with the string? So for example, we expect that string should be shared when structs are copied in the stack. We're using this code to test it, but it returns different pointers:
var a = new MyStruct();
a.a = "test";
var b = a;
IntPtr pA = Marshal.StringToCoTaskMemAnsi(a.a);
IntPtr pB = Marshal.StringToCoTaskMemAnsi(b.a);
Console.WriteLine("Pointer of a : {0}", (int)pA);
Console.WriteLine("Pointer of b : {0}", (int)pB);
The question is when structs are copied in the stack and have string inside did it share the string or the string is recreated?
[UPDATE]
We also tried this code, it returns different pointers as well:
char charA2 = a.a[0];
char charB2 = b.a[0];
unsafe
{
var pointerA2 = &charA2;
var pointerB2 = &charB2;
Console.WriteLine("POinter of a : {0}", (int)pointerA2);
Console.WriteLine("Pointer of b : {0}", (int)pointerB2);
}

The code you use to test it 'Copies the contents of a managed String to a block of memory allocated from the unmanaged COM task allocator.' according to MSDN. I would be surprised if any two subsequent calls to StringToCoTaskMemAnsi would return the same pointer. You can look at the memory address of the two string references or assign an object id using the debugger. Or easier: object.ReferenceEquals(a.a, b.a);
In your update, you are pointing to the stack location of the character variables, also not a good way of finding out. In any case, you are just copying the reference when you assign a string to another string, so they should always be the same.

Strings are immutable in storage and a reference type. Furthermore in your example, the string "test" is interned. So no matter how many copies of the struct you makes you ultimately have multiple pointers to the same underling storage (unless you go through gyrations to copy it to a new block of memory which your contrived examples are doing)
Rest assured that there is only one copy, pointed at multiple times.

how String object is allocate memory without having new keyword or constructor?

In C# if we want to create a variable of type string we can use:
string str="samplestring"; // this will allocate the space to hold the string
In C#, string is a class type, so if we want to create an object, normally we have to use the new keyword. So how is allocation happening without new or constructors?

When you write
string str="samplestring";
compiler will generate two instructions:
Firstly, ldstr gets a string literal from the metadata; allocates the requisite amount of memory; creates a new String object and pushes the reference to it onto the stack.
Then stloc (or one of it's short forms, e.g. stloc.0) stores that reference in the local variable str.
Note, that ldstr will allocate memory only once for each sequence of characters.
So in example below both variables will point at the same object in memory:
// CLR will allocate memory and create a new String object
// from the string literal stored in the metadata
string a = "abc";
// CLR won't create a new String object. Instead, it will look up for an existing
// reference pointing to the String object created from "abc" literal
string b = "abc";
This process is known as string interning.
Also, as you know, in .NET strings are immutable. So the contents of a String object cannot be changed after the object is created. That is, every time you're concatenating string, CLR will create a new String object.
For example, the following lines of code:
string a = "abc";
string b = a + "xyz";
Will be compiled into the following IL (not exactly, of course):
ldstr will allocate memory and create a new String object from "abc" literal
stloc will store the reference to that object in the local variable a
ldloc will push that reference onto the stack
ldstr will allocate memory and create a new String object from "xyz" literal
call will invoke the System.String::Concat on these String objects on the stack
A call to System.String::Concat will be decomposed into dozens of IL instructions and internal calls. Which, in short, will check lengths of both strings and allocate the requisite amount of memory to store the concatenation result and then copy those strings into the newly allocated memory.
stloc will store the reference to the newly created string in the local variable b

This is simply the C# compiler giving you a shortcut by allowing string literals.
If you'd rather, you can instantiate a string by any number of different constructors. For example:
char[] chars = { 'w', 'o', 'r', 'd' };
string myStr = new String(chars);

According to the MS documentation you do not need to use the new command to use the default string constructor.
However this does work.
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
c# Strings (from MSDN programming guide)

Strings are in fact reference types. The variable hold a reference to the value in the memory. Therefore you are just assigning the reference and not the value to the object. I would recommend you to have a look at this video from Pluralsight (you can get a free 14 days trial)
Pluralsight C# Fundamentals - Strings
Disclaimer: I am in no way related to Pluralsight. I am a subscriber and I love the videos over there

While everything is an object in .net there are still primitive types (int, bool, etc) that do not require instantiation.
as you can see here, a string is a 4byte address ref pointing to a vector/array structure that can extend to up to 2GB. remember strings are unmutable types so when you change a string you are not editing the existing variable, but instead allocating new memory for the literal value and then changing your string pointer to point to your new memory structure.
hope that helps

When you creates a string using literals, internally, depending on your assembly is marked with NoStringInterning flag or not, it's looks like:
String str = new String("samplestring");
// or with NoStringInterning
String str = String.Intern("samplestring");

In java if you write something like that:
String s1 = "abc";
String s2 = "abc";
memory will be allocated for "abc" in so called string pool and both s1 and s2 will refer to that memory. And s1 == s2 will return true ("==" compares references). But if you write:
String s1 = new String("abc");
String s1 = new String("abc");
s1 == s2 will return false. I guess in c# it'll be the same.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.