Performance considerations for String and StringBuilder - C#

Performance considerations for String and StringBuilder - C# - c#

All,
For the string string s = "abcd", does string w = s.SubString(2) return a new allocated String object i.e. string w = new String ("cd") internally or a String literal?
For StringBuilder, when appending string values and if the size of the StringBuilder needs to be increased, are all the contents copied over to a new memory location or simply the pointers to each of the earlier String value are reassigned to the new location?

String is immutable, so any operation that "changes" the string, will in effect return a new string. This includes SubString and all other operations on String, including those that does not change the length (such as ToLower() or similar).
StringBuilder contains internally a linked list of chunks of characters. When it needs to grow, a new chunk is allocated and inserted at the end of the list, and data is copied here. In other words, the whole StringBuilder buffer will not be copied on an append, only the data you are appending. I double-checked this against the Framework 4 reference sources.

For the string string s = "abcd", does string w = s.SubString(2) return a new allocated String object? Yes
For StringBuilder, when appending string values and if the size of the StringBuilder needs to be increased, are all the contents copied over to a new memory location? Yes
Any change in String small or large results in a new String

If you are going to make large numbers of edits to a string it better to do this via StringBuilder.
From MSDN:
You can use the StringBuilder class instead of the String class for operations that make multiple changes to the value of a string. Unlike instances of the String class, StringBuilder objects are mutable; when you concatenate, append, or delete substrings from a string, the operations are performed on a single string.

Strings are immutable objects so every time you had to make changes you create a new instance of that string. The substring method does not change the value of the original string.
Regards.

Difference between the String and StringBuilder is an important concept which makes the difference when an application has to deal with the editing of a high number of Strings.
String
The String object is a collection of UTF-16 code units represented by a System.Char object which belong to the System namespace. Since the value of this objects are read-only, the entire object String has defined as immutable. The maximum size of a String object in memory is 2 GB, or about 1 billion characters.
Immutable
Being immutable means that every time a methods of the System.String is used, a new sting object is created in memory and this cause a new allocation of space for the new object.
Example:
By using the string concatenation operator += appears that the value of the string variable named test change. In fact, it create a new String object, which has a different value and address from the original and assign it to the test variable.
string test;
test += "red"; // a new object string is created
test += "coding"; // a new object string is created
test += "planet"; // a new object string is created
StringBuilder
The StringBuilder is a dynamic object which belong to the System.Text namespace and allow to modify the number of characters in the string that it encapsulates, this characteristic is called mutability.
Mutability
To be able to append, remove, replace or insert characters, A StringBuilder maintains a buffer to accommodate expansions to the string. If new data is appended to the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, and the new data is then appended to the new buffer.
StringBuilder sb = new StringBuilder("");
sb.Append("red");
sb.Append("blue");
sb.Append("green ");
string colors = sb.ToString();
Performances
In order to help you better understand the performance difference between String and StringBuilder, I created the following example:
Stopwatch timer = new Stopwatch();
string str = string.Empty;
timer.Start();
for (int i = 0; i < 10000; i++) {
str += i.ToString();
}
timer.Stop();
Console.WriteLine("String : {0}", timer.Elapsed);
timer.Restart();
StringBuilder sbr = new StringBuilder(string.Empty);
for (int i = 0; i < 10000; i++) {
sbr.Append(i.ToString());
}
timer.Stop();
Console.WriteLine("StringBuilder : {0}", timer.Elapsed);
The output is
Output
String : 00:00:00.0706661
StringBuilder : 00:00:00.0012373

Related

.NET Core 2.1 string.Create

I use string.Create method to create a new string like this:
var rawStr = "raw str";
var newStr = string.Create(rawStr.Length, rawStr,
(chars, str) =>
{
chars = str.ToCharArray();
});
but, the result newStr just an empty char array.
I saw an answer here, and modify my code:
var rawStr = "raw str";
var newStr = string.Create(rawStr.Length, rawStr.ToCharArray(),
(chars, str) =>
{
//chars = str.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
chars[i] = str[i];
}
});
Then, newStr's value is raw str, this is why?

I'll try to explain what you are doing:
String.Create documentation
Creates a new string with a specific length and initializes it after creation by using the specified callback.
What you can achieve with the create, is a kind of conversion. In it's simplest form, it just takes an array of chars and creates as string.
Let's break down the function:
string.Create<TType>( newStrLength, typedOnject, creationFunction);
TType - It's the input type (in your case as string) that will be converted or used to create the new string
newStrLength - You need to provide the new string length
typedOnject - Object of type TType that will be given to the creation function
creationFunction - A lamda function that will do something based on the characters and the TType buffer. Create is calling this function. The chars are being provided by the Create and they are the new string's chars to modify as you please.
In your case, you creation function gets one by one the characters from the string and maps them to a new string, creating effectively a copy of one.
In your first attempt the following happens:
The chars array has a reference that is replaced by your new array that the ToCharArray returns. So by this assignment you are no longer referencing the characters that will be used to create the string. The original array remains unchanged.
In the second attempt you are changing the values of the original array and thus the new string uses it.

There is no need in this complexity. Use this syntax:
var rawStr = "raw str";
var newStr = rawStr;

Replace char(0x10) with a String (The Optimized way)

This is a common question but I hope this does not get tagged as a duplicate since the nature of the question is different (please read the whole not only the title)
Unaware of the existence of String.Replace I wrote the following:
int theIndex = 0;
while ((theIndex = message.IndexOf(separationChar, theIndex)) != -1) //we found the character
{
theIndex++;
if (theIndex < message.Length)//not in the last position
{
message = message.Insert(theIndex, theTime);
}
else
{
// I dont' think this is really neccessary
break;
}
} //while finding characters
As you can see I am replacing occurrences of separationChar in the message String with a String called "theTime".
Now, this works ok for small strings but I have been given a really huge String (in the order of several hundred Kbytes- by the way is there a limit for String or StringBuilder??) and it takes a lot of time...
So my questions are:
1) Is it more efficient if I just do
oldString=separationChar.ToString();
newString=oldString.Insert(theTime);
message= message.Replace(oldString,newString);
2) Is there any other way I can process very long Strings to insert a String (theTime) when finding some char in a very fast and efficient way??
Thanks a lot

As Danny already mentioned, string.Insert() actually creates a new instance each time you use it, and these also have to be garbage collected at some point.
You could instead start with an empty StringBuilder to construct the result string:
public static string Replace(this string str, char find, string replacement)
{
StringBuilder result = new StringBuilder(str.Length); // initial capacity
int pointer = 0;
int index;
while ((index = str.IndexOf(find, pointer)) >= 0)
{
// Append the unprocessed data up to the character
result.Append(str, pointer, index - pointer);
// Append the replacement string
result.Append(replacement);
// Next unprocessed data starts after the character
pointer = index + 1;
}
// Append the remainder of the unprocessed data
result.Append(str, pointer, str.Length - pointer);
return result.ToString();
}
This will not cause a new string to be created (and garbage collected) for each occurrence of the character. Instead, when the internal buffer of the StringBuilder is full, it will create a new buffer chunk "of sufficient capacity". Quote from reference source, when its buffer is full:
Compute the length of the new block we need
We make the new chunk at least big enough for the current need (minBlockCharCount), but also as big as the current length (thus doubling capacity), up to a maximum
(so we stay in the small object heap, and never allocate really big chunks even if
the string gets really big).

Thank you for answering my question.
I am writing an answer because I have to report that I tried the solution in my question 1) and it is indeed more efficient according to the results of my program. String.Replace can replace a string(from a char) with another string very fast.
oldString=separationChar.ToString();
newString=oldString.Insert(theTime);
message= message.Replace(oldString,newString);

Why string pointer position is different?

Why string pointer position is different each time I ran the application, when I'm using StringBuilder but same when I declare a variable?
void Main()
{
string str_01 = "my string";
string str_02 = GetString();
unsafe
{
fixed (char* pointerToStr_01 = str_01)
{
fixed (char* pointerToStr_02 = str_02)
{
Console.WriteLine((Int64)pointerToStr_01);
Console.WriteLine((Int64)pointerToStr_02);
}
}
}
}
private string GetString()
{
StringBuilder sb = new StringBuilder();
sb.Append("my string");
return sb.ToString();
}
Output:
40907812
178488268
next time:
40907812
179023248
next time:
40907812
178448964

str_01 holds a reference to constant string. StringBuilder however builds string instances dynamically, so the returned string instance is not referentially the same instance as the constant string with the same content. System.Object.ReferenceEquals() will return false.
Since the str_01 is a reference to a constant string, its data is probably stored in a data section of the executable, which always gets the same address in the application virtual address space.
Edit:
You can see the "my string" text in UTF-8 encoding when you open the compiled .exe file using PE.Explorer or similar software. It is present in the .data section of the file, including a preferred Virtual Address where the section should be loaded in process virtual memory.
I have however not been able to reproduce that str_01 has a same address on multiple runs of the application, probably because my x64 Windows 8.1 performs Address space layout randomization (ASLR). Because of that, all pointers will be different across multiple runs of the application, even those that point directly to loaded PE sections.

Just because two strings are equal that doesn't mean they point to the same references (which I guess would mean having the same pointers), C# does not intern all strings automatically because of performance considerations and what not. If you want the pointers to be the same for both strings you can intern str_02 using string.Intern.

when i use fixed it will allocate a memory
as str_01 is constant string, it allocates memory on execution and points to same location every time
fixed (char* pointerToStr_01 = str_01)
but in case of
fixed (char* pointerToStr_02 = str_02)
its dynamically allocating the memory hence the pointing location varies every time
hence there is diffrence in the string pointer each time we run

I am not agree that output for
Console.WriteLine((Int64)pointerToStr_01);
is same for you always as I tested it personally to make my point more clear.
Lets have a look in both cases:
In case of string str_01 = "my string", when you will print the pointer value of this variable it will not the same as previous because every time a new String object is created (i.e. string is Immutable) and "my string" is assigned to it. Then within Fixed statement you are printing the pointer's value which is out of scope when you execute the program again and previous value will not be remembered.
I think, till now you can self-explain the behavior of StringBuilder.
Also check with:
string str_01 = GetString();
private static string GetString()
{
var sb = new String(new char[] {'m','y',' ','s','t','r','i','n','g'});
return sb;
}

Add into string field [duplicate]

This question already has an answer here:
String Concatenation using '+' operator
(1 answer)
Closed 8 years ago.
tell me pls what's problem with this code C#.
string str = string.Empty;
for (var i = 1; i <= 1000; i++)
str += i.ToString();
This was interview question.

actually there is no problem with your code.
inthis case StringBuilder is more appropriate than string.
because StringBuilder is mutable whereas string is immutable.
so whenever you modify the String object using += it creates a new string object so at the end of your loop it creates many string objects.
but if you use StringBuilder: same object will be modified each time you Append the Strings to it.
You can find more info from MSDN: StringBuilder Class
The String object is immutable. Every time you use one of the methods
in the System.String class, you create a new string object in memory,
which requires a new allocation of space for that new object. In
situations where you need to perform repeated modifications to a
string, the overhead associated with creating a new String object can
be costly. The System.Text.StringBuilder class can be used when you
want to modify a string without creating a new object. For example,
using the StringBuilder class can boost performance when concatenating
many strings together in a loop.
Solution :
This
string str = string.Empty;
for (var i = 1; i <= 1000; i++)
str += i.ToString();
Shouldbe this
StringBuilder str =new StringBuilder();
for (var i = 1; i <= 1000; i++)
str.Append(i.ToString());

There is an answer here.
the compiler can't do anything if you concatenate in a loop and this does generate a lot of garbage.

Why does .NET create new substrings instead of pointing into existing strings?

From a brief look using Reflector, it looks like String.Substring() allocates memory for each substring. Am I correct that this is the case? I thought that wouldn't be necessary since strings are immutable.
My underlying goal was to create a IEnumerable<string> Split(this String, Char) extension method that allocates no additional memory.

One reason why most languages with immutable strings create new substrings rather than refer into existing strings is because this will interfere with garbage collecting those strings later.
What happens if a string is used for its substring, but then the larger string becomes unreachable (except through the substring). The larger string will be uncollectable, because that would invalidate the substring. What seemed like a good way to save memory in the short term becomes a memory leak in the long term.

Not possible without poking around inside .net using String classes. You would have to pass around references to an array which was mutable and make sure no one screwed up.
.Net will create a new string every time you ask it to. Only exception to this is interned strings which are created by the compiler (and can be done by you) which are placed into memory once and then pointers are established to the string for memory and performance reasons.

Each string has to have it's own string data, with the way that the String class is implemented.
You can make your own SubString structure that uses part of a string:
public struct SubString {
private string _str;
private int _offset, _len;
public SubString(string str, int offset, int len) {
_str = str;
_offset = offset;
_len = len;
}
public int Length { get { return _len; } }
public char this[int index] {
get {
if (index < 0 || index > len) throw new IndexOutOfRangeException();
return _str[_offset + index];
}
}
public void WriteToStringBuilder(StringBuilder s) {
s.Write(_str, _offset, _len);
}
public override string ToString() {
return _str.Substring(_offset, _len);
}
}
You can flesh it out with other methods like comparison that is also possible to do without extracting the string.

Because strings are immutable in .NET, every string operation that results in a new string object will allocate a new block of memory for the string contents.
In theory, it could be possible to reuse the memory when extracting a substring, but that would make garbage collection very complicated: what if the original string is garbage-collected? What would happen to the substring that shares a piece of it?
Of course, nothing prevents the .NET BCL team to change this behavior in future versions of .NET. It wouldn't have any impact on existing code.

Adding to the point that Strings are immutable, you should be that the following snippet will generate multiple String instances in memory.
String s1 = "Hello", s2 = ", ", s3 = "World!";
String res = s1 + s2 + s3;
s1+s2 => new string instance (temp1)
temp1 + s3 => new string instance (temp2)
res is a reference to temp2.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Performance considerations for String and StringBuilder - C# - c#

Strings are immutable objects so every time you had to make changes you create a new instance of that string. The substring method does not change the value of the original string. Regards.

Related

.NET Core 2.1 string.Create

Replace char(0x10) with a String (The Optimized way)

Why string pointer position is different?

Add into string field [duplicate]

Why does .NET create new substrings instead of pointing into existing strings?

Categories

Resources