This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
At what point does using a StringBuilder become insignificant or an overhead?
Related/Duplicate Questions
String vs StringBuilder
At what point does using a StringBuilder become insignificant or an overhead?
As plain as possible I have this method 1:
cmd2.CommandText = ("insert into " + TableName + " values (" + string.Join(",", insertvalues) + ");");
I am wondering if method 2 be faster if I would do:
StringBuilder sb2 = new StringBuilder();
sb2.Append("insert into ");
sb2.Append(TableName);
sb2.Append(" values (");
sb2.Append(string.Join(",", insertvalues));
sb2.Append(");");
cmd2.CommandText = sb2.ToString();
You could also try String.Format, which I believe uses a StringBuilder internally but has increased readability.
cmd2.CommandText = string.Format("insert into {0} values ({1});", TableName, string.Join(",", insertvalues));
(This is for C#)
For small programs this will be a premature optimization.
If you want to take into consideration these kinds of optimization then better measure it, because this depends on the size of the string concatenated also, apart from the number or appends.
Besides that IMO the StringBuilder method looks and reads better the StringBuilder does outperform string concatenation after 5 to 10 added strings according to http://dotnetperls.com/stringbuilder-performance
In C# an expression in the form "a" + b + "c" is optimized by the compiler into String.Concat("a", b, "c") so you will not get intermediary strings. This would be more efficient than a StringBuilder.
From here:
The Java language provides special
support for the string concatenation
operator ( + ), and for conversion of
other objects to strings. String
concatenation is implemented through
the StringBuffer class and its append
method.
So it would seem that the compiler is using StringBuffer on your behalf anyway.
A good compiler should optimize this for you - but don't take my word for it when you can easily find out for yourself.
Unless you are doing this in a tight loop, then the difference in performance is likely to be insignificant.
String concatenation of values is usually a bad way to construct SQL statements when you could use bind variables instead. This allows the database to optimize the queries. Using bind is likely to make a much bigger difference than optimizing your string construction - and with bind you only need to construct the string once per session instead of once per query.
Use String.format.
String.format("insert into {0} values({1});",TableName,string.Join(",", insertvalues));
It's more readable.
Your two methods will not differ in performance, the reason is that your string is concatenaled in 1 expression, and the compiler will create a StringBuilder for that expression. The following are equivalent and result in the same code:
String s1 = "five" + '=' + 5;
String s2 = new StringBuilder().append("five").append('=').append(5).toString();
If your code splits up the expresion, for instance in a loop, creating your own StringBuilder will perform better, the naive version using string + concatenation results after compilation in code like:
String s3 = "";
for (int n = 0; n < 5; n++) {
s3 = new StringBuilder(s3).append(getText(n)).append('=').append(n).append("\n").toString();
}
creating your method using an explicit StringBuilder can save creation of unnecessary StringBuilder objects.
For simple methods you normally do not have to optimise string concatenation yourself, but in those situations where the code is on the critical path, or where you are forced to use many different string expression to build up your end result, it is a good to know what happens under the hood so you can decide whether it is worth the extra effort.
Note that StringBuffer is thread-safe, while StringBuilder is not. StringBuilder is the faster choice for situations where multi-threaded access is not an issue.
Related
I have started to use StringBuilder as I hear it's much more optimized when it comes to outputting strings.
My question is regarding the use of + as strings are immutable and when you add them together it allocates a new string.
If I use this operator in the arguments for the StringBuilder.Append function, I assume it will essentially have the same overhead.
For example:
string animal1 = "dog";
string animal2 = "cat";
stringBuilder.Append("Today I saw a " + animal1 + " and " + animal2);
My guess is that this could concatenate these texts together allocating memory anyway.
I assume the more efficient (albeit verbose) way to do this would be:
stringBuilder.Append("Today I saw a");
stringBuilder.Append(animal1);
stringBuilder.Append(" and ");
stringBuilder.Append(animal2);
Is this correct?
You have a bug in the second example, as you miss a space between "Today I saw a" and animal1. Unless you are doing this in an excessive way (within a loop with many iterations) you'll probably find no measurable difference, so your best bet is probably to aim for readability.
$"Today I saw a {animal1} and a {animal2}"
Yes, I added an a before the second animal too :) I'm not handling cases for "an" though.
You also have the option of using AppendFormat if you want to be less verbose with all those appends...
stringBuilder.AppendFormat("Today I saw a {0} and a {1}", animal1, animal2);
Concatenating strings like that allocates the memory for a new string, just to have it appended to your StringBuilder, which is just wasteful. As you noted, you should just explicitly Append them instead.
How does StringBuilder work?
What does it do internally? Does it use unsafe code?
And why is it so fast (compared to the + operator)?
When you use the + operator to build up a string:
string s = "01";
s += "02";
s += "03";
s += "04";
then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.
...
s += "99";
On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.
A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.
What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.
Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.
StringBuilder's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.
The reason StringBuilder is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.
For just a single concatenation, it's actually slightly more efficient to use + than to use StringBuilder. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder shines.
See my article on StringBuilder for more information.
The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). The biggest performance benefit over a bunch of + concatenated strings is that it writes to a char[] and doesn't create as many intermediate strings. When you call ToString (), it builds a completed, immutable string from your contents.
The StringBuilder uses a string buffer that can be altered, compared to a regular String that can't be. When you call the ToString method of the StringBuilder it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.
As the StringBuilder can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data. When you use the + operator, the compiler turns that into a String.Concat call that creates a new string object. This seemingly innocent piece of code:
str += ",";
compiles into this:
str = String.Concat(str, ",");
The getter would get a message about the state of the struct it's in, which is defined by a combination of the struct's properties, which are finite.
First, this isn't about micro-optimization, I just think that:
StringBuilder msg = new StringBuilder();
...
msg.AppendLine("static string");
...
looks cleaner then:
String msg = String.Empty;
...
msg = String.Concat(msg, "static string", System.Environment.NewLine)
...
so just an aesthetic choice.
The msg variable has to be initialized empty either way, so the extra line doesn't bug me. But how bad is it to construct a new StringBuilder in a field that returns a string?
EDIT
I just wanted to know if putting a StringBuilder constructor in a getter was going to open up a huge can of worms, be ridiculous overhead, some never use constructors in getters kind of anti-pattern thing, etc... not to know everybody's favorite most performant way to concat non-looping strings.
EDIT2
What is the usual, best practice, performance threshold for a getter and does a StringBuilder constructor fall below that?
Why not do this?
msg + "static string" + Environment.NewLine
It will compile to the same as your second example.
Update
You changed your code so that it appears that you want to create a very large string containing lots of lines.
Then I guess it's fine to use StringBuilder, but I'd suggest that you make it a method (how about overriding ToString?) rather than a property so that callers are less likely to assume that it's cheap to call.
From a performance point of view, with the data supplied (three substrings), the String.Concat is better.
But, if inside the getter, you have lines like if(state == 0) that disrupt the efficiency of Concat or + operator then use StringBuilder for its good efficiency in string memory handling and for its clear syntax on AppendLine. Look at this site for data on StringBuilder vs Concat vs + and for some info on StringBuilder tips and mistakes
String.Concat function outperforms the StringBuilder by 2.3 times when not using lots of strings. Also,if for example you write code like "a" + "b" + "c" + "d" + "f" the compilter will compile it to use string.Concat(string[]) in the IL code.
In addition to answers present should say, that considering that you're talking about properties of the struct (I don't think thay are 1000s, but..), it's important to mantion fact, that the difference is not only in the semantics present, but in actual functinality too.
To be shorter and clearer:
+ and string.Concat on every call create a new string object.
If you use StringBuilder, instead, it operates on one single string buffer. So it's very convinient, from performance point of view, to use StringBuilder if you have a long string to compose.
When developing in Java a couple of years ago I learned that it is better to append a char if I had a single character instead of a string with one character because the VM would not have to do any lookup on the string value in its internal string pool.
string stringappend = "Hello " + name + ".";
string charappend = "Hello " + name + '.'; // better?
When I started programming in C# I never thought of the chance that it would be the same with its "VM". I came across C# String Theory—String intern pool that states that C# also has an internal string pool (I guess it would be weird if it didn't) so my question is,
are there actually any benefits in appending a char instead of a string when concatenating to a string regarding C# or is it just jibberish?
Edit: Please disregard StringBuilder and string.Format, I am more interested in why I would replace "." with '.' in code. I am well aware of those classes and functions.
If given a choice, I would pass a string rather than a char when calling System.String.Concat or the (equivalent) + operator.
The only overloads that I see for System.String.Concat all take either strings or objects. Since a char isn't a string, the object version would be chosen. This would cause the char to be boxed. After Concat verifies that the object reference isn't null, it would then call object.ToString on the char. It would then generate the dreaded single-character string that was being avoided in the first place, before creating the new concatinated string.
So I don't see how passing a char is going to gain anything.
Maybe someone wants to look at the Concat operation in Reflector to see if there is special handling for char?
UPDATE
As I thought, this test confirms that char is slightly slower.
using System;
using System.Diagnostics;
namespace ConsoleApplication19
{
class Program
{
static void Main(string[] args)
{
TimeSpan throwAwayString = StringTest(100);
TimeSpan throwAwayChar = CharTest(100);
TimeSpan realStringTime = StringTest(10000000);
TimeSpan realCharTime = CharTest(10000000);
Console.WriteLine("string time: {0}", realStringTime);
Console.WriteLine("char time: {0}", realCharTime);
Console.ReadLine();
}
private static TimeSpan StringTest(int attemptCount)
{
Stopwatch sw = new Stopwatch();
string concatResult = string.Empty;
sw.Start();
for (int counter = 0; counter < attemptCount; counter++)
concatResult = counter.ToString() + ".";
sw.Stop();
return sw.Elapsed;
}
private static TimeSpan CharTest(int attemptCount)
{
Stopwatch sw = new Stopwatch();
string concatResult = string.Empty;
sw.Start();
for (int counter = 0; counter < attemptCount; counter++)
concatResult = counter.ToString() + '.';
sw.Stop();
return sw.Elapsed;
}
}
}
Results:
string time: 00:00:02.1878399
char time: 00:00:02.6671247
When developing in Java a couple of years ago I learned that it is better to append a char if I had a single character instead of a string with one character because the VM would not have to do any lookup on the string value in its internal string pool.
Appending a char to a String is likely to be slightly faster than appending a 1 character String because:
the append(char) operation doesn't have to load the string length,
it doesn't have to load the reference to the string characters array,
it doesn't have to load and add the string's start offset,
it doesn't have to do a bounds check on the array index, and
it doesn't have to increment and test a loop variable.
Take a look at the Java source code for String and related classes. You might be surprised what goes on under the hood.
The intern pool has nothing to do with it. The interning of string literals happens just once during class loading. Interning of non-literal strings occurs only if the application explicitly calls String.intern().
This may be interesting:
http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx
Stringbuilder are not necessarily faster than Strings, it, as said before, depends. It depends on machine configuration, available memory vs processor power, framework version and machine config. Your profiler is your best buddy in this case :)
Back 2 Topic:
You should just TRY which is faster. Do that concatenation a bazillion times and let your profiler watch. You will see possible differences.
All string concatenation in .NET (with the standard operators i.e. +) requires the runtime to reserve enough memory for a complete new string to hold the results of the concatenation. This is due to the string type being immutable.
If you are performing string concatenation many times over (i.e. within a loop etc.) you will suffer performance issues (and eventually memory issues if the string is sufficiently large) as the .NET runtime needs to continually allocate and deallocate memory space to hold each new string.
It's probably for this reason that you're thinking (correctly) that excessive string concatenation can be problematic. It has very little (if anything) to do with concatenating a char rather than a string type.
The alternative to this is to use the StringBuilder class within the System.Text namespace. This class represents a mutable string-like object that can be used to concatenate strings without much of the resulting performance issues. This is because the StringBuilder class will reserve a specific amount of memory for a string, and will allow concatenations to be appended to the end of the reserved memory amount without requiring a complete new copy of the entire string.
EDIT:
With regard to the specifics of string lookups versus char lookups, I whipped up this little test:
class Program
{
static void Main(string[] args)
{
string stringtotal = "";
string chartotal = "";
Stopwatch stringconcat = new Stopwatch();
Stopwatch charconcat = new Stopwatch();
stringconcat.Start();
for (int i = 0; i < 100000; i++)
{
stringtotal += ".";
}
stringconcat.Stop();
charconcat.Start();
for (int i = 0; i < 100000; i++)
{
chartotal += '.';
}
charconcat.Stop();
Console.WriteLine("String: " + stringconcat.Elapsed.ToString());
Console.WriteLine("Char : " + charconcat.Elapsed.ToString());
Console.ReadLine();
}
}
It merely times (using the high-performance StopWatch class) how long it takes to concatenate 100000 dots/periods (.) of type string vs. 100000 dots/periods of type char.
I ran this test a few times over to prevent the results being skewed from one specific run, however, each time the results were similar to as follows:
String: 00:00:06.4606331
Char : 00:00:06.4528073
Therefore, in the context of multiple concatenations, I'd say that there's very little difference (in all likelihood, no difference when taking standard test run tolerances into account) between the two!
I agree with what everyone is saying about using StringBuilder if you are doing lots of string concatenation because String is an immutable type, but don't forget there's an overhead with creating the StringBuilder class too so you'll have to make a choice when to use which.
In one of Bill Wagner's Effect C# books (or might be in all 3 of them..), he touched on this too. Broadly speaking, if all you need is to add a few string fragments together, string.Format is better but if you need to build up a large string value in a potentially large loop, use the StringBuilder.
Every time when you concatenate strings using + operator, runtime creates a new string, and for avoiding that, recommended practice is usage of StringBuilder class, which has Append method. You can also use AppendLine and AppendFormat.
If you do not want to use StringBuilder, then you can use string.Format:
string str = string.Format("Hello {0}.", name);
Since strings are immutable types both would require creating a new instance of a string before the value is returned back to you.
I would consider string.Concat(...) for a small number of concatenations or use the StringBuilder class for many string concatenations.
I can't speak to C#, but in Java, the main advantage is not the compile-time gain but the run-time gain.
Yes, if you use a String, than at compile time Java will have to look the String up in its internal pool and possibly create a new String object. But this just happens once, at compile-time, when you create the .class files. The user will never see this.
What the user will see is that at run-time, if you give a character the program just has to retrieve the character. Done. If you give a String, it must first retrieve the String object handle. Then it must set up a loop to go through all the characters, retrieve the one character, observe that there are no more characters, and stop. I haven't looked at the generated byte-code but it's clearly seveal times as much work.
My question is this: Is string concatenation in C# safe? If string concatenation leads to unexpected errors, and replacing that string concatenation by using StringBuilder causes those errors to disappear, what might that indicate?
Background: I am developing a small command line C# application. It takes command line arguments, performs a slightly complicated SQL query, and outputs about 1300 rows of data into a formatted XML file.
My initial program would always run fine in debug mode. However, in release mode it would get to about the 750th SQL result, and then die with an error. The error was that a certain column of data could not be read, even through the Read() method of the SqlDataReader object had just returned true.
This problem was fixed by using StringBuilder for all operations in the code, where previously there had been "string1 + string2". I'm not talking about string concatenation inside the SQL query loop, where StringBuilder was already in use. I'm talking about simple concatenations between two or three short string variables earlier in the code.
I had the impression that C# was smart enough to handle the memory management for adding a few strings together. Am I wrong? Or does this indicate some other sort of code problem?
To answer your question:
String contatenation in C# (and .NET in general) is "safe", but doing it in a tight loop as you describe is likely to cause severe memory pressure and put strain on the garbage collector.
I would hazard a guess that the errors you speak of were related to resource exhaustion of some sort, but it would be helpful if you could provide more detail — for example, did you receive an exception? Did the application terminate abnormally?
Background:
.NET strings are immutable, so when you do a concatenation like this:
var stringList = new List<string> {"aaa", "bbb", "ccc", "ddd", //... };
string result = String.Empty;
foreach (var s in stringList)
{
result = result + s;
}
This is roughly equivalent to the following:
string result = "";
result = "aaa"
string temp1 = result + "bbb";
result = temp1;
string temp2 = temp1 + "ccc";
result = temp2;
string temp3 = temp2 + "ddd";
result = temp3;
// ...
result = tempN + x;
The purpose of this example is to emphasise that each time around the loop results in the allocation of a new temporary string.
Since the strings are immutable, the runtime has no alternative options but to allocate a new string each time you add another string to the end of your result.
Although the result string is constantly updated to point to the latest and greatest intermediate result, you are producing a lot of these un-named temporary string that become eligible for garbage collection almost immediately.
At the end of this concatenation you will have the following strings stored in memory (assuming, for simplicity, that the garbage collector has not yet run).
string a = "aaa";
string b = "bbb";
string c = "ccc";
// ...
string temp1 = "aaabbb";
string temp2 = "aaabbbccc";
string temp3 = "aaabbbcccddd";
string temp4 = "aaabbbcccdddeee";
string temp5 = "aaabbbcccdddeeefff";
string temp6 = "aaabbbcccdddeeefffggg";
// ...
Although all of these implicit temporary variables are eligible for garbage collection almost immediately, they still have to be allocated. When performing concatenation in a tight loop this is going to put a lot of strain on the garbage collector and, if nothing else, will make your code run very slowly. I have seen the performance impact of this first hand, and it becomes truly dramatic as your concatenated string becomes larger.
The recommended approach is to always use a StringBuilder if you are doing more than a few string concatenations. StringBuilder uses a mutable buffer to reduce the number of allocations that are necessary in building up your string.
String concatenation is safe though more memory intensive than using a StringBuilder if contatenating large numbers of strings in a loop. And in extreme cases you could be running out of memory.
It's almost certainly a bug in your code.
Maybe you're contatenating a very large number of strings. Or maybe it's something else completely different.
I'd go back to debugging without any preconceptions of the root cause - if you're still having problems try to reduce it to the minimum needed to repro the problem and post code.
Apart from what you're doing is probably best done with XML APIs instead of strings or StringBuilder I doubt that the error you see is due to string concatenation. Maybe switching to StringBuilder just masked the error or went over it gracefully, but I doubt using strings really was the cause.
How long would it take the concatenation version vs the string builder version? It's possible that your connection to the DB is being closed. If you are doing a lot of concatenation, i would go w/ StringBuilder as it is a bit more efficient.
One cause may be that strings are immutable in .Net so when you do an operation on one such as concatenation you are actually creating a new string.
Another possible cause is that string length is an int so the maximum possible length is Int32.MaxValue or 2,147,483,647.
In either case a StringBuilder is better than "string1 + string2" for this type of operation. Although, using the built-in XML capabilities would be even better.
string.Concat(string[]) is by far the fastest way to concatenate strings. It litterly kills StringBuilder in performance when used in loops, especially if you create the StringBuilder in each iteration.
There are loads of references if you Google "c# string format vs stringbuilder" or something like that.
http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx gives you an ideer about the times. Here string.Join wins the concatenation test but I belive this is because the string.Concat(string, string) is used instead of the overloaded version that takes an array.
If you take a look at the MSIL code that is generated by the different methods you'll see what going on beneath the hood.
Here is my shot in the dark...
Strings in .NET (not stringbuilders) go into the String Intern Pool. This is basically an area managed by the CLR to share strings to improve performance. There has to be some limit here, although I have no idea what that limit is. I imagine all the concatenation you are doing is hitting the ceiling of the string intern pool. So SQL says yes I have a value for you, but it can't put it anywhere so you get an exception.
A quick and easy test would be to nGen your assembly and see if you still get the error. After nGen'ing, you application no longer will use the pool.
If that fails, I'd contact Microsoft to try and get some hard details. I think my idea sounds plausible, but I have no idea why it works in debug mode. Perhaps in debug mode strings aren't interned. I am also no expert.
When compounding strings together I always use StringBuilder. It's designed for it and is more efficient that simply using "string1 + string2".