How does the C# compiler work with a split? - c#

I have an List<string> that I am iterating through and splitting on each item then adding it to a StringBuilder.
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1] + " ");
}
So my question is how many strings are created by doing this split? All of the splits are going to produce two items. So... I was thinking that it will create a string[2] and then an empty string. But, does it then create the concatenation of the string[1] + " " and then add it to the StringBuilder or is this optimized?

The code is actually equivalent to this:
foreach(string part in myList)
{
sb.Append(string.Concat(part.Split(':')[1], " "));
}
So yes, an additional string, representing the concatenation of the second part of the split and the empty string will be created.
Including the original string, you also have the two created by the call to Split(), and a reference to the literal string " ", which will be loaded from the assembly metadata.
You can save yourself the call to Concat() by just Appending the split result and the empty string sequentially:
sb.Append(part.Split(':')[1]).Append(" ");
Note that if you are only using string literals, then the compiler will make one optimzation for you:
sb.Append("This is " + "one string");
is actually compiled to
sb.Append("This is one string");

3 extra strings for every item
part[0];
part[1];
part[1] + " "
the least allocations possible would be to avoid all the temporary allocations completely, but the usual micro-optimization caveats apply.
var start = part.IndexOf(':') + 1;
stringbuilder.Append(part, start, part.Length-start).Append(' ');

You have the original string 'split' - 1 string
You have the 'split' split into two - 2 string
You have the two parts of split joined - 1 string
The string builder does not create a new string.
The current code uses 4 strings, including the original.
If you want to save one string do:
StringBuilder.Append(part.Split(':')[1]);
StringBuilder.Append(" ");

This code:
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1] + " ");
}
Is equivalent to:
foreach(string part in List)
{
string tmp = string.Concat(part.Split(':')[1], " ");
StringBuilder.Append(tmp);
}
So yes, it's creating a string needlessly. This would be better, at least in terms of the number of strings generated:
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1])
.Append(" ");
}

So for each value in the list (n, known as part in your code) you are allocating:
x (I assume 2) strings for the split.
n strings for the concatenation.
Roughly n + 1 string for the StringBuilder; probably much less though.
So you have nx + n + n + 1 at the end, and assuming the split always results in two values 4n + 1.
One way to improve this would be:
foreach(string part in List)
{
var val = part.Split(':')[1];
StringBuilder.EnsureCapacity(StringBuilder.Length + val.Length + 1);
StringBuilder.Append(val);
StringBuilder.Append(' ');
}
This makes it 3n + 1. It is a rough estimate as StringBuilder allocates strings as it runs out of space - but if you EnsureCapacity you will prevent it from getting it wrong.

Probably the only way to be sure about how this is compiled is to build it and decompile it again with Refactor to see how it's internally handled. Anyway have in mind that probably it does not have impact on the whole app performance.

Related

Optimize an iteration of IEnumerable<string> [duplicate]

for long time , I always append a string in the following way.
for example if i want to get all the employee names separated by some symbol , in the below example i opeted for pipe symbol.
string final=string.Empty;
foreach(Employee emp in EmployeeList)
{
final+=emp.Name+"|"; // if i want to separate them by pipe symbol
}
at the end i do a substring and remove the last pipe symbol as it is not required
final=final.Substring(0,final.length-1);
Is there any effective way of doing this.
I don't want to appened the pipe symbol for the last item and do a substring again.
Use string.Join() and a Linq projection with Select() instead:
finalString = string.Join("|", EmployeeList.Select( x=> x.Name));
Three reasons why this approach is better:
It is much more concise and readable
– it expresses intend, not how you
want to achieve your goal (in your
case concatenating strings in a
loop). Using a simple projection with Linq also helps here.
It is optimized by the framework for
performance: In most cases string.Join() will
use a StringBuilder internally, so
you are not creating multiple strings that are
then un-referenced and must be
garbage collected. Also see: Do not
concatenate strings inside loops
You don’t have to worry about special cases. string.Join()
automatically handles the case of
the “last item” after which you do
not want another separator, again
this simplifies your code and makes
it less error prone.
I like using the aggregate function in linq, such as:
string[] words = { "one", "two", "three" };
var res = words.Aggregate((current, next) => current + ", " + next);
You should join your strings.
Example (borrowed from MSDN):
using System;
class Sample {
public static void Main() {
String[] val = {"apple", "orange", "grape", "pear"};
String sep = ", ";
String result;
Console.WriteLine("sep = '{0}'", sep);
Console.WriteLine("val[] = {{'{0}' '{1}' '{2}' '{3}'}}", val[0], val[1], val[2], val[3]);
result = String.Join(sep, val, 1, 2);
Console.WriteLine("String.Join(sep, val, 1, 2) = '{0}'", result);
}
}
For building up like this, a StringBuilder is probably a better choice.
For your final pipe issue, simply leave the last append outside of the loop
int size = EmployeeList.length()
for(int i = 0; i < size - 1; i++)
{
final+=EmployeeList.getEmployee(i).Name+"|";
}
final+=EmployeeList.getEmployee(size-1).Name;

Should I use StringBuilder in C# if I have a large amount of concatenations even if the total length is small?

I am reviewing some code and I see a large amount of string concatentation but they are all very small strings. Something like this:
public string BuildList()
{
return "A" + GetCount() + "B" + TotalCount() + "C" + AMountLeft()
+ "D" + DaysLeft + "R" + Initials() + "E";
}
I am simplifying it but in total the longest string is about 300 characters and the number of + are about 20. I am trying to figure out if its worth to convert it to StringBuilder and do something like this:
public string BuildList()
{
var sb = new StringBuilder();
sb.Append("A");
sb.Append(GetCount());
sb.Append("B");
sb.Append(TotalCount());
sb.Append("C");
sb.Append(AmountLeft());
sb.Append("D");
// etc . .
}
I can't see a big difference from testing but wanted to see if there is a good breakeven rule about using StringBuilder (either length of string or number of different concatenations)?
For such a fixed situation with limited number of variable strings, and with a always-the-same part of text between those values, I would personally use string formatting there. That way, your intention gets a lot more clear and it’s easier to see what’s happening:
return string.Format("A{0}B{1}C{2}D{3}R{4}E",
GetCount(), TotalCount(), AmountLeft(), DaysLeft(), Initials());
Note that string.Format is slightly slower than string.Concat (which as others said is used when you build a string using +). However, it’s unlikely that string concatenation will be a bottleneck in your application, so you should favor clarity over micro-optimization until that becomes an actual problem.
No, in that case there is no reason to favour a StringBuilder over string concatenation.
Your first code will become a single call to String.Concat, which would be marginally more efficient that using a StringBuilder.

String Format with undefined number of characters c#

So I'm working on formatting a string and I need to line it up in a table, but this string has an undetermined number of characters. Is there anyway to have the string be in the same spot for each column? so far I have:
ostring += "Notes\t\t"
+ " : "
+ employees[number].Notes
+ "\t\t"
+ employees[number].FirstNotes
+ "\t\t"
+ employees[number].SecondNotes;
I use a similar fashion on the other rows, but they have a pre-determined number of digits, this however doesn't so I can't use the string modifiers like I would like.
Any ideas on what I need to do?
You can use String.PadRight() to force the string to a specific size, rather than using tabs.
When you are using String.Format item format has following syntax:
{ index[,alignment][ :formatString] }
Thus you can specify alignment which indicates the total length of the field into which the argument is inserted and whether it is right-aligned (a positive integer) or left-aligned (a negative integer).
Also it's better to use StringBuilder to build strings:
var builder = new StringBuilder();
var employee = employees[number];
builder.AppendFormat("Notes {0,20} {1,10} {2,15}",
employee.Notes, employee.FirstNotes, employee.SecondNotes);
You would first have to loop over every entry to find the largest one so you know hoe wide to make the columns, something like:
var notesWidth = employees.Max(Notes.Length);
var firstNotesWidth = employees.Max(FirstNotes.Length);
// etc...
Then you can pad the columns to the correct width:
var output = new StringBuilder();
foreach(var employee in employees)
{
output.Append(employee.Notes.PadRight(notesWidth+1));
output.Append(employee.FirstNotes.PadRight(firstNotesWidth+1));
// etc...
}
And please don't do a lot of string "adding" ("1" + "2" + "3" + ...) in a loop. Use a StringBuilder instead. It is much more efficient.

Appending a string in a loop in effective way

for long time , I always append a string in the following way.
for example if i want to get all the employee names separated by some symbol , in the below example i opeted for pipe symbol.
string final=string.Empty;
foreach(Employee emp in EmployeeList)
{
final+=emp.Name+"|"; // if i want to separate them by pipe symbol
}
at the end i do a substring and remove the last pipe symbol as it is not required
final=final.Substring(0,final.length-1);
Is there any effective way of doing this.
I don't want to appened the pipe symbol for the last item and do a substring again.
Use string.Join() and a Linq projection with Select() instead:
finalString = string.Join("|", EmployeeList.Select( x=> x.Name));
Three reasons why this approach is better:
It is much more concise and readable
– it expresses intend, not how you
want to achieve your goal (in your
case concatenating strings in a
loop). Using a simple projection with Linq also helps here.
It is optimized by the framework for
performance: In most cases string.Join() will
use a StringBuilder internally, so
you are not creating multiple strings that are
then un-referenced and must be
garbage collected. Also see: Do not
concatenate strings inside loops
You don’t have to worry about special cases. string.Join()
automatically handles the case of
the “last item” after which you do
not want another separator, again
this simplifies your code and makes
it less error prone.
I like using the aggregate function in linq, such as:
string[] words = { "one", "two", "three" };
var res = words.Aggregate((current, next) => current + ", " + next);
You should join your strings.
Example (borrowed from MSDN):
using System;
class Sample {
public static void Main() {
String[] val = {"apple", "orange", "grape", "pear"};
String sep = ", ";
String result;
Console.WriteLine("sep = '{0}'", sep);
Console.WriteLine("val[] = {{'{0}' '{1}' '{2}' '{3}'}}", val[0], val[1], val[2], val[3]);
result = String.Join(sep, val, 1, 2);
Console.WriteLine("String.Join(sep, val, 1, 2) = '{0}'", result);
}
}
For building up like this, a StringBuilder is probably a better choice.
For your final pipe issue, simply leave the last append outside of the loop
int size = EmployeeList.length()
for(int i = 0; i < size - 1; i++)
{
final+=EmployeeList.getEmployee(i).Name+"|";
}
final+=EmployeeList.getEmployee(size-1).Name;

How should I concatenate strings?

Are there differences between these examples? Which should I use in which case?
var str1 = "abc" + dynamicString + dynamicString2;
var str2 = String.Format("abc{0}{1}", dynamicString, dynamicString2);
var str3 = new StringBuilder("abc").
Append(dynamicString).
Append(dynamicString2).
ToString();
var str4 = String.Concat("abc", dynamicString, dynamicString2);
There are similar questions:
Difference in String concatenation which only asks about the + operator, and it's not even mentioned in the answer that it is converted to String.Concat
What's the best string concatenation method which is not really related to my question, where it is asking for the best, and not a comparation of the possible ways to concatenate a string and their outputs, as this question does.
This question is asking about what happens in each case, what will be the real output of those examples? What are the differences about them? Where should I use them in which case?
As long as you are not deailing with very many (100+) strings or with very large (Length > 10000) strings, the only criterion is readability.
For problems of this size, use the +. That + overload was added to the string class for readability.
Use string.Format() for more complicated compositions and when substitutions or formatting are required.
Use a StringBuilder when combining many pieces (hundreds or more) or very large pieces (length >> 1000). StringBuilder has no readability features, it's just there for performance.
Gathering information from all the answers it turns out to behave like this:
The + operator is the same as the String.Concat, this could be used on small concatenations outside a loop, can be used on small tasks.
In compilation time, the + operator generate a single string if they are static, while the String.Concat generates the expression str = str1 + str2; even if they are static.
String.Format is the same as StringBuilder.. (example 3) except that the String.Format does a validation of params and instantiate the internal StringBuilder with the length of the parameters.
String.Format should be used when format string is needed, and to concat simple strings.
StringBuilder should be used when you need to concatenate big strings or in a loop.
Use the + operator in your scenario.
I would only use the String.Format() method when you have a mix of variable and static data to hold in your string. For example:
string result=String.Format(
"Today {0} scored {1} {2} and {3} points against {4}",..);
//looks nicer than
string result = "Today " + playerName + " scored " + goalCount + " " +
scoreType + " and " + pointCount + " against " + opposingTeam;
I don't see the point of using a StringBuilder, since you're already dealing with three string literals.
I personally only use Concat when dealing with a String array.
My rule of thumb is to use String.Format if you are doing a relatively small amount of concatination (<100) and StringBuilder for times where the concatination is going to be large or is potentially going to be large. I use String.Join if I have an array and there isn't any formatting needed.
You can also use the Aggregate function in LINQ if you have an enumerable collection:
http://msdn.microsoft.com/en-us/library/bb548651.aspx
# Jerod Houghtelling Answer
Actually String.Format uses a StringBuilder behind the scenes (use reflecton on String.Format if you want)
I agree with the following answer in general
#Xander. I believe you man. However my code shows sb is faster than string.format.
Beat this:
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10000; i++)
{
string r = string.Format("ABC{0}{1}{2}", i, i-10,
"dasdkadlkdjakdljadlkjdlkadjalkdj");
}
sw.Stop();
Console.WriteLine("string.format: " + sw.ElapsedTicks);
sw.Reset();
sw.Start();
for (int i = 0; i < 10000; i++)
{
StringBuilder sb = new StringBuilder();
string r = sb.AppendFormat("ABC{0}{1}{2}", i, i - 10,
"dasdkadlkdjakdljadlkjdlkadjalkdj").ToString();
}
sw.Stop();
Console.WriteLine("AppendFormat: " + sw.ElapsedTicks);
It's important to understand that strings are immutable, they don't change. So ANY time that you change, add, modify, or whatever a string - it is going to create a new 'version' of the string in memory, then give the old version up for garbage collection. So something like this:
string output = firstName.ToUpper().ToLower() + "test";
This is going to create a string (for output), then create THREE other strings in memory (one for: ToUpper(), ToLower()'s output, and then one for the concatenation of "test").
So unless you use StringBuilder or string.Format, anything else you do is going to create extra instances of your string in memory. This is of course an issue inside of a loop where you could end up with hundreds or thousands of extra strings. Hope that helps
It is important to remember that strings do not behave like regular objets.
Take the following code:
string s3 = "Hello ";
string s3 += "World";
This piece of code will create a new string on the heap and place "Hello" into it. Your string object on the stack will then point to it (just like a regular object).
Line 2 will then creatre a second string on the heap "Hello World" and point the object on the stack to it. The initial stack allocation still stands until the garbage collector is called.
So....if you have a load of these calls before the garbage collector is called you could be wasting a lot of memory.
I see that nobody know this method:
string Color = "red";
Console.WriteLine($"The apple is {Color}");
var str3 = new StringBuilder
.AppendFormat("abc{0}{1}", dynamicString, dynamicString2).ToString();
the code above is the fastest. so use if you want it fast. use anything else if you dont care.

Categories

Resources