using PLINQ or Parallel with keeping original order - c#

I have a large collection of elements. I want to call ToString for each element and build one string.
My first approach was to slow
string str = "";
list.ForEach( g => {
string s = g.ToString();
if(s != "")
str = str + g.ToString() + "\n";
});
I tried using the Parallel class and PLINQ as shown below but then the order of the elements in the final string was not like in the original.
Parallel
System.Threading.Tasks.Parallel.ForEach(list, g => {
string s = g.ToString();
if(s != "")
str = str + g.ToString() + "\n";
});
PLINQ
string str = "";
list.AsParallel().AsOrdered().ForAll( g => {
string s = g.ToString();
if(s != "")
str = str + g.ToString() + "\n";
});
How can I improve the performance and keep the original order?
Thanks

I think that trying to use parallelism is not the right solution here, using a better algorithm is.
Currently, your code is O(n2), because each concatenation creates a completely new string and so it has to copy the whole previous string into the new one.
But you can do this om O(n), if you use the mutable StringBuilder instead of the immutable string. That way, you can just append at the end of the existing StringBuilder and you don't have to copy anything.
And you can also achieve the same performance with less code using a special method just for joining strings together: string.Join().
But there might be an even better solution: use a StreamWriter. For example, if you wanted to write the resulting string to a file, that would be a better solution, because the whole string doesn't need to be in memory at all.
Parallelism isn't something that will magically solve all your performance problems. You need to think about what the threads will be doing, especially if you have some shared data (like str in your code). In your case, you could try to use parallelism, but it wouldn't be that simple and I'm not sure it would actually improve the performance.
It would work like this: each thread would get a range of indexes in the list and concatenate those. At the end, the main thread would then concatenate the results from all threads together. But there is no built-in method for that, so you would need to write all that code by yourself. (This would work, because string concatenation is associative.)

Related

Should I use StringBuilder in C# if I have a large amount of concatenations even if the total length is small?

I am reviewing some code and I see a large amount of string concatentation but they are all very small strings. Something like this:
public string BuildList()
{
return "A" + GetCount() + "B" + TotalCount() + "C" + AMountLeft()
+ "D" + DaysLeft + "R" + Initials() + "E";
}
I am simplifying it but in total the longest string is about 300 characters and the number of + are about 20. I am trying to figure out if its worth to convert it to StringBuilder and do something like this:
public string BuildList()
{
var sb = new StringBuilder();
sb.Append("A");
sb.Append(GetCount());
sb.Append("B");
sb.Append(TotalCount());
sb.Append("C");
sb.Append(AmountLeft());
sb.Append("D");
// etc . .
}
I can't see a big difference from testing but wanted to see if there is a good breakeven rule about using StringBuilder (either length of string or number of different concatenations)?
For such a fixed situation with limited number of variable strings, and with a always-the-same part of text between those values, I would personally use string formatting there. That way, your intention gets a lot more clear and it’s easier to see what’s happening:
return string.Format("A{0}B{1}C{2}D{3}R{4}E",
GetCount(), TotalCount(), AmountLeft(), DaysLeft(), Initials());
Note that string.Format is slightly slower than string.Concat (which as others said is used when you build a string using +). However, it’s unlikely that string concatenation will be a bottleneck in your application, so you should favor clarity over micro-optimization until that becomes an actual problem.
No, in that case there is no reason to favour a StringBuilder over string concatenation.
Your first code will become a single call to String.Concat, which would be marginally more efficient that using a StringBuilder.

remove first element from array

PHP developer here working with c#.
I'm using a technique to remove a block of text from a large string by exploding the string into an array and then shifting the first element out of the array and turning what remains back into a string.
With PHP (an awesome & easy language) it was just
$array = explode('somestring',$string);
array_shift($array);
$newstring = implode(' ', $array);
and I'm done.
I get so mad at c# for not allowing me to create dynamic arrays and for not offering me default functions that can do the same thing as PHP regarding arrays. Instead of dynamic arrays I have to create lists and predefine key structures etc. But I'm new and I'm sure there are still equally graceful ways to do the same with c#.
Will someone show me a clean way to accomplish this goal with c#?
Rephrase of question: How can I remove the first element from an array using c# code.
Here is how far I've gotten, but RemoveAt throws a error while debugging so I don't believe it works:
//scoop-out feed header information
if (entry_start != "")
{
string[] parts = Regex.Split(this_string, #entry_start);
parts.RemoveAt(0);
this_string = String.Join(" ", parts);
}
I get so mad at c# for not allowing me to create dynamic arrays
You may take a look at the List<T> class. Its RemoveAt might be worth checking.
But for your particular scenario you could simply use LINQ and the Skip extension method (don't forget to add using System.Linq; to your file in order to bring it into scope):
if (entry_start != "")
{
string[] parts = Regex.Split(this_string, #entry_start).Skip(1).ToArray();
this_string = String.Join(" ", parts);
}
C# is not designed to be quick and dirty, nor it particularly specializes in text manipulation. Furthermore, the technique you use for removing some portion of a string from a beginning is crazy imho.
Why don't you just use String.Substring(int start, int length) coupled with String.IndexOf("your delimiter")?
Here is the corresponding C# code:
string input = "a,b,c,d,e";
string[] splitvals = input.Split(',');
string output = String.Join(",", splitvals, 1, splitvals.Length-1);
MessageBox.Show(output);
You can use LINQ for this:
if (entry_start != "")
this_string = String.Join(" ", Regex.Split(this_string, #entry_start).Skip(1).ToArray());
string split = ",";
string str = "asd1,asd2,asd3,asd4,asd5";
string[] ary = str.Split(new string[] { split }, StringSplitOptions.RemoveEmptyEntries);
string newstr = string.Join(split, ary, 1, ary.Count() - 1);
splits at ",". removes the first record. then combines back with ","
As stated above, you can use LINQ. Skip(int) will return an IEnumerable<string> that you can then cast back as array.
string[] myArray = new string[]{"this", "is", "an", "array"};
myArray = myArray.Skip(1).toArray();
You might be more comfortable with generic lists than arrays, which work more like PHP arrays.
List<T>
But if your goal is "to remove a block of text from a large string" then the easier way would be:
string Example = "somestring";
string BlockRemoved = Example.Substring(1);
// BlockRemoved = "omestring"
Edit
I misunderstood the question, thinking you were just removing the first element from the array where the array consisted of the characters that make up the string.
To split a string by a delimiter, look at the String.Split method instead. Some good examples are given here.

How does the C# compiler work with a split?

I have an List<string> that I am iterating through and splitting on each item then adding it to a StringBuilder.
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1] + " ");
}
So my question is how many strings are created by doing this split? All of the splits are going to produce two items. So... I was thinking that it will create a string[2] and then an empty string. But, does it then create the concatenation of the string[1] + " " and then add it to the StringBuilder or is this optimized?
The code is actually equivalent to this:
foreach(string part in myList)
{
sb.Append(string.Concat(part.Split(':')[1], " "));
}
So yes, an additional string, representing the concatenation of the second part of the split and the empty string will be created.
Including the original string, you also have the two created by the call to Split(), and a reference to the literal string " ", which will be loaded from the assembly metadata.
You can save yourself the call to Concat() by just Appending the split result and the empty string sequentially:
sb.Append(part.Split(':')[1]).Append(" ");
Note that if you are only using string literals, then the compiler will make one optimzation for you:
sb.Append("This is " + "one string");
is actually compiled to
sb.Append("This is one string");
3 extra strings for every item
part[0];
part[1];
part[1] + " "
the least allocations possible would be to avoid all the temporary allocations completely, but the usual micro-optimization caveats apply.
var start = part.IndexOf(':') + 1;
stringbuilder.Append(part, start, part.Length-start).Append(' ');
You have the original string 'split' - 1 string
You have the 'split' split into two - 2 string
You have the two parts of split joined - 1 string
The string builder does not create a new string.
The current code uses 4 strings, including the original.
If you want to save one string do:
StringBuilder.Append(part.Split(':')[1]);
StringBuilder.Append(" ");
This code:
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1] + " ");
}
Is equivalent to:
foreach(string part in List)
{
string tmp = string.Concat(part.Split(':')[1], " ");
StringBuilder.Append(tmp);
}
So yes, it's creating a string needlessly. This would be better, at least in terms of the number of strings generated:
foreach(string part in List)
{
StringBuilder.Append(part.Split(':')[1])
.Append(" ");
}
So for each value in the list (n, known as part in your code) you are allocating:
x (I assume 2) strings for the split.
n strings for the concatenation.
Roughly n + 1 string for the StringBuilder; probably much less though.
So you have nx + n + n + 1 at the end, and assuming the split always results in two values 4n + 1.
One way to improve this would be:
foreach(string part in List)
{
var val = part.Split(':')[1];
StringBuilder.EnsureCapacity(StringBuilder.Length + val.Length + 1);
StringBuilder.Append(val);
StringBuilder.Append(' ');
}
This makes it 3n + 1. It is a rough estimate as StringBuilder allocates strings as it runs out of space - but if you EnsureCapacity you will prevent it from getting it wrong.
Probably the only way to be sure about how this is compiled is to build it and decompile it again with Refactor to see how it's internally handled. Anyway have in mind that probably it does not have impact on the whole app performance.

What would be faster in this context String.Format or String.Replace?

string str = 'my {0} long string {1} need formatting';
Should I be doing the following,
str = string.Format(str, "really", "doesn't");
or creating a method like so and calling str = str.ReplaceWithValues("really", "doesn't");
public string ReplaceWithValues(this string str, params object[] values) {
string ret = str;
for (int i = 0; i < values.Length; i++) {
ret = str.Replace(string.Concat("{", i, "}"), values.ToString());
}
return ret;
}
It seems like StringBuilder.AppendFormat() isn't efficient when it comes to doing simple replacements like this since it goes character by character through the string.
Why do you want to reinvent String.Format?
I'd just use the framework method - it does exactly what you want, is robust, and is going to make sense to those that follow...
Just to satisfy your curiousity:
It seems like StringBuilder.AppendFormat() isn't efficient when it comes to doing simple replacements like this since it goes character by character through the string.
String.Format, FYI, uses StringBuilder.AppendFormat internally. That being said, StringBuilder.AppendFormat is quite efficient. You mention that it goes "character by character" through the string - however, in your case, you're using multiple calls to Replace and Concat instead. A single pass through the string with a StringBuilder is likely to be much quicker. If you really need to know- you could profile this to check. On my machine, I get the following timings if I run both of the above 1,000,000 times:
String.Format - 1029464 ticks
Custom method - 2988568 ticks
The custom procedure will increase its cost with each additional placeholder and produce throwaway strings for the garbage collector with each intermediate call to Replace.
Besides the likelihood that string.Format is much faster than multiple calls to Replace, string.Format includes overloads to culture-sensitive operations as well.
The flexibility and intuitiveness of string.Format is at least as compelling as the speed.
If all you want is to concatenate some strings, why not just do that?
string result = "my " + x + " long string " + y + " need formatting";
or
string result = string.Concat("my ", x, " long string ", y, " need formatting");
In C# the + operator actually turns in to a string.Concat(), and I always thought String.Format("Hello: {0} {1}", "Bob", "Jones") was the fastest. It turns out, after firing up a sample ran outside the debugger, in release mode, that "Bob" + "Jones" is much faster. There is a cutoff point though. I believe around 8 concatenations or so, string.Format becomes faster.

How should I concatenate strings?

Are there differences between these examples? Which should I use in which case?
var str1 = "abc" + dynamicString + dynamicString2;
var str2 = String.Format("abc{0}{1}", dynamicString, dynamicString2);
var str3 = new StringBuilder("abc").
Append(dynamicString).
Append(dynamicString2).
ToString();
var str4 = String.Concat("abc", dynamicString, dynamicString2);
There are similar questions:
Difference in String concatenation which only asks about the + operator, and it's not even mentioned in the answer that it is converted to String.Concat
What's the best string concatenation method which is not really related to my question, where it is asking for the best, and not a comparation of the possible ways to concatenate a string and their outputs, as this question does.
This question is asking about what happens in each case, what will be the real output of those examples? What are the differences about them? Where should I use them in which case?
As long as you are not deailing with very many (100+) strings or with very large (Length > 10000) strings, the only criterion is readability.
For problems of this size, use the +. That + overload was added to the string class for readability.
Use string.Format() for more complicated compositions and when substitutions or formatting are required.
Use a StringBuilder when combining many pieces (hundreds or more) or very large pieces (length >> 1000). StringBuilder has no readability features, it's just there for performance.
Gathering information from all the answers it turns out to behave like this:
The + operator is the same as the String.Concat, this could be used on small concatenations outside a loop, can be used on small tasks.
In compilation time, the + operator generate a single string if they are static, while the String.Concat generates the expression str = str1 + str2; even if they are static.
String.Format is the same as StringBuilder.. (example 3) except that the String.Format does a validation of params and instantiate the internal StringBuilder with the length of the parameters.
String.Format should be used when format string is needed, and to concat simple strings.
StringBuilder should be used when you need to concatenate big strings or in a loop.
Use the + operator in your scenario.
I would only use the String.Format() method when you have a mix of variable and static data to hold in your string. For example:
string result=String.Format(
"Today {0} scored {1} {2} and {3} points against {4}",..);
//looks nicer than
string result = "Today " + playerName + " scored " + goalCount + " " +
scoreType + " and " + pointCount + " against " + opposingTeam;
I don't see the point of using a StringBuilder, since you're already dealing with three string literals.
I personally only use Concat when dealing with a String array.
My rule of thumb is to use String.Format if you are doing a relatively small amount of concatination (<100) and StringBuilder for times where the concatination is going to be large or is potentially going to be large. I use String.Join if I have an array and there isn't any formatting needed.
You can also use the Aggregate function in LINQ if you have an enumerable collection:
http://msdn.microsoft.com/en-us/library/bb548651.aspx
# Jerod Houghtelling Answer
Actually String.Format uses a StringBuilder behind the scenes (use reflecton on String.Format if you want)
I agree with the following answer in general
#Xander. I believe you man. However my code shows sb is faster than string.format.
Beat this:
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10000; i++)
{
string r = string.Format("ABC{0}{1}{2}", i, i-10,
"dasdkadlkdjakdljadlkjdlkadjalkdj");
}
sw.Stop();
Console.WriteLine("string.format: " + sw.ElapsedTicks);
sw.Reset();
sw.Start();
for (int i = 0; i < 10000; i++)
{
StringBuilder sb = new StringBuilder();
string r = sb.AppendFormat("ABC{0}{1}{2}", i, i - 10,
"dasdkadlkdjakdljadlkjdlkadjalkdj").ToString();
}
sw.Stop();
Console.WriteLine("AppendFormat: " + sw.ElapsedTicks);
It's important to understand that strings are immutable, they don't change. So ANY time that you change, add, modify, or whatever a string - it is going to create a new 'version' of the string in memory, then give the old version up for garbage collection. So something like this:
string output = firstName.ToUpper().ToLower() + "test";
This is going to create a string (for output), then create THREE other strings in memory (one for: ToUpper(), ToLower()'s output, and then one for the concatenation of "test").
So unless you use StringBuilder or string.Format, anything else you do is going to create extra instances of your string in memory. This is of course an issue inside of a loop where you could end up with hundreds or thousands of extra strings. Hope that helps
It is important to remember that strings do not behave like regular objets.
Take the following code:
string s3 = "Hello ";
string s3 += "World";
This piece of code will create a new string on the heap and place "Hello" into it. Your string object on the stack will then point to it (just like a regular object).
Line 2 will then creatre a second string on the heap "Hello World" and point the object on the stack to it. The initial stack allocation still stands until the garbage collector is called.
So....if you have a load of these calls before the garbage collector is called you could be wasting a lot of memory.
I see that nobody know this method:
string Color = "red";
Console.WriteLine($"The apple is {Color}");
var str3 = new StringBuilder
.AppendFormat("abc{0}{1}", dynamicString, dynamicString2).ToString();
the code above is the fastest. so use if you want it fast. use anything else if you dont care.

Categories

Resources