string.IndexOf performance

string.IndexOf performance - c#

This simple piece of c# code that is meant to find script blocks in HTML takes 0.5 seconds to run on a 74K char string with only 9 script blocks in it. This is undebuged release binary on 2.8Ghz i7 CPU. I made several runs though this code to make sure that performance is not impeded by JIT. It is not.
This is VS2010 .NET 4.0 Client Profile. x64
Why is this so slow?
int[] _exclStart = new int[100];
int[] _exclStop = new int[100];
int _excl = 0;
for (int f = input.IndexOf("<script", 0); f != -1; )
{
_exclStart[_excl] = f;
f = input.IndexOf("</script", f + 8);
if (f == -1)
{
_exclStop[_excl] = input.Length;
break;
}
_exclStop[_excl] = f;
f = input.IndexOf("<script", f + 8);
++_excl;
}

I used the source on this page as an example, I then duplicated the content 8 times, resulting in a page some 334,312 bytes long. Using StringComparision.Ordinal yields massive performance difference.
string newInput = string.Format("{0}{0}{0}{0}{0}{0}{0}{0}", input.Trim().ToLower());
//string newInput = input.Trim().ToLower();
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int[] _exclStart = new int[100];
int[] _exclStop = new int[100];
int _excl = 0;
for (int f = newInput.IndexOf("<script", 0, StringComparison.Ordinal); f != -1; )
{
_exclStart[_excl] = f;
f = newInput.IndexOf("</script", f + 8, StringComparison.Ordinal);
if (f == -1)
{
_exclStop[_excl] = newInput.Length;
break;
}
_exclStop[_excl] = f;
f = newInput.IndexOf("<script", f + 8, StringComparison.Ordinal);
++_excl;
}
sw.Stop();
Console.WriteLine(sw.Elapsed.TotalMilliseconds);
running 5 times yields almost the same result for each (the loop timings did not significantly change so for this simple code there is almost no time spent for JIT to compile it)
Output using your original code (in Milliseconds):
10.2786
11.4671
11.1066
10.6537
10.0723
Output using the above code instead (in Milliseconds):
0.3055
0.2953
0.2972
0.3112
0.3347
Notice that my test results are around 0.010 seconds (original code) and 0.0003 seconds (for ordinal code). Meaning you have something else wrong other than this code directly.
If as you say, using StringComparison.Ordinal does nothing for your performance then that means that either your using incorrect timers to time your performance, or you have a large overhead in reading your input value such as reading it from a stream again which you otherwise don't realise.
Tested under Windows 7 x64 running on a 3GHz i5 using .NET 4 Client Profile.
Suggestions:
use StringComparison.Ordinal
Make sure you're using System.Diagnostics.Stopwatch to time performance
Declare a local variable for the input instead of using values external to the function (eg: string newInput = input.Trim().ToLower();)
Again I stress, I am getting 50 times faster speed for a test data that is apparently over 4 times larger in size using the exact same code that you provide. Meaning my test is running some 200 times faster than yours which is not something anyone would expect given we're both running same environment and just i5 (me) versus i7 (you).

The IndexOf overload you're using is culture-sensitive, which will affect performance. Instead, use:
input.IndexOf("<script", 0, StringComparison.Ordinal);

I would recommend using RegEx for this, it offers significant performance improvement because the expressions are compiled only once. Whereas IndexOf is essentially a loop which runs on per character basis which probably means, you have 3 "loops" within your main for loop, ofcourse, IndexOf won't be as slow as a regular loop, but still when the input size grows the time increases. Regex has inbuilt functions that would return the number and positions of occurrences of each pattern you define.
Edit: this might shed some more light on the performance of IndexOf IndexOf Perf

I just test IndexOf performance with .NET 4.0 on Windows 7
public void Test()
{
var input = "Hello world, I'm ekk. This is test string";
TestStringIndexOfPerformance(input, StringComparison.CurrentCulture);
TestStringIndexOfPerformance(input, StringComparison.InvariantCulture);
TestStringIndexOfPerformance(input, StringComparison.Ordinal);
Console.ReadLine();
}
private static void TestStringIndexOfPerformance(string input, StringComparison stringComparison)
{
var count = 0;
var startTime = DateTime.UtcNow;
TimeSpan result;
for (var index = 0; index != 1000000; index++)
{
count = input.IndexOf("<script", 0, stringComparison);
}
result = DateTime.UtcNow.Subtract(startTime);
Console.WriteLine("{0}: {1}", stringComparison, count);
Console.WriteLine("Total time: {0}", result.TotalMilliseconds);
Console.WriteLine("--------------------------------");
}
And the result is:
CurrentCulture:
Total time: 225.4008
InvariantCulture:
Total time: 187.2003
Ordinal:
Total time: 124.8003
As you can see performance of Ordinal is a little better.

I don't discuss the code here, that probably coul be written with Regex and so on... but in order to me is slow because the IndexOf() *inside* the for always rescan the string from the beginning ( it always start from index 0 ) try to scan from the last occurrency found instead.

Related

How come for loops in C# are so slow when concatenating strings? [duplicate]

This question already has answers here:
Most efficient way to concatenate strings?
(18 answers)
Closed 4 years ago.
I wrote a program that runs a simple for loop in both C++ and C#, yet the same thing takes dramatically longer in C#, why is that? Did I fail to account for something in my test?
C# (13.95s)
static double timeStamp() {
return (double)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
}
static void Main(string[] args) {
double timeStart = timeStamp();
string f = "";
for(int i=0; i<100000; i++) {
f += "Sample";
}
double timeEnd = timeStamp();
double timeDelta = timeEnd - timeStart;
Console.WriteLine(timeDelta.ToString());
Console.Read();
}
C++ (0.20s)
long int timeStampMS() {
milliseconds ms = duration_cast<milliseconds> (system_clock::now().time_since_epoch());
return ms.count();
}
int main() {
long int timeBegin = timeStampMS();
string test = "";
for (int i = 0; i < 100000; i++) {
test += "Sample";
}
long int timeEnd = timeStampMS();
long double delta = timeEnd - timeBegin;
cout << to_string(delta) << endl;
cin.get();
}

On my PC, changing the code to use StringBuilder and converting to a String at the end, the execution time went from 26.15 seconds to 0.0012 seconds, or over 20,000 times faster.
var fb = new StringBuilder();
for (int i = 0; i < 100000; ++i) {
fb.Append("Sample");
}
var f = fb.ToString();
As explained in the .Net documentation, the StringBuilder class is a mutable string object that is useful for when you are making many changes to a string, as opposed to the String class, which is an immutable object that requires a new object creation every time you e.g. concatenate two Strings together. Because the implementation of StringBuilder is a linked list of character arrays, and new blocks are added up to 8000 characters at a time, StringBuilder.Append is much faster.

C++ loop may be fast because it doesn't actually need to do anything. A good optimizer will be able to prove that removing the entire loop makes no observable difference in the behaviour of the program (execution time doesn't count as observable). I don't know if C# runtime is allowed to do similar optimization. In any case, to guarantee sensible measurements, you must always use the result in a way that is observable.
Assuming the optimizer didn't remove the loop, appending a constant length string into std::string has amortized constant complexity. Strings in C# are immutable, so the operation creates a new copy of the string every time, and so it has linear complexity. The longer the string becomes, the more significant this difference in asymptotic complexity becomes. You can achieve same asymptotic complexity by using the mutable StringBuilder in C#.

Since Strings are immutable, each concatenation creates a new string.
The used strings are left for dead, awaiting garbage collection.
StringBuider is instantiated once and new chunks of data can be added when needed, expanding its capacity to MakeRoom (.NET source).
Test it using a StringBuilder:
string stringToAppend = "Sample";
int iteratorMaxValue = 100000;
StringBuilder sb = new StringBuilder(stringToAppend.Length * iteratorMaxValue);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < iteratorMaxValue; i++) {
sb.Append(stringToAppend);
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
4 Milliseconds on my machine.

How to assert uniqueness of a huge collection of strings?

Let's say I have an algorithm which takes an unsigned 64-bit integer as input, and yields a string as a result. The string's alphabet is limited to [a-z, A-Z, 0-9] and its' maximum length is 16. So that's or 47,672,401,706,823,533,450,263,330,816 possible results.
I would like to assert the uniqueness of the algorithm's output. Read: I want to verify there are no collisions.
Is there an easy/quick 'n dirty way to do this, without having to fall back to (e.g.) some kind of database?
[EDIT]
Some clarification: the concerns uttered in the comments are legit, but no worries, I wasn't really planning on iterating over all possible combinations, my lifespan will probably be sub-1 century ;) Nor did I write my own algorithm to generate unique ID's. I just saw this and started wondering how one would go about asserting uniqueness for algorithms with very large result sets that can't be handled in-memory
[/EDIT]

As said in the comments, It would take a very long time to compute every possible entries, but just for fun, here is a try:
var workspace = new DirectoryInfo("MyWorkspace");
if (workspace.Exists)
{
workspace.Delete();
}
workspace.Create();
var limit = 23997907;
var buffer = new HashSet<string>();
ulong i = 0;
int j = 0;
var stopWatch = Stopwatch.StartNew();
while (i <= ulong.MaxValue)
{
var result = YourSuperAlgorythm(i);
// Check the result with current results
if (buffer.Contains(result))
{
throw new Exception("Failure !");
}
// Check the result with older results
foreach (var file in workspace.GetFiles())
{
var content = new HashSet<string>(File.ReadAllText(file.FullName).Split(';'));
if (content.Contains(result))
{
throw new Exception("Failure !");
}
}
buffer[j] = result;
i++;
j++;
if (j == arrayLimit)
{
stopWatch.Stop();
Console.WriteLine("Resetting. This loop takes " + stopWatch.Elapsed.TotalMilliseconds + "ms");
j = 0;
var file = Path.GetRandomFileName();
File.WriteAllText(Path.Combine(workspace.FullName, file), String.Join(";", buffer));
buffer = new HashSet<string>();
stopWatch.Restart();
}
}
You could probably optimize it but you won't have enought of a lifetime to check the results. For now, it did not even create a file to store the first set of entries :D. I will edit this post when one loop will be done!
Your only option is to prove mathematically your algorithm. Good luck with that...
EDIT1: for my test, I use this function:
private static string YourSuperAlgorythm(ulong i)
{
return i.ToString("x");
}
EDIT2: One loop takes 1477221.4261ms (~25min). And then the String.Join(";", buffer) line failed (OutOfMemory). So 23997907 is not the max value for my try. It must be decreased!

Optimization of algorithm that convert 8-bit number to char

I need to convert a 8-bit number such as 00001110 to char. The problem is easy so I wrote the code and everything is working fine, but now I need to optimize for speed as much as possible.
In test class :
class Program
{
static void Main(string[] args)
{
Random r = new Random();
int[] testTab = new int[8];
Normal n = new Normal();
long time;
Stopwatch watch = new Stopwatch();
watch.Start();
for (int i = 0; i < 9000; i++)
{
for (int j = 0; j < 8; j++)
{
testTab[j] = r.Next(2);
}
n.SetTable(testTab);
n.Decode();
}
watch.Stop();
time = watch.ElapsedTicks;
Console.WriteLine(time);
time = watch.ElapsedMilliseconds;
Console.WriteLine(time);
Console.ReadKey();
}
}
and class with algorithm :
class Normal
{
private int[] _tab = new int[8];
public void SetTable(int[] tab)
{
_tab = tab;
}
public void Decode()
{
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 +
_tab[6]*64 + _tab[7]*124));
}
}
In the output for 9000 times I get time 2ms it is not a long time ( for 9000 ) time, but I have good proc in my PC.
The final code will be running in smartphone so there is no powerful CPU. In my algorithm I use random data, in final version I will load data by Camera (so it will be longer ) and try to repeat this operation 10 times in one second so that is why I need best time in even smallest operations.
Is there a faster way to convert byte to char than this?
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 + _tab[6]*64 + _tab[7]*128));

tl;dr Your conversion code is already efficient, and is not your bottleneck.
Your benchmarking is flawed. You are not just timing the conversion of binary stored in int[] to integer value. You are also timing the generation of your random data. I expect that the majority of the time is spent generating the random data.
Re-write your benchmarking program to operate on data prepared before you start timing. Make sure that the duration of the test is at least 5 or 10 seconds so that you can generate meaningful answers. If you only run for two milliseconds then the granularity of your timer affects the quality of your results.
Bear in mind that in your real application you will be taking a picture on a camera of a QR code and decoding that. The cost of that is many orders of magnitude greater than the cost of converting the 8 bit int arrays.
Your code to do that conversion is already efficient. Do not seek to optimize it further. Not only is there no need to optimize it, there is little hope for significant gains. For the sake of clarity and conciseness you may well opt to use one of the .net library methods that perform such a conversion, but performance of this part of your program is not an issue.
As an aside, it looks like you need to be converting the 8 bit value to byte, adding these values to a byte array, and then feeding to Encoding.GetString to obtain your text. A cast to UTF-16 char as per your code is not correct.

It worth a try this:
var yourString = "00100000";
char yourChar = (char) Convert.ToByte(yourString, 2); // you got ' ' (space)
It may or may not faster, but definitely simpler, more stable and more maintainable.

I ran some tests with different implementations.
First was #Melnikovl answer.
Second was mine, where I replaced + with | and * with << operator.
Third was author's original solution.
I tested with modified code and measured only conversion code.
First and second solution showed a little better performance. But BitConverter a little more often was better, so I think you should choose it (also because if simplicity of code)

var byte[] bytes = { 1, 1, 1, 1 };
int i = BitConverter.ToInt32(bytes, 0);
char a = (char)i;
Don't forgot to check if byte array litte or big endian

My static C# function is playing games with me... totaly weird!

So, I've written a small and, from what I initially thought, easy method in C#.
This static method is meant to be used as a simple password suggestion generator, and the code looks like this:
public static string CreateRandomPassword(int outputLength, string source = "")
{
var output = string.Empty;
for (var i = 0; i < outputLength; i++)
{
var randomObj = new Random();
output += source.Substring(randomObj.Next(source.Length), 1);
}
return output;
}
I call this function like this:
var randomPassword = StringHelper.CreateRandomPassword(5, "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
Now, this method almost always return random strings like "AAAAAA", "BBBBBB", "888888" etc.. , where I thought it should return strings like "A8JK2A", "82mOK7" etc.
However, and here is the wierd part; If I place a breakpoint, and step through this iteration line by line, I get the correct type of password in return. In 100% of the other cases, when Im not debugging, it gives me crap like "AAAAAA", "666666", etc..
How is this possible? Any suggestion is greatly appreciated! :-)
BTW, my system: Visual Studio 2010, C# 4.0, ASP.NET MVC 3 RTM project w/ ASP.NET Development Server. Haven't tested this code in any other environments.

Move the declaration for the randomObj outside the loop. When you're debugging it, it creates it with a new seed each time, because there's enough time difference for the seed to be different. But when you're not debugging, the seed time is basically the same for each iteration of the loop, so it's giving you the same start value each time.
And a minor nit -- it's a good habit to use a StringBuilder rather than a string for this sort of thing, so you don't have to re-initialize the memory space every time you append a character to the string.
In other words, like this:
public static string CreateRandomPassword(int outputLength, string source = "")
{
var output = new StringBuilder();
var randomObj = new Random();
for (var i = 0; i < outputLength; i++)
{
output.Append(source.Substring(randomObj.Next(source.Length), 1));
}
return output.ToString();
}

The behavior you're seeing is because Random is time-based, and when you're not debugging it flies through all 5 iterations at the same moment (more or less). So you're asking for the first random number off the same seed. When you're debugging, it takes long enough to get a new seed each time.
Move the declaration of Random outside the loop:
var randomObj = new Random();
for (var i = 0; i < outputLength; i++)
{
output += source.Substring(randomObj.Next(source.Length), 1);
}
Now you're moving forward 5 steps away from a Random seed instead of moving 1 step away from the same Random seed 5 times.

You are instantiating a new instance of Random() on each iteration through the loop with a new time-dependent seed. Given the granularity of the system clock and the speed of modern CPUs, this pretty much guarantees that you restart the pseudo-random sequence over and over with the same seed.
Try something like the following, though if you're single-threaded, you can safely omit the lock():
private static Random randomBits = new Random() ;
public static string CreateRandomPassword(int outputLength, string source = "")
{
StringBuilder sb = new StringBuilder(outputLength) ;
lock ( randomBits )
{
while ( sb.Length < outputLength )
{
sb.Append( randomBits.Next( source.Length) , 1 ) ;
}
}
return sb.ToString() ;
}
You only instantiate the RNG once. Every draws bits from the same RNG, so it should behave much more like a source of entropy. If you need repeatability for testing, use the Random constructor overload that lets you supply the seed. Same seed == same pseudo-random sequence.

When to use StringBuilder?

I understand the benefits of StringBuilder.
But if I want to concatenate 2 strings, then I assume that it is better (faster) to do it without StringBuilder. Is this correct?
At what point (number of strings) does it become better to use StringBuilder?

I warmly suggest you to read The Sad Tragedy of Micro-Optimization Theater, by Jeff Atwood.
It treats Simple Concatenation vs. StringBuilder vs. other methods.
Now, if you want to see some numbers and graphs, follow the link ;)

But if I want to concatinate 2
strings, then I assume that it is
better (faster) to do it without
StringBuilder. Is this correct?
That is indeed correct, you can find why exactly explained very well on :
Article about strings and StringBuilder
Summed up : if you can concatinate strings in one go like
var result = a + " " + b + " " + c + ..
you are better off without StringBuilder for only on copy is made (the length of the resulting string is calculated beforehand.);
For structure like
var result = a;
result += " ";
result += b;
result += " ";
result += c;
..
new objects are created each time, so there you should consider StringBuilder.
At the end the article sums up these rules of thumb :
Rules Of Thumb
So, when should you use StringBuilder,
and when should you use the string
concatenation operators?
Definitely use StringBuilder when
you're concatenating in a non-trivial
loop - especially if you don't know
for sure (at compile time) how many
iterations you'll make through the
loop. For example, reading a file a
character at a time, building up a
string as you go using the += operator
is potentially performance suicide.
Definitely use the concatenation
operator when you can (readably)
specify everything which needs to be
concatenated in one statement. (If you
have an array of things to
concatenate, consider calling
String.Concat explicitly - or
String.Join if you need a delimiter.)
Don't be afraid to break literals up
into several concatenated bits - the
result will be the same. You can aid
readability by breaking a long literal
into several lines, for instance, with
no harm to performance.
If you need the intermediate results
of the concatenation for something
other than feeding the next iteration
of concatenation, StringBuilder isn't
going to help you. For instance, if
you build up a full name from a first
name and a last name, and then add a
third piece of information (the
nickname, maybe) to the end, you'll
only benefit from using StringBuilder
if you don't need the (first name +
last name) string for other purpose
(as we do in the example which creates
a Person object).
If you just have a few concatenations
to do, and you really want to do them
in separate statements, it doesn't
really matter which way you go. Which
way is more efficient will depend on
the number of concatenations the sizes
of string involved, and what order
they're concatenated in. If you really
believe that piece of code to be a
performance bottleneck, profile or
benchmark it both ways.

System.String is an immutable object - it means that whenever you modify its content it will allocate a new string and this takes time (and memory?).
Using StringBuilder you modify the actual content of the object without allocating a new one.
So use StringBuilder when you need to do many modifications on the string.

Not really...you should use StringBuilder if you concatenate large strings or you have many concatenations, like in a loop.

If you concatenate strings in a loop, you should consider using StringBuilder instead of regular String
In case it's single concatenation, you may not see the difference in execution time at all
Here is a simple test app to prove the point:
static void Main(string[] args)
{
//warm-up rounds:
Test(500);
Test(500);
//test rounds:
Test(500);
Test(1000);
Test(10000);
Test(50000);
Test(100000);
Console.ReadLine();
}
private static void Test(int iterations)
{
int testLength = iterations;
Console.WriteLine($"----{iterations}----");
//TEST 1 - String
var startTime = DateTime.Now;
var resultString = "test string";
for (var i = 0; i < testLength; i++)
{
resultString += i.ToString();
}
Console.WriteLine($"STR: {(DateTime.Now - startTime).TotalMilliseconds}");
//TEST 2 - StringBuilder
startTime = DateTime.Now;
var stringBuilder = new StringBuilder("test string");
for (var i = 0; i < testLength; i++)
{
stringBuilder.Append(i.ToString());
}
string resultString2 = stringBuilder.ToString();
Console.WriteLine($"StringBuilder: {(DateTime.Now - startTime).TotalMilliseconds}");
Console.WriteLine("---------------");
Console.WriteLine("");
}
Results (in milliseconds):
----500----
STR: 0.1254
StringBuilder: 0
---------------
----1000----
STR: 2.0232
StringBuilder: 0
---------------
----10000----
STR: 28.9963
StringBuilder: 0.9986
---------------
----50000----
STR: 1019.2592
StringBuilder: 4.0079
---------------
----100000----
STR: 11442.9467
StringBuilder: 10.0363
---------------

There's no definitive answer, only rules-of-thumb. My own personal rules go something like this:
If concatenating in a loop, always use a StringBuilder.
If the strings are large, always use a StringBuilder.
If the concatenation code is tidy and readable on the screen then it's probably ok.
If it isn't, use a StringBuilder.

To paraphrase
Then shalt thou count to three, no more, no less. Three shall be the number thou shalt count, and the number of the counting shall be three. Four shalt thou not count, neither count thou two, excepting that thou then proceed to three. Once the number three, being the third number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch
I generally use string builder for any block of code which would result in the concatenation of three or more strings.

Since it's difficult to find an explanation for this that's not either influenced by opinions or followed by a battle of prides I thought to write a bit of code on LINQpad to test this myself.
I found that using small sized strings rather than using i.ToString() changes response times (visible in small loops).
The test uses different sequences of iterations to keep time measurements in sensibly comparable ranges.
I'll copy the code at the end so you can try it yourself (results.Charts...Dump() won't work outside LINQPad).
Output (X-Axis: Number of iterations tested, Y-Axis: Time taken in ticks):
Iterations sequence: 2, 3, 4, 5, 6, 7, 8, 9, 10
Iterations sequence: 10, 20, 30, 40, 50, 60, 70, 80
Iterations sequence: 100, 200, 300, 400, 500
Code (Written using LINQPad 5):
void Main()
{
Test(2, 3, 4, 5, 6, 7, 8, 9, 10);
Test(10, 20, 30, 40, 50, 60, 70, 80);
Test(100, 200, 300, 400, 500);
}
void Test(params int[] iterationsCounts)
{
$"Iterations sequence: {string.Join(", ", iterationsCounts)}".Dump();
int testStringLength = 10;
RandomStringGenerator.Setup(testStringLength);
var sw = new System.Diagnostics.Stopwatch();
var results = new Dictionary<int, TimeSpan[]>();
// This call before starting to measure time removes initial overhead from first measurement
RandomStringGenerator.GetRandomString();
foreach (var iterationsCount in iterationsCounts)
{
TimeSpan elapsedForString, elapsedForSb;
// string
sw.Restart();
var str = string.Empty;
for (int i = 0; i < iterationsCount; i++)
{
str += RandomStringGenerator.GetRandomString();
}
sw.Stop();
elapsedForString = sw.Elapsed;
// string builder
sw.Restart();
var sb = new StringBuilder(string.Empty);
for (int i = 0; i < iterationsCount; i++)
{
sb.Append(RandomStringGenerator.GetRandomString());
}
sw.Stop();
elapsedForSb = sw.Elapsed;
results.Add(iterationsCount, new TimeSpan[] { elapsedForString, elapsedForSb });
}
// Results
results.Chart(r => r.Key)
.AddYSeries(r => r.Value[0].Ticks, LINQPad.Util.SeriesType.Line, "String")
.AddYSeries(r => r.Value[1].Ticks, LINQPad.Util.SeriesType.Line, "String Builder")
.DumpInline();
}
static class RandomStringGenerator
{
static Random r;
static string[] strings;
public static void Setup(int testStringLength)
{
r = new Random(DateTime.Now.Millisecond);
strings = new string[10];
for (int i = 0; i < strings.Length; i++)
{
strings[i] = Guid.NewGuid().ToString().Substring(0, testStringLength);
}
}
public static string GetRandomString()
{
var indx = r.Next(0, strings.Length);
return strings[indx];
}
}

But if I want to concatenate 2 strings, then I assume that it's better and faster to do so without StringBuilder. Is this correct?
Yes. But more importantly, it is vastly more readable to use a vanilla String in such situations. Using it in a loop, on the other hand, makes sense and can also be as readable as concatenation.
I’d be wary of rules of thumb that cite specific numbers of concatenation as a threshold. Using it in loops (and loops only) is probably just as useful, easier to remember and makes more sense.

As long as you can physically type the number of concatenations (a + b + c ...) it shouldn't make a big difference. N squared (at N = 10) is a 100X slowdown, which shouldn't be too bad.
The big problem is when you are concatenating hundreds of strings. At N=100, you get a 10000X times slowdown. Which is pretty bad.

A single concatenation is not worth using a StringBuilder. I've typically used 5 concatenations as a rule of thumb.

I don't think there's a fine line between when to use or when not to. Unless of course someone performed some extensive testings to come out with the golden conditions.
For me, I will not use StringBuilder if just concatenating 2 huge strings. If there's loop with an undeterministic count, I'm likely to, even if the loop might be small counts.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string.IndexOf performance - c#

The IndexOf overload you're using is culture-sensitive, which will affect performance. Instead, use: input.IndexOf("<script", 0, StringComparison.Ordinal);

I don't discuss the code here, that probably coul be written with Regex and so on... but in order to me is slow because the IndexOf() inside the for always rescan the string from the beginning ( it always start from index 0 ) try to scan from the last occurrency found instead.

Related

How come for loops in C# are so slow when concatenating strings? [duplicate]

How to assert uniqueness of a huge collection of strings?

Optimization of algorithm that convert 8-bit number to char

My static C# function is playing games with me... totaly weird!

When to use StringBuilder?

Categories

Resources