Am I undermining the efficiency of StringBuilder?

Am I undermining the efficiency of StringBuilder? - c#

I've started using StringBuilder in preference to straight concatenation, but it seems like it's missing a crucial method. So, I implemented it myself, as an extension:
public void Append(this StringBuilder stringBuilder, params string[] args)
{
foreach (string arg in args)
stringBuilder.Append(arg);
}
This turns the following mess:
StringBuilder sb = new StringBuilder();
...
sb.Append(SettingNode);
sb.Append(KeyAttribute);
sb.Append(setting.Name);
Into this:
sb.Append(SettingNode, KeyAttribute, setting.Name);
I could use sb.AppendFormat("{0}{1}{2}",..., but this seems even less preferred, and still harder to read. Is my extension a good method, or does it somehow undermine the benefits of StringBuilder? I'm not trying to prematurely optimize anything, as my method is more about readability than speed, but I'd also like to know I'm not shooting myself in the foot.

I see no problem with your extension. If it works for you it's all good.
I myself prefere:
sb.Append(SettingNode)
.Append(KeyAttribute)
.Append(setting.Name);

Questions like this can always be answered with a simple test case.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace SBTest
{
class Program
{
private const int ITERATIONS = 1000000;
private static void Main(string[] args)
{
Test1();
Test2();
Test3();
}
private static void Test1()
{
var sw = Stopwatch.StartNew();
var sb = new StringBuilder();
for (var i = 0; i < ITERATIONS; i++)
{
sb.Append("TEST" + i.ToString("00000"),
"TEST" + (i + 1).ToString("00000"),
"TEST" + (i + 2).ToString("00000"));
}
sw.Stop();
Console.WriteLine("Testing Append() extension method...");
Console.WriteLine("--------------------------------------------");
Console.WriteLine("Test 1 iterations: {0:n0}", ITERATIONS);
Console.WriteLine("Test 1 milliseconds: {0:n0}", sw.ElapsedMilliseconds);
Console.WriteLine("Test 1 output length: {0:n0}", sb.Length);
Console.WriteLine("");
}
private static void Test2()
{
var sw = Stopwatch.StartNew();
var sb = new StringBuilder();
for (var i = 0; i < ITERATIONS; i++)
{
sb.Append("TEST" + i.ToString("00000"));
sb.Append("TEST" + (i+1).ToString("00000"));
sb.Append("TEST" + (i+2).ToString("00000"));
}
sw.Stop();
Console.WriteLine("Testing multiple calls to Append() built-in method...");
Console.WriteLine("--------------------------------------------");
Console.WriteLine("Test 2 iterations: {0:n0}", ITERATIONS);
Console.WriteLine("Test 2 milliseconds: {0:n0}", sw.ElapsedMilliseconds);
Console.WriteLine("Test 2 output length: {0:n0}", sb.Length);
Console.WriteLine("");
}
private static void Test3()
{
var sw = Stopwatch.StartNew();
var sb = new StringBuilder();
for (var i = 0; i < ITERATIONS; i++)
{
sb.AppendFormat("{0}{1}{2}",
"TEST" + i.ToString("00000"),
"TEST" + (i + 1).ToString("00000"),
"TEST" + (i + 2).ToString("00000"));
}
sw.Stop();
Console.WriteLine("Testing AppendFormat() built-in method...");
Console.WriteLine("--------------------------------------------");
Console.WriteLine("Test 3 iterations: {0:n0}", ITERATIONS);
Console.WriteLine("Test 3 milliseconds: {0:n0}", sw.ElapsedMilliseconds);
Console.WriteLine("Test 3 output length: {0:n0}", sb.Length);
Console.WriteLine("");
}
}
public static class SBExtentions
{
public static void Append(this StringBuilder sb, params string[] args)
{
foreach (var arg in args)
sb.Append(arg);
}
}
}
On my PC, the output is:
Testing Append() extension method...
--------------------------------------------
Test 1 iterations: 1,000,000
Test 1 milliseconds: 1,080
Test 1 output length: 29,700,006
Testing multiple calls to Append() built-in method...
--------------------------------------------
Test 2 iterations: 1,000,000
Test 2 milliseconds: 1,001
Test 2 output length: 29,700,006
Testing AppendFormat() built-in method...
--------------------------------------------
Test 3 iterations: 1,000,000
Test 3 milliseconds: 1,124
Test 3 output length: 29,700,006
So your extension method is only slightly slower than the Append() method and is slightly faster than the AppendFormat() method, but in all 3 cases, the difference is entirely too trivial to worry about. Thus, if your extension method enhances the readability of your code, use it!

It's a little bit of overhead creating the extra array, but I doubt that it's a lot. You should measure
If it turns out that the overhead of creating string arrays is significant, you can mitigate it by having several overloads - one for two parameters, one for three, one for four etc... so that only when you get to a higher number of parameters (e.g. six or seven) will it need to create the array. The overloads would be like this:
public void Append(this builder, string item1, string item2)
{
builder.Append(item1);
builder.Append(item2);
}
public void Append(this builder, string item1, string item2, string item3)
{
builder.Append(item1);
builder.Append(item2);
builder.Append(item3);
}
public void Append(this builder, string item1, string item2,
string item3, string item4)
{
builder.Append(item1);
builder.Append(item2);
builder.Append(item3);
builder.Append(item4);
}
// etc
And then one final overload using params, e.g.
public void Append(this builder, string item1, string item2,
string item3, string item4, params string[] otherItems)
{
builder.Append(item1);
builder.Append(item2);
builder.Append(item3);
builder.Append(item4);
foreach (string item in otherItems)
{
builder.Append(item);
}
}
I'd certainly expect these (or just your original extension method) to be faster than using AppendFormat - which needs to parse the format string, after all.
Note that I didn't make these overloads call each other pseudo-recursively - I suspect they'd be inlined, but if they weren't the overhead of setting up a new stack frame etc could end up being significant. (We're assuming the overhead of the array is significant, if we've got this far.)

Other than a bit of overhead, I don't personally see any issues with it. Definitely more readable. As long as you're passing a reasonable number of params in I don't see the problem.

From a clarity perspective, your extension is ok.
It would probably be best to simply use the .append(x).append(y).append(z) format if you never have more than about 5 or 6 items.
StringBuilder itself would only net you a performance gain if you were processing many thousands of items. In addition you'll be creating the array every time you call the method.
So if you're doing it for clarity, that's ok. If you're doing it for efficiency, then you're probably on the wrong track.

I wouldn't say you're undermining it's efficiency, but you may be doing something inefficient when a more efficient method is available. AppendFormat is what I think you want here. If the {0}{1}{2} string being used constantly is too ugly, I tend to put my format strings in consts above, so the look would be more or less the same as your extension.
sb.AppendFormat(SETTING_FORMAT, var1, var2, var3);

I haven't tested recently, but in the past, StringBuilder was actually slower than plain-vanilla string concatenation ("this " + "that") until you get to about 7 concatenations.
If this is string concatenation that is not happening in a loop, you may want to consider if you should be using the StringBuilder at all. (In a loop, I start to worry about allocations with plain-vanilla string concatenation, since strings are immutable.)

Potentially even faster, because it performs at most one reallocation/copy step, for many appends.
public void Append(this StringBuilder stringBuilder, params string[] args)
{
int required = stringBuilder.Length;
foreach (string arg in args)
required += arg.Length;
if (stringBuilder.Capacity < required)
stringBuilder.Capacity = required;
foreach (string arg in args)
stringBuilder.Append(arg);
}

Ultimately it comes down to which one results in less string creation. I have a feeling that the extension will result in a higher string count that using the string format. But the performance probably won't be that different.

Chris,
Inspired by this Jon Skeet response (second answer), I slightly rewrote your code. Basically, I added the TestRunner method which runs the passed-in function and reports the elapsed time, eliminating a little redundant code. Not to be smug, but rather as a programming exercise for myself. I hope it's helpful.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace SBTest
{
class Program
{
private static void Main(string[] args)
{
// JIT everything
AppendTest(1);
AppendFormatTest(1);
int iterations = 1000000;
// Run Tests
TestRunner(AppendTest, iterations);
TestRunner(AppendFormatTest, iterations);
Console.ReadLine();
}
private static void TestRunner(Func<int, long> action, int iterations)
{
GC.Collect();
var sw = Stopwatch.StartNew();
long length = action(iterations);
sw.Stop();
Console.WriteLine("--------------------- {0} -----------------------", action.Method.Name);
Console.WriteLine("iterations: {0:n0}", iterations);
Console.WriteLine("milliseconds: {0:n0}", sw.ElapsedMilliseconds);
Console.WriteLine("output length: {0:n0}", length);
Console.WriteLine("");
}
private static long AppendTest(int iterations)
{
var sb = new StringBuilder();
for (var i = 0; i < iterations; i++)
{
sb.Append("TEST" + i.ToString("00000"),
"TEST" + (i + 1).ToString("00000"),
"TEST" + (i + 2).ToString("00000"));
}
return sb.Length;
}
private static long AppendFormatTest(int iterations)
{
var sb = new StringBuilder();
for (var i = 0; i < iterations; i++)
{
sb.AppendFormat("{0}{1}{2}",
"TEST" + i.ToString("00000"),
"TEST" + (i + 1).ToString("00000"),
"TEST" + (i + 2).ToString("00000"));
}
return sb.Length;
}
}
public static class SBExtentions
{
public static void Append(this StringBuilder sb, params string[] args)
{
foreach (var arg in args)
sb.Append(arg);
}
}
}
Here's the output:
--------------------- AppendTest -----------------------
iterations: 1,000,000
milliseconds: 1,274
output length: 29,700,006
--------------------- AppendFormatTest -----------------------
iterations: 1,000,000
milliseconds: 1,381
output length: 29,700,006

Related

Is there any performance boost by using Lamda-expressions in C#?

I would like to know if there is any difference in the performance by doing the following:
void Example(string message)
{
System.Console.WriteLine(message);
}
or
void Example(string message)
{
ConsoleMessage(message);
}
void ConsoleMessage(string message) => System.Console.WriteLine(message);

You can check how it compiles on this website - SharpLab. As you can see
void ConsoleMessage(string message) => System.Console.WriteLine(message);
is compiled to another method. The only difference is one more method call when using Example2. You can read more about cost of method calls in this post - How expensive are method calls in .net
It's very, very unlikely to be your bottleneck though. As always, write the most readable code you can first, and then benchmark it to see whether it performs well enough. If it doesn't, use a profiler to find the hotspots which may be worth micro-optimising.

I wrote that
static void ConsoleMessage(string message) => Console.WriteLine(message);
static void ExampleIndirect(string message)
{
ConsoleMessage(message);
}
static void ExampleDirect(string message)
{
Console.WriteLine(message);
}
By using JustDecompile on the release exe, I found that :
private static void ConsoleMessage(string message)
{
Console.WriteLine(message);
}
private static void ExampleDirect(string message)
{
Console.WriteLine(message);
}
private static void ExampleIndirect(string message)
{
Program.ConsoleMessage(message);
}
So I'm not so sure that it "is compiled to another method".
However, the difference seems not to be noticeable on benchmarks.
EDIT :
Results of my benchmark :
Iterations / Direct Time (ms) / Indirect time (ms)
10 1 0
5000 154 148
50000 1514 1502
100000 3025 3019
500000 15191 15150
1000000 30362 30276
And the code :
static void Loop(int times)
{
Stopwatch sw = Stopwatch.StartNew();
writer.Write(times.ToString() + "\t");
for (int i = 0; i < times; i++)
ExampleDirect("Test " + i.ToString());
writer.Write(sw.ElapsedMilliseconds.ToString() + "\t");
sw.Restart();
for (int i = 0; i < times; i++)
ExampleIndirect("Test " + i.ToString());
writer.Write(sw.ElapsedMilliseconds.ToString() + Environment.NewLine);
}

Speed up string concatenation [duplicate]

This question already has answers here:
String.Format vs "string" + "string" or StringBuilder? [duplicate]
(2 answers)
Closed 6 years ago.
I have a program that will write a series of files in a loop. The filename is constructed using a parameter from an object supplied to the method.
ANTS Performance Profiler says this is dog slow and I'm not sure why:
public string CreateFilename(MyObject obj)
{
return "sometext-" + obj.Name + ".txt";
}
Is there a more performant way of doing this? The method is hit thousands of times and I don't know of a good way outside of having a discrete method for this purpose since the input objects are out of my control and regularly change.

The compiler will optimize your two concats into one call to:
String.Concat("sometext-", obj.Name, ".txt")
There is no faster way to do this.

If you instead compute the filename within the class itself, it will run much faster, in exchange for decreased performance when modifying the object. Mind you, I'd be very concerned if computing a filename was a bottleneck; writing to the file is way slower than coming up with its name.
See code samples below. When I benchmarked them with optimizations on (in LINQPad 5), Test2 ran about 15x faster than Test1. Among other things, Test2 doesn't constantly generate/discard tiny string objects.
void Main()
{
Test1();
Test1();
Test1();
Test2();
Test2();
Test2();
}
void Test1()
{
System.Diagnostics.Stopwatch sw = new Stopwatch();
MyObject1 mo = new MyObject1 { Name = "MyName" };
sw.Start();
long x = 0;
for (int i = 0; i < 10000000; ++i)
{
x += CreateFileName(mo).Length;
}
Console.WriteLine(x); //Sanity Check, prevent clever compiler optimizations
sw.ElapsedMilliseconds.Dump("Test1");
}
public string CreateFileName(MyObject1 obj)
{
return "sometext-" + obj.Name + ".txt";
}
void Test2()
{
System.Diagnostics.Stopwatch sw = new Stopwatch();
MyObject2 mo = new MyObject2 { Name = "MyName" };
sw.Start();
long x = 0;
for (int i = 0; i < 10000000; ++i)
{
x += mo.FileName.Length;
}
Console.WriteLine(x); //Sanity Check, prevent clever compiler optimizations
sw.ElapsedMilliseconds.Dump("Test2");
}
public class MyObject1
{
public string Name;
}
public class MyObject2
{
public string FileName { get; private set;}
private string _name;
public string Name
{
get
{
return _name;
}
set
{
_name=value;
FileName = "sometext-" + _name + ".txt";
}
}
}
I also tested adding memoization to CreateFileName, but it barely improved performance over Test1, and it couldn't possibly beat out Test2, since it performs the equivalent steps with additional overhead for hash lookups.

Fast way to use String.Contains with huge list C#

I have somethings like this:
List<string> listUser;
listUser.Add("user1");
listUser.Add("user2");
listUser.Add("userhacker");
listUser.Add("user1other");
List<string> key_blacklist;
key_blacklist.Add("hacker");
key_blacklist.Add("other");
foreach (string user in listUser)
{
foreach (string key in key_blacklist)
{
if (user.Contains(key))
{
// remove it in listUser
}
}
}
The result of listUser is: user1, user2.
The problem is if i have a huge listUser (more than 10 million) and huge key_blacklist (100.000). That code is very very slow.
Is have anyway to get that faster?
UPDATE: I find new solution in there.
http://cc.davelozinski.com/c-sharp/fastest-way-to-check-if-a-string-occurs-within-a-string
Hope that will help someone when he got in there! :)

If you don't have much control over how the list of users is constructed, you can at least test each item in the list in parallel, which on modern machines with multiple cores will speed up the checking a fair bit.
listuser.AsParallel().Where(
s =>
{
foreach (var key in key_blacklist)
{
if (s.Contains(key))
{
return false; //Not to be included
}
}
return true; //To be included, as no match with the blacklist
});
Also - do you have to use .Contains? .Equals is going to be much much quicker, because in almost all cases a non-match will be determined when the HashCodes differ, which can be found only by an integer comparison. Super quick.
If you do need .Contains, you may want to think about restructuring the app. What do these strings in the list really represent? Separate sub-groups of users? Can I test each string, at the time it's added, for whether it represents a user on the blacklist?
UPDATE: In response to #Rawling's comment below - If you know that there is a finite set of usernames which have, say, "hacker" as a substring, that set would have to be pretty large before running a .Equals test of each username against a candidate would be slower than running .Contains on the candidate. This is because HashCode is really quick.

If you are using entity framework or linq to sql then using linq and sending the query to a server can improve the performance.
Then instead of removing the items you are actually querying for the items that fulfil the requirements, i.e. user where the name doesn't contain the banned expression:
listUser.Where(u => !key_blacklist.Any(u.Contains)).Select(u => u).ToList();

A possible solution is to use a tree-like data structure.
The basic idea is to have the blacklisted words organised like this:
+ h
| + ha
| + hac
| - hacker
| - [other words beginning with hac]
|
+ f
| + fu
| + fuk
| - fukoff
| - [other words beginning with fuk]
Then, when you check for blacklisted words, you avoid searching the whole list of words beginning with "hac" if you find out that your user string does not even contain "h".
In the example I provided, with your sample data, this does not of course make any difference, but with the real data sets this should reduce significantly the number of Contains, since you don't check against the full list of blacklisted words every time.
Here is a code example (please note that the code is pretty bad, this is just to illustrate my idea)
using System;
using System.Collections.Generic;
using System.Linq;
class Program {
class Blacklist {
public string Start;
public int Level;
const int MaxLevel = 3;
public Dictionary<string, Blacklist> SubBlacklists = new Dictionary<string, Blacklist>();
public List<string> BlacklistedWords = new List<string>();
public Blacklist() {
Start = string.Empty;
Level = 0;
}
Blacklist(string start, int level) {
Start = start;
Level = level;
}
public void AddBlacklistedWord(string word) {
if (word.Length > Level && Level < MaxLevel) {
string index = word.Substring(0, Level + 1);
Blacklist sublist = null;
if (!SubBlacklists.TryGetValue(index, out sublist)) {
sublist = new Blacklist(index, Level + 1);
SubBlacklists[index] = sublist;
}
sublist.AddBlacklistedWord(word);
} else {
BlacklistedWords.Add(word);
}
}
public bool ContainsBlacklistedWord(string wordToCheck) {
if (wordToCheck.Length > Level && Level < MaxLevel) {
foreach (var sublist in SubBlacklists.Values) {
if (wordToCheck.Contains(sublist.Start)) {
return sublist.ContainsBlacklistedWord(wordToCheck);
}
}
}
return BlacklistedWords.Any(x => wordToCheck.Contains(x));
}
}
static void Main(string[] args) {
List<string> listUser = new List<string>();
listUser.Add("user1");
listUser.Add("user2");
listUser.Add("userhacker");
listUser.Add("userfukoff1");
Blacklist blacklist = new Blacklist();
blacklist.AddBlacklistedWord("hacker");
blacklist.AddBlacklistedWord("fukoff");
foreach (string user in listUser) {
if (blacklist.ContainsBlacklistedWord(user)) {
Console.WriteLine("Contains blacklisted word: {0}", user);
}
}
}
}

You are using the wrong thing. If you have a lot of data, you should be using either HashSet<T> or SortedSet<T>. If you don't need the data sorted, go with HashSet<T>. Here is a program I wrote to demonstrate the time differences:
class Program
{
private static readonly Random random = new Random((int)DateTime.Now.Ticks);
static void Main(string[] args)
{
Console.WriteLine("Creating Lists...");
var stringList = new List<string>();
var hashList = new HashSet<string>();
var sortedList = new SortedSet<string>();
var searchWords1 = new string[3];
int ndx = 0;
for (int x = 0; x < 1000000; x++)
{
string str = RandomString(10);
if (x == 5 || x == 500000 || x == 999999)
{
str = "Z" + str;
searchWords1[ndx] = str;
ndx++;
}
stringList.Add(str);
hashList.Add(str);
sortedList.Add(str);
}
Console.WriteLine("Lists created!");
var sw = new Stopwatch();
sw.Start();
bool search1 = stringList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("List<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
search1 = hashList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("HashSet<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
search1 = sortedList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("SortedSet<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
}
private static string RandomString(int size)
{
var builder = new StringBuilder();
char ch;
for (int i = 0; i < size; i++)
{
ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
builder.Append(ch);
}
return builder.ToString();
}
}
On my machine, I got the following results:
Creating Lists...
Lists created!
List<T> True ==> 15ms
HashSet<T> True ==> 0ms
SortedSet<T> True ==> 0ms
As you can see, List<T> was extremely slow comparted to HashSet<T> and SortedSet<T>. Those were almost instantaneous.

Writing to console char by char, fastest way

In a current project of mine I have to parse a string, and write parts of it to the console. While testing how to do this without too much overhead, I discovered that one way I was testing is actually faster than Console.WriteLine, which is slightly confusing to me.
I'm aware this is not the proper way to benchmark stuff, but I'm usually fine with a rough "this is faster than this", which I can tell after running it a few times.
static void Main(string[] args)
{
var timer = new Stopwatch();
timer.Restart();
Test1("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
timer.Restart();
Test2("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
timer.Restart();
Test3("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
}
static void Test1(string str)
{
Console.WriteLine(str);
}
static void Test2(string str)
{
foreach (var c in str)
Console.Write(c);
Console.Write('\n');
}
static void Test3(string str)
{
using (var stream = new StreamWriter(Console.OpenStandardOutput()))
{
foreach (var c in str)
stream.Write(c);
stream.Write('\n');
}
}
As you can see, Test1 is using Console.WriteLine. My first thought was to simply call Write for every char, see Test2. But this resulted in taking roughly twice as long. My guess would be that it flushes after every write, which makes it slower. So I tried Test3, using a StreamWriter (AutoFlush off), which resulted in being about 25% faster than Test1, and I'm really curious why that is. Or is it that writing to the console can't be benchmarked properly? (noticed some strange data when adding more test cases...)
Can someone enlighten me?
Also, if there's a better way to do this (going though a string and only writing parts of it to the console), feel free to comment on that.

First I agree with the other comments that your test harness leaves something to be desired... I rewrote it and included it below. The result after rewrite post a clear winner:
//Test 1 = 00:00:03.7066514
//Test 2 = 00:00:24.6765818
//Test 3 = 00:00:00.8609692
From this you are correct the buffered stream writer is better than 25% faster. It's faster only because it's buffered. Internally the StreamWriter implementation uses a default buffer size of around 1~4kb (depending on the stream type). If you construct the StreamWriter with an 8-byte buffer (the smallest allowed) you will see most of your performance improvement disappear. You can also see this by using a Flush() call following each write.
Here is the test rewritten to obtain the numbers above:
private static StreamWriter stdout = new StreamWriter(Console.OpenStandardOutput());
static void Main(string[] args)
{
Action<string>[] tests = new Action<string>[] { Test1, Test2, Test3 };
TimeSpan[] timming = new TimeSpan[tests.Length];
// Repeat the entire sequence of tests many times to accumulate the result
for (int i = 0; i < 100; i++)
{
for( int itest =0; itest < tests.Length; itest++)
{
string text = String.Format("just a little test string, test = {0}, iteration = {1}", itest, i);
Action<string> thisTest = tests[itest];
//Clear the console so that each test begins from the same state
Console.Clear();
var timer = Stopwatch.StartNew();
//Repeat the test many times, if this was not using the console
//I would use a much higher number, say 10,000
for (int j = 0; j < 100; j++)
thisTest(text);
timer.Stop();
//Accumulate the result, but ignore the first run
if (i != 0)
timming[itest] += timer.Elapsed;
//Depending on what you are benchmarking you may need to force GC here
}
}
//Now print the results we have collected
Console.Clear();
for (int itest = 0; itest < tests.Length; itest++)
Console.WriteLine("Test {0} = {1}", itest + 1, timming[itest]);
Console.ReadLine();
}
static void Test1(string str)
{
Console.WriteLine(str);
}
static void Test2(string str)
{
foreach (var c in str)
Console.Write(c);
Console.Write('\n');
}
static void Test3(string str)
{
foreach (var c in str)
stdout.Write(c);
stdout.Write('\n');
}

I've ran your test for 10000 times each and the results are the following on my machine:
test1 - 0.6164241
test2 - 8.8143273
test3 - 0.9537039
this is the script I used:
static void Main(string[] args)
{
Test1("just a little test string."); // warm up
GC.Collect(); // compact Heap
GC.WaitForPendingFinalizers(); // and wait for the finalizer queue to empty
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < 10000; i++)
{
Test1("just a little test string.");
}
timer.Stop();
Console.WriteLine(timer.Elapsed);
}

I changed the code to run each test 1000 times.
static void Main(string[] args) {
var timer = new Stopwatch();
timer.Restart();
for (int i = 0; i < 1000; i++)
Test1("just a little test string.");
timer.Stop();
TimeSpan elapsed1 = timer.Elapsed;
timer.Restart();
for (int i = 0; i < 1000; i++)
Test2("just a little test string.");
timer.Stop();
TimeSpan elapsed2 = timer.Elapsed;
timer.Restart();
for (int i = 0; i < 1000; i++)
Test3("just a little test string.");
timer.Stop();
TimeSpan elapsed3 = timer.Elapsed;
Console.WriteLine(elapsed1);
Console.WriteLine(elapsed2);
Console.WriteLine(elapsed3);
Console.Read();
}
My output:
00:00:05.2172738
00:00:09.3893525
00:00:05.9624869

I also ran this one 10000 times and got these results:
00:00:00.6947374
00:00:09.6185047
00:00:00.8006468
Which seems in keeping with what others observed. I was curious why Test3 was slower than Test1, so wrote a fourth test:
timer.Start();
using (var stream = new StreamWriter(Console.OpenStandardOutput()))
{
for (int i = 0; i < testSize; i++)
{
Test4("just a little test string.", stream);
}
}
timer.Stop();
This one reuses the stream for each test, thus avoiding the overhead of recreating it each time. Result:
00:00:00.4090399
Although this is the fastest, it writes all the output at the end of the using block, which may not be what you are after. I would imagine that this approach would chew up more memory as well.

c# ref for speed

I understand full the ref word in the .NET
Since using the same variable, would increase speed to use ref instead of making copy?
I find bottleneck to be in password general.
Here is my codes
protected internal string GetSecurePasswordString(string legalChars, int length)
{
Random myRandom = new Random();
string myString = "";
for (int i = 0; i < length; i++)
{
int charPos = myRandom.Next(0, legalChars.Length - 1);
myString = myString + legalChars[charPos].ToString();
}
return myString;
}
is better to ref before legalchars?

Passing a string by value does not copy the string. It only copies the reference to the string. There's no performance benefit to passing the string by reference instead of by value.

No, you shouldn't pass the string reference by reference.
However, you are creating several strings pointlessly. If you're creating long passwords, that could be why it's a bottleneck. Here's a faster implementation:
protected internal string GetSecurePasswordString(string legalChars, int length)
{
Random myRandom = new Random();
char[] chars = new char[length];
for (int i = 0; i < length; i++)
{
int charPos = myRandom.Next(0, legalChars.Length - 1);
chars[i] = legalChars[charPos];
}
return new string(chars);
}
However, it still has three big flaws:
It creates a new instance of Random each time. If you call this method twice in quick succession, you'll get the same password twice. Bad idea.
The upper bound specified in a Random.Next() call is exclusive - so you'll never use the last character of legalChars.
It uses System.Random, which is not meant to be in any way cryptographically secure. Given that this is meant to be for a "secure password" you should consider using something like System.Security.Cryptography.RandomNumberGenerator. It's more work to do so because the API is harder, but you'll end up with a more secure system (if you do it properly).
You might also want to consider using SecureString, if you get really paranoid.

strings in .Net are immutable , so all modify operations on strings always result in creation ( and garbage collection) of new strings. No performance gain would be achieved by using ref in this case. Instead , use StringBuilder.

A word about the general performance gain of passing a string ByReference ("ref") instead of ByValue:
There is a performance gain, but it is very small!
Consider the program below where a function is called 10.000.0000 times with a string argument by value and by reference. The average time measured was
ByValue: 249 milliseconds
ByReference: 226 milliseconds
In general "ref" is a little faster, but often it's not worth worrying about it.
Here is my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace StringPerformanceTest
{
class Program
{
static void Main(string[] args)
{
const int n = 10000000;
int k;
string time, s1;
Stopwatch sw;
// List for testing ("1", "2", "3" ...)
List<string> list = new List<string>(n);
for (int i = 0; i < n; i++)
list.Add(i.ToString());
// Test ByVal
k = 0;
sw = Stopwatch.StartNew();
foreach (string s in list)
{
s1 = s;
if (StringTestSubVal(s1)) k++;
}
time = GetElapsedString(sw);
Console.WriteLine("ByVal: " + time);
Console.WriteLine("123 found " + k + " times.");
// Test ByRef
k = 0;
sw = Stopwatch.StartNew();
foreach (string s in list)
{
s1 = s;
if (StringTestSubRef(ref s1)) k++;
}
time = GetElapsedString(sw);
Console.WriteLine("Time ByRef: " + time);
Console.WriteLine("123 found " + k + " times.");
}
static bool StringTestSubVal(string s)
{
if (s == "123")
return true;
else
return false;
}
static bool StringTestSubRef(ref string s)
{
if (s == "123")
return true;
else
return false;
}
static string GetElapsedString(Stopwatch sw)
{
if (sw.IsRunning) sw.Stop();
TimeSpan ts = sw.Elapsed;
return String.Format("{0:00}:{1:00}:{2:00}.{3:000}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds);
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Am I undermining the efficiency of StringBuilder? - c#

I see no problem with your extension. If it works for you it's all good. I myself prefere: sb.Append(SettingNode) .Append(KeyAttribute) .Append(setting.Name);

Other than a bit of overhead, I don't personally see any issues with it. Definitely more readable. As long as you're passing a reasonable number of params in I don't see the problem.

Ultimately it comes down to which one results in less string creation. I have a feeling that the extension will result in a higher string count that using the string format. But the performance probably won't be that different.

Related

Is there any performance boost by using Lamda-expressions in C#?

Speed up string concatenation [duplicate]

Fast way to use String.Contains with huge list C#

Writing to console char by char, fastest way

c# ref for speed

Categories

Resources