ReverseString, a C# interview-question - c#

I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.
public string ReverseString(string sz)
{
string result = string.Empty;
for(int i = sz.Length-1; i>=0; i--)
{
result += sz[i]
}
return result;
}
I couldn't spot it. I saw no problems whatsoever.
In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).
I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.
I wanted to know, what is the problem with this code, do you guys see it?
-edit-
I changed this into a wiki because there can be several right answers.
Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.

Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:
public static string Reverse(string sz) // ideal for an extension method
{
if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
char[] chars = sz.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}

The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:
public string ReverseString(string sz)
{
var builder = new StringBuilder(sz.Length);
for(int i = sz.Length-1; i>=0; i--)
{
builder.Append(sz[i]);
}
return builder.ToString();
}
The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.
You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.
What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.

A few comments on the answers given so far:
Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.
Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?
As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)

Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).
A faster way would be (O(n)):
// pseudocode:
static string ReverseString(string input) {
char[] buf = new char[input.Length];
for(int i = 0; i < buf.Length; ++i)
buf[i] = input[input.Length - i - 1];
return new string(buf);
}

You can do this in .NET 3.5 instead:
public static string Reverse(this string s)
{
return new String((s.ToCharArray().Reverse()).ToArray());
}

Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.

You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.

I prefer something like this:
using System;
using System.Text;
namespace SpringTest3
{
static class Extentions
{
static private StringBuilder ReverseStringImpl(string s, int pos, StringBuilder sb)
{
return (s.Length <= --pos || pos < 0) ? sb : ReverseStringImpl(s, pos, sb.Append(s[pos]));
}
static public string Reverse(this string s)
{
return ReverseStringImpl(s, s.Length, new StringBuilder()).ToString();
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("abc".Reverse());
}
}
}

x is the string to reverse.
Stack<char> stack = new Stack<char>(x);
string s = new string(stack.ToArray());

This method cuts the number of iterations in half. Rather than starting from the end, it starts from the beginning and swaps characters until it hits center. Had to convert the string to a char array because the indexer on a string has no setter.
public string Reverse(String value)
{
if (String.IsNullOrEmpty(value)) throw new ArgumentNullException("value");
char[] array = value.ToCharArray();
for (int i = 0; i < value.Length / 2; i++)
{
char temp = array[i];
array[i] = array[(array.Length - 1) - i];
array[(array.Length - 1) - i] = temp;
}
return new string(array);
}

Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string (reversing a string is NOT equal to reversing a sequence of chars)
public static class Test
{
private static System.Collections.Generic.List<string> GraphemeClusters(string s)
{
System.Collections.Generic.List<string> ls = new System.Collections.Generic.List<string>();
System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (enumerator.MoveNext())
{
ls.Add((string)enumerator.Current);
}
return ls;
}
// this
private static string ReverseGraphemeClusters(string s)
{
if(string.IsNullOrEmpty(s) || s.Length == 1)
return s;
System.Collections.Generic.List<string> ls = GraphemeClusters(s);
ls.Reverse();
return string.Join("", ls.ToArray());
}
public static void TestMe()
{
string s = "Les Mise\u0301rables";
// s = "noël";
string r = ReverseGraphemeClusters(s);
// This would be wrong:
// char[] a = s.ToCharArray();
// System.Array.Reverse(a);
// string r = new string(a);
System.Console.WriteLine(r);
}
}
See:
https://vimeo.com/7403673
By the way, in Golang, the correct way is this:
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
And the incorrect way is this (ToCharArray.Reverse):
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)
Reference:
Character is an overloaded term than can mean many things.
A code point is the atomic unit of information. Text is a sequence of
code points. Each code point is a number which is given meaning by the
Unicode standard.
A grapheme is a sequence of one or more code points that are displayed
as a single, graphical unit that a reader recognizes as a single
element of the writing system. For example, both a and ä are
graphemes, but they may consist of multiple code points (e.g. ä may be
two code points, one for the base character a followed by one for the
diaresis; but there's also an alternative, legacy, single code point
representing this grapheme). Some code points are never part of any
grapheme (e.g. the zero-width non-joiner, or directional overrides).
A glyph is an image, usually stored in a font (which is a collection
of glyphs), used to represent graphemes or parts thereof. Fonts may
compose multiple glyphs into a single representation, for example, if
the above ä is a single code point, a font may chose to render that as
two separate, spatially overlaid glyphs. For OTF, the font's GSUB and
GPOS tables contain substitution and positioning information to make
this work. A font may contain multiple alternative glyphs for the same
grapheme, too.

static string reverseString(string text)
{
Char[] a = text.ToCharArray();
string b = "";
for (int q = a.Count() - 1; q >= 0; q--)
{
b = b + a[q].ToString();
}
return b;
}

Related

Property or indexer 'string.this[int]' cannot be assigned to -- it's read only

I didn't get the problem - I was trying to do a simple action:
for(i = x.Length-1, j = 0 ; i >= 0 ; i--, j++)
{
backx[j] = x[i];
}
Both are declared:
String x;
String backx;
What is the problem ? It says the error in the title...
If there is a problem - is there another way to do that?
The result (As the name 'backx' hints) is that backx will contain the string X backwards.
P.S. x is not empty - it contains a substring from another string.
Strings are immutable: you can retrieve the character at a certain position, but you cannot change the character to a new one directly.
Instead you'll have to build a new string with the change. There are several ways to do this, but StringBuilder does the job in a similar fashion to what you already have:
StringBuilder sb = new StringBuilder(backx);
sb[j] = x[i];
backx = sb.ToString();
EDIT: If you take a look at the string public facing API, you'll see this indexer:
public char this[int index] { get; }
This shows that you can "get" a value, but because no "set" is available, you cannot assign values to that indexer.
EDITx2: If you're looking for a way to reverse a string, there are a few different ways, but here's one example with an explanation as to how it works: http://www.dotnetperls.com/reverse-string
String is immutable in .NET - this is why you get the error.
You can get a reverse string with LINQ:
string x = "abcd";
string backx = new string(x.Reverse().ToArray());
Console.WriteLine(backx); // output: "dcba"
String are immuatable. You have convert to Char Array and then you would be able to modify.
Or you can use StringBuilder.
for example
char[] wordArray = word.ToCharArray();
In C# strings are immutable. You cannot "set" Xth character to whatever you want. If yo uwant to construct a new string, or be able to "edit" a string, use i.e. StringBuilder class.
Strings are immutable in C#. You can read more about it here: http://msdn.microsoft.com/en-us/library/362314fe.aspx
Both the variables you have are string while you are treating them as if they were arrays (well, they are). Of course it is a valid statement to access characters from a string through this mechanism, you cannot really assign it that way.
Since you are trying to reverse a string, do take a look at this post. It has lot of information.
public static string ReverseName( string theName)
{
string revName = string.Empty;
foreach (char a in theName)
{
revName = a + revName;
}
return revName;
}
This is simple and does not involve arrays directly.
The code below simply swaps the index of each char in the string which enables you to only have to iterate half way through the original string which is pretty efficient if you're dealing with a lot of characters. The result is the original string reversed. I tested this with a string consisting of 100 characters and it executed in 0.0000021 seconds.
private string ReverseString(string testString)
{
int j = testString.Length - 1;
char[] charArray = new char[testString.Length];
for (int i = 0; i <= j; i++)
{
if (i != j)
{
charArray[i] = testString[j];
charArray[j] = testString[i];
}
j--;
}
return new string(charArray);
}
In case you need to replace e.g. index 2 in string use this (it is ugly, but working and is easily maintainbable)
V1 - you know what you want to put their. Here you saying in pseudocode string[2] = 'R';
row3String.Replace(row3String[2], 'R');
V2 - you need to put their char R or char Y. Here string[2] = 'R' if was 'Y' or if was not stay 'Y' (this one line if needs some form of else)
row3String.Replace(row3String[2], row3String[2].Equals('Y') ? 'R' : 'Y');

.NET StringBuilder - check if ends with string

What is the best (shortest and fastest) way to check if StringBuilder ends with specific string?
If I want to check just one char, that's not a problem sb[sb.Length-1] == 'c', but how to check if it's ends with longer string?
I can think about something like looping from "some string".Length and read characters one by one, but maybe there exists something more simple? :)
At the end I want to have extension method like this:
StringBuilder sb = new StringBuilder("Hello world");
bool hasString = sb.EndsWith("world");
To avoid the performance overhead of generating the full string, you can use the ToString(int,int) overload that takes the index range.
public static bool EndsWith(this StringBuilder sb, string test)
{
if (sb.Length < test.Length)
return false;
string end = sb.ToString(sb.Length - test.Length, test.Length);
return end.Equals(test);
}
Edit: It would probably be desirable to define an overload that takes a StringComparison argument:
public static bool EndsWith(this StringBuilder sb, string test)
{
return EndsWith(sb, test, StringComparison.CurrentCulture);
}
public static bool EndsWith(this StringBuilder sb, string test,
StringComparison comparison)
{
if (sb.Length < test.Length)
return false;
string end = sb.ToString(sb.Length - test.Length, test.Length);
return end.Equals(test, comparison);
}
Edit2: As pointed out by Tim S in the comments, there is a flaw in my answer (and all other answers that assume character-based equality) that affects certain Unicode comparisons. Unicode does not require two (sub)strings to have the same sequence of characters to be considered equal. For example, the precomposed character é should be treated as equal to the character e followed by the combining mark U+0301.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
string s = "We met at the cafe\u0301";
Console.WriteLine(s.EndsWith("café")); // True
StringBuilder sb = new StringBuilder(s);
Console.WriteLine(sb.EndsWith("café")); // False
If you want to handle these cases correctly, it might be easiest to just call StringBuilder.ToString(), and then use the built-in String.EndsWith.
On msdn you can find the topic on how to search text in the StringBuilder object. The two options available to you are:
Call ToString and search the returned String object.
Use the Chars property to sequentially search a range of characters.
Since the first option is out of the question. You'll have to go with the Chars property.
public static class StringBuilderExtensions
{
public static bool EndsWith(this StringBuilder sb, string text)
{
if (sb.Length < text.Length)
return false;
var sbLength = sb.Length;
var textLength = text.Length;
for (int i = 1; i <= textLength; i++)
{
if (text[textLength - i] != sb[sbLength - i])
return false;
}
return true;
}
}
TL;DR
If you're goal is to get a piece or the whole of the StringBuilder's contents in a String object, you should use its ToString function. But if you aren't yet done creating your string, it's better to treat the StringBuilder as a character array and operate in that way than to create a bunch of strings you don't need.
String operations on a character array can become complicated by localization or encoding, since a string can be encoded in many ways (UTF8 or Unicode, for example), but its characters (System.Char) are meant to be 16-bit UTF16 values.
I've written the following method which returns the index of a string if it exists within the StringBuilder and -1 otherwise. You can use this to create the other common String methods like Contains, StartsWith, and EndsWith. This method is preferable to others because it should handle localization and casing properly, and does not force you to call ToString on the StringBuilder. It creates one garbage value if you specify that case should be ignored, and you can fix this to maximize memory savings by using Char.ToLower instead of precomputing the lower case of the string like I do in the function below. EDIT: Also, if you're working with a string encoded in UTF32, you'll have to compare two characters at a time instead of just one.
You're probably better off using ToString unless you're going to be looping, working with large strings, and doing manipulation or formatting.
public static int IndexOf(this StringBuilder stringBuilder, string str, int startIndex = 0, int? count = null, CultureInfo culture = null, bool ignoreCase = false)
{
if (stringBuilder == null)
throw new ArgumentNullException("stringBuilder");
// No string to find.
if (str == null)
throw new ArgumentNullException("str");
if (str.Length == 0)
return -1;
// Make sure the start index is valid.
if (startIndex < 0 && startIndex < stringBuilder.Length)
throw new ArgumentOutOfRangeException("startIndex", startIndex, "The index must refer to a character within the string.");
// Now that we've validated the parameters, let's figure out how many characters there are to search.
var maxPositions = stringBuilder.Length - str.Length - startIndex;
if (maxPositions <= 0) return -1;
// If a count argument was supplied, make sure it's within range.
if (count.HasValue && (count <= 0 || count > maxPositions))
throw new ArgumentOutOfRangeException("count");
// Ensure that "count" has a value.
maxPositions = count ?? maxPositions;
if (count <= 0) return -1;
// If no culture is specified, use the current culture. This is how the string functions behave but
// in the case that we're working with a StringBuilder, we probably should default to Ordinal.
culture = culture ?? CultureInfo.CurrentCulture;
// If we're ignoring case, we need all the characters to be in culture-specific
// lower case for when we compare to the StringBuilder.
if (ignoreCase) str = str.ToLower(culture);
// Where the actual work gets done. Iterate through the string one character at a time.
for (int y = 0, x = startIndex, endIndex = startIndex + maxPositions; x <= endIndex; x++, y = 0)
{
// y is set to 0 at the beginning of the loop, and it is increased when we match the characters
// with the string we're searching for.
while (y < str.Length && str[y] == (ignoreCase ? Char.ToLower(str[x + y]) : str[x + y]))
y++;
// The while loop will stop early if the characters don't match. If it didn't stop
// early, that means we found a match, so we return the index of where we found the
// match.
if (y == str.Length)
return x;
}
// No matches.
return -1;
}
The primary reason one generally uses a StringBuilder object rather than concatenating strings is because of the memory overhead you incur since strings are immutable. The performance hit you see when you do excessive string manipulation without using a StringBuilder is often the result of collecting all the garbage strings you created along the way.
Take this for example:
string firstString = "1st",
secondString = "2nd",
thirdString = "3rd",
fourthString = "4th";
string all = firstString;
all += " & " + secondString;
all += " &" + thirdString;
all += "& " + fourthString + ".";
If you were to run this and open it up in a memory profiler, you'd find a set of strings that look something like this:
"1st", "2nd", "3rd", "4th",
" & ", " & 2nd", "1st & 2nd"
" &", "&3rd", "1st & 2nd &3rd"
"& ", "& 4th", "& 4th."
"1st & 2nd &3rd& 4th."
That's fourteen total objects we created in that scope, but if you don't realize that every single addition operator creates a whole new string every time you might think there's only five. So what happens to the nine other strings? They languish away in memory until the garbage collector decides to pick them up.
So now to my point: if you're trying to find something out about a StringBuilder object and you're not wanting to call ToString(), it probably means you aren't done building that string yet. And if you're trying to find out if the builder ends with "Foo", it's wasteful to call sb.ToString(sb.Length - 1, 3) == "Foo" because you're creating another string object that becomes orphaned and obsolete the minute you made the call.
My guess is that you're running a loop aggregating text into your StringBuilder and you want to end the loop or just do something different if the last few characters are some sentinel value you're expecting.
private static bool EndsWith(this StringBuilder builder, string value) {
return builder.GetLast( value.Length ).SequenceEqual( value );
}
private static IEnumerable<char> GetLast(this StringBuilder builder, int count) {
count = Math.Min( count, builder.Length );
return Enumerable.Range( builder.Length - count, count ).Select( i => builder[ i ] );
}
I'm giving you what you asked for (with the limitations you state) but not the best way to do it. Something like:
StringBuilder sb = new StringBuilder("Hello world");
bool hasString = sb.Remove(1,sb.Length - "world".Length) == "world";

Replace a list of invalid character with their valid version (like tr)

I need to do something like this dreamed .trReplace:
str = str.trReplace("áéíüñ","aeiu&");
It should change this string:
a stríng with inválid charactérs
to:
a string with invalid characters
My current ideas are:
str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...
and:
sb = new StringBuilder(str)
sb.Replace("á","a").
sb.Replace("é","e")
sb.Replace("í","ï"...
But I don't think they are efficient for long strings.
Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:
Fastest way to remove chars from string
C# Stripping / converting one or more characters
There is also a good article on the CodeProject covering the different options.
http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx
To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:
private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
{
{ 'á', 'a'},
{ 'é', 'e'},
{ 'í', 'i'},
{ 'ü', 'u'},
{ 'ñ', '&'}
};
private static string Translate(String source)
{
var startIndex = 0;
var currentIndex = 0;
var result = new StringBuilder(source.Length);
while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
{
result.Append(source.Substring(startIndex, currentIndex - startIndex));
result.Append(ReplacementMappings[source[currentIndex]]);
startIndex = currentIndex + 1;
}
if (startIndex == 0)
return source;
result.Append(source.Substring(startIndex));
return result.ToString();
}
NOTE Not all edge cases have been tested.
NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.
Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).
That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.
Either way, just wanted to add some information to the answers for this question.
I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.
Dictionary<char, char> mappings;
static public string Translate(string s)
{
var t = new StringBuilder(s.Length);
foreach (char c in s)
{
char to;
if (mappings.TryGetValue(c, out to))
t.Append(to);
else
t.Append(c);
}
return t.ToString();
}
What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.
It would be better to use an array of char instead of Stringbuilder.
The indexer is faster than calling the Append method, because:
push all local variables to the stack
move to Append address
return to address
pop all local variables from the stack
The example below is about 20 percent faster (depends on your hardware and input string)
static Dictionary<char, char> mappings;
public static string TranslateV2(string s)
{
var len = s.Length;
var array = new char[len];
char c;
for (var index = 0; index < len; index++)
{
c = s[index];
if (mappings.ContainsKey(c))
array[index] = mappings[c];
else
array[index] = c;
}
return new string(array);
}

how to replace characters in a array quickly

I am using a XML Text reader on a XML file that may contain characters that are invalid for the reader. My initial thought was to create my own version of the stream reader and clean out the bad characters but it is severely slowing down my program.
public class ClensingStream : StreamReader
{
private static char[] badChars = { '\x00', '\x09', '\x0A', '\x10' };
//snip
public override int Read(char[] buffer, int index, int count)
{
var tmp = base.Read(buffer, index, count);
for (int i = 0; i < buffer.Length; ++i)
{
//check the element in the buffer to see if it is one of the bad characters.
if(badChars.Contains(buffer[i]))
buffer[i] = ' ';
}
return tmp;
}
}
according to my profiler the code is spending 88% of its time in if(badChars.Contains(buffer[i])) what is the correct way to do this so I am not causing horrible slowness?
The reason that it spends so much time in that line is because the Contains method loops through the array to look for the character.
Put the characters in a HashSet<char> instead:
private static HashSet<char> badChars =
new HashSet<char>(new char[] { '\x00', '\x09', '\x0A', '\x10' });
The code to check if the set contains the character looks the same as when looking in the array, but it uses the hash code of the character to look for it instead of looping through all the items in the array.
Alternatively, you could put the characters in a switch, that way the compiler would create an efficient comparison:
switch (buffer[i]]) {
case '\x00':
case '\x09':
case '\x0A':
case '\x10': buffer[i] = ' '; break;
}
If you have more characters (five or six IIRC), the compiler will actually create a hash table to look up the cases, so that would be similar to using a HashSet.
You might have better results with a switch statement:
switch (buffer[i])
{
case '\x00':
case '\x09':
case '\x0A':
case '\x10':
buffer[i] = ' ';
break;
}
This should be compiled down to fast code by the JIT compiler at runtime. Heck, the compiler might get close too. You don't need a method call this way either.
You could use regular expressions for that which should be optimized. Read the text into a string and use Replace with your characters in the regular expression afterwards.
However, your code also looks fine to me, I guess regex also can't do anything else than searching through your text... and you need to take a string there which you don't need to do with the other options.
you could check how well it optimises with just checking the read chars, making it
for (int i = index; i < index + count; i++){
//etc
}
Don't know if/how much this would help you, you'd have to profile your real world application to check
Try converting the char[] to a string and then using IndexOfAny.
You could use a boolean array
char[] badChars = { '\x00', '\x09', '\x0A', '\x10' };
char maxChar = badChars.Max();
Debug.Assert(maxChar < 256);
bool[] badCharsTable = new bool[maxChar + 1];
Array.ForEach(badChars, ch => badCharsTable[ch] = true);
and replace badChars.Contains(...) with (ch < badCharsTable.Length && badCharsTable[ch]).
Edit: Finally had time to improve the answer.

Most efficient way to concatenate strings?

What's the most efficient way to concatenate strings?
Rico Mariani, the .NET Performance guru, had an article on this very subject. It's not as simple as one might suspect. The basic advice is this:
If your pattern looks like:
x = f1(...) + f2(...) + f3(...) + f4(...)
that's one concat and it's zippy, StringBuilder probably won't help.
If your pattern looks like:
if (...) x += f1(...)
if (...) x += f2(...)
if (...) x += f3(...)
if (...) x += f4(...)
then you probably want StringBuilder.
Yet another article to support this claim comes from Eric Lippert where he describes the optimizations performed on one line + concatenations in a detailed manner.
The StringBuilder.Append() method is much better than using the + operator. But I've found that, when executing 1000 concatenations or less, String.Join() is even more efficient than StringBuilder.
StringBuilder sb = new StringBuilder();
sb.Append(someString);
The only problem with String.Join is that you have to concatenate the strings with a common delimiter.
Edit: as #ryanversaw pointed out, you can make the delimiter string.Empty.
string key = String.Join("_", new String[]
{ "Customers_Contacts", customerID, database, SessionID });
There are 6 types of string concatenations:
Using the plus (+) symbol.
Using string.Concat().
Using string.Join().
Using string.Format().
Using string.Append().
Using StringBuilder.
In an experiment, it has been proved that string.Concat() is the best way to approach if the words are less than 1000(approximately) and if the words are more than 1000 then StringBuilder should be used.
For more information, check this site.
string.Join() vs string.Concat()
The string.Concat method here is equivalent to the string.Join method invocation with an empty separator. Appending an empty string is fast, but not doing so is even faster, so the string.Concat method would be superior here.
From Chinh Do - StringBuilder is not always faster:
Rules of Thumb
When concatenating three dynamic string values or less, use traditional string concatenation.
When concatenating more than three dynamic string values, use StringBuilder.
When building a big string from several string literals, use either the # string literal or the inline + operator.
Most of the time StringBuilder is your best bet, but there are cases as shown in that post that you should at least think about each situation.
If you're operating in a loop, StringBuilder is probably the way to go; it saves you the overhead of creating new strings regularly. In code that'll only run once, though, String.Concat is probably fine.
However, Rico Mariani (.NET optimization guru) made up a quiz in which he stated at the end that, in most cases, he recommends String.Format.
Here is the fastest method I've evolved over a decade for my large-scale NLP app. I have variations for IEnumerable<T> and other input types, with and without separators of different types (Char, String), but here I show the simple case of concatenating all strings in an array into a single string, with no separator. Latest version here is developed and unit-tested on C# 7 and .NET 4.7.
There are two keys to higher performance; the first is to pre-compute the exact total size required. This step is trivial when the input is an array as shown here. For handling IEnumerable<T> instead, it is worth first gathering the strings into a temporary array for computing that total (The array is required to avoid calling ToString() more than once per element since technically, given the possibility of side-effects, doing so could change the expected semantics of a 'string join' operation).
Next, given the total allocation size of the final string, the biggest boost in performance is gained by building the result string in-place. Doing this requires the (perhaps controversial) technique of temporarily suspending the immutability of a new String which is initially allocated full of zeros. Any such controversy aside, however...
...note that this is the only bulk-concatenation solution on this page which entirely avoids an extra round of allocation and copying by the String constructor.
Complete code:
/// <summary>
/// Concatenate the strings in 'rg', none of which may be null, into a single String.
/// </summary>
public static unsafe String StringJoin(this String[] rg)
{
int i;
if (rg == null || (i = rg.Length) == 0)
return String.Empty;
if (i == 1)
return rg[0];
String s, t;
int cch = 0;
do
cch += rg[--i].Length;
while (i > 0);
if (cch == 0)
return String.Empty;
i = rg.Length;
fixed (Char* _p = (s = new String(default(Char), cch)))
{
Char* pDst = _p + cch;
do
if ((t = rg[--i]).Length > 0)
fixed (Char* pSrc = t)
memcpy(pDst -= t.Length, pSrc, (UIntPtr)(t.Length << 1));
while (pDst > _p);
}
return s;
}
[DllImport("MSVCR120_CLR0400", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe void* memcpy(void* dest, void* src, UIntPtr cb);
I should mention that this code has a slight modification from what I use myself. In the original, I call the cpblk IL instruction from C# to do the actual copying. For simplicity and portability in the code here, I replaced that with P/Invoke memcpy instead, as you can see. For highest performance on x64 (but maybe not x86) you may want to use the cpblk method instead.
From this MSDN article:
There is some overhead associated with
creating a StringBuilder object, both
in time and memory. On a machine with
fast memory, a StringBuilder becomes
worthwhile if you're doing about five
operations. As a rule of thumb, I
would say 10 or more string operations
is a justification for the overhead on
any machine, even a slower one.
So if you trust MSDN go with StringBuilder if you have to do more than 10 strings operations/concatenations - otherwise simple string concat with '+' is fine.
Try this 2 pieces of code and you will find the solution.
static void Main(string[] args)
{
StringBuilder s = new StringBuilder();
for (int i = 0; i < 10000000; i++)
{
s.Append( i.ToString());
}
Console.Write("End");
Console.Read();
}
Vs
static void Main(string[] args)
{
string s = "";
for (int i = 0; i < 10000000; i++)
{
s += i.ToString();
}
Console.Write("End");
Console.Read();
}
You will find that 1st code will end really quick and the memory will be in a good amount.
The second code maybe the memory will be ok, but it will take longer... much longer.
So if you have an application for a lot of users and you need speed, use the 1st. If you have an app for a short term one user app, maybe you can use both or the 2nd will be more "natural" for developers.
Cheers.
It's also important to point it out that you should use the + operator if you are concatenating string literals.
When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs.
How to: Concatenate Multiple Strings (C# Programming Guide)
Adding to the other answers, please keep in mind that StringBuilder can be told an initial amount of memory to allocate.
The capacity parameter defines the maximum number of characters that can be stored in the memory allocated by the current instance. Its value is assigned to the Capacity property. If the number of characters to be stored in the current instance exceeds this capacity value, the StringBuilder object allocates additional memory to store them.
If capacity is zero, the implementation-specific default capacity is used.
Repeatedly appending to a StringBuilder that hasn't been pre-allocated can result in a lot of unnecessary allocations just like repeatedly concatenating regular strings.
If you know how long the final string will be, can trivially calculate it, or can make an educated guess about the common case (allocating too much isn't necessarily a bad thing), you should be providing this information to the constructor or the Capacity property. Especially when running performance tests to compare StringBuilder with other methods like String.Concat, which do the same thing internally. Any test you see online which doesn't include StringBuilder pre-allocation in its comparisons is wrong.
If you can't make any kind of guess about the size, you're probably writing a utility function which should have its own optional argument for controlling pre-allocation.
Following may be one more alternate solution to concatenate multiple strings.
String str1 = "sometext";
string str2 = "some other text";
string afterConcate = $"{str1}{str2}";
string interpolation
Another solution:
inside the loop, use List instead of string.
List<string> lst= new List<string>();
for(int i=0; i<100000; i++){
...........
lst.Add(...);
}
return String.Join("", lst.ToArray());;
it is very very fast.
The most efficient is to use StringBuilder, like so:
StringBuilder sb = new StringBuilder();
sb.Append("string1");
sb.Append("string2");
...etc...
String strResult = sb.ToString();
#jonezy: String.Concat is fine if you have a couple of small things. But if you're concatenating megabytes of data, your program will likely tank.
System.String is immutable. When we modify the value of a string variable then a new memory is allocated to the new value and the previous memory allocation released. System.StringBuilder was designed to have concept of a mutable string where a variety of operations can be performed without allocation separate memory location for the modified string.
I've tested all the methods in this page and at the end I've developed my solution that is the fastest and less memory expensive.
Note: tested in Framework 4.8
[MemoryDiagnoser]
public class StringConcatSimple
{
private string
title = "Mr.", firstName = "David", middleName = "Patrick", lastName = "Callan";
[Benchmark]
public string FastConcat()
{
return FastConcat(
title, " ",
firstName, " ",
middleName, " ",
lastName);
}
[Benchmark]
public string StringBuilder()
{
var stringBuilder =
new StringBuilder();
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringBuilderExact24()
{
var stringBuilder =
new StringBuilder(24);
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringBuilderEstimate100()
{
var stringBuilder =
new StringBuilder(100);
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringPlus()
{
return title + ' ' + firstName + ' ' +
middleName + ' ' + lastName;
}
[Benchmark]
public string StringFormat()
{
return string.Format("{0} {1} {2} {3}",
title, firstName, middleName, lastName);
}
[Benchmark]
public string StringInterpolation()
{
return
$"{title} {firstName} {middleName} {lastName}";
}
[Benchmark]
public string StringJoin()
{
return string.Join(" ", title, firstName,
middleName, lastName);
}
[Benchmark]
public string StringConcat()
{
return string.
Concat(new String[]
{ title, " ", firstName, " ",
middleName, " ", lastName });
}
}
Yes, it use unsafe
public static unsafe string FastConcat(string str1, string str2, string str3, string str4, string str5, string str6, string str7)
{
var capacity = 0;
var str1Length = 0;
var str2Length = 0;
var str3Length = 0;
var str4Length = 0;
var str5Length = 0;
var str6Length = 0;
var str7Length = 0;
if (str1 != null)
{
str1Length = str1.Length;
capacity = str1Length;
}
if (str2 != null)
{
str2Length = str2.Length;
capacity += str2Length;
}
if (str3 != null)
{
str3Length = str3.Length;
capacity += str3Length;
}
if (str4 != null)
{
str4Length = str4.Length;
capacity += str4Length;
}
if (str5 != null)
{
str5Length = str5.Length;
capacity += str5Length;
}
if (str6 != null)
{
str6Length = str6.Length;
capacity += str6Length;
}
if (str7 != null)
{
str7Length = str7.Length;
capacity += str7Length;
}
string result = new string(' ', capacity);
fixed (char* dest = result)
{
var x = dest;
if (str1Length > 0)
{
fixed (char* src = str1)
{
Unsafe.CopyBlock(x, src, (uint)str1Length * 2);
x += str1Length;
}
}
if (str2Length > 0)
{
fixed (char* src = str2)
{
Unsafe.CopyBlock(x, src, (uint)str2Length * 2);
x += str2Length;
}
}
if (str3Length > 0)
{
fixed (char* src = str3)
{
Unsafe.CopyBlock(x, src, (uint)str3Length * 2);
x += str3Length;
}
}
if (str4Length > 0)
{
fixed (char* src = str4)
{
Unsafe.CopyBlock(x, src, (uint)str4Length * 2);
x += str4Length;
}
}
if (str5Length > 0)
{
fixed (char* src = str5)
{
Unsafe.CopyBlock(x, src, (uint)str5Length * 2);
x += str5Length;
}
}
if (str6Length > 0)
{
fixed (char* src = str6)
{
Unsafe.CopyBlock(x, src, (uint)str6Length * 2);
x += str6Length;
}
}
if (str7Length > 0)
{
fixed (char* src = str7)
{
Unsafe.CopyBlock(x, src, (uint)str7Length * 2);
}
}
}
return result;
}
You can edit the method and adapt it to your case. For example you can make it something like
public static unsafe string FastConcat(string str1, string str2, string str3 = null, string str4 = null, string str5 = null, string str6 = null, string str7 = null)
For just two strings, you definitely do not want to use StringBuilder. There is some threshold above which the StringBuilder overhead is less than the overhead of allocating multiple strings.
So, for more that 2-3 strings, use DannySmurf's code. Otherwise, just use the + operator.
It really depends on your usage pattern.
A detailed benchmark between string.Join, string,Concat and string.Format can be found here: String.Format Isn't Suitable for Intensive Logging
(This is actually the same answer I gave to this question)
It would depend on the code.
StringBuilder is more efficient generally, but if you're only concatenating a few strings and doing it all in one line, code optimizations will likely take care of it for you. It's important to think about how the code looks too: for larger sets StringBuilder will make it easier to read, for small ones StringBuilder will just add needless clutter.

Categories

Resources