I have a string in which I need to replace markers with values from a dictionary. It has to be as efficient as possible. Doing a loop with a string.replace is just going to consume memory (strings are immutable, remember). Would StringBuilder.Replace() be any better since this is was designed to work with string manipulations?
I was hoping to avoid the expense of RegEx, but if that is going to be a more efficient then so be it.
Note: I don't care about code complexity, only how fast it runs and the memory it consumes.
Average stats: 255-1024 characters in length, 15-30 keys in the dictionary.
Using RedGate Profiler using the following code
class Program
{
static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
static Dictionary<string, string> values;
static void Main(string[] args)
{
Console.WriteLine("Data length: " + data.Length);
values = new Dictionary<string, string>()
{
{ "ab", "aa" },
{ "jk", "jj" },
{ "lm", "ll" },
{ "yz", "zz" },
{ "ef", "ff" },
{ "st", "uu" },
{ "op", "pp" },
{ "x", "y" }
};
StringReplace(data);
StringBuilderReplace1(data);
StringBuilderReplace2(new StringBuilder(data, data.Length * 2));
Console.ReadKey();
}
private static void StringReplace(string data)
{
foreach(string k in values.Keys)
{
data = data.Replace(k, values[k]);
}
}
private static void StringBuilderReplace1(string data)
{
StringBuilder sb = new StringBuilder(data, data.Length * 2);
foreach (string k in values.Keys)
{
sb.Replace(k, values[k]);
}
}
private static void StringBuilderReplace2(StringBuilder data)
{
foreach (string k in values.Keys)
{
data.Replace(k, values[k]);
}
}
}
String.Replace = 5.843ms
StringBuilder.Replace #1 = 4.059ms
Stringbuilder.Replace #2 = 0.461ms
String length = 1456
stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. If you start with a stringbuilder instead of a string then #2 might be the way to go instead.
As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.
This may be of help:
https://learn.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance
The short answer appears to be that String.Replace is faster, although it may have a larger impact on your memory footprint / garbage collection overhead.
Would stringbuilder.replace be any better [than String.Replace]
Yes, a lot better. And if you can estimate an upper bound for the new string (it looks like you can) then it will probably be fast enough.
When you create it like:
var sb = new StringBuilder(inputString, pessimisticEstimate);
then the StringBuilder will not have to re-allocate its buffer.
Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object). Here is an MSDN link with some details.
Converting data from a String to a StringBuilder and back will take some time. If one is only performing a single replace operation, this time may not be recouped by the efficiency improvements inherent in StringBuilder. On the other hand, if one converts a string to a StringBuilder, then performs many Replace operations on it, and converts it back at the end, the StringBuilder approach is apt to be faster.
Rather than running 15-30 replace operations on the entire string, it might be more efficient to use something like a trie data structure to hold your dictionary. Then you can loop through your input string once to do all your searching/replacing.
It will depend a lot on how many of the markers are present in a given string on average.
Performance of searching for a key is likely to be similar between StringBuilder and String, but StringBuilder will win if you have to replace many markers in a single string.
If you only expect one or two markers per string on average, and your dictionary is small, I would just go for the String.Replace.
If there are many markers, you might want to define a custom syntax to identify markers - e.g. enclosing in braces with a suitable escaping rule for a literal brace. You can then implement a parsing algorithm that iterates through the characters of the string once, recognizing and replacing each marker that it finds. Or use a regex.
My two cents here, I just wrote couple of lines of code to test how each method performs and, as expected, result is "it depends".
For longer strings Regex seems to be performing better, for shorter strings, String.Replace it is. I can see that usage of StringBuilder.Replace is not very useful, and if wrongly used, it could be lethal in GC perspective (I tried to share one instance of StringBuilder).
Check my StringReplaceTests GitHub repo.
The problem with #DustinDavis' answer is that it recursively operates on the same string. Unless you're planning on doing a back-and-forth type of manipulation, you really should have separate objects for each manipulation case in this kind of test.
I decided to create my own test because I found some conflicting answers all over the Web, and I wanted to be completely sure. The program I am working on deals with a lot of text (files with tens of thousands of lines in some cases).
So here's a quick method you can copy and paste and see for yourself which is faster. You may have to create your own text file to test, but you can easily copy and paste text from anywhere and make a large enough file for yourself:
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Windows;
void StringReplace_vs_StringBuilderReplace( string file, string word1, string word2 )
{
using( FileStream fileStream = new FileStream( file, FileMode.Open, FileAccess.Read ) )
using( StreamReader streamReader = new StreamReader( fileStream, Encoding.UTF8 ) )
{
string text = streamReader.ReadToEnd(),
#string = text;
StringBuilder #StringBuilder = new StringBuilder( text );
int iterations = 10000;
Stopwatch watch1 = new Stopwatch.StartNew();
for( int i = 0; i < iterations; i++ )
if( i % 2 == 0 ) #string = #string.Replace( word1, word2 );
else #string = #string.Replace( word2, word1 );
watch1.Stop();
double stringMilliseconds = watch1.ElapsedMilliseconds;
Stopwatch watch2 = new Stopwatch.StartNew();
for( int i = 0; i < iterations; i++ )
if( i % 2 == 0 ) #StringBuilder = #StringBuilder .Replace( word1, word2 );
else #StringBuilder = #StringBuilder .Replace( word2, word1 );
watch2.Stop();
double StringBuilderMilliseconds = watch1.ElapsedMilliseconds;
MessageBox.Show( string.Format( "string.Replace: {0}\nStringBuilder.Replace: {1}",
stringMilliseconds, StringBuilderMilliseconds ) );
}
}
I got that string.Replace() was faster by about 20% every time swapping out 8-10 letter words. Try it for yourself if you want your own empirical evidence.
Related
Is there any way to make Search and addToSearch faster?
I am trying to make it faster. I am not sure if regex in addtosearch can be a problem, it is really small. I am out ofideas how to optimize it further. Now i am just trying to meet word count. I wonder if there is a way to concatenate parts of name that are not empty more effectivly than i do.
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System;
namespace AutoComplete
{
public struct FullName
{
public string Name;
public string Surname;
public string Patronymic;
}
public class AutoCompleter
{
private List<string> listOfNames = new List<string>();
private static readonly Regex sWhitespace = new Regex(#"\s+");
public void AddToSearch(List<FullName> fullNames)
{
foreach (FullName i in fullNames)
{
string nameToAdd = "";
if (!string.IsNullOrWhiteSpace(i.Surname))
{
nameToAdd += sWhitespace.Replace(i.Surname, "") + " ";
}
if (!string.IsNullOrWhiteSpace(i.Name))
{
nameToAdd += sWhitespace.Replace(i.Name, "") + " ";
}
if (!string.IsNullOrWhiteSpace(i.Patronymic))
{
nameToAdd += sWhitespace.Replace(i.Patronymic, "") + " ";
}
listOfNames.Add(nameToAdd.Substring(0, nameToAdd.Length - 1));
}
}
public List<string> Search(string prefix)
{
if (prefix.Length > 100 || string.IsNullOrWhiteSpace(prefix))
{
throw new System.Exception();
}
List<string> namesWithPrefix = new List<string>();
foreach (string name in listOfNames)
{
if (IsPrefix(prefix, name))
{
namesWithPrefix.Add(name);
}
}
return namesWithPrefix;
}
private bool IsPrefix(string prefix, string stringToSearch)
{
if (stringToSearch.Length < prefix.Length)
{
return false;
}
for (int i = 0; i < prefix.Length; i++)
{
if (prefix[i] != stringToSearch[i])
{
return false
}
}
return true
}
}
}
Regular expression (Regexp) are great because of their ease-of use and flexibility but most Regexp engines are actually quite slow. This is the case for the one of C#. Moreover, strings can contain Unicode character and "\s" needs to consider all the (fancy) spaces characters included in the Unicode character set. This make Regexp search/replace much slower. If you know your input does not contain such characters (eg. ASCII), then you can write a much faster implementation. Alternatively, you can play with RegexpOptions like Compiled and CultureInvariant so to reduce a bit the run time.
The AddToSearch performs many hidden allocations. Indeed, += create a new string (because C# string are immutable and not designed to be often resized) and Replace calls does allocate new strings too. You can speed up the computation by directly replace and write the result in a preallocated buffer and simply copy the result with a Substring like you currently do.
Search is fine and it is not easy to optimize it. That being said, if listOfNames is big, then you can use multiple threads so to significantly speed up the computation. Be careful though because Add is not thread-safe. Parallel linkq may help you to do that easily (I never tested it though).
Another solution to speed up a bit the computation of Search is to start the loop of IsPrefix from prefix.Length-1. Indeed, if most string contains the beginning of the prefix, then a significant portion of the time will be spend comparing nearly equal characters. The probability that prefix[prefix.Length-1] != stringToSearch[prefix.Length-1] is higher than prefix[0] != stringToSearch[0]. Additionally, partial loop unrolling may help a bit to speed up the function if the JIT is not able to do that.
Others have already pointed out that the use of regex can be problematic. I would personally consider using str.Replace(" ", String.Empty) - if I understood the regex correctly; I normally try to avoid regex as I have a hard time reading code using regex. Note that String.Empty does not allocate a new string.
That said, I think performance could boost if you would not store the names in a List but at least order the list alpabetically. Thus you do not need to iterate all elemnts of the list but e.g. use binary search to find all elements matching a given prefix - as range within the list of names you already have.
I'm working on a book encryption program for one of my courses and I've run into a problem. Our professor gave us the example of using say Pride and Prejudice as the book used to encrypt, so I chose that one to test my program. The current function I'm using to remove the punctuation from the string is taking so long that the program is being forced into break mode. This function works for smaller strings even pages long, but when I fed it Pride and Prejudice it takes way to long.
public void removePunctuation(ref string s) {
string result = "";
for (int i = 0; i < s.Length; i++) {
if (Char.IsWhiteSpace(s[i])) {
result += ' ';
} else if (!Char.IsLetter(s[i]) && !Char.IsNumber(s[i])) {
// do nothing
} else {
result += s[i];
}
}
s = result;
}
So I think I need a faster way to remove punctuation from this string if anyone has any suggestions? I know looping through every character is horrible, but I'm stumped and I was never taught Regex in depth.
Edit: I was asked how I was storing the string in the dictionary class! This is the constructor for another class that actually uses the formatted string.
public CodeBook(string book)
{
BookMap = new Dictionary<string, List<int>>();
Key = book.Split(null).ToList(); // split string into words
foreach(string s in Key)
{
if (!BookMap.Keys.Contains(s))
{
BookMap.Add(s, Enumerable.Range(0, Key.Count).Where(i => Key[i] == s).ToList());
// add word and add list of occurrances of word
}
}
}
This is slow because you construct string by concatenations in a loop. You have several approaches that are more performant:
Use StringBuilder - unlike string concatenation which constructs a new object each time you add a character, this approach expands the string under construction by larger chunks, preventing excessive garbage creation.
Use LINQ's filtering with Where - this approach constructs an array of chars in a single shot, then constructs a single string from it.
Use regular expression's Replace - this method is optimized to deal with strings of virtually unlimited sizes.
Roll your own algorithm - create an array of chars that corresponds to the length of the original string. Walk through the string, and add the characters that you wish to keep to the array. Use string's constructor that takes the array, the initial index, and the length to construct the string at once.
Looping through every character once is not that bad. You're doing it all in one pass, that's not trivial to avoid.
The problem lies in the fact that the framework will need to allocate a new copy of the (partial) string whenever you do something like
result += s[i];
You can avoid that by introducing a StringBuilder documented here to append non-punctuation characters as you go.
public string removePunctuation(string s)
{
var result = new StringBuilder();
for (int i = 0; i < s.Length; i++) {
if (Char.IsWhiteSpace(s[i])) {
result.Append(" ");
} else if (!Char.IsLetter(s[i]) && !Char.IsNumber(s[i])) {
// do nothing
} else {
result.Append(s[i]);
}
}
return result.ToString();
}
You could further reduce the number of necessary Append calls with a refined algorithm, for example look ahead to the next punctuation and append larger portions at once, or use an existing string manipulation library like RegEx. But the introduction of StringBuilder above should give you a noticable performance gain already.
I was never taught Regex in depth
Use the search provider of your choice, you may end up with a tested solution which you can just study and use: https://stackoverflow.com/a/5871826/1132334
You can use Regex to remove punctuations as below.
public string removePunctuation(string s)
{
string result = Regex.Replace(s, #"[^\w\s]", "");
return result;
}
^ Means: not these characters (letters, numbers).
\w Means: word characters.
\s Means: space characters.
I need to do something like this dreamed .trReplace:
str = str.trReplace("áéíüñ","aeiu&");
It should change this string:
a stríng with inválid charactérs
to:
a string with invalid characters
My current ideas are:
str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...
and:
sb = new StringBuilder(str)
sb.Replace("á","a").
sb.Replace("é","e")
sb.Replace("í","ï"...
But I don't think they are efficient for long strings.
Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:
Fastest way to remove chars from string
C# Stripping / converting one or more characters
There is also a good article on the CodeProject covering the different options.
http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx
To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:
private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
{
{ 'á', 'a'},
{ 'é', 'e'},
{ 'í', 'i'},
{ 'ü', 'u'},
{ 'ñ', '&'}
};
private static string Translate(String source)
{
var startIndex = 0;
var currentIndex = 0;
var result = new StringBuilder(source.Length);
while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
{
result.Append(source.Substring(startIndex, currentIndex - startIndex));
result.Append(ReplacementMappings[source[currentIndex]]);
startIndex = currentIndex + 1;
}
if (startIndex == 0)
return source;
result.Append(source.Substring(startIndex));
return result.ToString();
}
NOTE Not all edge cases have been tested.
NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.
Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).
That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.
Either way, just wanted to add some information to the answers for this question.
I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.
Dictionary<char, char> mappings;
static public string Translate(string s)
{
var t = new StringBuilder(s.Length);
foreach (char c in s)
{
char to;
if (mappings.TryGetValue(c, out to))
t.Append(to);
else
t.Append(c);
}
return t.ToString();
}
What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.
It would be better to use an array of char instead of Stringbuilder.
The indexer is faster than calling the Append method, because:
push all local variables to the stack
move to Append address
return to address
pop all local variables from the stack
The example below is about 20 percent faster (depends on your hardware and input string)
static Dictionary<char, char> mappings;
public static string TranslateV2(string s)
{
var len = s.Length;
var array = new char[len];
char c;
for (var index = 0; index < len; index++)
{
c = s[index];
if (mappings.ContainsKey(c))
array[index] = mappings[c];
else
array[index] = c;
}
return new string(array);
}
If I want to concatenate a string N number of times, which method should i prefer?
Take this code as an example:
public static string Repeat(this string instance, int times)
{
var result = string.Empty;
for (int i = 0; i < times; i++)
result += instance;
return result;
}
This method may be invoked with "times" set to 5, or 5000. What method should I prefer to use?
string.Join? Stringbuilder? Just standard string.Concat?
A similar function is going to be implemented in a commercial library so I really need the "optimal" way to do this.
public static string Repeat(this string instance, int times)
{
if (times == 1 || string.IsNullOrEmpty(instance)) return instance;
if (times == 0) return "";
if (times < 0) throw new ArgumentOutOfRangeException("times");
StringBuilder sb = new StringBuilder(instance.Length * times);
for (int i = 0; i < times; i++)
sb.Append(instance);
return sb.ToString();
}
Stringbuilder ofcourse. It is meant for fast string join operations, because it won't create new object each time you want to join a string.
For details see here.
StringBuilder.
"result += result;"
creates a new string each and every time you do the assignment, and assign that new string to your variable, since strings are immutable.
Go with StringBuilder, definitely.
Never make an assumption that one method is faster than another -- you must alway measure the performance of both and then decide.
Surprisingly, for smaller numbers of iterations, just a standard string concatenation (result += string) is often faster than using a string builder.
If you know that the number of iterations will always be the same (e.g. it will always be 50 iterations), then I would suggest that you make some performance measurements using different methods.
If you really want to get clever, make performance measurements against number of iterations and you can find the 'crossover point' where one method is faster than another and hard-code that threshold into the method:
if(iterations < 30)
{
CopyWithConcatenation();
}
else
{
CopyWithStringBuilder();
}
The exact performance crossover point will depend on the specific detail of your code and you'll never be able to find out what they are without making performance measurements.
To complicate things a little more, StringBuilder has 'neater' memory management that string concatenation (which creates more temporary instances) so this might also have an impact on overall performance outside of your string loop (like the next time the Garbage Collector runs).
Let us know how you got on.
I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.
public string ReverseString(string sz)
{
string result = string.Empty;
for(int i = sz.Length-1; i>=0; i--)
{
result += sz[i]
}
return result;
}
I couldn't spot it. I saw no problems whatsoever.
In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).
I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.
I wanted to know, what is the problem with this code, do you guys see it?
-edit-
I changed this into a wiki because there can be several right answers.
Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.
Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:
public static string Reverse(string sz) // ideal for an extension method
{
if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
char[] chars = sz.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}
The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:
public string ReverseString(string sz)
{
var builder = new StringBuilder(sz.Length);
for(int i = sz.Length-1; i>=0; i--)
{
builder.Append(sz[i]);
}
return builder.ToString();
}
The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.
You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.
What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.
A few comments on the answers given so far:
Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.
Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?
As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)
Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).
A faster way would be (O(n)):
// pseudocode:
static string ReverseString(string input) {
char[] buf = new char[input.Length];
for(int i = 0; i < buf.Length; ++i)
buf[i] = input[input.Length - i - 1];
return new string(buf);
}
You can do this in .NET 3.5 instead:
public static string Reverse(this string s)
{
return new String((s.ToCharArray().Reverse()).ToArray());
}
Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.
You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.
I prefer something like this:
using System;
using System.Text;
namespace SpringTest3
{
static class Extentions
{
static private StringBuilder ReverseStringImpl(string s, int pos, StringBuilder sb)
{
return (s.Length <= --pos || pos < 0) ? sb : ReverseStringImpl(s, pos, sb.Append(s[pos]));
}
static public string Reverse(this string s)
{
return ReverseStringImpl(s, s.Length, new StringBuilder()).ToString();
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("abc".Reverse());
}
}
}
x is the string to reverse.
Stack<char> stack = new Stack<char>(x);
string s = new string(stack.ToArray());
This method cuts the number of iterations in half. Rather than starting from the end, it starts from the beginning and swaps characters until it hits center. Had to convert the string to a char array because the indexer on a string has no setter.
public string Reverse(String value)
{
if (String.IsNullOrEmpty(value)) throw new ArgumentNullException("value");
char[] array = value.ToCharArray();
for (int i = 0; i < value.Length / 2; i++)
{
char temp = array[i];
array[i] = array[(array.Length - 1) - i];
array[(array.Length - 1) - i] = temp;
}
return new string(array);
}
Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string (reversing a string is NOT equal to reversing a sequence of chars)
public static class Test
{
private static System.Collections.Generic.List<string> GraphemeClusters(string s)
{
System.Collections.Generic.List<string> ls = new System.Collections.Generic.List<string>();
System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (enumerator.MoveNext())
{
ls.Add((string)enumerator.Current);
}
return ls;
}
// this
private static string ReverseGraphemeClusters(string s)
{
if(string.IsNullOrEmpty(s) || s.Length == 1)
return s;
System.Collections.Generic.List<string> ls = GraphemeClusters(s);
ls.Reverse();
return string.Join("", ls.ToArray());
}
public static void TestMe()
{
string s = "Les Mise\u0301rables";
// s = "noël";
string r = ReverseGraphemeClusters(s);
// This would be wrong:
// char[] a = s.ToCharArray();
// System.Array.Reverse(a);
// string r = new string(a);
System.Console.WriteLine(r);
}
}
See:
https://vimeo.com/7403673
By the way, in Golang, the correct way is this:
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
And the incorrect way is this (ToCharArray.Reverse):
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)
Reference:
Character is an overloaded term than can mean many things.
A code point is the atomic unit of information. Text is a sequence of
code points. Each code point is a number which is given meaning by the
Unicode standard.
A grapheme is a sequence of one or more code points that are displayed
as a single, graphical unit that a reader recognizes as a single
element of the writing system. For example, both a and ä are
graphemes, but they may consist of multiple code points (e.g. ä may be
two code points, one for the base character a followed by one for the
diaresis; but there's also an alternative, legacy, single code point
representing this grapheme). Some code points are never part of any
grapheme (e.g. the zero-width non-joiner, or directional overrides).
A glyph is an image, usually stored in a font (which is a collection
of glyphs), used to represent graphemes or parts thereof. Fonts may
compose multiple glyphs into a single representation, for example, if
the above ä is a single code point, a font may chose to render that as
two separate, spatially overlaid glyphs. For OTF, the font's GSUB and
GPOS tables contain substitution and positioning information to make
this work. A font may contain multiple alternative glyphs for the same
grapheme, too.
static string reverseString(string text)
{
Char[] a = text.ToCharArray();
string b = "";
for (int q = a.Count() - 1; q >= 0; q--)
{
b = b + a[q].ToString();
}
return b;
}