.NET StringBuilder - check if ends with string - c#

What is the best (shortest and fastest) way to check if StringBuilder ends with specific string?
If I want to check just one char, that's not a problem sb[sb.Length-1] == 'c', but how to check if it's ends with longer string?
I can think about something like looping from "some string".Length and read characters one by one, but maybe there exists something more simple? :)
At the end I want to have extension method like this:
StringBuilder sb = new StringBuilder("Hello world");
bool hasString = sb.EndsWith("world");

To avoid the performance overhead of generating the full string, you can use the ToString(int,int) overload that takes the index range.
public static bool EndsWith(this StringBuilder sb, string test)
{
if (sb.Length < test.Length)
return false;
string end = sb.ToString(sb.Length - test.Length, test.Length);
return end.Equals(test);
}
Edit: It would probably be desirable to define an overload that takes a StringComparison argument:
public static bool EndsWith(this StringBuilder sb, string test)
{
return EndsWith(sb, test, StringComparison.CurrentCulture);
}
public static bool EndsWith(this StringBuilder sb, string test,
StringComparison comparison)
{
if (sb.Length < test.Length)
return false;
string end = sb.ToString(sb.Length - test.Length, test.Length);
return end.Equals(test, comparison);
}
Edit2: As pointed out by Tim S in the comments, there is a flaw in my answer (and all other answers that assume character-based equality) that affects certain Unicode comparisons. Unicode does not require two (sub)strings to have the same sequence of characters to be considered equal. For example, the precomposed character é should be treated as equal to the character e followed by the combining mark U+0301.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
string s = "We met at the cafe\u0301";
Console.WriteLine(s.EndsWith("café")); // True
StringBuilder sb = new StringBuilder(s);
Console.WriteLine(sb.EndsWith("café")); // False
If you want to handle these cases correctly, it might be easiest to just call StringBuilder.ToString(), and then use the built-in String.EndsWith.

On msdn you can find the topic on how to search text in the StringBuilder object. The two options available to you are:
Call ToString and search the returned String object.
Use the Chars property to sequentially search a range of characters.
Since the first option is out of the question. You'll have to go with the Chars property.
public static class StringBuilderExtensions
{
public static bool EndsWith(this StringBuilder sb, string text)
{
if (sb.Length < text.Length)
return false;
var sbLength = sb.Length;
var textLength = text.Length;
for (int i = 1; i <= textLength; i++)
{
if (text[textLength - i] != sb[sbLength - i])
return false;
}
return true;
}
}

TL;DR
If you're goal is to get a piece or the whole of the StringBuilder's contents in a String object, you should use its ToString function. But if you aren't yet done creating your string, it's better to treat the StringBuilder as a character array and operate in that way than to create a bunch of strings you don't need.
String operations on a character array can become complicated by localization or encoding, since a string can be encoded in many ways (UTF8 or Unicode, for example), but its characters (System.Char) are meant to be 16-bit UTF16 values.
I've written the following method which returns the index of a string if it exists within the StringBuilder and -1 otherwise. You can use this to create the other common String methods like Contains, StartsWith, and EndsWith. This method is preferable to others because it should handle localization and casing properly, and does not force you to call ToString on the StringBuilder. It creates one garbage value if you specify that case should be ignored, and you can fix this to maximize memory savings by using Char.ToLower instead of precomputing the lower case of the string like I do in the function below. EDIT: Also, if you're working with a string encoded in UTF32, you'll have to compare two characters at a time instead of just one.
You're probably better off using ToString unless you're going to be looping, working with large strings, and doing manipulation or formatting.
public static int IndexOf(this StringBuilder stringBuilder, string str, int startIndex = 0, int? count = null, CultureInfo culture = null, bool ignoreCase = false)
{
if (stringBuilder == null)
throw new ArgumentNullException("stringBuilder");
// No string to find.
if (str == null)
throw new ArgumentNullException("str");
if (str.Length == 0)
return -1;
// Make sure the start index is valid.
if (startIndex < 0 && startIndex < stringBuilder.Length)
throw new ArgumentOutOfRangeException("startIndex", startIndex, "The index must refer to a character within the string.");
// Now that we've validated the parameters, let's figure out how many characters there are to search.
var maxPositions = stringBuilder.Length - str.Length - startIndex;
if (maxPositions <= 0) return -1;
// If a count argument was supplied, make sure it's within range.
if (count.HasValue && (count <= 0 || count > maxPositions))
throw new ArgumentOutOfRangeException("count");
// Ensure that "count" has a value.
maxPositions = count ?? maxPositions;
if (count <= 0) return -1;
// If no culture is specified, use the current culture. This is how the string functions behave but
// in the case that we're working with a StringBuilder, we probably should default to Ordinal.
culture = culture ?? CultureInfo.CurrentCulture;
// If we're ignoring case, we need all the characters to be in culture-specific
// lower case for when we compare to the StringBuilder.
if (ignoreCase) str = str.ToLower(culture);
// Where the actual work gets done. Iterate through the string one character at a time.
for (int y = 0, x = startIndex, endIndex = startIndex + maxPositions; x <= endIndex; x++, y = 0)
{
// y is set to 0 at the beginning of the loop, and it is increased when we match the characters
// with the string we're searching for.
while (y < str.Length && str[y] == (ignoreCase ? Char.ToLower(str[x + y]) : str[x + y]))
y++;
// The while loop will stop early if the characters don't match. If it didn't stop
// early, that means we found a match, so we return the index of where we found the
// match.
if (y == str.Length)
return x;
}
// No matches.
return -1;
}
The primary reason one generally uses a StringBuilder object rather than concatenating strings is because of the memory overhead you incur since strings are immutable. The performance hit you see when you do excessive string manipulation without using a StringBuilder is often the result of collecting all the garbage strings you created along the way.
Take this for example:
string firstString = "1st",
secondString = "2nd",
thirdString = "3rd",
fourthString = "4th";
string all = firstString;
all += " & " + secondString;
all += " &" + thirdString;
all += "& " + fourthString + ".";
If you were to run this and open it up in a memory profiler, you'd find a set of strings that look something like this:
"1st", "2nd", "3rd", "4th",
" & ", " & 2nd", "1st & 2nd"
" &", "&3rd", "1st & 2nd &3rd"
"& ", "& 4th", "& 4th."
"1st & 2nd &3rd& 4th."
That's fourteen total objects we created in that scope, but if you don't realize that every single addition operator creates a whole new string every time you might think there's only five. So what happens to the nine other strings? They languish away in memory until the garbage collector decides to pick them up.
So now to my point: if you're trying to find something out about a StringBuilder object and you're not wanting to call ToString(), it probably means you aren't done building that string yet. And if you're trying to find out if the builder ends with "Foo", it's wasteful to call sb.ToString(sb.Length - 1, 3) == "Foo" because you're creating another string object that becomes orphaned and obsolete the minute you made the call.
My guess is that you're running a loop aggregating text into your StringBuilder and you want to end the loop or just do something different if the last few characters are some sentinel value you're expecting.

private static bool EndsWith(this StringBuilder builder, string value) {
return builder.GetLast( value.Length ).SequenceEqual( value );
}
private static IEnumerable<char> GetLast(this StringBuilder builder, int count) {
count = Math.Min( count, builder.Length );
return Enumerable.Range( builder.Length - count, count ).Select( i => builder[ i ] );
}

I'm giving you what you asked for (with the limitations you state) but not the best way to do it. Something like:
StringBuilder sb = new StringBuilder("Hello world");
bool hasString = sb.Remove(1,sb.Length - "world".Length) == "world";

Related

How to identify if a string contains more than one instance of a specific character?

I want to check if a string contains more than one character in the string?
If i have a string 12121.23.2 so i want to check if it contains more than one . in the string.
You can compare IndexOf to LastIndexOf to check if there is more than one specific character in a string without explicit counting:
var s = "12121.23.2";
var ch = '.';
if (s.IndexOf(ch) != s.LastIndexOf(ch)) {
...
}
You can easily count the number of occurences of a character with LINQ:
string foo = "12121.23.2";
foo.Count(c => c == '.');
If performance matters, write it yourself:
public static bool ContainsDuplicateCharacter(this string s, char c)
{
bool seenFirst = false;
for (int i = 0; i < s.Length; i++)
{
if (s[i] != c)
continue;
if (seenFirst)
return true;
seenFirst = true;
}
return false;
}
In this way, you only make one pass through the string's contents, and you bail out as early as possible. In the worst case you visit all characters only once. In #dasblinkenlight's answer, you would visit all characters twice, and in #mensi's answer, you have to count all instances, even though once you have two you can stop the calculation. Further, using the Count extension method involves using an Enumerable<char> which will run more slowly than directly accessing the characters at specific indices.
Then you may write:
string s = "12121.23.2";
Debug.Assert(s.ContainsDuplicateCharacter('.'));
Debug.Assert(s.ContainsDuplicateCharacter('1'));
Debug.Assert(s.ContainsDuplicateCharacter('2'));
Debug.Assert(!s.ContainsDuplicateCharacter('3'));
Debug.Assert(!s.ContainsDuplicateCharacter('Z'));
I also think it's nicer to have a function that explains exactly what you're trying to achieve. You could wrap any of the other answers in such a function too, however.
Boolean MoreThanOne(String str, Char c)
{
return str.Count(x => x==c) > 1;
}

Has anyone implemented a Regex and/or Xml parser around StringBuilders or Streams?

I'm building a stress-testing client that hammers servers and analyzes responses using as many threads as the client can muster. I'm constantly finding myself throttled by garbage collection (and/or lack thereof), and in most cases, it comes down to strings that I'm instantiating only to pass them off to a Regex or an Xml parsing routine.
If you decompile the Regex class, you'll see that internally, it uses StringBuilders to do nearly everything, but you can't pass it a string builder; it helpfully dives down into private methods before starting to use them, so extension methods aren't going to solve it either. You're in a similar situation if you want to get an object graph out of the parser in System.Xml.Linq.
This is not a case of pedantic over-optimization-in-advance. I've looked at the Regex replacements inside a StringBuilder question and others. I've also profiled my app to see where the ceilings are coming from, and using Regex.Replace() now is indeed introducing significant overhead in a method chain where I'm trying to hit a server with millions of requests per hour and examine XML responses for errors and embedded diagnostic codes. I've already gotten rid of just about every other inefficiency that's throttling the throughput, and I've even cut a lot of the Regex overhead out by extending StringBuilder to do wildcard find/replace when I don't need capture groups or backreferences, but it seems to me that someone would have wrapped up a custom StringBuilder (or better yet, Stream) based Regex and Xml parsing utility by now.
Ok, so rant over, but am I going to have to do this myself?
Update: I found a workaround which lowered peak memory consumption from multiple gigabytes to a few hundred megs, so I'm posting it below. I'm not adding it as an answer because a) I generally hate to do that, and b) I still want to find out if someone takes the time to customize StringBuilder to do Regexes (or vice-versa) before I do.
In my case, I could not use XmlReader because the stream I am ingesting contains some invalid binary content in certain elements. In order to parse the XML, I have to empty out those elements. I was previously using a single static compiled Regex instance to do the replace, and this consumed memory like mad (I'm trying to process ~300 10KB docs/sec). The change that drastically reduced consumption was:
I added the code from this StringBuilder Extensions article on
CodeProject for the handy IndexOf method.
I added a (very) crude WildcardReplace method that allows one wildcard character (* or ?) per invocation
I replaced the Regex usage with a WildcardReplace() call to empty the contents of the offending elements
This is very unpretty and tested only as far as my own purposes required; I would have made it more elegant and powerful, but YAGNI and all that, and I'm in a hurry. Here's the code:
/// <summary>
/// Performs basic wildcard find and replace on a string builder, observing one of two
/// wildcard characters: * matches any number of characters, or ? matches a single character.
/// Operates on only one wildcard per invocation; 2 or more wildcards in <paramref name="find"/>
/// will cause an exception.
/// All characters in <paramref name="replaceWith"/> are treated as literal parts of
/// the replacement text.
/// </summary>
/// <param name="find"></param>
/// <param name="replaceWith"></param>
/// <returns></returns>
public static StringBuilder WildcardReplace(this StringBuilder sb, string find, string replaceWith) {
if (find.Split(new char[] { '*' }).Length > 2 || find.Split(new char[] { '?' }).Length > 2 || (find.Contains("*") && find.Contains("?"))) {
throw new ArgumentException("Only one wildcard is supported, but more than one was supplied.", "find");
}
// are we matching one character, or any number?
bool matchOneCharacter = find.Contains("?");
string[] parts = matchOneCharacter ?
find.Split(new char[] { '?' }, StringSplitOptions.RemoveEmptyEntries)
: find.Split(new char[] { '*' }, StringSplitOptions.RemoveEmptyEntries);
int startItemIdx;
int endItemIdx;
int newStartIdx = 0;
int length;
while ((startItemIdx = sb.IndexOf(parts[0], newStartIdx)) > 0
&& (endItemIdx = sb.IndexOf(parts[1], startItemIdx + parts[0].Length)) > 0) {
length = (endItemIdx + parts[1].Length) - startItemIdx;
newStartIdx = startItemIdx + replaceWith.Length;
// With "?" wildcard, find parameter length should equal the length of its match:
if (matchOneCharacter && length > find.Length)
break;
sb.Remove(startItemIdx, length);
sb.Insert(startItemIdx, replaceWith);
}
return sb;
}
Here try this. Everything's char based and relatively low level for efficiency. Any number of your *s or ?s can be used. However, your * is now ✪ and your ? is now ★. Around three days of work went into this to make it as clean as possible. You can even enter multiple queries on one sweep!
Example usage: wildcard(new StringBuilder("Hello and welcome"), "hello✪w★l", "be") results in "become".
////////////////////////////////////////////////////////////////////////////////////////////////////////
///////////// Search for a string/s inside 'text' using the 'find' parameter, and replace with a string/s using the replace parameter
// ✪ represents multiple wildcard characters (non-greedy)
// ★ represents a single wildcard character
public StringBuilder wildcard(StringBuilder text, string find, string replace, bool caseSensitive = false)
{
return wildcard(text, new string[] { find }, new string[] { replace }, caseSensitive);
}
public StringBuilder wildcard(StringBuilder text, string[] find, string[] replace, bool caseSensitive = false)
{
if (text.Length == 0) return text; // Degenerate case
StringBuilder sb = new StringBuilder(); // The new adjusted string with replacements
for (int i = 0; i < text.Length; i++) { // Go through every letter of the original large text
bool foundMatch = false; // Assume match hasn't been found to begin with
for(int q=0; q< find.Length; q++) { // Go through each query in turn
if (find[q].Length == 0) continue; // Ignore empty queries
int f = 0; int g = 0; // Query cursor and text cursor
bool multiWild = false; // multiWild is ✪ symbol which represents many wildcard characters
int multiWildPosition = 0;
while(true) { // Loop through query characters
if (f >= find[q].Length || (i + g) >= text.Length) break; // Bounds checking
char cf = find[q][f]; // Character in the query (f is the offset)
char cg = text[i + g]; // Character in the text (g is the offset)
if (!caseSensitive) cg = char.ToLowerInvariant(cg);
if (cf != '★' && cf != '✪' && cg != cf && !multiWild) break; // Break search, and thus no match is found
if (cf == '✪') { multiWild = true; multiWildPosition = f; f++; continue; } // Multi-char wildcard activated. Move query cursor, and reloop
if (multiWild && cg != cf && cf != '★') { f = multiWildPosition + 1; g++; continue; } // Match since MultiWild has failed, so return query cursor to MultiWild position
f++; g++; // Reaching here means that a single character was matched, so move both query and text cursor along one
}
if (f == find[q].Length) { // If true, query cursor has reached the end of the query, so a match has been found!!!
sb.Append(replace[q]); // Append replacement
foundMatch = true;
if (find[q][f - 1] == '✪') { i = text.Length; break; } // If the MultiWild is the last char in the query, then the rest of the string is a match, and so close off
i += g - 1; // Move text cursor along by the amount equivalent to its found match
}
}
if (!foundMatch) sb.Append(text[i]); // If a match wasn't found at that point in the text, then just append the original character
}
return sb;
}
XmlReader is a stream-based XML parser. See http://msdn.microsoft.com/en-us/library/756wd7zs.aspx
The Mono project has switched the license for their core libraries to an MIT X11 license. If you need to create a regex library customized for performance in your particular application, you should be able to start with the latest code from Mono's implementation of the System library.

How do I get the last four characters from a string in C#?

Suppose I have a string:
"34234234d124"
I want to get the last four characters of this string which is "d124". I can use SubString, but it needs a couple of lines of code, including naming a variable.
Is it possible to get this result in one expression with C#?
mystring.Substring(Math.Max(0, mystring.Length - 4)); //how many lines is this?
If you're positive the length of your string is at least 4, then it's even shorter:
mystring.Substring(mystring.Length - 4);
You can use an extension method:
public static class StringExtension
{
public static string GetLast(this string source, int tail_length)
{
if(tail_length >= source.Length)
return source;
return source.Substring(source.Length - tail_length);
}
}
And then call:
string mystring = "34234234d124";
string res = mystring.GetLast(4);
Update 2020: C# 8.0 finally makes this easy:
> "C# 8.0 finally makes this easy"[^4..]
"easy"
You can also slice arrays in the same way, see Indices and ranges.
All you have to do is..
String result = mystring.Substring(mystring.Length - 4);
Ok, so I see this is an old post, but why are we rewriting code that is already provided in the framework?
I would suggest that you add a reference to the framework DLL "Microsoft.VisualBasic"
using Microsoft.VisualBasic;
//...
string value = Strings.Right("34234234d124", 4);
string mystring = "34234234d124";
mystring = mystring.Substring(mystring.Length-4)
Using Substring is actually quite short and readable:
var result = mystring.Substring(mystring.Length - Math.Min(4, mystring.Length));
// result == "d124"
Here is another alternative that shouldn't perform too badly (because of deferred execution):
new string(mystring.Reverse().Take(4).Reverse().ToArray());
Although an extension method for the purpose mystring.Last(4) is clearly the cleanest solution, albeit a bit more work.
You can simply use Substring method of C#. For ex.
string str = "1110000";
string lastFourDigits = str.Substring((str.Length - 4), 4);
It will return result 0000.
A simple solution would be:
string mystring = "34234234d124";
string last4 = mystring.Substring(mystring.Length - 4, 4);
Definition:
public static string GetLast(string source, int last)
{
return last >= source.Length ? source : source.Substring(source.Length - last);
}
Usage:
GetLast("string of", 2);
Result:
of
string var = "12345678";
var = var[^4..];
// var = "5678"
This is index operator that literally means "take last four chars from end (^4) until the end (..)"
mystring = mystring.Length > 4 ? mystring.Substring(mystring.Length - 4, 4) : mystring;
Compared to some previous answers, the main difference is that this piece of code takes into consideration when the input string is:
Null
Longer than or matching the requested length
Shorter than the requested length.
Here it is:
public static class StringExtensions
{
public static string Right(this string str, int length)
{
return str.Substring(str.Length - length, length);
}
public static string MyLast(this string str, int length)
{
if (str == null)
return null;
else if (str.Length >= length)
return str.Substring(str.Length - length, length);
else
return str;
}
}
It is just this:
int count = 4;
string sub = mystring.Substring(mystring.Length - count, count);
I would like to extend the existing answer mentioning using new ranges in C# 8 or higher: To make the code usable for all possible strings, even those shorter than 4, there is some form of condition needed! If you want to copy code, I suggest example 5 or 6.
string mystring ="C# 8.0 finally makes slicing possible";
1: Slicing taking the end part- by specifying how many characters to omit from the beginning- this is, what VS 2019 suggests:
string example1 = mystring[Math.Max(0, mystring.Length - 4)..] ;
2: Slicing taking the end part- by specifying how many characters to take from the end:
string example2 = mystring[^Math.Min(mystring.Length, 4)..] ;
3: Slicing taking the end part- by replacing Max/Min with the ?: operator:
string example3 = (mystring.length > 4)? mystring[^4..] : mystring);
Personally, I like the second and third variant more than the first.
MS doc reference for Indices and ranges:
Null? But we are not done yet concerning universality. Every example so far will throw an exception for null strings. To consider null (if you don´t use non-nullable strings with C# 8 or higher), and to do it without 'if' (classic example 'with if' already given in another answer) we need:
4: Slicing considering null- by specifying how many characters to omit:
string example4 = mystring?[Math.Max(0, mystring.Length - 4)..] ?? string.Empty;
5: Slicing considering null- by specifying how many characters to take:
string example5 = mystring?[^Math.Min(mystring.Length, 4)..] ?? string.Empty;
6: Slicing considering null with the ?: operator (and two other '?' operators ;-) :
(You cannot put that in a whole in a string interpolation e.g. for WriteLine.)
string example6 = (mystring?.Length > 4) ? filePath[^4..] : mystring ?? string.Empty;
7: Equivalent variant with good old Substring() for C# 6 or 7.x:
(You cannot put that in a whole in a string interpolation e.g. for WriteLine.)
string example7 = (mystring?.Length > 4) ? mystring.Substring(mystring.Length- 4) : mystring ?? string.Empty;
Graceful degradation?
I like the new features of C#. Putting them on one line like in the last examples maybe looks a bit excessive. We ended up a little perl´ish didn´t we?
But it´s a good example for learning and ok for me to use it once in a tested library method.
Even better that we can get rid of null in modern C# if we want and avoid all this null-specific handling.
Such a library/extension method as a shortcut is really useful. Despite the advances in C# you have to write your own to get something easier to use than repeating the code above for every small string manipulation need.
I am one of those who began with BASIC, and 40 years ago there was already Right$(,). Funny, that it is possible to use Strings.Right(,) from VB with C# still too as was shown in another answer.
C# has chosen precision over graceful degradation (in opposite to old BASIC).
So copy any appropriate variant you like in these answers and define a graceful shortcut function for yourself, mine is an extension function called RightChars(int).
This works nice, as there are no errors if there are less characters in the string than the requested amount.
using System.Linq;
string.Concat("123".TakeLast(4));
This won't fail for any length string.
string mystring = "34234234d124";
string last4 = Regex.Match(mystring, "(?!.{5}).*").Value;
// last4 = "d124"
last4 = Regex.Match("d12", "(?!.{5}).*").Value;
// last4 = "d12"
This is probably overkill for the task at hand, but if there needs to be additional validation, it can possibly be added to the regular expression.
Edit: I think this regex would be more efficient:
#".{4}\Z"
Using the range operator is the easiest way for me. No many codes is required.
In your case, you can get what you want like this:
// the ^ operator indicates the element position from the end of a sequence
string str = "34234234d124"[^4..]
string x = "34234234d124";
string y = x.Substring(x.Length - 4);
Use a generic Last<T>. That will work with ANY IEnumerable, including string.
public static IEnumerable<T> Last<T>(this IEnumerable<T> enumerable, int nLastElements)
{
int count = Math.Min(enumerable.Count(), nLastElements);
for (int i = enumerable.Count() - count; i < enumerable.Count(); i++)
{
yield return enumerable.ElementAt(i);
}
}
And a specific one for string:
public static string Right(this string str, int nLastElements)
{
return new string(str.Last(nLastElements).ToArray());
}
Suggest using TakeLast method, for example: new String(text.TakeLast(4).ToArray())
I threw together some code modified from various sources that will get the results you want, and do a lot more besides. I've allowed for negative int values, int values that exceed the length of the string, and for end index being less than the start index. In that last case, the method returns a reverse-order substring. There are plenty of comments, but let me know if anything is unclear or just crazy. I was playing around with this to see what all I might use it for.
/// <summary>
/// Returns characters slices from string between two indexes.
///
/// If start or end are negative, their indexes will be calculated counting
/// back from the end of the source string.
/// If the end param is less than the start param, the Slice will return a
/// substring in reverse order.
///
/// <param name="source">String the extension method will operate upon.</param>
/// <param name="startIndex">Starting index, may be negative.</param>
/// <param name="endIndex">Ending index, may be negative).</param>
/// </summary>
public static string Slice(this string source, int startIndex, int endIndex = int.MaxValue)
{
// If startIndex or endIndex exceeds the length of the string they will be set
// to zero if negative, or source.Length if positive.
if (source.ExceedsLength(startIndex)) startIndex = startIndex < 0 ? 0 : source.Length;
if (source.ExceedsLength(endIndex)) endIndex = endIndex < 0 ? 0 : source.Length;
// Negative values count back from the end of the source string.
if (startIndex < 0) startIndex = source.Length + startIndex;
if (endIndex < 0) endIndex = source.Length + endIndex;
// Calculate length of characters to slice from string.
int length = Math.Abs(endIndex - startIndex);
// If the endIndex is less than the startIndex, return a reversed substring.
if (endIndex < startIndex) return source.Substring(endIndex, length).Reverse();
return source.Substring(startIndex, length);
}
/// <summary>
/// Reverses character order in a string.
/// </summary>
/// <param name="source"></param>
/// <returns>string</returns>
public static string Reverse(this string source)
{
char[] charArray = source.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
/// <summary>
/// Verifies that the index is within the range of the string source.
/// </summary>
/// <param name="source"></param>
/// <param name="index"></param>
/// <returns>bool</returns>
public static bool ExceedsLength(this string source, int index)
{
return Math.Abs(index) > source.Length ? true : false;
}
So if you have a string like "This is an extension method", here are some examples and results to expect.
var s = "This is an extension method";
// If you want to slice off end characters, just supply a negative startIndex value
// but no endIndex value (or an endIndex value >= to the source string length).
Console.WriteLine(s.Slice(-5));
// Returns "ethod".
Console.WriteLine(s.Slice(-5, 10));
// Results in a startIndex of 22 (counting 5 back from the end).
// Since that is greater than the endIndex of 10, the result is reversed.
// Returns "m noisnetxe"
Console.WriteLine(s.Slice(2, 15));
// Returns "is is an exte"
Hopefully this version is helpful to someone. It operates just like normal if you don't use any negative numbers, and provides defaults for out of range params.
string var = "12345678";
if (var.Length >= 4)
{
var = var.substring(var.Length - 4, 4)
}
// result = "5678"
assuming you wanted the strings in between a string which is located 10 characters from the last character and you need only 3 characters.
Let's say StreamSelected = "rtsp://72.142.0.230:80/SMIL-CHAN-273/4CIF-273.stream"
In the above, I need to extract the "273" that I will use in database query
//find the length of the string
int streamLen=StreamSelected.Length;
//now remove all characters except the last 10 characters
string streamLessTen = StreamSelected.Remove(0,(streamLen - 10));
//extract the 3 characters using substring starting from index 0
//show Result is a TextBox (txtStreamSubs) with
txtStreamSubs.Text = streamLessTen.Substring(0, 3);
public static string Last(this string source, int tailLength)
{
return tailLength >= source.Length ? source : source[^tailLength..];
}
This is a bit more than the OP question, but is an example of how to use the last 3 of a string for a specific purpose. In my case, I wanted to do a numerical sort (LINQ OrderBy) on a number field that is stored as a string (1 to 3 digit numbers.) So, to get the string numbers to sort like numeric numbers, I need to left-pad the string numbers with zeros and then take the last 3. The resulting OrderBy statement is:
myList = myList.OrderBy(x => string.Concat("00",x.Id)[^3..])
The string.Concat() used in the OrderBy statement results in strings like "001", "002", "011", "021", "114" which sort the way they would if they were stored as numbers.

ReverseString, a C# interview-question

I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.
public string ReverseString(string sz)
{
string result = string.Empty;
for(int i = sz.Length-1; i>=0; i--)
{
result += sz[i]
}
return result;
}
I couldn't spot it. I saw no problems whatsoever.
In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).
I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.
I wanted to know, what is the problem with this code, do you guys see it?
-edit-
I changed this into a wiki because there can be several right answers.
Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.
Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:
public static string Reverse(string sz) // ideal for an extension method
{
if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
char[] chars = sz.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}
The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:
public string ReverseString(string sz)
{
var builder = new StringBuilder(sz.Length);
for(int i = sz.Length-1; i>=0; i--)
{
builder.Append(sz[i]);
}
return builder.ToString();
}
The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.
You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.
What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.
A few comments on the answers given so far:
Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.
Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?
As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)
Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).
A faster way would be (O(n)):
// pseudocode:
static string ReverseString(string input) {
char[] buf = new char[input.Length];
for(int i = 0; i < buf.Length; ++i)
buf[i] = input[input.Length - i - 1];
return new string(buf);
}
You can do this in .NET 3.5 instead:
public static string Reverse(this string s)
{
return new String((s.ToCharArray().Reverse()).ToArray());
}
Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.
You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.
I prefer something like this:
using System;
using System.Text;
namespace SpringTest3
{
static class Extentions
{
static private StringBuilder ReverseStringImpl(string s, int pos, StringBuilder sb)
{
return (s.Length <= --pos || pos < 0) ? sb : ReverseStringImpl(s, pos, sb.Append(s[pos]));
}
static public string Reverse(this string s)
{
return ReverseStringImpl(s, s.Length, new StringBuilder()).ToString();
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("abc".Reverse());
}
}
}
x is the string to reverse.
Stack<char> stack = new Stack<char>(x);
string s = new string(stack.ToArray());
This method cuts the number of iterations in half. Rather than starting from the end, it starts from the beginning and swaps characters until it hits center. Had to convert the string to a char array because the indexer on a string has no setter.
public string Reverse(String value)
{
if (String.IsNullOrEmpty(value)) throw new ArgumentNullException("value");
char[] array = value.ToCharArray();
for (int i = 0; i < value.Length / 2; i++)
{
char temp = array[i];
array[i] = array[(array.Length - 1) - i];
array[(array.Length - 1) - i] = temp;
}
return new string(array);
}
Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string (reversing a string is NOT equal to reversing a sequence of chars)
public static class Test
{
private static System.Collections.Generic.List<string> GraphemeClusters(string s)
{
System.Collections.Generic.List<string> ls = new System.Collections.Generic.List<string>();
System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (enumerator.MoveNext())
{
ls.Add((string)enumerator.Current);
}
return ls;
}
// this
private static string ReverseGraphemeClusters(string s)
{
if(string.IsNullOrEmpty(s) || s.Length == 1)
return s;
System.Collections.Generic.List<string> ls = GraphemeClusters(s);
ls.Reverse();
return string.Join("", ls.ToArray());
}
public static void TestMe()
{
string s = "Les Mise\u0301rables";
// s = "noël";
string r = ReverseGraphemeClusters(s);
// This would be wrong:
// char[] a = s.ToCharArray();
// System.Array.Reverse(a);
// string r = new string(a);
System.Console.WriteLine(r);
}
}
See:
https://vimeo.com/7403673
By the way, in Golang, the correct way is this:
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
And the incorrect way is this (ToCharArray.Reverse):
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)
Reference:
Character is an overloaded term than can mean many things.
A code point is the atomic unit of information. Text is a sequence of
code points. Each code point is a number which is given meaning by the
Unicode standard.
A grapheme is a sequence of one or more code points that are displayed
as a single, graphical unit that a reader recognizes as a single
element of the writing system. For example, both a and ä are
graphemes, but they may consist of multiple code points (e.g. ä may be
two code points, one for the base character a followed by one for the
diaresis; but there's also an alternative, legacy, single code point
representing this grapheme). Some code points are never part of any
grapheme (e.g. the zero-width non-joiner, or directional overrides).
A glyph is an image, usually stored in a font (which is a collection
of glyphs), used to represent graphemes or parts thereof. Fonts may
compose multiple glyphs into a single representation, for example, if
the above ä is a single code point, a font may chose to render that as
two separate, spatially overlaid glyphs. For OTF, the font's GSUB and
GPOS tables contain substitution and positioning information to make
this work. A font may contain multiple alternative glyphs for the same
grapheme, too.
static string reverseString(string text)
{
Char[] a = text.ToCharArray();
string b = "";
for (int q = a.Count() - 1; q >= 0; q--)
{
b = b + a[q].ToString();
}
return b;
}

Most efficient way to concatenate strings?

What's the most efficient way to concatenate strings?
Rico Mariani, the .NET Performance guru, had an article on this very subject. It's not as simple as one might suspect. The basic advice is this:
If your pattern looks like:
x = f1(...) + f2(...) + f3(...) + f4(...)
that's one concat and it's zippy, StringBuilder probably won't help.
If your pattern looks like:
if (...) x += f1(...)
if (...) x += f2(...)
if (...) x += f3(...)
if (...) x += f4(...)
then you probably want StringBuilder.
Yet another article to support this claim comes from Eric Lippert where he describes the optimizations performed on one line + concatenations in a detailed manner.
The StringBuilder.Append() method is much better than using the + operator. But I've found that, when executing 1000 concatenations or less, String.Join() is even more efficient than StringBuilder.
StringBuilder sb = new StringBuilder();
sb.Append(someString);
The only problem with String.Join is that you have to concatenate the strings with a common delimiter.
Edit: as #ryanversaw pointed out, you can make the delimiter string.Empty.
string key = String.Join("_", new String[]
{ "Customers_Contacts", customerID, database, SessionID });
There are 6 types of string concatenations:
Using the plus (+) symbol.
Using string.Concat().
Using string.Join().
Using string.Format().
Using string.Append().
Using StringBuilder.
In an experiment, it has been proved that string.Concat() is the best way to approach if the words are less than 1000(approximately) and if the words are more than 1000 then StringBuilder should be used.
For more information, check this site.
string.Join() vs string.Concat()
The string.Concat method here is equivalent to the string.Join method invocation with an empty separator. Appending an empty string is fast, but not doing so is even faster, so the string.Concat method would be superior here.
From Chinh Do - StringBuilder is not always faster:
Rules of Thumb
When concatenating three dynamic string values or less, use traditional string concatenation.
When concatenating more than three dynamic string values, use StringBuilder.
When building a big string from several string literals, use either the # string literal or the inline + operator.
Most of the time StringBuilder is your best bet, but there are cases as shown in that post that you should at least think about each situation.
If you're operating in a loop, StringBuilder is probably the way to go; it saves you the overhead of creating new strings regularly. In code that'll only run once, though, String.Concat is probably fine.
However, Rico Mariani (.NET optimization guru) made up a quiz in which he stated at the end that, in most cases, he recommends String.Format.
Here is the fastest method I've evolved over a decade for my large-scale NLP app. I have variations for IEnumerable<T> and other input types, with and without separators of different types (Char, String), but here I show the simple case of concatenating all strings in an array into a single string, with no separator. Latest version here is developed and unit-tested on C# 7 and .NET 4.7.
There are two keys to higher performance; the first is to pre-compute the exact total size required. This step is trivial when the input is an array as shown here. For handling IEnumerable<T> instead, it is worth first gathering the strings into a temporary array for computing that total (The array is required to avoid calling ToString() more than once per element since technically, given the possibility of side-effects, doing so could change the expected semantics of a 'string join' operation).
Next, given the total allocation size of the final string, the biggest boost in performance is gained by building the result string in-place. Doing this requires the (perhaps controversial) technique of temporarily suspending the immutability of a new String which is initially allocated full of zeros. Any such controversy aside, however...
...note that this is the only bulk-concatenation solution on this page which entirely avoids an extra round of allocation and copying by the String constructor.
Complete code:
/// <summary>
/// Concatenate the strings in 'rg', none of which may be null, into a single String.
/// </summary>
public static unsafe String StringJoin(this String[] rg)
{
int i;
if (rg == null || (i = rg.Length) == 0)
return String.Empty;
if (i == 1)
return rg[0];
String s, t;
int cch = 0;
do
cch += rg[--i].Length;
while (i > 0);
if (cch == 0)
return String.Empty;
i = rg.Length;
fixed (Char* _p = (s = new String(default(Char), cch)))
{
Char* pDst = _p + cch;
do
if ((t = rg[--i]).Length > 0)
fixed (Char* pSrc = t)
memcpy(pDst -= t.Length, pSrc, (UIntPtr)(t.Length << 1));
while (pDst > _p);
}
return s;
}
[DllImport("MSVCR120_CLR0400", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe void* memcpy(void* dest, void* src, UIntPtr cb);
I should mention that this code has a slight modification from what I use myself. In the original, I call the cpblk IL instruction from C# to do the actual copying. For simplicity and portability in the code here, I replaced that with P/Invoke memcpy instead, as you can see. For highest performance on x64 (but maybe not x86) you may want to use the cpblk method instead.
From this MSDN article:
There is some overhead associated with
creating a StringBuilder object, both
in time and memory. On a machine with
fast memory, a StringBuilder becomes
worthwhile if you're doing about five
operations. As a rule of thumb, I
would say 10 or more string operations
is a justification for the overhead on
any machine, even a slower one.
So if you trust MSDN go with StringBuilder if you have to do more than 10 strings operations/concatenations - otherwise simple string concat with '+' is fine.
Try this 2 pieces of code and you will find the solution.
static void Main(string[] args)
{
StringBuilder s = new StringBuilder();
for (int i = 0; i < 10000000; i++)
{
s.Append( i.ToString());
}
Console.Write("End");
Console.Read();
}
Vs
static void Main(string[] args)
{
string s = "";
for (int i = 0; i < 10000000; i++)
{
s += i.ToString();
}
Console.Write("End");
Console.Read();
}
You will find that 1st code will end really quick and the memory will be in a good amount.
The second code maybe the memory will be ok, but it will take longer... much longer.
So if you have an application for a lot of users and you need speed, use the 1st. If you have an app for a short term one user app, maybe you can use both or the 2nd will be more "natural" for developers.
Cheers.
It's also important to point it out that you should use the + operator if you are concatenating string literals.
When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs.
How to: Concatenate Multiple Strings (C# Programming Guide)
Adding to the other answers, please keep in mind that StringBuilder can be told an initial amount of memory to allocate.
The capacity parameter defines the maximum number of characters that can be stored in the memory allocated by the current instance. Its value is assigned to the Capacity property. If the number of characters to be stored in the current instance exceeds this capacity value, the StringBuilder object allocates additional memory to store them.
If capacity is zero, the implementation-specific default capacity is used.
Repeatedly appending to a StringBuilder that hasn't been pre-allocated can result in a lot of unnecessary allocations just like repeatedly concatenating regular strings.
If you know how long the final string will be, can trivially calculate it, or can make an educated guess about the common case (allocating too much isn't necessarily a bad thing), you should be providing this information to the constructor or the Capacity property. Especially when running performance tests to compare StringBuilder with other methods like String.Concat, which do the same thing internally. Any test you see online which doesn't include StringBuilder pre-allocation in its comparisons is wrong.
If you can't make any kind of guess about the size, you're probably writing a utility function which should have its own optional argument for controlling pre-allocation.
Following may be one more alternate solution to concatenate multiple strings.
String str1 = "sometext";
string str2 = "some other text";
string afterConcate = $"{str1}{str2}";
string interpolation
Another solution:
inside the loop, use List instead of string.
List<string> lst= new List<string>();
for(int i=0; i<100000; i++){
...........
lst.Add(...);
}
return String.Join("", lst.ToArray());;
it is very very fast.
The most efficient is to use StringBuilder, like so:
StringBuilder sb = new StringBuilder();
sb.Append("string1");
sb.Append("string2");
...etc...
String strResult = sb.ToString();
#jonezy: String.Concat is fine if you have a couple of small things. But if you're concatenating megabytes of data, your program will likely tank.
System.String is immutable. When we modify the value of a string variable then a new memory is allocated to the new value and the previous memory allocation released. System.StringBuilder was designed to have concept of a mutable string where a variety of operations can be performed without allocation separate memory location for the modified string.
I've tested all the methods in this page and at the end I've developed my solution that is the fastest and less memory expensive.
Note: tested in Framework 4.8
[MemoryDiagnoser]
public class StringConcatSimple
{
private string
title = "Mr.", firstName = "David", middleName = "Patrick", lastName = "Callan";
[Benchmark]
public string FastConcat()
{
return FastConcat(
title, " ",
firstName, " ",
middleName, " ",
lastName);
}
[Benchmark]
public string StringBuilder()
{
var stringBuilder =
new StringBuilder();
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringBuilderExact24()
{
var stringBuilder =
new StringBuilder(24);
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringBuilderEstimate100()
{
var stringBuilder =
new StringBuilder(100);
return stringBuilder
.Append(title).Append(' ')
.Append(firstName).Append(' ')
.Append(middleName).Append(' ')
.Append(lastName).ToString();
}
[Benchmark]
public string StringPlus()
{
return title + ' ' + firstName + ' ' +
middleName + ' ' + lastName;
}
[Benchmark]
public string StringFormat()
{
return string.Format("{0} {1} {2} {3}",
title, firstName, middleName, lastName);
}
[Benchmark]
public string StringInterpolation()
{
return
$"{title} {firstName} {middleName} {lastName}";
}
[Benchmark]
public string StringJoin()
{
return string.Join(" ", title, firstName,
middleName, lastName);
}
[Benchmark]
public string StringConcat()
{
return string.
Concat(new String[]
{ title, " ", firstName, " ",
middleName, " ", lastName });
}
}
Yes, it use unsafe
public static unsafe string FastConcat(string str1, string str2, string str3, string str4, string str5, string str6, string str7)
{
var capacity = 0;
var str1Length = 0;
var str2Length = 0;
var str3Length = 0;
var str4Length = 0;
var str5Length = 0;
var str6Length = 0;
var str7Length = 0;
if (str1 != null)
{
str1Length = str1.Length;
capacity = str1Length;
}
if (str2 != null)
{
str2Length = str2.Length;
capacity += str2Length;
}
if (str3 != null)
{
str3Length = str3.Length;
capacity += str3Length;
}
if (str4 != null)
{
str4Length = str4.Length;
capacity += str4Length;
}
if (str5 != null)
{
str5Length = str5.Length;
capacity += str5Length;
}
if (str6 != null)
{
str6Length = str6.Length;
capacity += str6Length;
}
if (str7 != null)
{
str7Length = str7.Length;
capacity += str7Length;
}
string result = new string(' ', capacity);
fixed (char* dest = result)
{
var x = dest;
if (str1Length > 0)
{
fixed (char* src = str1)
{
Unsafe.CopyBlock(x, src, (uint)str1Length * 2);
x += str1Length;
}
}
if (str2Length > 0)
{
fixed (char* src = str2)
{
Unsafe.CopyBlock(x, src, (uint)str2Length * 2);
x += str2Length;
}
}
if (str3Length > 0)
{
fixed (char* src = str3)
{
Unsafe.CopyBlock(x, src, (uint)str3Length * 2);
x += str3Length;
}
}
if (str4Length > 0)
{
fixed (char* src = str4)
{
Unsafe.CopyBlock(x, src, (uint)str4Length * 2);
x += str4Length;
}
}
if (str5Length > 0)
{
fixed (char* src = str5)
{
Unsafe.CopyBlock(x, src, (uint)str5Length * 2);
x += str5Length;
}
}
if (str6Length > 0)
{
fixed (char* src = str6)
{
Unsafe.CopyBlock(x, src, (uint)str6Length * 2);
x += str6Length;
}
}
if (str7Length > 0)
{
fixed (char* src = str7)
{
Unsafe.CopyBlock(x, src, (uint)str7Length * 2);
}
}
}
return result;
}
You can edit the method and adapt it to your case. For example you can make it something like
public static unsafe string FastConcat(string str1, string str2, string str3 = null, string str4 = null, string str5 = null, string str6 = null, string str7 = null)
For just two strings, you definitely do not want to use StringBuilder. There is some threshold above which the StringBuilder overhead is less than the overhead of allocating multiple strings.
So, for more that 2-3 strings, use DannySmurf's code. Otherwise, just use the + operator.
It really depends on your usage pattern.
A detailed benchmark between string.Join, string,Concat and string.Format can be found here: String.Format Isn't Suitable for Intensive Logging
(This is actually the same answer I gave to this question)
It would depend on the code.
StringBuilder is more efficient generally, but if you're only concatenating a few strings and doing it all in one line, code optimizations will likely take care of it for you. It's important to think about how the code looks too: for larger sets StringBuilder will make it easier to read, for small ones StringBuilder will just add needless clutter.

Categories

Resources