Is StringBuilder really faster than Aggreggate?

Is StringBuilder really faster than Aggreggate? - c#

string c = tmpArr[0].Aggregate(string.Empty, (current, m) => current + (m.Name + " "));
StringBuilder sb = new StringBuilder();
foreach (Mobile m in tmpArr[0])
sb.Append(m.Name + " ");
sb.ToString();
which of those two is faster? aggregate certainly is cleaner, but is it fast or is it the same as doing
foreach(Mobile m in tmpArr[0])
c += m.Name + " ";
what I really would like to do is something like string.Join(",",tmpArr[0]), but I don't want it to concat their ToString values, just their Names, how would I do that best?
my problem with not using string.Join is I would actually have to do something like this:
string separator = "";
StringBuilder sb = new StringBuilder();
foreach (Mobile m in tmpArr[0])
{
separator = ", ";
sb.Append(separator + m.Name);
}

If you append strings in a loop (c += m.Name + " ";) you are causing lots of intermediate strings to be created; this causes "telescopic" memory usage, and puts extra load on GC. Aggregate, mixed with the fluent-API of StringBuilder can help here - but as would looping with StringBuilder. It isn't Aggregate that is important: it is not creating lots of intermediate strings.
For example, I would use:
foreach (Mobile m in tmpArr[0])
sb.Append(m.Name).Append(" ");
even fewer ;p
And for a similar example using StringBuilder in Aggregate:
string c = tmpArr[0].Aggregate(new StringBuilder(),
(current, m) => current.Append(m.Name).Append(" ")).ToString();

I don't want it to concat their ToString values, just their Names, how would I do that best?
string.Join(",",tmpArr[0].Select(t => t.Name).ToArray())
But most of the time It. Just. Doesn't. Matter!

As string is Immutable, add operation has performance cost. This is what the StringBuilder is mainly designed for, it acts like "Mutable" String. I haven't done much benchmarking for the speed, but for memory optimizations StringBuilder is definitely better.

Aggregate runs an anonymous method against each item in the IEnumerable. This method is passed to the System-defined Func<> delegate which returns an out parameter.
It's basically like running a function that does the appending as many times.
So allocation/deallocation on the stack for the method calls etc certainly has more overhead than running a simple for/foreach loop
So, in my opinion the second method would be faster.

Aggregate itself is not the problem. The problem is you are concatenating strings in a loop. When you concatenate two strings with + operator, a new place must be allocated in memory and the two strings are copied into it. So if you use the + five times, you actually create five new strings. That's why you should use StringBuilder or Join which avoid this.
If you want to use Join along with linq for better readability, you still can, just don't use Aggregate but something like Select and ToArray.

Something like this?
string joined = string.Join(",", myItems.Select(x => x.Name).ToArray());

Related

Find a Class object's Item with Maximum character count in a User Defined List

I am new to the LINQ, I have got one scenario where I am stuck to do a task in single expression rather than looping.
In a Custom class array and within that array a specific string parameter maximum length?
StringBuilder sb = new StringBuilder();
CultureInfo[] cinfo = CultureInfo.GetCultures(CultureTypes.AllCultures);
foreach (CultureInfo ci in cinfo)
{
sb.Append(ci.Name + "," + ci.TwoLetterISOLanguageName + "," + ci.ThreeLetterISOLanguageName + "," + ci.ThreeLetterWindowsLanguageName + "," + ci.DisplayName + "," + ci.EnglishName + Environment.NewLine);
}
lb.Items.Add(sb.ToString());
txtData.Text = sb.ToString();
here cinfo is the collection and I want to find out the element in the whole array which has maximum character in Name parameter.

You can order by the length of Name then take the first:
var cultureWithLongestName = CultureInfo
.GetCultures(CultureTypes.AllCultures)
.OrderByDescending(x => x.Name.Length)
.FirstOrDefault();

The easiest way to go about doing something like that is to order by the length and take the first element. We can do that like so:
var elementWithLongestName = collection.OrderByDescending(e => e.Name.Length).First();
HOWEVER, sorting with OrderByDescending gives you O(nlog(n)) time complexity (it uses Quicksort). We can do better! we can solve this with linear time (O(n)):
Using .NET 5.0 or below
Unfortunately, until .NET 6.0 (currently in preview) there is no built-in LINQ operator that finds a maximum element by one of its properties. There are two ways we can achieve this anyway:
Using vanilla LINQ
We can find the maximum length in the collection, and then find an element with that length:
var longestNameLength = collection.Select(e => e.Name.Length).Max();
var elementWithLongestName = collection.First(e => e.Name.Length == longestNameLength);
One downside of this approach (besides being more verbose) is that we're enumerating the same collection twice. In your case, that's fine because the source collection is a simple in-memory array. If the source collection was a more complicated IEnumerable (like a collection that lazily loads items from a database), it may be a good idea to turn it into an in-memory collection first by using ToList() or ToArray().
Using MoreLinq
You can use the popular package MoreLinq to achieve what you asked for:
var elementWithLongestName = collection.MaxBy(e => e.Name.Length);
Using .NET 6.0 and above
In this version, the MaxBy operator (that does exactly what you need) is included out of the box and there is no need to use a third-party library. You can just write:
var elementWithLongestName = collection.MaxBy(e => e.Name.Length);
NOTE: a stable release of .NET 6.0 is scheduled for November 2021.

Iterating through object properties in IList without a loop

Consider the following situation:
public class Employee
{
public string Name {get; set}
public string Email {get; set}
}
public class EnployeeGroup
{
//List of employees in marketting
public IList<Employee> MarkettingEmployees{ get; }
//List of employees in sales
public IList<Employee> SalesEmployees{ get; }
}
private EnployeeGroup GroupA;
int MarkettingCount;
string MarkettingNames;
MarkettingCount = GroupA.MarkettingEmployees.Count; //assigns MarkettingCount=5, this will always be 5-10 employees
MarkettingNames = <**how can i join all the GroupA.MarkettingEmployees.Name into a comma separated string?** >
//I tried a loop:
foreach(Employee MktEmployee in GroupA.MarkettingEmployees)
{
MarkettingNames += MktEmployee.Name + ", ";
}
The loop works, but i want to know:
Is Looping the most efficient/elegant way of doing this? If not then what are the better alternatives? I tried string.join but couldnt get it working..
I want to avoid Linq..

You need a little bit of LINQ whether you like it or not ;)
MarkettingNames = string.Join(", ", GroupA.MarkettingEmployees.Select(e => e.Name));

From a practicality standpoint, there's no reasonable argument for avoiding a loop. Iterations are at the hard of every general-purpose programming language.
Using LINQ is elegant in simple cases. Again, there's no sound reason to avoid it per se.
In case you are looking for a rather obscure, academic solution, there's always tail recursion. However, your data structure would have to be adapted for it. Note that even if you use it, a smart compiler will detect it and optimize into a loop. The odds are agains you!

As an alternative you could use a StringBuilder with Append instead of creating a new string at each iteration

This would be much more efficient (see caveat below):
var stringBuilder = new StringBuilder();
foreach (Employee MktEmployee in GroupA.MarkettingEmployees)
{
stringBuilder.Append(MktEmployee.Name + ", ");
}
Then this:
foreach(Employee MktEmployee in GroupA.MarkettingEmployees)
{
MarkettingNames += MktEmployee.Name + ", ";
}
Edit: If you were to have a large amount of employees this would be much more efficient. However, a trivial loop of 5-10 is actually slightly less efficient.
In small cases - this isn't going to be that large of a hit on performance, but in large cases the pay off will be significant.
Also, if you are to use the explicit loop approach, it's probably best to trim off that last ", " by using something like:
myString = myString.Trim().TrimEnd(',');
The article below explains when you should use StringBuilder to concatenate strings.
In short, in the approach you use: the concatenation is creating a new string each time, which obviously eats up a lot of memory. You also need to copy all the data from the existing string of MarkettingNames to the new string being appended yet another MktEmployee.Name + ", ".
Thank you, Jon Skeet: http://www.yoda.arachsys.com/csharp/stringbuilder.html

Best way to remove the last character from a string built with stringbuilder

I have the following
data.AppendFormat("{0},",dataToAppend);
The problem with this is that I am using it in a loop and there will be a trailing comma. What is the best way to remove the trailing comma?
Do I have to change data to a string and then substring it?

The simplest and most efficient way is to perform this command:
data.Length--;
by doing this you move the pointer (i.e. last index) back one character but you don't change the mutability of the object. In fact, clearing a StringBuilder is best done with Length as well (but do actually use the Clear() method for clarity instead because that's what its implementation looks like):
data.Length = 0;
again, because it doesn't change the allocation table. Think of it like saying, I don't want to recognize these bytes anymore. Now, even when calling ToString(), it won't recognize anything past its Length, well, it can't. It's a mutable object that allocates more space than what you provide it, it's simply built this way.

Just use
string.Join(",", yourCollection)
This way you don't need the StringBuilder and the loop.
Long addition about async case. As of 2019, it's not a rare setup when the data are coming asynchronously.
In case your data are in async collection, there is no string.Join overload taking IAsyncEnumerable<T>. But it's easy to create one manually, hacking the code from string.Join:
public static class StringEx
{
public static async Task<string> JoinAsync<T>(string separator, IAsyncEnumerable<T> seq)
{
if (seq == null)
throw new ArgumentNullException(nameof(seq));
await using (var en = seq.GetAsyncEnumerator())
{
if (!await en.MoveNextAsync())
return string.Empty;
string firstString = en.Current?.ToString();
if (!await en.MoveNextAsync())
return firstString ?? string.Empty;
// Null separator and values are handled by the StringBuilder
var sb = new StringBuilder(256);
sb.Append(firstString);
do
{
var currentValue = en.Current;
sb.Append(separator);
if (currentValue != null)
sb.Append(currentValue);
}
while (await en.MoveNextAsync());
return sb.ToString();
}
}
}
If the data are coming asynchronously but the interface IAsyncEnumerable<T> is not supported (like the mentioned in comments SqlDataReader), it's relatively easy to wrap the data into an IAsyncEnumerable<T>:
async IAsyncEnumerable<(object first, object second, object product)> ExtractData(
SqlDataReader reader)
{
while (await reader.ReadAsync())
yield return (reader[0], reader[1], reader[2]);
}
and use it:
Task<string> Stringify(SqlDataReader reader) =>
StringEx.JoinAsync(
", ",
ExtractData(reader).Select(x => $"{x.first} * {x.second} = {x.product}"));
In order to use Select, you'll need to use nuget package System.Interactive.Async. Here you can find a compilable example.

How about this..
string str = "The quick brown fox jumps over the lazy dog,";
StringBuilder sb = new StringBuilder(str);
sb.Remove(str.Length - 1, 1);

Use the following after the loop.
.TrimEnd(',')
or simply change to
string commaSeparatedList = input.Aggregate((a, x) => a + ", " + x)

I prefer manipulating the length of the stringbuilder:
data.Length = data.Length - 1;

I recommend, you change your loop algorithm:
Add the comma not AFTER the item, but BEFORE
Use a boolean variable, that starts with false, do suppress the first comma
Set this boolean variable to true after testing it

You should use the string.Join method to turn a collection of items into a comma delimited string. It will ensure that there is no leading or trailing comma, as well as ensure the string is constructed efficiently (without unnecessary intermediate strings).

The most simple way would be to use the Join() method:
public static void Trail()
{
var list = new List<string> { "lala", "lulu", "lele" };
var data = string.Join(",", list);
}
If you really need the StringBuilder, trim the end comma after the loop:
data.ToString().TrimEnd(',');

Yes, convert it to a string once the loop is done:
String str = data.ToString().TrimEnd(',');

You have two options. First one is very easy use Remove method it is quite effective. Second way is to use ToString with start index and end index (MSDN documentation)

Similar SO question here.
I liked the using a StringBuilder extension method.
RemoveLast Method

Gotcha!!
Most of the answers on this thread won't work if you use AppendLine like below:
var builder = new StringBuilder();
builder.AppendLine("One,");
builder.Length--; // Won't work
Console.Write(builder.ToString());
builder = new StringBuilder();
builder.AppendLine("One,");
builder.Length += -1; // Won't work
Console.Write(builder.ToString());
builder = new StringBuilder();
builder.AppendLine("One,");
Console.Write(builder.TrimEnd(',')); // Won't work
Fiddle Me
WHY??? #(&**(&#!!
The issue is simple but took me a while to figure it out: Because there are 2 more invisible characters at the end CR and LF (Carriage Return and Line Feed). Therefore, you need to take away 3 last characters:
var builder = new StringBuilder();
builder.AppendLine("One,");
builder.Length -= 3; // This will work
Console.WriteLine(builder.ToString());
In Conclusion
Use Length-- or Length -= 1 if the last method you called was Append. Use Length =- 3 if you the last method you called AppendLine.

Simply shortens the stringbuilder length by 1;
StringBuilder sb = new StringBuilder();
sb.Length--;
i know this is not the effective way as it translates to sb = sb-1;
Alternative Effective solution
sb.Remove(starting_index, how_many_character_to_delete);
for our case it would be
sb.Remove(sb.length-1,1)

String to dictionary using regex (want to optimize)

I have string on the format "$0Option one$1$Option two$2$Option three" (etc) that I want to convert into a dictionary where each number corresponds to an option. I currently have a working solution for this problem, but since this method is called for every entry I'm importing (few thousand) I want it to be as optimized as possible.
public Dictionary<string, int> GetSelValsDictBySelValsString(string selectableValuesString)
{
// Get all numbers in the string.
var correspondingNumbersArray = Regex.Split(selectableValuesString, #"[^\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();
List<int> correspondingNumbers = new List<int>();
int number;
foreach (string s in correspondingNumbersArray)
{
Int32.TryParse(s, out number);
correspondingNumbers.Add(number);
}
selectableValuesString = selectableValuesString.Replace("$", "");
var selectableStringValuesArray = Regex.Split(selectableValuesString, #"[\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();
var selectableValues = new Dictionary<string, int>();
for (int i = 0; i < selectableStringValuesArray.Count(); i++)
{
selectableValues.Add(selectableStringValuesArray.ElementAt(i), correspondingNumbers.ElementAt(i));
}
return selectableValues;
}

The first thing that caught my attention in your code is that it processes the input string three times: twice with Split() and once with Replace(). The Matches() method is a much better tool than Split() for this job. With it, you can extract everything you need in a single pass. It makes the code a lot easier to read, too.
The second thing I noticed was all those loops and intermediate objects. You're using LINQ already; really use it, and you can eliminate all of that clutter and improve performance. Check it out:
public static Dictionary<int, string> GetSelectValuesDictionary(string inputString)
{
return Regex.Matches(inputString, #"(?<key>[0-9]+)\$*(?<value>[^$]+)")
.Cast<Match>()
.ToDictionary(
m => int.Parse(m.Groups["key"].Value),
m => m.Groups["value"].Value);
}
notes:
Cast<Match>() is necessary because MatchCollection only advertises itself as an IEnumerable, and we need it to be an IEnumerable<Match>.
I used [0-9] instead of \d on the off chance that your values might contain digits from non-Latin writing systems; in .NET, \d matches them all.
Static Regex methods like Matches() automatically cache the Regex objects, but if this method is going to be called a lot (especially if you're using a lot of other regexes, too), you might want to create a static Regex object anyway. If performance is really critical, you can specify the Compiled option while you're at it.
My code, like yours, makes no attempt to deal with malformed input. In particular, mine will throw an exception if the number turns out to be too large, while yours just converts it to zero. This probably isn't relevant to your real code, but I felt compelled to express my unease at seeing you call TryParse() without checking the return value. :/
You also don't make sure your keys are unique. Like #Gabe, I flipped it around used the numeric values as the keys, because they happened to be unique and the string values weren't. I trust that, too, is not a problem with your real data. ;)

Your selectableStringValuesArray is not actually an array! This means that every time you index into it (with ElementAt or count it with Count) it has to rerun the regex and walk through the list of results looking for non-whitespace. You need something like this instead:
var selectableStringValuesArray = Regex.Split(selectableValuesString, #"[\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();
You should also fix your correspondingNumbersString because it has the same problem.
I see you're using C# 4, though, so you can use Zip to combine the lists and then you wouldn't have to create an array or use any loops. You could create your dictionary like this:
return correspondingNumbersString.Zip(selectableStringValuesArray,
(number, str) => new KeyValuePair<int, string>(int.Parse(number), str))
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);

Can all 'for' loops be replaced with a LINQ statement?

Is it possible to write the following 'foreach' as a LINQ statement, and I guess the more general question can any for loop be replaced by a LINQ statement.
I'm not interested in any potential performance cost just the potential of using declarative approaches in what is traditionally imperative code.
private static string SomeMethod()
{
if (ListOfResources .Count == 0)
return string.Empty;
var sb = new StringBuilder();
foreach (var resource in ListOfResources )
{
if (sb.Length != 0)
sb.Append(", ");
sb.Append(resource.Id);
}
return sb.ToString();
}
Cheers
AWC

Sure. Heck, you can replace arithmetic with LINQ queries:
http://blogs.msdn.com/ericlippert/archive/2009/12/07/query-transformations-are-syntactic.aspx
But you shouldn't.
The purpose of a query expression is to represent a query operation. The purpose of a "for" loop is to iterate over a particular statement so as to have its side-effects executed multiple times. Those are frequently very different. I encourage replacing loops whose purpose is merely to query data with higher-level constructs that more clearly query the data. I strongly discourage replacing side-effect-generating code with query comprehensions, though doing so is possible.

In general yes, but there are specific cases that are extremely difficult. For instance, the following code in the general case does not port to a LINQ expression without a good deal of hacking.
var list = new List<Func<int>>();
foreach ( var cur in (new int[] {1,2,3})) {
list.Add(() => cur);
}
The reason why is that with a for loop, it's possible to see the side effects of how the iteration variable is captured in a closure. LINQ expressions hide the lifetime semantics of the iteration variable and prevent you from seeing side effects of capturing it's value.
Note. The above code is not equivalent to the following LINQ expression.
var list = Enumerable.Range(1,3).Select(x => () => x).ToList();
The foreach sample produces a list of Func<int> objects which all return 3. The LINQ version produces a list of Func<int> which return 1,2 and 3 respectively. This is what makes this style of capture difficult to port.

In fact, your code does something which is fundamentally very functional, namely it reduces a list of strings to a single string by concatenating the list items. The only imperative thing about the code is the use of a StringBuilder.
The functional code makes this much easier, actually, because it doesn’t require a special case like your code does. Better still, .NET already has this particular operation implemented, and probably more efficient than your code1):
return String.Join(", ", ListOfResources.Select(s => s.Id.ToString()).ToArray());
(Yes, the call to ToArray() is annoying but Join is a very old method and predates LINQ.)
Of course, a “better” version of Join could be used like this:
return ListOfResources.Select(s => s.Id).Join(", ");
The implementation is rather straightforward – but once again, using the StringBuilder (for performance) makes it imperative.
public static String Join<T>(this IEnumerable<T> items, String delimiter) {
if (items == null)
throw new ArgumentNullException("items");
if (delimiter == null)
throw new ArgumentNullException("delimiter");
var strings = items.Select(item => item.ToString()).ToList();
if (strings.Count == 0)
return string.Empty;
int length = strings.Sum(str => str.Length) +
delimiter.Length * (strings.Count - 1);
var result = new StringBuilder(length);
bool first = true;
foreach (string str in strings) {
if (first)
first = false;
else
result.Append(delimiter);
result.Append(str);
}
return result.ToString();
}
1) Without having looked at the implementation in the reflector, I’d guess that String.Join makes a first pass over the strings to determine the overall length. This can be used to initialize the StringBuilder accordingly, thus saving expensive copy operations later on.
EDIT by SLaks: Here is the reference source for the relevant part of String.Join from .Net 3.5:
string jointString = FastAllocateString( jointLength );
fixed (char * pointerToJointString = &jointString.m_firstChar) {
UnSafeCharBuffer charBuffer = new UnSafeCharBuffer( pointerToJointString, jointLength);
// Append the first string first and then append each following string prefixed by the separator.
charBuffer.AppendString( value[startIndex] );
for (int stringToJoinIndex = startIndex + 1; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
charBuffer.AppendString( separator );
charBuffer.AppendString( value[stringToJoinIndex] );
}
BCLDebug.Assert(*(pointerToJointString + charBuffer.Length) == '\0', "String must be null-terminated!");
}

The specific loop in your question can be done declaratively like this:
var result = ListOfResources
.Select<Resource, string>(r => r.Id.ToString())
.Aggregate<string, StringBuilder>(new StringBuilder(), (sb, s) => sb.Append(sb.Length > 0 ? ", " : String.Empty).Append(s))
.ToString();
As to performance, you can expect a performance drop but this is acceptable for most applications.

I think what's most important here is that to avoid semantic confusion, your code should only be superficially functional when it is actually functional. In other words, please don't use side effects in LINQ expressions.

Technically, yes.
Any foreach loop can be converted to LINQ by using a ForEach extension method,such as the one in MoreLinq.
If you only want to use "pure" LINQ (only the built-in extension methods), you can abuse the Aggregate extension method, like this:
foreach(type item in collection { statements }
type item;
collection.Aggregate(true, (j, itemTemp) => {
item = itemTemp;
statements
return true;
);
This will correctly handle any foreach loop, even JaredPar's answer. EDIT: Unless it uses ref / out parameters, unsafe code, or yield return.
Don't you dare use this trick in real code.
In your specific case, you should use a string Join extension method, such as this one:
///<summary>Appends a list of strings to a StringBuilder, separated by a separator string.</summary>
///<param name="builder">The StringBuilder to append to.</param>
///<param name="strings">The strings to append.</param>
///<param name="separator">A string to append between the strings.</param>
public static StringBuilder AppendJoin(this StringBuilder builder, IEnumerable<string> strings, string separator) {
if (builder == null) throw new ArgumentNullException("builder");
if (strings == null) throw new ArgumentNullException("strings");
if (separator == null) throw new ArgumentNullException("separator");
bool first = true;
foreach (var str in strings) {
if (first)
first = false;
else
builder.Append(separator);
builder.Append(str);
}
return builder;
}
///<summary>Combines a collection of strings into a single string.</summary>
public static string Join<T>(this IEnumerable<T> strings, string separator, Func<T, string> selector) { return strings.Select(selector).Join(separator); }
///<summary>Combines a collection of strings into a single string.</summary>
public static string Join(this IEnumerable<string> strings, string separator) { return new StringBuilder().AppendJoin(strings, separator).ToString(); }

In general, you can write a lambda expression using a delegate which represents the body of a foreach cycle, in your case something like :
resource => { if (sb.Length != 0) sb.Append(", "); sb.Append(resource.Id); }
and then simply use within a ForEach extension method. Whether this is a good idea depends on the complexity of the body, in case it's too big and complex you probably don't gain anything from it except for possible confusion ;)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is StringBuilder really faster than Aggreggate? - c#

I don't want it to concat their ToString values, just their Names, how would I do that best? string.Join(",",tmpArr[0].Select(t => t.Name).ToArray()) But most of the time It. Just. Doesn't. Matter!

As string is Immutable, add operation has performance cost. This is what the StringBuilder is mainly designed for, it acts like "Mutable" String. I haven't done much benchmarking for the speed, but for memory optimizations StringBuilder is definitely better.

Something like this? string joined = string.Join(",", myItems.Select(x => x.Name).ToArray());

Related

Find a Class object's Item with Maximum character count in a User Defined List

Iterating through object properties in IList without a loop

Best way to remove the last character from a string built with stringbuilder

String to dictionary using regex (want to optimize)

Can all 'for' loops be replaced with a LINQ statement?

Categories

Resources