After String.Reverse, a mistake occurs in ToString - c#

When the application runs, the IDE tells me Input string is not in the correct format.
(Convert.ToInt32(_subStr.Reverse().ToString().Substring(4, _subStr.Length - 4))*1.6).ToString()
I don't know how the Reverse() can be exactly used here.

There is no Reverse method in the String class, so the method you're using is actually the Enumerable.Reverse extension method. This compiles because String implements IEnumerable<char>, but the result is not a string, it's another implementation of IEnumerable<char>. When you call ToString() on that, you get this: System.Linq.Enumerable+<ReverseIterator>d__a01[System.Char]`.
If you want to convert this IEnumerable<char> to a string, you can dot it like this:
string reversed = new string(_subStr.Reverse().ToArray());
(Convert.ToInt32(reversed.Substring(4, _subStr.Length - 4))*1.6).ToString()
Note, however, that it is not a correct way of reversing a string, it will fail in some cases due to Unicode surrogate pairs and combining characters. See here and there for explanations.

As the Reverse method is an extension of IEnumerable<T>, you get an IEnumerable<T> as result, and as that doesn't override the ToString method, you will get the original implementation from the Object class, that simply returns the type name of the object. Turn the IEnumerable<T> into an array, then you can create a string from it.
You should first get the part of the string that is digits, then reverse it. That way it will work, regardless of what characters you have in the rest of the string. Although using the Reverse extension doesn't work properly to reverse any string (as Thomas Levesque pointed out), it will work for a string with only digits:
(
Int32.Parse(
new String(_subStr.SubString(0, _subStr.Length - 4).Reverse().ToArray())
) * 1.6
).ToString();

the simplest approach would be:
string backwards = "sdrawkcab";
string reverse = "";
for(int i = backwards.Length-1; i >= 0; i--)
reverse += backwards[i];
The result is: reverse == "backwards".
then you can do:
reverse = (Convert.ToInt32(reverse) * 1.6).toString();
Your Piping approach string.Substring().Split().Last()... will very quickly lead to a memory bottleneck, if you loop through lines of text.
Also important to notice: strings are Immutable, therefor each iteration of the for loop we create a new string instance in the memory because of the += operator, this will give us less optimal memory efficieny compared to other, more efficient algorithms, this is an O(n2) algorithm.
for a more efficient implementation you can check:
https://stackoverflow.com/a/1009707/14473033

Related

Increment formated string (f.e. "07-01" to "07-02")

I have a string with the format 00-00 and I want to increment it to 00-01.
Currently I am using Split() but I have the feeling that my approach is not really best practice.
I don't have to worry about edge cases and just want to know if there is an elegant solution.
Thanks
Linq approach without edge cases like 00-99 or 99-99
string input ="07-01";
string result = string.Join("-", input.Split('-')
.Select(int.Parse)
.Select((x, i) => (i == 1 ? ++x : x).ToString("00")));
Your method of utilizing string.Split() would be my suggestion too.
However, I'm wondering if you are repeatedly incrementing that number. if this is a one time action, then I agree with using the split method.
If you are continually incrementing this number (e.g. as a counter), you will start noticing that it takes more resources to do string operations (split) and conversions (string to int, int to string); compared to incrementing the value of the integer.
In this case, I would advocate that you keep the integer value in memory, and keep using that to generate your output string, instead of always having to start from scratch.
However, the latter suggestion only applies if you keep incrementing the same values. Your question did not specify that that is the case.

Time complexity for C# StringBuilder initialized with string [duplicate]

How does StringBuilder work?
What does it do internally? Does it use unsafe code?
And why is it so fast (compared to the + operator)?
When you use the + operator to build up a string:
string s = "01";
s += "02";
s += "03";
s += "04";
then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.
...
s += "99";
On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.
A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.
What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.
Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.
StringBuilder's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.
The reason StringBuilder is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.
For just a single concatenation, it's actually slightly more efficient to use + than to use StringBuilder. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder shines.
See my article on StringBuilder for more information.
The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). The biggest performance benefit over a bunch of + concatenated strings is that it writes to a char[] and doesn't create as many intermediate strings. When you call ToString (), it builds a completed, immutable string from your contents.
The StringBuilder uses a string buffer that can be altered, compared to a regular String that can't be. When you call the ToString method of the StringBuilder it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.
As the StringBuilder can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data. When you use the + operator, the compiler turns that into a String.Concat call that creates a new string object. This seemingly innocent piece of code:
str += ",";
compiles into this:
str = String.Concat(str, ",");

Does string.Replace(string, string) create additional strings?

We have a requirement to transform a string containing a date in dd/mm/yyyy format to ddmmyyyy format (In case you want to know why I am storing dates in a string, my software processes bulk transactions files, which is a line based textual file format used by a bank).
And I am currently doing this:
string oldFormat = "01/01/2014";
string newFormat = oldFormat.Replace("/", "");
Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?
Here's the reason why I am asking this:
I am processing transaction files ranging in size from few kilobytes to hundreds of megabytes. So far I have not had a performance/memory problem, because I am still testing with very small files. But when it comes to megabytes I am not sure if I will have problems with these additional strings. I suspect that would be the case because strings are immutable. With millions of records this additional memory consumption will build up considerably.
I am already using StringBuilders for output file creation. And I also know that the discarded strings will be garbage collected (at some point before the end of the time). I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.
Sure enough, this converts "01/01/2014" to "01012014". But my question
is, does the replace happen in one step, or does it create an
intermediate string (e.g.: "0101/2014" or "01/012014")?
No, it doesn't create intermediate strings for each replacement. But it does create new string, because, as you already know, strings are immutable.
Why?
There is no reason to a create new string on each replacement - it's very simple to avoid it, and it will give huge performance boost.
If you are very interested, referencesource.microsoft.com and SSCLI2.0 source code will demonstrate this(how-to-see-code-of-method-which-marked-as-methodimploptions-internalcall):
FCIMPL3(Object*, COMString::ReplaceString, StringObject* thisRefUNSAFE,
StringObject* oldValueUNSAFE, StringObject* newValueUNSAFE)
{
// unnecessary code ommited
while (((index=COMStringBuffer::LocalIndexOfString(thisBuffer,oldBuffer,
thisLength,oldLength,index))>-1) && (index<=endIndex-oldLength))
{
replaceIndex[replaceCount++] = index;
index+=oldLength;
}
if (replaceCount != 0)
{
//Calculate the new length of the string and ensure that we have
// sufficent room.
INT64 retValBuffLength = thisLength -
((oldLength - newLength) * (INT64)replaceCount);
gc.retValString = COMString::NewString((INT32)retValBuffLength);
// unnecessary code ommited
}
}
as you can see, retValBuffLength is calculated, which knows the amount of replaceCount's. The real implementation can be a bit different for .NET 4.0(SSCLI 4.0 is not released), but I assure you it's not doing anything silly :-).
I was wondering if there is a better, more efficient way of replacing
all occurrences of a specific character/substring in a string, that
does not additionally create an string.
Yes. Reusable StringBuilder that has capacity of ~2000 characters. Avoid any memory allocation. This is only true if the the replacement lengths are equal, and can get you a nice performance gain if you're in tight loop.
Before writing anything, run benchmarks with big files, and see if the performance is enough for you. If performance is enough - don't do anything.
Well, I'm not a .NET development team member (unfortunately), but I'll try to answer your question.
Microsoft has a great site of .NET Reference Source code, and according to it, String.Replace calls an external method that does the job. I wouldn't argue about how it is implemented, but there's a small comment to this method that may answer your question:
// This method contains the same functionality as StringBuilder Replace. The only difference is that
// a new String has to be allocated since Strings are immutable
Now, if we'll follow to StringBuilder.Replace implementation, we'll see what it actually does inside.
A little more on a string objects:
Although String is immutable in .NET, this is not some kind of limitation, it's a contract. String is actually a reference type, and what it includes is the length of the actual string + the buffer of characters. You can actually get an unsafe pointer to this buffer and change it "on the fly", but I wouldn't recommend doing this.
Now, the StringBuilder class also holds a character array, and when you pass the string to its constructor it actually copies the string's buffer to his own (see Reference Source). What it doesn't have, though, is the contract of immutability, so when you modify a string using StringBuilder you are actually working with the char array. Note that when you call ToString() on a StringBuilder, it creates a new "immutable" string any copies his buffer there.
So, if you need a fast and memory efficient way to make changes in a string, StringBuilder is definitely your choice. Especially regarding that Microsoft explicitly recommends to use StringBuilder if you "perform repeated modifications to a string".
I haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string.
So you can use StringBuilder.Replace since you're already using a StringBuilder.
Is StringBuilder.Replace() more efficient than String.Replace?
String.Replace() vs. StringBuilder.Replace()
There is no string method for that. You are own your own. But you can try something like this:
oldFormat="dd/mm/yyyy";
string[] dt = oldFormat.Split('/');
string newFormat = string.Format("{0}{1}/{2}", dt[0], dt[1], dt[2]);
or
StringBuilder sb = new StringBuilder(dt[0]);
sb.AppendFormat("{0}/{1}", dt[1], dt[2]);

Difference between using append method of StringBulder class and concatenation "+" operator [duplicate]

This question already has answers here:
String Concatenation Vs String Builder Append
(5 answers)
Closed 9 years ago.
what is the difference in using the Append method of StringBuilder class and Concatenation using "+" operator?
In what way the Append method works efficient or faster than "+" operator in concatenating two strings?
First of all, String and StringBuilder are different classes.
String class represents immutable types but StringBuilder class represent mutable types.
When you use + to concatanate your strings, it uses String.Concat method. And every time, it returns a new string object.
StringBuilder.Append method appends a copy of the specified string. It doesn't return a new string, it changes the original one.
For efficient part, you should read Jeff's article called The Sad Tragedy of Micro-Optimization Theater
It. Just. Doesn't. Matter!
We already know none of these operations
will be performed in a loop, so we can rule out brutally poor
performance characteristics of naive string concatenation. All that's
left is micro-optimization, and the minute you begin worrying about
tiny little optimizations, you've already gone down the wrong path.
Oh, you don't believe me? Sadly, I didn't believe it myself, which is
why I got drawn into this in the first place. Here are my results --
for 100,000 iterations, on a dual core 3.5 GHz Core 2 Duo.
1: Simple Concatenation 606 ms
2: String.Format 665 ms
3: string.Concat 587 ms
4: String.Replace 979 ms
5: StringBuilder 588 ms
String are immutable so when you append, you actually create a new object in the background.
When you use StringBuilder, it provides an efficient method for concatenating strings.
To be honest, you are not really going to notice a big improvement if you use it once or twice. But the efficiency comes in when you use the StringBuilder in loops.
When you concatenate two strings you actually create a new string with the result. A StringBuilder has the ability to resize itself as you add to it, which can be faster.
As with all things, it depends. If you are simply concatenating two small strings like this:
string s = "a" + "b";
Then at best there will be no difference in performance, but likely this will be quicker than using a StringBuilder and is also easier to read.
StringBuilder is more suitable for cases where you are concatenating an arbitrary number of strings, which you don't know at compile time.

Does String.ToLower() always allocate memory?

Does String.ToLower() return the same reference (e.g. without allocating any new memory) if all the characters are already lower-case?
Memory allocation is cheap, but running a quick check on zillions of short strings is even cheaper. Most of the time the input I'm working with is already lower-case, but I want to make it that way if it isn't.
I'm working with C# / .NET in particular, but my curiosity extends to other languages so feel free to answer for your favorite one!
NOTE: Strings are immutable but that does not mean a function always has to return a new one, rather it means nothing can change their character content.
I expect so, yes. A quick test agrees (but this is not evidence):
string a = "abc", b = a.ToLower();
bool areSame = ReferenceEquals(a, b); // false
In general, try to work with comparers that do what you want. For example, if you want a case-insensitive dictionary, use one:
var lookup = new Dictionary<string, int>(
StringComparer.InvariantCultureIgnoreCase);
Likewise:
bool ciEqual = string.Equals("abc", "ABC",
StringComparison.InvariantCultureIgnoreCase);
String is an immutable. String.ToLower() will always return new instance thereby generating another instance on every ToLower() call.
Java implementation of String.toLowerCase() from Sun actually doesn't always allocate new String. It checks if all chars are lowercase, and if so, it returns original string.
[edit]
Interning doesn't help -- see the comments to this answer.
If you use the following code it will not allocate new memory and it will overwrite the original string (this may or may not be what you want). It expects an ascii string. Expect weird things to occur if you call this on strings returned from functions you do not control.
public static unsafe void UnsafeToLower(string asciiString)
{
fixed (char* pstr = asciiString)
{
for(char* p = pstr; *p != 0; ++p)
*p = (*p > 0x40) && (*p < 0x5b) ? (char)(*p | 0x60) : (*p);
}
}
It takes about 25% as long as ToLowerInvariant and avoids memory allocation.
I would only use something like this if you are doing say 100,000 or more strings regularly inside a tight loop.

Categories

Resources