C++ string vs C# string, different running times. Why? - c#

I was experimenting with C++ to observe the effects of variables' scope-bounded-declarations and usage in loops on the running time of the program, as follows:
for(int i=0; i<10000000 ; ++i){
string s = "HELLO THERE!";
}
and
string s;
for(int i=0; i<10000000 ; ++i){
s = "HELLO THERE!";
}
The first program run in ~1 second while the second one run in ~250 milliseconds, as expected. Trying built in types wouldn't cause a significant difference, so I stick with strings in both languages.
I was discussing this with a friend of mine and he said this wouldn't happen in C#. We tried and observed ourselves that this did not happen in C# as it turned out, scope-bounded declarations of strings won't affect running time of the program.
Why is this difference? Is that a bad optimization in C++ strings (I strongly doubt that tho) or something else?

Strings in C# are immutable, so the assignment can just copy over a reference. In C++, however, strings are mutable, so the entire contents of the string need to be copied over.
If you want to verify this hypothesis, try with a (significantly) longer string constant. Runtime in C++ should go up, but runtime in C# should remain the same.

Strings in C# are immutable.
C# uses references and the memory it's not copied!
in C# "HELLO THERE!" will be automatically assigned to a piece of memory and won't be copied each time
for example:
string a = "HELLO";
string b = a;
they are pointing to the same piece of memory, but in C++ no! the string will be the same but not in the same place, if you want to obtain the same reult you should use pointers (or smart pointers)
string *a = new string("hello");
string *b = a;

Related

Passing by reference to n-th element in C#

In C, if we have an array, we can pass it by reference to a function. We can also use simple addition of (n-1) to pass the reference starting from n-th element of the array like this:
char *strArr[5];
char *str1 = "I want that!\n";
char *str2 = "I want this!\n";
char *str3 = "I want those!\n";
char *str4 = "I want these!\n";
char *str5 = "I want them!\n";
strArr[0] = str1;
strArr[1] = str2;
strArr[2] = str3;
strArr[3] = str4;
strArr[4] = str5;
printPartially(strArr + 1, 4); //we can pass like this to start printing from 2nd element
....
void printPartially(char** strArrPart, char size){
int i;
for (i = 0; i < size; ++i)
printf(strArrPart[i]);
}
Resulting in these:
I want this!
I want those!
I want these!
I want them!
Process returned 0 (0x0) execution time : 0.006 s
Press any key to continue.
In C#, we can also pass reference to an object by ref (or, out). The object includes array, which is the whole array (or at least, this is how I suppose it works). But how are we to pass by reference to the n-th element of the array such that internal to the function, there is only string[] whose elements are one less than the original string[] without the need to create new array?
Must we use unsafe? I am looking for a solution (if possible) without unsafe
Edit:
I understand that we could pass Array in C# without ref keyword. Perhaps my question sounds quite misleading by mentioning ref when we talk about Array. The point why I put ref there, I should rather put it this way: is the ref keyword can be used, say, to pass the reference to n-th element of the array as much as C does other than passing reference to any object (without mentioning the n-th element or something alike)? My apology for any misunderstanding occurs by my question's phrasing.
The "safe" approach would be to pass an ArraySegment struct instead.
You can of course pass a pointer to a character using unsafe c#, but then you need to worry about buffer overruns.
Incidentally, an Array in C# is (usually) allocated on the heap, so passing it normally (without ref) doesn't mean copying the array- it's still a reference that is passed (just a new one).
Edit:
You won't be able to do it as you do in C in safe code.
A C# array (i.e. string[]) is derived from abstract type Array.
It is not only a simple memory block as it is in C.
So you can't send one of it's element's reference and start iterate from there.
But there are some solutions which will give you the same taste of course (without unsafe):
Like:
As #Chris mentioned you can use ArraySegment<T>.
As Array is also an IEnumerable<T> you can use .Skip and send the returned value. (but this will give you an IEnumerable<T> instead of an Array). But it will allow you iterate.
etc...
If the method should only read from the array, you can use linq:
string[] strings = {"str1", "str2", "str3", ...."str10"};
print(strings.Skip(1).Take(4).ToArray());
Your confusion is a very common one. The essential point is realizing that "reference types" and "passing by reference" (ref keyboard) are totally independent. In this specific case, since string[] is a reference type (as are all arrays), it means the object is not copied when you pass it around, hence you are always referring to the same object.
Modified Version of C# Code:
string[] strArr = new string[5];
strArr[0] = "I want that!\n";
strArr[1] = "I want this!\n";
strArr[2] = "I want those!\n";
strArr[3] = "I want these!\n";
strArr[4] = "I want them!\n";
printPartially(strArr.Skip(1).Take(4).ToArray());
void printPartially(string[] strArr)
{
foreach (string str in strArr)
{
Console.WriteLine(str);
}
}
Question is old, but maybe answer will be useful for someone.
As of C# 7.2 there are much more types to use in that case, ex. Span or Memory.
They allow exactly for the thing you mentioned in your question (and much more).
Here's great article about them
Currently, if you want to use them, remeber to add <LangVersion>7.2</LangVersion> in .csproj file of your project to use C# 7.2 features

C# taking arguments & declaring at runtime from text file

I have a text file with the following:
int A = 5 ;
string str = "tempstring" ;
str DosomeMethod 15 16 20 22 ;
When reading the text file thru my program, I want to declare int A = 5 & string str = "tempstring" at runtime.
It can be like
string[] st = freader.readline().split(' ');
if (st[0]=="int")
{
str[0] str[1] = str[4];
}
I know that the above is the wrong syntax but I want to do something like this with some reference.
Can anybody help without using irony .net?
This is relatively advanced material.
You can't really do as you said. C# is a strongly-typed language.
Although, you may seek to one of these solutions:
IronPython - a script language which uses .NET.
C#'s dynamic keyword.
Reflection
Dynamically compiling C# code within application
Or, if you seek a quick solution, you may set these variables (string temp and so on) within your code, then put values into them according to the string.

C++ ">>" and "<<" IO in C#?

Is there a C# library that provides the functionality of ">>" and "<<" for IO in C++? It was really convenient for console apps. Granted not a lot of console apps are in C#, but some of us use it for them.
I know about Console.Read[Line]|Write[Line] and Streams|FileStream|StreamReader|StreamWriter thats not part of the question.
I dont think im specific enough
int a,b;
cin >> a >> b;
IS AMAZING!!
string input = Console.ReadLine();
string[] data = input.split( ' ' );
a = Convert.ToInt32( data[0] );
b = Convert.ToInt32( data[1] );
... long winded enough? Plus there are other reasons why the C# solution is worse. I must get the entire line or make my own buffer for it. If the line im working on is IDK say the 1000 line of Bells Triangle, I waste so much time reading everything at one time.
EDIT:
GAR!!!
OK THE PROBLEM!!!
Using IntX to do HUGE number like the .net 4.0 BigInteger to produce the bell triangle. If you know the bell triangle it gets freaking huge very very quickly. The whole point of this question is that I need to deal with each number individually. If you read an entire line, you could easily hit Gigs of data. This is kinda the same as digits of Pi. For Example 42pow1048576 is 1.6 MB! I don't have time nor memory to read all the numbers as one string then pick the one I want
No, and I wouldn't. C# != C++
You should try your best to stick with the language convention of whatever language you are working in.
I think I get what you are after: simple, default formatted input. I think the reason there is no TextReader.ReadXXX() is that this is parsing, and parsing is hard: for example: should ReadFloat():
ignore leading whitespace
require decimal point
require trailing whitespace (123abc)
handle exponentials (12.3a3 parses differently to 12.4e5?)
Not to mention what the heck does ReadString() do? From C++, you would expect "read to the next whitespace", but the name doesn't say that.
Now all of these have good sensible answers, and I agree C# (or rather, the BCL) should provide them, but I can certainly understand why they would choose to not provide fragile, nearly impossible to use correctly, functions right there on a central class.
EDIT:
For the buffering problem, an ugly solution is:
static class TextReaderEx {
static public string ReadWord(this TextReader reader) {
int c;
// Skip leading whitespace
while (-1 != (c = reader.Peek()) && char.IsWhiteSpace((char)c)) reader.Read();
// Read to next whitespace
var result = new StringBuilder();
while (-1 != (c = reader.Peek()) && !char.IsWhiteSpace((char)c)) {
reader.Read();
result.Append((char)c);
}
return result.ToString();
}
}
...
int.Parse(Console.In.ReadWord())
Nope. You're stuck with Console.WriteLine. You could create a wrapper that offered this functionality, though.
You can Use Console.WriteLine , Console.ReadLine ..For the purpose.Both are in System NameSpace.
You have System.IO.Stream(Reader|Writer)
And for console: Console.Write, Console.Read
Not that I know of. If you are interested of the chaining outputs you can use System.Text.StringBuilder.
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder(VS.71).aspx
StringBuilder builder = new StringBuilder();
builder.Append("hello").Append(" world!");
Console.WriteLine(builder.ToString());
Perhaps not as pretty as C++, but as another poster states, C# != C++.
This is not even possible in C#, no matter how hard you try:
The left hand side and right hand side of operators is always passed by value; this rules out the possibility of cin.
The right hand side of << and >> must be an integer; this rules out cout.
The first point is to make sure operator overloading is a little less messy than in C++ (debatable, but it surely makes things a lot simpler), and the second point was specifically chosen to rule out C++'s cin and cout way of dealing with IO, IIRC.

String Concatenation unsafe in C#, need to use StringBuilder?

My question is this: Is string concatenation in C# safe? If string concatenation leads to unexpected errors, and replacing that string concatenation by using StringBuilder causes those errors to disappear, what might that indicate?
Background: I am developing a small command line C# application. It takes command line arguments, performs a slightly complicated SQL query, and outputs about 1300 rows of data into a formatted XML file.
My initial program would always run fine in debug mode. However, in release mode it would get to about the 750th SQL result, and then die with an error. The error was that a certain column of data could not be read, even through the Read() method of the SqlDataReader object had just returned true.
This problem was fixed by using StringBuilder for all operations in the code, where previously there had been "string1 + string2". I'm not talking about string concatenation inside the SQL query loop, where StringBuilder was already in use. I'm talking about simple concatenations between two or three short string variables earlier in the code.
I had the impression that C# was smart enough to handle the memory management for adding a few strings together. Am I wrong? Or does this indicate some other sort of code problem?
To answer your question:
String contatenation in C# (and .NET in general) is "safe", but doing it in a tight loop as you describe is likely to cause severe memory pressure and put strain on the garbage collector.
I would hazard a guess that the errors you speak of were related to resource exhaustion of some sort, but it would be helpful if you could provide more detail — for example, did you receive an exception? Did the application terminate abnormally?
Background:
.NET strings are immutable, so when you do a concatenation like this:
var stringList = new List<string> {"aaa", "bbb", "ccc", "ddd", //... };
string result = String.Empty;
foreach (var s in stringList)
{
result = result + s;
}
This is roughly equivalent to the following:
string result = "";
result = "aaa"
string temp1 = result + "bbb";
result = temp1;
string temp2 = temp1 + "ccc";
result = temp2;
string temp3 = temp2 + "ddd";
result = temp3;
// ...
result = tempN + x;
The purpose of this example is to emphasise that each time around the loop results in the allocation of a new temporary string.
Since the strings are immutable, the runtime has no alternative options but to allocate a new string each time you add another string to the end of your result.
Although the result string is constantly updated to point to the latest and greatest intermediate result, you are producing a lot of these un-named temporary string that become eligible for garbage collection almost immediately.
At the end of this concatenation you will have the following strings stored in memory (assuming, for simplicity, that the garbage collector has not yet run).
string a = "aaa";
string b = "bbb";
string c = "ccc";
// ...
string temp1 = "aaabbb";
string temp2 = "aaabbbccc";
string temp3 = "aaabbbcccddd";
string temp4 = "aaabbbcccdddeee";
string temp5 = "aaabbbcccdddeeefff";
string temp6 = "aaabbbcccdddeeefffggg";
// ...
Although all of these implicit temporary variables are eligible for garbage collection almost immediately, they still have to be allocated. When performing concatenation in a tight loop this is going to put a lot of strain on the garbage collector and, if nothing else, will make your code run very slowly. I have seen the performance impact of this first hand, and it becomes truly dramatic as your concatenated string becomes larger.
The recommended approach is to always use a StringBuilder if you are doing more than a few string concatenations. StringBuilder uses a mutable buffer to reduce the number of allocations that are necessary in building up your string.
String concatenation is safe though more memory intensive than using a StringBuilder if contatenating large numbers of strings in a loop. And in extreme cases you could be running out of memory.
It's almost certainly a bug in your code.
Maybe you're contatenating a very large number of strings. Or maybe it's something else completely different.
I'd go back to debugging without any preconceptions of the root cause - if you're still having problems try to reduce it to the minimum needed to repro the problem and post code.
Apart from what you're doing is probably best done with XML APIs instead of strings or StringBuilder I doubt that the error you see is due to string concatenation. Maybe switching to StringBuilder just masked the error or went over it gracefully, but I doubt using strings really was the cause.
How long would it take the concatenation version vs the string builder version? It's possible that your connection to the DB is being closed. If you are doing a lot of concatenation, i would go w/ StringBuilder as it is a bit more efficient.
One cause may be that strings are immutable in .Net so when you do an operation on one such as concatenation you are actually creating a new string.
Another possible cause is that string length is an int so the maximum possible length is Int32.MaxValue or 2,147,483,647.
In either case a StringBuilder is better than "string1 + string2" for this type of operation. Although, using the built-in XML capabilities would be even better.
string.Concat(string[]) is by far the fastest way to concatenate strings. It litterly kills StringBuilder in performance when used in loops, especially if you create the StringBuilder in each iteration.
There are loads of references if you Google "c# string format vs stringbuilder" or something like that.
http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx gives you an ideer about the times. Here string.Join wins the concatenation test but I belive this is because the string.Concat(string, string) is used instead of the overloaded version that takes an array.
If you take a look at the MSIL code that is generated by the different methods you'll see what going on beneath the hood.
Here is my shot in the dark...
Strings in .NET (not stringbuilders) go into the String Intern Pool. This is basically an area managed by the CLR to share strings to improve performance. There has to be some limit here, although I have no idea what that limit is. I imagine all the concatenation you are doing is hitting the ceiling of the string intern pool. So SQL says yes I have a value for you, but it can't put it anywhere so you get an exception.
A quick and easy test would be to nGen your assembly and see if you still get the error. After nGen'ing, you application no longer will use the pool.
If that fails, I'd contact Microsoft to try and get some hard details. I think my idea sounds plausible, but I have no idea why it works in debug mode. Perhaps in debug mode strings aren't interned. I am also no expert.
When compounding strings together I always use StringBuilder. It's designed for it and is more efficient that simply using "string1 + string2".

Why should I never use an unsafe block to modify a string?

I have a String which I would like to modify in some way. For example: reverse it or upcase it.
I have discovered that the fastest way to do this is by using a unsafe block and pointers.
For example:
unsafe
{
fixed (char* str = text)
{
*str = 'X';
}
}
Are there any reasons why I should never ever do this?
The .Net framework requires strings to be immutable. Due to this requirement it is able to optimise all sorts of operations.
String interning is one great example of this requirement is leveraged heavily. To speed up some string comparisons (and reduce memory consumption) the .Net framework maintains a Dictionary of pointers, all pre-defined strings will live in this dictionary or any strings where you call the String.intern method on. When the IL instruction ldstr is called it will check the interned dictionary and avoid memory allocation if we already have the string allocated, note: String.Concat will not check for interned strings.
This property of the .net framework means that if you start mucking around directly with strings you can corrupt your intern table and in turn corrupt other references to the same string.
For example:
// these strings get interned
string hello = "hello";
string hello2 = "hello";
string helloworld, helloworld2;
helloworld = hello;
helloworld += " world";
helloworld2 = hello;
helloworld2 += " world";
unsafe
{
// very bad, this changes an interned string which affects
// all app domains.
fixed (char* str = hello2)
{
*str = 'X';
}
fixed (char* str = helloworld2)
{
*str = 'X';
}
}
Console.WriteLine("hello = {0} , hello2 = {1}", hello, hello2);
// output: hello = Xello , hello2 = Xello
Console.WriteLine("helloworld = {0} , helloworld2 = {1}", helloworld, helloworld2);
// output : helloworld = hello world , helloworld2 = Xello world
Are there any reasons why I should never ever do this?
Yes, very simple: Because .NET relies on the fact that strings are immutable. Some operations (e.g. s.SubString(0, s.Length)) actually return a reference to the original string. If this now gets modified, all other references will as well.
Better use a StringBuilder to modify a string since this is the default way.
Put it this way: how would you feel if another programmer decided to replace 0 with 1 everywhere in your code, at execution time? It would play hell with all your assumptions. The same is true with strings. Everyone expects them to be immutable, and codes with that assumption in mind. If you violate that, you are likely to introduce bugs - and they'll be really hard to trace.
Oh dear lord yes.
1) Because that class is not designed to be tampered with.
2) Because strings are designed and expected throughout the framework to be immutable. That means that code that everyone else writes (including MSFT) is expecting a string's underlying value never to change.
3) Because this is premature optimization and that is E V I L.
Agreed about StringBuilder, or just convert your string to an array of chars/bytes and work there. Also, you gave the example of "upcasing" -- the String class has a ToUpper method, and if that's not at least as fast as your unsafe "upcasing", I'll eat my hat.

Categories

Resources