C# string passed as an function argument - c#

string myString;
void WriteString( string myString ) // This myString is copied.
{
// Writing to myString.
myString[0] = 'b'; // chaning this is just changing copy
}
void ReadString( string myString ) // Is this myString copied, eventhough I'm not writing at all?
{
if( myString[0] == 'a' ) // calling just get property in string
DebugConsole.Write("I just read myString and first character was 'a'");
}
Hello. I wonder if, in the case above, compiler would distinguish two functions and try to optimize ReadString function by passing myString as reference or inlining the function. If that is not the case, what should be done if myString is too huge to just ignore copying?
Thank you.

Regardless of the compiler's optimizations (which, no, would not make all that much of a difference anyway here), the string type in C# is always passed by reference.
Furthermore, the string reference is immutable. That means that your WriteString function wouldn't compile in the first place.
StringBuilder builder = new StringBuilder(myString);
builder[0] = 'b';
myString = builder.ToString();
Note, of course, that this solution will not change any references to the string made outside the function. In order to do that, pass it as a ref parameter.

Related

Why does String.IsInterned return a string

I see that String.Intern will actually add a string to the intern-pool and String.IsInterned will return the reference to that corresponding interned string. This makes me wonder:
Why does IsInterned return the referenced interned string and not a bool indicating whether a given string has been interned so far? I feel it's a funny use for an Is notation.
In what case would the code below return true?
bool InternCheck(string s)
{
string internedString = String.IsInterned(s);
return internedString != null && !String.Equals(internedString, s);
}
Why does IsInterned return the referenced interned string and not a bool indicating whether a given string has been interned so far? I feel it's a funny use for an Is notation.
For definitive "why?" you need to ask Microsoft. However, compare IsInterned() with similar (though functionally different of course) HashSet<T>.Add(). I.e. it's convenient to have a method that checks whether something is true, and if it is, provides the value you wanted as part of returning the information you want.
Why this method doesn't follow the TryXXX() pattern, again…you'd have to ask Microsoft, but we can easily guess. Obviously the method could have returned a bool and providing the string reference as an out parameter. But note that here, we know the value type is a nullable reference, and so can be null as an adequate way to indicate non-existence, which is different from the various types that implement TryXXX() methods.
In what case would the code below return true?
I don't see how that code would ever return true. If the string is not interned, it will return false, and if it is interned, then the interned string is necessarily always equal to the string that was passed in, and so !string.Equals(...) would also be false.
Is there some reason you think otherwise?
Let's imagine if the String.IsInterned method where to return a bool. Then all you'd know from calling bool whoopie = String.IsInterned(s); is that the value of your string is the same as a string that is interned. There is no indication that you have the same reference to the interned string.
Now the point of interning is to hold memory pressure down. You know you're creating a lot of similar strings and you want to ensure that you're not clogging up memory.
There's a cost to interning and that cost better be less than the cost of using up RAM.
So, back to String.IsInterned hypothetically returning a bool.
Since you don't know if you have the interned reference, which you'd want otherwise there's no point in interning, you'd end up writing this code a lot:
if (String.IsInterned(s))
{
s = String.GetInterned(s);
}
Or:
s = String.IsInterned(s) ? String.GetInterned(s) : s;
String.GetInterned is also a hypothetical method.
With the actual implementation of IsInterned this code becomes slightly simpler:
s = String.IsInterned(s) ?? s;
Let's see if we can improve this design.
If I try to implement a TryGetInterned style of operator I might implement it like this:
public static bool TryGetInterned(this string input, out string output)
{
string intermediate = String.IsInterned(input);
output = intermediate ?? input;
return intermediate != null;
}
This code works perfectly fine, but it leads to this kind of code repetition:
string s = "Hello World";
if (s.TryGetInterned(out string s2))
Console.WriteLine(s2); // `s` is interned
else
Console.WriteLine(s2); // `s` is NOT interned
This seems pretty pointless.
Compare this to the current IsInterned method:
string s = "Hello World";
s = String.IsInterned(s) ?? s;
Console.WriteLine(s);
Much simpler.
The only implementation that I could consider an improvement, in some circumstances, is this:
public static string GetIsInternedOrSelf(this string input)
=> String.IsInterned(input) ?? input;
Now I have this:
string s = "Hello World";
s = s.GetIsInternedOrSelf();
Console.WriteLine(s);
It's an improvement, but we've lost the ability to know if the string was interned.
The bottom-line is that I think String.IsInterned is probably as well designed as it could be.

Modifying C# Out parameter more than once

When you have a function that has an out parameter is it best practice to create a new variable inside the function and assign the out parameter to it at the end of the function? Or give the out parameter some empty/default value in the beginning and modify throughout the function.
I'm trying to come up with some reasoning as to why one of these coding styles/practices is better to use.
Option 1: Using just the out parameter.
public bool SomeFunc(out string outStr)
{
outStr = "";
if (errorCond)
return false;
outStr += "foo";
outStr += "bar";
return true;
}
Option 2: Using a temporary variable.
public bool SomeFunc1(out string outStr)
{
string tempStr = "";
outStr = ""; // To prevent 'The out parameter must be set' error on return false line.
if (errorCond)
return false;
tempString += "foo";
tempString += "bar";
outStr = tempStr;
return true;
}
Even though both of these achieve the same outcome, which is preferable? Are there any drawbacks to either one of them?
Actually, it doesn't matter, you just must assign variable in this method.
But, it is preferable to avoid using output or reference parameters:
Working with members that define out or reference parameters requires
that the developer understand pointers, subtle differences between
value types and reference types, and initialization differences
between out and reference parameters.
For me, the second one is overhead
Assign a default value at the beginning of the method, and then change the value if necessary.
Look at examples in .net source codes, like int.TryParse or Enum.TryParse

What's the point of variables in C# 7.0's pattern matching?

I've been reading about the "is" operator pattern matching in C# 7.0 and I don't see the reason for variable declaration.
If we have this
if(str is string s)
Console.WriteLine(s);
What's the point of "s"?
In this post it says the whole point of such variables is not to access the evaluated object twice. But there's no point!
If this is true then that means str is just a string object and accessing it twice to Console.WriteLine it shouldn't be that much of a consideration. In any case it's content is copied to s accessing which surely would take just as much time as str.
What I am asking is - why declare variables in the pattern matching feature of C#7.0 when accessing the evaluated variable should be about the same operation as copying it and then accessing it's copy?
What's the point of "s"?
It's a variable of the type that you've just checked for, which you often want to use.
Your example is an unfortunate one as Console.WriteLine accepts object as well... but suppose you wanted to print out the length of the string. Here's a complete example without pattern matching:
public void PrintLengthIfString(object obj)
{
if (obj is string)
{
string str = (string) obj;
Console.WriteLine(str.Length);
}
}
It's not only longer, but it's performing the same check twice, effectively: once for the is operator, and once for the cast. Pattern matching make this simpler, by getting the value of the string as part of the is operator:
public void PrintLengthIfString(object obj)
{
if (obj is string str)
{
// No cast here, it's in the pattern match!
Console.WriteLine(str.Length);
}
}

string concatenation and reference equality

In C# strings are immutable and managed. In theory that would mean the concatenation of any strings A and B would cause the allocation of a new buffer however this is all pretty obfuscated. When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime? Furthermore, how does the runtime/compiler handle s2's value/allocation when I modify the value of s1? My program would indicate that the memory at the original address of s1 remains intact (and s2 continues pointing there) while a relloc occurs for the new value and then s1 is pointed there, is this an accurate description of what happens under the covers?
Example program;
static void Main(string[] args)
{
string s1 = "Some random text I chose";
string s2 = s1;
string s3 = s2;
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // true
s1 = s1 + "";
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // true
Console.WriteLine(s2);
s1 = s1 + " something else";
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // false cause s1 got realloc'd
Console.WriteLine(Object.ReferenceEquals(s2, s3));
Console.WriteLine(s2);
Console.ReadKey();
}
When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime?
It is both a compile time optimization and also an optimization performed in the implementation of the overloaded concatenation operator. If you concat two compile time literals, or concat a string known to be null or empty at compile time, the concatenation is done at compile time, and then potentially interned, and will therefore be reference equal to any other compile time literal string that has the same value.
Additionally, String.Concat is implemented such that if you concat a string with either null or an empty string, it just returns the other string (unless the other string was null, in which case it returns an empty string). The test you already have demonstrates this, as you're concatting a non-compile time literal string with an empty string and it's staying reference equal.
Of course if you don't believe your own test, you can look at the source to see that if one of the arguments is null then it simply returns the other.
if (IsNullOrEmpty(str0)) {
if (IsNullOrEmpty(str1)) {
return String.Empty;
}
return str1;
}
if (IsNullOrEmpty(str1)) {
return str0;
}
When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime?
This is a run-time optimization. Here is how it is implemented in Mono:
public static String Concat(String str0, String str1) {
Contract.Ensures(Contract.Result() != null);
Contract.Ensures(Contract.Result().Length ==
(str0 == null ? 0 : str0.Length) +
(str1 == null ? 0 : str1.Length));
Contract.EndContractBlock();
// ========= OPTIMIZATION BEGINS ===============
if (IsNullOrEmpty(str0)) {
if (IsNullOrEmpty(str1)) {
return String.Empty;
}
return str1;
}
if (IsNullOrEmpty(str1)) {
return str0;
}
// ========== OPTIMIZATION ENDS =============
int str0Length = str0.Length;
String result = FastAllocateString(str0Length + str1.Length);
FillStringChecked(result, 0, str0);
FillStringChecked(result, str0Length, str1);
return result;
}
The compiler may produce additional optimizations of its own - for example, concatenating two string literals produces a new literal value at compile time, without calling string.Concat. This is not different from C#'s handling of other expressions that include compile-time constants of other data types, though.
Furthermore, how does the runtime/compiler handle s2's value/allocation when I modify the value of s1?
s1 and s2 are independent references to the same string object, which is immutable. Reassigning another object to one of them does not change the other reference.
It is a decision by the String.Concat function not to concat the string. It checks whether s1 is null and assigns "" to s1 if yes.
s1 = s1 + "";
gets optimized by the comiler.
s1 = s1 ?? "";
If you want to learn more check out this link
String concatenation is specified to return a string whose sequence of characters is the concatenation of the sequences encapsulated by the string representations of the things being concatenated. In cases where no existing string contains the proper sequence of characters, the concatenation code will need to create a new one; further, even in cases where an existing string might contain the proper sequence of characters, it will usually be faster for the computer to create a new string than try to find the existing one. I believe, however, that concatenation is allowed to return an existing string in any case where it can quickly find one that contains the proper characters, and in the case of concatenating a zero-length string to a non-zero-length string, finding a string which contains the proper characters is easy.
Because of behavioral details like the above, in most cases the only legitimate application of ReferenceEquals with strings is in situations where a true result is interpreted to say "the strings definitely contain the same characters" and a "false" result to say "the strings might not contain the same characters". It should not be interpreted as saying anything about where the strings came, how they were created, or anything like that.
When you concatenate with the identity (the empty string) the
reference maintains intact. Is this a compile time optimization or is
the overloaded assignment operator making the decision to not realloc
at runtime?
Neither. It's the Concat method that does that decision. The code is actually compiled into:
s1 = String.Concat(s1, "");
The Concat method contains this code, that makes it return the first parameter if the second is empty:
if (IsNullOrEmpty(str1)) {
return str0;
}
Ref: Microsoft reference source: String.Concat(string, string)
My program would indicate that the memory at the original address of
s1 remains intact (and s2 continues pointing there) while a relloc
occurs for the new value and then s1 is pointed there
That is correct.

In c# , when sending a parameter to a method, when should we use "ref" and when "out" and when without any of them?

In c# , when sending a parameter to a method, when should we use "ref" and when "out" and when without any of them?
In general, you should avoid using ref and out, if possible.
That being said, use ref when the method might need to modify the value. Use out when the method always should assign something to the value.
The difference between ref and out, is that when using out, the compiler enforces the rule, that you need to assign something to the out paramter before returning. When using ref, you must assign a value to the variable before using it as a ref parameter.
Obviously, the above applies, when you are writing your own methods. If you need to call methods that was declared with the ref or out modifiers on their parameters, you should use the same modifier before your parameter, when calling the method.
Also remember, that C# passes reference types (classes) by reference (as in, the reference is passed by value). So if you provide some method with a reference type as a parameter, the method can modify the data of the object; even without ref or out. But it cannot modify the reference itself (as in, it cannot modify which object is being referenced).
They are used mainly to obtain multiple return values from a method call. Personally, I tend to not use them. If I want multiple return values from a method then I'll create a small class to hold them.
ref and out are used when you want something back from the method in that parameter. As I recall, they both actually compile down to the same IL, but C# puts in place some extra stuff so you have to be specific.
Here are some examples:
static void Main(string[] args)
{
string myString;
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "Hello";
}
The above won't compile because myString is never initialised. If myString is initialised to string.Empty then the output of the program will be a empty line because all MyMethod0 does is assign a new string to a local reference to param1.
static void Main(string[] args)
{
string myString;
MyMethod1(out myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod1(out string param1)
{
param1 = "Hello";
}
myString is not initialised in the Main method, yet, the program outputs "Hello". This is because the myString reference in the Main method is being updated from MyMethod1. MyMethod1 does not expect param1 to already contain anything, so it can be left uninitialised. However, the method should be assigning something.
static void Main(string[] args)
{
string myString;
MyMethod2(ref myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod2(ref string param1)
{
param1 = "Hello";
}
This, again, will not compile. This is because ref demands that myString in the Main method is initialised to something first. But, if the Main method is changed so that myString is initialised to string.Empty then the code will compile and the output will be Hello.
So, the difference is out can be used with an uninitialised object, ref must be passed an initialised object. And if you pass an object without either the reference to it cannot be replaced.
Just to be clear: If the object being passed is a reference type already then the method can update the object and the updates are reflected in the calling code, however the reference to the object cannot be changed. So if I write code like this:
static void Main(string[] args)
{
string myString = "Hello";
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "World";
}
The output from the program will be Hello, and not World because the method only changed its local copy of the reference, not the reference that was passed in.
I hope this makes sense. My general rule of thumb is simply not to use them. I feel it is a throw back to pre-OO days. (But, that's just my opinion)
(this is supplemental to the existing answers - a few extra considerations)
There is another scenario for using ref with C#, more commonly seen in things like XNA... Normally, when you pass a value-type (struct) around, it gets cloned. This uses stack-space and a few CPU cycles, and has the side-effect that any modifications to the struct in the invoked method are lost.
(aside: normally structs should be immutable, but mutable structs isn't uncommon in XNA)
To get around this, it is quite common to see ref in such programs.
But in most programs (i.e. where you are using classes as the default), you can normally just pass the reference "by value" (i.e. no ref/out).
Another very common use-case of out is the Try* pattern, for example:
string s = Console.ReadLine();
int i;
if(int.TryParse(s, out i)) {
Console.WriteLine("You entered a valid int: " + i);
}
Or similarly, TryGetValue on a dictionary.
This could use a tuple instead, but it is such a common pattern that it is reasonably understood, even by people who struggle with too much ref/out.
Very simple really. You use exactly the same keyword that the parameter was originally declared with in the method. If it was declared as out, you have to use out. If it was declared as ref, you have to use ref.
In addition to Colin's detailed answer, you could also use out parameters to return multiple values from one method call. See for example the method below which returns 3 values.
static void AssignSomeValues(out int first, out bool second, out string third)
{
first = 12 + 12;
second = false;
third = "Output parameters are okay";
}
You could use it like so
static void Main(string[] args) {
int i;
string s;
bool b;
AssignSomeValues(out i, out b, out s);
Console.WriteLine("Int value: {0}", i);
Console.WriteLine("Bool value: {0}", b);
Console.WriteLine("String value: {0}", s);
//wait for enter key to terminate program
Console.ReadLine(); }
Just make sure that you assign a valid value to each out parameter to avoid getting an error.
Try to avoid using ref. Out is okay, because you know what will happen, the old value will be gone and a new value will be in your variable even if the function failed. However, just by looking at the function you have no idea what will happen to a ref parameter. It may be the same, modified, or an entirely new object.
Whenever I see ref, I get nervous.
ref is to be avoided (I beleive there is an fx-cop rule for this also) however use ref when the object that is reference may itself changed. If you see the 'ref' keyword you know that the underlying object may no longer be referenced by the same variable after the method is called.

Categories

Resources