Why does String.IsInterned return a string - c#

I see that String.Intern will actually add a string to the intern-pool and String.IsInterned will return the reference to that corresponding interned string. This makes me wonder:
Why does IsInterned return the referenced interned string and not a bool indicating whether a given string has been interned so far? I feel it's a funny use for an Is notation.
In what case would the code below return true?
bool InternCheck(string s)
{
string internedString = String.IsInterned(s);
return internedString != null && !String.Equals(internedString, s);
}

Why does IsInterned return the referenced interned string and not a bool indicating whether a given string has been interned so far? I feel it's a funny use for an Is notation.
For definitive "why?" you need to ask Microsoft. However, compare IsInterned() with similar (though functionally different of course) HashSet<T>.Add(). I.e. it's convenient to have a method that checks whether something is true, and if it is, provides the value you wanted as part of returning the information you want.
Why this method doesn't follow the TryXXX() pattern, again…you'd have to ask Microsoft, but we can easily guess. Obviously the method could have returned a bool and providing the string reference as an out parameter. But note that here, we know the value type is a nullable reference, and so can be null as an adequate way to indicate non-existence, which is different from the various types that implement TryXXX() methods.
In what case would the code below return true?
I don't see how that code would ever return true. If the string is not interned, it will return false, and if it is interned, then the interned string is necessarily always equal to the string that was passed in, and so !string.Equals(...) would also be false.
Is there some reason you think otherwise?

Let's imagine if the String.IsInterned method where to return a bool. Then all you'd know from calling bool whoopie = String.IsInterned(s); is that the value of your string is the same as a string that is interned. There is no indication that you have the same reference to the interned string.
Now the point of interning is to hold memory pressure down. You know you're creating a lot of similar strings and you want to ensure that you're not clogging up memory.
There's a cost to interning and that cost better be less than the cost of using up RAM.
So, back to String.IsInterned hypothetically returning a bool.
Since you don't know if you have the interned reference, which you'd want otherwise there's no point in interning, you'd end up writing this code a lot:
if (String.IsInterned(s))
{
s = String.GetInterned(s);
}
Or:
s = String.IsInterned(s) ? String.GetInterned(s) : s;
String.GetInterned is also a hypothetical method.
With the actual implementation of IsInterned this code becomes slightly simpler:
s = String.IsInterned(s) ?? s;
Let's see if we can improve this design.
If I try to implement a TryGetInterned style of operator I might implement it like this:
public static bool TryGetInterned(this string input, out string output)
{
string intermediate = String.IsInterned(input);
output = intermediate ?? input;
return intermediate != null;
}
This code works perfectly fine, but it leads to this kind of code repetition:
string s = "Hello World";
if (s.TryGetInterned(out string s2))
Console.WriteLine(s2); // `s` is interned
else
Console.WriteLine(s2); // `s` is NOT interned
This seems pretty pointless.
Compare this to the current IsInterned method:
string s = "Hello World";
s = String.IsInterned(s) ?? s;
Console.WriteLine(s);
Much simpler.
The only implementation that I could consider an improvement, in some circumstances, is this:
public static string GetIsInternedOrSelf(this string input)
=> String.IsInterned(input) ?? input;
Now I have this:
string s = "Hello World";
s = s.GetIsInternedOrSelf();
Console.WriteLine(s);
It's an improvement, but we've lost the ability to know if the string was interned.
The bottom-line is that I think String.IsInterned is probably as well designed as it could be.

Related

String immutability C# (Heap memory, new operator)

I recently learned about Stack and Heap and I wanted to ask a question concerning it. I've been "experimenting" with strings and I cannot explain - why is the following true if I am creating two different blocks of memory on the heap?
static void Main()
{
string test = "yes";
string secondTest = "yes";
Console.WriteLine(test == secondTest); //true
string thirdTest = new string("yes");
Console.WriteLine(secondTest == thirdTest); //true
}
The first string named test is the same as secondTest, because they have the same reference value, but when I create the third string thirdTest am I not creating a new block of memory on the heap by using "new"?
Why is it still true?
My guess:
What I wrote is exactly the same and I misunderstood the new operator, since when I watched tutorials, they were in Java language.
String name = "John"
String aThirdName = new String("John")
System.out.printIn(name == aThirdName); // false
This means that what I thought was different
(string test = "yes") = (string thirdTest = new string("yes"))
is actually the same. (By that I mean that those two lines are analogical)
If my guess is right, how do I create a new memory block on the heap with the same value?
(I want to know, just for learning purposes, I know that it is ineffective for the memory to have a lot of variables that have the same value that are on different memory blocks inside the heap)
As mentioned in comments, string is a bad example since it has the == operator overridden and the equals method overridden. For string, it is a reference type, but due to many overrides and other behavior it effectively behaves (in most cases) like a value type (especially in regards to equality).
That being said, if you were to create a simple class you'll find your test behaves exactly as you'd expect.
Snippets of the overridden equality in String to give you some context.
public static bool operator ==(string? a, string? b)
{
return string.Equals(a, b);
}
// Determines whether two Strings match.
public static bool Equals(string? a, string? b)
{
if (object.ReferenceEquals(a,b))
{
return true;
}
if (a is null || b is null || a.Length != b.Length)
{
return false;
}
return EqualsHelper(a, b);
}
It then starts down a rabbit hole of code with EqualsHelper that's not worth chasing in here (if you're interested, you can decompile it or find it online).
string firstTest=new string("test") and string secondTest="test" are the same, second version is just syntactic sugar. About why firstTest==secondTest //true, that's why class String override method Equals and operator (==) is also overriden and use Equals.

What's the point of variables in C# 7.0's pattern matching?

I've been reading about the "is" operator pattern matching in C# 7.0 and I don't see the reason for variable declaration.
If we have this
if(str is string s)
Console.WriteLine(s);
What's the point of "s"?
In this post it says the whole point of such variables is not to access the evaluated object twice. But there's no point!
If this is true then that means str is just a string object and accessing it twice to Console.WriteLine it shouldn't be that much of a consideration. In any case it's content is copied to s accessing which surely would take just as much time as str.
What I am asking is - why declare variables in the pattern matching feature of C#7.0 when accessing the evaluated variable should be about the same operation as copying it and then accessing it's copy?
What's the point of "s"?
It's a variable of the type that you've just checked for, which you often want to use.
Your example is an unfortunate one as Console.WriteLine accepts object as well... but suppose you wanted to print out the length of the string. Here's a complete example without pattern matching:
public void PrintLengthIfString(object obj)
{
if (obj is string)
{
string str = (string) obj;
Console.WriteLine(str.Length);
}
}
It's not only longer, but it's performing the same check twice, effectively: once for the is operator, and once for the cast. Pattern matching make this simpler, by getting the value of the string as part of the is operator:
public void PrintLengthIfString(object obj)
{
if (obj is string str)
{
// No cast here, it's in the pattern match!
Console.WriteLine(str.Length);
}
}

string concatenation and reference equality

In C# strings are immutable and managed. In theory that would mean the concatenation of any strings A and B would cause the allocation of a new buffer however this is all pretty obfuscated. When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime? Furthermore, how does the runtime/compiler handle s2's value/allocation when I modify the value of s1? My program would indicate that the memory at the original address of s1 remains intact (and s2 continues pointing there) while a relloc occurs for the new value and then s1 is pointed there, is this an accurate description of what happens under the covers?
Example program;
static void Main(string[] args)
{
string s1 = "Some random text I chose";
string s2 = s1;
string s3 = s2;
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // true
s1 = s1 + "";
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // true
Console.WriteLine(s2);
s1 = s1 + " something else";
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // false cause s1 got realloc'd
Console.WriteLine(Object.ReferenceEquals(s2, s3));
Console.WriteLine(s2);
Console.ReadKey();
}
When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime?
It is both a compile time optimization and also an optimization performed in the implementation of the overloaded concatenation operator. If you concat two compile time literals, or concat a string known to be null or empty at compile time, the concatenation is done at compile time, and then potentially interned, and will therefore be reference equal to any other compile time literal string that has the same value.
Additionally, String.Concat is implemented such that if you concat a string with either null or an empty string, it just returns the other string (unless the other string was null, in which case it returns an empty string). The test you already have demonstrates this, as you're concatting a non-compile time literal string with an empty string and it's staying reference equal.
Of course if you don't believe your own test, you can look at the source to see that if one of the arguments is null then it simply returns the other.
if (IsNullOrEmpty(str0)) {
if (IsNullOrEmpty(str1)) {
return String.Empty;
}
return str1;
}
if (IsNullOrEmpty(str1)) {
return str0;
}
When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime?
This is a run-time optimization. Here is how it is implemented in Mono:
public static String Concat(String str0, String str1) {
Contract.Ensures(Contract.Result() != null);
Contract.Ensures(Contract.Result().Length ==
(str0 == null ? 0 : str0.Length) +
(str1 == null ? 0 : str1.Length));
Contract.EndContractBlock();
// ========= OPTIMIZATION BEGINS ===============
if (IsNullOrEmpty(str0)) {
if (IsNullOrEmpty(str1)) {
return String.Empty;
}
return str1;
}
if (IsNullOrEmpty(str1)) {
return str0;
}
// ========== OPTIMIZATION ENDS =============
int str0Length = str0.Length;
String result = FastAllocateString(str0Length + str1.Length);
FillStringChecked(result, 0, str0);
FillStringChecked(result, str0Length, str1);
return result;
}
The compiler may produce additional optimizations of its own - for example, concatenating two string literals produces a new literal value at compile time, without calling string.Concat. This is not different from C#'s handling of other expressions that include compile-time constants of other data types, though.
Furthermore, how does the runtime/compiler handle s2's value/allocation when I modify the value of s1?
s1 and s2 are independent references to the same string object, which is immutable. Reassigning another object to one of them does not change the other reference.
It is a decision by the String.Concat function not to concat the string. It checks whether s1 is null and assigns "" to s1 if yes.
s1 = s1 + "";
gets optimized by the comiler.
s1 = s1 ?? "";
If you want to learn more check out this link
String concatenation is specified to return a string whose sequence of characters is the concatenation of the sequences encapsulated by the string representations of the things being concatenated. In cases where no existing string contains the proper sequence of characters, the concatenation code will need to create a new one; further, even in cases where an existing string might contain the proper sequence of characters, it will usually be faster for the computer to create a new string than try to find the existing one. I believe, however, that concatenation is allowed to return an existing string in any case where it can quickly find one that contains the proper characters, and in the case of concatenating a zero-length string to a non-zero-length string, finding a string which contains the proper characters is easy.
Because of behavioral details like the above, in most cases the only legitimate application of ReferenceEquals with strings is in situations where a true result is interpreted to say "the strings definitely contain the same characters" and a "false" result to say "the strings might not contain the same characters". It should not be interpreted as saying anything about where the strings came, how they were created, or anything like that.
When you concatenate with the identity (the empty string) the
reference maintains intact. Is this a compile time optimization or is
the overloaded assignment operator making the decision to not realloc
at runtime?
Neither. It's the Concat method that does that decision. The code is actually compiled into:
s1 = String.Concat(s1, "");
The Concat method contains this code, that makes it return the first parameter if the second is empty:
if (IsNullOrEmpty(str1)) {
return str0;
}
Ref: Microsoft reference source: String.Concat(string, string)
My program would indicate that the memory at the original address of
s1 remains intact (and s2 continues pointing there) while a relloc
occurs for the new value and then s1 is pointed there
That is correct.

"bool" as object vs "string" as object testing equality

I am relatively new to C#, and I noticed something interesting today that I guess I have never noticed or perhaps I am missing something. Here is an NUnit test to give an example:
object boolean1 = false;
object booloan2 = false;
Assert.That(boolean1 == booloan2);
This unit test fails, but this one passes:
object string1 = "string";
object string2 = "string";
Assert.That(string1 == string2);
I'm not that surprised in and of itself that the first one fails seeing as boolean1, and boolean2 are different references. But it is troubling to me that the first one fails, and the second one passes. I read (on MSDN somewhere) that some magic was done to the String class to facilitate this. I think my question really is why wasn't this behavior replicated in bool? As a note... if the boolean1 and 2 are declared as bool then there is no problem.
What is the reason for these differences or why it was implemented that way? Is there a situation where you would want to reference a bool object for anything except its value?
It's because the strings are in fact referring the same instance. Strings are interned, so that unique strings are reused. This means that in your code, the two string variables will refer to the same, interned string instance.
You can read some more about it here: Strings in .NET and C# (by Jon Skeet)
Update
Just for completeness; as Anthony points out string literals are interned, which can be showed with the following code:
object firstString = "string1";
object secondString = "string1";
Console.WriteLine(firstString == secondString); // prints True
int n = 1;
object firstString = "string" + n.ToString();
object secondString = "string" + n.ToString();
Console.WriteLine(firstString == secondString); // prints False
Operator Overloading.
The Boolean class does not have an overloaded == operator. The String class does.
As Fredrik said, you are doing a reference compare with the boolean comparison. The reason the string scenario works is because the == operator has been overloaded for strings to do a value compare. See the System.String page on MSDN.

What is the best practice for syntax in casting a variable?

Which (if any) is more correct? Why?
string someVariable = (string) someOtherVariable;
string someVariable = someOtherVariable.ToString();
string someVariable = someOtherVariable as string;
I've used all three, but I don't have any preference or understanding why one is better than the other.
These are not all examples of casting.
This is a cast:
string someVariable = (string) someOtherVariable;
This is method call:
string someVariable = someOtherVariable.ToString();
And this is a safe cast:
string someVariable = someOtherVariable as string;
The first and third examples are actual casts. The first cast has the potential to throw an InvalidCastException whereas the third example will not throw that exception. That is why the as operator is known as a safe cast.
Here's my article on the subject.
http://blogs.msdn.com/ericlippert/archive/2009/10/08/what-s-the-difference-between-as-and-cast-operators.aspx
As for which one is "most correct", the one that is most correct is the one that has the meaning you intend to convey to the reader of the program.
"ToString()" conveys "this is probably not a string; if it is not, then I wish to obtain from the object a string which represents it."
The "cast" operator conveys either "this is a string, and I am willing to have my program crash if I am wrong", or the opposite, "this is not a string and I want to call a user-defined conversion on this object to string".
The "as" operator conveys "this might be a string and if it isn't, I want the result to be null."
Which of those four things do you mean?
The three do different things -- none are "more correct", it depends on your situation. If you have a bunch of objects that may not be strings, you'd probably use .ToString() (with a null check, if you expect nulls).
If you only care about the non-null strings, but still expect to be receiving non-strings, use an "as" cast, and then ignore the values that come in as null (they were either originally null, or of a non-string type)
if you expect to receive only strings, it is best to use the (string) cast. This expresses the intent best in the code.
object foo = 5;
string str = (string)foo; // exception
string str = foo as string; // null
string str = foo.ToString(); // "5"
object foo = "bar";
string str = (string)foo; // "bar"
string str = foo as string; // "bar"
string str = foo.ToString(); // "bar"
object foo = null;
string str = (string)foo; // null
string str = foo as string; // null
string str = foo.ToString(); // exception
The as keyword is very useful if you think the conversion will fail during an upcast. For instance, if I want to perform the same operation on similar types in a Control list... let's say unchecking all Checkboxes:
foreach (Control ctrl in Controls)
{
Checkbox box = ctrl as Checkbox;
if (box != null)
box.Checked = false;
}
This way, if my list has something else, like a text box or a label, no exception is thrown (as simply sets the variable = null if it fails), and it's very efficient. There is no exception overhead.
The ideas of CAST and CONVERT should not be confused here. Casting involves viewing an object as if it was another type. Converting involves transforming an object to another type.
If your intention is to CAST to a string, you should use the first or third. (Option depends on what you want to happen in the error condition. See bangoker's answer.)
If your intention is to CONVERT to a string, you should use the second. (Or better, ChaosPandion's modified statement with the trinary operator.) That is because the ToString method's behaviour is defined as converting the object into a string representation.
This is 100% personal preference here, but for me I use the folowing:
string someVariable = (string) someOtherVariable;
When converting to a child or parent type (eg. NetworkStream->Stream)
string someVariable = someOtherVariable.ToString();
When converting to a new type (int -> string)
And I never use the latter (as string) method, mostly because coming from a C/C++ background I prefer the() and it's a bit more concise.
There is a big difference between casting with parenthesis and casting with "as".
Basically, parenthesis will thrown an exception while "as" will return null instead of raising an exception.
More detailed info here
string someVariable = (string) someOtherVariable;
this is your good old normal casting and it will throw an exception if you try to cast something into something it CANNOT be casted (thus some times you need to check if they are castable)
string someVariable = someOtherVariable.ToString();
is not really casting, its executing a method that may come from different places(interfaces) but that ALL objects in C# have, since they inherit from the Object object, which has it. It has a default operation which is giving the name of the type of the object, but you can overload it to print whatever you want your class to print on the ToString method.
string someVariable = someOtherVariable as string;
This is a new c# casting, it will check first if it is castable by using a variable is string first and then doing the casting if it is valid or return null if it is not, so it could be a silent error if you are expecting exceptions, since you should check against null.
Basically
myType as myOtherType
is the same as:
var something = null;
if(myType is myOtherType)
{
something = (myType) myotherType;
}
except that as will check and cast in one step, and not in 2.
First of all, you should avoid casting with AS operator. The article linked explains why.
Second, you can use AS operator ONLY if you expect the value not being of the type you cast too. So you will have to check that manually.
Also the obj.ToString() method call is not a casting, it converts object to a string representation (which in case of a string itself is the same string). This can be ovveridden by any class.
So as a general rule I follow this:
Always use (Type) casting.
Use as operator only if object can be of other type than you cast to.
If using as operator - ALWAYS check the result for NULL.
UseToString only in cases when you need to display information about the object.
If your question about the best practice for syntax in casting a variable, then I prefer to use next one:
var someVariable = someOtherVariable as string ?? string.Empty;
Off course you can use someVariableDefaultValue instead of string.Empty.
In case if you cast not to string but into the some complex type, then I recommend next syntax, sometimes called the Safe Navigation Operator:
var complexVariable = otherComplexVariable as ComplexType;
if (complexVariable?.IsCondition)
{
//your code if cast passed and IsCondition is true
}
else
{
//your code if cast not passed or IsCondition is false
}

Categories

Resources