Possible to create case insensitive string class?

Possible to create case insensitive string class? - c#

What would be required to create a case-insensitive string type that otherwise behaves exactly like a string?
I've never heard of anyone making a case insensitive string type like this and it's obviously not part of the framework, but it seems like it could be very useful. The fact that SQL does case insensitive comparisons by default is a great case in point. So I'm thinking it's either not possible, or else there's a really good reason why no one does it that I'm not aware of.
I know it would require using an implicit operator for assignment, and you would have to override the equals operator. And for overriding GetHashCode(), I'm thinking you could just return ToLower().GetHashCode().
What am I missing?

Comparing string is rather easy. You can simply use the equals method or the compare method.
Example:
string s = "A";
s.Equals("a", StringComparison.InvariantCultureIgnoreCase); // Will return true.
string s = "A";
s.Equals("a", StringComparison.InvariantCulture); // Will return false.
You should also look at this. That will explain a little more on comparing strings.

Building on type of deathismyfriend's answer above, I would extend the string class:
public static class StringExtensions
{
public static int CaseInsensitveCompare(this string s, string stringToCompare)
{
return String.Compare(s, stringToCompare, StringComparison.InvariantCultureIgnoreCase);
}
}
And the call:
int result = firstString.CaseInsensitveCompare(secondString);

It wouldn't behave "exactly like a string". The string type is special and is baked into the language spec. C# strings exhibit special behavior, such as
being a reference type, that gets passed by value. Reference types are normally passed by...well...reference.
are interned by default. That means that there is only ever a single instance of a given string. The following code results in the creation of just a single string: a, b and c all point to exactly the same instance of the string quick. That means that Object.ReferenceEquals() is true when comparing any two:
string a = "The quick brown dog...".Substring(4,5) ;
string b = new string(new char[]{'q','u','i','c','k'});
string c = new StringBuilder().
.Append('q')
.Append('u')
.Append('i')
.Append('c')
.Append('k')
.ToString()
;
[edited to note: while one might think that this should be possible, a little fiddling around suggests that one can't actually create a custom implementation/subtype of CompareInfo as it has no public constructors and its default constructor is internal. More in the answers to this question: Globally set String.Compare/ CompareInfo.Compare to Ordinal
Grrr...]
What you could do is this:
String comparisons are done using the current culture's collation/comparison rules. Create a custom culture for your app, say, a copy of the the US culture that uses the collation/comparison rules you need. Set that as the current culture and Bob's-yer-uncle.
You'll still get compiler/ReSharper whines because you're doing string comparisons without specifying the desired comparison semantics, but your code will be clean.
For more details, see
https://msdn.microsoft.com/en-us/library/kzwcbskc(v=vs.90).aspx
https://msdn.microsoft.com/en-us/library/se513yha(v=vs.100).aspx

Related

LINQ C# Dictionary [duplicate]

true.ToString()
false.toString();
Output:
True
False
Is there a valid reason for it being "True" and not "true"? It breaks when writing XML as XML's boolean type is lower case, and also isn't compatible with C#'s true/false (not sure about CLS though).
Update
Here is my very hacky way of getting around it in C# (for use with XML)
internal static string ToXmlString(this bool b)
{
return b.ToString().ToLower();
}
Of course that adds 1 more method to the stack, but removes ToLowers() everywhere.

Only people from Microsoft can really answer that question. However, I'd like to offer some fun facts about it ;)
First, this is what it says in MSDN about the Boolean.ToString() method:
Return Value
Type: System.String
TrueString if the value of this
instance is true, or FalseString if
the value of this instance is false.
Remarks
This method returns the
constants "True" or "False". Note that
XML is case-sensitive, and that the
XML specification recognizes "true"
and "false" as the valid set of
Boolean values. If the String object
returned by the ToString() method
is to be written to an XML file, its
String.ToLower method should be
called first to convert it to
lowercase.
Here comes the fun fact #1: it doesn't return TrueString or FalseString at all. It uses hardcoded literals "True" and "False". Wouldn't do you any good if it used the fields, because they're marked as readonly, so there's no changing them.
The alternative method, Boolean.ToString(IFormatProvider) is even funnier:
Remarks
The provider parameter is reserved. It does not participate in the execution of this method. This means that the Boolean.ToString(IFormatProvider) method, unlike most methods with a provider parameter, does not reflect culture-specific settings.
What's the solution? Depends on what exactly you're trying to do. Whatever it is, I bet it will require a hack ;)

...because the .NET environment is designed to support many languages.
System.Boolean (in mscorlib.dll) is designed to be used internally by languages to support a boolean datatype. C# uses all lowercase for its keywords, hence 'bool', 'true', and 'false'.
VB.NET however uses standard casing: hence 'Boolean', 'True', and 'False'.
Since the languages have to work together, you couldn't have true.ToString() (C#) giving a different result to True.ToString() (VB.NET). The CLR designers picked the standard CLR casing notation for the ToString() result.
The string representation of the boolean true is defined to be Boolean.TrueString.
(There's a similar case with System.String: C# presents it as the 'string' type).

For Xml you can use XmlConvert.ToString method.

It's simple code to convert that to all lower case.
Not so simple to convert "true" back to "True", however.
true.ToString().ToLower()
is what I use for xml output.

How is it not compatible with C#? Boolean.Parse and Boolean.TryParse is case insensitive and the parsing is done by comparing the value to Boolean.TrueString or Boolean.FalseString which are "True" and "False".
EDIT: When looking at the Boolean.ToString method in reflector it turns out that the strings are hard coded so the ToString method is as follows:
public override string ToString()
{
if (!this)
{
return "False";
}
return "True";
}

I know the reason why it is the way it is has already been addressed, but when it comes to "custom" boolean formatting, I've got two extension methods that I can't live without anymore :-)
public static class BoolExtensions
{
public static string ToString(this bool? v, string trueString, string falseString, string nullString="Undefined") {
return v == null ? nullString : v.Value ? trueString : falseString;
}
public static string ToString(this bool v, string trueString, string falseString) {
return ToString(v, trueString, falseString, null);
}
}
Usage is trivial. The following converts various bool values to their Portuguese representations:
string verdadeiro = true.ToString("verdadeiro", "falso");
string falso = false.ToString("verdadeiro", "falso");
bool? v = null;
string nulo = v.ToString("verdadeiro", "falso", "nulo");

This probably harks from the old VB NOT .Net days when bool.ToString produced True or False.

Is it possible to create a string that's not reference-equal to any other string?

It seems like .NET goes out of its way to make strings that are equal by value equal by reference.
In LINQPad, I tried the following, hoping it'd bypass interning string constants:
var s1 = new string("".ToCharArray());
var s2 = new string("".ToCharArray());
object.ReferenceEquals(s1, s2).Dump();
but that returns true. However, I want to create a string that's reliably distinguishable from any other string object.
(The use case is creating a sentinel value to use for an optional parameter. I'm wrapping WebForms' Page.Validate(), and I want to choose the appropriate overload depending on whether the caller gave me the optional validation group argument. So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value. Obviously there's other less arcane ways of approaching this specific use case, the aim of this question is more academical.),

It seems like .NET goes out of its way to make strings that are equal
by value equal by reference.
Actually, there are really only two special cases for strings that exhibit behavior like what you're describing here:
String literals in your code are interned, so the same literal in two places will result in a reference to the same object.
The empty string is a particularly weird case, where as far as I know literally every empty string in a .NET program is in fact the same object (i.e., "every empty string" constitutes a single string). This is the only case I know of in .NET where using the new keyword (on a class) may potentially not result in the allocation of a new object.
From your question I get the impression you already knew about the first case. The second case is the one you've stumbled across. As others have pointed out, if you just go ahead and use a non-empty string, you'll find it's quite easy to create a string that isn't reference-equal to any other string in your program:
public static string Sentinel = new string(new char[] { 'x' });
As a little editorial aside, I actually wouldn't mind this so much (as long as it were documented); but it kind of irks me that the CLR folks (?) implemented this optimization without also going ahead and doing the same for arrays. That is, it seems to me they might as well have gone ahead and made every new T[0] refer to the same object too. Or, you know, not done that for strings either.

If the strings are ReferenceEqual, they are the same object. When you call new string(new char[0]), you don't get a new object that happens to be reference-equal to string.Empty; that would be impossible. Rather, you get a new reference to the already-created string.Empty instance. This is a result of special-case code in the string constructor.
Try this:
var s1 = new string(new char { 'A', 'b' });
var s2 = new string(new char { 'A', 'b' });
object.ReferenceEquals(s1, s2).Dump();
Also, beware that string constants are interned, so all instances of the literal "Ab" in your code will be reference equal to one another, because they all refer to the same string object. Constant folding applies, too, so the constant expression "A" + "b" will also be reference equal to "Ab".
Your sentinal value, therefore, can be a privately-created non-zero-length string.

You can put non-printable characters into the string... even the 0/nul character. But really, I'd just use null for the sentinel value, and try to ensure code elsewhere is using the empty string instead of null.

So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value.
I've never done this before, but my thoughts would be to make a Nullable class... but instead of Nullable it would be Parameter and would keep track on whether or not it has been assigned anything (including null).

Question about Object Identity and Object Equality and String class exception

This is a Java and C# question.
We all know that, Object Identity(==) tests whether two objects refer to the same location and Obejct Equality(Equals method) tests whether two different (non identical)objects have the same value .But In case of string object Object Identity and Object Equality are same.
For e.g Below two boolean expressions in if statements return true
string a="123";
string b="123";
if(a==b)
if(a.Equals(b))
Why is it so??
What is the rational behind this design decision?

Java and C# both use a memory-saving technique called string interning. Because strings are immutable in these languages, they can pool frequently-used strings (included hard-coded string literals, like in your example) and use multiple references to that one string in memory to save space.

As far as I know, in .net the == Operator for Strings is overloaded to use Equals() instead of object identity. See this explanation for details: http://www.dotnetperls.com/string-equals
For if you need to know if it's really the same object, use this:
Object.ReferenceEquals(string1, string2)

Actually, at least in Java, there is a caching mechanism on strings. A pitfall is that two strings that are equal will sometimes, but not always return true when applying the identity operator. the following code prints false:
String a="123";
String b="12";
b=b+"3";
System.out.println(a==b);

If you really want to make sure, that a.equals(b) == true but (a==b) == false evaluates to false for two String a and b, then you can use the completely undervalued (^^) String constructor:
String a = new String("abc");
String b = new String("abc");
if (a.equals(b)) {
doTheyAreEqual();
if (a != b) {
doButNotTheSame();
}
}

Cannot convert type 'System.Enum' to int

(OK, I'll expose the depths of my ignorance here, please be gentle)
Background
I've got a method which looks (a bit) like this:
public void AddLink(Enum enumVal)
{
string identifier = m_EnumInterpreter(enumVal);
AddLink(identifier);
}
The EnumInterpreter is a Func<Enum, string> that is passed in when the parent class is created.
I'm using Enum because at this level it is 'none of my business'- I don't care which specific enum it is. The calling code just uses a (generated) enum to avoid magic strings.
Question
If the EnumInterpreter sends back an empty string, I'd like to throw an exception with the actual value of enumVal. I thought I would just be able to cast to int, but it the compiler won't have it. What am I doing wrong? (Please don't say 'everything').

System.Enum cannot be directly cast to Integer, but it does explicitly implement IConvertible, meaning you can use the following:
public void AddLink(Enum enumVal)
{
string identifier = m_EnumInterpreter(Convert.ToInt32(enumVal));
AddLink(identifier);
}
Keep in mind that if your Enum is actually using something other than an Integer (such as a float), you'll lose the non-integer data on conversion. Or obviously replace the Convert call with whatever you are converting from (if it's known)

No, you aren't able to cast it to an int because System.Enum is not an enum, it's just the base class for enums.
EDIT:
You can get the value as follows, but it is ugly:
int intVar = (int)enuYourEnum.GetType().GetField("value__").GetValue(objYourEnum);

try this..
m_EnumInterpreter((int) (object) enumVal);

Various things here:
1) the answer of Ryan looks ok, but... I would rather pass the Enum down to the enum interpreter, so that you can do the whole Convert.To... there. If you know that you are using ONLY integer based enums, the Convert.ToInt32() is just fine. Otherwise you may want to add by using either reflection or try/catch other conversions.
2) You may also consider using members of the Enum class, like .GetName(), .GetValue(), etc. since they deal directly with the defined names and values independent of the enum type.
3) technically I would not throw the exception outside the enum interpreter. If that condition is generally true, throw the exception from inside the enum interpreter, so that all uses of the class will benefit of the validation. Or you might end up duplicating code.
4) you seem to have an C++/MFC background judging from your variable naming. You might want to get into C# style naming conventions, it will ease your life when using/reading other peoples code and libraries. Check out MS's StyleCop for a good addin to help with naming.

I don't know whether to include this in my question, or as an answer. The problem is that it isn't THE answer, but it is the answer that works for me.
What I discovered, by chance while trying something else, is that if I just wodge it onto the end of a string, I get what I want:
throw new Exception("EnumInterpreter returns empty string for enumVal=" + enumVal);
//EnumInterpreter returns empty string for enumVal=3720116125
I actually simplified to int in my question, the real data type is uint (in this particular instance). Fortunately, given that I only actually wanted the string, I don't have to worry about that.
I'm not sure which of the three other answers is 'right', so vote away...

For me it was enough to cast to object first, since it's just a compilation error.
public static int AsInt(this Enum #this)
{
return (int)(object)#this;
}

I understand that this is probably not the solution to your exact problem, but I just want to post how I solved this for a particular API I was using.
int result = (int) (ActualEnumType) MethodThatReturnsSystemEnumType( arg1, arg2 );
Hopefully that will be of help to someone. Double cast FTW.

Why not parse the enum to a string and return the actual enum value?
public enum MyEnum { Flower = 1, Tree = 2, Animal = 3 };
string name = MyEnum.Flower.ToString(); // returns "Flower"
I think .ToString() is deprecated and I'm not sure about the new way to do it. I would've thought the actual enum representation would be more useful than the int?

C# Generics, Comparing 2 strings fail unless explicitly specified

I thought i've seen it all but this... :)
I was working on a generic graph of type string,
Graph<string> graph = new Graph<string>();
Graph is declared with a class constraint like this:
public class Graph<T> where T : class
Next i fill up the graph with some dynamicly generated strings:
for (char t = 'A'; t < 'J'; t++)
{
GraphPrim.Add(t.ToString());
}
So far so good, (Node is a internal class containing the original value and a list of references to other nodes (because its a graph))
Now, when i try to create relations between the different nodes, i have to look up the right node by checking its value and thats where the weirdness starts.
The following code, is a direct copy of the result found in the immidiate window after doing some tests:
Nodes.First().Value
"A"
Nodes.First().Value == "A"
false
Nodes.First().Value.ToString() == "A"
true
Am i totally missing something or shouldn't Nodes.First().Value == "A" use a string comparison method. (The JIT compiler has knowledge about the type beeing used on runtime, and with that, its supported methods, right?). It seems to me like when not explicitly specifying a string, it will do a reference check rather then a string test.
It would be great if someone could explain this to me,
Thanks in advance!

If the types aren't fully known up front (i.e. Value is only known as T, and is not strictly known to be a string), use things like:
object.Equals(Nodes.First().Value,"A")
Of course, you could cast, but in this case you'd need a double-cast ((string)(object)) which is ugly.
If you know the two objects are the same type (i.e. two T values), then you can use:
EqualityComparer<T>.Default.Equals(x,y)
The advantage of the above is that it avoids boxing of structs and supports lifted Nullable<T> operators, and IEquatable<T> in addition to Equals.

If the Value property of your Nodes is object, the == operator in
Nodes.First().Value == "A"
will do a comparison by reference instead of comparing strings.

== is a static method and therefore not virtual. The selection of which == method to use is done at compile-time, not run-time. Depending on the compile-time type of the object, it is probably choosing the implementation of == for objects that compares by reference.
If you use the virtual Equals methods instead, this will work as you expect.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Possible to create case insensitive string class? - c#

Related

LINQ C# Dictionary [duplicate]

Is it possible to create a string that's not reference-equal to any other string?

Question about Object Identity and Object Equality and String class exception

Cannot convert type 'System.Enum' to int

C# Generics, Comparing 2 strings fail unless explicitly specified

Categories

Resources