I have started to learn Linq recently. I came across a few inbuild methods like Min() and Max().
The working of these two methods is fine with int[]. But when it comes to string[], I am curious how it will work. I have tried some codes
string[] cars = { "Volvo", "BMW", "Ford", "Mazda" };
Console.WriteLine(cars.Max());
Console.WriteLine(cars.Min());
The output was like this:
**Volvo for Max()
BMW for Min()**
Can you please explain how it is working, is it taking the first letter in alphabetical order or is there any mechanism it is using like based on ASCII values etc?
All types that implement the IComparable or IComparable interface can be compared, using the CompareTo method implemented by each type. All primitive types implement IComparable<T>, including char and string. LINQ's Min(IEnumerable) and Max(IEnumerable) use this implementation to find the minimum or maximum in an enumerable.
String Comparisons
Comparing strings though is a bit more interesting than comparing integers. The strings are typically compared in dictionary order (lexicographically) but ... whose dictionary? Different languages have different sorting rules, and sometimes two letters are considered a single one. Even Danes forget that AA is equivalent to Å in Danish.
The dictionary used to compare strings is provided by the CultureInfo class. By default, the current thread's culture is used which typically matches the culture of the end user (in desktop applications) or the system locale in server applications. In a Danish culture for example, AA is treated differently from aa - I think one of them is ordered after other letters of the same case and the other isn't, but don't ask me which.
The InvariantCulture specifies a locale-insensitive culture that can be used to handle strings the same way in every locale. It uses mostly sensible settings (eg . for the decimal point) except dates, where it uses the US format instead of the ISO8601 (YYYY-MM-DD) format as everyone would expect.
Custom comparisons
It's possible to specify a different comparison method by passing a class that implements IComparer to any LINQ methods affected by order. Min(IEnumerable,IComparer) is one example.
The StringComparer class contains some predefined comparers :
CurrentCulture is the default
CurrentCultureIgnoreCase uses the current culture but ignores case, so A is equal to a. This is very useful eg in dictionaries.
InvariantCulture and InvariantCultureIgnoreCase use the Invariant culture for ordering
Finally, Ordinal and OrdinalIgnoreCase don't use a dictionary but compare the Unicode values of the characters. That's the fastest option if you don't care about locale rules
.Max() use Compare(String, String)
which compares two specified String objects and returns an integer that indicates their relative position in the sort order.
Source code of .Max() for string compare
public static TSource Max<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
Comparer<TSource> comparer = Comparer<TSource>.Default;
TSource value = default(TSource);
if (value == null) {
foreach (TSource x in source) {
if (x != null && (value == null || comparer.Compare(x, value) > 0))
value = x;
}
return value;
}
else {
bool hasValue = false;
foreach (TSource x in source) {
if (hasValue) {
if (comparer.Compare(x, value) > 0) //Compare strings
value = x;
}
else {
value = x;
hasValue = true;
}
}
if (hasValue) return value;
throw Error.NoElements();
}
}
Compare(String, String)
https://learn.microsoft.com/en-us/dotnet/api/system.string.compare?view=net-6.0#system-string-compare(system-string-system-string)
Strings are compared alphabetically, which translates into the following logic:
which item has a higher first character
which item has a higher second character
which item has a higher third character
...
The difference is detected at the first character that differs. So, when you compare string1 with string2, then string1 is greater than string2 if at their first differing character (from left to right), string1 has a greater value at that position than string2. If string2 is string1 + string3, then the first difference between string1 and string2 is beyond the end of string1 and at that point the comparison yields that string2 is greater than string1.
If you are dissatisfied with this comparison, then you can specify what comparer you intend to use, see here: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.max?view=net-6.0#system-linq-enumerable-max-1(system-collections-generic-ienumerable((-0))-system-collections-generic-icomparer((-0)))
Basically in that case you need to implement an IComparer and pass it as a second parameter, like
cars.Max(c => c, yourcomparer)
With string list Min() or Max() follows the first and last word or letter respectively but in case of integers Min() or Max() follows the exact phenomena of finding the Minimum and Maximum numbers from the list.
Please check this image and it's output
Related
In my app I need to parse a string like this, ".Add(20).Subtract(10).Add(2)" in a generic way into a series of method calls. In code I will supply the user with a value of T, and then expect the user to type an expression of the above format to calculate a new T from the expression. In the above example, I show the user an int and they typed the above string.
I need to aggregate any number of these chained string-representation of method calls into one cache-able property (delegate? Func<T,T>?) so that whenever a new value of T comes along it can be passed through the cached expression.
I initially thought there would be a way to aggregate these like a functional-programming pipeline, the outcome being a Func<T,T> that could represent the pipeline of methods. I'm guaranteed to know typeof(T) beforehand.
I'm hitting issues. Here's where I'm at:
I can regex the string with
\.(?<expName>[A-Z,a-z]+)\((?<expValue>[^)]+)\)
To get these matches:
expName
expValue
"Add"
"20"
"Subtract"
"10"
"Add"
"2"
I was expecting to use a TypeConverter to parse all expValue matches but I realized that given an arbitrary method T Foo(object arg) the arg can be any type to be determined by the specific method. The only guarantee is that a T input should always result in a T output.
We already know what type T is so we can theoretically map typeof(T) to a set of strings representing method names. I tried creating Dictionaries like this:
public static readonly Dictionary<string, Func<double, double, double>> DoubleMethods = new Dictionary<string, Func<double, double, double>>()
{
{"Add",(d,v)=>d+v },
{"Subtract",(d,v)=>d-v },
{"Multiply",(d,v)=>d*v },
{"Divide",(d,v)=>d/v }
};
public static Dictionary<string, Func<T, T, T>> TypeMethods<T>(Type t)
{
if(t.GetType() == typeof(double)) { return DoubleMethods; }
}
This won't compile, as I can't mix generics like this.
How do I create a linking structure that maps strings of predefined method names to a method, and then pass it the arg?
I also see that I will incur a bunch of boxing/unboxing penalties for arguments that happen to be primitive types, as in the example int.Add(int addedVal) method.
I believe I'm delving into parser/lexer territory without much familiarity.
Can you give an example of some code to point me in the right direction?
I'm not sure I see the need for the generics part:
var ops = "Add(20).Subtract(10).Divide(2).Multiply(5)";//25
var res = ops
.Split("().".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Chunk(2)
.Aggregate(0.0,(r,arr)=> r = DoubleMethods[arr[0]](r, double.Parse(arr[1])));
All those inputs parse as double, so let's just break the input string into chunks of 2 after splitting on the punctuation:
Add 20
Subtract 10
Divide 2
Multiply 5
Then run an agg op where we start from 0 (I wanted to start from 20 actually so that is what the add 20 is for)
The agg op looks up the method to call in the dictionary using the first element of the chunk
DoubleMethods[arr[0]]
And calls it passing in the current accumulator value r and the double parsing of the second element of the chunk:
(r, double.Parse(arr[1]))
and store the result into the accumulator for passing into the next op
I commented "do it in decimal" because it doesn't have floating point imprecision, but I used double just because your code did; you could swap to using decimal if you like, main point being that I can't see why you're worried about generics when decimal/double can store values one would encounter in ints too.
//basevalue is the value of the code your applying the change to.
soo... lets pretend the class is called number
Number ect = new Number(startingAmount)
in number we would have startingAmount = this.baseValue
public static T add(T baseValue, T change){
return baseValue+change;
}
public static T subtract(T baseValue, T change){
return baseValue-change;
}
public static T multiply(T baseValue, T change){
return baseValue*change;
}
public static T divide(T baseValue, T change){
return baseValue/change;
}
This should work... At least I hope it does
Here is a video on generics https://www.youtube.com/watch?v=K1iu1kXkVoA&t=1s
Java == c# so everything should be almost exactly the same
In part of my application I have an option that displays a list of albums by the current artist that aren't in the music library. To get this I call a music API to get the list of all albums by that artist and then I remove the albums that are in the current library.
To cope with the different casing of names and the possibility of missing (or extra punctuation) in the title I have written an IEqualityComparer to use in the .Except call:
var missingAlbums = allAbumns.Except(ownedAlbums, new NameComparer());
This is the Equals method:
public bool Equals(string x, string y)
{
// Check whether the compared objects reference the same data.
if (ReferenceEquals(x, y)) return true;
// Check whether any of the compared objects is null.
if (x is null || y is null)
return false;
return string.Compare(x, y, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0;
}
This is the GetHashCode method:
public int GetHashCode(string obj)
{
// Check whether the object is null
if (obj is null) return 0;
// Make lower case. How do I strip symbols?
return obj.ToLower().GetHashCode();
}
This fails, of course, when the string contains symbols as I'm not removing them before getting the hash code so the two strings (e.g. "Baa, baa, black sheep" and "Baa baa Black sheep") are still not equal even after converting to lower case.
I have written a method that will strip the symbols, but that meant I had to guess what those symbols actually are. It works for the cases I've tried so far, but I'm expecting it to fail eventually. I'd like a more reliable method of removing the symbols.
Given that the CompareOptions.IgnoreSymbols exists, is there a method I can call that will strip these characters from a string? Or failing that, a method that will return all the symbols?
I have found the IsPunctuation method for characters, but I can't determine whether what this deems to be punctuation is the same as what the string compare option deems to be a symbol.
If you're going to use the CompareOptions enum, I feel like you might as well use it with the CompareInfo class that it's documented as being designed for:
Defines the string comparison options to use with CompareInfo.
Then you can just use the GetHashCode(string, CompareOptions) method from that class (and even the Compare(string, string, CompareOptions) method if you like).
I'll keep this one short. I'm writing a module which will be required to compare two large integers which are input as strings (note: they are large, but not large enough to exceed Int64 bounds).
The strings are padded, so the choice is between taking the extra-step to converting them to their integer equivalent or comparing them as strings.
What I'm doing is converting each of them to Int64 and comparing them that way. However, I believe that string comparisons would also work. Seeing as I'd like it to be as efficient as possible, what are you're opinions on comparison of integers via :
string integer1 = "123";
string integer2 = "456";
if (Int64.Parse(integer1) <= Int64.Parse(integer2))
OR
string integer1 = "123";
string integer2 = "456";
if (integer1.CompareTo(integer2) < 0)
Better to use Int64.TryParse since this is a string fields
string integer1 = "123";
string integer2 = "456";
long value1=0;
long value2=0;
long.TryParse(integer1 ,out value1);
long.TryParse(integer2 ,out value2);
if(value1<=value2)
Nope string comparisons will not work. You should use your first version, you have to convert this strings to numbers parsing them and then compare the numbers.
It would be good to have a look here, where explains thorougly what the CompareTo method does. In a few words:
Compares the current instance with another object of the same type and returns an integer that indicates whether the current instance precedes, follows, or occurs in the same position in the sort order as the other object.
So since "123" and "456" are strings, they compare one string to another and not the one integer to the other.
Last but not least, it would be better to use the TryParse method for parsing your numbers, since your input may be not accidentally an integer. The way you use it is fairly easy:
Int64 value = 0;
Int64.Parse(integer1, out value1);
Where the value1 is the value1 you will get after the conversion of the string integer1. So for both you values, you should use this one if statement:
if(Int64.TryParse(integer1, out value1) && Int64.TryParse(integer2, out value2)
{
if(value1<=value2)
{
}
else
{
}
}
else
{
// Some error would have been happened to at least one of the two conversions.
}
It's fair to question if it is worth the cost of conversion (parse). If String.CompareTo were really efficient AND the number were always of a scale and format* the the string comparison were to be reliable then you might be better off. You could measure the performance, but you'll find the convert and int comparision is faster and more robust than a string comparison.
*String compare works if number strings are of equal length with leading 0s as necessary. So '003','020', and '100' will sort correctly but'3','20', and '100' will not.
I have a simple enum:
enum E
{
FullNameForA = 1,
A = 1,
FullNameForB = 2,
B = 2
}
The goal is to be able to use different string values for the same integral values with a twist - FullNameFor* must be used as default. In other words, a user can provide E.A as an input but the code should use E.FullNameForA for output.
It seems like by default C# will use alphabetical ordering of elements with the same integral value, which makes my goal harder. Is that right? Any ideas how to overcome this?
It seems like by default C# will use alphabetical ordering of elements with the same integral value
In what context? When you convert the value back to a string? From the docs of Enum.ToString:
If multiple enumeration members have the same underlying value and you attempt to retrieve the string representation of an enumeration member's name based on its underlying value, your code should not make any assumptions about which name the method will return.
(Note that the decision is in the BCL - it's not a language decision.)
I suggest that if you want a canonical string representation for each value, you create a Dictionary<E, string> and consult that rather than calling ToString().
Consider this alternative solution. You can decorate enum values with the DescriptionAttribute and to have a more human friendly name:
enum E
{
[System.ComponentModel.Description("FullNameForA")]
A = 1
}
Then you can extract the value of that attribute like so:
public static string AsString(this Enum value)
{
var type = value.GetType();
if (!type.IsEnum)
throw new ArgumentException();
var fieldInfo = type.GetField(value.ToString());
if (fieldInfo == null)
return value.ToString();
var attribs = fieldInfo.GetCustomAttributes(typeof(DescriptionAttribute), false) as DescriptionAttribute[];
return attribs.Length > 0 ? attribs[0].Description : value.ToString();
}
This of course isn't the best performing solution because it relies on reflection.
I have the following variables:
string str1 = "1";
string str2 = "asd";
string str3 = "3.5";
string str4 = "a";
Now I need to find the data type of each string i.e. the data type to which it can be converted if quotes are removed. Here is what I would like each variable to convert to:
str1 - integer
str2 - string
str3 - double
str4 - char
Note: if the string has single character it should be char, though a string can have single letter, I'm limiting it.
FYI: these values are obtained from DataGrid where i manually entered values. So everything is becoming a string.
Is there any way to do this?
Of course, there's no definite way to do this, but if you create a list of data types you want to check ordered by priority, then something like this may do the trick.
object ParseString(string str)
{
int intValue;
double doubleValue;
char charValue;
bool boolValue;
// Place checks higher if if-else statement to give higher priority to type.
if (int.TryParse(str, out intValue))
return intValue;
else if (double.TryParse(str, out doubleValue))
return doubleValue;
else if (char.TryParse(str, out charValue))
return charValue;
else if (bool.TryParse(str, out boolValue))
return boolValue;
return null;
}
Just call this function on each string, and you should have the appropiate type of object returned. A simple type check can then tell you how the string was parsed.
Use meta-data, if you can
That you have to guess what the data types are, is not a good idea.
Two things
1 Where is the data coming from?
If it's a database, are you sure they're strings?
If it is a database, there should be some meta data returned that will tell you what the datatypes of the fields are.
If it's an Xml file, is there a schema defined that will give you the types?
2 If you have to continue to guess.
Be aware that you can have strings that happen to be numbers, but are perfectly valid strings e.g phone numbers, bank acount numbers, that are best expressed as strings.
Also these numbers can have many digits, if you convert them to doubles you may loose some digits to floating point inaccuracies (you should be OK up to 14 or 15 digits)
I'm sure by now - cause I've taken my time typing this - there are lots of answers telling you how to do this (i.e. tryparse int first, then double, then test length for char, if not then it's a string etc), but if I were you, I'd try to NOT do that, and see if there's any way you can get, or pass some meta-data that will tell you what type it IS and not just what type it might be
Use the TryParse method of each type.
There is no built in way to do this, you could attempt TryParse on number types with increasing precision, but it wouldn't guarantee it to be right.
Your best bet what be to process it like you would manually. i.e. Is there a decimal place? No - then its an integer. How big? Is it negative?
The datatype for each of these items is string. If you want to attempt to parse them into different types you can use Int32.TryParse, Double.TryParse, etc. Or you can use Regex:
bool isInt = new Regex(#"^\d+$").IsMatch(str);
bool isDouble = !(isInt) && new Regex(#"^\d+\.\d+$").IsMatch(str);
bool isChar = !(isInt || isDouble) && new Regex(#"^.$").IsMatch(str);
bool isString = !(isInt || isDouble || isChar);