What's the meaning of Array[string[i] - 'a'] in C#? [duplicate] - c#

This question already has answers here:
Java: Subtract '0' from char to get an int... why does this work?
(10 answers)
How does subtracting the character '0' from a char change it into an int?
(4 answers)
Closed 8 years ago.
I’m learning Java through "introduction to Java programming 9th edition" by Daniel Liang at chapter 9 "strings" I’ve encountered this piece of code :
public static int hexCharToDecimal(char ch) {
if (ch >= 'A' && ch <= 'F')
return 10 + ch - 'A';
else
return ch - '0';
}
Can someone explain what just happened in here? How is possible to add/subtract chars from integers and what's the meaning behind it?

From the Docs
The char data type is a single 16-bit Unicode character.
A char is represented by its code point value:
min '\u0000' (or 0)
max: '\uffff' (or 65,535)
You can see all of the English alphabetic code points on an ASCII table.
Note that 0 == \u0000 and 65,535 == \uffff, as well as everything in between. They are corresponding values.
A char is actually just stored as a number (its code point value). We have syntax to represent characters like char c = 'A';, but it's equivalent to char c = 65; and 'A' == 65 is true.
So in your code, the chars are being represented by their decimal values to do arithmetic (whole numbers from 0 to 65,535).
For example, the char 'A' is represented by its code point 65 (decimal value in ASCII table):
System.out.print('A'); // prints A
System.out.print((int)('A')); // prints 65 because you casted it to an int
As a note, a short is a 16-bit signed integer, so even though a char is also 16-bits, the maximum integer value of a char (65,535) exceeds the maximum integer value of a short (32,767). Therefore, a cast to (short) from a char cannot always work. And the minimum integer value of a char is 0, whereas the minimum integer value of a short is -32,768.
For your code, let's say that the char was 'D'. Note that 'D' == 68 since its code point is 68.
return 10 + ch - 'A';
This returns 10 + 68 - 65, so it will return 13.
Now let's say the char was 'Q' == 81.
if (ch >= 'A' && ch <= 'F')
This is false since 'Q' > 'F' (81 > 70), so it would go into the else block and execute:
return ch - '0';
This returns 81 - 48 so it will return 33.
Your function returns an int type, but if it were to instead return a char or have the int casted to a char afterward, then the value 33 returned would represent the '!' character, since 33 is its code point value. Look up the character in ASCII table or Unicode table to verify that '!' == 33 (compare decimal values).

This is because char is a primitive type which can be used as a numerical value. Every character in a string is encoded as a specific number (not entirely true in all cases, but good enough for a basic understanding of the matter) and Java allows you to use chars in such a way.
It probably allows this mostly for historical reasons, this is how it worked in C and they probably motivated it with "performance" or something like that.
If you think it's weird then don't worry, I think so too
The other answer is incorrect actually. ASCII is a specific encoding (an encoding is some specification that says "1 = A, 2 = B, ... , 255 = Space") and that is not the one used in Java. A Java char is two bytes wide and is interpreted through the unicode character encoding.

Chars are in turn stored as integers(ASCII value) so that you can perform add and sub on integers which will return ASCII value of a char

Regardless of how Java actually stores the char datatype, what's certain is this, the character 'A' subtracted from the character 'A' would be represented as the null character, \0. In memory, this means every bit is 0. The size in memory a char takes up in memory may vary from language to language, but as far as I know, the null character is the same in all the languages, every bit is equal to 0.
As an int value, a piece of memory with every bit equal to 0 represents the integer value of 0.
And as it turns out, when you do "character math", subtracting any alphabetical character from any other alphabetical character (of the same case) results in bits being flipped in such a way that, if you were to interpret them as an int, would represent the distance between these characters. Additionally, subtracting the char '0' from any other numeric char will result in int value of the char you subtracted from, for basically the same reason.
'A' - 'A' = '\0'
'a' - 'a' = '\0'
'0' - '0' = '\0'

Related

representing a hexadecimal value by converting it to char

so I am outputting the char 0x11a1 by converting it to char
than I multiply 0x11a1 by itself and output it again but I do not get what I expect to get as
by doing this {int hgvvv = chch0;} and outputting to the console I can see that the computer thinks that 0x11a1 * 0x11a1 equals 51009 but it actually equals 20367169
As a result I do not gat what I want.
Could you please explain to me why?
char chch0 = (char)0x11a1;
Console.WriteLine(chch0);
chch0 = (char)(chch0 * chch0);
Console.WriteLine(chch0);
int hgvvv = chch0;
Console.WriteLine(hgvvv);
We know that 1 bytes is 8 bits.
We know that a char in c# is 2 bytes, which would be 16 bits.
If we multiply 0x11a1 X 0x11a1 we get 0x136c741.
0x136c741 in binary is 0001001101101100011101000001
Considering we only have 16 bits - we would only see the last 16 bits which is: 1100011101000001
1100011101000001 in hex is 0xc741.
This is 51009 that you are seeing.
You are being limited by the type size of char in c#.
Hope this answer cleared things up!
By enabling the checked context in your project or by adding it this way in your code:
checked {
char chch0 = (char)0x11a1;
Console.WriteLine(chch0);
chch0 = (char)(chch0 * chch0); // OverflowException
Console.WriteLine(chch0);
int hgvvv = chch0;
Console.WriteLine(hgvvv);
}
You will see that you will get an OverflowException, because the char type (2 bytes big) is only able to store values up to Char.MaxValue = 0xFFFF.
The value you expect (20367169) is larger than than 0xFFFF and you basically get only the two least significate bytes the type was able to store. Which is:
Console.WriteLine(20367169 & 0xFFFF);
// prints: 51009

Difficulty summing items from a large string of numbers using foreach [duplicate]

This question already has answers here:
Convert char to int in C#
(20 answers)
Closed 4 years ago.
I have been trying one of the Project Euler challenges but I have gotten stuck with an annoying problem.
double sum = 0;
string numbers = "3710728753390210279...."
foreach (int item in numbers) sum += item;
Console.WriteLine(sum);
Console.ReadLine();
When I run this code it doesn't split each number how I expect it to e.g. the first number 3 will instead be 51 and the second number 7 will be 55. I don't understand where it gets these numbers from.
Thanks in advance.
The other answers here haven't explained why you are seeing those unexpected numbers.
I think you are probably expecting the loop foreach (int item in numbers) to loop through the individual "numbers" in the string and automatically cast these numbers to integers. That's not what's happening (well, it is, but not how you expect).
The foreach loop is converting the string to IEnumerable<char> and iterating through each char character in the string starting '3', '7', '1', ....
In .Net characters and strings are encoded in unicode UTF-16 (as #TomBlodget pointed out in the comments). This means that each char can be converted to it's character code unit. Your code will actually sum the character code units.
In C# the code units for the characters '0', '1',..,'9' is in the range 48,...,57. For this reason you can do something like #Yeldar's answer:
foreach (char item in numbers)
sum += item - '0'; // if item == '9' this is equivalent to 57 - 48 = 9
So, if the string only contains numbers then subtracting the '0' character will implicitly convert the char to it's int counterpart and you will end up with the actual numerical value it represents (ie '7' - '0' => 55 - 48 = 7).
The other answers here provide solutions to overcome the issue. I thought it would be useful explain why it was happening.
If you know that the string only contains numerals then this works:
string numbers = "3710728753390210279";
int sum = numbers.Sum(x => x - '0');
If you're not sure it contains only numerals then this will filter out non-numerals:
int sum = numbers.Where(char.IsDigit).Sum(x => x - '0');
If you are sure that there are only digits in a string, then you can actually subtract the value of char '0', which will do the magic for you:
int sum = 0;
string numbers = "371";
foreach (char item in numbers)
sum += item - '0';
Console.WriteLine(sum);
Please, note that there is char in foreach so that there is no implicit cast to int. And also int is used for sum, instead of double, because you don't actually need a floating-point number here.
This solution is a little more verbose but it will let you see clearly what is happening and will also exclude any non-numeric characters that may pop up in the string:
double sum = 0;
string numbers = "3710728753390210279";
foreach (char item in numbers)
{
int intVal;
if(int.TryParse(item.ToString(), out intVal))
sum += intVal;
}
Console.WriteLine(sum);

Find the minimum of a number sequence with negative numbers with Linq

The following code doesn't give the expected minimum -1. Instead I get 0.
Do you know why?
class MainClass
{
public static void Main (string[] args)
{
string numbers = "-1 0 1 2 3 4 5";
Console.WriteLine (numbers.Split (' ')[0]); // output: -1
string max = numbers.Split(' ').Max();
string min = numbers.Split(' ').Min();
Console.WriteLine("{0} {1}", max, min); // output: "5 0"
}
}
I've not fully defined an answer yet but it appears to be because the - isn't accounted for.. you can confirm this with CompareOrdinal
Console.WriteLine(String.CompareOrdinal("-1", "0")); // -3 meaning -1 min
Console.WriteLine(String.Compare("-1", "0")); // 1 meaning 0 min
Either way, you are trying to compare numbers so you should treat them as numbers so similar subtleties dont appear.
Attempted explanation...
String implements IComparable<string> so String.Min uses that implementation (see remarks). Which in turn uses CompareTo,
Now in the notes for this method
Character sets include ignorable characters. The CompareTo(String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.
(Emphasis mine)
As you see. the - is ignored hence 0 which has a smaller value in an ascii table is used for the comparison
It's a string so Getting max from string is totally different than getting max from a number. For instance if You would have an array like below
char[] someCharArray = new char[] { '1', '12', '2' }
calling Max() on this array would result with 2 as 2 is "higher" in string order than 12.
Thinking about Max/Min value from string/char You need to think about alphabetical order. If You have a colection of letters A-Z, calling Min() will return A, calling Max() will return Z.
To get Max/Min in numerical order You need to cast to some Number type like int.
See below:
string numbers = "-1 0 1 2 3 4 5";
int min = numbers.Split(' ').Select(x => int.Parse(x)).Min();
Console.WriteLine(min); // gives You -1
There are two reasons for this behaviour:
You are sorting strings instead of numbers. This means that behind the scenes, Linq is using String.CompareTo() to compare the strings.
String.CompareTo() has special behaviour for -, which it treats as a HYPHEN and not a MINUS. (Note: This hyphen should not be confused with a soft hyphen which has the character code U00AD.)
Consider this code:
Console.WriteLine("-1".CompareTo("0")); // 1
Console.WriteLine("-1".CompareTo("1")); // 1
Console.WriteLine("-1".CompareTo("2")); // -1
Notice how, counter-intuitively, the "-1" is AFTER "0" and "1" but BEFORE "2".
This explains why when ordering the strings, the "-1" is neither the max nor the min.
Also see the answer to this question for more details.

How to (theoretically) print all possible double precision numbers in C#?

For a little personal research project I want to generate a string list of all possible values a double precision floating point number can have.
I've found the "r" formatting option, which guarantees that the string can be parsed back into the exact same bit representation:
string s = myDouble.ToString("r");
But how to generate all possible bit combinations? Preferably ordered by value.
Maybe using the unchecked keyword somehow?
unchecked
{
//for all long values
myDouble[i] = myLong++;
}
Disclaimer: It's more a theoretical question, I am not going to read all the numbers... :)
using unsafe code:
ulong i = 0; //long is 64 bit, like double
unsafe
{
double* d = (double*)&i;
for(;i<ulong.MaxValue;i++)
Console.WriteLine(*d);
}
You can start with all possible values 0 <= x < 1. You can create those by having zero for exponent and use different values for the mantissa.
The mantissa is stored in 52 bits of the 64 bits that make a double precision number, so that makes for 2 ^ 52 = 4503599627370496 different numbers between 0 and 1.
From the description of the decimal format you can figure out how the bit pattern (eight bytes) should be for those numbers, then you can use the BitConverter.ToDouble method to do the conversion.
Then you can set the first bit to make the negative version of all those numbers.
All those numbers are unique, beyond that you will start getting duplicate values because there are several ways to express the same value when the exponent is non-zero. For each new non-zero exponent you would get the value that were not possible to express with the previously used expontents.
The values between 0 and 1 will however keep you busy for the forseeable future, so you can just start with those.
This should be doable in safe code: Create a bit string. Convert that to a double. Output. Increment. Repeat.... A LOT.
string bstr = "01010101010101010101010101010101"; // this is 32 instead of 64, adjust as needed
long v = 0;
for (int i = bstr.Length - 1; i >= 0; i--) v = (v << 1) + (bstr[i] - '0');
double d = BitConverter.ToDouble(BitConverter.GetBytes(v), 0);
// increment bstr and loop

Why does TextReader.Read return an int, not a char?

Consider the following code ( .Dump() in LinqPad simply writes to the console):
var s = "𤭢"; //3 byte code point. 4 byte UTF32 encoded
s.Dump();
s.Length.Dump(); // 2
TextReader sr = new StringReader("𤭢");
int i;
while((i = sr.Read()) >= 0)
{
// notice here we are yielded two
// 2 byte values, but as ints
i.ToString("X").Dump(); // D852, DF62
}
Given the outcome above, why does TextReader.Read() return an int and not a char. Under what circumstances might it read a value greater than 2 bytes?
TextReader.Read() will never read greater than 2 bytes; however, it returns -1 to mean "no more characters to read" (end of string). Therefore, its return type needs to go up to Int32 (4 bytes) from Char (2 bytes) to be able to express the full Char range plus -1.
TextReader.Read() probably uses int to allow returning -1 when reaching the end of the text:
The next character from the text reader, or -1 if no more characters are available. The default implementation returns -1.
And, the Length is 2 because Strings are UTF-16 sequences, which require surrogate pairs to represent code points above U+FFFF.
{ 0xD852, 0xDF62 } <=> U+24B62 (𤭢)
You can get the UTF-32 code point from them with Char.ConvertToUtf32():
Char.ConvertToUtf32("𤭢", 0).ToString("X").Dump(); // 24B62

Categories

Resources