I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
Related
Previously today I was trying to add two ushorts and I noticed that I had to cast the result back to ushort. I thought it might've become a uint (to prevent a possible unintended overflow?), but to my surprise it was an int (System.Int32).
Is there some clever reason for this or is it maybe because int is seen as the 'basic' integer type?
Example:
ushort a = 1;
ushort b = 2;
ushort c = a + b; // <- "Cannot implicitly convert type 'int' to 'ushort'. An explicit conversion exists (are you missing a cast?)"
uint d = a + b; // <- "Cannot implicitly convert type 'int' to 'uint'. An explicit conversion exists (are you missing a cast?)"
int e = a + b; // <- Works!
Edit: Like GregS' answer says, the C# spec says that both operands (in this example 'a' and 'b') should be converted to int. I'm interested in the underlying reason for why this is part of the spec: why doesn't the C# spec allow for operations directly on ushort values?
The simple and correct answer is "because the C# Language Specification says so".
Clearly you are not happy with that answer and want to know "why does it say so". You are looking for "credible and/or official sources", that's going to be a bit difficult. These design decisions were made a long time ago, 13 years is a lot of dog lives in software engineering. They were made by the "old timers" as Eric Lippert calls them, they've moved on to bigger and better things and don't post answers here to provide an official source.
It can be inferred however, at a risk of merely being credible. Any managed compiler, like C#'s, has the constraint that it needs to generate code for the .NET virtual machine. The rules for which are carefully (and quite readably) described in the CLI spec. It is the Ecma-335 spec, you can download it for free from here.
Turn to Partition III, chapter 3.1 and 3.2. They describe the two IL instructions available to perform an addition, add and add.ovf. Click the link to Table 2, "Binary Numeric Operations", it describes what operands are permissible for those IL instructions. Note that there are just a few types listed there. byte and short as well as all unsigned types are missing. Only int, long, IntPtr and floating point (float and double) is allowed. With additional constraints marked by an x, you can't add an int to a long for example. These constraints are not entirely artificial, they are based on things you can do reasonably efficient on available hardware.
Any managed compiler has to deal with this in order to generate valid IL. That isn't difficult, simply convert the ushort to a larger value type that's in the table, a conversion that's always valid. The C# compiler picks int, the next larger type that appears in the table. Or in general, convert any of the operands to the next largest value type so they both have the same type and meet the constraints in the table.
Now there's a new problem however, a problem that drives C# programmers pretty nutty. The result of the addition is of the promoted type. In your case that will be int. So adding two ushort values of, say, 0x9000 and 0x9000 has a perfectly valid int result: 0x12000. Problem is: that's a value that doesn't fit back into an ushort. The value overflowed. But it didn't overflow in the IL calculation, it only overflows when the compiler tries to cram it back into an ushort. 0x12000 is truncated to 0x2000. A bewildering different value that only makes some sense when you count with 2 or 16 fingers, not with 10.
Notable is that the add.ovf instruction doesn't deal with this problem. It is the instruction to use to automatically generate an overflow exception. But it doesn't, the actual calculation on the converted ints didn't overflow.
This is where the real design decision comes into play. The old-timers apparently decided that simply truncating the int result to ushort was a bug factory. It certainly is. They decided that you have to acknowledge that you know that the addition can overflow and that it is okay if it happens. They made it your problem, mostly because they didn't know how to make it theirs and still generate efficient code. You have to cast. Yes, that's maddening, I'm sure you didn't want that problem either.
Quite notable is that the VB.NET designers took a different solution to the problem. They actually made it their problem and didn't pass the buck. You can add two UShorts and assign it to an UShort without a cast. The difference is that the VB.NET compiler actually generates extra IL to check for the overflow condition. That's not cheap code, makes every short addition about 3 times as slow. But otherwise the reason that explains why Microsoft maintains two languages that have otherwise very similar capabilities.
Long story short: you are paying a price because you use a type that's not a very good match with modern cpu architectures. Which in itself is a Really Good Reason to use uint instead of ushort. Getting traction out of ushort is difficult, you'll need a lot of them before the cost of manipulating them out-weighs the memory savings. Not just because of the limited CLI spec, an x86 core takes an extra cpu cycle to load a 16-bit value because of the operand prefix byte in the machine code. Not actually sure if that is still the case today, it used to be back when I still paid attention to counting cycles. A dog year ago.
Do note that you can feel better about these ugly and dangerous casts by letting the C# compiler generate the same code that the VB.NET compiler generates. So you get an OverflowException when the cast turned out to be unwise. Use Project > Properties > Build tab > Advanced button > tick the "Check for arithmetic overflow/underflow" checkbox. Just for the Debug build. Why this checkbox isn't turned on automatically by the project template is another very mystifying question btw, a decision that was made too long ago.
ushort x = 5, y = 12;
The following assignment statement will produce a compilation error, because the arithmetic expression on the right-hand side of the assignment operator evaluates to int by default.
ushort z = x + y; // Error: conversion from int to ushort
http://msdn.microsoft.com/en-us/library/cbf1574z(v=vs.71).aspx
EDIT:
In case of arithmetic operations on ushort, the operands are converted to a type which can hold all values. So that overflow can be avoided. Operands can change in the order of int, uint, long and ulong.
Please see the C# Language Specification In this document go to section 4.1.5 Integral types (around page 80 in the word document). Here you will find:
For the binary +, –, *, /, %, &, ^, |, ==, !=, >, <, >=, and <=
operators, the operands are converted to type T, where T is the first
of int, uint, long, and ulong that can fully represent all possible
values of both operands. The operation is then performed using the
precision of type T, and the type of the result is T (or bool for the
relational operators). It is not permitted for one operand to be of
type long and the other to be of type ulong with the binary operators.
Eric Lipper has stated in a question
Arithmetic is never done in shorts in C#. Arithmetic can be done in
ints, uints, longs and ulongs, but arithmetic is never done in shorts.
Shorts promote to int and the arithmetic is done in ints, because like
I said before, the vast majority of arithmetic calculations fit into
an int. The vast majority do not fit into a short. Short arithmetic is
possibly slower on modern hardware which is optimized for ints, and
short arithmetic does not take up any less space; it's going to be
done in ints or longs on the chip.
From the C# language spec:
7.3.6.2 Binary numeric promotions
Binary numeric promotion occurs for the operands of the predefined +, –, *, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary numeric promotion implicitly converts both operands to a common type which, in case of the non-relational operators, also becomes the result type of the operation. Binary numeric promotion consists of applying the following rules, in the order they appear here:
· If either operand is of type decimal, the other operand is converted to type decimal, or a binding-time error occurs if the other operand is of type float or double.
· Otherwise, if either operand is of type double, the other operand is converted to type double.
· Otherwise, if either operand is of type float, the other operand is converted to type float.
· Otherwise, if either operand is of type ulong, the other operand is converted to type ulong, or a binding-time error occurs if the other operand is of type sbyte, short, int, or long.
· Otherwise, if either operand is of type long, the other operand is converted to type long.
· Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.
· Otherwise, if either operand is of type uint, the other operand is converted to type uint.
· Otherwise, both operands are converted to type int.
There is no reason that is intended. This is just an effect or applying the rules of overload resolution which state that the first overload for whose parameters there is an implicit conversion that fit the arguments, that overload will be used.
This is stated in the C# Specification, section 7.3.6 as follows:
Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators.
It goes on illustrating with an example:
As an example of numeric promotion, consider the predefined implementations of the binary * operator:
int operator *(int x, int y);
uint operator *(uint x, uint y);
long operator *(long x, long y);
ulong operator *(ulong x, ulong y);
float operator *(float x, float y);
double operator *(double x, double y);
decimal operator *(decimal x, decimal y);
When overload resolution rules (§7.5.3) are applied to this set of operators, the effect is to select the first of the operators for which implicit conversions exist from the operand types. For example, for the operation b * s, where b is a byte and s is a short, overload resolution selects operator *(int, int) as the best operator.
Your question is in fact, a bit tricky. The reason why this specification is part of the language is... because they took that decision when they created the language. I know this sounds like a disappointing answer, but that's just how it is.
However, the real answer would probably involve many context decision back in the day in 1999-2000. I am sure the team who made C# had pretty robust debates about all those language details.
...
C# is intended to be a simple, modern, general-purpose, object-oriented programming language.
Source code portability is very important, as is programmer portability, especially for those programmers already familiar with C and C++.
Support for internationalization is very important.
...
The quote above is from Wikipedia C#
All of those design goals might have influenced their decision. For instance, in the year 2000, most of the system were already native 32-bits, so they might have decided to limit the number of variable smaller than that, since it will be converted anyway on 32-bits when performing arithmetic operations. Which is generally slower.
At that point, you might ask me; if there is implicit conversion on those types why did they included them anyway? Well one of their design goals, as quoted above, is portability.
Thus, if you need to write a C# wrapper around an old C or C++ program you might need those type to store some values. In that case, those types are pretty handy.
That's a decision Java did not make. For instance, if you write a Java program that interacts with a C++ program in which way your are received ushort values, well Java only has short (which are signed) so you can't easily assign one to another and expect correct values.
I let you bet, next available type that could receive such value in Java is int (32-bits of course). You have just doubled your memory here. Which might not be a big deal, instead you have to instantiate an array of 100 000 elements.
In fact, We must remember that those decision are been made by looking at the past and the future in order provide a smooth transfer from one to another.
But now I feel that I am diverging of the initial question.
So your question is a good one and hopefully I was able to bring some answers to you, even though if I know that's probably not what you wanted to hear.
If you'd like, you could even read more about the C# spec, links below. There is some interesting documentation that might be interesting for you.
Integral types
The checked and unchecked operators
Implicit Numeric Conversions Table
By the way, I believe you should probably reward habib-osu for it, since he provided a fairly good answer to the initial question with a proper link. :)
Regards
I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
As per the MSDN article: https://msdn.microsoft.com/en-us/library/8s682k58%28v=vs.80%29.aspx
Explicit conversion is required by some compilers to support narrowing
conversions..
Based on the statements made in the above msdn link, is it safe to say that
Converting int to uint is narrowing conversion
Also,
Converting uint to int is narrowing conversion
?
Converting an integral or floating point type from signed to unsigned and vice-versa is neither a narrowing nor a widening conversion, since the number of bits used to store it remains unchanged.
Instead, it is a change of representation and can utterly change the number (e.g. signed -1 is converted to unsigned 0xffffffff).
In fact, if you use unchecked arithmetic:
unchecked
{
int x = -1;
uint y = (uint) x;
int z = (int) y;
Debug.Assert(x == z); // Will succeed for all x.
uint a = 0xffffffff;
int b = (int) a;
uint c = (uint) b;
Debug.Assert(a == c); // Will succeed for all a.
}
So a round trip works in both directions, which proves that narrowing does not occur in either direction.
Narrowing and widening only applies to integral and floating point types where a different number of bits are used to store one or more parts of the number.
However, because the number's value can be changed, you must cast to make such a conversion, to ensure that you don't do it accidentally.
It is not narrowing, but you must still be explicit. Both can hold values that the other doesn't have. int can hold negative values, which uint can't and uint can hold values greater than 2147483647, which is the maximum value for an int.
However, both int and uint use 32 bits (= 4 bytes) to hold information.
When you convert an int to a byte, you will lose information, because a byte can only hold 8 bits.
Converting between int and uint doesn't lose any bits, but the bits have a different meaning. (in 50% of the cases)
You must be explicit about it to indicate that you know what you are doing. It will prevent mistakes that would be caused by an implicit cast.
I'm a Java programmer trying to migrate to C#, and this gotcha has me slightly stumped:
int a = 1;
a = 0x08000000 | a;
a = 0x80000000 | a;
The first line compiles just fine. The second does not. It seems to recognise that there is a constant with a sign bit, and for some reason it decides to cast the result to a long, resulting in the error:
Cannot implicitly convert type 'long' to 'int'.
An explicit conversion exists (are you missing a cast?)
The fix I have so far is:
a = (int)(0x80000000 | a);
Which deals with the cast but still leaves a warning:
Bitwise-or operator used on a sign-extended operand;
consider casting to a smaller unsigned type first
What would be the correct C# way to express this in an error/warning/long-free way?
I find it interesting that in all these answers, only one person actually suggested doing what the warning says. The warning is telling you how to fix the problem; pay attention to it.
Bitwise-or operator used on a sign-extended operand; consider casting to a smaller unsigned type first
The bitwise or operator is being used on a sign-extended operand: the int. That's causing the result to be converted to a larger type: long. An unsigned type smaller than long is uint. So do what the warning says; cast the sign-extended operand -- the int -- to uint:
result = (int)(0x80000000 | (uint) operand);
Now there is no sign extension.
Of course this just raises the larger question: why are you treating a signed integer as a bitfield in the first place? This seems like a dangerous thing to do.
A numeric integer literal is an int by default, unless the number is too large to fit in an int and it becomes an uint instead (and so on for long and ulong).
As the value 0x80000000 is too large to fit in an int, it's an uint value. When you use the | operator on an int and an uint both are extended to long as neither can be safely converted to the other.
The value can be represented as an int, but then you have to ignore that it becomes a negative value. The compiler won't do that silently, so you have to instruct it to make the value an int without caring about the overflow:
a = unchecked((int)0x80000000) | a;
(Note: This only instructs the compiler how to convert the value, so there is no code created for doing the conversion to int.)
Your issue is because 0x80000000 is a minus in int form and you cannot perform bitwise operations on minus values.
It should work fine if you use a uint.
a = ((uint)0x80000000) | a; //assuming a is a uint
Changing that line to
(int)((uint)0x80000000 | (uint)a);
does it for me.
The problem you have here is that
0x80000000 is an unsigned integer literal. The specification says that an integer literal is of the first type in the list (int, uint, long, ulong) which can hold the literal. In this case it is uint.
a is probably an int
This causes the result to be a long. I don't see a nicer way other than cast the result back to int, unless you know that a can't be negative. Then you can cast a to uint or declare it that way in the first place.
You can make it less ugly by creating a constant that starts with the letter 'O'. (And unlike on this website, in visual studio, it doesn't show it in a different color).
const int Ox80000000 = unchecked((int)0x80000000);
/// ...
const int every /* */ = Ox80000000;
const int thing /* */ = 0x40000000;
const int lines /* */ = 0x20000000;
const int up /* */ = 0x10000000;
I have a question about the implicit type conversion
Why does this implicit type conversion work in C#? I've learned that implicit code usually don't work.
I have a code sample here about implicit type conversion
char c = 'a';
int x = c;
int n = 5;
int answer = n * c;
Console.WriteLine(answer);
UPDATE: I am using this question as the subject of my blog today. Thanks for the great question. Please see the blog for future additions, updates, comments, and so on.
http://blogs.msdn.com/ericlippert/archive/2009/10/01/why-does-char-convert-implicitly-to-ushort-but-not-vice-versa.aspx
It is not entirely clear to me what exactly you are asking. "Why" questions are difficult to answer. But I'll take a shot at it.
First, code which has an implicit conversion from char to int (note: this is not an "implicit cast", this is an "implicit conversion") is legal because the C# specification clearly states that there is an implicit conversion from char to int, and the compiler is, in this respect, a correct implementation of the specification.
Now, you might sensibly point out that the question has been thoroughly begged. Why is there an implicit conversion from char to int? Why did the designers of the language believe that this was a sensible rule to add to the language?
Well, first off, the obvious things which would prevent this from being a rule of the language do not apply. A char is implemented as an unsigned 16 bit integer that represents a character in a UTF-16 encoding, so it can be converted to a ushort without loss of precision, or, for that matter, without change of representation. The runtime simply goes from treating this bit pattern as a char to treating the same bit pattern as a ushort.
It is therefore possible to allow a conversion from char to ushort. Now, just because something is possible does not mean it is a good idea. Clearly the designers of the language thought that implicitly converting char to ushort was a good idea, but implicitly converting ushort to char is not. (And since char to ushort is a good idea, it seems reasonable that char-to-anything-that-ushort-goes-to is also reasonable, hence, char to int. Also, I hope that it is clear why allowing explicit casting of ushort to char is sensible; your question is about implicit conversions.)
So we actually have two related questions here: First, why is it a bad idea to allow implicit conversions from ushort/short/byte/sbyte to char? and second,
why is it a good idea to allow implicit conversions from char to ushort?
Unlike you, I have the original notes from the language design team at my disposal. Digging through those, we discover some interesting facts.
The first question is covered in the notes from April 14th, 1999, where the question of whether it should be legal to convert from byte to char arises. In the original pre-release version of C#, this was legal for a brief time. I've lightly edited the notes to make them clear without an understanding of 1999-era pre-release Microsoft code names. I've also added emphasis on important points:
[The language design committee] has chosen to provide
an implicit conversion from bytes to
chars, since the domain of one is
completely contained by the other.
Right now, however, [the runtime
library] only provide Write methods
which take chars and ints, which means
that bytes print out as characters
since that ends up being the best
method. We can solve this either by
providing more methods on the Writer
class or by removing the implicit
conversion.
There is an argument for why the
latter is the correct thing to do.
After all, bytes really aren't
characters. True, there may be a
useful mapping from bytes to chars, but ultimately, 23 does not denote the
same thing as the character with ascii
value 23, in the same way that 23B
denotes the same thing as 23L. Asking
[the library authors] to provide this
additional method simply because of
how a quirk in our type system works
out seems rather weak. So I would
suggest that we make the conversion
from byte to char explicit.
The notes then conclude with the decision that byte-to-char should be an explicit conversion, and integer-literal-in-range-of-char should also be an explicit conversion.
Note that the language design notes do not call out why ushort-to-char was also made illegal at the same time, but you can see that the same logic applies. When calling a method overloaded as M(int) and M(char), when you pass it a ushort, odds are good that you want to treat the ushort as a number, not as a character. And a ushort is NOT a character representation in the same way that a ushort is a numeric representation, so it seems reasonable to make that conversion illegal as well.
The decision to make char go to ushort was made on the 17th of September, 1999; the design notes from that day on this topic simply state "char to ushort is also a legal implicit conversion", and that's it. No further exposition of what was going on in the language designer's heads that day is evident in the notes.
However, we can make educated guesses as to why implicit char-to-ushort was considered a good idea. The key idea here is that the conversion from number to character is a "possibly dodgy" conversion. It's taking something that you do not KNOW is intended to be a character, and choosing to treat it as one. That seems like the sort of thing you want to call out that you are doing explicitly, rather than accidentally allowing it. But the reverse is much less dodgy. There is a long tradition in C programming of treating characters as integers -- to obtain their underlying values, or to do mathematics on them.
In short: it seems reasonable that using a number as a character could be an accident and a bug, but it also seems reasonable that using a character as a number is deliberate and desirable. This asymmetry is therefore reflected in the rules of the language.
Does that answer your question?
The basic idea is that conversions leading to potential data-loss can be implicit, whereas conversions, which may lead to data-loss have to be explicit (using, for instance, a cast operator).
So implicitly converting from char to int will work in C#.
[edit]As others pointed out, a char is a 16-bit number in C#, so this conversion is just from a 16-bit integer to a 32-bit integer, which is possible without data-loss.[/edit]
C# supports implicit conversions, the part "usually don't work" is probably coming from some other language, probably C++, where some glorious string implementations provided implicit conversions to diverse pointer-types, creating some gigantic bugs in applications.
When you, in whatever language, provide type-conversions, you should also default to explicit conversions by default, and only provide implicit conversions for special cases.
From C# Specification
6.1.2 Implicit numeric conversions
The implicit numeric conversions are:
• From sbyte to short, int, long,
float, double, or decimal.
• From byte to short, ushort, int,
uint, long, ulong, float, double, or
decimal.
• From short to int, long, float,
double, or decimal.
• From ushort to int, uint, long,
ulong, float, double, or decimal.
• From int to long, float, double, or
decimal.
• From uint to long, ulong, float,
double, or decimal.
• From long to float, double, or
decimal.
• From ulong to float, double, or
decimal.
• From char to ushort, int, uint,
long, ulong, float, double, or
decimal.
• From float to double.
Conversions from int, uint, long, or
ulong to float and from long or ulong
to double may cause a loss of
precision, but will never cause a loss
of magnitude. The other implicit
numeric conversions never lose any
information. There are no implicit
conversions to the char type, so
values of the other integral types do
not automatically convert to the char
type.
From the MSDN page about the char (char (C# Reference) :
A char can be implicitly converted to ushort, int, uint, long, ulong, float, double, or decimal. However, there are no implicit conversions from other types to the char type.
It's because they have implemented an implicit method from char to all those types. Now if you ask why they implemented them, I'm really not sure, certainly to help working with ASCII representation of chars or something like that.
Casting will cause data loss. Here char is 16 bit and int is 32 bit. So the cast will happen without loss of data.
Real life example: we can put a small vessel into a big vessel but not vice versa without external help.
The core of #Eric Lippert's blog entry is his educated guess for the reasoning behind this decision of the c# language designers:
"There is a long tradition in C programming of treating characters as integers
-- to obtain their underlying values, or to do mathematics on them."
It can cause errors though, such as:
var s = new StringBuilder('a');
Which you might think initialises the StringBuilder with an 'a' character to start with, but actually sets the capacity of the StringBuilder to 97.
It works because each character is handled internally as a number, hence the cast is implicit.
The char is implicitly cast to it's Unicode numeric value, which is an integer.
The implicit conversion from char to number types makes no sense, in my opinion, because a loss of information happens. You can see it from this example:
string ab = "ab";
char a = ab[0];
char b = ab[1];
var d = a + b; //195
We have put all pieces of information from the string into chars. If by any chance only the information from d is kept, all that is left to us is a number which makes no sense in this context and cannot be used to recover the previously provided information. Thus, the most useful way to go would be to implicitly convert the "sum" of chars to a string.