What’s the point of post increment ++ operator having higher precedence than preincrement ++ operator? Thus, is there a situation where x++ having same level of precedence as ++x would cause an expression to return a wrong result?
Let's start with defining some terms, so that we're all talking about the same thing.
The primary operators are postfix "x++" and "x--", the member access operator "x.y", the call operator "f(x)", the array dereference operator "a[x]", and the new, typeof, default, checked, unchecked and delegate operators.
The unary operators are "+x", "-x", "~x", "!x", "++x", "--x" and the cast "(T)x".
The primary operators are by definition of higher precedence than the unary operators.
Your question is
is there a situation where x++ having same level of precedence as ++x would cause an expression to return a wrong result?
It is not at all clear to me what you mean logically by "the wrong result". If we changed the rules of precedence in such a way that the value of an expression changed then the new result would be the right result. The right result is whatever the rules say the right result is. That's how we define "the right result" -- the right result is what you get when you correctly apply the rules.
We try to set up the rules so that they are useful and make it easy to express the meaning you intend to express. Is that what you mean by "the wrong result" ? That is, are you asking if there is a situation where one's intuition about what the right answer is would be incorrect?
I submit to you that if that is the case, then this is not a helpful angle to pursue because almost no one's intuition about the "correct" operation of the increment operators actually matches the current specification, much less some hypothetical counterfactual specification. In almost every C# book I have edited, the author has in some subtle or gross way mis-stated the meaning of the increment operators.
These are side-effecting operations with unusual semantics, and they come out of a language - C - with deliberately vague operational semantics. We have tried hard in the definition of C# to make the increment and decrement operators sensible and strictly defined, but it is impossible to come up with something that makes intuitive sense to everyone, since everyone has a different experience with the operators in C and C++.
Perhaps it would be helpful to approach the problem from a different angle. Your question presupposes a counterfactual world in which postfix and prefix ++ are specified to have the same precedence, and then asks for a criticism of that design choice in that counterfactual world. But there are many different ways that could happen. We could make them have the same precedence by putting both into the "primary" category. Or we could make them have the same precedence by putting them both into the "unary" category. Or we could invent a new level of precedence between primary and unary. Or below unary. Or above primary. We could also change the associativity of the operators, not just their precedence.
Perhaps you could clarify the question as to which counterfactual world you'd like to have criticized. Given any of those counterfactuals, I can give you a criticism of how that choice would lead to code that was unnecessarily verbose or confusing, but without a clear concept of the counterfactual design you're asking for criticism on, I worry that I'd spend a lot of time criticising something other than what you actually have in mind.
Make a specific proposed design change, and we'll see what its consequences are.
John, you have answered the question yourself: these two constructions are mostly used for function calls: ++x - when you want first to increase the value and then call a function, and x++ when you want to call a function, and then make an increase. That might be very useful, depending on the context. Looking at return x++ vs return ++x I see no point for error: the code means exactly how it reads :) The only problem is the programmer, who might use these two constructions without understanding the operator's precedence, and thus missing the meaning.
Related
I understand the difference between == and .equals. There are plenty of other questions on here that explain the difference in detail e.g. this one: What is the difference between .Equals and == this one: Bitwise equality amongst many others.
My question is: why have them both (I realise there must be a very good reason) - they both appear to do the same thing (unless overridden differently).
When would == be overloaded in a different way to how .equals is overridden?
== is bound statically, at compile-time, because operators are always static. You overload operators - you can't override them. Equals(object) is executed polymorphically, because it's overridden.
In terms of when you'd want them to be different...
Often reference types will override Equals but not overload == at all. It can be useful to easily tell the difference between "these two references refer to the same object" and "these two references refer to equal objects". (You can use ReferenceEquals if necessary, of course - and as Eric points out in comments, that's clearer.) You want to be really clear about when you do that, mind you.
double has this behavior for NaN values; ==(double, double) will always return false when either operand is NaN, even if they're the same NaN. Equals can't do that without invalidating its contract. (Admittedly GetHashCode is broken for different NaN values, but that's a different matter...)
I can't remember ever implementing them to give different results, personally.
My question is: why have them both (I realise there must be a very good reason)
If there's a good reason it has yet to be explained to me. Equality comparisons in C# are a godawful mess, and were #9 on my list of things I regret about the design of C#:
http://www.informit.com/articles/article.aspx?p=2425867
Mathematically, equality is the simplest equivalence relation and it should obey the rules: x == x should always be true, x == y should always be the same as y == x, x == y and x != y should always be opposite valued, if x == y and y == z are true then x == z must be true. C#'s == and Equals mechanisms guarantee none of these properties! (Though, thankfully, ReferenceEquals guarantees all of them.)
As Jon notes in his answer, == is dispatched based on the compile-time types of both operands, and .Equals(object) and .Equals(T) from IEquatable<T> are dispatched based on the runtime type of the left operand. Why are either of those dispatch mechanisms correct? Equality is not a predicate that favours its left hand side, so why should some but not all of the implementations do so?
Really what we want for user-defined equality is a multimethod, where the runtime types of both operands have equal weight, but that's not a concept that exists in C#.
Worse, it is incredibly common that Equals and == are given different semantics -- usually that one is reference equality and the other is value equality. There is no reason by which the naive developer would know which was which, or that they were different. This is a considerable source of bugs. And it only gets worse when you realize that GetHashCode and Equals must agree, but == need not.
Were I designing a new language from scratch, and I for some crazy reason wanted operator overloading -- which I don't -- then I would design a system that would be much, much more straightforward. Something like: if you implement IComparable<T> on a type then you automatically get <, <=, ==, !=, and so on, operators defined for you, and they are implemented so that they are consistent. That is x<=y must have the semantics of x<y || x==y and also the semantics of !(x>y), and that x == y is always the same as y == x, and so on.
Now, if your question really is:
How on earth did we get into this godawful mess?
Then I wrote down some thoughts on that back in 2009:
https://blogs.msdn.microsoft.com/ericlippert/2009/04/09/double-your-dispatch-double-your-fun/
The TLDR is: framework designers and language designers have different goals and different constraints, and they sometimes do not take those factors into account in their designs in order to ensure a consistent, logical experience across the platform. It's a failure of the design process.
When would == be overloaded in a different way to how .equals is overridden?
I would never do so unless I had a very unusual, very good reason. When I implement arithmetic types I always implement all of the operators to be consistent with each other.
One case that can come up is when you have a previous codebase that depends on reference equality via ==, but you decide you want to add value equality checking. One way to do this is to implement IEquatable<T>, which is great, but now what about all that existing code that was assuming only references were equal? Should the inherited Object.Equals be different from how IEquatable<T>.Equals works? This doesn't have an easy answer, as ideally you want all of those functions/operators to act in a consistent way.
For a concrete case in the BCL where this happened, look at TimeZoneInfo. In that particular case, == and Object.Equals were kept the same, but it's not clear-cut that this was the best choice.
As an aside, one way you can mitigate the above problem is to make the class immutable. In this case, code is less likely be broken by having previously relied on reference equality, since you can't mutate the instance via a reference and invalidate an equality that was previously checked.
Generally, you want them to do the same thing, particularly if your code is going to be used by anyone other than yourself and the person next to you. Ideally, for anyone who uses your code, you want to adhere to the principle of least surprise, which having randomly different behaviours violates. Having said this:
Overloading equality is generally a bad idea, unless a type is immutable, and sealed. If you're at the stage where you have to ask questions about it, then the odds of getting it right in any other case are slim. There are lots of reasons for this:
A. Equals and GetHashCode play together to enable dictionaries and hash sets to work - if you have an inconsistent implementation (or if the hash code changes over time) then one of the following can occur:
Dictionaries/sets start performing as effectively linear-time lookups.
Items get lost in dictionaries/sets
B. What were you really trying to do? Generally, the identity of an object in an object-oriented language IS it's reference. So having two equal objects with different references is just a waste of memory. There was probably no need to create a duplicate in the first place.
C. What you often find when you start implementing equality for objects is that you're looking for a definition of equality that is "for a particular purpose". This makes it a really bad idea to burn your one-and-only Equals for this - much better to define different EqualityComparers for the uses.
D. As others have pointed out, you overload operators but override methods. This means that unless the operators call the methods, horribly amusing and inconsistent results occur when someone tries to use == and finds the wrong (unexpected) method gets called at the wrong level of the hierarchy.
Languages like i.e. Java and C# have both bitwise and logical operators.
Logical operators make only sense with boolean operands, bitwise operators work with integer types as well. Since C had no boolean type and treats all non-zero integers as true, the existence of both logical and bitwise operators makes sense there. However, languages like Java or C# have a boolean type so the compiler could automatically use the right kind of operators, depending on the type context.
So, is there some concrete reason for having both logical and bitwise operators in those languages? Or were they just included for familiarity reasons?
(I am aware that you can use the "bitwise" operators in a boolean context to circumvent the short-circuiting in Java and C#, but i have never needed such a behaviour, so i guess it might be a mostly unused special case)
1) is there some concrete reason for having both logical and bitwise operators in those languages?
Yes:
We have boolean operators to do boolean logic (on boolean values).
We have bitwise operators to do bitwise logic (on integer values).
2) I am aware that you can use the "bitwise" operators in a boolean context to circumvent the short-circuiting in Java and C#,
For as far as C# goes this simply is not true.
C# has for example 2 boolean AND operators: & (full) and && (short) but it does not allow bitwise operations on booleans.
So, there really is no 'overlap' or redundancy between logical and bitwise operators. The two do not apply to the same types.
in C#, with booleans
&& is a short circuiting logical operator
& is a non short circuiting logical operator
bitwise, it just uses & as a legacy syntax from C / C++.... but it's really quite different. If anything, it would be better as a completely different symbol to avoid any confustion. But there aren't really many left, unless you wanted to go for &&& or ||| but thats a bit ugly.
Late answer, but I'll try to get to your real point.
You are correct. And the easiest way to make your point is to mention that other typed languages (like Visual Basic) have logical operators that can act on both Boolean and Integer expressions.
VB Or operator: http://msdn.microsoft.com/en-us/library/06s37a7f.aspx
VB Bitwise example: http://visualbasic.about.com/od/usingvbnet/a/bitops01_2.htm
This was very much a language design decision. Java and C# didn’t have to be the way they are. They just are the way they are. Java and C# did indeed inherit much of their syntax from C for the sake of familiarity. Other languages didn’t and work just fine.
A design decision like this has consequences. Short circuit evaluation is one. Disallowing mixed types (which can be confusing for humans to read) is another. I’ve come to like it but maybe I’ve just been staring at Java for too long.
Visual Basic added AndAlso and OrElse as a way to do short circuit evaluation. Unlike basics other logical operators these work only on Booleans.
VB OrElse: http://msdn.microsoft.com/en-us/library/ea1sssb2.aspx
Short-circuit description: http://support.microsoft.com/kb/817250
The distinction wasn’t made because strong typing makes it impossible to only have one set of logic operators in a language. It was made because they wanted short circuit evaluation and they wanted a clear way to signal to the reader what was happening.
The other reason (besides short circuit) that c and c++ have different kinds of logical operators is to allow any non-zero number to be reinterpreted as TRUE and zero reinterpreted as FALSE. To do that they need operators that tell them to interpret that way. Java rejects this whole reinterpret idea and throws an error in your face if you try to do that using it's logical operators. If it wasn’t for short circuit evaluation the only reason left would simply be because they wanted the operators to look different when doing different things.
So yes, if the Java and C# language designers hadn't cared about any of that they could have used one set of logical operators for both bitwise and boolean logic and figured out which to do based on operand type like some other languages do. They just didn't.
I'll say for Java
Logical operator are user with booleans and bitwise operators are used with ints. They can't be mixed.
Why not reduce them to one operator such as "&" or "|"? Java was designed to be friendly for C/C++ users, so it got their syntax. Nowadays these operators cannot be reduced because of backwards compatibility.
As you have already said, there's some difference between & and && (the same goes for | and ||) so you need two sets of boolean operators.
Now, independently from those above, you may need bitwise operators and the best choice is &, | s.o. since you don't have to avoid any confusion.
Why complicate things and use the two-character version ?
compiler cannot infer proper operator looking only at arguments. it's a business decision which one to choose. it's about lazy calculations. e.g.
public boolean a() {
doStuffA();
return false;
}
public boolean b() {
doStuffB();
return true;
}
and now:
a() & b() will execute doStuffB() while a() && b() will not
This is just a curiousity about if there is a fundamental thing stopping something like this (or correct me if there's already some way):
public TTo Convert<TTo, TFrom>(TFrom from)
{
...
}
Called like this:
SomeType someType = converter.Convert(someOtherType);
Because what would happen if you did this?
static void M(int x){}
static void M(double x){}
static T N<T>() {}
...
M(N());
Now what is T? int or double?
It's all very easy to solve the problem when you know what the type you're assigning to is, but much of the time the type you're assigning to is the thing you're trying to figure out in the first place.
Reasoning from inside to outside is hard enough. Reasoning from outside to inside is far more difficult, and doing both at the same time is extremely difficult. If it is hard for the compiler to understand what is going on, imagine how hard it is for the human trying to read, understand and debug the code when inferences can be made both from and to the type of the context of an expression. This kind of inference makes programs harder to understand, not easier, and so it would be a bad idea to add it to C#.
Now, that said, C# does support this feature with lambda expressions. When faced with an overload resolution problem in which the lambda can be bound two, three, or a million different ways, we bind it two, three or a million different ways and then evaluate those million different possible bindings to determine which one is "the best". This makes overload resolution at least NP-HARD in C#, and it took me the better part of a year to implement. We were willing to make that investment because (1) lambdas are awesome, and (2) most of the time people write programs that can be analyzed in a reasonable amount of time and can be understood by humans. So it was worth the cost. But in general, that kind of advanced analysis is not worth the cost.
C# expressions always* have a fixed type, regardless of their surroundings.
You're asking for an expression whose type is determined by whatever it's assigned to; that would violate this principle.
*) except for lambda expressions, function groups, and the null literal.
Unlike Java, in C# type reference doesn't base on the return type. And don't ask me why, Eric Lippert had answered these "why can't C# ..." questions:
because no one ever designed, specified, implemented, tested,
documented and shipped that feature
Currently I'm teaching a class of C++ programmers the basics of the C# language. As we discussed the topic operators I used C# standard categories of primary, unary etc. operators.
One of the attendees felt puzzled, because in the C# standard the "postfix ++/--" have been put in the category of primary operators rather than the "prefix ++/--". Her rationale behind this confusion was, that she would rather implement the C++ operator "postfix ++/--" in terms of the operator "prefix ++/--". In other words she would rather count the operator "prefix ++/--" as a primary operator. - I understand her point, but I can't give to her a rationale behind that. OK the operators "postfix ++/--" have a higher precedence than "prefix ++/--", but is this the only rationale behind that?
The spec mentioned it in section "14.2.1 Operator precedence and associativity".
So my very neutral question: Why are Postfix ++/-- categorized as primary Operators in C#?
Is there a deeper truth in it?
Since the ECMA standard itself does not define what a 'Primary' operator is, other than order of precedence (i.e. coming before 'Unary') there can be no other significance. The choice of words was probably bad.
Take into account that in many C-link languages, postfix operators tend to create a temporary variable where the expression's intermediate result is stored (see: "Prefer prefix operators over postfix" at Semicolon). Thus, they are fundamentally different from the prefix version.
Nonetheless, quickly checking how Mono and Visual Studio compile for-loops using the postfix and prefix forms, I saw that the IL code produced is identical. Only if you use the postfix/prefix expression's value does it translate to different IL (only affecting where the 'dup' instruction in placed), at least with those implementations mentioned.
EDIT: Okay, now I'm back home, I've removed most of the confusing parts...
I don't know why x++ is classified as a primary expression but ++x isn't; although I doubt it makes much difference in terms of the code you would write. Ditto precedence. I wonder whether the postfix is deemed primary as it's used more commonly? The annotated C# specs don't have any annotations around this, by the way, in either the ECMA edition or the Microsoft C# 4 editions. (I can't immediately find my C# 3 edition to check.)
However, in terms of implementation, I would think of ++ as a sort of pseudo-operator which is used by both prefix and postfix expressions. In particular, when you overload the ++ operator, that overload is used for both postfix and prefix increment. This is unlike C++, as stakx pointed out in a comment.
One thing to note is that while a post-increment/post-decrement expression has to have a primary expression as an operand, a pre-increment/pre-decrement expression only has to have a unary expression as an operand. In both cases the operand has to be classified as a variable, property access or indexer access though, so I'm not sure what practical difference that makes, if any.
EDIT: Just to give another bit of commentary, even though it seems arbitrary, I agree it does seem odd when the spec states:
Primary expressions include the simplest forms of expressions
But the list of steps for pre-increment is shorter/simpler than list of steps for post-increment (as it doesn't include the "save the value" step).
the difference is that
a[i++] will access the element indexed i.
a[++i] will access teh element indexed i+1.
In both cases after execution of a[++i/i++]
i will be i+1.
This can make troubles because you can't make assumption on parameters order
function(i++,i++,i++)
will increment i 3 times but you don't know in wich order. if initially i is 4 you can also have
function(4,5,6)
but also function(6,5,4)
or also function(6,4,5).
and that is still nothing because I used as example native types (for example "int"), things get worse when you have classes.
When overloading the operator result is not changed, what is changed is it's precedence. and this too can cause troubles.
So in one case "++" is applied before returning the reference, in the other case is applied "after" returning the reference. And when you overload it probably is better having it applied before returnin the reference (so ++something is much better than something++ at least from overloading point of view.)
take a generic class with overloaded ++ (of wich we have 2 items, foo and bar)
foo = bar ++; //is like writing (foo=bar).operator++();
foo = ++bar; // is like writing foo= (bar.operator++());
and there's much difference. Especially when you just don't assign your reference but do something more complex with it, or internally your object has stuff that has to do with shallow-copies VS deep copies.
It's probably very lame question, but I found no references in C# specification about round brackets. Please point me to spec or msdn if answer on that question will be obvious.
What is the inner difference between (MyType)SomeObj.Property1 and (MyType)(SomeObj.Property1) in C# ?
AFAIK, in first case ( (x)SomeObj.Property1 cast ) - it will be the reference of concrete type (MyType) to Property1. In second case, such reference will execute get accessor SomeObj.get_Property1.
And it eventually could lead to subtle errors if get accessor have any side effects (and its often - do have ones)
Could anyone point me to exact documentation where such behaviour specified ?
Updated: Thank you for pointing. And I deeply apologize about such dumb question - after posting this question, I found a typo in example I fiddle with and thus realized that second case behaviour was not based on code I tried to compile, but on previously compiled completely different code. So my question was initially based on my own blindness ...
They are equivalent. This is determined by the operator precedence rules in the C# language, chapter 7.2.1 in the C# Language Specification:
The . operator is at the top in this list, the cast operator is the 2nd in the list. The . operator "wins". You will have use parentheses if you need the cast because Property1 is a property of the MyType class:
((MyType)SomeObj).Property1
There is absolutely no difference. The . operator binds more tightly than the typecast operator, so the extra parentheses make no difference. See here for details of operator precedence; the operators in question are in the first two groups.