Why are casting and conversion operations are syntactically indistinguishable?

Why are casting and conversion operations are syntactically indistinguishable? - c#

Stack Overflow has several questions about casting boxed values: 1, 2.
The solution requires first to unbox the value and only after that cast it to another type. Nevertheless, boxed value "knows" its own type, and I see no reason why conversion operator could not be called.
Moreover, the same issue is valid for reference types:
void Main()
{
object obj = new A();
B b = (B)obj;
}
public class A
{
}
public class B {}
This code throws InvalidCastException. So it's not the matter of value vs reference type; it's how compiler behaves.
For the upper code it emits castclass B, and for the code
void Main()
{
A obj = new A();
B b = (B)obj;
}
public class A
{
public static explicit operator B(A obj)
{
return new B();
}
}
public class B
{
}
it emits call A.op_Explicit.
Aha! Here compiler sees that A has an operator and calls it. But what then happens if B inherits from A? Not so fast, compiler is quite clever...it just says:
A.explicit operator B(A)': user-defined conversions to or from a
derived class are not allowed
Ha! No ambiguity!
but why on Earth did they allow two rather different operations to look the same?! What was the reason?

Your observation is, as far as I can tell, the observation that I made here:
http://ericlippert.com/2009/03/03/representation-and-identity/
There are two basic usages of the cast operator in C#:
(1) My code has an expression of type B, but I happen to have more information than the compiler does. I claim to know for certain that at runtime, this object of type B will actually always be of derived type D. I will inform the compiler of this claim by inserting a cast to D on the expression. Since the compiler probably cannot verify my claim, the compiler might ensure its veracity by inserting a run-time check at the point where I make the claim. If my claim turns out to be inaccurate, the CLR will throw an exception.
(2) I have an expression of some type T which I know for certain is not of type U. However, I have a well-known way of associating some or all values of T with an “equivalent” value of U. I will instruct the compiler to generate code that implements this operation by inserting a cast to U. (And if at runtime there turns out to be no equivalent value of U for the particular T I’ve got, again we throw an exception.)
The attentive reader will have noticed that these are opposites. A neat trick, to have an operator which means two contradictory things, don’t you think?
So apparently you are one of the "attentive readers" I called out who have noticed that we have one operation that logically means two rather different things. This is a good observation!
Your question is "why is that the case?" This is not a good question! :-)
As I have noted many times on this site, I cannot answer "why" questions satisfactorily. "Because that's what the specification says" is a correct answer but unsatisfactory. Really what the questioner is usually looking for is a summary of the language design process.
When the C# language design team designs features the debates can go on for literally months, they can involve a dozen people discussing many different proposals each with their own pros and cons, that generate hundreds of pages of notes. Even if I had the relevant information from the late 1990s meetings about cast operations, which I don't, it seems hard to summarize it concisely in a manner that would be satisfactory to the original questioner.
Moreover, in order to satisfactorily answer this question one would of course have to discuss the entire historical perspective. C# was designed to be immediately productive for existing C, C++ and Java programmers, and so it borrows many of the conventions of these languages, including its basic mechanisms for conversion operators. In order to properly answer the question we would have to discuss the history of the cast operator in C, C++ and Java as well. This seems like far too much information to expect in an answer on StackOverflow.
Frankly, the most likely explanation is that this decision was not the result of long debate between the merits of different positions. Rather, it's likely the language design team considered how it is done in C, C++ and Java, made a reasonable compromise position that didn't look too terrible, and moved on to other more interesting business. A proper answer would therefore be almost entirely historical; why did Ritchie design the cast operator like he did for C? I don't know, and we can't ask him.
My advice to you is that you stop asking "why?" questions about the history of programming language design and instead ask specific technical questions about specific code, questions that have a short answers.

Conversion operators are essentially "glorified method calls", so the compiler (as opposed to the runtime) already needs to know that you want to use the conversion operator and not a typecast. Basically the compiler needs to check whether a conversion exists to be able to generate the appropriate bytecode for it.
Your first code sample essentially looks like "convert from object to B", as the compiler has no idea that variable can only contain an A. According to the rules that means the compiler must emit a typecast operation.
Your second code sample is obvious to the compiler, because "convert from A to B" can be done with the conversion operator.

Related

What is wrong with inferred type of template and implicit type conversion?

I am reading "The D Programming Language" by Andrei Alexandrescu and one sentence puzzled me. Consider such code (p.138):
T[] find(T)(T[] haystack, T needle) {
while (haystack.length > 0 && haystack[0] != needle) {
haystack = haystack[1 .. $];
}
return haystack;
}
and call (p.140):
double[] a = [ 1.0, 2.5, 2.0, 3.4 ];
a = find(a, 2); // Error! ' find(double[], int)' undefined
Explanation (paragraph below the code):
If we squint hard enough, we do see that the intent of the caller in this case was to have T = double and benefit from the nice implicit conversion from int to double. However, having the language attempt combinatorially at the same time implicit conversions and type deduction is a dicey proposition in the general case, so D does not attempt to do all that.
I am puzzled because such language as C# tries to infer the type — if it cannot do this, user gets error, but if it can, well, it works. C# lives with it for several years and I didn't hear any story how this feature ruined somebody's day.
And so my questions is this — what dangers are involved with inferring types as in the example above?
I can see only advantages, it is easy to write generic function and it is easy to call it. Otherwise you would have to introduce more parameters in generic class/function and write special constrains expressing allowed conversions only for the sake of inferring types.

The first thing to note is that it doesn't say that there's a problem with type deduction, it's that there's a problem with type deduction and implicit conversion at the same time.
So, if you have:
a = find(a, 2.0);
Then it's happy to deduce double as the type.
And if explicitly typed for double, it's happy to deduce that 2 should be implicitly cast to 2.0.
What it's not going to do is both at the same time.
Now, C# does do that. And I think for the most part I agree with you, it's generally convenient and generally works well enough.
It is true at the same time that it can be confusing, especially in cases where it leads to more than one equally reasonable overload.
Why type inference and implicit operator is not working in the following situations? and Why does Assert.AreEqual on custom struct with implicit conversion operator fail? are both questions about why a particular combination of implicit conversion and type inference didn't work.
Unexpected effect of implicit cast on delegate type inference has other factors, but again just refusing to consider the method in the list of possible matches because the argument type didn't match would mean the problem didn't happen.
They'd both be a lot simpler if the answer was always "because you are doing implicit conversion and type inference at the same time, and that's not supported". Which is what the answer would be with D.
On the other hand though, such problems don't arise that often, so I still favour the C# design decision to allow for both, but that there are some problems means it's still a reasonable design decision to not allow them.

Does C# support type inference of the return type?

This is just a curiousity about if there is a fundamental thing stopping something like this (or correct me if there's already some way):
public TTo Convert<TTo, TFrom>(TFrom from)
{
...
}
Called like this:
SomeType someType = converter.Convert(someOtherType);

Because what would happen if you did this?
static void M(int x){}
static void M(double x){}
static T N<T>() {}
...
M(N());
Now what is T? int or double?
It's all very easy to solve the problem when you know what the type you're assigning to is, but much of the time the type you're assigning to is the thing you're trying to figure out in the first place.
Reasoning from inside to outside is hard enough. Reasoning from outside to inside is far more difficult, and doing both at the same time is extremely difficult. If it is hard for the compiler to understand what is going on, imagine how hard it is for the human trying to read, understand and debug the code when inferences can be made both from and to the type of the context of an expression. This kind of inference makes programs harder to understand, not easier, and so it would be a bad idea to add it to C#.
Now, that said, C# does support this feature with lambda expressions. When faced with an overload resolution problem in which the lambda can be bound two, three, or a million different ways, we bind it two, three or a million different ways and then evaluate those million different possible bindings to determine which one is "the best". This makes overload resolution at least NP-HARD in C#, and it took me the better part of a year to implement. We were willing to make that investment because (1) lambdas are awesome, and (2) most of the time people write programs that can be analyzed in a reasonable amount of time and can be understood by humans. So it was worth the cost. But in general, that kind of advanced analysis is not worth the cost.

C# expressions always* have a fixed type, regardless of their surroundings.
You're asking for an expression whose type is determined by whatever it's assigned to; that would violate this principle.
*) except for lambda expressions, function groups, and the null literal.

Unlike Java, in C# type reference doesn't base on the return type. And don't ask me why, Eric Lippert had answered these "why can't C# ..." questions:
because no one ever designed, specified, implemented, tested,
documented and shipped that feature

Why are most types in C# inherited from System.Object? [duplicate]

This question already has answers here:
Why does every class in .NET inherit from Object?
(9 answers)
Closed 9 years ago.
I was checking the int and float types in C# and even they have the "ToString" etc, methods meaning they are inherited from System.Object. But Doesn't this cause a performance hit? I understand that they did not make base types like int objects in java because of performance. Doesn't this rule apply to .NET as well? And if it does, then does that mean .NET is slower than Java? But practically that's not true because the programs i have made in C# run way better than those I made in Java. So is there something I don't understand here?

It's very important to understand the differences between value types and reference types. The core of the difference is what the value of an expression of the type is.
Consider:
int x = 10;
SomeClass y = new SomeClass();
Here, the value of x really is 10 - the bits for 10 end up in the memory associated with the variable x.
The value of y is a reference - a way of getting to a separate object in memory.
The difference becomes very important when you use assignment, particularly with mutable reference types:
int x1 = 10;
int x2 = x1;
SomeClass y1 = new SomeClass();
SomeClass y2 = y1;
y1.SomeProperty = "Fred";
Console.WriteLine(y2.SomeProperty);
In both cases, the value of the variable is copied in the assignment - so x2's value is 10; y2's value is a reference to the same object. So when the object's data is modified via the property in the penultimate, you can still see that difference via y2.
You would rarely write z1.SomeProperty = ... when z1 is a variable of a value type, as most value types are immutable. (You can't change the value of the data itself - you have to assign a new value to the variable explicitly.) If you did, however you wouldn't see any changes via variables which were previously initialized using an assignment z1 because the value would have been copied in that assignment.
I have an article on value types and reference types which goes into all this in more detail.
Now, C# and .NET have a unified type system such that even value types inherit from System.Object. That means you can call all the messages from Object on value type values. Sometimes that requires boxing (converting a value type value into an Object) and sometimes it doesn't... I won't go into all the rules right now. Importantly, if a value type overrides an object method (e.g. ToString, or GetHashCode) the value doesn't need to be boxed to call the method.
While boxing does have a performance penalty, it's typically overstated in my experience. These days with generics, boxing isn't really needed as much as it used to be - but so long as you only use it sensibly, it's unlikely to become a significant performance problem in your application.
EDIT: Regarding games, performance, and learning things a bit at a time...
I've seen lots of people asking questions on a relatively advanced topic without understanding the basics. There's nothing wrong with being a beginner, obviously, but in my view it's really important to learn the basics first, in a "newbie friendly" environment - which I don't think games count as.
Personally I find that console apps are the easiest way of learning most core new concepts - whether that's language features, file IO, collections, web services, threading etc. Obviously for things like "learning Windows Forms" you can't do that with a console app - but it really helps if nothing other than the Windows Forms part is new to you.
So I would strongly advise that you learn the basics of C# - things like how value types and reference types work, how parameter passing works, generics, classes, inheritance etc - before you move onto games. That may sound like extra work, but it's likely to save you time later on. When something doesn't behave as you expect it to, you'll know how the language works, so you can focus on the API behaving differently, etc. Trying to learn one thing at a time makes the whole process smoother, in my experience.
In particular, the performance requirements of games mean that sometimes it's worth writing very non-idiomatic C#. Things like preallocating objects and reusing them where you'd normally create new objects... even creating mutable structs which I'd pretty much never do in normal development. In the critical game loop, boxing or creating any objects at all may be a bad idea - but that doesn't mean these things are "expensive" in the normal frame of reference. It's really important that you understand when these things are appropriate, and when they're not... and if you start with games development, you'll get an imbalanced view of these things, IMO. You'll also potentially try to micro-optimize areas where you really don't need to - but if you have a solid grounding in C# and .NET to start with, you'll be in a better position to get everything in perspective.

As long as all objects appear to be derived from System.Object, then for all practical purposed they are derived from System.Object. Underneath the hood things are optimized into perfection so that an int is just a four-byte int most of the time but it's an object when it needs to be an object. That's what boxing is all about.
The compiler and the run-time both work very, very hard to make sure that primitive types are not overburdened with a lot of excess baggage in size or function. At the same time, all rules needed to meet the C# specification and the object hierarchy are ensured. Sometimes the compiler takes shortcuts, sometimes the run-time does the work.

One advantage of having a common base class means that you could write a method like
public void DoSomething(Object object)
{
....
}
and essentially pass in anything.
Not sure about the performance aspect though.

EDIT: To clarify my position on boxing: Jon said it well when he asked me to point out: "[boxing is] as expensive as creating any other small object". So, I don't mean to overstate the performance impact. However, I do intent to arm intelligent readers with the information they need to make smart decisions based on their individual circumstances.
EDIT: OK, so if you want to be technical, I retract my previous statement. From C# ECMA standard: "All value types implicitly inherit from class object. It is not possible for any type to derive from a value type, and value types are thus implicitly sealed (§17.1.1.2)." (C# ECMA Standard Page 130) HOWEVER... It's my understanding that the thrust of the OP's question is really in relation to PERFORMANCE and how the types are treated under the hood in .NET. To that point: Value Types ARE treated differently from Reference Types - and simple types (int, float, etc) are stored and operated upon in an efficient manner. When they ARE treated like Objects, you pay an expensive performance cost (as the OP suspects). The moral of this story is, for me, and hopefully maybe someone else, is to AVOID BOXING - which, in practical terms, means Value Types ARE different from Object... take my answer as you may.
You are mistaken. System.Integer and System.Float are not subclasses of System.Object. They are what's referred to as "Value Types".
you can see this is true in the documentation: http://msdn.microsoft.com/en-us/library/system.int32.aspx which shows that Int is a "struct".
This article discusses the topic in detail: http://msdn.microsoft.com/en-us/library/34yytbws%28v=vs.71%29.aspx
You should definitely read that last one if you're interested in this.
In regards to your broader question, "Why are most types in C# inherited from System.Object", you'll find there is nothing unique about this. Java has java.lang.Object, Objective-C has NSObject, etc. etc. The distinction between value types and reference types is nearly universal. I'm not sure I really need to get into the long-winded answer to that here, because I think pointing out the difference, and the article about value types in C#, has already probably answered your question.
To clarify everyone else's questions about boxing, etc.: http://msdn.microsoft.com/en-us/magazine/cc301569.aspx

What is the difference between casting and conversion? [duplicate]

This question already has answers here:
Is casting the same thing as converting?
(11 answers)
Closed 9 years ago.
Eric Lippert's comments in this question have left me thoroughly confused. What is the difference between casting and conversion in C#?

Casting is a way of telling the compiler "Object X is really Type Y, go ahead and treat it as such."
Conversion is saying "I know Object X isn't Type Y, but there exists a way of creating a new Object from X of Type Y, go ahead and do it."

I believe what Eric is trying to say is:
Casting is a term describing syntax (hence the Syntactic meaning).
Conversion is a term describing what actions are actually taken behind the scenes (and thus the Semantic meaning).
A cast-expression is used to convert
explicitly an expression to a given
type.
And
A cast-expression of the form (T)E,
where T is a type and E is a
unary-expression, performs an explicit
conversion (§13.2) of the value of E
to type T.
Seems to back that up by saying that a cast operator in the syntax performs an explicit conversion.

I am reminded of the anecdote told by Richard Feynman where he is attending a philosophy class and the professor askes him "Feynman, you're a physicist, in your opinion is an electron an 'essential object'?" So Feynman asks the clarifying question "is a brick an essential object?" to the class. Every student has a different answer to that question. They say that the fundamental abstract notion of "brickness" is the essential object. No, one specific, unique brick is the essential object. No, the parts of the brick you can empirically observe is the essential object. And so on.
Which is of course not to answer your question.
I'm not going to go through all these dozen answers and debate with their authors about what I really meant. I'll write a blog article on the subject in a few weeks and we'll see if that throws any light on the matter.
How about an analogy though, a la Feynman. You wish to bake a loaf of banana bread Saturday morning (as I do almost every Saturday morning.) So you consult The Joy of Cooking, and it says "blah blah blah... In another bowl, whisk together the dry ingredients. ..."
Clearly there is a strong relationship between that instruction and your actions tomorrow morning, but equally clearly it would be a mistake to conflate the instruction with the action. The instruction consists of text. It has a location, on a particular page. It has punctuation. Were you to be in the kitchen whisking together flour and baking soda, and someone asked "what's your punctuation right now?", you'd probably think it was an odd question. The action is related to the instruction, but the textual properties of the instruction are not properties of the action.
A cast is not a conversion in the same way that a recipe is not the act of baking a cake. A recipe is text which describes an action, which you can then perform. A cast operator is text which describes an action - a conversion - which the runtime can then perform.

From the C# Spec 14.6.6:
A cast-expression is used to convert
explicitly an expression to a given
type.
...
A cast-expression of the form (T)E,
where T is a type and E is a
unary-expression, performs an explicit
conversion (§13.2) of the value of E
to type T.
So casting is a syntactic construct used to instruct the compiler to invoke explicit conversions.
From the C# Spec §13:
A conversion enables an expression of
one type to be treated as another
type. Conversions can be implicit or
explicit, and this determines whether
an explicit cast is required.
[Example: For instance, the conversion
from type int to type long is
implicit, so expressions of type int
can implicitly be treated as type
long. The opposite conversion, from
type long to type int, is explicit, so
an explicit cast is required.
So conversions are where the actual work gets done. You'll note that the cast-expression quote says that it performs explicit conversions but explicit conversions are a superset of implicit conversions, so you can also invoke implicit conversions (even if you don't have to) via cast-expressions.

Just my understanding, probably much too simple:
When casting the essential data remains intact (same internal representation) - "I know this is a dictionary, but you can use it as a ICollection".
When converting, you are changing the internal representation to something else - "I want this int to be a string".

After reading Eric's comments, an attempt in plain english:
Casting means that the two types are actually the same at some level. They may implement the same interface or inherit from the same base class or the target can be "same enough" (a superset?) for the cast to work such as casting from Int16 to Int32.
Converting types then means that the two objects may be similar enough to be converted. Take for example a string representation of a number. It is a string, it cannot simply be cast into a number, it needs to be parsed and converted from one to the other, and, the process may fail. It may fail for casting as well but I imagine that's a much less expensive failure.
And that's the key difference between the two concepts I think. Conversion will entail some sort of parsing, or deeper analysis and conversion of the source data. Casting does not parse. It simply attempts a match at some polymorphic level.

Casting is the creation of a value of one type from another value of another type. Conversion is a type of casting in which the internal representation of the value must also be changed (rather than just its interpretation).
In C#, casting and converting are both done with a cast-expression:
( type ) unary-expression
The distinction is important (and the point is made in the comment) because only conversions may be created by a conversion-operator-declarator. Therefore, only (implicit or explicit) conversions may be created in code.
A non-conversion implicit cast is always available for subtype-to-supertype casts, and a non-conversion explicit cast is always available for supertype-to-subtype casts. No other non-conversion casts are allowed.

In this context, casting means that you are exposing an object of a given type for manipulation as some other type, conversion means that you are actually changing an object of a given type to an object of another type.

This page of the MSDN C# documentation suggests that a cast is specific instance of conversion: the "explicit conversion." That is, a conversion of the form x = (int)y is a cast.
Automatic data type changes (such as myLong = myInt) are the more generic "conversion."

A cast is an operator on a class/struct. A conversion is a method/process on one or the other of the affected classes/structs, or may be in a complete different class/struct (i.e. Converter.ToInt32()
Cast operators come in two flavors: implicit and explicit
Implicit cast operators indicate that data of one type (say, Int32) can always be represented as another type (decimal) without loss of data/precision.
int i = 25;
decimal d = i;
Explicit cast operators indicate that data of one type (decimal) can always be faithfully represented as another type (int), but there may be loss of data/precision. Therefor the compiler requires you to explicitly state that you are aware of this and want to do it anyway, through use of the explicit cast syntax:
decimal d = 25.0001;
int i = (int)d;
Conversion takes two types that are not necessarily related in any way, and attempts to convert one into the other through some process, such as parsing. If all known conversion algorithms fail, the process may either throw an exception or return a default value:
string s = "200";
int i = Converter.ToInt32(s); // set i to 200 by parsing s
string s = "two hundred";
int i = Converter.ToInt32(s); // sets i to 0 because the parse fails
Eric's references to syntactic conversion vs. symantic conversion are basically an operator vs. methodology distinction.

A cast is syntactical, and may or may not involve a conversion (depending on the type of cast). As you know, C++ allows specifying the type of cast you want to use.
Casting up/down the hierarchy may or may not be considered conversion, depending on who you ask (and what language they're talking about!)
Eric (C#) is saying that casting to a different type always involves a conversion, though that conversion may not even change the internal representation of the instance.
A C++-guy will disagree, since a static_cast might not result in any extra code (so the "conversion" is not actually real!)

Casting and Conversion are basically the same concept in C#, except that a conversion may be done using any method such as Object.ToString(). Casting is only done with the casting operator (T) E, that is described in other posts, and may make use of conversions or boxing.
Which conversion method does it use? The compiler decides based on the classes and libraries provided to the compiler at compile-time. If an implicit conversion exists, you are not required to use the casting operator. Object o = String.Empty. If only explicit conversions exist, you must use the casting operator. String s = (String) o.
You can create explicit and implicit conversion operators in your own classes. Note: conversions can make the data look very similar or nothing like the original type to you and me, but it's all defined by the conversion methods, and makes it legal to the compiler.
Casting always refers to the use of the casting operator. You can write
Object o = float.NaN;
String s = (String) o;
But if you access s, in for example a Console.WriteLine, you will receive a runtime InvalidCastException. So, the cast operator still attempts to use conversion at access time, but will settle for boxing during assignment.

INSERTED EDIT#2: isn't it hilariously inconsistent myopia that since I provided this answer, the question has been marked as duplicate of a question which asks, "Is casting the same thing as converting?". And the answers of "No" are overwhelmingly upvoted. Yet my answer here which points out the generative essence for why casts are not the same as conversion is overwhelmingly downvoted (yet I have one +1 in the comments). I suppose that readers have a difficult time with comprehending that casts apply at the denotational syntax/semantics layer and conversions apply at the operational semantics layer. For example, a cast of a reference (or pointer in C/C++)—referring to a boxed data type—to another data type, doesn't (in all languages and scenarios) generate a conversion of the boxed data. For example, in C float a[1]; int* p = (int*)&a; doesn't insure that *p refers to int data.
A compiler compiles from denotational semantics to operational semantics. The compilation is not bijective, i.e. it isn't guaranteed to uncompile (e.g. Java, LLVM, asm.js, or C# bytecode) back to any denotational syntax which compiles to that bytecode (e.g. Scala, Python, C#, C via Emscripten, etc). Thus the two layers not the same.
Thus most obviously a 'cast' and a 'conversion' are not the same thing. My answer here is pointing out that the terms apply to two different layers of semantics. Casts apply to the semantics of what the denotational layer (input syntax of the compiler) knows about. Conversions apply to the semantics of what the operational (runtime or intermediate bytecode) layer knows about. I used the standard term of 'erased' to describe what happens to denotational semantics that aren't explicitly recorded in the operational semantics layer.
For example, reified generics are an example of recording denotational semantics in the operational semantics layer, but they have the disadvantage of making the operational semantics layer incompatible with higher-order denotational semantics, e.g. this is why it was painful to consider implementing Scala's higher-kinded generics on C#'s CLR because C#'s denotational semantics for generics was hard-coded at the operational semantics layer.
Come on guys, stop downvoting someone who knows a lot more than you do. Do your homework first before you vote.
INSERTED EDIT: Casting is an operation that happens at the denotational semantics layer (where types are expressed in their full semantics). A cast may (e.g. explicit conversion) or may not (e.g. upcasting) cause a conversion at the runtime semantic layer. The downvotes on my answer (and the upvoting on Marc Gavin's comment) indicates to me that most people don't understand the differences between denotational semantics and operational (execution) semantics. Sigh.
I will state Eric Lippert's answer more simply and more generally for all languages, including C#.
A cast is syntax so (like all syntax) is erased at compile-time; whereas, a conversion causes some action at runtime.
That is a true statement for every computer language that I am aware of in the entire universe. Note that the above statement does not say that casting and conversions are mutually exclusive.
A cast may cause a conversion at runtime, but there are cases where it may not.
The reason we have two distinct words, i.e. cast and conversion, is we need a way to separately describe what is happening in syntax (the cast operator) and at runtime (conversion, or type check and possible conversion).
It is important that we maintain this separation-of-concepts, because in some programming languages the cast never causes a conversion. Also so that we understand implicit casting (e.g. upcasting) is happening only at compile-time. The reason I wrote this answer is because I want to help readers understand in terms of being multilingual with computer languages. And also to see how that general definition correctly applies in the C# case as well.
Also I wanted to help readers see how I generalize concepts in my mind, which helps me as computer language designer. I am trying to pass along the gift of a very reductionist, abstract way of thinking. But I am also trying to explain this in a very practical way. Please feel free to let me know in the comments if I need to improve the elucidation.
Eric Lippert wrote:
A cast is not a conversion in the same way that a recipe is not the
act of baking a cake. A recipe is text which describes an action,
which you can then perform. A cast operator is text which describes an
action - a conversion - which the runtime can then perform.
The recipe is what is happening in syntax. Syntax is always erased, and replaced with either nothing or some runtime code.
For example, I can write a cast in C# that does nothing and is entirely erased at compile-time when it is does not cause a change in the storage requirements or is upcasting. We can clearly see that a cast is just syntax, that makes no change to the runtime code.
int x = 1;
int y = (int)x;
Giraffe g = new Giraffe();
Animal a = (Animal)g;
That can be used for documentation purposes (yet noisy), but it is essential in languages that have type inference, where a cast is sometimes necessary to tell the compiler what type you wish it to infer.
For an example, in Scala a None has the type Option[Nothing] where Nothing is the bottom type that is the sub-type of all possible types (not super-type). So sometimes when using None, the type needs to be casted to a specific type, because Scala only does local type inference, thus can't always infer the type you intended.
// (None : Option[Int]) casts None to Option[Int]
println(Some(7) <*> ((None : Option[Int]) <*> (Some(9) > add)))
A cast could know at compile-time that it requires a type conversion, e.g. int x = (int)1.5, or could require a type check and possible type conversion at runtime, e.g. downcasting. The cast (i.e. the syntax) is erased and replaced with the runtime action.
Thus we can clearly see that equating all casts with explicit conversion, is an error of implication in the MSDN documentation. That documentation is intending to say that explicit conversion requires a cast operator, but it should not be trying to also imply that all casts are explicit conversions. I am confident that Eric Lippert can clear this up when he writes the blog he promised in his answer.
ADD: From the comments and chat, I can see that there is some confusion about the meaning of the term erased.
The term 'erased' is used to describe information that was known at compile-time, which is not known at runtime. For example, types can be erased in non-reified generics, and it is called type erasure.
Generally speaking all the syntax is erased, because generally CLI is not bijective (invertible, and one-to-one) with C#. You cannot always go backwards from some arbitrary CLI code back to the exact C# source code. This means information has been erased.
Those who say erased is not the right term, are conflating the implementation of a cast with the semantic of the cast. The cast is a higher-level semantic (I think it is actually higher than syntax, it is denotational semantics at least in case of upcasting and downcasting) that says at that level of semantics that we want to cast the type. Now how that gets done at runtime is entirely different level of semantics. In some languages it might always be a NOOP. For example, in Haskell all typing information is erased at compile-time.

Why is C# statically typed?

I am a PHP web programmer who is trying to learn C#.
I would like to know why C# requires me to specify the data type when creating a variable.
Class classInstance = new Class();
Why do we need to know the data type before a class instance?

As others have said, C# is static/strongly-typed. But I take your question more to be "Why would you want C# to be static/strongly-typed like this? What advantages does this have over dynamic languages?"
With that in mind, there are lots of good reasons:
Stability Certain kinds of errors are now caught automatically by the compiler, before the code ever makes it anywhere close to production.
Readability/Maintainability You are now providing more information about how the code is supposed to work to future developers who read it. You add information that a specific variable is intended to hold a certain kind of value, and that helps programmers reason about what the purpose of that variable is.
This is probably why, for example, Microsoft's style guidelines recommended that VB6 programmers put a type prefix with variable names, but that VB.Net programmers do not.
Performance This is the weakest reason, but late-binding/duck typing can be slower. In the end, a variable refers to memory that is structured in some specific way. Without strong types, the program will have to do extra type verification or conversion behind the scenes at runtime as you use memory that is structured one way physically as if it were structured in another way logically.
I hesitate to include this point, because ultimately you often have to do those conversions in a strongly typed language as well. It's just that the strongly typed language leaves the exact timing and extent of the conversion to the programmer, and does no extra work unless it needs to be done. It also allows the programmer to force a more advantageous data type. But these really are attributes of the programmer, rather than the platform.
That would itself be a weak reason to omit the point, except that a good dynamic language will often make better choices than the programmer. This means a dynamic language can help many programmers write faster programs. Still, for good programmers, strongly-typed languages have the potential to be faster.
Better Dev Tools If your IDE knows what type a variable is expected to be, it can give you additional help about what kinds of things that variable can do. This is much harder for the IDE to do if it has to infer the type for you. And if you get more help with the minutia of an API from the IDE, then you as a developer will be able to get your head around a larger, richer API, and get there faster.
Or perhaps you were just wondering why you have to specify the class name twice for the same variable on the same line? The answer is two-fold:
Often you don't. In C# 3.0 and later you can use the var keyword instead of the type name in many cases. Variables created this way are still statically typed, but the type is now inferred for you by the compiler.
Thanks to inheritance and interfaces sometimes the type on the left-hand side doesn't match the type on the right hand side.

It's simply how the language was designed. C# is a C-style language and follows in the pattern of having types on the left.
In C# 3.0 and up you can kind of get around this in many cases with local type inference.
var variable = new SomeClass();
But at the same time you could also argue that you are still declaring a type on the LHS. Just that you want the compiler to pick it for you.
EDIT
Please read this in the context of the users original question
why do we need [class name] before a variable name?
I wanted to comment on several other answers in this thread. A lot of people are giving "C# is statically type" as an answer. While the statement is true (C# is statically typed), it is almost completely unrelated to the question. Static typing does not necessitate a type name being to the left of the variable name. Sure it can help but that is a language designer choice not a necessary feature of static typed languages.
These is easily provable by considering other statically typed languages such as F#. Types in F# appear on the right of a variable name and very often can be altogether ommitted. There are also several counter examples. PowerShell for instance is extremely dynamic and puts all of its type, if included, on the left.

One of the main reasons is that you can specify different types as long as the type on the left hand side of the assignment is a parent type of the type on the left (or an interface that is implemented on that type).
For example given the following types:
class Foo { }
class Bar : Foo { }
interface IBaz { }
class Baz : IBaz { }
C# allows you to do this:
Foo f = new Bar();
IBaz b = new Baz();
Yes, in most cases the compiler could infer the type of the variable from the assignment (like with the var keyword) but it doesn't for the reason I have shown above.
Edit: As a point of order - while C# is strongly-typed the important distinction (as far as this discussion is concerned) is that it is in fact also a statically-typed language. In other words the C# compiler does static type checking at compilation time.

C# is a statically-typed, strongly-typed language like C or C++. In these languages all variables must be declared to be of a specific type.

Ultimately because Anders Hejlsberg said so...

You need [class name] in front because there are many situations in which the first [class name] is different from the second, like:
IMyCoolInterface obj = new MyInterfaceImplementer();
MyBaseType obj2 = new MySubTypeOfBaseType();
etc. You can also use the word 'var' if you don't want to specify the type explicitely.

Why do we need to know the data type
before a class instance?
You don't! Read from right to left. You create the variable and then you store it in a type safe variable so you know what type that variable is for later use.
Consider the following snippet, it would be a nightmare to debug if you didn't receive the errors until runtime.
void FunctionCalledVeryUnfrequently()
{
ClassA a = new ClassA();
ClassB b = new ClassB();
ClassA a2 = new ClassB(); //COMPILER ERROR(thank god)
//100 lines of code
DoStuffWithA(a);
DoStuffWithA(b); //COMPILER ERROR(thank god)
DoStuffWithA(a2);
}
When you'r thinking you can replace the new Class() with a number or a string and the syntax will make much more sense. The following example might be a bit verbose but might help to understand why it's designed the way it is.
string s = "abc";
string s2 = new string(new char[]{'a', 'b', 'c'});
//Does exactly the same thing
DoStuffWithAString("abc");
DoStuffWithAString(new string(new char[]{'a', 'b', 'c'}));
//Does exactly the same thing

C#, as others have pointed out, is a strongly, statically-typed language.
By stating up front what the type you're intending to create is, you'll receive compile-time warnings when you try to assign an illegal value. By stating up front what type of parameters you accept in methods, you receive those same compile-time warnings when you accidentally pass nonsense into a method that isn't expecting it. It removes the overhead of some paranoia on your behalf.
Finally, and rather nicely, C# (and many other languages) doesn't have the same ridiculous, "convert anything to anything, even when it doesn't make sense" mentality that PHP does, which quite frankly can trip you up more times than it helps.

c# is a strongly-typed language, like c++ or java. Therefore it needs to know the type of the variable. you can fudge it a bit in c# 3.0 via the var keyword. That lets the compiler infer the type.

That's the difference between a strongly typed and weakly typed language. C# (and C, C++, Java, most more powerful languages) are strongly typed so you must declare the variable type.

When we define variables to hold data we have to specify the type of data that those variables will hold. The compiler then checks that what we are doing with the data makes sense to it, i.e. follows the rules. We can't for example store text in a number - the compiler will not allow it.
int a = "fred"; // Not allowed. Cannot implicitly convert 'string' to 'int'
The variable a is of type int, and assigning it the value "fred" which is a text string breaks the rules- the compiler is unable to do any kind of conversion of this string.

In C# 3.0, you can use the 'var' keyword - this uses static type inference to work out what the type of the variable is at compile time
var foo = new ClassName();
variable 'foo' will be of type 'ClassName' from then on.

One things that hasn't been mentioned is that C# is a CLS (Common Language Specification) compliant language. This is a set of rules that a .NET language has to adhere to in order to be interopable with other .NET languages.
So really C# is just keeping to these rules. To quote this MSDN article:
The CLS helps enhance and ensure
language interoperability by defining
a set of features that developers can
rely on to be available in a wide
variety of languages. The CLS also
establishes requirements for CLS
compliance; these help you determine
whether your managed code conforms to
the CLS and to what extent a given
tool supports the development of
managed code that uses CLS features.
If your component uses only CLS
features in the API that it exposes to
other code (including derived
classes), the component is guaranteed
to be accessible from any programming
language that supports the CLS.
Components that adhere to the CLS
rules and use only the features
included in the CLS are said to be
CLS-compliant components
Part of the CLS is the CTS the Common Type System.
If that's not enough acronyms for you then there's a tonne more in .NET such as CLI, ILasm/MSIL, CLR, BCL, FCL,

Because C# is a strongly typed language

Static typing also allows the compiler to make better optimizations, and skip certain steps. Take overloading for example, where you have multiple methods or operators with the same name differing only by their arguments. With a dynamic language, the runtime would need to grade each version in order to determine which is the best match. With a static language like this, the final code simply points directly to the appropriate overload.
Static typing also aids in code maintenance and refactoring. My favorite example being the Rename feature of many higher-end IDEs. Thanks to static typing, the IDE can find with certainty every occurrence of the identifier in your code, and leave unrelated identifiers with the same name intact.
I didn't notice if it were mentioned yet or not, but C# 4.0 introduces dynamic checking VIA the dynamic keyword. Though I'm sure you'd want to avoid it when it's not necessary.

Why C# requires me to specify the data type when creating a variable.
Why do we need to know the data type before a class instance?
I think one thing that most answers haven't referenced is the fact that C# was originally meant and designed as "managed", "safe" language among other things and a lot of those goals are arrived at via static / compile time verifiability. Knowing the variable datatype explicitly makes this problem MUCH easier to solve. Meaning that one can make several automated assessments (C# compiler, not JIT) about possible errors / undesirable behavior without ever allowing execution.
That verifiability as a side effect also gives you better readability, dev tools, stability etc. because if an automated algorithm can understand better what the code will do when it actually runs, so can you :)

Statically typed means that Compiler can perform some sort of checks at Compile time not at run time. Every variable is of particular or strong type in Static type. C# is strongly definitely strongly typed.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.