Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Are there philosophical differences between Smalltalk OOP and Simula OOP ?
This is a question related to Java & C# vs C++ indirectly. As I understand, C++ is based on Simula but Java and C# are more or less from the Smalltalk family.
Several key 'differences in 'Style' within the broader OOP banner.
In all cases a statement about a static or dynamic type system means predominately one or the other, the issue is far from clear cut or clearly defined.
Also many languages choose the blur the line between the choices so this is not a list of binary choices by any means.
Polymorphic late binding
or "what does foo.Bar(x) mean?"
Hierarchy of types is flattened to a specific implementation per instance (often done via a vtable) and often allowing explicit reference to the base classes implementation.
Conceptually you look at the most specific type that foo is at the callsite. If it has an implementation of Bar for the parameter x that is called, if not the parent of foo is chosen and the process repeated.
Examples: C++/Java/C#, "Simula style" is often used.
Pure message passing. The code in foo which handles messages 'named' "Bar" is asked to accept the x. Only the name matters, not any assumptions the call site may have had about exactly what Bar was meant to be. Contrast with the previous style in which the method in question was Bar known to be something defined on whatever was known about the type hierarchy defined at compile time (though the precise place in the hierarchy is left till runtime).
Examples: Objective-C/Ruby, "Smalltalk style" is often used.
1 is often used within statically typed frameworks where it is an error, checked at compile time for no such implementation to exist. Further the languages often differentiate between Bar(x) and Bar(y) if x and y are different types. This is method overloading and the resulting methods with the same name are viewed as entirely different.
2 is often used in dynamic languages (which tend to avoid method overloading) as such it is possible that, at runtime the type of foo has no 'handler' for the message named 'Bar', different languages handle this in different ways.
Both can be implemented behind the scenes in the same fashion if desired (often the default for the second, Smalltalk style is to invoke a function but this is not made a defined behaviour in all cases).
Since the former method can frequently be easily implemented as simple pointer offset function calls it can, more easily, be made relatively fast. This does not mean that the other styles cannot also be made fast, but more work may be required to ensure that the greater flexibility is not compromised when doing so.
Inheritance/Reuse
or "Where do babies come from?"
Class based
Method implementations are organized into groups called classes. When implementation inheritance is desired a class is defined which extends the parent class. In this way it gains all exposed aspects of the parent (both fields and methods) and can choose to alter certain/all of those aspects but cannot remove any. You can add and update but not delete.
Examples: C++/Java/C# (note both SmallTalk and Simula use this)
Prototype based
Any instance of an object is simply a collection of identified methods (normally identified by name) and state in the form of (again named) fields. Whenever a new instance of this 'type' is desired an existing instance can be used to clone a new one. This new class retains a copy of the state and methods of the previous class but can then be modified to remove, add or alter existing named fields and methods.
Examples: Self/JavaScript
Again 1 tends to happen in static languages, 2 in dynamic though this is by no means a requirement they simply lend themselves to the style.
Interface or Class based
or "what or how?"
Interfaces list the methods that are required. They are a contract
Examples: VB6
Classes list methods that are required but may optionally supply their implementation
Examples: Simula
This is very much not a binary choice. Most class based languages allow the concept of abstract methods (ones with no implementation yet). If you have a class where all methods are abstract (called pure virtual in C++) then what the class amounts to is pretty much an interface, albeit one that may have also defined some state (fields). An true Interface should have no state (since it defines only what is possible, not how it happens.
Only older OOP languages tend to rely solely on one or the other.
VB6 has only on interfaces and have no implementation inheritance.
Simula let you declare pure virtual classes but you could instantiate them (with runtime errors on use)
Single or Multiple Inheritance
or "Who is the daddy?"
Single
Only one type can be a parent to another. In the Class based form above you can extend (take implementation from) only one type. Typically this form includes the concept of interfaces as first class aspects of the language to make up for this.
advantages include cleaner metadata and introspection, simpler language rules.
complications include making it harder to bring useful methods into scope (things like MixIns and Extension methods seek to mitigate this sort of problem)
Examples: C#/java
Multiple - you can extend multiple classes
advantages include certain structures are easier to model and design
complications include complex rules for collision resolution, especially when overloaded methods exist which could take either parent type.
Examples: C++/Eiffel
This question provokes considerable debate, especially as it is a key differentiator between C++'s OOP implementation and many of the modern statically typed languages perceived as possible successors like c# and java.
Mutability
or "what do you want to do to me?"
Mutable
Objects, once created can have their state changed.
Imutable
Objects, once created cannot be changed.
Frequently this is not an all or nothing it is simply a default (most commonly used OOP languages default to mutable by default). This can have a great deal of affect on how the language is structured. Many primarily functional languages which have included OOP features default the objects to have immutable state.
'Pureness' of their OOP
or "Is everything an Object?"
Absolutely everything in the system is viewed as an object (possibly even down to the methods themselves which are simply another kind of object and can be interacted with in the same way other objects can be).
Examples: SmallTalk
Not everything is an object, you cannot pass messages to everything (though the system might jump through hoops to make it seem like you can)
Examples: C++/C#/Java (see note*)
This is quite complex since techniques like auto boxing of primitives make it seem like everything is but you will find that several boundary cases exist where this 'compiler magic' is discovered and the proverbial wizard of Oz is found behind the curtain resulting is problems or errors.
In languages with immutability as a default this is less likely to happen, since the key aspect of objects (that they contain both methods and state) means that things that are similar to objects but not quite have less possibility for complications.
In regards to Java/C# the autoboxing(or in c#) system lets you treat, syntactically any variable as if it was an object but, in actuality this is not the case and this is exhibited in areas such as attempting to lock on an autoboxed object (rejected by the compiler as it would be an obvious bug).
Static or Dynamic
or "Who do you think you are?"
A far more pervasive aspect of language design and not one to get into here but the choices inherent in this decision impact many aspects of OOP as mentioned earlier.
Just aspects of the polymorphic late binding can depend on:
The type of the object to whom the message is being passed (at compile time/run time)
The type of the parameter(s) which are being passed (at compile time/run time)
The more dynamic a language gets the more complex these decisions tend to become but conversely the more input the language user, rather than the language designer has in the decision.
Giving examples here would be some what foolhardy since statically typed languages may be modified to include dynamic aspects (like c# 4.0).
I'd put Java and C# in the Simula camp as well:
Smalltalk, being dynamically typed, is quite apart of the four other languages you cite.
Smalltalk is structurally typed (alias duck typing) while the other four are nominally typed.
(What Java and C# have in common with Smalltalk is being mainly based on a VM, but there is little influence on the programming style).
Java and C# are definitely not from the Smalltalk family. Alan Kay even said that when he created OOP he did not have anything like Java or C++ in mind. Java, C#, and C++ all interpret OOP in pretty much the same way.
Languages like Smalltalk and Ruby have a radically different model that is based on message passing. In C++ classes are essentially namespaces for methods and state. Method invocations are bound at compile time. Smalltalk does not bind a "method call" until runtime. The result of this is that in C++
foo->bar
is compiled to mean "call the bar method on the foo object." If bar is non virtual, I'd imagine that the address of the bar method is specifically referenced.
In Smalltalk
foo bar
means "send the message bar to the foo object." foo can do whatever it wants with this message when it arrives. The default behavior is to call the method named bar, but that is not required. This property is exploited in Ruby for ActiveRecord column accessors. When you have an ActiveRecord object and you send it the name of a column in its database table as a message, if there is no method with that name defined, it checks to see if there is a column by that name on the table and if there is returns the value.
Message passing might seem like a tiny, irrelevant detail, but out of it, the rest of OOP easily flows.
"OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I'm not aware of them." -- Alan Kay, creator of Smalltalk
Eiffel is a statically typed, compiled, multiple inheritance pure OOP language.
http://dev.eiffel.com
Of the modern (and I use the term lightly) OO programming languages Objective C is the most like smalltalk.
Messaages:
In C++,C# and Java: messages are bound at compile time.
You can think of a method call as a message being sent to the object.
In Objective C,Smalltalk: messages are bound at run time.
I would say statically typed and dynamically typed OOP are two separate disciplines within the same school of OOP.
Java, C#, and C++ all follow a similar OOP strategy. It is based on function calls that are bound at compile time. Depending ont he call, either the direct function call or an offset into a vtable is fixed when compilation happens. By contrast Smalltalk's OOP is based on message passing. Conceptually every method call is a message to the receiving object asking whether it has a method called "Foo."
Smalltalk has no concept of interfaces. It only has similar looking methods. In the C++ group of languages, everything is bound to interfaces. One cannot implement AddRef and Release without also implementing QueryInterface (even if it is just a stub) because they are all part of the IUnknown interface. In Smalltalk, there is no IUnknown. There is only a collection of 3 functions, any of which could be implemented or not.
I'd say there is also a pretty big difference, conceptually, between class-based OOP (of which Smalltalk, Simula, C# and Java are all examples) and prototype-based OOP (which started with Self and is most widespread in JavaScript).
Aside from the above points, there is also a conceptual breakdown of Smalltalk vs. Simula.
Conceptually, "Smalltalk-style" typically indicates that the method run when a message is called is determined at run time, aiding polymorphism.
"Simula-style", on the other hand, usually seems to indicate where all method calls are really just a convenient way of writing overloaded function calls--no runtime polymorphism. (Please correct me if I'm wrong.)
In the middle, we have Java: all methods virtual by default, but statically typed and has compile-time type dispatch.
Example:
// C++
class Base {
void doSomething() {
cout << "Base::doSomething() called!\n";
}
}
class Derived : Base {
void doSomething() {
cout << "Derived::doSomething() called!\n";
}
}
int main() {
Base* b = new Base();
Derived* d = new Derived();
b->doSomething(); // prints "Base::doSomething() called!"
d->doSomething(); // prints "Derived::doSomething() called!"
Base* d2 = d; // OK; Liskov substitution principle.
d2->doSomething(); // prints "Base::doSomething called!" (!)
delete b;
delete d;
return 0;
}
VS:
// Objective-C
//Base.h
#interface Base
{
}
-(void)doSomething
#end
//Base.m
#import "Base.h"
#implementation Base
-(void) doSomething {
printf("doSomething sent to Base!");
}
#end
//Derived.h
#import "Base.h"
#import "Base.m"
#interface Derived : Base
{
}
#end
//Derived.m
#import "Derived.h"
#implementation Derived
-(void) doSomething {
printf("doSomething sent to Derived!")
}
#end
//Main.m
#import "Base.h"
#import "Base.m"
#import "Derived.h"
#import "Derived.m"
int main() {
Base* b = [[Base alloc] init];
Derived* d = [[Derived alloc] init];
[b doSomething]; // prints "doSomething sent to Base!"
[d doSomething]; // prints "doSomething sent to Derived!"
Base* d2 = d;
[d2 doSomething]; // prints "doSomething sent to Derived!"
[b release];
[d release];
return 0;
}
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
So i have a situation where i need different amounts of arguments for a function depending on the end result i desire.
I am new to C# and have heard about overloading the function which is not something i have seen before (i started in JavaScript).
But it looks bit dirty, like not a good practice to do even though it does work. Is it generally not a good idea to use overloaded functions, i could probably do an alternative with more work but overloads do make life easier.
It just feels very uncomfortable having more than one method with the same name. Are these considered standard features and acceptable code practice ? Or could it lead to some messy problems in the future that my inexperience does not know about yet and thus i should avoid it ?
Function overloads.
Named actually method overloading. C# does not have a direct distinction to methods which return a value and those that don't. Hence method.
But it looks bit dirty
It is a key component to the language which is a common practice and definitely not frowned upon.
like not a good practice to do even though it does work.
The idea is to provide different variants for a consumer. One consumer may only have X type to use while only Y type is offered. By offering more, the library, and/or instance is more flexible. Plus it lessons failure points by having the consumer convert data to get it into the method.
(i started in JavaScript).
Don't try to program in the style of language which one is accustomed to. Use the specific features of any new language as designed. By trying to do Java in C# or Ruby in C# is foolish. All languages have their design points...program to the language, not to a style of programming.
just feels very uncomfortable having more than one method with the same name.
Coming from a language which is not type safe, that is an understandable reaction. But keep in mind that the compiler is enforcing safety so that widget X is only matched with widget X; it is a true feature and not a gimmick.
Frankly when I see code which does not provide multiple overloads, I view it as either laziness of the developer or some god awful time crunch, hence rushed code.
Don't go overboard...simply provide enough overloads to make the class useable by a majority of consumers.
Or could it lead to some messy problems in the future
If one is not consistent, possibly yes.
So be consistent in the placement of the variables. If an int starts the method, the other method should also start with that same int; if offered. Don't mix the order.
Consider the following class:
public class MyClass
{
public void MyMethod(int a, object b)
{
}
}
If someone else calls your class like this:
new MyClass().MyMethod(1, 1);
And then in a future version of your assembly you add an innocent overload:
public class MyClass
{
public void MyMethod(int a, object b)
{
}
public void MyMethod(object a, int b)
{
}
}
That someone else's code will not compile against the new assembly.
You are correct that method overloading can introduce problems... however it is not always problematic.
Suppose the simple case - you have a method that operates on one Type - T. If you are tempted to add a method overload to handle a second Type U, consider what interfaces and base classes T and U might have in common (including T or U extending one another). If there is a common Type consider making that the argument type at design time (if that's specific enough). If not, then you may need a method overload. A good contrived example might be a method that returns the square of a number. There is no common abstraction for Type's that have an * operator (which you can write your own in C#). So you'd have to make (2) methods to handle an int and a double:
public int SquareMe(int x) { return x * x; }
public double SquareMe(double x) { return x * x; }
If, however, you found yourself wanting to make a method operating on List<T>, IEnumerable<T>, and T[], you may be better off writing the method to accept an IEnumerable<T> (and just calling ToArray() on it immediately to prevent the IEnumerable from expanding multiple times if your code needs it multiple times - if you're just foreach'ing it once, there's no need to expand it) this way you're left with only (1) method to write tests for. Every method, particularly on publicly consumed API's is more to maintain, document, test, automate, etc. Simpler is usually better (but complexity has its place, too). It's difficult to give an algorithm for design of API's (if there was an existing algorithm for such a thing, we could just have the design generated as the output from some hypothetical program, yes?)
When it comes to designing classes and interfaces for public consumption you should be very careful about method overloading (and your entire API, in general - method overloading introducing subtle breaking changes is just one thing to think about - almost any change could be a breaking change). If your API used by everyone, such as Microsoft, all changes to API's have to be very well thought-out and have minimum to 0 breaking changes.
If it's for "internal" use (and you can detect compilation breaks at build time) then if the compiler's happy, method overloading shouldn't be too big of a deal in and of itself. That being said - someone might call a different overload by accident because of what C# will choose. It's probably more important to have explicit method names (Microsoft recommends spelling things out in C#, generally) that intuitively (i.e. subjectively) match the content of what the method does than the concern of overloading.
Like other things, this language features is a trade off between being explicit and implicit and whether or not it's a good idea varies on the situation; method overloading can be both used and abused. In general try to learn the existing practices, patterns and culture of a new language before developing your own style on things so that you can take advantage of everyone's successes and failures before you. Method overloading definitely has its place in C#.
So, you have a situation that would be made easier by a core feature of the language you are using... and you're concerned about that? I wouldn't worry too much.
It might be idea to make an attempt and once you're happy with it take it over to codereview.stackexchange.com to get some feedback.
If the reason for varying signatures is because 'the end result you desire' varies then that's a case for having different functions.
Overloading is helpful when you have a number of optional parameters. If you have five optional parameters it's less obvious what will happen if you specify some but not others. If you create overloads then you can provide different versions of the function with required parameters. Perhaps behind the scenes they can all call a private method with optional parameters, but that remains hidden from public use.
Why not just use an optional parameter?
void Foo (int op = 42)
{
if (x!=42)
//do something
else
}
int x = 33;
Foo();
Foo(x);
This question already has answers here:
Why does every class in .NET inherit from Object?
(9 answers)
Closed 9 years ago.
I was checking the int and float types in C# and even they have the "ToString" etc, methods meaning they are inherited from System.Object. But Doesn't this cause a performance hit? I understand that they did not make base types like int objects in java because of performance. Doesn't this rule apply to .NET as well? And if it does, then does that mean .NET is slower than Java? But practically that's not true because the programs i have made in C# run way better than those I made in Java. So is there something I don't understand here?
It's very important to understand the differences between value types and reference types. The core of the difference is what the value of an expression of the type is.
Consider:
int x = 10;
SomeClass y = new SomeClass();
Here, the value of x really is 10 - the bits for 10 end up in the memory associated with the variable x.
The value of y is a reference - a way of getting to a separate object in memory.
The difference becomes very important when you use assignment, particularly with mutable reference types:
int x1 = 10;
int x2 = x1;
SomeClass y1 = new SomeClass();
SomeClass y2 = y1;
y1.SomeProperty = "Fred";
Console.WriteLine(y2.SomeProperty);
In both cases, the value of the variable is copied in the assignment - so x2's value is 10; y2's value is a reference to the same object. So when the object's data is modified via the property in the penultimate, you can still see that difference via y2.
You would rarely write z1.SomeProperty = ... when z1 is a variable of a value type, as most value types are immutable. (You can't change the value of the data itself - you have to assign a new value to the variable explicitly.) If you did, however you wouldn't see any changes via variables which were previously initialized using an assignment z1 because the value would have been copied in that assignment.
I have an article on value types and reference types which goes into all this in more detail.
Now, C# and .NET have a unified type system such that even value types inherit from System.Object. That means you can call all the messages from Object on value type values. Sometimes that requires boxing (converting a value type value into an Object) and sometimes it doesn't... I won't go into all the rules right now. Importantly, if a value type overrides an object method (e.g. ToString, or GetHashCode) the value doesn't need to be boxed to call the method.
While boxing does have a performance penalty, it's typically overstated in my experience. These days with generics, boxing isn't really needed as much as it used to be - but so long as you only use it sensibly, it's unlikely to become a significant performance problem in your application.
EDIT: Regarding games, performance, and learning things a bit at a time...
I've seen lots of people asking questions on a relatively advanced topic without understanding the basics. There's nothing wrong with being a beginner, obviously, but in my view it's really important to learn the basics first, in a "newbie friendly" environment - which I don't think games count as.
Personally I find that console apps are the easiest way of learning most core new concepts - whether that's language features, file IO, collections, web services, threading etc. Obviously for things like "learning Windows Forms" you can't do that with a console app - but it really helps if nothing other than the Windows Forms part is new to you.
So I would strongly advise that you learn the basics of C# - things like how value types and reference types work, how parameter passing works, generics, classes, inheritance etc - before you move onto games. That may sound like extra work, but it's likely to save you time later on. When something doesn't behave as you expect it to, you'll know how the language works, so you can focus on the API behaving differently, etc. Trying to learn one thing at a time makes the whole process smoother, in my experience.
In particular, the performance requirements of games mean that sometimes it's worth writing very non-idiomatic C#. Things like preallocating objects and reusing them where you'd normally create new objects... even creating mutable structs which I'd pretty much never do in normal development. In the critical game loop, boxing or creating any objects at all may be a bad idea - but that doesn't mean these things are "expensive" in the normal frame of reference. It's really important that you understand when these things are appropriate, and when they're not... and if you start with games development, you'll get an imbalanced view of these things, IMO. You'll also potentially try to micro-optimize areas where you really don't need to - but if you have a solid grounding in C# and .NET to start with, you'll be in a better position to get everything in perspective.
As long as all objects appear to be derived from System.Object, then for all practical purposed they are derived from System.Object. Underneath the hood things are optimized into perfection so that an int is just a four-byte int most of the time but it's an object when it needs to be an object. That's what boxing is all about.
The compiler and the run-time both work very, very hard to make sure that primitive types are not overburdened with a lot of excess baggage in size or function. At the same time, all rules needed to meet the C# specification and the object hierarchy are ensured. Sometimes the compiler takes shortcuts, sometimes the run-time does the work.
One advantage of having a common base class means that you could write a method like
public void DoSomething(Object object)
{
....
}
and essentially pass in anything.
Not sure about the performance aspect though.
EDIT: To clarify my position on boxing: Jon said it well when he asked me to point out: "[boxing is] as expensive as creating any other small object". So, I don't mean to overstate the performance impact. However, I do intent to arm intelligent readers with the information they need to make smart decisions based on their individual circumstances.
EDIT: OK, so if you want to be technical, I retract my previous statement. From C# ECMA standard: "All value types implicitly inherit from class object. It is not possible for any type to derive from a value type, and value types are thus implicitly sealed (ยง17.1.1.2)." (C# ECMA Standard Page 130) HOWEVER... It's my understanding that the thrust of the OP's question is really in relation to PERFORMANCE and how the types are treated under the hood in .NET. To that point: Value Types ARE treated differently from Reference Types - and simple types (int, float, etc) are stored and operated upon in an efficient manner. When they ARE treated like Objects, you pay an expensive performance cost (as the OP suspects). The moral of this story is, for me, and hopefully maybe someone else, is to AVOID BOXING - which, in practical terms, means Value Types ARE different from Object... take my answer as you may.
You are mistaken. System.Integer and System.Float are not subclasses of System.Object. They are what's referred to as "Value Types".
you can see this is true in the documentation: http://msdn.microsoft.com/en-us/library/system.int32.aspx which shows that Int is a "struct".
This article discusses the topic in detail: http://msdn.microsoft.com/en-us/library/34yytbws%28v=vs.71%29.aspx
You should definitely read that last one if you're interested in this.
In regards to your broader question, "Why are most types in C# inherited from System.Object", you'll find there is nothing unique about this. Java has java.lang.Object, Objective-C has NSObject, etc. etc. The distinction between value types and reference types is nearly universal. I'm not sure I really need to get into the long-winded answer to that here, because I think pointing out the difference, and the article about value types in C#, has already probably answered your question.
To clarify everyone else's questions about boxing, etc.: http://msdn.microsoft.com/en-us/magazine/cc301569.aspx
If I have various subclasses of something, and an algorithm which operates on instances of those subclasses, and if the behaviour of the algorithm varies slightly depending on what particular subclass an instance is, then the most usual object-oriented way to do this is using virtual methods.
For example if the subclasses are DOM nodes, and if the algorithm is to insert a child node, that algorithm differs depending on whether the parent node is a DOM element (which can have children) or DOM text (which can't): and so the insertChildren method may be virtual (or abstract) in the DomNode base class, and implemented differently in each of the DomElement and DomText subclasses.
Another possibility is give the instances a common property, whose value can be read: for example the algorithm might read the nodeType property of the DomNode base class; or for another example, you might have different types (subclasses) of network packet, which share a common packet header, and you can read the packet header to see what type of packet it is.
I haven't used run-time-type information much, including:
The is and as keywords in C#
Downcasting
The Object.GetType method in dot net
The typeid operator in C++
When I'm adding a new algorithm which depends on the type of subclass, I tend instead to add a new virtual method to the class hierarchy.
My question is, when is it appropriate to use run-time-type information, instead of virtual functions?
When there's no other way around. Virtual methods are always preferred but sometimes they just can't be used. There's couple of reasons why this could happen but most common one is that you don't have source code of classes you want to work with or you can't change them. This often happens when you work with legacy system or with closed source commercial library.
In .NET it might also happens that you have to load new assemblies on the fly, like plugins and you generally have no base classes but have to use something like duck typing.
In C++, among some other obscure cases (which mostly deal with inferior design choices), RTTI is a way to implement so-called multi methods.
This constructions ("is" and "as") are very familiar for Delphi developers since event handlers usually downcast objects to a common ancestor. For example event OnClick passes the only argurment Sender: TObject regardless of the type of the object, whether it is TButton, TListBox or any other. If you want to know something more about this object you have to access it through "as", but in order to avoid an exception, you can check it with "is" before. This downcasting allows design-type binding of objects and methods that could not be possible with strict class type checking. Imagine you want to do the same thing if the user clicks Button or ListBox, but if they provide us with different prototypes of functions, it could not be possible to bind them to the same procedure.
In more general case, an object can call a function that notifies that the object for example has changed. But in advance it leaves the destination the possibility to know him "personally" (through as and is), but not necessarily. It does this by passing self as a most common ancestor of all objects (TObject in Delphi case)
dynamic_cast<>, if I remember correctly, is depending on RTTI. Some obscure outer interfaces might also rely on RTTI when an object is passed through a void pointer (for whatever reason that might happen).
That being said, I haven't seen typeof() in the wild in 10 years of pro C++ maintenance work. (Luckily.)
You can refer to More Effective C# for a case where run-time type checking is OK.
Item 3. Specialize Generic Algorithms
Using Runtime Type Checking
You can easily reuse generics by
simply specifying new type parameters.
A new instantiation with new type
parameters means a new type having
similar functionality.
All this is great, because you write
less code. However, sometimes being
more generic means not taking
advantage of a more specific, but
clearly superior, algorithm. The C#
language rules take this into account.
All it takes is for you to recognize
that your algorithm can be more
efficient when the type parameters
have greater capabilities, and then to
write that specific code. Furthermore,
creating a second generic type that
specifies different constraints
doesn't always work. Generic
instantiations are based on the
compile-time type of an object, and
not the runtime type. If you fail to
take that into account, you can miss
possible efficiencies.
For example, suppose you write a class that provides a reverse-order enumeration on a sequence of items represented through IEnumerable<T>. In order to enumerate it backwards you may iterate it and copy items into an intermediate collection with indexer access like List<T> and than enumerate that collection using indexer access backwards. But if your original IEnumerable is IList why not take advantage of it and provide more performant way (without copying to intermediate collection) to iterate items backwards. So basically it is a special we can take advantage of but still providing the same behavior (iterating sequence backwards).
But in general you should carefully consider run-time type checking and ensure that it doesn't violate Liskov Substituion Principle.
I am a PHP web programmer who is trying to learn C#.
I would like to know why C# requires me to specify the data type when creating a variable.
Class classInstance = new Class();
Why do we need to know the data type before a class instance?
As others have said, C# is static/strongly-typed. But I take your question more to be "Why would you want C# to be static/strongly-typed like this? What advantages does this have over dynamic languages?"
With that in mind, there are lots of good reasons:
Stability Certain kinds of errors are now caught automatically by the compiler, before the code ever makes it anywhere close to production.
Readability/Maintainability You are now providing more information about how the code is supposed to work to future developers who read it. You add information that a specific variable is intended to hold a certain kind of value, and that helps programmers reason about what the purpose of that variable is.
This is probably why, for example, Microsoft's style guidelines recommended that VB6 programmers put a type prefix with variable names, but that VB.Net programmers do not.
Performance This is the weakest reason, but late-binding/duck typing can be slower. In the end, a variable refers to memory that is structured in some specific way. Without strong types, the program will have to do extra type verification or conversion behind the scenes at runtime as you use memory that is structured one way physically as if it were structured in another way logically.
I hesitate to include this point, because ultimately you often have to do those conversions in a strongly typed language as well. It's just that the strongly typed language leaves the exact timing and extent of the conversion to the programmer, and does no extra work unless it needs to be done. It also allows the programmer to force a more advantageous data type. But these really are attributes of the programmer, rather than the platform.
That would itself be a weak reason to omit the point, except that a good dynamic language will often make better choices than the programmer. This means a dynamic language can help many programmers write faster programs. Still, for good programmers, strongly-typed languages have the potential to be faster.
Better Dev Tools If your IDE knows what type a variable is expected to be, it can give you additional help about what kinds of things that variable can do. This is much harder for the IDE to do if it has to infer the type for you. And if you get more help with the minutia of an API from the IDE, then you as a developer will be able to get your head around a larger, richer API, and get there faster.
Or perhaps you were just wondering why you have to specify the class name twice for the same variable on the same line? The answer is two-fold:
Often you don't. In C# 3.0 and later you can use the var keyword instead of the type name in many cases. Variables created this way are still statically typed, but the type is now inferred for you by the compiler.
Thanks to inheritance and interfaces sometimes the type on the left-hand side doesn't match the type on the right hand side.
It's simply how the language was designed. C# is a C-style language and follows in the pattern of having types on the left.
In C# 3.0 and up you can kind of get around this in many cases with local type inference.
var variable = new SomeClass();
But at the same time you could also argue that you are still declaring a type on the LHS. Just that you want the compiler to pick it for you.
EDIT
Please read this in the context of the users original question
why do we need [class name] before a variable name?
I wanted to comment on several other answers in this thread. A lot of people are giving "C# is statically type" as an answer. While the statement is true (C# is statically typed), it is almost completely unrelated to the question. Static typing does not necessitate a type name being to the left of the variable name. Sure it can help but that is a language designer choice not a necessary feature of static typed languages.
These is easily provable by considering other statically typed languages such as F#. Types in F# appear on the right of a variable name and very often can be altogether ommitted. There are also several counter examples. PowerShell for instance is extremely dynamic and puts all of its type, if included, on the left.
One of the main reasons is that you can specify different types as long as the type on the left hand side of the assignment is a parent type of the type on the left (or an interface that is implemented on that type).
For example given the following types:
class Foo { }
class Bar : Foo { }
interface IBaz { }
class Baz : IBaz { }
C# allows you to do this:
Foo f = new Bar();
IBaz b = new Baz();
Yes, in most cases the compiler could infer the type of the variable from the assignment (like with the var keyword) but it doesn't for the reason I have shown above.
Edit: As a point of order - while C# is strongly-typed the important distinction (as far as this discussion is concerned) is that it is in fact also a statically-typed language. In other words the C# compiler does static type checking at compilation time.
C# is a statically-typed, strongly-typed language like C or C++. In these languages all variables must be declared to be of a specific type.
Ultimately because Anders Hejlsberg said so...
You need [class name] in front because there are many situations in which the first [class name] is different from the second, like:
IMyCoolInterface obj = new MyInterfaceImplementer();
MyBaseType obj2 = new MySubTypeOfBaseType();
etc. You can also use the word 'var' if you don't want to specify the type explicitely.
Why do we need to know the data type
before a class instance?
You don't! Read from right to left. You create the variable and then you store it in a type safe variable so you know what type that variable is for later use.
Consider the following snippet, it would be a nightmare to debug if you didn't receive the errors until runtime.
void FunctionCalledVeryUnfrequently()
{
ClassA a = new ClassA();
ClassB b = new ClassB();
ClassA a2 = new ClassB(); //COMPILER ERROR(thank god)
//100 lines of code
DoStuffWithA(a);
DoStuffWithA(b); //COMPILER ERROR(thank god)
DoStuffWithA(a2);
}
When you'r thinking you can replace the new Class() with a number or a string and the syntax will make much more sense. The following example might be a bit verbose but might help to understand why it's designed the way it is.
string s = "abc";
string s2 = new string(new char[]{'a', 'b', 'c'});
//Does exactly the same thing
DoStuffWithAString("abc");
DoStuffWithAString(new string(new char[]{'a', 'b', 'c'}));
//Does exactly the same thing
C#, as others have pointed out, is a strongly, statically-typed language.
By stating up front what the type you're intending to create is, you'll receive compile-time warnings when you try to assign an illegal value. By stating up front what type of parameters you accept in methods, you receive those same compile-time warnings when you accidentally pass nonsense into a method that isn't expecting it. It removes the overhead of some paranoia on your behalf.
Finally, and rather nicely, C# (and many other languages) doesn't have the same ridiculous, "convert anything to anything, even when it doesn't make sense" mentality that PHP does, which quite frankly can trip you up more times than it helps.
c# is a strongly-typed language, like c++ or java. Therefore it needs to know the type of the variable. you can fudge it a bit in c# 3.0 via the var keyword. That lets the compiler infer the type.
That's the difference between a strongly typed and weakly typed language. C# (and C, C++, Java, most more powerful languages) are strongly typed so you must declare the variable type.
When we define variables to hold data we have to specify the type of data that those variables will hold. The compiler then checks that what we are doing with the data makes sense to it, i.e. follows the rules. We can't for example store text in a number - the compiler will not allow it.
int a = "fred"; // Not allowed. Cannot implicitly convert 'string' to 'int'
The variable a is of type int, and assigning it the value "fred" which is a text string breaks the rules- the compiler is unable to do any kind of conversion of this string.
In C# 3.0, you can use the 'var' keyword - this uses static type inference to work out what the type of the variable is at compile time
var foo = new ClassName();
variable 'foo' will be of type 'ClassName' from then on.
One things that hasn't been mentioned is that C# is a CLS (Common Language Specification) compliant language. This is a set of rules that a .NET language has to adhere to in order to be interopable with other .NET languages.
So really C# is just keeping to these rules. To quote this MSDN article:
The CLS helps enhance and ensure
language interoperability by defining
a set of features that developers can
rely on to be available in a wide
variety of languages. The CLS also
establishes requirements for CLS
compliance; these help you determine
whether your managed code conforms to
the CLS and to what extent a given
tool supports the development of
managed code that uses CLS features.
If your component uses only CLS
features in the API that it exposes to
other code (including derived
classes), the component is guaranteed
to be accessible from any programming
language that supports the CLS.
Components that adhere to the CLS
rules and use only the features
included in the CLS are said to be
CLS-compliant components
Part of the CLS is the CTS the Common Type System.
If that's not enough acronyms for you then there's a tonne more in .NET such as CLI, ILasm/MSIL, CLR, BCL, FCL,
Because C# is a strongly typed language
Static typing also allows the compiler to make better optimizations, and skip certain steps. Take overloading for example, where you have multiple methods or operators with the same name differing only by their arguments. With a dynamic language, the runtime would need to grade each version in order to determine which is the best match. With a static language like this, the final code simply points directly to the appropriate overload.
Static typing also aids in code maintenance and refactoring. My favorite example being the Rename feature of many higher-end IDEs. Thanks to static typing, the IDE can find with certainty every occurrence of the identifier in your code, and leave unrelated identifiers with the same name intact.
I didn't notice if it were mentioned yet or not, but C# 4.0 introduces dynamic checking VIA the dynamic keyword. Though I'm sure you'd want to avoid it when it's not necessary.
Why C# requires me to specify the data type when creating a variable.
Why do we need to know the data type before a class instance?
I think one thing that most answers haven't referenced is the fact that C# was originally meant and designed as "managed", "safe" language among other things and a lot of those goals are arrived at via static / compile time verifiability. Knowing the variable datatype explicitly makes this problem MUCH easier to solve. Meaning that one can make several automated assessments (C# compiler, not JIT) about possible errors / undesirable behavior without ever allowing execution.
That verifiability as a side effect also gives you better readability, dev tools, stability etc. because if an automated algorithm can understand better what the code will do when it actually runs, so can you :)
Statically typed means that Compiler can perform some sort of checks at Compile time not at run time. Every variable is of particular or strong type in Static type. C# is strongly definitely strongly typed.
Isn't it much more elegant and neat to have an IStringable interface?
Who needs this Type.FullName object returned to us?
EDIT: everyone keeps asking why do I think it's more elegant..
Well, it's just like that, instead of IComparable, object would have CompareTo method, that by default throws an exception or returns 0.
There are objects that cannot and should not be described as a string. object could have equally returned string.Empty. Type.FullName is just an arbitrary choice..
And for methods such as Console.Write(object), I think it should be: Write(IStringable).
However, if you are using WriteLine to anything but strings (or something that its ToString is obvious such as numbers), it seems to me it's for debugging mode only..
By the way - how should I comment to you all? Is it okay that I post an answer?
There are three virtual methods that IMHO should have never been added to System.Object...
ToString()
GetHashCode()
Equals()
All of these could have been implemented as you suggest with an interface. Had they done so I think we'd be much better off. So why are these a problem? Let's just focus on ToString():
If ToString() is expected to be implemented by someone using ToString() and displaying the results you have an implicit contract that the compiler cannot enforce. You assume that ToString() is overloaded, but there is no way to force that to be the case.
With an IStringable you would only need to add that to your generic type-constraint or derive your interface from it to require it's usage on implementing objects.
If the benefit you find in overloading ToString() is for the debugger, you should start using [System.Diagnostics.DebuggerDisplayAttribute].
As for needing this implementation for converting objects to strings via String.Format(), and/or Console.WriteLine, they could have deferred to the System.Convert.ToString(object) and checked for something like 'IStringable', failing over to the type's name if not implemented.
As Christopher Estep points out, it's culture specific.
So I guess I stand alone here saying I hate System.Object and all of it's virtual methods. But I do love C# as a whole and overall I think the designers did a great job.
Note: If you intend to depend upon the behavior of ToString() being overloaded, I would suggest you go ahead and define your IStringable interface. Unfortunatly you'll have to pick another name for the method if you really want to require it.
more
My coworkers and I were just speaking on the topic. I think another big problem with ToString() is answering the question "what is it used for?". Is it Display text? Serialization text? Debugging text? Full type name?
Having Object.ToString makes APIs like Console.WriteLine possible.
From a design perspective the designers of the BCL felt that the ability to provide a string representation of an instance should be common to all objects. True full type name is not always helpful but they felt the ability to have customizable representation at a root level outweighed the minor annoyance of seeing a full type name in output.
True you could implement Console.WriteLine with no Object.ToString and instead do an interface check and default to the full name of the type if the interface was not present. But then every single API which wanted to capture the string representation of an object instance would have to implement this logic. Given the number of times Object.ToString is used just within the core BCL, this would have lead to a lot of duplication.
I imagine it exists because it's a wildly convenient thing to have on all objects and doesn't require add'l cruft to use. Why do you think IStringable would be more elegant?
Not at all.
It doesn't need to be implemented and it returns culture-specific results.
This method returns a human-readable string that is culture-sensitive. For example, for an instance of the Double class whose value is zero, the implementation of Double..::.ToString might return "0.00" or "0,00" depending on the current UI culture.
Further, while it comes with its own implementation, it can be overriden, and often is.
Why make it more complicated? The way it is right now basically establishes that each and every object is capable of printing its value to a string, I can't see anything wrong with that.
A "stringable" representation is useful in so many scenarios, the library designers probably thought ToString() was more straightforward.
With IStringable, you will have to do an extra check/cast to see if you can output an object in string format. It's too much of a hit on perf for such a common operation that should be a good thing to have for 99.99% of all objects anyway.
Mmmm, so it can be overridden in derived classes possibly?
Structs and Objects both have the ToString() member to ease debugging.
The easiest example of this can be seen with Console.WriteLine which receives a whole list of types including object, but also receives params object[] args. As Console is often a layer on-top of TextWriter these statements are also helpful (sometimes) when writing to files and other streams (sockets).
It also illustrates a simple object oriented design that shows you interfaces shouldn't be created just because you can.
My new base class:
class Object : global::System.Object
{
[Obsolete("Do not use ToString()", true)]
public sealed override string ToString()
{
return base.ToString();
}
[Obsolete("Do not use Equals(object)", true)]
public sealed override bool Equals(object obj)
{
return base.Equals(this, obj);
}
[Obsolete("Do not use GetHashCode()", true)]
public sealed override int GetHashCode()
{
return base.GetHashCode();
}
}
There's indeed little use of having the Type.FullName returned to you, but it would be even less use if an empty string or null were returned. You ask why it exists. That's not too easy to answer and has been a much debated issue for years. More then a decade ago, several new languages decided that it would be convenient to implicitly cast an object to a string when it was needed, those languages include Perl, PHP and JavaScript, but none of them is following the object orientation paradigm thoroughly.
Approaches
Designers of object oriented languages had a harder problem. In general, there were three approaches for getting the string representation of an object:
Use multiple inheritance, simply inherit from String as well and you can be cast to a string
Single inheritance: add ToString to the base class as a virtual method
Either: make the cast operator or copy constructor overloadable for strings
Perhaps you'd ask yourself Why would you need a ToString or equiv. in the first place? As some others already noted: the ToString is necessary for introspection (it is called when you hover your mouse over any instance of an object) and the debugger will show it too. As a programmer, you know that on any non-null object you can safely call ToString, always. No cast needed, no conversion needed.
It is considered good programming practice to always implement ToString in your own objects with a meaningful value from your persistable properties. Overloads can help if you need different types of representation of your class.
More history
If you dive a bit deeper in the history, we see SmallTalk taking a wider approach. The base object has many more methods, including printString, printOn etc.
A small decade later, when Bertrand Meyer wrote his landmark book Object Oriented Software construction, he suggested to use a rather wide base class, GENERAL. It includes methods like print, print_line and tagged_out, the latter showing all properties of the object, but no default ToString. But he suggests that the "second base object ANY to which all user defined object derive, can be expanded", which seems like the prototype approach we now know from JavaScript.
In C++, the only multiple inheritance language still in widespread use, no common ancestor exists for all classes. This could be the best candidate language to employ your own approach, i.e. use IStringable. But C++ has other ways: you can overload the cast operator and the copy constructor to implement stringability. In practice, having to be explicit about a to-string-implementation (as you suggest with IStringable) becomes quite cumbersome. C++ programmers know that.
In Java we find the first appearance of toString for a mainstream language. Unfortunately, Java has two main types: objects and value types. Value types do not have a toString method, instead you need to use Integer.toString or cast to the object counterpart. This has proven very cumbersome throughout the years, but Java programmers (incl. me) learnt to live with it.
Then came C# (I skipped a few languages, don't want to make it too long), which was first intended as a display language for the .NET platform, but proved very popular after initial skepticism. The C# designers (Anders Hejlsberg et al) looked mainly at C++ and Java and tried to take the best of both worlds. The value type remained, but boxing was introduced. This made it possible to have value types derive from Object implicitly. Adding ToString analogous to Java was just a small step and was done to ease the transition from the Java world, but has shown its invaluable merits by now.
Oddity
Though you don't directly ask about it, but why would the following have to fail?
object o = null;
Console.WriteLine(o.ToString());
and while you think about it, consider the following, which does not fail:
public static string MakeString(this object o)
{ return o == null ? "null" : o.ToString(); }
// elsewhere:
object o = null;
Console.WriteLine(o.MakeString());
which makes me ask the question: would, if the language designers had thought of extension methods early on, the ToString method be part of the extension methods to prevent unnecessary NullPointerExceptions? Some consider this bad design, other consider it a timesaver.
Eiffel, at the time, had a special class NIL which represented nothingness, but still had all the base class's methods. Sometimes I wished that C# or Java had abandoned null altogether, just like Bertrand Meyer did.
Conclusion
The wide approach of classical languages like Eiffel and Smalltalk has been replaced by a very narrow approach. Java still has a lot of methods on Object, C# only has a handful. This is of course good for implementations. Keeping ToString in the package simply keeps programming clean and understandable at the same time and because it is virtual, you can (and should!) always override it, which will make your code better apprehendable.
-- Abel --
EDIT: the asker edited the question and made a comparison to IComparable, same is probably true for ICloneable. Those are very good remarks and it is often considered that IComparable should've been included in Object. In line with Java, C# has Equals and not IComparable, but against Java, C# does not have ICloneable (Java has clone()).
You also state that it is handy for debugging only. Well, consider this everywhere you need to get the string version of something (contrived, no ext. methods, no String.Format, but you get the idea):
CarInfo car = new CarInfo();
BikeInfo bike = new BikeInfo();
string someInfoText = "Car " +
(car is IStringable) ? ((IStringable) car).ToString() : "none") +
", Bike " +
(bike is IStringable) ? ((IStringable) bike).ToString() : "none");
and compare that with this. Whichever you find easier you should choose:
CarInfo car = new CarInfo();
BikeInfo bike = new BikeInfo();
string someInfoText = "Car " + car.ToString() + ", Bike " + bike.ToString();
Remember that languages are about making things clearer and easier. Many parts of the language (LINQ, extension methods, ToString(), the ?? operator) are created as conveniences. None of these are necessities, but sure are we glad that we have them. Only when we know how to use them well, we also find the true value of a feature (or not).
I'd like to add a couple of thoughts on why .NET's System.Object class definition has a ToString() method or member function, in addition to the previous postings on debugging.
Since the .NET Common Language Runtime (CLR) or Execution Runtime supports Reflection, being able to instantiate an object given the string representation of the class type seems to be essential and fundamental. And if I'm not mistaken, all reference values in the CLR are derived from System.Object, having the ToString() method in the class ensures its availability and usage through Reflection. Defining and implementing an interface along the lines of IStringable, is not mandatory or required when defining a class in .NET, and would not ensure the ability to dynamically create a new instance after querying an assembly for its supported class types.
As more advanced .NET functionality available in the 2.0, 3.0 and 3.5 runtimes, such as Generics and LINQ, are based on Reflection and dynamic instantiation, not to mention .NET's Dynamic Language Runtime (DLR) support that allow for .NET implementations of scripting languages, such as Ruby and Python, being able to identify and create an instance by a string type seems to be an essential and indispensable function to have in all class definitions.
In short, if we can't identify and name a specific class we want to instantiate, how can we create it? Relying on a ToString() method that has the base class behavior of returning the Class Type as a "human readable" string seems to make sense.
Maybe a review of the articles and books from Jeffrey Ricther and Don Box on the .NET Framework design and architecture may provide better insights on this topic as well.