VB Compiler does not do implicit casts to Object? - c#

I've recently had a strange issue with one of my APIs reported. Essentially for some reason when used with VB code the VB compiler does not do implicit casts to Object when trying to invoke the ToString() method.
The following is a minimal code example firstly in C# and secondly in VB:
Graph g = new Graph();
g.LoadFromEmbeddedResource("VDS.RDF.Configuration.configuration.ttl");
foreach (Triple t in g.Triples)
{
Console.WriteLine(t.Subject.ToString());
}
The above compiles and runs fine while the below does not:
Dim g As Graph = New Graph()
g.LoadFromEmbeddedResource("VDS.RDF.Configuration.configuration.ttl")
For Each t As Triple In g.Triples
Console.WriteLine(t.Subject.ToString())
Next
The second VB example gives the following compiler exception:
Overload resolution failed because no
accessible 'ToString' accepts this
number of arguments.
This appears to be due to the fact that the type of the property t.Subject that I am trying to write to the console has explicitly defined ToString() methods which take parameters. The VB compiler appears to expect one of these to be used and does not seem to implicitly cast to Object and use the standard Object.ToString() method whereas the C# compiler does.
Is there any way around this e.g. a VB compiler option or is it best just to ensure that the type of the property (which is an interface in this example) explicitly defines an unparameterized ToString() method to ensure compatability with VB?
Edit
Here are the additional details requested by Lucian
Graph is an implementation of an interface but that is actually irrelevant since it is the INode interface which is the type that t.Subject returns which is the issue.
INode defines two overloads for ToString() both of which take parameters
Yes it is a compile time error
No I do not use hide-by-name, the API is all written in C# so I couldn't generate that kind of API if I wanted to
Note that I've since added an explicit unparameterized ToString() overload to the interface which has fixed the issue for VB users.

RobV, I'm the VB spec lead, so I should be able to answer your question, but I'll need some clarification please...
What are the overloads defined on "Graph"? It'd help if you could make a self-contained repro. It's hard to explain overloading behavior without knowing the overload candidates :)
You said it failed with a "compiler exception". That doesn't really exist. Do you mean a "compile-time error"? Or a "run-time exception"?
Something to check is whether you're relying on any kind of "hide-by-name" vs "hide-by-sig" behavior. C# compiler only ever emits "hide-by-sig" APIs; VB compiler can emit either depending on whether you use the "Shadows" keyword.
C# overload algorithm is to walk up the inheritance hierarchy level by level until it finds a possible match; VB overload algorithm is to look at all levels of the inheritance hierarchy simultaneously to see which has the best match. This is all a bit theoretical, but with a small self-contained repro of your problem I could explain what it means in practice.
Hans, I don't think your explanation is the right one. Your code gives compile-time error "BC30455: Argument not specified for parameter 'mumble' of ToString". But RobV had experienced "Overload resolution failed because no accessible 'ToString' accepts this number of arguments".

Here's a repro of this behavior. It also shows you the workaround, cast with CObj():
Module Module1
Sub Main()
Dim itf As IFoo = New CFoo()
Console.WriteLine(itf.ToString()) '' Error BC30455
Console.WriteLine(CObj(itf).ToString()) '' Okay
End Sub
End Module
Interface IFoo
Function ToString(ByVal mumble As Integer) As String
End Interface
Class CFoo
Implements IFoo
Function ToString1(ByVal mumble As Integer) As String Implements IFoo.ToString
Return "foo"
End Function
End Class
I think this is annotated in the VB.NET Language Specification, chapter 11.8.1 "Overloaded method resolution":
The justification for this rule is
that if a program is loosely-typed
(that is, most or all variables are
declared as Object), overload
resolution can be difficult because
all conversions from Object are
narrowing. Rather than have the
overload resolution fail in many
situations (requiring strong typing of
the arguments to the method call),
resolution the appropriate overloaded
method to call is deferred until run
time. This allows the loosely-typed
call to succeed without additional
casts.
An unfortunate side-effect of this,
however, is that performing the
late-bound call requires casting the
call target to Object. In the case of
a structure value, this means that the
value must be boxed to a temporary. If
the method eventually called tries to
change a field of the structure, this
change will be lost once the method
returns.
Interfaces are excluded from this
special rule because late binding
always resolves against the members of
the runtime class or structure type,
which may have different names than
the members of the interfaces they
implement.
Not sure. I'd transliterate it as: VB.NET is a loosely typed language where many object references are commonly late bound. This makes method overload resolution perilous.

Related

What does "late-bound access to the destination object" mean?

The docs for Interlocked.Exchange<T> contain the following remark:
This method overload is preferable to the Exchange(Object, Object) method overload, because the latter requires late-bound access to the destination object.
I am quite bewildered by this note. To me "late binding" refers to runtime method dispatch and doesn't seem to have anything to do with the technical specifics of atomically swapping two memory locations. What is the note talking about? What does "late-bound access" mean in this context?
canton7's answer is correct, and thanks for the shout-out. I'd like to add a few additional points.
This sentence, as is too often the case in the .NET documentation, both chooses to enstructure bizarre word usements, and thoroughly misses the point. For me, the poor word choice that stood out was not "late bound", which merely misses the point. The really awful word choice is using "destination object" to mean variable. A variable is not an object, any more than your sock drawer is a pair of socks. A variable contains a reference to an object, just as a sock drawer contains socks, and those two things should not be confused.
As you note, the reason to prefer the T version has nothing to do with late binding. The reason to prefer the T version is C# does not allow variant conversions on ref arguments. If you have a variable shelly of type Turtle, you cannot pass ref shelly to a method that takes ref object, because that method could write a Tiger into a ref object.
What then are the logical consequences of using the Object-taking overload on shelly? There are only two possibilities:
We copy the value of shelly to a second variable of type Object, do the exchange, and then copy the new value back, and now our operation is no longer atomic, which was the whole point of calling the interlocked exchange.
We change shelly to be of type Object, and now we are in a non-statically-typed and therefore bug-prone world, where we cannot ever be sure that shelly still contains a reference to Turtle.
Since both of those alternatives are terrible, you should use the generic version because it allows the aliased variable to be of the correct type throughout the operation.
The equivalent remark for Interlocked.Exchange(object, object) is:
Beginning with .NET Framework 2.0, the Exchange<T>(T, T) method overload provides a type-safe alternative for reference types. We recommend that you call it instead of this overload.
Although I haven't heard it used in this way before, I think by "late-bound" it simply means "non type-safe", as you need to cast the object to your concrete type (at runtime) before using it.
As well as virtual method dispatch, "Late Binding" also commonly refers to reflection, as the exact method to be called similarly isn't known until runtime.
To quote Eric Lippert:
Basically by "early binding" we mean "the binding analysis is performed by the compiler and baked in to the generated program"; if the binding fails then the program does not run because the compiler did not get to the code generation phase. By "late binding" we mean "some aspect of the binding will be performed by the runtime" and therefore a binding failure will manifest as a runtime failure
(emphasis mine). Under this rather loose definition, casting object to a concrete type and then calling a method on it could be seen as "late bound", as there's an element of the binding which is performed at, and could fail at, runtime.

dynamic and generics in C#

As discovered in C 3.5, the following would not be possible due to type erasure: -
int foo<T>(T bar)
{
return bar.Length; // will not compile unless I do something like where T : string
}
foo("baz");
I believe the reason this doesn't work is in C# and java, is due to a concept called type erasure, see http://en.wikipedia.org/wiki/Type_erasure.
Having read about the dynamic keyword, I wrote the following: -
int foo<T>(T bar)
{
dynamic test = bar;
return test.Length;
}
foo("baz"); // will compile and return 3
So, as far as I understand, dynamic will bypass compile time checking but if the type has been erased, surely it would still be unable to resolve the symbol unless it goes deeper and uses some kind of reflection?
Is using the dynamic keyword in this way bad practice and does this make generics a little more powerful?
dynamics and generics are 2 completely different notions. If you want compile-time safety and speed use strong typing (generics or just standard OOP techniques such as inheritance or composition). If you do not know the type at compile time you could use dynamics but they will be slower because they are using runtime invocation and less safe because if the type doesn't implement the method you are attempting to invoke you will get a runtime error.
The 2 notions are not interchangeable and depending on your specific requirements you could use one or the other.
Of course having the following generic constraint is completely useless because string is a sealed type and cannot be used as a generic constraint:
int foo<T>(T bar) where T : string
{
return bar.Length;
}
you'd rather have this:
int foo(string bar)
{
return bar.Length;
}
I believe the reason this doesn't work is in C# and java, is due to a concept called type erasure, see http://en.wikipedia.org/wiki/Type_erasure.
No, this isn't because of type erasure. Anyway there is no type erasure in C# (unlike Java): a distinct type is constructed by the runtime for each different set of type arguments, there is no loss of information.
The reason why it doesn't work is that the compiler knows nothing about T, so it can only assume that T inherits from object, so only the members of object are available. You can, however, provide more information to the compiler by adding a constraint on T. For instance, if you have an interface IBar with a Length property, you can add a constraint like this:
int foo<T>(T bar) where T : IBar
{
return bar.Length;
}
But if you want to be able to pass either an array or a string, it won't work, because the Length property isn't declared in any interface implemented by both String and Array...
No, C# does not have type erasure - only Java has.
But if you specify only T, without any constraint, you can not use obj.Lenght because T can virtually be anything.
foo(new Bar());
The above would resolve to an Bar-Class and thus the Lenght Property might not be avaiable.
You can only use Methods on T when you ensure that T this methods also really has. (This is done with the where Constraints.)
With the dynamics, you loose compile time checking and I suggest that you do not use them for hacking around generics.
In this case you would not benefit from dynamics in any way. You just delay the error, as an exception is thrown in case the dynamic object does not contain a Length property. In case of accessing the Length property in a generic method I can't see any reason for not constraining it to types who definately have this property.
"Dynamics are a powerful new tool that make interop with dynamic languages as well as COM easier, and can be used to replace much turgid reflective code. They can be used to tell the compiler to execute operations on an object, the checking of which is deferred to runtime.
The great danger lies in the use of dynamic objects in inappropriate contexts, such as in statically typed systems, or worse, in place of an interface/base class in a properly typed system."
Qouted From Article
Thought I'd weigh-in on this one, because no one clarified how generics work "under the hood". That notion of T being an object is mentioned above, and is quite clear. What is not talked about, is that when we compile C# or VB or any other supported language, - at the Intermediate Language (IL) level (what we compile to) which is more akin to an assembly language or equivalent of Java Byte codes, - at this level, there is no generics! So the new question is how do you support generics in IL? For each type that accesses the generic, a non-generic version of the code is generated which substitutes the generic(s) such as the ubiquitous T to the actual type it was called with. So if you only have one type of generic, such as List<>, then that's what the IL will contain. But if you use many implementation of a generic, then many specific implementations are created, and calls to the original code substituted with the calls to the specific non-generic version. To be clear, a MyList used as: new MyList(), will be substituted in IL with something like MyList_string().
That's my (limited) understanding of what's going on. The point being, the benefit of this approach is that the heavy lifting is done at compile-time, and at runtime there's no degradation to performance - which is again, why generic are probably so loved used anywhere, and everywhere by .NET developers.
On the down-side? If a method or type is used many times, then the output assembly (EXE or DLL) will get larger and larger, dependent of the number of different implementation of the same code. Given the average size of DLLs output - I doubt you'll ever consider generics to be a problem.

Generic type parameter covariance and multiple interface implementations

If I have a generic interface with a covariant type parameter, like this:
interface IGeneric<out T>
{
string GetName();
}
And If I define this class hierarchy:
class Base {}
class Derived1 : Base{}
class Derived2 : Base{}
Then I can implement the interface twice on a single class, like this, using explicit interface implementation:
class DoubleDown: IGeneric<Derived1>, IGeneric<Derived2>
{
string IGeneric<Derived1>.GetName()
{
return "Derived1";
}
string IGeneric<Derived2>.GetName()
{
return "Derived2";
}
}
If I use the (non-generic)DoubleDown class and cast it to IGeneric<Derived1> or IGeneric<Derived2> it functions as expected:
var x = new DoubleDown();
IGeneric<Derived1> id1 = x; //cast to IGeneric<Derived1>
Console.WriteLine(id1.GetName()); //Derived1
IGeneric<Derived2> id2 = x; //cast to IGeneric<Derived2>
Console.WriteLine(id2.GetName()); //Derived2
However, casting the x to IGeneric<Base>, gives the following result:
IGeneric<Base> b = x;
Console.WriteLine(b.GetName()); //Derived1
I expected the compiler to issue an error, as the call is ambiguous between the two implementations, but it returned the first declared interface.
Why is this allowed?
(inspired by A class implementing two different IObservables?. I tried to show to a colleague that this will fail, but somehow, it didn't)
If you have tested both of:
class DoubleDown: IGeneric<Derived1>, IGeneric<Derived2> {
string IGeneric<Derived1>.GetName() {
return "Derived1";
}
string IGeneric<Derived2>.GetName() {
return "Derived2";
}
}
class DoubleDown: IGeneric<Derived2>, IGeneric<Derived1> {
string IGeneric<Derived1>.GetName() {
return "Derived1";
}
string IGeneric<Derived2>.GetName() {
return "Derived2";
}
}
You must have realized that the results in reality, changes with the order you declaring the interfaces to implement. But I'd say it is just unspecified.
First off, the specification(§13.4.4 Interface mapping) says:
If more than one member matches, it is unspecified which member is the implementation of I.M.
This situation can only occur if S is a constructed type where the two members as declared in the generic type have different signatures, but the type arguments make their signatures identical.
Here we have two questions to consider:
Q1: Do your generic interfaces have different signatures?
A1: Yes. They are IGeneric<Derived2> and IGeneric<Derived1>.
Q2: Could the statement IGeneric<Base> b=x; make their signatures identical with type arguments?
A2: No. You invoked the method through a generic covariant interface definition.
Thus your call meets the unspecified condition. But how could this happen?
Remember, whatever the interface you specified to refer the object of type DoubleDown, it is always a DoubleDown. That is, it always has these two GetName method. The interface you specify to refer it, in fact, performs contract selection.
The following is the part of captured image from the real test
This image shows what would be returned with GetMembers at runtime. In all cases you refer it, IGeneric<Derived1>, IGeneric<Derived2> or IGeneric<Base>, are nothing different. The following two image shows more details:
As the images shown, these two generic derived interfaces have neither the same name nor another signatures/tokens make them identical.
The compiler can't throw an error on the line
IGeneric<Base> b = x;
Console.WriteLine(b.GetName()); //Derived1
because there is no ambiguity that the compiler can know about. GetName() is in fact a valid method on interface IGeneric<Base>. The compiler doesn't track the runtime type of b to know that there is a type in there which could cause an ambiguity. So it's left up to the runtime to decide what to do. The runtime could throw an exception, but the designers of the CLR apparently decided against that (which I personally think was a good decision).
To put it another way, let's say that instead you simply had written the method:
public void CallIt(IGeneric<Base> b)
{
string name = b.GetName();
}
and you provide no classes implementing IGeneric<T> in your assembly. You distribute this and many others implement this interface only once and are able to call your method just fine. However, someone eventually consumes your assembly and creates the DoubleDown class and passes it into your method. At what point should the compiler throw an error? Surely the already compiled and distributed assembly containing the call to GetName() can't produce a compiler error. You could say that the assignment from DoubleDown to IGeneric<Base> produces the ambiguity. but once again we could add another level of indirection into the original assembly:
public void CallItOnDerived1(IGeneric<Derived1> b)
{
return CallIt(b); //b will be cast to IGeneric<Base>
}
Once again, many consumers could call either CallIt or CallItOnDerived1 and be just fine. But our consumer passing DoubleDown also is making a perfectly legal call that could not cause a compiler error when they call CallItOnDerived1 as converting from DoubleDown to IGeneric<Derived1> should certainly be OK. Thus, there is no point at which the compiler can throw an error other than possibly on the definition of DoubleDown, but this would eliminate the possibility of doing something potentially useful with no workaround.
I have actually answered this question more in depth elsewhere, and also provided a potential solution if the language could be changed:
No warning or error (or runtime failure) when contravariance leads to ambiguity
Given that the chance of the language changing to support this is virtually zero, I think that the current behavior is alright, except that it should be laid out in the specifications so that all implementations of the CLR would be expected to behave the same way.
Holy goodness, lots of really good answers here to what is quite a tricky question. Summing up:
The language specification does not clearly say what to do here.
This scenario usually arises when someone is attempting to emulate interface covariance or contravariance; now that C# has interface variance we hope that less people will use this pattern.
Most of the time "just pick one" is a reasonable behaviour.
How the CLR actually chooses which implementation is used in an ambiguous covariant conversion is implementation-defined. Basically, it scans the metadata tables and picks the first match, and C# happens to emit the tables in source code order. You can't rely on this behaviour though; either can change without notice.
I'd only add one other thing, and that is: the bad news is that interface reimplementation semantics do not exactly match the behaviour specified in the CLI specification in scenarios where these sorts of ambiguities arise. The good news is that the actual behaviour of the CLR when re-implementing an interface with this kind of ambiguity is generally the behaviour that you'd want. Discovering this fact led to a spirited debate between me, Anders and some of the CLI spec maintainers and the end result was no change to either the spec or the implementation. Since most C# users do not even know what interface reimplementation is to begin with, we hope that this will not adversely affect users. (No customer has ever brought it to my attention.)
The question asked, "Why doesn't this produce a compiler warning?".
In VB, it does(I implemented it).
The type system doesn't carry enough information to provide a warning at time of invocation about variance ambiguity. So the warning has to be emitted earlier ...
In VB, if you declare a class C which implements both IEnumerable(Of Fish) and IEnumerable(Of Dog), then it gives a warning saying that the two will conflict in the common case IEnumerable(Of Animal). This is enough to stamp out variance-ambiguity from code that's written entirely in VB.
However, it doesn't help if the problem class was declared in C#. Also note that it's completely reasonable to declare such a class if no one invokes a problematic member on it.
In VB, if you perform a cast from such a class C into IEnumerable(Of Animal), then it gives a warning on the cast. This is enough to stamp out variance-ambiguity even if you imported the problem class from metadata.
However, it's a poor warning location because it's not actionable: you can't go and change the cast. The only actionable warning to people would be to go back and change the class definition. Also note that it's completely reasonable to perform such a cast if no one invokes a problematic member on it.
Question:
How come VB emits these warnings but C# doesn't?
Answer:
When I put them into VB, I was enthusiastic about formal computer science, and had only been writing compilers for a couple of years, and I had the time and enthusiasm to code them up.
Eric Lippert was doing them in C#. He had the wisdom and maturity to see that coding up such warnings in the compiler would take a lot of time that could be better spent elsewhere, and was sufficiently complex that it carried high risk. Indeed the VB compilers had bugs in these very warnings that were only fixed in VS2012.
Also, to be frank, it was impossible to come up with a warning message useful enough that people would understand it. Incidentally,
Question:
How does the CLR resolve the ambiguity when chosing which one to invoke?
Answer:
It bases it on the lexical ordering of inheritance statements in the original source code, i.e. the lexical order in which you declared that C implements IEnumerable(Of Fish) and IEnumerable(Of Dog).
Trying to delve into the "C# language specifications", it looks that the behaviour is not specified (if I did not get lost in my way).
7.4.4 Function member invocation
The run-time processing of a function member invocation consists of the following steps, where M is the function member and, if M is an instance member, E is the instance expression:
[...]
o The function member implementation to invoke is determined:
• If the compile-time type of E is an interface, the function member to invoke is the implementation of M provided by the run-time type of the instance referenced by E. This function member is determined by applying the interface mapping rules (§13.4.4) to determine the implementation of M provided by the run-time type of the instance referenced by E.
13.4.4 Interface mapping
Interface mapping for a class or struct C locates an implementation for each member of each interface specified in the base class list of C. The implementation of a particular interface member I.M, where I is the interface in which the member M is declared, is determined by examining each class or struct S, starting with C and repeating for each successive base class of C, until a match is located:
• If S contains a declaration of an explicit interface member implementation that matches I and M, then this member is the implementation of I.M.
• Otherwise, if S contains a declaration of a non-static public member that matches M, then this member is the implementation of I.M. If more than one member matches, it is unspecified which member is the implementation of I.M. This situation can only occur if S is a constructed type where the two members as declared in the generic type have different signatures, but the type arguments make their signatures identical.

Why can't C# member names be the same as the enclosing type name?

In C#, the following code doesn't compile:
class Foo {
public string Foo;
}
The question is: why?
More exactly, I understand that this doesn't compile because (I quote):
member names cannot be the same as their enclosing type
Ok, fine. I understand that, I won't do it again, I promise.
But I really don't understand why the compiler refuses to take any field having the same name as an enclosing type. What is the underlying issue that prevents me to do that?
Strictly speaking, this is a limitation imposed by C#, most likely for convenience of syntax. A constructor has a method body, but its member entry in IL is denoted as ".ctor" and it has slightly different metadata than a normal method (In the Reflection classes, ConstructorInfo derives from MethodBase, not MethodInfo.) I don't believe there's a .NET limitation that prevents creating a member (or even a method) with the same name as the outer type, though I haven't tried it.
I was curious, so I confirmed it's not a .NET limitation. Create the following class in VB:
Public Class Class1
Public Sub Class1()
End Sub
End Class
In C#, you reference it as:
var class1 = new Class1();
class1.Class1();
Because Foo is reserved as the name of the constructor.
So if your code was allowed - what would you call the constructor?
Even if it was possible to do this by treating the constructor as a special case and introducing new rules into method / member binding - would it be a good idea? It would inevitably lead to confusion at some point.
Because the member name clashes with the name of the class's constructor?
There is a right way to do it and a wrong way to do it.
Why Doesn't C# allow it?
Because it does not reason to do so. Why would you want to create such confusion in your life.
I think the CLR allows it, as another post proves with a vb.net example and it should not be restricted, but I would not want to create an application based on the same rules that the CLR operates in. The abstraction makes code more clear. I think the argument works on the same level as multiple inheritance. Yes it can be done in some languages, but it causes confusion. My answer therefore would be to reduce ambiguity and confusion and is based in the c# parser/compiler. A design choice by the C# team.

Why do we need new keywords for Covariance and Contravariance in C#?

Can someone explain why there is the need to add an out or in parameter to indicate that a generic type is Co or Contra variant in C# 4.0?
I've been trying to understand why this is important and why the compiler can't just figure it out..
Thanks,
Josh
Eric Lippert, who works on the langauge, has a series of posts on msdn that should help clarify the issues involved:
http://blogs.msdn.com/ericlippert/archive/tags/Covariance+and+Contravariance/default.aspx
When reading the articles shown at that link, start at the bottom and work up.
Eventually you'll get to #7 (Why do we need a syntax at all?).
We don't actually need them, any more then we need abstract on classes or both out and ref. They exist just so that we, as programmers, can make our intention crystal clear, so that maintenance programmer know what we are doing, and the compiler can verify that we are doing it right.
Well, the main problem is that if you have a class hierarchy like:
class Foo { .. }
class Bar : Foo { .. }
And you have an IEnumerator<Bar>, you can't use that as an IEnumerator<Foo> even though that would be perfectly safe. In 3.5 this forces a large number of painful gyrations. This operation would always be safe but is denied by the type system because it doesn't know about the covariant use of the generic type parameter. IEnumerator<Bar> can only return a Bar and every Bar is a Foo.
Similarly, if you had an IEqualityComparer<Foo> it can be used to compare any pair of objects of type Foo even if one or both is a Bar, but it cannot be cast into an IEqualityComparer<Bar> because it doesn't know about the contravariant use of the generic type parameter. IEqualityComparer<Foo> only consumes objects of type Foo and every Bar is a Foo.
Without these keywords we're forced to assume the generic argument can occur as both an argument to a method and as the result type of a method, and so we can't safely allow either of the above conversions.
With them, the type system is free to allow us to safely upcast and downcast between those interfaces in the direction indicated by the keyword and we get errors indicating when we would violate the discipline required to ensure that safety.
The in and out keywords have been keywords since C# 1.0, and have been used in the context of in- and out- parameters to methods.
Covariance and contravariance are constraints on how one can implement an interface. There is no good way to infer them - the only way, I think, is from usage, and that would be messy and in the end it wouldn't work.
Jon and Joel both provided a pretty complete answer to this, but the bottom line is that they aren't so much needed by the compiler but rather help guarantee the safety of the implementation by explicitly indicating the variance of the parameter. This follows a very similar pattern to requiring the out or ref keyword at both the calling site and the declaration site.

Categories

Resources