How the CLR implements IEnumerable<T> on Arrays? [duplicate] - c#

So as you may know, arrays in C# implement IList<T>, among other interfaces. Somehow though, they do this without publicly implementing the Count property of IList<T>! Arrays have only a Length property.
Is this a blatant example of C#/.NET breaking its own rules about the interface implementation or am I missing something?

So as you may know, arrays in C# implement IList<T>, among other interfaces
Well, yes, erm no, not really. This is the declaration for the Array class in the .NET 4 framework:
[Serializable, ComVisible(true)]
public abstract class Array : ICloneable, IList, ICollection, IEnumerable,
IStructuralComparable, IStructuralEquatable
{
// etc..
}
It implements System.Collections.IList, not System.Collections.Generic.IList<>. It can't, Array is not generic. Same goes for the generic IEnumerable<> and ICollection<> interfaces.
But the CLR creates concrete array types on the fly, so it could technically create one that implements these interfaces. This is however not the case. Try this code for example:
using System;
using System.Collections.Generic;
class Program {
static void Main(string[] args) {
var goodmap = typeof(Derived).GetInterfaceMap(typeof(IEnumerable<int>));
var badmap = typeof(int[]).GetInterfaceMap(typeof(IEnumerable<int>)); // Kaboom
}
}
abstract class Base { }
class Derived : Base, IEnumerable<int> {
public IEnumerator<int> GetEnumerator() { return null; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
The GetInterfaceMap() call fails for a concrete array type with "Interface not found". Yet a cast to IEnumerable<> works without a problem.
This is quacks-like-a-duck typing. It is the same kind of typing that creates the illusion that every value type derives from ValueType which derives from Object. Both the compiler and the CLR have special knowledge of array types, just as they do of value types. The compiler sees your attempt at casting to IList<> and says "okay, I know how to do that!". And emits the castclass IL instruction. The CLR has no trouble with it, it knows how to provide an implementation of IList<> that works on the underlying array object. It has built-in knowledge of the otherwise hidden System.SZArrayHelper class, a wrapper that actually implements these interfaces.
Which it doesn't do explicitly like everybody claims, the Count property you asked about looks like this:
internal int get_Count<T>() {
//! Warning: "this" is an array, not an SZArrayHelper. See comments above
//! or you may introduce a security hole!
T[] _this = JitHelpers.UnsafeCast<T[]>(this);
return _this.Length;
}
Yes, you can certainly call that comment "breaking the rules" :) It is otherwise darned handy. And extremely well hidden, you can check this out in SSCLI20, the shared source distribution for the CLR. Search for "IList" to see where the type substitution takes place. The best place to see it in action is clr/src/vm/array.cpp, GetActualImplementationForArrayGenericIListMethod() method.
This kind of substitution in the CLR is pretty mild compared to what happens in the language projection in the CLR that allows writing managed code for WinRT (aka Metro). Just about any core .NET type gets substituted there. IList<> maps to IVector<> for example, an entirely unmanaged type. Itself a substitution, COM doesn't support generic types.
Well, that was a look at what happens behind the curtain. It can be very uncomfortable, strange and unfamiliar seas with dragons living at the end of the map. It can be very useful to make the Earth flat and model a different image of what's really going on in managed code. Mapping it to everybody favorite answer is comfortable that way. Which doesn't work so well for value types (don't mutate a struct!) but this one is very well hidden. The GetInterfaceMap() method failure is the only leak in the abstraction that I can think of.

New answer in the light of Hans's answer
Thanks to the answer given by Hans, we can see the implementation is somewhat more complicated than we might think. Both the compiler and the CLR try very hard to give the impression that an array type implements IList<T> - but array variance makes this trickier. Contrary to the answer from Hans, the array types (single-dimensional, zero-based anyway) do implement the generic collections directly, because the type of any specific array isn't System.Array - that's just the base type of the array. If you ask an array type what interfaces it supports, it includes the generic types:
foreach (var type in typeof(int[]).GetInterfaces())
{
Console.WriteLine(type);
}
Output:
System.ICloneable
System.Collections.IList
System.Collections.ICollection
System.Collections.IEnumerable
System.Collections.IStructuralComparable
System.Collections.IStructuralEquatable
System.Collections.Generic.IList`1[System.Int32]
System.Collections.Generic.ICollection`1[System.Int32]
System.Collections.Generic.IEnumerable`1[System.Int32]
For single-dimensional, zero-based arrays, as far as the language is concerned, the array really does implement IList<T> too. Section 12.1.2 of the C# specification says so. So whatever the underlying implementation does, the language has to behave as if the type of T[] implements IList<T> as with any other interface. From this perspective, the interface is implemented with some of the members being explicitly implemented (such as Count). That's the best explanation at the language level for what's going on.
Note that this only holds for single-dimensional arrays (and zero-based arrays, not that C# as a language says anything about non-zero-based arrays). T[,] doesn't implement IList<T>.
From a CLR perspective, something funkier is going on. You can't get the interface mapping for the generic interface types. For example:
typeof(int[]).GetInterfaceMap(typeof(ICollection<int>))
Gives an exception of:
Unhandled Exception: System.ArgumentException: Interface maps for generic
interfaces on arrays cannot be retrived.
So why the weirdness? Well, I believe it's really due to array covariance, which is a wart in the type system, IMO. Even though IList<T> is not covariant (and can't be safely), array covariance allows this to work:
string[] strings = { "a", "b", "c" };
IList<object> objects = strings;
... which makes it look like typeof(string[]) implements IList<object>, when it doesn't really.
The CLI spec (ECMA-335) partition 1, section 8.7.1, has this:
A signature type T is compatible-with a signature type U if and only if at least one of the following holds
...
T is a zero-based rank-1 array V[], and U is IList<W>, and V is array-element-compatible-with W.
(It doesn't actually mention ICollection<W> or IEnumerable<W> which I believe is a bug in the spec.)
For non-variance, the CLI spec goes along with the language spec directly. From section 8.9.1 of partition 1:
Additionally, a created vector with element type T, implements the interface System.Collections.Generic.IList<U>, where U := T. (§8.7)
(A vector is a single-dimensional array with a zero base.)
Now in terms of the implementation details, clearly the CLR is doing some funky mapping to keep the assignment compatibility here: when a string[] is asked for the implementation of ICollection<object>.Count, it can't handle that in quite the normal way. Does this count as explicit interface implementation? I think it's reasonable to treat it that way, as unless you ask for the interface mapping directly, it always behaves that way from a language perspective.
What about ICollection.Count?
So far I've talked about the generic interfaces, but then there's the non-generic ICollection with its Count property. This time we can get the interface mapping, and in fact the interface is implemented directly by System.Array. The documentation for the ICollection.Count property implementation in Array states that it's implemented with explicit interface implementation.
If anyone can think of a way in which this kind of explicit interface implementation is different from "normal" explicit interface implementation, I'd be happy to look into it further.
Old answer around explicit interface implementation
Despite the above, which is more complicated because of the knowledge of arrays, you can still do something with the same visible effects through explicit interface implementation.
Here's a simple standalone example:
public interface IFoo
{
void M1();
void M2();
}
public class Foo : IFoo
{
// Explicit interface implementation
void IFoo.M1() {}
// Implicit interface implementation
public void M2() {}
}
class Test
{
static void Main()
{
Foo foo = new Foo();
foo.M1(); // Compile-time failure
foo.M2(); // Fine
IFoo ifoo = foo;
ifoo.M1(); // Fine
ifoo.M2(); // Fine
}
}

IList<T>.Count is implemented explicitly:
int[] intArray = new int[10];
IList<int> intArrayAsList = (IList<int>)intArray;
Debug.Assert(intArrayAsList.Count == 10);
This is done so that when you have a simple array variable, you don't have both Count and Length directly available.
In general, explicit interface implementation is used when you want to ensure that a type can be used in a particular way, without forcing all consumers of the type to think about it that way.
Edit: Whoops, bad recall there. ICollection.Count is implemented explicitly. The generic IList<T> is handled as Hans descibes below.

Explicit interface implementation. In short, you declare it like void IControl.Paint() { } or int IList<T>.Count { get { return 0; } }.

It's no different than an explicit interface implementation of IList. Just because you implement the interface doesn't mean its members need to appear as class members. It does implement the Count property, it just doesn't expose it on X[].

With reference-sources being available:
//----------------------------------------------------------------------------------------
// ! READ THIS BEFORE YOU WORK ON THIS CLASS.
//
// The methods on this class must be written VERY carefully to avoid introducing security holes.
// That's because they are invoked with special "this"! The "this" object
// for all of these methods are not SZArrayHelper objects. Rather, they are of type U[]
// where U[] is castable to T[]. No actual SZArrayHelper object is ever instantiated. Thus, you will
// see a lot of expressions that cast "this" "T[]".
//
// This class is needed to allow an SZ array of type T[] to expose IList<T>,
// IList<T.BaseType>, etc., etc. all the way up to IList<Object>. When the following call is
// made:
//
// ((IList<T>) (new U[n])).SomeIListMethod()
//
// the interface stub dispatcher treats this as a special case, loads up SZArrayHelper,
// finds the corresponding generic method (matched simply by method name), instantiates
// it for type <T> and executes it.
//
// The "T" will reflect the interface used to invoke the method. The actual runtime "this" will be
// array that is castable to "T[]" (i.e. for primitivs and valuetypes, it will be exactly
// "T[]" - for orefs, it may be a "U[]" where U derives from T.)
//----------------------------------------------------------------------------------------
sealed class SZArrayHelper {
// It is never legal to instantiate this class.
private SZArrayHelper() {
Contract.Assert(false, "Hey! How'd I get here?");
}
/* ... snip ... */
}
Specifically this part:
the interface stub dispatcher treats this as a special case, loads up
SZArrayHelper, finds the corresponding generic method (matched simply
by method name), instantiates it for type and executes it.
(Emphasis mine)
Source (scroll up).

Related

How to "carry" covariance through multiple interfaces

I've got an interface structure that looks like this:
At the most basic level is an IDataProducer with this definition:
public interface IDataProducer<out T>
{
IEnumerable<T> GetRecords();
}
and an IDataConsumer that looks like this:
public interface IDataConsumer<out T>
{
IDataProducer<T> Producer { set; }
}
Finally, I've got an IWriter that derives off of IDataConsumer like so:
public interface IWriter<out T> : IDataConsumer<T>
{
String FileToWriteTo { set; }
void Start();
}
I wanted to make IWriter's generic type T covariant so that I could implement a Factory method to create Writers that could handle different objects without having to know what type would be returned ahead of time. This was implemented by marking the generic type "out". The problem is, I'm having a compile error on IDataConsumer because of this:
Invalid variance: The type parameter 'T' must be contravariantly valid on 'IDataConsumer<T>.Producer'. 'T' is covariant.
I'm not really sure how this can be. It looks to me like the generic type is marked as covariant through the whole chain of interfaces, but it is very possible I don't totally understand how covariance works. Can someone explain to me what I am doing wrong?
The problem is that your Producer property is write-only. That is, you are actually using T in a contravariant way, by passing the value that is generic on type T into the implementer of the interface, rather than the implementer passing it out.
One of the things I like best about the way the C# language design team handled the variance feature in generic interfaces is that the keywords used to denote covariant and contravariant type parameters are mnemonic with the way the parameters are used. I always have a hard time remembering what the words "covariant" and "contravariant" mean, but I never have any trouble remembering what out T vs. in T means. The former means that you promise to only return T values from the interface (e.g. method return values or property getters), while the latter means that you promise to only accept T values into the interface (e.g. method parameters or property setters).
You broke that promise by providing a setter for the Producer property.
Depending on how these interfaces are implemented, it's possible what you want is interface IDataConsumer<in T> instead. That would at least compile. :) And as long as the IDataConsumer<T> implementation really is only consuming the T values, that would probably work. Hard to say without a more complete example.
Peter's answer is correct. To add to it: it helps to try out some examples and see what goes wrong. Suppose the code you had originally was allowed by the compiler. We could then say:
class TigerConsumer : IDataConsumer<Tiger>
{
public IDataProducer<Tiger> p;
public IDataProducer<Tiger> Producer { set { p = value; } }
... and so on ...
}
class GiraffeProducer : IDataProducer<Giraffe>
{
public IEnumerable<Giraffe> GetRecords() {
yield return new Giraffe();
}
TigerConsumer t = new TigerConsumer();
IDataConsumer<Mammal> m = t; // compatible with IDataConsumer<Mammal>
m.Producer = new GiraffeProducer(); // compatible with IDataProducer<Mammal>
foreach(Tiger tiger in t.p.GetRecords())
// And we just cast a giraffe to tiger
Every step on the way here is perfectly typesafe but the program is plainly wrong. Either one of those conversions has to be illegal, or one of the interfaces is not safe for covariance. We wish all those conversions to be legal, and therefore we must detect the lack of type safety in your interface declarations.

Arrays vs Generic in C#

I have noticed that array, in c#, implements ICollection<T>. How can an array implement a generic container interface, yet not be generic itself? Is it possible for us to do the same?
Edit: I would also like to know how the array is not generic, yet it accepts any type and has type safety.
public class ListOfStrings : IList<string>
{
...
}
This is a great example that demonstrates that we can create non-generics from a generic (Thank you MarcinJuraszek!!). This collection would be stuck with strings. My guess is that it has nothing to do with the generic value type declaration of string and is some internal wiring that I am unfamiliar with.
Thank you again!
Yes, it's totally possible. You can declare something like this:
public class MyListOfStrings : IList<string>
{
}
and as long as you implement all the properties/methods IList<string> requires you to everything will work just fine. As you can see MyListOfStrings is not generic.
You should also remember that Arrays are special types, and there is a bunch of stuff going on with them that's not happening with regular user-defined types. Some of it is described on MSDN, and the part that seem to be related to your questions is here:
Starting with the .NET Framework 2.0, the Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces. The implementations are provided to arrays at run time, and as a result, the generic interfaces do not appear in the declaration syntax for the Array class. In addition, there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations). The key thing to be aware of when you cast an array to one of these interfaces is that members which add, insert, or remove elements throw NotSupportedException.
As you can see Array implements IList<T>, ICollection<T> and IEnumerable<T> in a special way, and it's not something you can do with your own type.

Casting generic container of type to container of inherited type?

If I have two classes:
public class A { }
public class B : A { }
and I create a generic container and a function that takes it:
public void Foo(List<A> lst) { ... }
I get an invalid conversion if I attempt casting a List<B> to a List<A>, and instead have to pass it like so:
var derivedList = new List<B>();
Foo(new List<A>(derivedList));
Is there some way to pass a List<B> to this function without the overhead of allocating a brand new list, or does C# not support converting from a generic container of a derived type to its base type?
A List<B> simply isn't a List<A> - after all, you can add a plain A to a List<A>, but not to a List<B>.
If you're using C# 4 and .NET 4 and your Foo method only really needs to iterate over the list, then change the method to:
public void Foo(IEnumerable<A> lst) { ... }
In .NET 4, IEnumerable<T> is covariant in T, which allows a conversion from IEnumerable<B> (including a List<B>) to IEnumerable<A>. This is safe because values only ever flow "out" of IEnumerable<A>.
For a much more detailed look at this, you can watch the video of the session I gave at NDC 2010 as part of the torrent of NDC 2010 videos.
This is not possible. C# doesn't support co / contra variance on concrete types such as List<T>. It does support it on interfaces though so if you switch Foo to the following signature you can avoid an allocation
public void Foo(IEnumerable<A> enumerable) { ...
If you wish to pass list-like things to routines which are going to read them but not write them, it would be possible to define a generic covariant IReadableList<out T> interface, so that an IReadableList<Cat> could be passed to a routine expecting an IReadableList<Animal>. Unfortunately, common existing IList<T> implementations don't implement any such thing, and so the only way to implement one would be to implement a wrapper class (which could accept an IList as a parameter), but it probably wouldn't be too hard. Such a class should also implement non-generic IList, also as read-only, to allow code to evaluate Count without having to know the type of the items in the list.
Note that an object's implementation of IReadableList<T> should not be regarded as any promise of immutability. It would be perfectly reasonable to have a read-write list or wrapper class implement IReadableList<T>, since a read-write list is readable. It's not possible to use an IReadableList<T> to modify a list without casting it to something else, but there's no guarantee a list passed as IReadableList<T> can't be modified some other way, such as by casting it to something else, or by using a reference stored elsewhere.

Covariant generic parameter

I'm trying to understand this but I didn't get any appropriate results from searching.
In C# 4, I can do
public interface IFoo<out T>
{
}
How is this different from
public interface IFoo<T>
{
}
All I know is the out makes the generic parameter covariant (??).
Can someone explain the usage of <out T> part with an example? And also why is applicable only for interfaces and delegates and not for classes?
Sorry if it's a duplicate and close it as such if it is.
Can someone explain the usage of the out T part with an example?
Sure. IEnumerable<T> is covariant. That means you can do this:
static void FeedAll(IEnumerable<Animal> animals)
{
foreach(Animal animal in animals) animal.Feed();
}
...
IEnumerable<Giraffe> giraffes = GetABunchOfGiraffes();
FeedAll(giraffes);
"Covariant" means that the assignment compatibility relationship of the type argument is preserved in the generic type. Giraffe is assignment compatible with Animal, and therefore that relationship is preserved in the constructed types: IEnumerable<Giraffe> is assignment compatible with IEnumerable<Animal>.
Why is applicable only for interfaces and delegates and not for classes?
The problem with classes is that classes tend to have mutable fields. Let's take an example. Suppose we allowed this:
class C<out T>
{
private T t;
OK, now think this question through carefully before you go on. Can C<T> have any method outside of the constructor that sets the field t to something other than its default?
Because it must be typesafe, C<T> can now have no methods that take a T as an argument; T can only be returned. So who sets t, and where do they get the value they set it from?
Covariant class types really only work if the class is immutable. And we don't have a good way to make immutable classes in C#.
I wish we did, but we have to live with the CLR type system that we were given. I hope in the future we can have better support for both immutable classes, and for covariant classes.
If this feature interests you, consider reading my long series on how we designed and implemented the feature. Start from the bottom:
https://blogs.msdn.microsoft.com/ericlippert/tag/covariance-and-contravariance/
If we're talking about generic variance:
Covariance is all about values being returned from an operation back to the caller.
Contravariance It’s opposite and it's about values being passed into by the caller:
From what I know if a type parameter is only used for output, you can use out. However if the type is only used for input, you can use in. It's the convenience because the compiler cannot be sure if you can remember which form is called covariance and which is called contravariance. If you don't declare them explicitly once the type has been declared, the relevant types of conversion are available implicitly.
There is no variance (either covariance or contravariance) in classes because even if you have a class that only uses the type parameter for input (or only uses it for output), you
can’t specify the in or out modifiers. Only interfaces and delegates can have variant type parameters. Firstly the CLR doesn’t allow it. From the conceptual point of view Interfaces represent a way of looking at an object from a particular perspective, whereas classes are more actual implementation types.
It means that if you have this:
class Parent { }
class Child : Parent { }
Then an instance of IFoo<Child> is also an instance of IFoo<Parent>.

Why can I not return a List<Foo> if asked for a List<IFoo>? [duplicate]

This question already has answers here:
In C#, why can't a List<string> object be stored in a List<object> variable
(14 answers)
Closed 8 years ago.
I understand that, if S is a child class of T, then a List<S> is not a child of List<T>. Fine. But interfaces have a different paradigm: if Foo implements IFoo, then why is a List<Foo> not (an example of) a List<IFoo>?
As there can be no actual class IFoo, does this mean that I would always have to cast each element of the list when exposing a List<IFoo>? Or is this simply bad design and I have to define my own collection class ListOfIFoos to be able to work with them? Neither seem reasonable to me...
What would be the best way of exposing such a list, given that I am trying to program to interfaces? I am currently tending towards actually storing my List<Foo> internally as a List<IFoo>.
Your List<Foo> is not a subclass if List<IFoo> because you cannot store an MyOwnFoo object in it, which also happens to be an IFoo implementation. (Liskov substitution principle)
The idea of storing a List<IFoo> instead of a dedicated List<Foo> is OK. If you need casting the list's contents to it's implementation type, this probably means your interface is not appropriate.
Here's an example of why you can't do it:
// Suppose we could do this...
public List<IDisposable> GetDisposables()
{
return new List<MemoryStream>();
}
// Then we could do this
List<IDisposable> disposables = GetDisposables();
disposables.Add(new Form());
At that point a list which was created to hold MemoryStreams now has a Form in it. Bad!
So basically, this restriction is present to maintain type safety. In C# 4 and .NET 4.0 there will be limited support for this (it's called variance) but it still won't support this particular scenario, for exactly the reasons given above.
In your returning function, you have to make the list a list of interfaces, and when you create the object, make it as an object that implements it. Like this:
function List<IFoo> getList()
{
List<IFoo> r = new List<IFoo>();
for(int i=0;i<100;i++)
r.Add(new Foo(i+15));
return r;
}
MASSIVE EDIT
You'll be able to do it with C# 4.0, but [thanks Jon]
You can get around it using ConvertAll:
public List<IFoo> IFoos()
{
var x = new List<Foo>(); //Foo implements IFoo
/* .. */
return x.ConvertAll<IFoo>(f => f); //thanks Marc
}
The simple answer is that List<Foo> is a different type to List<IFoo>, in the same way that DateTime is different to IPAddress, for example.
However, the fact that you have IFoo implies that collections of IFoo are expected to contain at least two implementations of IFoo (FooA, FooB, etc...) because if you expect there to only ever be one implementation of IFoo, Foo, then the IFoo type is redundant.
So, if there is only ever going to be one derived type of an interface, forget the interface and save on the overhead. If there are two or more derived types of an interface then always use the interface type in collections/generic parameters.
If you find yourself writing thunking code then there's probably a design flaw somewhere.
If, at the time that IList<T> was invented, Microsft had been aware that future versions of .net would support interface covariance and contravariance, it would have been possible and useful to split the interface into IReadableList<out T>, IAppendable<in T>, and IList<T> which would inherit both of the above. Doing so would have imposed a small amount of additional work on vb.net implementers (they would have to define both read-only and read-write versions of the indexed property, since for some reason .net doesn't allow a read-write property to do serve as a read-only property) but would mean that methods which simply need to read items from a list could receive an IReadableList<T> in covariant fashion, and methods which simply need a collection they can append to could receive an IAppendable<T> in contravariant fashion.
Unfortunately, the only way such a thing could be implemented today would be if Microsoft provided a means for new interfaces be substitutable for older ones, with implementations of the old interfaces automatically using default methods supplied by the new ones. I would think such a feature (interface substitutability) would be extremely helpful, but I wouldn't hold my breath waiting for Microsoft to implement it.
Given that there's no way to back-fit IReadableList<T> into IList<T>, an alternative approach would be to define one's own list-related interface. The one difficulty with doing so is that all instances of System.Collections.Generic.List<T> would have to be replaced with some other type, though the difficulty of doing that could be minimized if one were to define a List<T> struct in a different namespace which contained a single System.Collections.Generic.List<T> field and defined widening conversions to and from the system type (using a struct rather than a class would mean that code would avoid the need to create new heap objects when casting in any scenario where the struct wouldn't have to be boxed).

Categories

Resources