Why can Enumerable.Except be used on a string array in C#?

Why can Enumerable.Except be used on a string array in C#? - c#

Example
Here is a code example I found at the Pete on Software blog:
var listThree = new string[] { "Pete", "On", "Software" };
var listFour = new string[] { "Joel", "On", "Software" };
stringExcept = listThree.Except(listFour);
The code compiles and runs. So far so good.
Question
However, I don't understand why it works.
So, can anyone explain why I can use Enumerable.Except on a string array?
Perhaps, it will be clear to me if someone could explain how to read the signature of Enumerable.Except and give me a code example:
public static IEnumerable<TSource> Except<TSource>(
this IEnumerable<TSource> first,
IEnumerable<TSource> second
)
What I know
I know the concepts of generics and extension methods. But obviously not good enough to understand the code example above. I have also already used some basic Linq queries.

Except is an extension method which extends any type that implements IEnumerable<T>. This includes the System.Array type which implements IEnumerable<T>.
The note on the linked page explains why the docs don't show System.Array implementing IEnumerable<T>
In the .NET Framework version 2.0, the Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces. The implementations are provided to arrays at run time, and therefore are not visible to the documentation build tools. As a result, the generic interfaces do not appear in the declaration syntax for the Array class, and there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations). The key thing to be aware of when you cast an array to one of these interfaces is that members which add, insert, or remove elements throw NotSupportedException.

It just says that if you have an IEnumerable of a given type TSource in this case string you can Except it with another IEnumerable of the same type and get a third IEnumerable of the same type back. The key point is that the two IEnumerable inputs have to be the same (and obviously the return will be of the same type).

An array of T (or say a T[]), is also an IEnumerable<T>. In your question, T is System.String. And Enumerable.Except is an extension method on IEnumerable<T>, so it's also working for a string[]. And stringExcept = listThree.Except(listFour); equals to
stringExcept = Enumerable.Except(listThree, listFour).

The compiler will match the TSource argument to string as a string array implements the IEnumerable<string> interface and thus matches the first argument of the extension method. So the answer is two things:
string[] implements IEnumerable<string>
The compiler is intelligent enough to infer the generic arguments

The Except method returns the elements in the first enumerable that do not also appear in the
second enumerable. So in the case you specified, the result would be {"Pete", "Joel"}.
In this case, thinking in terms of string arrays is perhaps a red herring. It might be more advantageous to think in terms of object equality (http://msdn.microsoft.com/en-us/library/system.object.equals.aspx).
The Microsoft documentation is here: http://msdn.microsoft.com/en-us/library/system.linq.enumerable.except.aspx

Related

Why does the compiler pick the extension method on string over implicit char array?

If I have System.Linq imported, I can use this ToArray overload in the following call:
var x = "foo".ToArray();
and x is assigned a char[] with three elements that are the characters from the string "foo". Then if I add a custom extension method in scope:
public static T[] ToArray<T>(this T toConvert) => new[] { toConvert };
The compiler silently changes its mind and x becomes a string[] with one element that is the string "foo".
Why did the compiler not complain about ambiguity? I know some seemingly-ambiguous situations are resolved automatically by the compiler without errors, but I can't find any documentation or references about this type of situation. Basically, it seems that treating a string as a string rather than an implicit array of char seems to be the preferred behavior...

The first extension method you reference:
public static TSource[] ToArray<TSource> (this System.Collections.Generic.IEnumerable<TSource> source);
Converts an IEnumerable<TSource> to an array (interface of generic type).
The second extension method you made:
public static T[] ToArray<T>(this T toConvert) => new[] { toConvert };
Converts any T object to a single object array. Since this is a generic type without an interface, it is preferred over an extension method taking an interface with a generic type. In essence, it is a greater covering surface of potential types to apply the extension on than the interface with a generic type. The compiler will prefer concrete types that match extension methods in favor of interfaces that match.
C# language spec, go about 60% down to find Method Overloading:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/introduction
C# overload resolution:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-7.3/improved-overload-candidates
VB version, although applies to C# mostly:
https://learn.microsoft.com/en-us/dotnet/visual-basic/reference/language-specification/overload-resolution

Does usage of contains on IEnumerable cast it to a List?

I'm using Linq to filter Data I get from the database. Due to design choices made 1 method returns me an IEnumerable<int> which I then use for a linq statement to see which IDs are permitted to be returned (code follows below). My question here is as I'm not seeing anything there in the documentation: Does the Contains method implicitly cast the IEnumerable to a List for the statement to be executed? (If so the question is if using List in the first place instead of IEnumerable is better).
Code Example
private List<MyData> GetAllPermittedData()
{
IEnumerable<int> permitteddIds = GetPermittedIDs();
return (from a in MyDataHandler.GetAllData() where permittedIds.Contains(a.Id)
select a);
}
Like I asked above I'm not sure if the Contains part implicitly converts permittedIds into a List<int> (for/inside the use of the Contains statement). If this is the case then a followup question would be if it is not better to already use the following statement instead (performance-wise):
private List<MyData> GetAllPermittedData()
{
List<int> permitteddIds = GetPermittedIDs().ToList();
return (from a in MyDataHandler.GetAllData() where permittedIds.Contains(a.Id)
select a);
}

The LINQ operator will attempt to cast it to ICollection<T> first. If the cast succeeds, it uses that method. Since List<T> implements this interface, it will use the list's contain method.
Note that if you use the overload that accepts an IEqualityComparer, it must iterate over the enumerable and the ICollection shortcut is not taken.
You can see this implementation in the .NET Framework reference source:
public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value) {
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null) return collection.Contains(value);
return Contains<TSource>(source, value, null);
}
Jon Skeet also has a good (and lengthy) blog series called "Reimplementing LINQ" where he discusses the implementation in depth. He specifically covers Contains in part 32 of his blog.

The Contains method may try to cast the passed IEnumerable<T> to IList<T> or to ICollection<T>. If the cast succeeds, it may directly use the methods of IList<T>, otherwise it will enumerate over the full sequence.
Note that I am writing may because this is implementation-specific and it is not specified in the docs. As such, it could be different across .NET versions and also in alternative implementations such as Mono.
Your advantage by providing only an IEnumerable<T> is that you have more freedom to exchange the object returned from that property without changing the public interface. The performance cost of the attempted cast to IList<T> or similar should be negligible.
In any case, this way is more performant than your suggestion of calling ToList, as that will actually create a new List<T> and copy all items from the enumeration into it.

Contains exists as an extension method for IEnumerable<T>. But you con't need to convert your IEnumerable to a List<T> with ToList(), you could simply use that IEnumerable<T> to fill a HashSet<T>:
var permitteddIds = new HashSet<int>(GetPermittedIDs());

List<T> Constructor (IEnumerable<T>) doesn't accept an array created from Array.CreateInstance(Type, Int32)

I'm a newbie to programming and .net, and I am having a hard time understanding why using the List<T>(IEnumerable<T>) constructor accepts an array created using [], but does not accept an array created using Array.CreateInstance(Type, Int32).
Here is what works:
DirectoryInfo[] dirsArray = foo.GetDirectories();
List<DirectoryInfo> dirsList = new List<DirectoryInfo>(dirsArray);
Here is what doesn't:
Array dirsArray = Array.CreateInstance(typeof(DirectoryInfo), 10); //assume we know 10 is the required length
List<DirectoryInfo> dirsList = new List<DirectoryInfo>(dirsArray);
The above gives the following compiler errors:
Error 1 The best overloaded method match for 'System.Collections.Generic.List<System.IO.DirectoryInfo>.List(System.Collections.Generic.IEnumerable<System.IO.DirectoryInfo>)' has some invalid arguments
Error 2 Argument 1: cannot convert from 'System.Array' to 'System.Collections.Generic.IEnumerable<System.IO.DirectoryInfo>'
But I know that List<T>(IEnumerable<T>) can accept any IEnumerable as an argument. And I know that System.Array is IEnumerable. Not only because that is in the reference, but because the first example using the [] constructor syntax works fine.
So then what is the problem here? Does Array.CreateInstance somehow manages to create an array that is not IEnumerable?

Array class itself is not an IEnumerable<T>.You will need to cast result of Array.CreateInstance
var dirsArray = (DirectoryInfo[]) Array.CreateInstance(typeof(DirectoryInfo), 10);
Array is the base class for all array types and the implementation of IEnumerable<T> is provided at runtime.So it is not possible to use an Array as IEnumerable<T> at compile time.From MSDN
Starting with the .NET Framework 2.0, the Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces. The implementations are provided to arrays at run time, and as a result, the generic interfaces do not appear in the declaration syntax for the Array class.

because Array.CreateInstance(Type, Int32) does not return an IEnumerable<T> instance, it returns an Array object. If you create an array using [] it will be an IEnumerable<T>.
Please take a look at this thread, it's quite explanatory: Why isn't Array a generic type?

Array.CreateInstance returns an untyped Array which as you'll notice, doesn't implement IEnumerable<T> (it's not generic), so can't be used with the List<T> constructor.
However, what's actually returned for Array.CreateInstance is typed, you just have to cast it to the type you want (all typed arrays are derived from the base Array class). So you can do this:
List<DirectoryInfo> dirsList = new List<DirectoryInfo>((DirectoryInfo[])dirsArray);
And it should compile.
That said, I've never found a reason to use Array.CreateInstance.
Update: Since in the comments to the original question you talk about converting a ControlCollection to a List<Control>, it should be noted that you can't cast a ControlCollection to a typed array. So something like this doesn't work:
var lst = (Control[])myControlCollection;
Because ControlCollection (and a lot of the older, pre-generics, collections in the framework) doesn't derive from Array. In that case, since ControlCollection implements IEnumerable, you can use the Cast extension method:
var lst = new List<Control>(myControlCollection.Cast<Control>());
You can use this trick with a lot of the collection classes in the framework if they implement IEnumerable. MatchCollection to give another example.

C# Variance Issue with IEnumerable<T> vs <T> [duplicate]

This question already has answers here:
Method overload resolution with regards to generics and IEnumerable
(2 answers)
Closed 8 years ago.
So, I'm having an issue with similar code to below:
public static String MyFunc<T>(this IEnumerable<T> list) where T : struct
{
... some code ...
return myString;
}
public static String MyFunc<T>(this T o) where T : struct
{
... some code ...
return myString;
}
The problem is that when trying to do call MyFunc on a List it uses the second function instead of the one that accepts an IEnumerable. I know this has to do with variance, but I'm uncertain as to how to force it to use the first function rather than the second. The code I would use to call the first one would be:
List<int> x = new List<int>();
String s = x.MyFunc();
The above code immediately goes to the second function and I need it to use the first. How can I force the desired behavior? Incidentally, I'm using .NET 4.0

The reason that it's currently picking the second method is that a conversion from a type to itself (second method, T=List<int>, conversion from List<int> to List<int>) will always be "better" than a conversion to the type it implements (first method, T=int, conversion from List<int> to IEnumerable<int>). This has nothing to do with variance, by the way - it's just the method overloading algorithm and type inference.
Note that with your current code, although the second overload is picked, it will then be found to be invalid because T violates the T : struct constraint. The constraint is only checked after the overload is chosen. See Eric Lippert's blog post on this for more details.
I suggest you just give the two methods different names.
EDIT: As noted by Anthony in comments, this can work if you call it as:
x.AsEnumerable().MyFunc();
Or just change the declaration to:
IEnumerable<int> x = new List<int>();
x.MyFunc();
It's not entirely clear to me exactly why it's better here - in this case after type argument substitution, you've basically got IEnumerable<T> as the parameter type in both cases. However, I would still strongly recommend using different names here. The fact that it's got me puzzling over the spec to work out which overload is being called should be enough indication that the behaviour won't be immediately clear to everyone reading the code.
EDIT: I think the reason is here (from the C# 5 spec, section 7.5.3.2):
A type parameter is less specific than a non-type parameter
So just T is less specific than IEnumerable<T>, even though the latter involves a type parameter. It's still not clear to me whether this is the language designers' intended behaviour... I can see why a type which involves type parameters should be seen as less specific than a type which doesn't involve type parameters, but not quite this wording...

Co-variant array conversion from x to y may cause run-time exception

I have a private readonly list of LinkLabels (IList<LinkLabel>). I later add LinkLabels to this list and add those labels to a FlowLayoutPanel like follows:
foreach(var s in strings)
{
_list.Add(new LinkLabel{Text=s});
}
flPanel.Controls.AddRange(_list.ToArray());
Resharper shows me a warning: Co-variant array conversion from LinkLabel[] to Control[] can cause run-time exception on write operation.
Please help me to figure out:
What does this means?
This is a user control and will not be accessed by multiple objects to setup labels,
so keeping code as such will not affect it.

What it means is this
Control[] controls = new LinkLabel[10]; // compile time legal
controls[0] = new TextBox(); // compile time legal, runtime exception
And in more general terms
string[] array = new string[10];
object[] objs = array; // legal at compile time
objs[0] = new Foo(); // again legal, with runtime exception
In C#, you are allowed to reference an array of objects (in your case, LinkLabels) as an array of a base type (in this case, as an array of Controls). It is also compile time legal to assign another object that is a Control to the array. The problem is that the array is not actually an array of Controls. At runtime, it is still an array of LinkLabels. As such, the assignment, or write, will throw an exception.

I'll try to clarify Anthony Pegram answer.
Generic type is covariant on some type argument when it returns values of said type (e.g. Func<out TResult> returns instances of TResult, IEnumerable<out T> returns instances of T). That is, if something returns instances of TDerived, you can as well work with such instances as if they were of TBase.
Generic type is contravariant on some type argument when it accepts values of said type (e.g. Action<in TArgument> accepts instances of TArgument). That is, if something needs instances of TBase, you can as well pass in instances of TDerived.
It seems quite logical that generic types which both accept and return instances of some type (unless it is defined twice in the generic type signature, e.g. CoolList<TIn, TOut>) are not covariant nor contravariant on the corresponding type argument. For example, List is defined in .NET 4 as List<T>, not List<in T> or List<out T>.
Some compatibility reasons might have caused Microsoft to ignore that argument and make arrays covariant on their values type argument. Maybe they conducted an analysis and found that most people only use arrays as if they were readonly (that is, they only use array initializers to write some data into an array), and, as such, the advantages overweigh the disadvantages caused by possible runtime errors when someone will try to make use of covariance when writing into the array. Hence it is allowed but not encouraged.
As for your original question, list.ToArray() creates a new LinkLabel[] with values copied from original list, and, to get rid of (reasonable) warning, you'll need to pass in Control[] to AddRange. list.ToArray<Control>() will do the job: ToArray<TSource> accepts IEnumerable<TSource> as its argument and returns TSource[]; List<LinkLabel> implements read-only IEnumerable<out LinkLabel>, which, thanks to IEnumerable covariance, could be passed to the method accepting IEnumerable<Control> as its argument.

The warning is due to the fact that you could theoretically add a Control other than a LinkLabel to the LinkLabel[] through the Control[] reference to it. This would cause a runtime exception.
The conversion is happening here because AddRange takes a Control[].
More generally, converting a container of a derived type to a container of a base type is only safe if you can't subsequently modify the container in the way just outlined. Arrays do not satisfy that requirement.

The most straight forward "solution"
flPanel.Controls.AddRange(_list.AsEnumerable());
Now since you are covariantly changing List<LinkLabel> to IEnumerable<Control> there is no more concerns since it is not possible to "add" an item to an enumerable.

The issue's root cause is correctly described in other answers, but to resolve the warning, you can always write:
_list.ForEach(lnkLbl => flPanel.Controls.Add(lnkLbl));

With VS 2008, I am not getting this warning. This must be new to .NET 4.0.
Clarification: according to Sam Mackrill it's Resharper who displays a warning.
The C# compiler does not know that AddRange will not modify the array passed to it. Since AddRange has a parameter of type Control[], it could in theory try to assign a TextBox to the array, which would be perfectly correct for a true array of Control, but the array is in reality an array of LinkLabels and will not accept such an assignment.
Making arrays co-variant in c# was a bad decision of Microsoft. While it might seem a good idea to be able to assign an array of a derived type to an array of a base type in the first place, this can lead to runtime errors!

How about this?
flPanel.Controls.AddRange(_list.OfType<Control>().ToArray());

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why can Enumerable.Except be used on a string array in C#? - c#

Related

Why does the compiler pick the extension method on string over implicit char array?

Does usage of contains on IEnumerable cast it to a List?

List<T> Constructor (IEnumerable<T>) doesn't accept an array created from Array.CreateInstance(Type, Int32)

C# Variance Issue with IEnumerable<T> vs <T> [duplicate]

Co-variant array conversion from x to y may cause run-time exception

Categories

Resources