Wondering why C# is moving towards more pattern based programming rather than conventional ways.
Ex. The foreach statement expects that the loop source to have a magic method called GetEnumerator which returns an object which has a few more magic methods like MoveNext and Current, but they don't mandate any specific interface? C# could have mandated that a class to be used in foreach should implement IEnumerable or IEnumerable<T> as it does for theusing statement in that it expects an object to be used in using statement to implement the IDisposable interface.
Also, I see a similar trend with async/await keywords as well....
Of course there must be a good reason for that, but it seems a little odd for me to understand the reason why does compiler/CLR requires "magic methods" rather than relying on interfaces.
foreach
I would say it's both about performance and compatibility
If you had chosen foreach to use IEnumerable it would have made all generic
collections iteration very slow for value-types T (because of
boxing/unboxing).
If you had chosen to use IEnumerable<T> iterating over ArrayList and
all non-generic collections from early .NET version would have not been
possible.
I think the design decision was good. When foreach was introduced (.NET 1.1) there was nothing about generics in .NET (they were introduced in .NET 2.0). Choosing IEnumerable as a source of foreach enumeration would make using it with generic collections poor or would require a radical change. I guess designers already knew that they were going to introduce generics not that long time later.
Additionaly, declaring it as use IEnumerable<T> when it's available or IEnumerable when it's not is not much different then use available GetEnumerator method or do not compile when it's not available, is it?
update
As #mikez mentioned in comments, there is one more advantage. When you don't expect GetEnumerator to return IEnumerator/IEnumerator<T> you can return struct and don't worry about boxing when the enumerator is used by loop.
LINQ
The same magic methods situation occurs when you use LINQ and syntax based queries. When you write
var results = from item in source
where item != "test"
select item.ToLower();
it's transformed by compiler into
var results = source.Where(x => x != "test")
.Select(x => x.ToLower());
And because that code would work no matter what interface source implement the same applies to syntax-based query. As long as after transforming it to method-based query every method call can be properly assigned by compiler everything is OK.
async/await
I'm not that sure but think the same thing applies to async/await. When you use these keywords compiler generates a bunch of code for yourself, which is then compiled as if you'd written the code by yourself. And as long as code made by that transformation can be compiled everything is OK.
Related
Could anyone point out the differences between C# statements and their alike extension methods? e.g: foreach vs. .ForEach(the extension method).
If there are any difference, what are they? Security wise? Performance wise? Which one is better to use? Which one is safer? etc.
And if there are no differences, then why bother writing them?
I've been thinking and searching a bit about this question if mine and didn't find my answer.
It depends on the implementation of the extension method you use. Internally, there's really nothing special about most's version of .ForEach.
There would be minimal/negligable time to load the extension method at app load and compile time. There "May" be minimal overhead to convert the .ForEach syntax into the underlying foreach as it's technically only a wrapper. It could potentially cause security issues, but only because it can create closure sitiuations where your objects may not be collected at the time expected (eg: held in scope longer). Ultimately, there's very, very little difference, and it comes down to taste. Unless of course, you're trying to shave off every millisecond, and in that case, using the native body is the best way to go.
I would like to mention that the .ForEach strikes against the premise of using lambda statements being purely functional, that is, it breaks the "functional" style and introduces the possibility of side-effects. Using a foreach body makes the code more readable, and explicit.
Please see:
Why there is no ForEach extension method on IEnumerable?
It's a trade off. The extension method is certainly more concise, and it provides compile time checking. The extension method also can introduce difficulty of readability, difficulty of maintainability, and side-effects.
Taken from here
The second reason is that doing so adds zero new representational
power to the language. Doing this lets you rewrite this perfectly
clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach((Foo foo)=>{ statement involving foo; });
which uses almost exactly the same characters in slightly different
order. And yet the second version is harder to understand, harder to
debug, and introduces closure semantics, thereby potentially changing
object lifetimes in subtle ways.
The provided answers are inaccurate. There are many pitfalls when using a ForEach extension method. E.g. the following extension method may easily become a performance killer:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var item in source)
{
action(item);
}
}
And then we misuse it:
IEnumerable<T> items = new List<T>();
items.ForEach(UpdateItem);
Looks nice, right? Well, here the ForEach() extension method is called on an IEnumerable<T> which means the compiler is forced to allocate a generic enumerator instead of using an optimized, allocation-free version. Then, the Action argument calls for another quite heavy delegate allocation. Put this loop on a hot path and the Garbage Collector will go nuts, causing significant performance issues.
Please see my other answer, where I explain this in much greater detail.
In terms of security, I have seen developers accidentally including a third-party assembly to use a specific ForEach() extension method. This implied shipping an unwanted dependency from who-knows-where with unknown capabilities.
Summary
foreach is safer.
foreach is more performant.
foreach is better. The compiler knows exactly how to deal with it efficiently.
.ForEach is similar to Parallel.ForEach. I've seen the regular .ForEach used to develop/debug parallel versions before. Whats nice about it is that you don't have to change a bunch of code to move between the two.
In general, if I have no intentions to do the Parallel.ForEach, then I prefer the regular foreach for readability.
Previously I used IEnumerable<T> type, if I passed collection as parameter of method.
But recently, I had a problem with the collection of type IEnumerable<T> that was created in similar way:
var peoples = names.Select(name => new People(name));
In this case, always, if I use a collection peoples (for example, foreach), it creates new instance of class People, and it can easily cause an error.
So I want to ask whether it is right to use the IEnumerable <T> type parameter. I think it may cause problems (see example above) and this type should not be used. What alternatives do you recommend (ICollection<T>, IList<T> etc.) and when to use which alternative?
Or do you think that this is a silly question, because the creation of objects in the Select method uses only a fool?
Of course, I know that I can use ToArray() or ToList() and thus solve the problem. But someone else who uses this method, it can not know. I would like to know how to prevent this by selecting the correct type parameter. List or array is too specific for me when I want to just "enumerate" objects.
IEnumerable is not a collection. It is just something that you can "enumerate". The problem is not passing IEnumerable to your method, the problem is that if you are using LINQ (Select method), every time you read the enumerable it will execute the code again. If you only want to have it executed once, you can use the ToArray() or ToList() methods:
var peoples = names.Select(name => new People(name)).ToList();
Like this you can still pass it to any method accepting an IEnumerable (or List) and it will only create one instance for each person.
Edit:
Your method shouldn't worry about these kind of problems. It's the callers problem. There might be perfectly good reasons to call your method with an enumerable instead of a list. The caller should know that the enumerable gives different results if he passes it to different methods, so you shouldn't worry about that.
The only exception is if you enumerate the parameter more than once in the method itself. In this case you should cache the parameter in a list inside the method and then enumerate the list instead as many times as you need.
The ToArray and ToList suggestions expose more than what one might initially think. We're tempted to think of this advice as simply saying that calling ToList/ToArray at either the call-site or the first thing in your method corrects the issue, but your question is whether IEnumerable<T> is appropriate - you could change from a parameter type from IEnumerable<T> to something else (like ICollection<T>) which puts the onus on the caller to convert to something that implements this interface (note that T[], List<T>, IList<T> and Collection<T> all do). Part of the problem with this approach is that these interfaces represent mutable collections, whereas IEnumerable<T> advertises that the method enumerates items - just one of the reasons I don't like this approach.
What if the potential bug was not really a bug at all? Perhaps the caller intends these to be defensive copies or dumb-data objects - in these latter cases, it may be inefficient by some measure but so is requiring them to make a copy - but in this proposed use it definitely is not a bug. Likewise, a one-size fits all recommendation doesn't fit because IEnumerable<T> objects don't have to ever terminate - but requiring an array type to be passed-in would mean that infinite (i.e. computed) or merely large IEnumerable<T> objects would be out of the question.
Regardless, I think you're right to pose questions regarding defensive programming. However, in this case I think the best solution is to stick with IEnumerable<T> and educate rather than limit your callers based on the speculation that they might, in some limited circumstances, introduce a bug.
Paraphrasing a quote:
The problem with designing to prevent issues that idiots will make is that the idiots are so damned ingenious.
Hope this helps. Cheers!
Note This is not a question about how to implement or emulate duck typing in C#...
For several years I was under the impression that certain C# language features were depdendent on data structures defined in the language itself (which always seemed like an odd chicken & egg scenario to me). For example, I was under the impression that the foreach loop was only available to use with types that implemented IEnumerable.
Since then I've come to understand that the C# compiler uses duck typing to determine whether an object can be used in a foreach loop, looking for a GetEnumerator method rather than IEnumerable. This makes a lot of sense as it removes the chicken & egg conundrum.
I'm a little confused as to why this doesn't seem to be the case with the using block and IDisposable. Is there any particular reason the compiler can't use duck typing and look for a Dispose method? What's the reason for this inconsistency?
Perhaps there's something else going on under the hood with IDisposable?
Discussing why you would ever have an object with a Dispose method that didn't implement IDisposable is outside the scope of this question :)
There's nothing special about IDisposable here - but there is something special about iterators.
Before C# 2, using this duck type on foreach was the only way you could implement a strongly-typed iterator, and also the only way of iterating over value types without boxing. I suspect that if C# and .NET had had generics to start with, foreach would have required IEnumerable<T> instead, and not had the duck typing.
Now the compiler uses this sort of duck typing in a couple of other places I can think of:
Collection initializers look for a suitable Add overload (as well as the type having to implement IEnumerable, just to show that it really is a collection of some kind); this allows for flexible adding of single items, key/value pairs etc
LINQ (Select etc) - this is how LINQ achieves its flexibility, allowing the same query expression format against multiple types, without having to change IEnumerable<T> itself
The C# 5 await expressions require GetAwaiter to return an awaiter type which has IsCompleted / OnCompleted / GetResult
In both cases this makes it easier to add the feature to existing types and interfaces, where the concept didn't exist earlier on.
Given that IDisposable has been in the framework since the very first version, I don't think there would be any benefit in duck typing the using statement. I know you explicitly tried to discount the reasons for having Dispose without implementing IDisposable from the discussion, but I think it's a crucial point. There need to be good reasons to implement a feature in the language, and I would argue that duck typing is a feature above-and-beyond supporting a known interface. If there's no clear benefit in doing so, it won't end up in the language.
There's no chicken and egg: foreach could depend on IEnumerable since IEnumerable doesn't depend on foreach. The reason foreach is permitted on collections not implementing IEnumerable is probably largely historic:
In C#, it is not strictly necessary
for a collection class to inherit from
IEnumerable and IEnumerator in order
to be compatible with foreach; as long
as the class has the required
GetEnumerator, MoveNext, Reset, and
Current members, it will work with
foreach. Omitting the interfaces has
the advantage of allowing you to
define the return type of Current to
be more specific than object, thereby
providing type-safety.
Furthermore, not all chicken and egg problems are actually problems: for example a function can call itself (recursion!) or a reference type can contain itself (like a linked list).
So when using came around why would they use something as tricky to specify as duck typing when they can simply say: implement IDisposable? Fundamentally, by using duck typing you're doing an end-run around the type system, which is only useful when the type system is insufficient (or impractical) to address a problem.
The question which you are asking is not a chicken and egg situation. Its more like hows the language compiler is implemented. Like C# and VB.NET compiler are implemented differently.If you write a simple code of hello world and compile it with both the compiler and inspect the IL code they will be different. Coming back to your question, I will like to explain what IL code is generated by C# compiler for IEnumerable.
IEnumerator e = arr.GetEnumerator();
while(e.MoveNext())
{
e.Currrent;
}
So the C# compiler is tweaked for the case of foreach.
Forgive me if this is a duplicate, but it's a minor issue for me and I can only spend so long on my curiosity. Why is it that when I use an implicitly typed loop variable in a foreach block, I get no Intellisense? The inferred type seems to be quite obvious.
I am using ReSharper, but when I switch the Intellisense to VS I get the same behaviour, and this don't think it's to blame.
EDIT: Sorry, a bit later, but I was iterating DataTable.Rows, which uses an untyped ieterator, as Marc explains below.
I suspect that the data you are enumerating is not typed - for example, a lot of things that were written in 1.1 only implement IEnumerable, and don't have a custom iterator (you don't actually need IEnumerable<T> to do typed iteration - and indeed you don't even need IEnumerable to use foreach; a lot of 1.1 typed wrote special enumerator types to avoid boxing/casting etc - lots of work). In many cases it would be a breaking change to fix them.
A trivial example here is PropertyDescriptorCollection:
var props = TypeDescriptor.GetProperties(obj);
foreach(PropertyDescriptor prop in props) {...} // fine
but actually, PropertDescriptorCollection's enumerator is just IEnumerator, so Current is object - and hence you'll always get object when you use var:
var props = TypeDescriptor.GetProperties(obj);
foreach(var prop in props) {...} // prop is "object"
Contrast this to the (equally 1.1) StringCollection; this has a custom enumerator (StringEnumerator); so if you used foreach with var, you'd get string (not object).
In anything 2.0 and above, it would be reasonable to expect better typing, for two reasons:
generics (for the simple cases), making it possible to write strongly-typed collections sensibly
iterator blocks (for the non-trivial cases), making it possible to write custom iterators without going insane
But even then there are still cases when you don't get the type you expect; you can either (and perhaps more clearly) specify the type manually, or use Cast<T>() / OfType<T>().
Ok, I'm hoping the community at large will aid us in solving a workplace debate that has been ongoing for a while. This has to do with defining interfaces that either accept or return lists of some type. There are several ways of doing this:
public interface Foo
{
Bar[] Bars { get; }
IEnumerable<Bar> Bars { get; }
ICollection<Bar> Bars { get; }
IList<Bar> Bars { get; }
}
My own preference is to use IEnumerable for arguments and arrays for return values:
public interface Foo
{
void Do(IEnumerable<Bar> bars);
Bar[] Bars { get; }
}
My argument for this approach is that the implementation class can create a List directly from the IEnumerable and simply return it with List.ToArray(). However some believe that IList should be returned instead of an array. The problem I have here is that now your required again to copy it with a ReadOnlyCollection before returning. The option of returning IEnumerable seems troublesome for client code?
What do you use/prefer? (especially with regards to libraries that will be used by other developers outside your organization)
My preference is IEnumerable<T>. Any other of the suggested interfaces gives the appearance of allowing the consumer to modify the underlying collection. This is almost certainly not what you want to do as it's allowing consumers to silently modify an internal collection.
Another good one IMHO, is ReadOnlyCollection<T>. It allows for all of the fun .Count and Indexer properties and unambiguously says to the consumer "you cannot modify my data".
I don't return arrays - they really are a terrible return type to use when creating an API - if you truly need a mutable sequence use the IList<T> or ICollection<T> interface or return a concrete Collection<T> instead.
Also I would suggest that you read Arrays considered somewhat harmful by Eric Lippert:
I got a moral question from an author
of programming language textbooks the
other day requesting my opinions on
whether or not beginner programmers
should be taught how to use arrays.
Rather than actually answer that
question, I gave him a long list of my
opinions about arrays, how I use
arrays, how we expect arrays to be
used in the future, and so on. This
gets a bit long, but like Pascal, I
didn't have time to make it shorter.
Let me start by saying when you
definitely should not use arrays, and
then wax more philosophical about the
future of modern programming and the
role of the array in the coming world.
For property collections that are indexed (and the indices have necessary semantic meaning), you should use ReadOnlyCollection<T> (read only) or IList<T> (read/write). It's the most flexible and expressive. For non-indexed collections, use IEnumerable<T> (read only) or ICollection<T> (read/write).
Method parameters should use IEnumerable<T> unless they 1) need to add/remove items to the collection (ICollection<T>) or 2) require indexes for necesary semantic purposes (IList<T>). If the method can benefit from indexing availability (such as a sorting routine), it can always use as IList<T> or .ToList() when that fails in the implementation.
I think about this in terms of writing the most useful code possible: code that can do more.
Put in those terms, it means I like to accept the weakest interface possible as method arguments, because that makes my code useful from more places. In this case, that's an IEnumerable<T>. Have an array? You can call my method. Have a List? You can call my method. Have an iterator block? You get the idea.
It also means I like my methods to return the strongest interface that is convenient, so that code that relies on the method can easily do more. In this case, that would be IList<T>. Note that this doesn't mean I will construct a list just so I can return it. It just means that if I already have some that implements IList<T>, I may as well use it.
Note that I'm a little unconventional with regards to return types. A more typical approach is to also return weaker types from methods to avoid locking yourself into a specific implementation.
I would prefer IEnumerable as it is the most highlevel of the interfaces giving the end user the opportunity to re-cast as he wishes. Even though this may provide the user with minimum functionality to begin with (basically only enumeration) it would still be enough to cover virtually any need, especially with the extension methods, ToArray(), ToList() etc.
IEnumerable<T> is very useful for lazy-evaluated iteration, especially in scenarios that use method chaining.
But as a return type for a typical data access tier, a Count property is often useful, and I would prefer to return an ICollection<T> with a Count property or possibly IList<T> if I think typical consumers will want to use an indexer.
This is also an indication to the caller that the collection has actually been materialized. And thus the caller can iterate through the returned collection without getting exceptions from the data access tier. This can be important. For example, a service may generate a stream (e.g. SOAP) from the returned collection. It can be awkward if an exception is thrown from the data access layer while generating the stream due to lazy-evaluated iteration, as the output stream is already partially written when the exception is thrown.
Since the Linq extension methods were added to IEnumerable<T>, I've found that my use of the other interfaces has declined considerably; probably around 80%. I used to use List<T> religiously as it had methods that accepted delegates for lazy evaluation like Find, FindAll, ForEach and the like. Since that's available through System.Linq's extensions, I've replaced all those references with IEnumerable<T> references.
I wouldn't go with array, its a type that allows modification yet doesn't have add/remove... kind of like the worst of the pack. If I want to allow modifications, then I would use a type that supports add/remove.
When you want to prevent modifications, you are already wrapping it/copying it, so I don't see what's wrong with a an IEnumerable or a ReadOnlyCollection. I would go with the later ... something I don't like about IEnumerable is that its lazy by nature, yet when you are using with pre-loaded data only to wrap it, calling code that works with it tends to assume pre-loaded data or have extra "unnecessary" lines :( ... that can get ugly results during change.