Does LINQ have a sequence operator, which allows to perform some action on every element without projecting it to a new sequence?
This might see a bit awkward, but just for me to know :)
Example:
IEnumerable<IDisposable> x;
x.PERFORM_ACTION_ON_EVERY_ELEMENT(m => m.Dispose());
Obviously, this could be done using something like:
foreach (var element in x) x.Dispose();
But if something actually exists, that would be nice.
No, it doesn't exist. Specifically for the reason you mention: It seems awkward having a single operator that behaves completely different than all the others.
Eric Lippert, one of the C# Compiler developers has an article about this.
But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.
The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects.
The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect. The call site of this thing would look an awful lot like an expression (though, admittedly, since the method is void-returning, the expression could only be used in a “statement expression” context.)
It does not sit well with me to make the one and only sequence operator that is only useful for its side effects.
You can use this method:
public static class Extension
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var t in source)
{
action(t);
}
return source;
}
}
It returns the source so you can pass it along to another extension method as needed. Or if you want to be void, you can change the method a little bit.
The morelinq project has a ForEach operator. LINQ itself doesn't, as LINQ is all about functional programming, and ForEach has side effects.
Here is a similar dicussion on this Run a method on all objects within a collection
Related
Wondering why C# is moving towards more pattern based programming rather than conventional ways.
Ex. The foreach statement expects that the loop source to have a magic method called GetEnumerator which returns an object which has a few more magic methods like MoveNext and Current, but they don't mandate any specific interface? C# could have mandated that a class to be used in foreach should implement IEnumerable or IEnumerable<T> as it does for theusing statement in that it expects an object to be used in using statement to implement the IDisposable interface.
Also, I see a similar trend with async/await keywords as well....
Of course there must be a good reason for that, but it seems a little odd for me to understand the reason why does compiler/CLR requires "magic methods" rather than relying on interfaces.
foreach
I would say it's both about performance and compatibility
If you had chosen foreach to use IEnumerable it would have made all generic
collections iteration very slow for value-types T (because of
boxing/unboxing).
If you had chosen to use IEnumerable<T> iterating over ArrayList and
all non-generic collections from early .NET version would have not been
possible.
I think the design decision was good. When foreach was introduced (.NET 1.1) there was nothing about generics in .NET (they were introduced in .NET 2.0). Choosing IEnumerable as a source of foreach enumeration would make using it with generic collections poor or would require a radical change. I guess designers already knew that they were going to introduce generics not that long time later.
Additionaly, declaring it as use IEnumerable<T> when it's available or IEnumerable when it's not is not much different then use available GetEnumerator method or do not compile when it's not available, is it?
update
As #mikez mentioned in comments, there is one more advantage. When you don't expect GetEnumerator to return IEnumerator/IEnumerator<T> you can return struct and don't worry about boxing when the enumerator is used by loop.
LINQ
The same magic methods situation occurs when you use LINQ and syntax based queries. When you write
var results = from item in source
where item != "test"
select item.ToLower();
it's transformed by compiler into
var results = source.Where(x => x != "test")
.Select(x => x.ToLower());
And because that code would work no matter what interface source implement the same applies to syntax-based query. As long as after transforming it to method-based query every method call can be properly assigned by compiler everything is OK.
async/await
I'm not that sure but think the same thing applies to async/await. When you use these keywords compiler generates a bunch of code for yourself, which is then compiled as if you'd written the code by yourself. And as long as code made by that transformation can be compiled everything is OK.
Could anyone point out the differences between C# statements and their alike extension methods? e.g: foreach vs. .ForEach(the extension method).
If there are any difference, what are they? Security wise? Performance wise? Which one is better to use? Which one is safer? etc.
And if there are no differences, then why bother writing them?
I've been thinking and searching a bit about this question if mine and didn't find my answer.
It depends on the implementation of the extension method you use. Internally, there's really nothing special about most's version of .ForEach.
There would be minimal/negligable time to load the extension method at app load and compile time. There "May" be minimal overhead to convert the .ForEach syntax into the underlying foreach as it's technically only a wrapper. It could potentially cause security issues, but only because it can create closure sitiuations where your objects may not be collected at the time expected (eg: held in scope longer). Ultimately, there's very, very little difference, and it comes down to taste. Unless of course, you're trying to shave off every millisecond, and in that case, using the native body is the best way to go.
I would like to mention that the .ForEach strikes against the premise of using lambda statements being purely functional, that is, it breaks the "functional" style and introduces the possibility of side-effects. Using a foreach body makes the code more readable, and explicit.
Please see:
Why there is no ForEach extension method on IEnumerable?
It's a trade off. The extension method is certainly more concise, and it provides compile time checking. The extension method also can introduce difficulty of readability, difficulty of maintainability, and side-effects.
Taken from here
The second reason is that doing so adds zero new representational
power to the language. Doing this lets you rewrite this perfectly
clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach((Foo foo)=>{ statement involving foo; });
which uses almost exactly the same characters in slightly different
order. And yet the second version is harder to understand, harder to
debug, and introduces closure semantics, thereby potentially changing
object lifetimes in subtle ways.
The provided answers are inaccurate. There are many pitfalls when using a ForEach extension method. E.g. the following extension method may easily become a performance killer:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var item in source)
{
action(item);
}
}
And then we misuse it:
IEnumerable<T> items = new List<T>();
items.ForEach(UpdateItem);
Looks nice, right? Well, here the ForEach() extension method is called on an IEnumerable<T> which means the compiler is forced to allocate a generic enumerator instead of using an optimized, allocation-free version. Then, the Action argument calls for another quite heavy delegate allocation. Put this loop on a hot path and the Garbage Collector will go nuts, causing significant performance issues.
Please see my other answer, where I explain this in much greater detail.
In terms of security, I have seen developers accidentally including a third-party assembly to use a specific ForEach() extension method. This implied shipping an unwanted dependency from who-knows-where with unknown capabilities.
Summary
foreach is safer.
foreach is more performant.
foreach is better. The compiler knows exactly how to deal with it efficiently.
.ForEach is similar to Parallel.ForEach. I've seen the regular .ForEach used to develop/debug parallel versions before. Whats nice about it is that you don't have to change a bunch of code to move between the two.
In general, if I have no intentions to do the Parallel.ForEach, then I prefer the regular foreach for readability.
I thought it would be nice to do something like this (with the lambda doing a yield return):
public IList<T> Find<T>(Expression<Func<T, bool>> expression) where T : class, new()
{
IList<T> list = GetList<T>();
var fun = expression.Compile();
var items = () => {
foreach (var item in list)
if (fun.Invoke(item))
yield return item; // This is not allowed by C#
}
return items.ToList();
}
However, I found out that I can't use yield in anonymous method. I'm wondering why. The yield docs just say it is not allowed.
Since it wasn't allowed I just created List and added the items to it.
Eric Lippert recently wrote a series of blog posts about why yield is not allowed in some cases.
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
EDIT2:
Part 7 (this one was posted later and specifically addresses this question)
You will probably find the answer there...
EDIT1: this is explained in the comments of Part 5, in Eric's answer to Abhijeet Patel's comment:
Q :
Eric,
Can you also provide some insight into
why "yields" are not allowed inside an
anonymous method or lambda expression
A :
Good question. I would love to have
anonymous iterator blocks. It would be
totally awesome to be able to build
yourself a little sequence generator
in-place that closed over local
variables. The reason why not is
straightforward: the benefits don't
outweigh the costs. The awesomeness of
making sequence generators in-place is
actually pretty small in the grand
scheme of things and nominal methods
do the job well enough in most
scenarios. So the benefits are not
that compelling.
The costs are large. Iterator
rewriting is the most complicated
transformation in the compiler, and
anonymous method rewriting is the
second most complicated. Anonymous
methods can be inside other anonymous
methods, and anonymous methods can be
inside iterator blocks. Therefore,
what we do is first we rewrite all
anonymous methods so that they become
methods of a closure class. This is
the second-last thing the compiler
does before emitting IL for a method.
Once that step is done, the iterator
rewriter can assume that there are no
anonymous methods in the iterator
block; they've all be rewritten
already. Therefore the iterator
rewriter can just concentrate on
rewriting the iterator, without
worrying that there might be an
unrealized anonymous method in there.
Also, iterator blocks never "nest",
unlike anonymous methods. The iterator
rewriter can assume that all iterator
blocks are "top level".
If anonymous methods are allowed to
contain iterator blocks, then both
those assumptions go out the window.
You can have an iterator block that
contains an anonymous method that
contains an anonymous method that
contains an iterator block that
contains an anonymous method, and...
yuck. Now we have to write a rewriting
pass that can handle nested iterator
blocks and nested anonymous methods at
the same time, merging our two most
complicated algorithms into one far
more complicated algorithm. It would
be really hard to design, implement,
and test. We are smart enough to do
so, I'm sure. We've got a smart team
here. But we don't want to take on
that large burden for a "nice to have
but not necessary" feature. -- Eric
Eric Lippert has written an excellent series of articles on the limitations (and design decisions influencing those choices) on iterator blocks
In particular iterator blocks are implemented by some sophisticated compiler code transformations. These transformations would impact with the transformations which happen inside anonymous functions or lambdas such that in certain circumstances they would both try to 'convert' the code into some other construct which was incompatible with the other.
As a result they are forbidden from interaction.
How iterator blocks work under the hood is dealt with well here.
As a simple example of an incompatibility:
public IList<T> GreaterThan<T>(T t)
{
IList<T> list = GetList<T>();
var items = () => {
foreach (var item in list)
if (fun.Invoke(item))
yield return item; // This is not allowed by C#
}
return items.ToList();
}
The compiler is simultaneously wanting to convert this to something like:
// inner class
private class Magic
{
private T t;
private IList<T> list;
private Magic(List<T> list, T t) { this.list = list; this.t = t;}
public IEnumerable<T> DoIt()
{
var items = () => {
foreach (var item in list)
if (fun.Invoke(item))
yield return item;
}
}
}
public IList<T> GreaterThan<T>(T t)
{
var magic = new Magic(GetList<T>(), t)
var items = magic.DoIt();
return items.ToList();
}
and at the same time the iterator aspect is trying to do it's work to make a little state machine. Certain simple examples might work with a fair amount of sanity checking (first dealing with the (possibly arbitrarily) nested closures) then seeing if the very bottom level resulting classes could be transformed into iterator state machines.
However this would be
Quite a lot of work.
Couldn't possibly work in all cases without at the very least the iterator block aspect being able to prevent the closure aspect from applying certain transformations for efficiency (like promoting local variables to instance variables rather than a fully fledged closure class).
If there was even a slight chance of overlap where it was impossible or sufficiently hard to not be implemented then the number of support issues resulting would likely be high since the subtle breaking change would be lost on many users.
It can be very easily worked around.
In your example like so:
public IList<T> Find<T>(Expression<Func<T, bool>> expression)
where T : class, new()
{
return FindInner(expression).ToList();
}
private IEnumerable<T> FindInner<T>(Expression<Func<T, bool>> expression)
where T : class, new()
{
IList<T> list = GetList<T>();
var fun = expression.Compile();
foreach (var item in list)
if (fun.Invoke(item))
yield return item;
}
Unfortunately I don't know why they didn't allow this, since of course it's entirely possible to do envision how this would work.
However, anonymous methods are already a piece of "compiler magic" in the sense that the method will be extracted either to a method in the existing class, or even to a whole new class, depending on whether it deals with local variables or not.
Additionally, iterator methods using yield is also implemented using compiler magic.
My guess is that one of these two makes the code un-identifiable to the other piece of magic, and that it was decided to not spend time on making this work for the current versions of the C# compiler. Of course, it might not be a concious choice at all, and that it just doesn't work because nobody thought to implement it.
For a 100% accurate question I would suggest you use the Microsoft Connect site and report a question, I'm sure you'll get something usable in return.
I would do this:
IList<T> list = GetList<T>();
var fun = expression.Compile();
return list.Where(item => fun.Invoke(item)).ToList();
Of course you need the System.Core.dll referenced from .NET 3.5 for the Linq method. And include:
using System.Linq;
Cheers,
Sly
Maybe its just a syntax limitation. In Visual Basic .NET, which is very similar to C#, it is perfectly possible while awkward to write
Sub Main()
Console.Write("x: ")
Dim x = CInt(Console.ReadLine())
For Each elem In Iterator Function()
Dim i = x
Do
Yield i
i += 1
x -= 1
Loop Until i = x + 20
End Function() ' here
Console.WriteLine($"{elem} to {x}")
Next
Console.ReadKey()
End Sub
Also note the parentheses ' here; the lambda function Iterator Function...End Function returns an IEnumerable(Of Integer) but is not such an object by itself. It must be called to get that object, and that’s what the () after End Function does.
The converted code by [1] raises errors in C# 7.3 (CS0149):
static void Main()
{
Console.Write("x: ");
var x = System.Convert.ToInt32(Console.ReadLine());
// ERROR: CS0149 - Method name expected
foreach (var elem in () =>
{
var i = x;
do
{
yield return i;
i += 1;
x -= 1;
}
while (i != x + 20);
}())
Console.WriteLine($"{elem} to {x}");
Console.ReadKey();
}
I strongly disagree to the reason given in the other answers that it's difficult for the compiler to handle. The Iterator Function() you see in the VB.NET example is specifically created for lambda iterators.
In VB, there is the Iterator keyword; it has no C# counterpart. IMHO, there is no real reason this is not a feature of C#.
So if you really, really want anonymous iterator functions, currently use Visual Basic or (I haven't checked it) F#, as stated in a comment of Part #7 in #Thomas Levesque's answer (do Ctrl+F for F#).
I've read that it is usually bad practice to extend System.Object, which I do agree with.
I am curious, however, if the following would be considered a useful extension method, or is it still bad practice?
It is similar to extending System.Object but not exactly,
public static R InvokeFunc<T, R>(this T input, Func<T, R> func)
{
return func.Invoke(input);
}
This essentially allows any object to invoke any function that takes that object as a parameter and returns R, whether that function belongs to the object or not. I think this could facilitate some interesting 'inversion of control', but not sure about it overall.
Thoughts?
Well there are really two points here:
1) Whether it is a good idea to create an extension method with this T so it will be applied to all types?
2) Whether the particular extension method described is useful?
For the 1st question the answer is sometimes but depends on the context. You can have an extension method apply to all classes just like linq does ensuring that you pick an appropriate namespace. I would think creating this type of extension method within the System namespace a bad idea but if it were more targeted then perhaps it would be useful.
For the 2nd since the invoke is immediate then the choice of syntax is as follows
int res = other.InvokeFunc<Other, int>(Callback);
var res2 = (new Func<Other, int>(Callback))(other);
var res3 = Callback(other);
Looking at that then a simple call to the method passing the instance in is more natural and typical, however if your extension method becomes more elaborate then I go back to my first point on that it depends on the context (which could help with encapsulation).
All this does is that it gives you the ability to refer to a method as a parameter which is in fact what delegates already allow you in C#.
I don't see it being more useful (in case of IoC) than a delegate of type Func<T,R> in your case. It's just another way of invoking it.
UPDATE
As mentioned in the comments, I think this method only helps you in creating delegates more efficiently. But either way, you do not use the created delegate any further since you invoke it immediately. So an extension method like this would make more sense to me:
public static Func<R> InvokeFunc<T, R>(this T input, Func<T, R> func)
{
return () => func(input);
}
Why cant I use an IEnumerable with params? Will this ever be fixed? I really wish they would rewrite the old libraries to use generics...
Why cant I use an IEnumerable with params?
The question presupposes that the design team must provide a reason to not add a feature to the language. This presupposition is false.
Rather, in order for a feature to be used by you it needs to be thought of, designed, specified, implemented, tested, documented and shipped. All of these have large costs.
The "params enumerable" feature has been thought of and designed. It has never been specified, implemented, tested, documented or shipped.
Therefore, you cannot use the feature.
UPDATE: As of this writing -- early 2015 -- has now been specified, but implementation, testing, documentation and shipping were cut for C# 6.0 in the latter part of 2014. See Lucian's announcement here: http://roslyn.codeplex.com/discussions/568820.
Since it has still not been implemented, tested, documented and shipped, there is still no such feature. Hopefully this will make it into a hypothetical future version of C#.
UPDATE: I should clarify what I mean by "the feature" since it is possible we all have different ideas in our heads what "the feature" is. The feature I'm talking about is to allow you to say something like
void Frob(params IEnumerable<int> x)
{
foreach(int y in x) ...
}
and then the call site can either be in the "normal form" of passing a sequence of integers, or the "expanded form" of Frob(10, 20, 30). If in the expanded form, the compiler generates the call as though you'd said Frob(new int[] { 10, 20, 30}), the same as it does for param arrays. The point of the feature is that it is often the case that the method never uses random access to the array, and therefore, we could weaken the requirement that the params be an array. The params could just be a sequence instead.
You can do this today by making an overload:
void Frob(params int[] x) { Frob((IEnumerable<int>)x); }
void Frob(IEnumerable<int> x)
{
foreach(int y in x) ...
}
which is a bit of a pain. We could simply allow you to use IEnumerable as the type of the params argument and be done with it.
Will this ever be fixed?
I hope so. This feature has been on the list for a long time. It would make a lot of functions work much more nicely with LINQ.
Frob(from c in customers select c.Age);
without having to write two different versions of Frob.
However, it is a mere "small convenience" feature; it doesn't actually add a whole lot of new power to the language. That's why its never made it high enough on the priority list to make it to the "specification is written" stage.
I really wish they would rewrite the old libraries to use generics.
Comment noted.
Ah, I think I may now have understood what you mean. I think you want to be able to declare a method like this:
public void Foo<T>(params IEnumerable<T> items)
{
}
And then be able to call it with a "normal" argument like this:
IEnumerable<string> existingEnumerable = ...;
Foo(existingEnumerable);
or with multiple parameters like this:
Foo("first", "second", "third");
Is that what you're after? (Noting that you'd want the first form to use T=string, rather than T=IEnumerable<string> with a single element...)
If so, I agree it could be useful - but it's easy enough to have:
public void Foo<T>(params T[] items)
{
Foo((IEnumerable<T>) items);
}
public void Foo<T>(IEnumerable<T> items)
{
}
I don't find I do this often enough to make the above a particularly ugly workaround.
Note that when calling the above code, you'll want to explicitly specify the type argument, to avoid the compiler preferring the params example. So for example:
List<string> x = new List<string>();
Foo<string>(x);
The params parameters are sent as an array, and an IEnumerable<T> doesn't provide the random access that is required to act as an array.
You have to create the array from the IEnumerable when you call the method:
TheMethod(theIEnumerable.ToArray());