Could anyone point out the differences between C# statements and their alike extension methods? e.g: foreach vs. .ForEach(the extension method).
If there are any difference, what are they? Security wise? Performance wise? Which one is better to use? Which one is safer? etc.
And if there are no differences, then why bother writing them?
I've been thinking and searching a bit about this question if mine and didn't find my answer.
It depends on the implementation of the extension method you use. Internally, there's really nothing special about most's version of .ForEach.
There would be minimal/negligable time to load the extension method at app load and compile time. There "May" be minimal overhead to convert the .ForEach syntax into the underlying foreach as it's technically only a wrapper. It could potentially cause security issues, but only because it can create closure sitiuations where your objects may not be collected at the time expected (eg: held in scope longer). Ultimately, there's very, very little difference, and it comes down to taste. Unless of course, you're trying to shave off every millisecond, and in that case, using the native body is the best way to go.
I would like to mention that the .ForEach strikes against the premise of using lambda statements being purely functional, that is, it breaks the "functional" style and introduces the possibility of side-effects. Using a foreach body makes the code more readable, and explicit.
Please see:
Why there is no ForEach extension method on IEnumerable?
It's a trade off. The extension method is certainly more concise, and it provides compile time checking. The extension method also can introduce difficulty of readability, difficulty of maintainability, and side-effects.
Taken from here
The second reason is that doing so adds zero new representational
power to the language. Doing this lets you rewrite this perfectly
clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach((Foo foo)=>{ statement involving foo; });
which uses almost exactly the same characters in slightly different
order. And yet the second version is harder to understand, harder to
debug, and introduces closure semantics, thereby potentially changing
object lifetimes in subtle ways.
The provided answers are inaccurate. There are many pitfalls when using a ForEach extension method. E.g. the following extension method may easily become a performance killer:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var item in source)
{
action(item);
}
}
And then we misuse it:
IEnumerable<T> items = new List<T>();
items.ForEach(UpdateItem);
Looks nice, right? Well, here the ForEach() extension method is called on an IEnumerable<T> which means the compiler is forced to allocate a generic enumerator instead of using an optimized, allocation-free version. Then, the Action argument calls for another quite heavy delegate allocation. Put this loop on a hot path and the Garbage Collector will go nuts, causing significant performance issues.
Please see my other answer, where I explain this in much greater detail.
In terms of security, I have seen developers accidentally including a third-party assembly to use a specific ForEach() extension method. This implied shipping an unwanted dependency from who-knows-where with unknown capabilities.
Summary
foreach is safer.
foreach is more performant.
foreach is better. The compiler knows exactly how to deal with it efficiently.
.ForEach is similar to Parallel.ForEach. I've seen the regular .ForEach used to develop/debug parallel versions before. Whats nice about it is that you don't have to change a bunch of code to move between the two.
In general, if I have no intentions to do the Parallel.ForEach, then I prefer the regular foreach for readability.
Related
Wondering why C# is moving towards more pattern based programming rather than conventional ways.
Ex. The foreach statement expects that the loop source to have a magic method called GetEnumerator which returns an object which has a few more magic methods like MoveNext and Current, but they don't mandate any specific interface? C# could have mandated that a class to be used in foreach should implement IEnumerable or IEnumerable<T> as it does for theusing statement in that it expects an object to be used in using statement to implement the IDisposable interface.
Also, I see a similar trend with async/await keywords as well....
Of course there must be a good reason for that, but it seems a little odd for me to understand the reason why does compiler/CLR requires "magic methods" rather than relying on interfaces.
foreach
I would say it's both about performance and compatibility
If you had chosen foreach to use IEnumerable it would have made all generic
collections iteration very slow for value-types T (because of
boxing/unboxing).
If you had chosen to use IEnumerable<T> iterating over ArrayList and
all non-generic collections from early .NET version would have not been
possible.
I think the design decision was good. When foreach was introduced (.NET 1.1) there was nothing about generics in .NET (they were introduced in .NET 2.0). Choosing IEnumerable as a source of foreach enumeration would make using it with generic collections poor or would require a radical change. I guess designers already knew that they were going to introduce generics not that long time later.
Additionaly, declaring it as use IEnumerable<T> when it's available or IEnumerable when it's not is not much different then use available GetEnumerator method or do not compile when it's not available, is it?
update
As #mikez mentioned in comments, there is one more advantage. When you don't expect GetEnumerator to return IEnumerator/IEnumerator<T> you can return struct and don't worry about boxing when the enumerator is used by loop.
LINQ
The same magic methods situation occurs when you use LINQ and syntax based queries. When you write
var results = from item in source
where item != "test"
select item.ToLower();
it's transformed by compiler into
var results = source.Where(x => x != "test")
.Select(x => x.ToLower());
And because that code would work no matter what interface source implement the same applies to syntax-based query. As long as after transforming it to method-based query every method call can be properly assigned by compiler everything is OK.
async/await
I'm not that sure but think the same thing applies to async/await. When you use these keywords compiler generates a bunch of code for yourself, which is then compiled as if you'd written the code by yourself. And as long as code made by that transformation can be compiled everything is OK.
I recently started on WPF, and I noticed that you have to do a lot of casting (especially with events). This is an aesthetic issue, but I was wondering how bad it would be if I'd use an extension method to cast, instead of using normal casting.
public static T Cast<T>(this object obj)
{
return (T)obj;
}
This would mean I could prevent a few nested parantheses, and change:
Console.WriteLine(((DataGridCell)e.OriginalSource).ActualHeight);
to:
Console.WriteLine(e.OriginalSource.Cast<DataGridCell>().ActualHeight);
Are there any clear disadvantages that I might be overlooking? How disgusted will people be when they encounter this in code? :)
This is similar in intent to Enumerable.Cast, so I wouldn't necessarily say that people will be disgusted.
Are there any clear disadvantages that I might be overlooking?
The main disadvantage is that this will be an extension method available to every single variable in your code, since you're extending System.Object. I typically avoid extension methods on Object for this reason, as it "pollutes" intellisense.
That being said, there are other disadvantages:
If you used this on an existing IEnumerable, you'd get a name collision with Enumerable.Cast<T>. A file having your namespace included but missing a using System.Linq could easily be misunderstood by other developers, as this would have a very different meaning to the expected "Cast<T>" extension method.
If you use this on a value type, you're introducing boxing (pushing the value type into an object), then an unbox and cast, which can actually cause an exception that wouldn't occur with a cast. Your extension method will raise an exception if you do:
int i = 42;
float f = i.Cast<float>();
This might be unexpected, as float f = (float)i; is perfectly legal. For details, see Eric Lippert's post on Representation and Identity. If you do write this, I would definitely recommend adding a class constraint to your operator.
I, personally, would just use parenthesis. This is a common, language supported feature, and should be understandable to all C# developers. Casting has the advantages of being shorter, understandable, and side effect free (in terms of intellisense, etc).
The other option would be to make this a normal static method, which would allow you to write:
Console.WriteLine(Utilities.Cast<DataGridCell>(e.OriginalSource).ActualHeight);
This eliminates the disadvantage of "polluting" intellisense, and makes it obvious that its a method you wrote, but increases the amount of typing required to use. It also does nothing to prevent the boxing and unbox/cast issue.
The main disadvantage is that casting is well-known for every C# developer, while your Cast<T> method is just another not-invented here wheel. The next step, usually, is a set of extensions like IsTrue, IsFalse, IsNull, etc.
This is a syntax garbage.
I need to find the minimum between 3 values, and I ended up doing something like this:
Math.Min(Math.Min(val1, val2), val3)
It just seems a little silly to me, because other languages use variadic functions for this. I highly doubt this was an oversight though.
Is there any reason why a simple Min/Max function shoundn't be variadic? Are there performance implications? Is there a variadic version that I didn't notice?
If it is a collection (A subclass of IEnumerable<T>) one could easily use the functions in the System.Linq library
int min = new int[] {2,3,4,8}.Min();
Furthermore, it's easy to implement these methods on your own:
public static class Maths {
public static T Min<T> (params T[] vals) {
return vals.Min();
}
public static T Max<T> (params T[] vals) {
return vals.Max();
}
}
You can call these methods with just simple scalars so Maths.Min(14,25,13,2) would give 2.
These are the generic methods, so there is no need to implement this method for each type of numbers (int,float,...)
I think the basic reason why these methods are not implemented in general is that, every time you call this method, an array (or at least an IList object) should be created. By keeping low-level methods one can avoid the overhead. However, I agree one could add these methods to the Math class to make the life of some programmers easier.
CommuSoft has addressed how to accomplish the equivalent in C#, so I won't retread that part.
To specifically address your question "Why aren't C#'s Math.Min/Max variadic?", two thoughts come to mind.
First, Math.Min (and Math.Max) is not, in fact, a C# language feature, it is a .NET framework library feature. That may seem pedantic, but it is an important distinction. C# does not, in fact, provide any special purpose language feature for determining the minimum or maximum value between two (or more) potential values.
Secondly, as Eric Lippert has pointed out a number of times, language features (and presumably framework features) are not "removed" or actively excluded - all features are unimplemented until someone designs, implements, tests, documents and ships the feature. See here for an example.
Not being a .NET framework developer, I cannot speak to the actual decision process that occurred, but it seems like this is a classic case of a feature that simply never rose to the level of inclusion, similar to the sequence foreach "feature" Eric discusses in the provided link.
I think CommuSoft is providing a robust answer that is at least suited for people searching for something along these lines, and that should be accepted.
With that said, the reason is definitely to avoid the overhead necessary for the less likely use case that people want to compare a group rather than two values.
As pointed about by #arx, using a parametric would be unnecessary overhead for the most used case, but it would also be a lot of unnecessary overhead with regards to the loop that would have to be used internally to go through the array n - 1 times.
I can easily see an argument for having created the method in addition to the basic form, but with LINQ that's just no longer necessary.
class my_class
{
public int add_1(int a, int b) {return a + b;}
public func<int, int, int> add_2 = (a, b) => {return a + b;}
}
add_1 is a function whereas add_2 is a delegate. However in this context delegates can forfill a similar role.
Due to precedent and the design of the language the default choice for C# methods should be functions.
However both approaches have pros and cons so I've produced a list. Are there any more advanteges or disadvantages to either approach?
Advantages to conventional methods.
more conventional
outside users of the function see named parameters - for the add_2 syntax arg_n and a type is generally not enough information.
works better with intellisense - ty Minitech
works with reflection - ty Minitech
works with inheritance - ty Eric Lippert
has a "this" - ty CodeInChaos
lower overheads, speed and memory - ty Minitech and CodeInChaos
don't need to think about public\private in respect to both changing and using the function. - ty CodeInChaos
less dynamic, less is permitted that is not known at compile time - ty CodeInChaos
Advantages to "field of delegate type" methods.
more consistant, not member functions and data members, it's just all just data members.
can outwardly look and behave like a variable.
storing it in a container works well.
multiple classes could use the same function as if it were each ones member function, this would be very generic, concise and have good code reuse.
straightforward to use anywhere, for example as a local function.
presumably works well when passed around with garbage collection.
more dynamic, less must be known at compile time, for example there could be functions that configure the behaviour of objects at run time.
as if encapsulating it's code, can be combined and reworked, msdn.microsoft.com/en-us/library/ms173175%28v=vs.80%29.aspx
outside users of the function see unnamed parameters - sometimes this is helpfull although it would be nice to be able to name them.
can be more compact, in this simple example for example the return could be removed, if there were one parameter the brackets could also be removed.
roll you'r own behaviours like inheritance - ty Eric Lippert
other considerations such as functional, modular, distributed, (code writing, testing or reasoning about code) etc...
Please don't vote to close, thats happened already and it got reopened. It's a valid question even if either you don't think the delegates approach has much practical use given how it conflicts with established coding style or you don't like the advanteges of delegates.
First off, the "high order bit" for me with regards to this design decision would be that I would never do this sort of thing with a public field/method. At the very least I would use a property, and probably not even that.
For private fields, I use this pattern fairly frequently, usually like this:
class C
{
private Func<int, int> ActualFunction = (int y)=>{ ... };
private Func<int, int> Function = ActualFunction.Memoize();
and now I can very easily test the performance characteristics of different memoization strategies without having to change the text of ActualFunction at all.
Another advantage of the "methods are fields of delegate type" strategy is that you can implement code sharing techniques that are different than the ones we've "baked in" to the language. A protected field of delegate type is essentially a virtual method, but more flexible. Derived classes can replace it with whatever they want, and you have emulated a regular virtual method. But you could build custom inheritence mechanisms; if you really like prototype inheritance, for example, you could have a convention that if the field is null, then a method on some prototypical instance is called instead, and so on.
A major disadvantage of the methods-are-fields-of-delegate-type approach is that of course, overloading no longer works. Fields must be unique in name; methods merely must be unique in signature. Also, you don't get generic fields the way that we get generic methods, so method type inference stops working.
The second one, in my opinion, offers absolutely no advantage over the first one. It's much less readable, is probably less efficient (given that Invoke has to be implied) and isn't more concise at all. What's more, if you ever use reflection it won't show up as being a method so if you do that to replace your methods in every class, you might break something that seems like it should work. In Visual Studio, the IntelliSense won't include a description of the method since you can't put XML comments on delegates (at least, not in the same way you would put them on normal methods) and you don't know what they point to anyway, unless it's readonly (but what if the constructor changed it?) and it will show up as a field, not a method, which is confusing.
The only time you should really use lambdas is in methods where closures are required, or when it's offers a significant convenience advantage. Otherwise, you're just decreasing readability (basically the readability of my first paragraph versus the current one) and breaking compatibility with previous versions of C#.
Why you should avoid delegates as methods by default, and what are alternatives:
Learning curve
Using delegates this way will surprise a lot of people. Not everyone can wrap their head around delegates, or why you'd want to swap out functions. There seems to be a learning curve. Once you get past it, delegates seem simple.
Perf and reliability
There's a performance loss to invoking delegates in this manner. This is another reason I would default to traditional method declaration unless it enabled something special in my pattern.
There's also an execution safety issue. Public fields are nullable. If you're passed an instance of a class with a public field you'll have to check that it isn't null before using it. This hurts perf and is kind of lame.
You can work around this by changing all public fields to properties (which is a rule in all .Net coding standards anyhow). Then in the setter throw an ArgumentNullException if someone tries to assign null.
Program design
Even if you can deal with all of this, allowing methods to be mutable at all goes against a lot of the design for static OO and functional programming languages.
In static OO types are always static, and dynamic behavior is enabled through polymorphism. You can know the exact behavior of a type based on its run time type. This is very helpful in debugging an existing program. Allowing your types to be modified at run time harms this.
In both static OO and function programming paradigms, limiting and isolating side-effects is quite helpful, and using fully immutable structures is one of the primary ways to do this. The only point of exposing methods as delegates is to create mutable structures, which has the exact opposite effect.
Alternatives
If you really wanted to go so far as to always use delegates to replace methods, you should be using a language like IronPython or something else built on top of the DLR. Those languages will be tooled and tuned for the paradigm you're trying to implement. Users and maintainers of your code won't be surprised.
That being said, there are uses that justify using delegates as a substitute for methods. You shouldn't consider this option unless you have a compelling reason to do so that overrides these performance, confusion, reliability, and design issues. You should only do so if you're getting something in return.
Uses
For private members, Eric Lippert's answer describes a good use: (Memoization).
You can use it to implement a Strategy Pattern in a function-based manner rather than requiring a class hierarchy. Again, I'd use private members for this...
...Example code:
public class Context
{
private Func<int, int, int> executeStrategy;
public Context(Func<int, int, int> executeStrategy) {
this.executeStrategy = executeStrategy;
}
public int ExecuteStrategy(int a, int b) {
return executeStrategy(a, b);
}
}
I have found a particular case where I think public delegate properties are warrented: To implement a Template Method Pattern with instances instead of derived classes...
...This is particularly useful in automated integration tests where you have a lot of setup/tear down. In such cases it often makes sense to keep state in a class designed to encapsulate the pattern rather than rely on the unit test fixture. This way you can easily support sharing the skeleton of the test suite between fixtures, without relying on (sometimes shoddy) test fixture inheritance. It also might be more amenable to parallelization, depending on the implementation of your tests.
var test = new MyFancyUITest
{
// I usually name these things in a more test specific manner...
Setup = () => { /* ... */ },
TearDown = () => { /* ... */ },
};
test.Execute();
Intellisense Support
outside users of the function see unnamed parameters - sometimes this is helpfull although it would be nice to be able to name them.
Use a named delegate - I believe this will get you at least some Intellisense for the parameters (probably just the names, less likely XML docs - please correct me if I'm wrong):
public class MyClass
{
public delegate int DoSomethingImpl(int foo, int bizBar);
public DoSomethingImpl DoSomething = (x, y) => { return x + y; }
}
I'd avoid delegate properties/fields as method replacements for public methods. For private methods it's a tool, but not one I use very often.
instance delegate fields have a per instance memory cost. Probably a premature optimization for most classes, but still something to keep in mind.
Your code uses a public mutable field, which can be changed at any time. That hurts encapsulation.
If you use the field initializer syntax, you can't access this. So field initializer syntax is mainly useful for static methods.
Makes static analysis much harder, since the implementation of that method isn't known at compile-time.
There are some cases where delegate properties/fields might be useful:
Handlers of some sort. Especially if multi-casting (and thus the event subscription pattern) doesn't make much sense
Assigning something that can't be easily described by a simple method body. Such as a memoized function.
The delegate is runtime generated or at least its value is only decided at runtime
Using a closure over local variables is an alternative to using a method and private fields. I strongly dislike classes with lots of fields, especially if some of these fields are only used by two methods or less. In these situations, using a delegate in a field can be preferable to conventional methods
class MyClassConventional {
int? someValue; // When Mark() is called, remember the value so that we can do something with it in Process(). Not used in any other method.
int X;
void Mark() {
someValue = X;
}
void Process() {
// Do something with someValue.Value
}
}
class MyClassClosure {
int X;
Action Process = null;
void Mark() {
int someValue = X;
Process = () => { // Do something with someValue };
}
}
This question presents a false dichotomy - between functions, and a delegate with an equivalent signature. The main difference is that one of the two you should only use if there are no other choices. Use this in your day to day work, and it will be thrown out of any code review.
The benefits that have been mentioned are far outweighed by the fact that there is almost never a reason to write code that is so obscure; especially when this code makes it look like you don't know how to program C#.
I urge anyone reading this to ignore any of the benefits which have been stated, since they are all overwhelmed by the fact that this is the kind of code that demonstrates that you do not know how to program in C#.
The only exception to that rule is if you have a need for one of the benefits, and that need can't be satisfied in any other way. In that case, you'll need to write more comment than code to explain why you have a good reason to do it. Be prepared to answer as clearly as Eric Lippert did. You'd better be able to explain as well as Eric does that you can't accomplish your requirements and write understandable code at the same time.
Does LINQ have a sequence operator, which allows to perform some action on every element without projecting it to a new sequence?
This might see a bit awkward, but just for me to know :)
Example:
IEnumerable<IDisposable> x;
x.PERFORM_ACTION_ON_EVERY_ELEMENT(m => m.Dispose());
Obviously, this could be done using something like:
foreach (var element in x) x.Dispose();
But if something actually exists, that would be nice.
No, it doesn't exist. Specifically for the reason you mention: It seems awkward having a single operator that behaves completely different than all the others.
Eric Lippert, one of the C# Compiler developers has an article about this.
But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.
The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects.
The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect. The call site of this thing would look an awful lot like an expression (though, admittedly, since the method is void-returning, the expression could only be used in a “statement expression” context.)
It does not sit well with me to make the one and only sequence operator that is only useful for its side effects.
You can use this method:
public static class Extension
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var t in source)
{
action(t);
}
return source;
}
}
It returns the source so you can pass it along to another extension method as needed. Or if you want to be void, you can change the method a little bit.
The morelinq project has a ForEach operator. LINQ itself doesn't, as LINQ is all about functional programming, and ForEach has side effects.
Here is a similar dicussion on this Run a method on all objects within a collection