What is the motivation of C# ExpressionVisitor's implementation? - c#

I have to design a solution for a task, and I would like to use something theoretically similar to C#'s ExpressionVisitor.
For curiosity I opened the .NET sources for ExpressionVisitor to have a look at it. From that time I've been wondering why the .NET team implemented the visitor as they did.
For example MemberInitExpression.Accept looks like this:
protected internal override Expression Accept(ExpressionVisitor visitor) {
return visitor.VisitMemberInit(this);
}
My - probably noob - question is: does it make any sense? I mean shouldn't the Accept method itself be responsible of how it implements the visiting within itself? I mean I've expected something like this (removing the internal visibility to be overridable from outside):
protected override Expression Accept(ExpressionVisitor visitor) {
return this.Update(
visitor.VisitAndConvert(this.NewExpression, "VisitMemberInit"),
visitor.Visit(this.Bindings, VisitMemberBinding)
);
}
But this code is in the base ExpressionVisitor's VisitMemberInit method, which gets called from MemberInitExpression.Accept. So seems like not any benefit of the Accept implementation here.
Why not just process the tree in the base ExpressionVisitor, and forget about all the Accept methods?
I hope you understand my points, and hope someone could shed some light on the motivation behind this implementation. Probably I don't understand the Visitor pattern at all?...

A visitor can override the way any expression is visited. If your proposal was implemented in all places the visitor never would be called. All the visitation logic would be in the overrides of Accept. Non-BCL code cannot override this method.
If you write visitor.Visit((Expression)this.SomeExpression) (like you do in the question) then how are you going to perform dynamic dispatch on the type of SomeExpression? Now the visitor has to perform the dynamic dispatch. Note, that your 2nd code snippet makes the simplifying assumption that all sub-expressions to be visited have known type. Try to write the code for BinaryExpression to see what I mean.
Maybe I did not understand something but this proposal does not make sense.
The purpose of the Accept method is a performance optimization. Each accept is a virtual call which is rather cheap. The alternative would be to have a huge switch in the visitor over the expression type (which is an enum). That's probably slower.

The visitor pattern allows the algorithm to essentially be separated from the structure it's operating on. In this case the structure it operates on is the expression tree.
Notice that the Accept method in the visitor is virtual. This means we could conceivably write different implementations of ExpressionVisitor that do different things to an expression tree (and, indeed, there are different implementations). And we can do this without changing any code in the expression tree classes themselves.
Examples of different visitor implementations might be something like having one visitor that turns the expression tree back into a string representing C# code (or perhaps code in another language).

Related

Interface conflict resolution in C#

This is a spin-off question based on Eric Lippert's answer on this question.
I would like to know why the C# language is designed not being able to detect the correct interface member in the following specific case. I am not looking on feedback whether designing a class this way is considered best practice.
class Turtle { }
class Giraffe { }
class Ark : IEnumerable<Turtle>, IEnumerable<Giraffe>
{
public IEnumerator<Turtle> GetEnumerator()
{
yield break;
}
// explicit interface member 'IEnumerable.GetEnumerator'
IEnumerator IEnumerable.GetEnumerator()
{
yield break;
}
// explicit interface member 'IEnumerable<Giraffe>.GetEnumerator'
IEnumerator<Giraffe> IEnumerable<Giraffe>.GetEnumerator()
{
yield break;
}
}
In the code above, Ark has 3 conflicting implementation of GetEnumerator(). This conflict is resolved by treating IEnumerator<Turtle>'s implementation as default, and requiring specific casts for both others.
Retrieving the enumerators works like a charm:
var ark = new Ark();
var e1 = ((IEnumerable<Turtle>)ark).GetEnumerator(); // turtle
var e2 = ((IEnumerable<Giraffe>)ark).GetEnumerator(); // giraffe
var e3 = ((IEnumerable)ark).GetEnumerator(); // object
// since IEnumerable<Turtle> is the default implementation, we don't need
// a specific cast to be able to get its enumerator
var e4 = ark.GetEnumerator(); // turtle
Why isn't there a similar resolution for LINQ's Select extension method? Is there a proper design decision to allow the inconsistency between resolving the former, but not the latter?
// This is not allowed, but I don't see any reason why ..
// ark.Select(x => x); // turtle expected
// these are allowed
ark.Select<Turtle, Turtle>(x => x);
ark.Select<Giraffe, Giraffe>(x => x);
It's important to first understand what mechanism is being used to resolve the call to the extension method Select. C# uses a generic type inference algorithm which is fairly complex; see the C# specification for the details. (I really should write a blog article explaining it all; I recorded a video about it in 2006 but unfortunately it has disappeared.)
But basically, the idea of generic type inference on Select is: we have:
public static IEnumerable<R> Select<A, R>(
this IEnumerable<A> items,
Func<A, R> projection)
From the call
ark.Select(x => x)
we must deduce what A and R was intended.
Since R depends on A, and in fact is equal to A, the problem reduces to finding A. The only information we have is the type of ark. We know that ark:
Is Ark
Extends object
Implements IEnumerable<Giraffe>
Implements IEnumerable<Turtle>
IEnumerable<T> extends IEnumerable and is covariant.
Turtle and Giraffe extend Animal which extends object.
Now, if those are the only things you know, and you know that we're looking for IEnumerable<A>, what conclusions can you reach about A?
There are a number of possibilities:
Choose Animal, or object.
Choose Turtle or Giraffe by some tiebreaker.
Decide that the situation is ambiguous, and give an error.
We can reject the first option. A design principle of C# is: when faced with a choice between options, always choose one of the options or produce an error. C# never says "you gave me a choice between Apple and Cake so I choose Food". It always chooses from the choices you gave it, or it says that it has no basis on which to make a choice.
Moreover, if we chose Animal, that just makes the situation worse. See the exercise at the end of this post.
You propose the second option, and your proposed tiebreaker is "an implicitly implemented interface gets priority over an explicitly implemented interface".
This proposed tiebreaker has some problems, starting with there is no such thing as an implicitly implemented interface. Let's make your situation slightly more complicated:
interface I<T>
{
void M();
void N();
}
class C : I<Turtle>, I<Giraffe>
{
void I<Turtle>.M() {}
public M() {} // Used for I<Giraffe>.M
void I<Giraffe>.N() {}
public N() {}
public static DoIt<T>(I<T> i) {i.M(); i.N();}
}
When we call C.DoIt(new C()) what happens? Neither interface is "explicitly implemented". Neither interface is "implicitly implemented". Interface members are implicitly or explicitly implemented, not interfaces.
Now we could say "an interface that has all of its members implicitly implemented is an implicitly implemented interface". Does that help? Nope. Because in your example, IEnumerable<Turtle> has one member implicitly implemented and one member explicitly implemented: the overload of GetEnumerator that returns IEnumerator is a member of IEnumerable<Turtle> and you've explicitly implemented it.
(ASIDE: A commenter notes that the above is inelegantly worded; it is not entirely clear from the specification whether members "inherited" from "base" interfaces are "members" of the "derived" interface, or whether it is simply the case that a "derivation" relationship between interfaces is simply the statement of a requirement that any implementor of the "derived" interface must also implement the "base". The specification has historically been unclear on this point and it is possible to make arguments either way. Regardless, my point is that the derived interface requires you to implement a certain set of members, and some of those members can be implicitly implemented and some can be explicitly implemented, and we can count how many there are of each should we choose to.)
So now maybe the proposed tiebreaker is "count the members, and the interface that has the least members explicitly implemented is the winner".
So let's take a step back here and ask the question: how on earth would you document this feature? How would you explain it? Suppose a customer comes to you and says "why are turtles being chosen over giraffes here?" How would you explain it?
Now suppose the customer asks "how can I make a prediction about what the compiler will do when I write the code?" Remember, that customer might not have the source code to Ark; it might be a type in a third-party library. Your proposal makes the invisible-to-users implementation decisions of third parties into relevant factors that control whether other people's code is correct or not. Developers generally are opposed to features that make it impossible for them to understand what their code does, unless there is a corresponding boost in power.
(For example: virtual methods make it impossible to know what your code does, but they are very useful; no one has made the argument that this proposed feature has a similar usefulness bonus.)
Suppose that third party changes a library so that a different number of members are explicitly implemented in a type you depend on. Now what happens? A third party changing whether or not a member is explicitly implemented can cause compilation errors in other people's code.
Even worse, it can not cause a compilation error; imagine a situation in which someone makes a change just in the number of methods that are implicitly implemented, and those methods are not even methods that you call, but that change silently causes a sequence of turtles to become a sequence of giraffes.
Those scenarios are really, really bad. C# was carefully designed to prevent this kind of "brittle base class" failure.
Oh, but it gets worse. Suppose we did like this tiebreaker; could we even implement it reliably?
How can we even tell if a member is explicitly implemented? The metadata in the assembly has a table that lists what class members are explicitly mapped to what interface members, but is that a reliable reflection of what is in the C# source code?
No, it is not! There are situations in which the C# compiler must secretly generate explicitly implemented interfaces on your behalf in order to satisfy the verifier (describing them would be quite off topic). So you cannot actually tell very easily how many interface members the type's implementor decided to implement explicitly.
It gets worse still: suppose the class is not even implemented in C#? Some languages always fill in the explicit interface table, and in fact I think Visual Basic might be one of those languages. So your proposal is to make the type inference rules possibly different for classes authored in VB than an equivalent type authored in C#.
Try explaining that to someone who just ported a class from VB to C# to have an identical public interface, and now their tests stop compiling.
Or, consider it from the perspective of the person implementing class Ark. If that person wishes to express the intention "this type can be used as both a sequence of turtles and giraffes, but if there is an ambiguity, choose turtles". Do you believe that any developer who wished to express that belief would naturally and easily come to the conclusion that the way to do that is to make one of the interfaces more implicitly implemented than the other?
If that were the sort of thing that developers needed to be able to disambiguate, then there should be a well-designed, clear, discoverable feature with those semantics. Something like:
class Ark : default IEnumerable<Turtle>, IEnumerable<Giraffe> ...
for example. That is, the feature should be obvious and searchable, rather than emerging by accident from an unrelated decision about what the public surface area of the type should be.
In short: The number of interface members that are explicitly implemented is not a part of the .NET type system. It's a private implementation strategy decision, not a public surface that the compiler should use to make decisions.
Finally, I've left the most important reason for last. You said:
I am not looking on feedback whether designing a class this way is considered best practice.
But that is an extremely important factor! The rules of C# are not designed to make good decisions about crappy code; they're designed to make crappy code into broken code that does not compile, and that has happened. The system works!
Making a class that implements two different versions of the same generic interface is a terrible idea and you should not do it. Because you should not do it, there is no incentive for the C# compiler team to spend even a minute figuring out how to help you do it better. This code gives you an error message. That is good. It should! That error message is telling you you're doing it wrong, so stop doing it wrong and start doing it right. If it hurts when you do that, stop doing that!
(One can certainly point out that the error message does a poor job of diagnosing the problem; this leads to another whole bunch of subtle design decisions. It was my intention to improve that error message for these scenarios, but the scenarios were too rare to make them a high priority and I did not get to it before I left Microsoft in 2012. Apparently no one else has made it a priority in the years that followed either.)
UPDATE: You ask why a call to ark.GetEnumerator can do the right thing automatically. That is a much easier question. The principle here is a simple one:
Overload resolution chooses the best member that is both accessible and applicable.
"Accessible" means that the caller has access to the member because it is "public enough", and "applicable" means "all the arguments match their formal parameter types".
When you call ark.GetEnumerator() the question is not "which implementation of IEnumerable<T> should I choose"? That's not the question at all. The question is "which GetEnumerator() is both accessible and applicable?"
There is only one, because explicitly implemented interface members are not accessible members of Ark. There is only one accessible member, and it happens to be applicable. One of the sensible rules of C# overload resolution is if there is only one accessible applicable member, choose it!
Exercise: What happens when you cast ark to IEnumerable<Animal>? Make a prediction:
I will get a sequence of turtles
I will get a sequence of giraffes
I will get a sequence of giraffes and turtles
I will get a compile error
I will get something else -- what?
Now try out your prediction and see what really happens. Draw conclusions as to whether it is a good or bad idea to write types that have multiple constructions of the same generic interface.

Is there any advantage in disallowing interface implementation for existing classes?

In static OOP languages, interfaces are used in order to declare that several classes share some logical property - they are disposable, they can be compared to an int, they can be serialized, etc.
Let's say .net didn't have a standard IDisposable interface, and I've just came up with this beautiful idea:
interface IDiscardable { void Discard(); }
My app uses a lot of System.Windows.Forms, and I think that a Form satisfies the logical requirements for being an IDiscardable. The problem is, Form is defined outside of my project, so C# (and Java, C++...) won't allow me to implement IDiscardable for it. C# doesn't allow me to formally represent the fact that a Form can be discarded ( and I'll probably end up with a MyForm wrapper class or something.
In contrast, Haskell has typeclasses, which are logically similar to interfaces. A Show instance can be presented (or serialized) as a string, Eq allows comparisons, etc. But there's one crucial difference: you can write a typeclass instance (which is similar to implementing an interface) without accessing the source code of a type. So if Haskell supplies me with some Form type, writing an Discardable instance for it is trivial.
My question is: from a language designer perspective, is there any advantage to the first approach? Haskell is not an object oriented language - does the second approach violates OOP in any way?
Thanks!
This is a difficult question, which stems from a common misunderstanding. Haskell type classes (TC), are said to be "logically similar" to the interfaces or abstract classes (IAC) from object-oriented programming languages. They are not. They represent different concepts about types and programming languages: IAC are a case of subtyping, while TC is a form of parametric polymorphism.
Nevertheless, since your questions are methodological, here I answer from a methodological side. To start with the second question:
does the second approach [that of extending the implementation of a class outside the class] violate OOP in any way
Object oriented programming is a set of ideas to describe the execution of a program, the main elements of an execution, how to specify these elements in the program's code, and how to structure a program so as to separate the specification of different elements. In particular, OOP is based in these ideas:
At any state of its execution, a process (executing program) consists of a set of objects. This set is dynamic: it may contain different objects at different states, via object creation and destruction.
Every object has an internal state represented by a set of fields, which may include references to other related objects. Relations are dynamic: the same field of the same object a may at different states point to different objects.
Every object can receive some messages from another object. Upon receiving a message, the object may alter its state and may send messages to objects in its fields.
Every object is an instance of a class: the class describes what fields the object has, what messages it can receive, and what it does upon receiving a message.
In an object a, the same field a.f may at different states point to
different objects, which may belong to different classes. Thus, a needs not to know to what class those objects b belong; it only needs to know what messages do those objects accept. For this reason, the type of those fields can be an interface.
The interface declares a set of messages that an object can receive. The class specifies explicitly what interfaces are satisfied by the objects of that class.
My answer to the question: in my opinion yes.
Implementing an interface (as suggested in the example) outside a class breaks one of these ideas: that the class of the object describes the complete set of messages that objects in that class can receive.
You may like to know, though, that this is (in part) what "Aspects", as in AspectJ, are about. An Aspect describes the implementation of a certain "method" in several classes, and these implementations are incorportated (weaved) into the class.
To answer back the first question, "is there any advantage to the first approach", the answer would be also yes: that all the behaviour of an object (what messages it answers to) is only described in one place, in the class.
Well, the Haskell approach does have one disadvantage, which is when you write, for example, two different libraries that each provides its own implementation of interface Foo for the same external type (provided by yet a third library). In this case now these two libraries can't be used at the same time in the same program. So if you call lack of a disadvantage an advantage, then I guess that would be one advantage for the OOP language way of doing this—but it's a pretty weak advantage.
What I would add to this, however, is that Haskell type classes are a bit like OOP interfaces, but not entirely like them. But type classes are also a bit like the Strategy and Template Method patterns; a type class can be simulated by explicitly passing around a "dictionary" object that provides implementations for the type class operations. So the following Haskell type class:
class Monoid m where
mempty :: m
mappend :: m -> m -> m
...can be simulated with this explicit dictionary type:
data Monoid_ m = Monoid_ { _mempty :: m, _mappend :: m -> m -> m }
...or an OOP interface like this:
interface Monoid<M> {
M empty();
M append(M a, M b);
}
What type classes add on top of this is that the compiler will maintain and pass around your dictionaries implicitly. Sometimes in the Haskell community you get arguments about when and whether type classes are superior to explicit dictionary passing; see for example Gabriel Gonzalez's "Scrap your type classes" blog entry (and keep in mind that he doesn't 100% agree with what he says there!). So the OOP counterpart to this idea would be instead of extending the language to allow external implements declarations, what are the drawbacks to just explicitly using Strategies or Template Methods?
What you are describing is the adapter pattern. The act of composing an object in a new type that provides some additional behavior to the underlying type, in this case the implementation of another interface.
As with so many design patterns, different languages choose different design patterns to incorporate directly into the language itself and provide special language support, often in the form of a more concise syntax, while other patterns are need to be implemented through the use of other mechanisms without their own special syntax.
C# doesn't have special language support for the adapter pattern, you need to create a new explicit type that composes your other type, implements the interface, and uses the composed type to fulfill the interface's contract. Is it possible for them to add such a feature to the language, sure. Like any other feature request in existence it needs to be designed, implemented, tested, documented, and all sorts of other expenses accounted for. This feature has (thus far) not made the cut.
What you are describing is called duck typing, after the phrase "If it walks like a duck, swims like a duck, and quacks like a duck, then it's a duck".
C# actually does allow dynamic (run-time) duck typing through the dynamic keyword. What it doesn't allow is static (compile-time) duck typing.
You'd probably need somebody from Microsoft to come along and provide the exact reasons this doesn't exist in C#, but here are some likely candidates:
The "minus 100 points" philosophy to adding features. It's not just enough for a feature to have no drawbacks, to justify the effort put into implementing, testing, maintaining and supporting a language feature, it has to provide a clear benefit. Between the dynamic keyword and the adapter pattern, there's not many situations where this is useful. Reflection is also powerful enough that it would be possible to effectively provide duck typing, for example I believe it'd be relatively straightforward to use Castle's DynamicProxy for this.
There are situations where you want a class to be able to specify how it is accessed. For example, fluent APIs often control the valid orderings and combinations of chained methods on a class through the use of interfaces. See, for example, this article. If my fluent class was designed around a grammar which stated that once method A was called, no other methods except B could be called, I could control this with interfaces like:
public class FluentExample : ICanCallAB
{
public ICanCallB A()
{
return this;
}
public ICanCallAB B()
{
return this;
}
}
public interface ICanCallA
{
void A();
}
public interface ICanCallAB : ICanCallA
{
void B();
}
Of course, a consumer could get around this using casting or dynamic, but at least in this case the class can state its own intent.
Related to the above point, an interface implementation is a declaration of meaning. For example, Tree and Poodle might both have a Bark() member, but I would want to be able to use Tree as an IDog.

How to properly partition code in a C# functional library?

As a premise one of a key difference of FP design about reusable libraries (for what I'm learning), is that these are more data-centric that corresponding OO (in general).
This seems confirmed also from emerging techniques like TFD (Type-First-Development), well explained by Tomas Petricek in this blog post.
Nowadays language are multi-paradigm and the same Petricek in its book explains various functional techniques usable from C#.
What I'm interested here and, hence the question, is how to properly partition code.
So I've defined library data structures, using the equivalent of discriminated unions (as shown in Petricek book), and I project to use them with immutable lists and/or tuples according to the domain logic of mine requirements.
Where do I place operations (methods ... functions) that acts on data structures?
If I want define an high-order function that use a function value embodied in a standard delegates Func<T1...TResult>, where do I place it?
Common sense says me to group these methods in static classes, but I'd like a confirmation from people that already wrote functional libs in C#.
Assuming that this is correct and I've an high-order function like this:
static class AnimalTopology {
IEnumerable<Animal> ListVertebrated(Func<Skeleton, bool> selector) {
// remainder omitted
}
}
If choosing vertebrated animal has N particular cases that I want to expose in the library, what's the more correct way to expose them.
static class VertebratedSelectorsA {
// this is compatible with "Func<Skeleton, bool> selector"
static bool Algorithm1(Skeleton s) {
//...
}
}
or
static class VertebratedSelectorsB {
// this method creates the function for later application
static Func<Skeleton, bool> CreateAlgorithm1Selector(Skeleton s) {
// ...
}
}
Any indication will be very appreciated.
EDIT:
I want to quote two phrases from T. Petricek, Real World Functional Programming foreword by Mads Torgersen:
[...] You can use functional programming techniques in C# to great benefit,
though it is easier and more natural to do so in F#.
[...]
Functional programming is a state of mind. [...]
EDIT-2:
I feel there's a necessity to further clarify the question. The functional mentioned in the title strictly relates to Functional Programming; I'm not asking the more functional way of grouping methods, in the sense of more logic way or the the way that make more sense in general.
This implies that the implementation will try to follow as more as possible founding concepts of FP summarized by NOOO manifesto and quoted here for convenience and clarity:
Functions and Types over classes
Purity over mutability
Composition over inheritance
Higher-order functions over method dispatch
Options over nulls
The question is around how to layout a C# library wrote following FP concepts, so (for example) it's absolutely not an option putting methods inside data structure; because this is a founding Object-Oriented paradigm.
EDIT-3:
Also if the question got response (and various comments), I don't want give the wrong impression that there has been said that one programming paradigm is superior than another.
As before I'll mention an authority on FP, Don Syme, in its book Expert F# 3.0 (ch.20 - Designing F# Libraries - pg.565):
[...] It's a common misconception that the functional and OO programming methodologies compete; in fact, they're largely orthogonal. [...]
Note: If you want a shorter, more-to-the-point answer, see my other answer. I am aware that this one here might seem to ramble & go on forever & talk past your issue, but perhaps it will give you a few ideas.
It is difficult to answer your question without knowing the exact relationship between Animal and Skeleton. I will make a recommendation about this relationship in the second half of my answer, but before I do that, I will simply go along with what I see in your post.
First I will try to infer a few things from your code:
static class AnimalTopology
{
// Note: I made this function `static`... or did you omit the keyword on purpose?
static IEnumerable<Animal> ListVertebrated(Func<Skeleton, bool> selector)
{
…
}
}
If you have designed that function according to functional principles, it should have no side-effects. That is, its output relies only on its arguments. (And in a semi-object-oriented setting, perhaps on other static members of AnimalTopology; but since you didn't show any, let us ignore that possibility.)
If the function is indeed side-effect-free (and does not access static members of AnimalTopology), then the function's type signature suggests that it is possible to derive an Animal from a Skeleton, because it accepts something that acts on Skeletons and returns Animals.
If this is also true, then let me assume the following for the sake of being able to give an answer:
class Skeleton
{
…
public Animal Animal { get { … } } // Skeletons have animals!? We'll get to that.
}
Now it is obvious that your function is impossible to implement, since it could derive Animals from Skeletons, but it doesn't receive any Skeleton at all; it only receives a predicate function that acts on a Skeleton. (You could fix this by adding a second parameter of type Func<IEnumerable<Skeleton>> getSkeletons, but...)
In my opinion, something like the following would make more sense:
static IEnumerable<Animal> GetVertebrates(this IEnumerable<Skeleton> skeletons,
Func<Skeleton, bool> isVertebrate)
{
return skeletons
.Where(isVertebrate)
.Select(s => s.Animal);
}
Now, one might wonder why you are guessing animals from their skeletons; and isn't the bool property "is vertebrate" an inherent property of an animal (or skeleton)? Are there really several ways to decide on this?
I would suggest the following:
class Animal
{
Skeleton Skeleton { get; } // not only vertebrates have skeletons!
}
class Vertebrate : Animal { … } // vertebrates are a kind of animal
static class AnimalsExtensions
{
static IEnumerable<Vertebrate> ThatAreVertebrates(this IEnumerable<Animal> animals)
{
return animals.OfType<Vertebrate>();
}
}
Please note the use of extension methods above. Here's an example how to use it:
List<Animal> animals = …;
IEnumerable<Vertebrate> vertebrates = animals.ThatAreVertebrates();
Now suppose your extension method did more complex work. In that case, it might be a good idea to put it inside its own designated "algorithm type":
interface IVertebrateSelectionAlgorithm
{
IEnumerable<Vertebrate> GetVertebrates(IEnumerable<Animal> animals);
}
This has the advantage that it can be set up / parameterized e.g. via a class constructor; and you could split up the algorithm into several methods that all reside in the same class (but are all private except for GetVertebrates.)
Of course you can do the same kind of parameterization with functional closures, but in my experience that quickly gets messy in a C# setting. Here, classes are a good means to group a set of functions together as one logical entity.
Where do I place operations (methods ... functions) that acts on data structures?
I see four common approaches (in no particular order):
Put the functions inside the data structures. (This is the object-oriented "method" approach. It is suitable when a function acts only on an instance of that type. It is perhaps less appropriate e.g. when a function "draws together" several objects of different types, and spits out an object of yet another type. In this case, I would...)
Put the functions inside their own designated "algorithm classes". (This seems reasonable when the functions do much or complex work, or need to be parameterized/configured, or where you might want to split the algorithm into several functions that you can then logically "group" together by putting them in a class type.)
Turn the functions into lambdas (a.k.a. anonymous delegates, closures, etc.). (This works well if they're small and you only need them in one specific place; the code won't be easily reusable in a different place.)
Put the functions in a static class and make them extension methods. (That's how LINQ to Objects works. It is a hybrid functional & object-oriented approach. It takes some extra care to get the discoverability / namespacing issue right. Many people will think this approach breaks "encapsulation" when taken too far. For a counter-argument, read the excellent C++ article "How Non-Member Functions Improve Encapsulation"; substitute "extension method" for "non-member friend function".)
Note: I could go into each of these in more detail if people want, but before I do that, I'll wait and see what kind of feedback this answer receives.

When is it correct to create an extension method?

I have a piece of code like the following:
public class ActivityHelper
{
public void SetDate(IList<Activity> anActivityList)
{
foreach(Activity current in anActivityList)
{
current.Date = DateTime.Now;
}
}
//More methods, properties, fields, etc...
}
This could easily be converted to an extension method. For example:
public static void SetDate(this IList<Activity> aList)
{
foreach(Activity current in anActivityList)
{
current.Date = DateTime.Now;
}
}
The original function doesn't use any instance specific data or methods from the ActivityHelper class which makes it seem like it is in the incorrect place. Is this the correct time to write an extension method? What are the correct scenarios in which to create extension methods?
Brad Adams has written about extension method design guidelines:
CONSIDER using extension methods in any of the following scenarios:
To provide helper functionality relevant to every implementation of an interface, if said functionality can be written in terms of the core interface. This is because concrete implementations cannot otherwise be assigned to interfaces. For example, the LINQ to Objects operators are implemented as extension methods for all IEnumerable types. Thus, any IEnumerable<> implementation is automatically LINQ-enabled.
When an instance method would introduce a dependency on some type, but such a dependency would break dependency management rules. For example, a dependency from String to System.Uri is probably not desirable, and so String.ToUri() instance method returning System.Uri would be the wrong design from a dependency management perspective. A static extension method Uri.ToUri(this string str) returning System.Uri would be a much better design.
I think Extension methods are only appropriate if there is a compelling reason to make the method an extension method.
If the type is one you do not control, and the method should appear to be integral to the type, or if there is a compelling reason to not put the method directly on the type (such as creating an unwanted dependency) then an extension method could be appropriate.
Personally, if the expectation of the user of your API will already be to use the "ActivityHelper" class when working with collections of Activities, then I would probably not create an extension method for this. A standard, non-extension method will actually be a simpler API, since it's easily understood and discoverable. Extension methods are tricky from a usage standpoint - you're calling a method that "looks like" it exists somewhere other than where it actually exists. While this can simplify syntax, it reduces maintainability and discoverability.
In my experience extension methods work best when they:
Don't have side-effects (most of the extension methods my team wrote that have side-effects, we ended up removing because they caused more problems than they helped)
Offer functionality that applies to every possible instance or value of the type they're extending. (Again citing an example from my team, string.NormalizeUrl() is not appropriate because not all strings are even URLs anyway)
Well i usually create extension methods to help me write codes which have a smooth flow. Its generally depends upon the method you are creating.
If you feel that the method should have already been in framework and is too general then its okay to create an extension method for that.
But you need to first analyze that the class you are extending will always will be in state that your extension method can handle.
For Guidelines here to Brad's Article
http://blogs.msdn.com/b/brada/archive/2009/01/12/framework-design-guidelines-extension-methods.aspx
In essence, Extension Methods provide a more fluent style syntax for Helper methods. This translates into the ability to seemingly add functionality to types or all implementations of interfaces.
However, I generally steer away from declaring Extension Methods with a void returntype, as I feel the usefulness of this fluent style syntax, which allows you to compose statements, is negated when the method in question doesn't return anything.
However, I guess it can be handy to have your methods picked up by IntelliSense... :-)

Advice on C# Expression Trees

I'm working on a method that accepts an expression tree as a parameter, along with a type (or instance) of a class.
The basic idea is that this method will add certain things to a collection that will be used for validation.
public interface ITestInterface
{
//Specify stuff here.
}
private static void DoSomething<T>(Expression<Func<T, object>> expression, params IMyInterface[] rule)
{
// Stuff is done here.
}
The method is called as follows:
class TestClass
{
public int MyProperty { get; set; }
}
class OtherTestClass : ITestInterface
{
// Blah Blah Blah.
}
static void Main(string[] args)
{
DoSomething<TestClass>(t => t.MyProperty,
new OtherTestClass());
}
I'm doing it this way because I'd like for the property names that are passed in to be strong typed.
A couple of things I'm struggling with..
Within DoSomething, I'd like to get a PropertyInfo type (from the body passed in) of T and add it to a collection along with rule[]. Currently, I'm thinking about using expression.Body and removing [propertyname] from "Convert.([propertyname])" and using reflection to get what I need. This seems cumbersome and wrong. Is there a better way?
Is this a specific pattern I'm using?
Lastly, any suggestions or clarifications as to my misunderstanding of what I'm doing are appreciated and / or resources or good info on C# expression trees are appreciated as well.
Thanks!
Ian
Edit:
An example of what expression.Body.ToString() returns within the DoSomething method is a string that contains "Convert(t.MyProperty)" if called from the example above.
I do need it to be strongly typed, so it will not compile if I change a property name.
Thanks for the suggestions!
I rely heavily on expression trees to push a lot of what I want to do with my current application to compile-time, i.e. static type checking.
I traverse expression trees to translate them into something else which "makes sense".
One thing I've ended up doing a lot is that instead of URLs I rely on a MVC like approach where I declare lambda functions, and translates that... interpret, the compiler generated expression tree into an URL. When this URL is invoked, I do the opposite. This way, I have what I call compile-time checks for broken links and this works great with refactoring and overloads as well. I think it's cool to think about using expression trees in this way.
You might wanna check out the visitor pattern, it's a pain to get started with because it doesn't make much sense in the beginning but it ties everything together and it's a very formal way to solve type checking in compiler construction. You could do the same, but instead of type checking emit what ever you need.
Something which I'm currently pounding my head against is the ability to build a simple framework for translating (or actually I should say interpret) expression tress and emit JavaScript. The idea is that the compiler generated expression trees will translate into valid JavaScript which interfaces with some object model.
What's exciting about this is the way the compiler is always able to tell me when I go wrong and sure the end result is just a bunch of strings but the important part is how these strings got created. They went through some verification and that means something.
Once you get that going there is little you can't do with expression trees.
While working with the System.Reflection.Emit stuff I found myself using expression trees to create a light-weight framework for dynamic compilation, which at compile time could basically say if my dynamically created assemblies would compile as well, and this worked seamlessly with reflection and static type checking. It took this further and further and ended up with something which in the end saved a lot of time and proved to be very agile and robust.
So I love this kind of stuff, and this is what meta programming is all about, writing programs in your programs that do programs. I say keep it coming!
Collecting PropertyInfo objects from Expression.Body seems similar to my solution to another question.
I appreciate what you are trying to do with the property here. I have run into this conundrum. It always feels weird to write:
DoSomething("MyProperty", new OtherClass());
If the property ever changes name, or the text is mistyped in the call, then there will be a problem. What I have come to learn is that this is something you probably have to deal with via testing. Specifically, unit testing. I would write unit tests to enforce that the "DoSomething" calls work correctly.
The other thing you might try is to decorate your properties with attributes, and then reflect against your class when it is constructed looking for properties with the attribute, and load rules.
[DoSomething(typeof(OtherClass), typeof(OtherClass2))]
public int MyProperty
{
get;
set;
}
In this case the constructor (perhaps in a base class?) would dynamically create an OtherClass object and a OtherClass2 object, and load them into a collection along with the name of the property.

Categories

Resources