How well does F# perform compared to C#? [duplicate] - c#

This question already has answers here:
Closed 14 years ago.
Of course both F# and C# compile to IL and the CLR JITter does most of the hard work, however I wanted to know whether there's anything implicit in the F# language or its core libraries which result in lesser performance compared to C#?
Additionally, is there anything in the use of functional programming in the .net framework that might make it slower compared to C#?
I am going to sit down and measure differences of course, as that really is the only real way to know for sure, however I just wondered whether anybody had any broad-stroke knowledge on this topic.
See Also
C# / F# Performance comparison
Is a program F# any more efficient (execution-wise) than C#?

There is nothing intrinsic that makes one language faster than the other. They both run on the CLR and hence have roughly the same performance characteristics of any language that runs on the CLR. There are features of the respective languages though that do affect performance.
Tail recursion is a great example. If you write an F# app which heavily uses tail recursion, the compiler will attempt to rewrite it as an iterative loop removing the recursion penalty. Additionally F# uses the .tail IL op code in order to ask the CLR to remove the recursion stack penalty where possible.
It's easy to demonstrate this by translating some F# continuation samples to C# and then insert huge data sets. F# will work and C# will eventually crash.
http://blogs.msdn.com/jaredpar/archive/2008/11/13/comparing-continuations-in-c-and-f-part-3-double-wrong.aspx
But that is not necessarily a deficiency in C#. C# is not meant to be written with continuations while that is certainly the case in F#. It's in fact not surprising that writing algorithms meant for F# in C# produce not so great results.
Conversely, C# iterators are usually more efficient than F# sequence expressions. The reason is that C# iterators are very efficient state machines vs. F#'s are continuation passing.
In general though, in the abscence of threads, if you use the languages as they are intended to be used they will have similar performance characteristics.

A good example is the sieve of eratosthenes.
A co-worker and I wrote similar sieves in C# and F#. The performance of the C# version was almost 10 times slower than it's functional counterpart written by my co-worker.
There were probably some inefficiencies in the C# version that could have been cleaned up, but the F# version is noticeably faster.
This sort of problem lends itself to being written in a functional language..
Hope this helps some.
Edit -
Here's one of the C# examples using similar functionality to F#'s List.Partition. I'll keep looking for the F# example. I have hundreds of projects that it could be in, it's just a matter of sorting through all of my stuff to find it (I save everything I ever experiment with, so this could be time consuming.. lol)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ListPartitionTest
{
public static class IEnumerableExtensions
{
public static KeyValuePair<IEnumerable<T>, IEnumerable<T>> Partition<T>(this IEnumerable<T> items, Func<T, bool> f)
{
return items.Aggregate(
new KeyValuePair<IEnumerable<T>, IEnumerable<T>>(Enumerable.Empty<T>(), Enumerable.Empty<T>()),
(acc, i) =>
{
if (f(i))
{
return new KeyValuePair<IEnumerable<T>, IEnumerable<T>>(acc.Key.Concat(new[] { i }), acc.Value);
}
else
{
return new KeyValuePair<IEnumerable<T>, IEnumerable<T>>(acc.Key, acc.Value.Concat(new[] { i }));
}
});
}
}
class PrimeNumbers
{
public int Floor { get; private set; }
public int Ceiling { get; private set; }
private IEnumerable<int> _sieve;
public PrimeNumbers(int floor, int ceiling)
{
Floor = floor;
Ceiling = ceiling;
}
public List<int> Go()
{
_sieve = Enumerable.Range(Floor, (Ceiling - Floor) + 1).ToList();
for (int i = (Floor < 2) ? 2 : Floor; i <= Math.Sqrt(Ceiling); i++)
{
_sieve = _sieve.Where(x => (x % i != 0 && x != i));
foreach (int x in _sieve)
{
Console.Write("{0}, ", x);
}
Console.WriteLine();
}
return _sieve.ToList();
}
}
class Program
{
static void Main(string[] args)
{
System.Diagnostics.Stopwatch s = new System.Diagnostics.Stopwatch();
int floor = 1;
int ceiling = 10;
s.Start();
PrimeNumbers p = new PrimeNumbers(floor, ceiling);
p.Go();
//foreach (int i in p.Go()) Console.Write("{0} ", i);
s.Stop();
Console.WriteLine("\n{0} to {1} completed in {2}", floor, ceiling, s.Elapsed.TotalMilliseconds);
Console.ReadLine();
}
}
}

Related

Print Numbers from 1 to 1000 using oops concept

How to print numbers from 1 to 1000 using oops concepts (i.e) without using loops, array and recursion
Since you mentioned the OOP concepts, one of which is encapsulation. You don't care about the implementation details when the method gets the job done.
Almost all linq extension methods use loops (Actually differed Iterators) in their implementation, it is easy not to realize that because it is a details encapsulated in the implementation.
To answer your question, the only way to do that without a loop is to write the WriteLine call 1000 times.
In OOP, you create a class to encapsulate the logic and then use it:
class Program
{
static void Main(string[] args)
{
(new RangePrinter).PrintRange(1, 1001);
}
}
No loops right ? Actually the loop is encapsulated in the implementation.
class RangePrinter
{
/* Injecting the Write target is skipped for simplicity */
public void PrintRange(int lowerBound, int upperBound)
{
for(int i = lowerBound; i < upperBound; i++)
{
Console.WriteLine(i);
}
}
}
The same applies when you use Enumerable (aka Linq extension methods):
Enumerable.Range(1, 1000).ToList().ForEach(x=> { Console.WriteLine(x); });
Here how the .NET Framework team implements the Range Internal Method:
static IEnumerable<int> RangeIterator(int start, int count) {
for (int i = 0; i < count; i++) yield return start + i;
}
Conclusion, when you need to do repetitive work then you need a loop. Any C# API that iterates over a collection (Linq's Where, Select, etc..) uses a loop. Loops are not bad unless the are not needed or they are nested when there is alternative approaches.
Just for fun, if this is meant to be a puzzle (forget the OOP requirement part), then you can do this:
string oneToThousand = #"1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n" +
"21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n" +
"41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59\r\n60\r\n" +
"61\r\n62\r\n63\r\n64\r\n65\r\n66\r\n67\r\n68\r\n69\r\n70\r\n71\r\n72\r\n73\r\n74\r\n75\r\n76\r\n77\r\n78\r\n79\r\n80\r\n" +
"81\r\n82\r\n83\r\n84\r\n85\r\n86\r\n87\r\n88\r\n89\r\n90\r\n91\r\n92\r\n93\r\n94\r\n95\r\n96\r\n97\r\n98\r\n99\r\n100\r\n";
/* Continue to 1000 */
Console.WriteLine(oneToThousand);
A recursive function (DEF) is a function which either calls itself or is in a potential cycle of function calls. As the definition specifies, there are two types of recursive functions. Consider a function which calls itself: we call this type of recursion immediate recursion.
Example:
public static void PrintTo(int number)
{
if (number == 0)
return;
Console.WriteLine(number);
PrintTo(number - 1);
}
static void Main(string[] args)
{
PrintTo(1000);
}

Writing a C# version of Haskell infinite Fibonacci series function

Note: The point of this question is more from a curiosity perspective. I want to know out of curiosity whether it is even possible to transliterate the Haskell implementation into a functional C# equivalent.
So I've been learning myself Haskell for great good, and while solving Project Euler problems I ran into this beautiful Haskell Fibonacci implementation:
fibs :: [Integer]
fibs = 1:1:zipWith (+) fibs (tail fibs)
Of course I was tempted to write a C# version like this, so:
If I do this:
IEnumerable<int> fibs =
Enumerable.Zip(Enumerable.Concat(new int[] { 1, 1 }, fibs),
//^^error
fibs.Skip(1), (f, s) => f + s);
The error says use of unassigned local variable fibs.
So I went slightly imperative, while this compiles...
public static IEnumerable<int> Get()
{
return Enumerable.Zip(Enumerable.Concat(new int[] { 1, 1 }, Get()),
Get().Skip(1), (f, s) => f + s);
}
It breaks with a stack overflow exception! So I came here..
Questions:
Can anyone think of a functional C# equivalent that works?
I'd like some insight into why my solutions don't work.
The answer to your first question is: this is how to do it in C#:
using System;
using System.Collections.Generic;
using System.Linq;
class P
{
static IEnumerable<int> F()
{
yield return 1;
yield return 1;
foreach(int i in F().Zip(F().Skip(1), (a,b)=>a+b))
yield return i;
}
static void Main()
{
foreach(int i in F().Take(10))
Console.WriteLine(i);
}
}
The answer to your second question is: C# is eager by default, so your method has an unbounded recursion. Iterators that use yield however return an enumerator immediately, but do not construct each element until required; they are lazy. In Haskell everything is lazy automatically.
UPDATE: Commenter Yitz points out correctly that this is inefficient because, unlike Haskell, C# does not automatically memoize the results. It's not immediately clear to me how to fix it while keeping this bizarre recursive algorithm intact.
Of course you would never actually write fib like this in C# when it is so much easier to simply:
static IEnumerable<BigInteger> Fib()
{
BigInteger prev = 0;
BigInteger curr = 1;
while (true)
{
yield return curr;
var next = curr + prev;
prev = curr;
curr = next;
}
}
Unlike the C# version provided in Eric Lippert's answer, this F# version avoids repeated computation of elements and therefore has comparable efficiency with Haskell:
let rec fibs =
seq {
yield 1
yield 1
for (a, b) in Seq.zip fibs (Seq.skip 1 fibs) do
yield a + b
}
|> Seq.cache // this is critical for O(N)!
I have to warn you that I'm trying to fix your attempts, not to make a productive code.
Also, this solution is good to make our brains to explode, and maybe the computer also.
In your first snippet you tried to call recursive your field or local variable, that is not possible.Instead we can try with a lambda which could be more similar to that. We know from Church, that is also not possible, at least in the traditional way. Lambda expressions are unnamed; you can't call them by their name ( inside of the implementation ). But you can use the fixed point to do recursion. If you have a sane mind there is big chance of not knowing what is that, anyway you should give a try to this link before continuing with this.
fix :: (a -> a) -> a
fix f = f (fix f)
This will be the c# implementation (which is wrong)
A fix<A>(Func<A,A> f) {
return f(fix(f));
}
Why is wrong? Because fix(f) represents a beautiful stackoverflow. So we need to make it lazy:
A fix<A>(Func<Func<A>,A> f) {
return f(()=>fix(f));
}
Now is lazy! Actually you will see a lot of this in the following code.
In your second snippet and also in the first, you have the problem that the second argument to Enumerable.Concat is not lazy, and you will have stackoverflow exception, or infinite loop in the idealistic way. So let's make it lazy.
static IEnumerable<T> Concat<T>(IEnumerable<T> xs,Func<IEnumerable<T>> f) {
foreach (var x in xs)
yield return x;
foreach (var y in f())
yield return y;
}
Now, we have the whole "framework" to implement what you have tried in the functional way.
void play() {
Func<Func<Func<IEnumerable<int>>>, Func<IEnumerable<int>>> F = fibs => () =>
Concat(new int[] { 1, 1 },
()=> Enumerable.Zip (fibs()(), fibs()().Skip(1), (a,b)=> a + b));
//let's see some action
var n5 = fix(F)().Take(5).ToArray(); // instant
var n20 = fix(F)().Take(20).ToArray(); // relative fast
var n30 = fix(F)().Take(30).ToArray(); //this will take a lot of time to compute
//var n40 = fix(F)().Take(40).ToArray(); //!!! OutOfMemoryException
}
I know that the F signature is ugly like hell, but this is why languages like haskell exists, and even F#. C# is not made for this work to be done like this.
Now, the question is, why haskell can achieve something like this?Why? because whenever you say something like
a:: Int
a = 4
The most similar translation in C# is :
Func<Int> a = () => 4
Actually is much more involved in the haskell implementation, but this is the idea why similar method of solving problems looks so different if you want to write it in both languages
Here it is for Java, dependent on Functional Java:
final Stream<Integer> fibs = new F2<Integer, Integer, Stream<Integer>>() {
public Stream<Integer> f(final Integer a, final Integer b) {
return cons(a, curry().f(b).lazy().f(a + b));
}
}.f(1, 2);
For C#, you could depend on XSharpX
A take on Eric's answer that has Haskell equivalent performance, but still has other issues(thread safety, no way to free memory).
private static List<int> fibs = new List<int>(){1,1};
static IEnumerable<int> F()
{
foreach (var fib in fibs)
{
yield return fib;
}
int a, b;
while (true)
{
a = fibs.Last();
b = fibs[fibs.Count() - 2];
fibs.Add(a+b);
yield return a + b;
}
}
Translating from a Haskell environment to a .NET environment is much easier if you use F#, Microsoft's functional declarative language similar to Haskell.

C# Linq vs. Currying

I am playing a little bit with functional programming and the various concepts of it. All this stuff is very interesting. Several times I have read about Currying and what an advantage it has.
But I do not get the point with this. The following source demonstrates the using of the curry concept and the solution with linq. Actually, I do not see any advatages of using the currying concept.
So, what is the advantage of using currying?
static bool IsPrime(int value)
{
int max = (value / 2) + 1;
for (int i = 2; i < max; i++)
{
if ((value % i) == 0)
{
return false;
}
}
return true;
}
static readonly Func<IEnumerable<int>, IEnumerable<int>> GetPrimes =
HigherOrder.GetFilter<int>().Curry()(IsPrime);
static void Main(string[] args)
{
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Console.Write("Primes:");
//Curry
foreach (int n in GetPrimes(numbers))
{
Console.Write(" {0}", n);
}
Console.WriteLine();
//Linq
foreach (int n in numbers.Where(p => IsPrime(p)))
{
Console.Write(" {0}", n);
}
Console.ReadLine();
}
Here is the HigherOrder Filter Method:
public static Func<Func<TSource, bool>, IEnumerable<TSource>, IEnumerable<TSource>> GetFilter<TSource>()
{
return Filter<TSource>;
}
what is the advantage of using currying?
First off, let's clarify some terms. People use "currying" to mean both:
reformulating a method of two parameters into a methods of one parameter that returns a method of one parameter and
partial application of a method of two parameters to produce a method of one parameter.
Clearly these two tasks are closely related, and hence the confusion. When speaking formally, one ought to restrict "currying" to refer to the first definition, but when speaking informally either usage is common.
So, if you have a method:
static int Add(int x, int y) { return x + y; }
you can call it like this:
int result = Add(2, 3); // 5
You can curry the Add method:
static Func<int, int> MakeAdder(int x) { return y => Add(x, y); }
and now:
Func<int, int> addTwo = MakeAdder(2);
int result = addTwo(3); // 5
Partial application is sometimes also called "currying" when speaking informally because it is obviously related:
Func<int, int> addTwo = y=>Add(2,y);
int result = addTwo(3);
You can make a machine that does this process for you:
static Func<B, R> PartiallyApply<A, B, R>(Func<A, B, R> f, A a)
{
return (B b)=>f(a, b);
}
...
Func<int, int> addTwo = PartiallyApply<int, int, int>(Add, 2);
int result = addTwo(3); // 5
So now we come to your question:
what is the advantage of using currying?
The advantage of either technique is that it gives you more flexibility in dealing with methods.
For example, suppose you are writing an implementation of a path finding algorithm. You might already have a helper method that gives you an approximate distance between two points:
static double ApproximateDistance(Point p1, Point p2) { ... }
But when you are actually building the algorithm, what you often want to know is what is the distance between the current location and a fixed end point. What the algorithm needs is Func<Point, double> -- what is the distance from the location to the fixed end point? What you have is Func<Point, Point, double>. How are you going to turn what you've got into what you need? With partial application; you partially apply the fixed end point as the first argument to the approximate distance method, and you get out a function that matches what your path finding algorithm needs to consume:
Func<Point, double> distanceFinder = PartiallyApply<Point, Point, double>(ApproximateDistance, givenEndPoint);
If the ApproximateDistance method had been curried in the first place:
static Func<Point, double> MakeApproximateDistanceFinder(Point p1) { ... }
Then you would not need to do the partial application yourself; you'd just call MakeApproximateDistanceFinder with the fixed end point and you'd be done.
Func<Point, double> distanceFinder = MakeApproximateDistanceFinder(givenEndPoint);
The comment by #Eric Lippert on What is the advantage of Currying in C#? (achieving partial function) points to this blog post:
Currying and Partial Function Application
Where I found this the best explantion that works for me:
From a theoretical standpoint, it is interesting because it (currying) simplifies
the lambda calculus to include only those functions which have at most
one argument. From a practical perspective, it allows a programmer to
generate families of functions from a base function by fixing the
first k arguments. It is akin to pinning up something on the wall
that requires two pins. Before being pinned, the object is free to
move anywhere on the surface; however, when when first pin is put in
then the movement is constrained. Finally, when the second pin is put
in then there is no longer any freedom of movement. Similarly, when a
programmer curries a function of two arguments and applies it to the
first argument then the functionality is limited by one dimension.
Finally, when he applies the new function to the second argument then
a particular value is computed.
Taking this further I see that functional programming essentially introduces 'data flow programming as opposed to control flow' this is akin to using say SQL instead of C#. With this definition I see why LINQ is and why it has many many applications outside of pure Linq2Objects - such as events in Rx.
The advantage of using currying is largely to be found in functional languages, which are built to benefit from currying, and have a convenient syntax for the concept. C# is not such a language, and implementations of currying in C# are usually difficult to follow, as is the expression HigherOrder.GetFilter<int>().Curry()(IsPrime).

How to inject/generate plumbing code into methods decorated with an Attribute?

I was reading through some articles on Caching and Memoization and how to implement it easily using delegates and generics. The syntax was pretty straightforward, and it is surprisingly easy to implement, but I just feel due to the repetitive nature it should be possible to generate code based on an Attribute, instead of having to write the same plumbing code over and over.
Let's say we start off with the default example:
class Foo
{
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
And then to memoize this:
// Let's say we have a utility class somewhere with the following extension method:
// public static Func<TResult> Memoize<TResult>(this Func<TResult> f)
class Foo
{
public Func<int,int> Fibonacci = fib;
public Foo()
{
Fibonacci = Fibonacci.Memoize();
}
public int fib(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
I thought, wouldn't it be simpler to just make a code generator that spits out this code, once it finds a tagged method that matches one of the Memoize extension methods. So in stead of writing this plumbing code, I could just add an attribute:
class Foo
{
[Memoize]
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
Honestly, I know this is looking more like compiler sugar that should be converted by a preprocessor than actual code generation but my question is:
What do you think is the best way to find the methods in a c# source file that have a given attribute, parsing out the parametertypes and returntype, and generating a delegate that matches this fingerprint
What would be the best way to integrate this into the build process, without actually overwriting my code. Is it possible to do some preprocessing on the source files before passing it on to the compiler?
Thanks for any and all ideas.
Update:
I have looked into the Postsharp library as Shay had suggested, and it seemed very suited for the job on non time-critical applications like Transaction Management, Tracing or Security.
However when using it in a time-critical context it proved a LOT slower than the delegate. One million iterations of the Fibonacci example with each implementation resulted in a 80x slower runtime. (0.012ms postsharp vs 0.00015ms delegate per call)
But honestly the result is completely acceptable in the context in which I intend to use it. Thanks for the responses!
Update2:
Apparently the author of Postsharp is working hard on a release 2.0 which will include, among other things, performance improvements in produced code, and compile time.
I bump into this memoizer attribute using postsharp
I've used the following Memoize function in a project of mine:
public class Foo
{
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n - 1) + Fibonacci(n - 2) : n;
}
}
class Program
{
public static Func<Т, TResult> Memoize<Т, TResult>(Func<Т, TResult> f) where Т : IEquatable<Т>
{
Dictionary<Т, TResult> map = new Dictionary<Т, TResult>();
return a =>
{
TResult local;
if (!TryGetValue<Т, TResult>(map, a, out local))
{
local = f(a);
map.Add(a, local);
}
return local;
};
}
private static bool TryGetValue<Т, TResult>(Dictionary<Т, TResult> map, Т key, out TResult value) where Т : IEquatable<Т>
{
EqualityComparer<Т> comparer = EqualityComparer<Т>.Default;
foreach (KeyValuePair<Т, TResult> pair in map)
{
if (comparer.Equals(pair.Key, key))
{
value = pair.Value;
return true;
}
}
value = default(TResult);
return false;
}
static void Main(string[] args)
{
var foo = new Foo();
// Transform the original function and render it with memory
var memoizedFibonacci = Memoize<int, int>(foo.Fibonacci);
// memoizedFibonacci is a transformation of the original function that can be used from now on:
// Note that only the first call will hit the original function
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
}
}
In my project I needed only functions with a single argument that implement IEquatable<Т> but this can be generalized even further.
Another important remark is that this code is not thread safe. If you need thread safety you will need to synchronize the read/write access to the internal map hashtable.
To specifically address your points:
This would be likely too hard to do
in the way that you describe, as
you'd need a full-blown C# grammar
parser. What might be a more viable
alternative is writing a managed app
which could load the compiled
assembly and extract type
information using Reflection. This
would involve getting all Type
objects in a given assembly, looking
for methods on the types, retrieving
the custom attribute, and then
emitting the memoization code (this
part might be a bit difficult).
If you go the route I mention in #1, you could simply
add a post-build step to run your tool. Visual Studio
(which uses MSBuild underneath) makes this relatively easy.
If you write a plugin to PostSharp instead of using its LAOS library, you would get no performance hit.

Is yield useful outside of LINQ?

When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Has anyone written anything like or not like linq where yield was useful?
Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.
Take, for example, a filter iterator:
IEnumerator<T> Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
foreach(T t in coll)
if (func(t)) yield return t;
}
Now, you can chain this:
MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)
You method would be creating (and tossing) three lists. My method iterates over it just once.
Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.
Why would I write this function:
IList<string> LoadStuff() {
var ret = new List<string>();
foreach(var x in SomeExternalResource)
ret.Add(x);
return ret;
}
When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:
IEnumerable<string> LoadStuff() {
foreach(var x in SomeExternalResource)
yield return x;
}
It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.
I could go on and on....
I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:
public IEnumerator<Expression<T>> GetEnumerator()
{
if (IsLeaf)
{
yield return this;
}
else
{
foreach (Expression<T> expr in LeftExpression)
{
yield return expr;
}
foreach (Expression<T> expr in RightExpression)
{
yield return expr;
}
yield return this;
}
}
Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.
At a previous company, I found myself writing loops like this:
for (DateTime date = schedule.StartDate; date <= schedule.EndDate;
date = date.AddDays(1))
With a very simple iterator block, I was able to change this to:
foreach (DateTime date in schedule.DateRange)
It made the code a lot easier to read, IMO.
yield was developed for C#2 (before Linq in C#3).
We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.
Collections are great any time you have a few elements that you're going to hit multiple times.
However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.
This is essentially what the SqlDataReader does - it's a forward only custom enumerator.
What yield lets you do is quickly and with minimal code write your own custom enumerators.
Everything yield does could be done in C#1 - it just took reams of code to do it.
Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.
Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.
.Net 2.0 example:
public static class FuncUtils
{
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
...
public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
{
foreach (T el in e)
if (filterFunc(el))
yield return el;
}
public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
{
foreach (T el in e)
yield return mapFunc(el);
}
...
I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.
I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:
the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more
One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations
Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.
Anyway, still trying to get my head around it as well.
Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.
consider this example:
static IEnumerable<Person> GetAllPeople()
{
return new List<Person>()
{
new Person() { Name = "George", Surname = "Bush", City = "Washington" },
new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
new Person() { Name = "Joe", Surname = "Average", City = "New York" }
};
}
static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people, string where)
{
foreach (var person in people)
{
if (person.City == where) yield return person;
}
yield break;
}
static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
foreach (var person in people)
{
if (person.Name.StartsWith(initial)) yield return person;
}
yield break;
}
static void Main(string[] args)
{
var people = GetAllPeople();
foreach (var p in people.GetPeopleFrom("Washington"))
{
// do something with washingtonites
}
foreach (var p in people.GetPeopleWithInitial("G"))
{
// do something with people with initial G
}
foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
{
// etc
}
}
(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)
As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.
The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.
The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.
Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.
I've used yeild in non-linq code things like this (assuming functions do not live in same class):
public IEnumerable<string> GetData()
{
foreach(String name in _someInternalDataCollection)
{
yield return name;
}
}
...
public void DoSomething()
{
foreach(String value in GetData())
{
//... Do something with value that doesn't modify _someInternalDataCollection
}
}
You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.
Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.
I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)
The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:
public static class CollectionSampling
{
public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
{
var rand = new Random();
using (var enumerator = coll.GetEnumerator());
{
while (enumerator.MoveNext())
{
yield return enumerator.Current;
int currentSample = rand.Next(max);
for (int i = 1; i <= currentSample; i++)
enumerator.MoveNext();
}
}
}
}
Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection

Categories

Resources