Is it possible to implement McCarthy's amb-operator for non-deterministic choice in C#?
Apparently .NET lacks continuation support but yield return could be useful. Would this be possible in other static .NET-languages like F#?
Yes, yield return does a form of continuation. Although for many useful cases, Linq provides functional operators that allow you to plug together a lazy sequence generator, so in fact in C# 3 it isn't necessary to use yield return so much (except when adding more Linq-style extensions of your own to plug gaps in the library, e.g. Zip, Unfold).
In the example we factorise an integer by brute force. Essentially the same example in C# can be done with the built-in Linq operators:
var factors = Enumerable.Range(2, 100)
.Join(Enumerable.Range(2, 100),
n => 1, n => 1, (i, j) => new { i, j })
.First(v => v.i*v.j == 481);
Console.WriteLine("Factors are " + factors.i + ", " + factors.j);
Here the starting points are my two calls to Enumerable.Range, which is built-in to Linq but you could implement yourself as:
IEnumerable<int> Range(int start, int stop)
{
for (int n = start; n < stop; n++)
yield return n;
}
There are two odd parameters, the n => 1, n => 1 parameters to Join. I'm picking 1 as the key value for Join to use when matching up items, hence all combinations will match and so I get to test every combination of numbers from the ranges.
Then I turn the pair of values into a kind of tuple (an anonymous type) with:
(i, j) => new { i, j })
Finally, I pick the first such tuple for which my test is satisfied:
.First(v => v.i*v.j == 481);
Update
The code inside the call to First need not be merely a short test expression. It can be a whole lot of imperative code which needs to be "restarted" if the test fails:
.First(v =>
{
Console.WriteLine("Aren't lambdas powerful things?");
return v.i*v.j == 481;
);
So the part of the program that potentially needs to be restarted with different values goes in that lambda. Whenever that lambda wants to restart itself with different values, it just returns false - the equivalent of calling amb with no arguments.
This is not an answer to your question, but it may get you what you want.
amb is used for nondeterministic computing. As you may know, Prolog is a nondeterministic language using the notion of unification to bind values to variables (basically what amb ends up doing).
There IS an implementation of this functionality in C#, called YieldProlog. As you guessed, the yield operator is an important requisite for this.
http://yieldprolog.sourceforge.net/
Related
Assume
List<int> diff(List<int> a, List<int> b)
{
// assume same length lists
List<int> diff= new List<int>(a.Count);
for (int i=0; i<diff.Count; ++i)
{
diff[i] = a[i] - b[i];
}
return diff;
}
I would like to have some kind of one-liner do the same, or something that uses a lambda, rather than re-writing all the boilerplate.
for instance, in python, this would be either
[ai-bi for ai,bi in zip(a,b)]
or even
np.array(a) - np.array(b)
Is there a nice way to write this in C#? All my searches find ways to remove or add list elements, but nothing about element-wise actions.
Linq has a Zip method as well:
var diff = a.Zip(b, (ai, bi) => ai - bi);
Note that one potential bug in your code is if b has fewer elements than a then you'd get an exception when you try to access an element outside the range of b. Zip will only return items as long as both collections have items, which is effectively the shorter of the two collection lengths.
I came across this dead end while trying to replace exceptions with either monad in c#.
Which leads me to think maybe it is not only language specific problem and more technique related missing feature.
Let me try to re-explain it more globally:
Given:
I have a 3rd party function( a function that is imported into my code and I have no access to) which receives a lazy list (c# IEnumerable,f# Seq...) and consume it
I Want:
To apply a function (LINQ select,map...) on the method's lazy list argument and will take each element of the list (lazily) and will do computation that might fail (throwing an exception or returning Error/Either).
The list to be consumed only "inside" the 3rd party function, I don't want to have to iterate over each element more then once.
With Exceptions/side effects this can be achieved easily with throwing exception from the select, map functions if error was found, this will stop the execution "inside" the 3rd party function. Then I could handle the exception outside of it (without the 3rd party being "aware" of my error handling), leaving the responsibility of the error handling to me.
While with Either it does not seem to be possible to get the same behavior without altering the 3rd party function. Intuitively I was trying to convert the list from the list of Eithers to Either of list, but this can be done only by consuming the list with functions. Like, aggregate or reduce (does Haskell's Sequence function act the same?).
All this leads me to the question are Maybes/Eithers or Error as return type, missing this behavior? Is there another way to achive the same thing with them?
As far as I can tell, Haskell Either is isomorphic to C#/Java-style exceptions, meaning that there's a translation from Either-based code to exception-based code, and vice versa. I'm not quite sure about this, though, as there may be some edge cases that I'm not aware of.
On the other hand, what I am sure of is that Either () a is isomorphic to Maybe a, so in the following, I'm going to stick with Either and ignore Maybe.
What you can do with exceptions in C#, you can also do with Either. The default in C# is to do no error handling1:
public IEnumerable<TResult> NoCatch<TResult, T>(
IEnumerable<T> source, Func<T, TResult> selector)
{
return source.Select(selector);
}
This will iterate over source until an exception happens. If no exception is thrown, it'll return IEnumerable<TResult>, but if an exception is thrown by selector, the entire method throws an exception as well. However, if elements of source were handled before an exception was thrown, and there were side-effects, that work remains done.
You can do the same in Haskell using sequence:
noCatch :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
noCatch f = sequence . fmap f
If f is a function that returns Either, then it behaves in the same way:
*Answer> noCatch (\i -> if i < 10 then Right i else Left i) [1, 3, 5, 2]
Right [1,3,5,2]
*Answer> noCatch (\i -> if i < 10 then Right i else Left i) [1, 3, 5, 11, 2, 12]
Left 11
As you can see, if no Left value is ever returned, you get a Right case back, with all the mapped elements. If just one Left case is returned, you get that, and no further processing is done.
You could also imagine that you have a C# method that suppresses individual exceptions:
public IEnumerable<TResult> Suppress<TResult, T>(
IEnumerable<T> source, Func<T, TResult> selector)
{
foreach (var x in source)
try { yield selector(x) } catch {}
}
In Haskell, you could do this with Either:
filterRight :: (a -> Either e b) -> [a] -> [b]
filterRight f = rights . fmap f
This returns all the Right values, and ignores the Left values:
*Answer> filterRight (\i -> if i < 10 then Right i else Left i) [1, 3, 5, 11, 2, 12]
[1,3,5,2]
You can also write a method that processes the input until the first exception is thrown (if any):
public IEnumerable<TResult> ProcessUntilException<TResult, T>(
IEnumerable<T> source, Func<T, TResult> selector)
{
var exceptionHappened = false;
foreach (var x in source)
{
if (!exceptionHappened)
try { yield selector(x) } catch { exceptionHappened = true }
}
}
Again, you can achieve the same effect with Haskell:
takeWhileRight :: (a -> Either e b) -> [a] -> [Either e b]
takeWhileRight f = takeWhile isRight . fmap f
Examples:
*Answer> takeWhileRight (\i -> if i < 10 then Right i else Left i) [1, 3, 5, 11, 2, 12]
[Right 1,Right 3,Right 5]
*Answer> takeWhileRight (\i -> if i < 10 then Right i else Left i) [1, 3, 5, 2]
[Right 1,Right 3,Right 5,Right 2]
As you can see, however, both the C# examples and the Haskell examples need to be aware of the style of error-handling. While you can translate between the two styles, you can't use one with a method/function that expects the other.
If you have a third-party C# method that expects exception handling to be the way things are done, you can't pass it a sequence of Either values and hope that it can deal with it. You'd have to modify the method.
The converse isn't quite true, though, because exception-handling is built into C# (and Haskell as well, in fact); you can't really opt out of exception-handling in such languages. Imagine, however, a language that doesn't have built-in exception-handling (PureScript, perhaps?), and this would be true as well.
1 C# code may not compile.
I haven't got a compiler handy, but you may want to check my language-ext project. It's a functional base class library for C#.
For your needs it has:
Seq<A> which is a cons like lazy enumerable which will only evaluate once
Try<A> which is a delegate based monad which allows you to capture exceptions from third party code
Other common error handling monads: Option<A>, Either<L, R>, etc.
Bonus variants of those monads: OptionAsync<A>, TryOption<A>, TryAsync<A>, TryOptionAsync<A>
Ability to easily convert between those types: ToOption(), ToEither(), etc.
To apply a function (LINQ select,map...) on the method's lazy list argument and will take each element of the list (lazily) and will do computation that might fail (throwing an exception or returning Error/Either).
The list to be consumed only "inside" the 3rd party function, I don't want to have to iterate over each element more then once.
This is a little unclear of the actual goal. In language-ext you could do this:
using LanguageExt;
using static LanguageExt.Prelude;
// Dummy lazy enumerable
IEnumerable<int> Values()
{
for(int i = 0; i < 100; i++)
{
yield return UnreliableExternalFunc(i);
}
}
// Convert to a lazy sequence
Seq<int> seq = Seq(Values());
// Invoke external function that takes an IEnumerable
ExternalFunction(seq);
// Calling it again won't evaluate it twice
ExternalFunction(seq);
But if the Values() function threw an exception then it would end its yielding and would return. So you'd ideally have this:
// Dummy lazy enumerable
IEnumerable<Try<int>> Values()
{
for(int i = 0; i < 100; i++)
{
yield return Try(() => UnreliableExternalFunc(i));
}
}
Try is the constructor function for the Try monad. So your result would be a sequence of Try thunks. If you don't care about the exception you could convert it to an Option
// Dummy lazy enumerable
IEnumerable<Option<int>> Values()
{
for(int i = 0; i < 100; i++)
{
yield return Try(() => UnreliableExternalFunc(i)).ToOption();
}
}
The you could access all successes via:
var validValues = Values().Somes();
Or you could instead use Either:
// Dummy lazy enumerable
IEnumerable<Either<Exception, A>> Values()
{
for(int i = 0; i < 100; i++)
{
yield return Try(() => UnreliableExternalFunc(i)).ToEither();
}
}
Then you can get the valid results thus:
var seq = Seq(Values());
var validValues = seq.Rights();
And the errors:
var errors = seq.Lefts();
I converted it to a Seq so it doesn't evaluate twice.
One way or another, if you want to catch an exception that happens during the lazy evaluation of the enumerable, then you will need to wrap each value. If the exception can occur from the usage of the lazy value, but within a function then your only hope is to surround it with a Try:
// Convert to a lazy sequence
Seq<int> seq = Seq(Values()); // Values is back to returning IEnumerable<int>
// Invoke external function that takes an IEnumerable
var res = Try(() => ExternalFunction(seq)).IfFail(Seq<int>.Empty);
// Calling it again won't evaluate it twice
ExternalFunction(seq);
Intuitively I was trying to convert the list from the list of Eithers to Either of list, but this can be done only by consuming the list with functions. Like, aggregate or reduce (does Haskell's Sequence function act the same?).
You can do this in language-ext like so:
IEnumerable<Either<L, R>> listOfEithers = ...;
Either<L, IEnumerable<R>> eitherList = listOfEithers.Sequence();
Traverse is also supported:
Either<L, IEnumerable<R>> eitherList = listOfEithers.Traverse(x => map(x));
All combinations of monads support Sequence() and Traverse; so you could do it with a Seq<Either<L, R>> to get a Either<L, Seq<R>>, which would guarantee that the lazy sequence isn't invoked multiple times. Or a Seq<Try<A>> to get a Try<Seq<A>>, or any of the async variants for concurrent sequencing and traversal.
I'm not sure if any of this is covering what you're asking, the question is a bit broad. A more specific example would be useful.
Lets assume you have a function that returns a lazily-enumerated object:
struct AnimalCount
{
int Chickens;
int Goats;
}
IEnumerable<AnimalCount> FarmsInEachPen()
{
....
yield new AnimalCount(x, y);
....
}
You also have two functions that consume two separate IEnumerables, for example:
ConsumeChicken(IEnumerable<int>);
ConsumeGoat(IEnumerable<int>);
How can you call ConsumeChicken and ConsumeGoat without a) converting FarmsInEachPen() ToList() beforehand because it might have two zillion records, b) no multi-threading.
Basically:
ConsumeChicken(FarmsInEachPen().Select(x => x.Chickens));
ConsumeGoats(FarmsInEachPen().Select(x => x.Goats));
But without forcing the double enumeration.
I can solve it with multithread, but it gets unnecessarily complicated with a buffer queue for each list.
So I'm looking for a way to split the AnimalCount enumerator into two int enumerators without fully evaluating AnimalCount. There is no problem running ConsumeGoat and ConsumeChicken together in lock-step.
I can feel the solution just out of my grasp but I'm not quite there. I'm thinking along the lines of a helper function that returns an IEnumerable being fed into ConsumeChicken and each time the iterator is used, it internally calls ConsumeGoat, thus executing the two functions in lock-step. Except, of course, I don't want to call ConsumeGoat more than once..
I don't think there is a way to do what you want, since ConsumeChickens(IEnumerable<int>) and ConsumeGoats(IEnumerable<int>) are being called sequentially, each of them enumerating a list separately - how do you expect that to work without two separate enumerations of the list?
Depending on the situation, a better solution is to have ConsumeChicken(int) and ConsumeGoat(int) methods (which each consume a single item), and call them in alternation. Like this:
foreach(var animal in animals)
{
ConsomeChicken(animal.Chickens);
ConsomeGoat(animal.Goats);
}
This will enumerate the animals collection only once.
Also, a note: depending on your LINQ-provider and what exactly it is you're trying to do, there may be better options. For example, if you're trying to get the total sum of both chickens and goats from a database using linq-to-sql or linq-to-entities, the following query..
from a in animals
group a by 0 into g
select new
{
TotalChickens = g.Sum(x => x.Chickens),
TotalGoats = g.Sum(x => x.Goats)
}
will result in a single query, and do the summation on the database-end, which is greatly preferable to pulling the entire table over and doing the summation on the client end.
The way you have posed your problem, there is no way to do this. IEnumerable<T> is a pull enumerable - that is, you can GetEnumerator to the front of the sequence and then repeatedly ask "Give me the next item" (MoveNext/Current). You can't, on one thread, have two different things pulling from the animals.Select(a => a.Chickens) and animals.Select(a => a.Goats) at the same time. You would have to do one then the other (which would require materializing the second).
The suggestion BlueRaja made is one way to change the problem slightly. I would suggest going that route.
The other alternative is to utilize IObservable<T> from Microsoft's reactive extensions (Rx), a push enumerable. I won't go into the details of how you would do that, but it's something you could look into.
Edit:
The above is assuming that ConsumeChickens and ConsumeGoats are both returning void or are at least not returning IEnumerable<T> themselves - which seems like an obvious assumption. I'd appreciate it if the lame downvoter would actually comment.
Actually simples way to achieve what you what is convert FarmsInEachPen return value to push collection or IObservable and use ReactiveExtensions for working with it
var observable = new Subject<Animals>()
observable.Do(x=> DoSomethingWithChicken(x. Chickens))
observable.Do(x=> DoSomethingWithGoat(x.Goats))
foreach(var item in FarmsInEachPen())
{
observable.OnNext(item)
}
I figured it out, thanks in large part due to the path that #Lee put me on.
You need to share a single enumerator between the two zips, and use an adapter function to project the correct element into the sequence.
private static IEnumerable<object> ConsumeChickens(IEnumerable<int> xList)
{
foreach (var x in xList)
{
Console.WriteLine("X: " + x);
yield return null;
}
}
private static IEnumerable<object> ConsumeGoats(IEnumerable<int> yList)
{
foreach (var y in yList)
{
Console.WriteLine("Y: " + y);
yield return null;
}
}
private static IEnumerable<int> SelectHelper(IEnumerator<AnimalCount> enumerator, int i)
{
bool c = i != 0 || enumerator.MoveNext();
while (c)
{
if (i == 0)
{
yield return enumerator.Current.Chickens;
c = enumerator.MoveNext();
}
else
{
yield return enumerator.Current.Goats;
}
}
}
private static void Main(string[] args)
{
var enumerator = GetAnimals().GetEnumerator();
var chickensList = ConsumeChickens(SelectHelper(enumerator, 0));
var goatsList = ConsumeGoats(SelectHelper(enumerator, 1));
var temp = chickensList.Zip(goatsList, (i, i1) => (object) null);
temp.ToList();
Console.WriteLine("Total iterations: " + iterations);
}
In LINQ Where is a streaming operator. Where-as OrderByDescending is a non-streaming operator. AFAIK, a streaming operator only gathers the next item that is necessary. A non-streaming operator evaluates the entire data stream at once.
I fail to see the relevance of defining a Streaming Operator. To me, it is redundant with Deferred Execution. Take the example where I have written a custom extension and consumed it using the where operator and and orderby.
public static class ExtensionStuff
{
public static IEnumerable<int> Where(this IEnumerable<int> sequence, Func<int, bool> predicate)
{
foreach (int i in sequence)
{
if (predicate(i))
{
yield return i;
}
}
}
}
public static void Main()
{
TestLinq3();
}
private static void TestLinq3()
{
int[] items = { 1, 2, 3,4 };
var selected = items.Where(i => i < 3)
.OrderByDescending(i => i);
Write(selected);
}
private static void Write(IEnumerable<int> selected)
{
foreach(var i in selected)
Console.WriteLine(i);
}
In either case, Where needs to evaluate each element in order to determine which elements meet the condition. The fact that it yields seems to only become relevant because the operator gains deferred execution.
So, what is the importance of Streaming Operators?
There are two aspects: speed and memory.
The speed aspect becomes more apparent when you use a method like .Take() to only consume a portion of the original result set.
// Consumes ten elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.Take(5)
.ToList();
// Consumes one million elements, yields 5 results.
Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.OrderByDescending(i => i)
.Take(5)
.ToList();
Because the first example uses only streaming operators before the call to Take, you only end up yielding values 1 through 10 before Take stops evaluating. Furthermore, only one value is loaded into memory at a time, so you have a very small memory footprint.
In the second example, OrderByDescending is not streaming, so the moment Take pulls the first item, the entire result that's passed through the Where filter has to be placed in memory for sorting. This could take a long time and produce a big memory footprint.
Even if you weren't using Take, the memory issue can be important. For example:
// Puts half a million elements in memory, sorts, then outputs them.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0)
.OrderByDescending(i => i);
foreach(var number in numbers) Console.WriteLine(number);
// Puts one element in memory at a time.
var numbers = Enumerable.Range(1, 1000000).Where(i => i % 2 == 0);
foreach(var number in numbers) Console.WriteLine(number);
The fact that it yields seems to only become relevant because the
operator gains deferred execution.
So, what is the importance of Streaming Operators?
I.e. you could not process infinite sequences with buffering / non-streaming extension methods - while you can "run" such a sequence (until you abort) just fine using only streaming extension methods.
Take for example this method:
public IEnumerable<int> GetNumbers(int start)
{
int num = start;
while(true)
{
yield return num;
num++;
}
}
You can use Where just fine:
foreach (var num in GetNumbers(0).Where(x => x % 2 == 0))
{
Console.WriteLine(num);
}
OrderBy() would not work in this case since it would have to exhaustively enumerate the results before emitting a single number.
Just to be explicit; in the case you mentioned there's no advantage to the fact that where streams, since the orderby sucks the whole thing in anyway. There are however times where the advantages of streaming are used (other answers/comments have given examples), so all LINQ operators stream to the best of their ability. Orderby streams as much as it can, which happens to be not very much. Where streams very effectively.
Can someone please explain me what I am missing here. Based on my basic understanding linq result will be calculated when the result will be used and I can see that in following code.
static void Main(string[] args)
{
Action<IEnumerable<int>> print = (x) =>
{
foreach (int i in x)
{
Console.WriteLine(i);
}
};
int[] arr = { 1, 2, 3, 4, 5 };
int cutoff = 1;
IEnumerable<int> result = arr.Where(x => x < cutoff);
Console.WriteLine("First Print");
cutoff = 3;
print(result);
Console.WriteLine("Second Print");
cutoff = 4;
print(result);
Console.Read();
}
Output:
First Print
1
2
Second Print
1
2
3
Now I changed the
arr.Where(x => x < cutoff);
to
IEnumerable<int> result = arr.Take(cutoff);
and the output is as follow.
First Print
1
Second Print
1
Why with Take, it does not use the current value of the variable?
The behavior your seeing comes from the different way in which the arguments to the LINQ functions are evaluated. The Where method recieves a lambda which captures the value cutoff by reference. It is evaluated on demand and hence sees the value of cutoff at that time.
The Take method (and similar methods like Skip) take an int parameter and hence cutoff is passed by value. The value used is the value of cutoff at the moment the Take method is called, not when the query is evaluated
Note: The term late binding here is a bit incorrect. Late binding generally refers to the process where the members an expression binds to are determined at runtime vs. compile time. In C# you'd accomplish this with dynamic or reflection. The behavior of LINQ to evaluate it's parts on demand is known as delayed execution.
There's a few different things getting confused here.
Late-binding: This is where the meaning of code is determined after it was compiled. For example, x.DoStuff() is early-bound if the compiler checks that objects of x's type have a DoStuff() method (considering extension methods and default arguments too) and then produces the call to it in the code it outputs, or fails with a compiler error otherwise. It is late-bound if the search for the DoStuff() method is done at run-time and throws a run-time exception if there was no DoStuff() method. There are pros and cons to each, and C# is normally early-bound but has support for late-binding (most simply through dynamic but the more convoluted approaches involving reflection also count).
Delayed execution: Strictly speaking, all Linq methods immediately produce a result. However, that result is an object which stores a reference to an enumerable object (often the result of the previous Linq method) which it will process in an appropriate manner when it is itself enumerated. For example, we can write our own Take method as:
private static IEnumerable<T> TakeHelper<T>(IEnumerable<T> source, int number)
{
foreach(T item in source)
{
yield return item;
if(--number == 0)
yield break;
}
}
public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int number)
{
if(source == null)
throw new ArgumentNullException();
if(number < 0)
throw new ArgumentOutOfRangeException();
if(number == 0)
return Enumerable.Empty<T>();
return TakeHelper(source, number);
}
Now, when we use it:
var taken4 = someEnumerable.Take(4);//taken4 has a value, so we've already done
//something. If it was going to throw
//an argument exception it would have done so
//by now.
var firstTaken = taken4.First();//only now does the object in taken4
//do the further processing that iterates
//through someEnumerable.
Captured variables: Normally when we make use of a variable, we make use of how its current state:
int i = 2;
string s = "abc";
Console.WriteLine(i);
Console.WriteLine(s);
i = 3;
s = "xyz";
It's pretty intuitive that this prints 2 and abc and not 3 and xyz. In anonymous functions and lambda expressions though, when we make use of a variable we are "capturing" it as a variable, and so we will end up using the value it has when the delegate is invoked:
int i = 2;
string s = "abc";
Action λ = () =>
{
Console.WriteLine(i);
Console.WriteLine(s);
};
i = 3;
s = "xyz";
λ();
Creating the λ doesn't use the values of i and s, but creates a set of instructions as to what to do with i and s when λ is invoked. Only when that happens are the values of i and s used.
Putting it all together: In none of your cases do you have any late-binding. That is irrelevant to your question.
In both you have delayed execution. Both the call to Take and the call to Where return enumerable objects which will act upon arr when they are enumerated.
In only one do you have a captured variable. The call to Take passes an integer directly to Take and Take makes use of that value. The call to Where passes a Func<int, bool> created from a lambda expression, and that lambda expression captures an int variable. Where knows nothing of this capture, but the Func does.
That's the reason the two behave so differently in how they treat cutoff.
Take doesn't take a lambda, but an integer, as such it can't change when you change the original variable.