Observing an asynchronous sequence with 'yield return'

Observing an asynchronous sequence with 'yield return' - c#

The following sample works fine:
static IEnumerable<int> GenerateNum(int sequenceLength)
{
for(int i = 0; i < sequenceLength; i++)
{
yield return i;
}
}
static void Main(string[] args)
{
//var observ = Observable.Start(() => GenerateNum(1000));
var observ = GenerateNum(1000).ToObservable();
observ.Subscribe(
(x) => Console.WriteLine("test:" + x),
(Exception ex) => Console.WriteLine("Error received from source: {0}.", ex.Message),
() => Console.WriteLine("End of sequence.")
);
Console.ReadKey();
}
However, what I really want is to use the commented out line - i.e. I want to run the 'number generator' asynchronously, and every time it yields a new value, I want it to be output to the console. It doesn't seem to work - how can I modify this code to work?

When doing this for asynchronous execution in a console app, you may want to use the ToObservable(IEnumerable<TSource>, IScheduler) overload (see Observable.ToObservable Method (IEnumerable, IScheduler)). To use the built-in thread pool schedule, for example, try
var observ = GenerateNum(1000).ToObservable(Scheduler.ThreadPool);
It works for me...To expand, the following complete example works exactly as I think you intend:
static Random r = new Random();
static void Main(string[] args) {
var observ = GenerateNum(1000).ToObservable(Scheduler.ThreadPool );
observ.Subscribe(
(x) => Console.WriteLine("test:" + x),
(Exception ex) => Console.WriteLine("Error received from source: {0}.", ex.Message),
() => Console.WriteLine("End of sequence.")
);
while (Console.ReadKey(true).Key != ConsoleKey.Escape) {
Console.WriteLine("You pressed a key.");
}
}
static IEnumerable<int> GenerateNum(int sequenceLength) {
for (int i = 0; i < sequenceLength; i++) {
Thread.Sleep(r.Next(1, 200));
yield return i;
}
}

Related

Parallel.For<int> not working as expected

I wrote a simple Parallel.For loop. But when i run the code, i get random results. I expect var total to be 15 (1+2+3+4+5). I used Interlocked.Add to prevent from race conditions and strange behavior. Can someone explain why the output is random and not 15?
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("before Dowork");
DoWork();
Console.WriteLine("After Dowork");
Console.ReadLine();
}
public static void DoWork()
{
try
{
int total = 0;
var result = Parallel.For<int>(0, 6,
() => 0,
(i, status, y) =>
{
return i;
},
(x) =>
{
Interlocked.Add(ref total, x);
});
if (result.IsCompleted)
Console.WriteLine($"total is: {total}");
else Console.WriteLine("loop not ready yet");
}
catch(Exception e)
{
Console.WriteLine(e.Message);
}
}
}

Instead of using
(i, status, y) =>
{
return i;
}
you should use
(i, status, y) =>
{
return y + i;
}
Parallel.For splits the source sequence into several partitions. The items in each partition are processed sequentially, but multiple partitions may be executed in parallel.
Each partition has a local state. The local state is the return value of the the above lambda function and it is also passed as the y parameter. So the reason for returning y + i should be clear now: you should update the local state to the sum of the previous state and the input value i.
After every item of a partition has been processed, the final value of the local state is passed to the last function, where you sum up all the states:
(x) =>
{
Interlocked.Add(ref total, x);
}

Observable with Time interval not displaying results on subscribe

I am trying to add a time interval to this Observable sequence( That is produce an integer sequence at a specific timespan) but it seems not to be working. When i remove the time, then it works time. Am i applying the timer wrongly?
var timer = Observable.Interval(TimeSpan.FromSeconds(2)).Take(4);
var nums = Observable.Range(1,1200).Where(a => a % 2 == 0);
var sourcenumbs = timer.SelectMany(nums);
var results = sourcenumbs.Subscribe(
x => Console.WriteLine("OnNext: {0}",x),
ex => Console.WriteLine("OnError: {0}",ex.Message),
() => Console.WriteLine("OnComplete")
);
This code displays no output, Does it get Dispose before it reaches the Subscribe?
But if i had a forloop with a timer in it then it works. Why?
for (int i = 0; i < 10; i++)
{
Thread.Sleep(TimeSpan.FromSeconds(0.9));
}

Is this what you want?
static void Main(string[] args)
{
Execute();
Console.ReadKey();
}
private static async void Execute()
{
var intervals = Observable.Interval(TimeSpan.FromSeconds(2)).StartWith(0);
var evenNumbers = Enumerable.Range(1, 1200).Where(a => a % 2 == 0);
var evenNumbersAtIntervals = intervals.Zip(evenNumbers, (_, num) => num);
try
{
await evenNumbersAtIntervals.ForEachAsync(
x => Console.WriteLine("OnNext: {0}", x)
);
Console.WriteLine("Complete");
}
catch(Exception e)
{
Console.WriteLine("Exception " + e);
}
}
Take note that numbers are Enumerable and not Observable.

C# async within an action

I would like to write a method which accept several parameters, including an action and a retry amount and invoke it.
So I have this code:
public static IEnumerable<Task> RunWithRetries<T>(List<T> source, int threads, Func<T, Task<bool>> action, int retries, string method)
{
object lockObj = new object();
int index = 0;
return new Action(async () =>
{
while (true)
{
T item;
lock (lockObj)
{
if (index < source.Count)
{
item = source[index];
index++;
}
else
break;
}
int retry = retries;
while (retry > 0)
{
try
{
bool res = await action(item);
if (res)
retry = -1;
else
//sleep if not success..
Thread.Sleep(200);
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
finally
{
retry--;
}
}
}
}).RunParallel(threads);
}
RunParallel is an extention method for Action, its look like this:
public static IEnumerable<Task> RunParallel(this Action action, int amount)
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < amount; i++)
{
Task task = Task.Factory.StartNew(action);
tasks.Add(task);
}
return tasks;
}
Now, the issue: The thread is just disappearing or collapsing without waiting for the action to finish.
I wrote this example code:
private static async Task ex()
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
{
ints.Add(i);
}
var tasks = RetryComponent.RunWithRetries(ints, 100, async (num) =>
{
try
{
List<string> test = await fetchSmthFromDb();
Console.WriteLine("#" + num + " " + test[0]);
return test[0] == "test";
}
catch (Exception e)
{
Console.WriteLine(e.StackTrace);
return false;
}
}, 5, "test");
await Task.WhenAll(tasks);
}
The fetchSmthFromDb is a simple Task> which fetches something from the db and works perfectly fine when invoked outside of this example.
Whenever the List<string> test = await fetchSmthFromDb(); row is invoked, the thread seems to be closing and the Console.WriteLine("#" + num + " " + test[0]); not even being triggered, also when debugging the breakpoint never hit.
The Final Working Code
private static async Task DoWithRetries(Func<Task> action, int retryCount, string method)
{
while (true)
{
try
{
await action();
break;
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
if (retryCount <= 0)
break;
retryCount--;
await Task.Delay(200);
};
}
public static async Task RunWithRetries<T>(List<T> source, int threads, Func<T, Task<bool>> action, int retries, string method)
{
Func<T, Task> newAction = async (item) =>
{
await DoWithRetries(async ()=>
{
await action(item);
}, retries, method);
};
await source.ParallelForEachAsync(newAction, threads);
}

The problem is in this line:
return new Action(async () => ...
You start an async operation with the async lambda, but don't return a task to await on. I.e. it runs on worker threads, but you'll never find out when it's done. And your program terminates before the async operation is complete -that's why you don't see any output.
It needs to be:
return new Func<Task>(async () => ...
UPDATE
First, you need to split responsibilities of methods, so you don't mix retry policy (which should not be hardcoded to a check of a boolean result) with running tasks in parallel.
Then, as previously mentioned, you run your while (true) loop 100 times instead of doing things in parallel.
As #MachineLearning pointed out, use Task.Delay instead of Thread.Sleep.
Overall, your solution looks like this:
using System.Collections.Async;
static async Task DoWithRetries(Func<Task> action, int retryCount, string method)
{
while (true)
{
try
{
await action();
break;
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
if (retryCount <= 0)
break;
retryCount--;
await Task.Delay(millisecondsDelay: 200);
};
}
static async Task Example()
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
ints.Add(i);
Func<int, Task> actionOnItem =
async item =>
{
await DoWithRetries(async () =>
{
List<string> test = await fetchSmthFromDb();
Console.WriteLine("#" + item + " " + test[0]);
if (test[0] != "test")
throw new InvalidOperationException("unexpected result"); // will be re-tried
},
retryCount: 5,
method: "test");
};
await ints.ParallelForEachAsync(actionOnItem, maxDegreeOfParalellism: 100);
}
You need to use the AsyncEnumerator NuGet Package in order to use the ParallelForEachAsync extension method from the System.Collections.Async namespace.

Besides the final complete reengineering, I think it's very important to underline what was really wrong with the original code.
0) First of all, as #Serge Semenov immediately pointed out, Action has to be replaced with
Func<Task>
But there are still other two essential changes.
1) With an async delegate as argument it is necessary to use the more recent Task.Run instead of the older pattern new TaskFactory.StartNew (or otherwise you have to add Unwrap() explicitly)
2) Moreover the ex() method can't be async since Task.WhenAll must be waited with Wait() and without await.
At that point, even though there are logical errors that need reengineering, from a pure technical standpoint it does work and the output is produced.
A test is available online: http://rextester.com/HMMI93124

How to limit the number of items that are passing concurrently through an entire Dataflow pipeline?

I want to limit the number of items posted in a Dataflow pipeline. The number of items depends of the production environment.
These objects consume a large amount of memory (images) so I would like to post them when the last block of the pipeline has done its job.
I tried to use a SemaphoreSlim to throttle the producer and release it in the last block of the pipeline. It works, but if an exception is raised during the process, the program waits forever and the exception is not intercepted.
Here is a sample which looks like our code.
How can I do this ?
static void Main(string[] args)
{
SemaphoreSlim semaphore = new SemaphoreSlim(1, 2);
var downloadString = new TransformBlock<string, string>(uri =>
{
Console.WriteLine("Downloading '{0}'...", uri);
return new WebClient().DownloadString(uri);
});
var createWordList = new TransformBlock<string, string[]>(text =>
{
Console.WriteLine("Creating word list...");
char[] tokens = text.ToArray();
for (int i = 0; i < tokens.Length; i++)
{
if (!char.IsLetter(tokens[i]))
tokens[i] = ' ';
}
text = new string(tokens);
return text.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
});
var filterWordList = new TransformBlock<string[], string[]>(words =>
{
Console.WriteLine("Filtering word list...");
throw new InvalidOperationException("ouch !"); // explicit for test
return words.Where(word => word.Length > 3).OrderBy(word => word)
.Distinct().ToArray();
});
var findPalindromes = new TransformBlock<string[], string[]>(words =>
{
Console.WriteLine("Finding palindromes...");
var palindromes = new ConcurrentQueue<string>();
Parallel.ForEach(words, word =>
{
string reverse = new string(word.Reverse().ToArray());
if (Array.BinarySearch<string>(words, reverse) >= 0 &&
word != reverse)
{
palindromes.Enqueue(word);
}
});
return palindromes.ToArray();
});
var printPalindrome = new ActionBlock<string[]>(palindromes =>
{
try
{
foreach (string palindrome in palindromes)
{
Console.WriteLine("Found palindrome {0}/{1}",
palindrome, new string(palindrome.Reverse().ToArray()));
}
}
finally
{
semaphore.Release();
}
});
downloadString.LinkTo(createWordList);
createWordList.LinkTo(filterWordList);
filterWordList.LinkTo(findPalindromes);
findPalindromes.LinkTo(printPalindrome);
downloadString.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)createWordList).Fault(t.Exception);
else createWordList.Complete();
});
createWordList.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)filterWordList).Fault(t.Exception);
else filterWordList.Complete();
});
filterWordList.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)findPalindromes).Fault(t.Exception);
// enter here when an exception throws
else findPalindromes.Complete();
});
findPalindromes.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
((IDataflowBlock)printPalindrome).Fault(t.Exception);
// the fault is propagated here but not caught
else printPalindrome.Complete();
});
try
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
downloadString.Post("http://www.google.com");
semaphore.Wait(); // waits here when an exception throws
}
downloadString.Complete();
printPalindrome.Completion.Wait();
}
catch (AggregateException agg)
{
Console.WriteLine("An error has occured : " + agg);
}
Console.WriteLine("Done");
Console.ReadKey();
}

You should simply wait on both the semaphore and the completion task together. In that way if the block ends prematurely (either by exception or cancellation) then the exception will be rethrown and if not then you will wait on your semaphore until there's room to post more.
You can do that with Task.WhenAny and SemaphoreSlim.WaitAsync:
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
downloadString.Post("http://www.google.com");
if (printPalindrome.Completion.IsCompleted)
{
break;
}
Task.WhenAny(semaphore.WaitAsync(), printPalindrome.Completion).Wait();
}
Note: using Task.Wait is only appropriate in this case as it's Main. Usually this should be an async method and you should await the task returned from Task.WhenAny.

This is how I handled throttling or only allowing 10 items in the source block at any one time. You could modify this to have 1. Make sure that you also throttle any other blocks in the pipeline, otherwise, you could get the source block with 1 and the next block with a lot more.
var sourceBlock = new BufferBlock<string>(
new ExecutionDataflowBlockOptions() {
SingleProducerConstrained = true,
BoundedCapacity = 10 });
Then the producer does this:
sourceBlock.SendAsync("value", shutdownToken).Wait(shutdownToken);
If you're using async / await, just await the SendAsync call.

Reactive Extensions and Retry

So a series of articles popped on my radar this morning. It started with this question, which lead to the original example and source code on GitHub.
I rewrote it slightly, so I can start using it in Console and Service applications:
public static class Extensions
{
static readonly TaskPoolScheduler Scheduler = new TaskPoolScheduler(new TaskFactory());
// Licensed under the MIT license with <3 by GitHub
/// <summary>
/// An exponential back off strategy which starts with 1 second and then 4, 8, 16...
/// </summary>
[SuppressMessage("Microsoft.Security", "CA2104:DoNotDeclareReadOnlyMutableReferenceTypes")]
public static readonly Func<int, TimeSpan> ExponentialBackoff = n => TimeSpan.FromSeconds(Math.Pow(n, 2));
/// <summary>
/// A linear strategy which starts with 1 second and then 2, 3, 4...
/// </summary>
[SuppressMessage("Microsoft.Security", "CA2104:DoNotDeclareReadOnlyMutableReferenceTypes")]
public static readonly Func<int, TimeSpan> LinearStrategy = n => TimeSpan.FromSeconds(1*n);
/// <summary>
/// Returns a cold observable which retries (re-subscribes to) the source observable on error up to the
/// specified number of times or until it successfully terminates. Allows for customizable back off strategy.
/// </summary>
/// <param name="source">The source observable.</param>
/// <param name="retryCount">The number of attempts of running the source observable before failing.</param>
/// <param name="strategy">The strategy to use in backing off, exponential by default.</param>
/// <param name="retryOnError">A predicate determining for which exceptions to retry. Defaults to all</param>
/// <param name="scheduler">The scheduler.</param>
/// <returns>
/// A cold observable which retries (re-subscribes to) the source observable on error up to the
/// specified number of times or until it successfully terminates.
/// </returns>
[SuppressMessage("Microsoft.Reliability", "CA2000:Dispose objects before losing scope")]
public static IObservable<T> RetryWithBackoffStrategy<T>(
this IObservable<T> source,
int retryCount = 3,
Func<int, TimeSpan> strategy = null,
Func<Exception, bool> retryOnError = null,
IScheduler scheduler = null)
{
strategy = strategy ?? ExponentialBackoff;
scheduler = scheduler ?? Scheduler;
if (retryOnError == null)
retryOnError = e => true;
int attempt = 0;
return Observable.Defer(() =>
{
return ((++attempt == 1) ? source : source.DelaySubscription(strategy(attempt - 1), scheduler))
.Select(item => new Tuple<bool, T, Exception>(true, item, null))
.Catch<Tuple<bool, T, Exception>, Exception>(e => retryOnError(e)
? Observable.Throw<Tuple<bool, T, Exception>>(e)
: Observable.Return(new Tuple<bool, T, Exception>(false, default(T), e)));
})
.Retry(retryCount)
.SelectMany(t => t.Item1
? Observable.Return(t.Item2)
: Observable.Throw<T>(t.Item3));
}
}
Now to test how it works, I've written this small program:
class Program
{
static void Main(string[] args)
{
int tryCount = 0;
var cts = new CancellationTokenSource();
var sched = new TaskPoolScheduler(new TaskFactory());
var source = Observable.Defer(
() =>
{
Console.WriteLine("Action {0}", tryCount);
var a = 5/tryCount++;
return Observable.Return("yolo");
});
source.RetryWithBackoffStrategy(scheduler: sched, strategy: Extensions.LinearStrategy, retryOnError: exception => exception is DivideByZeroException);
while (!cts.IsCancellationRequested)
source.Subscribe(
res => { Console.WriteLine("Result: {0}", res); },
ex =>
{
Console.WriteLine("Error: {0}", ex.Message);
},
() =>
{
cts.Cancel();
Console.WriteLine("End Processing after {0} attempts", tryCount);
});
}
}
Initially I have thought, that the event of subscription, will automatically trigger all the subsequent retires. That was not the case, so I had to implement a Cancellation Token and loop until it signals that all reties have been exhausted.
The other option is to use AutoResetEvent:
class Program
{
static void Main(string[] args)
{
int tryCount = 0;
var auto = new AutoResetEvent(false);
var source = Observable.Defer(
() =>
{
Console.WriteLine("Action {0}", tryCount);
var a = 5/tryCount++;
return Observable.Return("yolo");
});
source.RetryWithBackoffStrategy(strategy: Extensions.LinearStrategy, retryOnError: exception => exception is DivideByZeroException);
while (!auto.WaitOne(1))
{
source.Subscribe(
res => { Console.WriteLine("Result: {0}", res); },
ex =>
{
Console.WriteLine("Error: {0}", ex.Message);
},
() =>
{
Console.WriteLine("End Processing after {0} attempts", tryCount);
auto.Set();
});
}
}
}
In both scenarios it will display these lines:
Action 0
Error: Attempted to divide by zero.
Action 1
Result: yolo
End Processing after 2 attempts
The question I have to this crowd is: Is this the best way to use this extension? Or is there a way to subscribe to the Observable so it will re-fire itself, up to the number of retries?
FINAL UPDATE
Based on Brandon's suggestion, this is the proper way of subscribing:
internal class Program
{
#region Methods
private static void Main(string[] args)
{
int tryCount = 0;
IObservable<string> source = Observable.Defer(
() =>
{
Console.WriteLine("Action {0}", tryCount);
int a = 5 / tryCount++;
return Observable.Return("yolo");
});
source.RetryWithBackoffStrategy(strategy: Extensions.ExponentialBackoff, retryOnError: exception => exception is DivideByZeroException, scheduler: Scheduler.Immediate)
.Subscribe(
res => { Console.WriteLine("Result: {0}", res); },
ex => { Console.WriteLine("Error: {0}", ex.Message); },
() =>
{
Console.WriteLine("End Processing after {0} attempts", tryCount);
});
}
#endregion
}
The output will be slightly different:
Action 0
Action 1
Result: yolo
End Processing after 2 attempts
This turned out to be quite useful extension. Here is another example how it can be used, where strategy and error processing is given using delegates.
internal class Program
{
#region Methods
private static void Main(string[] args)
{
int tryCount = 0;
IObservable<string> source = Observable.Defer(
() =>
{
Console.WriteLine("Action {0}", tryCount);
int a = 5 / tryCount++;
return Observable.Return("yolo");
});
source.RetryWithBackoffStrategy(
strategy: i => TimeSpan.FromMilliseconds(1),
retryOnError: exception =>
{
if (exception is DivideByZeroException)
{
Console.WriteLine("Tried to divide by zero");
return true;
}
return false;
},
scheduler: Scheduler.Immediate).Subscribe(
res => { Console.WriteLine("Result: {0}", res); },
ex => { Console.WriteLine("Error: {0}", ex.Message); },
() =>
{
Console.WriteLine("Succeeded after {0} attempts", tryCount);
});
}
#endregion
}
Output:
Action 0
Tried to divide by zero
Action 1
Result: yolo
Succeeded after 2 attempts

Yeah Rx is generally asynchronous so when writing tests, you need to wait for it to finish (otherwise Main just exits right after your call to Subscribe).
Also, make sure you subscribe to the observable produced by calling source.RetryWithBackoffStrategy(...). That produces a new observable that has the retry semantics.
Easiest solution in cases like this is to literally use Wait:
try
{
var source2 = source.RetryWithBackoffStrategy(/*...*/);
// blocks the current thread until the source finishes
var result = source2.Wait();
Console.WriteLine("result=" + result);
}
catch (Exception err)
{
Console.WriteLine("uh oh", err);
}
If you use something like NUnit (which supports asynchronous tests) to write your tests, then you can do:
[Test]
public async Task MyTest()
{
var source = // ...;
var source2 = source.RetryWithBackoffStrategy(/*...*/);
var result = await source2; // you can await observables
Assert.That(result, Is.EqualTo(5));
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Observing an asynchronous sequence with 'yield return' - c#

Related

Parallel.For<int> not working as expected

Observable with Time interval not displaying results on subscribe

C# async within an action

How to limit the number of items that are passing concurrently through an entire Dataflow pipeline?

Reactive Extensions and Retry

Categories

Resources