I'm trying to change Stephen Toub's ForEachAsync<T> extension method into an extension which returns a result...
Stephen's extension:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
My approach (not working; tasks get executed but result is wrong)
public static Task<TResult[]> ForEachAsync<T, TResult>(this IList<T> source,
int degreeOfParallelism, Func<T, Task<TResult>> body)
{
return Task.WhenAll<TResult>(
from partition in Partitioner.Create(source).GetPartitions(degreeOfParallelism)
select Task.Run<TResult>(async () =
{
using (partition)
while (partition.MoveNext())
await body(partition.Current); // When I "return await",
// I get good results but only one per partition
return default(TResult);
}));
}
I know I somehow have to return (WhenAll?) the results from the last part but I didn't yet figure out how to do it...
Update: The result I get is just degreeOfParallelism times null (I guess because of default(TResult)) even though all the tasks get executed. I also tried to return await body(...) and then the result was fine, but only degreeOfParallelism number of tasks got executed.
Now that the Parallel.ForEachAsync API has become part of the standard libraries (.NET 6), it makes sense to implement a variant that returns a Task<TResult[]>, based on this API. Here is an implementation:
/// <summary>
/// Executes a foreach loop on an enumerable sequence, in which iterations may run
/// in parallel, and returns the results of all iterations in the original order.
/// </summary>
public static Task<TResult[]> ForEachAsync<TSource, TResult>(
IEnumerable<TSource> source,
ParallelOptions parallelOptions,
Func<TSource, CancellationToken, ValueTask<TResult>> body)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(parallelOptions);
ArgumentNullException.ThrowIfNull(body);
List<TResult> results = new();
if (source.TryGetNonEnumeratedCount(out int count)) results.Capacity = count;
IEnumerable<(TSource, int)> withIndexes = source.Select((x, i) => (x, i));
return Parallel.ForEachAsync(withIndexes, parallelOptions, async (entry, ct) =>
{
(TSource item, int index) = entry;
TResult result = await body(item, ct).ConfigureAwait(false);
lock (results)
{
while (results.Count <= index) results.Add(default);
results[index] = result;
}
}).ContinueWith(t =>
{
if (t.IsFaulted)
{
TaskCompletionSource<TResult[]> tcs = new();
tcs.SetException(t.Exception.InnerExceptions);
return tcs.Task;
}
if (t.IsCanceled)
{
TaskCompletionSource<TResult[]> tcs = new();
tcs.SetCanceled(new TaskCanceledException(t).CancellationToken);
return tcs.Task;
}
Debug.Assert(t.IsCompletedSuccessfully);
lock (results) return Task.FromResult(results.ToArray());
}, default, TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default).Unwrap();
}
This implementation supports all the options and the functionality of the Parallel.ForEachAsync overload that has an IEnumerable<T> as source. Its behavior in case of errors and cancellation is identical. The results are arranged in the same order as the associated elements in the source sequence.
Your LINQ query can only ever have the same number of results as the number of partitions - you're just projecting each partition into a single result.
If you don't care about the order, you just need to assemble the results of each partition into a list, then flatten them afterwards.
public static async Task<TResult[]> ExecuteInParallel<T, TResult>(this IList<T> source, int degreeOfParalleslism, Func<T, Task<TResult>> body)
{
var lists = await Task.WhenAll<List<TResult>>(
Partitioner.Create(source).GetPartitions(degreeOfParalleslism)
.Select(partition => Task.Run<List<TResult>>(async () =>
{
var list = new List<TResult>();
using (partition)
{
while (partition.MoveNext())
{
list.Add(await body(partition.Current));
}
}
return list;
})));
return lists.SelectMany(list => list).ToArray();
}
(I've renamed this from ForEachAsync, as ForEach sounds imperative (suitable for the Func<T, Task> in the original) whereas this is fetching results. A foreach loop doesn't have a result - this does.)
Related
I need the functionality of a hysteresis filter in RX. It should emit a value from the source stream only when the previously emitted value and the current input value differ by a certain amount. As a generic extension method, it could have the following signature:
public static IObservable<T> HysteresisFilter<T>(this IObservable<t> source, Func<T/*previously emitted*/, T/*current*/, bool> filter)
I was not able to figure out how to implement this with existing operators. I was looking for something like lift from RxJava, any other method to create my own operator. I have seen this checklist, but I haven't found any example on the web.
The following approaches (both are actually the same) which seem workaround to me work, but is there a more Rx way to do this, like without wrapping a subject or actually implementing an operator?
async Task Main()
{
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
var rnd = new Random();
var s = Observable.Interval(TimeSpan.FromMilliseconds(10))
.Scan(0d, (a,_) => a + rnd.NextDouble() - 0.5)
.Publish()
.AutoConnect()
;
s.Subscribe(Console.WriteLine, cts.Token);
s.HysteresisFilter((p, c) => Math.Abs(p - c) > 1d).Subscribe(x => Console.WriteLine($"1> {x}"), cts.Token);
s.HysteresisFilter2((p, c) => Math.Abs(p - c) > 1d).Subscribe(x => Console.WriteLine($"2> {x}"), cts.Token);
await Task.Delay(Timeout.InfiniteTimeSpan, cts.Token).ContinueWith(_=>_, TaskContinuationOptions.OnlyOnCanceled);
}
public static class ReactiveOperators
{
public static IObservable<T> HysteresisFilter<T>(this IObservable<T> source, Func<T, T, bool> filter)
{
return new InternalHysteresisFilter<T>(source, filter).AsObservable;
}
public static IObservable<T> HysteresisFilter2<T>(this IObservable<T> source, Func<T, T, bool> filter)
{
var subject = new Subject<T>();
T lastEmitted = default;
bool emitted = false;
source.Subscribe(
value =>
{
if (!emitted || filter(lastEmitted, value))
{
subject.OnNext(value);
lastEmitted = value;
emitted = true;
}
}
, ex => subject.OnError(ex)
, () => subject.OnCompleted()
);
return subject;
}
private class InternalHysteresisFilter<T>: IObserver<T>
{
Func<T, T, bool> filter;
T lastEmitted;
bool emitted;
private readonly Subject<T> subject = new Subject<T>();
public IObservable<T> AsObservable => subject;
public InternalHysteresisFilter(IObservable<T> source, Func<T, T, bool> filter)
{
this.filter = filter;
source.Subscribe(this);
}
public IDisposable Subscribe(IObserver<T> observer)
{
return subject.Subscribe(observer);
}
public void OnNext(T value)
{
if (!emitted || filter(lastEmitted, value))
{
subject.OnNext(value);
lastEmitted = value;
emitted = true;
}
}
public void OnError(Exception error)
{
subject.OnError(error);
}
public void OnCompleted()
{
subject.OnCompleted();
}
}
}
Sidenote: There will be several thousand of such filters applied to as many streams. I need throughput over latency, thus I am looking for the solution with the minimum of overhead both in CPU and in memory even if others look fancier.
Most examples I've seen in the book Introduction to Rx are using the method Observable.Create for creating new operators.
The Create factory method is the preferred way to implement custom observable sequences. The usage of subjects should largely remain in the realms of samples and testing. (citation)
public static IObservable<T> HysteresisFilter<T>(this IObservable<T> source,
Func<T, T, bool> predicate)
{
return Observable.Create<T>(observer =>
{
T lastEmitted = default;
bool emitted = false;
return source.Subscribe(value =>
{
if (!emitted || predicate(lastEmitted, value))
{
observer.OnNext(value);
lastEmitted = value;
emitted = true;
}
}, observer.OnError, observer.OnCompleted);
});
}
This answer is the same is equivalent to #Theodor's, but it avoids using Observable.Create, which I generally would avoid.
public static IObservable<T> HysteresisFilter2<T>(this IObservable<T> source,
Func<T, T, bool> predicate)
{
return source
.Scan((emitted: default(T), isFirstItem: true, emit: false), (state, newItem) => state.isFirstItem || predicate(state.emitted, newItem)
? (newItem, false, true)
: (state.emitted, false, false)
)
.Where(t => t.emit)
.Select(t => t.emitted);
}
.Scan is what you want to use when you're tracking state across items within an observable.
I have the below method (it's an extension method but not relevant to this question) and I would like to use GroupBy on the results of the method.
class MyClass
{
public async Task<string> GetRank()
{
return "X";
}
public async static Task Test()
{
List<MyClass> items = new List<MyClass>() { new MyClass() };
var grouped = items.GroupBy(async _ => (await _.GetRank()));
}
}
The type of grouped is IGrouping<Task<string>, MyClass>, however I need to group by the actual awaited result of the async method (string). Despite using await and making the lambda async, I still get IGrouping<Task<string>, ..> instead of IGrouping<string, ...>
How to use GroupBy and group by a result of async Task<string> method and get a grouping by string?
You probably are looking to await all your tasks first, then group
// projection to task
var tasks = items.Select(y => AsyncMethod(y);
// Await them all
var results = await Task.WhenAll(tasks)
// group stuff
var groups = results.GroupBy(x => ...);
Full Demo here
Note : You didnt really have any testable code so i just plumbed up something similar
Update
the reason why you example isn't working
items.GroupBy(async _ => (await _.GetRank()))
is because and async lambda is really just a method that returns a task, this is why you are getting IGrouping<Task<string>, MyClass>
You need to wait for all you tasks to finish first before you can think about doing anything with the results from the task
To further explain what is happening take a look at this SharpLab example
Your async lambda basically resolves to this
new Func<int, Task<string>>(<>c__DisplayClass1_.<M>b__0)
Here is an asynchronous version of GroupBy. It expects a task as the result of keySelector, and returns a task that can be awaited:
public static async Task<IEnumerable<IGrouping<TKey, TSource>>>
GroupByAsync<TSource, TKey>(this IEnumerable<TSource> source,
Func<TSource, Task<TKey>> keySelector)
{
var tasks = source.Select(async item => (Key: await keySelector(item), Item: item));
var entries = await Task.WhenAll(tasks);
return entries.GroupBy(entry => entry.Key, entry => entry.Item);
}
It can be used like this:
class MyClass
{
public async Task<string> GetRank()
{
await Task.Delay(100);
return "X";
}
public async static Task Test()
{
var items = new List<MyClass>() { new MyClass(), new MyClass() };
var grouped = items.GroupByAsync(async _ => (await _.GetRank()));
foreach (var grouping in await grouped)
{
Console.WriteLine($"Key: {grouping.Key}, Count: {grouping.Count()}");
}
}
}
Output:
Key: X, Count: 2
Here's a dumbed-down version of what I want to do:
private static int Inc(int input)
{
return input + 1;
}
private static async Task<int> IncAsync(int input)
{
await Task.Delay(200);
return input + 1;
}
private static async Task<IEnumerable<TResult>> GetResultsAsync<TInput, TResult>(Func<TInput, TResult> func, IEnumerable<TInput> values)
{
var tasks = values.Select(value => Task.Run(() => func(value)))
.ToList();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result);
}
public async void TestAsyncStuff()
{
var numbers = new[] { 1, 2, 3, 4 };
var resultSync = await GetResultsAsync(Inc, numbers); // returns IEnumerable<int>
Console.WriteLine(string.Join(",", resultSync.Select(n => $"{n}")));
// The next line is the important one:
var resultAsync = await GetResultsAsync(IncAsync, numbers); // returns IEnumerable<Task<int>>
}
So basically, GetResultsAsync() is intended to be a generic method that will get the results of a function for a set of input values. In TestAsyncStuff() you can see how it would work for calling a synchronous function (Inc()).
The trouble comes when I want to call an asynchronous function (IncAsync()). The result I get back is of type IEnumerable<Task<int>>. I could do a Task.WhenAll() on that result, and that works:
var tasksAsync = (await GetResultsAsync(IncAsync, numbers)).ToList();
await Task.WhenAll(tasksAsync);
var resultAsync = tasksAsync.Select(t => t.Result);
Console.WriteLine(string.Join(",", resultAsync.Select(n => $"{n}")));
But I'd like to tighten up the code and do the await inline. It should look something like this:
var resultAsync = await GetResultsAsync(async n => await IncAsync(n), numbers);
But that also returns an IEnumerable<Task<int>>! I could do this:
var resultAsync = await GetResultsAsync(n => IncAsync(n).GetAwaiter().GetResult(), numbers);
And that works... but from what I've seen, use of Task.GetAwaiter().GetResult() or Task.Result is not encouraged.
So what is the correct way to do this?
You should create two overloads of GetResultsAsync. One should accept a 'synchronous' delegate which returns TResult. This method will wrap each delegate into a task, and run them asynchronously:
private static async Task<IEnumerable<TResult>> GetResultsAsync<TInput, TResult>(
Func<TInput, TResult> func, IEnumerable<TInput> values)
{
var tasks = values.Select(value => Task.Run(() => func(value)));
return await Task.WhenAll(tasks);
}
The second overload will accept an 'asynchronous' delegate, which returns Task<TResult>. This method doesn't need to wrap each delegate into a task, because they are already tasks:
private static async Task<IEnumerable<TResult>> GetResultsAsync<TInput, TResult>(
Func<TInput, Task<TResult>> func, IEnumerable<TInput> values)
{
var tasks = values.Select(value => func(value));
return await Task.WhenAll(tasks);
}
You even can call the second method from the first one to avoid code duplication:
private static async Task<IEnumerable<TResult>> GetResultsAsync<TInput, TResult>(
Func<TInput, TResult> func, IEnumerable<TInput> values)
{
return await GetResultsAsync(x => Task.Run(() => func(x)), values);
}
NOTE: These methods don't simplify your life a lot. The same results can be achieved with
var resultSync = await Task.WhenAll(numbers.Select(x => Task.Run(() => Inc(x))));
var resultAsync = await Task.WhenAll(numbers.Select(IncAsync));
I'd say that your concern is a stylistic one: you want something that reads better. For your first case consider:
var resultSync= numbers.AsParallel()/*.AsOrdered()*/.Select(Inc);
on the grounds that Plinq already does what you're trying to do: It parallelizes IEnumerables. For your second case, there's no point in creating Tasks around Tasks. The equivalent would be:
var resultAsync = numbers.AsParallel()./*AsOrdered().*/Select(n => IncAsync(n).Result);
but I like Sergey's await Task.WhenAll(numbers.Select(IncAsync)) better.
Perhaps what I really like is a Linq style pair of overloads:
var numbers = Enumerable.Range(1,6);
var resultSync = await Enumerable.Range(1,6).SelectAsync(Inc);
var resultAsync = await Enumerable.Range(1,100).SelectAsync(IncAsync);
Console.WriteLine("sync" + string.Join(",", resultSync));
Console.WriteLine("async" + string.Join(",", resultAsync));
static class IEnumerableTasks
{
public static Task<TResult[]> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> func)
{
return Task.WhenAll( source.Select(async n => await Task.Run(()=> func(n))));
}
public static Task<TResult[]> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> func)
{
return Task.WhenAll(source.Select(func));
}
}
static int Inc(int input)
{
Task.Delay(1000).Wait();
return input+1;
}
static async Task<int> IncAsync(int input)
{
await Task.Delay(1000);
return input + 1;
}
Which, incidentally, if you change Range(1,6) to Range(1,40) shows the advantage of async. On my machine, the timing for the sync can rise steeply where the async version stays at a second or so even for Range(1, 100000)
In C#6 I have the following extensions:
public static void With<T>(this T value, Action<T> action) {
action(value);
}
public static R With<T, R>(this T value, Func<T, R> function) {
return function(value);
}
Is there a way to have Async versions of these extensions?
UPDATE
I am adding an example to clarify. Consider (context is EF context):
IList<Post> posts = context.Posts.With(x => x.ToList());
Now how to do this if I would like to use ToListAsync?
IList<Post> posts = await context.Posts.WithAsync(x => x.ToListAsync());
Or
IList<Post> posts = context.Posts.WithAsync(x => await x.ToListAsync());
What should be the best approach and how would the extension look like?
I will strongly suggest not to use async/await in your extension methods to skip generation of state machine. Just return task and wait or await them when you need them
You can use your second method for async case too
public static R With<T>(this T value, Func<T, R> function)
{
return function(value);
}
Or you can constraint method for only async use
public static R WithAsync<T, R>(this T value, Func<T, R> function)
where R : Task
{
return function(value);
}
I have a blog post on asynchronous delegate types. In summary, the async version of Action<T> is Func<T, Task>, and the async version of Func<T, R> is Func<T, Task<R>>.
I recommend you provide all overloads for maximum usability:
public static void With<T>(this T value, Action<T> action) {
action(value);
}
public static R With<T, R>(this T value, Func<T, R> function) {
return function(value);
}
public static Task With<T>(this T value, Func<T, Task> function) {
return function(value);
}
public static Task<R> With<T, R>(this T value, Func<T, Task<R>> function) {
return function(value);
}
Just do it as with any other function:
public static async Task With<T>(this T value, Func<T, Task> action) {
await action(value);
}
public static async Task<R> With<T, R>(this T value, Func<T, Task<R>> function) {
return await function(value);
}
make it async.
make it return a Task. If you need an actual return type use Task<InsertReturnTypeHere> instead of Task
and for good measure, name it WithAsync. That will allow With<T> to coexist with the async implementation, and it's also common convention.
public static async Task WithAsync<T>(this T value, Action<T> action)
{
await actionAsync(value);
}
public static void With<T>(this T value, Action<T> action) {
action(value);
}
Have your Action schedule a Task itself. With does not expect any value in return so it doesn't have to care how the action is run.
public static R With<T, R>(this T value, Func<T, R> function) {
return function(value);
}
Supply a function which returns a Task. You can use it like var y = await x.With(async z => { /* ... */ });.
Conclusion: you do not need to make any changes.
It depends on the amount of processing you intend to do and how you intent for it to be processed.
Do you need a Thread? If so then using Task provides a good alternative to Thread.
Otherwise there are quite a few threads which may already be available in the Thread Pool for your to use, See this question You can access these threads using 'BeginInvoke'.
static void _TestLogicForBeginInvoke(int i)
{
System.Threading.Thread.Sleep(10);
System.Console.WriteLine("Tested");
}
static void _Callback(IAsyncResult iar)
{
System.Threading.Thread.Sleep(10);
System.Console.WriteLine("Callback " + iar.CompletedSynchronously);
}
static void TestBeginInvoke()
{
//Callback is written after Tested and NotDone.
var call = new System.Action<int>(_TestLogicForBeginInvoke);
//Start the call
var callInvocation = call.BeginInvoke(0, _Callback, null);
//Write output
System.Console.WriteLine("Output");
int times = 0;
//Wait for the call to be completed a few times
while (false == callInvocation.IsCompleted && ++times < 10)
{
System.Console.WriteLine("NotDone");
}
//Probably still not completed.
System.Console.WriteLine("IsCompleted " + callInvocation.IsCompleted);
//Can only be called once, should be called to free the thread assigned to calling the logic assoicated with BeginInvoke and the callback.
call.EndInvoke(callInvocation);
}//Callback
The output should be:
Output
NotDone
NotDone
NotDone
NotDone
NotDone
NotDone
NotDone
NotDone
NotDone
IsCompleted False
Tested
Callback False
Any 'Delegate' type you define can be invoked on the Thread Pool using the 'BeginInvoke' method of the delegate instance. See also MSDN
I have a function like this:
public async Task<SomeViewModel> SampleFunction()
{
var data = service.GetData();
var myList = new List<SomeViewModel>();
myList.AddRange(data.select(x => new SomeViewModel
{
Id = x.Id,
DateCreated = x.DateCreated,
Data = await service.GetSomeDataById(x.Id)
}
return myList;
}
My await isn't working as it can only be used in a method or lambda marked with the async modifier. Where do I place the async with this function?
You can only use await inside an async method/delegate. In this case you must mark that lambda expression as async.
But wait, there's more...
Select is from the pre-async era and so it doesn't handle async lambdas (in your case it would return IEnumerable<Task<SomeViewModel>> instead of IEnumerable<SomeViewModel> which is what you actually need).
You can however add that functionality yourself (preferably as an extension method), but you need to consider whether you wish to await each item before moving on to the next (sequentialy) or await all items together at the end (concurrently).
Sequential async
static async Task<TResult[]> SelectAsync<TItem, TResult>(this IEnumerable<TItem> enumerable, Func<TItem, Task<TResult>> selector)
{
var results = new List<TResult>();
foreach (var item in enumerable)
{
results.Add(await selector(item));
}
return results.ToArray();
}
Concurrent async
static Task<TResult[]> SelectAsync<TItem, TResult>(this IEnumerable<TItem> enumerable, Func<TItem, Task<TResult>> selector)
{
return Task.WhenAll(enumerable.Select(selector));
}
Usage
public Task<SomeViewModel[]> SampleFunction()
{
return service.GetData().SelectAsync(async x => new SomeViewModel
{
Id = x.Id,
DateCreated = x.DateCreated,
Data = await service.GetSomeDataById(x.Id)
}
}
You're using await inside of a lambda, and that lambda is going to be transformed into its own separate named method by the compiler. To use await it must itself be async, and not just be defined in an async method. When you make the lambda async you now have a sequence of tasks that you want to translate into a sequence of their results, asynchronously. Task.WhenAll does exactly this, so we can pass our new query to WhenAll to get a task representing our results, which is exactly what this method wants to return:
public Task<SomeViewModel[]> SampleFunction()
{
return Task.WhenAll(service.GetData().Select(
async x => new SomeViewModel
{
Id = x.Id,
DateCreated = x.DateCreated,
Data = await service.GetSomeDataById(x.Id)
}));
}
Though maybe too heavyweight for your use case, using TPL Dataflow will give you finer control over your async processing.
public async Task<List<SomeViewModel>> SampleFunction()
{
var data = service.GetData();
var transformBlock = new TransformBlock<X, SomeViewModel>(
async x => new SomeViewModel
{
Id = x.Id,
DateCreated = x.DateCreated,
Data = await service.GetSomeDataById(x.Id)
},
new ExecutionDataflowBlockOptions
{
// Let 8 "service.GetSomeDataById" calls run at once.
MaxDegreeOfParallelism = 8
});
var result = new List<SomeViewModel>();
var actionBlock = new ActionBlock<SomeViewModel>(
vm => result.Add(vm));
transformBlock.LinkTo(actionBlock,
new DataflowLinkOptions { PropagateCompletion = true });
foreach (var x in data)
{
transformBlock.Post(x);
}
transformBlock.Complete();
await actionBlock.Completion;
return result;
}
This could be substantially less long-winded if service.GetData() returned an IObservable<X> and this method returned an IObservable<SomeViewModel>.