Parallel ForEach wait 500 ms before spawning - c#

I have this situation:
var tasks = new List<ITask> ...
Parallel.ForEach(tasks, currentTask => currentTask.Execute() );
Is it possible to instruct PLinq to wait for 500ms before the next thread is spawned?
System.Threading.Thread.Sleep(5000);

You are using Parallel.Foreach totally wrong, You should make a special Enumerator that rate limits itself to getting data once every 500 ms.
I made some assumptions on how your DTO works due to you not providing any details.
private IEnumerator<SomeResource> GetRateLimitedResource()
{
SomeResource someResource = null;
do
{
someResource = _remoteProvider.GetData();
if(someResource != null)
{
yield return someResource;
Thread.Sleep(500);
}
} while (someResource != null);
}
here is how your paralell should look then
Parallel.ForEach(GetRateLimitedResource(), SomeFunctionToProcessSomeResource);

There are already some good suggestions. I would agree with others that you are using PLINQ in a manner it wasn't meant to be used.
My suggestion would be to use System.Threading.Timer. This is probably better than writing a method that returns an IEnumerable<> that forces a half second delay, because you may not need to wait the full half second, depending on how much time has passed since your last API call.
With the timer, it will invoke a delegate that you've provided it at the interval you specify, so even if the first task isn't done, a half second later it will invoke your delegate on another thread, so there won't be any extra waiting.
From your example code, it sounds like you have a list of tasks, in this case, I would use System.Collections.Concurrent.ConcurrentQueue to keep track of the tasks. Once the queue is empty, turn off the timer.

You could use Enumerable.Aggregate instead.
var task = tasks.Aggregate((t1, t2) =>
t1.ContinueWith(async _ =>
{ Thread.Sleep(500); return t2.Result; }));
If you don't want the tasks chained then there is also the overload to Select assuming the tasks are in order of delay.
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => x * 2))
.Select((x, i) => Task.Delay(TimeSpan.FromMilliseconds(i * 500))
.ContinueWith(_ => x.Result));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
From the comments a better options would be to guard the resource instead of using the time delay.
static object Locker = new object();
static int GetResultFromResource(int arg)
{
lock(Locker)
{
Thread.Sleep(500);
return arg * 2;
}
}
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => GetResultFromResource(x)));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}

In this case how about a Producer-Consumer pattern with a BlockingCollection<T>?
var tasks = new BlockingCollection<ITask>();
// add tasks, if this is an expensive process, put it out onto a Task
// tasks.Add(x);
// we're done producin' (allows GetConsumingEnumerable to finish)
tasks.CompleteAdding();
RunTasks(tasks);
With a single consumer thread:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Execute();
// this may not be as accurate as you would like
Thread.Sleep(500);
}
}
If you have access to .Net 4.5 you can use Task.Delay:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
Task.Delay(500)
.ContinueWith(() => task.Execute())
.Wait();
}
}

Related

Executing N number of threads in parallel and in a sequential manner

I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?
One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.
Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}
var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).
Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}

Handle tasks which complete after Task.WhenAll().Wait() specified timeout

I am trying to use Task.WhenAll(tasks).Wait(timeout) to wait for tasks to complete and after that process task results.
Consider this example:
var tasks = new List<Task<Foo>>();
tasks.Add(Task.Run(() => GetData1()));
tasks.Add(Task.Run(() => GetData2()));
Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(5));
var completedTasks = tasks
.Where(t => t.Status == TaskStatus.RanToCompletion)
.Select(t => t.Result)
.ToList();
// Process completed tasks
// ...
private Foo GetData1()
{
Thread.Sleep(TimeSpan.FromSeconds(4));
return new Foo();
}
private Foo GetData2()
{
Thread.Sleep(TimeSpan.FromSeconds(10));
// How can I get the result of this task once it completes?
return new Foo();
}
It is possible that one of these tasks will not complete their execution within 5 second timeout.
Is it possible to somehow process results of the tasks that have completed after specified timeout? Maybe I am not using right approach in this situation?
EDIT:
I am trying to get all task results that managed to complete within specified timeout. There could be the following outcomes after Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(5)):
First task completes within 5 seconds.
Second task completes within 5 seconds.
Both tasks complete within 5 seconds.
None of the tasks complete within 5 seconds. Is it possible to get task results that haven't completed within 5 seconds, but have completed later, lets say, after 10 seconds?
In the end with help of the user who removed his answer, I ended up with this solution:
private const int TimeoutInSeconds = 5;
private static void Main(string[] args)
{
var tasks = new List<Task>()
{
Task.Run( async() => await Task.Delay(30)),
Task.Run( async() => await Task.Delay(300)),
Task.Run( async() => await Task.Delay(6000)),
Task.Run( async() => await Task.Delay(8000))
};
Task.WhenAll(tasks).Wait(TimeSpan.FromSeconds(TimeoutInSeconds));
var completedTasks = tasks
.Where(t => t.Status == TaskStatus.RanToCompletion).ToList();
var incompleteTasks = tasks
.Where(t => t.Status != TaskStatus.RanToCompletion).ToList();
Task.WhenAll(incompleteTasks)
.ContinueWith(t => { ProcessDelayedTasks(incompleteTasks); });
ProcessCompletedTasks(completedTasks);
Console.ReadKey();
}
private static void ProcessCompletedTasks(IEnumerable<Task> delayedTasks)
{
Console.WriteLine("Processing completed tasks...");
}
private static void ProcessDelayedTasks(IEnumerable<Task> delayedTasks)
{
Console.WriteLine("Processing delayed tasks...");
}
Instead of Waitall, you probably just want to do some sort of Spin/sleep of 5 seconds and then query the list as you are above.
You should then be able to enumerate again after a few more seconds to see what else has finished.
If performance is a concern, you may want to have additional 'wrapping' to see if All tasks have completed before 5 seconds.
I think there's a possible loss of task items between
var completedTasks = tasks.Where(t => t.Status == TaskStatus.RanToCompletion).ToList();
and
var incompleteTasks = tasks.Where(t => t.Status != TaskStatus.RanToCompletion).ToList();
because some tasks may ran to completition during this time.
As a workaround (not correct though) you coud swap these lines. In this case some tasks may present in each (completedTasks and incompleteTasks) list. But maybe it's better than to be lost completely.
A unit test to compare number of started tasks and number of tasks in completedTasks and incompleteTasks lists may also be useful.

How to use Rx to deliver events on a schedule?

I'm using Reactive Extensions for C#. I want several threads to enqueue items on a ConcurrentQueue. Then I want to Subscribe to that queue, but only get 1 element every 1 second. This answer almost works, but not when I add more elements to the queue.
Given a queue of ints: [1, 2, 3, 4, 5, 6]. I want Subscribe(Console.WriteLine) to print a value every second. I want to add more ints from another thread onto the queue while Rx is printing these numbers out. Any ideas?
To pace an input stream to output no faster than at a rate described by a Timespan interval, use this:
var paced = input.Select(i => Observable.Empty<T>()
.Delay(interval)
.StartWith(i)).Concat();
See here for an explanation. Here's an example implementation tailored to a concurrent queue that dequeues quickly. Note that using the ToObservable extension of IEnumerable<T> to convert ConcurrentQueue<T> to an observable directly would be a mistake, because sadly this observable completes as soon as the queue is empty. It's jolly annoying that - at least as far as I can see - there's no asynchronous dequeue on a ConcurrentQueue<T> and so I had to introduce a polling mechanism. Other abstractions (e.g. BlockingCollection<T>) may serve you better!
public static class ObservableExtensions
{
public static IObservable<T> Pace<T>(this ConcurrentQueue<T> queue,
TimeSpan interval)
{
var source = Observable.Create<T>(async (o, ct) => {
while(!ct.IsCancellationRequested)
{
T next;
while(queue.TryDequeue(out next))
o.OnNext(next);
// You might want to use some arbitrary shorter interval here
// to allow the stream to resume after a long delay in source
// events more promptly
await Task.Delay(interval, ct);
}
ct.ThrowIfCancellationRequested();
});
// this does the pacing
return source.Select(i => Observable.Empty<T>()
.Delay(interval)
.StartWith(i)).Concat()
.Publish().RefCount(); // to allow multiple subscribers
}
}
Example usage:
public static void Main()
{
var queue = new ConcurrentQueue<int>();
var stopwatch = new Stopwatch();
queue.Pace(TimeSpan.FromSeconds(1))
.Subscribe(
x => Console.WriteLine(stopwatch.ElapsedMilliseconds + ": x" + x),
e => Console.WriteLine(e.Message),
() => Console.WriteLine("Done"));
stopwatch.Start();
queue.Enqueue(1);
queue.Enqueue(2);
Thread.Sleep(500);
queue.Enqueue(3);
Thread.Sleep(5000);
queue.Enqueue(4);
queue.Enqueue(5);
queue.Enqueue(6);
Console.ReadLine();
}
May be you will satisfied with one of Observable.Buffer overload. But consider not to use buffering with long running subsriptions because buffered elements can stress your RAM.
You can also build you own extension method with any desired behavior using Observable.Generate
void Main()
{
var queue = new ConcurrentQueue<int>();
queue.Enqueue(1);
queue.Enqueue(2);
queue.Enqueue(3);
queue.Enqueue(4);
queue.ObserveEach(TimeSpan.FromSeconds(1)).DumpLive("queue");
}
// Define other methods and classes here
public static class Ex {
public static IObservable<T> ObserveConcurrentQueue<T>(this ConcurrentQueue<T> queue, TimeSpan period)
{
return Observable
.Generate(
queue,
x => true,
x => x,
x => x.DequeueOrDefault(),
x => period)
.Where(x => !x.Equals(default(T)));
}
public static T DequeueOrDefault<T>(this ConcurrentQueue<T> queue)
{
T result;
if (queue.TryDequeue(out result))
return result;
else
return default(T);
}
}

Speculative execution using the TPL

I have a List<Task<bool>> that I want to enumerate in parallel finding the first task to complete with a result of true and not waiting for or observe exceptions on any of the other tasks still pending.
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
I have tried to use PLINQ to do something like:
var task = tasks.AsParallel().FirstOrDefault(t => t.Result);
Which executes in parallel, but doesn't return as soon as it finds a satisfying result. because accessing the Result property is blocking. In order for this to work using PLINQ, I'd have to write this aweful statement:
var cts = new CancellationTokenSource();
var task = tasks.AsParallel()
.FirstOrDefault(t =>
{
try
{
t.Wait(cts.Token);
if (t.Result)
{
cts.Cancel();
}
return t.Result;
}
catch (OperationCanceledException)
{
return false;
}
} );
I've written up an extension method that yields tasks as they complete like so.
public static class Exts
{
public static IEnumerable<Task<T>> InCompletionOrder<T>(this IEnumerable<Task<T>> source)
{
var tasks = source.ToList();
while (tasks.Any())
{
var t = Task.WhenAny(tasks);
yield return t.Result;
tasks.Remove(t.Result);
}
}
}
// and run like so
var task = tasks.InCompletionOrder().FirstOrDefault(t => t.Result);
But it feels like this is something common enough that there is a better way. Suggestions?
Maybe something like this?
var tcs = new TaskCompletionSource<Task<bool>>();
foreach (var task in tasks)
{
task.ContinueWith((t, state) =>
{
if (t.Result)
{
((TaskCompletionSource<Task<bool>>)state).TrySetResult(t);
}
},
tcs,
TaskContinuationOptions.OnlyOnRanToCompletion |
TaskContinuationOptions.ExecuteSynchronously);
}
var firstTaskToComplete = tcs.Task;
Perhaps you could try the Rx.Net library. Its very good for in effect providing Linq to Work.
Try this snippet in LinqPad after you reference the Microsoft Rx.Net assemblies.
using System
using System.Linq
using System.Reactive.Concurrency
using System.Reactive.Linq
using System.Reactive.Threading.Tasks
using System.Threading.Tasks
void Main()
{
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
var observable = (from t in tasks.ToObservable()
//Convert task to an observable
let o = t.ToObservable()
//SelectMany
from x in o
select x);
var foo = observable
.SubscribeOn(Scheduler.Default) //Run the tasks on the threadpool
.ToList()
.First();
Console.WriteLine(foo);
}
First, I don't understand why are you trying to use PLINQ here. Enumerating a list of Tasks shouldn't take long, so I don't think you're going to gain anything from parallelizing it.
Now, to get the first Task that already completed with true, you can use the (non-blocking) IsCompleted property:
var task = tasks.FirstOrDefault(t => t.IsCompleted && t.Result);
If you wanted to get a collection of Tasks, ordered by their completion, have a look at Stephen Toub's article Processing tasks as they complete. If you want to list those that return true first, you would need to modify that code. If you don't want to modify it, you can use a version of this approach from Stephen Cleary's AsyncEx library.
Also, in the specific case in your question, you could “fix” your code by adding .WithMergeOptions(ParallelMergeOptions.NotBuffered) to the PLINQ query. But doing so still wouldn't work most of the time and can waste threads a lot even when it does. That's because PLINQ uses a constant number of threads and partitioning and using Result would block those threads most of the time.

List of objects with async Task methods, execute all concurrently

Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );

Categories

Resources