I have a TransformManyBlock that creates many "actors". They flow through several TransformBlocks of processing. Once all of the actors have completed all of the steps I need to process everything as a whole again.
The final
var CopyFiles = new TransformBlock<Actor, Actor>(async actor =>
{
//Copy all fo the files and then wait until they are all done
await actor.CopyFiles();
//pass me along to the next process
return actor;
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = -1 });
How do I detect when all of the Actors have been processed? Completions seems to be propagated immediately and only tell you there are no more items to process. That doesn't tell me when the last Actor is finished processing.
Completion in TPL Dataflow is done by calling Complete which returns immediately and signals completion. That makes the block refuse further messages but it continues processing the items it already contain.
When a block completes processing all its items it completes its Completion task. You can await that task to be notified when all the block's work has been done.
When you link blocks together (with PropogateCompletion turned on) you only need to call Complete on the first and await the last one's Completion property:
var copyFilesBlock = new TransformBlock<Actor, Actor>(async actor =>
{
await actor.CopyFilesAsync();
return actor;
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = -1 });
// Fill copyFilesBlock
copyFilesBlock.Complete();
await copyFilesBlock.Completion;
Related
I need your help, with threads I'm full 0 and you only need to create a certain thread and complete it on command, BUT I do not create each thread in advance, as there will be a lot of them, I do it like this:
Thread thread = new Thread(() => Go(..... many many variables that are taken from the listview ......));
thread.Start();
So, as noted above, variables are taken from the listview, which in turn is loaded by me from the file and then I run the threads I need. BUT the process in the stream is infinite and will end only if I completely close the program, and I would like to end the stream in the same way as I started it (right click on the desired line-start/stop). As I said, I have never worked with threads and thought that it was somehow simple, like when you start a thread, you assign it an ID and end it with the same ID, but alas. I have searched all over Google and have not found an EXAMPLE that suits me (I will repeat for the third time - I have never worked with threads and I do not need to say "go read about TPL"), so I ask for help, preferably with an example)
I have a very bad idea: in the sheet there is an invisible column in which an id is generated at the start, then when I send a command to start the thread, a unique variable is created with the name for example int id1=0 and its name is passed to the thread itself and each time the loop starts, id1=0 or 1 is checked in it, respectively, if 0-continue, if 1-empty. Well, it is logical that when you click the stop button, its value changes to 1. But something seems to me that the holy spirit of multithreading will punish me for this when the threads become 100+. I read this idea somewhere, so don't swear)
You do not need hundreds of threads for this. Your worker "threads" are performing HTTP requests, which can be done asynchronously without requiring a new thread. Also, hundreds of threads wouldn't really help you unless you have hundreds of CPU cores (you don't).
For this sort of work, I'd recommend the following:
Write a method that does all the work your thread does, but also checks a CancellationToken with each iteration.
Calls the method in a loop, once for each account, and store the resulting tasks in an array or list. Or use LINQ (as I do in this example) to create the list.
When your program terminates, activate the CancellationToken.
After cancelling, you have to await all the tasks in order to observe any possible exceptions and exit cleanly.
For example
public async Task DoTheWork(Account account, CancellationToken token)
{
while (!token.IsCancellationRequested)
{
var result = await httpClient.GetAsync(account.Url);
await DoSomethingWithResult(result);
await Task.Delay(1000);
}
}
//Main program
var accounts = GetAccountList();
var source = new CancellationTokenSource();
var tasks = accounts.Select( x => DoTheWork(x, source.Token) ).ToList();
//When exiting
source.Cancel();
await Task.WhenAll( tasks );
source.Dispose();
Indivivdual cancellation
Here's another approach that keeps a list of the accounts and a delegate that can be used for cancelling the task for that specific account.
//Declare this somewhere it will persist for the duration of the program
//The key to this dictionary is the account you wish to cancel
//The value is a delegate that you can call to cancel its task
Dictionary<Account, Func<Task>> _tasks = new Dictionary<Account, Func<Task>>();
async Task CreateTasks()
{
var accounts = GetAccounts();
foreach (var account in accounts)
{
var source = new CancellationTokenSource();
var task = DoTheWork(account, source.Token);
_tasks.Add(account, () => { source.Cancel(); return task; });
}
}
//Retrieve the delegate from the dictionary and call it to cancel its task
//Then await the task to observe any exceptions
//Then remove it from the list
async Task CancelTask(Account account)
{
var cancelAction = _tasks[account];
var task = cancelAction();
await task;
_tasks.Remove(account);
}
async Task CancelAllTasks()
{
var tasks = _tasks.Select(x => x.Value()).ToList();
await Task.WhenAll(tasks);
}
I have a problem with determining how to detect completion within a looping TPL Dataflow.
I have a feedback loop in part of a dataflow which is making GET requests to a remote server and processing data responses (transforming these with more dataflow then committing the results).
The data source splits its results into pages of 1000 records, and won't tell me how many pages it has available for me. I have to just keep reading until i get less than a full page of data.
Usually the number of pages is 1, frequently it is up to 10, every now and again we have 1000s.
I have many requests to fetch at the start.
I want to be able to use a pool of threads to deal with this, all of which is fine, I can queue multiple requests for data and request them concurrently. If I stumble across an instance where I need to get a big number of pages I want to be using all of my threads for this. I don't want to be left with one thread churning away whilst the others have finished.
The issue I have is when I drop this logic into dataflow, such as:
//generate initial requests for activity
var request = new TransformManyBlock<int, DataRequest>(cmp => QueueRequests(cmp));
//fetch the initial requests and feedback more requests to our input buffer if we need to
TransformBlock<DataRequest, DataResponse> fetch = null;
fetch = new TransformBlock<DataRequest, DataResponse>(async req =>
{
var resp = await Fetch(req);
if (resp.Results.Count == 1000)
await fetch.SendAsync(QueueAnotherRequest(req));
return resp;
}
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
//commit each type of request
var commit = new ActionBlock<DataResponse>(async resp => await Commit(resp));
request.LinkTo(fetch);
fetch.LinkTo(commit);
//when are we complete?
QueueRequests produces an IEnumerable<DataRequest>. I queue the next N page requests at once, accepting that this means I send slightly more calls than I need to. DataRequest instances share a LastPage counter to avoid neadlessly making requests that we know are after the last page. All this is fine.
The problem:
If I loop by feeding back more requests into fetch's input buffer as I've shown in this example, then i have a problem with how to signal (or even detect) completion. I can't set completion on fetch from request, as once completion is set I can't feedback any more.
I can monitor for the input and output buffers being empty on fetch, but I think I'd be risking fetch still being busy with a request when I set completion, thus preventing queuing requests for additional pages.
I could do with some way of knowing that fetch is busy (either has input or is busy processing an input).
Am I missing an obvious/straightforward way to solve this?
I could loop within fetch, rather than queuing more requests. The problem with that is I want to be able to use a set maximum number of threads to throttle what I'm doing to the remote server. Could a parallel loop inside the block share a scheduler with the block itself and the resulting thread count be controlled via the scheduler?
I could create a custom transform block for fetch to handle the completion signalling. Seems like a lot of work for such a simple scenario.
Many thanks for any help offered!
In TPL Dataflow, you can link the blocks with DataflowLinkOptions with specifying the propagation of completion of the block:
request.LinkTo(fetch, new DataflowLinkOptions { PropagateCompletion = true });
fetch.LinkTo(commit, new DataflowLinkOptions { PropagateCompletion = true });
After that, you simply call the Complete() method for the request block, and you're done!
// the completion will be propagated to all the blocks
request.Complete();
The final thing you should use is Completion task property of the last block:
commit.Completion.ContinueWith(t =>
{
/* check the status of the task and correctness of the requests handling */
});
For now I have added a simple busy state counter to the fetch block:-
int fetch_busy = 0;
TransformBlock<DataRequest, DataResponse> fetch_activity=null;
fetch = new TransformBlock<DataRequest, ActivityResponse>(async req =>
{
try
{
Interlocked.Increment(ref fetch_busy);
var resp = await Fetch(req);
if (resp.Results.Count == 1000)
{
await fetch.SendAsync( QueueAnotherRequest(req) );
}
Interlocked.Decrement(ref fetch_busy);
return resp;
}
catch (Exception ex)
{
Interlocked.Decrement(ref fetch_busy);
throw ex;
}
}
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
Which I then use to signal complete as follows:-
request.Completion.ContinueWith(async _ =>
{
while ( fetch.InputCount > 0 || fetch_busy > 0 )
{
await Task.Delay(100);
}
fetch.Complete();
});
Which doesnt seem very elegant, but should work I think.
Updated to explain things more clearly
I've got an application that runs a number of tasks. Some are created initially and other can be added later. I need need a programming structure that will wait on all the tasks to complete. Once the all the tasks complete some other code should run that cleans things up and does some final processing of data generated by the other tasks.
I've come up with a way to do this, but wouldn't call it elegant. So I'm looking to see if there is a better way.
What I do is keep a list of the tasks in a ConcurrentBag (a thread safe collection). At the start of the process I create and add some tasks to the ConcurrentBag. As the process does its thing if a new task is created that also needs to finish before the final steps I also add it to the ConcurrentBag.
Task.Wait accepts an array of Tasks as its argument. I can convert the ConcurrentBag into an array, but that array won't include any Tasks added to the Bag after Task.Wait was called.
So I have a two step wait process in a do while loop. In the body of the loop I do a simple Task.Wait on the array generated from the Bag. When it completes it means all the original tasks are done. Then in the while test I do a quick 1 millisecond test of a new array generated from the ConcurrentBag. If no new tasks were added, or any new tasks also completed it will return true, so the not condition exits the loop.
If it returns false (because a new task was added that didn't complete) we go back and do a non-timed Task.Wait. Then rinse and repeat until all new and old tasks are done.
// defined on the class, perhaps they should be properties
CancellationTokenSource Source = new CancellationTokenSource();
CancellationToken Token = Source.Token;
ConcurrentBag<Task> ToDoList = new ConcurrentBag<Task>();
public void RunAndWait() {
// start some tasks add them to the list
for (int i = 0; i < 12; i++)
{
Task task = new Task(() => SillyExample(Token), Token);
ToDoList.Add(task);
task.Start();
}
// now wait for those task, and any other tasks added to ToDoList to complete
try
{
do
{
Task.WaitAll(ToDoList.ToArray(), Token);
} while (! Task.WaitAll(ToDoList.ToArray(), 1, Token));
}
catch (OperationCanceledException e)
{
// any special handling of cancel we might want to do
}
// code that should only run after all tasks complete
}
Is there a more elegant way to do this?
I'd recommend using a ConcurrentQueue and removing items as you wait for them. Due to the first-in-first-out nature of queues, if you get to the point where there's nothing left in the queue, you know that you've waited for all the tasks that have been added up to that point.
ConcurrentQueue<Task> ToDoQueue = new ConcurrentQueue<Task>();
...
while(ToDoQueue.Count > 0 && !Token.IsCancellationRequested)
{
Task task;
if(ToDoQueue.TryDequeue(out task))
{
task.Wait(Token);
}
}
Here's a very cool way using Microsoft's Reactive Framework (NuGet "Rx-Main").
var taskSubject = new Subject<Task>();
var query = taskSubject.Select(t => Observable.FromAsync(() => t)).Merge();
var subscription =
query.Subscribe(
u => { /* Each Task Completed */ },
() => Console.WriteLine("All Tasks Completed."));
Now, to add tasks, just do this:
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
And then to signal completion:
taskSubject.OnCompleted();
It is important to note that signalling completion doesn't complete the query immediately, it will wait for all of the tasks to finish too. Signalling completion just says that you will no longer add any new tasks.
Finally, if you want to cancel, then just do this:
subscription.Dispose();
I have a requirement to iterate through a large list and for each item call a web service to get some data. However, I want to throttle the number of requests to the WS to say no more than 5 concurrent requests executing at any one time. All calls to the WS are made using async/await. I am using the TPL dataflow BufferBlock with a BoundedCapacity of 5. Everything is working properly but what I am noticing is that the consumer, which awaits the WS call, blocks the queue until its finished resulting in all the requests in the bufferblock being performed serially. Is it possible to have the consumer always processing 5 items off the queue at a time? Or do I need to set up multiple consumers or start looking into action blocks? So in a nutshell I want to seed the queue with 5 items. As one item is processed a sixth will take its place and so on so I always have 5 concurrent requests going until there are no more items to process.
I used this as my guide: Async Producer/Consumer Queue using Dataflow
Thanks for any help. Below is a simplified version of the code
//set up
BufferBlock<CustomObject> queue = new BufferBlock<CustomObject>(new DataflowBlockOptions { BoundedCapacity = 5 });
var producer = QueueValues(queue, values);
var consumer = ConsumeValues(queue);
await Task.WhenAll(producer, consumer, queue.Completion);
counter = await consumer;
//producer
function QueueValues(BufferBlock<CustomObject> queue, IList<CustomObject> values)
{
foreach (CustomObject value in values)
{
await queue.SendAsync(value);
}
queue.Complete();
}
//consumer
function ConsumeValues(BufferBlock<CustomObject> queue)
{
while (await queue.OutputAvailableAsync())
{
CustomObject value = await queue.ReceiveAsync();
await CallWebServiceAsync(value);
}
}
Your use of TPL Dataflow is rather strange. Normally, you'd move the consumption and processing into the flow. Append a TransformBlock to call the webservice. Delete ConsumeValues.
ConsumeValues executes sequentially which is fundamentally not what you want.
Instead of BoundedCapacity I think you rather want MaxDegreeOfParallelism.
You should be using ActionBlock with MaxDegreeOfParallelism set to 5. You may also want to set a BoundedCapacity but that's for throttling the producer and not the consumer:
var block = new ActionBlock<CustomObject>(
item => CallWebServiceAsync(item),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 5,
BoundedCapacity = 1000
});
foreach (CustomObject value in values)
{
await block.SendAsync(value);
}
block.Complete();
await block.Completion;
I have inherited a C#/XAML/Win 8 application. There is some code which is set to run every n seconds.
The code that sets that up is:
if(!_syncThreadStarted)
{
await Task.Run(() => SyncToDatabase());
_syncThreadStarted = true;
}
The above code is ran once.
And then inside SyncToDatabase() we have:
while (true)
{
DatabaseSyncer dbSyncer = new DatabaseSyncer();
await dbSyncer.DeserializeAndUpdate();
await Task.Delay(10); // after elapsed time re-run above code
}
The method DeserializeAndUpdate queries a in-memory collection of objects and pushes those objects to a web service.
Sometimes the send request to the web service takes longer than expected meaning duplicate items are sent.
Question: Is there a way to have a thread or some type of thread pool/background worker which I can stop/abort/destroy inside the method SyncToDatabase() , and then initialize/start it once we are done? This will ensure no subsequent requests are fired while a previous request is still pending.
Edit: I am not very knowledgeable when it comes to Threads, but the logic I want is:
Create thread which runs some method every x seconds, and when it starts that thread stop the "running every x seconds" part, after thread has complete start the "run every x seconds" part again.
E.g. if the thread kicks off at 10:01:30AM and does not complete until 10:01:39AM (9 seconds) the next thread should start at 10:01:44AM (5 seconds after work completed) - does that make sense? I do not want 2 or more threads running at the same time.
Here is my code for the above:
var period = TimeSpan.FromSeconds(5);
var completed = true;
ThreadPoolTimer syncTimer = ThreadPoolTimer.CreatePeriodicTimer(async (source) =>
{
// stop further threads from starting (in case this work takes longer than var period)
syncTimer.Cancel();
DatabaseSyncer dbSyncer = new DatabaseSyncer();
await dbSyncer.DeserializeAndUpdate(); // makes webservices calls
Dispatcher.RunAsync(CoreDispatcerPriority.High, async () =>
{
// Update UI
}
completed = true;
}, period,
(source) =>
{
if(!completed)
{
syncTimer.Cancel(); // not sure if this is correct...
}
}
Thanks,
Andrew)
This is not specific to Windows 8. Usually Task.Run is used for CPU-bound work, to offload it to a pool thread and keep the UI (or the core service loop) responsive. In your case, as far as I can tell, the main payload is dbSyncer.DeserializeAndUpdate, which is already asynchronous and most likely network-IO bound, rather than CPU-bound.
Besides, the author of the original code does _syncThreadStarted = true after await Task.Run(() => SyncToDatabase()). That doesn't make sense, because the work on the pool thread would have been already done by the time _syncThreadStarted = true is executed, thanks to the await.
To cancel the loop inside SyncToDatabase you could use Task Cancellation Pattern. Is SyncToDatabase itself an async method? I presume so, because there's an await in the while loop. Given that, the code which calls it could look something like this:
if(_syncTask != null && !_syncTask.IsCompleted)
{
_ct.Cancel();
// here you may want to make sure that the pending task has been fully shut down,
// keeping possible re-entrancy in mind
// See: https://stackoverflow.com/questions/18999827/a-pattern-for-self-cancelling-and-restarting-task
_syncTask = null;
}
_ct = new CancellationTokenSource();
// _syncTask = SyncToDatabase(ct.Token); // do not await
// edited to run on another thread, as requested by the OP
var _syncTask = Task.Run(async () => await SyncToDatabase(ct.Token), ct.Token);
_syncThreadStarted = true;
And SyncToDatabase could look like:
async Task SyncToDatabase(CancellationToken token)
{
while (true)
{
token.ThrowIfCancellationRequested();
DatabaseSyncer dbSyncer = new DatabaseSyncer();
await dbSyncer.DeserializeAndUpdate();
await Task.Delay(10, token); // after elapsed time re-run above code
}
}
Check this answer for more details on how to cancel and restart a task.
I may have misunderstood the question, but the execution of SynchToDatabase() will wait on the completion of await dbSyncer.DeserializeAndUpdaet() (due to the await keyword, go figure ;)) before executing the continuation, which will then delay for 10 ms (do you want 10ms or did you mean 10 seconds? Parameter for Task.Delay is in milliseconds), then loop back to re-execute the DbSyncer method, so I don't see the problem.