I Have an application , in a nutshell it creates objects of type "WebPage".
These objecsts are then inserted into a SQL database.
I want to retrive these records from the database , and then load them into some files.
I create a While loop to read the results of the query , and for each row returned a Webpage object is created and added to a static ConcurrentQueue.
Here is where my problem is :
I want to have a seperate thread , that when something new appears on the ConcurrentQueue - it responds and writes the object out to my file. I already have this code working in a single threaded and serial fashion , but I want to speed it up.
I currently have a piece of code inside the reader from the SQL database , when the ConcurrentQueue reaches a certain amount of objects - it sends an autoreset event (see below)
if(flow.CheckEngineCapacity >= 2000 || (Convert.ToInt32(totalRows) - numberOfRecords) < 2000)
{
waitHandle.Set();
Thread fileProcessor = new Thread(delegate () { flow.ProcessExportEngineFlow(waitHandle); });
fileProcessor.Start();
}
what ends up happening is some sort of context switch where the main thread seems to sleep until that one completes - I did attempt to try work with await and async but suspect that is not what I need.
How would I go about getting it work in the following pattern
New object is added to ConcurrentQueue
When a certain amount of objects on the ConcurrentQueue is reached , start to Dequeue the objects and load them into the files while still adding objects to the concurrent queue
NOTE that if the concurrentqueue hits a certain amount of objects it should block until the thread doing Dequeue can free up some space.
The reason I am doing this is to make the solution as performant as possible - the bottlenecks should be write to files and read from database.
The below is the example of the class I have been trying to put together :
public class EngineFlow
{
private static ConcurrentQueue<WebPages> _concurrentWebPageList = new ConcurrentQueue<WebPages>();
public bool IncreaseEngineFlow(WebPages page)
{
bool sucessfullyadded = false;
if (_concurrentWebPageList.Count <= 2000)
{
_concurrentWebPageList.Enqueue(page);
sucessfullyadded = true;
}
else
{
return sucessfullyadded;
}
return sucessfullyadded;
}
public int CheckEngineCapacity { get { return _concurrentWebPageList.Count; } }
private WebPages DecreaseEngineFlow()
{
WebPages page;
_concurrentWebPageList.TryDequeue(out page);
return page;
}
public void ProcessExportEngineFlow(AutoResetEvent waitHandle)
{
if (waitHandle.WaitOne() == false)
{
Thread.Sleep(100);
}
else
{
while (!_concurrentWebPageList.IsEmpty)
{
Console.WriteLine(DecreaseEngineFlow().URL);
Console.WriteLine(CheckEngineCapacity);
waitHandle.Set();
}
}
}
Originally this was meant to be a producer and consumer but I feel like I may be overthinking it.
Thank you #Henk Holterman
The new class used a BlockingCollection - which solved all the problems :
Task.Run(() =>
{
flow.ProcessExportEngineFlow();
});
Task.Run(() =>
{
while (reader.Read())
{
flow.IncreaseEngineFlow(webpage);
}
Class Definition :
private BlockingCollection<WebPages> _concurrentWebPageList = new BlockingCollection<WebPages>(new ConcurrentQueue<WebPages>(), 1000);
//private static ConcurrentQueue<WebPages> _concurrentWebPageList = new ConcurrentQueue<WebPages>();
public void IncreaseEngineFlow(WebPages page)
{
_concurrentWebPageList.Add(page);
}
public WebPages DecreaseEngineFlow()
{
return _concurrentWebPageList.Take();
}
public void ProcessExportEngineFlow()
{
while(!_concurrentWebPageList.IsCompleted)
{
WebPages page = null;
try
{
page = _concurrentWebPageList.Take();
}
catch (InvalidOperationException) { }
if(page != null)
{
Console.WriteLine(page.URL);
}
}
}
public bool GetEngineState()
{
return _concurrentWebPageList.IsCompleted;
}
public void SetEngineCompleted()
{
_concurrentWebPageList.CompleteAdding();
}
Related
I want to use IAsyncEnumerable like a Source for akka streams. But I not found, how do it.
No sutiable method in Source class for this code.
using System.Collections.Generic;
using System.Threading.Tasks;
using Akka.Streams.Dsl;
namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
Source.From(await AsyncEnumerable())
.Via(/*some action*/)
//.....
}
private static async IAsyncEnumerable<int> AsyncEnumerable()
{
//some async enumerable
}
}
}
How use IAsyncEnumerbale for Source?
This has been done in the past as a part of Akka.NET Streams contrib package, but since I don't see it there anymore, let's go through on how to implement such source. The topic can be quite long, as:
Akka.NET Streams is really about graph processing - we're talking about many-inputs/many-outputs configurations (in Akka.NET they're called inlets and outlets) with support for cycles in graphs.
Akka.NET is not build on top of .NET async/await or even on top of .NET standard thread pool library - they're both pluggable, which means that the lowest barier is basically using callbacks and encoding what C# compiler sometimes does for us.
Akka.NET streams is capable of both pushing and pulling values between stages/operators. IAsyncEnumerable<T> can only pull data while IObservable<T> can only push it, so we get more expressive power here, but this comes at a cost.
The basics of low level API used to implement custom stages can be found in the docs.
The starter boilerplate looks like this:
public static class AsyncEnumerableExtensions {
// Helper method to change IAsyncEnumerable into Akka.NET Source.
public static Source<T, NotUsed> AsSource<T>(this IAsyncEnumerable<T> source) =>
Source.FromGraph(new AsyncEnumerableSource<T>(source));
}
// Source stage is description of a part of the graph that doesn't consume
// any data, only produce it using a single output channel.
public sealed class AsyncEnumerableSource<T> : GraphStage<SourceShape<T>>
{
private readonly IAsyncEnumerable<T> _enumerable;
public AsyncEnumerableSource(IAsyncEnumerable<T> enumerable)
{
_enumerable = enumerable;
Outlet = new Outlet<T>("asyncenumerable.out");
Shape = new SourceShape<T>(Outlet);
}
public Outlet<T> Outlet { get; }
public override SourceShape<T> Shape { get; }
/// Logic if to a graph stage, what enumerator is to enumerable.
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) =>
new Logic(this);
sealed class Logic: OutGraphStageLogic
{
public override void OnPull()
{
// method called whenever a consumer asks for new data
}
public override void OnDownstreamFinish()
{
// method called whenever a consumer stage finishes,used for disposals
}
}
}
As mentioned, we don't use async/await straight away here: even more, calling Logic methods in asynchronous context is unsafe. To make it safe we need to register out methods that may be called from other threads using GetAsyncCallback<T> and call them via returned wrappers. This will ensure, that not data races will happen when executing asynchronous code.
sealed class Logic : OutGraphStageLogic
{
private readonly Outlet<T> _outlet;
// enumerator we'll call for MoveNextAsync, and eventually dispose
private readonly IAsyncEnumerator<T> _enumerator;
// callback called whenever _enumerator.MoveNextAsync completes asynchronously
private readonly Action<Task<bool>> _onMoveNext;
// callback called whenever _enumerator.DisposeAsync completes asynchronously
private readonly Action<Task> _onDisposed;
// cache used for errors thrown by _enumerator.MoveNextAsync, that
// should be rethrown after _enumerator.DisposeAsync
private Exception? _failReason = null;
public Logic(AsyncEnumerableSource<T> source) : base(source.Shape)
{
_outlet = source.Outlet;
_enumerator = source._enumerable.GetAsyncEnumerator();
_onMoveNext = GetAsyncCallback<Task<bool>>(OnMoveNext);
_onDisposed = GetAsyncCallback<Task>(OnDisposed);
}
// ... other methods
}
The last part to do are methods overriden on `Logic:
OnPull used whenever the downstream stage calls for new data. Here we need to call for next element of async enumerator sequence.
OnDownstreamFinish called whenever the downstream stage has finished and will not ask for any new data. It's the place for us to dispose our enumerator.
Thing is these methods are not async/await, while their enumerator's equivalent are. What we basically need to do there is to:
Call corresponding async methods of underlying enumerator (OnPull → MoveNextAsync and OnDownstreamFinish → DisposeAsync).
See, if we can take their results immediately - it's important part that usually is done for us as part of C# compiler in async/await calls.
If not, and we need to wait for the results - call ContinueWith to register our callback wrappers to be called once async methods are done.
sealed class Logic : OutGraphStageLogic
{
// ... constructor and fields
public override void OnPull()
{
var hasNext = _enumerator.MoveNextAsync();
if (hasNext.IsCompletedSuccessfully)
{
// first try short-path: values is returned immediately
if (hasNext.Result)
// check if there was next value and push it downstream
Push(_outlet, _enumerator.Current);
else
// if there was none, we reached end of async enumerable
// and we can dispose it
DisposeAndComplete();
}
else
// we need to wait for the result
hasNext.AsTask().ContinueWith(_onMoveNext);
}
// This method is called when another stage downstream has been completed
public override void OnDownstreamFinish() =>
// dispose enumerator on downstream finish
DisposeAndComplete();
private void DisposeAndComplete()
{
var disposed = _enumerator.DisposeAsync();
if (disposed.IsCompletedSuccessfully)
{
// enumerator disposal completed immediately
if (_failReason is not null)
// if we close this stream in result of error in MoveNextAsync,
// fail the stage
FailStage(_failReason);
else
// we can close the stage with no issues
CompleteStage();
}
else
// we need to await for enumerator to be disposed
disposed.AsTask().ContinueWith(_onDisposed);
}
private void OnMoveNext(Task<bool> task)
{
// since this is callback, it will always be completed, we just need
// to check for exceptions
if (task.IsCompletedSuccessfully)
{
if (task.Result)
// if task returns true, it means we read a value
Push(_outlet, _enumerator.Current);
else
// otherwise there are no more values to read and we can close the source
DisposeAndComplete();
}
else
{
// task either failed or has been cancelled
_failReason = task.Exception as Exception ?? new TaskCanceledException(task);
FailStage(_failReason);
}
}
private void OnDisposed(Task task)
{
if (task.IsCompletedSuccessfully) CompleteStage();
else {
var reason = task.Exception as Exception
?? _failReason
?? new TaskCanceledException(task);
FailStage(reason);
}
}
}
As of Akka.NET v1.4.30 this is now natively supported inside Akka.Streams via the RunAsAsyncEnumerable method:
var input = Enumerable.Range(1, 6).ToList();
var cts = new CancellationTokenSource();
var token = cts.Token;
var asyncEnumerable = Source.From(input).RunAsAsyncEnumerable(Materializer);
var output = input.ToArray();
bool caught = false;
try
{
await foreach (var a in asyncEnumerable.WithCancellation(token))
{
cts.Cancel();
}
}
catch (OperationCanceledException e)
{
caught = true;
}
caught.ShouldBeTrue();
I copied that sample from the Akka.NET test suite, in case you're wondering.
You can also use an existing primitive for streaming large collections of data. Here is an example of using Source.unfoldAsync to stream pages of data - in this case github repositories using Octokit - until there is no more.
var source = Source.UnfoldAsync<int, RepositoryPage>(startPage, page =>
{
var pageTask = client.GetRepositoriesAsync(page, pageSize);
var next = pageTask.ContinueWith(task =>
{
var page = task.Result;
if (page.PageNumber * pageSize > page.Total) return Option<(int, RepositoryPage)>.None;
else return new Option<(int, RepositoryPage)>((page.PageNumber + 1, page));
});
return next;
});
To run
using var sys = ActorSystem.Create("system");
using var mat = sys.Materializer();
int startPage = 1;
int pageSize = 50;
var client = new GitHubClient(new ProductHeaderValue("github-search-app"));
var source = ...
var sink = Sink.ForEach<RepositoryPage>(Console.WriteLine);
var result = source.RunWith(sink, mat);
await result.ContinueWith(_ => sys.Terminate());
class Page<T>
{
public Page(IReadOnlyList<T> contents, int page, long total)
{
Contents = contents;
PageNumber = page;
Total = total;
}
public IReadOnlyList<T> Contents { get; set; } = new List<T>();
public int PageNumber { get; set; }
public long Total { get; set; }
}
class RepositoryPage : Page<Repository>
{
public RepositoryPage(IReadOnlyList<Repository> contents, int page, long total)
: base(contents, page, total)
{
}
public override string ToString() =>
$"Page {PageNumber}\n{string.Join("", Contents.Select(x => x.Name + "\n"))}";
}
static class GitHubClientExtensions
{
public static async Task<RepositoryPage> GetRepositoriesAsync(this GitHubClient client, int page, int size)
{
// specify a search term here
var request = new SearchRepositoriesRequest("bootstrap")
{
Page = page,
PerPage = size
};
var result = await client.Search.SearchRepo(request);
return new RepositoryPage(result.Items, page, result.TotalCount);
}
}
I'm trying to process documents asynchronously. The idea is that the user sends documents to a service, which takes time, and will look at the results later (about 20-90 seconds per document).
Ideally, I would like to just fill some kind of observable collection that would be emptied by the system as fast as it can. When there is an item, process it and produce the expected output in another object, and when there is no item just do nothing. When the user checks the output collection, he will find the items that are already processed.
Ideally all items would be visible from the start and would have a state (completed, ongoing or in queue), but once I know how to do the first, I should be able to handle the states.
I'm not sure which object to use for that, right now I'm looking at BlockingCollection but I don't think it's suited for the job, as I can't fill it while it's being emptied from the other end.
private BlockingCollection<IDocument> _jobs = new BlockingCollection<IDocument>();
public ObservableCollection<IExtractedDocument> ExtractedDocuments { get; }
public QueueService()
{
ExtractedDocuments = new ObservableCollection<IExtractedDocument>();
}
public async Task Add(string filePath, List<Extra> extras)
{
if (_jobs.IsAddingCompleted || _jobs.IsCompleted)
_jobs = new BlockingCollection<IDocument>();
var doc = new Document(filePath, extras);
_jobs.Add(doc);
_jobs.CompleteAdding();
await ProcessQueue();
}
private async Task ProcessQueue()
{
foreach (var document in _jobs.GetConsumingEnumerable(CancellationToken.None))
{
var resultDocument = await service.ProcessDocument(document);
ExtractedDocuments.Add(resultDocument );
Debug.WriteLine("Job completed");
}
}
This is how I'm handling it right now. If I remove the CompleteAdding call, it hangs on the second attempt. If I have that statement, then I can't just fill the queue, I have to empty it first which defeats the purpose.
Is there a way of having what I'm trying to achieve? A collection that I would fill and the system would process asynchronously and autonomously?
To summarize, I need :
A collection that I can fill, that would be processed gradually and asynchronously. A document or series or document can be added while some are being processed.
An ouput collection that would be filled after the process is complete
The UI thread and app to still be responsive while everything is running
I don't need to have multiple processes in parallel, or one document at a time. Whichever is easiest to put in place and maintain will do (small scale application). I'm assuming one at a time is simpler.
A common pattern here is to have a callback method that executes upon a document state change. With a background task running, it will chew threw documents as fast as it can. Call Dispose to shutdown the processor.
If you need to process the callback on a gui thread, you'll need to synchornize the callback to your main thread some how. Windows forms has methods to do this if that's what you are using.
This example program implements all the necessary classes and interfaces, and you can fine tune and tweak things as you need.
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp2
{
class Program
{
private static Task Callback(IExtractedDocument doc, DocumentProcessor.DocState docState)
{
Console.WriteLine("Processing doc {0}, state: {1}", doc, docState);
return Task.CompletedTask;
}
public static void Main()
{
using DocumentProcessor docProcessor = new DocumentProcessor(Callback);
Console.WriteLine("Processor started, press any key to end processing");
for (int i = 0; i < 100; i++)
{
if (Console.KeyAvailable)
{
break;
}
else if (i == 5)
{
// make an error
docProcessor.Add(null);
}
else
{
docProcessor.Add(new Document { Text = "Test text " + Guid.NewGuid().ToString() });
}
Thread.Sleep(500);
}
Console.WriteLine("Doc processor shut down, press ENTER to quit");
Console.ReadLine();
}
public interface IDocument
{
public string Text { get; }
}
public class Document : IDocument
{
public string Text { get; set; }
}
public interface IExtractedDocument : IDocument
{
public IDocument OriginalDocument { get; }
public Exception Error { get; }
}
public class ExtractedDocument : IExtractedDocument
{
public override string ToString()
{
return $"Orig text: {OriginalDocument?.Text}, Extracted Text: {Text}, Error: {Error}";
}
public IDocument OriginalDocument { get; set; }
public string Text { get; set; }
public Exception Error { get; set; }
}
public class DocumentProcessor : IDisposable
{
public enum DocState { Processing, Completed, Error }
private readonly BlockingCollection<IDocument> queue = new BlockingCollection<IDocument>();
private readonly Func<IExtractedDocument, DocState, Task> callback;
private CancellationTokenSource cancelToken = new CancellationTokenSource();
public DocumentProcessor(Func<IExtractedDocument, DocState, Task> callback)
{
this.callback = callback;
Task.Run(() => StartQueueProcessor()).GetAwaiter();
}
public void Dispose()
{
if (!cancelToken.IsCancellationRequested)
{
cancelToken.Cancel();
}
}
public void Add(IDocument doc)
{
if (cancelToken.IsCancellationRequested)
{
throw new InvalidOperationException("Processor is disposed");
}
queue.Add(doc);
}
private void ProcessDocument(IDocument doc)
{
try
{
// do processing
DoCallback(new ExtractedDocument { OriginalDocument = doc }, DocState.Processing);
if (doc is null)
{
throw new ArgumentNullException("Document to process was null");
}
IExtractedDocument successExtractedDocument = DoSomeDocumentProcessing(doc);
DoCallback(successExtractedDocument, DocState.Completed);
}
catch (Exception ex)
{
DoCallback(new ExtractedDocument { OriginalDocument = doc, Error = ex }, DocState.Error);
}
}
private IExtractedDocument DoSomeDocumentProcessing(IDocument originalDocument)
{
return new ExtractedDocument { OriginalDocument = originalDocument, Text = "Extracted: " + originalDocument.Text };
}
private void DoCallback(IExtractedDocument result, DocState docState)
{
if (callback != null)
{
// send callbacks in background
callback(result, docState).GetAwaiter();
}
}
private void StartQueueProcessor()
{
try
{
while (!cancelToken.Token.IsCancellationRequested)
{
if (queue.TryTake(out IDocument doc, 1000, cancelToken.Token))
{
// can chance to Task.Run(() => ProcessDocument(doc)).GetAwaiter() for parallel execution
ProcessDocument(doc);
}
}
}
catch (OperationCanceledException)
{
// ignore, don't need to throw or worry about this
}
while (queue.TryTake(out IDocument doc))
{
DoCallback(new ExtractedDocument { Error = new ObjectDisposedException("Processor was disposed") }, DocState.Error);
}
}
}
}
}
I have an expensive method to call for creating a batch of source items:
private Task<List<SourceItem>> GetUnprocessedBatch(int batchSize)
{
//impl
}
I want to populate new items only when there is no item to process(or it falls below a certain threshold). I couldn't figure out which Source method to use so far.
I have implemented a crude stream that would keep returning new items:
public class Stream
{
private readonly Queue<SourceItem> scrapeAttempts;
private int batchSize = 100;
private int minItemCount = 10;
public Stream()
{
scrapeAttempts = new Queue<SourceItem>();
}
public async Task<SourceItem> Next()
{
if (scrapeAttempts.Count < minItemCount)
{
var entryScrapeAttempts = await GetUnprocessedBatch(batchSize);
entryScrapeAttempts.ForEach(attempt => scrapeAttempts.Enqueue(attempt));
}
return scrapeAttempts.Dequeue();
}
}
I expected Source.Task would work but it looks like it calls it only once. How can I create a source for this scenario?
So, conceptually what you want is a Source stage, that fetches elements asynchronously in batches, buffers the batch and propagates events downstream one by one. When the buffer is close to being empty, we want to eagerly call the next fetch on the side thread (but not more than once), so it could complete while we're emptying current batch.
This sort of behavior will require building a custom GraphStage. One that could look like this:
sealed class PreFetch<T> : GraphStage<SourceShape<T>>
{
private readonly int threshold;
private readonly Func<Task<IEnumerable<T>>> fetch;
private readonly Outlet<T> outlet = new Outlet<T>("prefetch");
public PreFetch(int threshold, Func<Task<IEnumerable<T>>> fetch)
{
this.threshold = threshold;
this.fetch = fetch;
this.Shape = new SourceShape<T>(this.outlet);
}
public override SourceShape<T> Shape { get; }
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) => new Logic(this);
private sealed class Logic : GraphStageLogic
{
public Logic(PreFetch<T> stage) : base(stage.Shape)
{
// queue for batched elements
var queue = new Queue<T>();
// flag which indicates, that pull from downstream was made,
// but we didn't have any elements at that moment
var wasPulled = false;
// determines if fetch was already called
var fetchInProgress = false;
// in order to cooperate with async calls without data races,
// we need to register async callbacks for success and failure scenarios
var onSuccess = this.GetAsyncCallback<IEnumerable<T>>(batch =>
{
foreach (var item in batch) queue.Enqueue(item);
if (wasPulled)
{
// if pull was requested but not fulfilled, we need to push now, as we have elements
// it assumes that fetch returned non-empty batch
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
fetchInProgress = false;
});
var onFailure = this.GetAsyncCallback<Exception>(this.FailStage);
SetHandler(stage.outlet, onPull: () => {
if (queue.Count < stage.threshold && !fetchInProgress)
{
// if queue occupation reached bellow expected capacity
// call fetch on a side thread and handle its result asynchronously
stage.fetch().ContinueWith(task =>
{
// depending on if task was failed or not, we call corresponding callback
if (task.IsFaulted || task.IsCanceled)
onFailure(task.Exception as Exception ?? new TaskCanceledException(task));
else onSuccess(task.Result);
});
fetchInProgress = true;
}
// if queue is empty, we cannot push immediatelly, so we only mark
// that pull request has been made but not fulfilled
if (queue.Count == 0)
wasPulled = true;
else
{
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
});
}
}
}
Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}
I have a method that queues some work to be executed asynchronously. I'd like to return some sort of handle to the caller that can be polled, waited on, or used to fetch the return value from the operation, but I can't find a class or interface that's suitable for the task.
BackgroundWorker comes close, but it's geared to the case where the worker has its own dedicated thread, which isn't true in my case. IAsyncResult looks promising, but the provided AsyncResult implementation is also unusable for me. Should I implement IAsyncResult myself?
Clarification:
I have a class that conceptually looks like this:
class AsyncScheduler
{
private List<object> _workList = new List<object>();
private bool _finished = false;
public SomeHandle QueueAsyncWork(object workObject)
{
// simplified for the sake of example
_workList.Add(workObject);
return SomeHandle;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
foreach (object workObject in _workList)
{
if (!workObject.IsFinished)
{
workObject.DoSomeWork();
}
}
Thread.Sleep(1000);
}
}
}
The QueueAsyncWork function pushes a work item onto the polling list for a dedicated work thread, of which there will only over be one. My problem is not with writing the QueueAsyncWork function--that's fine. My question is, what do I return to the caller? What should SomeHandle be?
The existing .Net classes for this are geared towards the situation where the asynchronous operation can be encapsulated in a single method call that returns. That's not the case here--all of the work objects do their work on the same thread, and a complete work operation might span multiple calls to workObject.DoSomeWork(). In this case, what's a reasonable approach for offering the caller some handle for progress notification, completion, and getting the final outcome of the operation?
Yes, implement IAsyncResult (or rather, an extended version of it, to provide for progress reporting).
public class WorkObjectHandle : IAsyncResult, IDisposable
{
private int _percentComplete;
private ManualResetEvent _waitHandle;
public int PercentComplete {
get {return _percentComplete;}
set
{
if (value < 0 || value > 100) throw new InvalidArgumentException("Percent complete should be between 0 and 100");
if (_percentComplete = 100) throw new InvalidOperationException("Already complete");
if (value == 100 && Complete != null) Complete(this, new CompleteArgs(WorkObject));
_percentComplete = value;
}
public IWorkObject WorkObject {get; private set;}
public object AsyncState {get {return WorkObject;}}
public bool IsCompleted {get {return _percentComplete == 100;}}
public event EventHandler<CompleteArgs> Complete; // CompleteArgs in a usual pattern
// you may also want to have Progress event
public bool CompletedSynchronously {get {return false;}}
public WaitHandle
{
get
{
// initialize it lazily
if (_waitHandle == null)
{
ManualResetEvent newWaitHandle = new ManualResetEvent(false);
if (Interlocked.CompareExchange(ref _waitHandle, newWaitHandle, null) != null)
newWaitHandle.Dispose();
}
return _waitHandle;
}
}
public void Dispose()
{
if (_waitHandle != null)
_waitHandle.Dispose();
// dispose _workObject too, if needed
}
public WorkObjectHandle(IWorkObject workObject)
{
WorkObject = workObject;
_percentComplete = 0;
}
}
public class AsyncScheduler
{
private Queue<WorkObjectHandle> _workQueue = new Queue<WorkObjectHandle>();
private bool _finished = false;
public WorkObjectHandle QueueAsyncWork(IWorkObject workObject)
{
var handle = new WorkObjectHandle(workObject);
lock(_workQueue)
{
_workQueue.Enqueue(handle);
}
return handle;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
WorkObjectHandle handle;
lock(_workQueue)
{
if (_workQueue.Count == 0) break;
handle = _workQueue.Dequeue();
}
try
{
var workObject = handle.WorkObject;
// do whatever you want with workObject, set handle.PercentCompleted, etc.
}
finally
{
handle.Dispose();
}
}
}
}
If I understand correctly you have a collection of work objects (IWorkObject) that each complete a task via multiple calls to a DoSomeWork method. When an IWorkObject object has finished its work you'd like to respond to that somehow and during the process you'd like to respond to any reported progress?
In that case I'd suggest you take a slightly different approach. You could take a look at the Parallel Extension framework (blog). Using the framework, you could write something like this:
public void QueueWork(IWorkObject workObject)
{
Task.TaskFactory.StartNew(() =>
{
while (!workObject.Finished)
{
int progress = workObject.DoSomeWork();
DoSomethingWithReportedProgress(workObject, progress);
}
WorkObjectIsFinished(workObject);
});
}
Some things to note:
QueueWork now returns void. The reason for this is that the actions that occur when progress is reported or when the task completes have become part of the thread that executes the work. You could of course return the Task that the factory creates and return that from the method (to enable polling for example).
The progress-reporting and finish-handling are now part of the thread because you should always avoid polling when possible. Polling is more expensive because usually you either poll too frequently (too early) or not often enough (too late). There is no reason you can't report on the progress and finishing of the task from within the thread that is running the task.
The above could also be implemented using the (lower level) ThreadPool.QueueUserWorkItem method.
Using QueueUserWorkItem:
public void QueueWork(IWorkObject workObject)
{
ThreadPool.QueueUserWorkItem(() =>
{
while (!workObject.Finished)
{
int progress = workObject.DoSomeWork();
DoSomethingWithReportedProgress(workObject, progress);
}
WorkObjectIsFinished(workObject);
});
}
The WorkObject class can contain the properties that need to be tracked.
public class WorkObject
{
public PercentComplete { get; private set; }
public IsFinished { get; private set; }
public void DoSomeWork()
{
// work done here
this.PercentComplete = 50;
// some more work done here
this.PercentComplete = 100;
this.IsFinished = true;
}
}
Then in your example:
Change the collection from a List to a Dictionary that can hold Guid values (or any other means of uniquely identifying the value).
Expose the correct WorkObject's properties by having the caller pass the Guid that it received from QueueAsyncWork.
I'm assuming that you'll start WorkThread asynchronously (albeit, the only asynchronous thread); plus, you'll have to make retrieving the dictionary values and WorkObject properties thread-safe.
private Dictionary<Guid, WorkObject> _workList =
new Dictionary<Guid, WorkObject>();
private bool _finished = false;
public Guid QueueAsyncWork(WorkObject workObject)
{
Guid guid = Guid.NewGuid();
// simplified for the sake of example
_workList.Add(guid, workObject);
return guid;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
foreach (WorkObject workObject in _workList)
{
if (!workObject.IsFinished)
{
workObject.DoSomeWork();
}
}
Thread.Sleep(1000);
}
}
// an example of getting the WorkObject's property
public int GetPercentComplete(Guid guid)
{
WorkObject workObject = null;
if (!_workList.TryGetValue(guid, out workObject)
throw new Exception("Unable to find Guid");
return workObject.PercentComplete;
}
The simplest way to do this is described here. Suppose you have a method string DoSomeWork(int). You then create a delegate of the correct type, for example:
Func<int, string> myDelegate = DoSomeWork;
Then you call the BeginInvoke method on the delegate:
int parameter = 10;
myDelegate.BeginInvoke(parameter, Callback, null);
The Callback delegate will be called once your asynchronous call has completed. You can define this method as follows:
void Callback(IAsyncResult result)
{
var asyncResult = (AsyncResult) result;
var #delegate = (Func<int, string>) asyncResult.AsyncDelegate;
string methodReturnValue = #delegate.EndInvoke(result);
}
Using the described scenario, you can also poll for results or wait on them. Take a look at the url I provided for more info.
Regards,
Ronald
If you don't want to use async callbacks, you can use an explicit WaitHandle, such as a ManualResetEvent:
public abstract class WorkObject : IDispose
{
ManualResetEvent _waitHandle = new ManualResetEvent(false);
public void DoSomeWork()
{
try
{
this.DoSomeWorkOverride();
}
finally
{
_waitHandle.Set();
}
}
protected abstract DoSomeWorkOverride();
public void WaitForCompletion()
{
_waitHandle.WaitOne();
}
public void Dispose()
{
_waitHandle.Dispose();
}
}
And in your code you could say
using (var workObject = new SomeConcreteWorkObject())
{
asyncScheduler.QueueAsyncWork(workObject);
workObject.WaitForCompletion();
}
Don't forget to call Dispose on your workObject though.
You can always use alternate implementations which create a wrapper like this for every work object, and who call _waitHandle.Dispose() in WaitForCompletion(), you can lazily instantiate the wait handle (careful: race conditions ahead), etc. (That's pretty much what BeginInvoke does for delegates.)