Parallel processing using concurrent collection

Parallel processing using concurrent collection - c#

I currently have a function that perform a set of 10 tasks in parallel. After the 10 tasks completes i move on to the next 10 until my queue is empty. I am looking forward to increase the efficiency of that algorithm as right now if 9 of my tasks have completed in 1 min and my 10th task is taking another 10 min i need to wait for all the 10 task to complete even though i have 9 spot free for 9 other task to start using.
Is there a way that when a task is completed, i immediately send another task for processing within that same level(for each loop). I saw that concurrent Dictionary can be use. Can you please guide and provide some sample code.
public async Task Test()
{
List<task> listoftasks =new List<Task>();
foreach(level in levels)
{
Queue<Model1> queue=new Queue<Model1>(Store);
while(queue.Count>0)
{
for(int i=0;i<10;i++)
{
if(!queue.TryDequeue(out Model1 item))
{
break;
}
listoftasks.Add(Task.Run(()=>Dosomething(sql)))
}
await Task.WhenAll(listoftasks);
listoftasks .Clear();
}
}
}

You can just use LimitedConcurrencyLevelTaskScheduler to achieve desired behavour (https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0). In this case you can just push all tasks at one moment and they will be executed with desired level of concurrency (not more then 10 tasks at the parallel in your case).

You can get each Task to dequeue an item. Use a ConcurrentQueue to ensure thread-safety.
It's kind of a poor-man's scheduler, but it's very lightweight.
ConcurrentQueue<Model1> queue;
void Dequeue()
{
while(queue.TryDequeue(out var item))
DoSomething(item);
}
public async Task Test()
{
queue = new ConcurrentQueue<Model1>(Store);
var listoftasks = new List<Task>();
for (var i = 0; i < Environment.ProcessorCount; i++)
listoftasks.Add(Task.Run(() => Dequeue()));
await Task.WhenAll(listoftasks);
}
Note: this does not handle exceptions, so all exceptions must be handled or swallowed

Personally I'd use an ActionBlock (out of the TPL Dataflow library). It has
built in MaxDegreeOfParallelism
Can easily deal with async IO Bound workloads, or non async CPU Bound workloads
Has cancelation support (if needed)
Can be built into larger pipelines
Can run as perpetual consumer in a multi-producer environment
Given
private ActionBlock<Model> _processor;
Setup
_processor = new ActionBlock<Model>(
DoSomething,
new ExecutionDataflowBlockOptions()
{
CancellationToken = SomeCancelationTokenIfNeeded,
MaxDegreeOfParallelism = 10,
SingleProducerConstrained = true
});
Some Method
public static void DoSomething(Model item)
{ ... }
Usage
await _processor.SendAsync(someItem);

Related

Concurrent parallel job processing with throttling using ActionBlock in TPL Dataflow

I am using the below code snippet to try and run jobs (selected by user from UI) non-blocking from main thread (asynchronously) and concurrently w.r.t each other, with some throttling set up to prevent too many jobs hogging all RAM. I used many sources such as Stephen Cleary's blog, this link on ActionBlock as well as this one from #i3arnon
public class ActionBlockJobsAsyncImpl {
private ActionBlock<Job> qJobs;
private Dictionary<Job, CancellationTokenSource> cTokSrcs;
public ActionBlockJobsAsyncImpl () {
qJobs = new ActionBlock<Job>(
async a_job => await RunJobAsync(a_job),
new ExecutionDataflowBlockOptions
{
BoundedCapacity = boundedCapacity,
MaxDegreeOfParallelism = maxDegreeOfParallelism,
});
cTokSrcs = new Dictionary<Job, CancellationTokenSource>();
}
private async Task<bool> RunJobAsync(Job a_job) {
JobArgs args = JobAPI.GetJobArgs(a_job);
bool ok = await JobAPI.RunJobAsync(args, cTokSrcs[a_job].Token);
return ok;
}
private async Task Produce(IEnumerable<Job> jobs) {
foreach (var job in jobs)
{
await qJobs.SendAsync(job);
}
//qJobs.Complete();
}
public override async Task SubmitJobs(IEnumerable<Job> jobs) {
//-Establish new cancellation token and task status
foreach (var job in jobs) {
cTokSrcs[job] = new CancellationTokenSource();
}
// Start the producer.
var producer = Produce(jobs);
// Wait for everything to complete.
await Task.WhenAll(producer);
}
}
The reason I commented out the qJobs.Complete() method call was because the user should be able to submit jobs continuously from the UI (same ones or different ones), and I learnt from implementing and testing in my first pass using BufferBlock that I shouldn't have that Complete() call if I wanted such a continuous producer/consumer queue. But BufferBlock as I learnt doesn't support running concurrent jobs; hence this is my second pass with ActionBlock instead.
In the above code using ActionBlock, when the user selects jobs and clicks to run from UI, this calls the SubmitJobs method. The int parameters boundedCapacity=8 and maxDegreeOfParallelism=DataflowBlockOptions.Unbounded But the code as is, currently does nothing (i.e., it doesn't run any job) - my analogous BufferBlock implementation on the other hand, used to at least run the jobs asynchronously, albeit sequentially w.r.t each other. Here, it never runs any of the jobs and I don't see any error messages either. Appreciate any ideas on what I'm doing wrong and perhaps some useful ideas on how to fix the problem. Thanks for your interest.

Dispatcher, Async/Await, Concurrent work

I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?

Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);

Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?

As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;

Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.

You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}

While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Executing N number of threads in parallel and in a sequential manner

I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?

One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.

Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}

var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).

Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}

Why does Parallel.Foreach create endless threads?

The code below continues to create threads, even when the queue is empty..until eventually an OutOfMemory exception occurs. If i replace the Parallel.ForEach with a regular foreach, this does not happen. anyone know of reasons why this may happen?
public delegate void DataChangedDelegate(DataItem obj);
public class Consumer
{
public DataChangedDelegate OnCustomerChanged;
public DataChangedDelegate OnOrdersChanged;
private CancellationTokenSource cts;
private CancellationToken ct;
private BlockingCollection<DataItem> queue;
public Consumer(BlockingCollection<DataItem> queue) {
this.queue = queue;
Start();
}
private void Start() {
cts = new CancellationTokenSource();
ct = cts.Token;
Task.Factory.StartNew(() => DoWork(), ct);
}
private void DoWork() {
Parallel.ForEach(queue.GetConsumingPartitioner(), item => {
if (item.DataType == DataTypes.Customer) {
OnCustomerChanged(item);
} else if(item.DataType == DataTypes.Order) {
OnOrdersChanged(item);
}
});
}
}

I think Parallel.ForEach() was made primarily for processing bounded collections. And it doesn't expect collections like the one returned by GetConsumingPartitioner(), where MoveNext() blocks for a long time.
The problem is that Parallel.ForEach() tries to find the best degree of parallelism, so it starts as many Tasks as the TaskScheduler lets it run. But the TaskScheduler sees there are many Tasks that take a very long time to finish, and that they're not doing anything (they block) so it keeps on starting new ones.
I think the best solution is to set the MaxDegreeOfParallelism.
As an alternative, you could use TPL Dataflow's ActionBlock. The main difference in this case is that ActionBlock doesn't block any threads when there are no items to process, so the number of threads wouldn't get anywhere near the limit.

The Producer/Consumer pattern is mainly used when there is just one Producer and one Consumer.
However, what you are trying to achieve (multiple consumers) more neatly fits in the Worklist pattern. The following code was taken from a slide for unit2 slide "2c - Shared Memory Patterns" from a parallel programming class taught at the University of Utah, which is available in the download at http://ppcp.codeplex.com/
BlockingCollection<Item> workList;
CancellationTokenSource cts;
int itemcount
public void Run()
{
int num_workers = 4;
//create worklist, filled with initial work
worklist = new BlockingCollection<Item>(
new ConcurrentQueue<Item>(GetInitialWork()));
cts = new CancellationTokenSource();
itemcount = worklist.Count();
for( int i = 0; i < num_workers; i++)
Task.Factory.StartNew( RunWorker );
}
IEnumberable<Item> GetInitialWork() { ... }
public void RunWorker() {
try {
do {
Item i = worklist.Take( cts.Token );
//blocks until item available or cancelled
Process(i);
//exit loop if no more items left
} while (Interlocked.Decrement( ref itemcount) > 0);
} finally {
if( ! cts.IsCancellationRequested )
cts.Cancel();
}
}
}
public void AddWork( Item item) {
Interlocked.Increment( ref itemcount );
worklist.Add(item);
}
public void Process( Item i )
{
//Do what you want to the work item here.
}
The preceding code allows you to add worklist items to the queue, and lets you set an arbitrary number of workers (in this case, four) to pull items out of the queue and process them.
Another great resource for the Parallelism on .Net 4.0 is the book "Parallel Programming with Microsoft .Net" which is freely available at: http://msdn.microsoft.com/en-us/library/ff963553

Internally in the Task Parallel Library, the Parallel.For and Parallel.Foreach follow a hill-climbing algorithm to determine how much parallelism should be utilized for the operation.
More or less, they start with running the body on one task, move to two, and so on, until a break-point is reached and they need to reduce the number of tasks.
This works quite well for method bodies that complete quickly, but if the body takes a long time to run, it may take a long time before the it realizes it needs to decrease the amount of parallelism. Until that point, it continues adding tasks, and possibly crashes the computer.
I learned the above during a lecture given by one of the developers of the Task Parallel Library.
Specifying the MaxDegreeOfParallelism is probably the easiest way to go.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel processing using concurrent collection - c#

Related

Concurrent parallel job processing with throttling using ActionBlock in TPL Dataflow

Dispatcher, Async/Await, Concurrent work

How to correctly queue up tasks to run in C#

Executing N number of threads in parallel and in a sequential manner

Why does Parallel.Foreach create endless threads?

Categories

Resources