Parallel.Foreach in dynamic List

Parallel.Foreach in dynamic List - c#

I am working on the file uploading to server.I am using Parallel.Foreach to split the task.In my Sample Source
Task.Factory.StartNew (delegate
{
Parallel.ForEach (metaDatas, new ParallelOptions{ MaxDegreeOfParallelism = 3 }, itemModel =>
{
apimanager.ContentConnector.uploadItem (0, itemModel.PhysicalFileName, itemModel.ParentId, itemModel.Size, itemModel.path, fetchDataDelegate, finishedUploadingDataDelegate, failedToUploadDataDelegate);
});
Console.WriteLine ("Upload Completed");
});
in my case metaDatas list is dynamically update.how can i use the latest metaDatas List in Parallel.ForEach.

I met the same problem.
I found a non standard solution that can help.
var buffer = new BlockingCollection<Task>(maxThreads);
// process while download all threads
while (!Queue.IsEmpty || buffer.Count > 0)
{
SomeClass item;
while (Queue.TryDequeue(out item))
{
var localItem = item;
var localTask = new Task(() =>
{
// work with your item
// in this place you can append Queue
buffer.Take(); // free buffer for adding new threads
});
buffer.Add(localTask);
localTask.Start();
}
// in this point Queue is empty,
// but buffer have remaining thread
Task.Delay(50).Wait();
}
So, Queue is ConcurrentQueue<SomeClass>, your dymanic source
It's not perfect solution, but it's work.

Related

TPL Dataflow LinkTo TransformBlock is very slow

I have two TransformBlocks which are arranged in a loop. They link their data to each other. TransformBlock 1 is an I/O block reading data and is limited to a maximum of 50 tasks. It reads the data and some meta data. Then they are passed to the second block. The second block decides on the meta data if the message goes again to the first block. So after the meta data matches the criteria and a short wait the data should go again back again to the I/O block. The second blocks MaxDegreeOfParallelism can be unlimited.
Now I have noticed when I send a lot of data to the I/O block it takes a long time till the messages are linked to the second block. It takes like 10 minutes to link the data and they are all sent in a bunch. Like 1000 entries in a few seconds.
Normally I would implement it like so:
public void Start()
{
_ioBlock = new TransformBlock<Data,Tuple<Data, MetaData>>(async data =>
{
var metaData = await ReadAsync(data).ConfigureAwait(false);
return new Tuple<Data, MetaData>(data, metaData);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 50 });
_waitBlock = new TransformBlock<Tuple<Data, MetaData>,Data>(async dataMetaData =>
{
var data = dataMetaData.Item1;
var metaData = dataMetaData.Item2;
if (!metaData.Repost)
{
return null;
}
await Task.Delay(TimeSpan.FromMinutes(1)).ConfigureAwait(false);
return data;
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
_ioBlock.LinkTo(_waitBlock);
_waitBlock.LinkTo(_ioBlock, data => data != null);
_waitBlock.LinkTo(DataflowBlock.NullTarget<Data>());
foreach (var data in Enumerable.Range(0, 2000).Select(i => new Data(i)))
{
_ioBlock.Post(data);
}
}
But because of the described problem I have to implement it like so:
public void Start()
{
_ioBlock = new ActionBlock<Data>(async data =>
{
var metaData = await ReadAsync(data).ConfigureAwait(false);
var dataMetaData= new Tuple<Data, MetaData>(data, metaData);
_waitBlock.Post(dataMetaData);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 50 });
_waitBlock = new ActionBlock<Tuple<Data, MetaData>>(async dataMetaData =>
{
var data = dataMetaData.Item1;
var metaData = dataMetaData.Item2;
if (metaData.Repost)
{
await Task.Delay(TimeSpan.FromMinutes(1)).ConfigureAwait(false);
_ioBlock.Post(data);
}
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
foreach (var data in Enumerable.Range(0, 2000).Select(i => new Data(i)))
{
_ioBlock.Post(data);
}
}
When I use the second approach the data get linked/posted faster (one by one). But it feels more like a hack to me. Anybody know how to fix the problem? Some friends recommended me to use TPL Pipeline but it seems much more complicated to me.

Problem solved. You need to set
ExecutionDataflowBlockOptions.EnsureOrdered
to forward the data immediately to the next/wait block.
Further information:
Why do blocks run in this order?

Chaining tasks with continuation and run parallel task afterward

The work flow of the parallel tasks
I am hoping to get help on the problem I am facing. So the problem is that I am running parallel tasks to search through folders for files. Each task entails identifying files and add it to a array of files. Next, wait until every task completes so that files are gathered up, then perform sorting on the results. Next, process the sorted file independently, by running one task per file to read through it to get a matching pattern back. The final stage is to aggregate all the results together in human readable format and display it in a user-friendly way.
So the question is that I want to chain the tasks in a proper way that does not blocks the UI thread. I would like to be able to cancel everything at any stage the program is at.
To sum it up:
Stage 1: Find files by searching through folders. Each task search recursively through a folder tree.
Stage 2: Sort all the files found and clean up duplicates
Stage 3: Start new tasks to process the files independently. Each task opens a file and search for matching pattern.
Stage 4: Aggregate result from every single file search into one giant result set and make it pretty for human to read.
List<Task> myTasks = new List<Task>();
// ==== stage 1 ======
for(int i = 0; i < 10; i++) {
string directoryName = directories[i];
Task t = new Task(() =>
{
FindFiles(directoryName);
});
myTasks.Add(t);
t.Start();
}
// ==== stage 2 ====
Task sortTask = Task.Factory.ContinueWhenAll(myTasks.ToArray(), (t) =>
{
if(_fileResults.Count > 1) {
// sort the files and remove any duplicates
}
});
sortTask.Wait();
// ==== stage 3 ====
Task tt = new Task(() =>
{
Parallel.For(0, _fileResults.Count, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount, CancellationToken = token, TaskScheduler = _taskScheduler },
(i, loopstate) => {
// 1. open file
// 2. read file
// 3. read file line by line
}
}
// == stage 4 ===
tt.ContinueWith((t) =>
{
// 1. aggregate the file results into one giant result set
// 2. display the giant result set in human readable format
}, token, TaskContinuationOptions.OnlyOnRanToCompletion, TaskScheduler.FromCurrentSynchronizationContext());
tt.start();

Don't synchronously wait for any of the tasks to finish. If any of those operations need to take place after a previously created task, add that work as a continuation of that task instead.

Have you considered using the async/await feature - By the sounds of your question, it's the perfect match for your needs. Here's a quick attempt at your problem using it:
try
{
List<Task<File[]>> stage1Tasks = new List<Task<File[]>>();
// ==== stage 1 ======
for (int i = 0; i < 10; i++)
{
string directoryName = directories[i];
Task<File[]> t = Task.Run(() =>
{
return FindFiles(directoryName);
},
token);
stage1Tasks.Add(t);
}
File[][] files = await Task.WhenAll(stage1Tasks).ConfigureAwait(false);
// Flatten files.
File[] _fileResults = files.SelectMany(x => x).ToArray();
// ==== stage 2 ====
Task<File[]> sortFilesTask = Task.Run(() =>
{
if (_fileResults.Count > 1)
{
// sort the files and remove any duplicates
return _fileResults.Reverse().ToArray();
}
},
token);
File[] _sortedFileResults = await sortFilesTask.ConfigureAwait(false);
// ==== stage 3 ====
Task<SomeResult[]> tt = Task.Run(() =>
{
SomeResult[] results = new SomeResult[_sortedFileResults.Length];
Parallel.ForEach(_sortedFileResults,
new ParallelOptions {
MaxDegreeOfParallelism = Environment.ProcessorCount,
CancellationToken = token,
TaskScheduler = _taskScheduler
},
(i, loopstate) =>
{
// 1. open file
// 2. read file
// 3. read file line by line
results[i] = new SomeResult( /* here goes your results for each file */);
});
return results;
},
token);
SomeResult[] theResults = await tt.ConfigureAwait(false);
// == stage 4 ===
// 1. aggregate the file results into one giant result set
// 2. display the giant result set in human readable format
// ....
}
catch (TaskCanceledException)
{
// some task has been cancelled...
}

Executing N number of threads in parallel and in a sequential manner

I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?

One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.

Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}

var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).

Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}

How to mark a TPL dataflow cycle to complete?

Given the following setup in TPL dataflow.
var directory = new DirectoryInfo(#"C:\dev\kortforsyningen_dsm\tiles");
var dirBroadcast=new BroadcastBlock<DirectoryInfo>(dir=>dir);
var dirfinder = new TransformManyBlock<DirectoryInfo, DirectoryInfo>((dir) =>
{
return directory.GetDirectories();
});
var tileFilder = new TransformManyBlock<DirectoryInfo, FileInfo>((dir) =>
{
return directory.GetFiles();
});
dirBroadcast.LinkTo(dirfinder);
dirBroadcast.LinkTo(tileFilder);
dirfinder.LinkTo(dirBroadcast);
var block = new XYZTileCombinerBlock<FileInfo>(3, (file) =>
{
var coordinate = file.FullName.Split('\\').Reverse().Take(3).Reverse().Select(s => int.Parse(Path.GetFileNameWithoutExtension(s))).ToArray();
return XYZTileCombinerBlock<CloudBlockBlob>.TileXYToQuadKey(coordinate[0], coordinate[1], coordinate[2]);
},
(quad) =>
XYZTileCombinerBlock<FileInfo>.QuadKeyToTileXY(quad,
(z, x, y) => new FileInfo(Path.Combine(directory.FullName,string.Format("{0}/{1}/{2}.png", z, x, y)))),
() => new TransformBlock<string, string>((s) =>
{
Trace.TraceInformation("Combining {0}", s);
return s;
}));
tileFilder.LinkTo(block);
using (new TraceTimer("Time"))
{
dirBroadcast.Post(directory);
block.LinkTo(new ActionBlock<FileInfo>((s) =>
{
Trace.TraceInformation("Done combining : {0}", s.Name);
}));
block.Complete();
block.Completion.Wait();
}
i am wondering how I can mark this to complete because of the cycle. A directory is posted to the dirBroadcast broadcaster which posts to the dirfinder that might post back new dirs to the broadcaster, so i cant simply mark it as complete because it would block any directories being added from the dirfinder. Should i redesign it to keep track of the number of dirs or is there anything for this in TPL.

If the purpose of your code is to traverse the directory structure using some sort of parallelism then I would suggest not using TPL Dataflow and use Microsoft's Reactive Framework instead. I think it becomes much simpler.
Here's how I would do it.
First define a recursive function to build the list of directories:
Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
recurse = di =>
Observable
.Return(di)
.Concat(di.GetDirectories()
.ToObservable()
.SelectMany(di2 => recurse(di2)))
.ObserveOn(Scheduler.Default);
This performs the recurse of the directories and uses the default Rx scheduler which causes the observable to run in parallel.
So by calling recurse with an input DirectoryInfo I get an observable list of the input directory and all of its descendants.
Now I can build a fairly straight-forward query to get the results I want:
var query =
from di in recurse(new DirectoryInfo(#"C:\dev\kortforsyningen_dsm\tiles"))
from fi in di.GetFiles().ToObservable()
let zxy =
fi
.FullName
.Split('\\')
.Reverse()
.Take(3)
.Reverse()
.Select(s => int.Parse(Path.GetFileNameWithoutExtension(s)))
.ToArray()
let suffix = String.Format("{0}/{1}/{2}.png", zxy[0], zxy[1], zxy[2])
select new FileInfo(Path.Combine(di.FullName, suffix));
Now I can action the query like this:
query
.Subscribe(s =>
{
Trace.TraceInformation("Done combining : {0}", s.Name);
});
Now I may have missed a little bit in your custom code but if this is an approach you want to take I'm sure you can fix any logical issues quite easily.
This code automatically handles completion when it runs out of child directories and files.
To add Rx to your project look for "Rx-Main" in NuGet.

I am sure this is not always possible, but in many cases (including directory enumeration) you can use a running counter and the Interlocked functions to have a cyclic one-to-many dataflow that completes:
public static ISourceBlock<string> GetDirectoryEnumeratorBlock(string path, int maxParallel = 5)
{
var outputBuffer = new BufferBlock<string>();
var count = 1;
var broadcastBlock = new BroadcastBlock<string>(s => s);
var getDirectoriesBlock = new TransformManyBlock<string, string>(d =>
{
var files = Directory.EnumerateDirectories(d).ToList();
Interlocked.Add(ref count, files.Count - 1); //Adds the subdir count, minus 1 for the current directory.
if (count == 0) //if count reaches 0 then all directories have been enumerated.
broadcastBlock.Complete();
return files;
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = maxParallel });
broadcastBlock.LinkTo(outputBuffer, new DataflowLinkOptions() { PropagateCompletion = true });
broadcastBlock.LinkTo(getDirectoriesBlock, new DataflowLinkOptions() { PropagateCompletion = true });
getDirectoriesBlock.LinkTo(broadcastBlock);
getDirectoriesBlock.Post(path);
return outputBuffer;
}
I have used this with a slight modification to enumerate files, but it works well. Be careful with the max degree of parallelism, this can quickly saturate a network file system!

I don't see any way this can be done, because each block (dirBroadcast and tileFilder) depends on the other one and can't complete on its own.
I suggest you redesign your directory traversal without TPL Dataflow, which isn't a good fit for this kind of problem. A better approach in my opinion would simply be to recursively scan the directories and fill your block with a stream of files:
private static void FillBlock(DirectoryInfo directoryInfo, XYZTileCombinerBlock<FileInfo> block)
{
foreach (var fileInfo in directoryInfo.GetFiles())
{
block.Post(fileInfo);
}
foreach (var subDirectory in directoryInfo.GetDirectories())
{
FillBlock(subDirectory, block);
}
}
FillBlock(directory, block);
block.Complete();
await block.Completion;

Here is a generalized approach of Andrew Hanlon's solution. It returns a TransformBlock that supports posting messages recursively to itself, and completes automatically when there are no more messages to process.
The transform lambda has three arguments instead of the usual one. The first argument is the item being processed. The second argument is the "path" of the processed message, which is a sequence IEnumerable<TInput> containing its parent messages. The third argument is an Action<TInput> that posts new messages to the block, as children of the current message.
/// <summary>Creates a dataflow block that supports posting messages to itself,
/// and knows when it has completed processing all messages.</summary>
public static IPropagatorBlock<TInput, TOutput>
CreateRecursiveTransformBlock<TInput, TOutput>(
Func<TInput, IEnumerable<TInput>, Action<TInput>, Task<TOutput>> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
int pendingCount = 1; // The initial 1 represents the completion of input1 block
var input1 = new TransformBlock<TInput, (TInput, IEnumerable<TInput>)>(item =>
{
Interlocked.Increment(ref pendingCount);
return (item, Enumerable.Empty<TInput>());
}, new ExecutionDataflowBlockOptions()
{
CancellationToken = dataflowBlockOptions.CancellationToken,
BoundedCapacity = dataflowBlockOptions.BoundedCapacity
});
var input2 = new BufferBlock<(TInput, IEnumerable<TInput>)>(new DataflowBlockOptions()
{
CancellationToken = dataflowBlockOptions.CancellationToken
// Unbounded capacity
});
var output = new TransformBlock<(TInput, IEnumerable<TInput>), TOutput>(async entry =>
{
try
{
var (item, path) = entry;
var postChildAction = CreatePostAction(item, path);
return await transform(item, path, postChildAction).ConfigureAwait(false);
}
finally
{
if (Interlocked.Decrement(ref pendingCount) == 0) input2.Complete();
}
}, dataflowBlockOptions);
Action<TInput> CreatePostAction(TInput parentItem, IEnumerable<TInput> parentPath)
{
return item =>
{
// The Post will be unsuccessful only in case of block failure
// or cancellation, so no specific action is needed here.
if (input2.Post((item, parentPath.Append(parentItem))))
{
Interlocked.Increment(ref pendingCount);
}
};
}
input1.LinkTo(output);
input2.LinkTo(output);
PropagateCompletion(input1, input2,
condition: () => Interlocked.Decrement(ref pendingCount) == 0);
PropagateCompletion(input2, output);
PropagateFailure(output, input1, input2); // Ensure that all blocks are faulted
return DataflowBlock.Encapsulate(input1, output);
async void PropagateCompletion(IDataflowBlock block1, IDataflowBlock block2,
Func<bool> condition = null)
{
try
{
await block1.Completion.ConfigureAwait(false);
}
catch { }
if (block1.Completion.Exception != null)
{
block2.Fault(block1.Completion.Exception.InnerException);
}
else
{
if (block1.Completion.IsCanceled) return; // On cancellation do nothing
if (condition == null || condition()) block2.Complete();
}
}
async void PropagateFailure(IDataflowBlock block1, IDataflowBlock block2,
IDataflowBlock block3)
{
try
{
await block1.Completion.ConfigureAwait(false);
}
catch (Exception ex)
{
if (block1.Completion.IsCanceled) return; // On cancellation do nothing
block2.Fault(ex); block3.Fault(ex);
}
}
}
// Overload with synchronous delegate
public static IPropagatorBlock<TInput, TOutput>
CreateRecursiveTransformBlock<TInput, TOutput>(
Func<TInput, IEnumerable<TInput>, Action<TInput>, TOutput> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateRecursiveTransformBlock<TInput, TOutput>((item, path, postAction) =>
Task.FromResult(transform(item, path, postAction)), dataflowBlockOptions);
}
The resulting block consists internally of three blocks: two input blocks that receive messages, and one output block that processes the messages. The first input block receives messages from outside, and the second input block receives messages from inside. The second input block has unbounded capacity, so an infinite recursion will eventually result to an OutOfMemoryException.
Usage example:
var fileCounter = CreateRecursiveTransformBlock<string, int>(
(folderPath, parentPaths, postChild) =>
{
var subfolders = Directory.EnumerateDirectories(folderPath);
foreach (var subfolder in subfolders) postChild(subfolder);
var files = Directory.EnumerateFiles(folderPath);
Console.WriteLine($"{folderPath} has {files.Count()} files"
+ $", and is {parentPaths.Count()} levels deep");
return files.Count();
});
fileCounter.LinkTo(DataflowBlock.NullTarget<int>());
fileCounter.Post(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments));
fileCounter.Complete();
fileCounter.Completion.Wait();
The above code prints in the console all the subfolders of the folder "MyDocuments".

Just to show my real answer, a combination of TPL and Rx.
Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
recurse = di =>
Observable
.Return(di)
.Concat(di.GetDirectories()
.Where(d => int.Parse(d.Name) <= br_tile[0] && int.Parse(d.Name) >= tl_tile[0])
.ToObservable()
.SelectMany(di2 => recurse(di2)))
.ObserveOn(Scheduler.Default);
var query =
from di in recurse(new DirectoryInfo(Path.Combine(directory.FullName, baselvl.ToString())))
from fi in di.GetFiles().Where(f => int.Parse(Path.GetFileNameWithoutExtension(f.Name)) >= br_tile[1]
&& int.Parse(Path.GetFileNameWithoutExtension(f.Name)) <= tl_tile[1]).ToObservable()
select fi;
query.Subscribe(block.AsObserver());
Console.WriteLine("Done subscribing");
block.Complete();
block.Completion.Wait();
Console.WriteLine("Done TPL Block");
where block is my var block = new XYZTileCombinerBlock<FileInfo>

Async/await tasks and WaitHandle

Say I have 10N items(I need to fetch them via http protocol), in the code N Tasks are started to get data, each task takes 10 items in sequence. I put the items in a ConcurrentQueue<Item>. After that, the items are processed in a thread-unsafe method one by one.
async Task<Item> GetItemAsync()
{
//fetch one item from the internet
}
async Task DoWork()
{
var tasks = new List<Task>();
var items = new ConcurrentQueue<Item>();
var handles = new List<ManualResetEvent>();
for i 1 -> N
{
var handle = new ManualResetEvent(false);
handles.Add(handle);
tasks.Add(Task.Factory.StartNew(async delegate
{
for j 1 -> 10
{
var item = await GetItemAsync();
items.Enqueue(item);
}
handle.Set();
});
}
//begin to process the items when any handle is set
WaitHandle.WaitAny(handles);
while(true)
{
if (all handles are set && items collection is empty) //***
break;
//in another word: all tasks are really completed
while(items.TryDequeue(out item))
{
AThreadUnsafeMethod(item); //process items one by one
}
}
}
I don't know what if condition can be placed in the statement marked ***. I can't use Task.IsCompleted property here, because I use await in the task, so the task is completed very soon. And a bool[] that indicates whether the task is executed to the end looks really ugly, because I think ManualResetEvent can do the same work. Can anyone give me a suggestion?

Well, you could build this yourself, but I think it's tons easier with TPL Dataflow.
Something like:
static async Task DoWork()
{
// By default, ActionBlock uses MaxDegreeOfParallelism == 1,
// so AThreadUnsafeMethod is not called in parallel.
var block = new ActionBlock<Item>(AThreadUnsafeMethod);
// Start off N tasks, each asynchronously acquiring 10 items.
// Each item is sent to the block as it is received.
var tasks = Enumerable.Range(0, N).Select(Task.Run(
async () =>
{
for (int i = 0; i != 10; ++i)
block.Post(await GetItemAsync());
})).ToArray();
// Complete the block when all tasks have completed.
Task.WhenAll(tasks).ContinueWith(_ => { block.Complete(); });
// Wait for the block to complete.
await block.Completion;
}

You can do a WaitOne with a timeout of zero to check the state. Something like this should work:
if (handles.All(handle => handle.WaitOne(TimeSpan.Zero)) && !items.Any())
break;
http://msdn.microsoft.com/en-us/library/cc190477.aspx

Thanks all. At last I found CountDownEvent is very suitable for this scenario. The general implementation looks like this:(for others' information)
for i 1 -> N
{
//start N tasks
//invoke CountDownEvent.Signal() at the end of each task
}
//see if CountDownEvent.IsSet here

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.Foreach in dynamic List - c#

Related

TPL Dataflow LinkTo TransformBlock is very slow

Chaining tasks with continuation and run parallel task afterward

Executing N number of threads in parallel and in a sequential manner

How to mark a TPL dataflow cycle to complete?

Async/await tasks and WaitHandle

Categories

Resources