Creating a execution queue by using Task.ContinueWith?

Creating a execution queue by using Task.ContinueWith? - c#

I have several actions that I want to execute in the background, but they have to be executed synchronously one after the other.
I was wondering if it's a good idea to use the Task.ContinueWith method to achieve this. Do you foresee any problems with this?
My code looks something like this:
private object syncRoot =new object();
private Task latestTask;
public void EnqueueAction(System.Action action)
{
lock (syncRoot)
{
if (latestTask == null)
latestTask = Task.Factory.StartNew(action);
else
latestTask = latestTask.ContinueWith(tsk => action());
}
}

There is one flaw with this, which I recently discovered myself because I am also using this method of ensuring tasks execute sequentially.
In my application I had thousands of instances of these mini-queues and quickly discovered I was having memory issues. Since these queues were often idle I was holding onto the last completed task object for a long time and preventing garbage collection. Since the result object of the last completed task was often over 85,000 bytes it was allocated to Large Object Heap (which does not perform compaction during garbage collection). This resulted in fragmentation of the LOH and the process continuously growing in size.
As a hack to avoid this, you can schedule a no-op task right after the real one within your lock. For a real solution, I will need to move to a different method of controlling the scheduling.

This should work as designed (using the fact that TPL will schedule the continuation immediately if the corresponding task already has completed).
Personally in this case I would just use a dedicated thread using a concurrent queue (ConcurrentQueue) to draw tasks from - this is more explicit but easier to parse reading the code, especially if you want to find out i.e. how many tasks are currently queued etc.

I used this snippet and have seem to get it work as designed.
The number of instances in my case does not runs in to thousands, but in single digit.
Nevertheless, no issues so far.
I would be interested in the ConcurrentQueue example, if there is any?
Thanks

Related

TPL Dataflow local storage or something like it

What I'm trying to accomplish is I have a action block with MaxDegreeOfParallelism = 4. I want to create one local instance of a session object I have for each parallel path, So I want to total of 4 session objects. If this was threads I would creating something like:
ThreadLocal<Session> sessionPerThread = new ThreadLocal<Session>(() => new Session());
I know blocks are not threads so I'm looking for something similar but for blocks. Any way to create this?
This block is in a service and runs for months on end. During that time period tons of threads are used for each concurrent slot of the block so thread local storage is not appropriate. I need something tied to the logical block slot. Also this block never completes, it runs the entire lifetime of the service.
Note: The above suggested answer is not valid for what I am asking. I'm specifically asking for something different than thread local and the above answer is using thread local. This is a different question entirely.

As it sounds like you already know, Dataflow blocks provide absolutely no guarantee of correlation between blocks, execution, and threads. Even with max parallelism set to 4, all 4 tasks could be executing on the same thread. Or an individual task may execute on many threads.
Given that you ultimately want to reuse n instances of an expensive service for your n degrees of parallelism, let's take dataflow completely out of the picture for a minute, since it doesn't help (or directly hinder) you from any general solution to this problem. It's actually quite simple. You can use a ConcurrentStack<T>, where T is the type of your service that is expensive to instantiate. You have code that appears at the top of the method (or delegate) that represents one of your parallel units of work:
private ConcurrentStack<T> reusableServices;
private void DoWork() {
T service;
if (!this.reusableServices.TryPop(out service)) {
service = new T(); // expensive construction
}
// Use your shared service.
//// Code here.
// Put the service back when we're done with it so someone else can use it.
this.reusableServices.Push(service);
}
Now in this way, you can quickly see that you create exactly as many instances of your expensive service as you have parallel executions of DoWork(). You don't even have to hard-code the degree of parallelism you expect. And it's orthogonal to how you actually schedule that parallelism (so threadpool, Dataflow, PLINQ, etc. doesn't matter).
So you can just use DoWork() as your Dataflow block's delegate and you're set to go.
Of course, there's nothing magical about ConcurrentStack<T> here, except that the locks around push and pop are built into the type so you don't have to do it yourself.

multithread read and process large text files

I have 10 lists of over 100Mb each with emails and I wanna process them using multithreads as fast as possible and without loading them into memory (something like reading line by line or reading small blocks)
I have created a function which is removing invalid ones based on a regex and another one which is organizing them based on each domain to other lists.
I managed to do it using one thread with:
while (reader.Peek() != -1)
but it takes too damn long.
How can I use multithreads (around 100 - 200) and maybe a backgroundworker or something to be able to use the form while processing the lists in parallel?
I'm new to csharp :P

Unless the data is on multiple physical discs, chances are that any more than a few threads will slow down, rather than speed up, the process.
What'll happen is that rather than reading consecutive data (pretty fast), you'll end up seeking to one place to read data for one thread, then seeking to somewhere else to read data for another thread, and so on. Seeking is relatively slow, so it ends up slower -- often quite a lot slower.
About the best you can do is dedicate one thread to reading data from each physical disc, then another to process the data -- but unless your processing is quite complex, or you have a lot of fast hard drives, one thread for processing may be entirely adequate.

There are multiple approaches to it:
1.) You can create threads explicitly like Thread t = new Thread(), but this approach is expensive on creating and managing a thread.
2.) You can use .net ThreadPool and pass your executing function's address to QueueUserWorkItem static method of ThreadPool Class. This approach needs some manual code management and synchronization primitives.
3.) You can create an array of System.Threading.Tasks.Task each processing a list which are executed parallely using all your available processors on the machine and pass that array to task.WaitAll(Task[]) to wait for their completion. This approach is related to Task Parallelism and you can find detailed information on MSDN
Task[] tasks = null;
for(int i = 0 ; i < 10; i++)
{
//automatically create an async task and execute it using ThreadPool's thread
tasks[i] = Task.StartNew([address of function/lambda expression]);
}
try
{
//Wait for all task to complete
Task.WaitAll(tasks);
}
catch (AggregateException ae)
{
//handle aggregate exception here
//it will be raised if one or more task throws exception and all the exceptions from defaulting task get accumulated in this exception object
}
//continue your processing further

You will want to take a look at the Task Parallel Library (TPL).
This library is made for parallel work, in fact. It will perform your action on the Threadpool in whatever is the most efficient fashion (typically). The only thing that I would caution is that if you run 100-200 threads at one time, then you possibly run into having to deal with context switching. That is, unless you have 100-200 processors. A good rule of thumb is to only run as many tasks in parallel as you have processors.
Some other good resources to review how to use the TPL:
Why and how to use the TPL
How to start a task.

I would be inclined to use parallel linq (plinq).
Something along the lines of:
Lists.AsParallel()
.SelectMany(list => list)
.Where(MyItemFileringFunction)
.GroupBy(DomainExtractionFunction)
AsParallel tells linq it can do this in parallel (which will mean the ordering of everything following will not be maintained)
SelectMany takes your individual lists and unrolls them such that all all items from all lists are effectivly in a single Enumerable
Where filers the items using your predicate function
GroupBy collects them by key, where DomainExtractionFunction is a function which gets a key (the domain name in your case) from the items (ie, the email)

How to speed up routines making use of collections in multithreading scenario

I've an application that makes use of parallelization for processing data.
The main program is in C#, while one of the routine for analyzing data is on an external C++ dll. This library scans data and calls a callback everytime a certain signal is found within the data. Data should be collected, sorted and then stored into HD.
Here is my first simple implementation of the method invoked by the callback and of the method for sorting and storing data:
// collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// method invoked by the callback
private void Collect(int type, long time)
{
lock(locker) { mySignalList.Add(new MySignal(type, time)); }
}
// store signals to disk
private void Store()
{
// sort the signals
mySignalList.Sort();
// file is a object that manages the writing of data to a FileStream
file.Write(mySignalList.ToArray());
}
Data is made up of a bidimensional array (short[][] data) of size 10000 x n, with n variable. I use parallelization in this way:
Parallel.For(0, 10000, (int i) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
}
Now for each of the 10000 arrays I estimate that 0 to 4 callbacks could be fired. I'm facing a bottleneck and given that my CPU resources are not over-utilized, I suppose that the lock (together with thousand of callbacks) is the problem (am I right or there could be something else?). I've tried the ConcurrentBag collection but performances are still worse (in line with other user findings).
I thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
[EDIT]
I've followed the Reed Copsey's suggestion and I used the following solution (according to the profiler of VS2010, before the burden for locking and adding to the list was taking 15% of the resources, while now only 1%):
// master collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// thread-local storage of data (each thread is working on its List<MySignal>)
ThreadLocal<List<MySignal>> threadLocal;
// analyze data
private void AnalizeData()
{
using(threadLocal = new ThreadLocal<List<MySignal>>(() =>
{ return new List<MySignal>(); }))
{
Parallel.For<int>(0, 10000,
() =>
{ return 0;},
(i, loopState, localState) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
return 0;
},
(localState) =>
{
lock(this)
{
// add thread-local lists to the master collection
mySignalList.AddRange(local.Value);
local.Value.Clear();
}
});
}
}
// method invoked by the callback
private void Collect(int type, long time)
{
local.Value.Add(new MySignal(type, time));
}

thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
You might want to look at using ThreadLocal<T> to hold your collections. This automatically allocates a separate collection per thread.
That being said, there are overloads of Parallel.For which work with local state, and have a collection pass at the end. This, potentially, would allow you to spawn your ProcessData wrapper, where each loop body was working on its own collection, and then recombine at the end. This would, potentially, eliminate the need for locking (since each thread is working on it's own data set) until the recombination phase, which happens once per thread (instead of once per task,ie: 10000 times). This could reduce the number of locks you're taking from ~25000 (0-4*10000) down to a few (system and algorithm dependent, but on a quad core system, probably around 10 in my experience).
For details, see my blog post on aggregating data with Parallel.For/ForEach. It demonstrates the overloads and explains how they work in more detail.

You don't say how much of a "bottleneck" you're encountering. But let's look at the locks.
On my machine (quad core, 2.4 GHz), a lock costs about 70 nanoseconds if it's not contended. I don't know how long it takes to add an item to a list, but I can't imagine that it takes more than a few microseconds. But let's it takes 100 microseconds (I would be very surprised to find that it's even 10 microseconds) to add an item to the list, taking into account lock contention. So if you're adding 40,000 items to the list, that's 4,000,000 microseconds, or 4 seconds. And I would expect one core to be pegged if this were the case.
I haven't used ConcurrentBag, but I've found the performance of BlockingCollection to be very good.
I suspect, though, that your bottleneck is somewhere else. Have you done any profiling?

The basic collections in C# aren't thread safe.
The problem you're having is due to the fact that you're locking the entire collection just to call an add() method.
You could create a thread-safe collection that only locks single elements inside the collection, instead of the whole collection.
Lets look at a linked list for example.
Implement an add(item (or list)) method that does the following:
Lock collection.
A = get last item.
set last item reference to the new item (or last item in new list).
lock last item (A).
unclock collection.
add new items/list to the end of A.
unlock locked item.
This will lock the whole collection for just 3 simple tasks when adding.
Then when iterating over the list, just do a trylock() on each object. if it's locked, wait for the lock to be free (that way you're sure that the add() finished).
In C# you can do an empty lock() block on the object as a trylock().
So now you can add safely and still iterate over the list at the same time.
Similar solutions can be implemented for the other commands if needed.

Any built-in solution for a collection is going to involve some locking. There may be ways to avoid it, perhaps by segregating the actual data constructs being read/written, but you're going to have to lock SOMEWHERE.
Also, understand that Parallel.For() will use the thread pool. While simple to implement, you lose fine-grained control over creation/destruction of threads, and the thread pool involves some serious overhead when starting up a big parallel task.
From a conceptual standpoint, I would try two things in tandem to speed up this algorithm:
Create threads yourself, using the Thread class. This frees you from the scheduling slowdowns of the thread pool; a thread starts processing (or waiting for CPU time) when you tell it to start, instead of the thread pool feeding requests for threads into its internal workings at its own pace. You should be aware of the number of threads you have going at once; the rule of thumb is that the benefits of multithreading are overcome by the overhead when you have more than twice the number of active threads as "execution units" available to execute threads. However, you should be able to architect a system that takes this into account relatively simply.
Segregate the collection of results, by creating a dictionary of collections of results. Each results collection is keyed to some token carried by the thread doing the processing and passed to the callback. The dictionary can have multiple elements READ at one time without locking, and as each thread is WRITING to a different collection within the Dictionary there shouldn't be a need to lock those lists (and even if you did lock them you wouldn't be blocking other threads). The result is that the only collection that has to be locked such that it would block threads is the main dictionary, when a new collection for a new thread is added to it. That shouldn't have to happen often if you're smart about recycling tokens.

C# thread pool limiting threads

Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}

Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?

It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class

The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.

I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.

Be careful, QueueUserWorkItem might fail

There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.

ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).

I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.

You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).

I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.