Limiting Self Expanding Tasks

Limiting Self Expanding Tasks - c#

Background: I'm using .NET 4.0 and the new Task class. Using the code found at http://msdn.microsoft.com/en-us/library/ee789351.aspx I've implemented a Task Scheduler that limits the number of concurrent threads executing. I've also modified the code to use the BlockingCollection instead of LinkedList for my lists of tasks so I can limit how large the list of pending tasks gets. Each of my tasks potentially spawn other tasks. So, task A can spawn task B, C, and D.
Problem: Task A isn't technically complete until it's added task B, C, and D to the queue, but if the queue is full, A blocks. So, how do I have a limited task queue with self expanding tasks?
The reason I need to limit the size of the queue is it will otherwise explode my memory use. A single task can spawn thousands of other tasks. 10-15 tasks each queuing up thousands more...you get the picture.
Suggestions would be appreciated!
Thanks,
Dan

how do I have a limited task queue with self expanding tasks
I would say you don't; that is, your requirements are in conflict with each other. If you must have a hard limit on the number of pending tasks, then you must accept that an attempt to pend a new task can either block (as now) or fail.
If "a single task can spawn thousands of other tasks" then you are necessarily going to have the possibility of a large amount of pending work. The purpose of the task queue is to act as a place to hold pending work. However, since (I would hope) most tasks will not pend thousands of new tasks, over time the amount of pending work will diminish, eventually to nothing.
In a sense, one of the points of having a queue of pending tasks is precisely that new pends can be done without regard to currently available processing time. 'Thousands' of items in the wait queue should not be a problem, as far as memory use goes. Millions maybe, but even then - have you profiled and demonstrated this to be a problem?

I would suggest using a limited 'master' queue, and any tasks that are dependent run in a separate unlimited 'slave' queue. So Task A goes in the 'master' queue, but the other tasks are created in the 'slave' queue.
To keep limits in place, you'd still stop more than (for example) 10 tasks being queued in the master queue: or you could stop anything being added to the main queue if there are more than 30 tasks in the slave queue.
Hope that helps - if you have any questions about this answer please post a comment here and I will be happy to help or provide an example!

You could perhaps have two lists of tasks in the task scheduler : PrimaryTasks and SecondaryTasks, with a different limit for each.
Have a small limit on primary tasks, but a larger limit on secondary tasks. Since the secondary tasks don't expand, I assume they wont ever block, and hence primary tasks will evntually finish too.

Related

TPL Dataflow Block with permanent Task/Thread

Stepen Toub mentions in this Channel 9 Video that a *Block creates a task if an item was pushed to its incoming queue. If all items in queue are computed the task gets destroyed.
If I use a lot of blocks to build up a mesh to number of actually running tasks is not clear (and if the TaskScheduler is the default one the number of active ThreadPool threads is also not clear).
Does TPL Dataflow offers a way where I can say: "Ok I want this kind of block with a permanent running task (thread)?

TL;DR: there is no way to dedicate a thread to a block, as it's clearly conflicting with purpose of TPL Dataflow, except by implementing your own TaskScheduler. Do measure before trying to improve your application performance.
I just watched the video and can't find such phrase in there:
creates a task if an item was pushed to its incoming queue. If all items in queue are computed the task gets destroyed.
Maybe I'm missing something, but all that Stephen said is: [at the beginning] We have a common Producer-Consumer problem, which can be easily implemented with .Net 4.0 stack, but the problem is that if the data runs out, the consumer goes away from loop, and never return.
[After that] Stephen explains, how such problem can be solved with TPL Dataflow, and he said that the ActionBlock starts a Task if it wasn't started. Inside that task there is code which waits (in async fashion) for a new message, freeing up the thread, but not destroying the task.
Also Stephen mentioned task while explaining the sending messages across the linked blocks, and there he says that posting task will fade away if there is no data to send. It doesn't mean that a task corresponding to the block fades away, it's only about some child task being used to send data, and that's it.
In the TPL Dataflow the only way to say to the block that there wouldn't be any more data: by calling it's Complete method or completing any of linked blocks. After that consuming task will be stopped, and, after all buffered data being processed, the block will end it's task.
According the official github for TPL Dataflow, all tasks for message handling inside blocks are created as DenyChildAttach, and, sometimes, with PreferFairness flag. So, there is no reason for me to provide a mechanism to fit one thread directly to the block, as it will stuck and waste CPU resources if there is no data for the block. You may introduce some custom TaskScheduler for blocks, but right now it's not obvious why do you need that.
If you're worried that some block may get more CPU time than others, there is a way to leverage that effect. According official docs, you can try to set the MaxMessagesPerTask property, forcing the task restart after some amount of data being sent. Still, this should be done only after measuring actual execution time.
Now, back to your words:
number of actually running tasks is not clear
the number of active ThreadPool threads is also not clear
How did you profile your application? During debug you can easily find all active tasks and all active threads. If it's not enough, you can profile your application, either with native Microsoft tools or a specialized profiler, like dotTrace, for example. Such toolkit can easily provide you information about what's going on in your app.

The talk is about the internal machinery of the TPL Dataflow library. As a mechanism it is a quite efficient, and you shouldn't really worry about any overhead unless your intended throughput is in the order of 100,000 messages per second or more (in which case you should search for ways to chunkify your workload). Even with workloads with very small granularity, the difference between processing the messages using a single task for all messages, or a dedicated task for each one, should be hardly noticeable. A Task is an object that "weighs" normally a couple hundred bytes, and the .NET platform is able to create and recycle millions of objects of this size per second.
It would be a problem if each Task required its own dedicated 1MB thread in order to run, but this is not the case. Typically the tasks are executed using ThreadPool threads, and a single ThreadPool thread can potentially execute millions of short-lived tasks per second.
I should also mention that the TPL Dataflow supports asynchronous lambdas too (lambdas with Task return types), in which case the blocks essentially don't have to execute any code at all. They just await the generated promise-style tasks to complete, and for asynchronous waiting no thread is needed.

C# Multithreading Model

I've a c# single threaded application and currently working on to make it multi-threaded with the use of thread pools. I am stuck in deciding which model would work for my problem.
Here's my current scenario
While(1)
{
do_sometask();
wait(time);
}
And this is repeated almost forever. The new scenario has multiple threads which does the above. I could easily implement it by spawning number of threads based on the tasks I have to perform, where all the threads perform some task and wait forever.
The issue here is I may not know the number of tasks, so I can't just blindly spawn 500 threads. I thought about using threadpool, but because almost every thread loops forever and won't ever be freed up for new tasks in the queue, am not sure which other model to use.
I am looking for an idea or solution where I could break the loop in the thread and free it up instead of waiting, but come back and resume the same task after the wait(when the time gets elapsed, using something like a timer/checking timestamp of when the last task is performed).
With this I could use a limited number of threads (like in a thread pool) and serve the tasks which are coming in during the time old threads waits(virtually).
Any help is really appreciated.

If all you have is a bunch of things that happen periodically, it sounds what you want is a bunch of timers. Create a timer for each task, to fire when appropriate. So if you have two different tasks:
using System.Threading;
// Task1 happens once per minute
Timer task1Timer = new Timer(
s => DoTask1(),
null,
TimeSpan.FromMinutes(1),
TimeSpan.FromMinutes(1));
// Task2 happens once every 47 seconds
Timer task2Timer = new Timer(
s => DoTask2(),
null,
TimeSpan.FromSeconds(47),
TimeSpan.FromSeconds(47);
The timer is a pretty lightweight object, so having a whole bunch of them isn't really a problem. The timer only takes CPU resources when it fires. The callback method will be executed on a pool thread.
There is one potential problem. If you have a whole lot of timers all with the same period, then the callbacks will all be called at the same time. The threadpool should handle that gracefully by limiting the number of concurrent tasks, but I can't say for sure. But if your wait times are staggered, this is going to work well.
If you have small wait times (less than a second), then you probably need a different technique. I'll detail that if required.

With this design, you only have one thread blocked at any time.
Have one thread (the master thread) waiting on a concurrent blocking collection, such as the BlockingCollection. This thread will be blocked by a call to TryTake until something is placed in the collection, or after a certain amount of time has passed via a timeout passed into the call (more on this later).
Once it is unblocked, it may have a unit of work to be processed. It checks to see if there is one (i.e., the TryTake call didn't time out), then if there is capacity to perform this work, and if so, queues up a thread (pool, Task or whatevs) to service the work. This master thread then goes back to the blocking collection and tries to take another unit of work. The cycle continues.
As a unit of work is begun, it will be noted so that the main thread can see how many threads are working. Once this unit is completed, the notation will be removed. The thread is then freed.
You want to use a timeout so that if it is judged that too many operations are running concurrently, you will be able to re-evaluate this a set period of time down the road. Otherwise, that unit of work sits in the blocking collection until a new unit is added, which is not optimal.
Outside users of this instance can queue up new units of work by simply dropping them in the collection.
You can use a cancellation token to immediately unblock the thread when it's time to shut down operations. Have the worker operations take cancellation tokens as well so they can halt on shutdown.

I could implement it with the help of a threadpool and few conditions to check the last activity of the task before adding it to the threadpool queue.

.net is not starting the tasks as i order it - what is the logic of starting new tasks

After working a while noticed that, even if you spawn 1000 tasks, they don't start immediately. So basically even if i start 1000 tasks, 100 of them running and 900 of them waiting to run.
So my question is, how are they begining to start ?
How .net determines when to start running task or make it waittorun ?
What methodolgy i can follow to start them immediately ?
I want to have certain number of task/thread running all the time.
If i use threads instead of tasks would they start running immediately or .net will start them as it please like tasks ?
Question may not be very clear, so please ask me to clarify.
Basically i am spawning 1000 (keeping this number spawned. when 1 task completed starting another task) tasks but only 125 of them Running and 875 of them WaitingToRun :)
this is how i start task
Task.Factory.StartNew(() =>
{
startCheckingProxies();
});
c# wpf 4.5

If you are talking about Task objects, they are run on top of the thread pool, so they will not all start immediately by running each on a separate thread. Instead, the limited number of tasks will initially be started on threads coming from the pool, and then the threads will be reused to run next tasks and so on.
Of course, this is just a high-level description, the logic behind is more complex and implements lot of optimizations.
You can find more info here and here
You can also start tasks with the overload of StartNew which lets you tweak options and scheduler settings. Note, however, that running on a large number of threads will likely result in worse performance. Thread creation and context switching have significant costs, and running thousands of threads will, IMO, backfire.

Tasks are really just threads under the hood.
There is a limit to how much benefit you can get by spawning new threads. Each thread has some overhead, so at some point, the overhead is going to exceed the benefit of spawning a new thread. If you leave the spawning of those tasks to the Framework, it is going to decide for itself how many threads it's going to run at once, and it's going to make that decision based on how much productivity it thinks it can get from those threads.
I'm pretty sure that optimal number is not going to be a thousand; I've written Windows Services where the optimal number of threads to run at the same time is the number of cores in the machine (in my case, it was 4).

C# ThreadPool Implementation / Performance Spikes

In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?

Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.

As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));

Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.

answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.

to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.

Embarrassingly parallelizable tasks in .NET

I am working on a problem where I need to perform a lot of embarrassingly parallelizable tasks. The task is created by reading data from the database but a collection of all tasks would exceed the amount of memory on the machine so tasks have to be created, processed and disposed. I am wondering what would be a good approach to solve this problem? I am thinking the following two approaches:
Implement a synchronized task queue. Implement a producer (task creater) that read data from database and put task in the queue (limit the number of tasks currently in the queue to a constant value to make sure that the amount of memory is not exceeded). Have multiple consumer processes (task processor) that read task from the queue, process task, store the result and dispose the task. What would be a good number of consumer processes in this approach?
Use .NET Parallel extension (PLINQ or parallel for), but I understand that a collection of tasks have to be created (Can we add tasks to the collection while processing in the parallel for?). So we will create a batch of tasks -- say N tasks at a time and do process these batch of tasks and read another N tasks.
What are your thoughts on these two approaches?

Use a ThreadPool with a bounded queue to avoid overwhelming the system.
If each of your worker tasks is CPU bound then configure your system initially so that the number of threads in your system is equal to the number of hardware threads that your box can run.
If your tasks aren't CPU bound then you'll have to experiment with the pool size to get an optimal solution for your particular situation
You may have to experiment with either approach to get to the optimal configuration.
Basically, test, adjust, test, repeat until you're happy.

I've not had the opportunity to actually use PLINQ, however I do know that PLINQ (like vanilla LINQ) is based on IEnumerable. As such, I think this might be a case where it would make sense to implement the task producer via C# iterator blocks (i.e. the yield keyword).
Assuming you are not doing any operations where the entire set of tasks must be known in advance (e.g. ordering), I would expect that PLINQ would only consume as many tasks as it could process at once. Also, this article references some strategies for controlling just how PLINQ goes about consuming input (the section titled "Processing Query Output").
EDIT: Comparing PLINQ to a ThreadPool.
According to this MSDN article, efficiently allocating work to a thread pool is not at all trivial, and even when you do it "right", using the TPL generally exhibits better performance.

Use the ThreadPool.
Then you can queue up everything and items will be run as threads become available to the pool without overwhelming the system. The only trick is determining the optimum number of threads to run at a time.

Sounds like a job for Microsoft HPC Server 2008. Given that it's the number of tasks that's overwhelming, you need some kind of parallel process manager. That's what HPC server is all about.
http://www.microsoft.com/hpc/en/us/default.aspx

In order to give a good answer we need a few questions answered.
Is each individual task parallelizable? Or each task is the product of a parallelizable main task?
Also, is it the number of tasks that would cause the system to run out of memory, or is it the quantity of data each task holds and processes that would cause the system to run out of memory?

Sounds like Windows Workflow Foundation (WF) might be a good thing to use to do this. It might also give you some extra benefits such as pause/resume on your tasks.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.