What is the best way to accomplish a queue line for threads so that I can only have a max number of threads and if I already have that many the code waits for a free slot before continuing..
Pseudo codeish example of what I mean, Im sure this can be done in a better way...
(Please check the additional requirements below)
private int _MaxThreads = 10;
private int _CurrentThreads = 0;
public void main(string[] args)
{
List<object> listWithLotsOfItems = FillWithManyThings();
while(listWithLotsOfItems.Count> 0)
{
// get next item that needs to be worked on
var item = listWithLotsOfItems[0];
listWithLotsOfItems.RemoveAt(0);
// IMPORTANT!, more items can be added as we go.
listWithLotsOfItems.AddRange(AddMoreItemsToBeProcessed());
// wait for free thread slot
while (_CurrentThreads >= _MaxThreads)
Thread.Sleep(100);
Interlocked.Increment(ref _CurrentThreads); // risk of letting more than one thread through here...
Thread t = new Thread(new ParameterizedThreadStart(WorkerThread(item));
t.Start();
}
}
public void WorkerThread(object bigheavyObject)
{
// do heavy work here
Interlocked.Decrement(ref _CurrentThreads);
}
Looked at Sempahore but that seems to be needing to run inside the threads and not outside before it is created. In this example the Semaphore is used inside the thread after it is created to halt it, and in my case there could be over 100k threads that need to run before the job is done so I would rather not create the thread before a slot is available.
(link to semaphore example)
In the real application, data can be added to the list of items as the program progresses so the Parallel.ForEach won't really work either (I'm doing this in a script component in a SSIS package to send data to a very slow WCF).
SSIS has .Net 4.0
So, let me first of all say that what you're trying to do is only going to give you a small enhancement to performance in a very specific arrangement. It can be a lot of work to try and tune at the thread-allocation level, so be sure you have a very good reason before proceeding.
Now, first of all, if you want to simply queue up the work, you can put it on the .NET thread pool. It will only allocate threads up to the maximum configured and any work that doesn't fit onto those (if all the threads are busy) will be queued up until a thread becomes available.
The simplest way to do this is to call:
Task.Factory.StartNew(() => { /* Your code */});
This creates a TPL task and schedules it to run on the default task scheduler, which should in turn allocate the task to the thread-pool.
If you need to wait for these tasks to complete before proceeding, you can add them to a collection and then use Task.WaitAll(...):
var tasks = new List<Task>();
tasks.Add(Task.Factory.StartNew(() => { /* Your code */}));
// Before leaving the script.
Task.WaitAll(tasks);
However, if you need to go deeper and control the scheduling of these tasks, you can look at creating a custom task scheduler that supports limited concurrency. This MSDN article goes into more details about it and suggests a possible implementation, but it isn't a trivial task.
The easiest way to do this is with the overload of Parallel.ForEach() which allows you to select MaxDegreeOfParallelism.
Here's a sample program:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public static class Program
{
private static void Main()
{
List<int> items = Enumerable.Range(1, 100).ToList();
Parallel.ForEach(items, new ParallelOptions {MaxDegreeOfParallelism = 5}, process);
}
private static void process(int item)
{
Console.WriteLine("Processing " + item);
Thread.Sleep(2000);
}
}
}
If you run this, you'll see that it processes 5 elements very quickly and then there's a delay (caused by Thread.Sleep(2000)) before the next block of elements is processed. This is because in this sample code no more than 5 threads are allowed to execute at once.
Note that if MaxDegreeOfParallelism exceeds the threadpool's minimum thread value, then it may take a while for all the threads to be started.
The reason for this is that Parallel.ForEach() uses threadpool threads - and there is a certain number of threads that the threadpool keeps available by default. When creating threads beyond this limit, a delay is introduced inbetween each new threadpool thread creation.
You can set the minimum number of threadpool threads to a higher value using ThreadPool.SetMinThreads(), but I do NOT recommend this.
However, if you do want to do so, here's an example which sets the minimum thread count to 20:
ThreadPool.GetMinThreads(out dummy, out ioThreads);
ThreadPool.SetMinThreads(20, ioThreads);
If you do that and then run the previous code with MaxDegreeOfParallelism = 20 you'll see that there's no longer any delay when the initial threads are created.
Have you considered using a Wait handle? See this
Also you can use Parallel.Foreach to manage the thread creation for you.
Hope it helps ;)
Related
If I am creating Tasks using a for loop will those tasks run in parallel or would they just run one after the other?
Here is my code -
private void initializeAllSpas()
{
Task[] taskArray = new Task[spaItems.Count];
for(int i = 0; i < spaItems.Count; i++)
{
taskArray[i] = Task.Factory.StartNew(() => spaItems[i].initializeThisSpa());
}
Task.WhenAll(taskArray).Wait();
foreach (var task in taskArray) task.Dispose();
}
where spaItems is a list of items from another class, call it SpaItem, in which the initializeThisSpa() function opens a file and updates the information for that particular SpaItem.
My question is, does the above code actually excute initializeThisSpa() on all of the spaItems at the same time? if not, how can I correct that?
(I Ignored syntax issues if any and not tested)
At the same time?..
Not guaranteed. At least (the best bet) definitely there will be nano secs difference.
Tasks are placed in a queue.
And every task waits for its opportunity for a thread from threadpool, for its turn of execution.
It all depends on the availability of threads in thread pool. If no thread available, the tasks waits in queue.
There are different states for the task before its final execution. Here is a good explanation. And after going through this link, you will come to know that it is almost impossible to call a function at the same time from multiple tasks.
https://blogs.msdn.microsoft.com/pfxteam/2009/08/30/the-meaning-of-taskstatus/
You can achieve tasks sequentially (one after another) calling a specific function by creating tasks with methods like "ContinueWith, ContinueWhenAll, ContinueWhenAny,"
An example is below in MSDN documentation link.
https://msdn.microsoft.com/en-us/library/dd321473(v=vs.110).aspx
I have a function where I want to execute in a separate thread avoiding two threads to access the same resources. Also I want to make sure that if the thread is currently executing then stop that thread and start executing the new thread. This is what I have:
volatile int threadCount = 0; // use it to know the number of threads being executed
private void DoWork(string text, Action OncallbackDone)
{
threadCount++;
var t = new Thread(new ThreadStart(() =>
{
lock (_lock) // make sure that this code is only accessed by one thread
{
if (threadCount > 1) // if a new thread got in here return and let the last one execute
{
threadCount--;
return;
}
// do some work in here
Thread.Sleep(1000);
OncallbackDone();
threadCount--;
}
}));
t.Start();
}
if I fire that method 5 times then all the threads will be waiting for the lock until the lock is released. I want to make sure that I execute the last thread though. when the threads are waiting to be the owner of the lock how can I determine which will be the next one owning the lock. I want them to own the resource in the order that I created the threads...
EDIT
I am not creating this application with .net 4.0 . Sorry for not mentioning what I was trying to accomplish. I am creating an autocomplete control where I am filtering a lot of data. I don't want the main window to freeze eveytime I want to filter results. also I want to filter results as the user types. If the user types 5 letters at once I want to stop all threads and I will just be interested in the last one. because the lock blocks all the threads sometimes the last thread that I created may own the lock first.
I think you are overcomplicating this. If you are able to use 4.0, then just use the Task Parallel Library. With it, you can just set up a ContinueWith function so that threads that must happen in a certain order are done in the order you dictate. If this is NOT what you are looking for, then I actually would suggest that you not use threading, as this sounds like a synchronous action that you are trying to force into parallelism.
If you are just looking to cancel tasks: then here is a SO question on how to cancel TPL tasks. Why waste the resources if you are just going to dump them all except for the last one.
If you are not using 4.0, then you can accomplish the same thing with a Background Worker. It just takes more boilerplate code to accomplish the same thing :)
I agree with Justin in that you should use the .NET 4 Task Parallel Library. But if you want complete control you should not use the default Task Scheduler, which favors LIFO, but create your own Task Scheduler (http://msdn.microsoft.com/en-us/library/system.threading.tasks.taskscheduler.aspx) and implement the logic that you want to determine which task gets preference.
Using Threads directly is not recommended unless you have deep knowledge of .NET Threading. If you are on .NET 4.0; Tasks and TPL are preferred.
This is what I came up with after reading the links that you guys posted. I guess I needed a Queue therefore I implemented:
volatile int threadCount = 0;
private void GetPredicateAsync(string text, Action<object> DoneCallback)
{
threadCount++;
ThreadPool.QueueUserWorkItem((x) =>
{
lock (_lock)
{
if (threadCount > 1) // disable executing threads at same time
{
threadCount--;
return; // if a new thread is created exit.
// let the newer task do work!
}
// do work in here
Application.Current.Dispatcher.BeginInvoke(new Action(() =>
{
threadCount--;
DoneCallback(Foo);
}));
}
},text);
}
I can have a maximum of 5 threads running simultaneous at any one time which makes use of 5 separate hardware to speedup the computation of some complex calculations and return the result. The API (contains only one method) for each of this hardware is not thread safe and can only run on a single thread at any point in time. Once the computation is completed, the same thread can be re-used to start another computation on either the same or a different hardware depending on availability. Each computation is stand alone and does not depend on the results of the other computation. Hence, up to 5 threads may complete its execution in any order.
What is the most efficient C# (using .Net Framework 2.0) coding solution for keeping track of which hardware is free/available and assigning a thread to the appropriate hardware API for performing the computation? Note that other than the limitation of 5 concurrently running threads, I do not have any control over when or how the threads are fired.
Please correct me if I am wrong but a lock free solution is preferred as I believe it will result in increased efficiency and a more scalable solution.
Also note that this is not homework although it may sound like it...
.NET provides a thread pool that you can use. System.Threading.ThreadPool.QueueUserWorkItem() tells a thread in the pool to do some work for you.
Were I designing this, I'd not focus on mapping threads to your HW resources. Instead I'd expose a lockable object for each HW resource - this can simply be an array or queue of 5 Objects. Then for each bit of computation you have, call QueueUserWorkItem(). Inside the method you pass to QUWI, find the next available lockable object and lock it (aka, dequeue it). Use the HW resource, then re-enqueue the object, exit the QUWI method.
It won't matter how many times you call QUWI; there can be at most 5 locks held, each lock guards access to one instance of your special hardware device.
The doc page for Monitor.Enter() shows how to create a safe (blocking) Queue that can be accessed by multiple workers. In .NET 4.0, you would use the builtin BlockingCollection - it's the same thing.
That's basically what you want. Except don't call Thread.Create(). Use the thread pool.
cite: Advantage of using Thread.Start vs QueueUserWorkItem
// assume the SafeQueue class from the cited doc page.
SafeQueue<SpecialHardware> q = new SafeQueue<SpecialHardware>()
// set up the queue with objects protecting the 5 magic stones
private void Setup()
{
for (int i=0; i< 5; i++)
{
q.Enqueue(GetInstanceOfSpecialHardware(i));
}
}
// something like this gets called many times, by QueueUserWorkItem()
public void DoWork(WorkDescription d)
{
d.DoPrepWork();
// gain access to one of the special hardware devices
SpecialHardware shw = q.Dequeue();
try
{
shw.DoTheMagicThing();
}
finally
{
// ensure no matter what happens the HW device is released
q.Enqueue(shw);
// at this point another worker can use it.
}
d.DoFollowupWork();
}
A lock free solution is only beneficial if the computation time is very small.
I would create a facade for each hardware thread where jobs are enqueued and a callback is invoked each time a job finishes.
Something like:
public class Job
{
public string JobInfo {get;set;}
public Action<Job> Callback {get;set;}
}
public class MyHardwareService
{
Queue<Job> _jobs = new Queue<Job>();
Thread _hardwareThread;
ManualResetEvent _event = new ManualResetEvent(false);
public MyHardwareService()
{
_hardwareThread = new Thread(WorkerFunc);
}
public void Enqueue(Job job)
{
lock (_jobs)
_jobs.Enqueue(job);
_event.Set();
}
public void WorkerFunc()
{
while(true)
{
_event.Wait(Timeout.Infinite);
Job currentJob;
lock (_queue)
{
currentJob = jobs.Dequeue();
}
//invoke hardware here.
//trigger callback in a Thread Pool thread to be able
// to continue with the next job ASAP
ThreadPool.QueueUserWorkItem(() => job.Callback(job));
if (_queue.Count == 0)
_event.Reset();
}
}
}
Sounds like you need a thread pool with 5 threads where each one relinquishes the HW once it's done and adds it back to some queue. Would that work? If so, .Net makes thread pools very easy.
Sounds a lot like the Sleeping barber problem. I believe the standard solution to that is to use semaphores
I have some work (a job) that is in a queue (so there a several of them) and I want each job to be processed by a thread.
I was looking at Rx but this is not what I wanted and then came across the parallel task library.
Since my work will be done in an web application I do not want client to be waiting for each job to be finished, so I have done the following:
public void FromWebClientRequest(int[] ids);
{
// I will get the objects for the ids from a repository using a container (UNITY)
ThreadPool.QueueUserWorkItem(delegate
{
DoSomeWorkInParallel(ids, container);
});
}
private static void DoSomeWorkInParallel(int[] ids, container)
{
Parallel.ForEach(ids, id=>
{
Some work will be done here...
var respository = container.Resolve...
});
// Here all the work will be done.
container.Resolve<ILogger>().Log("finished all work");
}
I would call the above code on a web request and then the client will not have to wait.
Is this the correct way to do this?
TIA
From the MSDN docs I see that Unitys IContainer Resolve method is not thread safe (or it is not written). This would mean that you need to do that out of the thread loop. Edit: changed to Task.
public void FromWebClientRequest(int[] ids);
{
IRepoType repoType = container.Resolve<IRepoType>();
ILogger logger = container.Resolve<ILogger>();
// remove LongRunning if your operations are not blocking (Ie. read file or download file long running queries etc)
// prefer fairness is here to try to complete first the requests that came first, so client are more likely to be able to be served "first come, first served" in case of high CPU use with lot of requests
Task.Factory.StartNew(() => DoSomeWorkInParallel(ids, repoType, logger), TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
private static void DoSomeWorkInParallel(int[] ids, IRepoType repository, ILogger logger)
{
// if there are blocking operations inside this loop you ought to convert it to tasks with LongRunning
// why this? to force more threads as usually would be used to run the loop, and try to saturate cpu use, which would be doing nothing most of the time
// beware of doing this if you work on a non clustered database, since you can saturate it and have a bottleneck there, you should try and see how it handles your workload
Parallel.ForEach(ids, id=>{
// Some work will be done here...
// use repository
});
logger.Log("finished all work");
}
Plus as fiver stated, if you have .Net 4 then Tasks is the way to go.
Why go Task (question in comment):
If your method fromClientRequest would be fired insanely often, you would fill the thread pool, and overall system performance would probably not be as good as with .Net 4 with fine graining. This is where Task enters the game. Each task is not its own thread but the new .Net 4 thread pool creates enough threads to maximize performance on a system, and you do not need to bother on how many cpus and how much thread context switches would there be.
Some MSDN quotes for ThreadPool:
When all thread pool threads have been
assigned to tasks, the thread pool
does not immediately begin creating
new idle threads. To avoid
unnecessarily allocating stack space
for threads, it creates new idle
threads at intervals. The interval is
currently half a second, although it
could change in future versions of the
.NET Framework.
The thread pool has a default size of
250 worker threads per available
processor
Unnecessarily increasing the number of
idle threads can also cause
performance problems. Stack space must
be allocated for each thread. If too
many tasks start at the same time, all
of them might appear to be slow.
Finding the right balance is a
performance-tuning issue.
By using Tasks you discard those issues.
Another good thing is you can fine grain the type of operation to run. This is important if your tasks do run blocking operations. This is a case where more threads are to be allocated concurrently since they would mostly wait. ThreadPool cannot achieve this automagically:
Task.Factory.StartNew(() => DoSomeWork(), TaskCreationOptions.LongRunning);
And of course you are able to make it finish on demand without resorting to ManualResetEvent:
var task = Task.Factory.StartNew(() => DoSomeWork());
task.Wait();
Beside this you don't have to change the Parallel.ForEach if you don't expect exceptions or blocking, since it is part of the .Net 4 Task Parallel Library, and (often) works well and optimized on the .Net 4 pool as Tasks do.
However if you do go to Tasks instead of parallel for, remove the LongRunning from the caller Task, since Parallel.For is a blocking operations and Starting tasks (with the fiver loop) is not. But this way you loose the kinda first-come-first-served optimization, or you have to do it on a lot more Tasks (all spawned through ids) which probably would give less correct behaviour. Another option is to wait on all tasks at the end of DoSomeWorkInParallel.
Another way is to use Tasks:
public static void FromWebClientRequest(int[] ids)
{
foreach (var id in ids)
{
Task.Factory.StartNew(i =>
{
Wl(i);
}
, id);
}
}
I would call the above code on a web
request and then the client will not
have to wait.
This will work provided the client does not need an answer (like Ok/Fail).
Is this the correct
way to do this?
Almost. You use Parallel.ForEach (TPL) for the jobs but run it from a 'plain' Threadpool job. Better to use a Task for the outer job as well.
Also, handle all exceptions in that outer Task. And be careful about the thread-safety of the container etc.
Our scenario is a network scanner.
It connects to a set of hosts and scans them in parallel for a while using low priority background threads.
I want to be able to schedule lots of work but only have any given say ten or whatever number of hosts scanned in parallel. Even if I create my own threads, the many callbacks and other asynchronous goodness uses the ThreadPool and I end up running out of resources. I should look at MonoTorrent...
If I use THE ThreadPool, can I limit my application to some number that will leave enough for the rest of the application to Run smoothly?
Is there a threadpool that I can initialize to n long lived threads?
[Edit]
No one seems to have noticed that I made some comments on some responses so I will add a couple things here.
Threads should be cancellable both
gracefully and forcefully.
Threads should have low priority leaving the GUI responsive.
Threads are long running but in Order(minutes) and not Order(days).
Work for a given target host is basically:
For each test
Probe target (work is done mostly on the target end of an SSH connection)
Compare probe result to expected result (work is done on engine machine)
Prepare results for host
Can someone explain why using SmartThreadPool is marked wit ha negative usefulness?
In .NET 4 you have the integrated Task Parallel Library. When you create a new Task (the new thread abstraction) you can specify a Task to be long running. We have made good experiences with that (long being days rather than minutes or hours).
You can use it in .NET 2 as well but there it's actually an extension, check here.
In VS2010 the Debugging Parallel applications based on Tasks (not threads) has been radically improved. It's advised to use Tasks whenever possible rather than raw threads. Since it lets you handle parallelism in a more object oriented friendly way.
UPDATE
Tasks that are NOT specified as long running, are queued into the thread pool (or any other scheduler for that matter).
But if a task is specified to be long running, it just creates a standalone Thread, no thread pool is involved.
The CLR ThreadPool isn't appropriate for executing long-running tasks: it's for performing short tasks where the cost of creating a thread would be nearly as high as executing the method itself. (Or at least a significant percentage of the time it takes to execute the method.) As you've seen, .NET itself consumes thread pool threads, you can't reserve a block of them for yourself lest you risk starving the runtime.
Scheduling, throttling, and cancelling work is a different matter. There's no other built-in .NET worker-queue thread pool, so you'll have roll your own (managing the threads or BackgroundWorkers yourself) or find a preexisting one (Ami Bar's SmartThreadPool looks promising, though I haven't used it myself).
In your particular case, the best option would not be either threads or the thread pool or Background worker, but the async programming model (BeginXXX, EndXXX) provided by the framework.
The advantages of using the asynchronous model is that the TcpIp stack uses callbacks whenever there is data to read and the callback is automatically run on a thread from the thread pool.
Using the asynchronous model, you can control the number of requests per time interval initiated and also if you want you can initiate all the requests from a lower priority thread while processing the requests on a normal priority thread which means the packets will stay as little as possible in the internal Tcp Queue of the networking stack.
Asynchronous Client Socket Example - MSDN
P.S. For multiple concurrent and long running jobs that don't do allot of computation but mostly wait on IO (network, disk, etc) the better option always is to use a callback mechanism and not threads.
I'd create your own thread manager. In the following simple example a Queue is used to hold waiting threads and a Dictionary is used to hold active threads, keyed by ManagedThreadId. When a thread finishes, it removes itself from the active dictionary and launches another thread via a callback.
You can change the max running thread limit from your UI, and you can pass extra info to the ThreadDone callback for monitoring performance, etc. If a thread fails for say, a network timeout, you can reinsert back into the queue. Add extra control methods to Supervisor for pausing, stopping, etc.
using System;
using System.Collections.Generic;
using System.Threading;
namespace ConsoleApplication1
{
public delegate void CallbackDelegate(int idArg);
class Program
{
static void Main(string[] args)
{
new Supervisor().Run();
Console.WriteLine("Done");
Console.ReadKey();
}
}
class Supervisor
{
Queue<System.Threading.Thread> waitingThreads = new Queue<System.Threading.Thread>();
Dictionary<int, System.Threading.Thread> activeThreads = new Dictionary<int, System.Threading.Thread>();
int maxRunningThreads = 10;
object locker = new object();
volatile bool done;
public void Run()
{
// queue up some threads
for (int i = 0; i < 50; i++)
{
Thread newThread = new Thread(new Worker(ThreadDone).DoWork);
newThread.IsBackground = true;
waitingThreads.Enqueue(newThread);
}
LaunchWaitingThreads();
while (!done) Thread.Sleep(200);
}
// keep starting waiting threads until we max out
void LaunchWaitingThreads()
{
lock (locker)
{
while ((activeThreads.Count < maxRunningThreads) && (waitingThreads.Count > 0))
{
Thread nextThread = waitingThreads.Dequeue();
activeThreads.Add(nextThread.ManagedThreadId, nextThread);
nextThread.Start();
Console.WriteLine("Thread " + nextThread.ManagedThreadId.ToString() + " launched");
}
done = (activeThreads.Count == 0) && (waitingThreads.Count == 0);
}
}
// this is called by each thread when it's done
void ThreadDone(int threadIdArg)
{
lock (locker)
{
// remove thread from active pool
activeThreads.Remove(threadIdArg);
}
Console.WriteLine("Thread " + threadIdArg.ToString() + " finished");
LaunchWaitingThreads(); // this could instead be put in the wait loop at the end of Run()
}
}
class Worker
{
CallbackDelegate callback;
public Worker(CallbackDelegate callbackArg)
{
callback = callbackArg;
}
public void DoWork()
{
System.Threading.Thread.Sleep(new Random().Next(100, 1000));
callback(System.Threading.Thread.CurrentThread.ManagedThreadId);
}
}
}
Use the built-in threadpool. It has good capabilities.
Alternatively you can look at the Smart Thread Pool implementation here or at Extended Thread Pool for a limit on the maximum number of working threads.