How to limit number of parallel processes executed from C# program? - c#

I need to create a code in C# that will execute a single process 1000 times (each time with different command line arguments), and I want to limit the number of processes that run in parallel to 4. In such a way so that all the rest will wait until at least one has finished, and then the next to start.
How can I do that?

you have couple options here
one is to use Parallel
Parallel.For(0, 1000, new ParallelOptions { MaxDegreeOfParallelism = 4 },
i =>
{
//do something
});
another one is to use Semaphore
Semaphore pool = new Semaphore(0, 4);
You would call pool.WaitOne() before starting the worker and pool.Release() when your worker finishes
Or you might be able to limit the thread pool's size (not recommended)

You can set the maximum number of Threads in the ThreadPool to 4:
ThreadPool.SetMaxThreads(4, 4);
Then you can queue up the 1000 processes in the ThreadPool:
for (int i = 0; i < 1000; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(LaunchProcess));
}
Here is the method that will run on the background thread and run your process. It uses WaitForExit() to ensure the process finishes before the thread will finish.
static void LaunchProcess(object o)
{
var p = new Process();
p.StartInfo = new ProcessStartInfo("myprocess.exe");
p.Start();
p.WaitForExit(); // Critical to wait here for the process to finish
}
That should limit the number of active processes to 4, I believe. Haven't tested.

Related

Stopping Parallel.ForEach if one of threads performs more than N minutes

I'm looking for a solution for stopping Parallel.ForEach if one of threads performs more than 2 minutes.
The next solution I think is not very good because of x2 extra threads:
Parallel.ForEach(items, (item, opt) =>
{
var thread = new Thread(() => { /* a process */ });
thread.Start();
bool finished = thread.Join(TimeSpan.FromMinutes(2));
if (!finished)
{
thread.Abort();
opt.Stop();
}
});
Do you know a better solution for the issue?
First of all, I want to note that Parallel class will not create a thread for each of your item, it will use default ThreadPool, which has by default number of threads equal to processor's cores count. Other problem in your code is that you do not stop all the tasks after 2 minutes of working, you only cancel the one which you've waited for two minutes.
I suggest you remove the Thread usage from your code, and create an array of Tasks with single CancellationToken for them with a Timeout for it or with a timeout for TaskFactory, and start them all. Also your code should explicitly check the token for cancellation pending.
So your code could be something like this:
var cts = new CancellationTokenSource();
// two minutes in milliseconds
cts.CancelAfter(2000 * 60);
var tasks = new List<Task>();
foreach (var item in items)
{
// this is needed because of closures work in C#
var localItem = item;
tasks.Add(Task.Run(() =>
{ /* a process with a localItem here */
// this check should be repeated from time to time in your calculations
if (cts.Token.IsCancellationRequested)
{
cts.Token.ThrowIfCancellationRequested();
}
}
// all tasks has only one token
, cts.Token)
}
// this will cancel all tasks after 2 minutes from start
Task.WaitAll(tasks.ToArray(), TimeSpan.FromMinutes(2));
// this will cancel all tasks if one of them will last more than 2 minutes
Task.WaitAll(tasks.ToArray());
Update:
As you said that the each item is independent, you can create CancellationTokenSource for each task, but, as #ScottChamberlain noted, in this case too many tasks will run in the same time. You can write your own TaskScheduler, use some Semafor (or it's slim version) or simply use the Parallel class with ParallelOptions.MaxDegreeOfParallelism correctly set.

Suggest a way in threading that always fixed number of threads must be used in computing?

I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});

How can i use AutoResetEventHandler to signal Main thread function to start threads again once the first set of worker threads are done processing

Requirement :- At any given point of time only 4 threads should be calling four different functions. As soon as these threads complete, next available thread should call the same functions.
Current code :- This seems to be the worst possible way to achieve something like this. While(True) will cause unnecessary CPU spikes and i could see CPU rising to 70% when running the following code.
Question :- How can i use AutoResetEventHandler to signal Main thread Process() function to start next 4 threads again once the first 4 worker threads are done processing without wasting CPU cycles. Please suggest
public class Demo
{
object protect = new object();
private int counter;
public void Process()
{
int maxthread = 4;
while (true)
{
if (counter <= maxthread)
{
counter++;
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
}
}
}
private void DoSomething()
{
try
{
Thread.Sleep(50000); //simulate long running process
}
finally
{
lock (protect)
{
counter--;
}
}
}
You can use TPL to achieve what you want in a simpler way. If you run the code below you'll notice that an entry is written after each thread terminates and only after all four threads terminate the "Finished batch" entry is written.
This sample uses the Task.WaitAll to wait for the completion of all tasks. The code uses an infinite loop for illustration purposes only, you should calculate the hasPendingWork condition based on your requirements so that you only start a new batch of tasks if required.
For example:
private static void Main(string[] args)
{
bool hasPendingWork = true;
do
{
var tasks = InitiateTasks();
Task.WaitAll(tasks);
Console.WriteLine("Finished batch...");
} while (hasPendingWork);
}
private static Task[] InitiateTasks()
{
var tasks = new Task[4];
for (int i = 0; i < tasks.Length; i++)
{
int wait = 1000*i;
tasks[i] = Task.Factory.StartNew(() =>
{
Thread.Sleep(wait);
Console.WriteLine("Finished waiting: {0}", wait);
});
}
return tasks;
}
One other thing, from the textual requirement section on your question I'm lead to believe that a batch of four new threads should only start after all previously four threads completed. However the code you posted is not compatible with that requirement, since it starts a new thread immediately after a previous thread terminate. You should clarify what exactly is your requirement.
UPDATE:
If you want to start a thread immediately after one of the four threads terminate you can still use TPL instead of starting new threads explicitly but you can limit the number of running threads to four by using a SemaphoreSlim. For example:
private static SemaphoreSlim TaskController = new SemaphoreSlim(4);
private static void Main(string[] args)
{
var random = new Random(570);
while (true)
{
// Blocks thread without wasting CPU
// if the number of resources (4) is exhausted
TaskController.Wait();
Task.Factory.StartNew(() =>
{
Console.WriteLine("Started");
Thread.Sleep(random.Next(1000, 3000));
Console.WriteLine("Completed");
// Releases a resource meaning TaskController.Wait will unblock
TaskController.Release();
});
}
}

How do you get list of running threads in C#?

I create dynamic threads in C# and I need to get the status of those running threads.
List<string>[] list;
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
}
for (int i = 0; i < list[0].Count; i++)
{
// here how can i get list of running thread here.
}
How can you get list of running threads?
On Threads
I would avoid explicitly creating threads on your own.
It is much more preferable to use the ThreadPool.QueueUserWorkItem or if you do can use .Net 4.0 you get the much more powerful Task parallel library which also allows you to use a ThreadPool threads in a much more powerful way (Task.Factory.StartNew is worth a look)
What if we choose to go by the approach of explicitly creating threads?
Let's suppose that your list[0].Count returns 1000 items. Let's also assume that you are performing this on a high-end (at the time of this writing) 16core machine. The immediate effect is that we have 1000 threads competing for these limited resources (the 16 cores).
The larger the number of tasks and the longer each of them runs, the more time will be spent in context switching. In addition, creating threads is expensive, this overhead creating each thread explicitly could be avoided if an approach of reusing existing threads is used.
So while the initial intent of multithreading may be to increase speed, as we can see it can have quite the opposite effect.
How do we overcome 'over'-threading?
This is where the ThreadPool comes into play.
A thread pool is a collection of threads that can be used to perform a number of tasks in the background.
How do they work:
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are placed in queue until they can be serviced as threads become available.
So we can see that by using a thread pool threads we are more efficient both
in terms of maximizing the actual work getting done. Since we are not over saturating the processors with threads, less time is spent switching between threads and more time actually executing the code that a thread is supposed to do.
Faster thread startup: Each threadpool thread is readily available as opposed to waiting until a new thread gets constructed.
in terms of minimising memory consumption, the threadpool will limit the number of threads to the threadpool size enqueuing any requests that are beyond the threadpool size limit. (see ThreadPool.GetMaxThreads). The primary reason behind this design choice, is of course so that we don't over-saturate the limited number of cores with too many thread requests keeping context switching to lower levels.
Too much Theory, let's put all this theory to the test!
Right, it's nice to know all this in theory, but let's put it to practice and see what
the numbers tell us, with a simplified crude version of the application that can give us a coarse indication of the difference in orders of magnitude. We will do a comparison between new Thread, ThreadPool and Task Parallel Library (TPL)
new Thread
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Thread thread = new Thread(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
thread.Name = "SID" + iCopy; // you can also use i here.
thread.Start();
}
Console.ReadKey();
Console.WriteLine(GC.GetTotalMemory(false) - initialMemoryFootPrint);
Console.ReadKey();
}
Result:
ThreadPool.EnqueueUserWorkItem
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
ThreadPool.QueueUserWorkItem((w) =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0)
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
Task Parallel Library (TPL)
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Task.Factory.StartNew(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
So we can see that:
+--------+------------+------------+--------+
| | new Thread | ThreadPool | TPL |
+--------+------------+------------+--------+
| Time | 6749 | 228ms | 222ms |
| Memory | ≈300kb | ≈103kb | ≈123kb |
+--------+------------+------------+--------+
The above falls nicely inline to what we anticipated in theory. High memory for new Thread as well as slower overall performance when compared to ThreadPool. ThreadPool and TPL have equivalent performance with TPL having a slightly higher memory footprint than a pure thread pool but it's probably a price worth paying given the added flexibility Tasks provide (such as cancellation, waiting for completion querying status of task)
At this point, we have proven that using ThreadPool threads is the preferable option in terms of speed and memory.
Still, we have not answered your question. How to track the state of the threads running.
To answer your question
Given the insights we have gathered, this is how I would approach it:
List<string>[] list = listdbConnect.Select()
int itemCount = list[0].Count;
Task[] tasks = new Task[itemCount];
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
// NOTE: Do not use i in here as it is not thread safe to do so!
sendMessage(list[0]['1']);
//calling callback function
});
}
// if required you can wait for all tasks to complete
Task.WaitAll(tasks);
// or for any task you can check its state with properties such as:
tasks[1].IsCanceled
tasks[1].IsCompleted
tasks[1].IsFaulted
tasks[1].Status
As a final note, you can not use the variable i in your Thread.Start, since it would create a closure over a changing variable which would effectively be shared amongst all Threads. To get around this (assuming you need to access i), simply create a copy of the variable and pass the copy in, this would make one closure per thread which would make it thread safe.
Good luck!
Use Process.Threads:
var currentProcess = Process.GetCurrentProcess();
var threads = currentProcess.Threads;
Note: any threads owned by the current process will show up here, including those not explicitly created by you.
If you only want the threads that you created, well, why don't you just keep track of them when you create them?
Create a List<Thread> and store each new thread in your first for loop in it.
List<string>[] list;
List<Thread> threads = new List<Thread>();
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
threads.add(th)
}
for (int i = 0; i < list[0].Count; i++)
{
threads[i].DoStuff()
}
However if you don't need i you can make the second loop a foreach instead of a for
As a side note, if your sendMessage function does not take very long to execute you should somthing lighter weight then a full Thread, use a ThreadPool.QueueUserWorkItem or if it is available to you, a Task
Process.GetCurrentProcess().Threads
This gives you a list of all threads running in the current process, but beware that there are threads other than those you started yourself.
Use Process.Threads to iterate through your threads.

How to run C# threads in sequence

I have a situation in which I want to start 3 threads called: tr1, tr2 and tr3
I want tr2 to start after tr1 and tr3 to start after tr2.
How can I achieve this?
What Possible reason you have to that? if you don't need the, to run in parallel why do you need 3 threads?
Any way - you can call thread1.Join() from thread2 and thread2.Join() from thread3 so each thread will wait for the previous one.
In Fx4 you can use Tasks and the ContinueWith feature.
And while it does make sense to have Tasks (Jobs) that must be run in sequence, it does not seem so sensible for Threads. Why don't you use 1 Thread that executes m1(), m2() and m3() in sequence? Especially if you are on Fx <= 3.5
Another aspect here is error handling. The tasks library will handle that more or less invisible, but without it you need to take care.
Make each thread start the next one.
However, if they all run in sequence anyways, what is the reason you want to use multiple threads in the first place?
The simplest way is just to have a small sleep between the startups, that is how I "solved" the problem.
Another option is to start tr2 from tr1, and tr3 from tr2, after you've done the non thread-safe things.
How are they dependent on each other? Why don't you just have one thread?
You can use WaitHandles to signal that a thread has completed working.
The reason why you would want to run threads in sequence is simple: you want to compare it with the parallel one and see the speed improvements.
The following example runs threads in sequence and computes the elapsed time. Comment the Join method in the first loop and uncomment the second loop to have the threads run concurrently.
using System.Diagnostics;
Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
var sw = new Stopwatch();
sw.Start();
Thread[] threads = new Thread[10];
for (var i = 0; i < 10; i++)
{
var t = new Thread(() =>
{
var id = Thread.CurrentThread.ManagedThreadId;
var r = new Random().Next(500, 1500);
Thread.Sleep(r);
Console.WriteLine($"{id} finished in {r} ms");
}
);
threads[i] = t;
}
foreach (var t in threads)
{
t.Start();
t.Join();
}
// foreach (var t in threads)
// {
// t.Join();
// }
sw.Stop();
var elapsed = sw.ElapsedMilliseconds;
Console.WriteLine($"elapsed: {elapsed} ms");

Categories

Resources