Hi is there any possible way to get the status of the threads from a Thread.Join, or can i make a breakout from a Thread.Join at a specified period?
For eg:
I have a loop that have n-jobs, i've got 3 free cores for 3 parallel threads, and after Joining the 3 threads, i wonder if there's a way to check if a thread has done it's job to start another job in it's place.
I want to keep the 3 cores working all time, not to wait for all threads to stop and then start another 3 of them.
The simplest, and most likely best, solution is to use the threadpool. The threadpool automatically scales based on available processors and cores.
ThreadPool.QueueUserWorkItem(state => TaskOne());
ThreadPool.QueueUserWorkItem(state => TaskTwo());
ThreadPool.QueueUserWorkItem(state => TaskThree());
ThreadPool.QueueUserWorkItem(state => TaskFour());
If you need to do this the hard way, you could keep a queue of pending tasks and a list of currently running tasks, and use a timeout for the Join() call so that it returns false if the thread is not ready.
I can't think of any reason to prefer the complex to the simple solution, but there might be one, of course.
var MAX_RUNNING = 3;
var JOIN_TIMEOUT_MS = 50;
var waiting = new Queue<ThreadStart>();
var running = new List<Thread>();
waiting.Enqueue(new ThreadStart(TaskOne));
waiting.Enqueue(new ThreadStart(TaskTwo));
waiting.Enqueue(new ThreadStart(TaskThree));
waiting.Enqueue(new ThreadStart(TaskFour));
while (waiting.Any() || running.Any())
{
while (running.Count < MAX_RUNNING && waiting.Any())
{
var next = new Thread(waiting.Dequeue());
next.Start();
running.Add(next);
}
for (var i = running.Count - 1; i >= 0; --i)
{
var t = running[i];
if(t.ThreadState == System.Threading.ThreadState.Stopped) {
running.RemoveAt(i);
break;
}
if (t.Join(JOIN_TIMEOUT_MS))
{
running.RemoveAt(i);
break;
}
}
}
Related
I have a listener that waits for messages to arrive in a queue. I want to throttle the amount of tasks so that for every 1000 messages in queue I need to wait until they complete before processing the next set of messages. The reason for this is the ProcessMessage code calls a WCF service and that seems to get overloaded with too many concurrent calls at once.
I want to know is this the best way to achieve thisthrottling? This code below looks a bit hacky.
var isEmpty = false;
var maxThreads = 1000;
var currentThreadCount = 0;
List<Task> taskList = new List<Task>();
while(!isEmpty)
{
var message = GetMessageFromServer();
if(!String.IsNullorEmpty(message))
{
isEmpty = true;
}
else
{
if(currentThreadCount == maxThreads)
{
task.WaitAll(tasksList.ToArray());
currentThreadCount = 0;
}
else
{
taskList.Add(Task.Run(() => ProcessMessage(message)));
}
}
}
Assuming that you are interested in the result of ProcessMessage I would suggest considering the use of Channels (cf.).
What this could improve with respect to your solution is that you could ensure a consistent amount of work and not stopping when 1000 is reached and waiting until all tasks are finished.
I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});
I'm using C# Parallel.ForEach to process more than thousand subsets of data. One set takes 5-30 minutes to process, depending on size of the set. In my computer with option
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount
I'll get 8 parallel processes. As I understood, processes are divided equally between parallel tasks (e.g. the first task gets jobs number 1,9,17 etc, the second gets 2,10,18 etc.); therefore, one task can finish own jobs sooner than others. Because those sets of data took less time than others.
The problem is that four parallel tasks finish their jobs within 24 hours, but the last one finish in 48 hours. It there some chance to organize parallelism so that all parallel tasks are finishing equally? It means all parallel tasks continue working until all jobs are done?
Since the jobs are not equal, you can't split the number of jobs between processors and have them finish at about the same time. I think what you need here is 8 worker threads that retrieve the next job in line. You will have to use a lock on the function to get the next job.
Somebody correct me if I'm wrong, but off the top of my head... a worker thread could be given a function like this:
public void ProcessJob()
{
for (Job myJob = GetNextJob(); myJob != null; myJob = GetNextJob())
{
// process job
}
}
And the function to get the next job would look like:
private List<Job> jobs;
private int currentJob = 0;
private Job GetNextJob()
{
lock (jobs)
{
Job job = null;
if (currentJob < jobs.Count)
{
job = jobs[currentJob];
currentJob++;
}
return job;
}
}
It seems that there is no ready-to-use solution and it has to be created.
My previous code was:
var ListOfSets = (from x in Database
group x by x.SetID into z
select new { ID = z.Key}).ToList();
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(ListOfSets, po, SingleSet=>
{
AnalyzeSet(SingleSet.ID);
});
To share work equally between all CPU-s, I still use Parallel to do the work, but instead of ForEach I use For and an idea from Matt. The new code is:
Parallel.For(0, Environment.ProcessorCount, i=>
{
while(ListOfSets.Count() > 0)
{
double SetID = 0;
lock (ListOfSets)
{
SetID = ListOfSets[0].ID;
ListOfSets.RemoveAt(0);
}
AnalyzeSet(SetID);
}
});
So, thank you for your advice.
One option, as suggested by others, is to manage your own producer consumer queue. I'd like to note that using the BlockingCollection makes this very easy to do.
BlockingCollection<JobData> queue = new BlockingCollection<JobData>();
//add data to queue; if it can be done quickly, just do it inline.
//If it's expensive, start a new task/thread just to add items to the queue.
foreach (JobData job in data)
queue.Add(job);
queue.CompleteAdding();
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var job in queue.GetConsumingEnumerable())
{
ProcessJob(job);
}
}, TaskCreationOptions.LongRunning);
}
I create dynamic threads in C# and I need to get the status of those running threads.
List<string>[] list;
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
}
for (int i = 0; i < list[0].Count; i++)
{
// here how can i get list of running thread here.
}
How can you get list of running threads?
On Threads
I would avoid explicitly creating threads on your own.
It is much more preferable to use the ThreadPool.QueueUserWorkItem or if you do can use .Net 4.0 you get the much more powerful Task parallel library which also allows you to use a ThreadPool threads in a much more powerful way (Task.Factory.StartNew is worth a look)
What if we choose to go by the approach of explicitly creating threads?
Let's suppose that your list[0].Count returns 1000 items. Let's also assume that you are performing this on a high-end (at the time of this writing) 16core machine. The immediate effect is that we have 1000 threads competing for these limited resources (the 16 cores).
The larger the number of tasks and the longer each of them runs, the more time will be spent in context switching. In addition, creating threads is expensive, this overhead creating each thread explicitly could be avoided if an approach of reusing existing threads is used.
So while the initial intent of multithreading may be to increase speed, as we can see it can have quite the opposite effect.
How do we overcome 'over'-threading?
This is where the ThreadPool comes into play.
A thread pool is a collection of threads that can be used to perform a number of tasks in the background.
How do they work:
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are placed in queue until they can be serviced as threads become available.
So we can see that by using a thread pool threads we are more efficient both
in terms of maximizing the actual work getting done. Since we are not over saturating the processors with threads, less time is spent switching between threads and more time actually executing the code that a thread is supposed to do.
Faster thread startup: Each threadpool thread is readily available as opposed to waiting until a new thread gets constructed.
in terms of minimising memory consumption, the threadpool will limit the number of threads to the threadpool size enqueuing any requests that are beyond the threadpool size limit. (see ThreadPool.GetMaxThreads). The primary reason behind this design choice, is of course so that we don't over-saturate the limited number of cores with too many thread requests keeping context switching to lower levels.
Too much Theory, let's put all this theory to the test!
Right, it's nice to know all this in theory, but let's put it to practice and see what
the numbers tell us, with a simplified crude version of the application that can give us a coarse indication of the difference in orders of magnitude. We will do a comparison between new Thread, ThreadPool and Task Parallel Library (TPL)
new Thread
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Thread thread = new Thread(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
thread.Name = "SID" + iCopy; // you can also use i here.
thread.Start();
}
Console.ReadKey();
Console.WriteLine(GC.GetTotalMemory(false) - initialMemoryFootPrint);
Console.ReadKey();
}
Result:
ThreadPool.EnqueueUserWorkItem
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
ThreadPool.QueueUserWorkItem((w) =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0)
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
Task Parallel Library (TPL)
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Task.Factory.StartNew(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
So we can see that:
+--------+------------+------------+--------+
| | new Thread | ThreadPool | TPL |
+--------+------------+------------+--------+
| Time | 6749 | 228ms | 222ms |
| Memory | ≈300kb | ≈103kb | ≈123kb |
+--------+------------+------------+--------+
The above falls nicely inline to what we anticipated in theory. High memory for new Thread as well as slower overall performance when compared to ThreadPool. ThreadPool and TPL have equivalent performance with TPL having a slightly higher memory footprint than a pure thread pool but it's probably a price worth paying given the added flexibility Tasks provide (such as cancellation, waiting for completion querying status of task)
At this point, we have proven that using ThreadPool threads is the preferable option in terms of speed and memory.
Still, we have not answered your question. How to track the state of the threads running.
To answer your question
Given the insights we have gathered, this is how I would approach it:
List<string>[] list = listdbConnect.Select()
int itemCount = list[0].Count;
Task[] tasks = new Task[itemCount];
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
// NOTE: Do not use i in here as it is not thread safe to do so!
sendMessage(list[0]['1']);
//calling callback function
});
}
// if required you can wait for all tasks to complete
Task.WaitAll(tasks);
// or for any task you can check its state with properties such as:
tasks[1].IsCanceled
tasks[1].IsCompleted
tasks[1].IsFaulted
tasks[1].Status
As a final note, you can not use the variable i in your Thread.Start, since it would create a closure over a changing variable which would effectively be shared amongst all Threads. To get around this (assuming you need to access i), simply create a copy of the variable and pass the copy in, this would make one closure per thread which would make it thread safe.
Good luck!
Use Process.Threads:
var currentProcess = Process.GetCurrentProcess();
var threads = currentProcess.Threads;
Note: any threads owned by the current process will show up here, including those not explicitly created by you.
If you only want the threads that you created, well, why don't you just keep track of them when you create them?
Create a List<Thread> and store each new thread in your first for loop in it.
List<string>[] list;
List<Thread> threads = new List<Thread>();
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
threads.add(th)
}
for (int i = 0; i < list[0].Count; i++)
{
threads[i].DoStuff()
}
However if you don't need i you can make the second loop a foreach instead of a for
As a side note, if your sendMessage function does not take very long to execute you should somthing lighter weight then a full Thread, use a ThreadPool.QueueUserWorkItem or if it is available to you, a Task
Process.GetCurrentProcess().Threads
This gives you a list of all threads running in the current process, but beware that there are threads other than those you started yourself.
Use Process.Threads to iterate through your threads.
I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.
Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:
private void CreateThreads()
{
_startTime = DateTime.Now;
var totalThreads = 0;
var totalRecords = 0;
progressThreadsCreated.Maximum = _threadCount;
progressThreadsCreated.Step = 1;
LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
this.Update();
for(var i = 1; i <= _threadCount; i++)
{
LabelThreadsCreated.Text = i + " / " + _threadCount;
progressThreadsCreated.Value = i;
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
totalThreads += 1;
LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();
var paramss = new ArrayList { i, records };
var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
thread.Start(paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (!runOnce)
{
CreateProgressArea(threadCount, records.Count);
}
else
{
ResetProgressBarMethod(threadCount, records.Count);
}
runOnce = true;
var counter = 0;
if (records.Count > 0)
{
foreach (var record in records)
{
counter += 1;
runningCount += 1;
_totalRecords += 1;
var rec = record;
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);
if (_stopThreads)
{
break;
}
}
UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);
if (!_noRecordsInPool)
{
records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
if (records == null || records.Count <= 0)
{
_noRecordsInPool = true;
break;
}
else
{
lastCount = records.Count;
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.
If you decide to move to .NET 4.0, Tasks would be another way to go.
I allow the user to input the number
of threads they would like to have
running. I also allow the user to
allocate the number of records to
process per thread.
This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.
I am not performing any locking (and
not sure I need to or not) on the
threads.
The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.
Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.
In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)
Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.
The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?
How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?