c# TaskFactory ContinueWhenAll unexpectedly running before all tasks complete - c#

I have a data processing program in C# (.NET 4.6.2; WinForms for the UI). I'm experiencing a strange situation where computer speed seems to be causing Task.Factory.ContinueWhenAll to run earlier than expected or some Tasks are reporting complete before actually running. As you can see below, I have a queue of up to 390 tasks, with no more than 4 in queue at once. When all tasks are complete, the status label is updated to say complete. The ScoreManager involves retrieving information from a database, performing several client-side calculations, and saving to an Excel file.
When running the program from my laptop, everything functions as expected; when running from a substantially more powerful workstation, I experience this issue. Unfortunately, due to organizational limitations, I likely cannot get Visual Studio on the workstation to debug directly. Does anyone have any idea what might be causing this for me to investigate?
private void button1_Click(object sender, EventArgs e)
{
int startingIndex = cbStarting.SelectedIndex;
int endingIndex = cbEnding.SelectedIndex;
lblStatus.Text = "Running";
if (endingIndex < startingIndex)
{
MessageBox.Show("Ending must be further down the list than starting.");
return;
}
List<string> lItems = new List<string>();
for (int i = startingIndex; i <= endingIndex; i++)
{
lItems.Add(cbStarting.Items[i].ToString());
}
System.IO.Directory.CreateDirectory(cbMonth.SelectedItem.ToString());
ThreadPool.SetMaxThreads(4, 4);
List<Task<ScoreResult>> tasks = new List<Task<ScoreResult>>();
for (int i = startingIndex; i <= endingIndex; i++)
{
ScoreManager sm = new ScoreManager(cbStarting.Items[i].ToString(),
cbMonth.SelectedItem.ToString());
Task<ScoreResult> task = Task.Factory.StartNew<ScoreResult>((manager) =>
((ScoreManager)manager).Execute(), sm);
sm = null;
Action<Task<ScoreResult>> itemcomplete = ((_task) =>
{
if (_task.Result.errors.Count > 0)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" had errors/warnings:" + Environment.NewLine);
});
foreach (ErrorMessage error in _task.Result.errors)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("\t" + error.ErrorText +
Environment.NewLine);
});
}
}
else
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" succeeded." + Environment.NewLine);
});
}
});
task.ContinueWith(itemcomplete);
tasks.Add(task);
}
Action<Task[]> allComplete = ((_tasks) =>
{
lblStatus.Invoke((MethodInvoker)delegate
{
lblStatus.Text = "Complete";
});
});
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
}

You are creating fire-and-forget tasks, that you don't wait or observe, here:
task.ContinueWith(itemcomplete);
tasks.Add(task);
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
The ContinueWith method returns a Task. You probably need to attach the allComplete continuation to these tasks, instead of their antecedents:
List<Task> continuations = new List<Task>();
Task continuation = task.ContinueWith(itemcomplete);
continuations.Add(continuation);
Task.Factory.ContinueWhenAll<ScoreResult>(continuations.ToArray(), allComplete);
As a side note, you could make your code half in size and significantly more readable if you used async/await instead of the old-school ContinueWith and Invoke((MethodInvoker) technique.
Also: setting an upper limit to the number of ThreadPool threads in order to control the degree of parallelism is extremely inadvisable:
ThreadPool.SetMaxThreads(4, 4); // Don't do this!
You can use the Parallel class instead. It allows controlling the MaxDegreeOfParallelism quite easily.

After discovering state was IsFaulted, I added some code to add some exception information to the log (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/exception-handling-task-parallel-library). Seems the problem is an underlying database issue where there are not enough connections left in the connection pool (Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)--the additional speed allows queries to fire more quickly/frequently. Not sure entirely why, as I do have the SqlConnection enclosed in a using clause, but investigating a few things on that front. At any rate, the problem is clearly a little different than what I thought above, so marking this quasi-answered.

Related

C#, multi-threading - Form not updating

I have a monte-carlo simulation running across multiple threads with a progress bar to inform the user how it's going. The progress bar management is done in a separate thread using Invoke, but the Form is not updating.
Here is my code:
Thread reportingThread = new Thread(() => UpdateProgress(iSims, ref myBag));
reportingThread.Priority = ThreadPriority.AboveNormal;
reportingThread.Start();`
and here is the function being called:
private void UpdateProgress(int iSims, ref ConcurrentBag<simResult> myBag)
{
int iCount;
string sText;
if (myBag == null)
iCount = 0;
else
iCount = myBag.Count;
while (iCount < iSims)
{
if (this.Msg.InvokeRequired)
{
sText = iCount.ToString() + " simultions of " + iSims + " completed.";
this.Msg.BeginInvoke((MethodInvoker) delegate() { this.Msg.Text = sText; this.Refresh(); });
}
Thread.Sleep(1000);
iCount = myBag.Count;
}
}
I have used both Application.DoEvents() and this.refresh() to try to force the form to update, but nothing happens.
UPDATE: Here is the procedure calling the above function
private void ProcessLeases(Boolean bValuePremium)
{
int iSims, iNumMonths, iNumYears, iIndex, iNumCores, iSimRef;
int iNumSimsPerThread, iThread, iAssets, iPriorityLevel;
string sMsg;
DateTime dtStart, dtEnd;
TimeSpan span;
var threads = new List<Thread>();
ConcurrentBag<simResult> myBag = new ConcurrentBag<simResult>();
ConcurrentBag<summaryResult> summBag = new ConcurrentBag<summaryResult>();
this.Msg.Text = "Updating all settings";
Application.DoEvents();
ShowProgressPanel();
iSims = objSettings.getSimulations();
iNumCores = Environment.ProcessorCount;
this.Msg.Text = "Initialising model";
Application.DoEvents();
iNumSimsPerThread = Convert.ToInt16(Math.Round(Convert.ToDouble(iSims) / Convert.ToDouble(iNumCores), 0));
this.Msg.Text = "Spawning " + iNumCores.ToString() + " threads";
for (iThread = 0; iThread < iNumCores; iThread++)
{
int iStart, iEnd;
if (iThread == 0)
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = ((iThread + 1) * iNumSimsPerThread);
}
else
{
if (iThread < (iNumCores - 1))
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = ((iThread + 1) * iNumSimsPerThread);
}
else
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = iSims;
}
}
Thread thread = new Thread(() => ProcessParallelMonteCarloTasks(iStart, iEnd, iNumMonths, iSimRef, iSims, ref objDB, iIndex, ref objSettings, ref myBag, ref summBag));
switch (iPriorityLevel)
{
case 1: thread.Priority = ThreadPriority.Highest; break;
case 2: thread.Priority = ThreadPriority.AboveNormal; break;
default: thread.Priority = ThreadPriority.Normal; break;
}
thread.Start();
threads.Add(thread);
}
// Now start the thread to aggregate the MC results
Thread MCThread = new Thread(() => objPortfolio.MCAggregateThread(ref summBag, (iSims * iAssets), iNumMonths));
MCThread.Priority = ThreadPriority.AboveNormal;
MCThread.Start();
threads.Add(MCThread);
// Here we review the CollectionBag size to report progress to the user
Thread reportingThread = new Thread(() => UpdateProgress(iSims, ref myBag));
reportingThread.Priority = ThreadPriority.AboveNormal;
reportingThread.Start();
// Wait for all threads to complete
//this.Msg.Text = iNumCores.ToString() + " Threads running.";
foreach (var thread in threads)
thread.Join();
reportingThread.Abort();
this.Msg.Text = "Aggregating results";
Application.DoEvents();
this.Msg.Text = "Preparing Results";
Application.DoEvents();
ShowResults();
ShowResultsPanel();
}
As you can see, there are a number of updates to the Form before my Invoked call and they all work fine - in each case, I am using Application.DoEvents() to update.
myBag is a ConcurrentBag into which each monte-carlo thread dumps it's results. By using the Count method, I can see how many simulations have completed and update the user.
foreach (var thread in threads)
thread.Join();
This is your problem. You are blocking here so nothing will ever update in the UI thread until all your threads complete.
This is a critical point - .DoEvents() happens naturally and all by itself every time a block of code you have attached to a user interface event handler completes executing. One of your primary responsibilities as a developer is to make sure that any code executing in a user interface event handler completes in a timely manner (a few hundred milliseconds, maximum). If you write your code this way then there is never, ever, a need to call DoEvents()... ever.
You should always write your code this way.
Aside from performance benefits, a major plus of using threads is that they are asynchronous by nature - to take advantage of this you have to write your code accordingly. Breaking out of procedural habits is a hard one. What you need to do is to forget the .Join altogether and get out of your ProcessLeases method here - let the UI have control again.
You are dealing with updates in your threads already so all you need is completion notification to let you pick up in a new handler when all of your threads finish their work. You'll need to keep track of your threads - have them each notify on completion (ie: invoke some delegate back on the UI thread, etc) and in whatever method handles it you would do something like
if (allThreadsAreFinished) // <-- You'll need to implement something here
{
reportingThread.Abort();
this.Msg.Text = "Preparing Results";
ShowResults();
ShowResultsPanel();
}
Alternatively, you could also simply call ProcessLeases in a background thread (making sure to correctly invoke all of your calls within it) and then it wouldn't matter that you are blocking that thread with a .Join. You could also then do away with all of the messy calls to .DoEvents().
Additionally, you don't need the call to this.Refresh(); here :
this.Msg.BeginInvoke((MethodInvoker) delegate() {
this.Msg.Text = sText;
this.Refresh();
});
If you aren't blocking the UI thread the control will update just fine without it and you'll only add extra work for nothing. If you are blocking the UI thread then adding the .Refresh() call won't help because the UI thread won't be free to execute it any more than it will be free to execute the previous line. This is programming chaotically - randomly adding code hoping that it will work instead of examining and understanding the reasons why it doesn't.
Chapter 2 : The Workplace Analogy.
Imagine the UI thread is like the manager. The manager can delegate a task in several ways. Using .Join as you've done it is a bit like the manager giving everyone a job to do - Joe gets one job, Lucy gets another, Bill gets a third, and Sara gets a fourth. The manager has follow-up work to do once everyone is done so he comes up with a plan to get it done as soon as possible.
Immediately after giving everyone their task, the manager goes and sits at Joe's desk and stares at him, doing nothing, until Joe is done. When Joe finishes, he moves to Lucy's desk to check if she is done. If she isn't he waits there until Lucy finishes, then moves to Bill's desk and stares at him until he is done... then moves to Sara's desk.
Clearly this isn't productive. Furthermore, each of the four team members have been sending email status updates (Manager.BeginInvoke -> read your email!) to their manager but he hasn't read any of them because he has been spending all of his time sitting at their desks, staring at them, waiting for them to finish their tasks. He hasn't done anything else, for that matter, either. The bosses have been asking what's going on, his phone's been ringing, nobody has updated the weekly financials - nothing. The manager hasn't been able to do anything else because he decided that he needed to sit on his bottom and watch his team work until they finished.
The manager isn't responding... The manager may respond again if you wait. Do you want to fire the manager?
[YES - FIRE HIM] [NO - Keep Waiting]
Better, one would think, if the manager simply set everyone off to work on stuff and then got on with doing other things. All he cares about is when they finish working so all it takes is one more instruction for them to notify him when their work is complete. The UI thread is like your application's manager - its time is precious and you should use as little of it as absolutely necessary. If you have work to do, delegate to a worker thread and don't have the manager sit around waiting for others to finish work - have them notify when things are ready and let the manager go back to work.
Well the code is very partial, but at a glance if
this.Msg.InvokeRequired == false
the following code doesn't get executed. Can that be the issue?

Adding a multithreading scenario for an application in c#

I have developed an application in c#. The class structure is as follows.
Form1 => The UI form. Has a backgroundworker, processbar, and a "ok" button.
SourceReader, TimedWebClient, HttpWorker, ReportWriter //clases do some work
Controller => Has the all over control. From "ok" button click an instance of this class called "cntrl" is created. This cntrlr is a global variable in Form1.cs.
(At the constructor of the Controler I create SourceReader, TimedWebClient,HttpWorker,ReportWriter instances. )
Then I call the RunWorkerAsync() of the background worker.
Within it code is as follows.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int iterator = 1;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
cntrlr.Vmain(iterator-1);
backgroundWorker1.ReportProgress(iterator);
}
}
At themoment ReportProgress updates the progressbar.
The urlList mentioned above has 1000 of urls. cntlr.Vamin(int i) process the whole process at themoment. I want to give the task to several threads, each one having to process 100 of urls. Though access for other instances or methods of them is not prohibited, access to ReportWriter should be limited to only one thread at a time. I can't find a way to do this. If any one have an idea or an answer, please explain.
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.
I think I would have used the ThreadPool, instead of background worker, and given each thread 1, not 100 url's to process. The thread pool will limit the number of threads it starts at once, so you wont have to worry about getting 1000 requests at once. Have a look here for a good example
http://msdn.microsoft.com/en-us/library/3dasc8as.aspx
Feeling a little more adventurous? Consider using TPL DataFlow to download a bunch of urls:
var urls = new[]{
"http://www.google.com",
"http://www.microsoft.com",
"http://www.apple.com",
"http://www.stackoverflow.com"};
var tb = new TransformBlock<string, string>(async url => {
using(var wc = new WebClient())
{
var data = await wc.DownloadStringTaskAsync(url);
Console.WriteLine("Downloaded : {0}", url);
return data;
}
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 4});
var ab = new ActionBlock<string>(data => {
//process your data
Console.WriteLine("data length = {0}", data.Length);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
tb.LinkTo(ab); //join output of producer to consumer block
foreach(var u in urls)
{
tb.Post(u);
}
tb.Complete();
Note how you can control the parallelism of each block explicitly, so you can gather in parallel but process without going concurrent (for example).
Just grab it with nuget. Easy.

Logging exceptions for each item in Parallel.ForEach and Task.Factory.StartNew from it

I am trying to use Parallel.ForEach on a list and for each item in the list, trying to make a database call. I am trying to log each item with or without error. Just wanted to check with experts here If I am doing thinsg right way. For this example, I am simulating the I/O using the File access instead of database access.
static ConcurrentQueue<IdAndErrorMessage> queue = new ConcurrentQueue<IdAndErrorMessage>();
private static void RunParallelForEach()
{
List<int> list = Enumerable.Range(1, 5).ToList<int>();
Console.WriteLine("Start....");
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Parallel.ForEach(list, (tempId) =>
{
string errorMessage = string.Empty;
try
{
ComputeBoundOperationTest(tempId);
try
{
Task[] task = new Task[1]
{
Task.Factory.StartNew(() => this.contentFactory.ContentFileUpdate(content, fileId))
};
}
catch (Exception ex)
{
this.tableContentFileConversionInfoQueue.Enqueue(new ContentFileConversionInfo(fileId, ex.ToString()));
}
}
catch (Exception ex)
{
errorMessage = ex.ToString();
}
if (queue.SingleOrDefault((IdAndErrorMessageObj) => IdAndErrorMessageObj.Id == tempId) == null)
{
queue.Enqueue(new IdAndErrorMessage(tempId, errorMessage));
}
}
);
Console.WriteLine("Stop....");
Console.WriteLine("Total milliseconds :- " + stopWatch.ElapsedMilliseconds.ToString());
}
Below are the helper methods :-
private static byte[] FileAccess(int id)
{
if (id == 5)
{
throw new ApplicationException("This is some file access exception");
}
return File.ReadAllBytes(Directory.GetFiles(Environment.SystemDirectory).First());
//return File.ReadAllBytes("Files/" + fileName + ".docx");
}
private static void ComputeBoundOperationTest(int tempId)
{
//Console.WriteLine("Compute-bound operation started for :- " + tempId.ToString());
if (tempId == 4)
{
throw new ApplicationException("Error thrown for id = 4 from compute-bound operation");
}
Thread.Sleep(20);
}
private static void EnumerateQueue(ConcurrentQueue<IdAndErrorMessage> queue)
{
Console.WriteLine("Enumerating the queue items :- ");
foreach (var item in queue)
{
Console.WriteLine(item.Id.ToString() + (!string.IsNullOrWhiteSpace(item.ErrorMessage) ? item.ErrorMessage : "No error"));
}
}
There is no reason to do this:
/*Below task is I/O bound - so do this Async.*/
Task[] task = new Task[1]
{
Task.Factory.StartNew(() => FileAccess(tempId))
};
Task.WaitAll(task);
By scheduling this in a separate task, and then immediately waiting on it, you're just tying up more threads. You're better off leaving this as:
/*Below task is I/O bound - but just call it.*/
FileAccess(tempId);
That being said, given that you're making a logged value (exception or success) for every item, you might want to consider writing this into a method and then just calling the entire thing as a PLINQ query.
For example, if you write this into a method that handles the try/catch (with no threading), and returns the "logged string", ie:
string ProcessItem(int id) { // ...
You could write the entire operation as:
var results = theIDs.AsParallel().Select(id => ProcessItem(id));
You might want to remove Console.WriteLine from thread code. Reason being there can be only one console per windows app. So if two or more threads going to write parallel to console, one has to wait.
In replacement to your custom error queue you might want to see .NET 4's Aggregate Exception and catch that and process exceptions accordingly. The InnerExceptions propery will give you the necessary list of exceptions. More here
And a general code review comment, don't use magic numbers like 4 in if (tempId == 4) Instead have some const defined which tells what 4 stands for. e.g. if (tempId == Error.FileMissing)
Parallel.ForEach runs an action/func concurrently up to a certain number of simultaneous instances. If what each of those iterations is doing is not inherently independent on one another, you're not getting any performance gains. And, likely are reducing performance by introducing expensive context switching and contention. You say that you want to do a "database call" and simulating it with a file operation. If each iteration uses the same resource (same row in a database table, for example; or try to write to the same file in the same location) then they're not really going to be run in parallel. only one will be running at a time, the others will simply be "waiting" to get a hold of the resource--needlessly making your code complex.
You haven't detailed what you want to do for each iteration; but when I've encountered situations like this with other programmers, they almost always aren't really doing things in parallel and they've simply gone through and replaced foreachs with Parallel.ForEach in the hopes of magically gaining performance or magically making use of multi-CPU/Core processors.

Deadlock Risk in Nested Parallel For

Take the following naive implementation of a nested async loop using the ThreadPool:
ThreadPool.SetMaxThreads(10, 10);
CountdownEvent icnt = new CountdownEvent(1);
for (int i = 0; i < 50; i++)
{
icnt.AddCount();
ThreadPool.QueueUserWorkItem((inum) =>
{
Console.WriteLine("i" + inum + " scheduled...");
Thread.Sleep(10000); // simulated i/o
CountdownEvent jcnt = new CountdownEvent(1);
for (int j = 0; j < 50; j++)
{
jcnt.AddCount();
ThreadPool.QueueUserWorkItem((jnum) =>
{
Console.WriteLine("j" + jnum + " scheduled...");
Thread.Sleep(20000); // simulated i/o
jcnt.Signal();
Console.WriteLine("j" + jnum + " complete.");
}, j);
}
jcnt.Signal();
jcnt.Wait();
icnt.Signal();
Console.WriteLine("i" + inum + " complete.");
}, i);
}
icnt.Signal();
icnt.Wait();
Now, you'd never use this pattern (it will deadlock on start) but it does demonstrate a specific deadlock you can cause with the threadpool - by blocking while waiting for nested threads to complete after the blocking threads have consumed the entire pool.
I'm wondering if there's any potential risk of generating similarly detrimental behavior using the nested Parallel.For version of this:
Parallel.For(1, 50, (i) =>
{
Console.WriteLine("i" + i + " scheduled...");
Thread.Sleep(10000); // simulated i/o
Parallel.For(1, 5, (j) =>
{
Thread.Sleep(20000); // simulated i/o
Console.WriteLine("j" + j + " complete.");
});
Console.WriteLine("i" + i + " complete.");
});
Obviously the scheduling mechanism is far more sophisticated (and I haven't seen this version to deadlock at all), but the underlying risk seems like it may still be lurking there. Is it theoretically possible to dry up the pool that the Parallel.For uses to the point of creating deadlock by having dependencies on nested threads? i.e. is there a limit to the number of threads that the Parallel.For keeps in it's back pocket for jobs that are scheduled after a delay?
No, there is no risk of a deadlock like that in Parallel.For() (or Parallel.ForEach()).
There are some factors that would lower the risk of deadlock (like dynamic count of threads used). But there is also a reason why the deadlock is impossible: the iteration is run on the original thread too. What that means is that if the ThreadPool is completely busy, the computation will run completely synchronously. In that case, you won't get any speedup from using Parallel.For(), but your code will still run, no deadlocks.
Also, a similar situation with Tasks is also solved correctly: if you Wait() on a Task (or access its Result) that hasn't been scheduled yet, it will run inline in the current thread. I think this is primarily a performance optimization, but I think it could also avoid deadlocks in some specific cases.
But I think the question is more theoretical than practical. .Net 4 ThreadPool has default maximum thread count set to something like a thousand. And if you have thousand Threads blocking at the same moment, you're doing something very wrong.

.NET Multithreading help

I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.
Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:
private void CreateThreads()
{
_startTime = DateTime.Now;
var totalThreads = 0;
var totalRecords = 0;
progressThreadsCreated.Maximum = _threadCount;
progressThreadsCreated.Step = 1;
LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
this.Update();
for(var i = 1; i <= _threadCount; i++)
{
LabelThreadsCreated.Text = i + " / " + _threadCount;
progressThreadsCreated.Value = i;
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
totalThreads += 1;
LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();
var paramss = new ArrayList { i, records };
var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
thread.Start(paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (!runOnce)
{
CreateProgressArea(threadCount, records.Count);
}
else
{
ResetProgressBarMethod(threadCount, records.Count);
}
runOnce = true;
var counter = 0;
if (records.Count > 0)
{
foreach (var record in records)
{
counter += 1;
runningCount += 1;
_totalRecords += 1;
var rec = record;
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);
if (_stopThreads)
{
break;
}
}
UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);
if (!_noRecordsInPool)
{
records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
if (records == null || records.Count <= 0)
{
_noRecordsInPool = true;
break;
}
else
{
lastCount = records.Count;
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.
If you decide to move to .NET 4.0, Tasks would be another way to go.
I allow the user to input the number
of threads they would like to have
running. I also allow the user to
allocate the number of records to
process per thread.
This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.
I am not performing any locking (and
not sure I need to or not) on the
threads.
The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.
Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.
In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)
Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.
The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?
How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?

Categories

Resources