So let me explain : I'm making a file signature scanner in C# using MD5 to compute the hashes. A problem I encountered is that ComputeHash() would block the UI thread. So then I thought of Task.Run(), which would solve my problems, at least I hoped so.
Even when putting the hash computing (and the whole) in an async task, it would still block, or at least, slow down the UI thread. And if I remove that hash computing, the UI thread isn't blocked anymore.
Here is my little snippet of code :
Task.Run(() =>
{
if (int.Parse(label5.Text) != int.Parse(label7.Text))
{
listBox1.SelectedIndex++;
label11.Text = listBox1.SelectedItem.ToString();
/*progressBar1.Increment(1);
label5.Text = progressBar1.Value.ToString();
int percentage = Convert.ToInt32(progressBar1.Value / (double)progressBar1.Maximum * 100);
label2.Text = "Scanning files (" + percentage + "%)";*/
label2.Text = "Scanning";
label9.Text = currentThreats.ToString();
try
{
StringBuilder buff = new StringBuilder();
using (MD5 md5 = MD5.Create())
{
using (FileStream stream = File.OpenRead(label11.Text))
{
byte[] hash = md5.ComputeHash(stream);
buff.Append(BitConverter.ToString(hash).Replace("-", string.Empty));
}
}
if (Reference.VirusList.Contains(buff.ToString())) currentThreats++;
}
catch { }
scanned++;
label5.Text = scanned.ToString();
}
else
{
StopCurrentScan(false);
}
});
NOTE : This is being run in a Timer with an interval of 1 millisecond. Just precising it in case it might help solve the problem.
Well, now my UI thread isn't lagging out as hell anymore since I followed the steps below that people suggested me to do :
I used await Task.Run() instead of Task.Run() to log exceptions, and I added Control.CheckForIllegalCrossThreads = true; just after InitializeComponent so I could better understand my mistakes.
I removed ALL UI thread calls (like updating UI elements) from the async task, I instead (in my case) use a timer that updates the UI elements I need from local variables.
Similar to 2. but reversed, that means only "resource-intensive" tasks (like ComputeHash()) are in my async task.
Related
I have a data processing program in C# (.NET 4.6.2; WinForms for the UI). I'm experiencing a strange situation where computer speed seems to be causing Task.Factory.ContinueWhenAll to run earlier than expected or some Tasks are reporting complete before actually running. As you can see below, I have a queue of up to 390 tasks, with no more than 4 in queue at once. When all tasks are complete, the status label is updated to say complete. The ScoreManager involves retrieving information from a database, performing several client-side calculations, and saving to an Excel file.
When running the program from my laptop, everything functions as expected; when running from a substantially more powerful workstation, I experience this issue. Unfortunately, due to organizational limitations, I likely cannot get Visual Studio on the workstation to debug directly. Does anyone have any idea what might be causing this for me to investigate?
private void button1_Click(object sender, EventArgs e)
{
int startingIndex = cbStarting.SelectedIndex;
int endingIndex = cbEnding.SelectedIndex;
lblStatus.Text = "Running";
if (endingIndex < startingIndex)
{
MessageBox.Show("Ending must be further down the list than starting.");
return;
}
List<string> lItems = new List<string>();
for (int i = startingIndex; i <= endingIndex; i++)
{
lItems.Add(cbStarting.Items[i].ToString());
}
System.IO.Directory.CreateDirectory(cbMonth.SelectedItem.ToString());
ThreadPool.SetMaxThreads(4, 4);
List<Task<ScoreResult>> tasks = new List<Task<ScoreResult>>();
for (int i = startingIndex; i <= endingIndex; i++)
{
ScoreManager sm = new ScoreManager(cbStarting.Items[i].ToString(),
cbMonth.SelectedItem.ToString());
Task<ScoreResult> task = Task.Factory.StartNew<ScoreResult>((manager) =>
((ScoreManager)manager).Execute(), sm);
sm = null;
Action<Task<ScoreResult>> itemcomplete = ((_task) =>
{
if (_task.Result.errors.Count > 0)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" had errors/warnings:" + Environment.NewLine);
});
foreach (ErrorMessage error in _task.Result.errors)
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("\t" + error.ErrorText +
Environment.NewLine);
});
}
}
else
{
txtLog.Invoke((MethodInvoker)delegate
{
txtLog.AppendText("Item " + _task.Result.itemdetail +
" succeeded." + Environment.NewLine);
});
}
});
task.ContinueWith(itemcomplete);
tasks.Add(task);
}
Action<Task[]> allComplete = ((_tasks) =>
{
lblStatus.Invoke((MethodInvoker)delegate
{
lblStatus.Text = "Complete";
});
});
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
}
You are creating fire-and-forget tasks, that you don't wait or observe, here:
task.ContinueWith(itemcomplete);
tasks.Add(task);
Task.Factory.ContinueWhenAll<ScoreResult>(tasks.ToArray(), allComplete);
The ContinueWith method returns a Task. You probably need to attach the allComplete continuation to these tasks, instead of their antecedents:
List<Task> continuations = new List<Task>();
Task continuation = task.ContinueWith(itemcomplete);
continuations.Add(continuation);
Task.Factory.ContinueWhenAll<ScoreResult>(continuations.ToArray(), allComplete);
As a side note, you could make your code half in size and significantly more readable if you used async/await instead of the old-school ContinueWith and Invoke((MethodInvoker) technique.
Also: setting an upper limit to the number of ThreadPool threads in order to control the degree of parallelism is extremely inadvisable:
ThreadPool.SetMaxThreads(4, 4); // Don't do this!
You can use the Parallel class instead. It allows controlling the MaxDegreeOfParallelism quite easily.
After discovering state was IsFaulted, I added some code to add some exception information to the log (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/exception-handling-task-parallel-library). Seems the problem is an underlying database issue where there are not enough connections left in the connection pool (Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)--the additional speed allows queries to fire more quickly/frequently. Not sure entirely why, as I do have the SqlConnection enclosed in a using clause, but investigating a few things on that front. At any rate, the problem is clearly a little different than what I thought above, so marking this quasi-answered.
I have this method for receiving a file.
public Task Download(IProgress<int> downloadProgress)
{
return Task.Run
(
async () =>
{
var counter = 0;
var buffer = new byte[1024];
while (true)
{
var byteCount = await _networkStream.ReadAsync(buffer, 0, buffer.Length);
counter += byteCount;
downloadProgress.Report(counter);
if (byteCount != buffer.Length)
break;
}
}
);
}
Then in the UI I call it like this:
await Download(progress);
where progress is simply updating a label.
When I run, the UI will be blocked (but after some time it will correctly update the label). I don't understand why, shouldn't Task.Run() create a new thread?
How do I fix this please?
You are calling downloadProgress.Report on an infinite loop without any pause in execution. My educated guess is this means every time execution time is available on the UI thread, the non-UI thread is requesting an operation which will require the UI thread's time (as demanded by the synchronisation context) and therefore clogging it up with invocations.
Essentially, rather than blocking the UI thread with one long execution, you may be blocking it with an un-ending stream of tiny ones.
Try putting a Thread.Sleep(10) in your 'spinlock' while(true) { ... } loop and see if that alleviates the issue.
I have a monte-carlo simulation running across multiple threads with a progress bar to inform the user how it's going. The progress bar management is done in a separate thread using Invoke, but the Form is not updating.
Here is my code:
Thread reportingThread = new Thread(() => UpdateProgress(iSims, ref myBag));
reportingThread.Priority = ThreadPriority.AboveNormal;
reportingThread.Start();`
and here is the function being called:
private void UpdateProgress(int iSims, ref ConcurrentBag<simResult> myBag)
{
int iCount;
string sText;
if (myBag == null)
iCount = 0;
else
iCount = myBag.Count;
while (iCount < iSims)
{
if (this.Msg.InvokeRequired)
{
sText = iCount.ToString() + " simultions of " + iSims + " completed.";
this.Msg.BeginInvoke((MethodInvoker) delegate() { this.Msg.Text = sText; this.Refresh(); });
}
Thread.Sleep(1000);
iCount = myBag.Count;
}
}
I have used both Application.DoEvents() and this.refresh() to try to force the form to update, but nothing happens.
UPDATE: Here is the procedure calling the above function
private void ProcessLeases(Boolean bValuePremium)
{
int iSims, iNumMonths, iNumYears, iIndex, iNumCores, iSimRef;
int iNumSimsPerThread, iThread, iAssets, iPriorityLevel;
string sMsg;
DateTime dtStart, dtEnd;
TimeSpan span;
var threads = new List<Thread>();
ConcurrentBag<simResult> myBag = new ConcurrentBag<simResult>();
ConcurrentBag<summaryResult> summBag = new ConcurrentBag<summaryResult>();
this.Msg.Text = "Updating all settings";
Application.DoEvents();
ShowProgressPanel();
iSims = objSettings.getSimulations();
iNumCores = Environment.ProcessorCount;
this.Msg.Text = "Initialising model";
Application.DoEvents();
iNumSimsPerThread = Convert.ToInt16(Math.Round(Convert.ToDouble(iSims) / Convert.ToDouble(iNumCores), 0));
this.Msg.Text = "Spawning " + iNumCores.ToString() + " threads";
for (iThread = 0; iThread < iNumCores; iThread++)
{
int iStart, iEnd;
if (iThread == 0)
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = ((iThread + 1) * iNumSimsPerThread);
}
else
{
if (iThread < (iNumCores - 1))
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = ((iThread + 1) * iNumSimsPerThread);
}
else
{
iStart = (iThread * iNumSimsPerThread) + 1;
iEnd = iSims;
}
}
Thread thread = new Thread(() => ProcessParallelMonteCarloTasks(iStart, iEnd, iNumMonths, iSimRef, iSims, ref objDB, iIndex, ref objSettings, ref myBag, ref summBag));
switch (iPriorityLevel)
{
case 1: thread.Priority = ThreadPriority.Highest; break;
case 2: thread.Priority = ThreadPriority.AboveNormal; break;
default: thread.Priority = ThreadPriority.Normal; break;
}
thread.Start();
threads.Add(thread);
}
// Now start the thread to aggregate the MC results
Thread MCThread = new Thread(() => objPortfolio.MCAggregateThread(ref summBag, (iSims * iAssets), iNumMonths));
MCThread.Priority = ThreadPriority.AboveNormal;
MCThread.Start();
threads.Add(MCThread);
// Here we review the CollectionBag size to report progress to the user
Thread reportingThread = new Thread(() => UpdateProgress(iSims, ref myBag));
reportingThread.Priority = ThreadPriority.AboveNormal;
reportingThread.Start();
// Wait for all threads to complete
//this.Msg.Text = iNumCores.ToString() + " Threads running.";
foreach (var thread in threads)
thread.Join();
reportingThread.Abort();
this.Msg.Text = "Aggregating results";
Application.DoEvents();
this.Msg.Text = "Preparing Results";
Application.DoEvents();
ShowResults();
ShowResultsPanel();
}
As you can see, there are a number of updates to the Form before my Invoked call and they all work fine - in each case, I am using Application.DoEvents() to update.
myBag is a ConcurrentBag into which each monte-carlo thread dumps it's results. By using the Count method, I can see how many simulations have completed and update the user.
foreach (var thread in threads)
thread.Join();
This is your problem. You are blocking here so nothing will ever update in the UI thread until all your threads complete.
This is a critical point - .DoEvents() happens naturally and all by itself every time a block of code you have attached to a user interface event handler completes executing. One of your primary responsibilities as a developer is to make sure that any code executing in a user interface event handler completes in a timely manner (a few hundred milliseconds, maximum). If you write your code this way then there is never, ever, a need to call DoEvents()... ever.
You should always write your code this way.
Aside from performance benefits, a major plus of using threads is that they are asynchronous by nature - to take advantage of this you have to write your code accordingly. Breaking out of procedural habits is a hard one. What you need to do is to forget the .Join altogether and get out of your ProcessLeases method here - let the UI have control again.
You are dealing with updates in your threads already so all you need is completion notification to let you pick up in a new handler when all of your threads finish their work. You'll need to keep track of your threads - have them each notify on completion (ie: invoke some delegate back on the UI thread, etc) and in whatever method handles it you would do something like
if (allThreadsAreFinished) // <-- You'll need to implement something here
{
reportingThread.Abort();
this.Msg.Text = "Preparing Results";
ShowResults();
ShowResultsPanel();
}
Alternatively, you could also simply call ProcessLeases in a background thread (making sure to correctly invoke all of your calls within it) and then it wouldn't matter that you are blocking that thread with a .Join. You could also then do away with all of the messy calls to .DoEvents().
Additionally, you don't need the call to this.Refresh(); here :
this.Msg.BeginInvoke((MethodInvoker) delegate() {
this.Msg.Text = sText;
this.Refresh();
});
If you aren't blocking the UI thread the control will update just fine without it and you'll only add extra work for nothing. If you are blocking the UI thread then adding the .Refresh() call won't help because the UI thread won't be free to execute it any more than it will be free to execute the previous line. This is programming chaotically - randomly adding code hoping that it will work instead of examining and understanding the reasons why it doesn't.
Chapter 2 : The Workplace Analogy.
Imagine the UI thread is like the manager. The manager can delegate a task in several ways. Using .Join as you've done it is a bit like the manager giving everyone a job to do - Joe gets one job, Lucy gets another, Bill gets a third, and Sara gets a fourth. The manager has follow-up work to do once everyone is done so he comes up with a plan to get it done as soon as possible.
Immediately after giving everyone their task, the manager goes and sits at Joe's desk and stares at him, doing nothing, until Joe is done. When Joe finishes, he moves to Lucy's desk to check if she is done. If she isn't he waits there until Lucy finishes, then moves to Bill's desk and stares at him until he is done... then moves to Sara's desk.
Clearly this isn't productive. Furthermore, each of the four team members have been sending email status updates (Manager.BeginInvoke -> read your email!) to their manager but he hasn't read any of them because he has been spending all of his time sitting at their desks, staring at them, waiting for them to finish their tasks. He hasn't done anything else, for that matter, either. The bosses have been asking what's going on, his phone's been ringing, nobody has updated the weekly financials - nothing. The manager hasn't been able to do anything else because he decided that he needed to sit on his bottom and watch his team work until they finished.
The manager isn't responding... The manager may respond again if you wait. Do you want to fire the manager?
[YES - FIRE HIM] [NO - Keep Waiting]
Better, one would think, if the manager simply set everyone off to work on stuff and then got on with doing other things. All he cares about is when they finish working so all it takes is one more instruction for them to notify him when their work is complete. The UI thread is like your application's manager - its time is precious and you should use as little of it as absolutely necessary. If you have work to do, delegate to a worker thread and don't have the manager sit around waiting for others to finish work - have them notify when things are ready and let the manager go back to work.
Well the code is very partial, but at a glance if
this.Msg.InvokeRequired == false
the following code doesn't get executed. Can that be the issue?
I have developed an application in c#. The class structure is as follows.
Form1 => The UI form. Has a backgroundworker, processbar, and a "ok" button.
SourceReader, TimedWebClient, HttpWorker, ReportWriter //clases do some work
Controller => Has the all over control. From "ok" button click an instance of this class called "cntrl" is created. This cntrlr is a global variable in Form1.cs.
(At the constructor of the Controler I create SourceReader, TimedWebClient,HttpWorker,ReportWriter instances. )
Then I call the RunWorkerAsync() of the background worker.
Within it code is as follows.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int iterator = 1;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
cntrlr.Vmain(iterator-1);
backgroundWorker1.ReportProgress(iterator);
}
}
At themoment ReportProgress updates the progressbar.
The urlList mentioned above has 1000 of urls. cntlr.Vamin(int i) process the whole process at themoment. I want to give the task to several threads, each one having to process 100 of urls. Though access for other instances or methods of them is not prohibited, access to ReportWriter should be limited to only one thread at a time. I can't find a way to do this. If any one have an idea or an answer, please explain.
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.
I think I would have used the ThreadPool, instead of background worker, and given each thread 1, not 100 url's to process. The thread pool will limit the number of threads it starts at once, so you wont have to worry about getting 1000 requests at once. Have a look here for a good example
http://msdn.microsoft.com/en-us/library/3dasc8as.aspx
Feeling a little more adventurous? Consider using TPL DataFlow to download a bunch of urls:
var urls = new[]{
"http://www.google.com",
"http://www.microsoft.com",
"http://www.apple.com",
"http://www.stackoverflow.com"};
var tb = new TransformBlock<string, string>(async url => {
using(var wc = new WebClient())
{
var data = await wc.DownloadStringTaskAsync(url);
Console.WriteLine("Downloaded : {0}", url);
return data;
}
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 4});
var ab = new ActionBlock<string>(data => {
//process your data
Console.WriteLine("data length = {0}", data.Length);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
tb.LinkTo(ab); //join output of producer to consumer block
foreach(var u in urls)
{
tb.Post(u);
}
tb.Complete();
Note how you can control the parallelism of each block explicitly, so you can gather in parallel but process without going concurrent (for example).
Just grab it with nuget. Easy.
I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.
Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:
private void CreateThreads()
{
_startTime = DateTime.Now;
var totalThreads = 0;
var totalRecords = 0;
progressThreadsCreated.Maximum = _threadCount;
progressThreadsCreated.Step = 1;
LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
this.Update();
for(var i = 1; i <= _threadCount; i++)
{
LabelThreadsCreated.Text = i + " / " + _threadCount;
progressThreadsCreated.Value = i;
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
totalThreads += 1;
LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();
var paramss = new ArrayList { i, records };
var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
thread.Start(paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (!runOnce)
{
CreateProgressArea(threadCount, records.Count);
}
else
{
ResetProgressBarMethod(threadCount, records.Count);
}
runOnce = true;
var counter = 0;
if (records.Count > 0)
{
foreach (var record in records)
{
counter += 1;
runningCount += 1;
_totalRecords += 1;
var rec = record;
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);
if (_stopThreads)
{
break;
}
}
UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);
if (!_noRecordsInPool)
{
records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
if (records == null || records.Count <= 0)
{
_noRecordsInPool = true;
break;
}
else
{
lastCount = records.Count;
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.
If you decide to move to .NET 4.0, Tasks would be another way to go.
I allow the user to input the number
of threads they would like to have
running. I also allow the user to
allocate the number of records to
process per thread.
This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.
I am not performing any locking (and
not sure I need to or not) on the
threads.
The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.
Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.
In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)
Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.
The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?
How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?