I have a problem stopping a Parallel for each loop.
I am iterating over a set of about 40.000 DataRows retrieved from a table, and I need to stop my loop immediately when I have 100 items in my resultset. The problem is that when I trigger the Stop method on the ParallelLoopState, the iteration is not stopped immediately, causing inconsistency in my resultset ( either to few or to many items).
Is there no way to make sure, that I kill all threads, as soon as I hit stop?
List<DataRow> rows = new List<DataRow>(dataTable.Select());
ConcurrentDictionary<string, object> resultSet = new ConcurrentDictionary<string, object>();
rows.EachParallel(delegate (DataRow row, ParallelLoopState state)
{
if (!state.IsStopped)
{
using (SqlConnection sqlConnection = new SqlConnection(Global.ConnStr))
{
sqlConnection.Open();
//{
// Do some processing.......
//}
var sourceKey = "key retrieved from processing";
if (!resultSet.ContainsKey(sourceKey))
{
object myCustomObj = new object();
resultSet.AddOrUpdate(
sourceKey,
myCustomObj,
(key, oldValue) => myCustomObj);
}
if (resultSet.Values.Count == 100)
state.Stop();
}
}
});
The documentation page of ParallelLoopState.Stop explains that calling Stop() will prevent new iterations from starting. It won't abort any existing iterations.
Stop() also sets the IsStopped property to true. Long running iterations can check the value of IsStopped and exit prematurely if required.
This is called cooperative cancellation which is far better than aborting threads. Aborting a thread is expensive and makes cleanup difficult. Imagine what would happen if a ThreadAbort exception was thrown just when you wanted to commit your work.
Cooperative cancellation on the other hand allows a task to exit gracefully after commiting or aborting transactions as necessary, closing connections, cleaning up other state and files etc.
Furthermore, Parallel uses tasks, not threads, to process chunks of data. One of those threads is the original thread that started the parallel operation. Aborting wouldn't just waste threadpool threads, it would also kill the main thread.
This is not a bug either - Parallel is meant to solve data parallelism problems, not asynchronous execution. In this scenario, one wants the system to use as many tasks as appropriate to process the data and continue once that processing is complete.
Related
Let's say i have .NET Core 2.0/2.1 program.
There is a thread executing the following method. I want to stop it forcefully.
Important notes:
Cooperative multitasking (for example, with CancellationToken) is a good thing, but not the case
XY problem (https://en.wikipedia.org/wiki/XY_problem) does exist, but i just want to know if stopping this thread is actually possible
while (true)
{
var i = 0;
try
{
Console.WriteLine($"Still alive {i++}");
}
catch (Exception e)
{
Console.WriteLine($"Caught {e.GetType().Name}");
}
}
Tried several options:
Thread.Abort - throws PlatformNotSupportedException, not an option
Thread.Interrupt - only works for threads in WaitSleepJoin state, which is not the case
Calling native API methods such as TerminateThread from kernel32.dll on Windows. This approach has a lot of problems like non-released locks (https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717(v=vs.85).aspx)
Concerns, from most important to least:
Releasing locks
Disposing objects in using directives
Actually collecting allocated objects
(as a corner case we can assume that out thread does not perform any heap allocations at all)
Use a ManualResetEventSlim. The instance will need to be available to both the thread you are trying to stop and the thread which will cause the stop.
In your while(true) loop, do something like this:
var shouldTerminate = mres.Wait(100);
if (shouldTerminate) { break; }
What this does is wait until the ManualResetEvent is put into a Set state, or 100ms, whichever comes first. The value returned indicates if the event is Set or Unset. You'll start off with the MRE in an Unset state, and when the control thread wishes to terminate the worker thread, it will call the Set method, and then it can Join the worker thread to wait for it to finish. This is important as in your loop you could perhaps be waiting on a network call to finish, and the worker won't actually terminate until you are back at the top of the loop again. If you need to, you could check the MRE with Wait at multiple points in the worker thread to prevent further expensive operations from continuing.
I understand a Barrier can be used to have several tasks synchronise their completion before a second phase runs.
I would like to have several tasks synchronise multiple steps like so:
state is 1;
Task1 runs and pauses waiting for state to become 2;
Task2 runs and pauses waiting for state to become 2;
Task2 is final Task and causes the state to progress to state 2;
Task1 runs and pauses waiting for state to become 3;
Task2 runs and pauses waiting for state to become 3;
Task2 is final Task and causes the state to progress to state 3;
state 3 is final state and so all tasks exit.
I know I can spin up new tasks at the end of each state, but since each task does not take too long, I want to avoid creating new tasks for each step.
I can run the above synchronously using for loops, but final state can be 100,000, and so I would like to make use of more than one thread to run the process faster as the process is CPU bound.
I have tried using a counter to keep track of the number of completed Tasks that is incremented by each Task on completion. If the Task is the final Task to complete then it will change the state to the next state. All completed Tasks then wait using while (iterationState == state) await Task.Yield but the performance is terrible and it seems to me a very crude way of doing it.
What is the most efficient way to get the above done? There must be an optimised tool to get this done?
I'm using Parallel.For, creating 300 tasks, and each task needs to run through up to 100,000 states. Each task running through one state completes in less than a second, and creating 300 * 100,000 tasks is a huge overhead that makes running the whole thing synchronously much faster, even if using a single thread.
So I'd like to create 300 Tasks and have these Tasks synchronise moving through the 100,000 states. Hopefully the overhead of creating only 300 tasks instead of 300 * 100,000 tasks, with the overhead of optimised synchronisation between the tasks, will run faster than when doing it synchronously on a single thread.
Each state must complete fully before the next state can be run.
So - what's the optimal synchronisation technique for this scenario? Thanks!
while (iterationState == state) await Task.Yield is indeed a terrible solution to synchronize across your 300 tasks (and no, 300 isn't necessarily super-expensive: you'll only get a reasonable number of threads allocated).
The key problem here isn't the Parallel.For, it's synchronizing across 300 tasks to wait efficiently until each of them have completed a given phase.
The simplest and cleanest solution here would probably be to have a for loop over the stages and a parallel.for over the bit you want parallelized:
for (int stage = 0; stage < 10000; stage++)
{
// the next line blocks until all 300 have completed
// will use thread pool threads as necessary
Parallel.For( ... 300 items to process this stage ... );
}
No extra synchronization primitives needed, no spin-waiting consuming CPU, no needless thrashing between threads trying to see if they are ready to progress.
I think I am understanding what you are trying to do, so here is a suggested way to handle it. Note - I am using Action as the type for the blocking collection, but you can change it to whatever would work best in your scenario.
// Shared variables
CountdownEvent workItemsCompleted = new CountdownEvent(300);
BlockingCollection<Action> workItems = new BlockingCollection<Action>();
CancellationTokenSource cancelSource = new CancellationTokenSource();
// Work Item Queue Thread
for(int i=1; i < stages; ++i)
{
workItemsCompleted.Reset(300);
for(int j=0; j < workItemsForStage[i].Count; ++j)
{
workItems.Add(() => {}) // Add your work item here
}
workItemsCompleted.Wait(token) // token should be passed in from cancelSource.Token
}
// Worker threads that are making use of the queue
// token should be passed to the threads from cancelSource.Token
while(!token.IsCancelled)
{
var item = workItems.Take(token); // Blocks until available item or token is cancelled
item();
workItemsCompleted.Signal();
}
You can use cancelSource from your main thread to cancel the running operations if you need to. In your worker threads you would then need to handle the OperationCancelledException. With this setup you can launch as many worker threads as you need and easily benchmark where you are getting your optimal performance (maybe it is with only using 10 worker threads, etc). Just launch as many workers as you want and then queue up the work items in the Work item queue thread. It's basically a producer-consumer type model except that the producer queues up one phase of the work, then blocks until that phase is done and then queues up the next round of work.
I'm trying to write to Azure Table Storage asynchronously using BeginExecute but have been getting inconsistent results. When I change BeginExecute to Execute, then everything gets written properly, but I'm guessing I have something wrong in my threads that they are either cancelling each other or something depending on how fast the main thread sends the messages. Here's what I'm doing now:
TableOperation op = TableOperation.Insert(entity);
_table.BeginExecute(op, new AsyncCallback(onTableExecuteComplete), entity);
private void onTableExecuteComplete(IAsyncResult result)
{
TableResult tr = _table.EndExecute(result);
if ((tr.HttpStatusCode < 200) || (tr.HttpStatusCode > 202))
{
Console.WriteLine("Error writing to table.");
}
}
I'm testing it with a few entries, and I'll get one or two entries in the table, but not all of them. Any ideas on how to catch errors and make sure that all the entries are written properly?
Update: I found that when I put Thread.Sleep(5000); at the end of my main thread, everything finishes writing. Is there a way to pause the main thread before it ends to ensure all other threads have finished so they don't get cancelled before they're done?
Likely what is happening is that your main thread ends, and destroys all active child threads. When you are doing asynchronous programming, your main thread either needs to be running long enough to wait for completion (such as a service), or it needs to wait for asynchronous tasks to finish:
var result = _table.BeginExecute(op,
new AsyncCallback(onTableExecuteComplete), entity);
result.AsyncWaitHandle.WaitOne();
Source: http://msdn.microsoft.com/en-us/library/system.iasyncresult.aspx
This of course begs the question: if you are not needing to do anything else while you are waiting for the "asynchronous" task to complete, then you might as well do it synchronously to keep things simpler. The purpose of the asynchronous pattern is for threads that shouldn't be blocked while waiting for some other process to finish - at a cost of increased complexity.
I have a thread, call it the "Parsing thread".
Thread parsingThread = new Thread(myMethod);
I perform some computations on this thread, of which the last involves more parallel computations.
public void ReadCityFiles(BlockingCollection<GeonamesFileInfo> files)
{
Parallel.ForEach<GeonamesFileInfo>(
files.GetConsumingPartitioner<GeonamesFileInfo>(),
new ParallelOptions { MaxDegreeOfParallelism = _maxParallelism },
(inputFile, args) =>
{
RaiseFileParsing(inputFile);
using (var input = new System.IO.StreamReader(inputFile.FullName))
{
while (!input.EndOfStream)
{
RaiseEntryParsed(ParseCity(input.ReadLine()));
Interlocked.Increment(ref _parsedEntries);
}
}
RaiseFileParsed(inputFile);
});
RaiseDirectoryParsed(Directory);
}
The problem is that when these very long and computationally expensive async foreach operations finish (~30 mins), the "Parsing Thread" doesn't resume. My GUI is still responsive, but the RaiseDirectoryParsed function that is supposed to continue to run on the "Parsing Thread" is never called. I debugged the program up to this point, and am pretty baffled as to what to do in this situation.
The point of BlockingCollection is that when an operation cannot be performed now, but might be in the future (e.g. Take() or Add() on a collection with bounded capacity), it will block. The same applies to GetConsumingEnumerable() and thus also to GetConsumingPartitioner(): if the collection is currently empty, the enumerable will block until you add more items to the collection.
But there is also a way to tell the collection that you're not going to add new items anymore and that it shouldn't block when empty from now on: the CompleteAdding() method. If you call this when you know you won't be adding any more new items to the collection, your Parallel.ForEach() won't block anymore and your thread will continue executing.
well, i make a loop that makes a lot of threads, see:
foreach (DataGridViewRow dgvRow in dataGridView1.Rows)
{
Class h = new Class(dgvRow.Cells["name"].Value.ToString());
Thread trdmyClass = new Thread(h.SeeInfoAboutName);
trdmyClass.IsBackground = true;
trdmyClass.Start();
}
This is working fine, creating the threads that i need, but i want to stop all this threads (using Thread.Abort()), in one time when i click on a button for e.g.
How can i do this?
I wouldn't use Thread.Abort. It can have some very nasty consequences. What you should be doing is keeping track of the threads you create by putting them into a list. You can then use a ManualResetEvent class. The threads should check if the reset was raised or not periodically and if it has been set, they should cleanup and exit. I use the WaitOne method will a millisecond timeout and then check the return value to allow threads to run in a loop. If true is returned the signal is set and you can exit the loop or otherwise return from your thread. If you're using .Net 4, you can also use a CancelationToken as well.
http://msdn.microsoft.com/en-us/library/system.threading.manualresetevent.aspx
http://msdn.microsoft.com/en-us/library/system.threading.cancellationtoken.aspx
Read more about the issues with Thread.Abort here: http://msdn.microsoft.com/en-us/library/ty8d3wta.aspx
EDIT: I use a ManualResetEvent as its thread safe and you could use it to syncronize the processing in the threads, for example if you're doing a producer / consumer pattern. A volatile boolean could be used as well. I recommend keeping the threads in a list in case you need to wait for them to complete, so you can Join on each one. This may or may not be applicable to your problem though. Its usually a good idea, especially if you're exiting, to Join all your threads to allow them to finish any cleanup they may be doing.
You really shouldn't use Thread.Abort(), it can be very dangerous. Instead, you should provide some way to signal to the threads that they are canceled. Each thread would then periodically check whether it's canceled and end if it was.
One way to do this would be to use CancellationToken, which does exactly that. The framework methods that support cancellation work with this type too.
Your code could then look something like this:
// field to keep CancellationTokenSource:
CancellationTokenSource m_cts;
// in your method:
m_cts = new CancellationTokenSource();
foreach (DataGridViewRow dgvRow in dataGridView1.Rows)
{
Class h = new Class(dgvRow.Cells["name"].Value.ToString());
Thread trdmyClass = new Thread(() => h.SeeInfoAboutName(m_cts.Token));
trdmyClass.IsBackground = true;
trdmyClass.Start();
}
//somewhere else, where you want to cancel the threads:
m_cts.Cancel();
// the SeeInfoAboutName() method
public void SeeInfoAboutName(CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
// do some work
}
}
Keep all the threads in a List, and then loop through the list and stop them.