"ConcurrentBag(T) is a thread-safe bag implementation, optimized for
scenarios where the same thread will be both producing and consuming
data stored in the bag." - MSDN
I have this exact use case (multiple threads both consuming and producing), but I need to be able to efficiently determine in a timely manner when the bag becomes permanently empty (my threads only produce based on what was consumed, and the bag is jumpstarted with a single element before the threads are started).
I have trouble figuring out a race-condition-free efficient way that is free of global locks to do this. I believe that introducing global locks would negate the benefits of using the mostly lock-free ConcurrentBag.
My actual use case is an "unordered" (binary) tree traversal. I just need to visit every node, and do some very light computation for each and every one of them. I don't care about the order in which they are visited. The algorithm should terminate when all nodes have been visited.
int taskCount = Environment.ProcessorCount;
Task[] tasks = new Task[taskCount];
var bag = new ConcurrentBag<TreeNode>();
bag.Add(root);
for (int i = 0; i < taskCount; i++)
{
int threadId = i;
tasks[threadId] = new Task(() =>
{
while(???) // Putting bag.IsEmpty>0 here would be obviously wrong as some other thread could have removed the last node but not yet added the node's "children"
{
TreeNode node;
bool success = bag.TryTake(out node);
if (!success) continue; //This spinning is probably not very clever here, but I don't really mind it.
// Placeholder: Do stuff with node
if (node.Left != null) bag.Add(node.Left);
if (node.Right != null) bag.Add(node.Right);
}
});
tasks[threadId].Start();
}
Task.WaitAll(tasks);
So how could one add an efficient termination condition to this? I don't mind the condition becoming costly when the bag is close to being empty.
I had this problem before. I had threads register as being in a wait state before checking the queue. If the queue is empty and all other threads are waiting as well we are done. If other threads are still busy, here comes the hack, sleep for 10ms. I believe it is possible to solve this without waiting by using some kind synchronization (maybe Barrier).
The code went like this:
string Dequeue()
{
Interlocked.Increment(ref threadCountWaiting);
try
{
while (true)
{
string result = queue.TryDequeue();
if (result != null)
return result;
if (cancellationToken.IsCancellationRequested || threadCountWaiting == pendingThreadCount)
{
Interlocked.Decrement(ref pendingThreadCount);
return null;
}
Thread.Sleep(10);
}
}
finally
{
Interlocked.Decrement(ref threadCountWaiting);
}
}
It might be possible to replace both the sleep and the counter maintenance with Barrier. I just did not bother, this was complicated enough already.
Interlocked operations are scalability bottlenecks because they are implemented using hardware spin locks basically. So you might want to insert a fast path at the beginning of the method:
string result = queue.TryDequeue();
if (result != null)
return result;
And most of the time the fast path will be taken.
Related
I have recursive function for tree.
Can I loop children node with Parallel.ForEachAsync?
private async Task<List<ResponseBase<BatchRowData>>> SaveDepartments(DepartmentTree node,
string parentUnitGuid, List<ResponseBase<BatchRowData>> allResponses)
{
if (parentUnitGuid == null)
{
return allResponses;
}
await Parallel.ForEachAsync(node.children, async (child, cancellationToken) =>
{
ResponseBase<BatchRowData> response = new ResponseBase<BatchRowData>();
//...do something
Unit unit = new Unit();
unit.SerialNum = child.data.DepartmentNumber;
unit.UnitName = child.data.DepartmentName;
unit.ParentUnitGuid = parentUnitGuid;
string unitGuid = await DBGate.PostAsync<string>("organization/SaveUnit", unit);
if (unitGuid != null)
{
response.IsSuccess = true;
response.ResponseData.ReturnGuid = unitGuid;
await SaveDepartments(child, unitGuid, allResponses);
}
else
{
response.IsSuccess = false;
response.ResponseData.ErrorDescription = "Failed to Save";
}
allResponses.Add(response);
});
return allResponses;
}
It works. but I wonder if the tree order levels is always saved with Parallel.ForEachAsync.
Because in my tree. it must be.
Or I should use simple sync foreach?
My application is ASP.NET Core 6.0.
First things first, the List<T> collection is not thread-safe, so you should synchronize the threads that are trying to Add to the list:
lock (allResponses) allResponses.Add(response);
Otherwise the behavior of your program is undefined.
Now regarding the validity of calling the Parallel.ForEachAsync recursively, it doesn't endanger the correctness of your program, but it does present the challenge of how to control the degree of parallelism. In general when you parallelize an operation you want to be able to limit the degree of parallelism, because over-parallelizing can be harmful for both the client and the server. You don't want to exhaust the memory, the CPU or the network bandwidth of either end.
The ParallelOptions.MaxDegreeOfParallelism property limits the parallelism of a specific Parallel.ForEachAsync loop. It does not have recursive effect. So you can't depend on this.
The easier way to solve this problem is to use a SemaphoreSlim as the throttler, and enclose the body of the Parallel.ForEachAsync loop in a try/finally block:
await semaphore.WaitAsync();
try
{
// The body
}
finally { semaphore.Release(); }
The semaphore should be initialized with the desirable maximum parallelism for both of its arguments. For example if the desirable limit is 5 then:
SemaphoreSlim semaphore = new(5, 5);
Another way of solving this problem would be to have a single non-recursive Parallel.ForEachAsync loop, which is fed with the items of a Channel<(DepartmentTree, string)>, and inside the body of the loop you write the children of each department in the channel. This should be more efficient than nesting multiple Parallel.ForEachAsync loops the one inside the other, but it is also more complex to implement, and more prone to programming errors (bugs). So I'll leave it out from this answer.
using System;
using System.Threading;
namespace Threading
{
class Program
{
static void Main(string[] args)
{
Semaphore even = new Semaphore(1, 1);
Semaphore odd = new Semaphore(1, 1);
Thread evenThread = new Thread(() =>
{
for (int i = 1; i <= 100; i++)
{
even.WaitOne();
if(i % 2 == 0)
{
Console.WriteLine(i);
}
odd.Release();
}
});
Thread oddThread = new Thread(() =>
{
for(int i = 1; i <=100; i++)
{
odd.WaitOne();
if(i%2 != 0)
{
Console.WriteLine(i);
}
even.Release();
}
});
oddThread.Start();
evenThread.Start();
}
}
}
So I have written this code where one thread is producing Odd numbers and other is producing even numbers.
Using Semaphores I have made sure that they print numbers in orders and it works perfectly.
But I have a special situation in mind, for example each thread waits until the other thread releases its semaphore. So can there be a condition where both threads are waiting and no thread is making any progress and there is a deadlock situation ?
For deadlock to occur, two or more threads must be trying to acquire two or more resources, but do so in different orders. See e.g. Deadlock and Would you explain lock ordering?.
Your code does not involve more than one lock per thread† and so does not have the ability to deadlock.
It does have the ability to throw an exception. As noted in this comment, it is theoretically possible for one of the threads to get far enough ahead of the other thread that it attempts to release a semaphore lock that hasn't already been taken. For example, if evenThread is pre-empted (or simply doesn't get scheduled to start running) before it gets to its first call to even.WaitOne(), but oddThread gets to run, then oddThread can acquire the odd semaphore, handle the if statement, and then try to call even.Release() before evenThread has had a chance to acquire that semaphore.
This will result in a SemaphoreFullException being thrown by the call to Release().
This would be a more likely possibility on a single-CPU system, something that is very hard to find these days. :) But it's still theoretically possible for any CPU configuration.
† Actually, there's an implicit lock in the Console.WriteLine() call, which is thread-safe by design. But from your code's point of view, that's an atomic operation. It's not possible for your code to acquire that lock and then wait on another. So it doesn't have any relevance to your specific question.
I have an event source which fired by a Network I/O very frequently, based on underlying design, of course the event was always on different thread each time, now I wrapped this event via Rx with: Observable.FromEventPattern(...), now I'm using the TakeWhile(predict) to filter some special event data.
At now, I have some concerns on its thread safety, the TakeWhile(predict) works as a hit and mute, but in concurrent situation, can it still be guaranteed? because I guess the underlying implementation could be(I can't read the source code since it's too complicated...):
public static IObservable<TSource> TakeWhile<TSource>(this IObservable<TSource> source, Func<TSource, bool> predict)
{
ISubject<TSource> takeUntilObservable = new TempObservable<TSource>();
IDisposable dps = null;
// 0 for takeUntilObservable still active, 1 for predict failed, diposed and OnCompleted already send.
int state = 0;
dps = source.Subscribe(
(s) =>
{
/* NOTE here the 'hit and mute' still not thread safe, one thread may enter 'else' and under CompareExchange, but meantime another thread may passed the predict(...) and calling OnNext(...)
* so the CompareExchange here mainly for avoid multiple time call OnCompleted() and Dispose();
*/
if (predict(s) && state == 0)
{
takeUntilObservable.OnNext(s);
}
else
{
// !=0 means already disposed and OnCompleted send, avoid multiple times called via parallel threads.
if (0 == Interlocked.CompareExchange(ref state, 1, 0))
{
try
{
takeUntilObservable.OnCompleted();
}
finally
{
dps.Dispose();
}
}
}
},
() =>
{
try
{
takeUntilObservable.OnCompleted();
}
finally { dps.Dispose(); }
},
(ex) => { takeUntilObservable.OnError(ex); });
return takeUntilObservable;
}
That TempObservable is just a simple implementation of ISubject.
If my guess reasonable, then seems the thread safety can't be guaranteed, means some unexpected event data may still incoming to OnNext(...) because that 'mute' is still on going.
Then I write a simple testing to verify, but out of expectation, the results are all positive:
public class MultipleTheadEventSource
{
public event EventHandler OnSthNew;
int cocurrentCount = 1000;
public void Start()
{
for (int i = 0; i < this.cocurrentCount; i++)
{
int j = i;
ThreadPool.QueueUserWorkItem((state) =>
{
var safe = this.OnSthNew;
if (safe != null)
safe(j, null);
});
}
}
}
[TestMethod()]
public void MultipleTheadEventSourceTest()
{
int loopTimes = 10;
int onCompletedCalledTimes = 0;
for (int i = 0; i < loopTimes; i++)
{
MultipleTheadEventSource eventSim = new MultipleTheadEventSource();
var host = Observable.FromEventPattern(eventSim, "OnSthNew");
host.TakeWhile(p => { return int.Parse(p.Sender.ToString()) < 110; }).Subscribe((nxt) =>
{
//try print the unexpected values, BUT I Never saw it happened!!!
if (int.Parse(nxt.Sender.ToString()) >= 110)
{
this.testContextInstance.WriteLine(nxt.Sender.ToString());
}
}, () => { Interlocked.Increment(ref onCompletedCalledTimes); });
eventSim.Start();
}
// simply wait everything done.
Thread.Sleep(60000);
this.testContextInstance.WriteLine("onCompletedCalledTimes: " + onCompletedCalledTimes);
}
before I do the testing, some friends here suggest me try to use Synchronize<TSource> or ObserveOn to make it thread safe, so any idea on my proceeding thoughts and why the issue not reproduced?
As per your other question, the answer still remains the same: In Rx you should assume that Observers are called in a serialized fashion.
To provider a better answer; Originally the Rx team ensured that the Observable sequences were thread safe, however the performance penalty for well behaved/designed applications was unnecessary. So a decision was taken to remove the thread safety to remove the performance cost. To allow you to opt back into to thread safety you could apply the Synchronize() method which would serialize all method calls OnNext/OnError/OnCompleted. This doesn't mean they will get called on the same thread, but you wont get your OnNext method called while another one is being processed.
The bad news, from memory this happened in Rx 2.0, and you are specifically asking about Rx 1.0. (I am not sure Synchonize() even exists in 1.xx?)
So if you are in Rx v1, then you have this blurry certainty of what is thread safe and what isn't. I am pretty sure the Subjects are safe, but I can't be sure about the factory methods like FromEventPattern.
My recommendation is: if you need to ensure thread safety, Serialize your data pipeline. The easiest way to do this is to use a single threaded IScheduler implementation i.e. DispatcherScheduler or a EventLoopScheduler instance.
Some good news is that when I wrote the book on Rx it did target v1, so this section is very relevant for you http://introtorx.com/Content/v1.0.10621.0/15_SchedulingAndThreading.html
So if your query right now looked like this:
Observable.FromEventPatter(....)
.TakeWhile(x=>x>5)
.Subscribe(....);
To ensure that the pipeline is serialized you can create an EventLoopScheduler (at the cost of dedicating a thread to this):
var scheduler = new EventLoopScheduler();
Observable.FromEventPatter(....)
.ObserveOn(scheduler)
.TakeWhile(x=>x>5)
.Subscribe(....);
I am trying to write a round-robin scheduler for lightweight threads (fibers). It must scale to handle as many concurrently-scheduled fibers as possible. I also need to be able to schedule fibers from threads other than the one the run loop is on, and preferably unschedule them from arbitrary threads as well (though I could live with only being able to unschedule them from the run loop).
My current idea is to have a circular doubly-linked list, where each fiber is a node and the scheduler holds a reference to the current node. This is what I have so far:
using Interlocked = System.Threading.Interlocked;
public class Thread {
internal Future current_fiber;
public void RunLoop () {
while (true) {
var fiber = current_fiber;
if (fiber == null) {
// block the thread until a fiber is scheduled
continue;
}
if (fiber.Fulfilled)
fiber.Unschedule ();
else
fiber.Resume ();
//if (current_fiber == fiber) current_fiber = fiber.next;
Interlocked.CompareExchange<Future> (ref current_fiber, fiber.next, fiber);
}
}
}
public abstract class Future {
public bool Fulfilled { get; protected set; }
internal Future previous, next;
// this must be thread-safe
// it inserts this node before thread.current_fiber
// (getting the exact position doesn't matter, as long as the
// chosen nodes haven't been unscheduled)
public void Schedule (Thread thread) {
next = this; // maintain circularity, even if this is the only node
previous = this;
try_again:
var current = Interlocked.CompareExchange<Future> (ref thread.current_fiber, this, null);
if (current == null)
return;
var target = current.previous;
while (target == null) {
// current was unscheduled; negotiate for new current_fiber
var potential = current.next;
var actual = Interlocked.CompareExchange<Future> (ref thread.current_fiber, potential, current);
current = (actual == current? potential : actual);
if (current == null)
goto try_again;
target = current.previous;
}
// I would lock "current" and "target" at this point.
// How can I do this w/o risk of deadlock?
next = current;
previous = target;
target.next = this;
current.previous = this;
}
// this would ideally be thread-safe
public void Unschedule () {
var prev = previous;
if (prev == null) {
// already unscheduled
return;
}
previous = null;
if (next == this) {
next = null;
return;
}
// Again, I would lock "prev" and "next" here
// How can I do this w/o risk of deadlock?
prev.next = next;
next.previous = prev;
}
public abstract void Resume ();
}
As you can see, my sticking point is that I cannot ensure the order of locking, so I can't lock more than one node without risking deadlock. Or can I? I don't want to have a global lock on the Thread object, since the amount of lock contention would be extreme. Plus, I don't especially care about insertion position, so if I lock each node separately then Schedule() could use something like Monitor.TryEnter and just keep walking the list until it finds an unlocked node.
Overall, I'm not invested in any particular implementation, as long as it meets the requirements I've mentioned. Any ideas would be greatly appreciated. Thanks!
EDIT - The comments indicate that people think I'm talking about winapi fibers (which I'm not). In a nutshell, all I want to do is schedule bits of code to run one after the other on the a single thread. It is similar to the TPL / Async CTP, but AFIK those do not guarantee continuations on the same thread unless it happens to be the UI thread. I am open to alternate suggestions on how to implement the above, but please don't just say "don't use fibers."
Use the Task Parallel Library.
Create a custom TaskScheduler as shown on MSDN. In your custom task scheduler, if you want just one thread you can have just one thread.
If you want to prevent it from scheduling tasks inline, override protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued) and return false.
Don't use Fibers with .NET, read anything by Joe Duffy on this subject. Use the TPL instead for a correct implementation of a user-mode scheduler.
I am guessing that running on a single thread is not required, only that one task at a time runs to completion. If this is the case, download the TPL samples. Look into using the LimitedConcurrencyLevelTaskScheduler with a MaximumConcurrency of 1.
You should be able to use a simple lock-free message queue to queue your Futures.
Rather than keeping a cyclic datastructure, you pull the Future off the queue before calling Resume(), and have it appended when completed (if there is more work to do).
I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.
Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:
private void CreateThreads()
{
_startTime = DateTime.Now;
var totalThreads = 0;
var totalRecords = 0;
progressThreadsCreated.Maximum = _threadCount;
progressThreadsCreated.Step = 1;
LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
this.Update();
for(var i = 1; i <= _threadCount; i++)
{
LabelThreadsCreated.Text = i + " / " + _threadCount;
progressThreadsCreated.Value = i;
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
totalThreads += 1;
LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();
var paramss = new ArrayList { i, records };
var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
thread.Start(paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (!runOnce)
{
CreateProgressArea(threadCount, records.Count);
}
else
{
ResetProgressBarMethod(threadCount, records.Count);
}
runOnce = true;
var counter = 0;
if (records.Count > 0)
{
foreach (var record in records)
{
counter += 1;
runningCount += 1;
_totalRecords += 1;
var rec = record;
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);
if (_stopThreads)
{
break;
}
}
UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);
if (!_noRecordsInPool)
{
records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
if (records == null || records.Count <= 0)
{
_noRecordsInPool = true;
break;
}
else
{
lastCount = records.Count;
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.
If you decide to move to .NET 4.0, Tasks would be another way to go.
I allow the user to input the number
of threads they would like to have
running. I also allow the user to
allocate the number of records to
process per thread.
This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.
I am not performing any locking (and
not sure I need to or not) on the
threads.
The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.
Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.
In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)
Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.
The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?
How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?