Is wrapping a Task with lock not very useful? - c#

What intent is expressed here?:
lock(Locker)
{
Task.Factory.StartNew(()=>
{
foreach(var item in this.MyNonCurrentCollection)
{
//modify non-concurrent collection
}
}, CancellationToken.None, TaskCreationOptions.None, TaskScheduler.FromCurrentSynchonizationContext())
.ContinueWith(t => this.RaisePropertyChanged("MyNonCurrentCollection"));
}
Will the system lock (queue) until the Task completes or will the system lock only to start a new Task? The latter implies that this lock is kind if useless, right? I am just trying to discover intent from someone else's code. The ideal here is to protect MyNonCurrentCollection from being modified by another thread.

Will the system lock (queue) until the Task completes
No.
will the system lock only to start a new Task?
Yes.
The latter implies that this lock is kind if useless, right?
It would seem so, although you can't always be sure without seeing the full context. For example, sometimes I'll write code that needs to check if it should start a task, based on a resource that requires locking, thus locking around code that just starts the task might be appropriate. If you're not doing anything besides starting the task though, that's probably not the case.
The ideal here is to protect MyNonCurrentCollection from being modified by another thread.
This does nothing to prevent that.
Side note, modifying a collection inside of a foreach over that collection is a bad idea. Some collections will be nice enough to just throw some sort of concurrent modification exception. Less nice collections will just produce mangled results.

The system will lock until the task is instantiated and kicked off. Task.Factory.StartNew is asynchronous. Your lock should not be acquired for very long, even if the task takes a while.
Inside the task, you should be actually locking the shared resource, not around the creation of the task. The lock will not have an effect on the safety of the resource unless the task completes extremely quickly and gets preemptively scheduled before the lock is exited.
This is a bug, yes.

Related

Why the lock inside AsyncLock does not block the thread?

I'm trying to understand how the AsyncLock works.
First of all, here's a snippet to prove that it actually works:
var l = new AsyncLock();
var tasks = new List<Task>();
while (true)
{
Console.ReadLine();
var i = tasks.Count + 1;
tasks.Add(Task.Run(async () =>
{
Console.WriteLine($"[{i}] Acquiring lock ...");
using (await l.LockAsync())
{
Console.WriteLine($"[{i}] Lock acquired");
await Task.Delay(-1);
}
}));
}
By "works" I mean that you can run as many tasks as you want (by hitting Enter) and the number of threads doesn't grow. If you replace it with traditional lock, you'll see that the new threads are started, which is what we try to avoid.
But the first thing you see in the source code is... the lock
Can somebody please explain me how this works, why it doesn't block, and what am I missing here?
Can somebody please explain me how this works, why it doesn't block, and what am I missing here?
The short answer is that lock is just an internal mechanism used to guarantee thread safety. The lock is never exposed in any way, and there's no way for any thread to hold that lock for any real amount of time. In this way, it's similar to the locks used internally by various concurrent collections.
There is an alternate approach that uses lock-free programming, but I have found lock-free programming to be extremely difficult to write, read, and maintain. A great example of this (which is sadly not online) was a bunch of Dr. Dobb's articles in the late '90s, each one trying to out-do the last with a better lock-free queue implementation. It turns out they were all faulty - in some cases, the bugs took more than a decade to find.
For my own code, I do not use lock-free programming, except where the correctness of the code is trivially obvious.
As far as the async lock vs lock concepts, I'm going to take a stab at explaining this. There's a feeling I get that I have only felt when working with asynchronous coordination primitives. It's something I've thought a lot about writing a blog post on, but I don't have the right words to make it understandable. That said, here goes...
Asynchronous coordination primitives exist on a completely different plane than normal coordination primitives. Synchronous primitives block threads and signal threads. Asynchronous primitives just work on plain objects; the blocking or signaling is just "by convention".
So, with a normal lock, the calling code must take the lock immediately. But with an asynchronous "lock", the attempted lock is just a request, just an object. The calling code doesn't even need to await it. It's possible to request several locks and await them all together with Task.WhenAll. Or even combine them with other things; code can do crazy things like (a)wait for two locks to both be free or for a signal (like AsyncManualResetEvent) to be sent, and then cancel the lock requests if the signal comes in first.
From a thread perspective, it's kinda-sorta like user-mode thread scheduling. There's also some similarities to cooperative multitasking (as opposed to preemptive). But overall, the asynchronous primitives are "lifted" to a different plane, where one works only with objects and blocks of code, not threads.
The lock inside AsyncLock is beeing released very quickly. Each task which tries to acquire AsyncLock, successfully acquires it's internal lock and the actual locking logic is done with a queue.
By wrapping LockAsync() within using block, the lock is being released when the block ends since LockAsync returns a disposable object Key which will be disposed at the end of the using block, and upon disposing the lock will be released. see https://github.com/StephenCleary/AsyncEx/blob/master/src/Nito.AsyncEx.Coordination/AsyncLock.cs#L182-L185

When should Task.ContinueWith be called with TaskScheduler.Current as an argument?

We are using this code snippet from StackOverflow to produce a Task that completes as soon as the first of a collection of tasks completes successfully. Due to the non-linear nature of its execution, async/await is not really viable, and so this code uses ContinueWith() instead. It doesn't specify a TaskScheduler, though, which a number of sources have mentioned can be dangerous because it uses TaskScheduler.Current when most developers usually expect TaskScheduler.Default behavior from continuations.
The prevailing wisdom appears to be that you should always pass an explicit TaskScheduler into ContinueWith. However, I haven't seen a clear explanation of when different TaskSchedulers would be most appropriate.
What is a specific example of a case where it would be best to pass TaskScheduler.Current into ContinueWith(), as opposed to TaskScheduler.Default? Are there rules of thumb to follow when making this decision?
For context, here's the code snippet I'm referring to:
public static Task<T> FirstSuccessfulTask<T>(IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var tcs = new TaskCompletionSource<T>();
int remainingTasks = taskList.Count;
foreach(var task in taskList)
{
task.ContinueWith(t =>
if(task.Status == TaskStatus.RanToCompletion)
tcs.TrySetResult(t.Result));
else
if(Interlocked.Decrement(ref remainingTasks) == 0)
tcs.SetException(new AggregateException(
tasks.SelectMany(t => t.Exception.InnerExceptions));
}
return tcs.Task;
}
Probably you need to choose a task scheduler that is appropriate for actions that an executing delegate instance performs.
Consider following examples:
Task ContinueWithUnknownAction(Task task, Action<Task> actionOfTheUnknownNature)
{
// We know nothing about what the action do, so we decide to respect environment
// in which current function is called
return task.ContinueWith(actionOfTheUnknownNature, TaskScheduler.Current);
}
int count;
Task ContinueWithKnownAction(Task task)
{
// We fully control a continuation action and we know that it can be safely
// executed by thread pool thread.
return task.ContinueWith(t => Interlocked.Increment(ref count), TaskScheduler.Default);
}
Func<int> cpuHeavyCalculation = () => 0;
Action<Task> printCalculationResultToUI = task => { };
void OnUserAction()
{
// Assert that SynchronizationContext.Current is not null.
// We know that continuation will modify an UI, and it can be safely executed
// only on an UI thread.
Task.Run(cpuHeavyCalculation)
.ContinueWith(printCalculationResultToUI, TaskScheduler.FromCurrentSynchronizationContext());
}
Your FirstSuccessfulTask() probably is the example where you can use TaskScheduler.Default, because the continuation delegate instance can be safely executed on a thread pool.
You can also use custom task scheduler to implement custom scheduling logic in your library. For example see Scheduler page on Orleans framework website.
For more information check:
It's All About the SynchronizationContext article by Stephen Cleary
TaskScheduler, threads and deadlocks article by Cosmin Lazar
StartNew is Dangerous article by Stephen Cleary
I'll have to rant a bit, this is getting way too many programmers into trouble. Every programming aid that was designed to make threading look easy creates five new problems that programmers have no chance to debug.
BackgroundWorker was the first one, a modest and sensible attempt to hide the complications. But nobody realizes that the worker runs on the threadpool so should never occupy itself with I/O. Everybody gets that wrong, not many ever notice. And forgetting to check e.Error in the RunWorkerCompleted event, hiding exceptions in threaded code is a universal problem with the wrappers.
The async/await pattern is the latest, it makes it really look easy. But it composes extraordinarily poorly, async turtles all the way down until you get to Main(). They had to fix that eventually in C# version 7.2 because everybody got stuck on it. But not fixing the drastic ConfigureAwait() problem in a library. It is completely biased towards library authors knowing what they are doing, notable is that a lot of them work for Microsoft and tinker with WinRT.
The Task class bridged the gap between the two, its design goal was to make it very composable. Good plan, they could not predict how programmers were going to use it. But also a liability, inspiring programmers to ContinueWith() up a storm to glue tasks together. Even when it doesn't make sense to do so because those tasks merely run sequentially. Notable is that they even added an optimization to ensure that the continuation runs on the same thread to avoid the context switch overhead. Good plan, but creating the undebuggable problem that this web site is named for.
So yes, the advice you saw was a good one. Task is useful to deal with asynchronicity. A common problem that you have to deal with when services move into the "cloud" and latency gets to be a detail you can no longer ignore. If you ContinueWith() that kind code then you invariably care about the specific thread that executes the continuation. Provided by TaskScheduler, low odds that it isn't the one provided by FromCurrentSynchronizationContext(). Which is how async/await happened.
If current task is a child task, then using TaskScheduler.Current will mean the scheduler will be that which the task it is in, is scheduled to; and if not inside another task, TaskScheduler.Current will be TaskScheduler.Default and thus use the ThreadPool.
If you use TaskScheduler.Default, then it will always go to the ThreadPool.
The only reason you would use TaskScheduler.Current:
To avoid the default scheduler issue, you should always pass an
explicit TaskScheduler to Task.ContinueWith and Task.Factory.StartNew.
From Stephen Cleary's post ContinueWith is Dangerous, Too.
There's further explanation here from Stephen Toub on his MSDN blog.
I most certainly don't think I am capable of providing bullet proof answer but I will give my five cents.
What is a specific example of a case where it would be best to pass TaskScheduler.Current into ContinueWith(), as opposed to TaskScheduler.Default?
Imagine you are working on some web api that webserver naturally makes multithreaded. So you need to compromise your parallelism because you don't want to use all the resources of your webserver, but at the same time you want to speed up your processing time, so you decide to make custom task scheduler with lowered concurrency level because why not.
Now your api needs to query some database and order the results, but these results are millions so you decide to do it via Merge Sort(divide and conquer), then you need all your child tasks of this algorithm to be complient with your custom task scheduler (TaskScheduler.Current) because otherwise you will end up taking all the resources for the algorithm and your webserver thread pool will starve.
When to use TaskScheduler.Current, TaskScheduler.Default, TaskScheduler.FromCurrentSynchronizationContext(), or some other TaskScheduler
TaskScheduler.FromCurrentSynchronizationContext() - Specific for WPF,
Forms applications UI thread context, you use this basically when you
want to get back to the UI thread after being offloaded some work to
non-UI thread
example taken from here
private void button_Click(…)
{
… // #1 on the UI thread
Task.Factory.StartNew(() =>
{
… // #2 long-running work, so offloaded to non-UI thread
}).ContinueWith(t =>
{
… // #3 back on the UI thread
}, TaskScheduler.FromCurrentSynchronizationContext());
}
TaskScheduler.Default - Almost all the time when you don't have any specific requirements, edge cases to collate with.
TaskScheduler.Current - I think I've given one generic example above, but in general it should be used when you have either custom scheduler or you explicitly passed TaskScheduler.FromCurrentSynchronizationContext() to TaskFactory or Task.StartNew method and later you use continuation tasks or inner tasks (so pretty damn rare imo).

Are non-thread-safe functions async safe?

Consider the following async function that modifies a non-thread-safe list:
async Task AddNewToList(List<Item> list)
{
// Suppose load takes a few seconds
Item item = await LoadNextItem();
list.Add(item);
}
Simply put: Is this safe?
My concern is that one may invoke the async method, and then while it's loading (either on another thread, or as an I/O operation), the caller may modify the list.
Suppose that the caller is partway through the execution of list.Clear(), for example, and suddenly the Load method finishes! What will happen?
Will the task immediately interrupt and run the list.Add(item); code? Or will it wait until the main thread is done with all scheduled CPU tasks (ie: wait for Clear() to finish), before running the code?
Edit: Since I've basically answered this for myself below, here's a bonus question: Why? Why does it immediately interrupt instead of waiting for CPU bound operations to complete? It seems counter-intuitive to not queue itself up, which would be completely safe.
Edit: Here's a different example I tested myself. The comments indicate the order of execution. I am disappointed!
TaskCompletionSource<bool> source;
private async void buttonPrime_click(object sender, EventArgs e)
{
source = new TaskCompletionSource<bool>(); // 1
await source.Task; // 2
source = null; // 4
}
private void buttonEnd_click(object sender, EventArgs e)
{
source.SetResult(true); // 3
MessageBox.Show(source.ToString()); // 5 and exception is thrown
}
No, its not safe. However also consider that the caller might also have spawned a thread and passed the List to its child thread before calling your code, even in a non async environment, which will have the same detrimental effect.
So; although not safe, there is nothing inherently thread-safe about receiving a List from a caller anyway - there is no way of knowing whether the list is actually being processed from other threads that your own.
Short answer
You always need to be careful using async.
Longer answer
It depends on your SynchronizationContext and TaskScheduler, and what you mean by "safe."
When your code awaits something, it creates a continuation and wraps it in a task, which is then posted to the current SynchronizationContext's TaskScheduler. The context will then determine when and where the continuation will run. The default scheduler simply uses the thread pool, but different types of applications can extend the scheduler and provide more sophisticated synchronization logic.
If you are writing an application that has no SynchronizationContext (for example, a console application, or anything in .NET core), the continuation is simply put on the thread pool, and could execute in parallel with your main thread. In this case you must use lock or synchronized objects such as ConcurrentDictionary<> instead of Dictionary<>, for anything other than local references or references that are closed with the task.
If you are writing a WinForms application, the continuations are put in the message queue, and will all execute on the main thread. This makes it safe to use non-synchronized objects. However, there are other worries, such as deadlocks. And of course if you spawn any threads, you must make sure they use lock or Concurrent objects, and any UI invocations must be marshaled back to the UI thread. Also, if you are nutty enough to write a WinForms application with more than one message pump (this is highly unusual) you'd need to worry about synchronizing any common variables.
If you are writing an ASP.NET application, the SynchronizationContext will ensure that, for a given request, no two threads are executing at the same time. Your continuation might run on a different thread (due to a performance feature known as thread agility), but they will always have the same SynchronizationContext and you are guaranteed that no two threads will access your variables at the same time (assuming, of course, they are not static, in which case they span across HTTP requests and must be synchronized). In addition, the pipeline will block parallel requests for the same session so that they execute in series, so your session state is also protected from threading concerns. However you still need to worry about deadlocks.
And of course you can write your own SynchronizationContext and assign it to your threads, meaning that you specify your own synchronization rules that will be used with async.
See also How do yield and await implement flow of control in .NET?
Assuming the "invalid acces" occures in LoadNextItem(): The Task will throw an exception. Since the context is captured it will pass on to the callers thread so list.Add will not be reached.
So, no it's not thread-safe.
Yes I think that could be a problem.
I would return item and add to the list on the main tread.
private async void GetIntButton(object sender, RoutedEventArgs e)
{
List<int> Ints = new List<int>();
Ints.Add(await GetInt());
}
private async Task<int> GetInt()
{
await Task.Delay(100);
return 1;
}
But you have to call from and async so I do not this this would work either.

Thread safety in C# lambdas

I came across a piece of C# code like this today:
lock(obj)
{
// perform various operations
...
// send a message via a queue but in the same process, Post(yourData, callback)
messagingBus.Post(data, () =>
{
// perform operation
...
if(condition == true)
{
// perform a long running, out of process operation
operation.Perform();
}
}
}
My question is this: can the callback function ever be invoked in such a way as to cause the lock(obj) to not be released before operation.Perform() is called? i.e., is there a way that the callback function can be invoked on the same thread that is holding the lock, and before that thread has released the lock?
EDIT: messagingBus.Post(...) can be assumed to be an insert on to a queue, that then returns immediately. The callback is invoked on some other thread, probably from the thread pool.
For the operation.Perform() you can read it as Thread.Sleep(10000) - just something that runs for a long time and doesn't share or mutate any state.
I'm going to guess.
Post in .net generally implies that the work will be done by another thread or at another time.
So yes, it's not only possible that the lock on obj will be released before Perform is called, it's fairly likely it will happen. However, it's not guaranteed. Perform may complete before the lock is released.
That doesn't mean it's a problem. The "perform various actions" part may need the lock. messagingBus may need the lock to queue the action. The work inside may not need the lock at all, in which case the code is thread safe.
This is all a guess because there's no notion of what work is being done, why it must be inside a lock, and what Post or perform does. So the code may be perfectly safe, or it may be horribly flawed.
Without know what messagingBus.Post is doing, you can't tell. If Post invokes the delegate it is given (the lambda expression in your example) then the lock will be in place while that lambda executes. If Post schedules that delegate for execution at a later time, then the lock will not be in place while the lambda executes. It's not clear what the the lock(obj) is for, to lock calls to messagingBus.Post, or what... Detailing the type (including full namespace) of the messagingBus variable would go a long way to providing better details.
If the callback executes asynchronously, then yes, the lock may still be held when Perform() unless Post() does something specific to avoid that case (which would be unusual).
If the callback was scheduled on the same thread as the call to Post() (e. g. in the extreme example where the thread pool has only 1 thread), a typical thread pool implementation would not execute the callback until the thread finishes it's current task, which in this case would require it releasing the lock before executing Perform().
It's impossible to answer your question without knowing how messagingBus.Post is implemented. Async APIs typically provide no guarantee that the callback will be executed truly concurrently. For example, .Net APM methods such as FileStream.BeginRead may decide to perform the operation synchronously, in wich case the callback will be executed on the same thread that called BeginRead. Returned IAsyncResult.CompletedSynchronously will be set to true in this case.

does Monitor.Wait Needs synchronization?

I have developed a generic producer-consumer queue which pulses by Monitor in the following way:
the enqueue :
public void EnqueueTask(T task)
{
_workerQueue.Enqueue(task);
Monitor.Pulse(_locker);
}
the dequeue:
private T Dequeue()
{
T dequeueItem;
if (_workerQueue.Count > 0)
{
_workerQueue.TryDequeue(out dequeueItem);
if(dequeueItem!=null)
return dequeueItem;
}
while (_workerQueue.Count == 0)
{
Monitor.Wait(_locker);
}
_workerQueue.TryDequeue(out dequeueItem);
return dequeueItem;
}
the wait section produces the following SynchronizationLockException :
"object synchronization method was called from an unsynchronized block of code"
do i need to synch it? why ? Is it better to use ManualResetEvents or the Slim version of .NET 4.0?
Yes, the current thread needs to "own" the monitor in order to call either Wait or Pulse, as documented. (So you'll need to lock for Pulse as well.) I don't know the details for why it's required, but it's the same in Java. I've usually found I'd want to do that anyway though, to make the calling code clean.
Note that Wait releases the monitor itself, then waits for the Pulse, then reacquires the monitor before returning.
As for using ManualResetEvent or AutoResetEvent instead - you could, but personally I prefer using the Monitor methods unless I need some of the other features of wait handles (such as atomically waiting for any/all of multiple handles).
From the MSDN description of Monitor.Wait():
Releases the lock on an object and blocks the current thread until it reacquires the lock.
The 'releases the lock' part is the problem, the object isn't locked. You are treating the _locker object as though it is a WaitHandle. Doing your own locking design that's provably correct is a form of black magic that's best left to our medicine man, Jeffrey Richter and Joe Duffy. But I'll give this one a shot:
public class BlockingQueue<T> {
private Queue<T> queue = new Queue<T>();
public void Enqueue(T obj) {
lock (queue) {
queue.Enqueue(obj);
Monitor.Pulse(queue);
}
}
public T Dequeue() {
T obj;
lock (queue) {
while (queue.Count == 0) {
Monitor.Wait(queue);
}
obj = queue.Dequeue();
}
return obj;
}
}
In most any practical producer/consumer scenario you will want to throttle the producer so it cannot fill the queue unbounded. Check Duffy's BoundedBuffer design for an example. If you can afford to move to .NET 4.0 then you definitely want to take advantage of its ConcurrentQueue class, it has lots more black magic with low-overhead locking and spin-waiting.
The proper way to view Monitor.Wait and Monitor.Pulse/PulseAll is not as providing a means of waiting, but rather (for Wait) as a means of letting the system know that the code is in a waiting loop which can't exit until something of interest changes, and (for Pulse/PulseAll) as a means of letting the system know that code has just changed something that might cause satisfy the exit condition some other thread's waiting loop. One should be able to replace all occurrences of Wait with Sleep(0) and still have code work correctly (even if much less efficiently, as a result of spending CPU time repeatedly testing conditions that haven't changed).
For this mechanism to work, it is necessary to avoid the possibility of the following sequence:
The code in the wait loop tests the condition when it isn't satisfied.
The code in another thread changes the condition so that it is satisfied.
The code in that other thread pulses the lock (which nobody is yet waiting on).
The code in the wait loop performs a Wait since its condition wasn't satisfied.
The Wait method requires that the waiting thread have a lock, since that's the only way it can be sure that the condition it's waiting upon won't change between the time it's tested and the time the code performs the Wait. The Pulse method requires a lock because that's the only way it can be sure that if another thread has "committed" itself to performing a Wait, the Pulse won't occur until after the other thread actually does so. Note that using Wait within a lock doesn't guarantee that it's being used correctly, but there's no way that using Wait outside a lock could possibly be correct.
The Wait/Pulse design actually works reasonably well if both sides cooperate. The biggest weaknesses of the design, IMHO, are (1) there's no mechanism for a thread to wait until any of a number of objects is pulsed; (2) even if one is "shutting down" an object such that all future wait loops should exit immediately (probably by checking an exit flag), the only way to ensure that any Wait to which a thread has committed itself will get a Pulse is to acquire the lock, possibly waiting indefinitely for it to become available.

Categories

Resources