I am starting a task with the following code:
var token = tokenSource.Token;
var taskWithToken = new Task(() =>
new ProcessMyCommand(_unitOfWork, ..., batchRunId, token).Execute(),
token);
In my continue with, I need to know the batchRunId and possibly some other variables listed in the ..., however, it doesn't apepar that this is possible???
taskWithToken.ContinueWith(task =>
{
if (!task.IsCanceled)
return;
//TODO: make sure no more subsequent runs happen
//TODO: sync with source data
}
);
Is there something I am missing? How can I make sure the .ContinueWith executes with access to the values it needs?
First, I'm not even sure if you need continuation in your case. Your code could be simplified into something like:
var taskWithToken = new Task(() =>
{
new ProcessMyCommand(_unitOfWork, ..., batchRunId, token).Execute();
// code from the continuation here
},
token);
But if you do want to use ContinueWith() and you're worried about using it because of the ReSharper warning, then you don't have to. Most of the time, code like this is perfectly fine and you can ignore the warning.
Longer version: when you write a lambda that references something from the enclosing scope (so called closure), the compiler has to generate code for that. How exactly does it do that is an implementation detail, but the current compiler generates a single closure class for all closures inside a single method.
What this means in your case is that the compiler generates a class that contains the locals this (because of _unitOfWork), request and batchRunId (and maybe others that you didn't show). This closure object is shared between the new Task lambda and the ContinueWith() lambda, even though the second lambda doesn't use request or this. And as long as the second lambda is referenced from somewhere, those objects can't be garbage collected, even though they can't be accessed from it.
So, this situation can lead to a memory leak, which I believe is why ReSharper is warning you about it. But in almost all cases, this memory leak either doesn't exist (because the second lambda isn't referenced longer than the first one) or it's very small. So, most of the time, you can safely ignore that warning. But if you get mysterious memory leaks, you should investigate the way you're using lambdas and especially places where you get this warning.
You can create your MyTaskData class to store your data and result and it may as well store MyTaskData PreviousTaskData property (from previous task) creating linked list of results. Create a Task<MyTaskData> inside which, at the end, you return myNewTaskData;. Then ContinueWith<MyTaskData>(...) inside which you can get previous results through Task.Result property.
As for continuation on cancelled Task ContinueWith has a variant with TaskContinuationOptions parameter (MSDN) where you can specify NotOnCanceled
Related
I want to cache calculation results in a ConcurrentDictionary<TKey,TValue>. Several threads may query the cache for an entry and generate it if it does not exist.
Since GetOrAdd(TKey, Func<TKey,TValue>) is not atomic, I think I should use GetOrAdd(TKey, TValue) with Task<CacheItem> as TValue.
So, when a thread wants to query a cache item, it generates a cold task coldTask, that is a task, which is not started, and potentially generates the the item, calls var cacheTask = cache.GetOrAdd(key, coldTask) for some key object, and then checks whether cacheTask is started or even has a result. If cacheTask is not started, the calling thread starts the task.
Is this a valid approach in principle?
One problem that remains is that
if(cacheTask.Status == Status.Created)
cacheTask.Start();
is not atomic, so the cacheTask may be started from another thread, before cacheTask.Start() is called here.
Is
try {
if(cacheTask.Status == Status.Created)
cacheTask.Start();
} catch {}
a valid workaround?
The principle should be fine, to start the task you should be able to do something like:
var newTask = new Task(...);
var dictionaryTask = myDictionary.GetOrAdd(myKey, newTask);
if(dictionaryTask == newTask){
newTask.Start();
}
return await dictionaryTask;
That should ensure that only the thread that created the task starts it.
I would suggest checking out Lazy<T> since it is somewhat related. I would also suggest doing some bench-marking, since the most appropriate approach will depend on your specific use case. Keep in mind that async/await, or blocking, a task will have some overhead, so it will depend on the cost of generating values, and the frequency this is done at.
As I suggested in the comments, I'd use TaskCompletionSource<TResult> and reference equality to avoid races and unnecessary additional tasks to be scheduled:
var tcs = new TaskCompletionSource<CacheItem>();
var actualTask = theDictionary.GetOrAdd(key, tcs.Task);
if(ReferenceEquals(actualTask, tcs.Task))
{
//Do the actual work here
tcs.SetResult(new CacheItem());
}
return actualTask;
If generation can fail then the //Do the actual work here section should be wrapped in a try/catch and SetException should be used on the completion source (to indicate to any existing waiters that the failure has occurred). But then you have to consider what it means for that failed entry in the cache, whether to remove or retry, etc, and all of the complexity that arises from trying to build a cache in the first place.
I'm running in circles with this one. I have some tasks on an HttpClient (.NET 4 with httpclient package from NuGet), in one of them i'm trying to assign a value to a variable that i declared OUTSIDE the task, at the beggining of the function, but when the execution gets to that point, the variable lost the assigned value and came back to the initial value, like it never changed. But I'm pretty sure it DID change at a moment, when the execution passed through the task.
I've made this screenshot to show it more easily:
What should I do to make my xmlString KEEP the value that was assigned to it inside the task, and use it OUTSIDE the task???
Thanks in advance for your help guys.
Judging by your screenshot (it would be better if you provided the code in your question as well) you are never awaiting your task. Therefore, your last usage where you obtain the value of xmlString happens before your task has finished executing, and presumably before your .ContinueWith() has assigned the variable.
Ideally, your enclosing method should be async as well. Then you can simply await it. Otherwise, you can try calling the .ContinueWith(...).Wait() method first, though at that point your'e not leveraging async semantics at all.
Why don't you use await? It makes the code a lot cleaner.
Replace the client.GetAsync() line with the following:
HttpResponse resp = await client.GetAsync(par);
And then add the try-catch part of the Task. Then it should work as you originally intended it to!
EDIT:
Servy is half-right in the comments. Apart from the Microsoft.Net.HttpClient you will most probably need to manually add Microsoft.Bcl.Async too.
I believe that I understand what a closure is for an anonymous function and am familiar with the traditional pitfalls. Good questions covering this topic are here and here. The purpose is not to understand why or how this works in a general sense but to suss out intricacies I may be unaware of when depending on the behavior of generated closure class references. Specifically, what pitfalls exist when reporting on the behavior of an externally modified variable captured in a closure?
Example
I have a long-running, massively concurrent worker service that has exactly one error case - when it cannot retrieve work. The degree of concurrency (number of conceptual threads to use) is configurable. Note, conceptual threads are implemented as Tasks<> via the TPL. Because the service constantly loops trying to get work when multiplied by the unknown degree of concurrency this can mean thousands to tens of thousands of errors could be generated per second.
As such, I need a reporting mechanism that is time-bound rather than attempt-bound, that is isolated to its own conceptual thread, and that is cancellable. To that end, I devised a recursive Task lambda that accesses my fault counter every 5 minutes outside of the primary attempt-based looping that is trying to get work:
var faults = 1;
Action<Task> reportDelay = null;
reportDelay =
// 300000 is 5 min
task => Task.Delay(300000, cancellationToken).ContinueWith(
subsequentTask =>
{
// `faults` is modified outside the anon method
Logger.Error(
$"{faults} failed attempts to get work since the last known success.");
reportDelay(subsequentTask);
},
cancellationToken);
// start the report task - runs concurrently with below
reportDelay.Invoke(Task.CompletedTask);
// example get work loop for context
while (true)
{
object work = null;
try
{
work = await GetWork();
cancellationToken.Cancel();
return work;
}
catch
{
faults++;
}
}
Concerns
I understand that, in this case, the generated closure with point by reference to my faults variable (which is incremented whenever any conceptual thread attempts to get work but can't). I likewise understand that this is generally discouraged, but from what I can tell only because it leads to unexpected behaviors when coded expecting the closure to capture a value.
Here, I want and rely on the closure capturing the faults variable by reference. I want to report the value of the variable around the time the continuation is called (it does not have to be exact). I am mildly concerned about faults being prematurely GC'd but I cancel the loop before exiting that lexical scope making me think it should be safe. Is there anything else I'm not thinking of? What dangers are there when considering closure access outside of mutability of the underlying value?
Answer and Explanation
I have accepted an answer below that refactors the code to avoid the need for closure access by reifying the fault monitor into its own class. However, because this does not answer the question directly, I will include a brief explanation here for future readers of the reliable behavior:
So long as the closed-over variable remains in scope for the life of the closure, it can be relied upon to behave as a true reference variable. The dangers of accessing a variable modified in an outer scope from within a closure are:
You must understand that the variable will behave as a reference within the closure, mutating its value as it is modified in the outer scope. The closure variable will always contain the current runtime value of the outer scope variable, not the value at the time the closure is generated.
You must write your program in such a way as to garuantee that the lifetime of the exterior variable is the same or greater than the anonymous function/closure itself. If you garbage collect the outer variable then the reference will become an invalid pointer.
Here is a quick alternative that avoids some of the issues you may be concerned with. Also, as #Servy mentioned just calling a sperate async function will do. The ConcurrentStack just makes it easy to add and clear, additionally more information could be logged than just the count.
public class FaultCounter {
private ConcurrentStack<Exception> faultsSinceLastSuccess;
public async void RunServiceCommand() {
faultsSinceLastSuccess = new ConcurrentStack<Exception>();
var faultCounter = StartFaultLogging(new CancellationTokenSource());
var worker = DoWork(new CancellationTokenSource());
await Task.WhenAll(faultCounter, worker);
Console.WriteLine("Done.");
}
public async Task StartFaultLogging(CancellationTokenSource cts) {
while (true && !cts.IsCancellationRequested) {
Logger.Error($"{faultsSinceLastSuccess.Count} failed attempts to get work since the last known success.");
faultsSinceLastSuccess.Clear();
await Task.Delay(300 * 1000);
}
}
public async Task<object> DoWork(CancellationTokenSource cts) {
while (true) {
object work = null;
try {
work = await GetWork();
cts.Cancel();
return work;
}
catch (Exception ex) {
faultsSinceLastSuccess.Push(ex);
}
}
}
}
I see some issues here in your solution:
You read/write the faults variable value in non-thread-safe manner, so in theory either of your threads could use it's old value. You can fix that with Interlocked class usage, especially for the incrementing.
Your action doesn't looks like dealing with task parameter, so why do you need it as an Action accepting the Task? Also, in continuation you aren't checking the token's cancellation flag, so, in theory again, you may get the situation your code runs smoothly, but you still get the error emails.
You start the long task without long-running flag, which is unfriedly for the task scheduler.
Your recursive action could be rewritten in while loop instead, removing the unnecessary overhead in your code.
Closures in C# are implemented into a compiler generated class, so the GC shouldn't be a concern for you, as long as you're looping your retry code.
I'm wondering will this scenario be thread safe and are there issues that I'm not currently seeing:
From ASP.net controller I call non-static method from non-static class (this class is in another project, and class is injected into controller).
This method (which is non-static) does some work and calls some other static method passing it userId
Finally static method does some work (for which userId is needed)
I believe this approach is thread safe, and that everything will be done properly if two users call this method at the same time (let's say in same nanosecond). Am I correct or completely wrong ? If I am wrong what would be correct way of using static methods within ASP.net project ?
EDIT
Here is code :)
This is call from the controller:
await _workoutService.DeleteWorkoutByIdAsync(AzureRedisFeedsConnectionMultiplexer.GetRedisDatabase(),AzureRedisLeaderBoardConnectionMultiplexer.GetRedisDatabase(), workout.Id, userId);
Here how DeleteWorkoutByIdAsync looks like:
public async Task<bool> DeleteWorkoutByIdAsync(IDatabase redisDb,IDatabase redisLeaderBoardDb, Guid id, string userId)
{
using (var databaseContext = new DatabaseContext())
{
var workout = await databaseContext.Trenings.FindAsync(id);
if (workout == null)
{
return false;
}
databaseContext.Trenings.Remove(workout);
await databaseContext.SaveChangesAsync();
await RedisFeedService.StaticDeleteFeedItemFromFeedsAsync(redisDb,redisLeaderBoardDb, userId, workout.TreningId.ToString());
}
return true;
}
As you can notice DeleteWorkoutByIdAsync calls static method StaticDeleteFeedItemFromFeedsAsync which looks like this:
public static async Task StaticDeleteFeedItemFromFeedsAsync(IDatabase redisDb,IDatabase redisLeaderBoardDd, string userId, string workoutId)
{
var deleteTasks = new List<Task>();
var feedAllRedisVals = await redisDb.ListRangeAsync("FeedAllWorkouts:" + userId);
DeleteItemFromRedisAsync(redisDb, feedAllRedisVals, "FeedAllWorkouts:" + userId, workoutId, ref deleteTasks);
await Task.WhenAll(deleteTasks);
}
And here is static method DeleteItemFromRedisAsync which is called in StaticDeleteFeedItemFromFeedsAsync:
private static void DeleteItemFromRedisAsync(IDatabase redisDb, RedisValue [] feed, string redisKey, string workoutId, ref List<Task> deleteTasks)
{
var itemToRemove = "";
foreach (var f in feed)
{
if (f.ToString().Contains(workoutId))
{
itemToRemove = f;
break;
}
}
if (!string.IsNullOrEmpty(itemToRemove))
{
deleteTasks.Add(redisDb.ListRemoveAsync(redisKey, itemToRemove));
}
}
"Thread safe" isn't a standalone term. Thread Safe in the the face of what? What kind of concurrent modifications are you expecting here?
Let's look at a few aspects here:
Your own mutable shared state: You have no shared state whatsoever in this code; so it's automatically thread safe.
Indirect shared state: DatabaseContext. This looks like an sql database, and those tend to be thread "safe", but what exactly that means depends on the database in question. For example, you're removing a Trenings row, and if some other thread also removes the same row, you're likely to get a (safe) concurrency violation exception. And depending on isolation level, you may get concurrency violation exceptions even for other certain mutations of "Trenings". At worst that means one failed request, but the database itself won't corrupt.
Redis is essentially single-threaded, so all operations are serialized and in that sense "thread safe" (which might not buy you much). Your delete code gets a set of keys, then deletes at most one of those. If two or more threads simultaneously attempt to delete the same key, it is possible that one thread will attempt to delete a non-existing key, and that may be unexpected to you (but it won't cause DB corruption).
Implicit consistency between redis+sql: It looks like you're using guids, so the chances of unrelated things clashing are small. Your example only contains a delete operation (which is likely no to cause consistency issues), so it's hard to speculate whether under all other circumstances redis and the sql database will stay consistent. In general, if your IDs are never reused, you're probably safe - but keeping two databases in sync is a hard problem, and you're quite likely to make a mistake somewhere.
However, your code seems excessively complicated for what it's doing. I'd recommend you simplify it dramatically if you want to be able to maintain this in the long run.
Don't use ref parameters unless you really know what you're doing (and it's not necessary here).
Don't mix up strings with other data types, so avoid ToString() where possible. Definitely avoid nasty tricks like Contains to check for key equality. You want your code to break when something unexpected happens, because code that "limps along" can be virtually impossible to debug (and you will write bugs).
Don't effectively return an array of tasks if the only thing you can really do is wait for all of them - might as well do that in the callee to simplify the API.
Don't use redis. It's probably just a distraction here - you already have another database, so it's very unlikely you need it here, except for performance reasons, and it's extremely premature to go adding whole extra database engines for a hypothetical performance problem. There's a reasonable chance that the extra overhead of requiring extra connections may make your code slower than if you had just one db, especially if you can't save many sql queries.
Note: this answer was posted before the OP amended their question to add their code, revealing that this is actually a question of whether async/await is thread-safe.
Static methods are not a problem in and of themselves. If a static method is self-contained and manages to do its job using local variables only, then it is perfectly thread safe.
Problems arise if the static method is not self-contained, (delegates to thread-unsafe code,) or if it manipulates static state in a non-thread safe fashion, i.e. accesses static variables for both read and write outside of a lock() clause.
For example, int.parse() and int.tryParse() are static, but perfectly thread safe. Imagine the horror if they were not thread-safe.
what you are doing here is synchronizing on a list (deleteTasks). If you do this i would recommend 1 of 2 things.
1) Either use thread safe collections
https://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
2) Let your DeleteItemFromRedisAsync return a task and await it.
Although i think in this particular case i don't see any issues as soon as you refactor it and DeleteItemFromRedisAsync can get called multiple times in parallel then you will have issues. The reason being is that if multiple threads can modify your list of deleteTasks then you are not longer guaranteed you collect them all (https://msdn.microsoft.com/en-us/library/dd997373(v=vs.110).aspx if 2 threads do an "Add"/Add-to-the-end in a non-thread safe way at the same time then 1 of them is lost) so you might have missed a task when waiting for all of them to finish.
Also i would avoid mixing paradigms. Either use async/await or keep track of a collection of tasks and let methods add to that list. don't do both. This will help the maintainability of your code in the long run. (note, threads can still return a task, you collect those and then wait for all of them. but then the collecting method is responsible for any threading issues instead of it being hidden in the method that is being called)
I am going to use this method in a Load Test which means thousands of calls may happen very quickly from different threads. I am wondering if I have to consider what would happen on subsequent call, where a new WebClient is created but before the prior await is complete?
public static async Task<string> SendRequest(this string url)
{
using (var wc = new WebClient())
{
var bytes = await wc.DownloadDataTaskAsync(url);
using (var reader = new StreamReader(new MemoryStream(bytes)))
{
return await reader.ReadToEndAsync();
}
}
}
I use the term reentrant to describe the fact that this method will be called by one or more threads.
So we want to know what potential problems could arise from using this method in a multithreaded context, either through a single call in an environment that has multiple threads, or where multiple calls are being made from one or more threads.
The first thing to look at is what does this method expose externally. If we're designing this method, we can control what it does, but not what the callers do. We need to assume that anyone can do anything with whatever they pass into our method, what they do with the returned value, and what they do with the type/object instance that the class is called on. Let's look at each of these in turn.
The URL:
Obviously the caller can pass in an invalid URL, but that's not an issue that's specific to asynchrony or multithreading. They can't really do anything else with this parameter. They can't mutate the string from another thread after passing it to us, because string is immutable (or at least observably immutable externally).
The return value:
So at first glance, this in fact may appear to be a problem. We're returning an object instance (a Task); that object is being mutated by this method that we're writing (to mark it as faulted, excepted, completed) and it is also likely to be mutated by the caller of this method (to add continuations). It's also quite plausible for this Task to end up being mutated from multiple different threads (the task could be passed to any number of other threads, which could mutate it by adding continuations, or be reading values while we're mutating it).
Fortunately, Task was very specifically designed to support all of these situations, and it will function properly due to the synchronization that it performs internally. As authors of this method, we don't need to care who adds what continuations to our task, from what thread, whether or not different people are adding them at the same time, what order things happen in, whether continuations are added before or after we mark the task as completed, or any of that. While the task can be mutated externally, even from other threads, there's nothing that they could do that would be observable to us, from this method. Likewise, their continuations are going to function appropriately regardless of what we do. Their continuations will always fire some time after the task is marked as completed, or immediately if it was already completed. It doesn't have the possible race conditions that an event based model has of adding an event handler after the event is fired to signal completion.
Finally, we have state of the type/instance.
This one is easy. It's a static method, so there are no instance fields that we could access even if we wanted to. There are also no static fields that this method accesses, so no state is shared between threads that way that we need to be concerned about.
Other than the string input and task output, the state that this method uses is entirely local variables that are never accessible outside of this method. Since this method does everything in a single thread (if there is a synchronization context, or it at least does everything sequentially even if thread pool threads are used), we don't need to worry about any threading issues internally, only what could be happening externally by the caller.
When you're concerned about methods being called multiple times before previous calls have finished, the primary concern here is around access to fields. If the method was accessing instance/static fields, then one would need to consider the implications not only of a method being called with any given input state, but also with what's going on if other methods are accessing those fields at the same time. Since we access none, this is moot for this method.