I'm wondering will this scenario be thread safe and are there issues that I'm not currently seeing:
From ASP.net controller I call non-static method from non-static class (this class is in another project, and class is injected into controller).
This method (which is non-static) does some work and calls some other static method passing it userId
Finally static method does some work (for which userId is needed)
I believe this approach is thread safe, and that everything will be done properly if two users call this method at the same time (let's say in same nanosecond). Am I correct or completely wrong ? If I am wrong what would be correct way of using static methods within ASP.net project ?
EDIT
Here is code :)
This is call from the controller:
await _workoutService.DeleteWorkoutByIdAsync(AzureRedisFeedsConnectionMultiplexer.GetRedisDatabase(),AzureRedisLeaderBoardConnectionMultiplexer.GetRedisDatabase(), workout.Id, userId);
Here how DeleteWorkoutByIdAsync looks like:
public async Task<bool> DeleteWorkoutByIdAsync(IDatabase redisDb,IDatabase redisLeaderBoardDb, Guid id, string userId)
{
using (var databaseContext = new DatabaseContext())
{
var workout = await databaseContext.Trenings.FindAsync(id);
if (workout == null)
{
return false;
}
databaseContext.Trenings.Remove(workout);
await databaseContext.SaveChangesAsync();
await RedisFeedService.StaticDeleteFeedItemFromFeedsAsync(redisDb,redisLeaderBoardDb, userId, workout.TreningId.ToString());
}
return true;
}
As you can notice DeleteWorkoutByIdAsync calls static method StaticDeleteFeedItemFromFeedsAsync which looks like this:
public static async Task StaticDeleteFeedItemFromFeedsAsync(IDatabase redisDb,IDatabase redisLeaderBoardDd, string userId, string workoutId)
{
var deleteTasks = new List<Task>();
var feedAllRedisVals = await redisDb.ListRangeAsync("FeedAllWorkouts:" + userId);
DeleteItemFromRedisAsync(redisDb, feedAllRedisVals, "FeedAllWorkouts:" + userId, workoutId, ref deleteTasks);
await Task.WhenAll(deleteTasks);
}
And here is static method DeleteItemFromRedisAsync which is called in StaticDeleteFeedItemFromFeedsAsync:
private static void DeleteItemFromRedisAsync(IDatabase redisDb, RedisValue [] feed, string redisKey, string workoutId, ref List<Task> deleteTasks)
{
var itemToRemove = "";
foreach (var f in feed)
{
if (f.ToString().Contains(workoutId))
{
itemToRemove = f;
break;
}
}
if (!string.IsNullOrEmpty(itemToRemove))
{
deleteTasks.Add(redisDb.ListRemoveAsync(redisKey, itemToRemove));
}
}
"Thread safe" isn't a standalone term. Thread Safe in the the face of what? What kind of concurrent modifications are you expecting here?
Let's look at a few aspects here:
Your own mutable shared state: You have no shared state whatsoever in this code; so it's automatically thread safe.
Indirect shared state: DatabaseContext. This looks like an sql database, and those tend to be thread "safe", but what exactly that means depends on the database in question. For example, you're removing a Trenings row, and if some other thread also removes the same row, you're likely to get a (safe) concurrency violation exception. And depending on isolation level, you may get concurrency violation exceptions even for other certain mutations of "Trenings". At worst that means one failed request, but the database itself won't corrupt.
Redis is essentially single-threaded, so all operations are serialized and in that sense "thread safe" (which might not buy you much). Your delete code gets a set of keys, then deletes at most one of those. If two or more threads simultaneously attempt to delete the same key, it is possible that one thread will attempt to delete a non-existing key, and that may be unexpected to you (but it won't cause DB corruption).
Implicit consistency between redis+sql: It looks like you're using guids, so the chances of unrelated things clashing are small. Your example only contains a delete operation (which is likely no to cause consistency issues), so it's hard to speculate whether under all other circumstances redis and the sql database will stay consistent. In general, if your IDs are never reused, you're probably safe - but keeping two databases in sync is a hard problem, and you're quite likely to make a mistake somewhere.
However, your code seems excessively complicated for what it's doing. I'd recommend you simplify it dramatically if you want to be able to maintain this in the long run.
Don't use ref parameters unless you really know what you're doing (and it's not necessary here).
Don't mix up strings with other data types, so avoid ToString() where possible. Definitely avoid nasty tricks like Contains to check for key equality. You want your code to break when something unexpected happens, because code that "limps along" can be virtually impossible to debug (and you will write bugs).
Don't effectively return an array of tasks if the only thing you can really do is wait for all of them - might as well do that in the callee to simplify the API.
Don't use redis. It's probably just a distraction here - you already have another database, so it's very unlikely you need it here, except for performance reasons, and it's extremely premature to go adding whole extra database engines for a hypothetical performance problem. There's a reasonable chance that the extra overhead of requiring extra connections may make your code slower than if you had just one db, especially if you can't save many sql queries.
Note: this answer was posted before the OP amended their question to add their code, revealing that this is actually a question of whether async/await is thread-safe.
Static methods are not a problem in and of themselves. If a static method is self-contained and manages to do its job using local variables only, then it is perfectly thread safe.
Problems arise if the static method is not self-contained, (delegates to thread-unsafe code,) or if it manipulates static state in a non-thread safe fashion, i.e. accesses static variables for both read and write outside of a lock() clause.
For example, int.parse() and int.tryParse() are static, but perfectly thread safe. Imagine the horror if they were not thread-safe.
what you are doing here is synchronizing on a list (deleteTasks). If you do this i would recommend 1 of 2 things.
1) Either use thread safe collections
https://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
2) Let your DeleteItemFromRedisAsync return a task and await it.
Although i think in this particular case i don't see any issues as soon as you refactor it and DeleteItemFromRedisAsync can get called multiple times in parallel then you will have issues. The reason being is that if multiple threads can modify your list of deleteTasks then you are not longer guaranteed you collect them all (https://msdn.microsoft.com/en-us/library/dd997373(v=vs.110).aspx if 2 threads do an "Add"/Add-to-the-end in a non-thread safe way at the same time then 1 of them is lost) so you might have missed a task when waiting for all of them to finish.
Also i would avoid mixing paradigms. Either use async/await or keep track of a collection of tasks and let methods add to that list. don't do both. This will help the maintainability of your code in the long run. (note, threads can still return a task, you collect those and then wait for all of them. but then the collecting method is responsible for any threading issues instead of it being hidden in the method that is being called)
Related
In my ASP.NET Core app, at some points I'm querying a couple ADs for data. This being AD, the queries take some time to complete and the DirectoryServices API contains only synchronous calls.
Is it a good practice to try and wrap the AD sync calls as async? I think it's done like this (just an example, not the real query):
private async Task<string[]> GetUserGroupsAsync(string samAccountName)
{
var func = new Func<string, string[]>(sam =>
{
var result = new List<string>();
using (var ctx = new PrincipalContext(ContextType.Domain, "", "", ""))
{
var p = new UserPrincipal(ctx)
{
SamAccountName = sam
};
using (var search_obj = new PrincipalSearcher(p))
{
var query_result = search_obj.FindOne();
if (query_result != null)
{
var usuario = query_result as UserPrincipal;
var directory_entry = usuario.GetUnderlyingObject() as DirectoryEntry;
var grupos = usuario.GetGroups(ctx).OfType<GroupPrincipal>().ToArray();
if (grupos != null)
{
foreach (GroupPrincipal g in grupos)
{
result.Add(g.Name);
}
}
}
}
}
return result.ToArray();
});
var result = await Task.Run(() => func(samAccountName));
return result;
}
Is it a good practice
Usually not.
In a desktop app where you don't want to hold up the UI thread, then this idea can actually be a good idea. That Task.Run moves the work to a different thread and the UI thread can continue responding to user input while you're waiting for a response.
You tagged ASP.NET. The answer there is also "it depends". ASP.NET has a limited amount of worker threads that it's allowed to use. The benefit of asynchronous code is to allow a thread to go and work on some other request while you're waiting for a response. Thus, you can serve more requests with the same amount of available threads. It helps the overall performance of your application.
If you're calling await GetUserGroupsAsync(), then there is absolutely no benefit to doing what you're doing. You're freeing up the calling thread, but you've created a new thread that is going to sit locked until a response is returned. So your net thread savings is zero, and you have the additional CPU overhead of setting up the task.
If you intend on calling GetUserGroupsAsync() and then going out and getting other data while you wait for a response, then this can save time. It won't save threads, but just time. But you should be conscious that you are now taking up two threads for each request instead of just one, which means you can hit the ASP.NET max thread count faster, potentially hurting the overall performance of your application.
But whether you want to save time in ASP.NET, or if you want to free up the UI thread in a desktop app, I would still argue that you should not use Task.Run inside GetUserGroupsAsync(). If the caller wants to offload that waiting to another thread so it can then go get other data, then the caller can use Task.Run, like this:
var groupsTask = Task.Run(() => GetUserGroupsAsync());
// make HTTP request or get some other external data while we wait
var groups = await groupsTask;
The decision on whether you should create a method for a class should depend on the answer to the question: if someone thinks of what this class represents, would he think that this class will have this functionality?
Compare this with class string and methods about string equality. Most people would think that two strings are equal if they have exactly the same characters in the same order. However, for a lot of applications, it might be handy to be able to compare two strings with case insensitivity. Instead of changing the equality method of string, a new class is created. This StringComparer class contains a lot of methods to compare strings using different definitions of equality.
If someone would say: "Okay, I've just created a class that represents several methods to compare two strings for equality". Would you expect that comparing with case insensitivity is one of the methods of this class? Of course you would!
The same should be with your class. I don't know what your class represents. However, apparently you thought, that someone who has an object of this class would be happy to "Get User Groups". He is happy that he doesn't have to know how that someone made this method for him, and that he doesn't need to know the insides of the class to be able to get the user groups.
This information hiding is an important thing of classes. It gives the creator of the class the freedom to internally change how the class works, without having to change usage of the class.
So if everyone who knows what your class represents would think: "of course getting user groups will take a considerable amount of time", and "of course, my thread will be waiting idly when getting user groups", then users of your class would expect the presence of asyn-await, to prevent idly waiting.
On the other hand, it might be that users of your class would say: "Well, I know that getting user groups will take some heavy calculations. It will take some time, but my thread will be very busy". In that case, they won't expect an async method.
Assuming that you have a non-async method to get the user groups:
string[] GetUserGroups(string samAccountName) {...}
The async method would be very simple:
Task<string[] GetUserGroupsAsync(string samAccountName)
{
return Task.Run(() => GetUserGroups(samAccountName));
}
The only thing you would have to decide is: do the users of my class expect this method?
Advantages and Disadvantages
Disadvantage of having a Sync and an Async method:
People who learn about your class have to learn about more methods
Users of your class can't decide how the async method calls the sync one, without creating an extra async method, which will only add to the confusion
You'll have to add an extra unit test
You'll have to maintain the async method forever.
Advantages of having an async method:
If in future a user group would be fetched from another process, for instance a database, or an XML file, or maybe the internet, then you can internally change the class, without having to change the many, many users (after all, all your classes are very popular, aren't they :)
Conclusion
If people look at your class, and they wouldn't even think that fetching user groups would be an async method, then don't create it.
If you think that maybe in future it could be that another process provides the user groups, then it would be wise to prepare your users about this.
I tried to transform a simple sequential loop into a parallel computed loop with the System.Threading.Tasks library.
The code compiles, returns correct results, but It does not save any computational cost, otherwise, it takes longer.
EDIT: Sorry guys, I have probably oversimplified the question and made some errors doing that.
To append additional information, I am running the code on an i7-4700QM, and it is referenced in a Grasshopper script.
Here is the actual code. I also switched to a non thread-local variables
public static class LineNet
{
public static List<Ray> SolveCpu(List<Speaker> sources, List<Receiver> targets, List<Panel> surfaces)
{
ConcurrentBag<Ray> rays = new ConcurrentBag<Ray>();
for (int i = 0; i < sources.Count; i++)
{
Parallel.For(
0,
targets.Count,
j =>
{
Line path = new Line(sources[i].Position, targets[j].Position);
Ray ray = new Ray(path, i, j);
if (Utils.CheckObstacles(ray,surfaces))
{
rays.Add(ray);
}
}
);
}
}
}
The Grasshopper implementation just collects sources targets and surfaces, calls the method Solve and returns rays.
I understand that dispatching workload to threads is expensive, but is it so expensive?
Or is the ConcurrentBag just preventing parallel calculation?
Plus, my classes are immutable (?), but if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?
Without a good Minimal, Complete, and Verifiable code example that reliably reproduces the problem, it is not possible to provide a definitive answer. The code you posted does not even appear to be an excerpt of real code, because the type declared as the return type of the method isn't the same as the value actually returned by the return statement.
However, certainly the code you posted does not seem like a good use of Parallel.For(). Your Line constructor would have be fairly expensive to justify parallelizing the task of creating the items. And to be clear, that's the only possible win here.
At the end, you still need to aggregate all of the Line instances that you created into a single list, so all those intermediate lists created for the Parallel.For() tasks are just pure overhead. And the aggregation is necessarily serialized (i.e. only one thread at a time can be adding an item to the result collection), and in the worst way (each thread only gets to add a single item before it gives up the lock and another thread has a chance to take it).
Frankly, you'd be better off storing each local List<T> in a collection, and then aggregating them all at once in the main thread after Parallel.For() returns. Not that that would be likely to make the code perform better than a straight-up non-parallelized implementation. But at least it would be less likely to be worse. :)
The bottom line is that you don't seem to have a workload that could benefit from parallelization. If you think otherwise, you'll need to explain the basis for that thought in a clearer, more detailed way.
if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?
You're already using (it appears) List<T> as the local data for each task, and indeed that should be fine, as tasks don't share their local data.
But if you are asking why you get an exception if you try to use List<T> instead of ConcurrentBag<T> for the result variable, well that's entirely to be expected. The List<T> class is not thread safe, but Parallel.For() will allow each task it runs to execute the localFinally delegate concurrently with all the others. So you have multiple threads all trying to modify the same not-thread-safe collection concurrently. This is a recipe for disaster. You're fortunate you get the exception; the actual behavior is undefined, and it's just as likely you'll simply corrupt the data structure as cause a run-time exception.
I believe that I understand what a closure is for an anonymous function and am familiar with the traditional pitfalls. Good questions covering this topic are here and here. The purpose is not to understand why or how this works in a general sense but to suss out intricacies I may be unaware of when depending on the behavior of generated closure class references. Specifically, what pitfalls exist when reporting on the behavior of an externally modified variable captured in a closure?
Example
I have a long-running, massively concurrent worker service that has exactly one error case - when it cannot retrieve work. The degree of concurrency (number of conceptual threads to use) is configurable. Note, conceptual threads are implemented as Tasks<> via the TPL. Because the service constantly loops trying to get work when multiplied by the unknown degree of concurrency this can mean thousands to tens of thousands of errors could be generated per second.
As such, I need a reporting mechanism that is time-bound rather than attempt-bound, that is isolated to its own conceptual thread, and that is cancellable. To that end, I devised a recursive Task lambda that accesses my fault counter every 5 minutes outside of the primary attempt-based looping that is trying to get work:
var faults = 1;
Action<Task> reportDelay = null;
reportDelay =
// 300000 is 5 min
task => Task.Delay(300000, cancellationToken).ContinueWith(
subsequentTask =>
{
// `faults` is modified outside the anon method
Logger.Error(
$"{faults} failed attempts to get work since the last known success.");
reportDelay(subsequentTask);
},
cancellationToken);
// start the report task - runs concurrently with below
reportDelay.Invoke(Task.CompletedTask);
// example get work loop for context
while (true)
{
object work = null;
try
{
work = await GetWork();
cancellationToken.Cancel();
return work;
}
catch
{
faults++;
}
}
Concerns
I understand that, in this case, the generated closure with point by reference to my faults variable (which is incremented whenever any conceptual thread attempts to get work but can't). I likewise understand that this is generally discouraged, but from what I can tell only because it leads to unexpected behaviors when coded expecting the closure to capture a value.
Here, I want and rely on the closure capturing the faults variable by reference. I want to report the value of the variable around the time the continuation is called (it does not have to be exact). I am mildly concerned about faults being prematurely GC'd but I cancel the loop before exiting that lexical scope making me think it should be safe. Is there anything else I'm not thinking of? What dangers are there when considering closure access outside of mutability of the underlying value?
Answer and Explanation
I have accepted an answer below that refactors the code to avoid the need for closure access by reifying the fault monitor into its own class. However, because this does not answer the question directly, I will include a brief explanation here for future readers of the reliable behavior:
So long as the closed-over variable remains in scope for the life of the closure, it can be relied upon to behave as a true reference variable. The dangers of accessing a variable modified in an outer scope from within a closure are:
You must understand that the variable will behave as a reference within the closure, mutating its value as it is modified in the outer scope. The closure variable will always contain the current runtime value of the outer scope variable, not the value at the time the closure is generated.
You must write your program in such a way as to garuantee that the lifetime of the exterior variable is the same or greater than the anonymous function/closure itself. If you garbage collect the outer variable then the reference will become an invalid pointer.
Here is a quick alternative that avoids some of the issues you may be concerned with. Also, as #Servy mentioned just calling a sperate async function will do. The ConcurrentStack just makes it easy to add and clear, additionally more information could be logged than just the count.
public class FaultCounter {
private ConcurrentStack<Exception> faultsSinceLastSuccess;
public async void RunServiceCommand() {
faultsSinceLastSuccess = new ConcurrentStack<Exception>();
var faultCounter = StartFaultLogging(new CancellationTokenSource());
var worker = DoWork(new CancellationTokenSource());
await Task.WhenAll(faultCounter, worker);
Console.WriteLine("Done.");
}
public async Task StartFaultLogging(CancellationTokenSource cts) {
while (true && !cts.IsCancellationRequested) {
Logger.Error($"{faultsSinceLastSuccess.Count} failed attempts to get work since the last known success.");
faultsSinceLastSuccess.Clear();
await Task.Delay(300 * 1000);
}
}
public async Task<object> DoWork(CancellationTokenSource cts) {
while (true) {
object work = null;
try {
work = await GetWork();
cts.Cancel();
return work;
}
catch (Exception ex) {
faultsSinceLastSuccess.Push(ex);
}
}
}
}
I see some issues here in your solution:
You read/write the faults variable value in non-thread-safe manner, so in theory either of your threads could use it's old value. You can fix that with Interlocked class usage, especially for the incrementing.
Your action doesn't looks like dealing with task parameter, so why do you need it as an Action accepting the Task? Also, in continuation you aren't checking the token's cancellation flag, so, in theory again, you may get the situation your code runs smoothly, but you still get the error emails.
You start the long task without long-running flag, which is unfriedly for the task scheduler.
Your recursive action could be rewritten in while loop instead, removing the unnecessary overhead in your code.
Closures in C# are implemented into a compiler generated class, so the GC shouldn't be a concern for you, as long as you're looping your retry code.
I am going to use this method in a Load Test which means thousands of calls may happen very quickly from different threads. I am wondering if I have to consider what would happen on subsequent call, where a new WebClient is created but before the prior await is complete?
public static async Task<string> SendRequest(this string url)
{
using (var wc = new WebClient())
{
var bytes = await wc.DownloadDataTaskAsync(url);
using (var reader = new StreamReader(new MemoryStream(bytes)))
{
return await reader.ReadToEndAsync();
}
}
}
I use the term reentrant to describe the fact that this method will be called by one or more threads.
So we want to know what potential problems could arise from using this method in a multithreaded context, either through a single call in an environment that has multiple threads, or where multiple calls are being made from one or more threads.
The first thing to look at is what does this method expose externally. If we're designing this method, we can control what it does, but not what the callers do. We need to assume that anyone can do anything with whatever they pass into our method, what they do with the returned value, and what they do with the type/object instance that the class is called on. Let's look at each of these in turn.
The URL:
Obviously the caller can pass in an invalid URL, but that's not an issue that's specific to asynchrony or multithreading. They can't really do anything else with this parameter. They can't mutate the string from another thread after passing it to us, because string is immutable (or at least observably immutable externally).
The return value:
So at first glance, this in fact may appear to be a problem. We're returning an object instance (a Task); that object is being mutated by this method that we're writing (to mark it as faulted, excepted, completed) and it is also likely to be mutated by the caller of this method (to add continuations). It's also quite plausible for this Task to end up being mutated from multiple different threads (the task could be passed to any number of other threads, which could mutate it by adding continuations, or be reading values while we're mutating it).
Fortunately, Task was very specifically designed to support all of these situations, and it will function properly due to the synchronization that it performs internally. As authors of this method, we don't need to care who adds what continuations to our task, from what thread, whether or not different people are adding them at the same time, what order things happen in, whether continuations are added before or after we mark the task as completed, or any of that. While the task can be mutated externally, even from other threads, there's nothing that they could do that would be observable to us, from this method. Likewise, their continuations are going to function appropriately regardless of what we do. Their continuations will always fire some time after the task is marked as completed, or immediately if it was already completed. It doesn't have the possible race conditions that an event based model has of adding an event handler after the event is fired to signal completion.
Finally, we have state of the type/instance.
This one is easy. It's a static method, so there are no instance fields that we could access even if we wanted to. There are also no static fields that this method accesses, so no state is shared between threads that way that we need to be concerned about.
Other than the string input and task output, the state that this method uses is entirely local variables that are never accessible outside of this method. Since this method does everything in a single thread (if there is a synchronization context, or it at least does everything sequentially even if thread pool threads are used), we don't need to worry about any threading issues internally, only what could be happening externally by the caller.
When you're concerned about methods being called multiple times before previous calls have finished, the primary concern here is around access to fields. If the method was accessing instance/static fields, then one would need to consider the implications not only of a method being called with any given input state, but also with what's going on if other methods are accessing those fields at the same time. Since we access none, this is moot for this method.
Suppose I have a static helper class that I'm using a lot in a web app. Suppose that the app receives about 20 requests per second for a sustained period of time and that, by magic, two requests ask the static class to do some work at the exact same nanosecond.
What happens when this happens?
To provide some context, the class is a used to perform a linq-to-sql query: it receives a few parameters, including the UserID, and returns a list of custom objects.
thanks.
It entirely depends on what your "some work" means. If it doesn't involve any shared state, it's absolutely fine. If it requires access to shared state, you'll need work out how to handle that in a thread-safe way.
A general rule of thumb is that a class's public API should be thread-safe for static methods, but doesn't have to be thread-safe for instance methods - typically any one instance is only used within a single thread. Of course it depends on what your class is doing, and what you mean by thread-safe.
What happens when this happens?
If your methods are reentrant then they are thread safe and what will happen is that chances are they will work. If those static methods rely on some shared state and you haven't synchronized access to this state chances are this shared state will get corrupted. But you don't need to hit the method at the same nanosecond by 20 requests to corrupt your shared state. 2 suffice largely if you don't synchronize it.
So static methods by themselves are not evil (well actually they are as they are not unit test friendly but that's another topic), it's the way they are implemented that matters in a multithreaded environment. So you should make them thread safe.
UPDATE:
Because in the comments section you mentioned LINQ-TO-SQL as long as all variables used in the static method are local, this method is thread-safe. For example:
public static SomeEntity GetEntity(int id)
{
using (var db = new SomeDbContext())
{
return db.SomeEntities.FirstOrDefault(x => x.Id == id);
}
}
you must ensure your methods are thread safe, so don't use static attributes to store any kind of state. If you are declaring new objects inside the static method, there is no problem because each thread have its own object.
It depends if the static class has any state or not (i.e. static variables shared across all calls). If it does not, then it's fine. If it does, it's not good. Examples:
// Fine
static class Whatever
{
public string DoSomething() {
return "something";
}
}
// Death from above
static class WhateverUnsafe
{
static int count = 0;
public int Count() {
return ++count;
}
}
You can make the second work fine using locks, but then you introduce deadlocks and concurrency issues.
I have built massive web applications with static classes but they never have any shared state.
It crashes out in a nasty way (if you are doing this to share state), avoid doing this in a webapp... Or alternativly protect the reads/writes with a lock:
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx
But honestly you really should avoid using statics, unless you REALLY have to, and if you really have to you have to be very careful with your locking strategy and test it to destruction to make sure have managed to isolated reads and writes from each other