How to setup global variables per Parallel.Foreach iteration? - c#

I'm looking to find a way to setup a variable inside a Parallel.Foreach loop and make the variable easily accessible anywhere in the system, to avoid having to pass all desired values deep into the system as parameters. This is primarily for logging purposes
Parallel.ForEach(orderIds, options, orderId =>
{
var currentOrderId = orderId;
});
And sometime later, deep in the code
public void DeepMethod(string searchVal)
{
// Access currentOrderId here somehow, so I can log this was called for the specified order
}

As noted in the comments, globally-scoped state for concurrently executing code is a poor design choice. If done correctly, you wind up with hard-to-maintain code and contention between concurrently executing code. If done incorrectly, you wind up with hard-to-find, hard-to-fix bugs.
There's not much context in your question, so it's impossible to suggest anything specific. But, given the description you've provided, the usual approach would be to define a class that represents the state for the concurrently executed operation, in which you keep the value or values that you want to be able to access at the "deep" level of the "system" (by this, I infer that you mean "deep" as in depth of call stack, and "system" as in the collection of methods involved in implementing this operation).
By using a class to contain the values and implementation of your concurrently executed operation, you then would have direct access to the value that's specific to that particular branch (thread) of the concurrently executed operation, as an instance field of your class, in the methods implemented in that class.
More broadly: a major tenet in writing concurrent code is to avoid sharing mutable data between threads. Shared data should be immutable (e.g. like a string object), and mutated data (like status values that you seem to be describing here) should be kept in data structures that are private to each thread.

Related

Is it permissible to cache/reuse Thread.GetNamedSlot between threads?

The Thread.GetNamedDataSlot method acquires a slot name that can be used with Thread.SetData.
Can the result of the GetNamedDataSlot function be cached (and reused across all threads) or should it be invoked in/for every thread?
The documentation does not explicitly say it "shouldn't" be re-used although it does not say it can be either. Furthermore, the example shows GetNamedDataSlot used at every GetData/SetData site; even within the same thread.
For example (note that BarSlot slot is not created/assigned on all specific threads that the TLS is accessed);
public Foo {
private static LocalStorageDataSlot BarSlot = Thread.GetNamedDataSlot("foo_bar");
public static void SetMethodCalledFromManyThreads(string awesome) {
Thread.SetData(BarSlot, awesome);
}
public static void ReadMethodCalledFromManyThreads() {
Console.WriteLine("Data:" + Thread.GetData(BarSlot));
}
}
I asks this question in relationship to code structure; any micro performance gains, if any, are a freebie. Any critical issues or performance degradation with the reuse make it not a viable option.
Can the result of the GetNamedDataSlot function be cached (and reused across all threads) or should it be invoked in/for every thread?
Unfortunately, the documentation isn't 100% clear on this point. Some interesting passages include…
From Thread.GetNamedDataSlot Method (String):
Data slots are unique per thread. No other thread (not even a child thread) can get that data
And from LocalDataStoreSlot Class:
The data slots are unique per thread or context; their values are not shared between the thread or context objects
At best, these make clear that each thread gets its own copy of the data. But the passages can be read to mean either that the LocalDataStoreSlot itself is per-thread, or simply the data to which it refers is per-thread. I believe it's the latter, but I can't point to a specific MSDN page that says so.
So, we can look at the implementation details:
There is a single slot manager per process, which is used to maintain all of the per-thread slots. A LocalDataStoreSlot returned in one thread can be passed to another thread and used there, and it would be owned by the same manager, and use the same slot index (because the slot table is also per-process). It also happens that the Thread.SetData() method will implicitly create the thread-local data store for that slot if it doesn't already exist.
The Thread.GetData() method simply returns null if you haven't already set a value or the thread-local data store hasn't been created. So, the behavior of GetData() remains consistent whether or not you have called SetData() in that thread already.
Since the slots are managed at a process-level basis, you can reuse the LocalDataStoreSlot values across threads. Once allocated, the slot is used up for all threads, and the data stored for that slot will be unique for each thread. Sharing the LocalDataStoreSlot value across threads shares the slot, but even for a single slot, you get thread-local storage for each thread.
Indeed, looking at it this way, the implementation you show would be the desirable way to use this API. After all, it's an alternative to [ThreadStatic], and the only way to ensure a different LocalDataStoreSlot value for each thread in your code would be either to use [ThreadStatic] (which if you wanted to use, you should have just used for the data itself), or to maintain your own dictionary of LocalDataStoreSlot values, indexed presumably by Thread.ManagedThreadId.
Personally, I'd just use [ThreadStatic]. MSDN even recommends this, and it has IMHO clearer semantics. But if you want to use LocalDataStoreSlot, it seems to me that the implementation you have is correct.

Calling static async methods from ASP.net project

I'm wondering will this scenario be thread safe and are there issues that I'm not currently seeing:
From ASP.net controller I call non-static method from non-static class (this class is in another project, and class is injected into controller).
This method (which is non-static) does some work and calls some other static method passing it userId
Finally static method does some work (for which userId is needed)
I believe this approach is thread safe, and that everything will be done properly if two users call this method at the same time (let's say in same nanosecond). Am I correct or completely wrong ? If I am wrong what would be correct way of using static methods within ASP.net project ?
EDIT
Here is code :)
This is call from the controller:
await _workoutService.DeleteWorkoutByIdAsync(AzureRedisFeedsConnectionMultiplexer.GetRedisDatabase(),AzureRedisLeaderBoardConnectionMultiplexer.GetRedisDatabase(), workout.Id, userId);
Here how DeleteWorkoutByIdAsync looks like:
public async Task<bool> DeleteWorkoutByIdAsync(IDatabase redisDb,IDatabase redisLeaderBoardDb, Guid id, string userId)
{
using (var databaseContext = new DatabaseContext())
{
var workout = await databaseContext.Trenings.FindAsync(id);
if (workout == null)
{
return false;
}
databaseContext.Trenings.Remove(workout);
await databaseContext.SaveChangesAsync();
await RedisFeedService.StaticDeleteFeedItemFromFeedsAsync(redisDb,redisLeaderBoardDb, userId, workout.TreningId.ToString());
}
return true;
}
As you can notice DeleteWorkoutByIdAsync calls static method StaticDeleteFeedItemFromFeedsAsync which looks like this:
public static async Task StaticDeleteFeedItemFromFeedsAsync(IDatabase redisDb,IDatabase redisLeaderBoardDd, string userId, string workoutId)
{
var deleteTasks = new List<Task>();
var feedAllRedisVals = await redisDb.ListRangeAsync("FeedAllWorkouts:" + userId);
DeleteItemFromRedisAsync(redisDb, feedAllRedisVals, "FeedAllWorkouts:" + userId, workoutId, ref deleteTasks);
await Task.WhenAll(deleteTasks);
}
And here is static method DeleteItemFromRedisAsync which is called in StaticDeleteFeedItemFromFeedsAsync:
private static void DeleteItemFromRedisAsync(IDatabase redisDb, RedisValue [] feed, string redisKey, string workoutId, ref List<Task> deleteTasks)
{
var itemToRemove = "";
foreach (var f in feed)
{
if (f.ToString().Contains(workoutId))
{
itemToRemove = f;
break;
}
}
if (!string.IsNullOrEmpty(itemToRemove))
{
deleteTasks.Add(redisDb.ListRemoveAsync(redisKey, itemToRemove));
}
}
"Thread safe" isn't a standalone term. Thread Safe in the the face of what? What kind of concurrent modifications are you expecting here?
Let's look at a few aspects here:
Your own mutable shared state: You have no shared state whatsoever in this code; so it's automatically thread safe.
Indirect shared state: DatabaseContext. This looks like an sql database, and those tend to be thread "safe", but what exactly that means depends on the database in question. For example, you're removing a Trenings row, and if some other thread also removes the same row, you're likely to get a (safe) concurrency violation exception. And depending on isolation level, you may get concurrency violation exceptions even for other certain mutations of "Trenings". At worst that means one failed request, but the database itself won't corrupt.
Redis is essentially single-threaded, so all operations are serialized and in that sense "thread safe" (which might not buy you much). Your delete code gets a set of keys, then deletes at most one of those. If two or more threads simultaneously attempt to delete the same key, it is possible that one thread will attempt to delete a non-existing key, and that may be unexpected to you (but it won't cause DB corruption).
Implicit consistency between redis+sql: It looks like you're using guids, so the chances of unrelated things clashing are small. Your example only contains a delete operation (which is likely no to cause consistency issues), so it's hard to speculate whether under all other circumstances redis and the sql database will stay consistent. In general, if your IDs are never reused, you're probably safe - but keeping two databases in sync is a hard problem, and you're quite likely to make a mistake somewhere.
However, your code seems excessively complicated for what it's doing. I'd recommend you simplify it dramatically if you want to be able to maintain this in the long run.
Don't use ref parameters unless you really know what you're doing (and it's not necessary here).
Don't mix up strings with other data types, so avoid ToString() where possible. Definitely avoid nasty tricks like Contains to check for key equality. You want your code to break when something unexpected happens, because code that "limps along" can be virtually impossible to debug (and you will write bugs).
Don't effectively return an array of tasks if the only thing you can really do is wait for all of them - might as well do that in the callee to simplify the API.
Don't use redis. It's probably just a distraction here - you already have another database, so it's very unlikely you need it here, except for performance reasons, and it's extremely premature to go adding whole extra database engines for a hypothetical performance problem. There's a reasonable chance that the extra overhead of requiring extra connections may make your code slower than if you had just one db, especially if you can't save many sql queries.
Note: this answer was posted before the OP amended their question to add their code, revealing that this is actually a question of whether async/await is thread-safe.
Static methods are not a problem in and of themselves. If a static method is self-contained and manages to do its job using local variables only, then it is perfectly thread safe.
Problems arise if the static method is not self-contained, (delegates to thread-unsafe code,) or if it manipulates static state in a non-thread safe fashion, i.e. accesses static variables for both read and write outside of a lock() clause.
For example, int.parse() and int.tryParse() are static, but perfectly thread safe. Imagine the horror if they were not thread-safe.
what you are doing here is synchronizing on a list (deleteTasks). If you do this i would recommend 1 of 2 things.
1) Either use thread safe collections
https://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
2) Let your DeleteItemFromRedisAsync return a task and await it.
Although i think in this particular case i don't see any issues as soon as you refactor it and DeleteItemFromRedisAsync can get called multiple times in parallel then you will have issues. The reason being is that if multiple threads can modify your list of deleteTasks then you are not longer guaranteed you collect them all (https://msdn.microsoft.com/en-us/library/dd997373(v=vs.110).aspx if 2 threads do an "Add"/Add-to-the-end in a non-thread safe way at the same time then 1 of them is lost) so you might have missed a task when waiting for all of them to finish.
Also i would avoid mixing paradigms. Either use async/await or keep track of a collection of tasks and let methods add to that list. don't do both. This will help the maintainability of your code in the long run. (note, threads can still return a task, you collect those and then wait for all of them. but then the collecting method is responsible for any threading issues instead of it being hidden in the method that is being called)

Parallel processing of two object of same class MRO

If we have two object of same class that runs parallel to each other
object1 // runs on processor 1
object2 // runs on processor 2
In C# class objects have it's own set data members and share same set of functions.
How compiler will allocate methods to class objects if both class want to execute same method at same time.
object1.process();
object2.process();
How compiler will decide priorities of same class objects at run time
I think I understand the question... Methods are code. They are bytes like data members but you can be sure those bytes do not change. So there is no issue with "allocation", the code can be executed on any thread at all times without the risc of data corruption.
Indirectly however the method's code may access data members. And you will have so make sure those members will not be changed in an interleaved manner by the different threads.
You can do this in a number of ways which I am sure will be documented all over the net (check re-entrancy, locking, semaphores, mutexes and atomic operations).

C# TPL: Invoke method on outer scoped instance

So my title was fairly obscure, here is what I'm worried about. Can I invoke a method on an instance of a class that is declared outside of the block without suffering pitfalls i.e
Are there concurrency issues for code as structured below.
HeavyLifter hl = new HeavyLifter();
var someActionBlock = new ActionBlock<string>(n =>
{
int liftedStuff= hl.DoSomeHeavyLifting(n);
if (liftedStuff> 0)
.....
});
The source of my concerns are below.
The Block may have multiple threads running at the same time, and each of these threads may enter the DoSomeHeavyLifting method. Does each function invocation get its own frame pointer? Should I make sure I don't reference any variables outside of the CountWords scope?
Is there a better way to do this than to instantiate a HeavyLifter in my block?
Any help is greatly appreciated, I'm not too lost, but I know Concurrency is the King of latent errors and corner cases.
Assuming by frame pointer, that you mean stack frame, then yes, each invocation gets it's own stack frame, and associated variables. If parameters to the function are reference types, then all of the parameters will refer to the same object.
Whether or not it's safe to use the same HeavyLifter instance for all invocations depends on whether the DoSomeHeavyLifting method has side effects. That is, whether DoSomeHeavyLifting modifies any of the contents of the HeavyLifter object's state. (or any other referenced objects)
Ultimately whether it is safe to do this depends largely on what DoSomeHeavyLifting does internally. If it's carefully constructed in order to be reentrant then there are no problems calling it the way you have it. If however, DoSomeHeavyLifting modifies the state, or the state is modified as a side effect of any other operation, then the decision would have to be made in the context of the overall architecture how to handle it. For example, do you allow the state change, and enforce atomicity, or do you prevent any state change that affects the operation? Without knowing what the method is actually doing, it's impossible to give any specific advice.
In general when designing for concurrency it's usually best to assume the worst:
If a race condition can happen, it will.
When a race condition happens, you will lose the race in the most complex way your code allows.
Non-atomic state updates will corrupt each other, and leave your object in an undefined state.
If you use a lock there will be a case where you could deadlock.
Something that doesn't ever happen in debug, will always happen in release.

Does using ConcurrentDictionary TryGetValue within an if statement make the if contents thread-safe?

If I have a ConcurrentDictionary and use the TryGetValue within an if statement, does this make the if statement's contents thread safe? Or must you lock still within the if statement?
Example:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
or do I have to do:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
lock (client)
{
//Users is a list.
client.Users.Add(item);
}
}
Yes you have to lock inside the if statement the only guarantee you get from concurrent dictionary is that its methods are thread save.
The accepted answer could be misleading, depending on your point of view and the scope of thread safety you are trying to achieve. This answer is aimed at people who stumble on this question while learning about threading and concurrency:
It's true that locking on the output of the dictionary retrieval (the Client object) makes some of the code thread safe, but only the code that is accessing that retrieved object within the lock. In the example, it's possible that another thread removes that object from the dictionary after the current thread retrieves it. (Even though there are no statements between the retrieval and the lock, other threads can still execute in between.) Then, this code would add the Client object to the Users list even though it is no longer in the concurrent dictionary. That could cause an exception, synchronization, or race condition.
It depends on what the rest of the program is doing. But in the scenario I'm describing, it would be safer to put the lock around the entire dictionary retrieval. And then a regular dictionary might be faster and simpler than a concurrent dictionary, as long as you always lock on it while using it!
While both of the current answers are technically true I think that the potential exists for them to be a little misleading and they don't express ConcurrentDictionary's big strengths. Maybe the OP's original way of solving the problem with locks worked in that specific circumstance but this answer is aimed more generally towards people learning about ConcurrentDictionary for the first time.
Concurrent Dictionary is designed so that you don't have to use locks. It has several specialty methods designed around the idea that some other thread could modify the object in the dictionary while you're currently working on it. For a simple example, the TryUpdate method lets you check to see if a key's value has changed between when you got it and the moment that you're trying to update it. If the value that you've got matches the value currently in the ConcurrentDictionary you can update it and TryUpdate returns true. If not, TryUpdate returns false. The documentation for the TryUpdate method can make this a little confusing because it doesn't make it explicitly clear why there is a comparison value but that's the idea behind the comparison value. If you wanted to have a little more control around adding or updating, you could use one of the overloads of the AddOrUpdate method to either add a value for a key if it doesn't exist at the moment that you're trying to add it or update the value if some other thread has already added a value for the key that is specified. The context of whatever you're trying to do will dictate the appropriate method to use. The point is that, rather than locking, try taking a look at the specialty methods that ConcurrentDictionary provides and prefer those over trying to come up with your own locking solution.
In the case of OP's original question, I would suggest that instead of this:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
One might try the following instead*:
ConcurrentDictionary<Guid, Client> m_Clients;
Client originalClient;
if(m_Clients.TryGetValue(clientGUID, out originalClient)
{
//The Client object will need to implement IEquatable if more
//than an object instance comparison needs to be done. This
//sample code assumes that Client implements IEquatable.
//If copying a Client is not trivial, you'll probably want to
//also implement a simple type of copy in a method of the Client
//object. This sample code assumes that the Client object has
//a ShallowCopy method to do this copy for simplicity's sake.
Client modifiedClient = originalClient.ShallowCopy();
//Make whatever modifications to modifiedClient that need to get
//made...
modifiedClient.Users.Add(item);
//Now update the value in the ConcurrentDictionary
if(!m_Clients.TryUpdate(clientGuid, modifiedClient, originalClient))
{
//Do something if the Client object was updated in between
//when it was retrieved and when the code here tries to
//modify it.
}
}
*Note in the example above, I'm using TryUpate for ease of demonstrating the concept. In practice, if you need to make sure that an object gets added if it doesn't exist or updated if it does, the AddOrUpdate method would be the ideal option because the method handles all of the looping required to check for add vs update and take the appropriate action.
It might seem like it's a little harder at first because it may be necessary to implement IEquatable and, depending on how instances of Client need to be copied, some sort of copying functionality but it pays off in the long run if you're working with ConcurrentDictionary and objects within it in any serious way.

Categories

Resources