Parallel processing of two object of same class MRO - c#

If we have two object of same class that runs parallel to each other
object1 // runs on processor 1
object2 // runs on processor 2
In C# class objects have it's own set data members and share same set of functions.
How compiler will allocate methods to class objects if both class want to execute same method at same time.
object1.process();
object2.process();
How compiler will decide priorities of same class objects at run time

I think I understand the question... Methods are code. They are bytes like data members but you can be sure those bytes do not change. So there is no issue with "allocation", the code can be executed on any thread at all times without the risc of data corruption.
Indirectly however the method's code may access data members. And you will have so make sure those members will not be changed in an interleaved manner by the different threads.
You can do this in a number of ways which I am sure will be documented all over the net (check re-entrancy, locking, semaphores, mutexes and atomic operations).

Related

How to setup global variables per Parallel.Foreach iteration?

I'm looking to find a way to setup a variable inside a Parallel.Foreach loop and make the variable easily accessible anywhere in the system, to avoid having to pass all desired values deep into the system as parameters. This is primarily for logging purposes
Parallel.ForEach(orderIds, options, orderId =>
{
var currentOrderId = orderId;
});
And sometime later, deep in the code
public void DeepMethod(string searchVal)
{
// Access currentOrderId here somehow, so I can log this was called for the specified order
}
As noted in the comments, globally-scoped state for concurrently executing code is a poor design choice. If done correctly, you wind up with hard-to-maintain code and contention between concurrently executing code. If done incorrectly, you wind up with hard-to-find, hard-to-fix bugs.
There's not much context in your question, so it's impossible to suggest anything specific. But, given the description you've provided, the usual approach would be to define a class that represents the state for the concurrently executed operation, in which you keep the value or values that you want to be able to access at the "deep" level of the "system" (by this, I infer that you mean "deep" as in depth of call stack, and "system" as in the collection of methods involved in implementing this operation).
By using a class to contain the values and implementation of your concurrently executed operation, you then would have direct access to the value that's specific to that particular branch (thread) of the concurrently executed operation, as an instance field of your class, in the methods implemented in that class.
More broadly: a major tenet in writing concurrent code is to avoid sharing mutable data between threads. Shared data should be immutable (e.g. like a string object), and mutated data (like status values that you seem to be describing here) should be kept in data structures that are private to each thread.

Is it permissible to cache/reuse Thread.GetNamedSlot between threads?

The Thread.GetNamedDataSlot method acquires a slot name that can be used with Thread.SetData.
Can the result of the GetNamedDataSlot function be cached (and reused across all threads) or should it be invoked in/for every thread?
The documentation does not explicitly say it "shouldn't" be re-used although it does not say it can be either. Furthermore, the example shows GetNamedDataSlot used at every GetData/SetData site; even within the same thread.
For example (note that BarSlot slot is not created/assigned on all specific threads that the TLS is accessed);
public Foo {
private static LocalStorageDataSlot BarSlot = Thread.GetNamedDataSlot("foo_bar");
public static void SetMethodCalledFromManyThreads(string awesome) {
Thread.SetData(BarSlot, awesome);
}
public static void ReadMethodCalledFromManyThreads() {
Console.WriteLine("Data:" + Thread.GetData(BarSlot));
}
}
I asks this question in relationship to code structure; any micro performance gains, if any, are a freebie. Any critical issues or performance degradation with the reuse make it not a viable option.
Can the result of the GetNamedDataSlot function be cached (and reused across all threads) or should it be invoked in/for every thread?
Unfortunately, the documentation isn't 100% clear on this point. Some interesting passages include…
From Thread.GetNamedDataSlot Method (String):
Data slots are unique per thread. No other thread (not even a child thread) can get that data
And from LocalDataStoreSlot Class:
The data slots are unique per thread or context; their values are not shared between the thread or context objects
At best, these make clear that each thread gets its own copy of the data. But the passages can be read to mean either that the LocalDataStoreSlot itself is per-thread, or simply the data to which it refers is per-thread. I believe it's the latter, but I can't point to a specific MSDN page that says so.
So, we can look at the implementation details:
There is a single slot manager per process, which is used to maintain all of the per-thread slots. A LocalDataStoreSlot returned in one thread can be passed to another thread and used there, and it would be owned by the same manager, and use the same slot index (because the slot table is also per-process). It also happens that the Thread.SetData() method will implicitly create the thread-local data store for that slot if it doesn't already exist.
The Thread.GetData() method simply returns null if you haven't already set a value or the thread-local data store hasn't been created. So, the behavior of GetData() remains consistent whether or not you have called SetData() in that thread already.
Since the slots are managed at a process-level basis, you can reuse the LocalDataStoreSlot values across threads. Once allocated, the slot is used up for all threads, and the data stored for that slot will be unique for each thread. Sharing the LocalDataStoreSlot value across threads shares the slot, but even for a single slot, you get thread-local storage for each thread.
Indeed, looking at it this way, the implementation you show would be the desirable way to use this API. After all, it's an alternative to [ThreadStatic], and the only way to ensure a different LocalDataStoreSlot value for each thread in your code would be either to use [ThreadStatic] (which if you wanted to use, you should have just used for the data itself), or to maintain your own dictionary of LocalDataStoreSlot values, indexed presumably by Thread.ManagedThreadId.
Personally, I'd just use [ThreadStatic]. MSDN even recommends this, and it has IMHO clearer semantics. But if you want to use LocalDataStoreSlot, it seems to me that the implementation you have is correct.

How to map ImmutableArray without getting it cast to IEnumerable which is not thread safe?

So I'm working in a multithreaded environment and I wan't to use ImmutableArray all the time because it's thread safe.
Unfortunately, ImmutableArray implements thread unsafe interfaces and so Select method from LINQ returns IEnumerable.
This way, my thread safe variable becomes thread unsafe.
How do I map from ImmutableArray to ImmutableArray?
It seems that there are a lot of misunderstandings behind this question. You need to go look at the source code for the Select method and learn about the yield keyword.
Second, LINQ methods are made to be short-lived. You have various threads doing various processing tasks. Are you using a pipeline situation, where you want to transform data in one thread and pass the result to another thread? You have to be careful with the yield keyword in that situation; essentially, you need to flush (er, realize, for lack of a better word) your collections before passing them to the next thread so that the actual work is done in the present thread. In that scenario, object ownership kicks in and you don't need thread-safe collections.
In short, the enumerable returned from calling Select on ImmutableArray is perfectly thread-safe. You can realize it at any point and it won't give you any errors. Of course it will only iterate through the data that was contained in your collection at the time you called Select. It won't know anything about newly assigned instances.

Can a write scheduler be optimized using pending writes, total writes, and total bytes written?

I'd like to confirm that this interview question is essentially impossible:
The goal of this project is to create an implementation of a device
that can write files to a directory and then to design a write
scheduling system for a set of devices. The write scheduler system
should consist of an interface named WriteScheduler and at least two
concrete implementations of it. The first should be a fair round robin
scheduler and the second should be a write scheduler that uses some
combination of pending writes, total writes, and total bytes written
to optimize the writes in some way. All code for this project should
be thread-safe.
Given an unspecified interface, how could a scheduler be optimized with just that data?
It says "...to optimize the writes in some way".
So exactly what you attempt to optimize for is apparently pretty much up to you. For that matter, it may not even be particularly important that your attempted optimization be highly (or at all) successful.
If I had to guess, I'd say they're probably mostly interesting in the basic idea that you can define an abstract interface, then write a couple of implementations of that interface that differ in at least some semi-meaningful way (but still meet the interface's specification).
I presume, that the tag c++ was a mistake?
Ok, to the subject:
I can imagine an interface from what you provide us as this...
IDevice: this interface can be even empty interface
Device:
must implement IDevice interface,
constructor consumes IWriteScheduler instannce,
must be thread safe for next ops,
method open(string path),
method create(string path),
method read(out byte[] buffer, int attempt_read_n),
method write(in byte[] buffer),
method close(),
device will place lock around all public methods,
all methods will be potentially blocking,
Now IWriteScheduler:
Task WriteAsync(IDevice device, int size, Action writing_task)
Device behaviour:
Device in write operation is going to create an action with operation
to write data into a file,
calling an instance of IWriteScheduler:
IWriteScheduler.WriteAsync(this, 41561, action) and then waiting on it,
OR the Device can return the Task back to caller of Write operation,
to let it decide if waiting is appropriate,
that is all,
WriteScheduler:
will implement IWriteScheduler,
has to have a lock on WriteAsync operation,
private class DevicesData
{IDevice, total_bytes, pending_writes, total_writes, Task last_task }
initial last_task has to be task_completed, could be created under
older .net versions,
has to implement dictionary,
virtual WriteAsync method, the default implementation would be
round robin,
what means that in BlockingWrite method, the scheduler would update the
correct DeviceData based on dictionary key IDevice and used last_task
of the DeviceData to start next one by chaining based on ContinueWith,
simple chaining of tasks is going to ensure round robin scheduling,
Other scheduling methods simply by inheriting from WriteScheduler, overriding the:
virtual Task WriteAsync(IDevice device, int size, Action writing_task);
Task chaining allows to put even delays on write operations and many more tricks.
For scheduiling based on on the pending_writes, total_writes and total_bytes, well there is many ways to schedule and build specific rules that such data structure could handle:
there could be writes per second,
there could be bytes per second,
these could help to keep hard drives from burning out :),
both could be handled with data structure by inserting tasks with
thread.sleep what would work, but would be awful :),
a better solution for delays would be Thread.Timer inside DeviceData
and an queue of writing tasks.
I hope it was helpful, I provided analysis on implementation of the interface and some ideas for scheduling scenarios...
/IP/

thread-safety of primitive concurrent read and write

Simplified illustration below, how does .NET deal with such a situation?
and if it would cause problems, would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
A field somewhere
public class CrossRoads(){
public int _timeouts;
}
A background thread writer
public void TimeIsUp(CrossRoads crossRoads){
crossRoads._timeouts++;
}
Possibly at the same time, trying to read elsewhere
public void HowManyTimeOuts(CrossRoads crossRoads){
int timeOuts = crossRoads._timeouts;
}
The simple answer is that the above code has the ability to cause problems if accessed simultaneously from multiple threads.
The .Net framework provides two solutions: interlocking and thread synchronization.
For simple data type manipulation (i.e. ints), interlocking using the Interlocked class will work correctly and is the recommended approach.
In fact, interlocked provides specific methods (Increment and Decrement) that make this process easy:
Add an IncrementCount method to your CrossRoads class:
public void IncrementCount() {
Interlocked.Increment(ref _timeouts);
}
Then call this from your background worker:
public void TimeIsUp(CrossRoads crossRoads){
crossRoads.IncrementCount();
}
The reading of the value, unless of a 64-bit value on a 32-bit OS, are atomic. See the Interlocked.Read method documentation for more detail.
For class objects or more complex operations, you will need to use thread synchronization locking (lock in C# or SyncLock in VB.Net).
This is accomplished by creating a static synchronization object at the level the lock is to be applied (for example, inside your class), obtaining a lock on that object, and performing (only) the necessary operations inside that lock:
private static object SynchronizationObject = new Object();
public void PerformSomeCriticalWork()
{
lock (SynchronizationObject)
{
// do some critical work
}
}
The good news is that reads and writes to ints are guaranteed to be atomic, so no torn values. However, it is not guaranteed to do a safe ++, and the read could potentially be cached in registers. There's also the issue of instruction re-ordering.
I would use:
Interlocked.Increment(ref crossroads._timeouts);
For the write, which will ensure no values are lost, and;
int timeouts = Interlocked.CompareExchange(ref crossroads._timeouts, 0, 0);
For the read, since this observes the same rules as the increment. Strictly speaking "volatile" is probably enough for the read, but it is so poorly understood that the Interlocked seems (IMO) safer. Either way, we're avoiding a lock.
Well, I'm not a C# developer, but this is how it typically works at this level:
how does .NET deal with such a situation?
Unlocked. Not likely to be guaranteed to be atomic.
Would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
Yes. An alternative would be to make a lock for the object available to the clients, then tell the clients they must lock the object while using the instance. This will reduce the number of locks acquisitions, and guarantee a more consistent, predictable, state for your clients.
Forget dotnet. At the machine language level, crossRoads._timeouts++ will be implemented as an INC [memory] instruction. This is known as a Read-Modify-Write instruction. These instructions are atomic with respect to multi-threading on a single processor*, (essentially implemented with time-slicing,) but are not atomic with respect to multi-threading using multiple processors or multiple cores.
So:
If you can guarantee that only TimeIsUp() will ever modify crossRoads._timeouts, and if you can guarantee that only one thread will ever execute TimeIsUp(), then it will be safe to do this. The writing in TimeIsUp() will work fine, and the reading in HowManyTimeOuts() (and any place else) will work fine. But if you also modify crossRoads._timeouts elsewhere, or if you ever spawn one more background thread writer, you will be in trouble.
In either case, my advice would be to play it safe and lock it.
(*) They are atomic with respect to multi-threading on a single processor because context switches between threads happen on a periodic interrupt, and on the x86 architectures these instructions are atomic with respect to interrupts, meaning that if an interrupt occurs while the CPU is executing such an instruction, the interrupt will wait until the instruction completes. This does not hold true with more complex instructions, for example those with the REP prefix.
Although an int may be 'native' size to a CPU (dealing in 32 or 64 bits at a time), if you are reading and writing from different threads to the same variable, you are best off locking this variable and synchronizing access.
There is never a guarantee that reads/writes maybe atomic to an int.
You can also use Interlocked.Increment for your purposes here.

Categories

Resources