C# TPL: Invoke method on outer scoped instance - c#

So my title was fairly obscure, here is what I'm worried about. Can I invoke a method on an instance of a class that is declared outside of the block without suffering pitfalls i.e
Are there concurrency issues for code as structured below.
HeavyLifter hl = new HeavyLifter();
var someActionBlock = new ActionBlock<string>(n =>
{
int liftedStuff= hl.DoSomeHeavyLifting(n);
if (liftedStuff> 0)
.....
});
The source of my concerns are below.
The Block may have multiple threads running at the same time, and each of these threads may enter the DoSomeHeavyLifting method. Does each function invocation get its own frame pointer? Should I make sure I don't reference any variables outside of the CountWords scope?
Is there a better way to do this than to instantiate a HeavyLifter in my block?
Any help is greatly appreciated, I'm not too lost, but I know Concurrency is the King of latent errors and corner cases.

Assuming by frame pointer, that you mean stack frame, then yes, each invocation gets it's own stack frame, and associated variables. If parameters to the function are reference types, then all of the parameters will refer to the same object.
Whether or not it's safe to use the same HeavyLifter instance for all invocations depends on whether the DoSomeHeavyLifting method has side effects. That is, whether DoSomeHeavyLifting modifies any of the contents of the HeavyLifter object's state. (or any other referenced objects)
Ultimately whether it is safe to do this depends largely on what DoSomeHeavyLifting does internally. If it's carefully constructed in order to be reentrant then there are no problems calling it the way you have it. If however, DoSomeHeavyLifting modifies the state, or the state is modified as a side effect of any other operation, then the decision would have to be made in the context of the overall architecture how to handle it. For example, do you allow the state change, and enforce atomicity, or do you prevent any state change that affects the operation? Without knowing what the method is actually doing, it's impossible to give any specific advice.
In general when designing for concurrency it's usually best to assume the worst:
If a race condition can happen, it will.
When a race condition happens, you will lose the race in the most complex way your code allows.
Non-atomic state updates will corrupt each other, and leave your object in an undefined state.
If you use a lock there will be a case where you could deadlock.
Something that doesn't ever happen in debug, will always happen in release.

Related

Why my code does not speed up with a multithreaded Parallel.For loop?

I tried to transform a simple sequential loop into a parallel computed loop with the System.Threading.Tasks library.
The code compiles, returns correct results, but It does not save any computational cost, otherwise, it takes longer.
EDIT: Sorry guys, I have probably oversimplified the question and made some errors doing that.
To append additional information, I am running the code on an i7-4700QM, and it is referenced in a Grasshopper script.
Here is the actual code. I also switched to a non thread-local variables
public static class LineNet
{
public static List<Ray> SolveCpu(List<Speaker> sources, List<Receiver> targets, List<Panel> surfaces)
{
ConcurrentBag<Ray> rays = new ConcurrentBag<Ray>();
for (int i = 0; i < sources.Count; i++)
{
Parallel.For(
0,
targets.Count,
j =>
{
Line path = new Line(sources[i].Position, targets[j].Position);
Ray ray = new Ray(path, i, j);
if (Utils.CheckObstacles(ray,surfaces))
{
rays.Add(ray);
}
}
);
}
}
}
The Grasshopper implementation just collects sources targets and surfaces, calls the method Solve and returns rays.
I understand that dispatching workload to threads is expensive, but is it so expensive?
Or is the ConcurrentBag just preventing parallel calculation?
Plus, my classes are immutable (?), but if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?
Without a good Minimal, Complete, and Verifiable code example that reliably reproduces the problem, it is not possible to provide a definitive answer. The code you posted does not even appear to be an excerpt of real code, because the type declared as the return type of the method isn't the same as the value actually returned by the return statement.
However, certainly the code you posted does not seem like a good use of Parallel.For(). Your Line constructor would have be fairly expensive to justify parallelizing the task of creating the items. And to be clear, that's the only possible win here.
At the end, you still need to aggregate all of the Line instances that you created into a single list, so all those intermediate lists created for the Parallel.For() tasks are just pure overhead. And the aggregation is necessarily serialized (i.e. only one thread at a time can be adding an item to the result collection), and in the worst way (each thread only gets to add a single item before it gives up the lock and another thread has a chance to take it).
Frankly, you'd be better off storing each local List<T> in a collection, and then aggregating them all at once in the main thread after Parallel.For() returns. Not that that would be likely to make the code perform better than a straight-up non-parallelized implementation. But at least it would be less likely to be worse. :)
The bottom line is that you don't seem to have a workload that could benefit from parallelization. If you think otherwise, you'll need to explain the basis for that thought in a clearer, more detailed way.
if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?
You're already using (it appears) List<T> as the local data for each task, and indeed that should be fine, as tasks don't share their local data.
But if you are asking why you get an exception if you try to use List<T> instead of ConcurrentBag<T> for the result variable, well that's entirely to be expected. The List<T> class is not thread safe, but Parallel.For() will allow each task it runs to execute the localFinally delegate concurrently with all the others. So you have multiple threads all trying to modify the same not-thread-safe collection concurrently. This is a recipe for disaster. You're fortunate you get the exception; the actual behavior is undefined, and it's just as likely you'll simply corrupt the data structure as cause a run-time exception.

How to setup global variables per Parallel.Foreach iteration?

I'm looking to find a way to setup a variable inside a Parallel.Foreach loop and make the variable easily accessible anywhere in the system, to avoid having to pass all desired values deep into the system as parameters. This is primarily for logging purposes
Parallel.ForEach(orderIds, options, orderId =>
{
var currentOrderId = orderId;
});
And sometime later, deep in the code
public void DeepMethod(string searchVal)
{
// Access currentOrderId here somehow, so I can log this was called for the specified order
}
As noted in the comments, globally-scoped state for concurrently executing code is a poor design choice. If done correctly, you wind up with hard-to-maintain code and contention between concurrently executing code. If done incorrectly, you wind up with hard-to-find, hard-to-fix bugs.
There's not much context in your question, so it's impossible to suggest anything specific. But, given the description you've provided, the usual approach would be to define a class that represents the state for the concurrently executed operation, in which you keep the value or values that you want to be able to access at the "deep" level of the "system" (by this, I infer that you mean "deep" as in depth of call stack, and "system" as in the collection of methods involved in implementing this operation).
By using a class to contain the values and implementation of your concurrently executed operation, you then would have direct access to the value that's specific to that particular branch (thread) of the concurrently executed operation, as an instance field of your class, in the methods implemented in that class.
More broadly: a major tenet in writing concurrent code is to avoid sharing mutable data between threads. Shared data should be immutable (e.g. like a string object), and mutated data (like status values that you seem to be describing here) should be kept in data structures that are private to each thread.

Do until a var is not null [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Edit a few years on: this was clearly a terrible approach, at the start of me using C#/.NET. Hopefully this question helps another noob with the same "problem".
Is this the best way to approach this scenario?
while(true)
{
if(Main.ActiveForm != null)
{
Main.ActiveForm.Invoke(new MethodInvoker(Main.SomeMethod));
break;
}
}
This is performed on a second thread.
Is this the best way to approach this scenario?
Just to clarify, the scenario is "I have a property of reference type; as soon as the property is not null I wish to invoke one of its methods", and the technique is "spin up another thread, busy-wait until the value is not null, invoke, and stop waiting".
The answer to your question is no, this is not the best way to approach this scenario. This is a terrible way to solve this problem for several reasons.
First, the code is simply wrong. The C# language makes no guarantee that this code works. If it works, then it is working by accident, and it can stop working at any time.
There are three reasons that this code is wrong.
The first reason it is wrong is because of the way threads work on modern operating systems. It is possible that the two threads are each assigned to their own processor. When a processor accesses memory on a modern machine, it does not go out to main memory every time. Rather, it fetches hundreds or thousands of nearby values into a cache the first time you hit an address. From then on, it accesses the local cache rather than taking the expensive bus ride back to main memory. The implications of that should be obvious: if one thread is writing and another thread is reading, then one thread might be writing to one processor cache and the other might be reading from an entirely different processor cache. They can be inconsistent forever if nothing forces them to be consistent, and therefore your loop can run forever even if the property has been set on another thread.
(And the "backwards" case is also possible; if the value of the property is now null, and was set at some time in the past, then it is possible that the second thread is reading the old, stale value and not the fresh null value. It therefore could decide to not wait at all, and invoke the method on a stale value of the property.)
The second reason this code is wrong is because it has a race condition. Suppose someone assigns the property to non-null on thread one, and then thread two reads it as non-null so you enter the body of the "if", and then thread three assigns it back to null, and then thread two reads it as null and crashes.
The third reason this code is wrong is because the compiler -- either the C# compiler or the jitter -- is permitted to "optimize" it so that it stays in the loop forever without doing the test a second time. The optimizer is allowed to analyze the code and realize that after the first time through the loop, if the test fails then nothing in the rest of the loop can cause it to succeed. It is permitted to then skip the test the next time through because it "knows" that it cannot succeed. Remember, the optimizer is permitted to make any optimization that would be invisible in a single-threaded program.
The optimizer does not actually make this optimization (to my knowledge) but it is permitted to, and a future version could do so. The optimizer can and does make similar optimizations in similar situations.
In order to make this code correct there must be a memory barrier in place. The most common technique for introducing a barrier is to make the access "volatile". The memory barrier forces the processor to abandon its cache and go back to main memory, and discourages the compiler from making aggressive optimizations. Of course, properties may not be volatile, and this technique utterly wrecks performance because it eliminates the one of the most important optimizations in modern processors. You might as well be accessing main memory by carrier pigeon, the cost is so onerous compared to hitting the cache.
Second, the code is bad because you are burning an entire processor sitting there in a tight loop checking a property. Imagine a processor is a car. Maybe your business owns four cars. You are taking one of them and driving it around the block non-stop, at high speed, until the mailman arrives. That is a waste of a valuable resource! It will make the entire machine less responsive, on laptops it will chew through battery like there is no tomorrow, it'll create waste heat, it's just bad.
I note however that at least you are marshalling the cross-thread call back to the UI thread, which is correct.
The best way to solve this problem is to not solve it. If you need something to happen when a property becomes non-null, then the best solution is to handle a change event associated with that property.
If you cannot do that then the best solution is to make the action the responsibility of the property. Change the setter so that it does the action when it is set to non-null.
If you can't make it the responsibility of the property, then make it the responsibility of the user who is setting the property. Require that every time the property be set to non-null, that the action be performed.
If you can't do that then the safest way to solve this problem is to NOT spin up another thread. Instead, spin up a timer that signals the main thread every half second or so, and have the timer event handler do the check and perform the action.
Busy-waiting is almost always the wrong solution.
All you need to do is attach an event handler to the Activated event of your form. Add the following inside that form's constructor:
Activated += SomeMethod;
And it will be fired whenever you re-activate the form after previously using another application.
The primary advantage of this approach is that you avoid creating a new thread just to have it sitting around doing a spinwait (using up a lot of CPU cycles).
If you want to use this approach, note that you have a race condition: someone else might set Main.ActiveForm to null between your test and your Invoke() call. That would result in an exception.
Copy the variable locally before doing any tests to make sure that the variable cannot be made null.
while(true)
{
var form = Main.ActiveForm;
if(form != null)
{
form.Invoke(new MethodInvoker(Main.SomeMethod));
break;
}
}
When you use a loop You are waste CPU.
The beter way to do this is use events:
// make event object in some shared place
var ev = new ManualResetEvent(false);
// do when form loaded
ev.Set();
// wait in thread
ev.WaitOne();
use :
while(Main.ActiveForm == null) { }
I would do it like that.
while(Main.ActiveForm == null)
{
//maybe a sleep here ?!
}
Main.ActiveForm.Invoke(new MethodInvoker(Main.SomeMethod));

Why do we need the directly call in a thread-safe call block?

Refer the thread-safe call tutorial at MSDN, have a look at following statments:
// InvokeRequired required compares the thread ID of the
// calling thread to the thread ID of the creating thread.
// If these threads are different, it returns true.
if (this.textBox1.InvokeRequired) {
SetTextCallback d = new SetTextCallback(SetText);
this.Invoke(d, new object[] { text });
} else {
this.textBox1.Text = text;
}
Of course, I've used it many times in my codes, and understand a little why to use it.
But I still have some unclear questions about those statements, so anybody help me to find them out, please.
The questions are:
Will the code can run correctly with the statements in the if body only? I tried and seems it just cause the problem if the control is not initialize completely. I don't know is there more problem?
Which the advantage of calling method directly (else body) instance via invoker? Does it save resource (CPU, RAM) or something?
Thanks!
You can of course always call using the Invoker, but:
It usually makes the code more verbose and difficult to read.
It is less efficient as there are several extra layers to contend with (setting up delegates, calling the dispatcher and so on).
If you are sure you'll always be on the GUI thread, you can just ignore the above checks and call directly.
If you always run just the first part of the if statement, it will always be fine, as Invoke already checks if you're on the UI thread.
The reason you don't want to do this is that Invoke has to do a a lot of work to run your method, even if you're already on the right thread. Here's what it has to do (extracted from the source of Control.cs):
Find the marshaling control via an upward traversal of the parent control chain
Check if the control is an ActiveX control and, if so, demand unmanaged code permissions
Work out if the call needs to be invoked asynchronously to avoid potential deadlock
Take a copy of the calling thread's execution context so the same security permissions will be used when the delegate is finally called
Enqueue the method call, then post a message to invoke the method, then wait (if synchronous) until it completes
None of the steps in the second branch are required during a direct call from the UI thread, as all the preconditions are already guaranteed, so it's definitely going to be faster, although to be fair, unless you're updating controls very frequently, you're very unlikely to notice any difference.

C# optimizations and side effects

Can optimizations done by the C# compiler or the JITter have visible side effects?
One example I've though off.
var x = new Something();
A(x);
B(x);
When calling A(x) x is guaranteed to be kept alive to the end of A - because B uses the same parameter. But if B is defined as
public void B(Something x) { }
Then the B(x) can be eliminated by the optimizer and then a GC.KeepAlive(x) call might be necessary instead.
Can this optimization actually be done by the JITter?
Are there other optimizations that might have visible side effects, except stack trace changes?
If your function B does not use the parameter x, then eliminating it and collecting x early does not have any visible side effects.
To be "visible side effects", they have to be visible to the program, not to an external tool like a debugger or object viewer.
When calling A(x) x is guaranteed to be kept alive to the end of A - because B uses the same parameter.
This statement is false. Suppose method A always throws an exception. The jitter could know that B will never be reached, and therefore x can be released immediately. Suppose method A goes into an unconditional infinite loop after its last reference to x; again, the jitter could know that via static analysis, determine that x will never be referenced again, and schedule it to be cleaned up. I do not know if the jitter actually performs these optimization; they seem dodgy, but they are legal.
Can this optimization (namely, doing early cleanup of a reference that is not used anywhere) actually be done by the JITter?
Yes, and in practice, it is done. That is not an observable side effect.
This is justified by section 3.9 of the specification, which I quote for your convenience:
If the object, or any part of it, cannot be accessed by any possible continuation of execution, other than the running of destructors, the object is considered no longer in use, and it becomes eligible for destruction. The C# compiler and the garbage collector may choose to analyze code to determine which references to an object may be used in the future. For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, the garbage collector may (but is not required to) treat the object as no longer in use.
Can optimizations done by the C# compiler or the JITter have visible side effects?
Your question is answered in section 3.10 of the specification, which I quote here for your convenience:
Execution of a C# program proceeds
such that the side effects of each
executing thread are preserved at
critical execution points.
A side
effect is defined as a read or write
of a volatile field, a write to a
non-volatile variable, a write to an
external resource, and the throwing of
an exception.
The critical execution
points at which the order of these
side effects must be preserved are
references to volatile fields, lock statements,
and thread creation and termination.
The execution environment is free to
change the order of execution of a C#
program, subject to the following
constraints:
Data dependence is
preserved within a thread of
execution. That is, the value of each
variable is computed as if all
statements in the thread were executed
in original program order.
Initialization ordering rules are
preserved.
The
ordering of side effects is preserved
with respect to volatile reads and
writes.
Additionally, the
execution environment need not
evaluate part of an expression if it
can deduce that that expression’s
value is not used and that no needed
side effects are produced (including
any caused by calling a method or
accessing a volatile field).
When
program execution is interrupted by an
asynchronous event (such as an
exception thrown by another thread),
it is not guaranteed that the
observable side effects are visible in
the original program order.
The second-to-last paragraph is I believe the one you are most concerned about; that is, what optimizations is the runtime allowed to perform with respect to affecting observable side effects? The runtime is permitted to perform any optimization which does not affect an observable side effect.
Note that in particular data dependence is only preserved within a thread of execution. Data dependence is not guaranteed to be preserved when observed from another thread of execution.
If that doesn't answer your question, ask a more specific question. In particular, a careful and precise definition of "observable side effect" will be necessary to answer your question in more detail, if you do not consider the definition given above to match your definition of "observable side effect".
Including B in your question just confuses the matter. Given this code:
var x = new Something();
A(x);
Assuming that A(x) is managed code, then calling A(x) maintains a reference to x, so the garbage collector can't collect x until after A returns. Or at least until A no longer needs it. The optimizations done by the JITer (absent bugs) will not prematurely collect x.
You should define what you mean by "visible side effects." One would hope that JITer optimizations at least have the side effect of making your code smaller or faster. Are those "visible?" Or do you mean "undesireable?"
Eric Lippert has started a great series about refactoring which leads me to believe that the C# Compiler and JITter makes sure not to introduce side effects. Part 1 and Part 2 are currently online.

Categories

Resources