I have a service that I am rewriting to use threading. I understand that state from one thread should not be accessed by another, but I'm a little confused by what constitutes 'state'. Does that mean any field/property/method outside of the method scope?
Specifically, my service looks something like this:
public class MyService
{
private IRepository<MyClass> repository;
private ILogger log;
...
public void MyMethod()
{
...
var t = new Thread(MyMethodAsync);
t.Start(someState);
}
//Is this OK???
public void MyMethodAsync(object state)
{
var someState = (MyState)state;
log.Log("Starting");
var someData = repository.GetSomeData(someState.Property);
//process data
log.Log("Done");
}
//Or should I be doing this:
public void MyMethodAsync2(object state)
{
var someState = (MyState)state;
lock(log){
log.Log("Starting"); }
lock(repository){
var someData = repository.GetSomeData(someState.Property);}
//process data
lock(log){
log.Log("Done"); }
}
}
Er...nope, you don't need to lock resources that are read-only. The purpose of locking them is so that if you need to check the value of a resource before writing it then another resource can't change the value between your read and your write. i.e.:
SyncLock MyQueue
If MyQueue.Length = 0 Then
PauseFlag.Reset
End If
End SyncLock
If we were to check the length of our queue before we set the flag to pause our process queue thread and then another resource were to add an item to the queue, then our process queue thread would sit in a paused state while an item potentially could've been added between checking the queue length and setting the pause flag...
If all resources are only accessing the queue in a read only fashion (not that I could think of a single useful application of a read-only queue) then there's no need to lock it.
"State" is all the data contained in the class, and the real issue as far as concurrency goes is write access, so your intuition is right.
Even worse, locking read-only structures is a good way to create deadlocks.
Related
I want to start some new threads each for one repeating operation. But when such an operation is already in progress, I want to discard the current task. In my scenario I need very current data only - dropped data is not an issue.
In the MSDN I found the Mutex class but as I understand it, it waits for its turn, blocking the current thread. Also I want to ask you: Does something exist in the .NET framework already, that does the following:
Is some method M already being executed?
If so, return (and let me increase some counter for statistics)
If not, start method M in a new thread
The lock(someObject) statement, which you may have come across, is syntactic sugar around Monitor.Enter and Monitor.Exit.
However, if you use the monitor in this more verbose way, you can also use Monitor.TryEnter which allows you to check if you'll be able to get the lock - hence checking if someone else already has it and is executing code.
So instead of this:
var lockObject = new object();
lock(lockObject)
{
// do some stuff
}
try this (option 1):
int _alreadyBeingExecutedCounter;
var lockObject = new object();
if (Monitor.TryEnter(lockObject))
{
// you'll only end up here if you got the lock when you tried to get it - otherwise you'll never execute this code.
// do some stuff
//call exit to release the lock
Monitor.Exit(lockObject);
}
else
{
// didn't get the lock - someone else was executing the code above - so I don't need to do any work!
Interlocked.Increment(ref _alreadyBeingExecutedCounter);
}
(you'll probably want to put a try..finally in there to ensure the lock is released)
or dispense with the explicit lock althogether and do this
(option 2)
private int _inUseCount;
public void MyMethod()
{
if (Interlocked.Increment(ref _inUseCount) == 1)
{
// do dome stuff
}
Interlocked.Decrement(ref _inUseCount);
}
[Edit: in response to your question about this]
No - don't use this to lock on. Create a privately scoped object to act as your lock.
Otherwise you have this potential problem:
public class MyClassWithLockInside
{
public void MethodThatTakesLock()
{
lock(this)
{
// do some work
}
}
}
public class Consumer
{
private static MyClassWithLockInside _instance = new MyClassWithLockInside();
public void ThreadACallsThis()
{
lock(_instance)
{
// Having taken a lock on our instance of MyClassWithLockInside,
// do something long running
Thread.Sleep(6000);
}
}
public void ThreadBCallsThis()
{
// If thread B calls this while thread A is still inside the lock above,
// this method will block as it tries to get a lock on the same object
// ["this" inside the class = _instance outside]
_instance.MethodThatTakesLock();
}
}
In the above example, some external code has managed to disrupt the internal locking of our class just by taking out a lock on something that was externally accessible.
Much better to create a private object that you control, and that no-one outside your class has access to, to avoid these sort of problems; this includes not using this or the type itself typeof(MyClassWithLockInside) for locking.
One option would be to work with a reentrancy sentinel:
You could define an int field (initialize with 0) and update it via Interlocked.Increment on entering the method and only proceed if it is 1. At the end just do a Interlocked.Decrement.
Another option:
From your description it seems that you have a Producer-Consumer-Scenario...
For this case it might be helpful to use something like BlockingCollection as it is thread-safe and mostly lock-free...
Another option would be to use ConcurrentQueue or ConcurrentStack...
You might find some useful information on the following site (the PDf is also downlaodable - recently downloaded it myself). The Adavnced threading Suspend and Resume or Aborting chapters maybe what you are inetrested in.
You should use Interlocked class atomic operations - for best performance - since you won't actually use system-level sychronizations(any "standard" primitive needs it, and involve system call overhead).
//simple non-reentrant mutex without ownership, easy to remake to support //these features(just set owner after acquiring lock(compare Thread reference with Thread.CurrentThread for example), and check for matching identity, add counter for reentrancy)
//can't use bool because it's not supported by CompareExchange
private int lock;
public bool TryLock()
{
//if (Interlocked.Increment(ref _inUseCount) == 1)
//that kind of code is buggy - since counter can change between increment return and
//condition check - increment is atomic, this if - isn't.
//Use CompareExchange instead
//checks if 0 then changes to 1 atomically, returns original value
//return true if thread succesfully occupied lock
return CompareExchange(ref lock, 1, 0)==0;
return false;
}
public bool Release()
{
//returns true if lock was occupied; false if it was free already
return CompareExchange(ref lock, 0, 1)==1;
}
So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.
I'm having a hard time wrapping my head around accessing a singleton class with multiple threads.
This article has given me a nice starting point to get my singleton thread safe: http://csharpindepth.com/Articles/General/Singleton.aspx
My singleton class is supposed to treat a group of files as a single unity of data, but process them in a parallel fashion.
I store information of each file in a dictionary and return to the calling thread a unique key (which will be created using a DateTime and a random number) so that each thread can later refer to its own file.
public string AddFileForProcessing(FileForProcessing file)
{
var id = CreateUniqueFileId();
var resultFile = CreateResultFileFor(file);
//These collections are written here and only read elsewhere
_files.Add(id, file);
_results.Add(id, resultFile)
return id;
}
Then later threads call methods passing this id.
public void WriteProcessResultToProperFile(string id, string[] processingResult)
{
//locate the proper file in dictionary using id and then write information...
File.AppendAllLines(_results[key].FileName, processingResult);
}
Those methods will be accessed inside a class that:
a) Responds to a FileWatcher's Created event and creates threads that call AddFileForProcessing:
public void ProcessIncomingFile(object sender, EventArgs e)
{
var file = ((FileProcessingEventArg)e).File;
ThreadPool.QueueUserWorkItem(
item =>
{
ProcessFile(file);
});
}
b) Inside ProcessFile, I add the file to the dictionary and start processing.
private void ProcessFile(FileForProcessing file)
{
var key = filesManager.AddFileForProcessing(file);
var records = filesManager.GetRecordsCollection(key);
for (var i = 0; i < records.Count; i++)
{
//Do my processing here
filesManager.WriteProcessResultToProperFile(key, processingResult);
}
}
Now I don't get what happens when two threads call these methods, given they're both using the same instance.
Each thread will call AddFileForProcessing and WriteProcessResultToProperFile with a different parameter. Does that make them two different calls?
Since it will operate on a file that will be uniquely identified by the id that belongs to a single thread (i.e.. no file will suffer from multiple accesses), can I leave this method as is or do I still have to "lock" my method?
Yes, as long as you only read from the shared dictionary all should be fine. And you can process the files in parallel as long as they are different files, as you correctly mention.
The documentation explains:
A Dictionary<TKey, TValue> can support multiple readers concurrently, as long as the collection is not modified.
So, you can't do anything in parallel if anyone can call AddFileForProcessing (without lock). But with calls only to WriteProcessResultToProperFile, it will be fine. This implies that if you want to call AddFileForProcessing in parallel, then you need locks in both methods (in fact: all parts of code that will touch this dictionary).
I have a C# program that has a list that does writes and reads in separate threads. The write is user initiated and can change the data at any random point in time. The read runs in a constant loop. It doesn't matter if the read is missing data in any given loop, as long as the data it does receive is valid and it get's the new data in a future loop.
After considering ConcurrentBag, I settled on using locks for a variety of reasons (simplicity being one of them). After implementing the locks, a coworker mentioned to me that using temporary references to point to the old List in memory would work just as well, but I am concerned about what will happen if the new assignment and the reference assignment would happen at the same time.
Q: Is the temporary reference example below thread safe?
Update: User input provides a list of strings which are used in DoStuff(). You can think of these strings as a definition of constants and as such the strings need to be persisted for future loops. They are not deleted in DoStuff(), only read. UserInputHandler is the only thread that will ever change this list and DoStuff() is the only thread that will ever read from this list. Nothing else has access to it.
Additionally, I am aware of the the Concurrent namespace and have used most of the collections in it in other projects, but, I have chosen not to use them here because of extra code complexity that they add (i.e. ConcurrentBag doesn't have a simple Clear() function, etc.). A simple lock is good enough in this situation. The question is only whether the second example below is thread safe.
Lock
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items.Clear();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
lock(items)
{
//Do read only actions with items here
foreach(var constant in constants)
{
//readonly actions
}
}
}
Reference
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items = new List<string>();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
var constantsReference = constants;
//Do read only actions with constantsReference here
foreach(var constant in constantsReference)
{
//readonly actions
}
}
This is not safe without the lock. Copying the reference to the list doesn't really do anything for you in this context. It's still quite possible for the list that you are currently iterating to be mutated in another thread while you are iterating it, causing all sorts of possible badness.
I think what you're looking for is BlockingCollection. Check out the following link for getting starting using it:
http://msdn.microsoft.com/en-us/library/dd997371%28v=vs.110%29.aspx
Here's an example of using BlockingCollection. ThreadB won't start enumerating the BlockingCollection until there are items available, and when it runs out of items, it will stop enumerating until more items become available (or until the IsCompleted property returns true)
private static readonly BlockingCollection<int> Items = new BlockingCollection<int>();
//ThreadA
public void LoadStuff()
{
Items.Add(1);
Items.Add(2);
Items.Add(3);
}
//ThreadB
public void DoStuff()
{
foreach (var item in Items.GetConsumingEnumerable())
{
//Do stuff here
}
}
Lock Free is dangerous and not portable. Don't do it. If you need to read on how to do lock-free, you probably shouldn't be doing it.
I think I missed understood the question. I under the strange impression that the list was only ever added to or only the most recent version is what matters. No idea how I came to that when he explicitly shows a "clear()" call.
I apologize for the confusion.
This code is being disputed, use at your own risk, but I'm quite sure it should work on x86/x64, but no clue about ARM
You could do something like this
//Suggested to just use volatile instead of memorybarrier
static volatile T _MyList = new ReadOnlyList<T>();
void Load(){
T LocalList = _MyList.Copy();
LocalList.Add(1);
LocalList.Add(2);
LocalList.Add(3);
_MyList = LocalList.ReadOnly(); //Making it more clear
}
DoStuff(){
T LocalList = _MyList;
foreach(t tmp in LocalList)
}
This should work well for heavy read workloads. If you have more than one writer that modifies _MyList, you'll need to figure out a way to synchronize them.
I have 2 threads to are triggered at the same time and run in parallel. These 2 threads are going to be manipulating a string value, but I want to make sure that there are no data inconsistencies. For that I want to use a lock with Monitor.Pulse and Monitor.Wait. I used a method that I found on another question/answer, but whenever I run my program, the first thread gets stuck at the Monitor.Wait level. I think that's because the second thread has already "Pulsed" and "Waited". Here is some code to look at:
string currentInstruction;
public void nextInstruction()
{
Action actions = {
fetch,
decode
}
Parallel.Invoke(actions);
_pc++;
}
public void fetch()
{
lock(irLock)
{
currentInstruction = "blah";
GiveTurnTo(2);
WaitTurn(1);
}
decodeEvent.WaitOne();
}
public void decode()
{
decodeEvent.Set();
lock(irLock)
{
WaitTurn(2);
currentInstruction = "decoding..."
GiveTurnTo(1);
}
}
// Below are the methods I talked about before.
// Wait for turn to use lock object
public static void WaitTurn(int threadNum, object _lock)
{
// While( not this threads turn )
while (threadInControl != threadNum)
{
// "Let go" of lock on SyncRoot and wait utill
// someone finishes their turn with it
Monitor.Wait(_lock);
}
}
// Pass turn over to other thread
public static void GiveTurnTo(int nextThreadNum, object _lock)
{
threadInControl = nextThreadNum;
// Notify waiting threads that it's someone else's turn
Monitor.Pulse(_lock);
}
Any idea how to get 2 parallel threads to communicate (manipulate the same resources) within the same cycle using locks or anything else?
You want to run 2 peaces of code in parallel, but locking them at start using the same variable?
As nvoigt mentioned, it already sounds wrong. What you have to do is to remove lock from there. Use it only when you are about to access something exclusively.
Btw "data inconsistencies" can be avoided by not having to have them. Do not use currentInstruction field directly (is it a field?), but provide a thread safe CurrentInstruction property.
private object _currentInstructionLock = new object();
private string _currentInstruction
public string CurrentInstruction
{
get { return _currentInstruction; }
set
{
lock(_currentInstructionLock)
_currentInstruction = value;
}
}
Other thing is naming, local variables name starting from _ is a bad style. Some peoples (incl. me) using them to distinguish private fields. Property name should start from BigLetter and local variables fromSmall.