In ConcurrentDictionary, is the read operation reading the latest updated value? - c#

I am using a ConcurrentDictionary (ongoingConnectionDic) in my code:
I check if a serial port number exists in the Dictionary.
If not existing, I add it into dictionary.
I perform communication with the serial port.
I remove the element from the ongoingConnectionDic.
If existing, I put the thread in wait.
My question is, can I ensure that when I perform a read operation, no other thread is simultaneously writing / updating the value ? So, am I reading the most recent value of the dictionary ?
If not, how do I achieve what I want?
Sample program:
class Program
{
// Dictionary in question
private static ConcurrentDictionary<string, string> ongoingPrinterJobs =
new ConcurrentDictionary<string, string>();
private static void sendPrint(string printerName)
{
if (ongoingPrinterJobs.ContainsKey(printerName))
{
// Add to pending list and run a thread to finish pending jobs by calling print();
}
else
{
ongoingPrinterJobs.TryAdd(printerName, ""); // -- Add it to the dictionary so that no other thread can
// use the printer
ThreadPool.QueueUserWorkItem(new WaitCallback(print), printerName);
}
}
private static void print(object stateInfo)
{
string printerName = (stateInfo as string);
string dummy;
// do printing work
// Remove from dictionary
ongoingPrinterJobs.TryRemove(printerName, out dummy);
}
static void Main(string[] args)
{
// Run threads here in random to print something on different printers
// Sample run with 10 printers
Random r = new Random();
for ( int i = 0 ; i < 10 ; i++ )
{
sendPrint(r.Next(0, 10).ToString());
}
}

The concurrent collections take a "snapshot" of the collection upon enumeration. This is to prevent the enumerator from becoming invalid if another thread comes along and writes to the collection.
A method such as ContainsKey may enumerate over the items in the dictionary (you'd have to look at the implementation), in which case, you may be reading stale data.
All concurrent collections allow you to do is ensure you can enumerate over a collection even if another thread writes to it while you're enumerating. This wasn't the case with the standard collections.
With that said, as others have mentioned in their comments, other issues of thread safety must still be considered (E.g. race conditions).
The only way to prevent someone inserting a value into the collection after you've attempted to read a value but before writing ia value is to lock the collection prior to reading the value to begin with, to ensure synchronized access to the collection throughout the entire transaction (I.e. The reading and subsequent writing of a value).

Related

Dictionary of ManualResetEvent - ThreadSafety

I'm synchronizing some threads using a dictionary of ManualResetEvents. It looks something like this. My question is, Is it thread-safe to call a getter/indexer of the dictionary like this? Should I call the getter from the context of a lock and store the value in a local variable?
Enum Type
enum RecvType
{
Type1,
Type2
//etc...
}
Dictionary of ManualResetEvents
Dictionary<RecvType, ManualResetEvent> recvSync;
Wait operation
void WaitForRecv(RecvType recvType, int timeout = 10000)
{
if (!recvSync[recvType].WaitOne(timeout))
{
throw new TimeoutException();
}
// do stuff
}
EventHandler (called from another thread)
void RecvDone(object sender, RecvType recvType)
{
recvSync[recvType].Set();
}
EDIT - clarify dictionary population
Dictionary Instanciation
public MyClass()
{
recvSync = new Dictionary<RecvType, ManualResetEvent>();
// populate dictionary (not modified after here)
socketWrapper.RecvDone += RecvDone;
}
According to the documentation:
A Dictionary<TKey,TValue> can support multiple readers concurrently, as long as the collection is not modified.
So your pattern of usage is OK, regarding thread-safety. The behavior of a "frozen" Dictionary<K,V> when multiple threads are reading it, is well defined.
You could consider communicating your intentions more clearly by using an ImmutableDictionary<K,V> instead of a normal Dictionary<K,V>, but that clarity would come with a cost: Finding an element in an ImmutableDictionary<K,V> is ~10 times slower.

Parallel.ForEach: Best way to save off a collection when its record count gets high?

So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.

Multithreaded access to singleton class

I'm having a hard time wrapping my head around accessing a singleton class with multiple threads.
This article has given me a nice starting point to get my singleton thread safe: http://csharpindepth.com/Articles/General/Singleton.aspx
My singleton class is supposed to treat a group of files as a single unity of data, but process them in a parallel fashion.
I store information of each file in a dictionary and return to the calling thread a unique key (which will be created using a DateTime and a random number) so that each thread can later refer to its own file.
public string AddFileForProcessing(FileForProcessing file)
{
var id = CreateUniqueFileId();
var resultFile = CreateResultFileFor(file);
//These collections are written here and only read elsewhere
_files.Add(id, file);
_results.Add(id, resultFile)
return id;
}
Then later threads call methods passing this id.
public void WriteProcessResultToProperFile(string id, string[] processingResult)
{
//locate the proper file in dictionary using id and then write information...
File.AppendAllLines(_results[key].FileName, processingResult);
}
Those methods will be accessed inside a class that:
a) Responds to a FileWatcher's Created event and creates threads that call AddFileForProcessing:
public void ProcessIncomingFile(object sender, EventArgs e)
{
var file = ((FileProcessingEventArg)e).File;
ThreadPool.QueueUserWorkItem(
item =>
{
ProcessFile(file);
});
}
b) Inside ProcessFile, I add the file to the dictionary and start processing.
private void ProcessFile(FileForProcessing file)
{
var key = filesManager.AddFileForProcessing(file);
var records = filesManager.GetRecordsCollection(key);
for (var i = 0; i < records.Count; i++)
{
//Do my processing here
filesManager.WriteProcessResultToProperFile(key, processingResult);
}
}
Now I don't get what happens when two threads call these methods, given they're both using the same instance.
Each thread will call AddFileForProcessing and WriteProcessResultToProperFile with a different parameter. Does that make them two different calls?
Since it will operate on a file that will be uniquely identified by the id that belongs to a single thread (i.e.. no file will suffer from multiple accesses), can I leave this method as is or do I still have to "lock" my method?
Yes, as long as you only read from the shared dictionary all should be fine. And you can process the files in parallel as long as they are different files, as you correctly mention.
The documentation explains:
A Dictionary<TKey, TValue> can support multiple readers concurrently, as long as the collection is not modified.
So, you can't do anything in parallel if anyone can call AddFileForProcessing (without lock). But with calls only to WriteProcessResultToProperFile, it will be fine. This implies that if you want to call AddFileForProcessing in parallel, then you need locks in both methods (in fact: all parts of code that will touch this dictionary).

Thread Safety: Lock vs Reference

I have a C# program that has a list that does writes and reads in separate threads. The write is user initiated and can change the data at any random point in time. The read runs in a constant loop. It doesn't matter if the read is missing data in any given loop, as long as the data it does receive is valid and it get's the new data in a future loop.
After considering ConcurrentBag, I settled on using locks for a variety of reasons (simplicity being one of them). After implementing the locks, a coworker mentioned to me that using temporary references to point to the old List in memory would work just as well, but I am concerned about what will happen if the new assignment and the reference assignment would happen at the same time.
Q: Is the temporary reference example below thread safe?
Update: User input provides a list of strings which are used in DoStuff(). You can think of these strings as a definition of constants and as such the strings need to be persisted for future loops. They are not deleted in DoStuff(), only read. UserInputHandler is the only thread that will ever change this list and DoStuff() is the only thread that will ever read from this list. Nothing else has access to it.
Additionally, I am aware of the the Concurrent namespace and have used most of the collections in it in other projects, but, I have chosen not to use them here because of extra code complexity that they add (i.e. ConcurrentBag doesn't have a simple Clear() function, etc.). A simple lock is good enough in this situation. The question is only whether the second example below is thread safe.
Lock
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items.Clear();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
lock(items)
{
//Do read only actions with items here
foreach(var constant in constants)
{
//readonly actions
}
}
}
Reference
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items = new List<string>();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
var constantsReference = constants;
//Do read only actions with constantsReference here
foreach(var constant in constantsReference)
{
//readonly actions
}
}
This is not safe without the lock. Copying the reference to the list doesn't really do anything for you in this context. It's still quite possible for the list that you are currently iterating to be mutated in another thread while you are iterating it, causing all sorts of possible badness.
I think what you're looking for is BlockingCollection. Check out the following link for getting starting using it:
http://msdn.microsoft.com/en-us/library/dd997371%28v=vs.110%29.aspx
Here's an example of using BlockingCollection. ThreadB won't start enumerating the BlockingCollection until there are items available, and when it runs out of items, it will stop enumerating until more items become available (or until the IsCompleted property returns true)
private static readonly BlockingCollection<int> Items = new BlockingCollection<int>();
//ThreadA
public void LoadStuff()
{
Items.Add(1);
Items.Add(2);
Items.Add(3);
}
//ThreadB
public void DoStuff()
{
foreach (var item in Items.GetConsumingEnumerable())
{
//Do stuff here
}
}
Lock Free is dangerous and not portable. Don't do it. If you need to read on how to do lock-free, you probably shouldn't be doing it.
I think I missed understood the question. I under the strange impression that the list was only ever added to or only the most recent version is what matters. No idea how I came to that when he explicitly shows a "clear()" call.
I apologize for the confusion.
This code is being disputed, use at your own risk, but I'm quite sure it should work on x86/x64, but no clue about ARM
You could do something like this
//Suggested to just use volatile instead of memorybarrier
static volatile T _MyList = new ReadOnlyList<T>();
void Load(){
T LocalList = _MyList.Copy();
LocalList.Add(1);
LocalList.Add(2);
LocalList.Add(3);
_MyList = LocalList.ReadOnly(); //Making it more clear
}
DoStuff(){
T LocalList = _MyList;
foreach(t tmp in LocalList)
}
This should work well for heavy read workloads. If you have more than one writer that modifies _MyList, you'll need to figure out a way to synchronize them.

Why doesn't a foreach loop work in certain cases?

I was using a foreach loop to go through a list of data to process (removing said data once processed--this was inside a lock). This method caused an ArgumentException now and then.
Catching it would have been expensive so I tried tracking down the issue but I couldn't figure it out.
I have since switched to a for loop and the problem seems to have went away. Can someone explain what happened? Even with the exception message I don't quite understand what took place behind the scenes.
Why is the for loop apparently working? Did I set up the foreach loop wrong or what?
This is pretty much how my loops were set up:
foreach (string data in new List<string>(Foo.Requests))
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
and
for (int i = 0; i < Foo.Requests.Count; i++)
{
string data = Foo.Requests[i];
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
EDIT: The for* loop is in a while setup like so:
while (running)
{
// [...]
}
EDIT: Added more information about the exception as requested.
System.ArgumentException: Destination array was not long enough. Check destIndex and length, and the array's lower bounds
at System.Array.Copy (System.Array sourceArray, Int32 sourceIndex, System.Array destinationArray, Int32 destinationIndex, Int32 length) [0x00000]
at System.Collections.Generic.List`1[System.String].CopyTo (System.String[] array, Int32 arrayIndex) [0x00000]
at System.Collections.Generic.List`1[System.String].AddCollection (ICollection`1 collection) [0x00000]
at System.Collections.Generic.List`1[System.String]..ctor (IEnumerable`1 collection) [0x00000]
EDIT: The reason for the locking is that there is another thread adding data. Also, eventually, more than one thread will be processing data (so if the entire setup is wrong, please advise).
EDIT: It was hard to pick a good answer.
I found Eric Lippert's comment deserving but he didn't really answer (up-voted his comment anyhow).
Pavel Minaev, Joel Coehoorn and Thorarin all gave answers I liked and up-voted. Thorarin also took an extra 20 minutes to write some helpful code.
I which I could accept all 3 and have it split the reputation but alas.
Pavel Minaev is the next deserving so he gets the credit.
Thanks for the help good people. :)
Your problem is that the constructor of List<T> that creates a new list from IEnumerable (which is what you call) isn't thread-safe with respect to its argument. What happens is that while this:
new List<string>(Foo.Requests)
is executing, another thread changes Foo.Requests. You'll have to lock it for the duration of that call.
[EDIT]
As pointed out by Eric, another problem List<T> isn't guaranteed safe for readers to read while another thread is changing it, either. I.e. concurrent readers are okay, but concurrent reader and writer are not. And while you lock your writes against each other, you don't lock your reads against your writes.
After seeing your exception; it looks to me that Foo.Requests is being changed while the shallow copy is being constructed. Change it to something like this:
List<string> requests;
lock (Foo.Requests)
{
requests = new List<string>(Foo.Requests);
}
foreach (string data in requests)
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
Not the question, but...
That being said, I somewhat doubt the above is what you want either. If new requests are coming in during processing, they will not have been processed when your foreach loop terminates. Since I was bored, here's something along the lines that I think you're trying to achieve:
class RequestProcessingThread
{
// Used to signal this thread when there is new work to be done
private AutoResetEvent _processingNeeded = new AutoResetEvent(true);
// Used for request to terminate processing
private ManualResetEvent _stopProcessing = new ManualResetEvent(false);
// Signalled when thread has stopped processing
private AutoResetEvent _processingStopped = new AutoResetEvent(false);
/// <summary>
/// Called to start processing
/// </summary>
public void Start()
{
_stopProcessing.Reset();
Thread thread = new Thread(ProcessRequests);
thread.Start();
}
/// <summary>
/// Called to request a graceful shutdown of the processing thread
/// </summary>
public void Stop()
{
_stopProcessing.Set();
// Optionally wait for thread to terminate here
_processingStopped.WaitOne();
}
/// <summary>
/// This method does the actual work
/// </summary>
private void ProcessRequests()
{
WaitHandle[] waitHandles = new WaitHandle[] { _processingNeeded, _stopProcessing };
Foo.RequestAdded += OnRequestAdded;
while (true)
{
while (Foo.Requests.Count > 0)
{
string request;
lock (Foo.Requests)
{
request = Foo.Requests.Peek();
}
// Process request
Debug.WriteLine(request);
lock (Foo.Requests)
{
Foo.Requests.Dequeue();
}
}
if (WaitHandle.WaitAny(waitHandles) == 1)
{
// _stopProcessing was signalled, exit the loop
break;
}
}
Foo.RequestAdded -= ProcessRequests;
_processingStopped.Set();
}
/// <summary>
/// This method will be called when a new requests gets added to the queue
/// </summary>
private void OnRequestAdded()
{
_processingNeeded.Set();
}
}
static class Foo
{
public delegate void RequestAddedHandler();
public static event RequestAddedHandler RequestAdded;
static Foo()
{
Requests = new Queue<string>();
}
public static Queue<string> Requests
{
get;
private set;
}
public static void AddRequest(string request)
{
lock (Requests)
{
Requests.Enqueue(request);
}
if (RequestAdded != null)
{
RequestAdded();
}
}
}
There are still a few problems with this, which I will leave to the reader:
Checking for _stopProcessing should probably be done after every time a request is processed
The Peek() / Dequeue() approach won't work if you have multiple threads doing processing
Insufficient encapsulation: Foo.Requests is accessible, but Foo.AddRequest needs to be used to add any requests if you want them processed.
In case of multiple processing threads: need to handle the queue being empty inside the loop, since there is no lock around the Count > 0 check.
Your locking scheme is broken. You need to lock Foo.Requests() for the entire duration of the loop, not just when removing an item. Otherwise the item might become invalid in the middle of your "process the data" operation and enumeration might change in between moving from item to item. And that assumes you don't need to insert the collection during this interval as well. If that's the case, you really need to re-factor to use a proper producer/consumer queue.
To be completely honest, I would suggest refactoring that. You are removing items from the object while also iterating over that. Your loop could actually exit before you've processed all items.
Three things:
- I wouldn't put them lock within the for(each) statement, but outside of it.
- I wouldn't lock the actual collection, but a local static object
- You can not modify a list/collection that you're enumerating
For more information check:
http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx
lock (lockObject) {
foreach (string data in new List<string>(Foo.Requests))
Foo.Requests.Remove(data);
}
The problem is the expression
new List<string>(Foo.Requests)
inside your foreach, because it's not under a lock. I assume that while .NET copies your requests collection into a new list, the list is modified by another thread
foreach (string data in new List<string>(Foo.Requests))
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
Suppose you have two threads executing this code.
at System.Collections.Generic.List1[System.String]..ctor
Thread1 starts processing the list.
Thread2 calls the List constructor, which takes a count for the array to be created.
Thread1 changes the number of items in the list.
Thread2 has the wrong number of items.
Your locking scheme is wrong. It's even wrong in the for loop example.
You need to lock every time you access the shared resource - even to read or copy it. This doesn't mean you need to lock for the whole operation. It does mean that everyone sharing this shared resource needs to participate in the locking scheme.
Also consider defensive copying:
List<string> todos = null;
List<string> empty = new List<string>();
lock(Foo.Requests)
{
todos = Foo.Requests;
Foo.Requests = empty;
}
//now process local list todos
Even so, all those that share Foo.Requests must participate in the locking scheme.
You are trying to remove objects from list as you are iterating through list. (OK, technically, you are not doing this, but that's the goal you are trying to achieve).
Here's how you do it properly: while iterating, construct another list of entries that you want to remove. Simply construct another (temp) list, put all entries you want to remove from original list into the temp list.
List entries_to_remove = new List(...);
foreach( entry in original_list ) {
if( entry.someCondition() == true ) {
entries_to_remove.add( entry );
}
}
// Then when done iterating do:
original_list.removeAll( entries_to_remove );
Using "removeAll" method of List class.
I know it's not what you asked for, but just for the sake of my own sanity, does the following represent the intention of your code:
private object _locker = new object();
// ...
lock (_locker) {
Foo.Requests.Clear();
}

Categories

Resources