Why doesn't a foreach loop work in certain cases? - c#

I was using a foreach loop to go through a list of data to process (removing said data once processed--this was inside a lock). This method caused an ArgumentException now and then.
Catching it would have been expensive so I tried tracking down the issue but I couldn't figure it out.
I have since switched to a for loop and the problem seems to have went away. Can someone explain what happened? Even with the exception message I don't quite understand what took place behind the scenes.
Why is the for loop apparently working? Did I set up the foreach loop wrong or what?
This is pretty much how my loops were set up:
foreach (string data in new List<string>(Foo.Requests))
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
and
for (int i = 0; i < Foo.Requests.Count; i++)
{
string data = Foo.Requests[i];
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
EDIT: The for* loop is in a while setup like so:
while (running)
{
// [...]
}
EDIT: Added more information about the exception as requested.
System.ArgumentException: Destination array was not long enough. Check destIndex and length, and the array's lower bounds
at System.Array.Copy (System.Array sourceArray, Int32 sourceIndex, System.Array destinationArray, Int32 destinationIndex, Int32 length) [0x00000]
at System.Collections.Generic.List`1[System.String].CopyTo (System.String[] array, Int32 arrayIndex) [0x00000]
at System.Collections.Generic.List`1[System.String].AddCollection (ICollection`1 collection) [0x00000]
at System.Collections.Generic.List`1[System.String]..ctor (IEnumerable`1 collection) [0x00000]
EDIT: The reason for the locking is that there is another thread adding data. Also, eventually, more than one thread will be processing data (so if the entire setup is wrong, please advise).
EDIT: It was hard to pick a good answer.
I found Eric Lippert's comment deserving but he didn't really answer (up-voted his comment anyhow).
Pavel Minaev, Joel Coehoorn and Thorarin all gave answers I liked and up-voted. Thorarin also took an extra 20 minutes to write some helpful code.
I which I could accept all 3 and have it split the reputation but alas.
Pavel Minaev is the next deserving so he gets the credit.
Thanks for the help good people. :)

Your problem is that the constructor of List<T> that creates a new list from IEnumerable (which is what you call) isn't thread-safe with respect to its argument. What happens is that while this:
new List<string>(Foo.Requests)
is executing, another thread changes Foo.Requests. You'll have to lock it for the duration of that call.
[EDIT]
As pointed out by Eric, another problem List<T> isn't guaranteed safe for readers to read while another thread is changing it, either. I.e. concurrent readers are okay, but concurrent reader and writer are not. And while you lock your writes against each other, you don't lock your reads against your writes.

After seeing your exception; it looks to me that Foo.Requests is being changed while the shallow copy is being constructed. Change it to something like this:
List<string> requests;
lock (Foo.Requests)
{
requests = new List<string>(Foo.Requests);
}
foreach (string data in requests)
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
Not the question, but...
That being said, I somewhat doubt the above is what you want either. If new requests are coming in during processing, they will not have been processed when your foreach loop terminates. Since I was bored, here's something along the lines that I think you're trying to achieve:
class RequestProcessingThread
{
// Used to signal this thread when there is new work to be done
private AutoResetEvent _processingNeeded = new AutoResetEvent(true);
// Used for request to terminate processing
private ManualResetEvent _stopProcessing = new ManualResetEvent(false);
// Signalled when thread has stopped processing
private AutoResetEvent _processingStopped = new AutoResetEvent(false);
/// <summary>
/// Called to start processing
/// </summary>
public void Start()
{
_stopProcessing.Reset();
Thread thread = new Thread(ProcessRequests);
thread.Start();
}
/// <summary>
/// Called to request a graceful shutdown of the processing thread
/// </summary>
public void Stop()
{
_stopProcessing.Set();
// Optionally wait for thread to terminate here
_processingStopped.WaitOne();
}
/// <summary>
/// This method does the actual work
/// </summary>
private void ProcessRequests()
{
WaitHandle[] waitHandles = new WaitHandle[] { _processingNeeded, _stopProcessing };
Foo.RequestAdded += OnRequestAdded;
while (true)
{
while (Foo.Requests.Count > 0)
{
string request;
lock (Foo.Requests)
{
request = Foo.Requests.Peek();
}
// Process request
Debug.WriteLine(request);
lock (Foo.Requests)
{
Foo.Requests.Dequeue();
}
}
if (WaitHandle.WaitAny(waitHandles) == 1)
{
// _stopProcessing was signalled, exit the loop
break;
}
}
Foo.RequestAdded -= ProcessRequests;
_processingStopped.Set();
}
/// <summary>
/// This method will be called when a new requests gets added to the queue
/// </summary>
private void OnRequestAdded()
{
_processingNeeded.Set();
}
}
static class Foo
{
public delegate void RequestAddedHandler();
public static event RequestAddedHandler RequestAdded;
static Foo()
{
Requests = new Queue<string>();
}
public static Queue<string> Requests
{
get;
private set;
}
public static void AddRequest(string request)
{
lock (Requests)
{
Requests.Enqueue(request);
}
if (RequestAdded != null)
{
RequestAdded();
}
}
}
There are still a few problems with this, which I will leave to the reader:
Checking for _stopProcessing should probably be done after every time a request is processed
The Peek() / Dequeue() approach won't work if you have multiple threads doing processing
Insufficient encapsulation: Foo.Requests is accessible, but Foo.AddRequest needs to be used to add any requests if you want them processed.
In case of multiple processing threads: need to handle the queue being empty inside the loop, since there is no lock around the Count > 0 check.

Your locking scheme is broken. You need to lock Foo.Requests() for the entire duration of the loop, not just when removing an item. Otherwise the item might become invalid in the middle of your "process the data" operation and enumeration might change in between moving from item to item. And that assumes you don't need to insert the collection during this interval as well. If that's the case, you really need to re-factor to use a proper producer/consumer queue.

To be completely honest, I would suggest refactoring that. You are removing items from the object while also iterating over that. Your loop could actually exit before you've processed all items.

Three things:
- I wouldn't put them lock within the for(each) statement, but outside of it.
- I wouldn't lock the actual collection, but a local static object
- You can not modify a list/collection that you're enumerating
For more information check:
http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx
lock (lockObject) {
foreach (string data in new List<string>(Foo.Requests))
Foo.Requests.Remove(data);
}

The problem is the expression
new List<string>(Foo.Requests)
inside your foreach, because it's not under a lock. I assume that while .NET copies your requests collection into a new list, the list is modified by another thread

foreach (string data in new List<string>(Foo.Requests))
{
// Process the data.
lock (Foo.Requests)
{
Foo.Requests.Remove(data);
}
}
Suppose you have two threads executing this code.
at System.Collections.Generic.List1[System.String]..ctor
Thread1 starts processing the list.
Thread2 calls the List constructor, which takes a count for the array to be created.
Thread1 changes the number of items in the list.
Thread2 has the wrong number of items.
Your locking scheme is wrong. It's even wrong in the for loop example.
You need to lock every time you access the shared resource - even to read or copy it. This doesn't mean you need to lock for the whole operation. It does mean that everyone sharing this shared resource needs to participate in the locking scheme.
Also consider defensive copying:
List<string> todos = null;
List<string> empty = new List<string>();
lock(Foo.Requests)
{
todos = Foo.Requests;
Foo.Requests = empty;
}
//now process local list todos
Even so, all those that share Foo.Requests must participate in the locking scheme.

You are trying to remove objects from list as you are iterating through list. (OK, technically, you are not doing this, but that's the goal you are trying to achieve).
Here's how you do it properly: while iterating, construct another list of entries that you want to remove. Simply construct another (temp) list, put all entries you want to remove from original list into the temp list.
List entries_to_remove = new List(...);
foreach( entry in original_list ) {
if( entry.someCondition() == true ) {
entries_to_remove.add( entry );
}
}
// Then when done iterating do:
original_list.removeAll( entries_to_remove );
Using "removeAll" method of List class.

I know it's not what you asked for, but just for the sake of my own sanity, does the following represent the intention of your code:
private object _locker = new object();
// ...
lock (_locker) {
Foo.Requests.Clear();
}

Related

Broken lock strategy - analysis and correction

I'm asking this primarily as a sanity check: In a C# (8.0) application I've got this bit of code, which spuriously fails with an "object is not synchronized" exception from Monitor.pulse() (I've omitted irrelevant code for clarity):
// vanilla multiple-producer single-consumer queue stuff:
private Queue<Message> messages = new Queue<Message>();
private void ConsumerThread () {
Queue<Message> myMessages = new Queue<Message>();
while (...) {
lock (messages) {
// wait
while (messages.Count == 0)
Monitor.Wait(messages);
// swap
(messages, myMessages) = (myMessages, messages);
}
// process
while (myMessages.Count > 0)
DoStuff(myMessages.Dequeue());
}
}
public void EnqueueMessage (...) {
Message message = new Message(...);
lock (messages) {
messages.Enqueue(message);
Monitor.Pulse(messages);
}
}
I'm fairly new to C# and also I was stressed when I wrote that. Now I am reviewing that code to fix the exception and I'm immediately raising an eyebrow at the fact that I reassigned messages inside the consumer's lock.
I looked around and found Is it bad to overwrite a lock object if it is the last statement in the lock?, which validates my raised eyebrow.
However, I still don't have a lot of confidence (inexperience + stress), so, just to confirm: Is the following analysis of why this is broken correct?
If the following happens, in this order:
Stuff happens to be in the queue.
Consumer thread locks messages (and will skip wait loop).
EnqueueMessage tries to lock messages, waits for lock.
Consumer thread swaps messages and myMessages, releases lock.
EnqueueMessage takes lock.
EnqueueMessage adds item to messages and calls Monitor.pulse(messages) except messages isn't the same object that it locked in step (3), since it was swapped out from under us in (4). Possible consequences include:
Calling Monitor.Pulse on a non-locked object (what used to be myMessages) -- hence the aforementioned exception.
Enqueueing to the wrong queue and the consequences of that.
Even weirder stuff if the consumer thread manages to complete another full loop cycle while EnqueueMessage is still somewhere in its lock{}.
Right? I'm pretty sure that's right, it feels very basic, but I just want to confirm because I'm completely burnt out right now.
Then, whether that's correct or not: Does the following proposed fix make sense?
It seems to me like the fix is super simple: Instead of using messages as the monitor object, just use some dedicated dummy object that won't be changed:
private readonly object messagesLock = new object();
private Queue<Message> messages = new Queue<Message>();
private void ConsumerThread () {
Queue<Message> myMessages = new Queue<Message>();
while (...) {
lock (messagesLock) {
while (messages.Count == 0)
Monitor.Wait(messagesLock);
(messages, myMessages) = (myMessages, messages);
}
}
...
}
public void EnqueueMessage (...) {
...;
lock (messagesLock) {
messages.Enqueue(...);
Monitor.Pulse(messagesLock);
}
}
Where the intent is to avoid any issues caused by swapping out the lock object in strange places.
And that should work... right?
Nobody uses Queue in multi-threading since .NET 2 probably 16 yrs ago (correct me if I am wrong with dates).
it is trivial with concurrent collections.
BlockingColleciton<Message> myMessages = new BlockingColleciton<Message>();
private void ConsumerThread () {
while (...)
{
var message = myMessages.Take();
}
...
}
public void EnqueueMessage (Message msg) {
...;
myMessages.Add(msg);
}

Parallel.ForEach: Best way to save off a collection when its record count gets high?

So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.

In ConcurrentDictionary, is the read operation reading the latest updated value?

I am using a ConcurrentDictionary (ongoingConnectionDic) in my code:
I check if a serial port number exists in the Dictionary.
If not existing, I add it into dictionary.
I perform communication with the serial port.
I remove the element from the ongoingConnectionDic.
If existing, I put the thread in wait.
My question is, can I ensure that when I perform a read operation, no other thread is simultaneously writing / updating the value ? So, am I reading the most recent value of the dictionary ?
If not, how do I achieve what I want?
Sample program:
class Program
{
// Dictionary in question
private static ConcurrentDictionary<string, string> ongoingPrinterJobs =
new ConcurrentDictionary<string, string>();
private static void sendPrint(string printerName)
{
if (ongoingPrinterJobs.ContainsKey(printerName))
{
// Add to pending list and run a thread to finish pending jobs by calling print();
}
else
{
ongoingPrinterJobs.TryAdd(printerName, ""); // -- Add it to the dictionary so that no other thread can
// use the printer
ThreadPool.QueueUserWorkItem(new WaitCallback(print), printerName);
}
}
private static void print(object stateInfo)
{
string printerName = (stateInfo as string);
string dummy;
// do printing work
// Remove from dictionary
ongoingPrinterJobs.TryRemove(printerName, out dummy);
}
static void Main(string[] args)
{
// Run threads here in random to print something on different printers
// Sample run with 10 printers
Random r = new Random();
for ( int i = 0 ; i < 10 ; i++ )
{
sendPrint(r.Next(0, 10).ToString());
}
}
The concurrent collections take a "snapshot" of the collection upon enumeration. This is to prevent the enumerator from becoming invalid if another thread comes along and writes to the collection.
A method such as ContainsKey may enumerate over the items in the dictionary (you'd have to look at the implementation), in which case, you may be reading stale data.
All concurrent collections allow you to do is ensure you can enumerate over a collection even if another thread writes to it while you're enumerating. This wasn't the case with the standard collections.
With that said, as others have mentioned in their comments, other issues of thread safety must still be considered (E.g. race conditions).
The only way to prevent someone inserting a value into the collection after you've attempted to read a value but before writing ia value is to lock the collection prior to reading the value to begin with, to ensure synchronized access to the collection throughout the entire transaction (I.e. The reading and subsequent writing of a value).

Thread Safety: Lock vs Reference

I have a C# program that has a list that does writes and reads in separate threads. The write is user initiated and can change the data at any random point in time. The read runs in a constant loop. It doesn't matter if the read is missing data in any given loop, as long as the data it does receive is valid and it get's the new data in a future loop.
After considering ConcurrentBag, I settled on using locks for a variety of reasons (simplicity being one of them). After implementing the locks, a coworker mentioned to me that using temporary references to point to the old List in memory would work just as well, but I am concerned about what will happen if the new assignment and the reference assignment would happen at the same time.
Q: Is the temporary reference example below thread safe?
Update: User input provides a list of strings which are used in DoStuff(). You can think of these strings as a definition of constants and as such the strings need to be persisted for future loops. They are not deleted in DoStuff(), only read. UserInputHandler is the only thread that will ever change this list and DoStuff() is the only thread that will ever read from this list. Nothing else has access to it.
Additionally, I am aware of the the Concurrent namespace and have used most of the collections in it in other projects, but, I have chosen not to use them here because of extra code complexity that they add (i.e. ConcurrentBag doesn't have a simple Clear() function, etc.). A simple lock is good enough in this situation. The question is only whether the second example below is thread safe.
Lock
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items.Clear();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
lock(items)
{
//Do read only actions with items here
foreach(var constant in constants)
{
//readonly actions
}
}
}
Reference
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items = new List<string>();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
var constantsReference = constants;
//Do read only actions with constantsReference here
foreach(var constant in constantsReference)
{
//readonly actions
}
}
This is not safe without the lock. Copying the reference to the list doesn't really do anything for you in this context. It's still quite possible for the list that you are currently iterating to be mutated in another thread while you are iterating it, causing all sorts of possible badness.
I think what you're looking for is BlockingCollection. Check out the following link for getting starting using it:
http://msdn.microsoft.com/en-us/library/dd997371%28v=vs.110%29.aspx
Here's an example of using BlockingCollection. ThreadB won't start enumerating the BlockingCollection until there are items available, and when it runs out of items, it will stop enumerating until more items become available (or until the IsCompleted property returns true)
private static readonly BlockingCollection<int> Items = new BlockingCollection<int>();
//ThreadA
public void LoadStuff()
{
Items.Add(1);
Items.Add(2);
Items.Add(3);
}
//ThreadB
public void DoStuff()
{
foreach (var item in Items.GetConsumingEnumerable())
{
//Do stuff here
}
}
Lock Free is dangerous and not portable. Don't do it. If you need to read on how to do lock-free, you probably shouldn't be doing it.
I think I missed understood the question. I under the strange impression that the list was only ever added to or only the most recent version is what matters. No idea how I came to that when he explicitly shows a "clear()" call.
I apologize for the confusion.
This code is being disputed, use at your own risk, but I'm quite sure it should work on x86/x64, but no clue about ARM
You could do something like this
//Suggested to just use volatile instead of memorybarrier
static volatile T _MyList = new ReadOnlyList<T>();
void Load(){
T LocalList = _MyList.Copy();
LocalList.Add(1);
LocalList.Add(2);
LocalList.Add(3);
_MyList = LocalList.ReadOnly(); //Making it more clear
}
DoStuff(){
T LocalList = _MyList;
foreach(t tmp in LocalList)
}
This should work well for heavy read workloads. If you have more than one writer that modifies _MyList, you'll need to figure out a way to synchronize them.

How to avoid double check locking when adding items to a Dictionary<> object in .NET?

I have a question about improving the efficiency of my program. I have a Dictionary<string, Thingey> defined to hold named Thingeys. This is a web application that will create multiple named Thingey’s over time. Thingey’s are somewhat expensive to create (not prohibitively so) but I’d like to avoid it whenever possible. My logic for getting the right Thingey for the request looks a lot like this:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
lock (this.Thingeys)
{
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
// else - oops someone else beat us to it
// newThingey will eventually get GCed
}
}
return this. Thingeys[thingeyName];
}
In this application, Thingeys live forever once created. We don’t know how to create them or which ones will be needed until the app starts and requests begin coming in. The question I have is in the above code is there are occasional instances where newThingey is created because we get multiple simultaneous requests for it before it’s been created. We end up creating 2 of them but only adding one to our collection.
Is there a better way to get Thingeys created and added that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using? (And this code works and has been running for some time. This is just the nagging bit that has always bothered me.)
I'm trying to avoid locking the dictionary for the duration of creating a Thingey.
This is the standard double check locking problem. The way it is implemented here is unsafe and can cause various problems - potentially up to the point of a crash in the first check if the internal state of the dictionary is screwed up bad enough.
It is unsafe because you are checking it without synchronization and if your luck is bad enough you can hit it while some other thread is in the middle of updating internal state of the dictionary
A simple solution is to place the first check under a lock as well. A problem with this is that this becomes a global lock and in web environment under heavy load it can become a serious bottleneck.
If we are talking about .NET environment, there are ways to work around this issue by piggybacking on the ASP.NET synchronization mechanism.
Here is how I did it in NDjango rendering engine: I keep one global dictionary and one dictionary per rendering thread. When a request comes I check the local dictionary first - this check does not have to be synchronized and if the thingy is there I just take it
If it is not I synchronize on the global dictionary check if it is there and if it is add it to my thread dictionary and release the lock. If it is not in the global dictionary I add it there first while still under lock.
Well, from my point of view simpler code is better, so I'd only use one lock:
private readonly object thingeysLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
lock (thingeysLock)
{
Thingey ret;
if (!thingeys.TryGetValue(key, out ret))
{
ret = new Thingey(request);
thingeys[key] = ret;
}
return ret;
}
}
Locks are really cheap when they're not contended. The downside is that this means that occasionally you will block everyone for the whole duration of the time you're creating a new Thingey. Clearly to avoid creating redundant thingeys you'd have to at least block while multiple threads create the Thingey for the same key. Reducing it so that they only block in that situation is somewhat harder.
I would suggest you use the above code but profile it to see whether it's fast enough. If you really need "only block when another thread is already creating the same thingey" then let us know and we'll see what we can do...
EDIT: You've commented on Adam's answer that you "don't want to lock while a new Thingey is being created" - you do realise that there's no getting away from that if there's contention for the same key, right? If thread 1 starts creating a Thingey, then thread 2 asks for the same key, your alternatives for thread 2 are either waiting or creating another instance.
EDIT: Okay, this is generally interesting, so here's a first pass at the "only block other threads asking for the same item".
private readonly object dictionaryLock = new object();
private readonly object creationLocksLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
private readonly Dictionary<string, object> creationLocks;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
Thingey ret;
bool entryExists;
lock (dictionaryLock)
{
entryExists = thingeys.TryGetValue(key, out ret);
// Atomically mark the dictionary to say we're creating this item,
// and also set an entry for others to lock on
if (!entryExists)
{
thingeys[key] = null;
lock (creationLocksLock)
{
creationLocks[key] = new object();
}
}
}
// If we found something, great!
if (ret != null)
{
return ret;
}
// Otherwise, see if we're going to create it or whether we need to wait.
if (entryExists)
{
object creationLock;
lock (creationLocksLock)
{
creationLocks.TryGetValue(key, out creationLock);
}
// If creationLock is null, it means the creating thread has finished
// creating it and removed the creation lock, so we don't need to wait.
if (creationLock != null)
{
lock (creationLock)
{
Monitor.Wait(creationLock);
}
}
// We *know* it's in the dictionary now - so just return it.
lock (dictionaryLock)
{
return thingeys[key];
}
}
else // We said we'd create it
{
Thingey thingey = new Thingey(request);
// Put it in the dictionary
lock (dictionaryLock)
{
thingeys[key] = thingey;
}
// Tell anyone waiting that they can look now
lock (creationLocksLock)
{
Monitor.PulseAll(creationLocks[key]);
creationLocks.Remove(key);
}
return thingey;
}
}
Phew!
That's completely untested, and in particular it isn't in any way, shape or form robust in the face of exceptions in the creating thread... but I think it's the generally right idea :)
If you're looking to avoid blocking unrelated threads, then additional work is needed (and should only be necessary if you've profiled and found that performance is unacceptable with the simpler code). I would recommend using a lightweight wrapper class that asynchronously creates a Thingey and using that in your dictionary.
Dictionary<string, ThingeyWrapper> thingeys = new Dictionary<string, ThingeyWrapper>();
private class ThingeyWrapper
{
public Thingey Thing { get; private set; }
private object creationLock;
private Request request;
public ThingeyWrapper(Request request)
{
creationFlag = new object();
this.request = request;
}
public void WaitForCreation()
{
object flag = creationFlag;
if(flag != null)
{
lock(flag)
{
if(request != null) Thing = new Thingey(request);
creationFlag = null;
request = null;
}
}
}
}
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
ThingeyWrapper output;
lock (this.Thingeys)
{
if(!this.Thingeys.TryGetValue(thingeyName, out output))
{
output = new ThingeyWrapper(request);
this.Thingeys.Add(thingeyName, output);
}
}
output.WaitForCreation();
return output.Thing;
}
While you are still locking on all calls, the creation process is much more lightweight.
Edit
This issue has stuck with me more than I expected it to, so I whipped together a somewhat more robust solution that follows this general pattern. You can find it here.
IMHO, if this piece of code is called from many thread simultaneous, it is recommended to check it twice.
(But: I'm not sure that you can safely call ContainsKey while some other thread is call Add. So it might not be possible to avoid the lock at all.)
If you just want to avoid the Thingy is created but not used, just create it within the locking block:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
// only one can create the same Thingy
Thingey newThingey = new Thingey(request);
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
}
}
return this. Thingeys[thingeyName];
}
You have to ask yourself the question whether the specific ContainsKey operation and the getter are themselfes threadsafe (and will stay that way in newer versions), because those may and willbe invokes while another thread has the dictionary locked and is performing the Add.
Typically, .NET locks are fairly efficient if used correctly, and I believe that in this situation you're better of doing this:
bool exists;
lock (thingeys) {
exists = thingeys.TryGetValue(thingeyName, out thingey);
}
if (!exists) {
thingey = new Thingey();
}
lock (thingeys) {
if (!thingeys.ContainsKey(thingeyName)) {
thingeys.Add(thingeyName, thingey);
}
}
return thingey;
Well I hope not being to naive at giving this answer. but what I would do, as Thingyes are expensive to create, would be to add the key with a null value. That is something like this
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
this.Thingeys.Add(thingeyName, null);
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
Thingeys[thingeyName] = newThingey;
}
// else - oops someone else beat us to it
// but it doesn't mather anymore since we only created one Thingey
}
}
return this.Thingeys[thingeyName];
}
I modified your code in a rush so no testing was done.
Anyway, I hope my idea is not so naive. :D
You might be able to buy a little bit of speed efficiency at the expense of memory. If you create an immutable array that lists all of the created Thingys and reference the array with a static variable, then you could check the existance of a Thingy outside of any lock, since immutable arrays are always thread safe. Then when adding a new Thingy, you can create a new array with the additional Thingy and replace it (in the static variable) in one (atomic) set operation. Some new Thingys may be missed, because of race conditions, but the program shouldn't fail. It just means that on rare occasions extra duplicate Thingys will be made.
This will not replace the need for duplicate checking when creating a new Thingy, and it will use a lot of memory resources, but it will not require that the lock be taken or held while creating a Thingy.
I'm thinking of something along these lines, sorta:
private Dictionary<string, Thingey> Thingeys;
// An immutable list of (most of) the thingeys that have been created.
private string[] existingThingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
// Reference the same list throughout the method, just in case another
// thread replaces the global reference between operations.
string[] localThingyList = existingThingeys;
// Check to see if we already made this Thingey. (This might miss some,
// but it doesn't matter.
// This operation on an immutable array is thread-safe.
if (localThingyList.Contains(thingeyName))
{
// But referencing the dictionary is not thread-safe.
lock (this.Thingeys)
{
if (this.Thingeys.ContainsKey(thingeyName))
return this.Thingeys[thingeyName];
}
}
Thingey newThingey = new Thingey(request);
Thiney ret;
// We haven't locked anything at this point, but we have created a new
// Thingey that we probably needed.
lock (this.Thingeys)
{
// If it turns out that the Thingey was already there, then
// return the old one.
if (!Thingeys.TryGetValue(thingeyName, out ret))
{
// Otherwise, add the new one.
Thingeys.Add(thingeyName, newThingey);
ret = newThingey;
}
}
// Update our existingThingeys array atomically.
string[] newThingyList = new string[localThingyList.Length + 1];
Array.Copy(localThingyList, newThingey, localThingyList.Length);
newThingey[localThingyList.Length] = thingeyName;
existingThingeys = newThingyList; // Voila!
return ret;
}

Categories

Resources