ConcurrentQueue in a ConcurrentDictionary Duplicate Error

ConcurrentQueue in a ConcurrentDictionary Duplicate Error - c#

I have a thread that handles the message receiving every 10 seconds and have another one write these messages to the database every minute.
Each message has a different sender which is named serialNumber in my case.
Therefore, I created a ConcurrentDictionary like below.
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
The key of the dictionary is serialNumber and the value is the collection of 1-minute messages. The reason I want to collect a minute of data is instead of going database every 10 seconds is go once in every minute so I can reduce the process by 1/6 times.
public class ShotManager
{
private const int SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER = 25000;
private bool ACTIVE_FILE_DB_SHOOT_THREAD = false;
private List<Devices> _devices = new List<Devices>();
public ConcurrentDictionary<string, ConcurrentQueue<PacketModel>> _dicAllPackets;
public ShotManager()
{
ACTIVE_FILE_DB_SHOOT_THREAD = Utility.GetAppSettings("AppConfig", "0", "ACTIVE_LIST_DB_SHOOT") == "1";
init();
}
private void init()
{
using (iotemplaridbContext dbContext = new iotemplaridbContext())
_devices = (from d in dbContext.Devices select d).ToList();
if (_dicAllPackets is null)
_dicAllPackets = new ConcurrentDictionary<string, ConcurrentQueue<PacketModel>>();
foreach (var device in _devices)
{
if(!_dicAllPackets.ContainsKey(device.SerialNumber))
_dicAllPackets.TryAdd(device.SerialNumber, new ConcurrentQueue<PacketModel> { });
}
}
public void Spinner()
{
while (ACTIVE_FILE_DB_SHOOT_THREAD)
{
try
{
Parallel.ForEach(_dicAllPackets, devicePacket =>
{
Thread.Sleep(100);
readAndShot(devicePacket);
});
Thread.Sleep(SLEEP_THREAD_FOR_FILE_LIST_DB_SHOOTER);
//init();
}
catch (Exception ex)
{
//init();
tLogger.EXC("Spinner exception for write...", ex);
}
}
}
public void EnqueueObjectToQueue(string serialNumber, PacketModel model)
{
if (_dicAllPackets != null)
{
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
}
}
private void readAndShot(KeyValuePair<string, ConcurrentQueue<PacketModel>> keyValuePair)
{
StringBuilder sb = new StringBuilder();
if (keyValuePair.Value.Count() <= 0)
{
return;
}
sb.AppendLine($"INSERT INTO ......) VALUES(");
//the reason why I don't use while(TryDequeue(out ..)){..} is there's constantly enqueue to this dictionary, so the thread will be occupied with a single device for so long
for (int i = 0; i < 10; i++)
{
keyValuePair.Value.TryDequeue(out PacketModel packet);
if (packet != null)
{
/*
*** do something and fill the sb...
*/
}
else
{
Console.WriteLine("No packet found! For Device: " + keyValuePair.Key);
break;
}
}
insertIntoDB(sb.ToString()[..(sb.Length - 5)] + ";");
}
}
EnqueueObjectToQueue caller is from a different class like below.
private void packetToDictionary(string serialNumber, string jsonPacket, string messageTimeStamp)
{
PacketModel model = new PacketModel {
MachineData = jsonPacket,
DataInsertedAt = messageTimeStamp
};
_shotManager.EnqueueObjectToQueue(serialNumber, model);
}
How I call the above function is from the handler function itself.
private void messageReceiveHandler(object sender, MessageReceviedEventArgs e){
//do something...parse from e and call the func
string jsonPacket = ""; //something parsed from e
string serialNumber = ""; //something parsed from e
string message_timestamp = DateTime.Now().ToString("yyyy-MM-dd HH:mm:ss");
ThreadPool.QueueUserWorkItem(state => packetToDictionary(serialNumber, str, message_timestamp));
}
The problem is sometimes some packets are enqueued under the wrong serialNumber or repeat itself(duplicate entry).
Is it clever to use ConcurrentQueue in a ConcurrentDictionary like this?

No, it's not a good idea to use a ConcurrentDictionary with nested ConcurrentQueues as values. It's impossible to update atomically this structure. Take this for example:
if (!_dicAllPackets.ContainsKey(serialNumber))
_dicAllPackets.TryAdd(serialNumber, new ConcurrentQueue<PacketModel> { });
else
_dicAllPackets[serialNumber].Enqueue(model);
This little piece of code is riddled with race conditions. A thread that is running this code can be intercepted by another thread at any point between the ContainsKey, TryAdd, the [] indexer and the Enqueue invocations, altering the state of the structure, and invalidating the conditions on which the correctness of the current thread's work is based.
A ConcurrentDictionary is a good idea when you have a simple Dictionary that contains immutable values, you want to use it concurrently, and using a lock around each access could potentially create significant contention. You can read more about this here: When should I use ConcurrentDictionary and Dictionary?
My suggestion is to switch to a simple Dictionary<string, Queue<PacketModel>>, and synchronize it with a lock. If you are careful and you avoid doing anything irrelevant while holding the lock, the lock will be released so quickly that rarely other threads will be blocked by it. Use the lock just to protect the reading and updating of a specific entry of the structure, and nothing else.
Alternative designs
A ConcurrentDictionary<string, Queue<PacketModel>> structure might be a good option, under the condition that you never removed queues from the dictionary. Otherwise there is still space for race conditions to occur. You should use exclusively the GetOrAdd method to get or add atomically a queue in the dictionary, and also use always the queue itself as a locker before doing anything with it (either reading or writing):
Queue<PacketModel> queue = _dicAllPackets
.GetOrAdd(serialNumber, _ => new Queue<PacketModel>());
lock (queue)
{
queue.Enqueue(model);
}
Using a ConcurrentDictionary<string, ImmutableQueue<PacketModel>> is also possible because in this case the value of the ConcurrentDictionary is immutable, and you won't need to lock anything. You'll need to use always the AddOrUpdate method, in order to update the dictionary with a single call, as an atomic operation.
_dicAllPackets.AddOrUpdate
(
serialNumber,
key => ImmutableQueue.Create<PacketModel>(model),
(key, queue) => queue.Enqueue(model)
);
The queue.Enqueue(model) call inside the updateValueFactory delegate does not mutate the queue. Instead it creates a new ImmutableQueue<PacketModel> and discards the previous one. The immutable collections are not very efficient in general. But if your goal is to minimize the contention between threads, at the cost of increasing the work that each thread has to do, then you might find them useful.

Related

Unable to implement data parsing in a multi-threaded context using lock

I've built a program that
takes in a list of record data from a file
parses and cleans up each record in a parsing object
outputs it to an output file
So far this has worked on a single thread, but considering the fact that records can exceed 1 million in some cases, we want to implement this in a multi threading context. Multi threading is new to me in .Net, and I've given it a shot but its not working. Below I will provide more details and code:
Main Class (simplified):
public class MainClass
{
parseObject[] parseObjects;
Thread[] threads;
List<InputLineItem> inputList = new List<InputLineItem>();
FileUtils fileUtils = new FileUtils();
public GenParseUtilsThreaded(int threadCount)
{
this.threadCount = threadCount;
Init();
}
public void Init()
{
inputList = fileUtils.GetInputList();
parseObjects = new parseObject[threadCount - 1];
threads = new Thread[threadCount - 1];
InitParseObjects();
Parse();
}
private void InitParseObjects()
{
//using a ref of fileUtils to use as my lock expression
parseObjects[0] = new ParseObject(ref fileUtils);
parseObjects[0].InitValues();
for (int i = 1; i < threadCount - 1; i++)
{
parseObjects[i] = new parseObject(ref fileUtils);
parseObjects[i].InitValues();
}
}
private void InitThreads()
{
for (int i = 0; i < threadCount - 1; i++)
{
Thread t = new Thread(new ThreadStart(parseObjects[0].CleanupAndParseInput));
threads[i] = t;
}
}
public void Parse()
{
try
{
InitThreads();
int objectIndex = 0;
foreach (InputLineItem inputLineItem in inputList)
{
parseObjects[0].inputLineItem = inputLineItem;
threads[objectIndex].Start();
objectIndex++;
if (objectIndex == threadCount)
{
objectIndex = 0;
InitThreads(); //do i need to re-init the threads after I've already used them all once?
}
}
}
catch (Exception e)
{
Console.WriteLine("(286) The following error occured: " + e);
}
}
}
}
And my Parse object class (also simplified):
public class ParseObject
{
public ParserLibrary parser { get; set; }
public FileUtils fileUtils { get; set; }
public InputLineItem inputLineItem { get; set; }
public ParseObject( ref FileUtils fileUtils)
{
this.fileUtils = fileUtils;
}
public void InitValues()
{
//relevant config of parser library object occurs here
}
public void CleanupFields()
{
parser.Clean(inputLineItem.nameValue);
inputLineItem.nameValue = GetCleanupUpValueFromParser();
}
private string GetCleanupFieldValue()
{
//code to extract cleanup up value from parses
}
public void CleanupAndParseInput()
{
CleanupFields();
ParseInput();
}
public void ParseInput()
{
try
{
parser.Parse(InputLineItem.NameValue);
}
catch (Exception e)
{
}
try
{
lock (fileUtils)
{
WriteOutputToFile(inputLineItem);
}
}
catch (Exception e)
{
Console.WriteLine("(414) Failed to write to output: " + e);
}
}
public void WriteOutputToFile(InputLineItem inputLineItem)
{
//writes updated value to output file
}
}
The error I get is when trying to run the Parse function, I get this message:
An unhandled exception of type 'System.AccessViolationException' occurred in GenParse.NET.dll
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
That being said, I feel like there's a whole lot more that I'm doing wrong here aside from what is causing that error.
I also have further questions:
Do I create multiple parse objects and iteratively feed them to each thread as I'm attempting to do, or should I use one Parse object that gets shared or cloned across each thread?
If, outside the thread, I change a value in the object that I'm passing to the thread, will that change reflect in the object passed to the thread? i.e, is the object passed by value or reference?
Is there a more efficient way for each record to be assigned to a thread and its parse object than I am currently doing with the objectIndex iterator?
THANKS!

Do I create multiple parse objects and iteratively feed them to each thread as I'm attempting to do, or should I use one Parse object that gets shared or cloned across each thread?
You initialize each thread with new ThreadStart(parseObjects[0].CleanupAndParseInput) so all threads will share the same parse object. It is a fairly safe bet that the parse objects are not threadsafe. So each thread should have a separate object. Note that this might not be sufficient, if the parse library uses any global fields it might be non-threadsafe even when using separate objects.
If, outside the thread, I change a value in the object that I'm passing to the thread, will that change reflect in the object passed to the thread? i.e, is the object passed by value or reference?
Objects (i.e. classes) are passed by reference. But any changes to an object are not guaranteed to be visible in other threads unless a memoryBarrier is issued. Most synchronization code (like lock) will issue memory barriers. Keep in mind that any non-atomic operation is unsafe if a field is written an read concurrently.
Is there a more efficient way for each record to be assigned to a thread and its parse object than I am currently doing with the objectIndex iterator?
Using manual threads in this way is very old-school. The modern, easier, and probably faster way is to use a parallel-for loop. This will try to be smart about how many threads it will use and try to adapt chunk sizes to keep the synchronization overhead low.
var items = new List<int>();
ParseObject LocalInit()
{
// Do initalization, This is run once for each thread used
return new ParseObject();
}
ParseObject ThreadMain(int value, ParallelLoopState state, ParseObject threadLocalObject)
{
// Do whatever you need to do
// This is run on multiple threads
return threadLocalObject;
}
void LocalFinally(ParseObject obj)
{
// Do Cleanup for each thread
}
Parallel.ForEach(items, LocalInit, ThreadMain, LocalFinally);
As a final note, I would advice against using multithreading unless you are familiar with the potential dangers and pitfalls it involves, at least for any project where the result is important. There are many ways to screw up and make a program that will work 99.9% of the time, and silently corrupt data the remaining 0.1% of the time.

Share local collection between Threads in C#

I have two methods as below
private void MethodB_GetId()
{
//Calling Method A constinuosly in different thread
//Let's say its calling for Id = 1 to 100
}
private void MethodA_GetAll()
{
List<string> lst;
lock(_locker)
{
lst = SomeService.Get(); //This get return all 100 ids in one shot.
//Some other processing and then return result.
}
}
Now client is calling MethodB_GetById continuously for fetching data for id: 1 to 100 randomly. (It require some of data from these 100 Ids, not all data)
MethodA_GetAll get all data from network may be cache or database in one shot. and return whole collection to method B, then method B extract record in which it is interested.
Now if MethodA_GetAll() makes GetALL() times multiple times and fetching same records will be useless. so i can put a lock around it one thread is fetching record then other will be blocked.
Let's When MethodA_GetAll called by Id = 1 acquire lock and all others are waiting for lock to be released.
What i want is one data is available by any one thread just don't make call again.
Solution option:
1. Make List global to that class and thread safe. (I don't have that option)
I require some how thread 1 tell all other threads that i have record don't go fetching record again.
something like
lock(_locker && Lst!=null) //Not here lst is local to every thread
{
//If this satisfy then only fetch records
}
Please excuse me for poorly framing question. I have posted this in little hurry.

It sounds like you want to create a threadsafe cache. One way to do this is to use Lazy<t>.
Here's an example for a cache of type List<string>:
public sealed class DataProvider
{
public DataProvider()
{
_cache = new Lazy<List<string>>(createCache);
}
public void DoSomethingThatNeedsCachedList()
{
var list = _cache.Value;
// Do something with list.
Console.WriteLine(list[10]);
}
readonly Lazy<List<string>> _cache;
List<string> createCache()
{
// Dummy implementation.
return Enumerable.Range(1, 100).Select(x => x.ToString()).ToList();
}
}
When you need to access the cached value, you just access _cache.Value. If it hasn't yet been created, then the method you passed to the Lazy<T>'s constructor will be called to initialise it. In the example above, this is the createCache() method.
This is done in a threadsafe manner, so that if two threads try to access the cached value simultaneously when it hasn't been created yet, one of the threads will actually end up calling createCache() and the other thread will be blocked until the cached value has been initialised.

You can try double-check-locking lst:
private List<string> lst;
private void MethodA_GetAll()
{
if (lst == null)
{
lock (_locker)
{
if (lst == null)
{
// do your thing
}
}
}
}

An enumerator wrapper that pre-buffers a number of items from underlying enumerator in advance

Suppose I have some IEnumerator<T> which does a fair amount of processing inside the MoveNext() method.
The code consuming from that enumerator does not just consume as fast as data is available, but occasionally waits (the specifics of which are irrelevant to my question) in order to synchronize the time when it needs to resume consumption. But when it does the next call to MoveNext(), it needs the data as fast as possible.
One way would be to pre-consume the whole stream into some list or array structure for instant enumeration. That would be a waste of memory however, as at any single point in time, only one item is in use, and it would be prohibitive in cases where the whole data does not fit into memory.
So is there something generic in .net that wraps an enumerator / enumerable in a way that it asynchronously pre-iterates the underlying enumerator a couple of items in advance and buffers the results so that it always has a number of items available in its buffer and the calling MoveNext will never have to wait? Obviously items consumed, i.e. iterated over by a subsequent MoveNext from the caller, would be removed from the buffer.
N.B. Part of what I'm trying to do is also called Backpressure, and, in the Rx world, has already been implemented in RxJava and is under discussion in Rx.NET. Rx (observables that push data) can be considered the opposite approach of enumerators (enumerators allow pulling of data). Backpressure is relatively easy in the pulling approach, as my answer shows: Just pause consumption. It's harder when pushing, requiring an additional feedback mechanism.

A more concise alternative to your custom enumerable class is to do this:
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> source, int bufferSize)
{
var queue = new BlockingCollection<T>(bufferSize);
Task.Run(() => {
foreach(var i in source) queue.Add(i);
queue.CompleteAdding();
});
return queue.GetConsumingEnumerable();
}
This can be used as:
var slowEnumerable = GetMySlowEnumerable();
var buffered = slowEnumerable.Buffer(10); // Populates up to 10 items on a background thread

There are different ways to implement this yourself, and I decided to use
a single dedicated thread per enumerator that does the asynchronous pre-buffering
a fixed number of elements to pre-buffer
Which is perfect for my case at hand (only a few, very long-running enumerators), but e.g. creating a thread might be too heavy if you use lots and lots of enumerators, and the fixed number of elements may be too inflexible if you need something more dynamic, based perhaps on the actual content of the items.
I have so far only tested its main feature, and some rough edges may remain. It can be used like this:
int bufferSize = 5;
IEnumerable<int> en = ...;
foreach (var item in new PreBufferingEnumerable<int>(en, bufferSize))
{
...
Here's the gist of the Enumerator:
class PreBufferingEnumerator<TItem> : IEnumerator<TItem>
{
private readonly IEnumerator<TItem> _underlying;
private readonly int _bufferSize;
private readonly Queue<TItem> _buffer;
private bool _done;
private bool _disposed;
public PreBufferingEnumerator(IEnumerator<TItem> underlying, int bufferSize)
{
_underlying = underlying;
_bufferSize = bufferSize;
_buffer = new Queue<TItem>();
Thread preBufferingThread = new Thread(PreBufferer) { Name = "PreBufferingEnumerator.PreBufferer", IsBackground = true };
preBufferingThread.Start();
}
private void PreBufferer()
{
while (true)
{
lock (_buffer)
{
while (_buffer.Count == _bufferSize && !_disposed)
Monitor.Wait(_buffer);
if (_disposed)
return;
}
if (!_underlying.MoveNext())
{
lock (_buffer)
_done = true;
return;
}
var current = _underlying.Current; // do outside lock, in case underlying enumerator does something inside get_Current()
lock (_buffer)
{
_buffer.Enqueue(current);
Monitor.Pulse(_buffer);
}
}
}
public bool MoveNext()
{
lock (_buffer)
{
while (_buffer.Count == 0 && !_done && !_disposed)
Monitor.Wait(_buffer);
if (_buffer.Count > 0)
{
Current = _buffer.Dequeue();
Monitor.Pulse(_buffer); // so PreBufferer thread can fetch more
return true;
}
return false; // _done || _disposed
}
}
public TItem Current { get; private set; }
public void Dispose()
{
lock (_buffer)
{
if (_disposed)
return;
_disposed = true;
_buffer.Clear();
Current = default(TItem);
Monitor.PulseAll(_buffer);
}
}

Serially process ConcurrentQueue and limit to one message processor. Correct pattern?

I'm building a multithreaded app in .net.
I have a thread that listens to a connection (abstract, serial, tcp...).
When it receives a new message, it adds it to via AddMessage. Which then call startSpool. startSpool checks to see if the spool is already running and if it is, returns, otherwise, starts it in a new thread. The reason for this is, the messages HAVE to be processed serially, FIFO.
So, my questions are...
Am I going about this the right way?
Are there better, faster, cheaper patterns out there?
My apologies if there is a typo in my code, I was having problems copying and pasting.
ConcurrentQueue<IMyMessage > messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances;
int currentSpoolInstances = 0;
Thread spoolThread;
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
this.startSpool();
}
private void startSpool()
{
bool run = false;
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances <= maxSpoolInstances)
{
this.currentSpoolInstances++;
run = true;
}
else
{
return;
}
}
if (run)
{
this.spoolThread = new Thread(new ThreadStart(spool));
this.spoolThread.Start();
}
}
private void spool()
{
Message.ITimingMessage message;
while (this.messages.Count > 0)
{
// TODO: Is this below line necessary or does the TryDequeue cover this?
message = null;
this.messages.TryDequeue(out message);
if (message != null)
{
// My long running thing that does something with this message.
}
}
lock (lcurrentSpoolInstances)
{
this.currentSpoolInstances--;
}
}

This would be easier using BlockingCollection<T> instead of ConcurrentQueue<T>.
Something like this should work:
class MessageProcessor : IDisposable
{
BlockingCollection<IMyMessage> messages = new BlockingCollection<IMyMessage>();
public MessageProcessor()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
public void AddMessage(IMyMessage message)
{
this.messages.Add(message);
}
private void Spool()
{
foreach(IMyMessage message in this.messages.GetConsumingEnumerable())
{
// long running thing that does something with this message.
}
}
public void FinishProcessing()
{
// This will tell the spooling you're done adding, so it shuts down
this.messages.CompleteAdding();
}
void IDisposable.Dispose()
{
this.FinishProcessing();
}
}
Edit: If you wanted to support multiple consumers, you could handle that via a separate constructor. I'd refactor this to:
public MessageProcessor(int numberOfConsumers = 1)
{
for (int i=0;i<numberOfConsumers;++i)
StartConsumer();
}
private void StartConsumer()
{
// Move this to constructor to prevent race condition in existing code (you could start multiple threads...
Task.Factory.StartNew(this.spool, TaskCreationOptions.LongRunning);
}
This would allow you to start any number of consumers. Note that this breaks the rule of having it be strictly FIFO - the processing will potentially process "numberOfConsumer" elements in blocks with this change.
Multiple producers are already supported. The above is thread safe, so any number of threads can call Add(message) in parallel, with no changes.

I think that Reed's answer is the best way to go, but for the sake of academics, here is an example using the concurrent queue -- you had some races in the code that you posted (depending upon how you handle incrementing currnetSpoolInstances)
The changes I made (below) were:
Switched to a Task instead of a Thread (uses thread pool instead of incurring the cost of creating a new thread)
added the code to increment/decrement your spool instance count
changed the "if currentSpoolInstances <= max ... to just < to avoid having one too many workers (probably just a typo)
changed the way that empty queues were handled to avoid a race: I think you had a race, where your while loop could have tested false, (you thread begins to exit), but at that moment, a new item is added (so your spool thread is exiting, but your spool count > 0, so your queue stalls).
private ConcurrentQueue<IMyMessage> messages = new ConcurrentQueue<IMyMessage>();
const int maxSpoolInstances = 1;
object lcurrentSpoolInstances = new object();
int currentSpoolInstances = 0;
public void AddMessage(IMyMessage message)
{
this.messages.Enqueue(message);
this.startSpool();
}
private void startSpool()
{
lock (lcurrentSpoolInstances)
{
if (currentSpoolInstances < maxSpoolInstances)
{
this.currentSpoolInstances++;
Task.Factory.StartNew(spool, TaskCreationOptions.LongRunning);
}
}
}
private void spool()
{
IMyMessage message;
while (true)
{
// you do not need to null message because it is an "out" parameter, had it been a "ref" parameter, you would want to null it.
if(this.messages.TryDequeue(out message))
{
// My long running thing that does something with this message.
}
else
{
lock (lcurrentSpoolInstances)
{
if (this.messages.IsEmpty)
{
this.currentSpoolInstances--;
return;
}
}
}
}
}

Check 'Pipelines pattern': http://msdn.microsoft.com/en-us/library/ff963548.aspx
Use BlockingCollection for the 'buffers'.
Each Processor (e.g. ReadStrings, CorrectCase, ..), should run in a Task.
HTH..

How to avoid double check locking when adding items to a Dictionary<> object in .NET?

I have a question about improving the efficiency of my program. I have a Dictionary<string, Thingey> defined to hold named Thingeys. This is a web application that will create multiple named Thingey’s over time. Thingey’s are somewhat expensive to create (not prohibitively so) but I’d like to avoid it whenever possible. My logic for getting the right Thingey for the request looks a lot like this:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
lock (this.Thingeys)
{
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
// else - oops someone else beat us to it
// newThingey will eventually get GCed
}
}
return this. Thingeys[thingeyName];
}
In this application, Thingeys live forever once created. We don’t know how to create them or which ones will be needed until the app starts and requests begin coming in. The question I have is in the above code is there are occasional instances where newThingey is created because we get multiple simultaneous requests for it before it’s been created. We end up creating 2 of them but only adding one to our collection.
Is there a better way to get Thingeys created and added that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using? (And this code works and has been running for some time. This is just the nagging bit that has always bothered me.)
I'm trying to avoid locking the dictionary for the duration of creating a Thingey.

This is the standard double check locking problem. The way it is implemented here is unsafe and can cause various problems - potentially up to the point of a crash in the first check if the internal state of the dictionary is screwed up bad enough.
It is unsafe because you are checking it without synchronization and if your luck is bad enough you can hit it while some other thread is in the middle of updating internal state of the dictionary
A simple solution is to place the first check under a lock as well. A problem with this is that this becomes a global lock and in web environment under heavy load it can become a serious bottleneck.
If we are talking about .NET environment, there are ways to work around this issue by piggybacking on the ASP.NET synchronization mechanism.
Here is how I did it in NDjango rendering engine: I keep one global dictionary and one dictionary per rendering thread. When a request comes I check the local dictionary first - this check does not have to be synchronized and if the thingy is there I just take it
If it is not I synchronize on the global dictionary check if it is there and if it is add it to my thread dictionary and release the lock. If it is not in the global dictionary I add it there first while still under lock.

Well, from my point of view simpler code is better, so I'd only use one lock:
private readonly object thingeysLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
lock (thingeysLock)
{
Thingey ret;
if (!thingeys.TryGetValue(key, out ret))
{
ret = new Thingey(request);
thingeys[key] = ret;
}
return ret;
}
}
Locks are really cheap when they're not contended. The downside is that this means that occasionally you will block everyone for the whole duration of the time you're creating a new Thingey. Clearly to avoid creating redundant thingeys you'd have to at least block while multiple threads create the Thingey for the same key. Reducing it so that they only block in that situation is somewhat harder.
I would suggest you use the above code but profile it to see whether it's fast enough. If you really need "only block when another thread is already creating the same thingey" then let us know and we'll see what we can do...
EDIT: You've commented on Adam's answer that you "don't want to lock while a new Thingey is being created" - you do realise that there's no getting away from that if there's contention for the same key, right? If thread 1 starts creating a Thingey, then thread 2 asks for the same key, your alternatives for thread 2 are either waiting or creating another instance.
EDIT: Okay, this is generally interesting, so here's a first pass at the "only block other threads asking for the same item".
private readonly object dictionaryLock = new object();
private readonly object creationLocksLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
private readonly Dictionary<string, object> creationLocks;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
Thingey ret;
bool entryExists;
lock (dictionaryLock)
{
entryExists = thingeys.TryGetValue(key, out ret);
// Atomically mark the dictionary to say we're creating this item,
// and also set an entry for others to lock on
if (!entryExists)
{
thingeys[key] = null;
lock (creationLocksLock)
{
creationLocks[key] = new object();
}
}
}
// If we found something, great!
if (ret != null)
{
return ret;
}
// Otherwise, see if we're going to create it or whether we need to wait.
if (entryExists)
{
object creationLock;
lock (creationLocksLock)
{
creationLocks.TryGetValue(key, out creationLock);
}
// If creationLock is null, it means the creating thread has finished
// creating it and removed the creation lock, so we don't need to wait.
if (creationLock != null)
{
lock (creationLock)
{
Monitor.Wait(creationLock);
}
}
// We *know* it's in the dictionary now - so just return it.
lock (dictionaryLock)
{
return thingeys[key];
}
}
else // We said we'd create it
{
Thingey thingey = new Thingey(request);
// Put it in the dictionary
lock (dictionaryLock)
{
thingeys[key] = thingey;
}
// Tell anyone waiting that they can look now
lock (creationLocksLock)
{
Monitor.PulseAll(creationLocks[key]);
creationLocks.Remove(key);
}
return thingey;
}
}
Phew!
That's completely untested, and in particular it isn't in any way, shape or form robust in the face of exceptions in the creating thread... but I think it's the generally right idea :)

If you're looking to avoid blocking unrelated threads, then additional work is needed (and should only be necessary if you've profiled and found that performance is unacceptable with the simpler code). I would recommend using a lightweight wrapper class that asynchronously creates a Thingey and using that in your dictionary.
Dictionary<string, ThingeyWrapper> thingeys = new Dictionary<string, ThingeyWrapper>();
private class ThingeyWrapper
{
public Thingey Thing { get; private set; }
private object creationLock;
private Request request;
public ThingeyWrapper(Request request)
{
creationFlag = new object();
this.request = request;
}
public void WaitForCreation()
{
object flag = creationFlag;
if(flag != null)
{
lock(flag)
{
if(request != null) Thing = new Thingey(request);
creationFlag = null;
request = null;
}
}
}
}
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
ThingeyWrapper output;
lock (this.Thingeys)
{
if(!this.Thingeys.TryGetValue(thingeyName, out output))
{
output = new ThingeyWrapper(request);
this.Thingeys.Add(thingeyName, output);
}
}
output.WaitForCreation();
return output.Thing;
}
While you are still locking on all calls, the creation process is much more lightweight.
Edit
This issue has stuck with me more than I expected it to, so I whipped together a somewhat more robust solution that follows this general pattern. You can find it here.

IMHO, if this piece of code is called from many thread simultaneous, it is recommended to check it twice.
(But: I'm not sure that you can safely call ContainsKey while some other thread is call Add. So it might not be possible to avoid the lock at all.)
If you just want to avoid the Thingy is created but not used, just create it within the locking block:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
// only one can create the same Thingy
Thingey newThingey = new Thingey(request);
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
}
}
return this. Thingeys[thingeyName];
}

You have to ask yourself the question whether the specific ContainsKey operation and the getter are themselfes threadsafe (and will stay that way in newer versions), because those may and willbe invokes while another thread has the dictionary locked and is performing the Add.
Typically, .NET locks are fairly efficient if used correctly, and I believe that in this situation you're better of doing this:
bool exists;
lock (thingeys) {
exists = thingeys.TryGetValue(thingeyName, out thingey);
}
if (!exists) {
thingey = new Thingey();
}
lock (thingeys) {
if (!thingeys.ContainsKey(thingeyName)) {
thingeys.Add(thingeyName, thingey);
}
}
return thingey;

Well I hope not being to naive at giving this answer. but what I would do, as Thingyes are expensive to create, would be to add the key with a null value. That is something like this
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
this.Thingeys.Add(thingeyName, null);
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
Thingeys[thingeyName] = newThingey;
}
// else - oops someone else beat us to it
// but it doesn't mather anymore since we only created one Thingey
}
}
return this.Thingeys[thingeyName];
}
I modified your code in a rush so no testing was done.
Anyway, I hope my idea is not so naive. :D

You might be able to buy a little bit of speed efficiency at the expense of memory. If you create an immutable array that lists all of the created Thingys and reference the array with a static variable, then you could check the existance of a Thingy outside of any lock, since immutable arrays are always thread safe. Then when adding a new Thingy, you can create a new array with the additional Thingy and replace it (in the static variable) in one (atomic) set operation. Some new Thingys may be missed, because of race conditions, but the program shouldn't fail. It just means that on rare occasions extra duplicate Thingys will be made.
This will not replace the need for duplicate checking when creating a new Thingy, and it will use a lot of memory resources, but it will not require that the lock be taken or held while creating a Thingy.
I'm thinking of something along these lines, sorta:
private Dictionary<string, Thingey> Thingeys;
// An immutable list of (most of) the thingeys that have been created.
private string[] existingThingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
// Reference the same list throughout the method, just in case another
// thread replaces the global reference between operations.
string[] localThingyList = existingThingeys;
// Check to see if we already made this Thingey. (This might miss some,
// but it doesn't matter.
// This operation on an immutable array is thread-safe.
if (localThingyList.Contains(thingeyName))
{
// But referencing the dictionary is not thread-safe.
lock (this.Thingeys)
{
if (this.Thingeys.ContainsKey(thingeyName))
return this.Thingeys[thingeyName];
}
}
Thingey newThingey = new Thingey(request);
Thiney ret;
// We haven't locked anything at this point, but we have created a new
// Thingey that we probably needed.
lock (this.Thingeys)
{
// If it turns out that the Thingey was already there, then
// return the old one.
if (!Thingeys.TryGetValue(thingeyName, out ret))
{
// Otherwise, add the new one.
Thingeys.Add(thingeyName, newThingey);
ret = newThingey;
}
}
// Update our existingThingeys array atomically.
string[] newThingyList = new string[localThingyList.Length + 1];
Array.Copy(localThingyList, newThingey, localThingyList.Length);
newThingey[localThingyList.Length] = thingeyName;
existingThingeys = newThingyList; // Voila!
return ret;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.