We are dealing with a lot of files which need to be opened and close for data reads mostly.
Is it a good idea or not to cache the memorystream of each file in a temp hashtable or some other object?
We have noticed when opening files over 100MB we are running into out of memory exceptions.
We are using a wpf app.
We could successfully open the files 1 or 2 time sometimes 3 to 4 times but after that we are running into out of memory exceptions.
If you are currently caching these files, then you would expect to run out of memory quite quickly.
If you aren't caching them yet, don't, because you'll just make it worse. Perhaps you have a memory leak? Are you disposing of the memorystream once you've used it?
The best way to deal with large files is to stream data in and out (using FileStreams), so that you don't have to have the whole file in memory at once...
One issue with the MemoryStream is the internal buffer doubles in size each time the capacity is forced to increase. Even if your MemoryStream is 100MB and your file is 101MB, as soon as you try to write that last 1MB to the MemoryStream the internal buffer on MemoryStream is doubled to 200MB. You may reduce this if you give the Memory Buffer a starting capacity equal to that of you files. But this will still allow the files to use all of the memory and stop any new allocations after the some of the files are loaded. If create a cache object that is help inside of a WeakReference object you would be able to allow the garbage collector to toss a few of your cached files as needed. But don't forget you will need to add code to recreate the lost cache on demand.
public class CacheStore<TKey, TCache>
{
private static object _lockStore = new object();
private static CacheStore<TKey, TCache> _store;
private static object _lockCache = new object();
private static Dictionary<TKey, TCache> _cache =
new Dictionary<TKey, TCache>();
public TCache this[TKey index]
{
get
{
lock (_lockCache)
{
if (_cache.ContainsKey(index))
return _cache[index];
return default(TCache);
}
}
set
{
lock (_lockCache)
{
if (_cache.ContainsKey(index))
_cache.Remove(index);
_cache.Add(index, value);
}
}
}
public static CacheStore<TKey, TCache> Instance
{
get
{
lock (_lockStore)
{
if (_store == null)
_store = new CacheStore<TKey, TCache>();
return _store;
}
}
}
}
public class FileCache
{
private WeakReference _cache;
public FileCache(string fileLocation)
{
if (!File.Exists(fileLocation))
throw new FileNotFoundException("fileLocation", fileLocation);
this.FileLocation = fileLocation;
}
private MemoryStream GetStream()
{
if (!File.Exists(this.FileLocation))
throw new FileNotFoundException("fileLocation", FileLocation);
return new MemoryStream(File.ReadAllBytes(this.FileLocation));
}
public string FileLocation { get; private set; }
public MemoryStream Data
{
get
{
if (_cache == null)
_cache = new WeakReference(GetStream(), false);
var ret = _cache.Target as MemoryStream;
if (ret == null)
{
Recreated++;
ret = GetStream();
_cache.Target = ret;
}
return ret;
}
}
public int Recreated { get; private set; }
}
class Program
{
static void Main(string[] args)
{
var cache = CacheStore<string, FileCache>.Instance;
var fileName = #"c:\boot.ini";
cache[fileName] = new FileCache(fileName);
var ret = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret));
GC.Collect();
var ret2 = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret2));
GC.Collect();
var ret3 = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret3));
Console.Read();
}
}
It's very dificutl say "yes" or "no", if is file content caching right in the common case and/or with so little informations. However - finited resources are real state of world, and you (as developer) must count with it. If you want cache something, you should use some mechanism for auto unloading data. In .NET framework you can use a WeakReference class, which unloads the target object (byte array and memory stream are objects too).
If you have the HW in you control, and you can use 64bit and have funds for very big RAM, you can cache big files.
However, you should be humble to resources (cpu,ram) and use the "cheap" way of implementation.
I think that the problem is that after you are done, the file is not disposed immediatly, it is waiting to the next GC cycle.
Streams are IDisposable, whice means you can and should use the using block. then the stream will dispose immidiatly when your are done dealing with it.
I don't think that caching such amount of data is a good solution, even if you don't get ever memroy overflow. Check out Memory Mapped files solution, which means that file lays on file system but speed of reading is almost equal to the in memory ones (there is an overhead for sure). Check out this link. MemoryMappedFiles
P.S. Ther are pretty good articles and examples on this topic arround in internet.
Good Luck.
Related
Writing Stringbuilder to file asynchronously. This code takes control of a file, writes a stream to it and releases it. It deals with requests from asynchronous operations, which may come in at any time.
The FilePath is set per class instance (so the lock Object is per instance), but there is potential for conflict since these classes may share FilePaths. That sort of conflict, as well as all other types from outside the class instance, would be dealt with retries.
Is this code suitable for its purpose? Is there a better way to handle this that means less (or no) reliance on the catch and retry mechanic?
Also how do I avoid catching exceptions that have occurred for other reasons.
public string Filepath { get; set; }
private Object locker = new Object();
public async Task WriteToFile(StringBuilder text)
{
int timeOut = 100;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
while (true)
{
try
{
//Wait for resource to be free
lock (locker)
{
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
}
break;
}
catch
{
//File not available, conflict with other class instances or application
}
if (stopwatch.ElapsedMilliseconds > timeOut)
{
//Give up.
break;
}
//Wait and Retry
await Task.Delay(5);
}
stopwatch.Stop();
}
How you approach this is going to depend a lot on how frequently you're writing. If you're writing a relatively small amount of text fairly infrequently, then just use a static lock and be done with it. That might be your best bet in any case because the disk drive can only satisfy one request at a time. Assuming that all of your output files are on the same drive (perhaps not a fair assumption, but bear with me), there's not going to be much difference between locking at the application level and the lock that's done at the OS level.
So if you declare locker as:
static object locker = new object();
You'll be assured that there are no conflicts with other threads in your program.
If you want this thing to be bulletproof (or at least reasonably so), you can't get away from catching exceptions. Bad things can happen. You must handle exceptions in some way. What you do in the face of error is something else entirely. You'll probably want to retry a few times if the file is locked. If you get a bad path or filename error or disk full or any of a number of other errors, you probably want to kill the program. Again, that's up to you. But you can't avoid exception handling unless you're okay with the program crashing on error.
By the way, you can replace all of this code:
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
With a single call:
File.AppendAllText(Filepath, text.ToString());
Assuming you're using .NET 4.0 or later. See File.AppendAllText.
One other way you could handle this is to have the threads write their messages to a queue, and have a dedicated thread that services that queue. You'd have a BlockingCollection of messages and associated file paths. For example:
class LogMessage
{
public string Filepath { get; set; }
public string Text { get; set; }
}
BlockingCollection<LogMessage> _logMessages = new BlockingCollection<LogMessage>();
Your threads write data to that queue:
_logMessages.Add(new LogMessage("foo.log", "this is a test"));
You start a long-running background task that does nothing but service that queue:
foreach (var msg in _logMessages.GetConsumingEnumerable())
{
// of course you'll want your exception handling in here
File.AppendAllText(msg.Filepath, msg.Text);
}
Your potential risk here is that threads create messages too fast, causing the queue to grow without bound because the consumer can't keep up. Whether that's a real risk in your application is something only you can say. If you think it might be a risk, you can put a maximum size (number of entries) on the queue so that if the queue size exceeds that value, producers will wait until there is room in the queue before they can add.
You could also use ReaderWriterLock, it is considered to be more 'appropriate' way to control thread safety when dealing with read write operations...
To debug my web apps (when remote debug fails) I use following ('debug.txt' end up in \bin folder on the server):
public static class LoggingExtensions
{
static ReaderWriterLock locker = new ReaderWriterLock();
public static void WriteDebug(string text)
{
try
{
locker.AcquireWriterLock(int.MaxValue);
System.IO.File.AppendAllLines(Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", ""), "debug.txt"), new[] { text });
}
finally
{
locker.ReleaseWriterLock();
}
}
}
Hope this saves you some time.
I have a problem with freeing memory in C#. I have a static class containing a static dictionary, which is filled with references to objects. Single object zajumie large amount of memory. From time to time I release the memory by deleting obsolete references to the object set to null and remove the item from the dictionary. Unfortunately, in this case, the memory is not slowing down, time after reaching the maximum size of the memory in the system is as if a sudden release of unused resources and the amount of memory used correctly decreases.
Below is the diagram of classes:
public class cObj
{
public DateTime CreatedOn;
public object ObjectData;
}
public static class cData
{
public static ConcurrentDictionary<Guid, cObj> ObjectDict = new ConcurrentDictionary<Guid, cObj>();
public static FreeData()
{
foreach(var o in ObjectDict)
{
if (o.Value.CreatedOn <= DateTime.Now.AddSeconds(-30))
{
cObj Data;
if (ObjectDict.TryGetValue(o.Key, out Data))
{
Data.Status = null;
Data.ObjectData = null;
ObjectDict.TryRemove(o.Key, out Data);
}
}
}
}
}
In this case, the memory is released. If, however, after this operation, I call
GC.Collect ();
Followed by the expected release of unused objects.
How to solve the problem, so you do not have to use the GC.Collect()?
You shouldn't have to call GC.Collect() in most cases. To GC.Collect or not?
I've had similar scenarios where I've just created a dictionary that's limited to n entries, I did this myself on top of ConcurrentDictionary but you could use BlockingCollection.
One possible advantage is that if 1 million entries get added at the same time, all except n will be available for garbage collection rather than 30 seconds later.
Environment : .net 4.0
I have a task that transforms XML files with a XSLT stylesheet, here is my code
public string TransformFileIntoTempFile(string xsltPath,
string xmlPath)
{
var transform = new MvpXslTransform();
transform.Load(xsltPath, new XsltSettings(true, false),
new XmlUrlResolver());
string tempPath = Path.GetTempFileName();
using (var writer = new StreamWriter(tempPath))
{
using (XmlReader reader = XmlReader.Create(xmlPath))
{
transform.Transform(new XmlInput(reader), null,
new XmlOutput(writer));
}
}
return tempPath;
}
I have X threads that can launch this task in parallel.
Sometimes my input file are about 300 MB, sometimes it's only a few MB.
My problem : I get OutOfMemoryException when my program try to transform some big XML files in the same time.
How can I avoid these OutOfMemoryEception ? My idea is to stop a thread before executing the task until there is enough available memory, but I don't know how to do that. Or there is some other solution (like putting my task in a distinct application).
Thanks
I don't recommend blocking a thread. In worst case, you'll just end up starving the task that could potentially free the memory you needed, leading to deadlock or very bad performance in general.
Instead, I suggest you keep a work queue with priorities. Get the tasks from the Queue scheduled fairly across a thread pool. Make sure no thread ever blocks on a wait operation, instead repost the task to the queue (with a lower priority).
So what you'd do (e.g. on receiving an OutOfMemory exception), is post the same job/task onto the queue and terminate the current task, freeing up the thread for another task.
A simplistic approach is to use LIFO which ensures that a task posted to the queue will have 'lower priority' than any other jobs already on that queue.
Since .NET Framework 4 we have API to work with good old Memory-Mapped Files feature which is available many years within from Win32API, so now you can use it from the .NET Managed Code.
For your task better fit "Persisted memory-mapped files" option,
MSDN:
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
On the page of MemoryMappedFile.CreateFromFile() method description you can find a nice example describing how to create a memory mapped Views for the extremely large file.
EDIT: Update regarding considerable notes in comments
Just found method MemoryMappedFile.CreateViewStream() which creates a stream of type MemoryMappedViewStream which is inherited from a System.IO.Stream.
I believe you can create an instance of XmlReader from this stream and then instantiate your custom implementation of the XslTransform using this reader/stream.
EDIT2: remi bourgarel (OP) already tested this approach and looks like this particular XslTransform implementation (I wonder whether ANY would) wont work with MM-View stream in way which was supposed
The main problem is that you are loading the entire Xml file. If you were to just transform-as-you-read the out of memory problem should not normally appear.
That being said I found a MS support article which suggests how it can be done:
http://support.microsoft.com/kb/300934
Disclaimer: I did not test this so if you use it and it works please let us know.
You could consider using a queue to throttle how many concurrent transforms are being done based on some sort of artificial memory boundary e.g. file size. Something like the following could be used.
This sort of throttling strategy can be combined with maximum number of concurrent files being processed to ensure your disk is not being thrashed too much.
NB I have not included necessary try\catch\finally around execution to ensure that exceptions are propogated to calling thread and Waithandles are always released. I could go into further detail here.
public static class QueuedXmlTransform
{
private const int MaxBatchSizeMB = 300;
private const double MB = (1024 * 1024);
private static readonly object SyncObj = new object();
private static readonly TaskQueue Tasks = new TaskQueue();
private static readonly Action Join = () => { };
private static double _CurrentBatchSizeMb;
public static string Transform(string xsltPath, string xmlPath)
{
string tempPath = Path.GetTempFileName();
using (AutoResetEvent transformedEvent = new AutoResetEvent(false))
{
Action transformTask = () =>
{
MvpXslTransform transform = new MvpXslTransform();
transform.Load(xsltPath, new XsltSettings(true, false),
new XmlUrlResolver());
using (StreamWriter writer = new StreamWriter(tempPath))
using (XmlReader reader = XmlReader.Create(xmlPath))
{
transform.Transform(new XmlInput(reader), null,
new XmlOutput(writer));
}
transformedEvent.Set();
};
double fileSizeMb = new FileInfo(xmlPath).Length / MB;
lock (SyncObj)
{
if ((_CurrentBatchSizeMb += fileSizeMb) > MaxBatchSizeMB)
{
_CurrentBatchSizeMb = fileSizeMb;
Tasks.Queue(isParallel: false, task: Join);
}
Tasks.Queue(isParallel: true, task: transformTask);
}
transformedEvent.WaitOne();
}
return tempPath;
}
private class TaskQueue
{
private readonly object _syncObj = new object();
private readonly Queue<QTask> _tasks = new Queue<QTask>();
private int _runningTaskCount;
public void Queue(bool isParallel, Action task)
{
lock (_syncObj)
{
_tasks.Enqueue(new QTask { IsParallel = isParallel, Task = task });
}
ProcessTaskQueue();
}
private void ProcessTaskQueue()
{
lock (_syncObj)
{
if (_runningTaskCount != 0) return;
while (_tasks.Count > 0 && _tasks.Peek().IsParallel)
{
QTask parallelTask = _tasks.Dequeue();
QueueUserWorkItem(parallelTask);
}
if (_tasks.Count > 0 && _runningTaskCount == 0)
{
QTask serialTask = _tasks.Dequeue();
QueueUserWorkItem(serialTask);
}
}
}
private void QueueUserWorkItem(QTask qTask)
{
Action completionTask = () =>
{
qTask.Task();
OnTaskCompleted();
};
_runningTaskCount++;
ThreadPool.QueueUserWorkItem(_ => completionTask());
}
private void OnTaskCompleted()
{
lock (_syncObj)
{
if (--_runningTaskCount == 0)
{
ProcessTaskQueue();
}
}
}
private class QTask
{
public Action Task { get; set; }
public bool IsParallel { get; set; }
}
}
}
Update
Fixed bug in maintaining batch size when rolling over to next batch window:
_CurrentBatchSizeMb = fileSizeMb;
I experience strange memory leak in computation expensive content-based image retrieval (CBIR) .NET application
The concept is that there is service class with thread loop which captures images from some source and then passes them to image tagging thread for annotation.
Image tags are queried from repository by the service class at specified time intervals and stored in its in-memory cache (Dictionary) to avoid frequent db hits.
The classes in the project are:
class Tag
{
public Guid Id { get; set; } // tag id
public string Name { get; set; } // tag name: e.g. 'sky','forest','road',...
public byte[] Jpeg { get; set; } // tag jpeg image patch sample
}
class IRepository
{
public IEnumerable<Tag> FindAll();
}
class Service
{
private IDictionary<Guid, Tag> Cache { get; set; } // to avoid frequent db reads
// image capture background worker (ICBW)
// image annotation background worker (IABW)
}
class Image
{
public byte[] Jpeg { get; set; }
public IEnumerable<Tag> Tags { get; set; }
}
ICBW worker captures jpeg image from some image source and passes it to IABW worker for annotation. IABW worker first tries to update Cache if time has come and then annotates the image by some algorithm creating Image object and attaching Tags to it then storing it to annotation repository.
Service cache update snippet in IABW worker is:
IEnumerable<Tag> tags = repository.FindAll();
Cache.Clear();
tags.ForEach(t => Cache.Add(t.Id, t));
IABW is called many times a second and is pretty processor extensive.
While running it for days I found memory increase in task manager. Using Perfmon to watch for Process/Private Bytes and .NET Memory/Bytes in all heaps I found them both increasing over the time.
Experimenting with the application I found that Cache update is the problem. If it is not updated there is no problem with the mem increase. But if the Cache update is as frequent as once in 1-5 minutes application gets ouf of mem pretty fast.
What might be the reason of that mem leak? Image objects are created quite often containing references to Tag objects in Cache. I presume when the Cache dictionary is created those references somehow are not garbage collected in the future.
Does it need to explicitly null managed byte[] objects to avoid memory leak e.g. by implementing Tag, Image as IDisposable?
Edit: 4 aug 2001, addition of the buggy code snippet causing quick mem leak.
static void Main(string[] args)
{
while (!Console.KeyAvailable)
{
IEnumerable<byte[]> data = CreateEnumeration(100);
PinEntries(data);
Thread.Sleep(900);
Console.Write(String.Format("gc mem: {0}\r", GC.GetTotalMemory(true)));
}
}
static IEnumerable<byte[]> CreateEnumeration(int size)
{
Random random = new Random();
IList<byte[]> data = new List<byte[]>();
for (int i = 0; i < size; i++)
{
byte[] vector = new byte[12345];
random.NextBytes(vector);
data.Add(vector);
}
return data;
}
static void PinEntries(IEnumerable<byte[]> data)
{
var handles = data.Select(d => GCHandle.Alloc(d, GCHandleType.Pinned));
var ptrs = handles.Select(h => h.AddrOfPinnedObject());
IntPtr[] dataPtrs = ptrs.ToArray();
Thread.Sleep(100); // unmanaged function call taking byte** data
handles.ToList().ForEach(h => h.Free());
}
No, you don't need to set anything to null or dispose of anything if it's just memory as you've shown.
I suggest you get hold of a good profiler to work out where the leak is. Do you have anything non-memory-related that you might be failing to dispose of, e.g. loading a GDI+ image to get the bytes?
I have a question about improving the efficiency of my program. I have a Dictionary<string, Thingey> defined to hold named Thingeys. This is a web application that will create multiple named Thingey’s over time. Thingey’s are somewhat expensive to create (not prohibitively so) but I’d like to avoid it whenever possible. My logic for getting the right Thingey for the request looks a lot like this:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
lock (this.Thingeys)
{
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
// else - oops someone else beat us to it
// newThingey will eventually get GCed
}
}
return this. Thingeys[thingeyName];
}
In this application, Thingeys live forever once created. We don’t know how to create them or which ones will be needed until the app starts and requests begin coming in. The question I have is in the above code is there are occasional instances where newThingey is created because we get multiple simultaneous requests for it before it’s been created. We end up creating 2 of them but only adding one to our collection.
Is there a better way to get Thingeys created and added that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using? (And this code works and has been running for some time. This is just the nagging bit that has always bothered me.)
I'm trying to avoid locking the dictionary for the duration of creating a Thingey.
This is the standard double check locking problem. The way it is implemented here is unsafe and can cause various problems - potentially up to the point of a crash in the first check if the internal state of the dictionary is screwed up bad enough.
It is unsafe because you are checking it without synchronization and if your luck is bad enough you can hit it while some other thread is in the middle of updating internal state of the dictionary
A simple solution is to place the first check under a lock as well. A problem with this is that this becomes a global lock and in web environment under heavy load it can become a serious bottleneck.
If we are talking about .NET environment, there are ways to work around this issue by piggybacking on the ASP.NET synchronization mechanism.
Here is how I did it in NDjango rendering engine: I keep one global dictionary and one dictionary per rendering thread. When a request comes I check the local dictionary first - this check does not have to be synchronized and if the thingy is there I just take it
If it is not I synchronize on the global dictionary check if it is there and if it is add it to my thread dictionary and release the lock. If it is not in the global dictionary I add it there first while still under lock.
Well, from my point of view simpler code is better, so I'd only use one lock:
private readonly object thingeysLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
lock (thingeysLock)
{
Thingey ret;
if (!thingeys.TryGetValue(key, out ret))
{
ret = new Thingey(request);
thingeys[key] = ret;
}
return ret;
}
}
Locks are really cheap when they're not contended. The downside is that this means that occasionally you will block everyone for the whole duration of the time you're creating a new Thingey. Clearly to avoid creating redundant thingeys you'd have to at least block while multiple threads create the Thingey for the same key. Reducing it so that they only block in that situation is somewhat harder.
I would suggest you use the above code but profile it to see whether it's fast enough. If you really need "only block when another thread is already creating the same thingey" then let us know and we'll see what we can do...
EDIT: You've commented on Adam's answer that you "don't want to lock while a new Thingey is being created" - you do realise that there's no getting away from that if there's contention for the same key, right? If thread 1 starts creating a Thingey, then thread 2 asks for the same key, your alternatives for thread 2 are either waiting or creating another instance.
EDIT: Okay, this is generally interesting, so here's a first pass at the "only block other threads asking for the same item".
private readonly object dictionaryLock = new object();
private readonly object creationLocksLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
private readonly Dictionary<string, object> creationLocks;
public Thingey GetThingey(Request request)
{
string key = request.ThingeyName;
Thingey ret;
bool entryExists;
lock (dictionaryLock)
{
entryExists = thingeys.TryGetValue(key, out ret);
// Atomically mark the dictionary to say we're creating this item,
// and also set an entry for others to lock on
if (!entryExists)
{
thingeys[key] = null;
lock (creationLocksLock)
{
creationLocks[key] = new object();
}
}
}
// If we found something, great!
if (ret != null)
{
return ret;
}
// Otherwise, see if we're going to create it or whether we need to wait.
if (entryExists)
{
object creationLock;
lock (creationLocksLock)
{
creationLocks.TryGetValue(key, out creationLock);
}
// If creationLock is null, it means the creating thread has finished
// creating it and removed the creation lock, so we don't need to wait.
if (creationLock != null)
{
lock (creationLock)
{
Monitor.Wait(creationLock);
}
}
// We *know* it's in the dictionary now - so just return it.
lock (dictionaryLock)
{
return thingeys[key];
}
}
else // We said we'd create it
{
Thingey thingey = new Thingey(request);
// Put it in the dictionary
lock (dictionaryLock)
{
thingeys[key] = thingey;
}
// Tell anyone waiting that they can look now
lock (creationLocksLock)
{
Monitor.PulseAll(creationLocks[key]);
creationLocks.Remove(key);
}
return thingey;
}
}
Phew!
That's completely untested, and in particular it isn't in any way, shape or form robust in the face of exceptions in the creating thread... but I think it's the generally right idea :)
If you're looking to avoid blocking unrelated threads, then additional work is needed (and should only be necessary if you've profiled and found that performance is unacceptable with the simpler code). I would recommend using a lightweight wrapper class that asynchronously creates a Thingey and using that in your dictionary.
Dictionary<string, ThingeyWrapper> thingeys = new Dictionary<string, ThingeyWrapper>();
private class ThingeyWrapper
{
public Thingey Thing { get; private set; }
private object creationLock;
private Request request;
public ThingeyWrapper(Request request)
{
creationFlag = new object();
this.request = request;
}
public void WaitForCreation()
{
object flag = creationFlag;
if(flag != null)
{
lock(flag)
{
if(request != null) Thing = new Thingey(request);
creationFlag = null;
request = null;
}
}
}
}
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
ThingeyWrapper output;
lock (this.Thingeys)
{
if(!this.Thingeys.TryGetValue(thingeyName, out output))
{
output = new ThingeyWrapper(request);
this.Thingeys.Add(thingeyName, output);
}
}
output.WaitForCreation();
return output.Thing;
}
While you are still locking on all calls, the creation process is much more lightweight.
Edit
This issue has stuck with me more than I expected it to, so I whipped together a somewhat more robust solution that follows this general pattern. You can find it here.
IMHO, if this piece of code is called from many thread simultaneous, it is recommended to check it twice.
(But: I'm not sure that you can safely call ContainsKey while some other thread is call Add. So it might not be possible to avoid the lock at all.)
If you just want to avoid the Thingy is created but not used, just create it within the locking block:
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
// only one can create the same Thingy
Thingey newThingey = new Thingey(request);
if (!this.Thingeys.ContainsKey(thingeyName))
{
this.Thingeys.Add(thingeyName, newThingey);
}
}
}
return this. Thingeys[thingeyName];
}
You have to ask yourself the question whether the specific ContainsKey operation and the getter are themselfes threadsafe (and will stay that way in newer versions), because those may and willbe invokes while another thread has the dictionary locked and is performing the Add.
Typically, .NET locks are fairly efficient if used correctly, and I believe that in this situation you're better of doing this:
bool exists;
lock (thingeys) {
exists = thingeys.TryGetValue(thingeyName, out thingey);
}
if (!exists) {
thingey = new Thingey();
}
lock (thingeys) {
if (!thingeys.ContainsKey(thingeyName)) {
thingeys.Add(thingeyName, thingey);
}
}
return thingey;
Well I hope not being to naive at giving this answer. but what I would do, as Thingyes are expensive to create, would be to add the key with a null value. That is something like this
private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
if (!this.Thingeys.ContainsKey(thingeyName))
{
lock (this.Thingeys)
{
this.Thingeys.Add(thingeyName, null);
if (!this.Thingeys.ContainsKey(thingeyName))
{
// create a new thingey on 1st reference
Thingey newThingey = new Thingey(request);
Thingeys[thingeyName] = newThingey;
}
// else - oops someone else beat us to it
// but it doesn't mather anymore since we only created one Thingey
}
}
return this.Thingeys[thingeyName];
}
I modified your code in a rush so no testing was done.
Anyway, I hope my idea is not so naive. :D
You might be able to buy a little bit of speed efficiency at the expense of memory. If you create an immutable array that lists all of the created Thingys and reference the array with a static variable, then you could check the existance of a Thingy outside of any lock, since immutable arrays are always thread safe. Then when adding a new Thingy, you can create a new array with the additional Thingy and replace it (in the static variable) in one (atomic) set operation. Some new Thingys may be missed, because of race conditions, but the program shouldn't fail. It just means that on rare occasions extra duplicate Thingys will be made.
This will not replace the need for duplicate checking when creating a new Thingy, and it will use a lot of memory resources, but it will not require that the lock be taken or held while creating a Thingy.
I'm thinking of something along these lines, sorta:
private Dictionary<string, Thingey> Thingeys;
// An immutable list of (most of) the thingeys that have been created.
private string[] existingThingeys;
public Thingey GetThingey(Request request)
{
string thingeyName = request.ThingeyName;
// Reference the same list throughout the method, just in case another
// thread replaces the global reference between operations.
string[] localThingyList = existingThingeys;
// Check to see if we already made this Thingey. (This might miss some,
// but it doesn't matter.
// This operation on an immutable array is thread-safe.
if (localThingyList.Contains(thingeyName))
{
// But referencing the dictionary is not thread-safe.
lock (this.Thingeys)
{
if (this.Thingeys.ContainsKey(thingeyName))
return this.Thingeys[thingeyName];
}
}
Thingey newThingey = new Thingey(request);
Thiney ret;
// We haven't locked anything at this point, but we have created a new
// Thingey that we probably needed.
lock (this.Thingeys)
{
// If it turns out that the Thingey was already there, then
// return the old one.
if (!Thingeys.TryGetValue(thingeyName, out ret))
{
// Otherwise, add the new one.
Thingeys.Add(thingeyName, newThingey);
ret = newThingey;
}
}
// Update our existingThingeys array atomically.
string[] newThingyList = new string[localThingyList.Length + 1];
Array.Copy(localThingyList, newThingey, localThingyList.Length);
newThingey[localThingyList.Length] = thingeyName;
existingThingeys = newThingyList; // Voila!
return ret;
}