I have four level data structure defined like this:
Dictionary<Type1, Dictionary<Type2, Dictionary<Type3, List<Type4>>>>
The whole thing is encapsulated in a class which also maintains thread-safety. Currently it just locks whole collection while it reads/manipulates the data (reading is by orders of magnitude more common than writing).
I was thinking of replacing the Dictionary with ConcurrentDictionary and List with ConcurrentBag (its items don't have to be ordered).
If I do so, can I just eliminate the locks and be sure the concurrent collections will do their job correctly?
I'm nearly a year late to the question.. but just in case anyone finds themselves in a similar position to Matěj Zábský, ask yourself:
Can you use a Dictionary<Tuple<Type1, Type2, Type3>, List<Type4>> instead?
Considerably easier to work with, and considering that hash tables (ie Dictionaries) are O(1) data structures with a somewhat hefty constant component (even more so if you move to a ConcurrentDictionary) it'd likely perform faster too. It'd also use less memory, and be pretty trivial to convert to a ConcurrentDictionary.
Of course if you need to enumerate all of a given Type2 for a given Type1 key, the nested dictionaries is possibly the way to go. But is that a requirement?
The concurrent collections will prevent data corruption and crashes, but the code won't be semantically equivalent to your current one. For example, if you iterate one of the concurrent dictionaries, some of the items may belong to different updates:
The enumerator returned from the
dictionary is safe to use concurrently
with reads and writes to the
dictionary, however it does not
represent a moment-in-time snapshot of
the dictionary. The contents exposed
through the enumerator may contain
modifications made to the dictionary
after GetEnumerator was called.
If you want to maintain the exact behavior you have now, yet save on the cost of locking, you may want to lock with ReaderWriterLockSlim which is especially suited for cases with more reads than writes.
Related
I am using a ConcurrentDictionary to store log-lines, and when I need to display them to the user I call ToList() to generate a list. But the weird thing is that some users receive the most recent lines first in the list, while they should logically be last.
Is this because ConcurrentDictionary doesnt guarantee a persistent order on the IEnumerate interface, or what can be the reason?
No ConcurrentDictionary (and Dictionary<T> for that matter) does not guarantee the ordering of the keys in the list. You'll have to use a different data type or perform the sorting yourself. For non-concurrent code you would use SortedDictionary<T>, but I don't believe there is an analogue in the concurrent collections.
No. The list order of ConcurrentDictionary is NOT guaranteed, lines can come out in any order.
I am in need of a data type that is able to insert entries and then be able to quickly determine if an entry has already been inserted. A Dictionary seems to suit this need (see example). However, I have no use for the dictionary's values. Should I still use a dictionary or is there another better suited data type?
public class Foo
{
private Dictionary<string, bool> Entities;
...
public void AddEntity(string bar)
{
if (!Entities.ContainsKey(bar))
{
// bool value true here has no use and is just a placeholder
Entities.Add(bar, true);
}
}
public string[] GetEntities()
{
return Entities.Keys.ToArray();
}
}
You can use HashSet<T>.
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
Habib's answer is excellent, but for multi-threaded environments if you use a HashSet<T> then by consequence you have to use locks to protect access to it. I find myself more prone to creating deadlocks with lock statements. Also, locks yield a worse speedup per Amdahl's law because adding a lock statement reduces the percentage of your code that is actually parallel.
For those reasons, a ConcurrentDictionary<T,object> fits the bill in multi-threaded environments. If you end up using one, then wrap it like you did in your question. Just new up objects to toss in as values as needed, since the values won't be important. You can verify that there are no lock statements in its source code.
If you didn't need mutability of the collection then this would be moot. But your question implies that you do need it, since you have an AddEntity method.
Additional info 2017-05-19 - actually, ConcurrentDictionary does use locks internally, although not lock statements per se--it uses Monitor.Enter (check out the TryAddInternal method). However, it seems to lock on individual buckets within the dictionary, which means there will be less contention than putting the entire thing in a lock statement.
So all in all, ConcurrentDictionary is often better for multithreaded environments.
It's actually quite difficult (impossible?) to make a concurrent hash set using only the Interlocked methods. I tried on my own and kept running into the problem of needing to alter two things at the same time--something that only locking can do in general. One workaround I found was to use singly-linked lists for the hash buckets and intentionally create cycles in a list when one thread needed to operate on a node without interference from other threads; this would cause other threads to get caught spinning around in the same spot until that thread was done with its node and undid the cycle. Sure, it technically didn't use locks, but it did not scale well.
I have a dictionary with a fixed collection of keys, which I create at the beginning of the program. Later, I have some threads updating the dictionary with values.
No pairs are added or removed once the threads started.
Each thread has its own key. meaning, only one thread will access a certain key.
the thread might update the value.
The question is, should I lock the dictionary?
UPDATE:
Thanks all for the answers,
I tried to simplify the situation when i asked this question, just to understand the behaviour of the dictionary.
To make myself clear, here is the full version:
I have a dictionary with ~3000 entries (fixed keys), and I have more than one thread accessing the key (shared resourse), but I know for a fact that only one thread is accessing a key entry at a time.
so, should I lock the dictionary? and - when you have the full version now, is a dictionary the right choise at all?
Thanks!
FROM MSDN
A Dictionary can support multiple readers concurrently, as long as the collection is not modified.
To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary<TKey, TValue>.
Let's deal with your question one interpretation at a time.
First interpretation: Given how Dictionary<TKey, TValue> is implemented, with the context I've given, do I need to lock the dictionary?
No, you don't.
Second interpretation: Given how Dictionary<TKey, TValue is documented, with the context I've given, do I need to lock the dictionary?
Yes, you definitely should.
There is no guarantee that the access, which might be OK today, will be OK tomorrow, in a multithreaded world since the type is documented as not threadsafe. This allows the programmers to make certain assumptions about the state and integrity of the type that they would otherwise have to build in guarantees for.
A hotfix or update to .NET, or a whole new version, might change the implementation and make it break and this is your fault for relying on undocumented behavior.
Third interpretation: Given the context I've given, is a dictionary the right choice?
No it isn't. Either switch to a threadsafe type, or simply don't use a dictionary at all. Why not just use a variable per thread instead?
Conclusion: If you intend to use the dictionary, lock the dictionary. If it's OK to switch to something else, do it.
Use a ConcurrentDictionary, don't reinvent the wheel.
Better still, refactor your code to avoid this unecessary contention.
If there is no communication between the threads you could just do something like this:
assuming a function that changes a value.
private static KeyValuePair<TKey, TValue> ValueChanger<TKey, TValue>(
KeyValuePair<TKey, TValue> initial)
{
// I don't know what you do so, i'll just return the value.
return initial;
}
lets say you have some starting data,
var start = Enumerable.Range(1, 3000)
.Select(i => new KeyValuePair<int, object>(i, new object()));
you could process them all at once like this,
var results = start.AsParallel().Select(ValueChanger);
when, results is evaluated, all 3000 ValueChangers will run concurrently, yielding a IEnumerable<KeyValuePair<int, object>>.
There will be no interaction between the threads, thus no possible concurrency problems.
If you want to turn the results into a Dictionary you could,
var resultsDictionary = results.ToDictionary(p => p.Key, p => p.Value);
This may or may not be useful in your situation but, without more detail its hard to say.
If each thread access only one "value" and if you dont care about others I'll say you dont need a Dictionary at all. You can use ThreadLocal or ThreadStatic variables.
If at all you need a Dictionary you definitely need a lock.
If you're in .Net 4.0 or above I'll strongly suggest you to use ConcurrentDictionary, you don't need to synchronize access when using ConcurrentDictionary because it is already "ThreadSafe".
The Diectionary is not thread safe but in your code you do not have to do that; you said one thread update one value so you do not have multi threading problem!
I do not have the code so I'm not sure 100%.
Also check this :Making dictionary access thread-safe?
If you're not adding keys, but simply modifying values, why not completely remove the need for writing directly to the Dictionary by storing complex objects as the value and modifying a value within the complex type. That way, you respect the thread safety constraints of the dictionary.
So:
class ValueWrapper<T>
{
public T Value{get;set;}
}
//...
var myDic = new Dictionary<KeyType, ValueWrapper<ValueType>>();
//...
myDic[someKey].Value = newValue;
You're now not writing directly to the dictionary but you can modify values.
Don't try to do the same with keys. Necessarily, they should be immutable
Given the constraint "I know for a fact that only one thread is accessing a key entry at a time", I don't think you have any problem.
Possible modifications of a Dictionary are: add, update and remove.
If the Dictionary is modified or allowed to be modified, you must use a synchronization mechanism of choice to eliminate the potential race condition, in which one thread reads the old dirty value while a second thread is currently replacing the value/updating the key.
To safe you some work, use the ConcurentDictionary in this scenario.
If the Dictionary is never modified after creation, there won't be any race conditions. A synchronization is therefore not required.
This is a special scenario in which you can replace the table with a read-only table. To add the important robustness, like guarding against potential bugs by accidentally manipulating the table, you should make the Dictionary immutable (or read-only). To give the developer compiler support, such an immutable implementation must throw an exception on any manipulation attempts.
To safe you some work, you can use the ReadOnlyDictionary in this scenario. Note that the underlying Dictionary of the ReadOnlyDictionary is still mutable and that its changes are propagated to the ReadOnlyDictionary facade. The ReadOnlyDictionary only helps to ensure that the table is not accidentally modified by its consumers.
This means: Dictionary is never an option in a multithreaded context.
Rather use the ConcurrentDictionary or a synchronization mechanism in general (or use the ReadOnlyDictionary if you can guarantee that the original source collection never changes).
Since you allow and expect manipulations of the table ("[...] the thread might update the value"), you must use a synchronization mechanism of choice or the ConcurrentDictionary.
I like the lock-free operation of the ConcurrentDictionary and use it in two objects:
ConcurrentDictionary<datetime,myObj> myIndexByDate
ConcurrentDictionary<myObjSummary, ConcurrentDictionary<int, myObj> myObjectSummary Index
These two objects need to stay in Sync. Is the only way to do this is to use a Lock, thus avoiding all benefits of the Concurrent dictionary?
I would create a custom class with 2 dictionaries and use a lock only on the methods which are susceptible to change the dictionary(Add, Delete).
You don't lose the benefits of the concurrent dictionary as this method require much less code than what you would have to do using normal dictionary.
ConcurrentDictionary is only "thread-safe" on the operations on a particular instance. e.g. while a ConcurrentDictionary.TryAdd() call is being invoked, no other threads can be modifying the collection...
This doesn't mean that while you get an value from one dictionary and add it to another dictionary that the value still exists in the original dictionary while you're adding it to the second.
You probably have an invariant that details that while you're moving one value from one dictionary to the other, no values in the original dictionary can be removed (or at least that value, but that's a little more difficult to guarantee with ConncurrentDictionary.
While going through some database code looking for a bug unrelated to this question, I noticed that in some places List<T> was being used inappropriately. Specifically:
There were many threads concurrently accessing the List as readers, but using indexes into the list instead of enumerators.
There was a single writer to the list.
There was zero synchronization, readers and writers were accessing the list at the same time, but because of code structure the last element would never be accessed until the method that executed the Add() returned.
No elements were ever removed from the list.
By the C# documentation, this should not be thread safe.
Yet it has never failed. I am wondering, because of the specific implementation of the List (I am assuming internally it's an array that re-allocs when it runs out of space), it the 1-writer 0-enumerator n-reader add-only scenario accidentally thread safe, or is there some unlikely scenario where this could blow up in the current .NET4 implementation?
edit: Important detail I left out reading some of the replies. The readers treat the List and its contents as read-only.
This can and will blow. It just hasn't yet. Stale indices is usually the first thing that goes. It will blow just when you don't want it to. You are probably lucky at the moment.
As you are using .Net 4.0, I'd suggest changing the list to a suitable collection from System.Collections.Concurrent which is guaranteed to be thread safe. I'd also avoid using array indices and switch to ConcurrentDictionary if you need to look up something:
http://msdn.microsoft.com/en-us/library/dd287108.aspx
Because of it has never failed or your application doesn't crash that doesn't mean that this scenario is thread safe. for instance suppose the writer thread does update a field within the list, lets say that is was a long field, at the same time the reader thread reading that field. the value returned maybe a bitwise combination of the two fields the old one and the new one! that could happen because the reader thread start reading the value from memory but before it finishes reading it the writer thread just updated it.
Edit: That of course if we suppose that the reader threads will just read all the data without updating anything, I am sure that they doesn't change the values of the arrays them self but, but they could change a property or field within the value they read. for instance:
for (int index =0 ; index < list.Count; index++)
{
MyClass myClass = list[index];//ok we are just reading the value from list
myClass.SomeInteger++;//boom the same variable will be updated from another threads...
}
This example not talking about thread safe of the list itself rather than the shared variables that the list exposed.
The conclusion is that you have to use a synchronization mechanism such as lock before interaction with the list, even if it has only one writer and no item removed, that will help you prevent tinny bugs and failure scenarios you are dispensable for in the first place.
Thread safety only matters when data is modified more than once at a time. The number of readers does not matter. Even when someone is writing while someone reads, the reader either gets the old data or the new, it still works. The fact that elements can only be accessed after the Add() returns, prevents parts of the element being read seperately. If you would start using the Insert() method readers could get the wrong data.
It follows then, that if the architecture is 32 bits, writing a field bigger than 32 bits, such as long and double, is not a thread safe operation; see the documentation for System.Double:
Assigning an instance of this type is not thread safe on all hardware platforms because the
binary representation of that instance might be too large to assign in a single atomic
operation.
If the list is fixed in size, however, this situation matters only if the List is storing value types greater than 32 bits. If the list is only holding reference types, then any thread safety issues stem from the reference types themselves, not from their storage and retrieval from the List. For instance, immutable reference types are less likely to cause thread safety issues than mutable reference types.
Moreover, you can't control the implementation details of List: that class was mainly designed for performance, and it's likely to change in the future with that aspect, rather than thread safety, in mind.
In particular, adding elements to a list or otherwise changing its size is not thread safe even if the list's elements are 32 bits long, since there is more involved in inserting, adding, or removing than just placing the element in the list. If such operations are needed after other threads have access to the list, then locking access to the list or using a concurrent list implementation is a better choice.
First off, to some of the posts and comments, since when was documentation reliable?
Second, this answer is more to the general question than the specifics of the OP.
I agree with MrFox in theory because this all boils down to two questions:
Is the List class is implemented as a flat array?
If yes, then:
Can a write instruction be preempted in the middle of a write>
I believe this is not the case -- the full write will happen before anything can read that DWORD or whatever. In other words, it will never happen that I write two of the four bytes of a DWORD and then you read 1/2 of the new value and 1/2 of the old one.
So, if you're indexing an array by providing an offset to some pointer, you can read safely without thread-locking. If the List is doing more than just simple pointer math, then it is not thread safe.
If the List was not using a flat array, I think you would have seen it crash by now.
My own experience is that it is safe to read a single item from a List via index without thread-locking. This is all just IMHO though, so take it for what it's worth.
Worst case, such as if you need to iterate through the list, the best thing to do is:
lock the List
create an array the same size
use CopyTo() to copy the List to the array
unlock the List
then iterate through the array instead of the list.
in (whatever you call the .net) C++:
List<Object^>^ objects = gcnew List<Object^>^();
// in some reader thread:
Monitor::Enter(objects);
array<Object^>^ objs = gcnew array<Object^>(objects->Count);
objects->CopyTo(objs);
Monitor::Exit(objects);
// use objs array
Even with the memory allocation, this will be faster than locking the List and iterating through the entire thing before unlocking it.
Just a heads up though: if you want a fast system, thread-locking is your worst enemy. Use ZeroMQ instead. I can speak from experience, message-based synch is the right way to go.