I would like to use the new concurrent collections in .NET 4.0 to solve the following problem.
The basic data structure I want to have is a producer consumer queue, there will be a single consumer and multiple producers.
There are items of type A,B,C,D,E that will be added to this queue. Items of type A,B,C are added to the queue in the normal manner and processed in order.
However items of type D or E can only exist in the queue zero or once. If one of these is to be added and there already exists another of the same type that has not yet been processed then this should update that other one in-place in the queue. The queue position would not change (i.e. would not go to the back of the queue) after the update.
Which .NET 4.0 classes would be best for this?
I think there is no such (priority) queue in .net 4 that would support atomic AddOrUpdate operation. There only is ConcurrentDictionary that supports this, but it's not suitable if you need the order preserved.
So your option is maybe to use some combination of the two.
However, please be aware that you will lose the safety of the concurrent structures as soon as you do combined operations on them; you must implement the locking mechanism on your own (look here for an example of such situation: A .Net4 Gem: The ConcurrentDictionary - Tips & Tricks).
Second option would be to google for some 3rd party implementations.
Related
I have a background service IHostedService in dotnet core 3.1 that takes requests from 100s of clients(machines in a factory) using sockets (home rolled). My issue is that multiple calls can come in on different threads to the same method on a class which has access to an object (shared state). This is common in the codebase. The requests also have to be processed in the correct order.
The reason that this is not in a database is due to performance reasons (real time system). I know I can use a lock, but I don't want to have locks all over the code base.
What is a standard way to handle this situation. Do you use an in-memory database? In-memory cache? Or do I just have to add locks everywhere?
public class Machine
{
public MachineState {get; set;}
// Gets called by multiple threads from multiple clients
public bool CheckMachineStatus()
{
return MachineState.IsRunning;
}
// Gets called by multiple threads from multiple clients
public void SetMachineStatus()
{
MachineState = Stopped;
}
}
Update
Here's an example. I have a console app that talks to a machine via sockets, for weighing products. When the console app initializes it will load data into memory (information about the products being weighed). All of this is done on the main thread, to keep data integrity.
When a call comes in from the weigh-er on Thread 1, it will get switched to the main thread to access the product information, and to finish any other work like raising events for other parts of the system.
Currently this switching from Thread 1,2, ...N to the main thread is done by a home rolled solution, and was done to avoid having locking code all over the code base. This was written in .Net 1.1 and since moving to dotnet core 3.1. I thought there might be a framework, library, tool, technique etc that might handle this for us, or just a better way.
This is an existing system that I'm still learning. Hope this makes sense.
Using an in-memory database is an option, as long as you are willing to delegate all concurrency-inducing situations to the database, and do nothing using code. For example if you must update a value in the database depending on some condition, then the condition should be checked by the database, not by your own code.
Adding locks everywhere is also an option, that will almost certainly lead to unmaintanable code quite quickly. The code will probably be riddled with hidden bugs from the get-go, bugs that you will discover one by one over time, usually under the most unfortunate of circumstances.
You must realize that you are dealing with a difficult problem, with no magic solutions available. Managing shared state in a multithreaded application has always been a source of pain.
My suggestion is to encapsulate all this complexity inside thread-safe classes, that the rest of your application can safely invoke. How you make these classes thread-safe depends on the situation.
Using locks is the most flexible option, but not always the most efficient because it has the potential of creating contention.
Using thread-safe collections, like the ConcurrentDictionary for example, is less flexible because the thread-safety guarantees they offer are limited to the integrity of their internal state. If for example you must update one collection based on a condition obtained from another collection, then the whole operation can not be made atomic by just using thread-safety collections. On the other hand these collections offer better performance than the simple locks.
Using immutable collections, like the ImmutableQueue for example, is another interesting option. They are less efficient both memory and CPU wise than the concurrent collections (adding/removing is in many cases O(Log n) instead of O(1)), and not more flexible than them, but they are very efficient specifically at providing snapshots of actively processed data. For updating atomically an immutable collection, there is the handy ImmutableInterlocked.Update method available. It updates a reference of an immutable collection with an updated version of the same collection, without using locks. In case of contention with other threads it may invoke the supplied transformation multiple times, until it wins the race.
I'm always confused on which one of these to pick. As I see it I use Dictionary over List if I want two data types as a Key and Value so I can easily find a value by its key but I am always confused if I should use a ConcurrentDictionary or Dictionary?
Before you go off at me for not putting much research in to this I have tried, but it seems google hasn't really got anything on Dictionary vs ConcurrentDictionary but has something on each one individually.
I have asked a friend this before but all they said is: "use ConcurrentDictionary if you use your dictionary a lot in code" and I didn't really want to pester them in to explaining it in larger detail. Could anyone expand on this?
"Use ConcurrentDictionary if you use your dictionary in a lot in code" is kind of vague advice. I don't blame you for the confusion.
ConcurrentDictionary is primarily for use in an environment where you're updating the dictionary from multiple threads (or async tasks). You can use a standard Dictionary from as much code as you like if it's from a single thread ;)
If you look at the methods on a ConcurrentDictionary, you'll spot some interesting methods like TryAdd, TryGetValue, TryUpdate, and TryRemove.
For example, consider a typical pattern you might see for working with a normal Dictionary class.
// There are better ways to do this... but we need an example ;)
if (!dictionary.ContainsKey(id))
dictionary.Add(id, value);
This has an issue in that between the check for whether it contains a key and calling Add a different thread could call Add with that same id. When this thread calls Add, it'll throw an exception. The method TryAdd handles that for you and will return a true/false telling you whether it added it (or whether that key was already in the dictionary).
So unless you're working in a multi-threaded section of code, you probably can just use the standard Dictionary class. That being said, you could theoretically have locks to prevent concurrent access to a dictionary; that question is already addressed in "Dictionary locking vs. ConcurrentDictionary".
The biggest reason to use ConcurrentDictionary over the normal Dictionary is thread safety. If your application will get multiple threads using the same dictionary at the same time, you need the thread-safe ConcurrentDictionary, this is particularly true when these threads are writing to or building the dictionary.
The downside to using ConcurrentDictionary without the multi-threading is overhead. All those functions that allow it to be thread-safe will still be there, all the locks and checks will still happen, taking processing time and using extra memory.
ConcurrentDictionary is useful when you need to access a dictionary across multiple threads (i.e. multithreading). Vanilla Dictionary objects do not possess this capability and therefore should only be used in a single-threaded manner.
A ConcurrentDictionary is useful when you want a high-performance dictionary that can be safely accessed by multiple threads concurrently. Compared to a standard Dictionary protected with a lock, it is more efficient under heavy usage because of its granular locking implementation. Instead of all threads competing for a single lock, the ConcurrentDictionary maintains multiple locks internally, minimizing this way the contention, and limiting the possibility of becoming a bottleneck.
Despite these nice characteristics, the number of scenarios where using a ConcurrentDictionary is the best option is actually quite small. There are two reasons for that:
The thread-safety guaranties offered by the ConcurrentDictionary are limited to the protection of its internal state. That's it. If you want to do anything slightly non-trivial, like for example updating the dictionary and another variable as an atomic operation, you are out of luck. This is not a supported scenario for a ConcurrentDictionary. Even protecting the elements it contains (in case they are mutable objects) is not supported. If you try to update one of its values using the AddOrUpdate method, the dictionary will be protected but the value will not. The Update in this context means replace the existing value with another one, not modify the existing value.
Whenever you find tempting to use a ConcurrentDictionary, there are usually better alternatives available. Alternatives that do not involve shared state, which is what a ConcurrentDictionary essentially is. No matter how efficient is its locking scheme, it will have a hard time beating an architecture where there is no shared state at all, and each thread does its own thing without interfering with the other threads. Commonly used libraries that follow this principle are the PLINQ and the TPL Dataflow library. Below is a PLINQ example:
Dictionary<string, Product> dictionary = productIDs
.AsParallel()
.Select(id => GetProduct(id))
.ToDictionary(product => product.Barcode);
Instead of creating a dictionary beforehand, and then having multiple threads filling it concurrently with values, you can trust PLINQ to produce a dictionary utilizing more efficient strategies, involving partitioning of the initial workload, and assigning each partition to a different worker thread. A single thread will eventually aggregate the partial results, and fill the dictionary.
The accepted answer above is correct. However, it is worth mentioning explicitly if a dictionary is not being modified i.e. it is only ever read from, regardless of number of threads, then Dictionary<TKey,TValue> is preferred because no synchronization is required.
e.g. caching config in a Dictionary<TKey,TValue>, that is populated only once at startup and used throughout the application for the life of the application.
When to use a thread-safe collection : ConcurrentDictionary vs. Dictionary
If you are only reading key or values, the Dictionary<TKey,TValue> is faster because no synchronization is required if the dictionary is not being modified by any threads.
I want to implement "producer/two consumers" functionality.
Producer: scans directories recursively and adds directory information to some storage (I guess Queue<>)
Consumer 1: retrieves data about directory and writes it to XML-file.
Consumer 2: retrieves data about directory and add it to TreeNode.
So both (1 and 2) consumers have to work with a same data. Because if one of consumers call Dequeue(), the other one will miss this data.
The only idea I have - is to make 2 different Queue<> and Producer will fill them both with a same data. Then each consumer will work with different Queue object.
I hope you'll advise something more attractive
LMAX Disruptor is one solution to this problem.
Article: http://martinfowler.com/articles/lmax.html
Illustration of the single-producer, multithreaded consumer ring buffer: http://martinfowler.com/articles/images/lmax/disruptor.png
It is assumed that you will need good - nearly expert level - knowledge of how atomic instructions and lock-free algorithms work on your target platform.
The description below is different from LMAX - I adapted it to the OP's scenario.
The underlying structure could be either a ring buffer (fixed-capacity), or a lock-free linked list (unlimited capacity, but only available on platforms that supports certain kinds of multi-word atomic instructions).
The producer will just push stuff to the front.
Each consumer keeps an iterator to the item that they are processing. Each consumer advances its own iterator, at each's own pace.
Besides the consumers, there is also a trailing garbage collector which will also try to advance, but it will not advance past any of the consumer's iterators. Thus, it will eventually clean up items that both consumers have finished processing, and only those items.
You could use ZeroMQ, which has this functionality built into it (and a lot more) -
http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/patterns/pushpull.html
The above example is with Python code, but there are .NET bindings -
http://zeromq.org/bindings:clr
I have a BlockingCollection that I m using in a classic publish-subscribe type example where the collection works as a buffer. When it reaches N it has to wait for the readers to consume at least one item. This works fine.
Now I would like to be able to reset the maximum number of items, the collection can hold, at runtime. I know how to use locks and monitors to achieve this and scrap the blockingcollection all together but I dont want to reimplement something that already exists in the core framework.
Is there any way to achieve that?
i'm writing some code that basically receive data from socket, perform deserialization and then passed to my application.the deserialized objects can be grouped by their id's (the id is being generated during the deserialization process).
to increase the performance of my application i wanted to use the new parallelism capabilities that came with C# 4.0. the only constraint i have is that 2 threads cannot access object with the same id's. now i know that i can just perform lock() on a sync object that will be placed inside the object but i want to avoid these locks (performance is an issue here).
The design I've thought about:
create some kind of partitioner that will split data by the ID (this will make sure that every buffer i'll get will always have the same object id's group together).
assign thread by using TPL of PLINQ
can someone suggest me some sources that do that?
I would suggest PLINQ when developing for multiple processors or cores.
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it. In fact, threads and locks won't even come up unless you really want to dive under the hood to understand how it all works. PLINQ is a key component of Parallel FX, the next generation of concurrency support in the Microsoft® .NET Framework.
This covers:
From LINQ to PLINQ
PLINQ Programming Model
Processing Query Output
Concurrent Exceptions
Ordering in the Output Results
Side Effects
Putting PLINQ to Work
Parallel LINQ (PLINQ)