Best Data Structure? - 2 Threads, 1 producer, 1 consumer

Best Data Structure? - 2 Threads, 1 producer, 1 consumer - c#

What is the best data structure to use to do the following:
2 Threads:
1 Produces (writes) to a data structure
1 Consumes (reads and then deletes) from the data structure.
Thread safe
Producer and Consumer can access data structure simultaenously
Efficient for large amounts of data

I wouldn't say that point 4 is impossible, but it is pretty hard, and actually, you should think hard if you really have that requirement.
...
Now that you realized that you don't, the Queue<T> would be what immediately springs to my mind when reading Producer/Consumer.
Let's say you have a thread running ProducerProc() and another running ConsumerProc(), and a method CreateThing() which produces, and a method HandleThing() which consumes, my solution would look something like this:
private Queue<T> queue;
private void ProducerProc()
{
while (true) // real abort condition goes here
{
lock (this.queue)
{
this.queue.Enqueue(CreateThing());
Monitor.Pulse(this.queue);
}
Thread.Yield();
}
}
private void ConsumerProc()
{
while (true)
{
T thing;
lock (this.queue)
{
Monitor.Wait(this.queue);
thing = this.queue.Dequeue();
}
HandleThing(thing);
}
}
Seeing lock, you realize immediately, that the two threads do NOT access the data structure completely simultaneously. But then, they only keep the lock for the tiniest amount of time. And the Pulse/Wait thing makes the consumer thread immediately react to the producer thread. This should really be good enough.

Related

Is there a safe way for a quick exchange of data between threads?

I'm setting up an application that reads data from a load cell and, in real time, based on the data read, interrupts the thrust of a motor. It is essential to have a high frequency reading from the load cell.
I'm programming in c# and I decided to use a separate thread to acquire data from the load cell.
My question is this: how can I use the data acquired in the thread in a thread-safe way? For example to show them in a chart.
This is the thread I call to acquire data in the queue.
Thread t = new Thread(() =>
{
Thread.CurrentThread.IsBackground = true;
while (save_in_queue)
{
Thread.Sleep(1);
if (queue.Count <= 1000)
{
queue.Enqueue(Frm_main.ComPh1.LeggiAnalogica(this.Address));
}
else
{
queue.Dequeue();
queue.Enqueue(Frm_main.ComPh1.LeggiAnalogica(this.Address));
}
}
});
t.Name = "Queue " + this.name;
t.Start();
This is method I use to associate queue filled in thread and queue in the main
public void SetData(Queue<int> q)
{
this.data = q;
}
This is the timer I use in the main application to set data for the series
private void timer1_Tick(object sender, EventArgs e)
{
List<int> dati = new List<int>();
lock (data)
{
dati = data.ToList();
}
grafico.Series[serie.Name].Points.Clear();
for (int x = 0; x < dati.Count; x++)
{
DataPoint pt = new DataPoint();
pt.XValue = x;
pt.YValues = new double[] { dati.ElementAt(x) };
grafico.Series[serie.Name].Points.Add(pt);
}
}
This code does not work because somethimes I receive the exception
"Collection was modified; enumeration operation may not execute" on the line dati = data.ToList();
Form me it's pretty clear why I receive this exception. But how to solve it?
I would like to avoid using too many "lock" or too many synchronization variables in order not to reduce the acquisition performance, which at the moment is excellent.

Don't do this in your consumer thread:
lock (data) {
dati = data.ToList();
}
You're using the queue for two different purposes; You're using it to pass data between the two threads, which is good; but you're also using it as a history buffer for previous data samples. That's bad.
What's doubly bad is, each time the timer ticks, you're locking the queue long enough to let the consumer copy maybe hundreds of data that it had previously copied on earlier ticks.
This is bad too:
if (queue.Count <= 1000) {
queue.Enqueue(Frm_main.ComPh1.LeggiAnalogica(this.Address));
}
else {
queue.Dequeue(); <== THIS IS BAD!
queue.Enqueue(Frm_main.ComPh1.LeggiAnalogica(this.Address));
}
One problem with that is, you are making the producer manage the history buffer (e.g., by limiting the length of the queue), but it's the consumer who cares about the length.
Another problem is that the producer does not lock the queue. If any thread needs to lock a data structure, then every thread needs to lock it.
The producer should do just one thing: It should read data from the sensor, and stuff the data into a queue.
The queue should be used for just one purpose: To communicate new data between the threads.
The producer should lock the queue just long enough to get the new data from the queue, and copy that into its own, private collection.
Multi-threaded programming often can be counter-intuitive. One example is; If you can decrease the amount of time that threads spend accessing a shared object by increasing the amount of work that each thread has to do, that often will improve the overall performance of the program. That's because locking is expensive, and because accessing memory locations that have been touched by other threads is expensive.

You may want to check Concurrent Collections namespace which provides Thread-Safe implementation of some collections
The System.Collections.Concurrent namespace provides several
thread-safe collection classes that should be used in place of the
corresponding types in the System.Collections and
System.Collections.Generic namespaces whenever multiple threads are
accessing the collection concurrently.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent
So you can use System.Collections.Concurrent.ConcurrentQueue instead of System.Collections.Queue in order to provide lock free solution to your problem.

Is it possible to create a deadlock in C# if nothing but the lock keyword is used around primitive data access?

I've written a lot of multi-threaded C# code, and I've never had a deadlock in any code I've released.
I use the following rules of thumb:
I tend to use nothing but the lock keyword (I also use other techniques such as reader/writer locks, but sparingly, and only if required for speed).
I use Interlocked.Increment if I am dealing with a long.
I tend to use the smallest granular unit of locking: I only tend to lock around primitive data structures such as long, dictionary or list.
I'm wondering if it's even possible to generate a deadlock if these rules are thumb are consistently followed, and if so, what the code would look like?
Update
I also use these rules of thumb:
Avoid adding a lock around anything that could pause indefinitely, especially I/O operations. If you absolutely have to do so, ensure that absolutely everything within the lock will time out after a set TimeSpan.
The objects I use for locking are always dedicated objects, e.g. object _lockDict = new object(); then lock(_lockDict) { // Access dictionary here }.
Update
Great answer from Jon Skeet. It also confirms why I never get deadlocks as I tend to instinctively avoid nested locks, and even if I do use them, I've always instinctively kept the entry order consistent.
And in response to my comment on tending to use nothing but the lock keyword, i.e. using Dictionary + lock instead of ConcurrentDictionary, Jon Skeet made this comment:
#Contango: That's exactly the approach I'd take too.
I'd go for simple code with locking over "clever" lock-free code every time, until there's evidence that it's causing an issue.

Yes, it's easy to deadlock, without actually accessing any data:
private readonly object lock1 = new object();
private readonly object lock2 = new object();
public void Method1()
{
lock(lock1)
{
Thread.Sleep(1000);
lock(lock2)
{
}
}
}
public void Method2()
{
lock(lock2)
{
Thread.Sleep(1000);
lock(lock1)
{
}
}
}
Call both Method1 and Method2 at roughly the same time, and boom - deadlock. Each thread will be waiting for the "inner" lock, which the other thread has acquired as its "outer" lock.
If you make sure you always acquire locks in the same order (e.g. "never acquire lock2 unless you already own lock1) and release the locks in the reverse order (which is implicit if you're acquiring/releasing with lock) then you won't get that sort of deadlock.
You can still get a deadlock with async code, with just a single thread involved - but that involves Task as well:
public async Task FooAsync()
{
BarAsync().Wait(); // Don't do this!
}
public async Task BarAsync()
{
await Task.Delay(1000);
}
If you run that code from a WinForms thread, you'll deadlock in a single thread - FooAsync will be blocking on the task returned by BarAsync, and the continuation for BarAsync won't be able to run because it's waiting to get back onto the UI thread. Basically, you shouldn't issue blocking calls from the UI thread...

As long as you ever only lock on one thing it's impossible, if one thread tries to lock on multiple locks, then yes. The dining philosophers problem nicely illustrates a simple deadlock caused with simple data.
As the other answers have already shown;
void Thread1Method()
{
lock (lock1)
{
// Do smth
lock (lock2)
{ }
}
}
void Thread2Method()
{
lock (lock2)
{
// Do smth
lock (lock2)
{ }
}
}

Addendum to what Skeet wrote:
The problem normally isn't with "only" two locks... (clearly there could be even with only two locks, but we want to play in Hard mode :-) )...
Let's say that in your program there are 10 lockable resources... Let's call them a1...a10. You must be sure that you'll always lock those in the same order, even for subsets of them... If a method needs a3, a5 and a7, and another methods needs a4, a5, a7, you must be sure that both will try locking them in the "right" order. For simplicity sake in this case the order is clear: a1->a10.
Normally lock objects aren't numbered, and/or they aren't taken in a single method... For example:
void MethodA()
{
lock (Lock1)
{
CommonMethod();
}
}
void MethodB()
{
lock (Lock3)
{
CommonMethod();
}
}
void CommonMethod()
{
lock (Lock2)
{
}
}
void MethodC()
{
lock (Lock1)
{
lock (Lock2)
{
lock (Lock3)
{
}
}
}
}
Here, even with the Lock* numbered, it isn't immediately clear that the locks could be taken in the wrong order (MethodB+CommonMethod take Lock3+Lock2, while MethodC takes Lock1+Lock2+Lock3)... It isn't immediately clear and we are playing with three very big advantages: we are speaking of deadlock, so we are looking for them, the locks are numbered and the whole code is around 30 lines.

Cross-Thread access of a field in C#

If a class has an array, it doesn't really matter what of. Now one thread is adding data to said array, while another thread needs to process the data that is already in it. With my limited knowledge of multithreading, how could this work? The first problem I can think of is if an item is added while the other thread is processing what's still there. At first I thought that wouldn't be a problem, the processor thread would get it next time it processed, but then I realized that while the processor thread removes items it's already processed, the adding thread would not receive this change, possibly (?) wreaking havoc. Is there any good way to implement this behavior?

What you've described is basically the Reader Writers Problem. If you want to take care of multithreading, you're either going to need a concurrent collection, or use of a lock. The simplest implementation of a lock would just be locking an object
private Object myLock = new Object();
public MyClass ReadFromSharedArray()
{
lock(myLock)
{
//do whatever here
}
}
public void WriteToSharedArray(MyClass data)
{
lock(myLock)
{
//Do whatever here
}
}
There are better locks such as ReadWriterSlim locks but this sort of basic implementation should be a good starting point.
Also you mentioned adding/removing from arrays, I'm assuming you meant Lists (or better yet a Queue) - there's a ConcurrentQueuewhich could be a good replacement.

C# alternative to lock if volatile isn't a good idea

I'm sorry I know this topic has been done to death (I've read I've read this and this and a few more) but there is one issue I have which I am not sure how to do 'correctly'.
Currently my code for a multithreaded Sudoku strategy is the following:
public class MultithreadedStrategy : ISudokuSolverStrategy
{
private Sudoku Sudoku;
private List<Thread> ThreadList = new List<Thread>();
private Object solvedLocker = new Object();
private bool _solved;
public bool Solved // This is slow!
{
get
{
lock (solvedLocker)
{
return _solved;
}
}
set
{
lock (solvedLocker)
{
_solved = value;
}
}
}
private int threads;
private ConcurrentQueue<Node> queue = new ConcurrentQueue<Node>();
public MultithreadedStrategy(int t)
{
threads = t;
Solved = false;
}
public Sudoku Solve(Sudoku sudoku)
{
// It seems concevable to me that there may not be
// a starting point where there is only one option.
// Therefore we may need to search multiple trees.
Console.WriteLine("WARNING: This may require a large amount of memory.");
Sudoku = sudoku;
//Throw nodes on queue
int firstPos = Sudoku.FindZero();
foreach (int i in Sudoku.AvailableNumbers(firstPos))
{
Sudoku.Values[firstPos] = i;
queue.Enqueue(new Node(firstPos, i, false, Sudoku));
}
//Setup threads
for (int i = 0; i < threads; i++)
{
ThreadList.Add(new Thread(new ThreadStart(ProcessQueue)));
ThreadList[i].Name = String.Format("Thread {0}", i + 1);
}
//Set them running
foreach (Thread t in ThreadList)
t.Start();
//Wait until solution found (conditional timeout?)
foreach (Thread t in ThreadList)
t.Join();
//Return Sudoku
return Sudoku;
}
public void ProcessQueue()
{
Console.WriteLine("{0} running...",Thread.CurrentThread.Name);
Node currentNode;
while (!Solved) // ACCESSING Solved IS SLOW FIX ME!
{
if (queue.TryDequeue(out currentNode))
{
currentNode.GenerateChildrenAndRecordSudoku();
foreach (Node child in currentNode.Children)
{
queue.Enqueue(child);
}
// Only 1 thread will have the solution (no?)
// so no need to be careful about locking
if (currentNode.CurrentSudoku.Complete())
{
Sudoku = currentNode.CurrentSudoku;
Solved = true;
}
}
}
}
}
(Yes I have done DFS with and without recursion and using a BFS which is what the above strategy modifies)
I was wondering whether I am allowed to change my private bool _solved; to a private volatile solved; and get rid of the accessors. I think this might be a bad thing because my ProcessQueue() method changes the state of _solved Am I correct? I know booleans are atomic but I don't want compiler optimisations to mess up the order of my read/write statements (esp. since the write only happens once).
Basically the lock statement adds tens of seconds to the run time of this strategy. Without the lock it runs an awful lot faster (although is relatively slow compared to a DFS because of the memory allocation within currentNode.GenerateChildrenAndRecordSudoku()

Before getting into alternatives: it is probably safe to go with a low-lock solution here by making access to the boolean volatile. This situation is ideal, as it is unlikely that you have complex observation-ordering requirements. ("volatile" does not guarantee that multiple volatile operations are observed to have consistent ordering from multiple threads, only that reads and writes have acquire and release semantics.)
However, low-lock solutions make me very nervous and I would not use one unless I was sure I had need to.
The first thing I would do is find out why there is so much contention on the lock. An uncontended lock should take 20-80 nanoseconds; you should only get a significant performance decrease if the lock is contended. Why is the lock so heavily contended? Fix that problem and your performance problems will go away.
The second thing I might do if contention cannot be reduced is to use a reader-writer lock. If I understand your scenario correctly, you will have many readers and only one writer, which is ideal for a reader-writer lock.
Leaving the question of volatility aside: as others have pointed out, there are basic mistakes in your threading logic like spinning on a boolean. This stuff is hard to get right. You might consider using the Task Parallel Library here as a higher-level abstraction than rolling your own threading logic. The TPL is ideally suited for problems where significant work must be done on multiple threads. (Note that the TPL does not magically make not-thread-safe code thread-safe. But it does provide a higher level of abstraction, so that you are dealing with Tasks rather than Threads. Let the TPL schedule the threads for you.)
Finally: the idea that a sudoku solver should take tens of seconds indicates to me that the solver is, frankly, not very good. The sudoku problem is, in its theoretically worst possible case, a hard problem to solve quickly no matter how many threads you throw at it. But for "newspaper" quality sudokus you should be able to write a solver that runs in a fraction of a second. There's no need to farm the work out to multiple threads if you can do the whole thing in a few hundred milliseconds.
If you're interested, I have a C# program that quickly finds solutions to sudoku problems here:
http://blogs.msdn.com/b/ericlippert/archive/tags/graph+colouring/

So the first thing, fix you're while loop to just join the threads...
//Set them running
foreach (Thread t in ThreadList)
t.Start();
//Wait until solution found (conditional timeout?)
foreach (Thread t in ThreadList)
t.Join(/* timeout optional here */);
Then there is issue with when to shutdown the threads. My advise is to introduce a wait handle on the class and then in the worker threads just loop on that...
ManualResetEvent mreStop = new ManualResetEvent(false);
//...
while(!mreStop.WaitOne(0))
{
//...
Now just modify the Solved property to signal all threads that they should quit...
public bool Solved
{
get
{
return _solved;
}
}
// As Eric suggests, this should be a private method, not a property set.
private void SetCompleted()
{
_solved = value;
mreStop.Set();
}
The benefit to this approach is that if a thread fails to quit within a timeout period you can signal the mreStop to stop the workers without setting _solved to true.

volatile IS used to prevent optimizations such as caching and reordering of reads/writes for a single variable. Using it in this case is exactly what it's designed for. I don't see what your concern is.
lock is a slow yet working alternative because it introduces a memory fence implicitly, but in your case you are using a lock just for the memory fence side-effect, which is not really a nice idea.

C# Threading and Queues

This isn't about the different methods I could or should be using to utilize the queues in the best manner, rather something I have seen happening that makes no sense to me.
void Runner() {
// member variable
queue = Queue.Synchronized(new Queue());
while (true) {
if (0 < queue.Count) {
queue.Dequeue();
}
}
}
This is run in a single thread:
var t = new Thread(Runner);
t.IsBackground = true;
t.Start();
Other events are "Enqueue"ing else where. What I've seen happen is over a period of time, the Dequeue will actually throw InvalidOperationException, queue empty. This should be impossible seeing as how the count guarantees there is something there, and I'm positive that nothing else is "Dequeue"ing.
The question(s):
Is it possible that the Enqueue actually increases the count before the item is fully on the queue (whatever that means...)?
Is it possible that the thread is somehow restarting (expiring, reseting...) at the Dequeue statement, but immediately after it already removed an item?
Edit (clarification):
These code pieces are part of a Wrapper class that implements the background helper thread. The Dequeue here is the only Dequeue, and all Enqueue/Dequeue are on the Synchronized member variable (queue).

Using Reflector, you can see that no, the count does not get increased until after the item is added.
As Ben points out, it does seem as you do have multiple people calling dequeue.
You say you are positive that nothing else is calling dequeue. Is that because you only have the one thread calling dequeue? Is dequeue called anywhere else at all?
EDIT:
I wrote a little sample code, but could not get the problem to reproduce. It just kept running and running without any exceptions.
How long was it running before you got errors? Maybe you can share a bit more of the code.
class Program
{
static Queue q = Queue.Synchronized(new Queue());
static bool running = true;
static void Main()
{
Thread producer1 = new Thread(() =>
{
while (running)
{
q.Enqueue(Guid.NewGuid());
Thread.Sleep(100);
}
});
Thread producer2 = new Thread(() =>
{
while (running)
{
q.Enqueue(Guid.NewGuid());
Thread.Sleep(25);
}
});
Thread consumer = new Thread(() =>
{
while (running)
{
if (q.Count > 0)
{
Guid g = (Guid)q.Dequeue();
Console.Write(g.ToString() + " ");
}
else
{
Console.Write(" . ");
}
Thread.Sleep(1);
}
});
consumer.IsBackground = true;
consumer.Start();
producer1.Start();
producer2.Start();
Console.ReadLine();
running = false;
}
}

Here is what I think the problematic sequence is:
(0 < queue.Count) evaluates to true, the queue is not empty.
This thread gets preempted and another thread runs.
The other thread removes an item from the queue, emptying it.
This thread resumes execution, but is now within the if block, and attempts to dequeue an empty list.
However, you say nothing else is dequeuing...
Try outputting the count inside the if block. If you see the count jump numbers downwards, someone else is dequeuing.

Here's a possible answer from the MSDN page on this topic:
Enumerating through a collection is
intrinsically not a thread-safe
procedure. Even when a collection is
synchronized, other threads can still
modify the collection, which causes
the enumerator to throw an exception.
To guarantee thread safety during
enumeration, you can either lock the
collection during the entire
enumeration or catch the exceptions
resulting from changes made by other
threads.
My guess is that you're correct - at some point, there's a race condition happening, and you end up dequeuing something that isn't there.
A Mutex or Monitor.Lock is probably appropriate here.
Good luck!

Are the other areas that are "Enqueuing" data also using the same synchronized queue object? In order for the Queue.Synchronized to be thread-safe, all Enqueue and Dequeue operations must use the same synchronized queue object.
From MSDN:
To guarantee the thread safety of the
Queue, all operations must be done
through this wrapper only.
Edited:
If you are looping over many items that involve heavy computation or if you are using a long-term thread loop (communications, etc.), you should consider having a wait function such as System.Threading.Thread.Sleep, System.Threading.WaitHandle.WaitOne, System.Threading.WaitHandle.WaitAll, or System.Threading.WaitHandle.WaitAny in the loop, otherwise it might kill system performance.

question 1: If you're using a synchronized queue, then: no, you're safe! But you'll need to use the synchronized instance on both sides, the supplier and the feeder.
question 2: Terminating your worker thread when there is no work to do, is a simple job. However, you either way need a monitoring thread or have the queue start a background worker thread whenever the queue has something to do. The last one sounds more like the ActiveObject Pattern, than a simple queue (which's Single-Responsibily-Pattern says that it should only do queueing).
In addition, I'd go for a blocking queue instead of your code above. The way your code works requires CPU processing power even if there is no work to do. A blocking queue lets your worker thread sleep whenever there is nothing to do. You can have multiple sleeping threads running without using CPU processing power.
C# doesn't come with a blocking queue implementation, but there a many out there. See this example and this one.

Another option for making thread-safe use of queues is the ConcurrentQueue<T> class that has been introduced since 2009 (the year of this question). This may help avoid having to write your own synchronization code or at least help making it much simpler.
From .NET Framework 4.6 onward, ConcurrentQueue<T> also implements the interface IReadOnlyCollection<T>.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.