How to detect AddingCompleted of a BlockingCollection without race condition and exception? - c#

I'm using a BlockingCollection{T} that's filled from only one thread and consumed by only one thread. Producing and consuming items works fine. The problem is at the end of this operation. The task blocks (as expected) at GetConsumingEnumerable. After calling CompleteAdding the task will dispose the BlockingCollection and will finish without any exceptions. So far so good.
Now I've a thread that adds items to the BlockingCollection. This thread has to test for IsAddingCompleted and then it has to add the item. But there's a race condition between aksing for IsAddingCompleted and adding the item. There's a TryAdd-method but is also raises an exception if adding is already completed.
How can I add an item or test for adding completed without an additional lock? Why does TryAdd throw any exceptions? Returning false will be fine if adding is already completed.
The very simplified code looks like that:
private BlockingCollection<string> _items = new BlockingCollection<string>();
public void Start()
{
Task.Factory.StartNew(
() =>
{
foreach (var item in this._items.GetConsumingEnumerable())
{
}
this._items.Dispose();
});
Thread.Sleep(50); // Wait for Task
this._items.CompleteAdding(); // Complete adding
}
public void ConsumeItem(string item)
{
if (!this._items.IsAddingCompleted)
{
this._items.Add(item);
}
}
Yes I know that this code doesn't make sense because there's nearly no chance to add any item and the foreach-loop does noting. The consuming task doesn't matter for my problem.
The problem is shown in ConsumeItem-method. I'm able to add an additional lock (Semaphore) arround ConsumeItem and CompleteAdding+Dispose but I try to avoid this performance impact.
How can I add items without any exceptions? Losing items will be fine if adding has been completed.

Related

How to invoke a consumer method as soon as BlockingCollection got populated?

Background:
By reading so many sources I understood BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.
I also have similar requirement:
I have a blocking collection of size 1.
This collection will be populated from 3 places (3 producers).
Currently using while loop to check whether collection has something or not.
Want to execute ProcessInbox() method as soon as blocking collection got a value and empty that collection, without checking if new data is available for consumer thread in certain time intervals typically in a while loop. How we can achieve it?
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
namespace ConsoleApp1
{
class Program
{
private static BlockingCollection<int> _processingNotificationQueue = new(1);
private static void GetDataFromQueue(CancellationToken cancellationToken)
{
Console.WriteLine("GDFQ called");
int data;
//while (!cancellationToken.IsCancellationRequested)
while(!_processingNotificationQueue.IsCompleted)
{
try
{
if(_processingNotificationQueue.TryTake(out data))
{
Console.WriteLine("Take");
ProcessInbox();
}
}
catch (Exception ex)
{
}
}
}
private static void ProcessInbox()
{
Console.WriteLine("PI called");
}
private static void PostDataToQueue(object state)
{
Console.WriteLine("PDTQ called");
_processingNotificationQueue.TryAdd(1);
}
private void MessageInsertedToTabale()
{
PostDataToQueue(new CancellationToken());
}
private void FewMessagesareNotProcessed()
{
PostDataToQueue(new CancellationToken());
}
static void Main(string[] args)
{
Console.WriteLine("Start");
new Timer(PostDataToQueue, new CancellationToken(), TimeSpan.Zero,
TimeSpan.FromMilliseconds(100));
// new Thread(()=> PostDataToQueue()).Start();
new Thread(() => GetDataFromQueue(new CancellationToken())).Start();
Console.WriteLine("End");
Console.ReadKey();
}
}
}
Just foreach over it. It's blocking. As long as it is not marked as completed, your foreach will HANG if the collection is empty, and will wake up as soon as new items were added.
See first ConsumingEnumerableDemo in of https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=net-6.0 and imagine the consumer foreach (var item in bc.GetConsumingEnumerable()) is in another thread. The producer there has a delay between new items, so you should be able to easily tinker with it and see how consumer wakes up "relatively immediately". There's just one producer, but I don't see a problem with multiple producers, Add is thread-safe.
I can't guarantee that there's no significant delay between adding new item and waking up a sleeping consumer, because 'significant' is totally case-dependent word. There probably is some delay, at least for switching threads, but I doubt the collection does any additional throttling. I suppose it signals to wake up sleeping consumers before Add returns in the producer. And I suppose for the purposes of Inbox processing, that's probably UI thing for humans, and probably an order of 100ms delay won't be noticeable and I wouldn't expect the Add/Wakeup latency to be much below 100ms. No guarantees though.
If the 'blocking foreach' part sounds evil to you for some reason, you'll probably have to switch to a different synchronization mechanism (**). This is a BlockingCollection, right? It's not evented collection or something. The TryXXX methods are there for cases where you want to limit exceptions for some reason, and can deal with scheduling updates yourself like you do here (*).
(*) well, almost. This code you posted is missing 2 important things. Your while loop busy-spins at max speed when the collection is empty, that's usually a deadly no-no, especially for anything that runs on batteries. Consider addind some dead time to have if(hasItems) doWork; else sleep(sometime);. The other thing is try-catch. The docs say, when the collection throws, it means the collection is "done". NO MORE ITEMS EVER. No point in looping over a dead collection. The try-catch should be not inside the loop, but should encompass the loop so looping is stopped when collection is finished.
(**) I personally like RX extensions. Here it'd be a simple subject and one observer. Also, async/wait/IAsyncEnumerable are tempting, can help prevent synchronization by sleeping, but it still can end up with a busy-spinning loop if not done carefuly. And there are more choices, and the question was on BlockingCollection, so just FYI.

C# lock leads to freeze

i have this list of sounds:
List<SourceVoice> runningInstances;
i attach an event to a sound object so that i remove it from the list when it is stopped.
sourceVoice.StreamEnd += delegate
{
lock (runningInstances)
{
runningInstances.Remove(sourceVoice);
}
};
and i also have this stop function, which is called from any thread.
public void stop(int fadeoutTime)
{
lock (runningInstances)
{
foreach (var sourceVoice in runningInstances)
{
if (!sourceVoice.IsDisposed)
{
sourceVoice.Stop();
sourceVoice.FlushSourceBuffers();
sourceVoice.DestroyVoice();
sourceVoice.Dispose();
}
}
runningInstances.Clear();
}
}
i thought that since i make the event a delegate, it will always wait until the object is unlocked. however it seems that it freezes there.
There are 2 possibilities:
the event is raised on the same thread as sourceVoice.Stop();. The lock() {} has no function because it is re-entrant but it is also harmless. The Items should already have been removed when Clear() is called.
the event is raised on another (threadpool) thread. This is up to sourceVoice.Stop(). The lock() will block the event handling until after runningInstances.Clear(). After that the handlers will run and removing from an epty List<> is not an error.
Neither would cause any 'freezing', so there must be something relevant in code we don't see.
Delegates are just callbacks, they don't make any guarantees about threading. You may want to check out the ConcurrentBag class, which is already thread-safe, so you can avoid worrying as much about the locking with respect to the collection.
It looks like one of the calls within the lock scope of the stop method is probably causing the StreamEnd event to fire. You could test for this by stepping through the code in the stop method a seeing if it jumps into the event. I would hazard a guess that its the sourceVoice.Stop() call.
You can change your stop method as below if sourceVoice.Stop() always raise the sourceVoice.StreamEnd event.
public void stop(int fadeoutTime)
{
foreach (var sourceVoice in runningInstances.ToList<SourceVoice>())
{
if (!sourceVoice.IsDisposed)
{
sourceVoice.Stop();
sourceVoice.FlushSourceBuffers();
sourceVoice.DestroyVoice();
sourceVoice.Dispose();
}
}
}
To know about .ToList() you can see
ToList()-- Does it Create a New List?

Thread won't resume after multithreaded session

I have a thread, call it the "Parsing thread".
Thread parsingThread = new Thread(myMethod);
I perform some computations on this thread, of which the last involves more parallel computations.
public void ReadCityFiles(BlockingCollection<GeonamesFileInfo> files)
{
Parallel.ForEach<GeonamesFileInfo>(
files.GetConsumingPartitioner<GeonamesFileInfo>(),
new ParallelOptions { MaxDegreeOfParallelism = _maxParallelism },
(inputFile, args) =>
{
RaiseFileParsing(inputFile);
using (var input = new System.IO.StreamReader(inputFile.FullName))
{
while (!input.EndOfStream)
{
RaiseEntryParsed(ParseCity(input.ReadLine()));
Interlocked.Increment(ref _parsedEntries);
}
}
RaiseFileParsed(inputFile);
});
RaiseDirectoryParsed(Directory);
}
The problem is that when these very long and computationally expensive async foreach operations finish (~30 mins), the "Parsing Thread" doesn't resume. My GUI is still responsive, but the RaiseDirectoryParsed function that is supposed to continue to run on the "Parsing Thread" is never called. I debugged the program up to this point, and am pretty baffled as to what to do in this situation.
The point of BlockingCollection is that when an operation cannot be performed now, but might be in the future (e.g. Take() or Add() on a collection with bounded capacity), it will block. The same applies to GetConsumingEnumerable() and thus also to GetConsumingPartitioner(): if the collection is currently empty, the enumerable will block until you add more items to the collection.
But there is also a way to tell the collection that you're not going to add new items anymore and that it shouldn't block when empty from now on: the CompleteAdding() method. If you call this when you know you won't be adding any more new items to the collection, your Parallel.ForEach() won't block anymore and your thread will continue executing.

Patterns for handling an event firing from another thread

Suppose I have two threads. On one thread, I'm performing a sequence of jobs. On the other thread I'm waiting for user input to control those jobs (i.e., skip to the next job). For example (pseudo-C#):
public class Worker {
public List<Job> Jobs { get; set; }
public Worker(Controller anotherThread) {
anotherThread.SkipJobRequested += OnSkipJobRequested;
}
public DoWork() {
foreach (Job job in Jobs) {
// Do various work...
}
}
// Event that fires on Controller thread
public void OnSkipJobRequsted(Object sender, EventArgs args) {
// Somehow skip to the next job
}
}
I'm uncertain how I should handle skipping to the next job. One possibility that occurs to me would be to have an instance variable (like, IsSkipRequested) that I set when a skip is requested, and check it at various junctures within DoWork().
Do any other patterns exist that I could use to handle this event?
Another pattern is the .Net class BackgroundWorker, which seems like it should suit your purpose. You could make Job a subclass of BackgroundWorker, and cycle through. The difference is that BackgroundWorker doesn't know about your UI thread, only whether cancellation has been requested.
In this pattern the UI thread would call CancelAsync, then your DoWork method would check the CancellationPending at convenient intervals to decide whether or not to proceed. You would call RunWorkerAsync for the next job inside the RunWorkerCompleted event handler.
Another suggestion would be (and you have mentioned this above) if you made List<Jobs> a Queue<Jobs>, and rather than DoWork performing a foreach maintained the current executing job in your object, you could simply dequeue each item when ready for processing.
The task parallel library allows you to specify a cancellation token (although you would probably have to pass this to your job execution code, and handle cancellation in there), which you can call RequestCancellation on when skip is pressed, and start the next job from the queue. Additionally, when firing off a new task you can specify an action to perform on completion, this would allow you to chain your tasks together into a sequential order and skip tasks when required. Below is an example without a cancellation token:
_currentJob = Jobs.DeQueue();
Task.Factory.StartNew(() => {_currentJob.execute();},
() =>
{
//On task completion logic
ExecuteNextJobFromQueue();
}
Be aware that this approach would probably work best if Job is performing multiple tasks and not one big blocking task as you would need to check for cancellation during job execution.

locking object in async operation

I have the following code that I want to achieve the following with.
Check if a value is in cache
If in cache, get the value from it and proceed
If not in cache, perform the logic to enter it in cache but do this async as the operation to do such may take a long period of time and I dont want to hold up the user
As you will see in my code I place a lock on the cache in the async thread. Is my setup below thread safe? And by placing the lock will this mean that the cache will not be accessible for other threads to read from cache while the async operation takes place. I do not want a circumstance where the cache is locked in an async thread preventing other requests from accessing it.
There is also a chance that the same request may be called by several threads hence the lock.
Any recommendations as how I could improve the code would be great.
// Check if the value is in cache
if (!this.Cache.Contains(key))
{
// Perform processing of files async in another thread so rendering is not slowed down
ThreadPool.QueueUserWorkItem(delegate
{
lock (this.Cache)
{
if (!this.Cache.Contains(key))
{
// Perform the operation to get value for cache here
var cacheValue = operation();
this.Cache.Add(key, cacheValue);
}
}
});
return "local value";
}
else
{
// Return the string from cache as they are present there
return this.Cache.GetFilename(key);
}
Note: this.Cache represents a cache object.
The application is a web application on .net 3.5.
How about changing the delegate to look like this:
var cacheValue = operation();
lock (this.Cache)
{
if (!this.Cache.Contains(key))
{
// Perform the operation to get value for cache here
this.Cache.Add(key, cacheValue);
}
}
This kind of coding locks the dictionary for a very short time. You can also try using ConcurrentDictionary that mostly doesn't to any locking at all.
Alex.
There are several problems with your code. Problems include: calling Cache.Contains outside a lock while other threads may be modifying the collection; invoking operation within a lock which may cause deadlocks; etc.
Here's a thread-safe implementation of a cache that satisfies all your requirements:
class Cache<TKey, TValue>
{
private readonly ConcurrentDictionary<TKey, Task<TValue>> items;
public Cache()
{
this.items = new ConcurrentDictionary<TKey, Task<TValue>>();
}
public Task<TValue> GetAsync(TKey key, Func<TKey, TValue> valueFactory)
{
return this.items.GetOrAdd(key,
k => Task.Factory.StartNew<TValue>(() => valueFactory(k)));
}
}
The GetAsync method works as follows: First it checks if there is a Task in the items dictionary for the given key. If there is no such Task, it runs valueFactory asynchronously on the ThreadPool and stores the Task object that represents the pending asynchronous operation in the dictionary. Code calling GetAsync can wait for the Task to finish, which will return the value calculated by valueFactory. This all happens in an asynchronous, non-blocking, thread-safe manner.
Example usage:
var cache = new Cache<string, int>();
Task<int> task = cache.GetAsync("Hello World", s => s.Length);
// ... do something else ...
task.Wait();
Console.WriteLine(task.Result);
Looks like a standard solution, except for the retrieval in the background thread. It will be thread safe as long as all other bits of the code that use the cache also take out a lock on the same cache reference before modifying it.
From your code, other threads will still be able to read from the cache (or write to it if they don't take out a lock(). The code will only block at the point a lock() statement is encountered.
Does the return "local value" make sense? Would you not need to retrieve the item in that function anyway in the case of a cache miss?

Categories

Resources