Deadlock testing with TestSchedulers, Rx and BlockingCollection - c#

I have the following class which basically subscribes to an int observable and multiplies the value by 2. For reality purposes I added a Thread.Sleep to simulate a heavy processing.
public class WorkingClass
{
private BlockingCollection<int> _collection = new BlockingCollection<int>(1);
public WorkingClass(IObservable<int> rawValues)
{
rawValues.Subscribe(x => _collection.Add(x));
}
public IObservable<int> ProcessedValues()
{
return Observable.Create<int>(observer =>
{
while (true)
{
int value;
try
{
value = _collection.Take();
}
catch (Exception ex)
{
observer.OnError(ex);
break;
}
Thread.Sleep(1000); //Simulate long work
observer.OnNext(value * 2);
}
return Disposable.Empty;
});
}
}
I'm having trouble testing it, in the following test I just want to assert that if the source stream emits the value 1 the SUT will emit the value 2:
[Test]
public void SimpleTest()
{
var sourceValuesScheduler = new TestScheduler();
var newThreadScheduler = new TestScheduler();
var source = sourceValuesScheduler.CreateHotObservable(
new Recorded<Notification<int>>(1000, Notification.CreateOnNext(1)));
var sut = new WorkingClass(source);
var observer = sourceValuesScheduler.CreateObserver<int>();
sut.ProcessedValues()
.SubscribeOn(newThreadScheduler) //The cold part (i.e, the while loop) of the ProcessedValues Observable should run in a different thread
.Subscribe(observer);
sourceValuesScheduler.AdvanceTo(1000);
observer.Messages.AssertEqual(new Recorded<Notification<int>>(1000, Notification.CreateOnNext(2)));
}
If I run this test the assert fails because the newThreadScheduler was never started and consequently the ProcessedValues observable was never created. If I do this:
sourceValuesScheduler.AdvanceTo(1000);
newThreadScheduler.AdvanceTo(1000);
It doesn't work either because the newThreadScheduler uses the same Thread of the sourceValuesScheduler so the test will be hanging right after the processed value is emmited, at the line:
value = _collection.Take();
Is there a way we can have multiple TestSchedulers running on different threads? Otherwise how can I test classes like this?

Take() blocks until there is an item to remove from the BlockingCollection<int> or you call CompleteAdding() on it.
Given your current implementation, the thread on which you subscribe to ProcessedValues() and execute the while loop will never finish.
You are supposed to consume the BlockingCollection<int> on a separate thread. You may for example create a consume Task when ProcessedValues() is called. Consider the following implementation which also disposes the BlockingCollection<int>:
public sealed class WorkingClass : IDisposable
{
private BlockingCollection<int> _collection = new BlockingCollection<int>(1);
private List<Task> _consumerTasks = new List<Task>();
public WorkingClass(IObservable<int> rawValues)
{
rawValues.Subscribe(x => _collection.Add(x));
}
public IObservable<int> ProcessedValues()
{
return Observable.Create<int>(observer =>
{
_consumerTasks.Add(Task.Factory.StartNew(() => Consume(observer), TaskCreationOptions.LongRunning));
return Disposable.Empty;
});
}
private void Consume(IObserver<int> observer)
{
try
{
foreach (int value in _collection.GetConsumingEnumerable())
{
Thread.Sleep(1000); //Simulate long work
observer.OnNext(value * 2);
}
}
catch (Exception ex)
{
observer.OnError(ex);
}
}
public void Dispose()
{
_collection.CompleteAdding();
Task.WaitAll(_consumerTasks.ToArray());
_collection.Dispose();
}
}
It can be tested like using the following code:
var sourceValuesScheduler = new TestScheduler();
var source = sourceValuesScheduler.CreateHotObservable(
new Recorded<Notification<int>>(1000, Notification.CreateOnNext(1)));
var observer = sourceValuesScheduler.CreateObserver<int>();
using (var sut = new WorkingClass(source))
{
sourceValuesScheduler.AdvanceTo(1000); //add to collection
sut.ProcessedValues().Subscribe(observer); //consume
} //...and wait until the loop exists
observer.Messages.AssertEqual(new Recorded<Notification<int>>(1000, Notification.CreateOnNext(2)));

Related

C# Design pattern for periodic execution of multiple Threads

I have a below requirement in my C# Windows Service.
At the starting of Service, it fetches a collection of data from db
and keeps it in memory.
Have a business logic to be executed periodically from 3 different threads.
Each thread will execute same bussiness logic with different subset of data from the data collection mentioned in step 1. Each thread will produce different result sets.
All 3 threads will run periodically if any change happened to the data collection.
When any client makes call to the service, service should be able to return the status of the thread execution.
I know C# has different mechanisms to implement periodic thread execution.
Timers, Threads with Sleep, Event eventwaithandle ect.,
I am trying to understand Which threading mechanism or design pattern will be best fit for this requirement?
A more modern approach would be using tasks but have a look at the principles
namespace Test {
public class Program {
public static void Main() {
System.Threading.Thread main = new System.Threading.Thread(() => new Processor().Startup());
main.IsBackground = false;
main.Start();
System.Console.ReadKey();
}
}
public class ProcessResult { /* add your result state */ }
public class ProcessState {
public ProcessResult ProcessResult1 { get; set; }
public ProcessResult ProcessResult2 { get; set; }
public ProcessResult ProcessResult3 { get; set; }
public string State { get; set; }
}
public class Processor {
private readonly object _Lock = new object();
private readonly DataFetcher _DataFetcher;
private ProcessState _ProcessState;
public Processor() {
_DataFetcher = new DataFetcher();
_ProcessState = null;
}
public void Startup() {
_DataFetcher.DataChanged += DataFetcher_DataChanged;
}
private void DataFetcher_DataChanged(object sender, DataEventArgs args) => StartProcessingThreads(args.Data);
private void StartProcessingThreads(string data) {
lock (_Lock) {
_ProcessState = new ProcessState() { State = "Starting", ProcessResult1 = null, ProcessResult2 = null, ProcessResult3 = null };
System.Threading.Thread one = new System.Threading.Thread(() => DoProcess1(data)); // manipulate the data toa subset
one.IsBackground = true;
one.Start();
System.Threading.Thread two = new System.Threading.Thread(() => DoProcess2(data)); // manipulate the data toa subset
two.IsBackground = true;
two.Start();
System.Threading.Thread three = new System.Threading.Thread(() => DoProcess3(data)); // manipulate the data toa subset
three.IsBackground = true;
three.Start();
}
}
public ProcessState GetState() => _ProcessState;
private void DoProcess1(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 1 done", ProcessResult1 = result, ProcessResult2 = _ProcessState?.ProcessResult2, ProcessResult3 = _ProcessState?.ProcessResult3 };
}
}
private void DoProcess2(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 2 done", ProcessResult1 = _ProcessState?.ProcessResult1 , ProcessResult2 = result, ProcessResult3 = _ProcessState?.ProcessResult3 };
}
}
private void DoProcess3(string dataSubset) {
// do work
ProcessResult result = new ProcessResult(); // this object contains the result
// on completion
lock (_Lock) {
_ProcessState = new ProcessState() { State = (_ProcessState.State ?? string.Empty) + ", 3 done", ProcessResult1 = _ProcessState?.ProcessResult1, ProcessResult2 = _ProcessState?.ProcessResult2, ProcessResult3 = result };
}
}
}
public class DataEventArgs : System.EventArgs {
// data here is string, but could be anything -- just think of thread safety when accessing from the 3 processors
private readonly string _Data;
public DataEventArgs(string data) {
_Data = data;
}
public string Data => _Data;
}
public class DataFetcher {
// watch for data changes and fire when data has changed
public event System.EventHandler<DataEventArgs> DataChanged;
}
}
The simplest solution would be to define the scheduled logic in Task Method() style, and execute them using Task.Run(), while in the main thread just wait for the execution to finish using Task.WaitAny(). If a task is finished, you could Call Task.WaitAny again, but instead of the finished task, you'd pass Task.Delay(timeUntilNextSchedule).
This way the tasks are not blocking the main thread, and you can avoid spinning the CPU just to wait. In general, you can avoid managing directly in modern .NET
Depending on other requirements, like standardized error handling, monitoring capability, management of these scheduled task, you could also rely on a more robust solution, like HangFire.

How to stop a TransformBlock from processing remaining queued messages based on a condition?

Below is the code for a simple workflow using TPL DataFlow in a Console project.
Three Test objects, TestA, TestB and TestC are posted into the starting bufferBlock. This is linked to a TransformBlock which evaluates each tests' PerformTestAsync() method that returns a Task<TestResult>. The TransformBlock is linked to an ActionBlock which writes the test result to the Console.
All this appears to work fine. However, what I am struggling to do is change the code so that the FIRST time await t.PerformTestAsync() returns TestResult.Failed I want the TransformBlock NOT to process anymore messages and certainly not to pass out any more to the ActionBlock except for the failed result. So, for my example code, I would only like to see "OK", and "Failed" in the console window, and for testC.PerformTestAsync() never to have been called at all.
How might I be able to achieve this?
Code:
class Program
{
static void Main(string[] args)
{
// Create workflow blocks
var bufferBlock = new BufferBlock<TestBase>();
var transformBlock = new TransformBlock<TestBase, TestResult>(async t => await t.PerformTestAsync());
var actionBlock = new ActionBlock<TestResult>(i => Console.WriteLine(i));
// Link Blocks
bufferBlock.LinkTo(transformBlock, new DataflowLinkOptions() { PropagateCompletion = true });
transformBlock.LinkTo(actionBlock, new DataflowLinkOptions() { PropagateCompletion = true });
// Create Tests
var tests = new List<TestBase>() { new TestA(), new TestB(), new TestC() };
// Post them into start of workflow
foreach (var test in tests)
{
bufferBlock.Post<TestBase>(test);
}
bufferBlock.Complete();
actionBlock.Completion.Wait();
Console.ReadLine();
}
}
public enum TestResult
{
OK,
Error,
Failed
}
public abstract class TestBase
{
private readonly string _name;
public TestBase(string name)
{
_name = name;
}
public abstract Task<TestResult> PerformTestAsync();
}
public class TestA : TestBase
{
public TestA() : base("Test A")
{
}
public override Task<TestResult> PerformTestAsync()
{
// Do some processing for this test...
return Task.FromResult(TestResult.OK);
}
}
public class TestB : TestBase
{
public TestB() : base("Test B")
{
}
public override Task<TestResult> PerformTestAsync()
{
// Do some processing for this test...
return Task.FromResult(TestResult.Failed);
}
}
public class TestC : TestBase
{
public TestC() : base("Test C")
{
}
public override Task<TestResult> PerformTestAsync()
{
// Do some processing for this test...
return Task.FromResult(TestResult.OK);
}
}
When await t.PerformTestAsync() returns TestReuslt.Failed, throw an exception. That will fault the flow and prevent any further processing. The flow will then complete in a faulted state. No further items would be processed.
var transformBlock = new TransformBlock<TestBase, TestResult>(async t =>
{
var result = await t.PerformTestAsync();
if (result == TestResult.Failed)
throw new InvalidOperationException();
return result;
});
Note that the exception you throw will be propagated to the Completion task of the final block, i.e. your ActionBlock. When you await that task you'll be able to handle the faulted flow or ignore as you choose.

How to manage a lifetime of an infinite observable when it's dictated by the subscriber not the source

I'm pushing data updates/changes using IObservable, I have a method that gets the latest data from a database GetLatestElement, whenever anyone calls an UpdateElement and the data gets updated, a message is distributed over a messaging system.
So I'm creating an observable that emits the latest value, and then re-emits the new value when it receives the update event form the messaging system:
public IObservable<IElement> GetElement(Guid id)
{
return Observable.Create<T>((observer) =>
{
observer.OnNext(GetLatestElement(id));
// subscribe to internal or external update notifications
var messageCallback = (message) =>
{
// new update message recieved,
observer.OnNext(GetLatestElement(id));
}
messageService.SubscribeToTopic(id, messageCallback);
return Disposable.Create(() => Console.Writeline("Observer Disposed"));
});
}
My problem is that this is indefinite. These updates will potentially happen forever. Since I'm trying to get the system as state-less as possible, a new Observable is created for each request for GetElementType. This means the lifetime is dictated by the subscriber, not the source of the data.
I'll never call OnComplete() in the Observable, I want to complete when the Observer/User is done.
However, I need to call messageService.Unsubscribe(messageCallback); at some point in time in order to unsubscribe from the messages when the Observable is done with.
I could do this when the subscription is disposed, but then I can only subscribe a single time, which seems likely to introduce bugs.
How should this be done with Observables?
It seems there is some misunderstanding about how Observable.Create works. Whenever you call Subscribe on the result of your GetElement() - the body of Observable.Create is executed. So for each subscriber you have separate subscription to your messageService with separate callback to execute. If you unsubscribe - you only remove subscription of that subscriber. All other remain active, because they have their own messageCallback. That is assuming of course that messageService is implemented properly. Here is sample application illustrating that:
static IElement GetLatestElement(Guid id) {
return new Element();
}
public class Element : IElement {
}
public interface IElement {
}
class MessageService {
private Dictionary<Guid, Dictionary<Action<IElement>, CancellationTokenSource>> _subs = new Dictionary<Guid, Dictionary<Action<IElement>, CancellationTokenSource>>();
public void SubscribeToTopic(Guid id, Action<IElement> callback) {
var ct = new CancellationTokenSource();
if (!_subs.ContainsKey(id))
_subs[id] = new Dictionary<Action<IElement>, CancellationTokenSource>();
_subs[id].Add(callback, ct);
Task.Run(() =>
{
while (!ct.IsCancellationRequested) {
callback(new Element());
Thread.Sleep(500);
}
});
}
public void Unsubscribe(Guid id, Action<IElement> callback) {
_subs[id][callback].Cancel();
_subs[id].Remove(callback);
}
}
public static IObservable<IElement> GetElement(Guid id)
{
var messageService = new MessageService();
return Observable.Create<IElement>((observer) =>
{
observer.OnNext(GetLatestElement(id));
// subscribe to internal or external update notifications
Action<IElement> messageCallback = (message) =>
{
// new update message recieved,
observer.OnNext(GetLatestElement(id));
};
messageService.SubscribeToTopic(id, messageCallback);
return Disposable.Create(() => {
messageService.Unsubscribe(id, messageCallback);
Console.WriteLine("Observer Disposed");
});
});
}
public static void Main(string[] args) {
var ob = GetElement(Guid.NewGuid());
var sub1 = ob.Subscribe(c =>
{
Console.WriteLine("got element");
});
var sub2 = ob.Subscribe(c =>
{
Console.WriteLine("got element 2");
});
// at this point we see both subscribers receive messages
Console.ReadKey();
sub1.Dispose();
// first one is unsubscribed, but second one is still alive
Console.ReadKey();
}
So as I said it comments - I see no reason to complete your observable in this case.
As Evk pointed out, Observable.Create runs then disposes almost immediately. If you want to keep the messageService subscription open though, Rx can help you with that. Look at MessageObservableProvider. The rest is just to make things compile:
public class MessageObservableProvider
{
private MessageService messageService;
private Dictionary<Guid, IObservable<Unit>> _messageNotifications = new Dictionary<Guid, IObservable<Unit>>();
private IObservable<Unit> GetMessageNotifications(Guid id)
{
return Observable.Create<Unit>((observer) =>
{
Action<Message> messageCallback = _ => observer.OnNext(Unit.Default);
messageService.SubscribeToTopic(id, messageCallback);
return Disposable.Create(() =>
{
messageService.Unsubscribe(messageCallback);
Console.WriteLine("Observer Disposed");
});
});
}
public IObservable<IElement> GetElement(Guid id)
{
if(!_messageNotifications.ContainsKey(id))
_messageNotifications[id] = GetMessageNotifications(id).Publish().RefCount();
return _messageNotifications[id]
.Select(_ => GetLatestElement(id))
.StartWith(GetLatestElement(id));
}
private IElement GetLatestElement(Guid id)
{
throw new NotImplementedException();
}
}
public class IElement { }
public class Message { }
public class MessageService
{
public void SubscribeToTopic(Guid id, Action<Message> callback)
{
throw new NotImplementedException();
}
public void Unsubscribe(Action<Message> callback)
{
throw new NotImplementedException();
}
}
Your original Create implementation incorporated the functionality of a StartWith and a Select. I moved those out, so now the Observable.Create just returns a notification when a new message is available.
More importantly though, in GetElement there's now a .Publish().RefCount() call. This will leave the messageService subscription open (by not calling .Dispose()) as long as there's at least one child observable (subscription) hanging around.

How to implement a continuous producer-consumer pattern inside a Windows Service

Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}

Parallel Tasks Sharing a Global Variable

Hi I am new to using Parallel tasks. I have a function which I need to run multiple times in parallel. Below is the dummy code to show this,
public MyClass GlobalValue;
static void Main(string[] args)
{
Task task1 = Task.Factory.StartNew(() => SaveValue());
Task task2 = Task.Factory.StartNew(() => SaveValue());
Task task3 = Task.Factory.StartNew(() => SaveValue());
}
public void SaveValue()
{
string val = GetValueFromDB();
if (GlobalValue == NULL)
{
GlobalValue = New MyClass(val);
}
else if (GlobalValue.Key != val)
{
GlobalValue = New MyClass(val);
}
string result = GlobalValue.GetData();
}
Now the line GlobalValue = New GlobalValue(val) is called every time. Kindly help me with this. I think there is a problem with the Global Variable.
You need to synchronize the access to the shared data, as each thread will try to access it at the same time, and see that it's null, then all will allocate.
Note that the synchronization, if done via lock, will likely cause the three threads to effectively run sequentially, as only one thread can enter a lock at a time.
well, why not do
static void Main()
{
var tasks = new[]
{
Task.Factory.StartNew(() => YourFunction()),
Task.Factory.StartNew(() => YourFunction()),
Task.Factory.StartNew(() => YourFunction())
};
Task.WaitAll(tasks)
}
public static string YourFunction()
{
var yourClass = new MyClass(GetValueFromDB());
return yourClass.GetData();
}
I don't see why you need GlobalValue. Is MyClass expensive to instantiate? More notably, you don't do anything with the results so all is moot.
Since the features are available, assuming you're using .Net 4.5 (c# 5.0), you could do
static void Main()
{
await Task.WhenAll(YourFunction(), YourFunction(), YourFunction());
}
public async Task<string> YourFunction()
{
return new MyClass(GetValueFromDB()).GetData();
}
For the sake of illustration, you could still use a global variable but it would massively mitigate the benefits of parallelization. You just have to make sure you serialize access to shared state or use thread safe types that do it for you.
adapted from your example,
private readonly SemaphoreSlim globalLock = new SemaphoreSlim(1));
...
public void SaveValue()
{
string val = GetValueFromDB();
MyClass thisValue;
globalLock.Wait();
try
{
if (this.GlobalValue == NULL)
{
this.GlobalValue = new MyClass(val);
}
else if (this.GlobalValue.Key != val)
{
this.GlobalValue = new MyClass(val);
}
thisValue = this.GlobalValue
}
finally
{
globalLock.Release();
}
string result = thisValue.GetData();
}

Categories

Resources