Multiple publishers sending concurrent messages to a single subscriber in Retlang? - c#

I need to build an application where some number of instances of an object are generating "pulses", concurrently. (Essentially this just means that they are incrementing a counter.) I also need to track the total counters for each object. Also, whenever I perform a read on a counter, it needs to be reset to zero.
So I was talking to a guy at work, and he mentioned Retlang and message-based concurrency, which sounded super interesting. But obviously I am very new to the concept. So I've built a small prototype, and I get the expected results, which is awesome - but I'm not sure if I've potentially made some logical errors and left the software open to bugs, due to my inexperience with Retlang and concurrent programming in general.
First off, I have these classes:
public class Plc {
private readonly IChannel<Pulse> _channel;
private readonly IFiber _fiber;
private readonly int _pulseInterval;
private readonly int _plcId;
public Plc(IChannel<Pulse> channel, int plcId, int pulseInterval) {
_channel = channel;
_pulseInterval = pulseInterval;
_fiber = new PoolFiber();
_plcId = plcId;
}
public void Start() {
_fiber.Start();
// Not sure if it's safe to pass in a delegate which will run in an infinite loop...
// AND use a shared channel object...
_fiber.Enqueue(() => {
SendPulse();
});
}
private void SendPulse() {
while (true) {
// Not sure if it's safe to use the same channel object in different
// IFibers...
_channel.Publish(new Pulse() { PlcId = _plcId });
Thread.Sleep(_pulseInterval);
}
}
}
public class Pulse {
public int PlcId { get; set; }
}
The idea here is that I can instantiate multiple Plcs, pass each one the same IChannel, and then have them execute the SendPulse function concurrently, which would allow each one to publish to the same channel. But as you can see from my comments, I'm a little skeptical that what I'm doing is actually legit. I'm mostly worried about using the same IChannel object to Publish in the context of different IFibers, but I'm also worried about never returning from the delegate that was passed to Enqueue. I'm hoping some one can provide some insight as to how I should be handling this.
Also, here is the "subscriber" class:
public class PulseReceiver {
private int[] _pulseTotals;
private readonly IFiber _fiber;
private readonly IChannel<Pulse> _channel;
private object _pulseTotalsLock;
public PulseReceiver(IChannel<Pulse> channel, int numberOfPlcs) {
_pulseTotals = new int[numberOfPlcs];
_channel = channel;
_fiber = new PoolFiber();
_pulseTotalsLock = new object();
}
public void Start() {
_fiber.Start();
_channel.Subscribe(_fiber, this.UpdatePulseTotals);
}
private void UpdatePulseTotals(Pulse pulse) {
// This occurs in the execution context of the IFiber.
// If we were just dealing with the the published Pulses from the channel, I think
// we wouldn't need the lock, since I THINK the published messages would be taken
// from a queue (i.e. each Plc is publishing concurrently, but Retlang enqueues
// the messages).
lock(_pulseTotalsLock) {
_pulseTotals[pulse.PlcId - 1]++;
}
}
public int GetTotalForPlc(int plcId) {
// However, this access takes place in the application thread, not in the IFiber,
// and I think there could potentially be a race condition here. I.e. the array
// is being updated from the IFiber, but I think I'm reading from it and resetting values
// concurrently in a different thread.
lock(_pulseTotalsLock) {
if (plcId <= _pulseTotals.Length) {
int currentTotal = _pulseTotals[plcId - 1];
_pulseTotals[plcId - 1] = 0;
return currentTotal;
}
}
return -1;
}
}
So here, I am reusing the same IChannel that was given to the Plc instances, but having a different IFiber subscribe to it. Ideally then I could receive the messages from each Plc, and update a single private field within my class, but in a thread safe way.
From what I understand (and I mentioned in my comments), I think that I would be safe to simply update the _pulseTotals array in the delegate which I gave to the Subscribe function, because I would receive each message from the Plcs serially.
However, I'm not sure how best to handle the bit where I need to read the totals and reset them. As you can see from the code and comments, I ended up wrapping a lock around any access to the _pulseTotals array. But I'm not sure if this is necessary, and I would love to know a) if it is in fact necessary to do this, and why, or b) the correct way to implement something similar.
And finally for good measure, here's my main function:
static void Main(string[] args) {
Channel<Pulse> pulseChannel = new Channel<Pulse>();
PulseReceiver pulseReceiver = new PulseReceiver(pulseChannel, 3);
pulseReceiver.Start();
List<Plc> plcs = new List<Plc>() {
new Plc(pulseChannel, 1, 500),
new Plc(pulseChannel, 2, 250),
new Plc(pulseChannel, 3, 1000)
};
plcs.ForEach(plc => plc.Start());
while (true) {
Thread.Sleep(10000);
Console.WriteLine(string.Format("Plc 1: {0}\nPlc 2: {1}\nPlc 3: {2}\n", pulseReceiver.GetTotalForPlc(1), pulseReceiver.GetTotalForPlc(2), pulseReceiver.GetTotalForPlc(3)));
}
}
I instantiate one single IChannel, pass it to everything, where internally the Receiver subscribes with an IFiber, and where the Plcs use IFibers to "enqueue" a non-returning method which continually publishes to the channel.
Again, the console output looks exactly like I would expect it to look, i.e. I see 20 "pulses" for Plc 1 after waiting 10 seconds. And the resetting of the counters after a read also seems to work, i.e. Plc 1 has 20 "pulses" after each 10 second increment. But that doesn't reassure me that I haven't overlooked something important.
I'm really excited to learn a bit more about Retlang and concurrent programming techniques, so hopefuly someone has the time to sift through my code and offer some suggestions for my specific concerns, or else even a different design based on my requirements!

Related

Broken lock strategy - analysis and correction

I'm asking this primarily as a sanity check: In a C# (8.0) application I've got this bit of code, which spuriously fails with an "object is not synchronized" exception from Monitor.pulse() (I've omitted irrelevant code for clarity):
// vanilla multiple-producer single-consumer queue stuff:
private Queue<Message> messages = new Queue<Message>();
private void ConsumerThread () {
Queue<Message> myMessages = new Queue<Message>();
while (...) {
lock (messages) {
// wait
while (messages.Count == 0)
Monitor.Wait(messages);
// swap
(messages, myMessages) = (myMessages, messages);
}
// process
while (myMessages.Count > 0)
DoStuff(myMessages.Dequeue());
}
}
public void EnqueueMessage (...) {
Message message = new Message(...);
lock (messages) {
messages.Enqueue(message);
Monitor.Pulse(messages);
}
}
I'm fairly new to C# and also I was stressed when I wrote that. Now I am reviewing that code to fix the exception and I'm immediately raising an eyebrow at the fact that I reassigned messages inside the consumer's lock.
I looked around and found Is it bad to overwrite a lock object if it is the last statement in the lock?, which validates my raised eyebrow.
However, I still don't have a lot of confidence (inexperience + stress), so, just to confirm: Is the following analysis of why this is broken correct?
If the following happens, in this order:
Stuff happens to be in the queue.
Consumer thread locks messages (and will skip wait loop).
EnqueueMessage tries to lock messages, waits for lock.
Consumer thread swaps messages and myMessages, releases lock.
EnqueueMessage takes lock.
EnqueueMessage adds item to messages and calls Monitor.pulse(messages) except messages isn't the same object that it locked in step (3), since it was swapped out from under us in (4). Possible consequences include:
Calling Monitor.Pulse on a non-locked object (what used to be myMessages) -- hence the aforementioned exception.
Enqueueing to the wrong queue and the consequences of that.
Even weirder stuff if the consumer thread manages to complete another full loop cycle while EnqueueMessage is still somewhere in its lock{}.
Right? I'm pretty sure that's right, it feels very basic, but I just want to confirm because I'm completely burnt out right now.
Then, whether that's correct or not: Does the following proposed fix make sense?
It seems to me like the fix is super simple: Instead of using messages as the monitor object, just use some dedicated dummy object that won't be changed:
private readonly object messagesLock = new object();
private Queue<Message> messages = new Queue<Message>();
private void ConsumerThread () {
Queue<Message> myMessages = new Queue<Message>();
while (...) {
lock (messagesLock) {
while (messages.Count == 0)
Monitor.Wait(messagesLock);
(messages, myMessages) = (myMessages, messages);
}
}
...
}
public void EnqueueMessage (...) {
...;
lock (messagesLock) {
messages.Enqueue(...);
Monitor.Pulse(messagesLock);
}
}
Where the intent is to avoid any issues caused by swapping out the lock object in strange places.
And that should work... right?
Nobody uses Queue in multi-threading since .NET 2 probably 16 yrs ago (correct me if I am wrong with dates).
it is trivial with concurrent collections.
BlockingColleciton<Message> myMessages = new BlockingColleciton<Message>();
private void ConsumerThread () {
while (...)
{
var message = myMessages.Take();
}
...
}
public void EnqueueMessage (Message msg) {
...;
myMessages.Add(msg);
}

How to return a data before method complete execution?

I have a slow and expensive method that return some data for me:
public Data GetData(){...}
I don't want to wait until this method will execute. Rather than I want to return a cached data immediately.
I have a class CachedData that contains one property Data cachedData.
So I want to create another method public CachedData GetCachedData() that will initiate a new task(call GetData inside of it) and immediately return cached data and after task will finish we will update the cache.
I need to have thread safe GetCachedData() because I will have multiple request that will call this method.
I will have a light ping "is there anything change?" each minute and if it will return true (cachedData != currentData) then I will call GetCachedData().
I'm new in C#. Please, help me to implement this method.
I'm using .net framework 4.5.2
The basic idea is clear:
You have a Data property which is wrapper around an expensive function call.
In order to have some response immediately the property holds a cached value and performs updating in the background.
No need for an event when the updater is done because you poll, for now.
That seems like a straight-forward design. At some point you may want to use events, but that can be added later.
Depending on the circumstances it may be necessary to make access to the property thread-safe. I think that if the Data cache is a simple reference and no other data is updated together with it, a lock is not necessary, but you may want to declare the reference volatile so that the reading thread does not rely on a stale cached (ha!) version. This post seems to have good links which discuss the issues.
If you will not call GetCachedData at the same time, you may not use lock. If data is null (for sure first run) we will wait long method to finish its work.
public class SlowClass
{
private static object _lock;
private static Data _cachedData;
public SlowClass()
{
_lock = new object();
}
public void GetCachedData()
{
var task = new Task(DoStuffLongRun);
task.Start();
if (_cachedData == null)
task.Wait();
}
public Data GetData()
{
if (_cachedData == null)
GetCachedData();
return _cachedData;
}
private void DoStuffLongRun()
{
lock (_lock)
{
Console.WriteLine("Locked Entered");
Thread.Sleep(5000);//Do Long Stuff
_cachedData = new Data();
}
}
}
I have tested on console application.
static void Main(string[] args)
{
var mySlow = new SlowClass();
var mySlow2 = new SlowClass();
mySlow.GetCachedData();
for (int i = 0; i < 5; i++)
{
Console.WriteLine(i);
mySlow.GetData();
mySlow2.GetData();
}
mySlow.GetCachedData();
Console.Read();
}
Maybe you can use the MemoryCache class,
as explained here in MSDN

How to manage observable subscription for dependent observables?

This sample console application has 2 observables. The first one pushes numbers from 1 to 100. This observable is subscribed by the AsyncClass which runs a long running process for each number it gets. Upon completion of this new async process I want to be able to 'push' to 2 subscribers which would be doing something with this new value.
My attempts are commented in the source code below.
AsyncClass:
class AsyncClass
{
private readonly IConnectableObservable<int> _source;
private readonly IDisposable _sourceDisposeObj;
public IObservable<string> _asyncOpObservable;
public AsyncClass(IConnectableObservable<int> source)
{
_source = source;
_sourceDisposeObj = _source.Subscribe(
ProcessArguments,
ExceptionHandler,
Completed
);
_source.Connect();
}
private void Completed()
{
Console.WriteLine("Completed");
Console.ReadKey();
}
private void ExceptionHandler(Exception exp)
{
throw exp;
}
private void ProcessArguments(int evtArgs)
{
Console.WriteLine("Argument being processed with value: " + evtArgs);
//_asyncOpObservable = LongRunningOperationAsync("hello").Publish();
// not going to work either since this creates a new observable for each value from main observer
}
// http://rxwiki.wikidot.com/101samples
public IObservable<string> LongRunningOperationAsync(string param)
{
// should not be creating an observable here, rather 'pushing' values?
return Observable.Create<string>(
o => Observable.ToAsync<string, string>(DoLongRunningOperation)(param).Subscribe(o)
);
}
private string DoLongRunningOperation(string arg)
{
return "Hello";
}
}
Main:
static void Main(string[] args)
{
var source = Observable
.Range(1, 100)
.Publish();
var asyncObj = new AsyncClass(source);
var _asyncTaskSource = asyncObj._asyncOpObservable;
var ui1 = new UI1(_asyncTaskSource);
var ui2 = new UI2(_asyncTaskSource);
}
UI1 (and UI2, they're basically the same):
class UI1
{
private IConnectableObservable<string> _asyncTaskSource;
private IDisposable _taskSourceDisposable;
public UI1(IConnectableObservable<string> asyncTaskSource)
{
_asyncTaskSource = asyncTaskSource;
_asyncTaskSource.Connect();
_taskSourceDisposable = _asyncTaskSource.Subscribe(RefreshUI, HandleException, Completed);
}
private void Completed()
{
Console.WriteLine("UI1: Stream completed");
}
private void HandleException(Exception obj)
{
Console.WriteLine("Exception! "+obj.Message);
}
private void RefreshUI(string obj)
{
Console.WriteLine("UI1: UI refreshing with value "+obj);
}
}
This is my first project with Rx so let me know if I should be thinking differently. Any help would be highly appreciated!
I'm going to let you know you should be thinking differently... :) Flippancy aside, this looks like a case of bad collision between object-oriented and functional-reactive styles.
It's not clear what the requirements are around timing of the data flow and caching of results here - the use of Publish and IConnectableObservable is a little confused. I'm going to guess you want to avoid the 2 downstream subscriptions causing the processing of a value being duplicated? I'm basing some of my answer on that premise. The use of Publish() can achieve this by allowing multiple subscribers to share a subscription to a single source.
Idiomatic Rx wants you to try and keep to a functional style. In order to do this, you want to present the long running work as a function. So let's say, instead of trying to wire your AsyncClass logic directly into the Rx chain as a class, you could present it as a function like this contrived example:
async Task<int> ProcessArgument(int argument)
{
// perform your lengthy calculation - maybe in an OO style,
// maybe creating class instances and invoking methods etc.
await Task.Delay(TimeSpan.FromSeconds(1));
return argument + 1;
}
Now, you can construct a complete Rx observable chain calling this function, and through the use of Publish().RefCount() you can avoid multiple subscribers causing duplicate effort. Note how this separates concerns too - the code processing the value is simpler because the reuse is handled elsewhere.
var query = source.SelectMany(x => ProcessArgument(x).ToObservable())
.Publish().RefCount();
By creating a single chain for subscribers, the work is only started when necessary on subscription. I've used Publish().RefCount() - but if you want to ensure values aren't missed by the second and subsequent subscribers, you could use Replay (easy) or use Publish() and then Connect - but you'll want the Connect logic outside the individual subscriber's code because you just need to call it once when all subscribers have subscribed.

Resource Access by Parallel Threads

I have 2 threads to are triggered at the same time and run in parallel. These 2 threads are going to be manipulating a string value, but I want to make sure that there are no data inconsistencies. For that I want to use a lock with Monitor.Pulse and Monitor.Wait. I used a method that I found on another question/answer, but whenever I run my program, the first thread gets stuck at the Monitor.Wait level. I think that's because the second thread has already "Pulsed" and "Waited". Here is some code to look at:
string currentInstruction;
public void nextInstruction()
{
Action actions = {
fetch,
decode
}
Parallel.Invoke(actions);
_pc++;
}
public void fetch()
{
lock(irLock)
{
currentInstruction = "blah";
GiveTurnTo(2);
WaitTurn(1);
}
decodeEvent.WaitOne();
}
public void decode()
{
decodeEvent.Set();
lock(irLock)
{
WaitTurn(2);
currentInstruction = "decoding..."
GiveTurnTo(1);
}
}
// Below are the methods I talked about before.
// Wait for turn to use lock object
public static void WaitTurn(int threadNum, object _lock)
{
// While( not this threads turn )
while (threadInControl != threadNum)
{
// "Let go" of lock on SyncRoot and wait utill
// someone finishes their turn with it
Monitor.Wait(_lock);
}
}
// Pass turn over to other thread
public static void GiveTurnTo(int nextThreadNum, object _lock)
{
threadInControl = nextThreadNum;
// Notify waiting threads that it's someone else's turn
Monitor.Pulse(_lock);
}
Any idea how to get 2 parallel threads to communicate (manipulate the same resources) within the same cycle using locks or anything else?
You want to run 2 peaces of code in parallel, but locking them at start using the same variable?
As nvoigt mentioned, it already sounds wrong. What you have to do is to remove lock from there. Use it only when you are about to access something exclusively.
Btw "data inconsistencies" can be avoided by not having to have them. Do not use currentInstruction field directly (is it a field?), but provide a thread safe CurrentInstruction property.
private object _currentInstructionLock = new object();
private string _currentInstruction
public string CurrentInstruction
{
get { return _currentInstruction; }
set
{
lock(_currentInstructionLock)
_currentInstruction = value;
}
}
Other thing is naming, local variables name starting from _ is a bad style. Some peoples (incl. me) using them to distinguish private fields. Property name should start from BigLetter and local variables fromSmall.

static method vs instance method, multi threading, performance

Can you help explain how multiple threads access static methods? Are multiple threads able to access the static method concurrently?
To me it would seem logical that if a method is static that would make it a single resouce that is shared by all the threads. Therefore only one thread would be able to use it at a time. I have created a console app to test this. But from the results of my test it would appear that my assumption is incorrect.
In my test a number of Worker objects are constructed. Each Worker has a number of passwords and keys. Each Worker has an instance method that hashes it's passwords with it's keys. There is also a static method which has exactly the same implementation, the only difference being that it is static. After all the Worker objects have been created the start time is written to the console. Then a DoInstanceWork event is raised and all of the Worker objects queue their useInstanceMethod to the threadpool. When all the methods or all the Worker objects have completed the time it took for them all to complete is calculated from the start time and is written to the console. Then the start time is set to the current time and the DoStaticWork event is raised. This time all the Worker objects queue their useStaticMethod to the threadpool. And when all these method calls have completed the time it took until they had all completed is again calculated and written to the console.
I was expecting the time taken when the objects use their instance method to be 1/8 of the time taken when they use the static method. 1/8 because my machine has 4 cores and 8 virtual threads. But it wasn't. In fact the time taken when using the static method was actually fractionally faster.
How is this so? What is happening under the hood? Does each thread get it's own copy of the static method?
Here is the Console app-
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
using System.Threading;
namespace bottleneckTest
{
public delegate void workDelegate();
class Program
{
static int num = 1024;
public static DateTime start;
static int complete = 0;
public static event workDelegate DoInstanceWork;
public static event workDelegate DoStaticWork;
static bool flag = false;
static void Main(string[] args)
{
List<Worker> workers = new List<Worker>();
for( int i = 0; i < num; i++){
workers.Add(new Worker(i, num));
}
start = DateTime.UtcNow;
Console.WriteLine(start.ToString());
DoInstanceWork();
Console.ReadLine();
}
public static void Timer()
{
complete++;
if (complete == num)
{
TimeSpan duration = DateTime.UtcNow - Program.start;
Console.WriteLine("Duration: {0}", duration.ToString());
complete = 0;
if (!flag)
{
flag = true;
Program.start = DateTime.UtcNow;
DoStaticWork();
}
}
}
}
public class Worker
{
int _id;
int _num;
KeyedHashAlgorithm hashAlgorithm;
int keyLength;
Random random;
List<byte[]> _passwords;
List<byte[]> _keys;
List<byte[]> hashes;
public Worker(int id, int num)
{
this._id = id;
this._num = num;
hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
keyLength = hashAlgorithm.Key.Length;
random = new Random();
_passwords = new List<byte[]>();
_keys = new List<byte[]>();
hashes = new List<byte[]>();
for (int i = 0; i < num; i++)
{
byte[] key = new byte[keyLength];
new RNGCryptoServiceProvider().GetBytes(key);
_keys.Add(key);
int passwordLength = random.Next(8, 20);
byte[] password = new byte[passwordLength * 2];
random.NextBytes(password);
_passwords.Add(password);
}
Program.DoInstanceWork += new workDelegate(doInstanceWork);
Program.DoStaticWork += new workDelegate(doStaticWork);
}
public void doInstanceWork()
{
ThreadPool.QueueUserWorkItem(useInstanceMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void doStaticWork()
{
ThreadPool.QueueUserWorkItem(useStaticMethod, new WorkerArgs() { num = _num, keys = _keys, passwords = _passwords });
}
public void useInstanceMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public static void useStaticMethod(object args)
{
WorkerArgs workerArgs = (WorkerArgs)args;
for (int i = 0; i < workerArgs.num; i++)
{
KeyedHashAlgorithm hashAlgorithm = KeyedHashAlgorithm.Create("HMACSHA256");
hashAlgorithm.Key = workerArgs.keys[i];
byte[] hash = hashAlgorithm.ComputeHash(workerArgs.passwords[i]);
}
Program.Timer();
}
public class WorkerArgs
{
public int num;
public List<byte[]> passwords;
public List<byte[]> keys;
}
}
}
Methods are code - there's no problem with thread accessing that code concurrently since the code isn't modified by running it; it's a read-only resource (jitter aside). What needs to be handled carefully in multi-threaded situations is access to data concurrently (and more specifically, when modifying that data is a possibility). Whether a method is static or an instance method has nothing to do with whether or not it needs to ne serialized in some way to make it threadsafe.
In all cases, whether static or instance, any thread can access any method at any time unless you do explicit work to prevent it.
For example, you can create a lock to ensure only a single thread can access a given method, but C# will not do that for you.
Think of it like watching TV. A TV does nothing to prevent multiple people from watching it at the same time, and as long as everybody watching it wants to see the same show, there's no problem. You certainly wouldn't want a TV to only allow one person to watch it at once just because multiple people might want to watch different shows, right? So if people want to watch different shows, they need some sort of mechanism external to the TV itself (perhaps having a single remote control that the current viewer holds onto for the duration of his show) to make sure that one guy doesn't change the channel to his show while another guy is watching.
C# methods are "reentrant" (As in most languages; the last time I heard of genuinely non-reentrant code was DOS routines) Each thread has its own call stack, and when a method is called, the call stack of that thread is updated to have space for the return address, calling parameters, return value, local values, etc.
Suppose Thread1 and Thread2 calls the method M concurrently and M has a local int variable n. The call stack of Thread1 is seperate from the call stack of Thread2, so n will have two different instantiations in two different stacks. Concurrency would be a problem only if n is stored not in a stack but say in the same register (i.e. in a shared resource) CLR (or is it Windows?) is careful not to let that cause a problem and cleans, stores and restores the registers when switching threads. (What do you do in presence of multiple CPU's, how do you allocate registers, how do you implement locking. These are indeed difficult problems that makes one respect compiler, OS writers when one comes to think of it)
Being reentrant does not prove no bad things happen when two threads call the same method at the same time: it only proves no bad things happen if the method does not access and update other shared resources.
When you access an instance method, you are accessing it through an object reference.
When you access a static method, you are accessing it directly.
So static methods are a tiny bit faster.
When you instanciate a class you dont create a copy of the code. You have a pointer to the definition of the class, and the code is acceded through it. So, instance methods are accessed the sane way than static methods

Categories

Resources