I'm trying to use a dataflowblock and I need to spy the items passing through for unit testing.
In order to do this, I'm using the AsObservable() method on ISourceBlock<T> of my TransformBlock<Tinput, T>,
so I can check after execution that each block of my pipeline have generated the expected values.
Pipeline
{
...
var observer = new MyObserver<string>();
_block = new TransformManyBlock<string, string>(MyHandler, options);
_block.LinkTo(_nextBlock);
_block.AsObservable().Subscribe(observer);
_block.Post("Test");
...
}
MyObserver
public class MyObserver<T> : IObserver<T>
{
public List<Exception> Errors = new List<Exception>();
public bool IsComplete = false;
public List<T> Values = new List<T>();
public void OnCompleted()
{
IsComplete = true;
}
public void OnNext(T value)
{
Values.Add(value);
}
public void OnError(Exception e)
{
Errors.Add(e);
}
}
So basically I subscribe my observer to the transformblock, and I expect that each value passing through get registered in my observer "values" list.
But, while the IsComplete is set to true, and the OnError() successfully register exception,
the OnNext() method never get called unless it is the last block of the pipeline...
I can't figure out why, because the "nextblock" linked to this sourceBlock successfully receive the data, proving that some data are exiting the block.
From what I understand, the AsObservable is supposed to report every values exiting the block and not only the values that have not been consumed by other linked blocks...
What am I doing wrong ?
Your messages are being consumed by _nextBlock before you get a chance to read them.
If you comment out this line _block.LinkTo(_nextBlock); it would likely work.
AsObservable sole purpose is just to allow a block to be consumed from RX. It doesn't change the internal working of the block to broadcast messages to multiple targets. You need a special block for that BroadcastBlock
I would suggest broadcasting to another block and using that to Subscribe
BroadcastBlock’s mission in life is to enable all targets linked from
the block to get a copy of every element published
var options = new DataflowLinkOptions {PropagateCompletion = true};
var broadcastBlock = new BroadcastBlock<string>(x => x);
var bufferBlock = new BufferBlock<string>();
var actionBlock = new ActionBlock<string>(s => Console.WriteLine("Action " + s));
broadcastBlock.LinkTo(bufferBlock, options);
broadcastBlock.LinkTo(actionBlock, options);
bufferBlock.AsObservable().Subscribe(s => Console.WriteLine("peek " + s));
for (var i = 0; i < 5; i++)
await broadcastBlock.SendAsync(i.ToString());
broadcastBlock.Complete();
await actionBlock.Completion;
Output
peek 0
Action 0
Action 1
Action 2
Action 3
Action 4
peek 1
peek 2
peek 3
peek 4
Related
I have an extremely simple setup for sending message to Kafka:
var producerConfig = new ProducerConfig
{
BootstrapServers = "www.example.com",
SecurityProtocol = SecurityProtocol.SaslSsl,
SaslMechanism = SaslMechanism.ScramSha512,
SaslUsername = _options.SaslUsername,
SaslPassword = _options.SaslPassword,
MessageTimeoutMs = 1
};
var producerBuilder = new ProducerBuilder<Null, string>(producerConfig);
using var producer = producerBuilder.Build();
producer.Produce("Some Topic", new Message<Null, string>()
{
Timestamp = Timestamp.Default,
Value = "hello"
});
Before, this code was working fine. Today it has decided to stop working and I'm trying to figure out why. I'm trying to get the Producer to throw an exception when failing to deliver a message, but it never seems to crash. Even when I fill in a wrong username and password, the producer still doesn't crash. Not even a logline in my local output window. How can I debug my Kafka connection when the producer never shows any problems?
You can add SetErrorHandler() to the ProducerBuilder. It would look like this:
var producerBuilder = new ProducerBuilder<Null, string>(producerConfig)
.SetErrorHandler(errorMessageString => .....);
Set a breakpoint in that lambda and you can break on errors.
Produce is asynchronous and not blocking, function signature is
void Produce(string topic, Message<TKey, TValue> message, Action<DeliveryReport<TKey, TValue>> deliveryHandler = null)
In order to verify that a message was delivered without error
you can add a delivery report handler function e.g.
private void DeliveryReportHandler(DeliveryReport<int, T> deliveryReport)
{
if (deliveryReport.Status == PersistenceStatus.NotPersisted)
{
_logger.LogError($"Failed message delivery: error reason:{deliveryReport.Error?.Reason}");
_messageWasNotDelivered = true;
}
}
_messageWasNotDelivered = false;
_producer.Produce(topic,
new Message<int, T>
{
Key = key,
Value = entity
},
DeliveryReportHandler)
_producer.Flush(); // Wait until all outstanding produce requests and delivery report callbacks are completed
if(_messageWasNotDelivered ){
// handle non delivery
}
This code can be trivially adjusted for batch producing like this
_messageWasNotDelivered = false;
foreach(var entity in entities){
_producer.Produce(topic,
new Message<int, T>
{
Key = entity.Id,
Value = entity
},
DeliveryReportHandler)
}
_producer.Flush(); // Wait until all outstanding produce requests and delivery report callbacks are completed
if(_messageWasNotDelivered ){
// handle non delivery
}
I need to import customer related data from legacy DB and perform several transformations during the process. This means a single entry needs to perform additional "events" (synchronize products, create invoices, etc.).
My initial solution was a simple parallel approach.
It works okay, but sometimes it has issues. If the currently processed customers need to wait for the same type of events, their processing queues might got stuck and eventually time out, causing every underlying events to fail too (they depend on the one which failed). It doesn't happen all the time, yet it's annoying.
So I got another idea, work in batches. I mean not only limiting the number of customers being processed at the same time, but also the number of the events which are broadcasted to the queues. While searching around for ideas, I found this answer, which points to the TPL DataFlow.
I made a skeleton to get familiar with it. I set up a simple pipeline, but I'm a bit confused about the usage of Complete() and awaiting Completion().
The steps are the following
Make a list of numbers (the ids of the customers to be imported) - this is outside the import logic, it just there to be able to trigger the rest of the logic
Create a BatchBlock (to be able to limit the number of customers to be processed at the same time)
Create a single MyClass1 item based on the id (TransformBlock<int, MyClass1>)
Perform some logic and generate a collection of MyClass2 (TransformManyBlock<MyClass1, MyClass2>) - as example, sleep for 1 second
Perform some logic on every item of the collection (ActionBlock<MyClass2>) - as example, sleep for 1 second
Here's the full code:
public static class Program
{
private static void Main(string[] args)
{
var batchBlock = new BatchBlock<int>(2);
for (var i = 1; i < 10; i++)
{
batchBlock.Post(i);
}
batchBlock.Complete();
while (batchBlock.TryReceive(null, out var ids))
{
var transformBlock = new TransformBlock<int, MyClass1>(delegate (int id)
{
Console.WriteLine($"TransformBlock(id: {id})");
return new MyClass1(id, "Star Wars");
});
var transformManyBlock = new TransformManyBlock<MyClass1, MyClass2>(delegate (MyClass1 myClass1)
{
Console.WriteLine($"TransformManyBlock(myClass1: {myClass1.Id}|{myClass1.Value})");
Thread.Sleep(1000);
return GetMyClass22Values(myClass1);
});
var actionBlock = new ActionBlock<MyClass2>(delegate (MyClass2 myClass2)
{
Console.WriteLine($"ActionBlock(myClass2: {myClass2.Id}|{myClass2.Value})");
Thread.Sleep(1000);
});
transformBlock.LinkTo(transformManyBlock);
transformManyBlock.LinkTo(actionBlock);
foreach (var id in ids)
{
transformBlock.Post(id);
}
// this is the point when I'm not 100% sure
//transformBlock.Complete();
//transformManyBlock.Complete();
//transformManyBlock.Completion.Wait();
actionBlock.Complete();
actionBlock.Completion.Wait();
}
Console.WriteLine();
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
private static IEnumerable<MyClass2> GetMyClass22Values(MyClass1 myClass1)
{
return new List<MyClass2>
{
new MyClass2(1, myClass1.Id+ " did this"),
new MyClass2(2, myClass1.Id+ " did that"),
new MyClass2(3, myClass1.Id+ " did this again")
};
}
}
public class MyClass1
{
public MyClass1(int id, string value)
{
Id = id;
Value = value;
}
public int Id { get; set; }
public string Value { get; set; }
}
public class MyClass2
{
public MyClass1(int id, string value)
{
Id = id;
Value = value;
}
public int Id { get; set; }
public string Value { get; set; }
}
So the point I struggle with is the end, where I'd need to call Complete() or wait for Completion. I can't seem to find the right combination. I'd like to see an output as follows:
TransformBlock(id: 1)
TransformBlock(id: 2)
TransformManyBlock(myClass1: 1|Star Wars)
TransformManyBlock(myClass1: 2|Star Wars)
ActionBlock(myClass2: 1|1 did this)
ActionBlock(myClass2: 2|1 did that)
ActionBlock(myClass2: 3|1 did this again)
ActionBlock(myClass2: 1|2 did this)
ActionBlock(myClass2: 2|2 did that)
ActionBlock(myClass2: 3|2 did this again)
TransformBlock(id: 3)
TransformBlock(id: 4)
TransformManyBlock(myClass1: 3|Star Wars)
TransformManyBlock(myClass1: 4|Star Wars)
ActionBlock(myClass2: 1|3 did this)
ActionBlock(myClass2: 2|3 did that)
ActionBlock(myClass2: 3|3 did this again)
ActionBlock(myClass2: 1|4 did this)
ActionBlock(myClass2: 2|4 did that)
ActionBlock(myClass2: 3|4 did this again)
[the rest of the items]
Press any key to exit...
Anyone can point me to the right direction?
You're almost there, you need to call Complete on the first block in the pipeline then await Completion on the last block. Then in your links you need to propagate completion like this:
private async static void Main(string[] args) {
var transformBlock = new TransformBlock<int, MyClass1>(delegate (int id)
{
Console.WriteLine($"TransformBlock(id: {id})");
return new MyClass1(id, "Star Wars");
});
var transformManyBlock = new TransformManyBlock<MyClass1, MyClass2>(delegate (MyClass1 myClass1)
{
Console.WriteLine($"TransformManyBlock(myClass1: {myClass1.Id}|{myClass1.Value})");
Thread.Sleep(1000);
return GetMyClass22Values(myClass1);
});
var actionBlock = new ActionBlock<MyClass2>(delegate (MyClass2 myClass2)
{
Console.WriteLine($"ActionBlock(myClass2: {myClass2.Id}|{myClass2.Value})");
Thread.Sleep(1000);
});
//propagate completion
transformBlock.LinkTo(transformManyBlock, new DataflowLinkOptions() { PropagateCompletion = true });
transformManyBlock.LinkTo(actionBlock, new DataflowLinkOptions() { PropagateCompletion = true});
foreach(var id in ids) {
transformBlock.Post(id);
}
//Complete the first block
transformBlock.Complete();
//wait for completion to flow to the last block
await actionBlock.Completion;
}
You can also incorporate the batch block into your pipeline and remove the need for the TryRecieve call but that seems like another part of your flow.
Edit
Example of propagating completion to multiple blocks:
public async static void Main(string[] args) {
var sourceBlock = new BufferBlock<int>();
var processBlock1 = new ActionBlock<int>(i => Console.WriteLine($"Block1 {i}"));
var processBlock2 = new ActionBlock<int>(i => Console.WriteLine($"Block2 {i}"));
sourceBlock.LinkTo(processBlock1);
sourceBlock.LinkTo(processBlock2);
var sourceBlockCompletion = sourceBlock.Completion.ContinueWith(tsk => {
if(!tsk.IsFaulted) {
processBlock1.Complete();
processBlock2.Complete();
} else {
((IDataflowBlock)processBlock1).Fault(tsk.Exception);
((IDataflowBlock)processBlock2).Fault(tsk.Exception);
}
});
//Send some data...
sourceBlock.Complete();
await Task.WhenAll(sourceBlockCompletion, processBlock1.Completion, processBlock2.Completion);
}
I came across a back pressure issue with RX.net I can't find a solution for. I have an observable real-time stream of log messages.
var logObservable = /* Observable stream of log messages */
Which I want to expose via a TCP interface which serializes the real-time log messages from the logObservable before they are sent over the wire. So I do the following:
foreach (var message in logObservable.ToEnumerable())
{
// 1. Serialize message
// 2. Send it over the wire.
}
The problem arises with the .ToEnumerable() if a back pressure scenario happens e.g. if the client on the other end pauses the stream. The problem is that .ToEnumerable() caches the items which result in a lot of memory usage. I'm looking for a mechanism something like a DropQueue which only buffers, let say, the last 10 messages e.g.
var observableStream = logObservable.DropQueue(10).ToEnumerable();
Is this the right way to way to solve this issue? And do you know to implement such a mechanism to avoid possible back pressure issue?
My DropQueue implementation:
public static IEnumerable<TSource> ToDropQueue<TSource>(
this IObservable<TSource> source,
int queueSize,
Action backPressureNotification = null,
CancellationToken token = default(CancellationToken))
{
var queue = new BlockingCollection<TSource>(new ConcurrentQueue<TSource>(), queueSize);
var isBackPressureNotified = false;
var subscription = source.Subscribe(
item =>
{
var isBackPressure = queue.Count == queue.BoundedCapacity;
if (isBackPressure)
{
queue.Take(); // Dequeue an item to make space for the next one
// Fire back-pressure notification if defined
if (!isBackPressureNotified && backPressureNotification != null)
{
backPressureNotification();
isBackPressureNotified = true;
}
}
else
{
isBackPressureNotified = false;
}
queue.Add(item);
},
exception => queue.CompleteAdding(),
() => queue.CompleteAdding());
token.Register(() => { subscription.Dispose(); });
using (new CompositeDisposable(subscription, queue))
{
foreach (var item in queue.GetConsumingEnumerable())
{
yield return item;
}
}
}
I have a class that receives standard .Net events from an external class.
These events have an address property (in addition to a lot of other properties, of course) that I can use to synchronize my events, so that I should be able to create a method to Get something, wait for the correct event, then return the data from the event in the Get method.
However, I'm fairly new to synchronization in C# and was hoping any of you could help me out. Below is somewhat pseudo code for what I want to accomplish:
Someone calls DoAsynchronousToSynchronousCall
That method waits until an event have been received with the same address (or until it times out)
The event checks against all current requests. If it finds a request with the same address, let DoAsynchronousToSynchronousCall know the reply has arrived
DoAsynchronousCall gets (or retrieves) the reply and returns it to the caller
public class MyMessage
{
public string Address { get; set; }
public string Data { get; set; }
}
public Main
{
externalClass.MessageReceived += MessageReceived;
}
public void MessageReceived(MyMessage message)
{
MyMessage request = _requestQueue.FirstOrDefault(m => m.Address = message.Address);
if (request != null)
{
// Do something to let DoAsynchronousToSynchronousCall() know the reply has arrived
}
}
private List<MyMessage> _requestQueue = new List<MyMessage>();
public MyMessage DoAsynchronousToSynchronousCall(MyMessage message)
{
_requestQueue.Add(message);
externalClass.Send(message);
// Do something to wait for a reply (as checked for above)
MyMessage reply = WaitForCorrectReply(timeout: 10000);
return reply;
}
I feel like I'm missing an opportunity to use async and await (yet I don't know how), and I hope you're able to understand what I'm trying to accomplish based on the information above.
You really can't have multiple calls on the fly and have synchronous responses. If you want synchronous responses for multiple calls then you need to do the calls synchronously too.
I would look at using Microsoft's Reactive Extensions (NuGet "Rx-Main") to make what you're doing as simple as possible. Rx lets you turn events into streams of values that you can query against.
Here's what I would do.
I would first define a stream of the received messages as IObservable<MyMessage> receivedMessages like this:
receivedMessages =
Observable
.FromEvent<MessageReceivedHandler, MyMessage>(
h => externalClass.MessageReceived += h,
h => externalClass.MessageReceived -= h);
(You didn't provide a class def so I've called the event delegate MessageReceivedHandler.)
Now you can redefine DoAsynchronousToSynchronousCall as:
public IObservable<MyMessage> DoAsynchronousCall(MyMessage message)
{
return Observable.Create<MyMessage>(o =>
{
IObservable<MyMessage> result =
receivedMessages
.Where(m => m.Address == message.Address)
.Take(1);
IObservable<MyMessage> timeout =
Observable
.Timer(TimeSpan.FromSeconds(10.0))
.Select(x => (MyMessage)null);
IDisposable subscription =
Observable
.Amb(result, timeout)
.Subscribe(o);
externalClass.Send(message);
return subscription;
});
}
The result observable is the receivedMessages filtered for the current message.Address.
The timeout observable is a default value to return if the call takes longer than TimeSpan.FromSeconds(10.0) to complete.
Finally the subscription uses Observable.Amb(...) to determine which of result or timeout produces a value first and subscribes to that result.
So now to call this you can do this:
DoAsynchronousCall(new MyMessage() { Address = "Foo", Data = "Bar" })
.Subscribe(response => Console.WriteLine(response.Data));
So, if I make a simple definition of ExternalClass like this:
public class ExternalClass
{
public event MessageReceivedHandler MessageReceived;
public void Send(MyMessage message)
{
this.MessageReceived(new MyMessage()
{
Address = message.Address,
Data = message.Data + "!"
});
}
}
...I get the result Bar! printed on the console.
If you have a whole bunch of messages that you want to process you can do this:
var messagesToSend = new List<MyMessage>();
/* populate `messagesToSend` */
var query =
from message in messagesToSend.ToObservable()
from response in DoAsynchronousCall(message)
select new
{
message,
response
};
query
.Subscribe(x =>
{
/* Do something with each correctly paired
`x.message` & `x.response`
*/
});
You're probably looking for ManualResetEvent which functions as a "toggle" of sorts to switch between thread-blocking and non-blocking behavior. The DoAsynchronousToSynchronousCall would Reset and then WaitOne(int timeoutMilliseconds) the event to block the thread, and the thing checking for the correct reply arrived would do the Set call to let the thread continue on its way if the correct thing arrived.
FWIW - I'm scrapping the previous version of this question in favor of different one along the same way after asking for advice on meta
I have a webservice that contains configuration data. I would like to call it at regular intervals Tok in order to refresh the configuration data in the application that uses it. If the service is in error (timeout, down, etc) I want to keep the data from the previous call and call the service again after a different time interval Tnotok. Finally I want the behavior to be testable.
Since managing time sequences and testability seems like a strong point of the Reactive Extensions, I started using an Observable that will be fed by a generated sequence. Here is how I create the sequence:
Observable.Generate<DataProviderResult, DataProviderResult>(
// we start with some empty data
new DataProviderResult() {
Failures = 0
, Informations = new List<Information>()},
// never stop
(r) => true,
// there is no iteration
(r) => r,
// we get the next value from a call to the webservice
(r) => FetchNextResults(r),
// we select time for next msg depending on the current failures
(r) => r.Failures > 0 ? tnotok : tok,
// we pass a TestScheduler
scheduler)
.Suscribe(r => HandleResults(r));
I have two problems currently:
It looks like I am creating a hot observable. Even trying to use Publish/Connect I have the suscribed action missing the first event. How can I create it as a cold observable?
myObservable = myObservable.Publish();
myObservable.Suscribe(r => HandleResults(r));
myObservable.Connect() // doesn't call onNext for first element in sequence
When I suscribe, the order in which the suscription and the generation seems off, since for any frame the suscription method is fired before the FetchNextResults method. Is it normal? I would expect the sequence to call the method for frame f, not f+1.
Here is the code that I'm using for fetching and suscription:
private DataProviderResult FetchNextResults(DataProviderResult previousResult)
{
Console.WriteLine(string.Format("Fetching at {0:hh:mm:ss:fff}", scheduler.Now));
try
{
return new DataProviderResult() { Informations = dataProvider.GetInformation().ToList(), Failures = 0};
}
catch (Exception)
{}
previousResult.Failures++;
return previousResult;
}
private void HandleResults(DataProviderResult result)
{
Console.WriteLine(string.Format("Managing at {0:hh:mm:ss:fff}", scheduler.Now));
dataResult = result;
}
Here is what I'm seeing that prompted me articulating these questions:
Starting at 12:00:00:000
Fetching at 12:00:00:000 < no managing the result that has been fetched here
Managing at 12:00:01:000 < managing before fetching for frame f
Fetching at 12:00:01:000
Managing at 12:00:02:000
Fetching at 12:00:02:000
EDIT: Here is a bare bones copy-pastable program that illustrates the problem.
/*using System;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using Microsoft.Reactive.Testing;*/
private static int fetchData(int i, IScheduler scheduler)
{
writeTime("fetching " + (i+1).ToString(), scheduler);
return i+1;
}
private static void manageData(int i, IScheduler scheduler)
{
writeTime("managing " + i.ToString(), scheduler);
}
private static void writeTime(string msg, IScheduler scheduler)
{
Console.WriteLine(string.Format("{0:mm:ss:fff} {1}", scheduler.Now, msg));
}
private static void Main(string[] args)
{
var scheduler = new TestScheduler();
writeTime("start", scheduler);
var datas = Observable.Generate<int, int>(fetchData(0, scheduler),
(d) => true,
(d) => fetchData(d, scheduler),
(d) => d,
(d) => TimeSpan.FromMilliseconds(1000),
scheduler)
.Subscribe(i => manageData(i, scheduler));
scheduler.AdvanceBy(TimeSpan.FromMilliseconds(3000).Ticks);
}
This outputs the following:
00:00:000 start
00:00:000 fetching 1
00:01:000 managing 1
00:01:000 fetching 2
00:02:000 managing 2
00:02:000 fetching 3
I don't understand why the managing of the first element is not picked up immediately after its fetching. There is one second between the sequence effectively pulling the data and the data being handed to the observer. Am I missing something here or is it expected behavior? If so is there a way to have the observer react immediately to the new value?
You are misunderstanding the purpose of the timeSelector parameter. It is called each time a value is generated and it returns a time which indicates how long to delay before delivering that value to observers and then generating the next value.
Here's a non-Generate way to tackle your problem.
private DataProviderResult FetchNextResult()
{
// let exceptions throw
return dataProvider.GetInformation().ToList();
}
private IObservable<DataProviderResult> CreateObservable(IScheduler scheduler)
{
// an observable that produces a single result then completes
var fetch = Observable.Defer(
() => Observable.Return(FetchNextResult));
// concatenate this observable with one that will pause
// for "tok" time before completing.
// This observable will send the result
// then pause before completing.
var fetchThenPause = fetch.Concat(Observable
.Empty<DataProviderResult>()
.Delay(tok, scheduler));
// Now, if fetchThenPause fails, we want to consume/ignore the exception
// and then pause for tnotok time before completing with no results
var fetchPauseOnErrors = fetchThenPause.Catch(Observable
.Empty<DataProviderResult>()
.Delay(tnotok, scheduler));
// Now, whenever our observable completes (after its pause), start it again.
var fetchLoop = fetchPauseOnErrors.Repeat();
// Now use Publish(initialValue) so that we remember the most recent value
var fetchLoopWithMemory = fetchLoop.Publish(null);
// YMMV from here on. Lets use RefCount() to start the
// connection the first time someone subscribes
var fetchLoopAuto = fetchLoopWithMemory.RefCount();
// And lets filter out that first null that will arrive before
// we ever get the first result from the data provider
return fetchLoopAuto.Where(t => t != null);
}
public MyClass()
{
Information = CreateObservable();
}
public IObservable<DataProviderResult> Information { get; private set; }
Generate produces cold observable sequences, so that is my first alarm bell.
I tried to pull your code into linqpad* and run it and changed it a bit to focus on the problem. It seems to me that you have the Iterator and ResultSelector functions confused. These are back-to-front. When you iterate, you should take the value from your last iteration and use it to produce your next value. The result selector is used to pick off (Select) the value form the instance you are iterating on.
So in your case, the type you are iterating on is the type you want to produce values of. Therefore keep your ResultSelector function just the identity function x=>x, and your IteratorFunction should be the one that make the WebService call.
Observable.Generate<DataProviderResult, DataProviderResult>(
// we start with some empty data
new DataProviderResult() {
Failures = 0
, Informations = new List<Information>()},
// never stop
(r) => true,
// we get the next value(iterate) by making a call to the webservice
(r) => FetchNextResults(r),
// there is no projection
(r) => r,
// we select time for next msg depending on the current failures
(r) => r.Failures > 0 ? tnotok : tok,
// we pass a TestScheduler
scheduler)
.Suscribe(r => HandleResults(r));
As a side note, try to prefer immutable types instead of mutating values as you iterate.
*Please provide an autonomous working snippet of code so people can better answer your question. :-)