There's a source that emits items asynchronously, in my case in the order of 10 items withing some seconds.
I want to handle all of them.
In order of appearance
One at a time
Handling of each might take long (like 1-5 seconds)
There might be new items while one is being handled
At the moment I'm porting a Java app to C#. In the RxJava world there's the onBackpressureBuffer method, whereas in Rx.Net there are a bunch of different ones and cannot figure out the equivalent.
I guess I could use Buffer() with a zero delay and use the produced enumerable, but seems to be hacky.
edit: please read comments before voting negatively
Rx.NET doesn't support backpressure and it can't the way RxJava does: one has to design the protocol so that there is a request channel between a producer and its consumer. In concept, Async Enumerables can get you backpressure in the form of 1-by-1 item delivery (called async pull, promise per item, continuations, etc. in some cases).
There is a C# library that matches the features of the RxJava Flowable type relatively well (but not 100%): Reactive4.NET that can also interop with IObservable and thus with Rx.NET if needed.
Related
I want to implement "producer/two consumers" functionality.
Producer: scans directories recursively and adds directory information to some storage (I guess Queue<>)
Consumer 1: retrieves data about directory and writes it to XML-file.
Consumer 2: retrieves data about directory and add it to TreeNode.
So both (1 and 2) consumers have to work with a same data. Because if one of consumers call Dequeue(), the other one will miss this data.
The only idea I have - is to make 2 different Queue<> and Producer will fill them both with a same data. Then each consumer will work with different Queue object.
I hope you'll advise something more attractive
LMAX Disruptor is one solution to this problem.
Article: http://martinfowler.com/articles/lmax.html
Illustration of the single-producer, multithreaded consumer ring buffer: http://martinfowler.com/articles/images/lmax/disruptor.png
It is assumed that you will need good - nearly expert level - knowledge of how atomic instructions and lock-free algorithms work on your target platform.
The description below is different from LMAX - I adapted it to the OP's scenario.
The underlying structure could be either a ring buffer (fixed-capacity), or a lock-free linked list (unlimited capacity, but only available on platforms that supports certain kinds of multi-word atomic instructions).
The producer will just push stuff to the front.
Each consumer keeps an iterator to the item that they are processing. Each consumer advances its own iterator, at each's own pace.
Besides the consumers, there is also a trailing garbage collector which will also try to advance, but it will not advance past any of the consumer's iterators. Thus, it will eventually clean up items that both consumers have finished processing, and only those items.
You could use ZeroMQ, which has this functionality built into it (and a lot more) -
http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/patterns/pushpull.html
The above example is with Python code, but there are .NET bindings -
http://zeromq.org/bindings:clr
We already have parallel fan-out working in our code (using ParallelEnumerable) which is currently running on a 12-core, 64G RAM server. But we would like to convert the code to use Rx so that we can have better flexibility over our downstream pipeline.
Current Workflow:
We read millions of records from a database (in a streaming fashion).
On the client side, we then use a custom OrderablePartitioner<T> class to group the database records into groups. Let’s call an instance of this class: partioner.
We then use partioner.AsParallel().WithDegreeOfParallelism(5).ForAll(group => ProcessGroupOfRecordsAsync(group));Note: this could be read as “Process all the groups, 5 at a time in parallel.” (I.e. parallel fan-out).
ProcessGroupOfRecordsAsync() – loops through all the records in the group and turns them into hundreds or even thousands of POCO objects for further processing (i.e. serial fan-out or better yet, expand).
Depending on the client’s needs:
This new serial stream of POCO objects are evaluated, sorted, ranked, transformed, filtered, filtered by manual process, and possibly more parallel and/or serial fanned-out throughout the rest of the pipeline.
The end of the pipeline may end up storing new records into the database, displaying the POCO objects in a form or displayed in various graphs.
The process currently works just fine, except that point #5 and #6 aren’t as flexible as we would like. We need the ability to swap in and out various downstream workflows. So, our first attempt was to use a Func<Tin, Tout> like so:
partioner.AsParallel
.WithDegreeOfParallelism(5)
.ForAll(group =>ProcessGroupOfRecordsAsync(group, singleRecord =>
NextTaskInWorkFlow(singleRecord));
And that works okay, but the more we flushed out our needs the more we realized we are just re-implementing Rx.
Therefore, we would like to do something like the following in Rx:
IObservable<recordGroup> rg = dbContext.QueryRecords(inputArgs)
.AsParallel().WithDegreeOfParallelism(5)
.ProcessGroupOfRecordsInParallel();
If (client1)
rg.AnalizeRecordsForClient1().ShowResults();
if (client2)
rg.AnalizeRecordsForClient2()
.AsParallel()
.WithDegreeOfParallelism(3)
.MoreProcessingInParallel()
.DisplayGraph()
.GetUserFeedBack()
.Where(data => data.SaveToDatabase)
.Select(data => data.NewRecords)
.SaveToDatabase(Table2);
...
using(rg.Subscribe(groupId =>LogToScreen(“Group {0} finished.”, groupId);
It sounds like you might want to investigate Dataflows in the Task Parallel Library - This might be a better fit than Rx for dealing with part 5, and could be extended to handle the whole problem.
In general, I don't like the idea of trying to use Rx for parallelization of CPU bound tasks; its usually not a good fit. If you are not too careful, you can introduce inefficiencies inadvertently. Dataflows can give you nice way to parallelize only where it makes most sense.
From MSDN:
The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available. For example, consider an application that processes image data from a web camera. By using the dataflow model, the application can process image frames as they become available. If the application enhances image frames, for example, by performing light correction or red-eye reduction, you can create a pipeline of dataflow components. Each stage of the pipeline might use more coarse-grained parallelism functionality, such as the functionality that is provided by the TPL, to transform the image.
Kaboo!
As no one has provided anything definite, I'll point out that the source code can be browsed at GitHub at Rx. Taking a quick tour around, it looks like at least some of the processing (all of it?) is done on the thread-pool already. So, maybe it's not possibly to explicitly control the parallelization degree besides implementing your own scheduler (e.g. Rx TestScheduler), but it happens nevertheless. See also the links below, judging from the answers (especially the one provided by James in the first link), the observable tasks are queued and processed serially by design -- but one can provide multiple streams for Rx to process.
See also the other questions that are related and visible on the left side (by default). In particular it looks like this one, Reactive Extensions: Concurrency within the subscriber, could provide some answers to your question. Or maybe Run methods in Parallel using Reactive.
<edit: Just a note that if storing objects to database becomes a problem, the Rx stream could push the save operations to, say, a ConcurrentQueue, which would then be processed separately. Other option would be to let Rx to queue items with a proper combination of some time and number of items and push them to the database by bulk insert.
Most people seem to build a listener socket and will include "events" to be invoked for processing. EG: SocketConnected, DataReceived. The programmer initializes a listener and binds to the "events" methods to receive socket events to build the service.
I feel on a large scale implementation, it would be more efficient to avoid delegates in the listener. And to complete all the processing in the callback methods. Possibly using different call backs for receiving data based on the knowledge of knowing what command is coming next. (This is part of my Message Frame Structure)
I have looked around for highly scalable examples, but I only find the standard MSDN implementations for asynchronous sockets or variations from other programmers that replicate the MSDN example.
Does anyone have any good experience that could point me in the right direction?
Note> The service will hold thousands of clients and in most cases, the clients stayed connected and updates received by the service will be send out to all other connected clients. It is a synchronized P2P type system for an object orientated database.
The difference between an event call and a callback is negligible. A callback is just the invocation of a delegate (or a function pointer). You can't do asynchronous operation without some sort of callback and expect to get results of any kind.
With events, they can be multicast. This means multiple callback calls--so that would be more costly because you calling multiple methods. But, if you're doing that you probably need to do it--the alternative is to have multiple delegates and call them manually. So, there'd be no real benefit. Events can often include sender/eventargs; so, you've got that extra object and the creation of the eventargs instance; but I've never seen a situation where that affected performance.
Personally, I don't use the event-based asynchronous pattern--I've found (prior to .NET 4.5) that the asynchronous programming model to be more ubiquitous. In .NET 4.5 I much prefer the task asynchronous pattern (single methods that end in Async instead of two methods one starting with Begin and one starting with End) because they can be used with async/await and less wordy.
Now, if the question is the difference between new AsyncCallback(Async_Send_Receive.Read_Callback) e.g.:
s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0,
new AsyncCallback(Async_Send_Receive.Read_Callback), so);
and just Async_Send_Receive.Read_Callback e.g.:
s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0,
Async_Send_Receive.Read_Callback, so);
The second is just a short-hand of the first; the AsyncCallback delegate is still created under the covers.
But, as with most things; even if it's generally accepted not to be noticeably different in performance, test and measure. If one way has more benefits (included performance) than another, use that one.
My only advice to you is this: Go with the style that provides the most clarity.
Eliminating an entire language feature because of an unmeasured speed difference would be premature. The cost of method calls/delegate invocations is highly unlikely to be the bottleneck in your code. Sure, you could benchmark the relative cost of one versus another, but if your program is only spending 1% of its setting up method invocations, then even huge differences won't really affect your program.
My best advice to you if you really want to juice your server, just make sure that all your IO happens asynchronously, and never run long-running tasks in the threadpool. .net4.5 async/await really simplifies all of this... consider using it for more maintainable code.
I have worked with live betting systems using sockets and with two way active messaging. It is really easier to work with a framework to handle the socket layer like WCF P2P. It handles all the connection problems for you and you can concentrate on your bussiness logic.
I want a serializable continuation so I can pickle async workflows to disk while waiting for new events. When the async workflow is waiting on a let!, it would be saved away along with a record of what was needed to wake it up. Instead of arbitrary in-memory IAsyncResults (or Task<T>, etc.), it would have to be, for instance, a filter criterion for incoming messages along with the continuation itself. Without language support for continuations, this might be a feat. But with computation expressions taking care of the explicit CPS tranformation, it might not be too tricky and could even be more efficient. Has anyone tackled an approach like this?
You could probably use the MailboxProcessor, or Agent, type as a means of getting close to what you want. You'd could then use the agent.PostAndAsyncReply with a timeout to retrieve the current AgentState. As mentioned above, you'll need to make the objects you are passing around serializable, but even delegates are serializable. The internals are really unrelated to async computations, though. The async computation would merely allow you a way to interact with the various agents in your program in a non-blocking fashion.
Dave Thomas and I have been working on a library called fracture-io that will provide some out-of-the-box scenarios for working with agents. We hadn't yet discussed this exact scenario, but we could probably look at baking this in ... or take a commit. :)
I also noticed that you tagged your question with callcc. I posted a sample of that operator to fssnip, but Tomas Petricek quickly posted an example of how easy it is to break with async computations. So I don't think callcc is a useful solution for this question. If you don't need async, you can look in FSharpx for the Continuation module and the callcc operator in there.
Have you looked at Windows Workflow Foundation?
http://msdn.microsoft.com/en-us/netframework/aa663328.aspx
That's probably the technology you want, assuming the events/messages are arriving in periods of hours/days/weeks and you're serializing to disk to avoid using memory/threads in the meantime. (Or else why do you want it?)
I'm writing a c# application that can create a series of request messages. Each message could have a response, that needs to be waited on by a consumer.
Where the number of outstanding request messages is constrained, I have used the windows EVENT to solve this problem. However, I know there is a limit on how many EVENT objects can be created in a single process, and in this instance, its possible I might exceed that limit.
Does anyone know if there is a similar limit on creation of mutex objects or semaphores?
I know this can be solved by some sort of pool of shared resources, that are grabbed by consumers when they need to wait, but it would be more convenient if each request message could have its own sync object.
The system limits the total number of handles a process can create. Events, Mutexes, Semaphores etc. all contribute to the handle count so they will all be limited by the system.
That limit was 16*1024*1024, but I have been away from this stuff for a while so I do not know it it has changed with newer OSs and 64bit, but to be honest I doubt it since that is a huge number of handles to be creating, and probably constitutes a serious design flaw if you get anywhere near that.
Not having a very clear picture of what you are wanting, I might be wrong, but maybe you could look at something like the asyc event pattern?
http://msdn.microsoft.com/en-us/library/wewwczdw.aspx