I have a static collection, say of tasks to call remote rest api:
static ConcurrentBag<Task<HttpResponseMessage>> _collection = new ConcurrentBag<Task<HttpResponseMessage>>();
static void Main(string[] args)
{
Task.Factory.StartNew(() => Produce());
Task.Factory.StartNew(() => Consume());
Console.ReadKey();
}
One thread adds new items into it:
private static void Produce()
{
while (true)
{
var task = HttpClientFactory.Create().GetAsync("http://example.com");
_collection.Add(task);
Thread.Sleep(500);
}
}
And another thread should process those items:
private static void Consume()
{
_collection.ToObservable()
.Subscribe(
t => Console.WriteLine("++"),
ex => Console.WriteLine(ex.Message),
() => Console.WriteLine("Done"));
}
But it runs only once and completes prematurely. So output is;
++
Done
It would be interesting if it worked like that... but sadly it doesn't. The ToObservable extension method is defined on the IEnumerable<T> interface - so it's getting a point-in-time snap shot of the collection.
You need a collection than can be observed, such as ObservableCollection. With this, you can respond to add events to feed an Rx pipeline (perhaps by wiring the CollectionChanged event up with Observable.FromEventPattern). Bear in mind that this collection doesn't support concurrent adds. Such a technique is one way to "enter the monad" (i.e. obtain an IObservable<T>).
Equivalent is adding your request payloads to a Subject. Either way, you can then project them into asynchronous requests. So say (for arguments sake), your Produce signature looked like this:
private static async Task<HttpResponseMessage> Produce(string requestUrl)
Then you might construct an observable to convert the requestUrls to async web requests using your Produce method like so:
var requests = new Subject<string>();
var responses = requests.SelectMany(
x => Observable.FromAsync(() => Produce(x)));
responses.Subscribe(
t => Console.WriteLine("++"),
ex => Console.WriteLine(ex.Message),
() => Console.WriteLine("Done"));
And submit each request with something like:
requests.OnNext("http://myurl");
If you need concurrent adds, see Observable.Synchronize.
If you need to control the thread(s) that handle the responses, use ObserveOn which I wrote a lengthy explanation of here.
Related
When using IObservable.LastAsync() to force my console app to wait on the result of an API call using Flurl, that API call is never made and the main thread deadlocks and never returns from LastAsync(). My goals are:
Since this is a console app, I can't really "subscribe" to the API call since that would allow the main thread to continue, likely causing it to exit prior to the API call completing. So I need to block until the value is obtained.
The API call should be deferred until the first subscriber requests a value.
Second and onward subscribers should not cause another API call, instead the last value from the stream should be returned (this is the goal of using Replay(1))
Here is an example that reproduces the issue:
public static class Program
{
public static async Task Main(string[] args)
{
var obs = Observable.Defer(() =>
"https://api.publicapis.org"
.AppendPathSegment("entries")
.GetJsonAsync()
.ToObservable())
.Select(x => x.title)
.Replay(1);
var title = await obs.LastAsync();
Console.WriteLine($"Title 1: {title}");
}
}
How can I modify my example to ensure that all 3 requirements above are met? Why does my example cause a deadlock?
Replay returns "connectable" observable, and you need to call Connect() method on it to start it going. Without that call, it does not subscribe to the underlying observable and does not emit items to its own subscribers, so that's why you have a "deadlock".
In this case instead of manually connecting, you can use RefCount() extension method which will automatically connect it on first subscriber and disconnect on when last subscriber unsubscribes. So:
public static async Task Main(string[] args) {
var obs = Observable.Defer(() =>
"https://api.publicapis.org"
.AppendPathSegment("entries")
.GetJsonAsync()
.ToObservable())
.Select(x => x.count)
.Replay(1)
.RefCount();
// makes request
var title = await obs.LastAsync();
Console.WriteLine($"Title 1: {title}");
// does not make request, obtains from replay cache
title = await obs.LastAsync();
Console.WriteLine($"Title 2: {title}");
}
You can also use AutoConnect method:
.Replay(1)
.AutoConnect(1);
This will automatically connect on first subscriber but will never disconnect (in your case shouldn't matter).
Given: An extension method taking a Selenium IWebdriver instance and returning an IObservable
public static IObservable<ObservableCollection<WebElementWrapper>>
GetAllElementsAsObservable(this IWebDriver wd)
{
return Observable.Create<ObservableCollection<WebElementWrapper>>(
(IObserver<ObservableCollection<WebElementWrapper>> observer) =>
{
var eles = wd.FindElements(By.CssSelector("*"));
var list = eles.ToWebElementObservableCollection();
observer.OnNext(list);
observer.OnCompleted();
return Disposable.Create(() => { });
});
}
And the code that calls the method above (running on GUI thread)...
//GUI Will Freeze on this call until OnCompleted is called
cd.GetAllElementsAsObservable().Subscribe((WEWList) =>
{
WebElementCollection = WEWList;
SetNavigationItems();
});
Can anyone help me determine root cause of GUI thread block until OnCompleted is called. I can stop the blocking if I use Task.Run in first method, but then I have to marshall the collection back onto GUI thread.
Does this block because the GUI thread spun up the Webdriver of which the Observable is using to extract elements?
Or is this due to the static methods being created at start up time on the GUI thread?
If ever you do this - Disposable.Create(() => { }) - you are doing something wrong. Using Observable.Create the way you are using it is a blocking operation. The code inside the .Create is part of the subscription, but you're running the observer to completion during subscription which is why it is blocking.
Try doing something like this instead:
public static IObservable<ObservableCollection<WebElementWrapper>>
GetAllElementsAsObservable(this IWebDriver wd)
{
return Observable.Create<ObservableCollection<WebElementWrapper>>(observer =>
Observable
.Start(() =>
wd
.FindElements(By.CssSelector("*"))
.ToWebElementObservableCollection())
.Subscribe(observer));
}
For WPF, I've also found these two methods to work..
SomeObservable
.SubscribeOn(Scheduler.Default)
.ObserveOn(Scheduler.CurrentThread)
.Subscribe(item => { //do something on gui thread here });
I don't care for the method name SubscribeOn, but I look at it this way... I want the observable to SubscribeOn some scheduler. (I think a better name would have been "SheduleOn").
The ObserveOn method name makes sense. But note the "Scheduler.Dispatcher" built-in property.
I have some time consuming code in a foreach that uses task/await.
it includes pulling data from the database, generating html, POSTing that to an API, and saving the replies to the DB.
A mock-up looks like this
List<label> labels = db.labels.ToList();
foreach (var x in list)
{
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
List<response> res = await api.sendMessage(object); //POST
//put all the responses in the db
foreach (var r in res)
{
db.responses.add(r);
}
db.SaveChanges();
}
Time wise, generating the Html and posting it to the API seem to be taking most of the time.
Ideally it would be great if I could generate the HTML for the next item, and wait for the post to finish, before posting the next item.
Other ideas are also welcome.
How would one go about this?
I first thought of adding a Task above the foreach and wait for that to finish before making the next POST, but then how do I process the last loop... it feels messy...
You can do it in parallel but you will need different context in each Task.
Entity framework is not thread safe, so if you can't use one context in parallel tasks.
var tasks = myLabels.Select( async label=>{
using(var db = new MyDbContext ()){
// do processing...
var response = await api.getresponse();
db.Responses.Add(response);
await db.SaveChangesAsync();
}
});
await Task.WhenAll(tasks);
In this case, all tasks will appear to run in parallel, and each task will have its own context.
If you don't create new Context per task, you will get error mentioned on this question Does Entity Framework support parallel async queries?
It's more an architecture problem than a code issue here, imo.
You could split your work into two separate parts:
Get data from database and generate HTML
Send API request and save response to database
You could run them both in parallel, and use a queue to coordinate that: whenever your HTML is ready it's added to a queue and another worker proceeds from there, taking that HTML and sending to the API.
Both parts can be done in multithreaded way too, e.g. you can process multiple items from the queue at the same time by having a set of workers looking for items to be processed in the queue.
This screams for the producer / consumer pattern: one producer produces data in a speed different than the consumer consumes it. Once the producer does not have anything to produce anymore it notifies the consumer that no data is expected anymore.
MSDN has a nice example of this pattern where several dataflowblocks are chained together: the output of one block is the input of another block.
Walkthrough: Creating a Dataflow Pipeline
The idea is as follows:
Create a class that will generate the HTML.
This class has an object of class System.Threading.Tasks.Dataflow.BufferBlock<T>
An async procedure creates all HTML output and await SendAsync the data to the bufferBlock
The buffer block implements interface ISourceBlock<T>. The class exposes this as a get property:
The code:
class MyProducer<T>
{
private System.Threading.Tasks.Dataflow.BufferBlock<T> bufferBlock = new BufferBlock<T>();
public ISourceBlock<T> Output {get {return this.bufferBlock;}
public async ProcessAsync()
{
while (somethingToProduce)
{
T producedData = ProduceOutput(...)
await this.bufferBlock.SendAsync(producedData);
}
// no date to send anymore. Mark the output complete:
this.bufferBlock.Complete()
}
}
A second class takes this ISourceBlock. It will wait at this source block until data arrives and processes it.
do this in an async function
stop when no more data is available
The code:
public class MyConsumer<T>
{
ISourceBlock<T> Source {get; set;}
public async Task ProcessAsync()
{
while (await this.Source.OutputAvailableAsync())
{ // there is input of type T, read it:
var input = await this.Source.ReceiveAsync();
// process input
}
// if here, no more input expected. finish.
}
}
Now put it together:
private async Task ProduceOutput<T>()
{
var producer = new MyProducer<T>();
var consumer = new MyConsumer<T>() {Source = producer.Output};
var producerTask = Task.Run( () => producer.ProcessAsync());
var consumerTask = Task.Run( () => consumer.ProcessAsync());
// while both tasks are working you can do other things.
// wait until both tasks are finished:
await Task.WhenAll(new Task[] {producerTask, consumerTask});
}
For simplicity I've left out exception handling and cancellation. StackOverFlow has artibles about exception handling and cancellation of Tasks:
Keep UI responsive using Tasks, Handle AggregateException
Cancel an Async Task or a List of Tasks
This is what I ended up using: (https://stackoverflow.com/a/25877042/275990)
List<ToSend> sendToAPI = new List<ToSend>();
List<label> labels = db.labels.ToList();
foreach (var x in list) {
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
sendToAPI.add(the object with HTML);
}
int maxParallelPOSTs=5;
await TaskHelper.ForEachAsync(sendToAPI, maxParallelPOSTs, async i => {
using (NasContext db2 = new NasContext()) {
List<response> res = await api.sendMessage(i.object); //POST
//put all the responses in the db
foreach (var r in res)
{
db2.responses.add(r);
}
db2.SaveChanges();
}
});
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext()) {
await body(partition.Current).ContinueWith(t => {
if (t.Exception != null) {
string problem = t.Exception.ToString();
}
//observe exceptions
});
}
}));
}
basically lets me generate the HTML sync, which is fine, since it only takes a few seconds to generate 1000's but lets me post and save to DB async, with as many threads as I predefine. In this case I'm posting to the Mandrill API, parallel posts are no problem.
I'm writing a networked application.
Messages are sent over the transport as such:
Network.SendMessage (new FirstMessage() );
I can register an event handler to be called when this message type arrives, like so:
Network.RegisterMessageHandler<FirstMessage> (OnFirstMessageReceived);
And the event gets fired:
public void OnFirstMessageReceived(EventArgs<FirstMessageEventArgs> e)
{
}
I'm writing a custom authentication procedure for my networked application, which requires around five messages to complete.
Without using the Task Parallel Library, I would be forced to code the next step of each procedure in the preceding event handler, like so:
public void OnFirstMessageReceived(EventArgs<FirstMessageEventArgs> e)
{
Network.SendMessage( new SecondMessage() );
}
public void OnSecondMessageReceived(EventArgs<SecondMessageEventArgs> e)
{
Network.SendMessage( new ThirdMessage() );
}
public void OnThirdMessageReceived(EventArgs<ThirdMessageEventArgs> e)
{
Network.SendMessage( new FourthMessage() );
}
public void OnFourthMessageReceived(EventArgs<FourthMessageEventArgs> e)
{
// Authentication is complete
}
I don't like the idea of jumping around the source code to code a portion of this and a portion of that. It's hard to understand and edit.
I hear the Task Parallel Library substantially simplifies this solution.
However, many of the examples I read using the Task Parallel Library were related to starting a chain of active tasks. What I mean by 'active', is that each task could start when called explicitly, like so:
public void Drink() {}
public void Eat() {}
public void Sleep() {}
Task.Factory.StartNew( () => Drink() )
.ContinueWith( () => Eat() )
.ContinueWith( () => Sleep() );
This is opposite from my event-based async pattern, in which each event handler method is called only when the message is received.
In other words, I can't do something like this (but I want to):
Task.Factory.StartNew( () => OnFirstMessageReceived() )
.ContinueWith( () => OnSecondMessageReceived() )
.ContinueWith( () => OnThirdMessageReceived() )
.ContinueWith( () => OnFourthMessageReceived() );
I've read this article, but I don't quite understand it. It seems like what I need has to do with TaskCompletionSource. If I wanted to make a task from my event-based async pattern like the code block above, what would it look like?
You're right about TaskCompletionSource, it's the key to transforming EAP (event-based asynchronous pattern) to TPL's Task.
This is documented here: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/tpl-and-traditional-async-programming#exposing-complex-eap-operations-as-tasks
Here is the simplified code:
public static class Extensions
{
public static Task<XDocument> GetRssDownloadTask(
this WebClient client, Uri rssFeedUri)
{
// task completion source is an object, which has some state.
// it gives out the task, which completes, when state turns "completed"
// or else it could be canceled or throw an exception
var tcs = new TaskCompletionSource<XDocument>();
// now we subscribe to completed event. depending on event result
// we set TaskCompletionSource state completed, canceled, or error
client.DownloadStringCompleted += (sender, e) =>
{
if(e.Cancelled)
{
tcs.SetCanceled();
}
else if(null != e.Error)
{
tcs.SetException(e.Error);
}
else
{
tcs.SetResult(XDocument.Parse(e.Result));
}
};
// now we start asyncronous operation
client.DownloadStringAsync(rssFeedUri);
// and return the underlying task immediately
return tcs.Task;
}
}
Now, all you need to do, to make a chain of those operations, is just to set your continuations (which is not very comfortable at the moment, and the C# 5 await and async will help alot with it)
So, this code could be used like this:
public static void Main()
{
var client = new WebClient();
client.GetRssDownloadTask(
new Uri("http://blogs.msdn.com/b/ericlippert/rss.aspx"))
.ContinueWith( t => {
ShowXmlInMyUI(t.Result); // show first result somewhere
// start a new task here if you want a chain sequence
});
// or start it here if you want to get some rss feeds simultaneously
// if we had await now, we would add
// async keyword to Main method defenition and then
XDocument feedEric = await client.GetRssDownloadTask(
new Uri("http://blogs.msdn.com/b/ericlippert/rss.aspx"));
XDocument feedJon = await client.GetRssDownloadTask(
new Uri("http://feeds.feedburner.com/JonSkeetCodingBlog?format=xml"));
// it's chaining - one task starts executing after
// another, but it is still asynchronous
}
Jeremy Likness has a blog entry title Coroutines for Asynchronous Sequential Workflows using Reactive Extensions (Rx) that might interest you. Here is the question he tries to answer:
The concept is straightforward: there are often times we want an asynchronous set of operations to perform sequentially. Perhaps you must load a list from a service, then load the selected item, then trigger an animation. This can be done either by chaining the completed events or nesting lambda expressions, but is there a cleaner way?
I'm trying to use the Reactive Extensions (Rx) to buffer an enumeration of Tasks as they complete. Does anyone know if there is a clean built-in way of doing this? The ToObservable extension method will just make an IObservable<Task<T>>, which is not what I want, I want an IObservable<T>, that I can then use Buffer on.
Contrived example:
//Method designed to be awaitable
public static Task<int> makeInt()
{
return Task.Run(() => 5);
}
//In practice, however, I don't want to await each individual task
//I want to await chunks of them at a time, which *should* be easy with Observable.Buffer
public static void Main()
{
//Make a bunch of tasks
IEnumerable<Task<int>> futureInts = Enumerable.Range(1, 100).Select(t => makeInt());
//Is there a built in way to turn this into an Observable that I can then buffer?
IObservable<int> buffered = futureInts.TasksToObservable().Buffer(15); //????
buffered.Subscribe(ints => {
Console.WriteLine(ints.Count()); //Should be 15
});
}
You can use the fact that Task can be converted to observable using another overload of ToObservable().
When you have a collection of (single-item) observables, you can create a single observable that contains the items as they complete using Merge().
So, your code could look like this:
futureInts.Select(t => t.ToObservable())
.Merge()
.Buffer(15)
.Subscribe(ints => Console.WriteLine(ints.Count));