I am making a bunch or asynchronous calls to Azure Table Storage. For obvious reasons insertion of these records are not in the same order as they were invoked.
I am planning to introduce ConcurrentQueue to ensure sequence. Following sample code written as a POC seems to achieve desired result.
I am wondering is this the best way I can ensure asynchronous calls
will be completed in sequence?
public class ProductService
{
ConcurrentQueue<string> ordersQueue = new ConcurrentQueue<string>();
//Place make calls here
public void PlaceOrder()
{
Task.Run(() =>
{
Parallel.For(0, 100, (i) =>
{
string item = "Product " + i;
ordersQueue.Enqueue(item);
Console.WriteLine("Placed Order: " + item);
Task.Delay(2000).Wait();
});
});
}
//Process calls in sequence, I am hoping concurrentQueue will be consistent.
public void Deliver()
{
Task.Run(() =>
{
while(true)
{
string productId;
ordersQueue.TryDequeue(out productId);
if (!string.IsNullOrEmpty(productId))
{
Console.WriteLine("Delivered: " + productId);
}
}
});
}
}
If you want to process records asynchronously and sequentially this sounds like a perfect fit for TPL Dataflow's ActionBlock. Simply create a block with the action to execute and post records to it. It supports async actions and keeps order:
var block = new ActionBlock<Product>(async product =>
{
await product.ExecuteAsync();
});
block.Post(new Product());
It also supports processing in parallel and bounded capacity if you need.
Try using Microsoft's Reactive Framework.
This worked for me:
IObservable<Task<string>> query =
from i in Observable.Range(0, 100, Scheduler.Default)
let item = "Product " + i
select AzureAsyncCall(item);
query
.Subscribe(async x =>
{
var result = await x;
/* do something with result */
});
The AzureAsyncCall call signature I used was public Task<string> AzureAsyncCall(string x).
I dropped in a bunch of Console.WriteLine(Thread.CurrentThread.ManagedThreadId); calls to ensure I was getting the right async behaviour in my test code. It worked well.
All the calls were asynchronous and serialized one after the other.
Related
In my Asp.Net Core WebApi Controller, I'm receiving a IFormFile[] files. I need to convert this to of List<DocumentData>. I first used foreach. It was working fine. But later decided to change to Parallel.ForEach as I'm receiving many(>5) files.
Here is my DocumentData Class:
public class DocumentData
{
public byte[] BinaryData { get; set; }
public string FileName { get; set; }
}
Here is my Parallel.ForEach Logic:
var documents = new ConcurrentBag<DocumentData>();
Parallel.ForEach(files, async (currentFile) =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
For Example, even for two files as input, documents always gives one file as output. Am I missing something?
I initially had List<DocumentData>. I found that it's not thread safe and changed to ConcurrentBag<DocumentData>. But still I'm getting unexpected results. Please assist on where I'm wrong?
I guess it is because, Parallel.Foreach doesn't support async/await. It only takes Action as input and executes it for each item. And in case of async delegates it will execute them in a fire-and-forget manner.
In that case passed lambda will be considered as async void function and async void can't be awaited.
If there were overload which takes Func<Task> then it would work.
I suggest you to create Tasks with the help of Select and use Task.WhenAll for executing them at the same time.
For example:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
await Task.WhenAll(tasks);
Additionally you can improve that code with just returning DocumentData instance from that method, and in such case there is no need to modify documents collection. Task.WhenAll has overload which takes IEnumerable<Task<TResult> as input and produces Task of TResult array. So, the result will be so:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
return new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
};
}
}
return null;
});
var documents = (await Task.WhenAll(tasks)).Where(d => d != null).ToArray();
You had the right idea with a concurrent collection, but misused a TPL method.
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run mutple tasks at the same time use Task.WhenAll , or a TPL Dataflow ActionBlock which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
The fundimental issue is when you call an async lambda on an Action, you are essentially creating an async void method, which will run as a task unobserved. That's to say, your TPL method is just creating a bunch of tasks in parallel to run a bunch of unobserved tasks and not waiting for them.
Think of it like this, you ask a bunch of friends to go and get you some groceries, they in turn tell someone else to get your groceries, yet your friends report back to you and say thier job is done. It obviously isn't and you have no groceries.
I have some time consuming code in a foreach that uses task/await.
it includes pulling data from the database, generating html, POSTing that to an API, and saving the replies to the DB.
A mock-up looks like this
List<label> labels = db.labels.ToList();
foreach (var x in list)
{
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
List<response> res = await api.sendMessage(object); //POST
//put all the responses in the db
foreach (var r in res)
{
db.responses.add(r);
}
db.SaveChanges();
}
Time wise, generating the Html and posting it to the API seem to be taking most of the time.
Ideally it would be great if I could generate the HTML for the next item, and wait for the post to finish, before posting the next item.
Other ideas are also welcome.
How would one go about this?
I first thought of adding a Task above the foreach and wait for that to finish before making the next POST, but then how do I process the last loop... it feels messy...
You can do it in parallel but you will need different context in each Task.
Entity framework is not thread safe, so if you can't use one context in parallel tasks.
var tasks = myLabels.Select( async label=>{
using(var db = new MyDbContext ()){
// do processing...
var response = await api.getresponse();
db.Responses.Add(response);
await db.SaveChangesAsync();
}
});
await Task.WhenAll(tasks);
In this case, all tasks will appear to run in parallel, and each task will have its own context.
If you don't create new Context per task, you will get error mentioned on this question Does Entity Framework support parallel async queries?
It's more an architecture problem than a code issue here, imo.
You could split your work into two separate parts:
Get data from database and generate HTML
Send API request and save response to database
You could run them both in parallel, and use a queue to coordinate that: whenever your HTML is ready it's added to a queue and another worker proceeds from there, taking that HTML and sending to the API.
Both parts can be done in multithreaded way too, e.g. you can process multiple items from the queue at the same time by having a set of workers looking for items to be processed in the queue.
This screams for the producer / consumer pattern: one producer produces data in a speed different than the consumer consumes it. Once the producer does not have anything to produce anymore it notifies the consumer that no data is expected anymore.
MSDN has a nice example of this pattern where several dataflowblocks are chained together: the output of one block is the input of another block.
Walkthrough: Creating a Dataflow Pipeline
The idea is as follows:
Create a class that will generate the HTML.
This class has an object of class System.Threading.Tasks.Dataflow.BufferBlock<T>
An async procedure creates all HTML output and await SendAsync the data to the bufferBlock
The buffer block implements interface ISourceBlock<T>. The class exposes this as a get property:
The code:
class MyProducer<T>
{
private System.Threading.Tasks.Dataflow.BufferBlock<T> bufferBlock = new BufferBlock<T>();
public ISourceBlock<T> Output {get {return this.bufferBlock;}
public async ProcessAsync()
{
while (somethingToProduce)
{
T producedData = ProduceOutput(...)
await this.bufferBlock.SendAsync(producedData);
}
// no date to send anymore. Mark the output complete:
this.bufferBlock.Complete()
}
}
A second class takes this ISourceBlock. It will wait at this source block until data arrives and processes it.
do this in an async function
stop when no more data is available
The code:
public class MyConsumer<T>
{
ISourceBlock<T> Source {get; set;}
public async Task ProcessAsync()
{
while (await this.Source.OutputAvailableAsync())
{ // there is input of type T, read it:
var input = await this.Source.ReceiveAsync();
// process input
}
// if here, no more input expected. finish.
}
}
Now put it together:
private async Task ProduceOutput<T>()
{
var producer = new MyProducer<T>();
var consumer = new MyConsumer<T>() {Source = producer.Output};
var producerTask = Task.Run( () => producer.ProcessAsync());
var consumerTask = Task.Run( () => consumer.ProcessAsync());
// while both tasks are working you can do other things.
// wait until both tasks are finished:
await Task.WhenAll(new Task[] {producerTask, consumerTask});
}
For simplicity I've left out exception handling and cancellation. StackOverFlow has artibles about exception handling and cancellation of Tasks:
Keep UI responsive using Tasks, Handle AggregateException
Cancel an Async Task or a List of Tasks
This is what I ended up using: (https://stackoverflow.com/a/25877042/275990)
List<ToSend> sendToAPI = new List<ToSend>();
List<label> labels = db.labels.ToList();
foreach (var x in list) {
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
sendToAPI.add(the object with HTML);
}
int maxParallelPOSTs=5;
await TaskHelper.ForEachAsync(sendToAPI, maxParallelPOSTs, async i => {
using (NasContext db2 = new NasContext()) {
List<response> res = await api.sendMessage(i.object); //POST
//put all the responses in the db
foreach (var r in res)
{
db2.responses.add(r);
}
db2.SaveChanges();
}
});
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext()) {
await body(partition.Current).ContinueWith(t => {
if (t.Exception != null) {
string problem = t.Exception.ToString();
}
//observe exceptions
});
}
}));
}
basically lets me generate the HTML sync, which is fine, since it only takes a few seconds to generate 1000's but lets me post and save to DB async, with as many threads as I predefine. In this case I'm posting to the Mandrill API, parallel posts are no problem.
I have a method (called via an AJAX request) that runs at the end of a sequence. In this method I save to the database, send emails, look up a bunch of info in other APIs/databases, correlate things together, etc.. I just refactored the original method into a second revision and used Tasks to make it asynchronous, and shaved off up to two seconds in wall time. I used Tasks mainly because it seemed easier (I'm not that experienced in async/await yet) and some tasks depend on other tasks (like task C D and E all depend on results from B, which itself depends on A). Basicalll all of the tasks are started at the same time (processing just zips down the to the Wait() call on the email task, which in one way or another requires all the others to complete. I generally do something like this except with something like eight tasks:
public thing() {
var FooTask<x> = Task<x>.Factory.StartNew(() => {
// ...
return x;
});
var BarTask<y> = Task<y>.Factory.StartNew(() => {
// ...
var y = FooTask.Result;
// ...
return y + n;
}
var BazTask<z> = Task<z>.Factory.StartNew(() => {
var y = FooTask.Result;
return y - n;
}
var BagTask<z> = Task<z>.Factory.StartNew(() => {
var g = BarTask.Result;
var h = BazTask.Result;
return 1;
}
// Lots of try/catch aggregate shenanigans.
BagTask.Wait();
return "yay";
}
Oh I also need to roll back previous things if something breaks, like remove a database row if the email fails to send, so there are a few levels of try/catches in there. Anyway, all of this works (amazingly, it all worked on the first try). My question is whether this sort of method would benefit from being rewritten to use async/await rather than Tasks. If so, how would the multiple-dependency scenario play out without re-running an async method that was already ran or awaited by another method? I guess some shared variable?
Update:
The // ... lines were supposed to indicate that the task was doing something, like looking up DB records. Sorry if that wasn't clear. About half of the tasks (there are 8 total) can take up to maybe five seconds to run, if the contexts aren't warmed up, and the other half of the tasks just collect/assemble/process/use that data.
I also need to roll back previous things if something breaks, like remove a database row if the email fails to send, so there are a few levels of try/catches in there.
You'll find that async/await (paired with Task.Run instead of StartNew) will make your code much cleaner:
var x = await Task.Run(() => {
...
return ...;
});
var y = await Task.Run(() => {
...
return x + n;
});
var bazTask = Task.Run(() => {
...
return y - n;
});
var bagTask = Task.Run(async () => {
...
var g = y;
var h = await bazTask;
return 1;
}
await bagTask;
return "yay";
You also have the option of using Task.WhenAll if you want to await multiple tasks completing. Error handling in particular is cleaner with await since it doesn't wrap exception in AggregateException.
However
called via an AJAX request
This is a bit of a problem. Both StartNew and Task.Run should be avoided on ASP.NET.
shaved off up to two seconds in wall time
Yes, parallel processing on ASP.NET (which is what the code is currently doing) will make individual requests execute faster, but at the expense of scalability. The server will be unable to handle as many requests if it is doing parallel processing on each one.
save to the database, send emails, look up a bunch of info in other APIs/databases
These are all I/O-bound operations, not CPU-bound. So the ideal solution is to create truly-async I/O methods and then just call them using await (and Task.WhenAll if necessary). By "truly-async", I mean calling the underlying asynchronous APIs (e.g., HttpClient.GetStringAsync instead of WebClient.DownloadString; or Entity Framework's ToListAsync instead of ToList, etc). Using StartNew or Task.Run is what I call "fake asynchrony".
Once you have asynchronous APIs, your top-level method really becomes simple:
X x = await databaseService.GetXFromDatabaseAsync();
Y y = await apiService.LookupValueAsync(x);
Task<Baz> bazTask = databaseSerivce.GetBazFromDatabaseAsync(y);
Task<Bag> bagTask = apiService.SecondaryLookupAsync(y);
await Task.WhenAll(bazTask, bagTask);
Baz baz = await bazTask;
Bag bag = await bagTask;
return baz + bag;
I would like to know if it's possible to improve this code for better performance. I'm new to the whole async thing on the serverside, so please bear with me here:
con.GetGame(id, game => {
foreach(Player p in game.Team1)
{
p.SomeExtraDetails = GetPlayerDetails(p.Id);
}
// I would like the player data to be set on all players
// before ending up here
});
private PlayerDetails GetPlayerDetails(double playerId)
{
var task = con.GetPlayer(playerId);
PlayerDetails ret = null;
Task continuation = task.ContinueWith(t =>
{
ret = t.Result;
});
continuation.Wait();
return ret;
}
If I got it right, continuation.Wait(); blocks the main thread.
Is there any way to make the tasks run simultaneously?
Ideally, you'd make these operations asynchronous all the way down:
private Task<PlayerDetails> GetPlayerDetailsAsync(double playerId)
{
return con.GetPlayer(playerId);
}
con.GetGame(id, game => {
var tasks = game.Team1
.Select(p => new { Player=p, Details=GetPlayerDetailsAsync(p.Id)})
.ToList(); // Force all tasks to start...
foreach(var t in tasks)
{
t.Player.SomeExtraDetails = await t.Details;
}
// all player data is now set on all players
});
If that isn't an option (ie: you're not using VS 2012), you could simplify your code to:
// This is a more efficient version of your existing code
private PlayerDetails GetPlayerDetails(double playerId)
{
var task = con.GetPlayer(playerId);
return task.Result;
}
con.GetGame(id, game => {
// This will run all at once, but block until they're done
Parallel.ForEach(game.Team1, p =>
{
p.SomeExtraDetails = GetPlayerDetails(p.Id);
});
});
consider using Parallel.ForEach in your GetGame page instead of Task.ContinueWith
Alternative solution without LINQ (although I like Reed Copsey's solution). However, beware that, as pointed out in the comments, this solution introduces an overhead by encapsulating the call to GetPlayerDetailsAsync() inside Tasks created by Task.Run().
Requires .NET 4.5 and C# 5.
con.GetGame(id, game => {
var tasks = new List<Task>();
foreach(Player p in game.Team1)
{
tasks.Add(Task.Run(async () => p.SomeExtraDetails = await GetPlayerDetailsAsync(p.Id)));
}
Task.WaitAll(tasks.ToArray());
});
private Task<PlayerDetails> GetPlayerDetailsAsync(double playerId)
{
return con.GetPlayerAsync(playerId);
});
Further, in order to catch up on the Task-based Asynchronous Pattern (TAP) with .NET 4.5 I highly recommend reading: Task-based Asynchronous Pattern - by Stephen Toub, Microsoft.
I have a function that sends requests to search for information from a url. The search criteria is a list and the search iterates through each item and requests info from the url. To speed it up I divide the list into x subsets, and create a task for each subset. Then each subset sends 3 simultaneous requests, as follows:
This is the main entry point:
Search search = new Search();
await Task.Run(() => search.Start());
The Start function:
public void Search()
{
//Each subset is a List<T> ie where T is certain search criteria
//If originalList.Count = 30 and max items per subset is 10, then subsets will be 3 lists of 10 items each
var subsets = CreateSubsets(originalList);
List<Task> tasks = new List<Task>(subsets.Count);
for (int i = 0; i < subsets.Count; i++)
tasks.Add(Task.Factory.StartNew(() => SearchSubset(subsets[i]));
Task.WaitAll(tasks.ToArray());
foreach (Task task in tasks)
if (task != null)
task.Dispose();
}
private void SearchSubset(List<SearchCriteria> subset)
{
//Checking that i+1 and i+2 is within subset.Count-1 has been omitted
for (int i = 0; i < subset.Count; i+=3)
{
Task[] tasks = {Task.Factory.StartNew(() => SearchCriteria(subset[i])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+1])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+2]))};
//Wait & dispose like above
}
}
private void SearchCriteria(SearchCriteria criteria)
{
//SearchForCriteria uses WebRequest and WebResponse (callback)
//to query the url and return the response.content
var results = SearchForCriteria(criteria);
//process results...
}
The above code works fine and the search is quite fast. However, does the above code create too much overhead, and is there is more cleaner (or simpler) way to achieve the same results?
This is not the most efficient method, but if this is for a desktop application, efficiency isn't your primary concern anyway. So, unless you are actually seeing performance degradation from this code, you shouldn't change it.
That said, I would have approached this differently.
You're using the TPL to parallelize I/O-bound operations. You're using dynamic parallelism, the most complex kind; as Jeff Mercado commented, your code would be simpler and slightly more efficient if you used a higher-level parallelism abstraction such as Parallel or PLINQ).
However, any parallel approach is going to waste thread pool threads by blocking them. Since this is I/O-bound, I would recommend using async/await to make them concurrent.
If you want to do simple throttling, you can use SemaphoreSlim. I don't think you need to do throttling like this in addition to your subsets, but if you want an async equivalent to your existing code, it would look something like this:
public Task SearchAsync()
{
var subsets = CreateSubsets(originalList);
return Task.WhenAll(subsets.Select(subset => SearchSubsetAsync(subset)));
}
private Task SearchSubsetAsync(List<SearchCriteria> subset)
{
var semaphore = new SemaphoreSlim(3);
return Task.WhenAll(subset.Select(criteria => SearchCriteriaAsync(criteria, semaphore)));
}
private async Task SearchCriteriaAsync(SearchCriteria criteria, SemaphoreSlim semaphore)
{
await semaphore.WaitAsync();
try
{
// SearchForCriteriaAsync uses HttpClient (async).
var results = await SearchForCriteriaAsync(criteria);
// Consider returning results rather than processing them here.
}
finally
{
semaphore.Release();
}
}