In my Asp.Net Core WebApi Controller, I'm receiving a IFormFile[] files. I need to convert this to of List<DocumentData>. I first used foreach. It was working fine. But later decided to change to Parallel.ForEach as I'm receiving many(>5) files.
Here is my DocumentData Class:
public class DocumentData
{
public byte[] BinaryData { get; set; }
public string FileName { get; set; }
}
Here is my Parallel.ForEach Logic:
var documents = new ConcurrentBag<DocumentData>();
Parallel.ForEach(files, async (currentFile) =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
For Example, even for two files as input, documents always gives one file as output. Am I missing something?
I initially had List<DocumentData>. I found that it's not thread safe and changed to ConcurrentBag<DocumentData>. But still I'm getting unexpected results. Please assist on where I'm wrong?
I guess it is because, Parallel.Foreach doesn't support async/await. It only takes Action as input and executes it for each item. And in case of async delegates it will execute them in a fire-and-forget manner.
In that case passed lambda will be considered as async void function and async void can't be awaited.
If there were overload which takes Func<Task> then it would work.
I suggest you to create Tasks with the help of Select and use Task.WhenAll for executing them at the same time.
For example:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
await Task.WhenAll(tasks);
Additionally you can improve that code with just returning DocumentData instance from that method, and in such case there is no need to modify documents collection. Task.WhenAll has overload which takes IEnumerable<Task<TResult> as input and produces Task of TResult array. So, the result will be so:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
return new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
};
}
}
return null;
});
var documents = (await Task.WhenAll(tasks)).Where(d => d != null).ToArray();
You had the right idea with a concurrent collection, but misused a TPL method.
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run mutple tasks at the same time use Task.WhenAll , or a TPL Dataflow ActionBlock which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
The fundimental issue is when you call an async lambda on an Action, you are essentially creating an async void method, which will run as a task unobserved. That's to say, your TPL method is just creating a bunch of tasks in parallel to run a bunch of unobserved tasks and not waiting for them.
Think of it like this, you ask a bunch of friends to go and get you some groceries, they in turn tell someone else to get your groceries, yet your friends report back to you and say thier job is done. It obviously isn't and you have no groceries.
Related
Hi Recently i was working in .net core web api project which is downloading files from external api.
In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.
Actually the operation is timing out due to the long download process
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
bool DownloadFlag = false;
foreach (DownloadAttachment downloadAttachment in downloadAttachments)
{
DownloadFlag = await DownloadAttachment(downloadAttachment.id);
//update the download status in database
if(DownloadFlag)
{
bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (UpdateFlag)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
return true;
}
catch (Exception ext)
{
log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
return false;
}
}
Document service code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
}
And DB update code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
using (var db = new SqlConnection(_connectionString.Value))
{
var Result = 0; bool SuccessFlag = false;
var parameters = new DynamicParameters();
parameters.Add("#pm_AttachmentID", AttachmentID);
parameters.Add("#pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
Result = parameters.Get<int>("#pm_Result");
if (Result > 0) { SuccessFlag = true; }
return SuccessFlag;
}
}
How can i move this async task to run parallel ? and get the result? i tried following code
var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result;
Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?
Please help
If you extracted the code that handles individual files to a separate method :
private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
try
{
var download = await DownloadAttachment(downloadAttachment.id);
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (update)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
catch(....)
{
....
}
}
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
foreach (var attachment in downloadAttachments)
{
await DownloadSingleAttachment(attachment);
}
}
....
}
It would be easy to start all downloads at once, although not very efficient :
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
//Start all of them
var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
await Task.WhenAll(tasks);
}
....
}
This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.
Using Dataflow classes - Single block
One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.
We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe :
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
};
var downloader=new ActionBlock<DownloadAttachment>(async att=>{
await DownloadSingleAttachment(att);
},dlOptions);
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await downloader.Completion;
Dataflow - Multiple steps
To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete, or they could go into separate blocks if the methods talk to different services with different concurrency requirements.
The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.
The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.
var downloader=new TransformBlock<string,(string id,bool download)>(
async id=> {
var download=await DownloadAttachment(id);
return (id,download);
},dlOptions);
var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
async (id,download)=> {
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(id);
return (id,update);
}
return (id,false);
});
var deleter=new ActionBlock<(string id,bool update)>(
async (id,update)=> {
if(update)
{
await DeleteAttachment(id);
}
});
The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well :
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);
We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await deleter.Completion;
Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.
Throttling and backpressure
One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism.
It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=20,
};
var updaterOptions= new ExecutionDataflowBlockOptions
{
BoundedCapacity=20,
};
...
var downloader=new TransformBlock<...>(...,dlOptions);
var updater=new TransformBlock<...>(...,updaterOptions);
No other changes are necessary
To run multiple asynchronous operations you could do something like this:
public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
{
const int myNumberOfConcurrentOperations = 10;
var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
var tasks = new List<Task>();
foreach(var myItem in myList)
{
await mySemaphore.WaitAsync();
var task = RunOperation(myItem);
tasks.Add(task);
task.ContinueWith(t => mySemaphore.Release());
}
await Task.WhenAll(tasks);
}
private async Task RunOperation<T>(T myItem)
{
// Do stuff
}
Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment
This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.
I've been reading examples for a long time now, but unfortunately I've been unable to apply the solutions to the code I'm working with. Some quick Facts/Assorted Info:
1) I'm new to C#
2) The code posted below is modified from Amazon Web Services (mostly stock)
3) Purpose of code is to compare server info to offline already downloaded info and create a list of need to download files. This snip is for the list made from the server side, only option with AWS is to call async, but I need this to finish before moving forward.
public void InitiateSearch()
{
UnityInitializer.AttachToGameObject(this.gameObject);
//these are the access key and secret access key for credentials
BasicAWSCredentials credentials = new BasicAWSCredentials("secret key", "very secret key");
AmazonS3Config S3Config = new AmazonS3Config()
{
ServiceURL = ("url"),
RegionEndpoint = RegionEndpoint.blahblah
};
//Setting the client to be used in the call below
AmazonS3Client Client = new AmazonS3Client(credentials, S3Config);
var request = new ListObjectsRequest()
{
BucketName = "thebucket"
};
Client.ListObjectsAsync(request, (responseObject) =>
{
if (responseObject.Exception == null)
{
responseObject.Response.S3Objects.ForEach((o) =>
{
int StartCut = o.Key.IndexOf(SearchType) - 11;
if (SearchType == o.Key.Substring(o.Key.IndexOf(SearchType), SearchType.Length))
{
if (ZipCode == o.Key.Substring(StartCut + 12 + SearchType.Length, 5))
{
AWSFileList.Add(o.Key + ", " + o.LastModified);
}
}
}
);
}
else
{
Debug.Log(responseObject.Exception);
}
});
}
I have no idea how to apply await to the Client.ListObjectsAsync line, I'm hoping you all can give me some guidance and let me keep my hair for a few more years.
You can either mark your method async and await it, or you can call .Wait() or .Result() on the Task you're given back.
I have no idea how to apply await to the Client.ListObjectsAsync line
You probably just put await in front of it:
await Client.ListObjectsAsync(request, (responseObject) => ...
As soon as you do this, Visual Studio will give you an error. Take a good look at the error message, because it tells you exactly what to do next (mark InitiateSearch with async and change its return type to Task):
public async Task InitiateSearchAsync()
(it's also a good idea to add an Async suffix to follow the common pattern).
Next, you'd add an await everywhere that InitiateSearchAsync is called, and so on.
I'm assuming Client.ListObjectsAsync returns a Task object, so a solution for your specific problem would be this:
public async void InitiateSearch()
{
//code
var collection = await Client.ListObjectsAsync(request, (responseObject) =>
{
//code
});
foreach (var item in collection)
{
//do stuff with item
}
}
the variable result will now be filled with the objects. You may want to set the return type of InitiateSearch() to Task, so you can await it too.
await InitiateSearch(); //like this
If this method is an event handler of some sort (like called by the click of a button), then you can keep using void as return type.
A simple introduction from an unpublished part of the documentation for async-await:
Three things are needed to use async-await:
The Task object: This object is returned by a method which is executed asynchronous. It allows you to control the execution of the method.
The await keyword: "Awaits" a Task. Put this keyword before the Task to asynchronously wait for it to finish
The async keyword: All methods which use the await keyword have to be marked as async
A small example which demonstrates the usage of this keywords
public async Task DoStuffAsync()
{
var result = await DownloadFromWebpageAsync(); //calls method and waits till execution finished
var task = WriteTextAsync(#"temp.txt", result); //starts saving the string to a file, continues execution right await
Debug.Write("this is executed parallel with WriteTextAsync!"); //executed parallel with WriteTextAsync!
await task; //wait for WriteTextAsync to finish execution
}
private async Task<string> DownloadFromWebpageAsync()
{
using (var client = new WebClient())
{
return await client.DownloadStringTaskAsync(new Uri("http://stackoverflow.com"));
}
}
private async Task WriteTextAsync(string filePath, string text)
{
byte[] encodedText = Encoding.Unicode.GetBytes(text);
using (FileStream sourceStream = new FileStream(filePath, FileMode.Append))
{
await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
}
}
Some thing to note:
You can specify a return value from an asynchronous operations with Task. The await keyword waits till the execution of the method finishes, and returns the string.
the Task object contains the status of the execution of the method, it can be used as any other variable.
if an exception is thrown (for example by the WebClient) it bubbles up at the first time the await keyword is used (in this example at the line string result (...))
It is recommended to name methods which return the Task object as MethodNameAsync
For more information about this take a look at http://blog.stephencleary.com/2012/02/async-and-await.html.
We got an existing library where some of the methods needs to be converted to async methods.
However I'm not sure how to do it with the following method (errorhandling has been removed). The purpose of the method is to zip a file and save it to disk. (Note that the zip class doesn't expose any async methods.)
public static bool ZipAndSaveFile(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
return true;
}
An implementation could look like this:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
await Task.Run(() =>
{
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
});
return true;
}
Which just seems wrong. The client could just call the sync method using Task.Run
Please, anyone got any hints on how to transform it into a async method ?
Which just seems wrong. The client could just call the sync method
using Task.Run
Spot on. By wrapping synchronous code in Task.Run() the library is now using the client's threadpool resources without it being readily apparent. Imagine what could happen to the client's threadpool if all libraries took this approach? Long story short, just expose the synchronous method and let the client decide if it wants to wrap it in Task.Run().
Having said that, if the ZipFile object had async functionality (e.g. had a SaveAsync() method) then you could make the outer method async as well. Here's an example of how that would look:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
// do stuff
await zip.SaveAsync(archiveNameAndPath);
}
return true;
}
As a temporarily solution, I would introduce an extension method:
public static class ZipFileExtensions
{
public static Task SaveAsync(this ZipFile zipFile, string filePath)
{
zipFile.Save(filePath);
return Task.FromResult(true);
}
}
Then the usage would be:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
...
await zip.SaveAsync(archiveNameAndPath).ConfugureAwait(false);
}
return true;
}
Implementing synchronous tasks does not violate anything (talking about Task.FromResult)
Submit a request to https://github.com/jstedfast/Ionic.Zlib asking for an async support in the library due to IO operations
Hope that's done eventually, and then you can upgrade the Ionic.Zlib in your app, delete the ZipFileExtensions, and continue using async version of the Save method (this time built into the library).
Alternatively, you can clone the repo from GitHub, and add SaveAsync by yourself, the submit a pull request back.
It's just not possible to 'convert' a sync method to an async if a library does not support it.
From performance standpoint, this might not be the best solution, but from management point of view, you can decouple stories "Convert everything to async" and "Improve app performance by having Ionic.Zlib async", what makes your backlog more granular.
public static Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
return Task.Run(() =>
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
return true;
});
}
Then use like this
public async Task MyMethod()
{
bool b = await ZipAndSaveFileAsync();
}
Some of the answers suggest that zipping a file is not a process that you should do asynchronously. I don't agree with this.
I can imagine that zipping files is a process that might take some time. During this time you want to keep your UI responsive or you want to zip several files simultaneously, or you want to upload a zipped file while zipping the next one/
The code you show is the proper way to make your function asynchronous. You question whether it is useful to create such a small method. Why not let the users call Task.Run instead of call your async function?
The reason for this is called information hiding. By creating the async function you're hiding how you zip asynchronously, thus relieving others from knowing how to do this.
Besides, information hiding gives you the freedom to change the internals of the procedure as long as you don't change the pre- and postcondition.
One of the answers said that your function still is not asynchronous. That is not true. Callers of your function may call your async function without awaiting for it. While the task is zipping, the caller may do other things. As soon as it needs the boolean result of the task if can await for the task.
Example of usage:
private async Task DoSomethingSimultaneously()
{
var taskZipFileA = ZipAndSaveFileAsync(fileA, ...)
// while this task is zipping do other things,
// for instance start zipping file B:
var taskZipFileB = ZipAndSaveFileAsync(fileB, ...)
// while both tasks are zipping do other things
// after a while you want to wait until both files are finished:
await Task.WhenAll(new Task[] {taskZipFileA, taskZipFileB});
// after the await, the results are known:
if (taskZipFileA.Result)
{
// process the boolean result of taskZipFile A
}
Note the difference between Task.WaitAll and Task.WhenAll
In async - await you use Task.WhenAll. The return is a Task, so you can
await Task.WhenAll (...)
For proper async-await, all functions that call any async function need to be async themselves and return a Task (instead of void) or Task<TResult> instead of TResult. There is one exception: the event handler may return void.
private async void OnButton1_clicked(object sender, ...)
{
bool zipResult = await SaveAndZipFileAsync(...);
ProcessZipResult(zipResult);
}
Using this method your UI keeps responsive. You don't have to call Task.Run
If you have a non-async function and want to start zipping while doing something else, your non-async function has to call Task.Run. As the function is not async it can't use await. When it needs the result of task.Run it needs to use Task.Wait, or Task.WaitAll
private void NonAsyncZipper()
{
var taskZipFileA = Task.Run ( () => ZipAndSaveFileAsync(...);
// while zipping do other things
// after a while when the result is needed:
taskZipFileA.Wait();
ProcesZipResult(taskZipFileA.Result);
}
If it's possible to get the binary data from your Zip library after the compression, then instead of using this library to save the file, use .NET IO libraries to save it.
EDIT:
There is no point in using async for CPU bound operations (such as compression). In your case, the only benefit you can get from async is when you save the file to the disk. I thought that's what you were asking for.
I have some time consuming code in a foreach that uses task/await.
it includes pulling data from the database, generating html, POSTing that to an API, and saving the replies to the DB.
A mock-up looks like this
List<label> labels = db.labels.ToList();
foreach (var x in list)
{
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
List<response> res = await api.sendMessage(object); //POST
//put all the responses in the db
foreach (var r in res)
{
db.responses.add(r);
}
db.SaveChanges();
}
Time wise, generating the Html and posting it to the API seem to be taking most of the time.
Ideally it would be great if I could generate the HTML for the next item, and wait for the post to finish, before posting the next item.
Other ideas are also welcome.
How would one go about this?
I first thought of adding a Task above the foreach and wait for that to finish before making the next POST, but then how do I process the last loop... it feels messy...
You can do it in parallel but you will need different context in each Task.
Entity framework is not thread safe, so if you can't use one context in parallel tasks.
var tasks = myLabels.Select( async label=>{
using(var db = new MyDbContext ()){
// do processing...
var response = await api.getresponse();
db.Responses.Add(response);
await db.SaveChangesAsync();
}
});
await Task.WhenAll(tasks);
In this case, all tasks will appear to run in parallel, and each task will have its own context.
If you don't create new Context per task, you will get error mentioned on this question Does Entity Framework support parallel async queries?
It's more an architecture problem than a code issue here, imo.
You could split your work into two separate parts:
Get data from database and generate HTML
Send API request and save response to database
You could run them both in parallel, and use a queue to coordinate that: whenever your HTML is ready it's added to a queue and another worker proceeds from there, taking that HTML and sending to the API.
Both parts can be done in multithreaded way too, e.g. you can process multiple items from the queue at the same time by having a set of workers looking for items to be processed in the queue.
This screams for the producer / consumer pattern: one producer produces data in a speed different than the consumer consumes it. Once the producer does not have anything to produce anymore it notifies the consumer that no data is expected anymore.
MSDN has a nice example of this pattern where several dataflowblocks are chained together: the output of one block is the input of another block.
Walkthrough: Creating a Dataflow Pipeline
The idea is as follows:
Create a class that will generate the HTML.
This class has an object of class System.Threading.Tasks.Dataflow.BufferBlock<T>
An async procedure creates all HTML output and await SendAsync the data to the bufferBlock
The buffer block implements interface ISourceBlock<T>. The class exposes this as a get property:
The code:
class MyProducer<T>
{
private System.Threading.Tasks.Dataflow.BufferBlock<T> bufferBlock = new BufferBlock<T>();
public ISourceBlock<T> Output {get {return this.bufferBlock;}
public async ProcessAsync()
{
while (somethingToProduce)
{
T producedData = ProduceOutput(...)
await this.bufferBlock.SendAsync(producedData);
}
// no date to send anymore. Mark the output complete:
this.bufferBlock.Complete()
}
}
A second class takes this ISourceBlock. It will wait at this source block until data arrives and processes it.
do this in an async function
stop when no more data is available
The code:
public class MyConsumer<T>
{
ISourceBlock<T> Source {get; set;}
public async Task ProcessAsync()
{
while (await this.Source.OutputAvailableAsync())
{ // there is input of type T, read it:
var input = await this.Source.ReceiveAsync();
// process input
}
// if here, no more input expected. finish.
}
}
Now put it together:
private async Task ProduceOutput<T>()
{
var producer = new MyProducer<T>();
var consumer = new MyConsumer<T>() {Source = producer.Output};
var producerTask = Task.Run( () => producer.ProcessAsync());
var consumerTask = Task.Run( () => consumer.ProcessAsync());
// while both tasks are working you can do other things.
// wait until both tasks are finished:
await Task.WhenAll(new Task[] {producerTask, consumerTask});
}
For simplicity I've left out exception handling and cancellation. StackOverFlow has artibles about exception handling and cancellation of Tasks:
Keep UI responsive using Tasks, Handle AggregateException
Cancel an Async Task or a List of Tasks
This is what I ended up using: (https://stackoverflow.com/a/25877042/275990)
List<ToSend> sendToAPI = new List<ToSend>();
List<label> labels = db.labels.ToList();
foreach (var x in list) {
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
sendToAPI.add(the object with HTML);
}
int maxParallelPOSTs=5;
await TaskHelper.ForEachAsync(sendToAPI, maxParallelPOSTs, async i => {
using (NasContext db2 = new NasContext()) {
List<response> res = await api.sendMessage(i.object); //POST
//put all the responses in the db
foreach (var r in res)
{
db2.responses.add(r);
}
db2.SaveChanges();
}
});
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext()) {
await body(partition.Current).ContinueWith(t => {
if (t.Exception != null) {
string problem = t.Exception.ToString();
}
//observe exceptions
});
}
}));
}
basically lets me generate the HTML sync, which is fine, since it only takes a few seconds to generate 1000's but lets me post and save to DB async, with as many threads as I predefine. In this case I'm posting to the Mandrill API, parallel posts are no problem.
I has a simple console app where I want to call many Urls in a loop and put the result in a database table. I am using .Net 4.5 and using async i/o to fetch the URL data. Here is a simplified version of what I am doing. All methods are async except for the database operation. Do you guys see any issues with this? Are there better ways of optimizing?
private async Task Run(){
var items = repo.GetItems(); // sync method to get list from database
var tasks = new List<Task>();
// add each call to task list and process result as it becomes available
// rather than waiting for all downloads
foreach(Item item in items){
tasks.Add(GetFromWeb(item.url).ContinueWith(response => { AddToDatabase(response.Result);}));
}
await Task.WhenAll(tasks); // wait for all tasks to complete.
}
private async Task<string> GetFromWeb(url) {
HttpResponseMessage response = await GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
private void AddToDatabase(string item){
// add data to database.
}
Your solution is acceptable. But you should check out TPL Dataflow, which allows you to set up a dataflow "mesh" (or "pipeline") and then shove the data through it.
For a problem this simple, Dataflow won't really add much other than getting rid of the ContinueWith (I always find manual continuations awkward). But if you plan to add more steps or change your data flow in the future, Dataflow should be something you consider.
Your solution is pretty much correct, with just two minor mistakes (both of which cause compiler errors). First, you don't call ContinueWith on the result of List.Add, you need call continue with on the task and then add the continuation to your list, this is solved by just moving a parenthesis. You also need to call Result on the reponse Task.
Here is the section with the two minor changes:
tasks.Add(GetFromWeb(item.url)
.ContinueWith(response => { AddToDatabase(response.Result);}));
Another option is to leverage a method that takes a sequence of tasks and orders them by the order that they are completed. Here is my implementation of such a method:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var taskSources = new BlockingCollection<TaskCompletionSource<T>>();
var taskSourceList = new List<TaskCompletionSource<T>>(taskList.Count);
foreach (var task in taskList)
{
var newSource = new TaskCompletionSource<T>();
taskSources.Add(newSource);
taskSourceList.Add(newSource);
task.ContinueWith(t =>
{
var source = taskSources.Take();
if (t.IsCanceled)
source.TrySetCanceled();
else if (t.IsFaulted)
source.TrySetException(t.Exception.InnerExceptions);
else if (t.IsCompleted)
source.TrySetResult(t.Result);
}, CancellationToken.None, TaskContinuationOptions.PreferFairness, TaskScheduler.Default);
}
return taskSourceList.Select(tcs => tcs.Task);
}
Using this your code can become:
private async Task Run()
{
IEnumerable<Item> items = repo.GetItems(); // sync method to get list from database
foreach (var task in items.Select(item => GetFromWeb(item.url))
.Order())
{
await task.ConfigureAwait(false);
AddToDatabase(task.Result);
}
}
Just though I'd throw in my hat as well with the Rx solution
using System.Reactive;
using System.Reactive.Linq;
private Task Run()
{
var fromWebObservable = from item in repo.GetItems.ToObservable(Scheduler.Default)
select GetFromWeb(item.url);
fromWebObservable
.Select(async x => await x)
.Do(AddToDatabase)
.ToTask();
}