Getting error 409 "confict" because of Task.WhenAll() involving SqlDataReader - c#

I'm asking this more because I have no idea of why the way I solved the issue works.
Here's the method that gives me the error (notice the return line):
public async Task<IEnumerable<ContractServiceResponse>> GetContractServices(Guid contractId)
{
var services = await _translationManager.GetCachedResource<Service>();
var rates = await _translationManager.GetCachedResource<Rate>();
var contractServices = await _dbContext.ContractService
.Where(cs => cs.ContractId == contractId)
.ToListAsync();
var serviceCenterBillableConcepts = await _billableConceptService.GetServiceCenterBillableConcepts(services.Select(s => s.Id).Distinct());
var contractBillableConcepts = await _billableConceptService.GetContractBillableConcepts(contractId);
var servicesResponse = contractServices.Select(async cs => new ContractServiceResponse
{
Id = cs.ServiceId,
Service = services.FirstOrDefault(s => s.Id == cs.ServiceId).Translation,
Enabled = cs.Billable,
ExceptionReason = cs.BillableExceptionReason,
BillableConcepts = await AddContractBillableConceptsToServices(cs.ServiceId, contractBillableConcepts, serviceCenterBillableConcepts, rates)
});
return await Task.WhenAll(servicesResponse);
}
And here's the code that works (notice the return line).
public async Task<IEnumerable<ContractServiceResponse>> GetContractServices(Guid contractId)
{
var services = await _translationManager.GetCachedResource<Service>();
var rates = await _translationManager.GetCachedResource<Rate>();
var contractServices = await _dbContext.ContractService
.Where(cs => cs.ContractId == contractId)
.ToListAsync();
var serviceCenterBillableConcepts = await _billableConceptService.GetServiceCenterBillableConcepts(services.Select(s => s.Id).Distinct());
var contractBillableConcepts = await _billableConceptService.GetContractBillableConcepts(contractId);
var servicesResponse = contractServices.Select(async cs => new ContractServiceResponse
{
Id = cs.ServiceId,
Service = services.FirstOrDefault(s => s.Id == cs.ServiceId).Translation,
Enabled = cs.Billable,
ExceptionReason = cs.BillableExceptionReason,
BillableConcepts = await AddContractBillableConceptsToServices(cs.ServiceId, contractBillableConcepts, serviceCenterBillableConcepts, rates)
});
return servicesResponse.Select(t => t.Result);
}
Why the second one works, but the first one doesn't?
Thanks in advance.

First, this has nothing to do with SqlDataReader or any database. Databases don't return HTTP status codes. The 409 is returned by whatever is called by await AddContractBillableConceptsToServices
The first example executes as many calls as there are items in contractServices at the exact same time. If there are 100 items, it will execute 100 requests. All those requests would try to allocate and use memory, access databases or limited other resources at the same time, causing blocks, queueing and possibly deadlocks. Making 100 concurrent calls could easily result in 100x degradation if not outright crashing the service.
That's why production services always implement throttling and queuing, and, if a client is ill-behaved, will throw a 409 and block it for a while.
The second code runs sequentially. t.Result blocks until the task t completes, so the second code is no different than:
foreach(var cs in contractServices)
{
var t=AddContractBillableConceptsToServices(cs.ServiceId, contractBillableConcepts, serviceCenterBillableConcepts, rates);
t.Wait();
yield return new ContractServiceResponse
{
BillableConcepts = t.Result
}
}
Execute only N requests at a time
A real solution is to execute only a limited number of requests at a time, using, eg Parallel.ForEachAsync :
var results=new ConcurrentQueue<ContractServiceResponse> ();
Parallel.ForEachAsync(contractServices, async cs=>{
var concepts=AddContractBillableConceptsToServices(cs.ServiceId, contractBillableConcepts, serviceCenterBillableConcepts, rates);
var response=new ContractServiceResponse
{
...
BillableConcepts = concepts
}
results.Enqueue(response);
});
By default, ForEachAsync will make as many concurrent calls as there are cores. This can be changed through the ParallelOptions.MaxDegreeOfParallelism property.

Youre executing everything in sequence. servicesResponse.Select(t => t.Result) won't do anything until the IEnumerable is enumerated by the consumer.
Task.WhenAll() will run the tasks in parallel. Your issue is that the code you're calling, does not play nice with parallelism.
To solve this, don't use WhenAll, but just use await.
Alternatively you can pinpoint the code which breaks when it is run in parallel, and implement some kind of locking mechanism using a semaphoreslim

Related

Channel multiple producers and consumers

I have the below code:
var channel = Channel.CreateUnbounded<string>();
var consumers = Enumerable
.Range(1, 5)
.Select(consumerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
while (await channel.Reader.WaitToReadAsync())
{
if (channel.Reader.TryRead(out var item))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}
}));
var producers = Enumerable
.Range(1, 5)
.Select(producerNumber =>
Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}));
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
await Task.WhenAll(consumers);
Which works as it should however im wanting it to consume at the same time as producing. However
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete());
Blocks the consumer from running until its complete and I can't think of a way of getting them both to run?
There are a couple of issues with the code, including forgetting to enumate the producers and consumers enumerables. IEnumerable is evaluated lazily, so until you actually enumerate it with eg foreach or ToList, nothing is generated.
There's nothing wrong with ContinueWith when used properly either. It's definitely better and cheaper than using exceptions as control flow.
The code can be improved a lot by using some common Channel coding patterns.
The producer owns and encapsulates the channel
The producer exposes only Reader(s)
Plus, ContinueWith is an excellent choice to signal a ChannelWriter's completion, as we don't care at all which thread will do that. If anything, we'd prefer to use one of the "worker" threads to avoid a thread switch.
Let's say the producer function is:
async Task Produce(ChannelWriter<string> writer, int producerNumber)
{
return Task.Run(async () =>
{
var rnd = new Random();
for (var i = 0; i < 10; i++)
{
var t = $"Message {i}";
Console.WriteLine($"Producing {t} on producer {producerNumber}");
await channel.Writer.WriteAsync(t);
await Task.Delay(TimeSpan.FromSeconds(rnd.Next(3)));
}
}
}
Producer
The producer can be :
ChannelReader<string> ProduceData(int dop)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(producerNumber => Produce(producerNumber))
.ToList();
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
.
return channel.Reader;
}
Completion and error propagation
Notice the line :
_ =Task.WhenAll(tasks).ContinueWith(t=>writer.TryComplete(t.Exception));
This says that as soon as the producers complete, the writer itself should complete with any exception that may be raised. It doesn't really matter what thread the continuation runs on as it doesn't do anything other than call TryComplete.
More importantly, t=>writer.TryComplete(t.Exception) propagates the worker exception(s) to downstream consumers. Otherwise the consumers would never know something went wrong. If you had a database consumer you'd want it to avoid finalizing any changes if the source aborted.
Consumer
The consumer method can be:
async Task Consume(ChannelReader<string> reader,int dop,CancellationToken token=default)
{
var tasks= Enumerable
.Range(1, dop)
.Select(consumerNumber =>
Task.Run(async () =>
{
await foreach(var item in reader.ReadAllAsync(token))
{
Console.WriteLine($"Consuming {item} on consumer {consumerNumber}");
}
}));
await Task.WhenAll(tasks);
}
In this case await Task.WhenAll(tasks); enumerates the worker tasks thus starting them.
Nothing else is needed to produce all generated messages. When all producers finish, the Channel.Reader is completed. When that happens, ReadAllAsync will keep offering all remaining messages to the consumers and exit.
Composition
Combining both methods is as easy as:
var reader=Produce(10);
await Consume(reader);
General Pattern
This is a general pattern for pipeline stages using Channels - read the input from a ChannelReader, write it to an internal Channel and return only the owned channel's Reader. This way the stage owns the channel which makes completion and error handling a lot easier:
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var tasks=Enumerable.Range(0,dop)
.Select(async i=>Task.Run(async ()=>
{
await(var item in reader.ReadAllAsync(token))
{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
}
},token));
_ =Task.WhenAll(tasks)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
This allows chaining multiple "stages" together to form a pipeline:
var finalReader=Producer(...)
.Crunch1()
.Crunch2(10)
.Crunch3();
await foreach(var result in finalReader.ReadAllAsync())
{
...
}
Producer and consumer methods can be written in the same way, allowing, eg the creation of a data import pipeline:
var importTask = ReadFiles<string>(somePath)
.ParseCsv<string,Record[]>(10)
.ImportToDb<Record>(connectionString);
await importTask;
With ReadFiles
static ChannelReader<string> ReadFiles(string folder)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
var task=Task.Run(async ()=>{
foreach(var path in Directory.EnumerateFiles(folder,"*.csv"))
{
await writer.WriteAsync(path);
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
Update for .NET 6 Parallel.ForEachAsync
Now that .NET 6 is supported in production, one could use Parallel.ForEachAsync to simplify a concurrent consumer to :
static ChannelReader<TOut> Crunch<Tin,TOut>(this ChannelReader<Tin>,
int dop=1,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<TOut>();
var writer=channel.Writer;
var dop=new ParallelOptions {
MaxDegreeOfParallelism = dop,
CancellationToken = token
};
var task=Parallel.ForEachAsync(
reader.ReadAllAsync(token),
dop,
async item =>{
try
{
...
await writer.WriteAsync(msg);
}
catch(Exception exc)
{
//Handle the exception and keep processing messages
}
});
task.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel.Reader;
}
The consumers and producers variables are of type IEnumerable<Task>. This a deferred enumerable, that needs to be materialized in order for the tasks to be created. You can materialize the enumerable by chaining the ToArray operator on the LINQ queries. By doing so, the type of the two variables will become Task[], which means that your tasks are instantiated and up and running.
As a side note, the ContinueWith method requires passing explicitly the TaskScheduler.Default as an argument, otherwise you are at the mercy of whatever the TaskScheduler.Current may be (it might be the UI TaskScheduler for example). This is the correct usage of ContinueWith:
await Task.WhenAll(producers)
.ContinueWith(_ => channel.Writer.Complete(), TaskScheduler.Default);
Code analyzer CA2008: Do not create tasks without passing a TaskScheduler
"[...] This is why in production library code I write, I always explicitly specify the scheduler I want to use." (Stephen Toub)
Another problem is that any exceptions thrown by the producers will be swallowed, because the tasks are not awaited. Only the continuation is awaited, which is unlikely to fail. To solve this problem, you could just ditch the primitive ContinueWith, and instead use async-await composition (an async local function that awaits the producers and then completes the channel). In this case not even that is necessary. You could simply do this:
try { await Task.WhenAll(producers); }
finally { channel.Writer.Complete(); }
The channel will Complete after any outcome of the Task.WhenAll(producers) task, and so the consumers will not get stuck.
A third problem is that a failure of some of the producers will cause the immediate termination of the current method, before awaiting the consumers. These tasks will then become fire-and-forget tasks. I am leaving it to you to find how you can ensure that all tasks can be awaited, in all cases, before exiting the method either successfully or with an error.

Running parallel async tasks and return result in .NET Core Web API

Hi Recently i was working in .net core web api project which is downloading files from external api.
In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.
Actually the operation is timing out due to the long download process
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
bool DownloadFlag = false;
foreach (DownloadAttachment downloadAttachment in downloadAttachments)
{
DownloadFlag = await DownloadAttachment(downloadAttachment.id);
//update the download status in database
if(DownloadFlag)
{
bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (UpdateFlag)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
return true;
}
catch (Exception ext)
{
log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
return false;
}
}
Document service code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
}
And DB update code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
using (var db = new SqlConnection(_connectionString.Value))
{
var Result = 0; bool SuccessFlag = false;
var parameters = new DynamicParameters();
parameters.Add("#pm_AttachmentID", AttachmentID);
parameters.Add("#pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
Result = parameters.Get<int>("#pm_Result");
if (Result > 0) { SuccessFlag = true; }
return SuccessFlag;
}
}
How can i move this async task to run parallel ? and get the result? i tried following code
var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result;
Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?
Please help
If you extracted the code that handles individual files to a separate method :
private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
try
{
var download = await DownloadAttachment(downloadAttachment.id);
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (update)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
catch(....)
{
....
}
}
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
foreach (var attachment in downloadAttachments)
{
await DownloadSingleAttachment(attachment);
}
}
....
}
It would be easy to start all downloads at once, although not very efficient :
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
//Start all of them
var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
await Task.WhenAll(tasks);
}
....
}
This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.
Using Dataflow classes - Single block
One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.
We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe :
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
};
var downloader=new ActionBlock<DownloadAttachment>(async att=>{
await DownloadSingleAttachment(att);
},dlOptions);
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await downloader.Completion;
Dataflow - Multiple steps
To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete, or they could go into separate blocks if the methods talk to different services with different concurrency requirements.
The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.
The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.
var downloader=new TransformBlock<string,(string id,bool download)>(
async id=> {
var download=await DownloadAttachment(id);
return (id,download);
},dlOptions);
var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
async (id,download)=> {
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(id);
return (id,update);
}
return (id,false);
});
var deleter=new ActionBlock<(string id,bool update)>(
async (id,update)=> {
if(update)
{
await DeleteAttachment(id);
}
});
The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well :
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);
We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await deleter.Completion;
Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.
Throttling and backpressure
One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism.
It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=20,
};
var updaterOptions= new ExecutionDataflowBlockOptions
{
BoundedCapacity=20,
};
...
var downloader=new TransformBlock<...>(...,dlOptions);
var updater=new TransformBlock<...>(...,updaterOptions);
No other changes are necessary
To run multiple asynchronous operations you could do something like this:
public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
{
const int myNumberOfConcurrentOperations = 10;
var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
var tasks = new List<Task>();
foreach(var myItem in myList)
{
await mySemaphore.WaitAsync();
var task = RunOperation(myItem);
tasks.Add(task);
task.ContinueWith(t => mySemaphore.Release());
}
await Task.WhenAll(tasks);
}
private async Task RunOperation<T>(T myItem)
{
// Do stuff
}
Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment
This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.

Retry Failed Tasks in C#.Net -WhenAll Async [duplicate]

I have a solution that creates multiple I/O based tasks and I'm using Task.WhenAny() to manage these tasks. But often many of the tasks will fail due to network issue or request throttling etc.
I can't seem to find a solution that enables me to successfully retry failed tasks when using a Task.WhenAny() approach.
Here is what I'm doing:
var tasks = new List<Task<MyType>>();
foreach(var item in someCollection)
{
task.Add(GetSomethingAsync());
}
while (tasks.Count > 0)
{
var child = await Task.WhenAny(tasks);
tasks.Remove(child);
???
}
So the above structure works for completing tasks, but I haven't found a way to handle and retry failing tasks. The await Task.WhenAny throws an AggregateException rather than allowing me to inspect a task status. When In the exception handler I no longer have any way to retry the failed task.
I believe it would be easier to retry within the tasks, and then replace the Task.WhenAny-in-a-loop antipattern with Task.WhenAll
E.g., using Polly:
var tasks = new List<Task<MyType>>();
var policy = ...; // See Polly documentation
foreach(var item in someCollection)
tasks.Add(policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
or, more succinctly:
var policy = ...; // See Polly documentation
var tasks = someCollection.Select(item => policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
If you don't want to use the Polly library for some reason, you could use the Retry method bellow. It accepts a task factory, and keeps creating and then awaiting a task until it completes successfully, or the maxAttempts have been reached:
public static async Task<TResult> Retry<TResult>(Func<Task<TResult>> taskFactory,
int maxAttempts)
{
int failedAttempts = 0;
while (true)
{
try
{
var task = taskFactory();
return await task.ConfigureAwait(false);
}
catch
{
failedAttempts++;
if (failedAttempts >= maxAttempts) throw;
}
}
}
You could then use this method to download (for example) some web pages.
string[] urls =
{
"https://stackoverflow.com",
"https://superuser.com",
//"https://no-such.url",
};
var httpClient = new HttpClient();
var tasks = urls.Select(url => Retry(async () =>
{
return (Url: url, Html: await httpClient.GetStringAsync(url));
}, maxAttempts: 5));
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
}
Output:
Url: https://stackoverflow.com, 112,276 chars
Url: https://superuser.com, 122,784 chars
If you uncomment the third url then instead of these results an HttpRequestException will be thrown, after five failed attempts.
The Task.WhenAll method will wait for the completion of all tasks before propagating the error. In case it is preferable to report the error as soon as possible, you can find solutions in this question.

Await async call in for-each-loop [duplicate]

This question already has answers here:
Using async/await for multiple tasks
(8 answers)
Closed 4 years ago.
I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.
public static async Task<List<Deployment>> GetDeployments()
{
try
{
var depjson = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}");
var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList();
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
return deployments;
}
catch (Exception)
{
throw;
}
}
This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.
I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.
There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.
This is how you can turn this:
wait for one -> process -> wait for one -> process ...
into
wait for all -> process -> done
Convert this:
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
To:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.
By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.
Let me know if there are typos, I didn't compile this code.
Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.
However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.
So instead of:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
I'd suggest the following slight change:
// async fetch the Release data of Deployment:
private async Task<Release> FetchReleaseDataAsync(Deployment deployment)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
return JsonConvert.DeserializeObject<Release>(reljson);
}
// async fill the Release data of Deployment:
private async Task FillReleaseDataAsync(Deployment deployment)
{
deployment.Release = await FetchReleaseDataAsync(deployment);
}
Then your procedure is similar to the solution that Fcin suggested:
IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments.
.Select(deployment => FillReleaseDataAsync(deployment)
.ToList();
await Task.WhenAll(tasksFillDeploymentWithReleaseData);
Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.
If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.
This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing
If i understand you right and you want to make the var reljson = await GetJson parralel:
Try this:
Parallel.ForEach(deployments, (deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might limit the number of parallel executions such as:
Parallel.ForEach(
deployments,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might also want to be able to break the loop:
Parallel.ForEach(deployments, (deployment, state) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
if (noFurtherProcessingRequired) state.Break();
});

Concurrent execution of async methods

Using the async/await model, I have a method which makes 3 different calls to a web service and then returns the union of the results.
var result1 = await myService.GetData(source1);
var result2 = await myService.GetData(source2);
var result3 = await myService.GetData(source3);
allResults = Union(result1, result2, result3);
Using typical await, these 3 calls will execute synchronously wrt each other. How would I go about letting them execute concurrently and join the results as they complete?
How would I go about letting them execute in parallel and join the results as they complete?
The simplest approach is just to create all the tasks and then await them:
var task1 = myService.GetData(source1);
var task2 = myService.GetData(source2);
var task3 = myService.GetData(source3);
// Now everything's started, we can await them
var result1 = await task1;
var result1 = await task2;
var result1 = await task3;
You might also consider Task.WhenAll. You need to consider the possibility that more than one task will fail... with the above code you wouldn't observe the failure of task3 for example, if task2 fails - because your async method will propagate the exception from task2 before you await task3.
I'm not suggesting a particular strategy here, because it will depend on your exact scenario. You may only care about success/failure and logging one cause of failure, in which case the above code is fine. Otherwise, you could potentially attach continuations to the original tasks to log all exceptions, for example.
You could use the Parallel class:
Parallel.Invoke(
() => result1 = myService.GetData(source1),
() => result2 = myService.GetData(source2),
() => result3 = myService.GetData(source3)
);
For more information visit: http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel(v=vs.110).aspx
As a more generic solution you can use the api I wrote below, it also allows you to define a real time throttling mechanism of max number of concurrent async requests.
The inputEnumerable will be the enumerable of your source and asyncProcessor is your async delegate (myservice.GetData in your example).
If the asyncProcessor - myservice.GetData - returns void or just a Task without any type, then you can simply update the api to reflect that. (just replace all Task<> references to Task)
public static async Task<TOut[]> ForEachAsync<TIn, TOut>(
IEnumerable<TIn> inputEnumerable,
Func<TIn, Task<TOut>> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task<TOut>> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = inputEnumerable.Select(
async input =>
{
await throttler.WaitAsync();
try
{
return await asyncProcessor(input).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = inputEnumerable.Select(asyncProcessor);
}
await Task.WhenAll(tasks);
}

Categories

Resources