I'm using ODP.NET, which doesn't provide any asych methods like the SQL driver does or other Oracle drivers.
I have lots of slow queries, sometimes I need to call several of them on a single MVC controller call. So I'm trying to wrap them in Task calls. So I'm thinking of using this pattern:
(This is somewhat contrived example I wouldn't call the same query 10 times for real, it would be some heterogeneous workload)
List<Task<DbDataReader>> list = new List<Task<DbDataReader>>(10);
for (int i = 0; i < 10; i++)
{
ERROR_LOG_PKG_DR package = new ERROR_LOG_PKG_DR();
//Could be a bunch of different queries.
//Returns a task that has the longrunningtask property set, so it's on it's own thread
list.Add(package.QUERY_ALL_ASYNC());
}
Task.WaitAll(list.ToArray());
And here is the task:
public Task<DbDataReader> QUERY_ALL_ASYNC()
{
CancellationToken ct = new CancellationToken();
return Task.Factory.StartNew(_ =>
{
System.Diagnostics.Debug.WriteLine("InTask: Thread: " +
Thread.CurrentThread.ManagedThreadId);
System.Diagnostics.Debug.WriteLine("InTask: Is Background: " +
Thread.CurrentThread.IsBackground);
System.Diagnostics.Debug.WriteLine("InTask: Is ThreadPool: " +
Thread.CurrentThread.IsThreadPoolThread);
return QUERY_ALL<DbDataReader>();
}, null, ct,TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
This fires up 10 threads. What if I wanted some thread pool like behavior-- ie. only about 4 concurrent threads at a time-- re-use threads, etc, but I want it outside of the ASP.NET thread pool (which is being used to service requests)
How do I do that?
If you are happy to sacrifice your web app scalability, you could use SemaphoreSlim to throttle the number of parallel tasks:
const int MAX_PARALLEL_TASKS = 4;
DbDataReader GetData(CancellationToken token)
{
DbDataReader reader = ... // execute the synchronous DB API
return reader;
}
// this can be called form an async controller method
async Task ProcessAsync()
{
// list of synchronous methods or lambdas to offload to thread pool
var funcs = new Func<CancellationToken, DbDataReader>[] {
GetData, GetData2, ... };
Task<DbDataReader>[] tasks;
using (var semaphore = new SemaphoreSlim(MAX_PARALLEL_TASKS))
{
tasks = funcs.Select(async(func) =>
{
await semaphore.WaitAsync();
try
{
return await Task.Run(() => func(token));
}
finally
{
semaphore.Release();
}
}).ToArray();
await Task.WhenAll(tasks);
}
// process the results, e.g: tasks[0].Result
}
I think this is what you need - (msdn) How to: Create a Task Scheduler That Limits Concurrency.
Shortly put idea is to create custom TaskScheduler that limits number of concurrently runninga tasks.
Why not just use ThreadPool?
https://msdn.microsoft.com/en-us/library/kbf0f1ct.aspx
This provides system-aware throttling and has been around for a very long time
Related
Hi Recently i was working in .net core web api project which is downloading files from external api.
In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.
Actually the operation is timing out due to the long download process
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
bool DownloadFlag = false;
foreach (DownloadAttachment downloadAttachment in downloadAttachments)
{
DownloadFlag = await DownloadAttachment(downloadAttachment.id);
//update the download status in database
if(DownloadFlag)
{
bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (UpdateFlag)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
return true;
}
catch (Exception ext)
{
log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
return false;
}
}
Document service code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
}
And DB update code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
using (var db = new SqlConnection(_connectionString.Value))
{
var Result = 0; bool SuccessFlag = false;
var parameters = new DynamicParameters();
parameters.Add("#pm_AttachmentID", AttachmentID);
parameters.Add("#pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
Result = parameters.Get<int>("#pm_Result");
if (Result > 0) { SuccessFlag = true; }
return SuccessFlag;
}
}
How can i move this async task to run parallel ? and get the result? i tried following code
var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result;
Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?
Please help
If you extracted the code that handles individual files to a separate method :
private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
try
{
var download = await DownloadAttachment(downloadAttachment.id);
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (update)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
catch(....)
{
....
}
}
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
foreach (var attachment in downloadAttachments)
{
await DownloadSingleAttachment(attachment);
}
}
....
}
It would be easy to start all downloads at once, although not very efficient :
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
//Start all of them
var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
await Task.WhenAll(tasks);
}
....
}
This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.
Using Dataflow classes - Single block
One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.
We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe :
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
};
var downloader=new ActionBlock<DownloadAttachment>(async att=>{
await DownloadSingleAttachment(att);
},dlOptions);
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await downloader.Completion;
Dataflow - Multiple steps
To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete, or they could go into separate blocks if the methods talk to different services with different concurrency requirements.
The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.
The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.
var downloader=new TransformBlock<string,(string id,bool download)>(
async id=> {
var download=await DownloadAttachment(id);
return (id,download);
},dlOptions);
var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
async (id,download)=> {
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(id);
return (id,update);
}
return (id,false);
});
var deleter=new ActionBlock<(string id,bool update)>(
async (id,update)=> {
if(update)
{
await DeleteAttachment(id);
}
});
The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well :
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);
We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await deleter.Completion;
Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.
Throttling and backpressure
One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism.
It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=20,
};
var updaterOptions= new ExecutionDataflowBlockOptions
{
BoundedCapacity=20,
};
...
var downloader=new TransformBlock<...>(...,dlOptions);
var updater=new TransformBlock<...>(...,updaterOptions);
No other changes are necessary
To run multiple asynchronous operations you could do something like this:
public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
{
const int myNumberOfConcurrentOperations = 10;
var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
var tasks = new List<Task>();
foreach(var myItem in myList)
{
await mySemaphore.WaitAsync();
var task = RunOperation(myItem);
tasks.Add(task);
task.ContinueWith(t => mySemaphore.Release());
}
await Task.WhenAll(tasks);
}
private async Task RunOperation<T>(T myItem)
{
// Do stuff
}
Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment
This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.
I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?
Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);
Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}
I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.
Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).
Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}
I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.
Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}
Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions
1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}