Why do multiple short Tasks end up getting the same id?

Why do multiple short Tasks end up getting the same id? - c#

I have a list of items to process, and I create a task for each one, and then await using Task.WhenAny(). I am following the pattern described here: Start Multiple Async Tasks and Process Them As They Complete .
I have changed one thing: I am using HashSet<Task> instead of List<Task>. But I notice that all the tasks end-up getting the same id, and thus the HashSet only adds one of them, and hence I end up waiting for only one task.
I have a working example here in dotnetfiddle: https://dotnetfiddle.net/KQN2ow
Also pasting the code below:
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ReproTasksWithSameId
{
public class Program
{
public static async Task Main(string[] args)
{
List<int> itemIds = new List<int>() { 1, 2, 3, 4 };
await ProcessManyItems(itemIds);
}
private static async Task ProcessManyItems(List<int> itemIds)
{
//
// Create tasks for each item and then wait for them using Task.WhenAny
// Following Task.WhenAny() pattern described here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/start-multiple-async-tasks-and-process-them-as-they-complete
// But replaced List<Task> with HashSet<Task>.
//
HashSet<Task> tasks = new HashSet<Task>();
// We map the task ids to item ids so that we have enough info to log if a task throws an exception.
Dictionary<int, int> taskIdToItemId = new Dictionary<int, int>();
foreach (int itemId in itemIds)
{
Task task = ProcessOneItem(itemId);
Console.WriteLine("Created task with id: {0}", task.Id);
tasks.Add(task);
taskIdToItemId[task.Id] = itemId;
}
// Add a loop to process the tasks one at a time until none remain.
while (tasks.Count > 0)
{
// Identify the first task that completes.
Task task = await Task.WhenAny(tasks);
// Remove the selected task from the list so that we don't
// process it more than once.
tasks.Remove(task);
// Get the item id from our map, so that we can log rich information.
int itemId = taskIdToItemId[task.Id];
try
{
// Await the completed task.
await task; // unwrap exceptions.
Console.WriteLine("Successfully processed task with id: {0}, itemId: {1}", task.Id, itemId);
}
catch (Exception ex)
{
Console.WriteLine("Failed to process task with id: {0}, itemId: {1}. Just logging & eating the exception {1}", task.Id, itemId, ex);
}
}
}
private static async Task ProcessOneItem(int itemId)
{
// Assume this method awaits on some asynchronous IO.
Console.WriteLine("item: {0}", itemId);
}
}
}
The output I get is this:
item: 1
Created task with id: 1
item: 2
Created task with id: 1
item: 3
Created task with id: 1
item: 4
Created task with id: 1
Successfully processed task with id: 1, itemId: 4
So basically the program exits after awaiting just the first task.
Why do multiple short Tasks end up getting the same id? BTW I also tested with a method that returns Task<TResult> instead of Task, and in that case it works fine.
Is there a better approach I can use?

The question's code is synchronous so there's only one completed task going around. async doesn't make something run asynchronously, it's syntactic sugar that allows using await to await an already executing asynchronous operation to complete without blocking the calling thread.
As for the documentation example, that's what it is. A documentation example, not a pattern and certainly not something that can be used in production except for simple cases.
What happens if you can only make 5 requests at a time to avoid flooding your network or CPU? You'd need to download only a fixed number of records for that. What if you need to process the downloaded data? What if the list of URLs comes from another thread?
Those issues are handled by concurrent containers, pub/sub patterns and the purpose-built Dataflow and Channel classes.
Dataflow
The older Dataflow classes take care of buffering input and output and handling worker tasks automatically. The entire download code can be replaced with an ActionBlock:
var client=new HttpClient(....);
//Cancel if the process takes longer than 30 minutes
var cts=new CancellationTokenSource(TimeSpan.FromMinutes(30));
var options=new ExecutionDataflowBlockOptions(){
MaxDegreeOfParallelism=10,
BoundedCapacity=5,
CancellationToken=cts.Token
};
var block=new ActionBlock<string>(url=>ProcessUrl(url,client,cts.Token));
That's it. The block will use up to 10 concurrent tasks to perform up to 10 concurrent downloads. It will keep up to 5 urls in memory (it would buffer everything otherwise). If the input buffer becomes full, sending items to the block will await asynchronously, t thus preventing slow downloads from flooding memory with URLs.
On the same or a different thread, the "publisher" of urls can post as many URLs as it wants, for as long as it wants.
foreach(var url in urls)
{
await block.SendAsync(url);
}
//Tell the block we're done
block.Complete();
//Wait until all downloads are complete
await block.Completion;
We can use other blocks like TransformBlock to produce output, pass it to another block and thus, create a concurrent processing pipeline. Let's say we have two methods, DownloadURL and ParseResponse instead of just ProcessUrl :
Task<string> DownloadUrlAsync(string url,HttpClient client)
{
return client.GetStringAsync(url);
}
void ParseResponse(string content)
{
var object=JObject.Parse();
DoSomethingWith(object);
}
We could create a separate block for each step in the pipeline, with different DOP and buffers :
var dlOptions=new ExecutionDataflowBlockOptions(){
MaxDegreeOfParallelism=5,
BoundedCapacity=5,
CancellationToken=cts.Token
};
var downloader=new TransformBlock<string,string>(
url=>DownloadUrlAsync(url,client),
dlOptions);
var parseOptions = new ExecutionDataflowBlockOptions(){
MaxDegreeOfParallelism=10,
BoundedCapacity=2,
CancellationToken=cts.Token
};
var parser=new ActionBlock<string>(ParseResponse);
downloader.LinkTo(parser, new DataflowLinkOptions{PropageateCompletion=true});
We can post URLs to the downloader now and wait until all of them are parsed. By using different DOP and capacities, we can balance the number of downloader and parser tasks to download as many URLs as we can parse and handle eg slow downloads or big responses.
foreach(var url in urls)
{
await downloader.SendAsync(url);
}
//Tell the block we're done
downloader.Complete();
//Wait until all urls are parsed
await parser.Completion;
Channels
System.Threading.Channels introduces Go-style channels. These are actually lower-level concepts that a Dataflow block. If Channels were available back in 2012, they'd be written using channels.
An equivalent download method would look like this :
ChannelReader<string> Downloader(ChannelReader<string> ulrs,HttpClient client,
int capacity,CancellationToken token=default)
{
var channel=Channel.CreateBounded(capacity);
var writer=channel.Writer;
_ = Task.Run(async ()=>{
await foreach(var url in urls.ReadAsStreamAsync(token))
{
var response=await client.GetStringAsync(url);
await writer.WriteAsync(response);
}
}).ContinueWith(t=>writer.Complete(t.Exception));
return channel.Reader;
}
That's more verbose but it allows us to do things like create the HttpClient in the method and reuse it. Using a ChannelReader as both input and output may look weird, but now we can chain such methods simply by passing an output reader as input to another method.
The "magic" is that we create a worker task that waits to process messages and return a reader immediatelly. Whenever a result is produced, it's sent to the channel and the next step in the pipeline.
To use multiple worker tasks, we can use Enumerable.Range to start many of them and use Task.WhenAny to close the channel when all channels are done :
ChannelReader<string> Downloader(ChannelReader<string> ulrs,HttpClient client,
int capacity,int dop,CancellationToken token=default)
{
var channel=Channel.CreateBounded(capacity);
var writer=channel.Writer;
var tasks = Enumerable
.Range(0,dop)
.Select(_=> Task.Run(async ()=>{
await foreach(var url in urls.ReadAllAsync(token))
{
var response=await client.GetStringAsync(url);
await writer.WriteAsync(response);
}
});
_=Task.WhenAll(tasks)
.ContinueWith(t=>writer.Complete(t.Exception));
return channel.Reader;
}
Publishers can create their own channel and pass a reader to the Downloader method. They don't need to publish anything in advance either :
var channel=Channel.CreateUnbounded<string>();
var dlReader=Downloader(channel.Reader,client,5,5);
foreach(var url in someUrlList)
{
await channel.Writer.WriteAsync(url);
}
channel.Writer.Complete();
Fluent pipelines
This is so common that someone could create an extension method for this. Eg, to convert an IList to a Channel<T>, we don't need to wait as all the results are already available :
ChannelReader<T> Generate<T>(this IEnumerable<T> source)
{
var channel=Channel.CreateUnbounded<T>();
foreach(var item in source)
{
channel.Writer.TryWrite(T);
}
channel.Writer.Complete();
return channel.Reader;
}
If we convert the Downloader to an extension method too, we can use :
var pipeline= someUrls.Generate()
.Downloader(client,5,5);

It's because ProcessOneItem is not async.
You should see the following warning:
This async method lacks 'await' operators and will run synchronously. Consider using the 'await' operator to await non-blocking API calls, or 'await Task.Run(...)' to do CPU-bound work on a background thread.
Once you add await (...) to ProcessOneItem the return task will have a unique-ish id.

From the documentation of Task.Id property:
Task IDs are assigned on-demand and do not necessarily represent the order in which task instances are created. Note that although collisions are very rare, task identifiers are not guaranteed to be unique.
From what I understand this property is mainly there for debugging purposes. You should probably avoid depending on it for production code.

Related

.NET Task returning object and calling Async inside

I have some tasks executing in a WhenAll(). I get a semantic error if a task returns an object and calls an async method inside their Run(). The async method fetches from Blob some string content, then constructs and returns an object.
Do you know how to solve this issue, while maintaining the batch download done by tasks?
I need a list with those FinalWrapperObjects.
Error message
Cannot convert async lamba expression to delegate type
'Func<FinalWrapperObject>'. An async lambda expression may return
void, Task or Task, none of which are convertible to
'Func<FinalWrapperObject>'.
...
List<FinalWrapperObject> finalReturns = new List<FinalWrapperObject>();
List<Task<FinalWrapperObject>> tasks = new List<Task<FinalWrapperObject>>();
var resultsBatch = fetchedObjects.Skip(i).Take(10).ToList();
foreach (var resultBatchItem in resultsBatch)
{
tasks.Add(
new Task<FinalWrapperObject>(async () => //!! errors here on arrow
{
var blobContent = await azureBlobService.GetAsync(resultBatchItem.StoragePath);
return new FinalWrapperObject {
BlobContent = blobContent,
CreationDateTime = resultBatchItem.CreationDateTime
};
})
);
}
FinalWrapperObject[] listFinalWrapperObjects = await Task.WhenAll(tasks);
finalReturns.AddRange(listFinalWrapperObjects);
return finalReturns;

Your code never starts any tasks. Tasks aren't threads anyway. They're a promise that something will complete and maybe produce a value in the future. Some tasks require a thread to run. These are executed using threads that come from a threadpool. Others, eg async IO operations, don't require a thread. Uploading a file is such an IO operation.
Your lambda is asynchronous and already returning a Task so there's no reason to use Task.Run. You can execute it once for all items, collect the Tasks in a list and await all of them. That's the bare-bones way :
async Task<FinalWrapperObject> UploadItemAsync(BatchItem resultBatchItem) =>
{
var blobContent = await azureBlobService.GetAsync(resultBatchItem.StoragePath);
return new FinalWrapperObject {
BlobContent = blobContent,
CreationDateTime = resultBatchItem.CreationDateTime
};
}
...
var tasks=resultsBatch.Select(UploadItemAsync);
var results=await Task.WhenAll(tasks);
Using TPL Dataflow
A better option would be to use the TPL Dataflow classes to upload items concurrently and even construct a pipeline from processing blocks.
var options= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var results=new BufferBlock<FinalWrapperObject>();
var uploader=new TransformBlock<BatchItem,FinalWrapperObject>(UploadItemAsync,options);
uploader.LinkTo(results);
foreach(var item in fetchedObjects)
{
uploader.PostAsync(item);
}
uploader.Complete();
await uploader.Completion;
By default, a block only processes one message at a time. Using MaxDegreeOfParallelism = 10 we're telling it to process 10 items concurrently. This code will upload 10 items concurrently at a time, as long as there items to post to the uploader block.
The results are forwarded to the results BufferBlock. The items can be extracted with TryReceiveAll :
IList<FinalWrapperObject> items;
results.TryReceiveAll(out items);
Dataflow blocks can be combined into a pipeline. You could have a block that loads items from disk, another to upload them and a final one that stores the response to another file or database :
var dop10= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=4
};
var bounded= new ExecutionDataflowBlockOptions
{
BoundedCapacity=4
};
var loader=new TransformBlock<FileInfo,BatchItem>(LoadFile,bounded);
var uploader=new TransformBlock<BatchItem,FinalWrapperObject>(UploadItemAsync,dop10);
var dbLogger=new ActionBlock<FinalWrapperObject>(bounded);
var linkOptions=new DataflowLinkOptions {PropagateCompletion=true};
loader.LinkTo(uploader,linkOptions);
uploader.LinkTo(dbLogger,linkOptions);
var folder=new DirectoryInfo(rootPath);
foreach(var item in folder.EnumerateFiles())
{
await loader.SendAsync(item);
}
loader.Complete();
await dbLogger.Completion;
In this case, all files in a folder are posted to the loader block which loads files one by one and forwards a BatchItem. The uploader uploads the file and the results are stored by dbLogger. In the end, we tell loader we're finished and wait for all items to get processed all the way to the end with await dbLogger.Completion.
The BoundedCapacity is used to put a limit on how many items can be held at each block's input buffer. This prevents loading all files into memory.

Understand parallel programming in C# with async examples

I am trying to understand parallel programming and I would like my async methods to run on multiple threads. I have written something but it does not work like I thought it should.
Code
public static async Task Main(string[] args)
{
var listAfterParallel = RunParallel(); // Running this function to return tasks
await Task.WhenAll(listAfterParallel); // I want the program exceution to stop until all tasks are returned or tasks are completed
Console.WriteLine("After Parallel Loop"); // But currently when I run program, after parallel loop command is printed first
Console.ReadLine();
}
public static async Task<ConcurrentBag<string>> RunParallel()
{
var client = new System.Net.Http.HttpClient();
client.DefaultRequestHeaders.Add("Accept", "application/json");
client.BaseAddress = new Uri("https://jsonplaceholder.typicode.com");
var list = new List<int>();
var listResults = new ConcurrentBag<string>();
for (int i = 1; i < 5; i++)
{
list.Add(i);
}
// Parallel for each branch to run await commands on multiple threads.
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, async (index) =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
return listResults;
}
I would like RunParallel function to complete before "After parallel loop" is printed. Also I want my get posts method to run on multiple threads.
Any help would be appreciated!

What's happening here is that you're never waiting for the Parallel.ForEach block to complete - you're just returning the bag that it will eventually pump into. The reason for this is that because Parallel.ForEach expects Action delegates, you've created a lambda which returns void rather than Task. While async void methods are valid, they generally continue their work on a new thread and return to the caller as soon as they await a Task, and the Parallel.ForEach method therefore thinks the handler is done, even though it's kicked that remaining work off into a separate thread.
Instead, use a synchronous method here;
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index =>
{
var response = client.GetAsync("posts/" + index).Result;
var contents = response.Content.ReadAsStringAsync().Result;
listResults.Add(contents);
Console.WriteLine(contents);
});
If you absolutely must use await inside, Wrap it in Task.Run(...).GetAwaiter().GetResult();
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index => Task.Run(async () =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
}).GetAwaiter().GetResult();
In this case, however, Task.run generally goes to a new thread, so we've subverted most of the control of Parallel.ForEach; it's better to use async all the way down;
var tasks = list.Select(async (index) => {
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
await Task.WhenAll(tasks);
Since Select expects a Func<T, TResult>, it will interpret an async lambda with no return as an async Task method instead of async void, and thus give us something we can explicitly await

Take a look at this: There Is No Thread
When you are making multiple concurrent web requests it's not your CPU that is doing the hard work. It's the CPU of the web server that is serving your requests. Your CPU is doing nothing during this time. It's not in a special "Wait-state" or something. The hardware inside your box that is working is your network card, that writes data to your RAM. When the response is received then your CPU will be notified about the arrived data, so it can do something with them.
You need parallelism when you have heavy work to do inside your box, not when you want the heavy work to be done by the external world. From the point of view of your CPU, even your hard disk is part of the external world. So everything that applies to web requests, applies also to requests targeting filesystems and databases. These workloads are called I/O bound, to be distinguished from the so called CPU bound workloads.
For I/O bound workloads the tool offered by the .NET platform is the asynchronous Task. There are multiple APIs throughout the libraries that return Task objects. To achieve concurrency you typically start multiple tasks and then await them with Task.WhenAll. There are also more advanced tools like the TPL Dataflow library, that is build on top of Tasks. It offers capabilities like buffering, batching, configuring the maximum degree of concurrency, and much more.

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?

As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;

Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.

You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}

While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

C# Parallel - Adding items to the collection being iterated over, or equivalent?

Right now, I've got a C# program that performs the following steps on a recurring basis:
Grab current list of tasks from the database
Using Parallel.ForEach(), do work for each task
However, some of these tasks are very long-running. This delays the processing of other pending tasks because we only look for new ones at the start of the program.
Now, I know that modifying the collection being iterated over isn't possible (right?), but is there some equivalent functionality in the C# Parallel framework that would allow me to add work to the list while also processing items in the list?

Generally speaking, you're right that modifying a collection while iterating it is not allowed. But there are other approaches you could be using:
Use ActionBlock<T> from TPL Dataflow. The code could look something like:
var actionBlock = new ActionBlock<MyTask>(
task => DoWorkForTask(task),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
actionBlock.Post(task);
await Task.Delay(someShortDelay);
// or use Thread.Sleep() if you don't want to use async
}
}
Use BlockingCollection<T>, which can be modified while consuming items from it, along with GetConsumingParititioner() from ParallelExtensionsExtras to make it work with Parallel.ForEach():
var collection = new BlockingCollection<MyTask>();
Task.Run(async () =>
{
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
collection.Add(task);
await Task.Delay(someShortDelay);
}
}
});
Parallel.ForEach(collection.GetConsumingPartitioner(), task => DoWorkForTask(task));

Here is an example of an approach you could try. I think you want to get away from Parallel.ForEaching and do something with asynchronous programming instead because you need to retrieve results as they finish, rather than in discrete chunks that could conceivably contain both long running tasks and tasks that finish very quickly.
This approach uses a simple sequential loop to retrieve results from a list of asynchronous tasks. In this case, you should be safe to use a simple non-thread safe mutable list because all of the mutation of the list happens sequentially in the same thread.
Note that this approach uses Task.WhenAny in a loop which isn't very efficient for large task lists and you should consider an alternative approach in that case. (See this blog: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx)
This example is based on: https://msdn.microsoft.com/en-GB/library/jj155756.aspx
private async Task<ProcessResult> processTask(ProcessTask task)
{
// do something intensive with data
}
private IEnumerable<ProcessTask> GetOutstandingTasks()
{
// retreive some tasks from db
}
private void ProcessAllData()
{
List<Task<ProcessResult>> taskQueue =
GetOutstandingTasks()
.Select(tsk => processTask(tsk))
.ToList(); // grab initial task queue
while(taskQueue.Any()) // iterate while tasks need completing
{
Task<ProcessResult> firstFinishedTask = await Task.WhenAny(taskQueue); // get first to finish
taskQueue.Remove(firstFinishedTask); // remove the one that finished
ProcessResult result = await firstFinishedTask; // get the result
// do something with task result
taskQueue.AddRange(GetOutstandingTasks().Select(tsk => processData(tsk))) // add more tasks that need performing
}
}

Task Scheduler with WCF Service Reference async function

I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.

Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}

Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions

1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why do multiple short Tasks end up getting the same id? - c#

Related

.NET Task returning object and calling Async inside

Understand parallel programming in C# with async examples

How to correctly queue up tasks to run in C#

C# Parallel - Adding items to the collection being iterated over, or equivalent?

Task Scheduler with WCF Service Reference async function

Categories

Resources