How to run api calls and execute their response parallel in c#? - c#

I am from ruby background. I have a project need to be migrated to c#. It will make thousands of api service calls. In ruby I use Typhoeus Hydra to run the request parallel and execute the response parallel.
NOTE: each api call are separate no dependency between each call.
The template of ruby will be like this
#typhoeus gem used to make api call
QUEUE = Typhoeus::Hydra.new
[1..100].each do |val|
request = Typhoeus::Request.new("http://api.com/?value=#{val}")
request.on_complete do |response|
# code to be executed after each call
end
QUEUE.queue(request)
end
#run the queue will run 100 api calls in parallel and execute complete block in parallel
QUEUE.run
I have little idea that i have to work with async and await (TPL) in c#. But I need some good examples which will be helpful.
Thanks in advance

Shou should have look at the Parallel LINQ library (PLINQ).
You can do requests with like this:
Parallel.ForEach(Enumerable.Range(1, 100), (val) =>
{
// make syncron api call
WebClient webClient = new WebClient();
var result = webClient.DownloadString(string.Format("http://api.com/?value={0}", val);
// work on the result
});

Parallel processing is an option; however, it blocks threads unnecessarily. Since your operation is I/O-bound (hitting an HTTP API), asynchronous concurrency is a better option.
First, you'd define your "download and process" operation:
private static HttpClient client = new HttpClient();
private static async Task DownloadAndProcessAsync(string value)
{
var response = await client.GetStringAsync($"http://api.com/?value={value}");
// Process response.
}
If you want to run them all concurrently, then a simple Task.WhenAll would suffice:
var source = Enumerable.Range(1, 100);
var tasks = source.Select(v => DownloadAndProcessAsync(v.ToString()));
await Task.WhenAll(tasks);
For more information about async/await, see my intro to async blog post (and the followup resources at the end of it).

Related

Are these webrequests actually concurrent?

I have a UrlList of only 4 URLs which I want to use to make 4 concurrent requests. Does the code below truly make 4 requests which start at the same time?
My testing appears to show that it does, but am I correct in thinking that there will actually be 4 requests retrieving data from the URL target at the same time or does it just appear that way?
static void Main(string[] args)
{
var t = Do_TaskWhenAll();
t.Wait();
}
public static async Task Do_TaskWhenAll()
{
var downloadTasksQuery = from url in UrlList select Run(url);
var downloadTasks = downloadTasksQuery.ToArray();
Results = await Task.WhenAll(downloadTasks);
}
public static async Task<string> Run(string url)
{
var client = new WebClient();
AddHeaders(client);
var content = await client.DownloadStringTaskAsync(new Uri(url));
return content;
}
Correct, when ToArray is called, the enumerable downloadTasksQuery will yield a task for every URL, running your web requests concurrently.
await Task.WhenAll ensures your task completes only when all web requests have completed.
You can rewrite your code to be less verbose, while doing effectively the same, like so:
public static async Task Do_TaskWhenAll()
{
var downloadTasks = from url in UrlList select Run(url);
Results = await Task.WhenAll(downloadTasks);
}
There's no need for ToArray because Task.WhenAll will enumerate your enumerable for you.
I advice you to use HttpClient instead of WebClient. Using HttpClient, you won't have to create a new instance of the client for each concurrent request, as it allows you to reuse the same client for doing multiple requests, concurrently.
The short answer is yes: if you generate multiple Tasks without awaiting each one individually, they can run simultaneously, as long as they are truly asynchronous.
When DownloadStringTaskAsync is awaited, a Task is returned from your Run method, allowing the next iteration to occur whilst waiting for the response.
So the next HTTP request is allowed to be sent without waiting for the first to complete.
As an aside, your method can be written more concisely:
public static async Task Do_TaskWhenAll()
{
Results = await Task.WhenAll(UrlList.Select(Run));
}
Task.WhenAll has an overload that accepts IEnumerable<Task<TResult>> which is returned from UrlList.Select(Run).
No, there is no guarantee that your requests will be executed in parallel, or immediately.
Starting a task merely queues it to the thread pool. If all of the pool's threads are occupied, that task will necessarily wait until a thread frees up.
In your case, since there are a relatively large number of threads available in the pool, and you are queueing only a small number of items, the pool has no problem servicing them as they come in. The more tasks you queue at once, the more likely this is to change.
If you truly need concurrency, you need to be aware of what the thread pool size is, and how busy it is. The ThreadPool class will help you to manage this.

Performance of multiple awaits compared to Task.WhenAll

General Information
I want to improve the performance of a program issuing multiple HTTP requests to the same external API endpoint. Therefore, I have created a console application to perform some tests. The method GetPostAsync sends an asynchronous HTTP request to the external API and returns the result as a string.
private static async Task<string> GetPostAsync(int id)
{
var client = new HttpClient();
var response = await client.GetAsync($"https://jsonplaceholder.typicode.com/posts/{id}");
return await response.Content.ReadAsStringAsync();
}
Additionally, I have implemented the methods below to compare the execution time of multiple calls to await and Task.WhenAll.
private static async Task TaskWhenAll(IEnumerable<int> postIds)
{
var tasks = postIds.Select(GetPostAsync);
await Task.WhenAll(tasks);
}
private static async Task MultipleAwait(IEnumerable<int> postIds)
{
foreach (var postId in postIds)
{
await GetPostAsync(postId);
}
}
Test Results
Using the integrated Stopwatch class, I have measured the timings of the two methods and interestingly enough, the approach using Task.WhenAll performed way better than its counterpart:
Issue 50 HTTP requests
TaskWhenAll: ~650ms
MultipleAwait: ~4500ms
Why is the method using Task.WhenAll so much faster and are there any negative effects (i.e exception handling) when choosing this approach over the other?
Why is the method using Task.WhenAll so much faster
It is faster because you are not awaiting GetPostAsync. So actually every time you await client.GetAsync($"https://jsonplaceholder.typicode.com/posts/{id}"); the control will be returned to the caller which then can make another HTTP request. If you consider that HTTP request is much longer than creating the new client you effectively have the parallelism by running multiple HTTP requests in parallel. The WhenAll will just create a suspension point and wait for all tasks to finish.
With the multiple await approach, you make HTTP requests sequentially one by one by await GetPostAsync(postId) from foreach loop. You start the task but at the same time, you make a suspension point and wait for it to finish.
are there any negative effects (i.e exception handling, etc.) when
choosing this approach over the other?
There are no negative effects, using await/async pattern handling exception become just, as usual, using try-catch block. WhenAll will aggregate all exception from each task which is in Faulted state.

What's the "right way" to use HttpClient synchronously?

I used quote marks around "right way" because I'm already well aware that the right way to use an asynchronous API is to simply let the asynchronous behavior propagate throughout the entire call chain. That's not an option here.
I'm dealing with a very large and complicated system designed specifically to do batch processing synchronously in a loop.
The reason why suddenly I'm using HttpClient is because prior to now all data for the batch processing was gathered from a SQL database, and now we're adding a Web API call to the mix.
Yes, we're calling a Web API in a synchronously executing loop. I know. Rewriting the whole thing to be async just isn't an option. This is actually what we want to do. (We're minimizing the number of API calls as much as possible)
I actually did try to propagate the async behavior up the call chain, but then I found myself 50 files deep in changes, still with hundreds of compiler errors to resolve, and lost all hope. I am defeated.
So then, back to the question, given Microsoft's recommendation to never use WebRequest for new development and to instead use HttpClient, which offers only an asynchronous API, what am I to do?
Here is some pseudo-code of what I'm doing...
foreach (var thingToProcess in thingsToProcess)
{
thingToProcess.ProcessStuff(); // This makes an API call
}
How do I implement ProcessStuff()?
My first implementation looked like this
public void ProcessStuff()
{
var apiResponse = myHttpClient // this is an instance of HttpClient
.GetAsync(someUrl)
.Result;
// do some stuff with the apiResponse
}
I was told however, that calling .Result in this manner can result in deadlocks when it's called from something like ASP.NET due to the synchronization context.
Guess what, this batch process will be kicked off from an ASP.NET controller. Yes, again, I know, this is silly. When it runs from ASP.NET it's only "batch processing" one item instead of the whole batch, but I digress, it still gets called from ASP.NET and thus I'm concerned about deadlocks.
So what's the "right way" to handle this?
Try the following:
var task = Task.Run(() => myHttpClient.GetAsync(someUrl));
task.Wait();
var response = task.Result;
Use it only when you cannot use an async method.
This method is completely deadlock free as mentioned on the MSDN blog:
ASP.Net–Do not use Task .Result in main context.
For anyone coming across this now, .NET 5.0 has added a synchronous Send method to HttpClient. https://github.com/dotnet/runtime/pull/34948
You can therefore use this instead of SendAsync. For example
public string GetValue()
{
var client = new HttpClient();
var webRequest = new HttpRequestMessage(HttpMethod.Post, "http://your-api.com")
{
Content = new StringContent("{ 'some': 'value' }", Encoding.UTF8, "application/json")
};
var response = client.Send(webRequest);
using var reader = new StreamReader(response.Content.ReadAsStream());
return reader.ReadToEnd();
}
This code is just a simplified example, it's not production ready.
You could also look at using Nito.AsyncEx, which is a nuget package. I've heard of issues with using Task.Run() and this this addresses that. Here's a link to the api docs:
http://dotnetapis.com/pkg/Nito.AsyncEx/4.0.1/net45/doc/Nito.AsyncEx.AsyncContext
And here's an example for using an async method in a console app:
https://blog.stephencleary.com/2012/02/async-console-programs.html

Sending multiple requests to a server using multithreading

I have a task where I form thousands of requests which are later sent to a server. The server returns the response for each request and that response is then dumped to an output file line by line.
The pseudo code goes like this:
//requests contains thousands of requests to be sent to the server
string[] requests = GetRequestsString();
foreach(string request in requests)
{
string response = MakeWebRequest(request);
ParseandDump(response);
}
Now, as can be seen the serve is handling my requests one by one. I want to make this entire process fast. The server in question is capable of handling multiple requests at a time. I want to apply multi-threading and send let's say 4 requests to the server at a time and dump the response in same thread.
Can you please give me any pointer to possible approaches.
You can take advantage of Task from .NET 4.0 and the new toy HttpClient, sample code below is showed how you send requests in parallel, then dump response in the same thread by using ContinueWith:
var httpClient = new HttpClient();
var tasks = requests.Select(r => httpClient.GetStringAsync(r).ContinueWith(t =>
{
ParseandDump(t.Result);
}));
Task uses ThreadPool under the hood, so you don't need to specify how many threads should be used, ThreadPool will manage this for you in optimized way.
The easiest way would be to use Parallel.ForEach like this:
string[] requests = GetRequestsString();
Parallel.ForEach(requests, request => ParseandDump(MakeWebRequest(request)));
.NET framework 4.0 or greater is required to use Parallel.
I think this could be done in a consumer-producer-pattern. You could use a ConcurrentQueue (from the namespace System.Collections.Concurrent) as a shared resource between the many parallel WebRequests and the dumping thread.
The pseudo code would be something like:
var requests = GetRequestsString();
var queue = new ConcurrentQueue<string>();
Task.Factory.StartNew(() =>
{
Parallel.ForEach(requests , currentRequest =>
{
queue.Enqueue(MakeWebRequest(request));
}
});
Task.Factory.StartNew(() =>
{
while (true)
{
string response;
if (queue.TryDequeue(out response))
{
ParseandDump(response);
}
}
});
Maybe a BlockingCollection might serve you even better, depending on how you want to go about synchronizing the threads to signal the end of incoming requests.

Parallel batch file download from Amazon S3 using AWS S3 SDK for .NET

Problem: I would like to download 100 files in parallel from AWS S3 using their .NET SDK. The downloaded content should be stored in 100 memory streams (the files are small enough, and I can take it from there). I am geting confused between Task, IAsyncResult, Parallel.*, and other different approaches in .NET 4.0.
If I try to solve the problem myself, off the top of my head I imagine something like this pseudocode:
(edited to add types to some variables)
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
AmazonS3 _s3 = ...;
IEnumerable<GetObjectRequest> requestObjects = ...;
// Prepare to launch requests
var asyncRequests = from rq in requestObjects
select _s3.BeginGetObject(rq,null,null);
// Launch requests
var asyncRequestsLaunched = asyncRequests.ToList();
// Prepare to finish requests
var responses = from rq in asyncRequestsLaunched
select _s3.EndGetRequest(rq);
// Finish requests
var actualResponses = responses.ToList();
// Fetch data
var data = actualResponses.Select(rp => {
var ms = new MemoryStream();
rp.ResponseStream.CopyTo(ms);
return ms;
});
This code launches 100 requests in parallel, which is good. However, there are two problems:
The last statement will download files serially, not in parallel. There doesn't seem to be BeginCopyTo()/EndCopyTo() method on stream...
The preceding statement will not let go until all requests have responded. In other words none of the files will start downloading until all of them start.
So here I start thinking I am heading down the wrong path...
Help?
It's probably easier if you break the operation down into a method that will handle one request asynchronously and then call it 100 times.
To start, let's identify the final result you want. Since what you'll be working with is a MemoryStream it means that you'll want to return a Task<MemoryStream> from your method. The signature will look something like this:
static Task<MemoryStream> GetMemoryStreamAsync(AmazonS3 s3,
GetObjectRequest request)
Because your AmazonS3 object implements the Asynchronous Design Pattern, you can use the FromAsync method on the TaskFactory class to generate a Task<T> from a class that implements the Asynchronous Design Pattern, like so:
static Task<MemoryStream> GetMemoryStreamAsync(AmazonS3 s3,
GetObjectRequest request)
{
Task<GetObjectResponse> response =
Task.Factory.FromAsync<GetObjectRequest,GetObjectResponse>(
s3.BeginGetObject, s3.EndGetObject, request, null);
// But what goes here?
So you're already in a good place, you have a Task<T> which you can wait on or get a callback on when the call completes. However, you need to somehow translate the GetObjectResponse returned from the call to Task<GetObjectResponse> into a MemoryStream.
To that end, you want to use the ContinueWith method on the Task<T> class. Think of it as the asynchronous version of the Select method on the Enumerable class, it's just a projection into another Task<T> except that each time you call ContinueWith, you are potentially creating a new Task that runs that section of code.
With that, your method looks like the following:
static Task<MemoryStream> GetMemoryStreamAsync(AmazonS3 s3,
GetObjectRequest request)
{
// Start the task of downloading.
Task<GetObjectResponse> response =
Task.Factory.FromAsync<GetObjectRequest,GetObjectResponse>(
s3.BeginGetObject, s3.EndGetObject, request, null
);
// Translate.
Task<MemoryStream> translation = response.ContinueWith(t => {
using (Task<GetObjectResponse> resp = t ){
var ms = new MemoryStream();
t.Result.ResponseStream.CopyTo(ms);
return ms;
}
});
// Return the full task chain.
return translation;
}
Note that in the above you can possibly call the overload of ContinueWith passing TaskContinuationOptions.ExecuteSynchronously, as it appears you are doing minimal work (I can't tell, the responses might be huge). In the cases where you are doing very minimal work where it would be detrimental to start a new task in order to complete the work, you should pass TaskContinuationOptions.ExecuteSynchronously so that you don't waste time creating new tasks for minimal operations.
Now that you have the method that can translate one request into a Task<MemoryStream>, creating a wrapper that will process any number of them is simple:
static Task<MemoryStream>[] GetMemoryStreamsAsync(AmazonS3 s3,
IEnumerable<GetObjectRequest> requests)
{
// Just call Select on the requests, passing our translation into
// a Task<MemoryStream>.
// Also, materialize here, so that the tasks are "hot" when
// returned.
return requests.Select(r => GetMemoryStreamAsync(s3, r)).
ToArray();
}
In the above, you simply take a sequence of your GetObjectRequest instances and it will return an array of Task<MemoryStream>. The fact that it returns a materialized sequence is important. If you don't materialize it before returning, then the tasks will not be created until the sequence is iterated through.
Of course, if you want this behavior, then by all means, just remove the call to .ToArray(), have the method return IEnumerable<Task<MemoryStream>> and then the requests will be made as you iterate through the tasks.
From there, you can process them one at a time (using the Task.WaitAny method in a loop) or wait for all of them to be completed (by calling the Task.WaitAll method). An example of the latter would be:
static IList<MemoryStream> GetMemoryStreams(AmazonS3 s3,
IEnumerable<GetObjectRequest> requests)
{
Task<MemoryStream>[] tasks = GetMemoryStreamsAsync(s3, requests);
Task.WaitAll(tasks);
return tasks.Select(t => t.Result).ToList();
}
Also, it should be mentioned that this is a pretty good fit for the Reactive Extensions framework, as this very well-suited towards an IObservable<T> implementation.
You can use Nexus.Tasks from Nexus.Core package.
var response = await fileNames
.WhenAll(item => GetObject(item, cancellationToken), 10, cancellationToken)
.ConfigureAwait(false);

Categories

Resources