Inside a c# project I'm making some calls to a web api, the thing is that I'm doing them within a loop in a method. Usually there are not so many but even though I was thinking of taking advantage of parallelism.
What I am trying so far is
public void DeployView(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
});
Task.WhenAll(tasks);
}
}
But wonder if that's the correct path, or should I try to parallel the whole DeployView (i.e. even before using the HttpClient)
Now that I see it posted, I reckon I can't just remove the variable response as well, just do the await without setting it to any variable
Thanks
Usually there is no need to parallelize the requests - one thread making async requests should be enough (even if you have hundreds of requests). Consider this code:
var tasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
//now tasks is IEnumerable<Task<WebResponse>>
await Task.WhenAll(tasks);
//now all the responses are available
foreach(WebResponse response in tasks.Select(p=> p.Result))
{
//do something with the response
}
However, you can utilize parallelism when processing the responses. Instead of the above 'foreach' loop you may use:
Parallel.Foreach(tasks.Select(p=> p.Result), response => ProcessResponse(response));
But TMO, this is the best utilization of asynchronous and parallelism:
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
ProcessResponse(response);
});
await Task.WhenAll(tasks);
There is a major difference between the first and last examples:
In the first one, you have one thread launching async requests, waits (non blocking) for all of them to return, and only then processing them.
In the second example, you attach a continuation to each Task. That way, every response gets processed as soon as it arrives. Assuming the current TaskScheduler allows parallel (multithreaded) execution of Tasks, no response remains idle as in the first example.
*Edit - if you do decide to do it parallel, you can use just one instance of HttpClient - it's thread safe.
What you're introducing is concurrency, not parallelism. More on that here.
Your direction is good, though a few minor changes that I would make:
First, you should mark your method as async Task as you're using Task.WhenAll, which returns an awaitable, which you will need to asynchronously wait on. Next, You can simply return the operation from PostAsJsonAsync, instead of awaiting each call inside your Select. This will save a little bit of overhead as it won't generate the state-machine for the async call:
public async Task DeployViewAsync(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var agentTasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
await Task.WhenAll(agentTasks);
}
}
HttpClient is able to make concurrent requests (see #usr link for more), thus I don't see a reason to create a new instance each time inside your lambda. Note that if you consume DeployViewAsync multiple times, perhaps you'll want to keep your HttpClient around instead of allocating one each time, and dispose it once you no longer need its services.
HttpClient appears to be usable for concurrent requests. I have not verified this myself, this is just what I gather from searching. Therefore, you don't have to create a new client for each task that you are starting. You can do what is most convenient to you.
In general I strive to share as little (mutable) state as possible. Resource acquisitions should generally be pushed inwards towards their usage. I think it's better style to create a helper CreateHttpClient and create a new client for each request here. Consider making the Select body a new async method. Then, the HttpClient usage is completely hidden from DeployView.
Don't forget to await the WhenAll task and make the method async Task. (If you do not understand why that is necessary you've got some research about await to do.)
Related
I have a task who want to call from the constructor class but it's really slow for executing. Is there a way to force this task?
private async Task GetExchange()
{
NewsStack.IsVisible = false;
SearchStack.IsVisible = false;
ExchangeStack.IsVisible = true;
try
{
var client = new HttpClient();
var request = new HttpRequestMessage
{
Method = HttpMethod.Get,
RequestUri = new Uri("https://coinlore-cryptocurrency.p.rapidapi.com/api/tickers/?start=0&limit=100"),
Headers =
{
{ "x-rapidapi-host", "coinlore-cryptocurrency.p.rapidapi.com" },
{ "x-rapidapi-key", "yourAPIkey" },
},
};
using (var response = await client.SendAsync(request))
{
var exchange = new Exchange();
response.EnsureSuccessStatusCode();
var body = await response.Content.ReadAsStringAsync();
var exchangeBody = JsonConvert.DeserializeObject<Exchange>(body);
exchange = exchangeBody;
this.exchangeBodyList = new List<SearchCrypto>();
foreach (var item in exchange.CryptoExchange)
{
this.exchangeBodyList.Add(new SearchCrypto()
{
Name = item.Name,
Symbol = item.Symbol
});
}
this.exchangeTest = exchange;
lstExchange.ItemsSource = exchangeBody.CryptoExchange;
}
dateTimeRefresh.Text = "Last Update: " + DateTime.Now.ToString("HH:mm:ss");
}
catch (Exception ex)
{
await DisplayAlert("Alert", "Please, check your internet connection.", "OK");
}
}
I call this task in constructor like that:
Task.Run(() => this.GetExchange()).Wait();
I'm not sure if there's a way to force it in another way.
Also I accepting tips or examples for code optimization.
In general, asynchronous work is a poor fit for constructors. Ideally, constructors should be short and fast and do almost nothing - setting some member variables, perhaps doing some argument validation, that's about it.
Instead of trying to cram I/O into a constructor, consider using a factory pattern. So you create a factory, which can then create an instance of the type you want using an asynchronous method like async Task<MyType> CreateAsync(). CreateAsync can then call GetExchange naturally (i.e., asynchronously) and pass exchangeBodyList and exchangeTest into the constructor.
What point are you trying to accomplish by forcing the API call to finish? Just like most things, the server will give a response when it's performed all it's operations, not before. The only way to force the result early is to close the connection and not wait for an answer. If you just want it to speed up and finish quicker, then you'll need to speed up the server side code and any DB calls.
Just like in any program, there's no way to force code to run faster. You can't make the computer run faster. You can force it to run a thread at a higher priority, but I'm pretty sure that's not going to make much speed difference and it's probably not the format you need the code to run in.
Speeding up code isn't really on topic here, unless you have an actual, specific error or issue you want to fix, but a general "speed up my code" doesn't work here. It might be on topic on Code Review, maybe, but not here.
I am working on a protocol and trying to use as much async/await as I can to make it scale well. The protocol will have to support hundreds to thousands of simultaneous connections. Below is a little bit of pseudo code to illustrate my problem.
private static async void DoSomeWork()
{
var protocol = new FooProtocol();
await protocol.Connect("127.0.0.1", 1234);
var i = 0;
while(i != int.MaxValue)
{
i++;
var request = new FooRequest();
request.Payload = "Request Nr " + i;
var task = protocol.Send(request);
_ = task.ContinueWith(async tmp =>
{
var resp = await task;
Console.WriteLine($"Request {resp.SequenceNr} Successful: {(resp.Status == 0)}");
});
}
}
And below is a little pseudo code for the protocol.
public class FooProtocol
{
private int sequenceNr = 0;
private SemaphoreSlim ss = new SemaphoreSlim(20, 20);
public Task<FooResponse> Send(FooRequest fooRequest)
{
var tcs = new TaskCompletionSource<FooResponse>();
ss.Wait();
var tmp = Interlocked.Increment(ref sequenceNr);
fooRequest.SequenceNr = tmp;
// Faking some arbitrary delay. This work is done over sockets.
Task.Run(async () =>
{
await Task.Delay(1000);
tcs.SetResult(new FooResponse() {SequenceNr = tmp});
ss.Release();
});
return tcs.Task;
}
}
I have a protocol with request and response pairs. I have used asynchronous socket programming. The FooProtocol will take care of matching up request with responses (sequence numbers) and will also take care of the maximum number of pending requests. (Done in the pseudo and my code with a semaphore slim, So I am not worried about run away requests). The DoSomeWork method calls the Protocol.Send method, but I don't want to await the response, I want to spin around and send the next one until I am blocked by the maximum number of pending requests. When the task does complete I want to check the response and maybe do some work.
I would like to fix two things
I would like to avoid using Task.ContinueWith() because it seems to not fit in cleanly with the async/await patterns
Because I have awaited on the connection, I have had to use the async modifier. Now I get warnings from the IDE "Because this call is not waited, execution of the current method continues before this call is complete. Consider applying the 'await' operator to the result of the call." I don't want to do that, because as soon as I do it ruins the protocol's ability to have many requests in flight. The only way I can get rid of the warning is to use a discard. Which isn't the worst thing but I can't help but feel like I am missing a trick and fighting this too hard.
Side note: I hope your actual code is using SemaphoreSlim.WaitAsync rather than SemaphoreSlim.Wait.
In most socket code, you do end up with a list of connections, and along with each connection is a "processor" of some kind. In the async world, this is naturally represented as a Task.
So you will need to keep a list of Tasks; at the very least, your consuming application will need to know when it is safe to shut down (i.e., all responses have been received).
Don't preemptively worry about using Task.Run; as long as you aren't blocking (e.g., SemaphoreSlim.Wait), you probably will not starve the thread pool. Remember that during the awaits, no thread pool thread is used.
I am not sure that it's a good idea to enforce the maximum concurrency at the protocol level. It seems to me that this responsibility belongs to the caller of the protocol. So I would remove the SemaphoreSlim, and let it do the one thing that it knows to do well:
public class FooProtocol
{
private int sequenceNr = 0;
public async Task<FooResponse> Send(FooRequest fooRequest)
{
var tmp = Interlocked.Increment(ref sequenceNr);
fooRequest.SequenceNr = tmp;
await Task.Delay(1000); // Faking some arbitrary delay
return new FooResponse() { SequenceNr = tmp };
}
}
Then I would use an ActionBlock from the TPL Dataflow library in order to coordinate the process of sending a massive number of requests through the protocol, by handling the concurrency, the backpreasure (BoundedCapacity), the cancellation (if needed), the error-handling, and the status of the whole operation (running, completed, failed etc). Example:
private static async Task DoSomeWorkAsync()
{
var protocol = new FooProtocol();
var actionBlock = new ActionBlock<FooRequest>(async request =>
{
var resp = await protocol.Send(request);
Console.WriteLine($"Request {resp.SequenceNr} Status: {resp.Status}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 20,
BoundedCapacity = 100
});
await protocol.Connect("127.0.0.1", 1234);
foreach (var i in Enumerable.Range(0, Int32.MaxValue))
{
var request = new FooRequest();
request.Payload = "Request Nr " + i;
var accepted = await actionBlock.SendAsync(request);
if (!accepted) break; // The block has failed irrecoverably
}
actionBlock.Complete();
await actionBlock.Completion; // Propagate any exceptions
}
The BoundedCapacity = 100 configuration means that the ActionBlock will store in its internal buffer at most 100 requests. When this threshold is reached, anyone who wants to send more requests to it will have to wait. The awaiting will happen in the await actionBlock.SendAsync line.
I am trying to understand parallel programming and I would like my async methods to run on multiple threads. I have written something but it does not work like I thought it should.
Code
public static async Task Main(string[] args)
{
var listAfterParallel = RunParallel(); // Running this function to return tasks
await Task.WhenAll(listAfterParallel); // I want the program exceution to stop until all tasks are returned or tasks are completed
Console.WriteLine("After Parallel Loop"); // But currently when I run program, after parallel loop command is printed first
Console.ReadLine();
}
public static async Task<ConcurrentBag<string>> RunParallel()
{
var client = new System.Net.Http.HttpClient();
client.DefaultRequestHeaders.Add("Accept", "application/json");
client.BaseAddress = new Uri("https://jsonplaceholder.typicode.com");
var list = new List<int>();
var listResults = new ConcurrentBag<string>();
for (int i = 1; i < 5; i++)
{
list.Add(i);
}
// Parallel for each branch to run await commands on multiple threads.
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, async (index) =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
return listResults;
}
I would like RunParallel function to complete before "After parallel loop" is printed. Also I want my get posts method to run on multiple threads.
Any help would be appreciated!
What's happening here is that you're never waiting for the Parallel.ForEach block to complete - you're just returning the bag that it will eventually pump into. The reason for this is that because Parallel.ForEach expects Action delegates, you've created a lambda which returns void rather than Task. While async void methods are valid, they generally continue their work on a new thread and return to the caller as soon as they await a Task, and the Parallel.ForEach method therefore thinks the handler is done, even though it's kicked that remaining work off into a separate thread.
Instead, use a synchronous method here;
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index =>
{
var response = client.GetAsync("posts/" + index).Result;
var contents = response.Content.ReadAsStringAsync().Result;
listResults.Add(contents);
Console.WriteLine(contents);
});
If you absolutely must use await inside, Wrap it in Task.Run(...).GetAwaiter().GetResult();
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index => Task.Run(async () =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
}).GetAwaiter().GetResult();
In this case, however, Task.run generally goes to a new thread, so we've subverted most of the control of Parallel.ForEach; it's better to use async all the way down;
var tasks = list.Select(async (index) => {
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
await Task.WhenAll(tasks);
Since Select expects a Func<T, TResult>, it will interpret an async lambda with no return as an async Task method instead of async void, and thus give us something we can explicitly await
Take a look at this: There Is No Thread
When you are making multiple concurrent web requests it's not your CPU that is doing the hard work. It's the CPU of the web server that is serving your requests. Your CPU is doing nothing during this time. It's not in a special "Wait-state" or something. The hardware inside your box that is working is your network card, that writes data to your RAM. When the response is received then your CPU will be notified about the arrived data, so it can do something with them.
You need parallelism when you have heavy work to do inside your box, not when you want the heavy work to be done by the external world. From the point of view of your CPU, even your hard disk is part of the external world. So everything that applies to web requests, applies also to requests targeting filesystems and databases. These workloads are called I/O bound, to be distinguished from the so called CPU bound workloads.
For I/O bound workloads the tool offered by the .NET platform is the asynchronous Task. There are multiple APIs throughout the libraries that return Task objects. To achieve concurrency you typically start multiple tasks and then await them with Task.WhenAll. There are also more advanced tools like the TPL Dataflow library, that is build on top of Tasks. It offers capabilities like buffering, batching, configuring the maximum degree of concurrency, and much more.
I have a Web API method that takes in a list of strings, performs a web request for each of those strings, and compiles all of the data into a list for returning.
The input list can be variable length, up into the thousands. Do I need to manually limit the number of concurrent tasks, by batching them into groups, or is it safe to create thousands of tasks and await them with Task.WhenAll()? Here is a snippet of what I am using now:
public async Task<List<Customer>> GetDashboard(List<string> customerIds)
{
HttpClient client = new HttpClient();
var tasks = new List<Task>();
var customers = new List<Customer>();
foreach (var customerId in customerIds)
{
string customerIdCopy = customerId;
tasks.Add(client.GetStringAsync("http://testurl.com/" + customerId)
.ContinueWith(t => {
customers.Add(new Customer { Id = customerIdCopy, Data = t.Result });
}));
}
await Task.WhenAll(tasks);
return customers;
}
HttpClient can perform concurrent requests efficiently, with the caveat that it limits the number of concurrent requests to a single server.
If your requests are all going to the same site, the excess requests will be put into a queue. When requests are in this queue, the request timeout is ticking down... before it ever tries to connect to the server. So, manage that carefully, and if appropriate maybe even turn the timeout off.
Beyond this, it is perfectly fine to launch thousands of requests at once.
If you think that'll affect you, you can use a SemaphoreSlim or maybe TPL Dataflow to limit the number of concurrent requests.
The first thing which comes to mind is to delegate all this "multithreading performance" part to TPL. Use async in your requests instead of manually created tasks and ContinueWith.
It will also make C# take care about thread performance.
private async Task<Customer> GetDashboardAsync(string customerId)
{
using (var httpClient = new HttpClient())
{
string data = await httpClient.GetStringAsync("http://testurl.com/" + id);
return new Customer { Id = id, Data = data });
}
}
public async Task<Customer[]> GetDashboardAsync(List<string> customerIds)
{
var tasks = customerIds
.Select(GetDashboardAsync)
.ToArray();
return await Task.WhenAll(tasks);
}
Do I need to manually limit the number of concurrent tasks, by batching them into groups, or is it safe to create thousands of tasks and await them with Task.WhenAll()?
If you just create tasks using a List, foreach and ContinueWith, then it can cause performance drop due to the excessive number of tasks.
However, if you use async/await in your code as above, then you don't need to bother about it. TPL will use asynchronous tasks and continuations to provide the best performance.
You can just try this out to make sure that it works :)
I have an API that must call in parallel 4 HttpClients supporting a concurrency of 500 users per second (all of them calling the API at the same time)
There must be a strict timeout letting the API to return a result even if not all the HttpClients calls have returned a value.
The endpoints are external third party APIs and I don't have any control on them or know the code.
I did extensive research on the matter, but even if many solution works, I need the one that consume less CPU as possible since I have a low server budget.
So far I came up with this:
var conn0 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn1 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn2 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn3 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var list = new List<HttpClient>() { conn0, conn1, conn2, conn3 };
var timeout = TimeSpan.FromMilliseconds(1000);
var allTasks = new List<Task<Task>>();
//the async DoCall method just call the HttpClient endpoint and return a MyResponse object
foreach (var call in list)
{
allTasks.Add(Task.WhenAny(DoCall(call), Task.Delay(timeout)));
}
var completedTasks = await Task.WhenAll(allTasks);
var allResults = completedTasks.OfType<Task<MyResponse>>().Select(task => task.Result).ToList();
return allResults;
I use WhenAny and two tasks, one for the call, one for the timeout.If the call task is late, the other one return anyway.
Now, this code works perfectly and everything is async, but I wonder if there is a better way of achieving this.
Ever single call to this API creates lot of threads and with 500concurrent users it needs an avarage of 8(eight) D3_V2 Azure 4-core machines resulting in crazy expenses, and the higher the timeout is, the higher the CPU use is.
Is there a better way to do this without using so many CPU resources (maybe Parallel Linq a better choice than this)?
Is the HttpClient timeout alone sufficient to stop the call and return if the endpoint do not reply in time, without having to use the second task in WhenAny?
UPDATE:
The endpoints are third party APIs, I don't know the code or have any control, the call is done in JSON and return JSON or a string.
Some of them reply after 10+ seconds once in a while or got stuck and are extremely slow,so the timeout is to free the threads and return even if with partial data from the other that returned in time.
Caching is possible but only partially since the data change all the time, like stocks and forex real time currency trading.
Your approach using the two tasks just for timeout do work, but you can do a better thing: use CancellationToken for the task, and for getting the answers from a server:
var cts = new CancellationTokenSource();
// set the timeout equal to the 1 second
cts.CancelAfter(1000);
// provide the token for your request
var response = await client.GetAsync(url, cts.Token);
After that, you simply can filter the completed tasks:
var allResults = completedTasks
.Where(t => t.IsCompleted)
.Select(task => task.Result).ToList();
This approach will decrease the number of tasks you're creating no less than two times, and will decrease the overhead on your server. Also, it will provide you a simple way to cancel some part of the handling or even whole one. If your tasks are completely independent from each other, you may use a Parallel.For for calling the http clients, yet still usee the token for cancelling the operation:
ParallelLoopResult result = Parallel.For(list, call => DoCall(call, cts.Token));
// handle the result of the parallel tasks
or, using the PLINQ:
var results = list
.AsParallel()
.Select(call => DoCall(call, cts.Token))
.ToList();