Calling HTTPRequest takes too long for large number of calls - c#

I have to call 100,000 urls and I don't need to wait for the response. I have searched a lot . some people say there is no way to call http request without waiting for the response, also there are few answers for questions like mine that say to set Method="POST".
The following is the source code based on all of them. I have tried to call the urls asynchronously using WhenAll.
The problem is this that when I see CPU usage in task manager, it is fully busy for 140 seconds and within that time the system is almost unusable.
protected async void btnStartCallWhenAll_Click(object sender, EventArgs e)
{
// Make a list of web addresses.
List<string> urlList = SetUpURLList(Convert.ToInt32(txtNoRecordsToAdd.Text));
// One-step async call.
await ProcessAllURLSAsync(urlList);
}
private async Task ProcessAllURLSAsync(List<string> urlList)
{
// Create a query.
IEnumerable<Task<int>> CallingTasksQuery =
from url in urlList select ProcessURLAsync(url);
// Use ToArray to execute the query and start the Calling tasks.
Task<int>[] CallingTasks = CallingTasksQuery.ToArray();
// Await the completion of all the running tasks.
int[] lengths = await Task.WhenAll(CallingTasks);
int total = lengths.Sum();
}
private async Task<int> ProcessURLAsync(string url)
{
await CallURLAsync(url);
return 1;
}
private async Task CallURLAsync(string url)
{
// Initialize an HttpWebRequest for the current URL.
var webReq = (HttpWebRequest)WebRequest.Create(url);
webReq.Method="POST";
// Send the request to the Internet resource and wait for the response.
Task<WebResponse> responseTask = webReq.GetResponseAsync() ;
}
private List<string> SetUpURLList(int No)
{
List<string> urls = new List<string>
{
};
for (int i = 1; i <= No; i++)
urls.Add("http://msdn.microsoft.com/library/windows/apps/br211380.aspx");
return urls;
}
By the way the compiler hints that "this async method lacks 'await' operators and will run synchronously consider using the await operator to await non-blocking api calls or await task.run(..) to do cpu bound work on background thread" for this line:
private async Task CallURLAsync(string url).
I don't know if this affect my problem or not but after searching same problem in other questions, they said that I need to disable compiler message before this line.

Related

C# Async and Sync functions confuse me

I have an app in which a button starts creating XMLs. In the end of each XML creation, the SendInvoice function sends it, receives the response and a function (ParseResponse) parses the responses and does the database operations needed.
The idea is that when all the XMLs are created and sent, the application must close.
The problem is that I have lost control with async and the application seems to close before it actually finishes all the jobs. Also XMLs are sent before the previous have been processed.
The ParseResponse function is not asynchronous.
Here is the SendInvoice function.
Can you suggest any good practise?
Thank you in advance.
public async void SendInvoice(string body)
{
Cursor.Current = Cursors.WaitCursor;
var client = new HttpClient();
var queryString = HttpUtility.ParseQueryString(string.Empty);
var uri = "https://xxxx.xxx/SendInvoices?" + queryString;
HttpResponseMessage response;
// Request body
byte[] byteData = Encoding.UTF8.GetBytes(body);
using (var content = new ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/xml");
response = await client.PostAsync(uri, content);
string responsebody = await response.Content.ReadAsStringAsync();
ParseResponse(response.ToString());
ParseResponse(responsebody);
}
}
The rest of the code
private void button1_Click(object sender, EventArgs e)
{
For
{
......
SendInvoice(xml)
}
System.Windows.Forms.Application.Exit();
}
Since you are calling the method from an Event Handler, this is a case where async void is acceptable, change your Button Click handler method signature to use async, I also added some ConfigureAwait(false) to async method calls - best-practice-to-call-configureawait-for-all-server-side-code and why-is-writing-configureawaitfalse-on-every-line-with-await-always-recommended:
private async void button1_Click(object sender, EventArgs e)
{
//since you are using a for-loop, I'd suggest adding each Task
//to a List and awaiting all Tasks to complete using .WhenAll()
var tasks = new List<Task>();
FOR
{
......
//await SendInvoice(xml).ConfigureAwait(false);
tasks.Add(SendInvoice(xml));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
System.Windows.Forms.Application.Exit();
}
AND change your SendInvoice method signature to return a Task
public async Task SendInvoice(string body)
{
Cursor.Current = Cursors.WaitCursor;
var client = new HttpClient();
var queryString = HttpUtility.ParseQueryString(string.Empty);
var uri = "https://xxxx.xxx/SendInvoices?" + queryString;
HttpResponseMessage response;
// Request body
byte[] byteData = Encoding.UTF8.GetBytes(body);
using (var content = new ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/xml");
response = await client.PostAsync(uri, content).ConfigureAwait(false);
string responsebody = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
ParseResponse(response.ToString());
ParseResponse(responsebody);
}
}
I was very used to multithreaded programming and it took me some time to understand asynchronous programming, because it really has nothing to do with multithreading. It is about doing more with a single thread, or a small number of threads.
asynchronous code is beneficial when the CPU would otherwise be waiting for something besides processing. Examples are: waiting for a network response, waiting for data to be read from disk, waiting on a separate process such as a database server.
It provides a way for the thread you are running to do other things while you wait. C# does this using Task. A task is some work that is being done, and it can be running or it can be waiting, and when waiting it doesn't need a thread attached.
All asynchronous functions must return a Task to be useful. So your function should be:
public async Task SendInvoice() {
...
The async keyword is used by the compiler to automatically wrap your function in a task object so you don't need to worry about a lot of the details. You just use await when calling another async function. You could do more work yourself to create tasks or return a task from another async function, or even call multiple async functions and await all of them together.
If your async method returns a value, use the generic Task: Task<String>, for example.
The Task is returned from an async method before the task completes. That is what allows the thread to be used by something else, but it has to get back to that starting place, which is why asynchonous programming you'll hear "async all the way up". It doesn't really do any good until it gets back to a caller that has multiple tasks to balance, which is usually the entry point of your application or the web request.
You can make your C# Main method async, but it mostly won't matter unless your process is really doing multiple things at the same time. For a web application, that can just be handling multiple requests. For a standalone app, it means you can query multiple APIs, make multiple web requests or db queries at the same time, and await them all, just using a single thread. Obviously, that can make things faster (at least locally, the external resources may have more work to do).
For a simple way to keep your program from exiting, if you have an asynchronous main, just await the call to SendInvoice. If your main is not async, you can use something like:
SendInvoice().Wait()
or
SendInvoice().Result
Using Wait() or Result will lock the thread until the task completes. It typically will make that thread exclusively available to the task so the thread cannot be used for any other tasks. If there are more threads in the threadpool, other tasks may continue to run, but typically using Wait/Result on a single Task defeats the point of asynchronous programming, so keep that in mind.
EDIT
Now that you have posted your calling code, it appears your call is in a loop. This is a good opportunity to take advantage of async calls and send ALL the invoices at once.
private async void button1_Click(object sender, EventArgs e)
{
List<Task> tasks = new List<Task>();
FOR
{
......
t = SendInvoice(xml).ConfigureAwait(false);
tasks.Add(t)
}
await Task.WhenAll(tasks).ConfigureAwait(false);
System.Windows.Forms.Application.Exit();
}
That will send ALL the invoices, then return from the handler, and then exit once all the responses have been received.

Asynchronous parallel web requests in ASP.NET MVC(C#)

I have a mini-project that requires to download html documents of multiple websites using C# and make it perform as fast as possible. In my scenario I might need to switch IP using proxies based on certain conditions. I want to take advantage of C# Asynchronous Tasks to make it execute as many requests as possible in order for the whole process to be fast and efficient.
Here's the code I have so far.
public class HTMLDownloader
{
public static List<string> URL_LIST = new List<string>();
public static List<string> HTML_DOCUMENTS = new List<string>();
public static void Main()
{
for (var i = 0; i < URL_LIST.Count; i++)
{
var html = Task.Run(() => Run(URL_LIST[i]));
HTML_DOCUMENTS.Add(html.Result);
}
}
public static async Task<string> Run(string url)
{
var client = new WebClient();
//Handle Proxy Credentials client.Proxy= new WebProxy();
string html = "";
try
{
html = await client.DownloadStringTaskAsync(new Uri(url));
//if(condition ==true)
//{
// return html;
//}
//else
//{
// Switch IP and try again
//}
}
catch (Exception e)
{
}
return html;
}
The problem here is that I'm not really taking advantage of sending multiple web requests because each request has to finish in order for the next one to begin. Is there a better approach to this? For example, send 10 web requests at a time and then send a new request when one of those requests is finished.
Thanks
I want to take advantage of C# Asynchronous Tasks to make it execute as many requests as possible in order for the whole process to be fast and efficient.
You can use Task.WhenAll to get asynchronous concurrency.
For example, send 10 web requests at a time and then send a new request when one of those requests is finished.
To throttle asynchronous concurrency, use SemaphoreSlim:
public static async Task Main()
{
using var limit = new SemaphoreSlim(10); // 10 at a time
var tasks = URL_LIST.Select(Process).ToList();
var results = await Task.WhenAll(tasks);
HTML_DOCUMENTS.AddRange(results);
async Task<string> Process(string url)
{
await limit.WaitAsync();
try { return await Run(url); }
finally { limit.Release(); }
}
}
One way is to use Task.WhenAll.
Creates a task that will complete when all of the supplied tasks have
completed.
The premise is, Select all the tasks into a List, await the list of task with Task.WhenAll, Select the results
public static async Task Main()
{
var tasks = URL_LIST.Select(Run);
await Task.WhenAll(tasks);
var results = tasks.Select(x => x.Result);
}
Note : The result of WhenAll will be the collection of results as well
First change your Main to be async.
Then you can use LINQ Select to run the Tasks in parallel.
public static async Task Main()
{
var tasks = URL_LIST.Select(Run);
string[] documents = await Task.WhenAll(tasks);
HTML_DOCUMENTS.AddRange(documents);
}
Task.WhenAll will unwrap the Task results into an array, once all the tasks are complete.

Task.Run on method returning T VS method returning Task<T>

Scenario 1 - For each website in string list (_websites), the caller method wraps GetWebContent into a task, waits for all the tasks to finish and return results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(Task.Run(() => GetWebsiteContent(website)));
}
var results = await Task.WhenAll(tasks);
return results;
}
private string GetWebContent(string url)
{
var client = new HttpClient();
var content = client.GetStringAsync(url);
return content.Result;
}
Scenario 2 - For each website in string list (_websites), the caller method calls GetWebContent (returns Task< string >), waits for all the tasks to finish and return the results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(GetWebContent(website));
}
var results = await Task.WhenAll(tasks);
return results;
}
private async Task<string> GetWebContent(string url)
{
var client = new HttpClient();
var content = await client.GetStringAsync(url);
return content;
}
Questions - Which way is the correct approach and why? How does each approach impact achieving asynchronous processing?
With Task.Run() you occupy a thread from the thread pool and tell it to wait until the web content has been received.
Why would you want to do that? Do you pay someone to stand next to your mailbox to tell you when a letter arrives?
GetStringAsync already is asynchronous. The cpu has nothing to do (with this process) while the content comes in over the network.
So the second approach is correct, no need to use extra threads from the thread pool here.
Always interesting to read: Stephen Cleary's "There is no thread"
#René Vogt gave a great explanation.
There a minor 5 cents from my side.
In the second example there is not need to use async / await in GetWebContent method. You can simply return Task<string> (this would also reduce async depth).

Make the code asynchronous in C#

I have 2 methods: the first sends HTTP GET request on one address and the second calls it multiple times (so, it sends request to many IPs). Both methods are async, so they don't block code execution while the requests are proccessing remotely. The problem is, due to my poor C# knowledge, I don't know how to send all the requests simultaneously, not one after another (which my code does). That's my code:
public static async Task<string> SendRequest(Uri uri)
{
using (var client = new HttpClient())
{
var resp = await client.GetStringAsync(uri).ConfigureAwait(false);
return resp;
}
}
public static async Task<string[]> SendToAllIps(string req)
{
string[] resp = new string[_allIps.Length];
for (int i = 0; i < _allIps.Length; i++)
{
resp[i] = await SendRequest(new Uri(_allIps[i] + req));
}
return resp;
}
How to make SendToAllIps send requests without awaiting for previous task result? Also SendToAllIps must return an array of responses when all the requests are finished. As far as I understand, this can be done with Task.WaitAll, but how to use it in this particular situation?
You can use Task.WhenAll to await a collection of tasks:
public static async Task<string[]> SendToAllIps(string req)
{
var tasks = _allIps.Select(ip => SendRequest(new Uri(ip + req)));
return await Task.WhenAll(tasks);
}
Answers provided above provide the correct way of doing it but doesn't provide rationale, let me explain what's wrong with your code:
Following line creates an issue:
resp[i] = await SendRequest(new Uri(_allIps[i] + req));
Why ?
As you are awaiting each individual request, it will stop the processing of the remaining requests, that's the behavior of the async-await and it will be almost the synchronous processing of each SendRequest when you wanted then to be concurrent.
Resolution:
SendRequest being an async method returns as Task<string>, you need add that to an IEnumerable<Task<string>> and then you have option:
Task.WaitAll or Task.WhenAll
Yours is Web API (Rest Application), which needs a Synchronization Context, so you need Task.WhenAll, which provides a Task as result to wait upon and integrate all the task results in an array, if you try using Task.WaitAll it will lead to deadlock, as it is not able to search the Synchronization Context, check the following:
WaitAll vs WhenAll
You can use the Task.WaitAll only for the Console application not the Web Application, Console App doesn't need any Synchronization Context or UI Thread
public static async Task<string[]> SendToAllIps(string req)
{
var tasks = new List<Task<string>>();
for (int i = 0; i < _allIps.Length; i++)
{
// Start task and assign the task itself to a collection.
var task = SendRequest(new Uri(_allIps[i] + req));
tasks.Add(task);
}
// await all the tasks.
string[] resp = await Task.WhenAll(tasks);
return resp;
}
The key here is to collect all the tasks in a collection and then await them all using await Task.WhenAll. Even though I think the solution from Lee is more elegant...

ASP MVC4 await with httpclient

I'm learning how to use Tasks in MVC and I get completly lost. I need to download the source from selected webpage, find one element, get its value and return it. That's it.
For this I am using HtmlAgilityPack and HttpClient to fetch the webpage.
The problem that occurs is where nothing is waiting for response from httpClient and thus results that response generation completed when Task was still in progress. (An asynchronous module or handler completed while an asynchronous operation was still pending.)
I read lots of threads in here ,codeproj and some blogs, still don't understand what's the problem. Most common explanation is about resulting type of void in async method, but I cannot find any other way to return awaiting value, than this:
public float ReadPrice(Uri url)
{
switch (url.Host)
{
case "www.host1.xy":
return ParseXYZAsync(url).Result;
default:
return float.Parse("99999,99");
}
}
private Task<float> ParseXYZAsync(Uri url)
{
loadPage(url);
var priceNode = document.DocumentNode.SelectSingleNode(
#"//*[#id='pageWrapper']/div[4]/section[1]/div[4]/div[1]/div[1]/span");
var price = priceNode.InnerText;
...
return priceInFloat;
}
private async Task LoadPage(Uri url)
{
HttpClient http = new HttpClient();
var response = await http.GetByteArrayAsync(url);
String source = Encoding.GetEncoding("utf-8")
.GetString(response, 0, response.Length - 1);
source = WebUtility.HtmlDecode(source);
document.LoadHtml(source);
}
In order to figure out what's wrong you need to understand one key concept with async-await. When an async method hits the first await keyword, control is yielded back to the calling method. This means that when you do this:
loadPage(url);
The method will synchronously run until it hits:
var response = await http.GetByteArrayAsync(url);
Which will yields control back to ParseWebSite, which will continue execution and will probably end before the async operation has actually completed.
What you need to do is make LoadPage return a Task and await for it's completion:
private async Task<float> ParseWebsiteAsync(Uri url)
{
await LoadPageAsync(url);
var priceNode = document.DocumentNode.SelectSingleNode
(#"//*[#id='pageWrapper']/div[4]/section[1]/div[4]/div[1]/div[1]/span");
var price = priceNode.InnerText;
return priceInFloat;
}
private async Task LoadPageAsync(Uri url)
{
HttpClient http = new HttpClient();
var source = await http.GetAsStringAsync(url);
source = WebUtility.HtmlDecode(source);
document.LoadHtml(source);
}
Side Notes:
Do follow .NET naming conventions. Methods should be PascalCase and not camelCase
Do add the "Async" postfix for asynchronous methods. E.g, LoadPage should become LoadPageAsync.
private async void loadPage(Uri url) needs to return a Task:
private async Task loadPage(Uri url)
then you need to await it in the calling method:
private async Task<float> parseWEBSITEXY(Uri url)
{
await loadPage(url);
...
}
In you code load page starts and it returns immediately.
One more thing - it is not recommended to use async void, except for event handlers. Always return Task.

Categories

Resources