List<string> urls = this.populateRequestList();
this.Logger("Starting");
var reqs = urls.Select<string, WebRequest>(HttpWebRequest.Create).ToArray();
var iars = reqs.Select(req => req.BeginGetResponse(null, null)).ToArray();
var rsps = reqs.Select((req, i) => req.EndGetResponse(iars[i])).ToArray();
this.Logger("Done");
Things I noticed so far:
When I run this code, "Starting" shows up in my log, but "Done" never shows up. When I view the whole process in the debugger, it seems to skip over it like it's not even there. No exceptions are being thrown either. When reqs.Select is looping through req.EndGetResponse(iars[i]), it's like it freezes or skips over stuff. When I view it in the debugger, I don't get past 10-15 loops before it just skips to the end.
Questions:
How do I stop this from "skipping" sometime during var rsps = reqs.Select((req, i) => req.EndGetResponse(iars[i])).ToArray();?
How to I get the html from rsps? I think this problem doing that stems from the "skipping". I tried looping through each response and calling Repsponse.GetResponseStream() etc..., but nothing happens as soon as it skips.
The problem with your code is that BeginGetResponse(null, null) accepts a callback as the first argument which is invoked when the operation completes. This callback is where EndGetResponse should be called. When you call EndGetResponse, the operations are not yet completed.
Look at this article to see how aync web requests can be made in C# using iterators: http://tomasp.net/blog/csharp-async.aspx.
If using the task parallel library or .NET 4 you can also do this:
var urls = new List<string>();
var tasks = urls.Select(url =>
{
var request = WebRequest.Create(url);
var task = Task.Factory.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);
task.Start();
return task;
}).ToArray();
Task.WaitAll(tasks);
foreach (var task in tasks)
{
using (var response = task.Result)
using (var stream = response.GetResponseStream())
using (var reader = new StreamReader(stream))
{
var html = reader.ReadToEnd();
}
}
You are trying to use the asynchronous request methods to do a synchronous request, that doesn't work.
You are supposed to start the requests using BeginGetResponse with a callback method that handles each response. If you call EndGetResponse immediately after BeginGetResponse, it will fail because the response haven't started to arrive yet.
If you want to make a synchronous request, use the GetResponse method instead.
As I read http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.endgetresponse.aspx you need to wait for the callback before you can use EndGetResponse?
Or use GetReponse: http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.getresponse.aspx
Related
Below is my code to get an HTML page
public static async Task<string> GetUrltoHtml(string url)
{
string s;
using (var client = new HttpClient())
{
var result = client.GetAsync(url).Result;
//Console.WriteLine("!!!"+result.StatusCode);
s = result.Content.ReadAsStringAsync().Result; //break point
}
return s;
}
the line
var result = client.GetAsync(url).Result;
causes app freeze seconds and work as sync mode
Your comment welcome
According to the docs
Accessing the property's get accessor blocks the calling thread until the asynchronous operation is complete; it is equivalent to calling the Wait method.
So getting Result is a blocking action. You should use await instead.
s = await result.Content.ReadAsStringAsync();
(Result is helpful when the result is ready and you just want to get it. Or in some cases you want to block the thread (but it's not recommended).)
I've been reading Essential C# 6.0 recently. In the chapter of the book where author explains multi threading he shows this method and I don't understand two things about it which don't seem to be explained anywhere.
private static Task WriteWebRequestSizeAsync(string url)
{
StreamReader reader = null;
WebRequest webRequest = WebRequest.Create(url);
Task task = webRequest.GetResponseAsync()
.ContinueWith(antecedent =>
{
WebResponse response = antecedent.Result;
reader = new StreamReader(response.GetResponseStream());
return reader.ReadToEndAsync();
})
.Unwrap()
.ContinueWith(antecedent =>
{
if(reader != null) reader.Dispose();
string text = antecedent.Result;
Console.WriteLine(text.Length);
});
return task;
}
1. Why does the author use ContinueWith() methods and calls them essential? How is his way of doing it better than my approach, which does not utilize these methods?
private static Task WriteWebRequestSizeAsync(string url)
{
return Task.Run(() =>
{
WebRequest webRequest = WebRequest.Create(url);
WebResponse response = webRequest.GetResponseAsync().Result;
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string text = reader.ReadToEndAsync().Result;
Console.WriteLine(text.Length);
}
});
}
2. Why does the author use async variants of the methods and then access their result via .Result property, instead of using not async variants as it appears to have the same result at the end. Please, notice that I haven't changed it in my approach above
Although you are calling GetResponseAsync() in your method, however, trying to use .Result makes it a blocking call.As a result of this your task continues to wait for the result to be available wasting cpu cycles.
WebResponse response = webRequest.GetResponseAsync().Result; //blocking call
However, in the example by author, GetResponseAsync() is followed by a ContinueWith(). This means that Task on which GetResponseAsync() is called won't be blocked and can be utilized to do something else. When the result of GetResponseAsync() is available the continuation will run.
webRequest.GetResponseAsync()
.ContinueWith(antecedent =>
{
WebResponse response = antecedent.Result;
reader = new StreamReader(response.GetResponseStream());
return reader.ReadToEndAsync();
})
Same example can also be written using async and await instead of continuation...This will have a similar effect of continuations . However, this will be more natural to read.
var result = await webRequest.GetResponseAsync();
//do something with result now.
It seems as if the author uses stacked continuation in order to split the operations according to the seperation of concerns principle.
Main difference between yours and authors way is that author runs code in the same thread from what method WriteWebRequestSizeAsync while your code will run in some thread from ThreadPull.
I don't know context so may be it's essential.
About second question. If author calls not async methods he could not get tasks and attach to them ContinueWith.
I'm trying to learn the async and await mechanisms in C#.
The simplest example is clear to me.
The line
Task<string> getStringTask = client.GetStringAsync("http://msdn.microsoft.com");
triggers an asynchronous web call. The control returns to AccessTheWebAsync(). It is free to perform DoIndependentWork(). After doing this it waits for the completion of the task getStringTask and when this result is available the function executes the next line
return urlContents.Length;
So, as far as I understand the purpose of the async call is to let the caller execute other operations when the operation tagged with async is in progress.
However, I'm bit confused with the example, in this function.
private async Task<byte[]> GetURLContentsAsync(string url)
{
// The downloaded resource ends up in the variable named content.
var content = new MemoryStream();
// Initialize an HttpWebRequest for the current URL.
var webReq = (HttpWebRequest)WebRequest.Create(url);
// Send the request to the Internet resource and wait for
// the response.
using (WebResponse response = await webReq.GetResponseAsync())
// The previous statement abbreviates the following two statements.
//Task<WebResponse> responseTask = webReq.GetResponseAsync();
//using (WebResponse response = await responseTask)
{
// Get the data stream that is associated with the specified url.
using (Stream responseStream = response.GetResponseStream())
{
// Read the bytes in responseStream and copy them to content.
await responseStream.CopyToAsync(content);
// The previous statement abbreviates the following two statements.
// CopyToAsync returns a Task, not a Task<T>.
//Task copyTask = responseStream.CopyToAsync(content);
// When copyTask is completed, content contains a copy of
// responseStream.
//await copyTask;
}
}
// Return the result as a byte array.
return content.ToArray();
}
Inside the method GetURLContentsAsync(), there are two async invocations. However, the API waits with an await call on both. The caller is not doing anything between the trigger of the async operation and the receipt of the data. So, as far as I understand, the async/await mechanism brings no benefit here. Am I missing something obvious here?
Your code doesn't need to explicitly be doing anything between await'd async calls to gain benefit. It means that the thread isn't sitting waiting for each call to complete, it is available to do other work.
If this is a web application it can result in more requests being processed. If it is a Windows application it means the UI thread isn't blocked and the user has a better experience.
However, the API waits with an await call on both.
You will have to await for the both because your method code should get executed sequentially, if you don't await the first call, your next lines of code will also get executed which is something you might not expect or need to happen.
The following two reasons that come in my mind for awaiting both methods are:
it is possible that your first async method result is used as
parameter in your second async method call
it is also possible that we decide on the result of first async
method call that the second async method to be called or not
So if that's the case then it is quite clear why you would not need to add await to every async method call inside your async method.
EDIT:
From the example which you are pointing to clearly you can see that the output of first async method is being used in the second async method call here:
using (WebResponse response = await webReq.GetResponseAsync())
// The previous statement abbreviates the following two statements.
//using (WebResponse response = await responseTask)
{
// Get the data stream that is associated with the specified url.
using (Stream responseStream = response.GetResponseStream())
{
// Read the bytes in responseStream and copy them to content.
await responseStream.CopyToAsync(content);
// The previous statement abbreviates the following two statements.
// CopyToAsync returns a Task, not a Task<T>.
//Task copyTask = responseStream.CopyToAsync(content);
// When copyTask is completed, content contains a copy of
// responseStream.
//await copyTask;
}
}
GetResponseAsync returns when the web server starts its response (by sending the headers), while CopyToAsync returns once all the data has been sent from the server and copied to the other stream.
If you add code to record how much time elapses between the start of the asynchronous call and the return to your function, you'll see that both methods take some time to complete (on a large file, at least.)
private static async Task<byte[]> GetURLContentsAsync(string url) {
var content = new MemoryStream();
var webReq = (HttpWebRequest)WebRequest.Create(url);
DateTime responseStart = DateTime.Now;
using (WebResponse response = await webReq.GetResponseAsync()) {
Console.WriteLine($"GetResponseAsync time: {(DateTime.Now - responseStart).TotalSeconds}");
using (Stream responseStream = response.GetResponseStream()) {
DateTime copyStart = DateTime.Now;
await responseStream.CopyToAsync(content);
Console.WriteLine($"CopyToAsync time: {(DateTime.Now - copyStart).TotalSeconds}");
}
}
return content.ToArray();
}
For a ~40 MB file on a fast server, the first await is quick while the second await takes longer.
https://ftp.mozilla.org/pub/thunderbird/releases/52.2.1/win32/en-US/Thunderbird%20Setup%2052.2.1.exe
GetResponseAsync time: 0.3422409
CopyToAsync time: 5.3175731
But for a server that takes a while to respond, the first await can take a while too.
http://www.fakeresponse.com/api/?sleep=3
GetResponseAsync time: 3.3125195
CopyToAsync time: 0
I want to read a XML file from the Web with following method.
public static async void Load_WinPhone(string URL)
{
HttpClient client = new HttpClient();
var httpResponseMessage = await client.GetAsync(new Uri(URL));
if (httpResponseMessage.StatusCode == System.Net.HttpStatusCode.OK)
{
var xmlStream = await httpResponseMessage.Content.ReadAsStreamAsync();
XDocument Xdoc = XDocument.Load(xmlStream);
var query = from data in Xdoc.Descendants("article")
select new MyClass
{
Title = data.Element("title").Value
}
foreach (MyClass x in query)
{
AnotherClass.List.Add(x);
}
}
This Works, but after the method finished the AnotherClass.List is still empty.
I think it is because of the async, I tried this in the console without the async and it worked fine.
But now i want to to this on a Windows Phone 8.1 and the list stays empty.
Can someone explain me why or even have a workaround for this?
Yes, that's how await works - from the point of view of the caller, it's basically the same thing as a return. So when you call this method, it most likely returns on the first await - long before AnotherClass.List is modified.
The main problem you have is that your method is async void - you're throwing away all the information about the method's execution. Instead, you want to return Task - this allows you to await the method or bind a continuation to it.
Whenever you break the await chain, you also break the synchronicity of the code. Most of the time (especially in UI), you want to await all the way to the top - usually, the only thing that's async void is the event handlers, and even then it's only because event handlers must return void.
Overall, multi-threading and asynchronous code is a rather big topic - http://www.albahari.com/threading/ is a great start on understanding most of the fundamentals, as well as ways to handle it well in C#.
Inside a c# project I'm making some calls to a web api, the thing is that I'm doing them within a loop in a method. Usually there are not so many but even though I was thinking of taking advantage of parallelism.
What I am trying so far is
public void DeployView(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
});
Task.WhenAll(tasks);
}
}
But wonder if that's the correct path, or should I try to parallel the whole DeployView (i.e. even before using the HttpClient)
Now that I see it posted, I reckon I can't just remove the variable response as well, just do the await without setting it to any variable
Thanks
Usually there is no need to parallelize the requests - one thread making async requests should be enough (even if you have hundreds of requests). Consider this code:
var tasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
//now tasks is IEnumerable<Task<WebResponse>>
await Task.WhenAll(tasks);
//now all the responses are available
foreach(WebResponse response in tasks.Select(p=> p.Result))
{
//do something with the response
}
However, you can utilize parallelism when processing the responses. Instead of the above 'foreach' loop you may use:
Parallel.Foreach(tasks.Select(p=> p.Result), response => ProcessResponse(response));
But TMO, this is the best utilization of asynchronous and parallelism:
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
ProcessResponse(response);
});
await Task.WhenAll(tasks);
There is a major difference between the first and last examples:
In the first one, you have one thread launching async requests, waits (non blocking) for all of them to return, and only then processing them.
In the second example, you attach a continuation to each Task. That way, every response gets processed as soon as it arrives. Assuming the current TaskScheduler allows parallel (multithreaded) execution of Tasks, no response remains idle as in the first example.
*Edit - if you do decide to do it parallel, you can use just one instance of HttpClient - it's thread safe.
What you're introducing is concurrency, not parallelism. More on that here.
Your direction is good, though a few minor changes that I would make:
First, you should mark your method as async Task as you're using Task.WhenAll, which returns an awaitable, which you will need to asynchronously wait on. Next, You can simply return the operation from PostAsJsonAsync, instead of awaiting each call inside your Select. This will save a little bit of overhead as it won't generate the state-machine for the async call:
public async Task DeployViewAsync(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var agentTasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
await Task.WhenAll(agentTasks);
}
}
HttpClient is able to make concurrent requests (see #usr link for more), thus I don't see a reason to create a new instance each time inside your lambda. Note that if you consume DeployViewAsync multiple times, perhaps you'll want to keep your HttpClient around instead of allocating one each time, and dispose it once you no longer need its services.
HttpClient appears to be usable for concurrent requests. I have not verified this myself, this is just what I gather from searching. Therefore, you don't have to create a new client for each task that you are starting. You can do what is most convenient to you.
In general I strive to share as little (mutable) state as possible. Resource acquisitions should generally be pushed inwards towards their usage. I think it's better style to create a helper CreateHttpClient and create a new client for each request here. Consider making the Select body a new async method. Then, the HttpClient usage is completely hidden from DeployView.
Don't forget to await the WhenAll task and make the method async Task. (If you do not understand why that is necessary you've got some research about await to do.)