HttpWebResonse hangs on multiple request - c#

I've an application that create many web request to donwload the news pages of a web site
(i've tested for many web sites)
after a while I find out that the application slows down in fetching the html source then I found out that HttpWebResonse fails getting the response. I post only the function that do this job.
public PageFetchResult Fetch()
{
PageFetchResult fetchResult = new PageFetchResult();
try
{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(URLAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Uri requestedURI = new Uri(URLAddress);
Uri responseURI = resp.ResponseUri;
if (Uri.Equals(requestedURI, responseURI))
{
string resultHTML = "";
byte[] reqHTML = ResponseAsBytes(resp);
if (!string.IsNullOrEmpty(FetchingEncoding))
resultHTML = Encoding.GetEncoding(FetchingEncoding).GetString(reqHTML);
else if (!string.IsNullOrEmpty(resp.CharacterSet))
resultHTML = Encoding.GetEncoding(resp.CharacterSet).GetString(reqHTML);
resp.Close();
fetchResult.IsOK = true;
fetchResult.ResultHTML = resultHTML;
}
else
{
URLAddress = responseURI.AbsoluteUri;
relayPageCount++;
if (relayPageCount > 5)
{
fetchResult.IsOK = false;
fetchResult.ErrorMessage = "Maximum page redirection occured.";
return fetchResult;
}
return Fetch();
}
}
catch (Exception ex)
{
fetchResult.IsOK = false;
fetchResult.ErrorMessage = ex.Message;
}
return fetchResult;
}
any solution would greatly appreciate

Fetch function is called recursively and always creates HttpWebRequest but releasing only when url is matched. You have to close request and response in else statement.

I agree with #volody, Also HttpWebRequest already have property called MaximumAutomaticRedirections, which is set to 50, you can set it to 5 to automatically achieve what you are looking for in this code anyway, it will raise exception and that will be handled by your code.
Just set
request.MaximumAutomaticRedirections = 5;

Related

HttpWebRequest.GetRequestStream() connection gets actively refused in executable but not in WPF standalone application

I am working with web server to make calls to its API via HttpWebRequests. I wrote a standalone WPF application for testing purposes and all of my requests were functioning correctly. When I referenced the working project file in my production application it is now returning that the request is being actively refused by the server.
public string Post(string xmlData, Transaction transaction)
{
var result = "";
try
{
var webReq = (HttpWebRequest)WebRequest.Create(BaseUrl);
webReq.Accept = "application/xml";
webReq.ContentType = "application/xml";
webReq.Method = "POST";
webReq.KeepAlive = false;
webReq.Proxy = WebRequest.DefaultWebProxy;
webReq.ProtocolVersion = HttpVersion.Version10;
// If we passed in data to be written to the body of the request add it
if (!string.IsNullOrEmpty(xmlData))
{
webReq.ContentLength = xmlData.Length;
using (var streamWriter = new StreamWriter(webReq.GetRequestStream())) /**CONNECTION REFUSED EXCEPTION HERE**/
{
streamWriter.Write(xmlData);
streamWriter.Flush();
streamWriter.Close();
}
}
else //Otherwise write empty string as body
{
webReq.ContentLength = 0;
var data = "";
using (var streamWriter = new StreamWriter(webReq.GetRequestStream()))
{
streamWriter.Write(data);
streamWriter.Flush();
streamWriter.Close();
}
}
//Attempt to get response from web request, catch exception if there is one
using (var response = (HttpWebResponse)webReq.GetResponse())
{
using (var streamreader =
new StreamReader(response.GetResponseStream() ?? throw new InvalidOperationException()))
{
result = streamreader.ReadToEnd();
}
}
return result;
}
catch (WebException e)
{
//Handle web exceptions here
}
catch (Exception e)
{
//Handle other exceptions here
}
}
Has anyone else encountered this problem?
After reviewing your fiddler requests I can say that the reason is probably the IP address difference.
You use 192.168.1.186:44000 first time and 192.168.1.86:44000 second time.

Getting 302 status and too many redirects for valid URL while using httpWebRequest and WebClient class

Hi guys need help i am trying to make a console application which checks whether a website is available or not. Also i am trying to get the title of the page.
For doing this i am using HttpWebRequest Class (for getting status) and WebClient class (for getting title).
Note: The page i am trying to get is on a private server.
URl format is (applicationname-environment.corporation.companyname.com)
example: FIFO-dev.corp.tryit.com
When i try to get the status it is always giving me 401 as its status even though the page is up and running
List<int> Web_Status = new List<int>();
foreach (var URL in WEB_URL)
{
try
{
HttpWebRequest Web_Test = (HttpWebRequest)WebRequest.Create("http://" + URL);
Web_Test.AllowAutoRedirect = true;
HttpWebResponse Web_response = (HttpWebResponse)Web_Test.GetResponse();
Web_Status.Add((int)Web_response.StatusCode);
Web_response.Close();
}
catch (System.Net.WebException ex)
{
HttpWebResponse Web_response = (HttpWebResponse)ex.Response;
Web_Status.Add((int)Web_response.StatusCode);
}
}
Also note while giving url's i am making sure that i am not reentering http://.
The below code is giving this error
"System.Net.WebException: The remote server returned an error: (401)
Unauthorized.
at System.Net.WebClient.DownloadDataInternal(Uri address,
WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at website_monitoring.Get_Title.Title(List`1 WEB_URL) in "
string source = "";
List<string> status = new List<string>();
WebClient x = new WebClient();
foreach (var item in WEB_URL)
{
try
{
source = x.DownloadString("http://" + item);
status.Add(Regex.Match(source, #"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value);
}
catch (System.Net.WebException ex)
{
status.Add(ex.ToString());
}
}
Sorry guys i cant give the exact url i am trying on.
This code is working with all the common websites and blogs like
"stackoverflow.com","http://understandingarduino.blogspot.in/" and so
on.
Update 1: following mammago suggestion i was able to handle 4xx issue but now it is giving too many redirects issues while getting title.
i was able to handle 302 status issue by autoredirect property to 1000;
List<int> Web_Status = new List<int>();
foreach (var URL in WEB_URL)
{
try
{
HttpWebRequest Web_Test = (HttpWebRequest)WebRequest.Create("http://" + URL);
// Set credentials to use for this request.
Web_Test.Credentials = CredentialCache.DefaultCredentials;
Web_Test.CookieContainer = new CookieContainer();
Web_Test.AllowAutoRedirect = true;
Web_Test.MaximumAutomaticRedirections = 1000;
//Web_Test.UserAgent =
HttpWebResponse Web_response = (HttpWebResponse)Web_Test.GetResponse();
Web_Status.Add((int)Web_response.StatusCode);
Web_response.Close();
}
catch (System.Net.WebException ex)
{
HttpWebResponse Web_response = (HttpWebResponse)ex.Response;
Web_Status.Add((int)Web_response.StatusCode);
}
}
Now all i need help is how to handle auto- redirect issue in this segment
string source = "";
List<string> status = new List<string>();
WebClient x = new WebClient();
//letting the website know its a known user who is accessing it (as if website have auto authentication)
x.Credentials = CredentialCache.DefaultCredentials;
foreach (var item in WEB_URL)
{
try
{
source = x.DownloadString("http://" + item);
status.Add(Regex.Match(source, #"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value);
}
catch (System.Net.WebException ex)
{
status.Add(ex.ToString());
//source = x.DownloadString("http://" + ex);
//status.Add(Regex.Match(source, #"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value);
}
}
"System.Net.WebException: Too many automatic redirections were
attempted. at System.Net.WebClient.DownloadDataInternal(Uri
address, WebRequest& request) at
System.Net.WebClient.DownloadString(Uri address)

Why i'm getting exception: Too many automatic redirections were attempted on webclient?

In the top of form1 i did:
WebClient Client;
Then in the constructor:
Client = new WebClient();
Client.DownloadFileCompleted += Client_DownloadFileCompleted;
Client.DownloadProgressChanged += Client_DownloadProgressChanged;
Then i have this method i'm calling every minute:
private void fileDownloadRadar()
{
if (Client.IsBusy == true)
{
Client.CancelAsync();
}
else
{
Client.DownloadProgressChanged += Client_DownloadProgressChanged;
Client.DownloadFileAsync(myUri, combinedTemp);
}
}
Every minutes it's downloading an image from a website same image each time.
It was all working for more then 24 hours no problems untill now throwing this exception in the download completed event:
private void Client_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
if (e.Error != null)
{
timer1.Stop();
span = new TimeSpan(0, (int)numericUpDown1.Value, 0);
label21.Text = span.ToString(#"mm\:ss");
timer3.Start();
}
else if (!e.Cancelled)
{
label19.ForeColor = Color.Green;
label19.Text = "חיבור האינטרנט והאתר תקינים";
label19.Visible = true;
timer3.Stop();
if (timer1.Enabled != true)
{
if (BeginDownload == true)
{
timer1.Start();
}
}
bool fileok = Bad_File_Testing(combinedTemp);
if (fileok == true)
{
File1 = new Bitmap(combinedTemp);
bool compared = ComparingImages(File1);
if (compared == false)
{
DirectoryInfo dir1 = new DirectoryInfo(sf);
FileInfo[] fi = dir1.GetFiles("*.gif");
last_file = fi[fi.Length - 1].FullName;
string lastFileNumber = last_file.Substring(82, 6);
int lastNumber = int.Parse(lastFileNumber);
lastNumber++;
string newFileName = string.Format("radar{0:D6}.gif", lastNumber);
identicalFilesComparison = File_Utility.File_Comparison(combinedTemp, last_file);
if (identicalFilesComparison == false)
{
string newfile = Path.Combine(sf, newFileName);
File.Copy(combinedTemp, newfile);
LastFileIsEmpty();
}
}
if (checkBox2.Checked)
{
simdownloads.SimulateDownloadRadar();
}
}
else
{
File.Delete(combinedTemp);
}
File1.Dispose();
}
}
Now it stopped inside the if(e.Error != null)
On the line: timer1.Stop();
Then i see on the Error the error:
This is the stack trace:
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Net.WebClient.GetWebResponse(WebRequest request, IAsyncResult result)
at System.Net.WebClient.DownloadBitsResponseCallback(IAsyncResult result)
How can i solve this problem so it won't happen again ? And why it happened ?
EDIT:
I tried to change the fileDownloadRadar method to this to release the client every time:
private void fileDownloadRadar()
{
using (WebClient client = new WebClient())
{
if (client.IsBusy == true)
{
client.CancelAsync();
}
else
{
client.DownloadFileAsync(myUri, combinedTemp);
}
}
}
The problem is that in the constructor i'm using Client and here it's client two different Webclient variables.
How can i solve this and the exception ?
This is the websitel ink for the site with the image i'm downloading every minute.
Still not sure yet why i got this exception after it was working no problems for more then 24 hours.
Now i ran the program again over again and it's working but i wonder if i will get this exception again tommorow or sometimes in the next hours.
The site with image i'm downloading
I had the same problem with WebClient and found the solution here:
http://blog.developers.ba/fixing-issue-httpclient-many-automatic-redirections-attempted/
Using HttpWebRequest and setting a CookieContainer solved the problem, for example:
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(linkUrl);
try
{
webReq.CookieContainer = new CookieContainer();
webReq.Method = "GET";
using (WebResponse response = webReq.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(stream);
res = reader.ReadToEnd();
...
}
}
}
catch (Exception ex)
{
...
}
If you're getting an exception with a description that says there are too many redirections, it's because the Web site you're trying to access is redirecting to another site, which is directing to another, and another, etc. beyond the default redirections limit.
So, for example, you try to get an image from site A. Site A redirects you to site B. Site B redirects you to site C, etc.
WebClient is configured to follow redirections up to some default limit. Since WebClient is based on HttpWebRequest, it's likely that it is using the default value for MaximumAutomaticRedirections, which is 50.
Most likely, either there is a bug on the server and it's redirecting in a tight loop, or they got tired of you hitting the server for the same file once per minute and they're purposely redirecting you in a circle.
The only way to determine what's really happening is to change your program so that it doesn't automatically follow redirections. That way, you can examine the redirection URL returned by the Web site and determine what's really going on. If you want to do that, you'll need to use HttpWebRequest rather than WebClient.
Or, you could use something like wget with verbose logging turned on. That will show you what what the server is returning when you make a request.
Although this is an old topic, I couldn't help but notice that the poster was using WebClient which uses no UserAgent when making the request. Many sites will reject or redirect clients that don't have a proper UserAgent string.
Consider setting the WebClient.Headers["User-Agent"]
Problem can be solved by setting a cookie container and most importantly by setting webRequest.AllowAutoRedirect = false; like this:
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);
webRequest.CookieContainer = new CookieContainer();
webRequest.AllowAutoRedirect = false;
I had this error but got a simple fix.
You don't need all that code, all you need is to do is in the beginning of your application download the cookie like this (sorry but i work with VB :) but is pretty simple to convert)
[your application namespace].Application.GetCookie(New Uri("https://[site]"))
The easiest is to create CookieAwareWebClient and override the creation of the WebRequest.
It looks like that:
public class CookieAwareWebClient : WebClient {
public CookieContainer CookieContainer { get; set; }
public Uri Uri { get; set; }
public CookieAwareWebClient()
: this(new CookieContainer()) {
}
public CookieAwareWebClient(CookieContainer cookies) {
this.CookieContainer = cookies;
}
protected override WebRequest GetWebRequest(Uri address) {
var request = base.GetWebRequest(address);
if (request is HttpWebRequest) {
(request as HttpWebRequest).CookieContainer = this.CookieContainer;
}
var httpRequest = (HttpWebRequest)request;
httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpRequest.AllowAutoRedirect = true;
httpRequest.MaximumAutomaticRedirections = 100;
httpRequest.ContinueTimeout = 5 * 60 * 1000;
httpRequest.Timeout = 5 * 60 * 1000;
return httpRequest;
}
protected override WebResponse GetWebResponse(WebRequest request) {
var response = base.GetWebResponse(request);
var setCookieHeader = response.Headers[HttpResponseHeader.SetCookie];
//if (setCookieHeader != null)
//{
// Cookie cookie = new Cookie(); //create cookie
// cookie.Value = setCookieHeader;
// this.CookieContainer.Add(cookie);
//}
return response;
}
}
Here is the VB.Net version of #Ron.Eng's answer:
Public Function DownloadFileWithCookieContainerWebRequest(URL As String, FileName As String)
Dim webReq As HttpWebRequest = HttpWebRequest.Create(URL)
Try
webReq.CookieContainer = New CookieContainer()
webReq.Method = "GET"
Using response As WebResponse = webReq.GetResponse()
Using Stream As Stream = response.GetResponseStream()
Dim reader As StreamReader = New StreamReader(Stream)
Dim res As String = reader.ReadToEnd()
File.WriteAllText(FileName, res)
End Using
End Using
Catch ex As Exception
Throw ex
End Try
End Function

WebException when loading rss feed

I am attempting to load a page I've received from an RSS feed and I receive the following WebException:
Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.
with an inner exception:
Invalid URI: The hostname could not be parsed.
I wrote a code that would attempt loading the url via an HttpWebRequest. Due to some suggestions I received, when the HttpWebRequest fails I then set the AllowAutoRedirect to false and basically manually loop through the iterations of redirect until I find out what ultimately fails. Here's the code I'm using, please forgive the gratuitous Console.Write/Writeline calls:
Uri url = new Uri(val);
bool result = true;
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
string source = String.Empty;
Uri responseURI;
try
{
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
reader.Close();
}
}
req.Abort();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(source);
result = true;
}
catch (ArgumentException ae)
{
Console.WriteLine(url + "\n--\n" + ae.Message);
result = false;
}
catch (WebException we)
{
Console.WriteLine(url + "\n--\n" + we.Message);
result = false;
string urlValue = url.ToString();
try
{
bool cont = true;
int count = 0;
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
if (string.IsNullOrEmpty(source))
{
urlValue = httpWebResponse.Headers["Location"].ToString();
count++;
reader.Close();
}
else
{
cont = false;
}
}
}
} while (cont);
}
catch (UriFormatException uriEx)
{
Console.WriteLine(urlValue + "\n--\n" + uriEx.Message + "\r\n");
result = false;
}
catch (WebException innerWE)
{
Console.WriteLine(urlValue + "\n--\n" + innerWE.Message+"\r\n");
result = false;
}
}
if (result)
Console.WriteLine("testing successful");
else
Console.WriteLine("testing unsuccessful");
Since this is currently just test code I hardcode val as http://rss.nytimes.com/c/34625/f/642557/s/3d072012/sc/38/l/0Lartsbeat0Bblogs0Bnytimes0N0C20A140C0A70C30A0Csarah0Ekane0Eplay0Eamong0Eofferings0Eat0Est0Eanns0Ewarehouse0C0Dpartner0Frss0Gemc0Frss/story01.htm
the ending url that gives the UriFormatException is: http:////www-nc.nytimes.com/2014/07/30/sarah-kane-play-among-offerings-at-st-anns-warehouse/?=_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&partner=rss&emc=rss&_r=6&
Now I'm sure if I'm missing something or if I'm doing the looping wrong, but if I take val and just put that into a browser the page loads fine, and if I take the url that causes the exception and put it in a browser I get taken to an account login for nytimes.
I have a number of these rss feed urls that are resulting in this problem. I also have a large number of these rss feed urls that have no problem loading at all. Let me know if there is any more information needed to help resolve this. Any help with this would be greatly appreciated.
Could it be that I need to have some sort of cookie capability enabled?
You need to keep track of the cookies while doing all your requests. You can use an instance of the CookieContainer class to achieve that.
At the top of your method I made the following changes:
Uri url = new Uri(val);
bool result = true;
// keep all our cookies for the duration of our calls
var cookies = new CookieContainer();
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
// assign our CookieContainer to the new request
req.CookieContainer = cookies;
string source = String.Empty;
Uri responseURI;
try
{
And in the exception handler where you create a new HttpWebRequest, you do the assignment from our CookieContainer again:
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
// reuse our cookies!
req.CookieContainer = cookies;
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
This makes sure that on each successive call the already present cookies are resend again in the next request. If you leave this out, no cookies are sent and therefore the site you try to visit assumes you are a fresh/new/unseen user and gives you a kind of authentication path.
If you want to store/keep cookies beyond this method you could move the cookie instance variable to a static public property so you can use all those cookies program-wide like so:
public static class Cookies
{
static readonly CookieContainer _cookies = new CookieContainer();
public static CookieContainer All
{
get
{
return _cookies;
}
}
}
And to use it in a WebRequest:
var req = (System.Net.HttpWebRequest) WebRequest.Create(url);
req.CookieContainer = Cookies.All;

How to ensure the response for image is complete?

I'm doing a webscraping project in ASP.net for a website, as there is a need for Catpcha code, hence I need to get the Captcha code for users to key in before continue.
So far the project is working fine, but the only problem I found is that sometimes the captcha code response was not entirely captured hence converting the response stream to Image caused the following errors:
"Parameter is invalid."
I noticed that web browsers do not have this problem, and it always can show the captcha code nicely as long as the server is not down.
However, this doesn't make sense to HttpWebRequest, it is sometimes able to get it, and sometimes not, may I know is there a way to ensure that the Response Stream is complete?
My Code snippet is as follow:
public Image GetCaptchaCode()
{
Image returnVal = null;
Uri uri = new Uri(URL_CAPTCHA);
HttpWebRequest request = null;
HttpWebResponse response = null;
try
{
// Get Cookies
CookieCollection cookies = this.GetCookies();
foreach (Cookie cookie in cookies)
{
Console.WriteLine(cookie.Name + ": " + cookie.Value);
}
// Get Catpcha
request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.ProtocolVersion = HttpVersion.Version11;
request.Method = WebRequestMethods.Http.Get; // use GET for loading Captcha
request.CookieContainer = this._cookies; // Store Cookies Info
System.Net.ServicePointManager.Expect100Continue = false;
// Add more cookies
if (cookies != null)
{
request.CookieContainer.Add(cookies);
}
// Handle Gzip Compression
request.Headers.Add(HttpRequestHeader.AcceptEncoding, HEADER_TYPE);
request.AutomaticDecompression = DecompressionMethods.GZip;
request.Referer = URL_REFERER;
request.UserAgent = USER_AGENT;
// Get Response
response = (HttpWebResponse)request.GetResponse();
returnVal = Image.FromStream(response.GetResponseStream());
}
catch (Exception ex)
{
string errMsg = ex.Message;
}
finally
{
if (uri != null) uri = null;
if (request != null) request = null;
if (response != null)
{
response.Close();
response = null;
}
}
return returnVal;
}

Categories

Resources