I'm trying to fetch some webpages using the code below:
public static string FetchPage(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
req.Headers.Add("Accept-Encoding", "gzip,deflate");
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
req.Headers.Add("Keep-Alive", "115");
req.Headers.Add("Cache-Control: max-age=0");
req.AllowAutoRedirect = true;
req.IfModifiedSince = DateTime.Now;
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
using (Stream resStream = resp.GetResponseStream())
{
StreamReader reader = new StreamReader(resStream);
return reader.ReadToEnd();
}
}
}
Some pages work (W3C, example.com) while most others I've tried do not (BBC.co.uk, CNN.com, etc). Wireshark shows that I'm getting a proper reponse.
I've tried setting the encoding of the reader to the expected encoding of the response (CNN - utf8) as well as every possible combination but I have had no luck.
What am I missing out on here?
The first bytes of my response are always "1f ef bf bd" if you're able to tell something based on that.
I suspect the most likely explanation is that you are getting compressed data and not uncompressing it. Try using a stream filter to deflate/unzip it. See Rick Strahl's blog article for more info.
Loading http://bbc.co.uk worked for me when leaving out the "Accept-Encoding" header:
req.Headers.Add("Accept-Encoding", "gzip,deflate");
Related
Like I said i trying to get some HTML content from URL but it's telling me "This error page might contain sensitive error" How can i solve it ? My code :
try
{
string siteContent = string.Empty;
string url = "https://www.antalyaeo.org.tr/tr/nobetci-eczaneler";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream responseStream = response.GetResponseStream())
using (StreamReader streamReader = new StreamReader(responseStream))
{
siteContent = streamReader.ReadToEnd();
}
return siteContent;
}
catch (WebException webex)
{
WebResponse errResp = webex.Response;
using (Stream respStream = errResp.GetResponseStream())
{
StreamReader reader = new StreamReader(respStream);
string text = reader.ReadToEnd();
return text;
}
}
Error message :
This error page might contain sensitive information because ASP.NET is
configured to show verbose error messages using <customErrors
mode="Off"/>. Consider using <customErrors mode="On"/> or
<customErrors mode="RemoteOnly"/> in production environments.-->
What I tried: When I search google and StackOverflow I saw they said make customError="False" or kinda like that thing from system.web / web.config But both of the files have on my project. How can I solve it? Thanks.
You need to add user-agent otherwise it seems server denies your request, for example:
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;
request.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())
I have the following code for getting a website and it works fine. The problem come up when I try to get a web page developed in Angular.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201";
request.Method = "GET";
request.Timeout = 30000;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream flujo = response.GetResponseStream();
Encoding encode = Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(flujo, encode);
String html;
try
{
html = readStream.ReadToEnd();
} catch(System.IO.IOException)
{
return;
}
response.Close();
readStream.Close();
HtmlAgilityPack.HtmlDocument DOM = new HtmlAgilityPack.HtmlDocument();
DOM.LoadHtml(html);
I know Angular first supply the skeleton of the page and in client side, fecth for info and display it.
When I try to get some info using HtmlAgilityPack, I get nothing.
My question is if it's possible to setup HttpWebRequest or HttpWebResponse or any other class to indicate to wait for javascript is done before getting the content or something similar.
Also, I tried to get content using WebBrowser and used the loadCompleted event and the same problem.
Any help?
Thanks.
So I have a list of URLs that I'm processing through and in a couple of cases I run into an argument exception due to gzip encoding issues, so I drew up this code to resolve the gzip encoding problems.
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(uri);
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))";
string source;
using (WebResponse webResponse = req.GetResponse())
// On the second iteration we never get beyond this line
{
HttpWebResponse httpWebResponse = webResponse as HttpWebResponse;
using (StreamReader reader = new StreamReader(httpWebResponse.GetResponseStream()))
{
source = reader.ReadToEnd();
}
httpWebResponse.Close();
}
req.Abort();
This works for the first URL that needs this processing. however the second URL that needs processing times out. I'm not sure what it is that I'm missing to get this to work consistently.
Now I do have the URLs being sent to the above method inside a foreach loop.
foreach (string uri in _UriAddresses)
{
ProcessListItem(uri);
}
Let me know if there's anything that's not shown that would shed light on this issue.
I have looked through the forums and found many seemingly related questions, but nothing has helped thus far. I want to be able to get select pdfs from various websites. Here is a snippet that I'm using successfully for most of the documents I'm interested in.
if (!String.IsNullOrEmpty(filePaths[1]))
{
var myRequest = (HttpWebRequest)WebRequest.Create(filePaths[1]);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
var sr = new StreamReader(myResponse.GetResponseStream(), Encoding.UTF8);
var fileBytes = sr.ReadToEnd();
using (var sw = new StreamWriter("<localfilepath/name")
{
sw.Write(fileBytes);
}
}
The problem comes when I try to get this document: http://www.azdor.gov/LinkClick.aspx?fileticket=r_I2VeNlcCQ%3d&tabid=265&mid=921
If I use the above code, I get a DotNetNuke error. I tried utilizing a WebClient as many other posts have suggested, but get the same error.
When I use this code:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = #"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0";
request.ContentType = "application/x-unknown";
request.Method = "GET";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
var sr2 = new StreamReader(stream, Encoding.UTF8);//.ASCII);
var srt = sr2.ReadToEnd();
var a = srt.Length;
using (var sw = new StreamWriter("WebDataTestdocs/testpdf.pdf"))
{
sw.Write(srt);
}
}
}
I get a file back, but it says it was corrupted. Also utilizing UTF8 makes the file size bigger than the one I get when going to the site. If I make the Encoding.ASCII, the file size is correct, but still am getting the corrupted file error. I can see the English text in the file by opening it with notepad, so I'm not sure what is corrupted exactly.
Any help that can be offered would be greatly appreciated, I've been at this for quite a while!
I have the following code that sends a HttpWebRequest to Bing. When I request the url below though it returns what appears to be an empty response when it should be returning a list of results.
var response = string.Empty;
var httpWebRequest = WebRequest.Create("http://www.bing.com/search?q=stackoverflow&count=100") as HttpWebRequest;
httpWebRequest.Method = WebRequestMethods.Http.Get;
httpWebRequest.Headers.Add("Accept-Language", "en-US");
httpWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
httpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
using (var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse)
{
Stream stream = null;
using (stream = httpWebResponse.GetResponseStream())
{
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
stream = new GZipStream(stream, CompressionMode.Decompress);
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
stream = new DeflateStream(stream, CompressionMode.Decompress);
var streamReader = new StreamReader(stream, Encoding.UTF8);
response = streamReader.ReadToEnd();
}
}
Its pretty standard code for requesting and receiving a web page. Any ideas why the response is empty? Thanks in advance.
EDIT I left off a query string parameter in the url. I also had &count=100 which I have now corrected. It seems to work for values of 50 and below but returns nothing when larger. This works ok when in the browser, but not for this web request.
It makes me think the issue is that the response is large and HttpWebResponse is not handling that for me the way I have it set up. Just a guess though.
This works just fine on my machine. Perhaps you are IP banned from Bing?
Your code works fine on my machine.
I suggest you get yourself a copy of Fiddler and examine the actual HTTP sesssion occuring. May be a proxy or firewall thing.