I have the following code that sends a HttpWebRequest to Bing. When I request the url below though it returns what appears to be an empty response when it should be returning a list of results.
var response = string.Empty;
var httpWebRequest = WebRequest.Create("http://www.bing.com/search?q=stackoverflow&count=100") as HttpWebRequest;
httpWebRequest.Method = WebRequestMethods.Http.Get;
httpWebRequest.Headers.Add("Accept-Language", "en-US");
httpWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
httpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
using (var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse)
{
Stream stream = null;
using (stream = httpWebResponse.GetResponseStream())
{
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
stream = new GZipStream(stream, CompressionMode.Decompress);
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
stream = new DeflateStream(stream, CompressionMode.Decompress);
var streamReader = new StreamReader(stream, Encoding.UTF8);
response = streamReader.ReadToEnd();
}
}
Its pretty standard code for requesting and receiving a web page. Any ideas why the response is empty? Thanks in advance.
EDIT I left off a query string parameter in the url. I also had &count=100 which I have now corrected. It seems to work for values of 50 and below but returns nothing when larger. This works ok when in the browser, but not for this web request.
It makes me think the issue is that the response is large and HttpWebResponse is not handling that for me the way I have it set up. Just a guess though.
This works just fine on my machine. Perhaps you are IP banned from Bing?
Your code works fine on my machine.
I suggest you get yourself a copy of Fiddler and examine the actual HTTP sesssion occuring. May be a proxy or firewall thing.
Related
I am developing a C# wpf application that has a functionality of logging into my website and download the file. This said website has an Authorize attribute on its action. I need 2 cookies for me to able to download the file, first cookie is for me to log in, second cookie(which is provided after successful log in) is for me to download the file. So i came up with the flow of keeping my cookies after my httpwebrequest/httpwebresponse. I am looking at my posting flow as maybe it is the problem. Here is my code.
void externalloginanddownload()
{
string pageSource = string.Empty;
CookieContainer cookies = new CookieContainer();
HttpWebRequest getrequest = (HttpWebRequest)WebRequest.Create("login uri");
getrequest.CookieContainer = cookies;
getrequest.Method = "GET";
getrequest.AllowAutoRedirect = false;
HttpWebResponse getresponse = (HttpWebResponse)getrequest.GetResponse();
using (StreamReader sr = new StreamReader(getresponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
var values = new NameValueCollection
{
{"Username", "username"},
{"Password", "password"},
{ "Remember me?","False"},
};
var parameters = new StringBuilder();
foreach (string key in values.Keys)
{
parameters.AppendFormat("{0}={1}&",
HttpUtility.UrlEncode(key),
HttpUtility.UrlEncode(values[key]));
}
parameters.Length -= 1;
HttpWebRequest postrequest = (HttpWebRequest)WebRequest.Create("login uri");
postrequest.CookieContainer = cookies;
postrequest.Method = "POST";
using (var writer = new StreamWriter(postrequest.GetRequestStream()))
{
writer.Write(parameters.ToString());
}
using (WebResponse response = postrequest.GetResponse()) // the error 500 occurs here
{
using (var streamReader = new StreamReader(response.GetResponseStream()))
{
string html = streamReader.ReadToEnd();
}
}
}
When you get the WebResponse, the cookies returned will be in the response, not in the request (oddly enough, even though you need to CookieContainer on the request).
You will need to add the cookies from the response object to your CookieContainer, so it gets sent on the next request.
One simple way:
for(var cookie in getresponse.Cookies)
cookies.Add(cookie)
Since the cookies in response is already a cookies container, you can do this (might help to check for null in case all cookies were already there)
if (response.Cookies != null) cookies.Add(response.Cookies)
You may also have trouble with your POST as you need to set ContentType and length:
myWebRequest.ContentLength = parameters.Length;
myWebRequest.AllowWriteStreamBuffering = true;
If you have any multibyte characters to think about, you may have to address that as well by setting the encoding to UTF-8 on the request and the stringbuilder, and converting string to bytes and using that length.
Another tip: some web server code chokes if there is no user agent. Try:
myWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
And just in case you have any multibyte characters, it is better to do this:
var databytes = System.Text.Encoding.UTF8.GetBytes(parameters.ToString());
myWebRequest.ContentLength = databytes.Length;
myWebRequest.ContentType = "application/x-www-form-urlencoded; charset=utf-8";
using (var stream = myWebRequest.GetRequestStream())
{
stream.Write(databytes, 0, databytes.Length);
}
In C# Application (Server side Web API) Enable the C++ Exception and Common Language Run time Exceptions using (Ctrl+Alt+E) what is the Server side Exception it's throw.
First you check data is binding Properly. After you can see what it is Exact Exception. the Internal Server Error Mostly throw the data is not correct format and not properly managed Exception.
I have the following code for getting a website and it works fine. The problem come up when I try to get a web page developed in Angular.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201";
request.Method = "GET";
request.Timeout = 30000;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream flujo = response.GetResponseStream();
Encoding encode = Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(flujo, encode);
String html;
try
{
html = readStream.ReadToEnd();
} catch(System.IO.IOException)
{
return;
}
response.Close();
readStream.Close();
HtmlAgilityPack.HtmlDocument DOM = new HtmlAgilityPack.HtmlDocument();
DOM.LoadHtml(html);
I know Angular first supply the skeleton of the page and in client side, fecth for info and display it.
When I try to get some info using HtmlAgilityPack, I get nothing.
My question is if it's possible to setup HttpWebRequest or HttpWebResponse or any other class to indicate to wait for javascript is done before getting the content or something similar.
Also, I tried to get content using WebBrowser and used the loadCompleted event and the same problem.
Any help?
Thanks.
I am doing webparsing using C# Console Application.
My code is:
var req = WebRequest.Create("http://watch.squidtv.net/");
req.BeginGetResponse(r =>
{
var response = req.EndGetResponse(r);
var stream = response.GetResponseStream();
var reader = new StreamReader(stream, true);
var str = reader.ReadToEnd();
Console.WriteLine(str);
}, null);
This Code is runing fine with other URLs but when I changed URL to http://watch.squidtv.net/ then two problems occurred-
First one- It is not downloading its html.
Second one- Its generates a sound of CPU.
Then I changed the code and used webClient like this -
string htmlCode = "";
htmlCode = client.DownloadString("http://watch.squidtv.net");
Console.WriteLine(htmlCode);
But the problem is same :(
what can be the problem ???
I found the the Solution
the probelm was HTML header in HTML header there is gzip object Encoding the httpwebrequest is not accepting the gzip header which causing the problem when i used this code the problem solved
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://watch.squidtv.net/");
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))";
string htmlCode;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
htmlCode = reader.ReadToEnd();
}
Possibly you'll have to specify more in your WebRequest so that the SquidTV server can know to send you back the HTML for one idea.
Consider that in a browser there are lots of headers that get sent to the server. If you want to take a look use Fiddler or WireShark to see all the extra data that gets sent.
Firewalls could be another issue as you are sending out a request that may not be allowed and thus nothing is coming back. This would be where I'd likely suggest intermediate tools like WireShark or Fiddler that may be useful in seeing if the request is getting out at least.
I fetch webpages in order to feed data to my application. However, the pages contain a lot of images which I don't require at all. I only need the text data.
My problem is that the web requests take an unacceptable amount of time. I think the images also are fetch during a web request. Is there any way to eliminate the images and download only the text data?
The following is the code that I am using currently.
var httpWebRequest = HttpWebRequest.Create(url) as HttpWebRequest;
httpWebRequest.Method = "GET";
httpWebRequest.ProtocolVersion = HttpVersion.Version11;
httpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
httpWebRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpWebRequest.Proxy = null;
httpWebRequest.KeepAlive = true;
httpWebRequest.Accept = "text/html";
string responseString = null;
var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse;
using (var responseStream = httpWebResponse.GetResponseStream())
{
using (var streamReader = new StreamReader(responseStream))
{
responseString = streamReader.ReadToEnd();
}
}
Also, any other optimization suggestions are most welcome.
That is incorrect.
HttpWebRequest does not know anything about HTML or images; it just sends raw HTTP requests.
You can use Fiddler to see exactly what's going on.
I'm trying to fetch some webpages using the code below:
public static string FetchPage(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
req.Headers.Add("Accept-Encoding", "gzip,deflate");
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
req.Headers.Add("Keep-Alive", "115");
req.Headers.Add("Cache-Control: max-age=0");
req.AllowAutoRedirect = true;
req.IfModifiedSince = DateTime.Now;
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
using (Stream resStream = resp.GetResponseStream())
{
StreamReader reader = new StreamReader(resStream);
return reader.ReadToEnd();
}
}
}
Some pages work (W3C, example.com) while most others I've tried do not (BBC.co.uk, CNN.com, etc). Wireshark shows that I'm getting a proper reponse.
I've tried setting the encoding of the reader to the expected encoding of the response (CNN - utf8) as well as every possible combination but I have had no luck.
What am I missing out on here?
The first bytes of my response are always "1f ef bf bd" if you're able to tell something based on that.
I suspect the most likely explanation is that you are getting compressed data and not uncompressing it. Try using a stream filter to deflate/unzip it. See Rick Strahl's blog article for more info.
Loading http://bbc.co.uk worked for me when leaving out the "Accept-Encoding" header:
req.Headers.Add("Accept-Encoding", "gzip,deflate");