C# WebClient downloads source for some pages, but not all - c#

I currently have this code that is supposed to grab the HTML source of the website. Specifically, I am telling it to read the source of 4chan. It WILL get the source code for a board, such as /pol/ or /news/, but it will NOT get the source code for specific threads. It throws the error: [System.Net.WebException: 'The remote server returned an error: (403) Forbidden.']
Here is the code I am working with.
public string GetSource(string url)
{
WebClient client = new WebClient();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12; //tried with & without this
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/6.0;)");
try
{
return client.DownloadString(url);
}
catch
{
Error(2); //error code 2
}
return "";
}
It will download the source of "https://boards.4chan.org/pol" for example.
It will not download the source of "https://boards.4chan.org/pol/thread/#"
I am completely lost as to how to proceed. I have a "user-agent" tag, and it works sometimes, so I don't know what the problem is. Any help would be appreciated. Thanks.

Related

C# HttpClient request fails to scrape (both on System.Net and Windows.Web http requests)

I am trying to scrape the news off this site: https://www.livescore.com/soccer/news/
using (Windows.Web.Http.HttpClient client = new Windows.Web.Http.HttpClient())
{
client.DefaultRequestHeaders.Add("User-Agent",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident / 6.0)");
using (Windows.Web.Http.HttpResponseMessage response = await client.GetAsync(new Uri(pageURL)))
using (Windows.Web.Http.IHttpContent content = response.Content)
{
try
{
string result = await content.ReadAsStringAsync();
Debug.WriteLine(result);
}
}
}
I see that I am getting a response containing Your browser is out of date or some of its features are disabled
I moved to Windows.Web to add certificates since I am on UWP and tried adding the following certificates
HttpBaseProtocolFilter filter = new HttpBaseProtocolFilter();
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.Untrusted);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.Expired);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.IncompleteChain);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.WrongUsage);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.InvalidName);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.RevocationInformationMissing);
filter.IgnorableServerCertificateErrors.Add(ChainValidationResult.RevocationFailure);
but I am still getting the same response from the server.
Any idea how to bypass this?
Edit: They do have the old server, unsecured, http://www.livescore.com/, where I guess I can scrape everything but news aren't there.
I think that the problem is the user-agent string. you are telling to site that the browser are you using is Internet Explorer 10.
Look this page http://www.useragentstring.com/pages/useragentstring.php?name=Internet+Explorer and try to use the user agent for internet Explorer 11 (before make this open the page from your ie11 browser to check that function properly)

Web Request Response From Nike.com Forbidden

All I'm trying to do is create a program that gets a web response from Nike's upcoming shoe's page, however I keep running into an error saying this is forbidden. No other threads on this topic have been of use to me, is there anything I can do for this or am I just screwed? This is the code:
WebRequest request = WebRequest.Create("https://www.nike.com/launch/?s=upcoming");
WebResponse response = request.GetResponse();
and this is the error:
System.Net.WebException: 'The remote server returned an error: (403) Forbidden.'
Seems like a header issue, try this:
WebClient client = new WebClient();
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
client.Headers.Add("Content-Type", "application / zip, application / octet - stream");
client.Headers.Add("Referer", "http://whatevs");
client.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
String someStuff = client.DownloadString("https://www.hassanhabib.com");
Console.WriteLine(someStuff);
Console.Read();
Removed the Accept-Encoding line, should be fine now.

Webclient 404 protocol error on valid url c#

I have a webclient that calls to a URL that works fine when i view it in a browser, which led me to believe i would need to add headers in to my call
I have done this, but am still getting the error.
I do have other calls to the same API that work fine, and have checked that all the parameters I am passing across are exactly the same as expected(case, spelling)
using (var wb = new WebClient())
{
wb.Proxy = proxy;
wb.Headers.Add("Accept-Language", " en-US");
wb.Headers.Add("Accept", " text/html, application/xhtml+xml, */*");
wb.Headers.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");
byte[] response = wb.UploadValues("http://myserver/api/account/GetUser",
new NameValueCollection()
{
{ "email", register.Email },
});
userDetails = Encoding.UTF8.GetString(response);
}
Does anyone have an idea why I am still getting the protocol error on a call that works perfectly fine in a browser?
UploadValue uses a HTTP POST. Are you sure that it what you want? If you are viewing it in a browser it is likely a GET, unless you are filling out some sort of web form.
One might surmise that what you are trying to do is GET this response "http://myserver/api/account/GetUser?email=blah#blah.com"
in which case you would just formulate that url, with query parameters, and execute a GET using one of the DownloadString overloads.
using (var wb = new WebClient())
{
wb.Proxy = proxy;
userDetails = wb.DownloadString("http://myserver/api/account/GetUser?email=" + register.Email);
}
The Wikipedia article on REST has a nice table that outlines the semantics of each HTTP verb, which may help choosing the appropriate WebClient method to use for your use cases.

The remote server returned an error: (416) Requested Range Not Satisfiable in C#

Code which i tried :
string contents = string.Empty;
using (var wc = new System.Net.WebClient())
{
contents = wc.DownloadString("http://www.bizjournals.com/albany/blog/health-care/2015/10/what-this-local-bank-did-to-control-health-care.html");
}
but its throwing error
The remote server returned an error: (416) Requested Range Not
Satisfiable
It appears that some webservers may return a 416 if your client does not send a User-Agent header. Try adding the header like this:
wc.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705");

Getting error "This page uses frames, but your browser doesn't support"

This page uses frames, but your browser doesn't support them.
This error occurs when I am trying to get the information from our smsgatwaye site.
The code is as follows:
WebClient client = new WebClient();
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR1.0.3705;)");
string baseurl = "http://smsoutbox.in/?user=test&password=test#123";
Stream data = client.OpenRead(baseurl);
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
data.Close();
reader.Close();
I am requesting to http://smsoutbox.in page which ask for username & password if it is valid than it shows my gateway balance on same page in frame.
But when I get the response, I found this error:
This page uses frames, but your browser doesn't support. (Line) instead of balance in response stream.
How can solve this?
View the source of the page yourself, and look at the frames being used. Open each one separately to determine which URL you need to retreive.
The problem might be that WebClient already submits a UserAgent and by adding another "user-agent"-header you're not replacing the original header.
Use this modified WebClient that internally uses HttpWebRequest's UserAgent property:
http://codehelp.smartdev.eu/2009/05/08/improve-webclient-by-adding-useragent-and-cookies-to-your-requests/
Alternatively it should work to correctly modify the UserAgent like this:
client.Headers["user-agent"] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR1.0.3705;)");

Categories

Resources