I have been trying to make a simple proxy checker...
WebProxy myProxy = default(WebProxy);
foreach (string proxy in Proxies)
{
try
{
myProxy = new WebProxy(proxy);
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36";
r.Timeout = 3000;
r.Proxy = myProxy;
HttpWebResponse re = r.GetResponse();
Console.WriteLine($"[+] {proxy} Good", ConsoleColor.Green);
}
catch (Exception)
{
Console.WriteLine($"[-] {proxy} Bad", ConsoleColor.Red);
}
}
for some reason this line:
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
I see a little red line under the http, and this is the error I get
The best overload for Create does not have a parameter names http
How can I fix it? and I how can I make it check proxies reall fast, not like 1 proxy every 5 seconds
The HttpWebRequest class's Create method takes the URL as a string, not HTML:
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
Since there's actually no Create on HttpWebRequest, but only on WebRequest, your code is most likely actually this:
HttpWebRequest r = WebRequest.Create("http://www.google.com");
But what you want is this:
HttpWebRequest r = (HttpWebRequest)WebRequest.Create("http://www.google.com");
Related
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
I want to fill my MultiLine textbox from webpage's this is my code:
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
valuetxt.Text = htmlBody.InnerText;
This code is working fine for some url but for some url (https) this gave me an error:
Could not find file 'C:\Program Files\IIS Express\www.justdial.com
or:
The remote server returned an error: (403) Forbidden
Can anyone help me? Thanks in advance, sorry for my bad English.
Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting.
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
request.Credentials = new NetworkCredential("user", "password");
It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.
Add a UserAgent to your request to connect https properly:
request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
from here
I have ASP.NET website. When I call the url 'http://example.org/worktodo.ashx' from browser it works ok.
I have created one android app and if I call the above url from android app then also it works ok.
I have created windows app in C# and if I call the above url from that windows app then it fails with error 403 forbidden.
Following is the C# code.
try
{
bool TEST_LOCAL = false;
//
// One way to call the url
//
WebClient client = new WebClient();
string url = TEST_LOCAL ? "http://localhost:1805/webfolder/worktodo.ashx" : "http://example.org/worktodo.ashx";
string status = client.DownloadString(url);
MessageBox.Show(status, "WebClient Response");
//
// Another way to call the url
//
WebRequest request = WebRequest.Create(url);
request.Method = "GET";
request.Headers.Add("Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
request.Headers.Add("Connection:keep-alive");
request.Headers.Add("User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36");
request.Headers.Add("Upgrade-Insecure-Requests:1");
request.Headers.Add("Accept-Encoding:gzip, deflate, sdch");
request.ContentType = "text/json";
WebResponse response = request.GetResponse();
string responseString = new System.IO.StreamReader(response.GetResponseStream()).ReadToEnd();
MessageBox.Show(responseString, "WebRequest Response");
}
catch (WebException ex)
{
string error = ex.Status.ToString();
}
The exception thrown is:
The remote server returned an error: (403) Forbidden.
StatusCode value is 'Forbidden'
StatusDescription value is 'ModSecurity Action'
Following is android app code (uses org.apache.http library):
Handler handler = new Handler() {
Context ctx = context; // save context for use inside handleMessage()
#SuppressWarnings("deprecation")
public void handleMessage(Message message) {
switch (message.what) {
case HttpConnection.DID_START: {
break;
}
case HttpConnection.DID_SUCCEED: {
String response = (String) message.obj;
JSONObject jobjdata = null;
try {
JSONObject jobj = new JSONObject(response);
jobjdata = jobj.getJSONObject("data");
String status = URLDecoder.decode(jobjdata.getString("status"));
Toast.makeText(ctx, status, Toast.LENGTH_LONG).show();
} catch (Exception e1) {
Toast.makeText(ctx, "Unexpected error encountered", Toast.LENGTH_LONG).show();
// e1.printStackTrace();
}
}
}
}
};
final ArrayList<NameValuePair> params1 = new ArrayList<NameValuePair>();
if (RUN_LOCALLY)
new HttpConnection(handler).post(LOCAL_URL, params1);
else
new HttpConnection(handler).post(WEB_URL, params1);
}
Efforts / Research done so far to solve the issue:
I found following solutions that fixed 403 forbidden error for them but that could not fix my problem
Someone said, the file needs to have appropriate 'rwx' permissions set, so, I set 'rwx' permissions for the file
Someone said, specifying USER-AGENT worked, I tried (ref. Another way to call)
Someone said, valid header fixed it - used Fiddler to find valid header to be set, I used Chrome / Developer Tools and set valid header (ref.
another way to call)
Someone configured ModSecurity to fix it, but, I don't have ModSecurity installed for my website, so, not an option for me
Many were having problem with MVC and fixed it, but, I don't use MVC, so those solutions are not for me
ModSecurity Reference manual says, to remove it from a website, add <modules><remove name="ModSecurityIIS" /></modules> to web.config. I did but couldn't fix the issue
My questions are:
Why C# WinApp fails where as Android App succeeds?
Why Android App doesn't encounter 'ModSecurity Action' exception?
Why C# WinApp encounter 'ModSecurity Action' exception?
How to fix C# code?
Please help me solve the issue. Thank you all.
I found the answer. Below is the code that works as expected.
bool TEST_LOCAL = false;
string url = TEST_LOCAL ? "http://localhost:1805/webfolder/worktodo.ashx" : "http://example.org/worktodo.ashx";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36";
request.ContentType = "text/json";
WebResponse response = request.GetResponse();
string responseString = new System.IO.StreamReader(response.GetResponseStream()).ReadToEnd();
MessageBox.Show(responseString, "WebRequest Response");
NOTE: requires using System.Net;
edit: I'm also not convinced HttpListener does anything
So response headers != request headers for the next post.
Why does a browser begin with correct request headers, but a simple GET HTTP / 1.1 from my client not look the same even tho the originating request headers change per domain a lot of times???????
This doesn't use the cookies properly either. Why is that?
How do I work something to give me this browser magic?
*WebClient has no .RequestHeaders.
*Comparing HttpWebRequest headers to Chrome/Fiddler sniffing.
using System.Net;
private void Form1_Load(object sender, EventArgs e)
{
CookieContainer cookieJar = new CookieContainer();
cookieJar.GetCookies(new Uri("https://www.google.com"));
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.google.com");
request.CookieContainer = cookieJar;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
this.Text = request.Headers.Count.ToString();
WebHeaderCollection header = request.Headers;
for (int i = 0; i < header.Count; i++)
{
richTextBox1.AppendText(header.GetKey(i) + ": " + header[i] + "\n");
}
}
Fiddler/Chrome combo returns 10 Request headers; The client returns 2.
Also why does a header "Accept-Encoding: gzip,deflate,sdch" always make the response some weird 2 character flop of data?
It's not quite clear what you are trying to achieve but the WebClient has a Headers property that you could use to make the request headers look as you wish:
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22";
client.Headers[HttpRequestHeader.AcceptLanguage] = "fr-FR,fr;q=0.8";
... you could set here whatever headers you want
string result = client.DownloadString("http://www.google.com");
}
Is there a way to spoof a web request from C# code so it doesn't look like a bot or spam hitting the site? I am trying to web scrape my website, but keep getting blocked after a certain amount of calls. I want to act like a real browser. I am using this code, from HTML Agility Pack.
var web = new HtmlWeb();
web.UserAgent =
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
I do way too much web scraping, but here are the options:
I have a default list of headers I add as all of these are expected from a browser:
wc.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
wc.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
wc.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
wc.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
wc.Headers[HttpRequestHeader.AcceptLanguage] = "en-GB,en-US;q=0.8,en;q=0.6";
wc.Headers[HttpRequestHeader.AcceptCharset] = "ISO-8859-1,utf-8;q=0.7,*;q=0.3";
(WC is my WebClient).
As a further help - here is my webclient class that keeps cookies stored - which is also a massive help:
public class CookieWebClient : WebClient
{
public CookieContainer m_container = new CookieContainer();
public WebProxy proxy = null;
protected override WebRequest GetWebRequest(Uri address)
{
try
{
ServicePointManager.DefaultConnectionLimit = 1000000;
WebRequest request = base.GetWebRequest(address);
request.Proxy = proxy;
HttpWebRequest webRequest = request as HttpWebRequest;
webRequest.Pipelined = true;
webRequest.KeepAlive = true;
if (webRequest != null)
{
webRequest.CookieContainer = m_container;
}
return request;
}
catch
{
return null;
}
}
}
Here is my usual use for it. Add a static copy to your base site class with all your parsing functions you likely have:
protected static CookieWebClient wc = new CookieWebClient();
And call it as such:
public HtmlDocument Download(string url)
{
HtmlDocument hdoc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
HtmlNode.ElementsFlags.Remove("select");
Stream read = null;
try
{
read = wc.OpenRead(url);
}
catch (ArgumentException)
{
read = wc.OpenRead(HttpHelper.HTTPEncode(url));
}
hdoc.Load(read, true);
return hdoc;
}
The other main reason you may be crashing out is the connection is being closed by the server as you have had an open connection for too long. You can prove this by adding a try catch around the download part as above and if it fails, reset the webclient and try to download again:
HtmlDocument d = new HtmlDocument();
try
{
d = this.Download(prp.PropertyUrl);
}
catch (WebException e)
{
this.Msg(Site.ErrorSeverity.Severe, "Error connecting to " + this.URL + " : Resubmitting..");
wc = new CookieWebClient();
d = this.Download(prp.PropertyUrl);
}
This saves my ass all the time, even if it was the server rejecting you, this can re-jig the lot. Cookies are cleared and your free to roam again. If worse truly comes to worse - add proxy support and get a new proxy applied per 50-ish requests.
That should be more than enough for you to kick your own and any other sites arse.
RATE ME!
Use a regular browser and fiddler (if the developer tools are not up to scratch) and take a look at the request and response headers.
Build up your requests and request headers to match what the browser sends (you can use a couple of different browsers to asses if this makes a difference).
In regards to "getting blocked after a certain amount of calls" - throttle your calls. Only make one call every x seconds. Behave nicely to the site and it will behave nicely to you.
Chances are good that they simply look at the number of calls from your IP address per second and if it passes a threshold, the IP address gets blocked.