Get webpage content in asp using c#

Get webpage content in asp using c# - c#

I want to fill my MultiLine textbox from webpage's this is my code:
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
valuetxt.Text = htmlBody.InnerText;
This code is working fine for some url but for some url (https) this gave me an error:
Could not find file 'C:\Program Files\IIS Express\www.justdial.com
or:
The remote server returned an error: (403) Forbidden
Can anyone help me? Thanks in advance, sorry for my bad English.

Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting.
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
request.Credentials = new NetworkCredential("user", "password");

It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.

Add a UserAgent to your request to connect https properly:
request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
from here

Related

C# WebClient receives 403 when getting html from a site

I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.

I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!

'System.Net.WebException' when accessing WebClient. Works fine on browser

I want to go to download a string from a website, I made this php file to show an example.
(This won't work around my whole website)
The link http://swageh.co/information.php won't be downloaded using a webClient from any PC.
I prefer using a webClient.
No matter what I try, it won't downloadString.
It works fine on a browser.
It returns an error 500 An unhandled exception of type 'System.Net.WebException' occurred in System.dll
Additional information: The underlying connection was closed: An unexpected error occurred on a send. is the error

Did you change something on the server-side?
All of the following options are working just fine for me as of right now (all return just "false" with StatusCode of 200):
var client = new WebClient();
var stringResult = client.DownloadString("http://swageh.co/information.php");
Also HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://swageh.co/information.php");
request.GetResponse().GetResponseStream();
Newer HttpClient:
var client = new HttpClient();
var req = new HttpRequestMessage(HttpMethod.Get, "http://swageh.co/information.php");
var res = client.SendAsync(req);
var stringResult = res.Result.Content.ReadAsStringAsync().Result;

it's because your website is responding with 301 Moved Permanently
see Get where a 301 URl redirects to
This shows how to automatically follow the redirect: Using WebClient in C# is there a way to get the URL of a site after being redirected?
look at Christophe Debove's answer rather than the accepted answer.
Interestingly this doesn't work - tried making headers the same as Chrome as below, perhaps use Telerik Fiddler to see what is happening.
var strUrl = "http://theurl_inhere";
var headers = new WebHeaderCollection();
headers.Add("Accept-Language", "en-US,en;q=0.9");
headers.Add("Cache-Control", "no-cache");
headers.Add("Pragma", "no-cache");
headers.Add("Upgrade-Insecure-Requests", "1");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Accept = "text/html,application/xhtml+xml,application/xml; q = 0.9,image / webp,image / apng,*/*;q=0.8";
request.Headers.Add( headers );
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
var strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();

C# WinApp throws (403) Fobidden exception while sending HTTP/GET request

I have ASP.NET website. When I call the url 'http://example.org/worktodo.ashx' from browser it works ok.
I have created one android app and if I call the above url from android app then also it works ok.
I have created windows app in C# and if I call the above url from that windows app then it fails with error 403 forbidden.
Following is the C# code.
try
{
bool TEST_LOCAL = false;
//
// One way to call the url
//
WebClient client = new WebClient();
string url = TEST_LOCAL ? "http://localhost:1805/webfolder/worktodo.ashx" : "http://example.org/worktodo.ashx";
string status = client.DownloadString(url);
MessageBox.Show(status, "WebClient Response");
//
// Another way to call the url
//
WebRequest request = WebRequest.Create(url);
request.Method = "GET";
request.Headers.Add("Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
request.Headers.Add("Connection:keep-alive");
request.Headers.Add("User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36");
request.Headers.Add("Upgrade-Insecure-Requests:1");
request.Headers.Add("Accept-Encoding:gzip, deflate, sdch");
request.ContentType = "text/json";
WebResponse response = request.GetResponse();
string responseString = new System.IO.StreamReader(response.GetResponseStream()).ReadToEnd();
MessageBox.Show(responseString, "WebRequest Response");
}
catch (WebException ex)
{
string error = ex.Status.ToString();
}
The exception thrown is:
The remote server returned an error: (403) Forbidden.
StatusCode value is 'Forbidden'
StatusDescription value is 'ModSecurity Action'
Following is android app code (uses org.apache.http library):
Handler handler = new Handler() {
Context ctx = context; // save context for use inside handleMessage()
#SuppressWarnings("deprecation")
public void handleMessage(Message message) {
switch (message.what) {
case HttpConnection.DID_START: {
break;
}
case HttpConnection.DID_SUCCEED: {
String response = (String) message.obj;
JSONObject jobjdata = null;
try {
JSONObject jobj = new JSONObject(response);
jobjdata = jobj.getJSONObject("data");
String status = URLDecoder.decode(jobjdata.getString("status"));
Toast.makeText(ctx, status, Toast.LENGTH_LONG).show();
} catch (Exception e1) {
Toast.makeText(ctx, "Unexpected error encountered", Toast.LENGTH_LONG).show();
// e1.printStackTrace();
}
}
}
}
};
final ArrayList<NameValuePair> params1 = new ArrayList<NameValuePair>();
if (RUN_LOCALLY)
new HttpConnection(handler).post(LOCAL_URL, params1);
else
new HttpConnection(handler).post(WEB_URL, params1);
}
Efforts / Research done so far to solve the issue:
I found following solutions that fixed 403 forbidden error for them but that could not fix my problem
Someone said, the file needs to have appropriate 'rwx' permissions set, so, I set 'rwx' permissions for the file
Someone said, specifying USER-AGENT worked, I tried (ref. Another way to call)
Someone said, valid header fixed it - used Fiddler to find valid header to be set, I used Chrome / Developer Tools and set valid header (ref.
another way to call)
Someone configured ModSecurity to fix it, but, I don't have ModSecurity installed for my website, so, not an option for me
Many were having problem with MVC and fixed it, but, I don't use MVC, so those solutions are not for me
ModSecurity Reference manual says, to remove it from a website, add <modules><remove name="ModSecurityIIS" /></modules> to web.config. I did but couldn't fix the issue
My questions are:
Why C# WinApp fails where as Android App succeeds?
Why Android App doesn't encounter 'ModSecurity Action' exception?
Why C# WinApp encounter 'ModSecurity Action' exception?
How to fix C# code?
Please help me solve the issue. Thank you all.

I found the answer. Below is the code that works as expected.
bool TEST_LOCAL = false;
string url = TEST_LOCAL ? "http://localhost:1805/webfolder/worktodo.ashx" : "http://example.org/worktodo.ashx";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36";
request.ContentType = "text/json";
WebResponse response = request.GetResponse();
string responseString = new System.IO.StreamReader(response.GetResponseStream()).ReadToEnd();
MessageBox.Show(responseString, "WebRequest Response");
NOTE: requires using System.Net;

Use HtmlAgilityPack to get information from web encounter error 403

I want to get some information from the web with HtmlAgilityPack, the application was normal before I use the application to get the data from this page, the number of the error is 403, And my code is as follows:
string wikipageurl = geturl.Text;
WebClient wc1 = new WebClient();
Stream stream1 = wc1.OpenRead(wikipageurl);
StreamReader sr1 = new StreamReader(stream1, Encoding.UTF8);
showhtml.Text = sr1.ReadToEnd();
I use showhtml textbox to show me the information the application got.

This is how you can do it using HtmlAgilityPack:
HtmlDocutment doc;
HtmlWeb web = new HtmlWeb();
web.OverrideEncoding = Encoding.UTF8;
web.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
doc = web.Load("http://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E5%85%83%E5%B8%85%E5%88%97%E8%A1%A8");
showhtml.Text = doc.DocumentNode.OuterHtml;
If you want to do it using WebClient check Oscar Mederos answer

Just try to simulate you're accessing to it through a web browser. For that, you use the User-Agent header:
...
WebClient wc1 = new WebClient();
wc1.Headers.Add(
"User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0"
);
Stream stream1 = wc1.OpenRead(wikipageurl);
...

.Net C# : Read attachment from HttpWebResponse

Is it possible to read an image attachment from System.Net.HttpWebResponse?
I have a url to a java page, which generates images.
When I open the url in firefox, the download dialog appears. Content-type is application/png.
Seems to work.
When I try this in c#, and make a GET request I retrieve the content-type: text/html and no content-disposition header.
Simple Code:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
response.GetResponseStream() is empty.
A try with java was successful.
Do I have to prepare webrequest or something else?

You probably need to set a User-Agent header.
Run Fiddler and compare the requests.

Writing something in the UserAgent property of the HttpWebRequest does indeed make a difference in a lot of cases. A common practice for web services seem to be to ignore requests with an empty UserAgent.
See: Webmasters: Interpretation of empty User-agent
Simply set the UserAgent property to a non-empty string. You can for example use the name of your application, assembly information, impersonate a common UserAgent, or something else identifying.
Examples:
request.UserAgent = "my example program v1";
request.UserAgent = $"{System.Reflection.Assembly.GetExecutingAssembly().GetName().Name.ToString()} v{System.Reflection.Assembly.GetExecutingAssembly().GetName().Version.ToString()}";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36";
And just to give a full working example:
using System.IO;
using System.Net;
void DownloadFile(Uri uri, string filename)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.Timeout = 10000;
request.Method = "GET";
request.UserAgent = "my example program v1";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream receiveStream = response.GetResponseStream())
{
using (FileStream fileStream = File.Create(filename))
{
receiveStream.CopyTo(fileStream);
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get webpage content in asp using c# - c#

Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting. WebRequest request = WebRequest.Create(urltxt.Text.Trim()); request.Credentials = new NetworkCredential("user", "password");

It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.

Add a UserAgent to your request to connect https properly: request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"; from here

Related

C# WebClient receives 403 when getting html from a site

'System.Net.WebException' when accessing WebClient. Works fine on browser

C# WinApp throws (403) Fobidden exception while sending HTTP/GET request

Use HtmlAgilityPack to get information from web encounter error 403

.Net C# : Read attachment from HttpWebResponse

Categories

Resources