'System.Net.WebException' when accessing WebClient. Works fine on browser - c#

I want to go to download a string from a website, I made this php file to show an example.
(This won't work around my whole website)
The link http://swageh.co/information.php won't be downloaded using a webClient from any PC.
I prefer using a webClient.
No matter what I try, it won't downloadString.
It works fine on a browser.
It returns an error 500 An unhandled exception of type 'System.Net.WebException' occurred in System.dll
Additional information: The underlying connection was closed: An unexpected error occurred on a send. is the error

Did you change something on the server-side?
All of the following options are working just fine for me as of right now (all return just "false" with StatusCode of 200):
var client = new WebClient();
var stringResult = client.DownloadString("http://swageh.co/information.php");
Also HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://swageh.co/information.php");
request.GetResponse().GetResponseStream();
Newer HttpClient:
var client = new HttpClient();
var req = new HttpRequestMessage(HttpMethod.Get, "http://swageh.co/information.php");
var res = client.SendAsync(req);
var stringResult = res.Result.Content.ReadAsStringAsync().Result;

it's because your website is responding with 301 Moved Permanently
see Get where a 301 URl redirects to
This shows how to automatically follow the redirect: Using WebClient in C# is there a way to get the URL of a site after being redirected?
look at Christophe Debove's answer rather than the accepted answer.
Interestingly this doesn't work - tried making headers the same as Chrome as below, perhaps use Telerik Fiddler to see what is happening.
var strUrl = "http://theurl_inhere";
var headers = new WebHeaderCollection();
headers.Add("Accept-Language", "en-US,en;q=0.9");
headers.Add("Cache-Control", "no-cache");
headers.Add("Pragma", "no-cache");
headers.Add("Upgrade-Insecure-Requests", "1");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Accept = "text/html,application/xhtml+xml,application/xml; q = 0.9,image / webp,image / apng,*/*;q=0.8";
request.Headers.Add( headers );
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
var strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();

Related

How to get redirect url of http://www.google.com site

I have tried this simple code to get redirect code 307 but failed.
string urlRequest = "http://www.google.com";
request = HttpWebRequest.Create(urlRequest) as HttpWebRequest;
request.AllowAutoRedirect = false;
var response = request.GetResponse();
Expect response status code is 307 and AbsoluteUri = "https://www.google.com" but not?
Google does not initiate a redirect in this case because it cannot be sure that the client supports https. It seems that google checks the UserAgent header of the request and only initiates a redirection when it can be sure the user agent supports https.
string urlRequest = "http://www.google.com";
HttpWebRequest request = HttpWebRequest.CreateHttp(urlRequest);
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0";
request.AllowAutoRedirect = false;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
MessageBox.Show(response.StatusCode.ToString());
MessageBox.Show(response.Headers["Location"]);
Other request headers will also influence how Google behaves.

Get webpage content in asp using c#

I want to fill my MultiLine textbox from webpage's this is my code:
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
valuetxt.Text = htmlBody.InnerText;
This code is working fine for some url but for some url (https) this gave me an error:
Could not find file 'C:\Program Files\IIS Express\www.justdial.com
or:
The remote server returned an error: (403) Forbidden
Can anyone help me? Thanks in advance, sorry for my bad English.
Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting.
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
request.Credentials = new NetworkCredential("user", "password");
It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.
Add a UserAgent to your request to connect https properly:
request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
from here

Getting 403 Exception fetching web page programmatically even though web page is available via browser

I'm trying to fetch the HTML of a page through code:
WebRequest r = WebRequest.Create(szPageURL);
WebClient client = new WebClient();
try
{
WebResponse resp = r.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
szHTML = sr.ReadToEnd();
}
This code works when I use URLs like www.microsoft.com, www.google.com, or www.nasa.gov. However, when I put in www.epa.gov (using either 'http' or 'https' in the URL parameter), I get a 403 exception when executing r.GetResponse(). Yet I can easily fetch the page manually in a browser. The exception I'm getting is 403 (Forbidden) and the exception status member says "ProtocolError". What does that mean? Why I am I getting this on a page that actually is available? Anyone have any ideas? Thanks!
BTW - I also tried this way:
string downloadString = client.DownloadString(szPageURL);
Got exact same exception.
try this code, it works:
string Url = "https://www.epa.gov/";
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.CookieContainer = cookieJar;
request.Accept = #"text/html, application/xhtml+xml, */*";
request.Referer = #"https://www.epa.gov/";
request.Headers.Add("Accept-Language", "en-GB");
request.UserAgent = #"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)";
request.Host = #"www.epa.gov";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
String htmlString;
using (var reader = new StreamReader(response.GetResponseStream()))
{
htmlString = reader.ReadToEnd();
}

Webservice and HttpWebRequest

I have a website with webservice active(prestashop)
This site require an authentication.
I use this code:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36";
request.Method = "GET";
request.Credentials = new NetworkCredential("key", "");
request.PreAuthenticate = true;
//request.Connection
request.Host = "localhost";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
String R = reader.ReadToEnd();
The code is ok but my problem is that there is a login form for the webservice
In fact, the HttpWebRequest object , sends two requests:
with the first answer is not authorized while the second was ok status.
I used fiddler web debbuger.
I apologize for my English.
if the form is submitted using GET method you must pass the form paramaters in the url query string, for instance http://url?username={0}&pass={1}. If it is POST method, you must pass the form info into the http body request. There is a lot of examples in stackoverflow of this. Also you must handle the cookies witch is achieve using the CookieContainer. In the first request intialize the container
request.CookieContainer = new CookieContainer();
when the request comeback with ok status the cookies will be in request.Cookies witch is a CookieCollection instance. Later for further request you must have to pass this cookies in order to retrieve the correct data.
request.CookieContainer = new CookieContainer();
request.CookieContainer.Add(userCookies);
Hope it helps!

Screen scraping: unable to authenticate into a site utilizing ASP .NET Forms Authentication

Using a C# WebRequest, I am attempting to screen scrape a website utilizing ASP.NET Forms Authentication.
First, the application performs a GET to the login page and extracts the __VIEWSTATE and __EVENTVALIDATION keys from hidden input fields, and the .NET SessionId from its cookie. Next, the application performs a POST with the username, password, other required form fields, and the three aforementioned .NET variables to the form action.
From a Fiddler session using Chrome to authenticate into the website, I am expecting a 302 with a token stored in a cookie to allow navigation of the secure area of the site. I cannot understand why I keep getting 302s without a token, redirecting me to the website's non-authenticated home page. In Fiddler, my application's request looks exactly the same as the request made from within Chrome or Firefox.
// Create a request using a URL that can receive a post.
var request = (HttpWebRequest)WebRequest.Create(LoginUrl);
// Set the Method property of the request to POST.
_container = new CookieContainer();
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers["Accept-Encoding"] = "gzip,deflate,sdch";
request.Headers["Accept-Language"] = "en-US,en;q=0.8";
var response = (HttpWebResponse)request.GetResponse();
_container.Add(response.Cookies);
string responseFromServer;
using (var decompress = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
{
using (var reader = new StreamReader(decompress))
{
// Read the content.
responseFromServer = reader.ReadToEnd();
}
}
var doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
var hiddenFields = doc.DocumentNode.SelectNodes("//input[#type='hidden']").ToDictionary(input => input.GetAttributeValue("name", ""), input => input.GetAttributeValue("value", ""));
request = (HttpWebRequest)WebRequest.Create(LoginUrl);
request.Method = "POST";
request.CookieContainer = _container;
// Create POST data and convert it to a byte array. Modify this line accordingly
var postData = String.Format("ddlsubsciribers={0}&memberfname={1}&memberpwd={2}&chkRemberMe=true&Imgbtn=LOGIN&__EVENTTARGET&__EVENTARGUMENT&__LASTFOCUS", Agency, Username, Password);
postData = hiddenFields.Aggregate(postData, (current, field) => current + ("&" + field.Key + "=" + field.Value));
ServicePointManager.ServerCertificateValidationCallback = AcceptAllCertifications;
var byteArray = Encoding.UTF8.GetBytes(postData);
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36";
// Set the ContentType property of the WebRequest.
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers["Accept-Encoding"] = "gzip,deflate,sdch";
request.Headers["Accept-Language"] = "en-US,en;q=0.8";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
// Get the request stream.
var dataStream = request.GetRequestStream();
// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
// Close the Stream object.
dataStream.Close();
// Get the response.
response = (HttpWebResponse)request.GetResponse();
_container.Add(response.Cookies);
// Clean up the streams.
dataStream.Close();
response.Close();
As it would turn out, some funky characters in the __EVENTVALIDATION variable were being encoded into a line break, and ASP.NET then threw out the session assuming it had become corrupt. The solution was to escape the ASP.NET variables using Uri.EscapeDataString.
postData = hiddenFields.Aggregate(postData, (current, field) => current + ("&" + field.Key + "=" + Uri.EscapeDataString(field.Value)));

Categories

Resources