I want to go to download a string from a website, I made this php file to show an example.
(This won't work around my whole website)
The link http://swageh.co/information.php won't be downloaded using a webClient from any PC.
I prefer using a webClient.
No matter what I try, it won't downloadString.
It works fine on a browser.
It returns an error 500 An unhandled exception of type 'System.Net.WebException' occurred in System.dll
Additional information: The underlying connection was closed: An unexpected error occurred on a send. is the error
Did you change something on the server-side?
All of the following options are working just fine for me as of right now (all return just "false" with StatusCode of 200):
var client = new WebClient();
var stringResult = client.DownloadString("http://swageh.co/information.php");
Also HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://swageh.co/information.php");
request.GetResponse().GetResponseStream();
Newer HttpClient:
var client = new HttpClient();
var req = new HttpRequestMessage(HttpMethod.Get, "http://swageh.co/information.php");
var res = client.SendAsync(req);
var stringResult = res.Result.Content.ReadAsStringAsync().Result;
it's because your website is responding with 301 Moved Permanently
see Get where a 301 URl redirects to
This shows how to automatically follow the redirect: Using WebClient in C# is there a way to get the URL of a site after being redirected?
look at Christophe Debove's answer rather than the accepted answer.
Interestingly this doesn't work - tried making headers the same as Chrome as below, perhaps use Telerik Fiddler to see what is happening.
var strUrl = "http://theurl_inhere";
var headers = new WebHeaderCollection();
headers.Add("Accept-Language", "en-US,en;q=0.9");
headers.Add("Cache-Control", "no-cache");
headers.Add("Pragma", "no-cache");
headers.Add("Upgrade-Insecure-Requests", "1");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Accept = "text/html,application/xhtml+xml,application/xml; q = 0.9,image / webp,image / apng,*/*;q=0.8";
request.Headers.Add( headers );
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
var strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();
I'm trying to fetch the HTML of a page through code:
WebRequest r = WebRequest.Create(szPageURL);
WebClient client = new WebClient();
try
{
WebResponse resp = r.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
szHTML = sr.ReadToEnd();
}
This code works when I use URLs like www.microsoft.com, www.google.com, or www.nasa.gov. However, when I put in www.epa.gov (using either 'http' or 'https' in the URL parameter), I get a 403 exception when executing r.GetResponse(). Yet I can easily fetch the page manually in a browser. The exception I'm getting is 403 (Forbidden) and the exception status member says "ProtocolError". What does that mean? Why I am I getting this on a page that actually is available? Anyone have any ideas? Thanks!
BTW - I also tried this way:
string downloadString = client.DownloadString(szPageURL);
Got exact same exception.
try this code, it works:
string Url = "https://www.epa.gov/";
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.CookieContainer = cookieJar;
request.Accept = #"text/html, application/xhtml+xml, */*";
request.Referer = #"https://www.epa.gov/";
request.Headers.Add("Accept-Language", "en-GB");
request.UserAgent = #"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)";
request.Host = #"www.epa.gov";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
String htmlString;
using (var reader = new StreamReader(response.GetResponseStream()))
{
htmlString = reader.ReadToEnd();
}
I am new to C# but it happens that I need to programmatically login to a particular web site for screen-scraping in C#. I have done on-line research (this site has been particularly helpful) and I have learnt that I need to use one of the following objects/classes in order to login: WebRequest/WebResponse, HttpWebRequest/HttpWebResponse, WebClient, and also that I need to pass cookies that I receive from the web site to subsequent (screen scraping) requests. I, however, have not been able to successfully login, and at this point I have ran out of ideas. I want to login on the home page ------ and then screen scrape a number of pages like this one: -------. The web site works like this: It allows one to access pages like the one I have referenced, but unless a user is logged in, it returns asterisks in some of the fields. I presume that it means that the content is dynamically generated, which I suspect may be the underlying cause of my login troubles. I am including the code that I am using to login to the web site:
class Program
{
private static string link_main_page = "-----------";
private static string link_target_page = "------------";
private static string authorization_param = "----------";
private static void LoginUsingTheHttpWebRequestClass()
{
HttpWebRequest MyLoginRequest = (HttpWebRequest)WebRequest.Create(link_main_page);
MyLoginRequest.Method = "POST";
MyLoginRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.1; .NET4.0C; .NET CLR 2.0.50727; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)";
MyLoginRequest.ContentType = "application/x-www-form-urlencoded";
MyLoginRequest.CookieContainer = new CookieContainer();
byte[] sentData = Encoding.UTF8.GetBytes(authorization_param);
MyLoginRequest.ContentLength = sentData.Length;
Stream sendStream = MyLoginRequest.GetRequestStream();
sendStream.Write(sentData, 0, sentData.Length);
HttpWebResponse MyLoginResponse = (HttpWebResponse)MyLoginRequest.GetResponse();
CookieCollection MyCookieCollection = new CookieCollection();
MyCookieCollection.Add(MyLoginResponse.Cookies);
foreach (Cookie MyCookie in MyCookieCollection)
{
Console.WriteLine("Cookie:");
Console.WriteLine("{0} = {1}", MyCookie.Name, MyCookie.Value);
}
HttpWebRequest MyGetRequest = (HttpWebRequest)WebRequest.Create(link_target_page);
MyGetRequest.ContentType = "application/x-www-form-urlencoded";
MyGetRequest.Method = "GET";
MyGetRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.1; .NET4.0C; .NET CLR 2.0.50727; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)";
MyGetRequest.CookieContainer = new CookieContainer();
MyGetRequest.CookieContainer.Add(MyCookieCollection);
HttpWebResponse MyGetResponse = (HttpWebResponse)MyGetRequest.GetResponse();
Stream stream;
stream = MyGetResponse.GetResponseStream();
string s;
using (StreamReader sr = new StreamReader(stream))
{
s = sr.ReadToEnd();
using (StreamWriter sw = File.CreateText("TheFile.htm"))
{
sw.Write(s);
sw.Close();
}
sr.Close();
}
}
private static void LoginUsingTheWebRequestClass()
{
WebRequest MyLoginRequest = WebRequest.Create(link_main_page);
MyLoginRequest.Method = "POST";
MyLoginRequest.ContentType = "application/x-www-form-urlencoded";
byte[] sentData = Encoding.UTF8.GetBytes(authorization_param);
MyLoginRequest.ContentLength = sentData.Length;
Stream sendStream = MyLoginRequest.GetRequestStream();
sendStream.Write(sentData, 0, sentData.Length);
WebResponse MyLoginResponse = MyLoginRequest.GetResponse();
string CookieHeader;
CookieHeader = MyLoginResponse.Headers["Set-cookie"];
Console.WriteLine("Cookie:");
Console.WriteLine(CookieHeader);
WebRequest MyGetRequest = WebRequest.Create(link_target_page);
MyGetRequest.ContentType = "application/x-www-form-urlencoded";
MyGetRequest.Method = "GET";
MyGetRequest.Headers.Add("Cookie", CookieHeader);
WebResponse MyGetResponse = MyGetRequest.GetResponse();
Stream stream;
stream = MyGetResponse.GetResponseStream();
string s;
using (StreamReader sr = new StreamReader(stream))
{
s = sr.ReadToEnd();
using (StreamWriter sw = File.CreateText("TheFile2.htm"))
{
sw.Write(s);
sw.Close();
}
sr.Close();
}
}
static void Main(string[] args)
{
Console.WriteLine("Login Using the HttpWebRequest Class");
LoginUsingTheHttpWebRequestClass();
Console.WriteLine("Login Using the WebRequest Class");
LoginUsingTheWebRequestClass();
Console.WriteLine("Done! Press any key to continue");
Console.ReadKey();
}
}
Neither the attempt to login using HttpWebRequest/HttpWebResponse nor the attempt to login using WebRequest/WebResponse works. The first one returns a cookie that looks like this:
PHPSESSID=hncrr0...
The second one returns a cookie that looks like that:
PHPSESSID=88dn1n9...; path=/
These cookies look suspicious to me. For one thing they look different from the cookies in the IE. But I do not know what exactly I should expect.
(I also tried to pass cookies that I received via a (Http)WebRequest/(Http)WebResponse to a WebClient but again to no avail - I am not including it here to save space).
I would very much appreciate any input. If someone wants to run the code, I can actually provide the actual login/password information (registration on that web site is free anyway).
I have been trying to get an HttpWebRequest or anything else to give me the HTML of a webpage with a required login. IN a normal browser you do something like:
http://username:password#someURL.com
However in C#, this fails with a 401 Unauthorized.
I have also tried to set the credentials but it also fails. I have tried enabling cookies, posing as a browser, I'm lost...
string credentials = _domain.UserInfo;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(_domain);
request.Headers.Add("Authorization", "Basic " + Convert.ToBase64String(Encoding.UTF8.GetBytes(credentials)));
request.PreAuthenticate = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
** Another Try ****
System.Net.WebRequest req = System.Net.WebRequest.Create(_domain);
if (_domain.UserInfo.Length > 0)
{
string[] creds = _domain.UserInfo.Split(new char[] { ':' });
req.Credentials = new System.Net.NetworkCredential(creds[0], creds[1], _domain.Authority);
}
req.ImpersonationLevel = System.Security.Principal.TokenImpersonationLevel.Delegation;
req.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.CacheIfAvailable);
System.Net.HttpWebRequest _httpReq = (HttpWebRequest)req;
_httpReq.CookieContainer = new CookieContainer();
_httpReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)";
_httpReq.UnsafeAuthenticatedConnectionSharing = true;
System.Net.WebResponse resp = req.GetResponse();
What am I doing wrong?
This behavior is by design (see here.) and there doesn't seem to be an easy way to change this behavior. Instead you will need to put your credentials into a System.Net.NetworkCredentials object and set the Credentials property to that on your web request.
I've been trying to automate a log in to a website I frequent, www.bungie.net. The site is associated with Microsoft and Xbox Live, and as such makes uses of the Windows Live ID API when people log in to their site.
I am relatively new to creating web spiders/robots, and I worry that I'm misunderstanding some of the most basic concepts. I've simulated logins to other sites such as Facebook and Gmail, but live.com has given me nothing but trouble.
Anyways, I've been using Wireshark and the Firefox addon Tamper Data to try and figure out what I need to post, and what cookies I need to include with my requests. As far as I know these are the steps one must follow to log in to this site.
1. Visit https: //login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1268167141&rver=5.5.4177.0&wp=LBI&wreply=http:%2F%2Fwww.bungie.net%2FDefault.aspx&id=42917
2. Recieve the cookies MSPRequ and MSPOK.
3. Post the values from the form ID "PPSX", the values from the form ID "PPFT", your username, your password all to a changing URL similar to: https: //login.live.com/ppsecure/post.srf?wa=wsignin1.0&rpsnv=11&ct=
(there are a few numbers that change at the end of that URL)
4. Live.com returns the user a page with more hidden forms to post. The client then posts the values from the form "ANON", the value from the form "ANONExp" and the values from the form "t" to the URL: http ://www.bung ie.net/Default.aspx?wa=wsignin1.0
5. After posting that data, the user is returned a variety of cookies the most important of which is "BNGAuth" which is the log in cookie for the site.
Where I am having trouble is on fifth step, but that doesn't neccesarily mean I've done all the other steps correctly. I post the data from "ANON", "ANONExp" and "t" but instead of being returned a BNGAuth cookie, I'm returned a cookie named "RSPMaybe" and redirected to the home page.
When I review the Wireshark log, I noticed something that instantly stood out to me as different between the log when I logged in with Firefox and when my program ran. It could be nothing but I'll include the picture here for you to review. I'm being returned an HTTP packet from the site before I post the data in the fourth step. I'm not sure how this is happening, but it must be a side effect from something I'm doing wrong in the HTTPS steps.
using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Text;
using System.Net;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;
using System.Security.Cryptography.X509Certificates;
using System.Web;
namespace SpiderFromScratch
{
class Program
{
static void Main(string[] args)
{
CookieContainer cookies = new CookieContainer();
Uri url = new Uri("https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1268167141&rver=5.5.4177.0&wp=LBI&wreply=http:%2F%2Fwww.bungie.net%2FDefault.aspx&id=42917");
HttpWebRequest http = (HttpWebRequest)HttpWebRequest.Create(url);
http.Timeout = 30000;
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 (.NET CLR 3.5.30729)";
http.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
http.Headers.Add("Accept-Language", "en-us,en;q=0.5");
http.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
http.Headers.Add("Keep-Alive", "300");
http.Referer = "http://www.bungie.net/";
http.ContentType = "application/x-www-form-urlencoded";
http.CookieContainer = new CookieContainer();
http.Method = WebRequestMethods.Http.Get;
HttpWebResponse response = (HttpWebResponse)http.GetResponse();
StreamReader readStream = new StreamReader(response.GetResponseStream());
string HTML = readStream.ReadToEnd();
readStream.Close();
//gets the cookies (they are set in the eighth header)
string[] strCookies = response.Headers.GetValues(8);
response.Close();
string name, value;
Cookie manualCookie;
for (int i = 0; i < strCookies.Length; i++)
{
name = strCookies[i].Substring(0, strCookies[i].IndexOf("="));
value = strCookies[i].Substring(strCookies[i].IndexOf("=") + 1, strCookies[i].IndexOf(";") - strCookies[i].IndexOf("=") - 1);
manualCookie = new Cookie(name, "\"" + value + "\"");
Uri manualURL = new Uri("http://login.live.com");
http.CookieContainer.Add(manualURL, manualCookie);
}
//stores the cookies to be used later
cookies = http.CookieContainer;
//Get the PPSX value
string PPSX = HTML.Remove(0, HTML.IndexOf("PPSX"));
PPSX = PPSX.Remove(0, PPSX.IndexOf("value") + 7);
PPSX = PPSX.Substring(0, PPSX.IndexOf("\""));
//Get this random PPFT value
string PPFT = HTML.Remove(0, HTML.IndexOf("PPFT"));
PPFT = PPFT.Remove(0, PPFT.IndexOf("value") + 7);
PPFT = PPFT.Substring(0, PPFT.IndexOf("\""));
//Get the random URL you POST to
string POSTURL = HTML.Remove(0, HTML.IndexOf("https://login.live.com/ppsecure/post.srf?wa=wsignin1.0&rpsnv=11&ct="));
POSTURL = POSTURL.Substring(0, POSTURL.IndexOf("\""));
//POST with cookies
http = (HttpWebRequest)HttpWebRequest.Create(POSTURL);
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 (.NET CLR 3.5.30729)";
http.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
http.Headers.Add("Accept-Language", "en-us,en;q=0.5");
http.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
http.Headers.Add("Keep-Alive", "300");
http.CookieContainer = cookies;
http.Referer = "https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1268158321&rver=5.5.4177.0&wp=LBI&wreply=http:%2F%2Fwww.bungie.net%2FDefault.aspx&id=42917";
http.ContentType = "application/x-www-form-urlencoded";
http.Method = WebRequestMethods.Http.Post;
Stream ostream = http.GetRequestStream();
//used to convert strings into bytes
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
//Post information
byte[] buffer = encoding.GetBytes("PPSX=" + PPSX +"&PwdPad=IfYouAreReadingThisYouHaveTooMuc&login=YOUREMAILGOESHERE&passwd=YOURWORDGOESHERE" +
"&LoginOptions=2&PPFT=" + PPFT);
ostream.Write(buffer, 0, buffer.Length);
ostream.Close();
HttpWebResponse response2 = (HttpWebResponse)http.GetResponse();
readStream = new StreamReader(response2.GetResponseStream());
HTML = readStream.ReadToEnd();
response2.Close();
ostream.Dispose();
foreach (Cookie cookie in response2.Cookies)
{
Console.WriteLine(cookie.Name + ": ");
Console.WriteLine(cookie.Value);
Console.WriteLine(cookie.Expires);
Console.WriteLine();
}
//SET POSTURL value
string POSTANON = "http://www.bungie.net/Default.aspx?wa=wsignin1.0";
//Get the ANON value
string ANON = HTML.Remove(0, HTML.IndexOf("ANON"));
ANON = ANON.Remove(0, ANON.IndexOf("value") + 7);
ANON = ANON.Substring(0, ANON.IndexOf("\""));
ANON = HttpUtility.UrlEncode(ANON);
//Get the ANONExp value
string ANONExp = HTML.Remove(0, HTML.IndexOf("ANONExp"));
ANONExp = ANONExp.Remove(0, ANONExp.IndexOf("value") + 7);
ANONExp = ANONExp.Substring(0, ANONExp.IndexOf("\""));
ANONExp = HttpUtility.UrlEncode(ANONExp);
//Get the t value
string t = HTML.Remove(0, HTML.IndexOf("id=\"t\""));
t = t.Remove(0, t.IndexOf("value") + 7);
t = t.Substring(0, t.IndexOf("\""));
t = HttpUtility.UrlEncode(t);
//POST the Info and Accept the Bungie Cookies
http = (HttpWebRequest)HttpWebRequest.Create(POSTANON);
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 (.NET CLR 3.5.30729)";
http.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
http.Headers.Add("Accept-Language", "en-us,en;q=0.5");
http.Headers.Add("Accept-Encoding", "gzip,deflate");
http.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
http.Headers.Add("Keep-Alive", "115");
http.CookieContainer = new CookieContainer();
http.ContentType = "application/x-www-form-urlencoded";
http.Method = WebRequestMethods.Http.Post;
http.Expect = null;
ostream = http.GetRequestStream();
int test = ANON.Length;
int test1 = ANONExp.Length;
int test2 = t.Length;
buffer = encoding.GetBytes("ANON=" + ANON +"&ANONExp=" + ANONExp + "&t=" + t);
ostream.Write(buffer, 0, buffer.Length);
ostream.Close();
//Here lies the problem, I am not returned the correct cookies.
HttpWebResponse response3 = (HttpWebResponse)http.GetResponse();
GZipStream gzip = new GZipStream(response3.GetResponseStream(), CompressionMode.Decompress);
readStream = new StreamReader(gzip);
HTML = readStream.ReadToEnd();
//gets both cookies
string[] strCookies2 = response3.Headers.GetValues(11);
response3.Close();
}
}
}
I'm not sure if you're still working on this or not but the Windows Live Development site has a lot of info on it to help with using the Live ID API. I've not had much of a dig into it but their Getting Started page has a load of info plus a link to download sample applications detailing how to use the service in a variety of languages (including C#).
You can download the sample application from there.
It sounds pretty interesting what you're trying to do, so much so that I quite fancy having a play with this myself!
Change your timing and see if you get the same results.
It is so much easier to just use a UI automation framework like WatiN than to use httpwebrequest, unless that would break your requirements. With WatiN, you are thinking about what is shown on UI rather than what is on the HTML.