load a page with javascript disabled using HtmlAgilityPack/HttpWebRequest - c#

I was wondering if there was a way to load a page with javascript disabled (i.e. emulate a browser accessing a page with javascript disallowed).
I'm exploring a promising method using WebRequest and UserAgent:
HttpWebRequest Req = (HttpWebRequest)WebRequest.Create(url);
Req.UserAgent = #"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
WebPage result;
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
HtmlDocument doc = new HtmlDocument();
var resultStream = resp.GetResponseStream();
doc.Load(resultStream);
And I want to say there is a way to intialize the useragent (in this case firefox) with javascript disabled, but I'm not quite sure how to.
If anyone knows how to do this just using HtmlAgilityPack, that would be extremely helpful as well.
Also, on a sidenote: to fill in a textbox using the HtmlAgilityPack, is it just:
HtmlNode textbox = HtmlDocument doc.DocumentNode.SelectSingleNode("//text[#id='box']");
textbox.SetAttributeValue("value to put in textbox");
?
Thank you very much!

Related

Accessing ASP.NET_SessionId on HttpWebResponse C#

I'm trying to get ASP.NET_SessionId from a HttpWebResponse but it seems that no such data comes on the response.
Basically I'm trying to simulate some steps over some pages, where authentication is required. The problem is not in the authentication, my problem is to get ASP.NET_SessionId after the authentication so I can use it in my future requests/steps.
From Chrome on developer tools > network, I can see the ASP.NET_SessionId on the headers, but it does not come in my HttpWebResponse. Any ideia why this is happening?
There is my code:
var httpWebRequest = (HttpWebRequest) WebRequest.Create(url);
httpWebRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36"; httpWebRequest.Method = "POST";
httpWebRequest.ContentType = "application/x-www-form-urlencoded";
httpWebRequest.ContentLength = 0;
var httpWebResponse = (HttpWebResponse) httpWebRequest.GetResponse();
After my request I should see a ASP.NET_SessionId header Set-Cookie, but no luck. Any ideia?
I've seen some people say that
httpWebResponse.Headers["ASP.NET_SessionId"]
or
httpWebResponse.Headers["SESSION_ID"]
should work but no, no session id header is set nor any Cookie.
After many research, the answer was here
Basically we have to keep the same CookieContainer object reference across all requests. I was extracting some Set-Cookie from the responses and adding them into my requests, but now I don't need to do anything, CookieContainer manages all of it transparently.
Set-Cookie from responses are set on CookieContainer of your request. It's the way they found to resolve possible security issues, so don't lose more time and just keep a reference to your CookieContainer because you will not be able to access session id (and you don't need it).
There's the example of my code now.
var cookieContainer = new CookieContainer();
var httpWebRequest1 = (HttpWebRequest) WebRequest.Create(url);
httpWebRequest1.CookieContainer = cookieContainer;
// do the request and some logic
var httpWebRequest2 = (HttpWebRequest) WebRequest.Create(anotherUrl);
httpWebRequest2.CookieContainer = cookieContainer; // same cookieContainer reference
Everything is working great now, hope it helps someone.

XPath, htmlAgilityPack and the WebBrowser control

I can load a url into a WebBrowser control and perform a login (forms based), I see what I need to see. Great, now I want to use XPath to get the data I need.
Can't do that with a WebBrowser (unless you disagree?) so I use The Agility Pack to kick of a new session as per below:
var wc = new WebClient();
wc.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(wc.OpenRead(url), Encoding.UTF8);
var value = doc.DocumentNode.SelectSingleNode("|//li[#data-section='currentPositionsDetails']//*[#class='description']");
My value is not retrievable because the website doesn't expose it to the public (it wants a logged in session). How can I "pass on" my WebBrowser control session to my WebClient()? Looking into some of the methods of how to POST my login information, it all seems awfully complicated.
Any ideas? - Thanks
You can retrieve the body html string with webBrowser1.Document.Body.OuterHtml and load it with HtmlAgilityPack:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(new StringReader(this.webBrowser1.Document.Body.OuterHtml));
OK, posting this as an answer as it seems to be answered/discussed elsewhere. It's not going to be easy for an amateur like me!
How to pass cookies to HtmlAgilityPack or WebClient?
HtmlAgilityPack.HtmlDocument Cookies

How to send GET/POST request programmatically to simple ASPX page?

I use following code to post querystring
string URI = "http://somewebsite.com/default.aspx";
string myParameters = "param1=value1&param2=value2&param3=value3";
using (WebClient wc = new WebClient())
{
wc.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
string HtmlResult = wc.UploadString(URI, myParameters);
}
But somehow default.aspx does not accept that post call.
The point is when I manually in browser go to http://somewebsite.com/default.aspx all code there is working fine.
My questions is following what do I am missing here to archive the same result when I open page manually as I do it with WebClient?
Thank you in advance!
P.S. 1
I just tried to use GET method to that URL and it has no effect also. How is it possible?
What is difference between manual navigation to page and sending GET/POST?
P.S. 2
I even tried this
wc.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
wc.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC)";
and and Load event of Default.aspx is not hiting. :(
From your description of what you want to achieve, I think you may have chosen the wrong WebClient method. Instead of UploadString, try DownloadString:
using (WebClient wc = new WebClient())
{
string HtmlResult = wc.DownloadString("http://somewebsite.com/default.aspx?param1=value1&param2=value2&param3=value3");
}
So that comment is correct one
"What is difference between manual navigation to page and sending
GET/POST?" - see for yourself, for example using Fiddler. –
CodeCaster
I checked all requests with Fiddler and found that there is code of base page class that redirects to Index page. So Load event is never happened.

C# HttpWebRequest Get Total Page Size

Hello I am trying to get the entire total page size with HttpWebRequest from images scripts etc. I need to calculate the total trafic of a web page if one user visits the page via web browser. The content length brings me the length of the content in byte size. It is actualy the document length. But i need to get all the traffic also from Images and scripts
This is my Code So far
NetworkCredential cred = new NetworkCredential("username", "password");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.Proxy = new System.Net.WebProxy("proxy", true);
request.Proxy.Credentials = cred;
request.UserAgent = #"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
using (HttpWebResponse requestresponse = (HttpWebResponse)request.GetResponse())
{
Headers = requestresponse.Headers;
Url = requestresponse.ResponseUri;
int ContentLength;
if (int.TryParse(response.Headers.Get("Content-Length"), out ContentLength))
{
//This is One way i get the content leangth
WebPageSize = ContentLength;
}
//This is another way i get the content leangth
WebPageSize = request.ContentLength;
return ProcessContent(requestresponse);
}
Also I the header content length is not guaranteed that the server will respond it back.
Any suggestions?
If you just need to check the size once why don't you try using the Net panel of Firebug (Firefox extension) or the equivalent tool of other browsers? It will tell you the size off all the requests performed while loading a webpage.
If you are the one doing the measurement and can guarantee there will be no other network traffic, you can try the following:
Get TotalBytesReceived from IPv4Statistics
Open the page in WebBrowserControl and wait until all resources are loaded.
Get the total bytes again.
From those two numbers, you can calculate the total size.
Here is an article about working with IPv4Statistics:
http://www.m0interactive.com/archives/2008/02/06/how_to_calculate_network_bandwidth_speed_in_c_/

WebRequest NameResolutionFailure

I'm attempting to write a small screen-scraping tool for statistics aggregation in c#. I have attempted to use this code, (posted many times here but again for detail):
public static string GetPage(string url)
{
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
WebResponse response = (HttpWebResponse) request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string result = reader.ReadToEnd();
stream.Dispose();
reader.Dispose();
return result;
}
However, some (not all) websites I attempt to connect to that use Ajax or server side includes throw NameResolutionFailure exceptions and cannot read the data.
An example of this is : pgatour stats
I am led to believe the HttpWebRequest class emulates a browser when requesting information so you get the post-generated HTML. Currently, the only way I can read the data is making an iMacro that grabs it from the page source after it runs through the browser. As said before, it works in the browser so I don't think the error is related to a DNS issue and the website does generate a response (.haveresponse is set).
Has anyone else encountered this issue and what did you use tor resolve it?
Thanks.

Categories

Resources