How can I Browse a page Programmatically?

How can I Browse a page Programmatically? - c#

I've seen numerous examples on how to get the contents of a URI. I also used HTMLAgilityPack a lot.
What I want is to create Unit Testing environment for asp websites.
I've seen the BrowserSession and this Question but although, the process seems fine, they do not login in a website. I tried numerous well-known websites.
Any ideas on how to browse though code?

It sounds like you want to submit a form on a web page and view the response HTML back of the resulting page.
This method will take a form target URL and submit a post with the given named arguments in the parms Dictionary.
I have used the method below to perform password authentication on a web page and view the response after authentication. You will need to know the target Url and the form fields you wish to pass in the request.
private string SubmitRequest(string url, Dictionary<string, string> parms)
{
var req = WebRequest.Create(url);
req.Method = "POST";
string parmsString = string.Join("&", parms.Select(p => string.Format("{0}={1}", p.Key, p.Value)));
req.ContentLength = parmsString.Length;
using (StreamWriter writer = new StreamWriter(req.GetRequestStream()))
{
writer.Write(parmsString);
writer.Close();
}
var res = req.GetResponse();
using (StreamReader reader = new StreamReader(res.GetResponseStream()))
{
string response = reader.ReadToEnd();
reader.Close();
return response;
}
}
If there is something more specific you are wanting or this is not what you are looking for then please post a comment.

My suggestion is to try some tutorials of WebDriverJs and see if that works for you. It is mainly used for testing but can also be used for other purposes. I am using it to automate responding to user's queries on a shopping platform.

Related

.net MVC：How to hide the true URL?

I have an external URL, like http://a.com/?id=5 (not in my project)
and I want my website to show this URL's contents,
ex.
My website(http://MyWebsite.com/?id=123) shows 3rd party's url (http://a.com/?id=5) contents
but I don't want the client side to get a real URL(http://a.com/?id=5), I'll check the AUTH first and then shows the page.

I assume that you do not have control over the server of "http://a.com/?id=5". I think there's no way to completely hide the external link to users. They can always look at the HTML source code and http requests & trace back the original location.
One possible solution to partially hide that external site is using curl equivalent of MVC, on your controller: after auth-ed, you request the website from "http://a.com/?id=5" and then return that to your user:
ASP.NET MVC - Using cURL or similar to perform requests in application:
I assume the request to "http://a.com/?id=5" is in GET method:
public string GetResponseText(string userAgent) {
string url = "http://a.com/?id=5";
string responseText = String.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = userAgent;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (StreamReader sr = new StreamReader(response.GetResponseStream())) {
responseText = sr.ReadToEnd();
}
return responseText;
}
then, you just need to call this in your controller. Pass the same userAgent from client so that they can view the website exactly like they open it with their web browsers:
return GetResponseText( request.UserAgent);
//request is the request passed to the controller for http://MyWebsite.com/?id=123
PS: I may not using the correct MVC API, but the idea is there. Just need to look up MVC document on HttpWebRequest to make it work correctly.

Downloading the HTML of the site returns something completely different

I'm using C# to download the HTML of a webpage, but when I check the actual code of the web page and my downloaded code, they are completely different. Here is the code:
public static string getSourceCode(string url) {
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string soruceCode = sr.ReadToEnd();
sr.Close();
resp.Close();
return soruceCode;
using (StreamReader sRead = new StreamReader(resp.GetResponseStream(), Encoding.UTF8)) {
// veriyi döndür
return sRead.ReadToEnd();
}
private void button1_Click(object sender, EventArgs e) {
string url = "http://www.booking.com/hotel/tr/nena.en-gb.html?label=gog235jc-hotel-en-tr-mina-nobrand-tr-com-T002-1;sid=fcc1c6c78f188a42870dcbe1cabf2fb4;dcid=1;origin=disamb;srhash=3938286438;srpos=5";
string sourceCode = Finder.getSourceCode(url);
StreamWriter sw = new StreamWriter("HotelPrice.txt");//Here the code are completly different with web page code.
sw.Write(sourceCode);
sw.Close();
#region //Get Score Value
int StartIndex = sourceCode.IndexOf("<strong id=\"rsc_total\">") + 23;
sourceCode = sourceCode.Substring(StartIndex, 3);
#endregion
}

Most likely the cause for the difference is that when you use the browser to request the same page it's part of a session which is not established when you request the same page using the WebRequest.
Looking at the URL it looks like that query parameter sid is a session identifier or a nonce of some sort. The page probably verifies that against the actually session id and when it determines that they are different it gives you some sort of "Ooopss.. wrong seesion" sort of response.
In order to mimic the browser's request you will have to make sure you generate the proper request which may need to include one or more of the following:
cookies (previously sent to you by the webserver)
a valid/proper user agent
some specific query parameters (again depending on what the page expects)
potentially a referrer URL
authentication credentials
The best way to determine what you need is to follow a conversation between your browser and the web server serving that page from start to finish and see exactly which pages are requested, what order and what information is passed back and forth. You can accomplish this using WireShark or Fidler - both free tools!

I ran into the same problem when trying to use HttpWebRequest to crawl a page, and the page used ajax to load all the data I was after. In order to get the ajax calls to occur I switched to the WebBrowser control.
This answer provides an example of how to use the control outside of a WinForms app. You'll want to hookup to the browser's DocumentCompleted event before parsing the page. Be warned, this event may fire multiple times before the page is ready to be parsed. You may want to add something like this
if(browser.ReadyState == WebBrowserReadyState.Complete)
to your event handler, to know when the page is completely done loading.

Retrieve DOM data from site

Is there any chance to retrieve DOM results when I click older posts from the site:
http://www.facebook.com/FamilyGuy
using C# or Java? I heard that it is possible to execute a script with onclick and get results. How I can execute this script:
onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"

I think older posts link sends an Ajax request and appends the response to the page. (I'm not sure. You should check the page source).
You can emulate this behavior in C#, Java, and JavaScript (you already have the code for javascript).
Edit:
It seems that Facebook uses some sort of internal APIs (JSCC) to load the content and it's undocumented.
I don't know about Facebook Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData to intercept GET requests when you click on more posts link and find the request URL and it's parameters.
After you get this information you have to Login to your account in your application and get the authentication cookie.
C# sample code as you requested:
private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
// Now you may need to extract some values from the login form and build the POST data with your username and password.
// I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.
// cookie container
var cookies = new CookieContainer();
// post the login form
webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
var requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
webRequest.GetResponse().Close();
return cookies;
}
Then you can perform GET requests with the cookie you have, on the URL you've got from analyzing that JSCC.get().getHandler() requests using TamperData, and eventually you'll get what you want as a response stream:
var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();
You can also use Selenium for browser automation. It also has C# and Java APIs (I have no experience using Selenium).

Facebook loads it's content dynamically with AJAX. You can use a tool like Firebug to examine what kind of request is made, and then replicate it.
Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML:
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

Logging into website programmatically with C# and WebRequest class

I'm trying to login to a website using C# and the WebRequest class. This is the code I wrote up last night to send POST data to a web page:
public string login(string URL, string postData)
{
Stream webpageStream;
WebResponse webpageResponse;
StreamReader webpageReader;
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
_webRequest = WebRequest.Create(URL);
_webRequest.Method = "POST";
_webRequest.ContentType = "application/x-www-form-urlencoded";
_webRequest.ContentLength = byteArray.Length;
webpageStream = _webRequest.GetRequestStream();
webpageStream.Write(byteArray, 0, byteArray.Length);
webpageResponse = _webRequest.GetResponse();
webpageStream = webpageResponse.GetResponseStream();
webpageReader = new StreamReader(webpageStream);
string responseFromServer = webpageReader.ReadToEnd();
webpageReader.Close();
webpageStream.Close();
webpageResponse.Close();
return responseFromServer;
}
and it works fine, but I have no idea how I can modify it to send POST data to a login script and then save a cookie(?) and log in.
I have looked at my network transfers using Firebug on the websites login page and it is sending POST data to a URL that looks like this:
accountName=myemail%40gmail.com&password=mypassword&persistLogin=on&app=com-sc2
As far as I'm aware, to be able to use my account with this website in my C# app I need to save the cookie that the web server sends, and then use it on every request? Is this right? Or can I get away with no cookie at all?
Any help is greatly apprecated, thanks! :)

The login process depends on the concrete web site. If it uses cookies, you need to use them.
I recommend to use Firefox with some http-headers watching plugin to look inside headers how they are sent to your particular web site, and then implement it the same way in C#. I answered very similar question the day before yesterday, including example with cookies. Look here.

I've found more luck using the HtmlElement class to manipulate around websites.
Here is cross post to an example of how logging in through code would work (provided you're using a WebBrowser Control)

How to get a variable value in an .asp page inside a .aspx.cs page using Ajax

I have a situation where I'm generating my Connection String in an asp page using a functionality.This functionality I may need to completely do from scratch in .net which is redundancy.To avoid this I want to get the connection string variable from the .asp page to the .net page i.e aspx.cs. Is it possible to do this. A couple of options from google I have been able to get are Server.Execute and sending a Web Request through.net to .asp page and get those values.I wanted to know the latency associated with this methods if it is actually possible.
there is a file getconnstring.asp...classic asp file
in this file I'm constructing connection string like
strACHConnection="Provider=MSDAORA.1;Password=..."
I want to use this variable value in an asp.net website as in a getconnstring.aspx.cs.Is it possible to do using an Ajax request.

Can can get the connection string or any other information from your .asp application by making a WebRequest from your asp.net application to your .asp app.
However, there will be latency issues depending on where the two reside with respect to each other. So I would get the info once and then save it to a file or something and then read it from there the next time.

I'm posting another answer so I can post some code that doesn't get garbled. Below is a Task based version.
var webRequest = WebRequest.Create("http://www.microsoft.com");
webRequest.GetReponseAsync().ContinueWith(t =>
{
if (t.Exception == null)
{
using (var sr = new StreamReader(t.Result.GetResponseStream()))
{
string str = sr.ReadToEnd();
}
}
else
System.Diagnostics.Debug.WriteLine(t.Exception.InnerException.Message);
});
And here is a sync version that's untested but should get you going.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.microsoft.com");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string str = reader.ReadtoEnd();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How can I Browse a page Programmatically? - c#

My suggestion is to try some tutorials of WebDriverJs and see if that works for you. It is mainly used for testing but can also be used for other purposes. I am using it to automate responding to user's queries on a shopping platform.

Related

.net MVC：How to hide the true URL?

Downloading the HTML of the site returns something completely different

Retrieve DOM data from site

Logging into website programmatically with C# and WebRequest class

How to get a variable value in an .asp page inside a .aspx.cs page using Ajax

Categories

Resources