Is there any chance to retrieve DOM results when I click older posts from the site:
http://www.facebook.com/FamilyGuy
using C# or Java? I heard that it is possible to execute a script with onclick and get results. How I can execute this script:
onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"
I think older posts link sends an Ajax request and appends the response to the page. (I'm not sure. You should check the page source).
You can emulate this behavior in C#, Java, and JavaScript (you already have the code for javascript).
Edit:
It seems that Facebook uses some sort of internal APIs (JSCC) to load the content and it's undocumented.
I don't know about Facebook Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData to intercept GET requests when you click on more posts link and find the request URL and it's parameters.
After you get this information you have to Login to your account in your application and get the authentication cookie.
C# sample code as you requested:
private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
// Now you may need to extract some values from the login form and build the POST data with your username and password.
// I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.
// cookie container
var cookies = new CookieContainer();
// post the login form
webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
var requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
webRequest.GetResponse().Close();
return cookies;
}
Then you can perform GET requests with the cookie you have, on the URL you've got from analyzing that JSCC.get().getHandler() requests using TamperData, and eventually you'll get what you want as a response stream:
var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();
You can also use Selenium for browser automation. It also has C# and Java APIs (I have no experience using Selenium).
Facebook loads it's content dynamically with AJAX. You can use a tool like Firebug to examine what kind of request is made, and then replicate it.
Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML:
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
Related
I have an external URL, like http://a.com/?id=5 (not in my project)
and I want my website to show this URL's contents,
ex.
My website(http://MyWebsite.com/?id=123) shows 3rd party's url (http://a.com/?id=5) contents
but I don't want the client side to get a real URL(http://a.com/?id=5), I'll check the AUTH first and then shows the page.
I assume that you do not have control over the server of "http://a.com/?id=5". I think there's no way to completely hide the external link to users. They can always look at the HTML source code and http requests & trace back the original location.
One possible solution to partially hide that external site is using curl equivalent of MVC, on your controller: after auth-ed, you request the website from "http://a.com/?id=5" and then return that to your user:
ASP.NET MVC - Using cURL or similar to perform requests in application:
I assume the request to "http://a.com/?id=5" is in GET method:
public string GetResponseText(string userAgent) {
string url = "http://a.com/?id=5";
string responseText = String.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = userAgent;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (StreamReader sr = new StreamReader(response.GetResponseStream())) {
responseText = sr.ReadToEnd();
}
return responseText;
}
then, you just need to call this in your controller. Pass the same userAgent from client so that they can view the website exactly like they open it with their web browsers:
return GetResponseText( request.UserAgent);
//request is the request passed to the controller for http://MyWebsite.com/?id=123
PS: I may not using the correct MVC API, but the idea is there. Just need to look up MVC document on HttpWebRequest to make it work correctly.
I'm working on a project of a mobile version of an archaic Online Learning System in my campus. I've been trying for weeks to scrape something in this website, but I need to login first in order to get it. I have search anything including HttpWebRequest, CookiesAwareWebClient, etc
My method until now is:
Find the "action" URL in the login form of the site
Sent POST request to that URL
Receive response containing cookies in the Headers["Set-Cookie"]
Create new HttpWebRequest with the URL to the content(that need to be logged in first).
Copy the headers of set-cookie into that request.
Run it (but fails)
I also have tried using CookieCollection in CookieAwareWebClient but it didn't work too.
How to do it properly? Is the location of a Cookie in HttpWebRequest is only in Headers, or in HTTP Packets, where is the location of CookieCollection? Does CookieCollection included in the next request?
Thanks
You need to use a CookieContainer. That will process and hold the cookies for you between HttpWebRequest objects:
var cookieJar = new CookieContainer();
var loginWebRequest = WebRequest.Create(loginUrl) as HttpWebRequest;
loginWebRequest.CookieContainer = cookieJar;
// Execute the Web Request
var authRequiredWebRequest = WebRequest.Create(protectedUrl) as HttpWebRequest;
authRequiredWebRequest.CookieContainer = cookieJar;
// Execute the next request
// It will have the auth cookie set appropriately
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Say I am building a c# application.
The purpose of application to :
get username & password from user.
and show some information present on the website.
in the background, after taking username and password, it should :
log in to a website with those credentials.
and click on the anchor link that appears after logging in.
find out the span that hold the info.
get the info.
that was an example. I am actually building an app to show bandwidth usage information.
The server does not expose any API for that.
Is there any tutorial/info/article available for similar purpose ? I just don't what to search for ?
Basic Introduction To HttpWebRequests
Firstly, you're going to need the right tools for the job. Go and download the Live HTTP Headers plugin for Firefox. This will allow you to view HTTP headers in real time so you can view the POST data that is sent when you interact with the website. Once you know the data that is sent to the website you can emulate the process by creating your own HTTP web requests programmatically. Tool > Live HTTP Headers
Load Live HTTP Headers by navigating to Tools > Live HTTP Headers. Once you've loaded the GUI navigate to the website you wish to login to, I will use Facebook for demonstration purposes. Type in your credentials ready to login, but before you do Clear the GUI text window and ensure that the check box labeled Capture is checked. Once you hit login you will see the text window flood with various information about the requests including the POST data which you need.
I find it best to click Save All... and then search for your username in the text document so that you can identify the POST data easily. For my request the POST data looked like this:
lsd=AVp-UAbD&display=&legacy_return=1&return_session=0&trynum=1&charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&timezone=0&lgnrnd=214119_mDgc&lgnjs=1356154880&email=%myfacebookemail40outlook.com&pass=myfacebookpassword&default_persistent=0
Which can then be defined in C# like so:
StringBuilder postData = new StringBuilder();
postData.Append("lsd=AVqRGVie&display=");
postData.Append("&legacy_return=1");
postData.Append("&return_session=0");
postData.Append("&trynum=1");
postData.Append("&charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84");
postData.Append("&timezone=0");
postData.Append("&lgnrnd=153743_eO6D");
postData.Append("&lgnjs=1355614667");
postData.Append(String.Format("&email={0}", "CUSTOM_EMAIL"));
postData.Append(String.Format("&pass={0}", "CUSTOM_PASSWORD"));
postData.Append("&default_persistent=0");
I'm aiming to show you the relation between the POST data that we can send 'manually' via the web browser and how we can use said data to emulate the request in C#. Understand that sending POST data is far from deterministic. Different websites work in different ways and can throw all kinds of things your way. Below is a function I put together to validate that Facebook credentials are correct. I can't and shouldn't go into extraordinary depth here as the classes and their members are well self-documented. You can find better information than I can offer about the methods used at MSDN for example, WebRequest.Method Property
private bool ValidateFacebookCredentials(string email, string password)
{
CookieContainer cookies = new CookieContainer();
HttpWebRequest request = null;
HttpWebResponse response = null;
string returnData = string.Empty;
//Need to retrieve cookies first
request = (HttpWebRequest)WebRequest.Create(new Uri("https://www.facebook.com/login.php?login_attempt=1"));
request.Method = "GET";
request.CookieContainer = cookies;
response = (HttpWebResponse)request.GetResponse();
//Set up the request
request = (HttpWebRequest)WebRequest.Create(new Uri("https://www.facebook.com/login.php?login_attempt=1"));
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
request.Referer = "https://www.facebook.com/login.php?login_attempt=1";
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.CookieContainer = cookies;
//Format the POST data
StringBuilder postData = new StringBuilder();
postData.Append("lsd=AVqRGVie&display=");
postData.Append("&legacy_return=1");
postData.Append("&return_session=0");
postData.Append("&trynum=1");
postData.Append("&charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84");
postData.Append("&timezone=0");
postData.Append("&lgnrnd=153743_eO6D");
postData.Append("&lgnjs=1355614667");
postData.Append(String.Format("&email={0}", email));
postData.Append(String.Format("&pass={0}", password));
postData.Append("&default_persistent=0");
//write the POST data to the stream
using(StreamWriter writer = new StreamWriter(request.GetRequestStream()))
writer.Write(postData.ToString());
response = (HttpWebResponse)request.GetResponse();
//Read the web page (HTML) that we retrieve after sending the request
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
returnData = reader.ReadToEnd();
return !returnData.Contains("Please re-enter your password");
}
Sample Code on Grabbing Contents (Screen Scraping)
Uri uri = new Uri("http://www.microsoft.com/default.aspx");
if(uri.Scheme = Uri.UriSchemeHttp)
{
HttpWebRequest request = HttpWebRequest.Create(uri);
request.Method = WebRequestMethods.Http.Get;
HttpWebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string tmp = reader.ReadToEnd();
response.Close();
Response.Write(tmp);
}
Sample Code on how to Post Data to remote Web Page using HttpWebRequest
Uri uri = new Uri("http://www.amazon.com/exec/obidos/search-handle-form/102-5194535-6807312");
string data = "field-keywords=ASP.NET 2.0";
if (uri.Scheme == Uri.UriSchemeHttp)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.Method = WebRequestMethods.Http.Post;
request.ContentLength = data.Length;
request.ContentType = "application/x-www-form-urlencoded";
StreamWriter writer = new StreamWriter(request.GetRequestStream());
writer.Write(data);
writer.Close();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string tmp = reader.ReadToEnd();
response.Close();
Response.Write(tmp);
}
Source
Any HTTP client implementation, there are tons of open-source libraries for that. look at curl for example. Some dude made a .NET wrapper for it.
You can continue using WebClient to POST (instead of GET, which is the HTTP verb you're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.
There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").
Click Here to Check in Detail
I would like to grab some content from a website that is made with Drupal.
The challenge here is that i need to login on this site before i can access the page i want to scrape. Is there a way to automate this login process in my C# code, so i can grab the secure content?
To access the secured content, you'll need to store and send cookies with every request to your server, starting with the request that sends your log in info and then saving the session cookie that the server gives you (which is your proof that you are who you say you are).
You can use the System.Windows.Forms.WebBrowser for a less control but out-of-the-box solution that will handle cookies.
My preferred method is to use System.Net.HttpWebRequest to send and receive all web data and then use the HtmlAgilityPack to parse the returned data into a Document Object Model (DOM) which can be easily read from.
The trick to getting System.Net.HttpWebRequest to work is that you must create a long-lived System.Net.CookieContainer that will keep track of your log in info (and other things the server expects you to keep track of). The good news is that the HttpWebRequest will take care of all of this for you if you provide the container.
You need a new HttpWebRequest for each call you make, so you must sets their .CookieContainer to the same object every time. Here is an example:
UNTESTED
using System.Net;
public void TestConnect()
{
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/login.htm");
request.CookieContainer = cookieJar;
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
// do page parsing and request setting here
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/submit_login.htm");
// add specific page parameters here
request.CookeContainer = cookieJar;
response = (HttpWebResponse) request.GetResponse();
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/secured_page.htm");
request.CookeContainer = cookieJar;
// this will now work since you have saved your authentication cookies in 'cookieJar'
response = (HttpWebResponse) request.GetResponse();
}
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
HttpWebRequest Class
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx
You'll have to use the Services module to do that. Also check out this link for a bit of explanation.
I'm trying to login to a website using C# and the WebRequest class. This is the code I wrote up last night to send POST data to a web page:
public string login(string URL, string postData)
{
Stream webpageStream;
WebResponse webpageResponse;
StreamReader webpageReader;
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
_webRequest = WebRequest.Create(URL);
_webRequest.Method = "POST";
_webRequest.ContentType = "application/x-www-form-urlencoded";
_webRequest.ContentLength = byteArray.Length;
webpageStream = _webRequest.GetRequestStream();
webpageStream.Write(byteArray, 0, byteArray.Length);
webpageResponse = _webRequest.GetResponse();
webpageStream = webpageResponse.GetResponseStream();
webpageReader = new StreamReader(webpageStream);
string responseFromServer = webpageReader.ReadToEnd();
webpageReader.Close();
webpageStream.Close();
webpageResponse.Close();
return responseFromServer;
}
and it works fine, but I have no idea how I can modify it to send POST data to a login script and then save a cookie(?) and log in.
I have looked at my network transfers using Firebug on the websites login page and it is sending POST data to a URL that looks like this:
accountName=myemail%40gmail.com&password=mypassword&persistLogin=on&app=com-sc2
As far as I'm aware, to be able to use my account with this website in my C# app I need to save the cookie that the web server sends, and then use it on every request? Is this right? Or can I get away with no cookie at all?
Any help is greatly apprecated, thanks! :)
The login process depends on the concrete web site. If it uses cookies, you need to use them.
I recommend to use Firefox with some http-headers watching plugin to look inside headers how they are sent to your particular web site, and then implement it the same way in C#. I answered very similar question the day before yesterday, including example with cookies. Look here.
I've found more luck using the HtmlElement class to manipulate around websites.
Here is cross post to an example of how logging in through code would work (provided you're using a WebBrowser Control)