Open webpage programmatically and retrieve its html contain as a string - c#

I have a facebook account and I would like to extract my friend's photo and its personal detail such as "Date of birth", "Studied at" and so on. I am able to extract the address of the facebook's first page for each of my friends account but I don't know how to programmatically open webpage for each of my friends first page and save the html contain as a string so that I can extract out their personal detail and photos. Please help! Thank in advance!

You have Three options:
1- Using a WebClient object.
WebClient webClient = new webClient();
webClient.Credentials = new System.Net.NetworkCredential("UserName","Password", "Domain");
string pageHTML = WebClient .DownloadString("http://url");`
2- Using a WebRequest. This is the best solution because it gives you more control over your request.
WebRequest myWebRequest = WebRequest.Create("http://URL");
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream ReceiveStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader( ReceiveStream, encode );
string strResponse=readStream.ReadToEnd();
StreamWriter oSw=new StreamWriter(strFilePath);
oSw.WriteLine(strResponse);
oSw.Close();
readStream.Close();
myWebResponse.Close();
3- Using a WebBrowser (I bet you don't wanna do that)
WebBrowser wb = new WebBrowser();
wb.Navigate("http://URL");
string pageHTML = "";
wb.DocumentCompleted += (sender, e) => pageHTML = wb.DocumentText;
Excuse me if I misstyped any code because I improvised it and I don't have a syntax checker to check its correctness. But I think it should be fine.
EDIT: For facebook pages. You may consider using facebook Graph API:
http://developers.facebook.com/docs/reference/api/

Try this:
var html = new WebClient()
.DownloadString("the facebook account url goes here");
Also, once you have downloaded the HTML as a string I would highly recommend that you use the Html Agility Pack to parse it.

There are in general 2 things you can do here. The first thing you can do is called web scraping. That way you can download the source of the html with the following code:
var request = WebRequest.Create("http://example.com");
var response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(responseStream);
string stringResponse = reader.ReadToEnd();
}
stringResponse then contains the Html source of the website http://example.com
However, this is probably not what you want to do. Facebook has an SDK that you can use to download this kind of information. You can read about this on the following pages
http://developers.facebook.com/docs/reference/api/user/
If you want to use the FaceBook API then I think it's worth changing your question or asking a new question about this, since it's quite more complicated and requires some autorization and other codings. However, it's the best way since it's unlikely that your code is every going to break and it warrents the privacy of the people you want to get information from.
For example, if you query me with the api, you get the following string:
{
"id": "1089655429",
"name": "Timo Willemsen",
"birthday": "08/29/1989",
"education": [
{
"school": {
"id": "115091211836927",
"name": "Stedelijk Gymnasium Arnhem"
},
"year": {
"id": "127668947248449",
"name": "2001"
},
"type": "High School"
}
]
}
You can see that I'm Timo Wilemsen, 21 years old and studyied # Stedelijk Gymnasium Arnhem in 2001.

Use selenium 2.0 for C#. http://seleniumhq.org/download/
var driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://www.google.com");
String pageSource = driver.PageSource;

Related

Get instagram full profile information without "Instagram Graph Api"

I wanna get some data from the Instagram users.
So I've used Instagram Basic Display Api and the profile data I could receive was these:
username
media count
account type
but I want these data:
username
name
media count
Profile Image
followers count
following count
I don't know how can I have these data without Instagram Graph API(in any way) in c#?
Or is there any way to get these data with the WebClient class or anything like that?
Update for #Eehab answer: I use RestClient and WebClient in this example and both of them give the same result.
Now see WebClient example:
WebClient client = new WebClient();
string page = client.DownloadString("https://www.instagram.com/instagram/?__a=1");
Console.WriteLine(page);
Console.ReadKey();
and see an image of this code here.
now see the result of the code above here
I've also got, that this link is the only access for login users and I've been login into my Instagram account in chrome already, but I think WebClient needs to log in too.
Edit Through #Eehab answer:
In this case for using this Url(https://www.instagram.com/{username}/?__a=1), we can't do it without Instagram logged-in browser profile. So we should log in to Instagram with selenium and use the logged-in cookies to use it for Url requests. So first Install the selenium web driver and then write the following codes(untested):
var driver = new ChromeDriver();
//go to Instagram
driver.Url = "https://www.instagram.com/";
//Log in
var userNameElement = _driver.FindElement(By.Name("username"));
userNameElement.SendKeys("Username");
var passwordElement = _driver.FindElement(By.Name("password"));
passwordElement.SendKeys(Cars[0].auth.pass);
var loginButton = _driver.FindElement(By.Id("login"));
loginButton.Click();
//Get cookies
var cookies = driver.Manage().Cookies.AllCookies.ToList();
//Send request with given cookies :)
var url = "https://www.instagram.com/{username}/?__a=1";
var httpRequest = (HttpWebRequest)WebRequest.Create(url);
foreach(var cookie in cookies){
httpRequest.Headers["Cookie"] += $"{cookie.Name}={cookie.Value}; ";
}
var httpResponse = (HttpWebResponse)httpRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
}
//...
If anyone can improve this question for more uses can edit and I really appreciate it :)
You could do that using the open API , example :
https://www.instagram.com/instagram/?__a=1
example code from postman code :
var client = new RestClient("https://www.instagram.com/instagram/?__a=1");
client.Timeout = -1;
var request = new RestRequest(Method.GET);
IRestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
you could use HttpClient class also, if you want to use WebClient you could do it with
WebClient.DownloadString Method while I don't recommend using WebClient for this scraping, keep in mind Instagram may block you if blocked you , you need residential proxies to bypass the block.
the response will be json data , use Json.Net or similar library to deserialize it.
just replace instagram with any username you want in the given url.

Simple Get Post in Asp.net as like in PHP

I ran into a problem. I am .Net Developer and don't know about php, I am working on a CRM which has an API. My Client says it should be simple page should work with simple post. now i don't understand how i can do a simple Post in .Net. I have created an asp.net WebForm. All is working well. The only thing that i have problem with is that i have to return a list of parameters to response. I am using
Response.Write("100 - Click Recorded Successfully.");
but this return a full html Document with the parameter string at the top of the document. I saw one php Api which return only the prameter string like this with out HTML Document:
response=1
&responsetext=SUCCESS
&authcode=123456
&transactionid=2154229522
&avsresponse=N
&cvvresponse=N
&orderid=3592
&type=sale
&response_code=100
can some one suggest me any better way how i can do this. I found many article that explains how to do a simple Get Post in .Net but none of these solved my problem.
Update:
this is the code that i am using from another application to call the page and get response stream
string result = "";
WebRequest objRequest = WebRequest.Create(url + query);
objRequest.Method = "POST";
objRequest.ContentLength = 0;
objRequest.Headers.Add("x-ms-version", "2012-08-01");
objRequest.ContentType = "application/xml";
WebResponse objResponse = objRequest.GetResponse();
using (StreamReader sr =
new StreamReader(objResponse.GetResponseStream()))
{
result = sr.ReadToEnd();
// Close and clean up the StreamReader
sr.Close();
}
string temp = result;
where url + query is the address to my page. The result shows this code http://screencast.com/t/eKn4cckXc. I want to get the header line only, that is "100 - Click Recorded Successfully."
You have two options. First is to clear whatever response was already generated on the page, write the text, and then end the response so that nothing else added:
Response.Clear();
Response.ClearHeaders();
Response.AddHeader("Content-Type", "text/plain");
Response.Write(Request.Url.Query);
Response.End();
That is if you want to process it on the Page. However a better approach would be to implement Http Handler, in which case all you need to do is:
public void ProcessRequest
{
Response.AddHeader("Content-Type", "text/plain");
Response.Write(Request.Url.Query);
}

WebRequest using Mozilla Firefox

I need to have access at the HTML of a Facebook page, to extract from it some data. So, I need to create a WebRequest.
Example:
My code worked well for other sites, but for Facebook, I must be logged in to can access the HTML.
How can I use Firefox data for creating a WebRequest for Facebook page?
I tried this:
List<string> HTML_code = new List<string>();
WebRequest request = WebRequest.Create(URL);
using (WebResponse response = request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream()))
{
string line;
while ((line = stream.ReadLine()) != null)
{
HTML_code.Add(line);
}
}
...but the HTML resulted is the HTML of Facebook Home Page when I am not logged in.
If what you are trying to is retrieve the number of likes from a Facebook page, you can use Facebook's Graph API service. Just too keep it simple, this is what I basically did in the code:
Retrieve the Facebook page's data. In this case I used the Coke page's data since it was an example FB had listed.
Parse the returned Json using Json.Net. There are other ways to do this, but this just keeps it simple, and you can get Json.Net over at Codeplex. The documentation that I looked for my code was from this page in the docs. Their documentation will also help you with parsing and serializing even more Json if you need to.
Then that basically translates in to this code. Just note that I left out all the fancy exception handling to keep it simple as using networking is not always reliable! Also don't forget to include the Json.Net library in your project!
Usings:
using System.IO;
using System.Net;
using Newtonsoft.Json.Linq;
Code:
string url = "https://graph.facebook.com/cocacola";
WebClient client = new WebClient();
string jsonData = string.Empty;
// Load the Facebook page info
Console.WriteLine("Connecting to Facebook...");
using (Stream data = client.OpenRead(url))
{
using (StreamReader reader = new StreamReader(data))
{
jsonData = reader.ReadToEnd();
}
}
// Get number of likes from Json data
JObject jsonParsed = JObject.Parse(jsonData);
int likes = (int)jsonParsed.SelectToken("likes");
// Write out the result
Console.WriteLine("Number of Likes: " + likes);

How to fetch webpage title and images from URL?

I want to fetch website title and images from URL.
as facebook.com doing. How I get images and website title from third party link.?
use html Agility Pack this is a sample code to get the title:
using System;
using HtmlAgilityPack;
protected void Page_Load(object sender, EventArgs e)
{
string url = #"http://www.veranomovistar.com.pe/";
System.Net.WebClient wc = new System.Net.WebClient();
HtmlDocument doc = new HtmlDocument();
doc.Load(wc.OpenRead(url));
var metaTags = doc.DocumentNode.SelectNodes("//title");
if (metaTags != null)
{
string title = metaTags[0].InnerText;
}
}
Any doubt, post your comment.
At a high level, you just need to send a standard HTTP request to the desired URL. This will get you the site's markup. You can then inspect the markup (either by parsing it into a DOM object and then querying the DOM, or by running some simple regexp's/pattern matching to find the things you are interested in) to extract things like the document's <title> element and any <img> elements on the page.
Off the top of my head, I'd use an HttpWebRequest to go get the page and parse the title out myself, then use further HttpWebRequests in order to go get any images referenced on the page. There's a darn good chance though that there's a better way to do this and somebody will come along and tell you what it is. If not, it'd look something like this:
HttpWebResponse response = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(<your URL here>);
response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
//use the StreamReader object to get the page data and parse out the title as well as
//getting locations of any images you need to get
catch
{
//handle exceptions
}
finally
{
if(response != null)
{
response.Close();
}
}
Probably the dumb way to do it, but that's my $0.02.
just u hav to write using javascript on source body
for example
if u r using master page just u hav to write code on matser page thats reflect on all the pages
u can also used the image Url property in this script like that
khan mohd faizan

get data from page that a url points to

Given a Url, I'd like to be able to capture the Title of the page this url points to, as well
as other info - eg a snippet of text from the first paragraph on a page? - maybe even an image from the page.
Digg.com does this nicely when you submit a url.
How could something like this be done in .Net c#?
You're looking for the HTML Agility Pack, which can parse malformed HTML documents.
You can use its HTMLWeb class to download a webpage over HTTP.
You can also download text over HTTP using .Net's WebClient class.
However, it won't help you parse the HTML.
You could try something like this:
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
namespace WebGet
{
class progMain
{
static void Main(string[] args)
{
ASCIIEncoding asc = new ASCIIEncoding();
WebRequest wrq = WebRequest.Create("http://localhost");
WebResponse wrp = wrq.GetResponse();
byte [] responseBuf = new byte[wrp.ContentLength];
int status = wrp.GetResponseStream().Read(responseBuf, 0, responseBuf.Length);
Console.WriteLine(asc.GetString(responseBuf));
}
}
}
Once you have the buffer, you can process it looking for paragraph or image HTML tags to extract portions of the returned data.
You can extract the title of a page with a function like the following. You would need to modify the regular expression to look for, say, the first paragraph of text but since each page is different, that may prove difficult. You could look for a meta description tag and take the value from that, however.
public static string GetWebPageTitle(string url)
{
// Create a request to the url
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
// If the request wasn't an HTTP request (like a file), ignore it
if (request == null) return null;
// Use the user's credentials
request.UseDefaultCredentials = true;
// Obtain a response from the server, if there was an error, return nothing
HttpWebResponse response = null;
try { response = request.GetResponse() as HttpWebResponse; }
catch (WebException) { return null; }
// Regular expression for an HTML title
string regex = #"(?<=<title.*>)([\s\S]*)(?=</title>)";
// If the correct HTML header exists for HTML text, continue
if (new List<string>(response.Headers.AllKeys).Contains("Content-Type"))
if (response.Headers["Content-Type"].StartsWith("text/html"))
{
// Download the page
WebClient web = new WebClient();
web.UseDefaultCredentials = true;
string page = web.DownloadString(url);
// Extract the title
Regex ex = new Regex(regex, RegexOptions.IgnoreCase);
return ex.Match(page).Value.Trim();
}
// Not a valid HTML page
return null;
}
You could use Selenium RC (Open Source, www.seleniumhq.org) to parse data etc. from the pages. It is a web test automation tool with an C# .Net lib.
Selenium have full API to read out specific items on a html page.

Categories

Resources