I need to have access at the HTML of a Facebook page, to extract from it some data. So, I need to create a WebRequest.
Example:
My code worked well for other sites, but for Facebook, I must be logged in to can access the HTML.
How can I use Firefox data for creating a WebRequest for Facebook page?
I tried this:
List<string> HTML_code = new List<string>();
WebRequest request = WebRequest.Create(URL);
using (WebResponse response = request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream()))
{
string line;
while ((line = stream.ReadLine()) != null)
{
HTML_code.Add(line);
}
}
...but the HTML resulted is the HTML of Facebook Home Page when I am not logged in.
If what you are trying to is retrieve the number of likes from a Facebook page, you can use Facebook's Graph API service. Just too keep it simple, this is what I basically did in the code:
Retrieve the Facebook page's data. In this case I used the Coke page's data since it was an example FB had listed.
Parse the returned Json using Json.Net. There are other ways to do this, but this just keeps it simple, and you can get Json.Net over at Codeplex. The documentation that I looked for my code was from this page in the docs. Their documentation will also help you with parsing and serializing even more Json if you need to.
Then that basically translates in to this code. Just note that I left out all the fancy exception handling to keep it simple as using networking is not always reliable! Also don't forget to include the Json.Net library in your project!
Usings:
using System.IO;
using System.Net;
using Newtonsoft.Json.Linq;
Code:
string url = "https://graph.facebook.com/cocacola";
WebClient client = new WebClient();
string jsonData = string.Empty;
// Load the Facebook page info
Console.WriteLine("Connecting to Facebook...");
using (Stream data = client.OpenRead(url))
{
using (StreamReader reader = new StreamReader(data))
{
jsonData = reader.ReadToEnd();
}
}
// Get number of likes from Json data
JObject jsonParsed = JObject.Parse(jsonData);
int likes = (int)jsonParsed.SelectToken("likes");
// Write out the result
Console.WriteLine("Number of Likes: " + likes);
Related
I wanna get some data from the Instagram users.
So I've used Instagram Basic Display Api and the profile data I could receive was these:
username
media count
account type
but I want these data:
username
name
media count
Profile Image
followers count
following count
I don't know how can I have these data without Instagram Graph API(in any way) in c#?
Or is there any way to get these data with the WebClient class or anything like that?
Update for #Eehab answer: I use RestClient and WebClient in this example and both of them give the same result.
Now see WebClient example:
WebClient client = new WebClient();
string page = client.DownloadString("https://www.instagram.com/instagram/?__a=1");
Console.WriteLine(page);
Console.ReadKey();
and see an image of this code here.
now see the result of the code above here
I've also got, that this link is the only access for login users and I've been login into my Instagram account in chrome already, but I think WebClient needs to log in too.
Edit Through #Eehab answer:
In this case for using this Url(https://www.instagram.com/{username}/?__a=1), we can't do it without Instagram logged-in browser profile. So we should log in to Instagram with selenium and use the logged-in cookies to use it for Url requests. So first Install the selenium web driver and then write the following codes(untested):
var driver = new ChromeDriver();
//go to Instagram
driver.Url = "https://www.instagram.com/";
//Log in
var userNameElement = _driver.FindElement(By.Name("username"));
userNameElement.SendKeys("Username");
var passwordElement = _driver.FindElement(By.Name("password"));
passwordElement.SendKeys(Cars[0].auth.pass);
var loginButton = _driver.FindElement(By.Id("login"));
loginButton.Click();
//Get cookies
var cookies = driver.Manage().Cookies.AllCookies.ToList();
//Send request with given cookies :)
var url = "https://www.instagram.com/{username}/?__a=1";
var httpRequest = (HttpWebRequest)WebRequest.Create(url);
foreach(var cookie in cookies){
httpRequest.Headers["Cookie"] += $"{cookie.Name}={cookie.Value}; ";
}
var httpResponse = (HttpWebResponse)httpRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
}
//...
If anyone can improve this question for more uses can edit and I really appreciate it :)
You could do that using the open API , example :
https://www.instagram.com/instagram/?__a=1
example code from postman code :
var client = new RestClient("https://www.instagram.com/instagram/?__a=1");
client.Timeout = -1;
var request = new RestRequest(Method.GET);
IRestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
you could use HttpClient class also, if you want to use WebClient you could do it with
WebClient.DownloadString Method while I don't recommend using WebClient for this scraping, keep in mind Instagram may block you if blocked you , you need residential proxies to bypass the block.
the response will be json data , use Json.Net or similar library to deserialize it.
just replace instagram with any username you want in the given url.
I am trying to read the content of a page from URL by using the below code in MVC C#
var webRequest = WebRequest.Create(#"https://example.com/aa/aa");
webRequest.Method = "GET";
using (var response = webRequest.GetResponse())
using (var content = response.GetResponseStream())
using (var reader = new StreamReader(content))
{
var strContent = reader.ReadToEnd();
}
but I didnot receive any response (the call never returned to strContent)
but when I run the same code using URL : https://google.com/, it worked fine.
I checked the source code for both pages, and found that https://google.com/ has a proper doctype and tags declared but the one I am hitting seems to be a properties file with no tags and doctype defined.
Any help will be appreciated.
using (var client = new WebClient())
{
string data = client.DownloadString("www.yourUrl.com");
}
In my .NET project, I have to use HTTP GET request to get weather info for my city from API. Because of my JavaScript background I thought "OK, so all I need is something like app.get(url, body)", so I started with something like this:
using (var client = new WebClient())
{
var responseString = client.DownloadString("http://www.webservicex.net/globalweather.asmx/GetWeather?CityName=" + city + "&CountryName=" + country);
string xmlString = DecodeXml(responseString);
return xmlString;
}
Unfortunately for me it turned out, that I have to use WCF to get the data. I searched the web for some tutorials, but I couldn't find anything with getting the data from outer sources, just creating own API.
I'm not a native speaker, so maybe I'm just out of words to look for the solution, but it would be awesome if you could give me some advice.
Assuming you are using Visual Studio. Add Service Reference, and then type "http://www.webservicex.net/globalweather.asmx" into the address and hit Go. It'll auto-generate the end point for you to use.
Then the code is something like:
ServiceReference1.GlobalWeatherSoapClient client = new ServiceReference1.GlobalWeatherSoapClient("GlobalWeatherSoap");
string cities = client.GetCitiesByCountry("Hong Kong");
If you want to just use HTTP GET, you can do something like this:
var city = "Dublin";
var country = "Ireland";
WebRequest request = WebRequest.Create(
"http://www.webservicex.net/globalweather.asmx/GetWeather?CityName=" +
city +
"&CountryName=" + country);
request.Credentials = CredentialCache.DefaultCredentials;
WebResponse response = request.GetResponse();
Console.WriteLine(((HttpWebResponse)response).StatusDescription);
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string responseFromServer = reader.ReadToEnd();
Console.WriteLine(responseFromServer);
reader.Close();
response.Close();
Console.ReadLine();
Please note, I have not HTML decoded the response here, you can simply use HttpUtility.HtmlDecode for that.
Also, you will need to include the following using statements:
using System.IO;
using System.Net;
I have a facebook account and I would like to extract my friend's photo and its personal detail such as "Date of birth", "Studied at" and so on. I am able to extract the address of the facebook's first page for each of my friends account but I don't know how to programmatically open webpage for each of my friends first page and save the html contain as a string so that I can extract out their personal detail and photos. Please help! Thank in advance!
You have Three options:
1- Using a WebClient object.
WebClient webClient = new webClient();
webClient.Credentials = new System.Net.NetworkCredential("UserName","Password", "Domain");
string pageHTML = WebClient .DownloadString("http://url");`
2- Using a WebRequest. This is the best solution because it gives you more control over your request.
WebRequest myWebRequest = WebRequest.Create("http://URL");
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream ReceiveStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader( ReceiveStream, encode );
string strResponse=readStream.ReadToEnd();
StreamWriter oSw=new StreamWriter(strFilePath);
oSw.WriteLine(strResponse);
oSw.Close();
readStream.Close();
myWebResponse.Close();
3- Using a WebBrowser (I bet you don't wanna do that)
WebBrowser wb = new WebBrowser();
wb.Navigate("http://URL");
string pageHTML = "";
wb.DocumentCompleted += (sender, e) => pageHTML = wb.DocumentText;
Excuse me if I misstyped any code because I improvised it and I don't have a syntax checker to check its correctness. But I think it should be fine.
EDIT: For facebook pages. You may consider using facebook Graph API:
http://developers.facebook.com/docs/reference/api/
Try this:
var html = new WebClient()
.DownloadString("the facebook account url goes here");
Also, once you have downloaded the HTML as a string I would highly recommend that you use the Html Agility Pack to parse it.
There are in general 2 things you can do here. The first thing you can do is called web scraping. That way you can download the source of the html with the following code:
var request = WebRequest.Create("http://example.com");
var response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(responseStream);
string stringResponse = reader.ReadToEnd();
}
stringResponse then contains the Html source of the website http://example.com
However, this is probably not what you want to do. Facebook has an SDK that you can use to download this kind of information. You can read about this on the following pages
http://developers.facebook.com/docs/reference/api/user/
If you want to use the FaceBook API then I think it's worth changing your question or asking a new question about this, since it's quite more complicated and requires some autorization and other codings. However, it's the best way since it's unlikely that your code is every going to break and it warrents the privacy of the people you want to get information from.
For example, if you query me with the api, you get the following string:
{
"id": "1089655429",
"name": "Timo Willemsen",
"birthday": "08/29/1989",
"education": [
{
"school": {
"id": "115091211836927",
"name": "Stedelijk Gymnasium Arnhem"
},
"year": {
"id": "127668947248449",
"name": "2001"
},
"type": "High School"
}
]
}
You can see that I'm Timo Wilemsen, 21 years old and studyied # Stedelijk Gymnasium Arnhem in 2001.
Use selenium 2.0 for C#. http://seleniumhq.org/download/
var driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://www.google.com");
String pageSource = driver.PageSource;
Given a Url, I'd like to be able to capture the Title of the page this url points to, as well
as other info - eg a snippet of text from the first paragraph on a page? - maybe even an image from the page.
Digg.com does this nicely when you submit a url.
How could something like this be done in .Net c#?
You're looking for the HTML Agility Pack, which can parse malformed HTML documents.
You can use its HTMLWeb class to download a webpage over HTTP.
You can also download text over HTTP using .Net's WebClient class.
However, it won't help you parse the HTML.
You could try something like this:
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
namespace WebGet
{
class progMain
{
static void Main(string[] args)
{
ASCIIEncoding asc = new ASCIIEncoding();
WebRequest wrq = WebRequest.Create("http://localhost");
WebResponse wrp = wrq.GetResponse();
byte [] responseBuf = new byte[wrp.ContentLength];
int status = wrp.GetResponseStream().Read(responseBuf, 0, responseBuf.Length);
Console.WriteLine(asc.GetString(responseBuf));
}
}
}
Once you have the buffer, you can process it looking for paragraph or image HTML tags to extract portions of the returned data.
You can extract the title of a page with a function like the following. You would need to modify the regular expression to look for, say, the first paragraph of text but since each page is different, that may prove difficult. You could look for a meta description tag and take the value from that, however.
public static string GetWebPageTitle(string url)
{
// Create a request to the url
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
// If the request wasn't an HTTP request (like a file), ignore it
if (request == null) return null;
// Use the user's credentials
request.UseDefaultCredentials = true;
// Obtain a response from the server, if there was an error, return nothing
HttpWebResponse response = null;
try { response = request.GetResponse() as HttpWebResponse; }
catch (WebException) { return null; }
// Regular expression for an HTML title
string regex = #"(?<=<title.*>)([\s\S]*)(?=</title>)";
// If the correct HTML header exists for HTML text, continue
if (new List<string>(response.Headers.AllKeys).Contains("Content-Type"))
if (response.Headers["Content-Type"].StartsWith("text/html"))
{
// Download the page
WebClient web = new WebClient();
web.UseDefaultCredentials = true;
string page = web.DownloadString(url);
// Extract the title
Regex ex = new Regex(regex, RegexOptions.IgnoreCase);
return ex.Match(page).Value.Trim();
}
// Not a valid HTML page
return null;
}
You could use Selenium RC (Open Source, www.seleniumhq.org) to parse data etc. from the pages. It is a web test automation tool with an C# .Net lib.
Selenium have full API to read out specific items on a html page.