HTML Agility pack help getting value in table

HTML Agility pack help getting value in table - c#

I'm trying to get values from a page, the code I'm using is this :
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("url")
var nodi = doc.DocumentNode.SelectNodes("//*[#id='hotellist_inner']/div");
string code = "";
if (nodi != null){
foreach (var item in nodi){
var prova = item.InnerText;
Console.WriteLine(code);
}
}
It all works fine as I can get most of the informations I need except for the price.
The url is This.
The Xpath of the info I'm trying to access is
//*[#id="hotellist_inner"]/div[1]/div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong/b
But it looks like my code ignores anything in this section of the HTML.
Can anyone tell me what am I doing wrong?
Thanks in advance!

Related

C# Get data from website and show it in textbox

Hello i am pretty new in c# sphere. I want to make a little program that will fetch data from the given page.
It is a fragment of website:
<h3 class="filmInfo__header cloneToCast cloneToOtherInfo" data-type="directing-header">reżyseria</h3>
<div class="filmInfo__info cloneToCast cloneToOtherInfo" data-type="directing-info"> <span itemprop="url" content="/person/Rupert+Sanders-1121101"></span> <span itemprop="name">Rupert Sanders</span> </div>
I want to get data from "Data-type="Directing-info" and get a result from title="Rupert Sanders"
Somebody can help me ?
My very simple code:
private void button1_Click(object sender, EventArgs e)
{
var url = "https://www.filmweb.pl/film/Kr%C3%B3lewna+%C5%9Anie%C5%BCka+i+%C5%81owca-2012-600541";
var httpClient = new HttpClient();
var html = httpClient.GetStringAsync(url);
textBox1.Text = (html.Result);
}

C# or .NET does not offer native HTML parsing functionality. However, there are a handful of libraries which provides HTML parsing functionality. For example, you can use Html Agility Pack.
First, you need to install it into your project. You can easily install it with NuGet Package Manager in Visual Studio if you use it.
After that, you can use it like this with your input HTML:
private void button1_Click(object sender, EventArgs e)
{
var url = "https://www.filmweb.pl/film/Kr%C3%B3lewna+%C5%9Anie%C5%BCka+i+%C5%81owca-2012-600541";
var httpClient = new HttpClient();
var html = httpClient.GetStringAsync(url);
// Create a HtmlDocument and load your HTML into it.
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html.Result);
// Find your desired node inside it.
HtmlNode directingInfoNode = htmlDocument.DocumentNode.SelectSingleNode("//div[#data-type='directing-info']/a");
// Get the title attribute of that node.
HtmlAttribute titleAttribute = directingInfoNode.Attributes["title"];
textBox1.Text = (titleAttribute.Value);
}
Of course, you will need to put necessary using statement to the top of your file:
using HtmlAgilityPack;

Unable to get html element by using X-Path in HtmlAgilityPack C#

I am trying to get element by using x-path tree element but showing null, and this type of x-path work for other site for me, only 2% site this types of X-Path not working, also i tried x-path from chrome also but when my x-path not work that time chrome x-path also not work.
public static void Main()
{
string url = "http://www.ndrf.gov.in/tender";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(url);
var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("/html[1]/body[1]/section[2]/div[1]/div[1]/div[1]/div[1]/div[2]/table[1]"); // i want this type // not wroking
//var nodetest2 = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=\"content\"]/div/div[1]/div[2]/table"); // from Google chrome // not wroking
//var nodetest3 = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=\"content\"]"); // by ID but i don't want this type // wroking
Console.WriteLine(nodetest1.InnerText); //fail
//Console.WriteLine(nodetest2.InnerText); //fail
//Console.WriteLine(nodetest3.InnerText); //proper but I don't want this type
}

The answer that #QHarr suggested works perfectly, But the reason you get null with a correct x-path, is that there is a javascript file in the header of the site, that adds a wrapper div around the table, and since getting result in HtmlAgilityPack seems not loading or executing js, the x-path returns null.
what you observe, after that js runs is:
<div class="view-content">
<div class="guide-text">
...
</div>
<div class="scroll-table1">
<!-- Your table is here -->
</div>
</div>
but what actually you get whithout that js, is:
<div class="view-content">
<!-- Your table is here -->
</div>
thus your x-path should be:
var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("/html[1]/body[1]/section[2]/div[1]/div[1]/div[1]/div[1]/table[1]");

Your xpath when used in browser selects for entire table. You can shorten and use as follows (fiddle):
using System;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
string url = "http://www.ndrf.gov.in/tender";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(url);
var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("//table");
Console.WriteLine(nodetest1.InnerText);
}
}

Use Fizzler.Systems.HtmlAgilityPack
details here : https://www.nuget.org/packages/Fizzler.Systems.HtmlAgilityPack/
This library adds extension methods called QuerySelector and QuerySelectorAll, that takes CSS Selector not XPath.

Ali Bordbar caught perfect, This Url adds a wrapper div when I navigating URL in WebBrowser control in this all JavaScript file are loaded,
but when i load URL using HtmlWeb there is none of the JavaScript file loaded.
The HtmlWeb retrieves the static HTML response that the server sends, and does not execute any javascript, whereas a WebBrowser would.
So WebBrowser control HTML DOM data XPath and HtmlWeb HTML DOM data XPath not match.
My below code work perfect for this switchvation
HtmlWeb web = new HtmlWeb();
web.AutoDetectEncoding = true;
HtmlAgilityPack.HtmlDocument theDoc1 = web.Load("http://www.ndrf.gov.in/tender");
var HtmlDoc = new HtmlAgilityPack.HtmlDocument();
var bodytag = theDoc1.DocumentNode.SelectSingleNode("//html");
HtmlDoc.LoadHtml(bodytag.OuterHtml);
var xpathHtmldata = HtmlDoc.DocumentNode.SelectSingleNode(savexpath); //savexpath is my first xpath make from HTML DOM data of WebBrowser control which is work for most url.
if (xpathHtmldata == null)
{
//take last tag name from first xpath
string mainele = savexpath.Substring(savexpath.LastIndexOf("/") + 1);
if (mainele.Contains("[")) { mainele = mainele.Remove(mainele.IndexOf("[")); }
//collect all tag name with name of which is sotre in mainele variable
var taglist = HtmlDoc.DocumentNode.SelectNodes("//" + mainele);
foreach (var ele in taglist) //check one by one element
{
string htmltext1 = ele.InnerText;
htmltext1 = Regex.Replace(htmltext1, #"\s", "");
htmltext1 = htmltext1.Replace("&", "&").Trim();
htmltext1 = htmltext1.Replace(" ", "").Trim();
string htmltext2 = saveInnerText; // my previus xpath text from HTML DOM data of WebBrowser control
htmltext2 = Regex.Replace(htmltext2, #"\s", "");
if (htmltext1 == htmltext2) // check equality to my previus xpath text..if it is equal thats my new xpath
{
savexpath = ele.XPath;
break;
}
}
}

C# Downloading Instagram Profile As HTML

I have been trying to download an public Instagram profile to the fetch stats such as followers and bio. I have been doing this in a c# console application and downloading the HTML using HTML Agility Pack.
Code:
string url = #"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en";
Console.WriteLine();
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
document.Save(path1);
When I save it though all I get is a bunch of scripts and a blank screen:
I was wondering how to save the html once all the scripts had run and formed the content

When you retrieve content using a web request, it returns a HTML document which is then rendered by the browser to display the content.
Right now, you're saving the HTML document given to you by the server. Instead of doing this, you need to render it before getting the details. One way to do this is using a web browser control. If you set the URL to the instragram URL, let the rendering engine handle it and once the load event is fired by the control, you can get the rendered HTML output.
From there, you can deserialize as an XmlDocument and identify exactly what details you need to retrieve from the rendered output.

public MainWindow()
{
InitializeComponent();
WB_1.Navigate(#"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en");
WB_1.LoadCompleted += wb_LoadCompleted;
}
void wb_LoadCompleted(object sender, NavigationEventArgs e)
{
dynamic doc = WB_1.Document;
string htmlText = doc.documentElement.InnerHtml;
}

ANSWER
Thanks for the suggestions on how to download the HTML! I managed to return some instagram information in the end. Here is the code:
//(This was done using HTML Agility Pack)
string url = #"https://www.instagram.com/" + Console.ReadLine() + #"/?hl=en";
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
var metas = document.DocumentNode.Descendants("meta");
var followers = metas.FirstOrDefault(_ => _.HasProperty("name", "description"));
if (followers == null) { Console.WriteLine("Sorry, Can't Find Profile :("); return; }
var content = followers.Attributes["content"].Value.StopAt('-');
Console.WriteLine(content);
And HasProperty() & StopAt()
public static bool HasProperty(this HtmlNode node, string property, params string[] valueArray)
{
var propertyValue = node.GetAttributeValue(property, "");
var propertyValues = propertyValue.Split(' ');
return valueArray.All(c => propertyValues.Contains(c));
}
public static string StopAt(this string input, char stopAt)
{
int x = input.IndexOf(stopAt);
return input.Substring(0, x);
}
NOTE:
However this is still not the answer I am looking for. I still have a wreck of HTML which is not structred the same as the HTML I recieve when I look at it in Google Chrome. Doing some searching in the HTML I managed to scalvage the content-less html for a meta tag which contains the content. This is okay for this but if I going to continue this method of finding HTML content then it may not be the same :(

How to get the Page Id in my Facebook Application page

I have my application is hosted in FaceBook as a tab and I want to get the page ID when my application is being added to be stored in my logic.
How can I get the page ID, I know it is stored in the URL but when I tried to get it from the page as a server variable, I am not getting it even my application is configured as iFrame ? But this is a standard way to get the parent URL.
C#:
string t= request.serverVariables("HTTP_REFERER");
//doesn't get FB page url even if your app is configured as iframe ?!! #csharpsdk #facebook devs
Any help ?
Thanks a lot.

Here is how I do it:
if (FacebookWebContext.Current.SignedRequest != null)
{
dynamic data = FacebookWebContext.Current.SignedRequest.Data;
if (data.page != null)
{
var pageId = (String)data.page.id;
var isUserAdmin = (Boolean)data.page.admin;
var userLikesPage = (Boolean)data.page.liked;
}
else
{
// not on a page
}
}

The Page ID is not stored in the URL; it is posted to your page within the signed_request form parameter. See this Facebook developer blog post for more details.
You can use the FacebookSignedRequest.Parse method within the Facebook C# SDK to parse the signed request (using your app secret). Once you have done this you can extract the Page ID from the Page JSON object as follows:
string signedRequest = Request.Form["signed_request"];
var DecodedSignedRequest = FacebookSignedRequest.Parse(FacebookContext.Current.AppSecret, SignedRequest);
dynamic SignedRequestData = DecodedSignedRequest.Data;
var RawRequestData = (IDictionary<string, object>)SignedRequestData;
if (RawRequestData.ContainsKey("page") == true)
{
Facebook.JsonObject RawPageData = (Facebook.JsonObject)RawRequestData["page"];
if (RawPageData.ContainsKey("id") == true)
currentFacebookPageID = (string)RawPageData["id"];
}
Hope this helps.

Here's the same solution as Andy Sinclairs's in VB that worked for me:
Dim pageId as Int64 = 0
Dim signed_request As String = Request.Form("signed_request")
Dim req = FacebookSignedRequest.Parse(AppSettings("FacebookSecret"), signed_request)
Dim data As IDictionary(Of String, Object) = req.Data
If data.ContainsKey("page") Then
Dim RawPageData As Facebook.JsonObject = data("page")
If RawPageData.ContainsKey("id") Then
pageId = RawPageData("id")
End If
End If

Using the Facebook C# SDK for Fan Page Custom iFrame Tabs

I'm trying to create a simple iFrame custom tab on my fan page. I'm using the Facebook C# SDK and I need to read the signed_request value that Facebook passes to my iFrame page.
I can print the signed_request encoded value so I know its showing up, but when I try to decode it with the Facebook C# SDK I'm getting an error. I'm using .NET 4.0 and dynamics.
Here's my code:
signedRequestString contains the Request value with the signed_param passed from Facebook.
var result = FacebookSignedRequest.Parse(FacebookContext.Current.AppSecret, signedRequestString);
dynamic signedRequestJson = result.Data;
dynamic page = signedRequestJson.page;
And the error I receive:
Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: Cannot perform runtime binding on a null reference at CallSite.Target(Closure , CallSite , Object ) at System.Dynamic.UpdateDelegates.UpdateAndExecute1[T0,TRet](CallSite site, T0 arg0) at DecodeSignedRequest(String signedRequestString)
Any thoughts why I would be getting a null? I setup my web.config properly (I think), but I'm guessing I'm missing an initialization step or something.

It's easier to use FacebookWebContext.Current.SignedRequest.
You can then access the information about the page:
if (FacebookWebContext.Current.SignedRequest != null)
{
dynamic data = FacebookWebContext.Current.SignedRequest.Data;
if (data.page != null)
{
var pageId = (String)data.page.id;
var isUserAdmin = (Boolean)data.page.admin;
var userLikesPage = (Boolean)data.page.liked;
}
else
{
// not on a page
}
}

You need to cast your signedRequestJson object to an IDictionary key/value pair before you can grab the page data.
You can do this as follows:
dynamic signedRequestJson = result.Data;
var RawRequestData = (IDictionary<string, object>)signedRequestJson;
You can then access the page data using the JSON keys (assuming you are referencing the Newtonsoft.Json.dll library):
Facebook.JsonObject RawPageData = (Facebook.JsonObject)RawRequestData["page"];
currentFacebookPageID = (string)RawPageData["id"];
Hope this helps.

I am using this. I hope it works for you:
Facebook.FacebookConfigurationSection s = new FacebookConfigurationSection();
s.AppId = 'ApplicationID';
s.AppSecret = 'ApplicationSecret';
FacebookWebContext wc = new FacebookWebContext(s);
dynamic da = wc.SignedRequest.Data;
dynamic page = da.page;
string pageid = page.id;
bool isLiked = page.liked;
bool isAdmin = page.admin;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

HTML Agility pack help getting value in table - c#

Related

C# Get data from website and show it in textbox

Unable to get html element by using X-Path in HtmlAgilityPack C#

C# Downloading Instagram Profile As HTML

How to get the Page Id in my Facebook Application page

Using the Facebook C# SDK for Fan Page Custom iFrame Tabs

Categories

Resources