Html Agility Pack, SelectSingleNode - c#

This code works
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
html = client.DownloadString("http://www.imdb.com/chart/moviemeter?ref_=nv_mv_mpm_8");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
MessageBox.Show(doc.DocumentNode.SelectSingleNode("//*[#id='main']/div/span/div/div/div[3]/table/tbody/tr[1]/td[2]/a").InnerText);
Html codes here:
Split
MessageBox shows the text which is "Split". But look this Html codes:
<div class="summary_text" itemprop="description">
Three girls are kidnapped by a man with a diagnosed 23 distinct personalities, and must try and escape before the apparent emergence of a frightful new 24th.
</div>
I want MessageBox to show the text which starts with "Three girls are kidn..." so i wrote this code:
WebClient client2 = new WebClient();
client2.Encoding = Encoding.UTF8;
HtmlAgilityPack.HtmlDocument doc2 = new HtmlAgilityPack.HtmlDocument();
doc2.LoadHtml(client2.DownloadString("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1"));
MessageBox.Show(doc2.DocumentNode.SelectSingleNode("//*[#id='title - overview - widget']/div[3]/div[1]/div[1]").InnerText);
When i start this code,an unhandled exception of type "System.NullReferenceException" occurred
Xpaths are true, i've checked a hundred times so what should i do?

Can you try this?
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1");
var desNodeText = doc.DocumentNode.Descendants("div").FirstOrDefault(o => o.GetAttributeValue("class", "") == "summary_text").InnerText;

Related

HTMLAgilityPack not getting loading html of the webpage

I am trying to Crawl through https://www.adecco.ch/en-us/job-results but i am not able to load html from this page its not loading any getting any thing in html document.
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var divs = htmlDocument.DocumentNode.Descendants().ToList();
As mentioned in my comment, the content from the site is being sent back compressed and was not being decompressed before you tried loading it, so you were basically loading gibberish. This code should work fine -
var handler = new HttpClientHandler();
// this is the important bit
handler.AutomaticDecompression = System.Net.DecompressionMethods.All;
var httpClient = new HttpClient(handler);
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var divs = htmlDocument.DocumentNode.Descendants().ToList();

Get webpage source code with alt key code symbols using asp.net c#

I'm trying to get webpage source code using htmlagilitypack. This is my code to get source code and fill into multiline textbox:
var url = "http://www.example.com";
var web = new HtmlWeb();
var doc = web.Load(url);
sourcecodetxt.Text = doc.ToString();
code is working fine but if my webpage have some "Alt Codes Symbols" then symbol changed with some characters eg: ★ changed with ★
My question is how to get original symbol. Sorry for my bad english. Thanks in advance.
Try using WebClient and HtmlDocument's Load() method so you can specify the encoding:
WebClient client = new WebClient();
HtmlDocument doc = new HtmlDocument();
doc.Load(client.OpenRead("http://www.example.com"), Encoding.UTF8);

Empty Xml Document response from Api

I need information from imdb unoffical api "omdbapi".I am sending link in correct but when I get response the document is null.I am using htmlagiltypack.what am I doing wrong?
here is direct link:http://www.omdbapi.com/?i=tt2231253&plot=short&r=xml
string url = "http://www.omdbapi.com/?i=" + ImdbID + "&plot=short&r=xml";
HtmlWeb source = new HtmlWeb();
HtmlDocument document = source.Load(url);
Its no Html but a XML document you expect. Try this instead:
string url = "http://www.omdbapi.com/?i=tt2231253&plot=short&r=xml";
WebClient wc = new WebClient();
XDocument doc = XDocument.Parse(wc.DownloadString(url));
Console.WriteLine(doc);

Convert "iso-8859-1" to "utf-8" with HTML Agility Pack and xpath

I'm trying to get a piece of web page, but I have a problem with special characters. how to convert the data to obtain a correct reading? the website use ISO 8859-1 and i must use UTF 8.
string url = "http://www.ta-meteo.fr/troyes.htm";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
HtmlNode bulletinMatin = doc.DocumentNode.SelectSingleNode("//*[#id='blockdetday0']/div[1]/p[1]");
MessageBox.Show(bulletinMatin.InnerText);
thanks.
I solved the problem
string url = "http://www.ta-meteo.fr/troyes.htm";
Encoding iso = Encoding.GetEncoding("iso-8859-1");
HtmlWeb web = new HtmlWeb()
{
AutoDetectEncoding = false,
OverrideEncoding = iso,
};
HtmlDocument doc = web.Load(url);
HtmlNode bulletinMatin = doc.DocumentNode.SelectSingleNode("//*[#id='blockdetday0']/div[1]/p[1]");
MessageBox.Show(bulletinMatin.InnerText);

HTML Agility Pack 2

I am tring to scrap This Website .
The below Xpath expression working fine with FirePath firebug extension
html/body/table/tbody/tr[3]/td
But using same xpath expression the below code gives me null :
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("http://www.edb.gov.hk/templates/sch_list_print.asp?district=cw");
var collection= doc.DocumentNode.SelectNodes("html/body/table/tbody/tr[3]/td");
Can anyone help me on this. Thanks.
this works, looking at the source of the page you are trying to scrape there is no tbody inside of table.
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("http://www.edb.gov.hk/templates/sch_list_print.asp?district=cw");
var collection= doc.DocumentNode.SelectNodes("html/body/table/tr[3]/td");
change your xpath to
html/body/table/tr[3]/td

Categories

Resources