Empty Xml Document response from Api - c#

I need information from imdb unoffical api "omdbapi".I am sending link in correct but when I get response the document is null.I am using htmlagiltypack.what am I doing wrong?
here is direct link:http://www.omdbapi.com/?i=tt2231253&plot=short&r=xml
string url = "http://www.omdbapi.com/?i=" + ImdbID + "&plot=short&r=xml";
HtmlWeb source = new HtmlWeb();
HtmlDocument document = source.Load(url);

Its no Html but a XML document you expect. Try this instead:
string url = "http://www.omdbapi.com/?i=tt2231253&plot=short&r=xml";
WebClient wc = new WebClient();
XDocument doc = XDocument.Parse(wc.DownloadString(url));
Console.WriteLine(doc);

Related

Html Agility Pack, SelectSingleNode

This code works
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
html = client.DownloadString("http://www.imdb.com/chart/moviemeter?ref_=nv_mv_mpm_8");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
MessageBox.Show(doc.DocumentNode.SelectSingleNode("//*[#id='main']/div/span/div/div/div[3]/table/tbody/tr[1]/td[2]/a").InnerText);
Html codes here:
Split
MessageBox shows the text which is "Split". But look this Html codes:
<div class="summary_text" itemprop="description">
Three girls are kidnapped by a man with a diagnosed 23 distinct personalities, and must try and escape before the apparent emergence of a frightful new 24th.
</div>
I want MessageBox to show the text which starts with "Three girls are kidn..." so i wrote this code:
WebClient client2 = new WebClient();
client2.Encoding = Encoding.UTF8;
HtmlAgilityPack.HtmlDocument doc2 = new HtmlAgilityPack.HtmlDocument();
doc2.LoadHtml(client2.DownloadString("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1"));
MessageBox.Show(doc2.DocumentNode.SelectSingleNode("//*[#id='title - overview - widget']/div[3]/div[1]/div[1]").InnerText);
When i start this code,an unhandled exception of type "System.NullReferenceException" occurred
Xpaths are true, i've checked a hundred times so what should i do?
Can you try this?
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1");
var desNodeText = doc.DocumentNode.Descendants("div").FirstOrDefault(o => o.GetAttributeValue("class", "") == "summary_text").InnerText;

What is the fastest way to get an HTML document node using XPath and the HtmlAgilityPack?

In my application I need to get to get the URL of the image of a blog post. In order to do this I'm using the HtmlAgilityPack.
This is the code I have so far:
static string GetBlogImageUrl(string postUrl)
{
string imageUrl = string.Empty;
using (WebClient client = new WebClient())
{
string htmlString = client.DownloadString(postUrl);
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(htmlString);
string xPath = "/html/body/div[contains(#class, 'container')]/div[contains(#class, 'content_border')]/div[contains(#class, 'single-post')]/main[contains(#class, 'site-main')]/article/header/div[contains(#class, 'featured_image')]/img";
HtmlNode node = htmlDocument.DocumentNode.SelectSingleNode(xPath);
imageUrl = node.GetAttributeValue("src", string.Empty);
}
return imageUrl;
}
The problem is that this is too slow, when I did some tests I noticed that It takes about three seconds to extract the URL of the image in the given page. Which it's a problem when I'm loading a feed and trying to red several articles.
I tried to use the absolute xpath of the element I want to load, but I didn't noticed any improvement. Is there a faster way to achieve this?
Can you try this code and see if it's faster or not?
string Url = "http://blog.cedrotech.com/5-tendencias-mobile-que-sua-empresa-precisa-acompanhar/";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var featureDiv = doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.Attributes.Contains("class") && _.Attributes["class"].Value.Contains("featured_image"));
var img = featureDiv.ChildNodes.First(_ => _.Name.Equals("img"));
var imgUrl = img.Attributes["src"];

Convert "iso-8859-1" to "utf-8" with HTML Agility Pack and xpath

I'm trying to get a piece of web page, but I have a problem with special characters. how to convert the data to obtain a correct reading? the website use ISO 8859-1 and i must use UTF 8.
string url = "http://www.ta-meteo.fr/troyes.htm";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
HtmlNode bulletinMatin = doc.DocumentNode.SelectSingleNode("//*[#id='blockdetday0']/div[1]/p[1]");
MessageBox.Show(bulletinMatin.InnerText);
thanks.
I solved the problem
string url = "http://www.ta-meteo.fr/troyes.htm";
Encoding iso = Encoding.GetEncoding("iso-8859-1");
HtmlWeb web = new HtmlWeb()
{
AutoDetectEncoding = false,
OverrideEncoding = iso,
};
HtmlDocument doc = web.Load(url);
HtmlNode bulletinMatin = doc.DocumentNode.SelectSingleNode("//*[#id='blockdetday0']/div[1]/p[1]");
MessageBox.Show(bulletinMatin.InnerText);

Why XDocument.Load(url) throws exception?

I am new to C# and I am trying to read xml from URL.
xml looks like this
<posts>
<post>
<title>title1</title>
<des>des1</des>
</post>
<post>
<title>title2</title>
<des>des2</des>
</post>
.....
</posts>
And this is what I am using to parse it.
String uri = "url";
XDocument books = XDocument.Load(uri);
When the debug hits XDocument line it throws an exception and skips it.
How can I avoid this?
I think URI for your XML is lacking the extension of the file which is causing the problem. Please try using:
String uri = PATH + "url.xml";
XDocument books = new XDocument();
books.Load(uri);
To parse XML obtained from URL u can use:
string strURL = "http://<some-server>/<some-uri-path>";
string xmlStr;
WebClient wc = new WebClient();
xmlStr = wc.DownloadString(strURL);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlStr);

Html Agility Pack Load method issue

I am using the Html Agility pack. When the Load method of HtmlDocument class is passed the URL like "http://www.stackoverflow.com" it says the URI is not in correct format.
doc.Load(TextBoxUrl.Text, Encoding.UTF8 );
the url I try is this http://www.stackoverflow.com/questions/846994/how-to-use-html-agility-pack
HAP can not load from url, only from file or from a string. Use WebClient or HttpWebRequest to get the page.
For example:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
using (var wc = new WebClient())
{
doc.LoadHtml(wc.DownloadString(TextBoxUrl.Text));
}

Categories

Resources