How to get div by class in HtmlAgilityPack?

How to get div by class in HtmlAgilityPack? - c#

I'm following this tutorial, but I have a problem, I don't know how to get htmlNode by class name .
HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(e.Result);
HtmlNode divContainer = htmlDoc.GetElementbyId("directoryItems");//My problem here,I want to get by class name html
if (divContainer != null)
{
HtmlNodeCollection nodes = divContainer.SelectNodes("//table/tr");
....
}

Try this:
HtmlNodeCollection divContainer = htmlDoc.DocumentNode.SelectNodes("//div[#class='myClass']");
this will return a collection of div nodes with class="myClass"

Assuming that you want to select a <div> element having class attribute value equals "directoryItems", and you know there will be only one element meets the criteria (or you want to simply select the first occurrence if there are more then one), you can use .SelectSingleNode() method with following XPath query :
HtmlNode divContainer = htmlDoc.DocumentNode
.SelectSingleNode("//div[#class='directoryItems']");

Related

Finding node using HTML agility pack

Here is the google chrome dev tool to get the elment im looking for.
Here are all the different ways I have tried to get the nodes..
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webObject.Html);
// HtmlNode footer = doc.DocumentNode.Descendants().SingleOrDefault(y => y. == "boardPickerInner");
// "//div[#class='boardPickerInner']"
//var y = (from HtmlNode node in doc.DocumentNode.SelectNodes("//")
// where node.InnerText == "boardPickerInner"
// select node.InnerHtml);
HtmlAgilityPack.HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//nameAndIcons");
var xq = doc.DocumentNode.SelectSingleNode("//td[#class='nameAndIcons']");
var x = doc.DocumentNode.SelectSingleNode("");
HtmlNode nodes = doc.DocumentNode.SelectSingleNode("//[#class='nameAndIcons']");
var boards = nodes.SelectNodes("//*[#class='nameAndIcons']");
Can someone explain what I am doing wrong..?

It looks like you have multiple span elements with class="nameAndIcons". So in order to get them all you could use the SelectNodes function:
var nodes = doc.DocumentNode.SelectNodes("//span[#class='nameAndIcons'"])

Htmlagilitypack: create html text node

In HtmlAgilityPack, I want to create HtmlTextNode, which is a HtmlNode (inherts from HtmlNode) that has a custom InnerText.
HtmlTextNode CreateHtmlTextNode(string name, string text)
{
HtmlDocument doc = new HtmlDocument();
HtmlTextNode textNode = doc.CreateTextNode(text);
textNode.Name = name;
return textNode;
}
The problem is that the textNode.OuterHtml and textNode.InnerHtml will be equal to "text" after the method above.
e.g. CreateHtmlTextNode("title", "blabla") will generate:
textNode.OuterHtml = "blabla" instead of <Title>blabla</Title>
Is there any better way to create HtmlTextNode?

The following lines creates a outer html with content
var doc = new HtmlDocument();
// create html document
var html = HtmlNode.CreateNode("<html><head></head><body></body></html>");
doc.DocumentNode.AppendChild(html);
// select the <head>
var head = doc.DocumentNode.SelectSingleNode("/html/head");
// create a <title> element
var title = HtmlNode.CreateNode("<title>Hello world</title>");
// append <title> to <head>
head.AppendChild(title);
// returns Hello world!
var inner = title.InnerHtml;
// returns <title>Hello world!</title>
var outer = title.OuterHtml;
Hope it helps.

A HTMLTextNode contains just Text, no tags.
It's like the following:
<div> - HTML Node
<span>text</span> - HTML Node
This is the Text Node - Text Node
<span>text</span> - HTML Node
</div>
You're looking for a standard HtmlNode.
HtmlDocument doc = new HtmlDocument();
HtmlNode textNode = doc.CreateElement("title");
textNode.InnerHtml = HtmlDocument.HtmlEncode(text);
Be sure to call HtmlDocument.HtmlEncode() on the text you're adding. That ensures that special characters are properly encoded.

How to get text between two div tags with some class attribute with HTMLagility

I want to get some text from two html div from HTML file.
After some searches i decided to use HTMLAgility Pack for doing this.
I wrote this code :
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(result);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//*div[#class='item']");
string value = node.InnerText;
'result' is my content of the File.
But i get this exception : 'Expression must evaluate to a node-set'
And this is some of mt file's content :
<div class="Clear" style="height:15px;"></div>
<div class='Container Select' id="Container_1">
<div class='Item'><div class='Part Lable'>موضوع : </div><div class='Part ...

try either
"//*/div[#class='item']"
or simply
"//div[#class='item']"

have you tried using XPath
for example if I wanated to find a if a node is selected in my example I would do the following
string xpath = null;
XmlNode configNode = configDom.DocumentElement;
// collect selected nodes in node list
XmlNodeList nodeList =
configNode.SelectNodes(#"//*[#status='checked']");
in your case you would do the following
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(result);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//*/div[#class='item']");
string value = node.InnerText;

Html Agility Pack, SelectNodes from a node

Why does this pick all of my <li> elements in my document?
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
var travelList = new List<Page>();
var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']")
.SelectNodes("//li");
What I want is to get all <li> elements in the <div> with an id of "myTrips".

It's a bit confusing because you're expecting that it would do a selectNodes on only the div with id "myTrips", however if you do another SelectNodes("//li") it will performn another search from the top of the document.
I fixed this by combining the statement into one, but that would only work on a webpage where you have only one div with an id "mytrips". The query would look like this:
doc.DocumentNode.SelectNodes("//div[#id='myTrips'] //li");

var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']")
.SelectNodes(".//li");
Note the dot in the second line. Basically in this regard HTMLAgitilityPack completely relies on XPath syntax, however the result is non-intuitive, because those queries are effectively the same:
doc.DocumentNode.SelectNodes("//li");
some_deeper_node.SelectNodes("//li");

Creating a new node can be beneficial in some situations and lets you use the xpaths more intuitively. I've found this useful in a couple of places.
var myTripsDiv = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']");
var myTripsNode = HtmlNode.CreateNode(myTripsDiv.InnerHtml);
var liOfTravels = myTripsNode.SelectNodes("//li");

You can do this with a Linq query:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
var travelList = new List<HtmlNode>();
foreach (var matchingDiv in doc.DocumentNode.DescendantNodes().Where(n=>n.Name == "div" && n.Id == "myTrips"))
{
travelList.AddRange(matchingDiv.DescendantNodes().Where(n=> n.Name == "li"));
}
I hope it helps

This seems counter intuitive to me aswell, if you run a selectNodes method on a particular node I thought it would only search for stuff underneath that node, not in the document in general.
Anyway OP if you change this line :
var liOfTravels =
doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']").SelectNodes("//li");
TO:
var liOfTravels =
doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']").SelectNodes("li");
I think you'll be ok, i've just had the same issue and that fixed it for me. Im not sure though if the li would have to be a direct child of the node you have.

HtmlAgilityPack selecting childNodes not as expected

I am attempting to use the HtmlAgilityPack library to parse some links in a page, but I am not seeing the results I would expect from the methods. In the following I have a HtmlNodeCollection of links. For each link I want to check if there is an image node and then parse its attributes but the SelectNodes and SelectSingleNode methods of linkNode seems to be searching the parent document not the childNodes of linkNode. What gives?
HtmlDocument htmldoc = new HtmlDocument();
htmldoc.LoadHtml(content);
HtmlNodeCollection linkNodes = htmldoc.DocumentNode.SelectNodes("//a[#href]");
foreach(HtmlNode linkNode in linkNodes)
{
string linkTitle = linkNode.GetAttributeValue("title", string.Empty);
if (linkTitle == string.Empty)
{
HtmlNode imageNode = linkNode.SelectSingleNode("/img[#alt]");
}
}
Is there any other way I could get the alt attribute of the image childnode of linkNode if it exists?

You should remove the forwardslash prefix from "/img[#alt]" as it signifies that you want to start at the root of the document.
HtmlNode imageNode = linkNode.SelectSingleNode("img[#alt]");

With an xpath query you can also use "." to indicate the search should start at the current node.
HtmlNode imageNode = linkNode.SelectSingleNode(".//img[#alt]");

Also, watch out for null checks; SelectNodes returns null instead of blank collection.
HtmlNodeCollection linkNodes = htmldoc.DocumentNode.SelectNodes("//a[#href]");
**if(linkNodes!=null)**
{
foreach(HtmlNode linkNode in linkNodes)
{
string linkTitle = linkNode.GetAttributeValue("title", string.Empty);
if (linkTitle == string.Empty)
{
**HtmlNode imageNode = linkNode.SelectSingleNode("img[#alt]");**
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to get div by class in HtmlAgilityPack? - c#

Try this: HtmlNodeCollection divContainer = htmlDoc.DocumentNode.SelectNodes("//div[#class='myClass']"); this will return a collection of div nodes with class="myClass"

Related

Finding node using HTML agility pack

Htmlagilitypack: create html text node

How to get text between two div tags with some class attribute with HTMLagility

Html Agility Pack, SelectNodes from a node

HtmlAgilityPack selecting childNodes not as expected

Categories

Resources