Finding node using HTML agility pack - c#

Here is the google chrome dev tool to get the elment im looking for.
Here are all the different ways I have tried to get the nodes..
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webObject.Html);
// HtmlNode footer = doc.DocumentNode.Descendants().SingleOrDefault(y => y. == "boardPickerInner");
// "//div[#class='boardPickerInner']"
//var y = (from HtmlNode node in doc.DocumentNode.SelectNodes("//")
// where node.InnerText == "boardPickerInner"
// select node.InnerHtml);
HtmlAgilityPack.HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//nameAndIcons");
var xq = doc.DocumentNode.SelectSingleNode("//td[#class='nameAndIcons']");
var x = doc.DocumentNode.SelectSingleNode("");
HtmlNode nodes = doc.DocumentNode.SelectSingleNode("//[#class='nameAndIcons']");
var boards = nodes.SelectNodes("//*[#class='nameAndIcons']");
Can someone explain what I am doing wrong..?

It looks like you have multiple span elements with class="nameAndIcons". So in order to get them all you could use the SelectNodes function:
var nodes = doc.DocumentNode.SelectNodes("//span[#class='nameAndIcons'"])

Related

Htmlagilitypack doesnt get nodes.

I am using Htmlagilitypack in c#. But when i want to select images in a div at the url bottom, there are nothing found in selector. But i think i write right selector.
Codes are in fiddle. Thanks.
https://dotnetfiddle.net/NNIC3X
var url = "https://fotogaleri.haberler.com/unlu-sarkici-imaj-degistirdi-gorenler-gozlerine/";
//I will get the images src values in .col-big div at this url.
var web = new HtmlWeb();
var doc = web.Load(url);
var htmlNodes = doc.DocumentNode.SelectNodes("//div[#class='col-big']//*/img");
//i am selecting all images in div.col-big. But there is nothing.
foreach (var node in htmlNodes)
{
Console.WriteLine(node.Attributes["src"].Value);
}
Your xpath is wrong because there is no div-tag that has class-attribtue with the value 'col-big'. There is however a div-tag that has a class attribute with the value 'col-big pull-left'. So try.
var htmlNodes = doc.DocumentNode.SelectNodes("//div[#class='col-big pull-left']//*/img");

How to get div by class in HtmlAgilityPack?

I'm following this tutorial, but I have a problem, I don't know how to get htmlNode by class name .
HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(e.Result);
HtmlNode divContainer = htmlDoc.GetElementbyId("directoryItems");//My problem here,I want to get by class name html
if (divContainer != null)
{
HtmlNodeCollection nodes = divContainer.SelectNodes("//table/tr");
....
}
Try this:
HtmlNodeCollection divContainer = htmlDoc.DocumentNode.SelectNodes("//div[#class='myClass']");
this will return a collection of div nodes with class="myClass"
Assuming that you want to select a <div> element having class attribute value equals "directoryItems", and you know there will be only one element meets the criteria (or you want to simply select the first occurrence if there are more then one), you can use .SelectSingleNode() method with following XPath query :
HtmlNode divContainer = htmlDoc.DocumentNode
.SelectSingleNode("//div[#class='directoryItems']");

How to use HtmlAgilityPack to get this two content?

Html code:
<div>
<div>Name</div>
Date
</div>
How to use HtmlAgilityPack to get the Name and Date values?
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode div in doc.DocumentElement.SelectNodes("//div"])
{
string parent = div.InnerText; //this will give you Name
foreach (HtmlNode child in div.ChildNodes)
{
string childDiv = child.InnerText; //this will give you the child
}
}
var doc = new HtmlDocument();
doc.LoadHtml("<div><div>Name</div>Date</div>");
var nodes = doc.DocumentNode
.DescendantNodes()
.Where(n => n.NodeType == HtmlNodeType.Text);
foreach (var node in nodes)
{
var value = node.InnerText;
}
Hard-coded:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<div><div>Name</div>Date</div>");
Console.WriteLine(((HtmlTextNode)doc.DocumentNode.ChildNodes[0].ChildNodes[0]).Text);
Console.WriteLine(((HtmlTextNode)doc.DocumentNode.ChildNodes[1]).Text);
If what you're looking for, is all the text nodes (as in Kevin Babcock's answer!). You have to remember that text in Xml (and with HtmlAgility) are nodes.

How to parse string with regular expression

How can I get List() from this text:
For the iPhone do the following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>
that should consist :Go to AppStore,
Search by him,
Download
Load the string up into the HTML Agility Pack then select all li elements inner text.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>");
var uls = doc.DocumentNode.Descendants("li").Select(d => d.InnerText);
foreach (var ul in uls)
{
Console.WriteLine(ul);
}
Wrap in an XML root element and use LINQ to XML:
var xml = "For the iPhone do the following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>";
xml = "<r>"+xml+"</r>";
var ele = XElement.Parse(xml);
var lists = ele.Descendants("li").Select(e => e.Value).ToList();
Returns in lists:
Go to AppStore
Search by him
Download

Html Agility Pack, SelectNodes from a node

Why does this pick all of my <li> elements in my document?
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
var travelList = new List<Page>();
var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']")
.SelectNodes("//li");
What I want is to get all <li> elements in the <div> with an id of "myTrips".
It's a bit confusing because you're expecting that it would do a selectNodes on only the div with id "myTrips", however if you do another SelectNodes("//li") it will performn another search from the top of the document.
I fixed this by combining the statement into one, but that would only work on a webpage where you have only one div with an id "mytrips". The query would look like this:
doc.DocumentNode.SelectNodes("//div[#id='myTrips'] //li");
var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']")
.SelectNodes(".//li");
Note the dot in the second line. Basically in this regard HTMLAgitilityPack completely relies on XPath syntax, however the result is non-intuitive, because those queries are effectively the same:
doc.DocumentNode.SelectNodes("//li");
some_deeper_node.SelectNodes("//li");
Creating a new node can be beneficial in some situations and lets you use the xpaths more intuitively. I've found this useful in a couple of places.
var myTripsDiv = doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']");
var myTripsNode = HtmlNode.CreateNode(myTripsDiv.InnerHtml);
var liOfTravels = myTripsNode.SelectNodes("//li");
You can do this with a Linq query:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
var travelList = new List<HtmlNode>();
foreach (var matchingDiv in doc.DocumentNode.DescendantNodes().Where(n=>n.Name == "div" && n.Id == "myTrips"))
{
travelList.AddRange(matchingDiv.DescendantNodes().Where(n=> n.Name == "li"));
}
I hope it helps
This seems counter intuitive to me aswell, if you run a selectNodes method on a particular node I thought it would only search for stuff underneath that node, not in the document in general.
Anyway OP if you change this line :
var liOfTravels =
doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']").SelectNodes("//li");
TO:
var liOfTravels =
doc.DocumentNode.SelectSingleNode("//div[#id='myTrips']").SelectNodes("li");
I think you'll be ok, i've just had the same issue and that fixed it for me. Im not sure though if the li would have to be a direct child of the node you have.

Categories

Resources