first .. sorry about my bad english
my question is how can i scrape div inside div in htmlagilitypack c#
this is test html code
<html>
<div class="all_ads">
<div class="ads__item">
<div class="test">
test 1
</div>
</div>
<div class="ads__item">
<div class="test">
test 2
</div>
</div>
<div class="ads__item">
<div class="test">
test 3
</div>
</div>
</div>
</html>
how to make a loop that get all ads then loop that control test inside ads
You can select all the nodes inside class all_ads as follow:-
var res = div.SelectNodes(".//div[#class='all_ads ads__item']");
.//div[#class='all_ads ads__item'] This will select all the nodes inside all_adswhich has class ads_item.
You have to use this path => //div[contains(#class, 'test')]
This means you need to select those div(s) that contains class with name ads__item.
and then select all those selected div(s) inner html. like
class Program
{
static void Main(string[] args)
{
string html = File.ReadAllText(#"Path to your html file");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var innerContent = doc.DocumentNode.SelectNodes("//div[contains(#class, 'test')]").Select(x => x.InnerHtml.Trim());
foreach (var item in innerContent)
Console.WriteLine(item);
Console.ReadLine();
}
}
Output:
Related
I have this html document:
<div class="link1">
link1
</div>
<div class="link2">
link2
</div>
<div class="link3">
link3
</div>
<div class="link3">
link4
</div>
<div class="link5">
link4
</div>
I want to show elements that specified with "link3" in webBrowser control by getting element by class name.
This code works, but if we have two elements by same class name it show nothing!
foreach (HtmlElement elm in webBrowser1.Document.All)
if (elm.GetAttribute("className") == "link3")
{
HtmlDocument doc = webBrowser1.Document;
doc.Body.InnerHtml = elm.InnerHtml;
}
Use this instead:
StringBuilder sb=new StringBuilder();
foreach (HtmlElement elm in webBrowser1.Document.All)
if (elm.GetAttribute("className") == "link3")
sb.Append(elm.InnerHtml);
HtmlDocument doc = webBrowser1.Document;
doc.Body.InnerHtml=sb.ToString();
Firstly, I tried a lot of ways but I couldn't solve my problem. I don't know how to place my node way in SelectSingleNode(?) method. I create a html path to reach my node in my c# code but if I run this code, I take NullReferenceException because of my html path. I just want you that how can I create my html way or any other solution?
This is example of html code:
<html>
<body>
<div id="container">
<div id="box">
<div class="box">
<div class="boxContent">
<div class="userBox">
<div class="userBoxContent">
<div class="userBoxElement">
<ul id ="namePart">
<li>
<span class ="namePartContent>
</span>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
And this my C# code:
namespace AgilityTrial
{
class Program
{
static void Main(string[] args)
{
Uri url = new Uri("https://....");
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string html = client.DownloadString(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string path = #"//html/body/div[#id='container']/div[#id='classifiedDetail']"+
"/div[#class='classifiedDetail']/div[#class='classifiedDetailContent']"+
"/div[#class='classifiedOtherBoxes']/div[#class='classifiedUserBox']"+
"/div[#class='classifiedUserContent']/ul[#id='phoneInfoPart']/li"+
"/span[#class='pretty-phone-part show-part']";
var tds = doc.DocumentNode.SelectSingleNode(path);
var date = tds.InnerHtml;
Console.WriteLine(date);
}
}
}
Take as an example your namePartContent span node. If you want to fetch that data you would simply do this:
doc.DocumentNode.SelectSingleNode(".//span[#class='namePartContent']")?.InnerText;
It will search/fetch a single span node with namePartContent as its class, begining at the root node, in your case <html>;
I have a HTML file that looks like this:
<div class="user_meals">
<div class="name">Name Surname</div>
<div class="day_meals">
<div class="meal">First Meal</div>
</div>
<div class="day_meals">
<div class="meal">Second Meal</div>
</div>
<div class="day_meals">
<div class="meal">Third Meal</div>
</div>
<div class="day_meals">
<div class="meal">Fourth Meal</div>
</div>
<div class="day_meals">
<div class="meal">Fifth Meal</div>
</div>
This code repeats a few times.
I want to get Name and Surname which is between <div> tag with class "name".
This is my code using HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(#"C:\workspace\file.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='name']"))
{
string vaule = node.InnerText;
}
But actually it doesn't work. Visual Studio throws me Exception:
An unhandled exception of type 'System.NullReferenceException'.
You are using wrong method to load HTML from a path LoadHtml expect HTML and not location of the file. Use Load instead.
The error you are getting is quite misleading as all properties are not null and standard tips from What is a NullReferenceException, and how do I fix it? don't apply.
Essentially this comes from the fact SelectNodes correctly returns null as there are not elements matching the query and foreach throws on it.
Fixed code:
HtmlDocument doc = new HtmlDocument();
// either doc.Load(#"C:\workspace\file.html") or pass HTML:
doc.LoadHtml("<div class='user_meals'><div class='name'>Name Surname</div></div> ");
var nodes = doc.DocumentNode.SelectNodes("//div[#class='name']");
// SelectNodes returns null if nothing found - may need to check
if (nodes == null)
{
throw new InvalidOperationException("Where all my nodes???");
}
foreach (HtmlNode node in nodes)
{
string vaule = node.InnerText;
vaule.Dump();
}
I'm trying to learn how to use CsQuery to traverse a dom to get specific text.
The html looks like this:
<div class="featured-rows">
<div class="row">
<div class="featured odd" data-genres-filter="MA0000002613">
<div class="album-cover">
<div class="artist">
Half apanese
</div>
<div class="title">
<div class="label"> Joyful Noise </div>
<div class="styles">
<div class="rating allmusic">
<div class="rating average">
<div class="headline-review">
</div>
<div class="featured even" data-genres-filter="MA0000002572, MA0000002613">
</div>
<div class="row">
<div class="row">
<div class="row">
My code attempt looks like this:
public void GetRows()
{
var artistName = string.Empty;
var html = GetHtml("http://www.allmusic.com/newreleases");
var rows = html.Select(".featured-rows");
foreach(var row in rows)
{
var odd = row.Cq().Find(".featured odd");
foreach(var artist in odd)
{
artistName = artist.Cq().Text();
}
}
}
The first select for .featured-row works but then i don't know how to get down to the .artist to get the text.
You should try something similar to this:
var html = GetHtml("http://www.allmusic.com/newreleases");
var query = CQ.Create(html)
var row = query[".artist>a"];
string link = row.Attributes["href"];
string text = row.DefaultValue or row.InnerText or row.Value...
CsQuery is port of JQuery so you can google for JQuery code
UPDATE:
To traverse to get all artists and titles
var rows = query[".featured odd"];
foreach(var row in rows)
{
var artistsLink = row[".artists>a"];
var title = row[".title"];
// here do whatever you need with this
}
List<string> artists = html[".featured .artist a"].Select(dom=>dom.TextContent).ToList();
where html == your CQ object.
var odd = row.Cq().Find(".featured odd");
should be
var odd = row.Cq().Find(".featured.odd");
I want to show a specific section of a html-page in a textbox in a WP7-app (C#). After a bit of searching online I found this:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("http://www.positief-project.be/?p=532");
string links = doc.DocumentNode
.Descendants("section")
.Where(section => section.Attributes["class"] != null &&
section.Attributes["class"].Value == "article-content").ToString();
txbContent.Text = links;
This doesn't give an error, but doesn't work either. How can I make it show in the text box?
Is jQuery an option?
HTML
<div class="section">
<div class="article-content">some foo 1</div>
<div class="article-content">some foo 2</div>
<div class="article-content">some foo 3</div>
<div class="article-content">some foo 4</div>
</div>
<br>
<input type="text" id="tbContent" />
jQuery
$(document).ready(function () {
var content;
$('.article-content').each(function(i, obj){
content += obj.innerHTML;
});
$('#tbContent').val(content);
});
See this fiddle http://jsfiddle.net/rodhartzell/Fk2xM/