How to select specific node using Html Agility Pack? - c#

<div class="form-field wide-80 normal">1997-09-15</div>
I am trying to select the date inside it 1997-09-15. I tried this code but its giving an error of "Xpath Exception was Unhandled" what's wrong in the code please Help
string Url = "http://whois.domaintools.com/google.com";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(Url);
var SpanNodes = doc.DocumentNode.SelectNodes("//div[#class=form-field wide-80 normal]");
if (SpanNodes != null)
{
foreach (HtmlNode SN in SpanNodes)
{
string text = SN.FirstChild.InnerText.Trim();
MessageBox.Show(text);
}
}

You forget 's
var SpanNodes =
doc.DocumentNode.SelectNodes("//div[#class='form-field wide-80 normal']");

Related

I want to parse this website url "https://www.171chryslerdodgejeepram.com/" to get value of DealerId but I have no idea how to do it

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("DealerId:")))
{
}
I am trying to get value of dealer id with the help of above code but it is not returning anything, but if you trying to find DealerId from page source of above website then it is there. please help me to achieve this or if anything wrong with above code then please correct me.
you can parse JS code with something like (https://github.com/sebastienros/jint) and extract the field you need from JS object
you can simple find this text (not elegant, but works):
UPD1: regex version
static void Main(string[] args)
{
var url = "https://www.171chryslerdodgejeepram.com/";
var web = new HtmlWeb();
var doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("dealerId:")))
{
var s = node.InnerText;
Regex r = new Regex(#"dealerId:\s+'(\d+)'");
Match m = r.Match(s);
Console.WriteLine(m.Groups[1].Value);
}
Console.ReadKey();
}
var url = "https://www.171chryslerdodgejeepram.com/";
var web = new HtmlWeb();
var doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.ToLower().Contains("dealerid")))
{
AdwordsAccount info = new AdwordsAccount();
var s = node.InnerText;
//Regex r = new Regex("dealerid(.*?)\",");
Regex r = new Regex(#"dealerId:\s+'(\d+)'");
Match m = r.Match(s.ToLower());
info.account = m.Groups[1].Value;
urlinfo.adwordsaccount.Add(info);
}
#cereberus this is my code

Scrape data with HtmlAgilityPack for a tag which doesn't have class

Here is my C# code what i am trying to do is to scrape data from a website by using HtmlAgilityPack but it's showing nothing found every time don't know what i am doing wrong a bit confused
HtmlAgilityPack.HtmlWeb webb = new HtmlAgilityPack.HtmlWeb();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
HtmlAgilityPack.HtmlDocument doc = webb.Load("mywebsite");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//ul[#class='unstyled']//li//a");
if (nodes != null)
{
foreach (HtmlNode n in nodes)
{
q = n.InnerText;
q = System.Net.WebUtility.HtmlDecode(q);
q = q.Trim();
Console.WriteLine(q);
}
}
else
{
Console.WriteLine("nothing found");
}
Here is the picture of the tag from which i am trying to capture data i need data from <a> tag .
The XPath used to select the tag is incorrect.
HtmlNodeCollection nodes =
doc.DocumentNode.SelectNodes("//ul[#class='unstyled']/li/a");
This should select all the anchor nodes and then you can loop through the nodes to get the InnerHtml.
Working sample shown below
string s = "<ul class='unstyle no-overflow'><li><ul class='unstyled'><li><a href='http://www.smsconnexion.com'>SMS ConneXion</a></li></ul><ul class='unstyled'><li><a href='http://www.celusion.com'>Celusion</a></li></ul></li></ul>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);
HtmlNodeCollection nodes =
doc.DocumentNode.SelectNodes("//ul[#class='unstyled']/li/a");
foreach(var node in nodes)
{
Console.WriteLine(node.Attributes["href"].Value);
}
Console.ReadLine();

NodeSet exception while using agilitypack

private void ShowStatistics_Click(object sender, RoutedEventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
HtmlWeb hw = new HtmlWeb();
doc = hw.Load("http://www.gamerankings.com/browse.html");
HtmlNodeCollection nodes= doc.DocumentNode.SelectNodes("//a/");
string result = "";
foreach (var item in nodes)
{
result += item.InnerText+Environment.NewLine;
}
Info.ItemsSource = result;
}
By pressing the button i want to get information from the webpage in a textbox called Info.
After pressing the button I get an exception saying that the result of expression should be NodeSet, what should I do? I'm using agility pack
Your XPATH is wrong. You can use this instead if you want to get all hyperlink elements
var nodes = doc.DocumentNode.Descendants("a");
In addition to #Hung Cao, you can actually shorten this/work around:
foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("Selector here")){
//your code here
}

how to get data from exact chtml class agility pack

I want to extract not the whole web-page but only text from one class, I want text from td class="result-neutral" and I don't know what is wrong with this code:
<td class="result-neutral" xseid="xz1nBfht">3 - 2 </td>
And this is C# code:
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
HtmlWeb hw = new HtmlWeb();
doc = hw.Load("htt
var scoreNodes = doc.DocumentNode.Descendants("td").Where(d =>d.Attributes.Contains("class")&&d.Attributes["class"].Value.Contains("result-neutral"));
foreach (var item in scoreNodes)
{
result += item.OuterHtml + Environment.NewLine;
}
Info.Text = result;
}
The OuterHtml returns html with start & end of the element. Don't you want InnerHtml or InnerText?
EDIT:
This snippet works for me:
const string html = #"<html><body><table><tr><td class='result-neutral' xseid='xz1nBfht'><a href='/hockey/russia/khl/ska-st-petersburg-metallurg-magnitogorsk-xz1nBfht/'>3 - 2</a></td></tr></table></body></html>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var scoreNodes = doc.DocumentNode.Descendants("td").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("result-neutral"));
string result = "";
foreach (var item in scoreNodes) {
result += item.InnerText + Environment.NewLine;
}
result = result.TrimEnd(); // the result is "3-2"

c# using html agility pack URI formats not supported

I am trying to use HTML agility pack to get my program to read in a file and get all the image srcs from it. Heres what I got so far:
private ArrayList GetImageLinks(String html,String link)
{
//link = url of webpage
//html = a string of the html, just for testing will remove after
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(link);
List<String> imgs = (from x in htmlDoc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();
Console.Out.WriteLine("Hey");
ArrayList imageLinks = new ArrayList(imgs);
foreach (String element in imageLinks)
{
Console.WriteLine(element);
}
return imageLinks;
}
And this is the error im getting:
System.ArgumentException: URI formats are not supported.
HtmlDocument docHtml = new HtmlWeb().Load(url);

Categories

Resources