I want to catch some data from a website using HtmlAgilityPack. The data is stored in an object with the property class="translateTxt". I use this code but it returns null
c# code:
HtmlAgilityPack.HtmlDocument doc = hw.Load(Url);
HtmlNodeCollection nodes1 = doc.DocumentNode.SelectNodes("//div[#class='translateTxt']");
foreach (HtmlNode node in nodes1)
{
string Txt = node.InnerText;
}
html code:
<div id="trans" class="tap_mt">
<div class="tr_brst clearfix">
<div class="tr_instyle">
<div class="tr_ext clearfix">
<div class="translateTxt">
hi
</div>
</div>
</div>
</div>
</div>
Try using the following to get the all descendants div tag's
var findclasses = doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("translateTxt"));
Then loop over your findClasses variable
Related
i use geckofx c# to get the textcontext. why can't I get the textcontent result if the html element is like this
<div>
<div class="text1">
<div class="text2">Michael</div>
<div class="text2">Andrey</div>
</div>
</div>
output : null
whereas if the html element like this
<div class="text">
<div class="text1">
<div class="text2">Michael</div>
<div class="text2">Andrey</div>
</div">
</div>
output :
Michael, Andrey
I use geckofx code like this
GeckoNodeCollection names = geckoWebBrowser1.Document.GetElementsByClassName("text1");
foreach (GeckoNode name in names)
{
console.writeline(name.TextContent);
}
the difference that I see is in <div> and <div class="text">. I appreciate all the help you provide.
Anyone know how to print out all the elements that contain in a list with text value in selenium c#? Try to do like the code below it print out blank value. But if i were to put writeline with elem only the value was display but it is not in text form. I would like to get value with text.
Code:
IList<IWebElement> attachmentList = driver.FindElements(By.ClassName("comment-box"));
foreach (IWebElement element in attachmentList)
{
Console.WriteLine(element.Text);
}
HTML:
<div class="comment-box">
<!-- Comment Image -->
<div class="col-xs-2">
<div id="attachmentImgSFHD-24" class="attachmentImg">
<img src="downloadAttachment?attachmenturl=/secure/thumbnail/10111/_thumb_10111.png" />
</div>
</div>
<!-- Attachment details -->
<div class="col-xs-10">
<div class="commentContent">
<div class="topRow">
<div class="username">ApplicationLink.png</div>
<div class="commentTimeStamp">31400 KB</div>
</div>
<div class="bottomRow">
<div class="commentDisplay">
Download
</div>
</div>
</div>
</div>
</div>
<div class="comment-box">
<!-- Comment Image -->
<div class="col-xs-2">
<div id="attachmentImgSFHD-24" class="attachmentImg">
<img src="downloadAttachment?attachmenturl=/secure/thumbnail/10313/_thumb_10313.png" />
</div>
</div>
<!-- Attachment details -->
<div class="col-xs-10">
<div class="commentContent">
<div class="topRow">
<div class="username">test.jpg</div>
<div class="commentTimeStamp">7423 KB</div>
</div>
<div class="bottomRow">
<div class="commentDisplay">
Download
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br/>
The elements don't have text in the html, so element.Text is empty. Use
Console.WriteLine(element.GetAttribute("value"));
You can use the below Xpath to get the attachment details
Xpath: //div[#class='comment-box']//div[#class='commentContent']//div[#class='username']
Code:
IList<IWebElement> attachmentList = driver.FindElements(By.XPath("//div[#class='comment-box']//div[#class='commentContent']//div[#class='username']"));
foreach (IWebElement element in attachmentList)
{
Console.WriteLine(element.Text);//It will print all the attachment name like 'ApplicationLink.png,test.jpg'
}
IList<IWebElement> attachmentList = driver.FindElements(By.ClassName("comment-box"));
foreach (IWebElement element in attachmentList)
{
System.Threading.Thread.Sleep(2000);
Console.WriteLine(element.Text);
}
it works fine by putting the thread.sleep code
I want to scrape data with Html Agility Pack.
I used this:
string url = #"https://mobile.bet365.gr/#type=Coupon;key=1-1-13-40-141-0-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0;ip=0;lng=5;anim=1";
var webGet = new HtmlWeb();
var document = webGet.Load(url);
var nodes = document.DocumentNode.SelectNodes("//*[#id='Coupon']/div[1]/div[2]/div[1]/div/div[1]/div[1]/span");
int i = 0;
foreach (var node in nodes)
{
dataGridView1.Rows.Add();
dataGridView1.Rows[i].Cells[0].Value = i + 1;
dataGridView1.Rows[i].Cells[1].Value = node.InnerHtml;
i++;
}
The XPath is taken from FireXPath but nothing appears.
The HTML snippet is this:
<div id="Coupon" class="C4 C4_1">
<div class="liveAlertKey enhancedPod cc_12_7" data-sportskey="1-1-13-40-141-0-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0" data-alertkey="NPower Champs">
<h1><em>Αγγλία - Τσάμπιονσιπ</em></h1>
<div class="podHeaderRow">
<div class="wideLeftColumn">Παρ 29 Σεπ</div>
<div class="priceColumn"><em>1</em></div>
<div class="priceColumn"><em>X</em></div>
<div class="priceColumn"><em>2</em></div>
</div>
<div data-fixtureid="67185688" data-plbtid="40" class="podEventRow cc_12_4 ippg-Market " data-nav="rw_spl_sc_1-1-8-67185688-3-0-0-0-1-0-0-0-0-0-1-0-0-0-0-0-0,MarketCount,1-1-8-67185688-3-0-0-0-1-0-0-0-0-0-1-0-0-0-0-0-0,False,1">
<div class="wideLeftColumn hasStatsIcon">
<div class="ippg-Market_GameDetail">
<div class="ippg-Market_GameItem ">
<div class="ippg-Market_CompetitorName">
<span class="ippg-Market_Truncator">ΚΠΡ</span>
</div>
<div class="ippg-Market_CompetitorScores">
<span class="ippg-PointNode"></span>
</div>
</div>
<div class="ippg-Market_GameItem ">
<div class="ippg-Market_CompetitorName">
<span class="ippg-Market_Truncator">Φούλαμ</span>
</div>
<div class="ippg-Market_CompetitorScores">
<span class="ippg-PointNode"></span>
</div>
</div>
<div class="ippg-Market_MetaContainer ">
<div class="ippg-Market_GameStartTime">20:45</div>
<div class="ippg-Market_GameInfo "></div>
<div class="ippg-Market_MarketCount">109</div>
<div id="FixtureIconsContainer">
<img src="/grfx/V6/Misc/pixel.gif" class="VideoIcon SSP-7">
</div>
<div id="StatsIconContainer">
<a class="icon-stats" target="_blank" data-nav="externalLink" href="http://www.stats.betradar.com/s4/?clientid=259&matchid=11868244&language=el"></a>
</div>
</div>
</div>
</div>
<div class="ippg-Market_Topic priceColumn" data-nav="pt=N#o=9/4#f=67185688#fp=1410316836#so=0#c=1#" data-inplaytopic="" data-pgfpid="1410316836" data-inplaymarkettopic="" data-inplayaltmarkettopic="">
<span class="ippg-Market_Odds">3.25</span>
</div>
<div class="ippg-Market_Topic priceColumn" data-nav="pt=N#o=13/5#f=67185688#fp=1410316839#so=0#c=1#" data-inplaytopic="" data-pgfpid="1410316839" data-inplaymarkettopic="" data-inplayaltmarkettopic="">
<span class="ippg-Market_Odds">3.60</span>
</div>
<div class="ippg-Market_Topic priceColumn" data-nav="pt=N#o=5/4#f=67185688#fp=1410316841#so=0#c=1#" data-inplaytopic="" data-pgfpid="1410316841" data-inplaymarkettopic="" data-inplayaltmarkettopic="">
<span class="ippg-Market_Odds">2.25</span>
</div>
</div>
</div>
</div>
Could anyone help me find the correct XPath? I used this technique in other sites and I had taken the results I wanted but from this site I have some problem to find the correct XPath.
You can get your teams and odds from the HTML snippet like this:
HtmlDocument document = new HtmlDocument();
document.Load(Server.MapPath("xpath.html"));
// Teams
HtmlNodeCollection teamNodes = document.DocumentNode.SelectNodes("//div[#class='ippg-Market_CompetitorName']");
List<string> teams = new List<string>();
foreach (HtmlNode n in teamNodes)
{
HtmlNode nodeTeam = n.SelectSingleNode(".//span[#class='ippg-Market_Truncator']");
if (nodeTeam != null)
{
teams.Add(nodeTeam.InnerText);
}
}
// Odds
HtmlNodeCollection oddNodes = document.DocumentNode.SelectNodes("//span[#class='ippg-Market_Odds']");
List<string> odds = new List<string>();
foreach (HtmlNode o in oddNodes)
{
odds.Add(o.InnerText);
}
I want to get link,title and price from this html(this is one result of ten results)
<div class="listing-item">
<div class="block item-title">
<h3 id="title">
<span style="direction: ltr" class="title">
<a xtcltype="S" xtclib="listing_list_1_title_link" href="http://dubai.dubizzle.com/motors/used-cars/ford/explorer/2013/7/1/ford-explorer-2012-new-model-expat-leaving-2/?back=ZHViYWkuZHViaXp6bGUuY29tL21vdG9ycy91c2VkLWNhcnMv&pos=1">FORD EXPLORER - 2012 - NEW MODEL - EXPAT LEAV...</a>
</span>
</h3>
<div class="price">
AED 118,000
<br>
</div>
</div>
</div>
Here is my code
var allCarResults = rootNode.SelectNodes("//div[normalize-space(#class)='listing-item']");
foreach (var carResult in allCarResults)
{
var dataNode = carResult.SelectSingleNode(".//div[#class='block item-title']");
var carNameNode = dataNode.SelectSingleNode(".//h3/a");
string carName = carNameNode.InnerText.Trim();
}
This give me object reference issue to get carName.What mistake i am doing here?
dataNode.SelectSingleNode(".//h3/a"); tries to select a <a> node directly under the <h3> that is somewhere under that dataNode.
However, in your case there is a <span> inbetween. So use dataNode.SelectSingleNode(".//h3//a"); (note the // between h3 and a) to get an <a> node somewhere below a <h3>.
I would like the nodes in the collection but with iterating SelectSingleNode I keep getting the same object just node.Id is changing...
What i try is to readout the webresponse of a given site and catch some information like values, links .. in special defined elements.
int offSet = 0;
string address = "http://www.testsite.de/ergebnisliste.html?offset=" + offSet;
HtmlWeb web = new HtmlWeb();
//web.OverrideEncoding = Encoding.UTF8;
HtmlDocument doc = web.Load(address);
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//div[#itemtype='http://schema.org/Posting']");
foreach (HtmlNode node in collection) {
string id = HttpUtility.HtmlDecode(node.Id);
string cpname = HttpUtility.HtmlDecode(node.SelectSingleNode("//span[#itemprop='name']").InnerText);
string cptitle = HttpUtility.HtmlDecode(node.SelectSingleNode("//span[#itemprop='title']").InnerText);
string cpaddress = HttpUtility.HtmlDecode(node.SelectSingleNode("//span[#itemprop='addressLocality']").InnerText);
string date = HttpUtility.HtmlDecode(node.SelectSingleNode("//div[#itemprop='datePosted']").InnerText);
string link = "http://www.testsite.de" + HttpUtility.HtmlDecode(node.SelectSingleNode("//div[#class='h3 title']//a[#href]").GetAttributeValue("href", "default"));
}
This is for example for 1 iteration:
<div id="66666" itemtype="http://schema.org/Posting">
<div>
<a>
<img />
</a>
</div>
<div>
<div class="h3 title">
<a href="/test.html" title="Test">
<span itemprop="title">Test</span>
</a>
</div>
<div>
<span itemprop="name">TestName</span>
</div>
</div>
<div>
<div>
<div>
<div>
<span itemprop="address">Test</span>
</div>
<span>
<a>
<span><!-- --></span>
<span></span>
</a>
</span>
</div>
</div>
<div itemprop="date">
<time datetime="2013-03-01">01.03.13</time>
</div>
</div>
By writing
node.SelectSingleNode("//span[#itemprop='name']").InnerText
it's like you writing
doc.DocumentNode.SelectSingleNode("//span[#itemprop='name']").InnerText
To do what you want to do you should write it like this: node.SelectSingleNode(".//span[#itemprop='name']").InnerText.
This .dot / period tells make a search on the current node which is node instead on doc