my program need to get the RSS link then go to read the rss.
I found when i parse to layer <div id="titleRSS_7224" class="rss"></div>
the <a>inside of it gone.
i'm using the HtmlAgilityPack
i can see the<a> from the google chrome
<div id="titleRSS_7224" class="rss">
<a title="RSS 2.0" target="_blank" rel="nofollow" href="/rss/media/bz0xMiZmbHBsPTIxMjEzNjYsMjAsODQwLDAmZng9.rss"></a>
</div>
my code is:
HtmlDocument temNode= new HtmlDocument();
string temStr = page.DocumentNode.SelectSingleNode(longPath).InnerHtml;
temNode.LoadHtml(page.DocumentNode.SelectSingleNode(longPath).InnerHtml);
then I check both temStr and temNode, the <a> is no in there.
i get another idea that is to do:
HtmlNode temNode = page.DocumentNode.SelectSingleNode("//a[#title='RSS 2.0']");
this works.
but just want to know why the first method does not work.
Perhaps if you just select the single node instead of the innerhtml youll be able to enumerate the childnodes?
Just spitballing though as im not familiar with that api.
Related
I have delt with an Href button in the past and didn't find it too hard, but this button is being a pain. I have tried clicking by xpath, class, and link text. None have worked. I know there are plenty of the same question out there, but most of them give answer's that I am already trying. Below is the code I have. The one thing I haven't tried is javascriptexecutor. I also have been clicking on it in the command line of chrome and it does work. Just can't get selenium too. It throws an element not found. Also I feel it is worthy to note that I did not find any IFrames that I need to switch to. The only things that concern me which maybe I do not have the knowledge of selenium to deal with is the HTML mentions header, main, section, div, ul, li , and a which I have seen all before except for ul and li. Thank you for any help someone provides.
wait.Until(ExpectedConditions.ElementToBeClickable(By.ClassName("card-header-link float-md-right"))).Click();
wait.Until(ExpectedConditions.ElementToBeClickable(By.XPath("/html/body/div[1]/main/section[1]/div/ul/li[5]/a"))).Click(); //full xpath
wait.Until(ExpectedConditions.ElementToBeClickable(By.XPath("//*[#id='app']/main/section[1]/div/ul/li[5]/a"))).Click();
wait.Until(ExpectedConditions.ElementToBeClickable(By.LinkText("Security"))).Click();
HTML
<li data-v-91f16f3e="">
<a data-v-91f16f3e="" href="/security" class="">
<span data-v-91f16f3e="" class="icon icon-shield"></span>
<span data-v-91f16f3e="" class="text">Security</span></a>
</li>
try this xpath :
//span[text()='Security']/..
or
//span[text()='Security']/parent::a[#href='/security']
in code :
wait.Until(ExpectedConditions.ElementToBeClickable(By.XPath("//span[text()='Security']/.."))).Click();
but it is strange to know that By.LinkText("Security") did not work.
Update 1 :
try this css selector
div[class$='desktop'] li a[href$='security']
code :
wait.Until(ExpectedConditions.ElementToBeClickable(By.CssSelector("div[class$='desktop'] li a[href$='security']"))).Click();
I see the href attribute value is lowercased security, not Security.
So, please try this:
wait.Until(ExpectedConditions.ElementToBeClickable(By.Xpath("//a[contains(#href,'security')]"))).Click();
trying to find this span and click on it. There are multiple objects on the page with the same ID. Need to find by data-margin
<span title="Add me!" onclick="addCalc(this)" id="chkSelectedPrice" class="glyphicon glyphicon-unchecked pointer-finger add-calc" data-productid="1534" data-margin="1.375" data-lpc="0" data-unadjustedplf="0.578" data-plf="0.578" data-isfixed="False" data-buyrate="0"></span>
I think you can achieve this through xpath. Selenium has built-in xpath support so it should be pretty easy to query for your element
https://www.guru99.com/xpath-selenium.html
Also, if you want to generate an xpath for elements on an existing website you can try this plugin
https://addons.mozilla.org/en-US/firefox/addon/truepath/
I haven't tested this but I think your xpath will be something similar to
span[#data-margin="1.375"]
Here is the xpath.
//span[#data-margin="1.375"]
CSS:
span[data-margin="1.375"]
I'm a bit confused on how to extract specific href links from an HTML page. There are certainly a good amount of examples, but they seem to cover either gathering an href when theres just one on the page, or gathering all the links.
So I currently push the HTML document into a text file using HttpWebRequest, HttpWebResponse, and StreamReader.
Here's my little sample I'm working with, this just downloads the URL of my choice and saves it to a text file.
protected void btnURL_Click(object sender, EventArgs e)
{
string url = txtboxURL.Text;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
//lblResponse.Text = sr.ReadToEnd();
string urldata = sr.ReadToEnd();
if (File.Exists(#"C:\Temp\test.txt"))
{
File.Delete(#"C:\Temp\test.txt");
}
File.Create(#"C:\Temp\test.txt").Close();
File.WriteAllText(#"C:\Temp\test.txt", urldata);
sr.Close();
response.Close();
}
I can search the entire text file for a href, but there are a lot of them on each page, and the ones I'm looking for are sectioned in a <nav> tag, and then they are all in <div> tags with the same class, sort of like this:
<nav class="deptVertNav>
<div class="acTrigger">
<a href="*this is what I need to get*" ....
....
</a>
</div>
<div class="acTrigger">
<a href="*etc*" ....
....
</a>
</div>
<div class="acTrigger">
<a href="*etc*" ....
....
</a>
</div>
</nav>
Essentially I'm trying to create a text crawler/scraper to retrieve links. The current pages I'm working with start at a main page with links down the side on a navigation bar. Those links in the navigation bar are what I want to get to so I may download each of those page's content, and then retrieve the real data I'm looking for. So this is all just one big parse job, and I am terrible at parsing. If I can figure out how to parse this first main page then I will be able to parse the sub pages.
I don't want anyone to just give me the answer, I just want to know what a good method of parsing would be in this situation. IE how do I narrow the parse down to just those tags, and then what would be a good dynamic way to store those links so I can access them later? I hope this makes sense.
EDIT: Well I am now attempting to use HtmlAgilityPack with much confusion. To my knowledge this will retrieve all the nodes that are a <div class="acTrigger"> that are within the page I load:
var div = html.DocumentNode.SelectNodes("//div[#class='acTrigger']");
The next question is how I get inside the <div> tag and into the <a> tag, and then retrieve the href value, and store it.
Instead of trying to manually parse the text file, I would recommend placing the HTML in a HtmlDocument control (https://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument(v=vs.110).aspx) or WebBrowser control (https://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=vs.110).aspx). This allows you to access the elements already parsed. From there you can easily find all DIV elements with the appropriate class, and then the A element inside of that.
Take a look at the Selenium Web Driver library. Then grab the urls as needed.
IWebElement anchorUrl1 = driver.FindElement(By.XPath("//nav[#class='deptVertNav']/div[1]/a[1]"));
string urlText1 = anchorUrl1.Text;
IWebElement anchorUrl2 = driver.FindElement(By.XPath("//nav[#class='deptVertNav']/div[2]/a[1]"));
string urlText2 = anchorUrl2.Text;
If all you want to do is click on them, then:
driver.FindElement(By.XPath("//nav[#class='deptVertNav']/div[1]/a[1]")).Click();
I am trying to create a proper XPATH syntax in C# to click on a download button from the Amazon business website. Everything I have tried is unable to find the button. Here are some of the things I've tried:
driver.FindElement(By.XPath("//button[#type='submit']")).Submit();
driver.FindElement(By.XPath("//span[contains(#class,'a-button-inner')][contains(text(),'downloadCSV_button-announce')]")).Submit();
driver.FindElement(By.XPath("//span[contains(#class,'a-button-inner')][contains(text(),'Download CSV')]")).Submit();
Below is the source code from the Amazon page. Can anyone help me to design the proper XPATH query to click this download button? Thank you.
<h1>Amazon Business Analytics</h1>
<div class="a-row a-spacing-medium a-grid-vertical-align a-grid-center">
<div class="a-column a-span12">
<span class="a-declarative" data-action="aba:download-csv" data-aba:download-csv="{}">
<span id="downloadCSV_button" class="a-button aok-float-right"><span class="a-button-inner"><input class="a-button-input" type="submit" aria-labelledby="downloadCSV_button-announce"><span id="downloadCSV_button-announce" class="a-button-text" aria-hidden="true">Download CSV</span></span></span>
</span>
You should try using WebElement#click() to perform click on element instead as below :-
driver.FindElement(By.CssSelector("input.a-button-input[aria-labelledby = 'downloadCSV_button-announce']")).Click();
Or if span element is clickable try as :-
driver.FindElement(By.Id("downloadCSV_button-announce")).Click();
Or
driver.FindElement(By.Id("downloadCSV_button")).Click();
there is an html codes like below :
<div class="class1 class2 class3">
<div class="class4 class5">
<span class="class6">GOAL STRING</span>
</div>
</div>
now i want to find that GOAL STRING use from HTMLAgilityPack.
how can i do that?
[with LINQ and without LINQ = please show us both ways]
thanks in advance
Well you can use xpath to get the span directly.
document.DocumentNode.SelectSingleNode("//div[#class='class1 class2 class3']/div[#class='class4 class5']/span[#class='class6']").InnerText;
This is a good resource for xpath specifically the table in the middle of the page:
http://www.codeproject.com/Articles/9494/Manipulate-XML-data-with-XPath-and-XmlDocument-C
Also on Google Chrome you can right click -> inspect element and then right click the element that shows up on the tree and click copy as Xpath to get a starting point. These expressions can usually be simplified.