Consider the following example, where a ul element's id is known, and we want to Click() its containing li element if the li.Text equals a certain text.
Here are two working solutions to this problem:
Method 1: Using XPath
ReadOnlyCollection<IWebElement> lis = FindElements(By.XPath("//ul[#id='id goes here']/li"));
foreach (IWebElement li in lis) {
if (li.Text == text) {
li.Click();
break;
}
}
Method 2: Using ID and TagName
IWebElement ul = FindElement(By.Id("id goes here"));
ReadOnlyCollection<IWebElement> lis = ul.FindElements(By.TagName("li"));
foreach (IWebElement li in lis) {
if (li.Text == text) {
li.Click();
break;
}
}
My question is: When should we use XPath and when shouldn't we?
I prefer to use XPath only when necessary. For this specific example, I think that XPath is completely unnecessary, but when I looked up this specific problem on StackOverflow, it seems that a majority of users default to using XPath.
In this particular case, XPath can even simplify the problem to a single line:
driver.FindElement(By.XPath(String.Format("//ul[#id='id goes here']/li[. = '{0}']", text))).click();
In general though, if you can uniquely identify an element using simple By.Id or By.TagName or other similar "simple" locators, do it. XPath expression and CSS selector based locators usually either provide advanced ways to locate elements (we can go up/down/sideways in the tree, use partial attribute matches, count elements, determine their position etc) or make the element's location concise, as in this particular situation.
When you need to track more similar web elements use XPATH.
When you need particular single element use id
Xpath having more advantage, because sometimes id get duplicate
This is my experience!
Related
I have XPath
/html/body/div[#id='page']/div[#id='page-inner']/div[#id='main-box']/div[#class='in1']/div[#id='content-and-context']/div[#id='content']/div[#class='under-bar']/table[#class='flights']/tbody/tr[#id='flight-932539']/td[2]:
But flight-number are changes. Can I find Elements with part XPath ?
I use foreach() and write data for every flight.
this is html code:
First thing first: don't use absolute path. Even the smallest change in the html invalidate the path, especially in dynamic applications. Your xpath could easily be //tr[#id='flight-932539']/td[2]
As for your question, you can use contains() for partial id
//tr[contains(#id, 'flight-')]/td[2]
As Guy mentioned xpath above, for same you can easily use findElements to find all the flight details and then according perform you actions using for loop.
List<WebElement> WebElements = driver.findElements(By.xpath("//tr[contains(#id, 'flight-')]/td[2]");
for(WebElement element : WebElements){
//perform any operation like for click you can use
element.getText();
}
Above example is in JAVA you can do same in C# as well.
C# Selenium Webdriver
So i need to ensure that none of my pages (around 200 pages) contain a particular known string. Is there any way that i can scan a page for the existence of this string and if it does then return both the ElementID of that element and the entire string?
For example my source is like:
<a id="cancel_order_lnkCancel">Cancel Order</a>
I want to search for the word 'Cancel' on the whole page (<div id="sitewrapper">) and return both
cancel_order_lnkCancel;Cancel Order
Thanks
You can use XPath to find by text. e.g.:
var element = driver.FindElement(By.XPath(string.Format("//*[contains(text(), '{0}')]", value)));
value being the string you are searching for.
Then to get the element's markup and content:
var html = element.GetAttribute("outerHTML");
var text = element.Text;
or
var text = element.GetAttribute("innerHTML");
I haven't worked in C# binding but you can use FindElements to get a list of all elements containing the text. You can by no doubt use #Jarga's xpath. The good thing with FIndElements will be that it won't throw you an exception (atleast this is what happens in java) though you have to use try catch to handle getAttribute if you get null for value of id. And if you iterate over the list you can fetch all texts using getText method.
I am having an issue with XPath syntax as I dont understand how to use it to extract certain HTML statements.
I am trying to load a videos information from a channel page; http://www.youtube.com/user/CinemaSins/videos
I know there is a line that holds all the details from views, title, ID, ect.
Here is what I am trying to get from within the html:
Thats line 2836;
<div class="yt-lockup clearfix yt-lockup-video yt-lockup-grid context-data-item" data-context-item-id="ntgNB3Mb08Y" data-context-item-views="243,456 views" data-context-item-time="9:01" data-context-item-type="video" data-context-item-user="CinemaSins" data-context-item-title="Everything Wrong With The Chronicles Of Riddick In 8 Minutes Or Less">
I'm not sure how, But I have HTML Ability Pack added as a resouce and have started attempts on getting it.
Can someone explain how to get all of those details and the XPath syntax involved?
What I have attemped:
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='yt-lockup clearfix yt-lockup-video yt-lockup-grid context-data-item']//a"))
{
if (node.ChildNodes[0].InnerHtml != String.Empty)
{
title.Add(node.ChildNodes[0].InnerHtml);
}
}
^ The above code works in only getting the title of each video. But it also has a blank input aswell. Code executed and result is below.
Your xpath is selecting the <a> element inside the <div>. If you want the attributes of the <div> too, then you need to either
a) select both elements and process them separately.
b) run several xpath queries where you specify the exact attribute you want.
Lets go with (a) for this example.
var nodes = doc.DocumentNode.SelectNodes("//div[#class='yt-lockup clearfix yt-lockup-video yt-lockup-grid context-data-item']");
and get the attributes and title like so:
foreach(var node in nodes)
{
foreach(var attribute in node.Attributes)
{
// ... Get the values of the attributes here.
}
var linkNodes = node.SelectNodes("//a"));
// ... Get the InnerHtml as per your own example.
}
I hope this was clear enough. Good luck.
Seems the answer given to me did not help what so ever so after HEAPS of digging, I finally understand how XPath works and managed to do it myself as seen below;
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='yt-lockup clearfix yt-lockup-video yt-lockup-grid context-data-item']"))
{
String val = node.Attributes["data-context-item-id"].Value;
videoid.Add(val);
}
I just had to grab the content within the class. Knowing this made it alot easier to use.
I have a source code on a webpage that I wish to extract (I've narrowed it down to exactly what is relevant here:
<div class="sideInfoPlayer">
<a class="signLink" href="spieler.php?uid=12345" title="Profile">
<span class="wrap">Wagamama</span>
</a>
Now the trick here is that I want to get the word Wagamama into a message box but that word changes on every page of that site so I need to get that element but there is no ID on this page. Therefore I was thinking of doing a search for the class named "sideInfoPlayer" first and then find the "wrap" class within the previous class block.
I have written the below to get the first one but do not know how to tackle the second one and then get the desired value.
HtmlElementCollection col = webBrowser1.Document.GetElementsByTagName("div");
foreach (HtmlElement element in col)
{
string cls = element.GetAttribute("className");
if (String.IsNullOrEmpty(cls) || !cls.Equals("sideInfoPlayer"))
continue;
}
I hope you can help unstuck me on this one.
You have better options. Look at http://htmlagilitypack.codeplex.com/
And here: How can i parse html string
First you'll need to add reference to HtmlAgilityPack library by downloading it manually or with NuGet package manager.
// loading html into HtmlDocument
var doc = new HtmlWeb().Load("http://website.com/mypage");
// walking through all nodes of interest
foreach (var node in doc.DocumentNode.SelectNodes("//div[#class='sideInfoPlayer']/span[#class='wrap']"))
{
// here is your text: node.InnerText
}
//div[#class='sideInfoPlayer']/span[#class='wrap'] is called Xpath Expression and this one literally means "get me all span elements with class=wrap that are children of div element with class=sideInfoPlayer.
I didn't test it, but it should work.
I'm scraping a number of websites using HtmlAgilityPack. The problem is that it seems to insist on inserting TextNodes in most places which are either empty or just contain a mass of \n, whitespaces and \r.
They tend to cause me issues when I'm counting childnodes , since firebug doesn't show them, but HtmlAgilityPack does.
Is there a way of telling HtmlAgilityPack to stop doing it, or at least clearing out these textnodes? (I want to keep USEFUL ones though). While we're here, same thing for Comment and Script tags.
You can use the following extension method:
static class HtmlNodeExtensions
{
public static List<HtmlNode> GetChildNodesDiscardingTextOnes(this HtmlNode node)
{
return node.ChildNodes.Where(n => n.NodeType != HtmlNodeType.Text).ToList();
}
}
And call it like this:
List<HtmlNode> nodes = someNode.GetChildNodesDiscardingTextOnes();
There is a difference between "no whitespace" between two nodes and "some whitespace". So all-whitespace textnodes still are needed and significant.
Couldn't you preprocess the html and remove all nodes that you do not need, before starting the "real scraping"?
See also this answer for the "how to remove".
Create an extension method that operates on the "Child" collection (or similar) on a node that uses some LINQ to filter out unwanted nodes. Then, when you traverse your tree do something like this:
myNode.Children.FilterNodes().ForEach(x => {});
I am looking for a better answer. Here is my current method with respect to childnodes like tables rows and table cells. Nodes are identified by their name TR, TH, TD so I strip out #text every time.
List<HtmlNode> rows = table.ChildNodes.Where(w => w.Name != "#text").ToList();
Sure, it is tedious and works and could be improved by an extension.