Selenium C# - Web element attribute is present but cannot be found? - c#

Running into an issue with a function I wrote for a selenuim testcase... When I run a jquery on a web element ID (#AmountToggle), it displays all the attributes. I want to verify this one ("lastChild"):
but when I run this code:
driver.FindElement(By.CssSelector("#AmountToggle")).GetAttribute("lastChild")
its returning null?!
Why is this and how can I get the correct value of this attribute?

So it looks like "lastChild" is a property, not an attribute. Although I cannot assert that for sure with what you have above.
The difference being, an attribute would appear as something in the direct html, such as:
Link
Where href and id are attributes. If lastChild doesn't appear in the html like the above examples, it won't be considered an attribute.
First try comparing these two in the javascript console:
$("#AmountToggle").attr("lastChild")
$("#AmountToggle").prop("lastChild")
The following is a workaround when you have issues with Selenium finding things. This logic will allow you to find things inside of iframes with ease, and also will allow you to use pseudo-selectors to find elements:
public static string GetFullyQualifiedXPathToElement(string cssSelector, bool isFullJQuery = false, bool noWarn = false)
{
if (cssSelector.Contains("$(") && !isFullJQuery) {
isFullJQuery = true;
}
string finder_method = #"
function getPathTo(element) {
if(typeof element == 'undefined') return '';
if (element.tagName == 'HTML')
return '/HTML[1]';
if (element===document.body)
return '/HTML[1]/BODY[1]';
var ix= 0;
var siblings = element.parentNode.childNodes;
for (var i= 0; i< siblings.length; i++) {
var sibling= siblings[i];
if (sibling===element)
return getPathTo(element.parentNode)+'/'+element.tagName+'['+(ix+1)+']';
if (sibling.nodeType===1 && sibling.tagName===element.tagName)
ix++;
}
}
";
if(isFullJQuery) {
cssSelector = cssSelector.TrimEnd(';');
}
string executable = isFullJQuery ? string.Format("{0} return getPathTo({1}[0]);", finder_method, cssSelector) : string.Format("{0} return getPathTo($('{1}')[0]);", finder_method, cssSelector.Replace("'", "\""));
string xpath = string.Empty;
try {
xpath = BaseTest.Driver.ExecuteJavaScript<string>(executable);
} catch (Exception e) {
if (!noWarn) {
//Warn about failure with custom message.
}
}
if (!noWarn && string.IsNullOrEmpty(xpath)) {
//Warn about failure with custom message.
//string.Format("Supplied cssSelector did not point to an element. Selector is \"{0}\".", cssSelector);
}
return xpath;
}
This method uses Jquery, which has more extensive search options using CssSelectors (such as pseudo selectors), and finds things 100% of the time given a good search query. This method uses JQuery to find the element, and then generates an explicit XPath to that element in the DOM, returning that XPath. With the explicit XPath, you can then tell Selenium to find the element using XPath.
It looks like the value of last-child is an element itself. If that is true, here is how you might use this in your example:
driver.FindElement(By.XPath(GetFullyQualifiedXPathToElement("$(#AmountToggle).prop('lastChild')[0]", true)));
Note three things here. The first is that I used "prop" in JQuery. Change that to "attr" if that was the correct call. Also, note the [0] index. This will return the JQuery element value as a regular javascript DOM element, which is what the method above uses. The final thing to note is the cssSelector value passed in. You can pass in just a selector to this method, such as "#SomeElementId > div", or you can pass in full JQuery, such as "$('#SomeElementId > div')".

Related

Adding conditional statements to an XPath query

I am able to retrieve data using C# & XPath and display it in a list, but I would like to know how to perform two unique actions.
To start with, my code example looks like this:
protected async override void OnNavigatedTo(NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string htmlPagePurchase = "";
using (var client = new HttpClient())
{
htmlPagePurchase = await client.GetStringAsync(MyURI);
}
HtmlDocument htmlDocumentPurchase = new HtmlDocument();
htmlDocumentPurchase.LoadHtml(htmlPagePurchase);
foreach (var div in htmlDocumentPurchase.DocumentNode.SelectNodes("//div[contains(#id, 'odyContent')]"))
{
PurchaseDetails newPurchase = new PurchaseDetails();
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();
Purchase.Add(newPurchase);
}
lstPurchase.ItemsSource = Purchase;
}
Firstly, if there is no "ex1" node within the page, can I request a null value be returned or for it to be ignored? I need to do this as some of the pages I use contain the data I want in an alternative node (I can't control this) and I don't want the app to crash if one of the nodes isn't contained within that page.
Secondly, if the node contains no text within it, can I force an output i.e. within a list of "ex1" nodes, some contain an expired date but one "ex1" node does not include any date as it hasn't expired yet. When that happens can I return my own value of 'hasn't expired', for example?
This is being compiled in a Windows Phone 8.0 Silverlight App.
This code should work by checking the node and value, and using your defaultValue if no real value is found.
var node = xmlDoc.SelectSingleNode(".//ex1");
return (node == null || string.IsNullOrEmpty((node.InnerText ?? "").Trim()) ? defaultValue : node.InnerText.Trim());
.NET Fiddle: https://dotnetfiddle.net/3DAjKH
UPDATE FOR INTEGRATING WITH PROVIDED CODE SAMPLE
This should work within your loop.
var exNode = div.SelectSingleNode(".//ex1");
if (exNode == null || string.IsNullOrEmpty((exNode.InnerText ?? "").Trim()))
newPurchase.Expiry = "N/A"; // Default value
else
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();

Get HtmlAgilityPack Node using exact HTML search or Converting HTMLElement to HTMLNode

I have created a HTMLElement picker (DOM) by using the default .net WebBrowser.
The user can pick (select) a HTMLElement by clicking on it.
I want to get the HtmlAgilityPack.HTMLNode corresponding to the HTMLElement.
The easiest way (in my mind) is to use doc.DocumentNode.SelectSingleNode(EXACTHTMLTEXT) but it does not really work (because the function only accepts xpath code).
How can I do this?
A sample HTMLElement select by a user looks like this (The OuterHtml Code):
<a onmousedown="return wow" class="l" href="http://site.com"><em>Great!!!</em> <b>come and see more</b></a>
Of course, any element can be selected, that's why I need a way to get the HTMLNode.
Same concept, but a bit simpler because you don't have to know the element type:
HtmlNode n = doc.DocumentNode.Descendants().Where(n => n.OuterHtml.Equals(text, StringComparison.InvariantCultureIgnoreCase)).FirstOrDefault();
I came up with a solution. Don't know if it's the best (I would appreciate if somebody knows a better way to achieve this to let me know).
Here is the class that will get the HTMLNode:
public HtmlNode GetNode(string text)
{
if (text.StartsWith("<")) //get the type of the element (a, p, div etc..)
{
string type = "";
for (int i = 1; i < text.Length; i++)
{
if (text[i] == ' ')
{
type = text.Substring(1, i - 1);
break;
}
}
try //check to see if there are any nodes of your HTMLElement type that have an OuterHtml equal to the HtmlElement Outer HTML. If a node exist, than that's the node we want to use
{
HtmlNode n = doc.DocumentNode.SelectNodes("//" + type).Where(x => x.OuterHtml == text).First();
return n;
}
catch (Exception)
{
throw new Exception("Cannot find the HTML element in the HTML Page");
}
}
else
{
throw new Exception("Invalid HTML Element supplied. The selected HTML element must start with <");
}
}
The idea is that you pass the OuterHtml of the HtmlElement. Example:
HtmlElement el=....
HtmlNode N = GetNode(el.OuterHtml);

Find a string list whether containing same element more than once

I am writing my own specific web crawler for product selling websites. Due to their very bad coding nature i get with getting urls pointing same page.
Example one
http://www.hizlial.com/bilgisayar/bilgisayar-bilesenleri/bilgisayar/yazicilar/samsung-scx-3200-tarayici-fotokopi-lazer-yazici_30.033.1271.0043.htm
For example the page above is same as below
http://www.hizlial.com/bilgisayar-bilesenleri/bilgisayar/yazicilar/samsung-scx-3200-tarayici-fotokopi-lazer-yazici_30.033.1271.0043.htm
As you can see it contains 2 "bilgisayar" element when you split via '/' character
So what i want is i want to split urls like this
string[] lstSPlit = srURL.Split('/');
After that check that whether that list contains same element more than once or not. Any element. If contains any element i will skip the url because i would have already have the real url extracted from some other page. So what is the best way of doing this ?
Longer but working version
string[] lstSPlit = srHref.Split('/');
bool blDoNotAdd = false;
HashSet<string> splitHashSet=new HashSet<string>();
foreach (var vrLstValue in lstSPlit)
{
if (vrLstValue.Length > 1)
{
if (splitHashSet.Contains(vrLstValue) == false)
{
splitHashSet.Add(vrLstValue);
}
else
{
blDoNotAdd = true;
break;
}
}
}
if (list.Distinct().Count() < list.Count)
This ought to be faster than grouping. (I haven't measured)
You can make it even faster by writing your own extension method that adds items to a HashSet<T> and returns false immediately if Add() returns false.
You can even do that using a wicked shorthand:
if (!list.All(new HashSet<string>().Add))
if(lstSPlit.GroupBy(i => i).Where(g => g.Count() > 1).Any())
{
// found more than once
}

HtmlElement.Parent returns wrong parent

I'm trying to generate CSS selectors for random elements on a webpage by means of C#. Some background:
I use a form with a WebBrowser control. While navigating one can ask for the CSS selector of the element under the cursor. Getting the html-element is trivial, of course, by means of:
WebBrowser.Document.GetElementFromPoint(<Point>);
The ambition is to create a 'strict' css selector leading up to the element under the cursor, a-la:
html > body > span:eq(2) > li:eq(5) > div > div:eq(3) > span > a
This selector is based on :eq operators since it's meant to be handled by jQuery and/or SizzleJS (these two support :eq - original CSS selectors don't. Thumbs up #BoltClock for helping me clarify this). So, you get the picture. In order to achieve this goal, we supply the retrieved HtmlElement to the below method and start ascending up the DOM tree by asking for the Parent of each element we come across:
private static List<String> GetStrictCssForHtmlElement(HtmlElement element)
{
List<String> familyTree;
for (familyTree = new List<String>(); element != null; element = element.Parent)
{
string ordinalString = CalculateOrdinalPositionAmongSameTagSimblings(element);
if (ordinalString == null) return null;
familyTree.Add(element.TagName.ToLower() + ordinalString);
}
familyTree.Reverse();
return familyTree;
}
private static string CalculateOrdinalPositionAmongSameTagSimblings(HtmlElement element, bool simplifyEq0 = true)
{
int count = 0;
int positionAmongSameTagSimblings = -1;
if (element.Parent != null)
{
foreach (HtmlElement child in element.Parent.Children)
{
if (element.TagName.ToLower() == child.TagName.ToLower())
{
count++;
if (element == child)
{
positionAmongSameTagSimblings = count - 1;
}
}
}
if (positionAmongSameTagSimblings == -1) return null; // Couldn't find child in parent's offsprings!?
}
return ((count > 1) ? (":eq(" + positionAmongSameTagSimblings + ")") : ((simplifyEq0) ? ("") : (":eq(0)")));
}
This method has worked reliably for a variety of pages. However, there's one particular page which makes my head in:
http://www.delicious.com/recent
Trying to retrieve the CSS selector of any element in the list (at the center of the page) fails for one very simple reason:
After the ascension hits the first SPAN element in it's way up (you can spot it by inspecting the page with IE9's web-dev tools for verification) it tries to process it by calculating it's ordinal position among it's same tag siblings. To do that we need to ask it's Parent node for the siblings. This is where things get weird. The SPAN element reports that it's Parent is a DIV element with id="recent-index". However that's not the immediate parent of the SPAN (the immediate parent is LI class="wrap isAdv"). This causes the method to fail because -unsurprisingly- it fails to spot SPAN among the children.
But it gets even weirder. I retrieved and isolated the HtmlElement of the SPAN itself. Then I got it's Parent and used it to re-descend back down to the SPAN element using:
HtmlElement regetSpanElement = spanElement.Parent.Children[0].Children[1].Children[1].Children[0].Children[2].Children[0];
This lead us back to the SPAN node we begun ... with one twist however:
regetSpanElement.Parent.TagName;
This now reports LI as the parent X-X. How can this be? Any insight?
Thank you again in advance.
Notes:
I saved the Html code (as it's presented inside WebBrowser.Document.Html) and inspected it myself to be 100% sure that nothing funny is taking place (aka different code served to WebBrowser control than the one I see in IE9 - but that's not happening the structure matches 100% for the path concerned).
I am running WebBrowser control in IE9-mode using the instructions outlined here:
http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version
Trying to get WebBrowser control and IE9 to run as similarly as possible.
I suspect that the effects observed might be due to some script running behind my back. However my knowledge is not so far reaching in terms of web-programming to pin it down.
Edit: Typos
Relying on :eq() is tough! It is difficult to reliably re-select out of a DOM that is dynamic. Sure it may work on very static pages, but things are only getting more dynamic every day. You might consider changing strategy a little bit. Try using a smarter more flexible selector. Perhaps pop in some javascript like so:
predictCss = function(s, noid, noclass, noarrow) {
var path, node = s;
var psep = noarrow ? ' ' : ' > ';
if (s.length != 1) return path; //throw 'Requires one element.';
while (node.length) {
var realNode = node[0];
var name = (realNode.localName || realNode.tagName || realNode.nodeName);
if (!name || name == '#document') break;
name = name.toLowerCase();
if(node.parent().children(name).length > 1){
if (realNode.id && !noid) {
try {
var idtest = $(name + '#' + realNode.id);
if (idtest.length == 1) return name + '#' + realNode.id + (path ? '>' + path : '');
} catch (ex) {} // just ignore the exception, it was a bad ID
} else if (realNode.className && !noclass) {
name += '.' + realNode.className.split(/\s+/).join('.');
}
}
var parent = node.parent();
if (name[name.length - 1] == '.') {
name = name.substring(0, name.length - 1);
}
siblings = parent.children(name);
//// If you really want to use eq:
//if (siblings.length > 1) name += ':eq(' + siblings.index(node) + ')';
path = name + (path ? psep + path : '');
node = parent;
}
return path
}
And use it to generate a variety of selectors:
var elem = $('#someelement');
var epath = self.model.util.predictCss(elem, true, true, false);
var epathclass = self.model.util.predictCss(elem, true, false, false);
var epathclassid = self.model.util.predictCss(elem, false, false, false);
Then use each:
var relem= $(epathclassid);
if(relem.length === 0){
relem = $(epathclass);
if(relem.length === 0){
relem = $(epath);
}
}
And if your best selector still comes out with more than one element, you'll have to get creative in how you match a dom element - perhaps levenshtein or perhaps there is some specific text, or you can fallback to eq. Hope that helps!
Btw, I assumed you have jQuery - due to the sizzle reference. You could inject the above in a self-executing anonymous function in a script tag appended to the last child of body for example.

How to get values from tag? [duplicate]

I am developing a Windows Forms application which is interacting with a web site.
Using a WebBrowser control I am controlling the web site and I can iterate through the tags using:
HtmlDocument webDoc1 = this.webBrowser1.Document;
HtmlElementCollection aTags = webDoc1.GetElementsByTagName("a");
Now, I want to get a particular text from the tag which is below:
Show Assigned<br>
Like here I want to get the number 244 which is equal to assignedto in above tag and save it into a variable for further use.
How can I do this?
You can try splitting a string by ';' values, and then each string by '=' like this:
string aTag = ...;
foreach(var splitted in aTag.Split(';'))
{
if(splitted.Contains("="))
{
var leftSide = splitted.Split('=')[0];
var rightSide = splitted.Split('=')[1];
if(leftSide == "assignedto")
{
MessageBox.Show(rightSide); //It should be 244
//Or...
int num = int.Parse(rightSide);
}
}
}
Other option is to use Regexes, which you can test here: www.regextester.com. And some more info on regexes: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
Hope it helps!
If all cases are similar to this and you don't mind a reference to System.Web in your Windows Forms application, tou can do something like this:
using System;
public class Program
{
static void Main()
{
string href = #"issue?status=-1,1,2,3,4,5,6,7&
#sort=-activity&#search_text=&#dispname=Show Assigned&
#filter=status,assignedto&#group=priority&
#columns=id,activity,title,creator,status&assignedto=244&
#pagesize=50&#startwith=0";
href = System.Web.HttpUtility.HtmlDecode(href);
var querystring = System.Web.HttpUtility.ParseQueryString(href);
Console.WriteLine(querystring["assignedto"]);
}
}
This is a simplified example and first you need to extract the href attribute text, but that should not be complex. Having the href attribute text you can take advantage that is basically a querystring and reuse code in .NET that already parses query strings.
To complete the example, to obtain the href attribute text you could do:
HtmlElementCollection aTags = webBrowser.Document.GetElementsByTagName("a");
foreach (HtmlElement element in aTags)
{
string href = element.GetAttribute("href");
}

Categories

Resources