Navigate HTML Source while performing WatiN tests - c#

I am performing actions on the page during WatiN tests. What is the neatest method for asserting certain elements are where they should be by evaluating the HTML source? I am scraping the source but looking for a clean way to navigate the tags pulled back.
UPDATE: Right now I am thinking about grabbing certain elements within HTML source using regular expressions and then analysing that to see if other elements exist within. Other thoughts appreciated.

IE.ContainsText("myText") is not enough for your scenario?

I would use XPath to navigate tags in HTML without using regexps.

Related

Selenium: Finding if a Webelement is contained inside another Webelement?

I'm currently learning Selenium by building a test framework up to test Trello - for no good reason other than it's free and reasonably complex.
The problem I have right now is that, I want to check if a card is in a certain column or not.
I have the card & column as WebElements so what I'm doing right now is:
column.FindElements(By.LinkText(_card.Text)).Count > 0;
But this doesn't work if there's no text and seems pretty brittle.
What I want is something like:
column.Contains(_card)
I've searched on SO but I've only seen solutions which pass an XPath - I don't have an XPath, I have the WebElement.
Any ideas?
Two things,
relative xpath is fairly easy to learn and could probably take care of this for you.
CSS selectors should also easily identify the container regardless of the text. Without seeing the code, I can't help much more.
You should be able to find all elements with a certain css tag.
Using Firefox with the Firebug extension, right click your element and go to Inspect Element with Firebug. Then, when the html of your element comes up in the window, right click the element and select Copy XPath. Now you have an XPath to use.
To use the CSS Selectors that others are talking about, you can select Copy CSS Path instead of Copy XPath.
Hope this helps.

Finding element using Selenium Web Driver in C#

So I've been working learning how to use Selenium in C# to do some automated testing for a project. However, I've hit a roadblock on this one. I have been trying to figure out of way to click the following link on this webpage.
Here is what I'm trying to target:
<A class='PortalLink' HREF="https://mywebsite.com/myprograms/launchprogram.jsp?" onClick="setUser('login','password');"><span>MyProgram</span></A>
Searching by ClassName hasn't turned up anything. Although there are multiples, I just wanted to see if I could detect the presence of them.
By.ClassName("PortalLink")
I tried a href based search using CssSelector, but this failed as well.
By.CssSelector("[href*='https://mywebsite.com/myprograms/launchprogram.jsp?']")
Lastly, I tried to use XPath and search by class and span content, but this failed to find the link as well.
By.XPath("//A[contains(#class,'PortalLink') and span[text()='MyProgram']]")))
The webpage in question contains 2 frames which I've tried both.
I'm waiting 200 seconds before timing out. What am I doing incorrectly? Thanks in advance for any help!
Assuming that this element is not appended to the DOM during ajax, your statement should be
By.CssSelector("a.PortalLink[href*='launchprogram.jsp']")
If there are multiple of these links, then we'll need to go further up in the parent-child hierarchy since this link has no more attributes that make this link unique.
If you can post the parent html of this link then we can suggest more options,
Can you try these......
//span[contains(text(),'MyProgram']
//span[contains(text(),'MyProgram']/../

C#\HtmlAgilityPack Parse Other Then FIreBug

im using HtmlAgilityPack to parse Html nodes,
im using firebug to search the node attributes which im looking for, like div with class name "ABC"
I've noticed that sometimes im getting no result for the div im looking for, i debug that and saw that the XPATH from firebug and from HtmlAgilityPack is different for the same Node:S
/html[1]/body[1]/div[2]/div[3]/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]/div[1]/table[1]/tr[1]/td[1]/div[1]/table[1]/tr[2]/td[1]/div[2]/table[1]/tr[1]/td[1]/div[1]/td[1]/div[1]
/html/body/div[3]/div[3]/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr/td/div/table/tbody/tr/td/div/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td/div/div/table/tbody/tr[3]/td/table/tbody/tr/td[2]/div
First one is firebug. nyone knows where im wrong?
There are two possible reasons
HTML Agility Pack is not parsing the HTML correctly
The web page has been altered by client script after the page was loaded. When you view with Firebug, you are looking at the DOM, not the HTML source. HAP can only work the HTML source.
You will notice in the paths you have shown that (for example) there are no TBODY tags in the HAP version. TBODY is is optional in HTML markup, but still a required tag in a complete DOM. Browser HTML parsers will always add TBODY if it's missing. HAP will not. This can result in paths that work in a browser, failing in HAP
An alternative to HAP is CsQuery (on nuget), which uses a standards-compliant HTML parser (actually - the same parser as Firefox). CsQuery is a C# jquery port, it works with CSS selectors (not xpath). It should give you a DOM that matches the one the browser shows. This will not change anything if the problem is simply that javascript is altering the DOM, though.
Html Agility Pack only concentrates on markup. It has no idea of how things will be rendered. Firebug I think relies on the current in-firefox-memory DOM which can be dramatically different. That's why you see elements such as TBODY that only exist in DOM, not in markup (where they are optional).
Plus you can add to that the fact that there is an infinite possible XPATH expression for a given Xml node.
Anyway, in general, the needed XPATH when doing queries using Html Agility Pack don't need the full XPATH expression that a tool would give. You just need to focus on discriminants, for example specific attributes (like the class), id, etc.... Your code will be much more resistant to changes. But it means you need to learn a bit about XPATH (this is a good starting point: XPath Tutorial). So you really want to build XPATH expression such as this:
//div[#class = 'ABC']
which will get all DIV elements with a CLASS attribute named 'ABC'.

Html Parser & Object Model for .net/C#

I'm looking to parse html using .net for the purposes of testing or asserting its content.
i.e.
HtmlDocument doc = GetDocument("some html")
List forms = doc.Forms()
Link link = doc.GetLinkByText("New Customer")
the idea is to allow people to write tests in c# similar to how they do in webrat (ruby).
i.e.
visits('\')
fills_in "Name", "mick"
clicks "save"
I've seen the html agility pack, sgmlreader etc but has anyone created an object model for this, i.e. a set of classes representing the html elements, such as form, button etc??
Cheers.
Here is good library for html parsing, objects like HtmlButton , HtmlInput s are not created but it is a good point to start and to create them yourself if you don't want to use HTML DOM
The closest thing to an HTML DOM in .NET, as far as I can tell, is the HTML DOM.
You can use the Windows Forms WebBrowser control, load it with your HTML, then access the DOM from the outside.
BTW, this is .NET. Any code that works for VB.NET would work for C#.
you have 2 major options:
Use some browser engine (i.e. internet explorer) that will parse the html for u and then will give give u access to the generated DOM. this option will require u to hvae some interop with the browser engine (in the case of i.e. it's simple COM)
use some light weight parser like HtmlAgilityPack
It sounds to me like you are trying to do HTML unit tests. Have you looked into Selenium? It even has C# library so that you can write your HTML unit tests in C# and assert that elements exist and that they have the correct values and even click on links. It even works with JavaScript / AJAX sites.
The best parser for HTML is the HTQL COM. Use can use HTQL queries to retrieve HTML content.

Self learning regular expression or xpath query?

Is it possible to write code which generates a regular expression or XPath that parses links based on some HTML document?
What I want is to parse a page for some links. The only thing I know is that the majority of the links on the page is those links.
For a simple example, take a Google search engine results page, for example this. The majority of the links is from the search results and looks something like this:
<h3 class="r"><a onmousedown="return rwt(this,'','','res','1','AFQjCNERidL9Hb6OvGW93_Y6MRj3aTdMVA','')" class="l" href="http://stackoverflow.com/"><em>Stack Overflow</em></a></h3>
Is it possible to write code that learns this and recognizes this and is able to parse all links, even if Google changes their presentation?
I'm thinking of parsing out all links, and looking X chars before and after each tag and then work from that.
I understand that this also could be done with XPath, but the question is still the same. Can I parse this content and generate a valid XPath to find the serp links?
As I understand them, most machine learning algorithms work best when they have many examples from which they generalize an 'intelligent' behavior. In this case, you don't have many examples. Google isn't likely to change their format often. Even if it feels often to us, it's probably not enough for a machine learning algorithm.
It may be easier to monitor the current format and if it changes, change your code. If you make the expected format a configurable regular expression, you can re-deploy the new format without rebuilding the rest of your project.
If I understand your question, there's really no need to write a learning algorithm. Regular expressions are powerful enough to pick this up. You can get all the links in an HTML page with the following regular expression:
(?<=href=")[^"]+(?=")
Verified in Regex Hero, this regular expression uses a positive lookbehind and a positive lookahead to grab the url inside of href="".
If you want to take it a step further you can also look for the anchor tag to ensure you're getting an actual anchor link and not a reference to a css file or something. You can do that like this:
(?<=<a[^<]+href=")[^"]+(?=")
This should work fine as long as the page follows the href="" convention for the links. If they're using onclick events then everything becomes more complicated as you're going to be dealing with the unpredictability of Javascript. Even Google doesn't crawl Javascript links.
Does that help?

Categories

Resources