Selecting children of parent, based on other child - c#

Using HTMLAgilityPack I am trying to generate a list of clickable objects, using the function FindElementsByXPath, based on below structure.
<div class = "table-container">
<div>
<strong>
<a>Txt<a/>
</strong>
</div>
<Table class="sc" style="display: None;">
</Table>
</div>
The problem however is that I only want to include the deepest-level a-tag if the table has the style-attribute set to "display: None;" (note that if the table is already expanded, the style attribute does not exist).
I am trying to generate an XPath expression that would help me achieve this. So far, I have made this:
//*[#class='table-container' and table[contains(#style,'display: None;')]]/div/strong/a
However, this is not working. I tried to search for the solution online and experimented with various settings, but no luck so far. I am new to XPath selectors and find myself stuck at this moment. Any help would be appreciated.

Solution
The following query should work:
//*[#class='table-container' and Table[contains(#style,'display: None;')]]/div/strong/a
It's very close to what you had.
Testing
I tested it on the following Xml:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<div class="table-container">
<div>
<strong>
<a>Txt</a>
</strong>
</div>
<Table class="sc" style="display: None;"/>
</div>
<div class="table-container">
<div>
<strong>
<a>Txt2</a>
</strong>
</div>
<Table class="sc"/>
</div>
</root>
and it returns
<a>Txt</a>
Notes
You query was basically correct. Note the following.
Xml parsers can be really finicky. Check the case of the items in the selectors. For example, table might not match, but Table might.
Xml parsers can be really fragile. Check that the markup that you're trying to parse is valid. In the posted snipped we had <a>Txt<a/> which caused my parser to barf. Once I changed it to <a>Txt</a> it was fine.
There are often many different ways to do the same thing. The most appropriate will depend heavily on the structure of your actual Xml. For example, //div[Table[#style='display: None;']]//a works fine on the test data, but might not work "in real life". For example, if the Xml you're actually using varied between display:None and display: None (with a space after the colon) that would cause another problem.

I found the answer after returning from work and looking at it anew. Turns out if you hadn't clicked on the contained text in the a-tag, the table was simply not "there" as far as the XML was concerned. Only once you have clicked on it, it became visible in firebug with a distinguishing style being equal to either "display: None;" or being empty. For my application, I thus had to check if the table was present and, if not, click the a-tag. The definitive XPath was:
//*[#class='table-container' and not(Table)]/div/strong/a
Credit does have to go Ezra for pointing out the nuances of XPath!

Related

xPath working for only one of multiple div tags found at the same level

Writing tests with Selenium webdriver in C#. I absolutely can't understand why only the first in a list of (same-level) div elements can be identified with xPath.
I have this html, I have inspected two elements on the page, two different divs. I managed to copy just the text of the first element, by running this SIMPLE code:
IWebElement chapterElement = webDriver.FindElement(By.XPath("/html/body/div[3]/main/div[2]/div[3]/article/div[1]"));
...after which I can just type:
chapterElement.Text to find out the inner text.
And the other one is another div, at the same level as the first, the xPath I just copied from the HTML (copy entire xPath):
IWebElement chapterElement = webDriver.FindElement(By.XPath("/html/body/div[3]/main/div[2]/div[3]/article/div[2]"));
... and it doesn't fail, but it doesn't copy the text also, the text is "" (empty string).
The only differences between the two divs are:
the last segment in the path: div[1] versus div[2].
the second div is actually hidden from the page (probably because it lacks the class "chapter_visible"), but does show up completely in the html with Inspect!
In case this helps, I'm gonna say
"/html/body/div[3]/main/div[2]/div[3]/article/div[1]"
corresponds with:
<div class="chapter chapter chapter_visible" data-chapterno="0" data-chapterid="5e8798266cee070006f5a3d1" style="display: block;">
<h1>some text</h1>
<div class="chapter__content"><p>some text</p>
<p>some text</p>
<p>some text</p>
<ul>
<li>some text</li>
<li>some text</li>
<li>some text.</li>
</ul></div>
</div>
and
"/html/body/div[3]/main/div[2]/div[3]/article/div[2]" (the second xPath)
corresponds to the following (as is located at the same level as the first):
<div class="chapter chapter" data-chapterno="1" data-chapterid="5e8798436cee070006f5a3d2">
<h1>some text</h1>
<div class="chapter__content"><p>some text</p>
<p><strong>some text</strong></p>
<p>some text.</p>
<p>some text</p>
<p>some text</p></div>
</div>
This is my first experience playing around with xPath, a bit disappointed because I just copied the xPath, I didn't even write it manually. It was supposed to be fast and straightforward, right? Thank you.
IWebElement chapterElement = webDriver.FindElement(By.XPath("//div[#class='chapter chapter']"));
Can u try this?
if you want get_attribute
IWebElement chapterElement = webDriver.FindElement(By.XPath("//div[#class='chapter chapter']")).GetAttribute("attribute_name");

class name and then the tag name contains the distinguish text

I have class name active and then there is unique text called active text in span(which is nested). Class name active is the unique among other class names then nested text is unique. How would i click on that. I have used following methods.
FindElement(By.XPath("//li[#class='active']//*[contains(.,'active text')]"));
also i tried
findelement(BY.xpath(//li[#class='active']//div//div//div//span[contains(.,'active text')]"))
also tried this
FindElement(By.XPath("//li[contains(#class,'active')] and //span[contains(.,'active text')]")).Text;
Every time i get no such element found
ANythoughts
this is the html code
<li class="active">
<div class="a">
<div class="b">
<div class="c">
<h1></h1>
<h3 class="d"> some text</h3>
<div class="e">
<span class="f">
Active Text</span>
</div></div></div></div>
</li>
You can use either of the following Locator Strategies:
CssSelector:
FindElement(By.CssSelector("li.active span.f"));
XPath 1:
FindElement(By.XPath("//li[#class='active']//span[normalize-space()='Active Text']"));
XPath 2:
FindElement(By.XPath("//li[#class='active']//span[#class='f' and normalize-space()='Active Text']"));
FindElement(By.XPath("//li[#class='active']//span[contains(text(),'Active Text')]"));
OR
FindElement(By.XPath("//li[#class='active']//span[#class='f' and contains(text(),'Active Text')]"));
Please try the above code. both will work. also, let me know if clarification is required
So what worked for me was this,
FindElement(By.CssSelector("li.active")).FindElement(By.XPath("//span[contains(.,'Active Text')]"));

Remove invalid/incorrectly placed tags from html string

I'm wondering if there is a good (or good enough) way to remove invalid or incorrectly placed HTML tags from an HTML string in C#?
Example 1: <div> </div> </div> should be changed to <div> </div>
Example 2: <div> </section> </div> should be changed to <div> </div>
Basically the transformed html string should be W3C validated markup. I understand that this may be a bit difficult to do, perhaps there is a library that does the job well?
Thanks!
I'd recommend using HTMLTidy.
Since you're using C#, there's the tidy.net project. I think there are dlls that you can just reference and use in your C# code.
Or, you can just use the command line stuff for HTMLTidy.
I ended up fixing the root issue that generated an invalid HTML string. In such a scenario, it is exceedingly better to fix the main problem - if possible - than the symptoms.

Click a tab button on website

I want to click tab buttons on website which runs on web browser with code below. I put "step2Tab" and "group1step2" as parameter but getting NullReferenceException error. How can i click these buttons?
webBrowser1.Document.GetElementById("step2Tab").InvokeMember("click");
Here is the html code
Step 3
Step 2
<a class="currentTab" href="javascript:donothing()" onclick="showTab(this,1,'step1')" id="group1step1">Step 1</a>
<div id="step1Tab" style="display: block;"></div>
<div id="step2Tab" style="display: block;"></div>
<div id="step3Tab" style="display: block;"></div>
I'm not familiar with the environment you're working in, you may need to be more specific but looking at other people having this issue (link, link) it appears you need to get elements by tag:
webBrowser1.Document.GetElementByTagName("step2Tab");
This will return an array of elements with that tag. You will then need to compare by the an attribute of the element:
GetAttribute("attribute")
I hope this is useful.

HtmlAgilityPack reading HTML in a wrong way?

I have been using HAP for a pretty long time. And now I have a really simple question.
How to correctly load a webpage?
The reason I'm asking is because there is a website and a specific part in the formatting messes up with HAP:
<div class="like-bar">
<div class="g-bar"><div class="green-bar" style="width:55.47%"/></div></div>
<div class="like-descr">76 Likes, 61 Dislikes</div>
</div>
So the part I'm having the problem with is "style="width:55.47%"/></div></div>". So there is a closing tag for the g-bar class, a closing tag for the "green-bar" class, but the greenbar class by itself has the closing bracket (/>). As you could imagine, this screws up the whole formatting and makes it impossible to parse.
When I use inspect in any browser the "/>" tag is just not there. How can I figure out what writes it down? I download the page using the Load method from the HtmlWeb class.
Update #1
For some really strange reason, the following does not work:
<div class="like-bar">
<div class="g-bar">
<div class="green-bar" style="width:55.474452554745%"></div>
</div>
<div class="like-descr">
<span class="bold">76</span><span>Likes</span>, <span class="bold">61</span><span>Dislikes</span>
</div>
</div>
The last is not associated to the class like-bar, instead it links to a parent.
What's wrong with this?
Thank you for your attention!

Categories

Resources