how to ignore double white-spaces during text matching using Xpath

how to ignore double white-spaces during text matching using Xpath - c#

I have a html code like this:
<div class="main">
<div class ="first">
<p>just text</p>
</div>
<div class= "second">
<p>some text</p>
</div>
<div class= "third">
<p>some text having double white-space</p>
</div>
</div>
and use Xpath like this: //div/p[contains(text(),'some text')]
But unfortunately in any place of the "some text" inside element p can be double white-spaces. So I need to ignore them during this matching. I know I can use Xpath expression like this translate(normalize-space(//div/p), ' ', '') but it will find all elements p and just replace double white-spaces without matching "some text"!
Is it possible to match "some text" but ignore double white-spaces in the same time?

When selecting a set of nodes using XPath 1.0, the XPath can't change the nodes that are returned in the result. You can only select nodes as they already are. You can use the following to ignore the double spaces when doing the selection:
//div/p[contains(normalize-space(), 'some text')]
This will return the set of ps that you are looking for, but their text content will be kept as it originally was. If you then want to obtain the text values without the duplicate spaces, you can iterate through this node set and strip out the spaces from the values one by one. You haven't told us anything about the code you're using to carry out these queries, so it's hard to tell you precisely how you would modify your code to do this. If you can show us your code, I can show you how to get it to do what you need.

Related

How to get xpath of different html values with the same properties

I'm working on Selenium and trying to get the values inside tags. The site that I'm working on is https://www.qnbfinansbank.enpara.com/doviz-kur-bilgileri/doviz-altin-kurlari.aspx. But the properties of the objects are the same. Therefore, the xpath scripts are the same. The values that I'm trying to get are like 5,615505 TL, 4,827450 TL, 187,389825 TL from
<div class="dlCont">
<span>5,615505 TL </span>
</div>
<div class="dlCont">
<span>4,827450 TL </span>
</div>
<div class="dlCont">
<span>187,389825 TL </span>
</div>
and so on. Is there any way to get the xpath of these values?

You can store all the values in a List. Then one by one you can retrieve it.
Something like :
IList<IWebElement> allValues= driver.FindElements(By.CssSelector("div.dlCont span"));
foreach (IWebElement values in allValues)
{
Console.WriteLine(values.Text);
}
Hope this will help.

You can use like this,
//span[contains(text(),'5,615505 TL')]

You can manually write the xpath for the below DOM Structure
<div class="dlCont">
<span>5,615505 TL </span>
</div>
Manually written xpath for above DOM structure is "//div[#class='dlCont']/span".
if the page is having many elements with same DOM struture then written Xpath will match with all the nodes.
There are 8 nodes are matched with XPATH="//div[#class='dlCont']/span" in the below URL https://www.qnbfinansbank.enpara.com/doviz-kur-bilgileri/doviz-altin-kurlari.aspx
if you want to fetch particular webelements then you need to specify the index value as "(//div[#class='dlCont']/span)[2]".
you need to add open bracket in the starting of the manually written xpath and close bracket in the ending of the Xpath.after that you need to mention the index value
1.//div[#class='dlCont']/span
2.(//div[#class='dlCont']/span
3.(//div[#class='dlCont']/span)
4.(//div[#class='dlCont']/span)[1]
Hope it will be helpful

How to sendkeys to a <p> tag through C# and Selenium

i want to sendkeys "description" within a textarea. I have tried all the possible ways but does not work.
HTML of the element :
<div class="ta-scroll-window ng-scope ta-text ta-editor form-control" ng-hide="showHtml">
<div class="popover fade bottom" style="max-width: none; width: 305px;">
<div class="arrow"></div>
<div class="popover-content"></div>
</div>
<div class="ta-resizer-handle-overlay">
<div class="ta-resizer-handle-background"></div>
<div class="ta-resizer-handle-corner ta-resizer-handle-corner-tl"></div>
<div class="ta-resizer-handle-corner ta-resizer-handle-corner-tr"></div>
<div class="ta-resizer-handle-corner ta-resizer-handle-corner-bl"></div>
<div class="ta-resizer-handle-corner ta-resizer-handle-corner-br"></div>
<div class="ta-resizer-handle-info"></div>
</div>
<div id="taTextElement737852736512107" contenteditable="true" ta-bind="ta-bind" ng-model="html" ta-keep-styles="true" class="ng-pristine ng-valid ta-bind ng-empty ng-touched" an-form-object-name="Açıklama" name="Açıklama">
<p>
<br>
</p>
</div>
</div>
Code trial :
Dim action2 = New Actions(driver)
Dim cekbul2 = driver.FindElement(By.XPath("//*#id=""taHtmlElement737852736512107""]"))
cekbul2.SendKeys("Açıklama")
Console.Write("textarea send description")
or
Dim cekbul2 = driver.FindElement(By.XPath("//textarea[#class='ng-pristine ng-untouched ng-valid ng-scope ta-bind ta-html ta-editor form-control ng-empty ng-hide' and #id='taHtmlElement737852736512107']"))
The error is :
"no such element: Unable to locate element does not work" give error

Your html does not have a text area input field inside it.
When you use an xPath that says
'//textarea' this means that you are looking for an element that has tags of <textarea> </textarea>
It looks like your html is actually div's that are styled up to look like text areas.
That is why your second attempt will never work - because you are looking for a textarea where none exists.
Typically, in the situation where a div is styled up to work like a text area or textbox, you will find that the div has a backing input behind it.
These must be located between the
<form> and </form> tags in the html - otherwise the server would never be able to receive the data. (Html 5 provides new ways of working with this - but that is another story)
Can you examine your full html, and see if you can find the actual text area objects or the input type objects that end up containing the text content.
Type some dummy text, and use an html inspector tool within chrome or firefox to look for your dummy text.
If however, the post is completed by javascript - you may find that the javascript does not use inputs or text areas for containing the text and instead posts it external to any form elements. This is common with richtext emulators such as forum post pages.
If that is the case- you may need to experiment and find the appropriate html element that you need to send keys to in order for the content to work.
Also - could you try
Dim cekbul2 = driver.FindElement(By.XPath("//div[#id='taHtmlElement737852736512107']"))
I couldnt help but notice it had an xPath syntax error - you had no starting [ square bracket ] - also, in programming it is sometimes considered lazy a bad practice to wildcard / work with dynamics. I recommend always using the tag type for your xpaths, as opposed to '//*'
Worse case scenario, I would say that you could probably get around this by using Javascript execution. Eg: Directly setting the text, instead of 'sending the key strokes'.
However, this does not emulate human behavior - but it may be a necessary evil depending on your situation.

To send text to the <p> tag you have to use the ExecuteScript() method from IJavaScriptExecutor Interface and you can use the following code block :
((IJavaScriptExecutor)driver).ExecuteScript("document.getElementsByTagName("p")[0].innerHTML="Hasan Sarıkaya";");

I want to highlight some points here
Most probably your locator which you are using is not correct.
There are three way which I know to enter text using selenium
1)Use driver.findElement(yourLoator).sendKeys("Stringvalue");
2)You can use action class to send keys
3)You can use javascript executor to change innerHtml code
Personally ill not prefer the third solution, because we are testers I believe changing dom attribute is a good practice
Hope this will give you some help. please Let me know in case any query.

How to prevent tags from becoming separated by white-space?

I'm generating a XML document that will be parsed as XHTML using XDocument. In some parts of it I have lists formated as:
<root>
<div>
<span>Item 1</span>
</div>
<div>
<span>Item 2</span>
</div>
</root>
The whitespace between <div> and <span> (and respective terminators) is messing up my CSS. Is it possible to force it to NOT insert white-space in those cases, generating something like:
<root>
<div><span>Item 1</span></div>
<div><span>Item 2</span></div>
</root>
SaveOptions.DisableFormatting does work, but then it becomes a pain to (human) read the file. So I need something else.

I think I found an answer, I will leave it here for others to comment and find possible bugs before accepting it.
I inserted a blank XText as the first element in the div and made XDocument understand it as mixed content (or something like that) and produce the output that I need.
div.AddFirst(new XText(""));
Does anyone have documentation on why it doesn't format mixed content and if that is indeed what is happening?
BTW, it has to be a empty XText, just the below doesn't work:
div.AddFirst("");

Write query to parse HTML DOCUMENT with HtmlAgilityPack

I want to get the A href of that element in span class="floatClear" whose rating is minimum in
span class="star-img stars_4"
How can I use HtmlAgilityPack to achieve this behaviour I have give the html source of my file
<div class="businessresult"> //will repeat
<div class="rightcol">
<div class="rating">
<span class="star-img stars_4">
<img height="325" width="84" src="http://media1.px" alt="4.0 star rating" **title**="4.0 star rating">
</span>
</div>
</div>
<span class="floatClear">
<a class="ybtn btn-y-s" href="/writeareview/biz/KaBw8UEm8u6war_loc%NY">
</span>
</div>
The query I have written
var lowestreview =
from main in htmlDoc.DocumentNode.SelectNodes("//div[#class='rightcol']")
from rating in htmlDoc.DocumentNode.SelectNodes("//div[#class='rating']")
from ratingspan in htmlDoc.DocumentNode.SelectNodes("//span[#class='star-img stars_4']")
from floatClear in htmlDoc.DocumentNode.SelectNodes("//span[#class='floatClear']")
select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };
But I do not know how to apply condition here at last line of LINQ query!

Don't select "rating" from the entire htmlDoc, select it from the previously found "main".
I guess you need something like:
var lowestreview =
from main in htmlDoc.DocumentNode.SelectNodes("//div[#class='rightcol']")
from rating in main.SelectNodes("//div[#class='rating']")
from ratingspan in rating.SelectNodes("//span[#class='star-img stars_4']")
from floatClear in ratingspan.SelectNodes("//span[#class='floatClear']")
select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };
I hope it will not crash if some of those divs ans spans are not present: a previous version of the HtmlAgilityPack returned null instead of an empty list when the SelectNodes didn't find anything.
EDIT
You probably also need to change the "xpath query" for the inner selects: change the "//" into ".//" (extra . at the beginning) to signal that you really want a subnode. If the AgilityPack works the same as regular XML-XPath (I'm not 100% sure) then a "//" at the beginning will search from the root of the document, even if you specify it from a subnode. A ".//" will always search from the node you are searching from.
A main.SelectNodes("//div[#class='rating']") will (probably) also find <div class="rating">s outside the <div class="rightcol"> you found in the previous line.
A main.SelectNodes(".//div[#class='rating']") should fix that.

Extract content with XPath?

I have html content that I am storing as an XML document (using HTML Agility Pack). I know some XPath, but I am not able to zero into the exact content I need.
In my example below, I am trying to extract the "src" and "alt" text from the large image. This is my example:
<html>
<body>
....
<div id="large_image_display">
<img class="photo" src="images/KC0763_l.jpg" alt="Circles t-shirt - Navy" />
</div>
....
<div id="small_image_display">
<img class="photo" src="images/KC0763_s.jpg" alt="Circles t-shirt - Navy" />
</div>
</body>
</html>
What is the XPath to get "images/KC0763_l.jpg" and "Circles t-shirt - Navy"? This is how far I got but it is wrong. Mostly pseudo code at this point:
\\div[#class='large_image_display']\img[1][#class='photo']#src
\\div[#class='large_image_display']\img[1][#class='photo']#alt
Any help in getting this right would be greatly appreciated.

The following xpath will get you to the src attributes for the img tags:
'//html/body/div/img[#class="photo"]/#src'
And similarly this will get you to the alt attributes:
'//html/body/div/img[#class="photo"]/#alt'
From there you can get to the attribute text. If you want to only find the ones that match 'large_image_display' then you would filter it further like this:
'//html/body/div[#id="large_image_display"]/img[#class="photo"]/#src'

Use the following XPath expressions:
/html/body/div[#id='large_image_display']/img/#src
and
/html/body/div[#id='large_image_display']/img/#alt
Always try to avoid using the // abbreviation, because it may result in very inefficient evaluation (causes the whole (sub)tree to be scanned).
In this particular case we know that the html element is the top element of the document and we can simply select it by /html -- not //html.
Your major problem was that in your expressions you were using \ and \\ and there are no such operators in XPath. The correct XPath operators you were trying to use are / and the // abbreviation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.