I have this Html element on the page:
<li id="city" class="anketa_list-item">
<div class="anketa_item-city">From</div>
London
</li>
I found this element:
driver.FindElement(By.Id("city"))
If I try: driver.FindElement(By.Id("city")).Text, => my result: "From\r\nLondon".
How can I get only London by WebDriver?
You could easily get by using class-name:
driver.FindElement(By.Class("anketa_item-city")).Text;
or using Xpath
driver.FindElement(By.Xpath("\li\div")).Text;
You can try this:
var fromCityTxt = driver.FindElement(By.Id("city")).Text;
var city = Regex.Split(fromCityTxt, "\r\n")[1];
Sorry for my misleading. My previous provide xpath is which ends in the function /text() which selects not a node, but the text of it.
Approach for this situation is get parent's text then replace children's text then trim to remove space/special space/etc ....
var parent = driver.FindElement(By.XPath("//li"))
var child = driver.FindElement(By.XPath("//li/div"))
var london = parent.Text.Replace(child.Text, "").Trim()
Notes:
If Trim() isn't working then it would appear that the "spaces" aren't spaces but some other non printing character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
This worked for me.
driver.FindElement(By.XPath("//li[#id='city']/text()"));
Related
anglesharp - 0.9.11
On the page in the browser, the text is displayed as:
String_1.
String_2.
String_3.
String_4.
Parsing result:
String_1.String_2.String_3.String_4.
Page layout:
<div class="adv-point view-adv-point"><span>String_1. <br><br>String_2.<br>String_3.<br>String_4.</span></div>
I use code to parse:
var text = document.QuerySelectorAll("div:nth-child(4) >div:nth-child(3) > div.adv-point.view-adv-point");
text = items[0].TextContent.Trim();
Question
How to make the result of parsing with line breaks?
In other words, the result of the parsing should be:
String_1.
String_2.
String_3.
String_4.
I think if you use innerText here then it will work fine for you. Here is the code
var x = document.querySelectorAll("div:nth-child(4) >div:nth-child(3) > div.adv-point.view-adv-point");
console.log(x[0].innerText);
Try this-
var text=document.querySelectorAll(".view-adv-point span")[0].innerText;
If you log/alert text, you will see that the line break is present.
If you want to replace <br> with \n, then you can do this-
var text=document.querySelectorAll(".view-adv-point span")[0].innerHTML;
text = text.replace(/<br>/g, '\n');
But i believe this will return the same value as the first approach
I only see the space tags "\r\n "\r\n" for InnerHTML & InnerText properties and not the actual content. Where am i going wrong
RENDERED HTML:
<div id="urllist" runat="server">
http://test1t.com
<br></br>
http://test2.com
<br></br>
</div>
C#:
HtmlContainerControl list = (HtmlContainerControl)urllist;
string string1 = list.InnerHtml;
string string2 = list.InnerText;
//this didnt work either
string string1 = urllist.InnerHtml;
string string2 = urllist.InnerText;
If i remember correctly you have to use Controls[0] to find the literal control that contains the text:
var div = (HtmlGenericControl) urllist;
var lit = (LiteralControl) div.Controls[0];
string text = lit.Text;
Update: tested, it works. This is text:
http://test1t.com
<br></br>
http://test2.com
<br></br>
However, now i have tested it with your approach and it works also.
I would have added a comment, but I cannot add images in comments. See below, I've tested your code and it works:
Are you sure you don't check your result in a HTML page or that you are not altering your result in any way before you check it?
I have a Html code and I want to Convert it to plain text but keep only colored text tags.
for example:
when I have below Html:
<body>
This is a <b>sample</b> html text.
<p align="center" style="color:#ff9999">this is only a sample<p>
....
and some other tags...
</body>
</html>
I want the output:
this is a sample html text.
<#ff9999>this is only a sample<>
....
and some other tags...
I'd use parser to parse HTML like HtmlAgilityPack, and use regular expressions to find the color value in attributes.
First, find all the nodes that contain style attribute with color defined in it by using xpath:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode
.SelectNodes("//*[contains(#style, 'color')]")
.ToArray();
Then the simplest regex to match a color value: (?<=color:\s*)#?\w+.
var colorRegex = new Regex(#"(?<=color:\s*)#?\w+", RegexOptions.IgnoreCase);
Then iterate through these nodes and if there is a regex match, replace the inner html of the node with html encoded tags (you'll understand why a little bit later):
foreach (var node in nodes)
{
var style = node.Attributes["style"].Value;
if (colorRegex.IsMatch(style))
{
var color = colorRegex.Match(style).Value;
node.InnerHtml =
HttpUtility.HtmlEncode("<" + color + ">") +
node.InnerHtml +
HttpUtility.HtmlEncode("</" + color + ">");
}
}
And finally get the inner text of the document and perform html decoding on it (this is because inner text strips all the tags):
var txt = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
This should return something like this:
This is a sample html text.
<#ff9999>this is only a sample</#ff9999>
....
and some other tags...
Of course you could improve it for your needs.
It is possible to do it using regular expressions but... You should not parse (X)HTML with regex.
The first regexp I came with to solve the problem is:
<p(\w|\s|[="])+color:(#([0-9a-f]{6}|[0-9a-f]{3}))">(\w|\s)+</p>
Group 5th will be the hex (3 or 6 hexadecimals) colour and group 6th will be the text inside the tag.
Obviously, it's not the best solution as I'm not a regexp master and obviously it needs some testing and probably generalisation... But still it's a good point to start with.
This is what I tried:
string myURL= "http://mysite.com/articles/healthrelated";
String idStr = myURL.Substring(myURL.LastIndexOf('/') + 1);
I need to fetch "healthrelated" ie the text after the last slash in the URL. Now the problem is that my URL can also be like :
"http://mysite.com/articles/healthrelated/"
ie "a Slash" at the end of that text too. Now the last slash becomes the one AFTER "healthrelated" and so the result I get using
String idStr = myURL.Substring(myURL.LastIndexOf('/') + 1);
is empty string..
what should my code be like so I always get that text "healthrelated" no matter if there's a slash in the end or not. I just need to fetch that text somehow.
Try this.
var lastSegment = url
.Split(new string[]{"/"}, StringSplitOptions.RemoveEmptyEntries)
.ToList()
.Last();
Why don't you use Uri class of .NET and use segments property:
http://msdn.microsoft.com/en-us/library/system.uri.segments.aspx
What you can do in this situation is either using REGEX (which I'm not an expert on, but I'm shure other ppl here are ;) ) or a simple:
string[] urlParts = myURL.Split('/');
and take the last string in this array.
In C#, Windows Form, how would I accomplish this:
07:55 Header Text: This is the data<br/>07:55 Header Text: This is the data<br/>07:55 Header Text: This is the data<br/>
So, as you can see, i have a return string, that can be rather long, but i want to be able to format the data to be something like this:
<b><font color="Red">07:55 Header Text</font></b>: This is the data<br/><b><font color="Red">07:55 Header Text</font></b>: This is the data<br/><b><font color="Red">07:55 Header Text</font></b>: This is the data<br/>
As you can see, i essentially want to prepend <b><font color="Red"> to the front of the header text & time, and append </font></b> right before the : section.
So yeah lol i'm kinda lost.
I have messed around with .Replace() and Regex patterns, but not with much success. I dont really want to REPLACE text, just append/pre-pend at certain positions.
Is there an easy way to do this?
Note: the [] tags are actually <> tags, but i can't use them here lol
Just because you're using RegEx doesn't mean you have to replace text.
The following regular expression:
(\d+:\d+.*?:)(\s.*?\[br/\])
Has two 'capturing groups.' You can then replace the entire text string with the following:
[b][font color="Red"]\1[/font][/b]\2
Which should result in the following output:
[b][font color="Red"]07:55 Header Text:[/font][/b] This is the data[br/]
[b][font color="Red"]07:55 Header Text:[/font][/b] This is the data[br/]
[b][font color="Red"]07:55 Header Text:[/font][/b] This is the data[br/]
Edit: Here's some C# code which demonstrates the above:
var fixMe = #"07:55 Header Text: This is the data[br/]07:55 Header Text: This is the data[br/]07:55 Header Text: This is the data[br/]";
var regex = new Regex(#"(\d+:\d+.*?:)(\s.*?\[br/\])");
var matches = regex.Matches(fixMe);
var prepend = #"[b][font color=""Red""]";
var append = #"[/font][/b]";
string outputString = "";
foreach (Match match in matches)
{
outputString += prepend + match.Groups[1] + append + match.Groups[2] + Environment.NewLine;
}
Console.Out.WriteLine(outputString);
have you tried .Insert() check this.
Have you considered creating a style and setting the css class of each line by wrapping each line in a p or div tag?
Easier to maintain and to construct.
The easiest way probably is to use string.Replace() and string.Split(). Say your input string is input (untested):
var output = string.Join("<br/>", in
.Split("<br/>)
.Select(l => "<b><font color=\"Red\">" + l.Replace(": ", "</font></b>: "))
.ToList()
) + "<br/>";