Populating a span with a string including a linked email [duplicate] - c#

What is the difference between innerHTML, innerText and value in JavaScript?

The examples below refer to the following HTML snippet:
<div id="test">
Warning: This element contains <code>code</code> and <strong>strong language</strong>.
</div>
The node will be referenced by the following JavaScript:
var x = document.getElementById('test');
element.innerHTML
Sets or gets the HTML syntax describing the element's descendants
x.innerHTML
// => "
// => Warning: This element contains <code>code</code> and <strong>strong language</strong>.
// => "
This is part of the W3C's DOM Parsing and Serialization Specification. Note it's a property of Element objects.
node.innerText
Sets or gets the text between the start and end tags of the object
x.innerText
// => "Warning: This element contains code and strong language."
innerText was introduced by Microsoft and was for a while unsupported by Firefox. In August of 2016, innerText was adopted by the WHATWG and was added to Firefox in v45.
innerText gives you a style-aware, representation of the text that tries to match what's rendered in by the browser this means:
innerText applies text-transform and white-space rules
innerText trims white space between lines and adds line breaks between items
innerText will not return text for invisible items
innerText will return textContent for elements that are never rendered like <style /> and `
Property of Node elements
node.textContent
Gets or sets the text content of a node and its descendants.
x.textContent
// => "
// => Warning: This element contains code and strong language.
// => "
While this is a W3C standard, it is not supported by IE < 9.
Is not aware of styling and will therefore return content hidden by CSS
Does not trigger a reflow (therefore more performant)
Property of Node elements
node.value
This one depends on the element that you've targeted. For the above example, x returns an HTMLDivElement object, which does not have a value property defined.
x.value // => null
Input tags (<input />), for example, do define a value property, which refers to the "current value in the control".
<input id="example-input" type="text" value="default" />
<script>
document.getElementById('example-input').value //=> "default"
// User changes input to "something"
document.getElementById('example-input').value //=> "something"
</script>
From the docs:
Note: for certain input types the returned value might not match the
value the user has entered. For example, if the user enters a
non-numeric value into an <input type="number">, the returned value
might be an empty string instead.
Sample Script
Here's an example which shows the output for the HTML presented above:
var properties = ['innerHTML', 'innerText', 'textContent', 'value'];
// Writes to textarea#output and console
function log(obj) {
console.log(obj);
var currValue = document.getElementById('output').value;
document.getElementById('output').value = (currValue ? currValue + '\n' : '') + obj;
}
// Logs property as [propName]value[/propertyName]
function logProperty(obj, property) {
var value = obj[property];
log('[' + property + ']' + value + '[/' + property + ']');
}
// Main
log('=============== ' + properties.join(' ') + ' ===============');
for (var i = 0; i < properties.length; i++) {
logProperty(document.getElementById('test'), properties[i]);
}
<div id="test">
Warning: This element contains <code>code</code> and <strong>strong language</strong>.
</div>
<textarea id="output" rows="12" cols="80" style="font-family: monospace;"></textarea>

Unlike innerText, though, innerHTML lets you work with HTML rich text and doesn't automatically encode and decode text. In other words, innerText retrieves and sets the content of the tag as plain text, whereas innerHTML retrieves and sets the content in HTML format.

InnerText property html-encodes the content, turning <p> to <p>, etc. If you want to insert HTML tags you need to use InnerHTML.

In simple words:
innerText will show the value as is and ignores any HTML formatting which may
be included.
innerHTML will show the value and apply any HTML formatting.

Both innerText and innerHTML return internal part of an HTML element.
The only difference between innerText and innerHTML is that: innerText return HTML element (entire code) as a string and display HTML element on the screen (as HTML code), while innerHTML return only text content of the HTML element.
Look at the example below to understand better. Run the code below.
const ourstring = 'My name is <b class="name">Satish chandra Gupta</b>.';
document.getElementById('innertext').innerText = ourstring;
document.getElementById('innerhtml').innerHTML = ourstring;
.name {
color:red;
}
<p><b>Inner text below. It inject string as it is into the element.</b></p>
<p id="innertext"></p>
<br>
<p><b>Inner html below. It renders the string into the element and treat as part of html document.</b></p>
<p id="innerhtml"></p>

var element = document.getElementById("main");
var values = element.childNodes[1].innerText;
alert('the value is:' + values);
To further refine it and retrieve the value Alec for example, use another .childNodes[1]
var element = document.getElementById("main");
var values = element.childNodes[1].childNodes[1].innerText;
alert('the value is:' + values);

In terms of MutationObservers, setting innerHTML generates a childList mutation due to the browsers removing the node and then adding a new node with the value of innerHTML.
If you set innerText, a characterData mutation is generated.

innerText property sets or returns the text content as plain text of the specified node, and all its descendants, whereas the innerHTML property gets and sets the plain text or HTML contents in the elements. Unlike innerText, innerHTML lets you work with HTML rich text and doesn’t automatically encode and decode text.

InnerText will only return the text value of the page with each element on a newline in plain text, while innerHTML will return the HTML content of everything inside the body tag, and childNodes will return a list of nodes, as the name suggests.

The innerText property returns the actual text value of an html element while the innerHTML returns the HTML content. Example below:
var element = document.getElementById('hello');
element.innerText = '<strong> hello world </strong>';
console.log('The innerText property will not parse the html tags as html tags but as normal text:\n' + element.innerText);
console.log('The innerHTML element property will encode the html tags found inside the text of the element:\n' + element.innerHTML);
element.innerHTML = '<strong> hello world </strong>';
console.log('The <strong> tag we put above has been parsed using the innerHTML property so the .innerText will not show them \n ' + element.innerText);
console.log(element.innerHTML);
<p id="hello"> Hello world
</p>

To add to the list, innerText will keep your text-transform, innerHTML wont.

#rule:
innerHTML
write: whatever String you write to the ele.innerHTML, ele (the code of the element in the html file) will be exactly same as it is written in the String.
read : whatever you read from the ele.innerHTML to a String, the String will be exactly same as it is in ele (the html file).
=> .innerHTML will not make any modification for your read/write
innerText
write: when you write a String to the ele.innerText, any html reserved special character in the String will be encoded into html format first, then stored into the ele.
eg: <p> in your String will become <p> in the ele
read : when you read from the ele.innerText to a String,
any html reserved special character in the ele will be decoded back into a readable text format,
eg: <p> in the ele will become back into <p> in your String
any (valid) html tag in the ele will be removed -- so it becomes "plain text"
eg: if <em>you</em> can in the ele will become if you can in your String
about invalid html tag
if there is an invalid html tag originally in the ele (the html code), and you read from.innerText, how does the tag gets removed?
-- this ("if there is an invalid html tag originally") should not (is not possible to) happen
but its possible that you write an invalid html tag by .innerHTML (in raw) into ele -- then, this may be auto fixed by the browser.
dont take (-interpret) this as step [1.] [2.] with an order
-- no, take it as step [1.] [2.] are executed at the same time
-- I mean, if the decoded characters in [1.] will form a new tag after the conversion, [2.] does not remove it
(-- cuz [2.] considers what characters are in the ele during the conversion, not the characters they become into after the conversion)
then stored into the String.
jsfiddle: with explanation
(^ this contains much more explanations in comments of the js file, + output in console.log
below is a simplified view, with some output.
(try out the code yourself, also there is no guarantee that my explanations are 100% correct.))
<p id="mainContent">This is a <strong>sample</strong> sentennce for Reading.</p>
<p id="htmlWrite"></p>
<p id="textWrite"></p>
// > #basic (simple)
// read
var ele_mainContent = document.getElementById('mainContent');
alert(ele_mainContent.innerHTML); // This is a <strong>sample</strong> sentennce for Reading. // >" + => `.innerHTML` will **not make any modification** for your read/write
alert(ele_mainContent.innerText); // This is a sample sentennce for Reading. // >" 2. any (valid) `html tag` in the `ele` will be **removed** -- so it becomes "plain text"
// write
var str_WriteOutput = "Write <strong>this</strong> sentence to the output.";
var ele_htmlWrite = document.getElementById('htmlWrite');
var ele_textWrite = document.getElementById('textWrite');
ele_htmlWrite.innerHTML = str_WriteOutput;
ele_textWrite.innerText = str_WriteOutput;
alert(ele_htmlWrite.innerHTML); // Write <strong>this</strong> sentence to the output. // >" + => `.innerHTML` will **not make any modification** for your read/write
alert(ele_htmlWrite.innerText); // Write this sentence to the output. // >" 2. any (valid) `html tag` in the `ele` will be **removed** -- so it becomes "plain text"
alert(ele_textWrite.innerHTML); // Write <strong>this</strong> sentence to the output. // >" any `html reserved special character` in the String will be **encoded** into html format first
alert(ele_textWrite.innerText); // Write <strong>this</strong> sentence to the output. // >" 1. any `html reserved special character` in the `ele` will be **decoded** back into a readable text format,
// > #basic (more)
// write - with html encoded char
var str_WriteOutput_encodedChar = "What if you have <strong>encoded</strong> char in <strong>the</strong> sentence?";
var ele_htmlWrite_encodedChar = document.getElementById('htmlWrite_encodedChar');
var ele_textWrite_encodedChar = document.getElementById('textWrite_encodedChar');
ele_htmlWrite_encodedChar.innerHTML = str_WriteOutput_encodedChar;
ele_textWrite_encodedChar.innerText = str_WriteOutput_encodedChar;
alert(ele_htmlWrite_encodedChar.innerHTML); // What if you have <strong>encoded</strong> char in <strong>the</strong> sentence?
alert(ele_htmlWrite_encodedChar.innerText); // What if you have <strong>encoded</strong> char in the sentence?
alert(ele_textWrite_encodedChar.innerHTML); // What if you have &lt;strong&gt;encoded&lt;/strong&gt; char in <strong>the</strong> sentence?
alert(ele_textWrite_encodedChar.innerText); // What if you have <strong>encoded</strong> char in <strong>the</strong> sentence?
// > #note-advance: read then write
var ele__htmlRead_Then_htmlWrite = document.getElementById('htmlRead_Then_htmlWrite');
var ele__htmlRead_Then_textWrite = document.getElementById('htmlRead_Then_textWrite');
var ele__textRead_Then_htmlWrite = document.getElementById('textRead_Then_htmlWrite');
var ele__textRead_Then_textWrite = document.getElementById('textRead_Then_textWrite');
ele__htmlRead_Then_htmlWrite.innerHTML = ele_mainContent.innerHTML;
ele__htmlRead_Then_textWrite.innerText = ele_mainContent.innerHTML;
ele__textRead_Then_htmlWrite.innerHTML = ele_mainContent.innerText;
ele__textRead_Then_textWrite.innerText = ele_mainContent.innerText;
alert(ele__htmlRead_Then_htmlWrite.innerHTML); // This is a <strong>sample</strong> sentennce for Reading.
alert(ele__htmlRead_Then_htmlWrite.innerText); // This is a sample sentennce for Reading.
alert(ele__htmlRead_Then_textWrite.innerHTML); // This is a <strong>sample</strong> sentennce for Reading.
alert(ele__htmlRead_Then_textWrite.innerText); // This is a <strong>sample</strong> sentennce for Reading.
alert(ele__textRead_Then_htmlWrite.innerHTML); // This is a sample sentennce for Reading.
alert(ele__textRead_Then_htmlWrite.innerText); // This is a sample sentennce for Reading.
alert(ele__textRead_Then_textWrite.innerHTML); // This is a sample sentennce for Reading.
alert(ele__textRead_Then_textWrite.innerText); // This is a sample sentennce for Reading.
// the parsed html after js is executed
/*
<html><head>
<meta charset="utf-8">
<title>Test</title>
</head>
<body>
<p id="mainContent">This is a <strong>sample</strong> sentennce for Reading.</p>
<p id="htmlWrite">Write <strong>this</strong> sentence to the output.</p>
<p id="textWrite">Write <strong>this</strong> sentence to the output.</p>
<!-- P2 -->
<p id="htmlWrite_encodedChar">What if you have <strong>encoded</strong> char in <strong>the</strong> sentence?</p>
<p id="textWrite_encodedChar">What if you have &lt;strong&gt;encoded&lt;/strong&gt; char in <strong>the</strong> sentence?</p>
<!-- P3 #note: -->
<p id="htmlRead_Then_htmlWrite">This is a <strong>sample</strong> sentennce for Reading.</p>
<p id="htmlRead_Then_textWrite">This is a <strong>sample</strong> sentennce for Reading.</p>
<p id="textRead_Then_htmlWrite">This is a sample sentennce for Reading.</p>
<p id="textRead_Then_textWrite">This is a sample sentennce for Reading.</p>
</body></html>
*/

innerhtml will apply html codes
innertext will put content as text so if you have html tags it will show as text only

1)innerHtml
sets all the html content inside the tag
returns all the html content inside the tag
includes styling + whitespaces
2)innerText
sets all the content inside the tag (with tag wise line breaks)
returns all html content inside the tag (with tag wise line breaks)
ignores tags (shows only text)
ignores styling + whitespaces
if we have style:"visibility:hidden;" inside tag
|_ innerText includes the styling -> hides content
3)textContent
sets all the content inside the tag (no tag wise line breaks)
returns all content inside the tag (no tag wise line breaks)
includes whitespaces
if we have style:"visibility:hidden;" inside tag
|_ textContent ignores the styling -> shows content
textContent has better performance because its value is not parsed as HTML.

Related

Agility Helper Html retrieving p/paragraphs text until another anchor is reached

I am using Agility Helper HTML and I have thus far a code as such:
var linkWeb = new HtmlWeb();
var linkDoc = web.Load(link);
foreach (HtmlNode l in linkDoc.DocumentNode.SelectNodes("//p"))
{
Console.WriteLine("text #"+ i++= + l.InnerText);
}
So this reads the web paragraph text just fine except, I want it to read all the paragraphs text combined until another anchor a tag is reached or if you can think of a better method.
<p>
PART 1
CONTENT1;
CONTENT2;
</p>
<p>CONTENT3.</p>
<p>
PART 2
CONTENT1
CONTENT2
CONTENT3
CONTENT4
</p>
<p>CONTENT5.</p>
<p>CONTENT6.</p>
<p>CONTENT8.</p>
<p>
PART 3
CONTENT1
CONTENT2
CONTENT3
CONTENT4.
</p>
So right now with the code I have, it reads the P text of each paragraph separately.
TEXT #1 is
CONTENT1
CONTENT2
TEXT # 2 is
CONTENT3.
I want this to read
TEXT #1 is
CONTENT1
CONTENT2
CONTENT3.
this is dynamic and # of paragraphs change.
Some kind of check to make sure before hitting the anchor it reads all paragraphs / InnerTexts and knows it is the supposed to be in the same Text #.
You could implement this like:
foreach (HtmlNode l in linkDoc.DocumentNode.SelectNodes("//p"))
{
if (l.ChildNodes.Any(node => node.Name == "a"))
{
Console.WriteLine();
Console.Write("text #" + i++);
}
Console.Write(l.InnerText + " ");
}

Unable to access div content from C#

I only see the space tags "\r\n "\r\n" for InnerHTML & InnerText properties and not the actual content. Where am i going wrong
RENDERED HTML:
<div id="urllist" runat="server">
http://test1t.com
<br></br>
http://test2.com
<br></br>
</div>
C#:
HtmlContainerControl list = (HtmlContainerControl)urllist;
string string1 = list.InnerHtml;
string string2 = list.InnerText;
//this didnt work either
string string1 = urllist.InnerHtml;
string string2 = urllist.InnerText;
If i remember correctly you have to use Controls[0] to find the literal control that contains the text:
var div = (HtmlGenericControl) urllist;
var lit = (LiteralControl) div.Controls[0];
string text = lit.Text;
Update: tested, it works. This is text:
http://test1t.com
<br></br>
http://test2.com
<br></br>
However, now i have tested it with your approach and it works also.
I would have added a comment, but I cannot add images in comments. See below, I've tested your code and it works:
Are you sure you don't check your result in a HTML page or that you are not altering your result in any way before you check it?

How to unescape special characters in c#

I have the following code
XElement element = new XElement("test", "a&b");
where
element.LastNode contains the value "a&b".
i wanted to be it "a&b".
How do i replace this?
Wait a moment,
<test>a&b</test>
is not valid XML. You cannot make XML that looks like this. This is clarified by the XML standard.
& has special meaning, it denotes an escaped character that may otherwise be invalid. An '&' character is encoded as & in XML.
for what its worth, this is invalid HTML for the same reason.
<!DOCTYPE html> <html> <body> a&b </body> </html>
If I write the code,
const string Value = "a&b";
var element = new XElement("test", Value);
Debug.Assert(
string.CompareOrdinal(Value, element.Value) == 0,
"XElement is mad");
it runs without error, XElement encodes and decodes to and from XML as necessary.
To unescape or decode the XML element you simply read XElement.Value.
If you want to make a document that looks like
<test>a&b</test>
you can but it is not XML or HTML, tools for working with HTML or XML won't intentionally help you. You'll have make your own Readers, Writers and Parsers.
The & is a reserved character so it will allways be encoded. So you have to decode:
Is this an option:
HttpUtility.HtmlDecode Method (String)
Usage:
string decoded = HttpUtility.HtmlDecode("a&b");
// returns "a&b"
Try following:
public static string GetTextFromHTML(String htmlstring)
{
// replace all tags with spaces...
htmlstring= Regex.Replacehtmlstring)#"<(.|\n)*?>", " ");
// .. then eliminate all double spaces
while (htmlstring).Contains(" "))
{
htmlstring= htmlstring.Replace(" ", " ");
}
// clear out non-breaking spaces and & character code
htmlstring = htmlstring.Replace(" ", " ");
htmlstring = htmlstring.Replace("&", "&");
return htmlstring;
}

Get colored texts within HTML code

I have a Html code and I want to Convert it to plain text but keep only colored text tags.
for example:
when I have below Html:
<body>
This is a <b>sample</b> html text.
<p align="center" style="color:#ff9999">this is only a sample<p>
....
and some other tags...
</body>
</html>
I want the output:
this is a sample html text.
<#ff9999>this is only a sample<>
....
and some other tags...
I'd use parser to parse HTML like HtmlAgilityPack, and use regular expressions to find the color value in attributes.
First, find all the nodes that contain style attribute with color defined in it by using xpath:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode
.SelectNodes("//*[contains(#style, 'color')]")
.ToArray();
Then the simplest regex to match a color value: (?<=color:\s*)#?\w+.
var colorRegex = new Regex(#"(?<=color:\s*)#?\w+", RegexOptions.IgnoreCase);
Then iterate through these nodes and if there is a regex match, replace the inner html of the node with html encoded tags (you'll understand why a little bit later):
foreach (var node in nodes)
{
var style = node.Attributes["style"].Value;
if (colorRegex.IsMatch(style))
{
var color = colorRegex.Match(style).Value;
node.InnerHtml =
HttpUtility.HtmlEncode("<" + color + ">") +
node.InnerHtml +
HttpUtility.HtmlEncode("</" + color + ">");
}
}
And finally get the inner text of the document and perform html decoding on it (this is because inner text strips all the tags):
var txt = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
This should return something like this:
This is a sample html text.
<#ff9999>this is only a sample</#ff9999>
....
and some other tags...
Of course you could improve it for your needs.
It is possible to do it using regular expressions but... You should not parse (X)HTML with regex.
The first regexp I came with to solve the problem is:
<p(\w|\s|[="])+color:(#([0-9a-f]{6}|[0-9a-f]{3}))">(\w|\s)+</p>
Group 5th will be the hex (3 or 6 hexadecimals) colour and group 6th will be the text inside the tag.
Obviously, it's not the best solution as I'm not a regexp master and obviously it needs some testing and probably generalisation... But still it's a good point to start with.

Extract content in paragraph Tags

I have following html in string and i have to extract the content only in Paragraph tags any ideas??
link is http://www.public-domain-content.com/books/Coming_Race/C1P1.shtml
I have tried
const string HTML_TAG_PATTERN = "<[^>]+.*?>";
static string StripHTML(string inputString)
{
return Regex.Replace(inputString, HTML_TAG_PATTERN, string.Empty);
}
it removes all html tags but i dont want to remove all the tags because this is the way how i can get content like paragraph by tags
secondly it makes line breaks to \n in text and and applying replace("\n","") dose not helps
one problem is that when i apply
int UrlStart = e.Result.IndexOf("<p>"), urlEnd = e.Result.IndexOf("<p> </p></td>\r" );
string paragraph = e.Result.Substring(UrlStart, urlEnd);
extractedContent.Text = paragraph.Replace(Environment.NewLine, "");
<p> </p></td>\r this appears at the end of paragraph but urlEnd dose not makes sure only paragraph is shown
the string extracted is shown in visual studio is like this
this page is downloaded by Webclient
End of HTMLpage
We will provide ourselves with ropes of\rsuitable length and strength- and- pardon me- you must not\rdrink more to-night. our hands and feet must be steady and\rfirm tomorrow.\"\r<p> </p> </td>\r </tr>\r\r <tr>\r <td height=\"25\" width=\"10%\">\r \r </td><td height=\"25\" width=\"80%\" align=\"center\">\r <font color=\"#FFFFFF\">\r <font size=\"4\">1</font> \r </font></td>\r <td height=\"25\" width=\"10%\" align=\"right\">Next</td>\r </tr>\r </table>\r </center>\r</div>\r<p align=\"center\"><b>The Coming Race -by- Edward Bulwer Lytton</b></p>\r<P><B><center>Encyclopedia - Books - Religion<a/> - <A HREF=\"http://www.public-domain-content.com/links2.shtml\">Links - Home - Message Boards</B><BR>This Wikipedia content is licensed under the <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU Fr
Don't use regular expressions to parse HTML. Use the HTML Agility Pack (or something similar) instead.
A quick example, but you could do something like this:
HtmlDocument document = new HtmlDocument();
document.Load("your_file_here.htm");
foreach(HtmlNode paragraph in document.DocumentElement.SelectNodes("//p"))
{
// do something with the paragraph node here
string content = paragraph.InnerText; // or something similar
}

Categories

Resources