Get text from HTML [duplicate] - c#

This question already has answers here:
How do you convert Html to plain text?
(20 answers)
Closed 1 year ago.
I need a way to get all text from my aspx files.
They may contain javascrip also but I only need this for the HTML code.
Basically I need to extract everything on Text or Value attributes, text within code, whatever...
Is there any parser API available?
Cheers!
Alex

As an alternative, you might consider playing with Linq to XML to strip the interesting stuff out.

Related

How to remove Only HTML tags in the program [duplicate]

This question already has an answer here:
Retrieving Inner Text of Html Tag C#
(1 answer)
Closed 3 years ago.
I want to remove HTML Tags with some source with C#.
Unfortunately, there are some content like <This is content>
first, I tried to Regex class like that.
Regex.Replace(htmltext,"[\\x00-\\x1f<>:\"/\\\\|?*]" +
"|^(CON|PRN|AUX|NUL|COM[0-9]|LPT[0-9]|CLOCK\\$)(\\.|$)" +
"|[\\. ]$", String.Empty);
but in this case,
"<This is content>" was removed.
so anyone, please tell me how to remove Only HTML Tags in the program.
Thanks regard.
Don't try and parse HTML with Regex. It tends not to go well.
Use a parser, HTML Agility Pack is very popular.
Using HTML agility pack you can simply call InnerText to extract the contents without HTML tags.

replace string between delimiters [duplicate]

This question already has answers here:
Are there any CSV readers/writer libraries in C#? [closed]
(5 answers)
Closed 6 years ago.
I've been trying my luck with Regex but my understanding doesn't seem to be the best.
Problem
I have a .csv file given to me by a 3rd party. I cannot edit it but need to read the data into my application.
There are always 12 columns in the file. However, sometimes it will go like this:
text, text ,text,"text with comma,"
text, text, text, text....
text, text, text,"text with comma,","text with comma again", text...
What I need to do this replace all the commas between the "" with a -.
Any help would be appreciated!
This might do the trick for you
foreach(Match match in Regex.Matches(YourCSV, "\"([^\"]*)\""))
if(match.ToString().Contains(","))
YourCSV = YourCSV.Replace(match.ToString(), match.ToString().Replace(",", "-"));

How can html be parsed as XML when containing '...&body='? [duplicate]

This question already has answers here:
What is the best way to parse html in C#? [closed]
(15 answers)
Closed 8 years ago.
I have html file that is a well-formed xml document (tags are paired), but contains anchor like the one below:
link
Xml parser invoked by XDocument.Load throws XmlException that says:
Additional information: '=' is an unexpected token. The expected token is ';'.
How can I instruct parser that I '&body' is not an entity? Do I must escape '&' character?
Not all HTML is going to be valid XML so you shouldn't try to parse it as such (although, in this case, it looks like you have some un-escpaped strings in the document that should probably get taken care of).
Instead, you should use something like the HTMLAgilityPack to parse your HTML and work with the document that way.

Easiest way to extract some html from string [duplicate]

This question already has answers here:
What is the best way to parse html in C#? [closed]
(15 answers)
Closed 9 years ago.
I have a long c# string of HTML code and I want to specifically extract bullet points "<ul><li></li></ul>".
Say I have the following HTML string.
var html = "<div class=ClassC441AA82DA8C5C23878D8>Here is a text that should be ignored.</div>This text should be ignored too<br><ul><li>* Need this one</li><li>Another bullet point I need</li><li>A bulletpoint again that I want</li><li>And this is the last bullet I want</li></ul><div>Ignore this line and text</div><p>Ignore this as well.</p>Text not important."
I need everything between the '<ul>' to '</ul>' tags. The '<ul>' tag can be excluded.
Now regular expression is not my strongest side, but if that can be used I need some help.
My code is in c#.
You should use the HtmlAgilityPack for things like this. I wrote a little introduction to it a while ago that may help you get going: http://colinmackay.scot/2011/03/22/a-quick-intro-to-the-html-agility-pack/

how to search and hilight text containing special characters using javascript/jquery/c# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Full text search in HTML ignoring tags / &
I did lots of googling but didnt find any help.
I have a webbrowser control wchich has HTML body.Body contains data that includes special charactors also. I want to add search box wchich will search and hilight it in page .Text can include special characters like \,/,?,$,^,&,<,> .
how should i achieve this using jquery/javascript or c#?
Here's an answer I gave to a similar question:
https://stackoverflow.com/a/5887719/96100
However, window.find(), which the above answer relies on, is likely to be removed from browsers in the future and is not going to be replaced in the short term. That being the case, I've written a flexible search function for my Rangy library. Demo (with highlighting) here:
http://rangy.googlecode.com/svn/trunk/demos/textrange.html

Categories

Resources