c# - how can I read a specific line from a HTML page - c#

I have a HTML page with 2 lines. The 2 lines are regarding an update ,but this update can be downloaded only by a specific user.
Basically the HTML file contains:
beast
1.11
the 1st line is the user and the 2nd is the version.There are no HTML headers or other lines.
Now my problem is,that I really have no idea how to read just the 2nd line
A text document can be easily manipulated by:
File.ReadLines("Text.txt").ElementAt(1);
Is there a similar command wich can be used for reading an HTML file line by line?
HUGE thanks for any reply!!!!

You can use that same exact code to read HTML.
HTML files are ordinary text files that happen to contain HTML tags.
If you want to parse the HTML tags, use HTML Agility Pack (on NuGet).

Related

C# - Convert HTML String to formated plain text

Is there a way to convert a html string for example content saved using a wysiwyg editor such as tinyMCE to formatted plain text i.e. retain line breaks.
I've been looking at the HTML Agility Pack but there are no examples that show how you can achieve something like the above.
Basically I want send content saved using tinyMCE in an email notification but i need to ensure any styles and formatting saved by tinyMCE don't break the styles in my HTML email hence the need to convert the content to plain text.
Any examples would be appreciated.

C# ASP.NET: How do I Turn Text into HTML

How would i show the output as HTML. I have tried HTML Decode and it still didn't work.
#section Grid {
#Server.HtmlDecode(lister.gen(new System.IO.StreamReader(Server.MapPath("~/Grid.xml")).ReadToEnd()))
}
Edit: I am taking XML from output.InnerXml (A XMLDocument) and trying to put it into a HTML Document as HTML (As in <a> is a link and <img> is a picture and not Text)
It turns out I had to add #Html.Raw along with HtmlDecode for it to display correctly
#Html.Raw(Server.HtmlDecode(lister.gen(new System.IO.StreamReader(Server.MapPath("~/Grid.xml")).ReadToEnd())))
If you want to show the HTML in an HTML page, you need to use HTML encode, not decode. This will put the proper tags in to turn < and > (and other HTML elements.
ADDED due to comment:
If showing XML AS HTML is the goal, then you will end up extracting the XML, from the DOM and putting it into the format you desire. You can bind XML to a table if you are simply trying to get it into a grid. If you need sorting, etc, LINQ to XML works nicely.
ADDED - second edit
Your XML appears to be XHTML, so you can simply throw it into the stream at the correct location. I would create a server control and then Response.Write the XML from the server control. You will want the decoded version based on what you posted. I was assuming you wanted to show the XML in the page, which was incorrect.
There is one minor bit of an issue with the XML, as it contains paragraph and div tags inside of an anchor tag. Not illegal, but not necessary in this case.

Hide part of text temporarily, show after user clicks certain element

I'm making a detail page about certain items.
This detail page can contain large blocks of text, and the customer would like to only show the first 100 letters and then put a " ... more " at the end.
When the user clicks this " ... more " the rest of the text can be shown.
Biggest problem: the text is currently is a CMS and has large varieties. Some is pure text, some have html elements in them ...
I tried to cut off the text and put them in spans. Then i could show/hide these spans as i please. The issue here is that there can be a starting element of a certain tag in the first span and the closing element can be in the second span. This causes the DOM hierarchyto be faulty and the result is never pretty.
Does anyone know a ( other ) way to achieve this or a library i can use ?
To be able to extract "readable" characters you need to get the content into a plain text format (get rid of the mark-up).
Since the content is stored in a cms it is likely that the content is structured to be well formed - thus xhtml.
If that is the case you can treat the content as XML. Get the root node and get the innertext property there-of. Then you will have plain text - no tags - and can easily cut it after the first 100 characters or whatever the requirement is.
Hopefully the content doesn't contain js/css!
Edit:
It seems that the markup must be retained.
Try the following xsl to transform and truncate the content:
https://gist.github.com/allen/65817

DiffPlex does not ignore html tags while comparing two html codes

Well I am using DIFFPlex for comparing two html code fetching from db but i want to ignore html tags and their styling (only content), is there any way to achieve this.
I am waiting for your answer guys.
Thanks
I would suggest using the HtmlAgilityPack to strip all the html tags.
See the following question: https://stackoverflow.com/a/16875574/12919

Delete blank pages from WORD open XML

I have successfully generated a word document file using open XML, but I have got too many blank pages,
how can i remove them ?
This depends on how those blank pages are represented in the Open XML; you may want to post a sample document to demonstrate exactly how your blank pages are represented.
But let's take the case of a Word document in which a user has inserted extra page breaks (by hitting ctrl-enter in Word), resulting in blank pages. These page breaks will be represented in the XML as:
<w:br w:type="page"/>
The page will still have plenty of tags in it for spacing, fonts, etc.; and the page may display header and footers, too. But let's define a blank page as one which has no new paragraph text. In Open XML, new text is displayed with a w:t tag.
So, in order to remove blank pages created by extra page breaks with no text in between, we can run the following regular expression on the XML document, replacing with blank (""):
<w:br w:type="page"/>(.(?!<w:t>))*(?=<w:br w:type="page"/>)
This regex will search for a series of two or more page breaks with no new text in between, removing all but the last one.
(Note that this won't take care of blank pages at the end of the document, which is a bit trickier. Additionally, if you'd like to account for pages with images, textboxes, etc., the regex will have to be expanded to include the relevant items).

Categories

Resources