I want to add a html control in C# which will display all the text from an html page selectively with the title given in my html page
Don't forget that when you are programming in ASP.NET, you are really programming in HTML. ASP.NET controls have their effect by generating HTML, which is then sent to the browser.
This changes your question. Your question is really, "how can I use HTML to display the contents of another web site, and how can I make ASP.NET generate the HTML that I need".
You can display the contents of another site by using an iframe:
<iframe id="myOtherSite" src="other site url"/>
You can simply place that on your ASP.NET page. However, it doesn't solve your problem with the title. I expect you can do that with some JavaScript, as your main window can access the DOM of the iframe to pick up the title and put it where you want it.
You could always use a string reader to effectively 'scrape' the page content from the 3rd party site. Then use a simple regular expression check to grab the page title. You could then do with it as you want.
Related
Im looking for a simple way to get a string from a URL that contains all text actually displayed to the user.
I.e. anything loaded with a delay (using JavaScript) should be contained. Also, the result should ideally be free from HTML tags etc.
A straightforward approach with WebClient.DownlodString() and subsequent HTML-regex is pretty much pointless, because most content in modern web apps is not contained in the initial HTML document.
Most probably you can use Selenium WebDriver to fully load the page and then dump the full DOM.
I am trying to scrape the KB Urls from this page:
https://support.microsoft.com/en-us/kb/894199
On the page, there are URLs such as:
https://support.microsoft.com/kb/2976978
If you open up the developer tools in Chrome, it shows that data is contained like this:
<div class="indent">
<a id="kb-link-142" href="https://support.microsoft.com/kb/2976978" target="_self">https://support.microsoft.com/kb/2976978</a>
</div>
Now based on the above HTML, I believe I should be able to scrape the URLs from the href element like this:
foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
list.Add(link.GetAttributeValue("href", string.Empty));
}
The problem I am running into though, is that when I download the HTMLSource, the content changes. What I mean is that even though the Developer tools show the above HTML available on the page, if you right click the page and choose to View source, the HTML it shows at that point is totally different, and does not contain any of the URLs that the rendered page displays.
My theory is that there's some kind of file reference where the HTML loads a file somewhere and the file contains the details of the page that is rendered.
So how can I use HTMLAgilityPack to get the URLs that are on the rendered page, since the source doesn't seem to contain them?
Also - I realize my question Title may be really confusing. If there is a technical term for what this page is doing/how it works, let me know and I can update the title so it is more logical and others can search it out in the future.
Okay, I see the problem now. This page is using Angularjs directives and bindings, and the hrefs are loading post page load. The page we are getting is before any parsing/execution has happened as from the web browser agent. This means the changes on the page after any DOM manupulation/ javascript or ajax modification will not be included in the HtmlDocument response. I think the way to go about this would be to pretend like a web browser request, let the javascript and ajax execute completely and fetch the content as advised here . Hope this helps!
I'm trying to parse a website. The only problem is that the site dosen't use a specific URL to the site I wan't to parse. The content is being displayed to the site using JavaScript on the same webpage so the content is different depending on the searchquery.
Is it possible to choose a value from a dropdown-menu and then post that to the server and then parse the HTML-code in C#?
Clarification:The code is returned in HTML.
I know the name of the option from the dropdown i want to post, but how do I do that from code-behind?
Most sites do not really generate HTML in Javascript. Much more often you see Asp.Net sites where Javascript is used for a postback (and name of the dropdown is posted back in __EVENTTARGET field)
Then you can do the same in your application - you have to imitate filling the form - pass all the fields to the server including VIEWSTATE and EVENTTARGET.
Having said that, it might be against the site's terms of use.
You definitely need to checkout Selenium, it does exactly what you need. It is commonly used as a testing framework. However you can use it to manipulate HTML tags even when the website uses javascript.
Note: Selenium allows you to open and manipulate a website using a browser such as FireFox, Chrome, IE, etc. However, I think what you need here is to use the WebDriver, which manipulates the website without opening a browser. Most of my experience using Selenium is with Java, but I found multiple tutorials online for .net too.
I have been tasked to create a layout editor for my companies internal Reporting System. The Specifications they gave me indicate that templates must be able to be defined in .html files in a certain folder. These HTML files can have their own style etc. So it's a full HTML page with the html, head and body tag with content areas that are indicated with special a syntax.
Now what's been bothering me is that I have to load this page with it's styling etc. into a layout div (or IFrame maybe?) where I need to be able to work on it with Javascript (Using JQuery) to insert the controls to manage how the data is displayed.
I can't seem to find a way to do this. Any ideas as to how achieve this according to specifications? Any Help will be appreciated.
The only way to load the page with all referenced stylesheets applied appropriately, and avoiding javascript conflicts is to embed the html in an iframe.
This does however mean that your page will have to be served from the same domain as your application in order for you to be able to interact with the content in an easy way, but as long as this is so (possibly using your app as a proxy for the pages) there is cross browser support out there from jQuery * other javascript frameworks are available I'm sure.
I realize this is probably a fundamental thing I should know but I am self-teaching myself C# and asp.net so I am a little lost at this point.
I right now have 2 pages. One is an .aspx (with aspx.cs file included) that is blank and html is generated for it from a Page_Load function in the cs file. The HTML is very simple and it is just an image and some text.
The second file is a shtml file which has lots of things, serverside includes, editable and noneditable areas. I want to put my webapp into this file. My asp.net app uses Response.Write to just write out the html. This does not flow well with this page as all that does is write it at the top of the page which is because it is ran first and generates it at the top.
How can I make it to where I can generate HTML code inside the page, like within a specific DIV so it does not mess up the page. Where would a starting point be in learning how to do that.
I should note that I do not need any interaction from the user. All of this should generate right away.
I think you need to read up on some basic ASP.Net documentation and tutorials. Response.Write is not the correct approach - you need to understand how the ASP.Net page lifecycle works and how WebControls are used to render the html.
ASP.Net tries to abstract away having to create your html manually for the most part.
So if i have understood the questions correctly.
You already have an existing page/application (the shtml file) that you want to extend with some new ASP.NET components by including output from the ASP.NET page in the existing page?
This is as not something that is out of the box "supported" by ASP.NET and you "won't" be able to execute the aspx page using SSI. But you can do the opposite, an ASP.NET page does support SSI. So if you are not using any other scripts in the shtml file this might be a solution.
Otherwise the only common solutions would be either to use an AJAX framework and let it call the ASP.NET from within the existing pages or to use an iframe solution. In both cases the client will be resposible for making the calls to the ASP.NET pages and merging the results.
And then you have a issue with controlling the output from the ASP.NET page?
The Polymorphic Podcast has a good article on Controlling HTML in ASP.NET WebForms .
You can add a Literal control to the page inside the div:
<div>
<asp:Literal ID="litMarkup" runat=server />
</div>
then in your code-behind:
litMarkup.Text = "<strong>Your markup</strong>";
I don't know how well this would work for you, but could you try using an iframe to house the ASP.NET page? This should keep it in the specified region and not overwriting your shtml file. It may be something to think about.
If it is necessary that you generate your HTML output from C# code, and you would use this in more than one place, I think you may be thinking of something like what are called ASP.NET Custom Controls (not to be confused with "User Controls"-- though you probably could put together a solution with those as well, using a Literal control as another person suggested). The MSDN documentation would be a good starting point. In general, though, the writing-out-HTML-yourself-from-code model (like you would with, say, CGI applications), is not the usual ASP.NET model of development, as it largely defeats the point of using ASP.NET at all. You'd mostly want to do this sort of thing if you are writing your own web control, though this might be exactly what you are doing (hard to tell from the description).