How do I get the final rendered HTML of a remote URL? - c#

I'm trying to get the visible HTML of a remote page into a string, after the scripts (aspx, javascript) have executed.
To be exact, I want the same HTML as if I went to chrome and pressed F12. The visible HTML, the final rendered HTML.
I've been struggling with this for days, so it would be lovely to get some instructions here.
If you have any questions, or feel like my question isn't clean, ask away.

You need a WebView that Renders the page and executes JavaScript.
Or try an engline like phantomjs. That does the JavaScript rendering for you.
Afterwards you can get the full dom of the html page.

Related

How to get URLs on page with HTMLAgilityPack, when the Source does not contain the URLs?

I am trying to scrape the KB Urls from this page:
https://support.microsoft.com/en-us/kb/894199
On the page, there are URLs such as:
https://support.microsoft.com/kb/2976978
If you open up the developer tools in Chrome, it shows that data is contained like this:
<div class="indent">
<a id="kb-link-142" href="https://support.microsoft.com/kb/2976978" target="_self">https://support.microsoft.com/kb/2976978</a>
</div>
Now based on the above HTML, I believe I should be able to scrape the URLs from the href element like this:
foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
list.Add(link.GetAttributeValue("href", string.Empty));
}
The problem I am running into though, is that when I download the HTMLSource, the content changes. What I mean is that even though the Developer tools show the above HTML available on the page, if you right click the page and choose to View source, the HTML it shows at that point is totally different, and does not contain any of the URLs that the rendered page displays.
My theory is that there's some kind of file reference where the HTML loads a file somewhere and the file contains the details of the page that is rendered.
So how can I use HTMLAgilityPack to get the URLs that are on the rendered page, since the source doesn't seem to contain them?
Also - I realize my question Title may be really confusing. If there is a technical term for what this page is doing/how it works, let me know and I can update the title so it is more logical and others can search it out in the future.
Okay, I see the problem now. This page is using Angularjs directives and bindings, and the hrefs are loading post page load. The page we are getting is before any parsing/execution has happened as from the web browser agent. This means the changes on the page after any DOM manupulation/ javascript or ajax modification will not be included in the HtmlDocument response. I think the way to go about this would be to pretend like a web browser request, let the javascript and ajax execute completely and fetch the content as advised here . Hope this helps!

C#/.NET Webclient, wait for page to finish loading

I'm trying to retrieve the html of a page with some Ajax on.
Problem is that Webclient.Downloadstring() returns to fast, so the Ajax page haven't finished loading => I'm not getting the right html :(
Is it possible to call another function or similar, so I for example request the page, wait a few seconds and then read the response? (so I allow the Ajax to finish loading before I retrieve the html)
Thanks,
Louisa
The WebClient by default only fetches the (HTML) contents of a single URL. It does not parse HTML and thus does not know about any CSS, images or javascript used on the page. You are trying to emulate the functionality of a full-blown browser, for which the WebClient alone is insufficient.
To achieve your desired behaviour, you will have to not only retrieve the HTML, but then also parse it, retrieve and execute javascript on the page and then get the resulting DOM. This is most easily achieved through a library that provides the functionality of a webbrowser to your application. Examples include System.Windows.Forms.WebBrowser (WinForms), System.Windows.Controls.WebBrowser (WPF) or Awesomium.

How to get page source of IFrame in a subdomain in IE addon

I am making an IE Addon using BandObjects in C#. I am making my web browser navigate to a page, suppose it's example.com. In that page there's an IFrame whose src is sub.example.com. So, IFrame points to a subdomain. I am able to fetch the URL of the IFrame, but unable to get the Page Source, when I view in the browser, it's there, but through code I can only view the script, no data.
I am pasting the IFrame:
<iframe height="40" src="http://sub.example.com/....php?style=web&ext=1305964161&hash=Ng1gwLG821-f" frameBorder="0" width="300" scrolling="no"></iframe>
When I view this element through visual studio, in HTML view, it shows me the data, that's an email, and Text View shows this. How do I get the HTML view or say the Page Source if this Iframe.
So, overall I want the data contained in this IFrame, the browser executes it some way, but how can I do it with code?
I have visited lot of sites, forums, but couldn't get it to work.
Well in the browser thats active the code is what the code is, theres no way of viewing the source of a frame within the active window unless of course the frame is large enough to right click on and view the source of itself. Otherwise your stuck with whats rendered in the browser as your source and the iframe per say is irrelevant in a matter of speaking. However seeing as its an add-on your making is it possible to load the url of the iframe up in a hidden window of sorts and then obtain the code that way as you would for the active page? Can you use javascript anywhere in your addon? I know thats a silly question but Ive never built an addon for a browser.
if you can use javascript however maybe something like getting the iframe name/id/whatever to identify it and then using innerHTML on the element you might be able to catch the source.

How do you get generated HTML from the Windows Phone WebBrowser control?

Before you jump all over me, I know about the SaveToString() method! The problem is that it only grabs the server-returned HTML. If it's further processed through script, I don't see a way to get at the actual rendered HTML. An easy example is a Tumblr page. If you view the SaveToString source on the phone you get a minimal link to the RSS feed and few other tags. Try to view the source in IE or Chrome and you'll see the full (and long) HTML. Any scripting call would replace the HTML, and there don't seem to be any other relevant methods. Any thoughts or ideas?
Had a look into Javascript?
Calling JavaScript Functions in WP7
should do you just fine

Html rendered under asp:updatepanel does not appear in page source

I am working with .net c#.
Is there a way to see the rendered html code under the updatepanel?
Thanks
more info:
I dynamically generate UI controls and place them in a asp:Panel control I have under updatePanel. My page is initially almost empty, and I add about 50 new controls upon button click. However, I cannot see the html code generated in the page source. as in, I can see my textfield on the screen but I cannot see the corresponding code in the html source on my browser.
Thanks again.
What are you using to view the source? If you are using the View Source functionality in some browsers, this may only be showing you the initial server response, and anything dynamically inserted into the page in an AJAX call might not appear.
If you use a tool like Firebug you can watch the current state of the DOM, which will show you any dynamically inserted elements.
With Internet Explorer you can use the Developer Tools (IE8) to view the actual source, not just the initial source. As Tom said Firebug will do the same thing in Firefox, and Safari has a similar option that I can't remember off hand what it's called.
Basically, you need to inspect the DOM instead of the html source. Addins like Firebug for firefox and Developer Tools for IE8 would allow you to inspect the DOM and even allow you to update them dynamically.
If you need to view HTML instead of the DOM representation, you can use Fiddler or Firebug's NET Panel, which will let you debug HTTP traffic and see the response given for the AJAX calls.
It does appear over here, just like normal ASP.net controls, just there is a little bit of ajax code that does the updating. Can you be more specific about what are you looking for?

Categories

Resources