Before you jump all over me, I know about the SaveToString() method! The problem is that it only grabs the server-returned HTML. If it's further processed through script, I don't see a way to get at the actual rendered HTML. An easy example is a Tumblr page. If you view the SaveToString source on the phone you get a minimal link to the RSS feed and few other tags. Try to view the source in IE or Chrome and you'll see the full (and long) HTML. Any scripting call would replace the HTML, and there don't seem to be any other relevant methods. Any thoughts or ideas?
Had a look into Javascript?
Calling JavaScript Functions in WP7
should do you just fine
Related
I'm trying to get the visible HTML of a remote page into a string, after the scripts (aspx, javascript) have executed.
To be exact, I want the same HTML as if I went to chrome and pressed F12. The visible HTML, the final rendered HTML.
I've been struggling with this for days, so it would be lovely to get some instructions here.
If you have any questions, or feel like my question isn't clean, ask away.
You need a WebView that Renders the page and executes JavaScript.
Or try an engline like phantomjs. That does the JavaScript rendering for you.
Afterwards you can get the full dom of the html page.
It should be such a problem: I should parse links from the site. Everything would be fine, but the links are displayed in the script and in the source code they are not. More precisely, they are, but the old ones.
Here is the site: http://54.join.ru/resume?q=
Need to parse links to resume. Everything is fine. But when you go to some other page, for example 5, a summary of changes, and the source code are old links, ie those that were on the first page.
Can anybody suggest how can I parse the new links? Write in c # using webBrowser.
Use Selenium WebDriver.
Selenium-WebDriver was developed to better support dynamic web pages
where elements of a page may change without the page itself being
reloaded.
Thus you will be able to access elements on a web page that has been changed dynamically by javascript.
Following code for example finds an element by given class name:
IWebElement we = driver.FindElement(By.ClassName("ra-elements-list__new-window-link"));
I'm trying to retrieve the html of a page with some Ajax on.
Problem is that Webclient.Downloadstring() returns to fast, so the Ajax page haven't finished loading => I'm not getting the right html :(
Is it possible to call another function or similar, so I for example request the page, wait a few seconds and then read the response? (so I allow the Ajax to finish loading before I retrieve the html)
Thanks,
Louisa
The WebClient by default only fetches the (HTML) contents of a single URL. It does not parse HTML and thus does not know about any CSS, images or javascript used on the page. You are trying to emulate the functionality of a full-blown browser, for which the WebClient alone is insufficient.
To achieve your desired behaviour, you will have to not only retrieve the HTML, but then also parse it, retrieve and execute javascript on the page and then get the resulting DOM. This is most easily achieved through a library that provides the functionality of a webbrowser to your application. Examples include System.Windows.Forms.WebBrowser (WinForms), System.Windows.Controls.WebBrowser (WPF) or Awesomium.
In HTMLAgailityPack, how to get the data from the website which is not coming in the innerhtml method of it. For example, if in the link below:
https://www.theice.com/productguide/ProductSpec.shtml?specId=1496#expiry
The table starting with contract symbol is not coming in the innerhtmltext. Please let me know how to get this table data through HTMLAgailityPack?
Regards
You need to send a GET request to https://www.theice.com/productguide/ProductSpec.shtml?expiryDates=&specId=1496&_=1342907196619
The content is being loaded dynamically via javascript. Perhaps you can parse the innerhtmltext to see what link the javascript will send the GET request to
If its not 'coming in the innerhtml' that would mean that its being put in there by a script. I'm not able to check this page myself so I'm not sure.
If its coming from a script, you can't get it very easily. You can play around viewing the javascript and maybe being able to read the data as its coming in.
Basically install Firebug on your browser, and look at the data transfers being made. Sometimes you're lucky, sometimes you're not.
Or you can take the simple method and use the winforms WebBrowser control, load it in it, let it run the script then scrape from there. Note that this will leak memory and GDI handles like crazy.
Pleae use this XPath to get that table you want //*[#id="right"]/div/table
e.g.
HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[#id="right"]/div/table"));
string html = node.InnerHtml;
I need to write a C# code for grabbing contents of a web page. Steps looks like following
Browse to login page
I have user name and a password, provide it programatically and login
Then you are in detail page
You have to get some information there, like (prodcut Id, Des, etc.)
Then need to click(by code) on Detail View
Then you can get the price for that product from there.
Now it is done, so we can write detail line into text file like this...
ABC Printer::225519::285.00
Please help me on this, (Even VB.Net Code is ok, I can convert it to C#)
The WatiN library is probably what you want, then. Basically, it controls a web browser (native support for IE and Firefox, I believe, though they may have added more since I last used it) and provides an easy syntax for programmatically interacting with page elements within that browser. All you'll need are the names and/or IDs of those elements, or some unique way to identify them on the page.
You should be able to achieve this using the WebRequest class to retrieve pages, and the HTML Agility Pack to extract elements from HTML source.
yea I downloaded that library. Nice one.
Thanks for sharing it with me. But I have a issue with that library. The site I want to get data is having a "captcha" on the login page.
I can enter that value if this can show image and wait for my input.
Can we achive that from this library, if you can like to have a sample.
You should be able to achieve this by using two classes in C#, HttpWebRequest (to request the web pages) and perhaps XmlTextReader (to parse the HTML/XML response).
If you do not wish to use XmlTextReader, then I'd advise looking into Regular Expressions, as they are fantastically useful for extracting information from large bodies of text where-in patterns exist.
How to: Send Data Using the WebRequest Class