I have some web pages with A lot of tags.i want to download source Page where it tag is Span and its className is Something.
It's possible that i just download a part of page(source code) not whole page?
I know that i can do it with webbrowser(for example navigate to my destination page and search for specific tag and get its source code)
But with it,i must first get whole page and after that get specific tag.
there is any way (for example: WebClient class) to download just my specific tag with specific ClassName source code?
No, the HTTP protocol doesn't have any facility to do what you need (the only thing one can do is get a certain Range, but that requires you to know exactly where the data is, so that doesn't seem to help), you will have to download the entire page and then parse what you need.
I'm afraid that you cannot download just parts of your page and you need to load the whole page first. But to make it maybe easier, you can parse the HTML in XML and then work with it which is a lot easier.
Related
Im looking for a simple way to get a string from a URL that contains all text actually displayed to the user.
I.e. anything loaded with a delay (using JavaScript) should be contained. Also, the result should ideally be free from HTML tags etc.
A straightforward approach with WebClient.DownlodString() and subsequent HTML-regex is pretty much pointless, because most content in modern web apps is not contained in the initial HTML document.
Most probably you can use Selenium WebDriver to fully load the page and then dump the full DOM.
In HTMLAgailityPack, how to get the data from the website which is not coming in the innerhtml method of it. For example, if in the link below:
https://www.theice.com/productguide/ProductSpec.shtml?specId=1496#expiry
The table starting with contract symbol is not coming in the innerhtmltext. Please let me know how to get this table data through HTMLAgailityPack?
Regards
You need to send a GET request to https://www.theice.com/productguide/ProductSpec.shtml?expiryDates=&specId=1496&_=1342907196619
The content is being loaded dynamically via javascript. Perhaps you can parse the innerhtmltext to see what link the javascript will send the GET request to
If its not 'coming in the innerhtml' that would mean that its being put in there by a script. I'm not able to check this page myself so I'm not sure.
If its coming from a script, you can't get it very easily. You can play around viewing the javascript and maybe being able to read the data as its coming in.
Basically install Firebug on your browser, and look at the data transfers being made. Sometimes you're lucky, sometimes you're not.
Or you can take the simple method and use the winforms WebBrowser control, load it in it, let it run the script then scrape from there. Note that this will leak memory and GDI handles like crazy.
Pleae use this XPath to get that table you want //*[#id="right"]/div/table
e.g.
HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[#id="right"]/div/table"));
string html = node.InnerHtml;
I need to write a C# code for grabbing contents of a web page. Steps looks like following
Browse to login page
I have user name and a password, provide it programatically and login
Then you are in detail page
You have to get some information there, like (prodcut Id, Des, etc.)
Then need to click(by code) on Detail View
Then you can get the price for that product from there.
Now it is done, so we can write detail line into text file like this...
ABC Printer::225519::285.00
Please help me on this, (Even VB.Net Code is ok, I can convert it to C#)
The WatiN library is probably what you want, then. Basically, it controls a web browser (native support for IE and Firefox, I believe, though they may have added more since I last used it) and provides an easy syntax for programmatically interacting with page elements within that browser. All you'll need are the names and/or IDs of those elements, or some unique way to identify them on the page.
You should be able to achieve this using the WebRequest class to retrieve pages, and the HTML Agility Pack to extract elements from HTML source.
yea I downloaded that library. Nice one.
Thanks for sharing it with me. But I have a issue with that library. The site I want to get data is having a "captcha" on the login page.
I can enter that value if this can show image and wait for my input.
Can we achive that from this library, if you can like to have a sample.
You should be able to achieve this by using two classes in C#, HttpWebRequest (to request the web pages) and perhaps XmlTextReader (to parse the HTML/XML response).
If you do not wish to use XmlTextReader, then I'd advise looking into Regular Expressions, as they are fantastically useful for extracting information from large bodies of text where-in patterns exist.
How to: Send Data Using the WebRequest Class
I have an web based application. The content for the Home page has been currently mentioned in the HTML code for the Home page using , and tags. To change the content anytime in future, it needs to be changed in the HTML code. :(
Is there a way that we can pick up the content from some external place and get it reflected through the website. This ways, any change if required can be made at the external location without referring to the application's code.
Please advise if there is any solution for it.
Thanks.
You can
Use a database
Include external files using Server Side Includes
Read external files and write their contents and an alternative method
Sounds like you're looking for a Content Management System (CMS), which will allow your content editors access to modify only specific blocks of a page that you specify.
There are a ton out there to do what you want, so you don't have to start from scratch. Just Google 'CMS'.
Although I haven't used it myself, DotNetNuke is a popular one these days and has a free version.
I've been struggling to find an exmample of some C# code (I'm using C# Visual Studio 2008 Express) that can programmatically save an entire web page (given a URL) including the images and formatting (e.g. CSS). The intention is that in a subsequent phase I'd ship this off (not sure how yet) so it could be viewed later via a browser.
Is there an example of the most simple approach (leveraging the .NET Framework methods) to save an entire web page? Saving as one page with a subdirectory for images, or otherwise. Basically the same as what you get with browsers when you say "save entire web page".
The simplest way is probably to add a WebBrowser Control to your application and point it at the page you want to save using the Navigate() method.
Then, when the document has loaded, call the ShowSaveAsDialog method. The user can then save the page as a single file, or a file with images in a subdirectory.
[Update]
Having now noticed "programatically" in your question, the above approach is not ideal as it requires either user involvement or delving into the Windows API to send input using SendKeys or similar.
There is nothing built-in to the .NET Framework that does all of what you ask.
So my approach revised would be:
Use System.NET.HttpWebRequest to get the main HTML document as a string or stream (easy).
Load this into a HTMLAgilityPack document where you can now easily query the document to get lists of all image elements, stylesheet links, etc.
Then make a separate web request for each of these files and save them to a subdirectory.
Finally update all relevent links in the main page to point to the items in the subdirectory.
In effect you would be implementing a very simple web browser. You may run into issues with pages that use JavaScript to dynamically alter or request page content, but for most pages this should give acceptable results.
From code Project: ZetaWebSpider
It's definitely not elegant, but you could navigate a System.Windows.Forms.WebBrowser to the URL and then call its ShowSaveAsDiagog() method to save the page.