C# download pdf from ajax driven url - c#

In a C# application I would like to open a url and download a pdf.
When this url is hit from the browser the page quickly loads and begins what I believe are ajax calls. After several seconds the browser download prompt appears with the pdf file.
I have attempted to open this url via WebClient. The stream I return is not the pdf file and is is the actual html of the page.
How can I detect the the pdf file has loaded and download it?

If I'm not mistaken, WebClient has no clue about JavaScript, it won't run the AJAX code at all, it just gets the HTML of the page and leaves it at that.
And since likely the PDF's URL is parsed by the Javascript or the PDF is generated on demand or even streamed trough Javascript, you really need support for active content.
This seems something like Selenium would be good for. http://www.seleniumhq.org/
It'll spawn either an actual browser and steer that browser to the content you need or run the PhantomJS headless browser and fetch the content you want.
It might be a bit overkill and a more knowledgeable person might have a better answer, but that's what I've used on an application that needs to fetch PDF's, CSV's and other files from many different websites.

Related

How to communicate with Google Chrome using C# or Python

I'm developing a software on C# which has to get info from a website which the user opens in chrome, the user has to input some data and then the website returns a list of different items.
What I want is a way to be able to access to the source code of the page in order to get the info, I cant open the web myself as it doesnt show anything because I didnt input any data, so I need to get it directly from chrome.
How can I achieve this ? A chrome extension ? Or can I access to chrome directly from my software ?
Off the top of my head, I don't know any application that gets data directly from an open instance of Chrome. You'd have to write your own Chrome extension.
Alternatively, you can open the web browser from your application initially.
You can look into these libraries for doing so:
Watin (My personal favourite)
Selenium
Awesomium (You'd have to roll out your own UI, it's invisible)
Cef
Essential Objects Web Browser
EDIT: I didn't think about using QA tools as the actual browser hook as #TheAnathema mentions. That would probably work for your needs.
You're going to need to create it as Chrome extension if you must be dependent on the user actually going to a specific web site (i.e. not being able to do the requests yourself with either Selenium or standard web requests in Python).
The reason why a Chrome extension would be required is because think of how bad it could be for any software to easily read the pages you browse. Banking, medical, email, etc. could all be accessed anonymously from any process if Google allowed any outside process to tap into the web page.
Even Chrome extensions have to ask for permission to be able to do what they want, but at least it is software the user knowingly installed and agreed to the permissions.
A quick search yielded this example of modifying a page's HTML with a Chrome extension: https://blog.lateral.io/2016/04/create-chrome-extension-modify-websites-html-css/
It sounds like you want to do web scraping. Here's a good tutorial to get you started: HTML Scraping.
And this answer has a good example of how to scrape data from a website where you need to submit a form to get access to the data.

Passing file from jQuery Ajax method to Webservice

I've a file upload control in my .aspx page. that file upload only allows excel files. after correct file is posted in that control. I'm rendering that control via JavaScript and passing that file to Web service, which has a method to access and do some calculation on that file.
I read some article for that. there is no such a functionality exist to transfer file via client side. Do you have any suggestion?
There is a HTML5 File upload, but only FF, chrome and Safari support these functions.
What you can do is using HTML5 upload with Flash or iFrame post as backup. A plugin with all browser support for jQuery can be found here.

C# download web-content after submit

I need to automate the download of a file from this site http://stats.smith.com/reports/Default.aspx?btnGo=View+Report. My problem is once I click on the submit buttom I lose control and a download dialog pops up. Is there a way to download the file using c# and avoid the download dialog? I'm currently using the WebBrowser object in the Forms assembly to navigate through the page.
Take a look at the WebClient class
If you want to save a downloaded file to the filesystem from a web browser, there must be user interaction. A web page does not have permission to muck about in a client's file system.
If you want to display the page in the browser, you can try removing the Content-Disposition=attachment;... server response header when the file is downloaded. This will only work if the client has the browser set to display such file types inside the browser.
Your question doesn't specify what you're using to download the file.
If you're asking if you can have a program that runs on a client (either a WinForms app, a console app, or a Windows Service) then you can download a file from a web site using the System.Net.WebClient class and calling the DownloadFile() method.
The accepted answer here (slightly different than you question, so it's not a duplicate) has a link to show how to download a file that requires an HTTP post first.
If you're trying to somehow automate Internet Explorer via a javascript from a web page you're hosting to force a file to download on a user without displaying the dialog box, then no. You can't.

Replicate steps in downloading file

I'm trying to automate the download of a file from a website. Normally to download the file, I login with a username and password. Navigate to a particular screen then click a button.
I've been trying to watch the sequence of POSTs using Chrome's developer mode, and then replicate all the steps using .Net WebClient class, but to no success. I've derived from the WebClient class and added cookie handling. Which seems to be working. I go to the login page and post using WebClient.UploadValues. About half the times it seems to work. The next step appears to make another POST action to a reporting URL. Once again I use WebClient.UploadValues, but the response from the server is a page showing an internal error.
I have a couple of questions.
1) Are there better tools than hand coding C# code to replicate a bunch of web browser interactions? I really only care about being able to download the file at a particular time each day onto a Windows box.
2) The WebClient does not seem to be the best class to use for this. Perhaps it's a bit to simplistic. I tried using HttpWebRequest, but it has no facilities for encoding POST requests. Any other recommendations?
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
I can even post the exact code I'm using. The site I'm pulling data from, specifically is the Standard and Poors website. They have the ability to create custom reports for downloading historical data which I need for reporting, not republishing.
Using IE to download the file would be a much easier, as compared to writing C# / Perl / Java code to replicate http requests.
Reason is, even a slight change in JavaScript code can break the flow.
With IE, you can automate it using COM. Following VBA example opens IS and performs a google search:
Sub Search_Google()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.google.com" 'load web page google.com
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.all("q").Value = "what you want to put in text box"
ie.Document.all("btnG").Click
'clicks the button named "btng" which is google's "google search" button
While ie.Busy
DoEvents 'wait until IE is done loading page.
Wend
End Sub
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
For this you can use Fiddler to view all the interaction going on and the RAW data going back and forth. To make it work with HTTPS you will need to install the Certificates to enable decryption of trafffic.

Access text generated with javascript

This website has a custom google search box:
http://ezinearticles.com/
The search results are generated by a piece of JS code. How would I access these results using wget and/or C#'s WebClient?
It looks like the searches on that page are normal google site searches. Try wget with the following url, where 'asdf' is your search
wget http://www.google.com/search?&q=site:ezinearticles.com+asdf
You need to to what your web browser does - render the page. Maybe you can extract the js call to the webservice providing the results and just execute this request and parse the output directly.
You need to access it with a programmable browser supporting JavaScript.
The HtmlUnit library for Java does this, and runs fine headless.
You can automate a real web browser, e.g. with WatiN on Windows, and access the page's content. This requires a GUI desktop though, because a real browser window is opened.

Categories

Resources