I'm trying to automate the download of a file from a website. Normally to download the file, I login with a username and password. Navigate to a particular screen then click a button.
I've been trying to watch the sequence of POSTs using Chrome's developer mode, and then replicate all the steps using .Net WebClient class, but to no success. I've derived from the WebClient class and added cookie handling. Which seems to be working. I go to the login page and post using WebClient.UploadValues. About half the times it seems to work. The next step appears to make another POST action to a reporting URL. Once again I use WebClient.UploadValues, but the response from the server is a page showing an internal error.
I have a couple of questions.
1) Are there better tools than hand coding C# code to replicate a bunch of web browser interactions? I really only care about being able to download the file at a particular time each day onto a Windows box.
2) The WebClient does not seem to be the best class to use for this. Perhaps it's a bit to simplistic. I tried using HttpWebRequest, but it has no facilities for encoding POST requests. Any other recommendations?
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
I can even post the exact code I'm using. The site I'm pulling data from, specifically is the Standard and Poors website. They have the ability to create custom reports for downloading historical data which I need for reporting, not republishing.
Using IE to download the file would be a much easier, as compared to writing C# / Perl / Java code to replicate http requests.
Reason is, even a slight change in JavaScript code can break the flow.
With IE, you can automate it using COM. Following VBA example opens IS and performs a google search:
Sub Search_Google()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.google.com" 'load web page google.com
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.all("q").Value = "what you want to put in text box"
ie.Document.all("btnG").Click
'clicks the button named "btng" which is google's "google search" button
While ie.Busy
DoEvents 'wait until IE is done loading page.
Wend
End Sub
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
For this you can use Fiddler to view all the interaction going on and the RAW data going back and forth. To make it work with HTTPS you will need to install the Certificates to enable decryption of trafffic.
Related
I've a C# WPF application developed in VS 2015, and I want the browser to read some data from it. Just a short string. I can save it in a text file, or in a variable but it should be visible to the browser (using JS I suppose). For instance using file:/// doesn't work if the original page is hosted online - as in my case (different source conflict). This should work in Opera and FFox, but looking at their extensions, it seems you can only develop with front-end technologies, which are not enough in my case since I use WPF to look into Win OS, and then I need to share the result with the browser.
I suspect it's possible, and no , it's not to write a malicious piece of code. For instance I can read the details of the graphic card for diagnostic purposes.
Please help, many thanks.
Browsers run in a security sandbox which is intended to stop them reading or writing files to the file system.
You could write to the user's appdata. There are various javascript frameworks which persist data to there so they can provide offline or static data.
I don't think that is a good plan though.
I suggest your first candidate would be a cookie.
Quick google on how to do that, I find:
How to create cookie in c#.net windows application?
From a web page you can use the content of a cookie dynamically. So you could change what you see in the web page after it's up and running from some process in your wpf app and do a counter or whatever.
I've not used this with windows apps and a browser but I have with a web app and Silverlight. I'm afraid I don't have that code to hand though.
I'm developing a software on C# which has to get info from a website which the user opens in chrome, the user has to input some data and then the website returns a list of different items.
What I want is a way to be able to access to the source code of the page in order to get the info, I cant open the web myself as it doesnt show anything because I didnt input any data, so I need to get it directly from chrome.
How can I achieve this ? A chrome extension ? Or can I access to chrome directly from my software ?
Off the top of my head, I don't know any application that gets data directly from an open instance of Chrome. You'd have to write your own Chrome extension.
Alternatively, you can open the web browser from your application initially.
You can look into these libraries for doing so:
Watin (My personal favourite)
Selenium
Awesomium (You'd have to roll out your own UI, it's invisible)
Cef
Essential Objects Web Browser
EDIT: I didn't think about using QA tools as the actual browser hook as #TheAnathema mentions. That would probably work for your needs.
You're going to need to create it as Chrome extension if you must be dependent on the user actually going to a specific web site (i.e. not being able to do the requests yourself with either Selenium or standard web requests in Python).
The reason why a Chrome extension would be required is because think of how bad it could be for any software to easily read the pages you browse. Banking, medical, email, etc. could all be accessed anonymously from any process if Google allowed any outside process to tap into the web page.
Even Chrome extensions have to ask for permission to be able to do what they want, but at least it is software the user knowingly installed and agreed to the permissions.
A quick search yielded this example of modifying a page's HTML with a Chrome extension: https://blog.lateral.io/2016/04/create-chrome-extension-modify-websites-html-css/
It sounds like you want to do web scraping. Here's a good tutorial to get you started: HTML Scraping.
And this answer has a good example of how to scrape data from a website where you need to submit a form to get access to the data.
I understood how to login to Gmail using c#, but when I try to go to the webpage it does not recognize I have logged in to Gmail.
Overall, I need to login to Gmail, and then access a webpage once I'm logged in, and save its source code, all using c#, preferably without having to open a browser, just doing all within the c# application.
Edit: I have logged in to Gmail successfully. But when I then go to the website, it doesn't recognize that I'm logged in. I need a way to do it in the same session. I tried researching but couldn't understand how to do it.
I'm pretty sure that you can't Download the source code behind gmail, that is must likely closely guarded for security reasons, you can maybe get a response and try to download a list of mails if you want to make a Outlook lookalike.
if you need it all in one session, you need to find a way to share that across programs, your C# app runs in a separate environment then a browser does and cannot inter act directly, this has to be done through a API, socket communication etc.
you can however if all you want to do is to access gmail through your own program add the Web-browser component, from the toolbox to your form. (if you have one)
this is just a blank space (looks like a giant text box) that web pages can easily be loaded into. No URL bar, no controls in any way just completely blank
and then control the pages though your source code.
but what i wonder is why do you what this, why not just make your browser log you in automatically ?
To also use the same session in the browser you should transfer the session cookie to the browser. I don't know if this is possible. I don't even know if Gmail likes/allows this.
I would suggest you try something different (not trying to transfer sessions for example), like opening Gmail in the browser instead of your C# program.
You also can't download Gmail's source.
I am thinking about working with remote data and receive or send data actually in external web sites. exists a large amount of examples in World Wide Web are working. For example: free online web tools like web stats OR Google's AdSense .... .you know in such web services some code will generate for publishers and the publisher put generated code in her BODY of web page document(HTML file) and the system after that will work. we can have count of visits for home pages, count of clicks on advertisements and so on.now this is my question: How such systems Work? and how can I investigate and search about them to find out how to program them? can you suggest me some keywords? Which Titles should I looking for? and which Technologies is relevant to this kind of programming? Exactly I want to find some relevant references to learn and start some experiences on these systems. if my Q is not Clear I will Explain it more if you want...Help me I am confused.
Consider that I am an Programmer want to program such a systems not to use them.
There are a few different ways to track clicks.
Redirection Tracking
One is to link the advertisement (or any link) to a redirection script. You would normally pass it some sort of ID so it knows which URL it should forward to. But before redirecting the user to that page it can first record that click in a database where it can store the users IP, timestamp, browser information, etc. It will then forward the user (without them really knowing) to the specified URL.
Advertisement ---> Redirection Script (records click) ---> Landing Page
Pixel Tracking
Another way to do it is to use pixel tracking. This is where you put a "pixel" or a piece of Javascript code onto the body of a webpage. The pixel is just an image (or a script posing as an image) which will then be requested by the user visiting the page. The tracker which hosts the pixel can record the relevant information by that image request. Some systems will use Javascript instead of an image (or they use both) to track clicks. This may allow them to gain slightly more information using Javascript's functions.
Advertisement ---> Landing Page ---> User requests pixel (records click)
Here is an example of a pixel: <img src="http://tracker.mydomain.com?id=55&type=png" />
I threw in the png at the end because some systems might require a valid image filetype.
Hidden Tracking
If you do not want the user to know what the tracker is you can put code on your landing page to pass data to your tracker. This would be done on the backend (server side) so it is invisible to the user. Essentially you can just "request" the tracker URL while passing relevant data via the GET parameters. The tracker would then record that data with very limited server load on the landing page's server.
Advertisement ---> Landing Page requests tracker URL and concurrently renders page
Your question really isn't clear I'm afraid.
Are you trying to find out information on who uses your site, how many click you get and so one? Something like Google Analytics might be what you are after - take a look here http://www.google.com/analytics/
EDIT: Adding more info in response to comment.
Ah, OK, so you want to know how Google tracks clicks on sites when those sites use Google ads? Well, a full discussion on how Google AdSense works is well beyond me I'm afraid - you'll probably find some useful info on Google itself and on Wikipedia.
In a nutshell, and at a very basic level, Google Ads work by actually directing the click to Google first - if you look at the URL for a Google ad (on this site for example) you will see the URL starts with "http://googleads.g.doubleclick.net..." (Google own doubleclick), the URL also contains a lot of other information which allows Google to detect where the click came from and where to redirect you to see the actual web site being advertised.
Google analytics is slightly different in that it is a small chunk of JavaScript you run in your page, but that too basically reports back to Google that the page was clicked on, when you landed there and how long you spend on a page.
Like I said a full discussion of this is beyond me I'm afraid, sorry.
I'm trying to build a C# console application to automate grabbing certain files from our website, mostly to save myself clicks and - frankly - just to have done it. But I've hit a snag that for which I've been unable to find a working solution.
The website I'm trying to which I'm trying to connect uses ASP.Net forms authorization, and I cannot figure out how to authenticate myself with it. This application is a complete hack so I can hard code my username and password or any other needed auth info, and the solution itself doesn't need to be something that is viable enough to release to general users. In other words, if the only possible solution is a hack, I'm fine with that.
Basically, I'm trying to use HttpWebRequest to pull the site that has the list of files, iterating through that list and then downloading what I need. So the actual work on the site is fairly trivial once I can get the website to consider me authorized.
I have dealt with something similar, and the hardest part is figuring out exactly what you needed to "fake" to get authorized. In my case it was authorizing into some Lotus Notes webservice, but the details are unimportant, the method is the same.
Essentially, we need to record a regular user session. I would recommend Fiddler http://www.fiddler2.com but if you're on linux or something, then you'll need to use wireshark to figure some of the things out. Not sure if there is a firefox plugin that could be used.
Anyway, start up IE, then start up Fiddler. Complete the login process.
Stop what you're doing. Switch to the fiddler pane, and examine the recorded sessions in detail. It should give you exactly what you need to fake using WebRequests.
This page should get you started. You need to first make a request to the page, and then saving the cookie to a container that you include in all later request. That should keep you logged in, and able to retrieve the files.