I'm letting my users register an email account, the users just fills in all information in my program and my program will fill the fields. Well not really it makes a POST request with the correct postdata to the correct form/post url.
However the website requires a captcha, I just simply want to show the captcha to my user, he enters the value and then it gets send along with the postdata.
The register page is here: http://register.rediff.com/register/register.php?FormName=user_details
I can just get all image urls from the html but when I copy the url of the captcha image and go to it, it's a different image then the image i copied the url from:
http://register.rediff.com/register/tb135/tb_getimage.php?uid=1312830635&start=JTNG
How do I do this using HttpWebRequest ?
I can just grab the html first:
string html = new WebClient().DownloadString("http://register.rediff.com/register/register.php?FormName=user_details ");
Then get the image url but I don't know how to show the same captcha to the user?
Btw it's not for a bot... it's not something automated.. it's just I don't want the user to show the webinterface...
Not really answer, some advice instead:
If you're writing an app client to work with the website, a better approach would be to write a WCF/WebService for the App to interact with directly - this can just refer directly to your BL layer.
If you want the whole app to work on screen scraping then that's a lot of work ahead, and your app will be dependent on the site not being changed.
Related
I have a scenario where I would like to automate programmatically the following process:
Currently, I have to manually
Navigate to a webpage
Enter some text (an email) in a certain field on the webpage
Press the 'Search' button, which generates a new page containing a Table with the results on it.
Manually scroll through the generated results table and extract 4 pieces of information.
Is there a way for me to do this from a Desktop WPF App using C# ?
I am aware there is a WebClient type that can download a string, presumably of the content of the webpage, but I don't see how that would help me.
My knowledge of web based stuff is pretty non-existent so I am quite lost how to go about this, or even if this is possible.
I think a web driver is what you're looking for, I would suggest using Selenium, you can navigate to sites and send input or clicks to specific elements in them.
Well, I'll write the algorithm for you but you also need to some homework.
UseWebClient get the htm page with the form you want to auto fill and submit
Us regex and extract the action attribute of the form you want to auto submit. That gets you the URL you want to submit your next request to.
Since you know the fields in that form, create a class corresponding to those fields, let's call the class AutoClass
Create a new instance of your auto class and assign values you want to auto fill
Using WebClient to send your new request with the url you extracted from the form previously, attach your object which you want to send to the server either through serialization or any method.
Send the request and wait for feedback, then further action
Either use a web driver like Puppeteer (Selenium is kinda dead) or use HTTPS protocol to make web requests (if you don't get stopped by bot checks). I feel like your looking for the latter method because there is no reason to use a web driver in this case when a lighter method like HTTP requests can be used.
You can use RestSharp or the built in libraries if you want. Here is a popular thread of the ways to send requests with the libraries built in to C#.
To figure out what you need to send you should use a tool like Fiddler or Chrome Dev Tools (specifically the Network tab) to see what you need to send to acheive your goal as you would in a browser.
Here's the full disclosure: I have full authorization to access one of our company's third-party servicer's websites, from which I download daily reports and perform certain repetitive tasks. I know how to automate IE to perform all of these duties for me, but logging in to the website requires the entry of a captcha phrase, and frankly I'm tired of entering captcha phrases.
When I refresh the page the captcha phrase doesn't change. I isolated the URL for the captcha picture, thinking that I could just (in C#) use a JpegBitmapDecoder to grab a picture of the captcha phrase, crack it (I already wrote some code that will crack it), then navigate to the log in page in IE and put the result in. However, the server considers the JpegBitmapDecoder and my IE page to be different sessions, so it throws a different captcha.
My goal is to find a way to grab the captcha image (it's just a jpeg image) as it appears right off the IE page. I want to do it in such a way that the IE instance doesn't have to be visible (so preferrably no "screencapture" methods). I've tried all sorts of ways using HTML DOM and whatnot but can never get to the raw bytes of the image. I'd prefer not to have to read and decode packets either, if that's even possible. How else can this be done? Certainly the bytes representing the jpeg image are stored locally somewhere.
Thanks,
You need to send session cookie headers with your JpegBitmapDecoder request.
Generally it works like this:
You enter the site. If you don't have session cookie set in browser (ssid=xxxxxxxxxxxxx), server sets a new session for you by sending Set-Cookie headers in the response. From now the browser knows what ssid it should use, and remembers it. Every new request send to this domain contains ssid cookie value that matches the value given by the server first time. So you got to take that ssid and tell JpegBitmapDecoder to send request with that ssid set.
in my web application I want the page URL to point to current user by his username/ID
so when i change the username the browser moves me to the other username page.
for example like this: www.XYZcom/Member.aspx?userID="012345"
i want to do this because i have a search method and when i click on the resulted hyperlinks, the link returns me back to my own profile, not the username i clicked on.
I'm using c# and ASP.NET.
I hope my question is clear.
HttpContext.Current.Request.QueryString.Add()
include System.Data in your file and use HttpContext to get and set parameters to the url.
You said that you are using c# so this is an example of using C# in the code behind file or controller. You can also accomplish this on the client using java script, but I will wait for a response from you in regards to where you are trying to append to the url.
I want to use the WebRequest class to post data to a website. This works fine, however the website I'm posting to requires cookies/sessions (it's a login form). After logging in I need to retrieve some account information (this is information on a specific page).
How can I make sure the login information is being stored? In AutoIT I did this using a hidden webbrowser, however I want to use a console application for it.
My current code (to login) is too long to post here, so it can be found here.
Take a look at my aspx sessions scraper on bitbucket. It does exactly what you are asking for, including some aspx webforms specific extensions, like sending postbacks etc.
You need to store the cookie that you get after logging in and then send that cookie when you request pages containing personal information.
Here is an example of using cookies with WebRequest
It is possible that you can't connect because the session has ended so in this case you need to relogin.
I am trying to use domain masking to simulate multi-tenant access to my application. The plan right now is to read the subdomain portion of the domain ie: demo.mydomain.com and load settings from the DB using that name.
The issue I'm having is that request.url is getting the request url - NOT the url in the browser.
So if I have http://demo.mydomain.com forwarding to http://www.mydomain.com/controllername with masking, request.url is grabbing the latter, simply because of how masking works, i assume - by putting the masked site inside of a frame.
Is it even possible to read the url in the browsers address bar? Thanks.
You probably can get the url you want, but at the client side...
So, do this:
Get the browser's url by using a javascript call, like window.location.href.
Post that url to the server-side.
Cons:
This is a javascript dependent solution, it will not work with javascript disabled.
This is ugly as hell.
Pros:
You probably do not have any other option.