FiddlerCore - Multithreading webbrowser, getting same session - c#

I am with a very specific question, and I'll try to explain the best I can. I'm using windows forms, with webbrowser, multithread and fiddler.
My application executes different forms in multiple threads. There is an webbrowser on each form, that is running at the same time as the other webbrowsers on the other forms.
Each of them uses fiddlercore, and some of them navigate to the same website, searching for some information.
In the fiddlercore code, I use the event FiddlerApplication_AfterSessionComplete to capture all the traffic from the website (on each of the webbrowsers).
The main problem is that fiddler doesn't distinguish from which thread I'm navigating, so sometimes it takes the information that's supposed to be on another thread to the form that's navigating to the same page, but searching another things.
So, what I really need is a way to check if the Session I got from fiddler is the same as launched from that specific form.
If you need, I can post some code, but I don't think it's actually necessary.
I appreciate any help.

I don't know fiddlercore but from documentation on FiddlerApplication_AfterSessionComplete should be a Fiddler.Session object with an oRequest property of type ClientChatter. ClientChatter has a headers property,
So my suggestion is to add a Custom Header with a unique identifier (thread number.. guid..) in the request and check for this header when FiddlerApplication_AfterSessionComplete is fired to match Request / Response.

Related

Firing hCaptcha callback function for bypass token

I'm trying to bypass the hCaptcha in Discord Account Registration using selenium webDriver in C#. I'm using CapMonster Cloud API for solving the captcha itself and as response I'm getting bypass token.
The problem that I currently have is that I can't locate the callback function that I need to call/submit, in order to pass the hCaptcha.
I'm setting the bypass token into "g-recaptcha-response" and "h-captcha-response" textareas, but can't find a way to locate and call the callback function. There is no form to be submitted.
using selenium webDriver in C#
10/10 Would recommend doing discord captcha bypasses using:
PuppeteerExtraSharp/ExtraStealth
(as selenium has some obvious tracers)
Puppeteer has a lot more freedom in it's API as well as the fact that 2capthca is a much more popular method for solving h-captchas
I know this doesn't answer your question but i hope you look into this as a potential better alternative if you do not receive a more traditional answer.
You can do that with Anti-Captcha.com plugin which will do the job automatically. It injects its own callbacks, so when a token is ready it submits the form. If you ever have problems with plugin, support guys here will help you out.
Web communication has to happen in one of the methods defined on this page
So if anything is being sent and received from a server to the browser it has to be one among those methods. Generally the most common methods are POST and GET.
The statement "There is no form to be submitted" is somewhat confusing. A form is just display of fields to collect data from a user. In case a website does not need user input they do not show the form. They would instead capture the required data and send a POST request to the server (without the user ever noticing), in a manner similar to how a form would have sent the data. This is a normal behavior for almost all major websites. An example is google-analytics codes.
So what you need to look for is a POST request (mostly or PUT maybe GET - depends) where the data you are targeting is received or sent.
In your case there indeed is a form which displays the captcha (that is how you see it) and and associated POST request which does what you need.
Url for the post request on the captcha is POST /getcaptcha?s=xxxxxxxx-xxxe-xxxx-xxxx-xxxxxxxxxxxx HTTP/3
Url where it is sent is POST /api/v9/auth/register HTTP/3
These basics apply to any web communication and not just the website in question.

Invoke code-behind method when browser is closing

I need to find a way to intercept browser closing and invoke a metod to update a record with several information about logged user in DB. It's very important this record is updated when the user is logging-out or when he close the browser. Obviously when the user clicks 'Logout' I handle the update in the server-side event, but what if the user simply exit from browser?
Someone suggest to use the window.onbeforeunload event and make an asynchronous call to some WS or WebMethod to execute the code, but this doesn't convince me at all: the problem with onbeforeunload is that it shows a confirm prompt. I need to avoid this message and simply invoke the method.
So I'm wondering if there is a 'server-side' solution without using ajax or javascript.
For example... a way to trigger some event on session abandon or session clear, or some other way to solve this problem just working on code-behind...
There is no way you could have a server-side solution to know something that happens in the client browser.
I do not believe there is any way to do what you need server-side only. Client side is the only way. Server has no way of knowing when browser window was closed, this is limitation of HTTP protocol.
Yes, you can put an event in the Global.AsaX which will fire when the session ends. Now if you need data from the client to update the db etc., you'll need a way of getting it there, but if not, then the Session_End will do the trick.
Note: Session end is slightly different than the browser closing, so it this will depend on what you want the event firing to do.
How to handle session end in global.asax?
I'd like to find a 'server-side' solution without using ajax or
javascript.
I suspect that it's impossible with that requirement.
Maybe you could do something like:
Have a hidden IFRAME on the page
Set the Refresh header on this IFRAME (or use a META element) to contact the server every couple of seconds
If you do not hear from the client for some period of time, assume the browser has been closed.
However, I imagine that this solution will not scale well.
Have you considered something like signalr? I use it to detect when someone has a record open.
public class ChatHub : Hub
{
public override Task OnDisconnected()
{
Broadcaster.Disconnected(Context.ConnectionId);
return base.OnDisconnected();
}
}
For the moment I changed radically the approach to my problem.
To update pending rows I implemented a timed job using Quartz.NET framework, that runs every night.

Getting data from a webpage

I have an idea for an App that would really help me out in work but I'm not sure if it's possible.
I want to run a C# desktop application that will ask for a value. When a value is supplied, the application will open a browswer, go to a webpage and add the value into a form on an online website. The form is then submitted and a new page is loaded that contains a table of results. I then want to extract the table of results from the page source and write code to parse the result values.
It is not important that the user see's this happen in an actual browser. In other words if there's a way to do it by reading HTTP requests then thats great.
The biggest problem I have is getting the values into the form and then retrieving the page source after the form is submitted and the next page loads.
Any help really appreciated.
Thanks
Provided that you're only using this in a legal context:
Usually, web forms are sent via POST request to the web server, specifically some script that handles it. You can look at the HTML code for the form's page and find out the destination for the form (form's action).
You can then use a HttpWebRequest in C# to "pretend you are the form", sending a POST request with all the required parameters (adding them to the HTTP header).
As a result you will get the source code of the destination page as it would be sent to the browser. You can parse this.
This is definitely possible and you don't need to use an actual web browser for this. You can simply use a System.Net.WebClient to send your HTTP request and get an HTTP response.
I suggest to use wireshark (or you can use Firefox + Firebug) it allows you to see HTTP requests and responses. By looking at the HTTP traffic you can see exactly how you should pass your HTTP request and which parameters you should be setting.
You don't need to involve the browser with this. WebClient should do all that you require. You'll need to see what's actually being posted when you submit the form with the browser, and then you should be able to make a POST request using the WebClient and retrieve the resulting page as a string.
The docs for the WebClient constructor have a nice example.
See e.g. this question for some pointers on at least the data retrieval side. You're going to know a lot more about the http protocol before you're done with this...
Why would you do this through web pages if you don't even want the user to do anything?
Web pages are purely for interaction with users, if you simply want data transfer, use WCF.
#Brian using Wireshark will result in a very angry network manager, make sure you are actually allowed to use it.

C# WebClient - View source question

I'm using a C# WebClient to post login details to a page and read the all the results.
The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines???
The flash I am interested in is just text (not an image/video) etc and when I "View Selection Source" in firefox I do actually see the text, within HTML, that I want to see.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
Currently after I have posted my login details, and loaded the HTML back, I see the page which does NOT show the flash HTML (as if I had viewed source for the whole page).
Thanks in advance,
Jim
PS: I should point out that the POST is actually working, my log in is successful.
Fiddler (or similar tool) is invaluable to track down screen-scraping problems like this. Using a normal browser and with fiddler active, look at all the requests being made as you go through the login and navigation process to get to the data you want. In between, you will likely see one or more things that your code is doing differently which the server is responding to and hence showing you different HTML than a real client.
The list of stuff below (think of it as "scraping 101") is what you want to look for. Most of the stuff below is probably stuff you're already doing, but I included everything for completeness.
In order to scrape effectively, you may need to deal with one or more of the following:
cookies and/or hidden fields. when you show up at any page on a site, you'll typically get a session cookie and/or hidden form field which (in a normal browser) would be propagated back to the server on all subsequent requests. You will likely also get a persistent cookie. On many sites, if a requests shows up without a proper cookie (or form field for sites using "cookieless sessions"), the site will redirect the user to a "no cookies" UI, a login page, or another undesirable location (from the scraper app's perspective). always make sure you capture the cookies set on the initial request and faithfully send them back to the server on subsequent requests, except if one of those subsequent requests changes a cookie (in which case propagate that new cookie instead).
authentication tokens a special case of above is forms-authentication cookies or hidden fields. make sure you're capturing the login token (usually a cookie) and sending it back.
POST vs. GET this is obvious, but make sure you're using the same HTTP method that a real browser does.
form fields (esp. hidden ones!) I'm sure you're doing this already, but make sure to send all form fields that a real browser does, not just the visible fields. make sure fields are HTML-encoded properly.
HTTP headers. you already checked this, but it may make sense to check again just to make sure the (non-cookie) headers are identical. I always start with the exact same headers and then start pulling out headers one by one, and only keep the ones that cause the request to fail or return bogus data. this approach simplifies your scraping code.
redirects. These can either come from the server, or from client script (e.g. "if user doesn't have flash plug-in loaded, redirect to a non-flash page"). See WebRequest: How to find a postal code using a WebRequest against this ContentType="application/xhtml+xml, text/xml, text/html; charset=utf-8"? for a crazy example of how redirection can trip up a screen-scraper. Note that if you're using .NET for scraping, you'll need to use HttpWebRequest (not WebClient) for redirect-dependent scraping, because by default WebClient doesn't provide a way for your code to attach cookies and headers to the second (post-redirect) request. See the thread above for more details.
sub-requests (frames, ajax, flash, etc.) - often, page elements (not the main HTTP requests) will end up fetching the data you want to scrape. you'll be able to figure this out by looking which HTTP response contains the text you want, and then working backwards until you find what on the page is actually making the request for that content. A few sites do really crazy things in sub-requests, like requesting compressed or encrypted text via ajax, and then using client-side script to decrypt it. if this is the case, you'll need to do a bit more work like reverse-engineering what the client script is doing.
ordering - this one is obvious: make HTTP requests in the same order that a browser client does. that doesn't mean you need to make every request (e.g. images). Typically you only need to make the requests which return text/html content type, unless the data you want is not in the HTML and is in an ajax/flash/etc. request.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
This usually means that the discrepancy is caused by some DOM manipulations via javascript after the page has loaded. Try turning off javascript and see what it looks like.

Running server-side function as browser closes

Background: I'm creating a very simple chatroom-like ASP.NET page with C# Code-Behind. The current users/chat messages are displayed in Controls located within an AJAX Update Panel, and using a Timer - they pull information from a DB every few seconds.
I'm trying to find a simple way to handle setting a User's status to "Offline" when they exit their browser as opposed to hitting the "Logoff" button. The "Offline" status is currently just a 1 char (y/n) for IsOnline.
So far I have looked into window.onbeforeunload with Javascript, setting a hidden form variable with a function on this event -> Of course the trouble is, I'd still have to test this hidden form variable in my Code-Behind somewhere to do the final Server-Side DB Query, effectively setting the User offline.
I may be completely obfusticating this likely simple problem! and of course I'd appreciate any completely different alternative suggestions.
Thanks
I suspect you are barking up the wrong tree. Remember, it is possible for the user to suddenly lose their internet connection, their browser could crash, or switch off their computer using the big red switch. There will be cases where the server simply never hears from the browser again.
The best way to do this is with a "dead man's switch." Since you said that they are pulling information from the database every few seconds, use that opportunity to store (in the database) a timestamp for the last time you heard from a given client.
Every minute or so, on the server, do a query to find clients that have not polled for a couple of minutes, and set the user offline... all on the server.
Javascript cannot be reliable, because I can close my browser by abending it.
A more reliable method might be to send periodic "hi I'm still alive" messages from the browser to the server, and have the server change the status when it stops receiving these messages.
I can only agree with Joel here. There is no reliable way for you to know when the HTTP agent wants to terminate the conversation.

Categories

Resources