A browser inside a browser - c#

I want to create with Asp.net a browser inside a web page, so that I can process the click events of the user (for statistics analysis).
I kwnow how to do it with Winforms but I need a full online solution, so that:
The user open an standard browser and types in a start url.
In this url the menus and bars of the standard browser are hidden and
the user can see a "simulated browser", with standard buttons
(back, reload, ...).
From the Asp (c#) code behind this page I can start collecting click
data.
Thanks in advance, and keep the good work.

What you want to collect (a heat map of clicks essentially) is doable, but I don't think the way you want to go about it is very feasible.
Try this out.
I think that using this kind of solution with frames, etc. is much more feasible than embedding a browser (this amounts to writing a browser that can be served up by some kind of java/silverlight technology, not trivial).
Another idea would be that since, I assume, you have the permission of your users to track their clicks, write a greasemonkey (firefox plugin) based on the javascript in the link I provided above. You could then have all users use this plugin script combination to give you their clicks.

Web browsers are normally designed to prevent this kind of cross-site scripting vulnerability. This would only be feasible if you had the complete cooperation of all sites involved.

I don't think browsers will allow you to do this, for the simple reason that it opens up a whole bunch of security holes. If you think about it, an attack site designed like this would be able to follow people around the net tracking their actions, stealing passwords, etc. without them even knowing it was there.

This is not so simple for a web app.
Your options are:
Create a plugin (or Greasemonkey script) for your favorite browser to collect click data.
JavaScript that tracks the user's cursor position. Keep in mind that this won't be reliable if your users go to other sites from within your site thanks to the fact that JavaScript doesn't work well if scripts come from different origins.
You won't be able to make a "browser" control like you can on a desktop app because browsers intentionally don't allow web sites to be that powerful.
For the "browser in a browser" effect, you can use the tag. Remember, you'll only be able to track user actions in this iframe if the source is from the the same domain as the page it's included on.

Cross domain scripting is impossible by client-side. For obvios security reasons, you can't even read from a frame or iframe pointing to somewhere not from your own site.
Maybe the solution here is to to build something similar to the famous PHPProxy, or PHPBrowser, in this case a "ASP.NET Proxy". Its not that hard to build, you can Google for many exemples of those little codes.

While I doubt you can hide the original browsers toolbars etc, you could set up a single page that does this (it certainly wouldn't handle everything though).
This page would contain a the buttons and textbox required (to make up the inner browser UI) and a placeholder that would contain the page that the user requested. Of course the page contained in the placeholder will need to have all the links replaced so that they can be tracked (I would use linkbuttons). I'm not sure how well form submits would work.
Personally I'd use a proxy if I had control of the computer.

Related

Interacting with web pages in C#

There is a website that was created using ColdFusion (not sure if this matters or not). I need to interact with this web site. The main things I need to do are navigate to different pages and click buttons.
I have come up with two ideas on how to do this. The first is to use the WebBrowser control. With this, I could certainly navigate pages, and click buttons (According to This).
The other way is to interact with the html directly. Not sure exactly how to do this, but I am assuming I could click buttons or use HTML requests to interact with the page.
Does anyone have a recommendation on which way is better? Is there a better way that I haven't thought of?
I'd use Html AgilityPack to parse the html and then do POSTs and GETs appropriately with HttpWebRequest.
While it may be possible to use the WebBrowser control to simulate clicks and navigation you get more control with Html AgilityPack and HttpWebRequest regarding what gets sent
Did you consider Selenium? The WebDriver API is quite good, and permits a lot of things in terms of Website automation.
why not submit directly the url? that's what the button click will do.
using WebRequest.Create you can submit directly to the url. no need to load, parse and "click" the button.
HtmlAguilityPack is useful for pulling the web elements and finding tags easily. If you need to remotely "steer" a web session, though, I prefer to use WatiN. It bills itself as a web unit testing framework, but it's very useful anytime you need to fake a browser section. Further, it can remote control different browsers well enough for most tasks you'll need (like finding a button and pushing it, or a text field and filling in text if you need a login).

Is it possible to record and repeat user actions in WebBrowserControl (Windows Forms)?

I want to be able to use the .NET WebBrowserControl to record and repeat user actions to automate the collection and retrieval of text from web pages for a data extraction tool that I'm building, but am unsure about how to best approach this.
I specifically want to use the .NET WebBrowserControl as it can be embedded in a .NET form and also used within a server side process without a UI. I'm aware that there are other means of recording and repeating user actions such as Selenium, but for now I am interested in a solution around the web browser control (just to keep answers focused).
Actions to be recorded are those such as button clicks, drop down list selection, link clicks etc.
Potential solutions I have looked at so far:
(Please correct me if my notes based on brief evaluations are wrong)
iMacro (doesn't appear to have a component that can be used within a project, to record user actions, rather the GUI has to be used).
WaitN - Good for programmatic play back - but no recording facility that can be hooked up to the web browser control?
I'm presuming this is possible as services like Mozenda appear to make use of the WebBrowserControl, or some IE like version based on mshtml.dll.
Are there any other options I can look at?
Any insight would be appreciated.
yap, as in Mozenda ,when user create any action like goto mainpage>click on images>download image etc... the XPath is recorded with the each page url into XML file. So, use self learning algorithm to implement such kind of XML better way than mozenda.
i have developed one application using JSOUP and Regular Expression Parsing works same as mozenda do. i created the configuration file which contains the XPath of all the items you want . Which works great for me.
Hope this helps,

How to alter browser's back button functionality similar to eCommerce sites in asp.net

I have a web application, in which browser's back button functionality should be customized. It's like, whenever we click on browser's back button, it should take us to landing page(Login page), It should display error message saying that 'session expired. Please login again'.
I have gone through so many posts and even in stack overflow also, i saw few posts. But nothing worked for me. The java script approach i am using as a temporary workaround.Basically this JavaScript never allow us to go back. instead it will keep us in same page.
JavaScript i have used <script>history.go(1)</script>
Please help me to customize the functionality of Browser's back button.
Any suggestions will be really helpful to me.
Short answer: You cannot
A little longer: You shouldn't even try.
But if you insist: A Thorough Examination of "Disabling the Back Button." (from 2000, but since it is ASP I guess still valid for you :)
Newer dot net: Restrict user go back to previous page after signout
Ignore the older browsers do not support location.replace - IE3.2 is not considered older any more but ancient.
For this - you would need a custom solution and disabling back button will not help...
Usually you should not try to change the behavior of back button. But since this is the requirement, I would suggest the following:
Approach 1:
This calls for creating a navigation framework where you know which is the current page in the flow... This is only possible if you a sequence in which the pages will be called (like a wizard)
Approach 2:
Specific to your case : You can use jquery/javascript to identify if the back button is clicked.. If it is then you can do an ajax call to server to kill the session and then redirect the user to login page.
You can programmatically manipulate browser history using something like this:
window.history.back();
window.history.forward();
window.history.go(2); etc.
In HTML5 ready modern browsers like Chrome you can also do more advanced things including completely overwriting back button functionality using history.pushState() and history.replaceState() methods.
(https://developer.mozilla.org/en/DOM/Manipulating_the_browser_history)
You can also go dirty and use javascript + ajax calls to react specifically to back button events, but this will also not work on some browsers. What you ask is not a native part of a web, so no matter what you will decide to use in the end, it wont be very easy or widely supported.

What is the best WebBrowser control that allows one proxy per instance?

I am making a multi-threaded [workers] application. Each thread should have it's own Non-GUI WebBrowser that Navigates to a web page and writes data to fields and click a button. I also need each WebBrowser to have it's own proxy. I tried the classic Windows.Forms.WebBrowser but I got stuck at the proxy part as it depends on IE global settings which won't work in my case. Any recommendations are welcome.
note: I tried doing it through HttpWebRequest/Response but it will never work as the data to be passed to the page contains a field called [ab_test_data] which gets its value from javascript code that calculate the value according to AB testing which I don't even fully understand. So a WebBrowser would be my best solution, unless someone can tell me how to convert that Javascript code that calculates ab_test_data to C# code. The algorithm used by the page I am trying to access is really sophisticated.
note2: ab_test_data value depends on Window.Event and Timestamp which can't be simulated on a httpWebRequest/Response.
note3: I tried Gecko, But it won't let me do anything to the webPage unless GeckoWebBrowser is drawn on the form (which I don't want).
Any solutions are welcome.
edit: If you know any WebBrowser that works like I want in any different language (Maybe Java) I would like to know.
Thanks in advance.
CefSharp: .Net binding for the Chromium Embedded Framework
use http://webkitdotnet.sourceforge.net/
As a question that may help, I wonder why browsers don't allow a proxy per (say) window/tab? I think a lot of it is because of lack of usefulness with respect to development time.
It may also be because the browsers [presumably] have centralized engines for things like web requests and caches, etc... Perhaps, allowing a proxy per window and/or tab would fundamentally alter the design of the modern browser and or have negative performance impacts. I don't really know. To illustrate the point further, consider things like Incognito mode and Private Browsing. In these cases, the browsers have, at least, conceptually made separate caches per windows...but I still bet an Incognito window and a standard window (in Chrome) use the same underlying web request engine.
Right now there are so many people who want a JavaScript and DOM parser and interpreter. Projects like the HtmlAgility Pack and Jint are helping, but there doesn't seem to be a unified and standard solution; at least not one with the simplicity of a web browser.
[rant below]...
Unfortunately, projects like Jint and HtmlAgility are worrisome. For one, they're not IE, Chrome, Safari or FireFox. You don't exactly know what you're getting yourself into. For instance, you know that in Chrome page xyz.com loads and renders perfectly. You can fire up FireFox and see that maybe something is not quite the same and so on with the other browsers. But, with these libraries you don't really know what if not everything is working right (there's no visual display to do a quick check). Plus, who knows what pace they're being developed at. Do they keep up with HTML5? Do they lag behind the major browsers? What about performance? Even more so, browsers already have things like caching and performance enhancements, which I doubt you'll get with individual libraries.
The best browser control would of course be something like:
IWebBrowser browser = new IE();
IWebBrowser browser = new Chrome();
IWebBrowser browser = new Safari();
IWebBrowser browser = new FireFox();
I think that is a dream, unfortunately. For one, what if you ever wanted to load plug-ins with these? What about user profiles, user logins, and so on? I think most of us just want the muscle of the browsers without these extras.
I really do hope that you find a good Chrome solution. I don't know what, if any, luck you'll have in the FireFox realm - maybe you can keep us updated? These solutions are evolving so quickly - I had never even heard of CefSharp or WebKit.NET before today and I looked for the same thing (Chrome and/or FireFox .NET browsers) several months ago for my own use. It would be great if a lot of people got together, made a standard interface and then each company built their embedded browser against the spec. Here's to wishing.

How to hide view source

Could some one please help me out to hide a view source option of a web page in dot net ?
You can't, it is an option of the browser. The best you can do is obfuscate it.
Back in the Geocities era of the internet it wasn't uncommon for sites to use javascript to capture right clicks and popup a message box saying that you weren't allowed to view the source (or save an image or something).
This isn't quite so common nowadays for three main reasons:
It was futile. Preventing someone from using right click to view the source did nothing, as there are plenty of other ways to get at it. It was a minor inconvenience at best. If the browser can render the HTML, the user can get at it too.
It was annoying. Not just the modal message box whenever you accidentally right clicked. Arbitrarily removing functionality from the user's browser is a no-no.
It serves no purpose. If there is some reason that you really don't want the user to see the source of the website then there is something really wrong. If you are doing it to hide how bad the code is, fear not, terribly awful code makes it to production all the time. If you're doing it out of security then this is a majorly bad decision. Security through obscurity (on its own) is never the right choce.
That said, there are ways of obfuscating the code such that the browser can still parse it, but that it is at least annoying for a human to do so. You can use javascript to write certain parts of the page (a la AJAX) such that viewing the vanilla source code doesn't show what it actually rendered. Or you can compress it removing all formatting and naming elements (once it goes to production) such that it is at the very least annoying to follow.
If you're dealing with Internet Explorer only, you can use Group Policy to disable the Internet Explorer View Source menu item.
See Group Policy entry: View menu: Disable Source menu option.
Group Policy modifications are usually made through gpedit.msc or Active Directory. However, in the most basic scenarios changes to Group Policy can be made via direct edits to the registry.

Categories

Resources