select and manipulation elements in web-page

select and manipulation elements in web-page - c#

I'm looking for an opportunity to manipulate the contents of a Web page in browser engine(C#) jquery-like methods.
Pseudo-code:
Browser.Document.Query("div.content> a")[0].Click();
I watched the popular browser engines for .NET, but have not found anything like that. Any ideas?

Sounds like, from your description, you want Selenium.
Selenium automates browsers. That's it. What you do with that power is entirely up to you. Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
Selenium has the support of some of the largest browser vendors who have taken (or are taking) steps to make Selenium a native part of their browser. It is also the core technology in countless other browser automation tools, APIs and frameworks.
It will allow you to automate a ton of different things, but for example you can navigate to pages within a browser (whether this is Firefox, IE, Chrome etc) and find elements, click them, change their text and execute arbitrary Javascript too.
Selenium has an API in various languages, one of which is indeed C#. You can query against a page using many different methods including CSS selectors (which you use in your example).

The Sizzle engine is what powers jQuery selectors, you might want to look at the Fizzler port:
Check this out: http://code.google.com/p/fizzler/

You probably won't be able to call it directly from C#, but you should be able to create a javascript function in that page that does what you want, and then call that javascript function instead from C#.

Related

How can I remotely take an accurate screenshot of a webpage by URL without WebBrowser?

I tried WebBrowser but any solution related to WebBrowser I find uses the exact same codes, WebBrowser.DrawToBitmap
Problem with that is, it gives out inaccurate results depending on URL, to what page actually would look like, sometimes even blank pages.
So I am looking for any alternate solution to WebBrowser if there is any.
It needs to run in background, as in not open any browser on screen, render it in background with all scripts and get an image.

If I understand what you're trying to do, you might be interested in a framework called PhantomJS, which is a WebKit "browser" engine which runs the pages without visually rendering them. It can be used to capture screens.
Now this technique requires JavaScript, but there is something called Selenium WebDriver to help you wrapping that. Users here at SO posted a simple example and this comment looks useful as it contains the list of required Packages.

Programatically interact with the IE browser to fill in forms and navigate etc

I'd like to use C# to interact with the IE browser.
I have a feeling that shdocvw.dll will be involved, but there are so many classes in there that I don't know where to start, and maybe it's not even necessary to use it.
The goal here is to interact with a website, visiting it's pages and "warming it up," not unlike as described here by Kenneth Scott. The thing is, javascript is getting executed as you interact with a website, so it would be nice just to be able to login / submit forms exactly as you would on the website itself.
Plus it would be nice to be able to create a program that records my actions in IE, and then be able to slightly automate and slightly modify them.
Additionally, it would be nice if it could do all this in the background, without having to display the webpage at all.
I'm not looking for third party solutions, I want to do this myself (with your advice of course.)
Thanks.

You said you're not looking for a third party solution, however, we have used WatiN in work with great success for automated UI testing.
It's open source, so if you want to see how they do it, you can.

Things like selenium and watin are very mature frameworks for doing exactly what you ask. Unless the point is to learn for yourself how to do this I would use one of them.
Watin is also a great way to learn how to do this in c# as it is an open source c# project.

Is it possible to record and repeat user actions in WebBrowserControl (Windows Forms)?

I want to be able to use the .NET WebBrowserControl to record and repeat user actions to automate the collection and retrieval of text from web pages for a data extraction tool that I'm building, but am unsure about how to best approach this.
I specifically want to use the .NET WebBrowserControl as it can be embedded in a .NET form and also used within a server side process without a UI. I'm aware that there are other means of recording and repeating user actions such as Selenium, but for now I am interested in a solution around the web browser control (just to keep answers focused).
Actions to be recorded are those such as button clicks, drop down list selection, link clicks etc.
Potential solutions I have looked at so far:
(Please correct me if my notes based on brief evaluations are wrong)
iMacro (doesn't appear to have a component that can be used within a project, to record user actions, rather the GUI has to be used).
WaitN - Good for programmatic play back - but no recording facility that can be hooked up to the web browser control?
I'm presuming this is possible as services like Mozenda appear to make use of the WebBrowserControl, or some IE like version based on mshtml.dll.
Are there any other options I can look at?
Any insight would be appreciated.

yap, as in Mozenda ,when user create any action like goto mainpage>click on images>download image etc... the XPath is recorded with the each page url into XML file. So, use self learning algorithm to implement such kind of XML better way than mozenda.
i have developed one application using JSOUP and Regular Expression Parsing works same as mozenda do. i created the configuration file which contains the XPath of all the items you want . Which works great for me.
Hope this helps,

What is the best WebBrowser control that allows one proxy per instance?

I am making a multi-threaded [workers] application. Each thread should have it's own Non-GUI WebBrowser that Navigates to a web page and writes data to fields and click a button. I also need each WebBrowser to have it's own proxy. I tried the classic Windows.Forms.WebBrowser but I got stuck at the proxy part as it depends on IE global settings which won't work in my case. Any recommendations are welcome.
note: I tried doing it through HttpWebRequest/Response but it will never work as the data to be passed to the page contains a field called [ab_test_data] which gets its value from javascript code that calculate the value according to AB testing which I don't even fully understand. So a WebBrowser would be my best solution, unless someone can tell me how to convert that Javascript code that calculates ab_test_data to C# code. The algorithm used by the page I am trying to access is really sophisticated.
note2: ab_test_data value depends on Window.Event and Timestamp which can't be simulated on a httpWebRequest/Response.
note3: I tried Gecko, But it won't let me do anything to the webPage unless GeckoWebBrowser is drawn on the form (which I don't want).
Any solutions are welcome.
edit: If you know any WebBrowser that works like I want in any different language (Maybe Java) I would like to know.
Thanks in advance.

CefSharp: .Net binding for the Chromium Embedded Framework

use http://webkitdotnet.sourceforge.net/

As a question that may help, I wonder why browsers don't allow a proxy per (say) window/tab? I think a lot of it is because of lack of usefulness with respect to development time.
It may also be because the browsers [presumably] have centralized engines for things like web requests and caches, etc... Perhaps, allowing a proxy per window and/or tab would fundamentally alter the design of the modern browser and or have negative performance impacts. I don't really know. To illustrate the point further, consider things like Incognito mode and Private Browsing. In these cases, the browsers have, at least, conceptually made separate caches per windows...but I still bet an Incognito window and a standard window (in Chrome) use the same underlying web request engine.
Right now there are so many people who want a JavaScript and DOM parser and interpreter. Projects like the HtmlAgility Pack and Jint are helping, but there doesn't seem to be a unified and standard solution; at least not one with the simplicity of a web browser.
[rant below]...
Unfortunately, projects like Jint and HtmlAgility are worrisome. For one, they're not IE, Chrome, Safari or FireFox. You don't exactly know what you're getting yourself into. For instance, you know that in Chrome page xyz.com loads and renders perfectly. You can fire up FireFox and see that maybe something is not quite the same and so on with the other browsers. But, with these libraries you don't really know what if not everything is working right (there's no visual display to do a quick check). Plus, who knows what pace they're being developed at. Do they keep up with HTML5? Do they lag behind the major browsers? What about performance? Even more so, browsers already have things like caching and performance enhancements, which I doubt you'll get with individual libraries.
The best browser control would of course be something like:
IWebBrowser browser = new IE();
IWebBrowser browser = new Chrome();
IWebBrowser browser = new Safari();
IWebBrowser browser = new FireFox();
I think that is a dream, unfortunately. For one, what if you ever wanted to load plug-ins with these? What about user profiles, user logins, and so on? I think most of us just want the muscle of the browsers without these extras.
I really do hope that you find a good Chrome solution. I don't know what, if any, luck you'll have in the FireFox realm - maybe you can keep us updated? These solutions are evolving so quickly - I had never even heard of CefSharp or WebKit.NET before today and I looked for the same thing (Chrome and/or FireFox .NET browsers) several months ago for my own use. It would be great if a lot of people got together, made a standard interface and then each company built their embedded browser against the spec. Here's to wishing.

A browser inside a browser

I want to create with Asp.net a browser inside a web page, so that I can process the click events of the user (for statistics analysis).
I kwnow how to do it with Winforms but I need a full online solution, so that:
The user open an standard browser and types in a start url.
In this url the menus and bars of the standard browser are hidden and
the user can see a "simulated browser", with standard buttons
(back, reload, ...).
From the Asp (c#) code behind this page I can start collecting click
data.
Thanks in advance, and keep the good work.

What you want to collect (a heat map of clicks essentially) is doable, but I don't think the way you want to go about it is very feasible.
Try this out.
I think that using this kind of solution with frames, etc. is much more feasible than embedding a browser (this amounts to writing a browser that can be served up by some kind of java/silverlight technology, not trivial).
Another idea would be that since, I assume, you have the permission of your users to track their clicks, write a greasemonkey (firefox plugin) based on the javascript in the link I provided above. You could then have all users use this plugin script combination to give you their clicks.

Web browsers are normally designed to prevent this kind of cross-site scripting vulnerability. This would only be feasible if you had the complete cooperation of all sites involved.

I don't think browsers will allow you to do this, for the simple reason that it opens up a whole bunch of security holes. If you think about it, an attack site designed like this would be able to follow people around the net tracking their actions, stealing passwords, etc. without them even knowing it was there.

This is not so simple for a web app.
Your options are:
Create a plugin (or Greasemonkey script) for your favorite browser to collect click data.
JavaScript that tracks the user's cursor position. Keep in mind that this won't be reliable if your users go to other sites from within your site thanks to the fact that JavaScript doesn't work well if scripts come from different origins.
You won't be able to make a "browser" control like you can on a desktop app because browsers intentionally don't allow web sites to be that powerful.
For the "browser in a browser" effect, you can use the tag. Remember, you'll only be able to track user actions in this iframe if the source is from the the same domain as the page it's included on.

Cross domain scripting is impossible by client-side. For obvios security reasons, you can't even read from a frame or iframe pointing to somewhere not from your own site.
Maybe the solution here is to to build something similar to the famous PHPProxy, or PHPBrowser, in this case a "ASP.NET Proxy". Its not that hard to build, you can Google for many exemples of those little codes.

While I doubt you can hide the original browsers toolbars etc, you could set up a single page that does this (it certainly wouldn't handle everything though).
This page would contain a the buttons and textbox required (to make up the inner browser UI) and a placeholder that would contain the page that the user requested. Of course the page contained in the placeholder will need to have all the links replaced so that they can be tracked (I would use linkbuttons). I'm not sure how well form submits would work.
Personally I'd use a proxy if I had control of the computer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.