I am working for a screen scrapping application from windows application
I can automatically navigate through login page and all the pages using the we browser methods and sometimes having to use the '.Click' to trigger buttons on some of the pages.
Here's the problem. When I do the final 'click' to get my data, web browser opens up a new explorer window(pop up windows) that contains the another link button and I have to do click on this link button using c# to get my final data.
How can I access the new window(pop up window) to scrape it?
I am using below code and this code open the URL in new pop up window.
HtmlElement toollinkbutton = WebBrowser1.Document.Window.Document.Body.Document.GetElementsByTagName("a")[48];
toollinkbutton .InvokeMember("click");
The new window may be due to target="_blank" or javascript and using InvokeMember will result in the new window opening. Add a handler to the WebBrowser control NewWindow event and handle the click by calling Navigate() instead.
private string url = "";
public Form1()
{
InitializeComponent();
WebBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
WebBrowser1.NewWindow += new System.ComponentModel.CancelEventHandler(webBrowser1_NewWindow);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection links = WebBrowser1.Document.Links;
foreach (HtmlElement var in links)
{
var.AttachEventHandler("onclick", LinkClicked);
}
}
private void LinkClicked(object sender, EventArgs e)
{
HtmlElement link = WebBrowser1.Document.ActiveElement;
url = link.GetAttribute("href");
}
void webBrowser1_NewWindow(object sender, System.ComponentModel.CancelEventArgs e)
{
WebBrowser webBrowser = (WebBrowser)sender;
HtmlElement link = webBrowser.Document.ActiveElement;
Uri urlNavigated = new Uri(link.GetAttribute("href"));
WebBrowser1.Navigate(url);
e.Cancel = true;
}
Related
I am using C# to login to a local web page.
I am using webBrowser in order to display the page after the log.
First, I navigate to page then I fill the username & password then I invoke a click.The element to be clicked is recognized; so I assume that the click happened. But the result page isn't showing, nothing appears when I execute.
I tried this:
public WebBrowser webBrowser;
public MainWindow()
{
InitializeComponent();
webBrowser = new WebBrowser();
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(LoginEvent);
webBrowser.AllowNavigation = true;
webBrowser.Navigate("http://192.168.1.100/login.html");
}
private void LoginEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser webBrowser = sender as WebBrowser;
//To execute the event just one time
webBrowser.DocumentCompleted -= LoginEvent;
//load page's document
HtmlDocument doc = webBrowser.Document;
doc.GetElementById("u").SetAttribute("value", "admin");
doc.GetElementById("pw").SetAttribute("value", "123456");
foreach (HtmlElement elem in doc.GetElementsByTagName("a"))
{
elem.InvokeMember("click");
}
}
Can anyone help me please to figure why the page isn't showing?
1) Your WebBrowser object is a local variable in your MainWindow() constructor.
This object is being deposed once the MainWindow constructor ends.
You need to declare the WebBrowser object as a class member.
2) There might be a multiple DocumentComplete events being fired. You need to filter out all iFrame events and wait before the page being fully loaded:
private void LoginEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// filter out non main documents
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//To execute the event just one time
webBrowser.DocumentCompleted -= LoginEvent;
//load page's document
HtmlDocument doc = webBrowser.Document;
doc.GetElementById("u").SetAttribute("value", "admin");
doc.GetElementById("pw").SetAttribute("value", "123456");
foreach (HtmlElement elem in doc.GetElementsByTagName("a"))
{
elem.InvokeMember("click");
}
}
I have a c# application with a WebBrowser component.
I set the documentText properties with a string that contains a form and an autosubmit to another page:
In the application the code is:
private void carica_Click(object sender, EventArgs e)
{
browserRoar.DocumentText = formHTML;
browserRoar.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(ShowDocument);
}
private void ShowDocument(object sender,WebBrowserDocumentCompletedEventArgs e)
{
string newContent = (browserRoar.DocumentText);
}
In the webbrowser i see that the new page is the one with results but when i check newContent i find the starting content. How can i get the new content?
Thanks
Fulvio
I have a winform app with the following functionality:
Has a multiline textbox that contain one URL on each line - about 30 URLs (each URL is different but the webpage is the same (just the domain is different);
I have another textbox in which I can write a command and a button that sends that command to an input field from the webpage.
I have a WebBrowser controller ( I would like to do all the things in one controller )
The webpage consist of a textbox and a button which I want to be clicked after I insert a command in that textbox.
My code so far:
//get path for the text file to import the URLs to my textbox to see them
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog fbd1 = new OpenFileDialog();
fbd1.Title = "Open Dictionary(only .txt)";
fbd1.Filter = "TXT files|*.txt";
fbd1.InitialDirectory = #"M:\";
if (fbd1.ShowDialog(this) == DialogResult.OK)
path = fbd1.FileName;
}
//import the content of .txt to my textbox
private void button2_Click(object sender, EventArgs e)
{
textBox1.Lines = File.ReadAllLines(path);
}
//click the button from webpage
private void button3_Click(object sender, EventArgs e)
{
this.webBrowser1.Document.GetElementById("_act").InvokeMember("click");
}
//parse the value of the textbox and press the button from the webpage
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
newValue = textBox2.Text;
HtmlDocument doc = this.webBrowser1.Document;
doc.GetElementById("_cmd").SetAttribute("Value", newValue);
}
Now, how can I add all those 30 URLs from my textbox in the same webcontroller so that I can send the same command to all of the textboxes from all the webpages and then press the button for all of them ?
//EDIT 1
So, I have adapted #Setsu method and I've created the following:
public IEnumerable<string> GetUrlList()
{
string f = File.ReadAllText(path); ;
List<string> lines = new List<string>();
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null)
lines.Add(line);
}
return lines;
}
Now, is this returning what it should return, in order to parse each URL ?
If you want to keep using just 1 WebBrowser control, you'd have to sequentially navigate to each URL. Note, however, that the Navigate method of the WebBrowser class is asynchronous, so you can't just naively call it in a loop. Your best bet is to implement an async/await pattern detailed in this answer here.
Alternatively, you CAN have 30 WebBrowser controls and have each one navigate on its own; this is roughly equivalent to having 30 tabs open in modern browsers. Since each WebBrowser is doing identical work, you can just have 1 DocumentCompleted event written to handle a single WebBrowser, and then hook up the others to the same event. Do note that the WebBrowser control has a bug that will cause it to gradually leak memory, and the only way to solve this is to restart the application. Thus, I would recommend going with the async/await solution.
UPDATE:
Here's a brief code sample of how to do the 30 WebBrowsers way (untested as I don't have access to VS right now):
List<WebBrowser> myBrowsers = new List<WebBrowser>();
public void btnDoWork(object sender, EventArgs e)
{
//This method starts navigation.
//It will call a helper function that gives us a list
//of URLs to work with, and naively create as many
//WebBrowsers as necessary to navigate all of them
IEnumerable<string> urlList = GetUrlList();
//note: be sure to sanitize the URLs in this method call
foreach (string url in urlList)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += webBrowserDocumentCompleted;
browser.Navigate(url);
myBrowsers.Add(browser);
}
}
private void webBrowserDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//check that the full document is finished
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//get our browser reference
WebBrowser browser = sender as WebBrowser;
//get the string command from form TextBox
string command = textBox2.Text;
//enter the command string
browser.Document.GetElementById("_cmd").SetAttribute("Value", command);
//invoke click
browser.Document.GetElementById("_act").InvokeMember("click");
//detach the event handler from the browser
//note: necessary to stop endlessly setting strings and clicking buttons
browser.DocumentCompleted -= webBrowserDocumentCompleted;
//attach second DocumentCompleted event handler to destroy browser
browser.DocumentCompleted += webBrowserDestroyOnCompletion;
}
private void webBrowserDestroyOnCompletion(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//check that the full document is finished
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//I just destroy the WebBrowser, but you might want to do something
//with the newly navigated page
WebBrowser browser = sender as WebBrowser;
browser.Dispose();
myBrowsers.Remove(browser);
}
If someone clicks on a hyper link inside a WebBrowser control , what method can i use to get the url of that hyperlink and check if it has an attribute that tells the browser to open the link in a new tab/window. Windows Forms Application.
You could obtain the element that currently has user input focus using the Document.ActiveElement property.
private void webBrowser1_NewWindow(object sender, CancelEventArgs e)
{
e.Cancel = true;
if (webBrowser1.Document != null)
{
HtmlElement currentElement = webBrowser1.Document.ActiveElement;
if (currentElement != null)
{
string targetPath = currentElement.GetAttribute("href");
//You can perform some logic here to determine if the targetPath conformsto your specification and if so...
MainForm newWindow = new MainForm();
newWindow.webBrowser1.Navigate(targetPath);
newWindow.Show();
//Otherwise
//webBrowser1.Navigate(targetPath);
}
}
}
Does anybody know how to click on a link in the WebBrowser control in a WinForms application and then have that link open in a new tab inside my TabControl?
I've been searching for months, seen many tutorials/articles/code samples but it seems as though nobody has ever tried this in C# before.
Any advice/samples are greatly appreciated.
Thank you.
Based on your comments, I understand that you want to trap the "Open In New Window" action for the WebBrowser control, and override the default behavior to open in a new tab inside your application instead.
To accomplish this reliably, you need to get at the NewWindow2 event, which exposes ppDisp (a settable pointer to the WebBrowser control that should open the new window).
All of the other potential hacked together solutions (such as obtaining the last link selected by the user before the OpenWindow event) are not optimal and are bound to fail in corner cases.
Luckily, there is a (relatively) simple way of accomplishing this while still using the System.Windows.Forms.WebBrowser control as a base. All you need to do is extend the WebBrowser and intercept the NewWindow2 event while providing public access to the ActiveX Instance (for passing into ppDisp in new tabs). This has been done before, and Mauricio Rojas has an excellent example with a complete working class "ExtendedWebBrowser":
http://blogs.artinsoft.net/mrojas/archive/2008/09/18/newwindow2-events-in-the-c-webbrowsercontrol.aspx
Once you have the ExtendedWebBrowser class, all you need to do is setup handlers for NewWindow2 and point ppDisp to a browser in a new tab. Here's an example that I put together:
private void InitializeBrowserEvents(ExtendedWebBrowser SourceBrowser)
{
SourceBrowser.NewWindow2 += new EventHandler<NewWindow2EventArgs>(SourceBrowser_NewWindow2);
}
void SourceBrowser_NewWindow2(object sender, NewWindow2EventArgs e)
{
TabPage NewTabPage = new TabPage()
{
Text = "Loading..."
};
ExtendedWebBrowser NewTabBrowser = new ExtendedWebBrowser()
{
Parent = NewTabPage,
Dock = DockStyle.Fill,
Tag = NewTabPage
};
e.PPDisp = NewTabBrowser.Application;
InitializeBrowserEvents(NewTabBrowser);
Tabs.TabPages.Add(NewTabPage);
Tabs.SelectedTab = NewTabPage;
}
private void Form1_Load(object sender, EventArgs e)
{
InitializeBrowserEvents(InitialTabBrowser);
}
(Assumes TabControl named "Tabs" and initial tab containing child control docked ExtendedWebBrowser named "InitialWebBrowser")
Don't forget to unregister the events when the tabs are closed!
private Uri _MyUrl;
System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser();
browser.Navigating += new System.Windows.Forms.WebBrowserNavigatingEventHandler(browser_Navigating);
void browser_Navigating(object sender, System.Windows.Forms.WebBrowserNavigatingEventArgs e)
{
_MyUrl = e.Url;
e.Cancel;
}
The following code works, just follow the first reply for creating the ExtendedWebBrowser class.
I'm using this to open a new tab but it also works to open a new window using your browser and not IE.
Hope it helps.
private void Window_Loaded(object sender, RoutedEventArgs e)
{
if (current_tab_count == 10) return;
TabPage tabPage = new TabPage("Loading...");
tabpages.Add(tabPage);
tabControl1.TabPages.Add(tabPage);
current_tab_count++;
ExtendedWebBrowser browser = new ExtendedWebBrowser();
InitializeBrowserEvents(browser);
webpages.Add(browser);
browser.Parent = tabPage;
browser.Dock = DockStyle.Fill;
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
browser.DocumentTitleChanged += new EventHandler(Browser_DocumentTitleChanged);
browser.Navigated += Browser_Navigated;
browser.IsWebBrowserContextMenuEnabled = true;
public void InitializeBrowserEvents(ExtendedWebBrowser browser)
{
browser.NewWindow2 += new EventHandler<ExtendedWebBrowser.NewWindow2EventArgs>(Browser_NewWindow2);
}
void Browser_NewWindow2(object sender, ExtendedWebBrowser.NewWindow2EventArgs e)
{
if (current_tab_count == 10) return;
TabPage tabPage = new TabPage("Loading...");
tabpages.Add(tabPage);
tabControl1.TabPages.Add(tabPage);
current_tab_count++;
ExtendedWebBrowser browser = new ExtendedWebBrowser();
webpages.Add(browser);
browser.Parent = tabPage;
browser.Dock = DockStyle.Fill;
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
browser.DocumentTitleChanged += new EventHandler(Browser_DocumentTitleChanged);
browser.Navigated += Browser_Navigated;
tabControl1.SelectedTab = tabPage;
browser.Navigate(textBox.Text);
{
e.PPDisp = browser.Application;
InitializeBrowserEvents(browser);
}
I did a bit of research on this topic and one does not need to do all the COM plumbing that is present in the ExtendedWebBrowser class, as that code is already present in the generated Interop.SHDocVw. As such, I was able to use the more natural construct below to subscribe to the NewWindow2 event. In Visual Studio I had to add a reference to "Microsoft Internet Controls".
using SHDocVw;
...
internal WebBrowserSsoHost(System.Windows.Forms.WebBrowser webBrowser,...)
{
ParameterHelper.ThrowOnNull(webBrowser, "webBrowser");
...
(webBrowser.ActiveXInstance as WebBrowser).NewWindow2 += OnNewWindow2;
}
private void OnNewWindow2(ref object ppDisp, ref bool Cancel)
{
MyTabPage tabPage = TabPageFactory.CreateNewTabPage();
tabPage.SetBrowserAsContent(out ppDisp);
}
Please read http://bit.ly/IDWm5A for more info. This is page #5 in the series, for a complete understanding I had to go back and read pages 3 -> 5.
You simply cancel the new window event and handle the navigation and tab stuff yourself.
Here is a fully working example. This assumes you have a tabcontrol and at least 1 tab page in place.
using System.ComponentModel;
using System.Windows.Forms;
namespace stackoverflow2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
this.webBrowser1.NewWindow += WebBrowser1_NewWindow;
this.webBrowser1.Navigated += Wb_Navigated;
this.webBrowser1.DocumentText=
"<html>"+
"<head><title>Title</title></head>"+
"<body>"+
"<a href = 'http://www.google.com' target = 'abc' > test </a>"+
"</body>"+
"</html>";
}
private void WebBrowser1_NewWindow(object sender, CancelEventArgs e)
{
e.Cancel = true; //stop normal new window activity
//get the url you were trying to navigate to
var url= webBrowser1.Document.ActiveElement.GetAttribute("href");
//set up the tabs
TabPage tp = new TabPage();
var wb = new WebBrowser();
wb.Navigated += Wb_Navigated;
wb.Size = this.webBrowser1.Size;
tp.Controls.Add(wb);
wb.Navigate(url);
this.tabControl1.Controls.Add(tp);
tabControl1.SelectedTab = tp;
}
private void Wb_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
tabControl1.SelectedTab.Text = (sender as WebBrowser).DocumentTitle;
}
}
}
There is no tabbing in the web browser control, therefor you need to handle the tabs yourself. Add a tab control above the web browser control and create new web browser controls when new tabs are being opened. Catch and cancel when the user opens new windows and open new tabs instead.