I am working on a web crawler. I am using the Webbrowser control for this purpose. I have got the list of urls stored in database and I want to traverse all those URLs one by one and parse the HTML.
I used the following logic
foreach (string href in hrefs)
{
webBrowser1.Url = new Uri(href);
webBrowser1.Navigate(href);
}
I want to do some work in the "webBrowser1_DocumentCompleted" event once the page is loaded completely. But the "webBrowser1_DocumentCompleted" does not get the control as I am using the loop here. It only get the control when the last url in "hrefs" is navigated and the control exits the loop.
Whats the best way to handle such problem?
Store the list somewhere in your state, as well as the index of where you've got to. Then in the DocumentCompleted event, parse the HTML and then navigate to the next page.
(Personally I wouldn't use the WebBrowser control for web crawling... I know it means it'll handle the JavaScript for you, but it'll be a lot harder to parallelize nicely than using multiple WebRequest or WebClient objects.)
First of all, you are setting new url to same web browser control, even before it has loaded anything, this way you will simply see the last url on your browser. Definately browser will certainly take some time to load url, so I guess navigation is cancelled well in advance before Document_Completed can be fired.
There is only one way to do this simultaneously,
You have to use a tab control, and open a new tab item for every url and each tab item will have its own web browser control and you can set its url.
foreach(string href in hrefs){
TabItem item = new TabItem();
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += wb_DocumentCompleted;
wb.Url = href;
item.Child = web;
tabControl1.Items.Add(item);
}
private void wb_DocumentCompleted(object sender, EventArgs e){
/// do your stuff...
}
In order to improve above method, you should see how can you create multiple tab items in different UI threads, its pretty log topic to discuss here, but it is still possible.
Another method is to do use a queue...
private static Queue<string> queue = new ...
foreach(string href in hrefs){
queue.Enqueue(href);
}
private void webBrowser1_DocumentCompleted(object sender, EventArgs e){
if(queue.Count>0){
webBrowser1.Url = queue.Dequeue();
}
}
Related
I have a Page (showDocuments) that shows documents and folders (like in Dropbox or Google Drive). When a user clicks on a folder I'm trying to navigate to a new instance of the showDocuments Page in order to show the content of the clicked folder. However, when I render the new information, it appears both the new documents and the previous ones.
I could do it by just having one page and cleaning it each time, but I need different pages in order to go back to the parent folders using frame.GoBack(), since it is much faster rather than using frame.Navigate(...) and compute and print everything again.
I'm not using a MVVM model, I just have a page and I decide which objects I need to show on the xaml.cs file.
Should I use views instead of pages?
Thanks for your time.
Try by handle the back button operation to catch parameters and navigating back.
You could have global vars of parent folder: the value (parentFolderVar) you pass to showDocuments Page and isReturn that is setted in OnNavigatedTo Method:
App.parentFolderVar = someValue;
So, when you handle tha back operation in App.xaml.cs:
private void OnBackRequested(object sender, BackRequestedEventArgs e)
{
if (rootFrame.SourcePageType == typeof(showDocuments))
App.isReturn = true;
e.Handled = true;
Pause();
}
And in showDocuments navigation:
protected override void OnNavigatedTo(object sender, NavigationEventArgs e)
{
if(App.isReturn)
//You know the parent folder with App.parentFolderVar
//Make operations
App.parentFolderVar = updatedParentFolderValue;
App.isReturn = false;
}
I'm writing an application that shows notifications trough baloontips over notifyicon. There are two kinds of notifications I want to display - normal baloontip and clickable baloontip. I want clickable baloontips to open some url in web browser. The problem is that events stacks over baloontip.
I'm not sure if this explanation says anything, so here's an example:
code:
NotifyIcon ni = new NotifyIcon();
void showClickableNotification(string title, string content, string url)
{
ni.BaloonTipClicked += new EventHandler((sender, e) => ni_BalloonTipClicked(sender, e, url));
ni.ShowBaloonTip(1, title, content, ToolTipIcon.Info);
}
void ni_BalooTipClicked(object sender, EventArgs e, string url)
{
Process.Start(url);
}
every use of showClickableNotification will assign one more url to BallonTipClicked event
I want to clear event after notification will hide, to prevent opening multiple tabs unassociated with current notification.
Also, when normal notification is shown after a clickable one it's click opens all the stacked urls as well.
I tried to assign an empty function for ni.BaloonTipClicked += emptyFunction this, but += operator just adds another event to the pool instead of overwriting it. -= does not work since I'm adding new event every time. I guess I could do some global variable that holds current url and avoid assigning new everytime (-= would work then), but it looks like cheap workaround. Is there any (correct?) way to do it?
I have a problem, I am trying to work with a custom webbrowser for a specific website.
The problem I am having with GeckoFX is that some times I need to wait for DocumentCompleted to continue with execution of particular methods.
I don't want to put all my code into one large DocumentCompleted event, as that seems silly and wrong.
I got the code to work by using the Application.DoEvents() as follows, but I read that this is not a right way to go, and that webbrowser should be best run as async.
private void AddNewTab(string tmsAddress) //add a new browser to my form
{
TabPage tab = new TabPage();
browserTabControl.TabPages.Insert(browserTabControl.TabCount - 1, tab);
GeckoWebBrowser browser = new GeckoWebBrowser();
tab.Controls.Add(browser);
tmsBrowser.Dock = DockStyle.Fill;
tmsBrowser.Navigate(address);
tmsBrowser.DocumentCompleted += new EventHandler<Gecko.Events.GeckoDocumentCompletedEventArgs>(tmsBrowser_DocumentCompleted);
}
Navigation on the page is manual, i.e. users work with the page normally, but from time to time they can use a shortcut button to get somewhere.
private void someButton_MouseUp(object sender, MouseEventArgs e)
{
GeckoWebBrowser browser = getCurrentBrowser();
//get some data from the page here
OpenPageInContentFrame(address + parameter1 + parameter2); //I need to wait for this page to load and then do HighlightItemRow()
while (!eventHandled) //documentCompleted events sets the bool as 'true'
{
Application.DoEvents();
}
HighlightItemRow(browser, parameter1);
}
}
I wanted to go with ManualResetEvents instead of while() and Application.DoEvents(), but using manResEvent.WaitOne() causes the whole application to freeze, including the navigation, so the page never actually loads. I think this must be because it's all in a single thread, I don't know how to make it working - I never used anything async etc.
I'm working on a little automation project at the min and have hit a brick wall. Firstly i'd like to state the only reason i'm using webbrowser for this component of the project is the site being scraped has obfuscated code and requires a java enabled browser to display the code, i've got another app using webclient which works fine for other test sites but unfortunately can't be used on this target
My problem arises when trying to programatically configure the webbrowser control
First problem i've discovered is if i manually set the url in the controls properties it loads page 1 up and the scraper works for that page. However, I proceeded to clear the url in the properties and set it manually in the Form1_Load method but it returns about:blank as the url despite the fact i've verified the automated parameter being pulled in is fine and should be getting set without issue
Here's what i'm using:
Note:
collection refers to an XML serialized array of definitions
definition refers to the active definition for this target,the idea being to configure this for multiple targets
private void Form1_Load(object sender, EventArgs e)
{
PopulateScraperCollection();
webBrowser1.Url = new Uri(collection.ElementAt(b).AccessUrl);
NavigateToUrl(collection.ElementAt(b).AccessUrl);
}
public void PopulateScraperCollection()
{
string[] xmlFiles = Directory.GetFiles(#"E:\DealerConfigs\");
foreach (string xmlFile in xmlFiles)
{
collection.Add(ScraperDefinition.Deserialize(xmlFile));
}
}
public void NavigateToUrl(string url)
{
Console.WriteLine(collection.ElementAt(b).AccessUrl);
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
webBrowser1.Navigate(webBrowser1.Url);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
Process(collection.ElementAt(b), 0);
b++;
}
Consequently this causes another issue in using DocumentCompleted to navigate to the paginated results. On the first page load i use a DocumentCompleted event to trigger the link extraction. When I attempt to set the url for the the next page,which is being picked out fine using xpath and again verified, using F10 to step over in debug indicates it hasnt been changed and the DocumentCompleted event isn't being triggered
My code to change the url etc. is:
string nextPageUrl = string.Format(definition.NextPageUrlFormat, WebUtility.HtmlDecode(relativeUrl));
webBrowser1.Url = new Uri(nextPageUrl);
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
webBrowser1.Navigate(webBrowser1.Url);
Any help as always is greatly appreciated, this is proving to be a nightmare to automate, not only because WebBrowser is so much slower than WebClient, but its proving a pain to alter on the fly
Regards
Barry
You should never really set webBrowser1.Url, You should just be using the Navigate void, so
private void Form1_Load(object sender, EventArgs e)
{
PopulateScraperCollection();
NavigateToUrl(collection.ElementAt(b).AccessUrl);
}
My guess would be why it isnt navigating, is that the collection.ElementAt(b).AccessUrl is null or about:blank
Im not really sure how to answer your question, but the Navigate void should change it
NB: WebBrowser control is proper crap, you could try another WebBrowser control like Awesomium or GeckoFX
I understand the "why" controls vanish on postback, and up until now I have had great success just creating what I need to do dynamically in page init. However this fell apart for me when I had to add some controls to a asp.net page based on the value of an existing dropdownlist.
So my question is simple, and I don't seem to be able to find a good working code example. I need to add some controls to the page based on the value of a dropdownlist. Then persist these added controls across other postbacks (session is fine).
Here is a snippet to work off of:
protected void Page_Init(System.Object sender, System.EventArgs e)
{
RebuildPlaceholder();
}
protected void ddlGroup_Change(System.Object sender, System.EventArgs e)
{
ExampleDataContext ctxExample = new ExampleDataContext();
var aryExample = (from rslt in ctxExample.mvExample
where rslt.label.ToLower() == ddlGroup.SelectedValue
select rslt);
foreach (var objExample in aryExample)
{
TextBox txtCreated = new TextBox();
txtCreated.ID = "ddl" + objExample.ID;
plcExample.Controls.Add(txtCreated);
}
StorePlaceholder();
}
private void StorePlaceholder()
{
//Need code to store all controls in a placeholder.
}
private void RebuildPlaceholder()
{
//Need code to rebuild all of the controls from Session.
}
I found this related article: Dynamically Adding Controls but I am struggling with the syntax for serializing all the controls, etc.
This can be limited to the child controls of a single placeholder that already exists on a page, just storing/restoring that placeholder's controls is what I am after.
Any version of ASP.NET is fine, if there is something that made this easy in 4.0 great.
Instead try caching the dropdown list selection. Then during the next page load use the cache to set the value selected. Then load the new controls based on that selection.
Session["CacheKey"] = DropDownList1.SelectedValue;
Then to access the Session Cache:
var value = Session["CacheKey"];
Take a look at this Microsoft article
on ASP.NET Caching
I've found that DropDownList.SelectedValue is unavailable during Page.Init. But you can still get access to the value with Request[ddl.UniqueID] and then create and add all your dynamic controls.
It feels kind of like a hack, but the ASP.NET page lifecycle doesn't allow many alternatives, particularly if your controls are not serializable.