C# Wait for Web Page to Load Before Scraping - c#

I am trying to make a Windows Forms app that logs in another web application, navigates for a few steps (clicks) until it reaches a specific page and then scrape some info (names and addresses).
The problem is that I am using the DocumentCompletedEventHandler in order to have a page loaded before I execute the code for navigating to the next page (in order to reach the final web page).
When it fires, DocumentCompletedEventHandler fires multiple times.
When I reach the loggin page, it enters the credentials and then the message "Page loaded!" appears multiple times.
I press enter, it appears again.
Then it navigates to the next page and with that new page I have the same problem.
how can I make DocumentCompletedEventHandler to fire only once and not multiple times?
private void loadEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
MessageBox.Show("Page loaded!");
}
private void loadLogin(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var inputElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement i in inputElements)
{
if (i.GetAttribute("name").Equals("utilizator"))
{
i.InnerText = textBox1.Text;
}
if (i.GetAttribute("name").Equals("parola"))
{
i.Focus();
i.InnerText = textBox2.Text;
}
}
var buttonElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement b in buttonElements)
{
if (b.GetAttribute("name").Equals("Intra"))
{
b.InvokeMember("Click");
}
}
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadEvent);
var inputElements1 = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement i1 in inputElements1)
{
if (i1.GetAttribute("id").Equals("headerqstext"))
{
i1.Focus();
i1.InnerText = textBox3.Text;
}
}
var buttonElements1 = webBrowser1.Document.GetElementsByTagName("button");
foreach (HtmlElement b1 in buttonElements1)
{
if (b1.GetAttribute("title").Equals("Caută"))
{
b1.InvokeMember("Click");
}
}
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadEvent);
}
private void Button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("http://10.1.104.23/ecris_cdms/");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadLogin);
}
}
}

try this :)
Uri last = null;
private void CompleteResponse(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (!(last != null && last != e.Url))
return;
//your code here
}

Related

How do I scrape web content async?

Here is what I tried so far. This works but the Form is Freezing everytime it updates
private void timer1_Tick(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.roblox.com/catalog/527365852/Dominus-Praefectus");
foreach (var item in doc.DocumentNode.SelectNodes("//*[#id='item-details']/div[1]/div[1]/div[2]/div/span[2]"))
{
textBox1.Text = item.InnerText;
}
}

DocumentCompleted not firing twice

Scraping a web page. The page loads and calls a DocumentCompleted handler. Inside that handler, I invoke a java method to set a date and then invoke a click to get the new data (via POST). This all works correctly except that the DocumentCompleted handler is only called once. The POST that goes back and "gets" a new page doesn't cause the handler to fire a second time.
I tried adding multiple handlers, removing the first and adding a second handler in the first handler. Didn't work. Also ran this as Administrator, didn't change anything.
Anyone have thoughts on how to proceed? I guess I can wait 60 seconds for it to load and then grab the text but that seems clunky.
public void FirstHandler(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = ((WebBrowser)sender);
string url = e.Url.ToString();
if (!(url.StartsWith("http://")) || url.StartsWith("https://"))
{
// in AJAX
}
if (e.Url.AbsolutePath != webBrowser.Url.AbsolutePath)
{
// IFRAME Painting
}
else
{
// really really complete
wb.DocumentCompleted -= FirstHandler;
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(SecondHandler);
HtmlElement webDatePicker = wb.Document.GetElementById("ctl00_WebSplitter1_tmpl1_ContentPlaceHolder1_dtePickerBegin");
string szJava = string.Empty;
szJava = "a = $find(\"ctl00_WebSplitter1_tmpl1_ContentPlaceHolder1_dtePickerBegin\"); a.set_text(\"5/20/2017\");";
object a = wb.Document.InvokeScript("eval", new object[] { szJava });
if (webDatePicker != null)
webDatePicker.InvokeMember("submit");
HtmlElement button = wb.Document.GetElementById("ctl00$WebSplitter1$tmpl1$ContentPlaceHolder1$HeaderBTN1$btnRetrieve");
if (button != null)
{
button.InvokeMember("click");
}
}
}
public void SecondHandler(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = ((WebBrowser)sender);
string url = e.Url.ToString();
string d = string.Empty;
if (!(url.StartsWith("http://")) || url.StartsWith("https://"))
{
// in AJAX
}
if (e.Url.AbsolutePath != webBrowser.Url.AbsolutePath)
{
// IFRAME Painting
}
else
{
d = wb.DocumentText;
System.IO.File.WriteAllText("Finally.htm", d);
wb.DocumentCompleted -= SecondHandler;
}
_fired = true;
}

WebBrowserDocumentCompletedEventHandler is repeating itself

I am trying to make a program that harvester data from a remote login site. I manage to log my self in but when i try to navigate through 2 pages my code makes the browser request and loads p1 then p2 then p1 then p2 and so on.
I try all methods within this link How to make WebBrowser wait till it loads fully?
And it stills gives me the same problem!
Here is my code:
webBrowser1.Document.GetElementById("user").InnerText = textBox1.Text.ToString();
webBrowser1.Document.GetElementById("pass").InnerText = textBox2.Text.ToString();
webBrowser1.Document.GetElementById("login").InvokeMember("click");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(LookNew);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Lookfind);
void LookNew(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url != webBrowser1.Url)
return;
else
FindLink(webBrowser1.DocumentText, "New").InvokeMember("Click");
}
void LookFind(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url != webBrowser1.Url)
return;
else
FindLink(webBrowser1.DocumentText, "find").InvokeMember("Click");
}

Winforms Webbrowser control URL Validation

I am trying to validate a winform web browser control url when a button is clicked. I would like to see if the web browser's current url matches a certain url. When I try to run this code the program freezes
private void button_Click(object sender, EventArgs e)
{
// Check to see if web browser is at URL
if (webBrowser1.Url.ToString != "www.google.com" || webBrowser1.Url.ToString == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else {
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}
Here you go.
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Url = new Uri("https://www.google.ca");
// Check to see if web browser is at URL
if (webBrowser1.Url != null)
{
if (webBrowser1.Url.ToString() != "https://www.google.com" || webBrowser1.Url.ToString() == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else
{
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}
}
1) Please use the schema with the URL.
2) Use ToString() as a function.

C# stopping an infinite foreach loop

This foreach loop checks a webpage and sees if there are any images then downloads them. How do i stop it? When i press the button it continues the loop forever.
private void button1_Click(object sender, EventArgs e)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted +=browser_DocumentCompleted;
browser.Navigate(textBox1.Text);
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
int count = 0; //if available
int maximumCount = imgCollection.Count;
try
{
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
webClient.DownloadFile(url, url.Substring(url.LastIndexOf('/')));
count++;
if(count >= maximumCount)
break;
}
}
catch { MessageBox.Show("errr"); }
}
use the break; keyword to break out of a loop
You do not have an infinite loop, you have an exception that is being thrown based on how you are writing the file to disk
private void button1_Click(object sender, EventArgs e)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += browser_DocumentCompleted;
browser.Navigate("www.google.ca");
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
string name = System.IO.Path.GetFileName(url);
string path = System.IO.Path.Combine(Environment.CurrentDirectory, name);
webClient.DownloadFile(url, path);
}
}
That code works fine on my environment. The issue you seemed to be having was when you were setting the DownloadFile filepath, you were setting it to a value like `\myimage.png', and the webclient could not find the path so it threw and exception.
The above code drops it into the current directory with the extension name.
Maybe the Event browser.DocumentCompleted cause the error, if the page refreshes the event gets fired again. You could try to deregister the event.
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
browser.DocumentCompleted -= browser_DocumentCompleted;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
string name = System.IO.Path.GetFileName(url);
string path = System.IO.Path.Combine(Environment.CurrentDirectory, name);
webClient.DownloadFile(url, path);
}
}

Categories

Resources