Navigate URLs using WebBrowser DocumentCompleted - c#

This is the scenario
1-Navigate to admin page.
2-Enter username and password
3-Navigate to new page
4-Fill some text in textareas etc and post .
5-Repeat Step 3 and 4 until loop ends
The Code Below successfully does step 1 and 2. But it reaches step 3 before new page is loaded and generates the error "Object reference not set to an instance of an object" on this line doc.GetElementById("title").SetAttribute("value", "check1");
I am trying to achieve this from last 3 days but can't reached step 3 until now. Any help will be appreciated
bool AdminPagework =false;
bool postnavigationdone =false;
public Form1()
{
InitializeComponent();
webBrowser1.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(AdminPageCredentials);
webBrowser1.Navigate("www.website.com/admin");
}
private void AdminPageCredentials(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (AdminPagework == false && (webBrowser1.ReadyState == WebBrowserReadyState.Complete))
{
HtmlDocument doc = webBrowser1.Document;
doc.GetElementById("login").SetAttribute("value", "ADMIN");
doc.GetElementById("pass").SetAttribute("value", "123");
doc.GetElementById("submit").InvokeMember("click");
AdminPagework = true;
webBrowser1.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(RedirectToPostPage);
webBrowser1.Navigate("http://www.website.com/admin/post.php");
}
}
public void RedirectToPostPage(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if ((postnavigationdone == false) && (webBrowser1.ReadyState == WebBrowserReadyState.Complete))
{
HtmlDocument doc = webBrowser1.Document;
doc.GetElementById("title").SetAttribute("value", "check1");
doc.GetElementById("content").SetAttribute("value", textBox2.Text);
doc.GetElementById("post-format-video").InvokeMember("click");
doc.GetElementById("in-category-64").InvokeMember("click");
webBrowser1.Document.GetElementById("mm").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("jj").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("aa").SetAttribute("value", "2013");
webBrowser1.Document.GetElementById("hh").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("mm").SetAttribute("value", "01");
doc.GetElementById("publish").InvokeMember("click");
postnavigationdone = true;
}
}

var titleElement = doc.GetElementById("title");
titleElement.SetAttribute("value","check1");
Try that and see if the title element is found after all, since the most likely reason it fails is: There is no element with the name "title".
I like using ScrapySharp framework (you'll find it on NuGet) for web automation.
var titleNodes = doc.DocumentNode.CssSelect("div#title").ToList();
foreach(var titleNode in titleNodes)
{
titleNode.SetAttribute("value","check1");
}
btw. why would you do that anyway, changing this attribute? Just curious...

Related

C# Wait for Web Page to Load Before Scraping

I am trying to make a Windows Forms app that logs in another web application, navigates for a few steps (clicks) until it reaches a specific page and then scrape some info (names and addresses).
The problem is that I am using the DocumentCompletedEventHandler in order to have a page loaded before I execute the code for navigating to the next page (in order to reach the final web page).
When it fires, DocumentCompletedEventHandler fires multiple times.
When I reach the loggin page, it enters the credentials and then the message "Page loaded!" appears multiple times.
I press enter, it appears again.
Then it navigates to the next page and with that new page I have the same problem.
how can I make DocumentCompletedEventHandler to fire only once and not multiple times?
private void loadEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
MessageBox.Show("Page loaded!");
}
private void loadLogin(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var inputElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement i in inputElements)
{
if (i.GetAttribute("name").Equals("utilizator"))
{
i.InnerText = textBox1.Text;
}
if (i.GetAttribute("name").Equals("parola"))
{
i.Focus();
i.InnerText = textBox2.Text;
}
}
var buttonElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement b in buttonElements)
{
if (b.GetAttribute("name").Equals("Intra"))
{
b.InvokeMember("Click");
}
}
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadEvent);
var inputElements1 = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement i1 in inputElements1)
{
if (i1.GetAttribute("id").Equals("headerqstext"))
{
i1.Focus();
i1.InnerText = textBox3.Text;
}
}
var buttonElements1 = webBrowser1.Document.GetElementsByTagName("button");
foreach (HtmlElement b1 in buttonElements1)
{
if (b1.GetAttribute("title").Equals("Caută"))
{
b1.InvokeMember("Click");
}
}
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadEvent);
}
private void Button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("http://10.1.104.23/ecris_cdms/");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(loadLogin);
}
}
}
try this :)
Uri last = null;
private void CompleteResponse(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (!(last != null && last != e.Url))
return;
//your code here
}

Getting <img src=""> attribute from AliExpress error

Today I was trying to load images from aliexpress products.
I was using this code : string NowImage = HJ.GetElementsByTagName("img")[0].GetAttribute("src");
it worked for the first 8 images and didn't load the rest of images.
it was returning empty string.
And I checked the html of the aliexpress and found out that it should work.
Can someone help me ? Thanks for reading.
public bool Search()
{
WB.DocumentCompleted += WB_SearchCompleted;
WB.Navigate(URL);
while (WB.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
return true;
}
private void WB_SearchCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection HEC = WB.Document.GetElementsByTagName("li");
foreach(HtmlElement HJ in HEC)
{
if(HJ.GetAttribute("qrdata") == "")
continue;
NowImage = HJ.GetElementsByTagName("img")[0].GetAttribute("src");
//for the first 8 images it was loading perfect after that it was
//returning empty string
}
}

Winforms Webbrowser control URL Validation

I am trying to validate a winform web browser control url when a button is clicked. I would like to see if the web browser's current url matches a certain url. When I try to run this code the program freezes
private void button_Click(object sender, EventArgs e)
{
// Check to see if web browser is at URL
if (webBrowser1.Url.ToString != "www.google.com" || webBrowser1.Url.ToString == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else {
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}
Here you go.
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Url = new Uri("https://www.google.ca");
// Check to see if web browser is at URL
if (webBrowser1.Url != null)
{
if (webBrowser1.Url.ToString() != "https://www.google.com" || webBrowser1.Url.ToString() == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else
{
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}
}
1) Please use the schema with the URL.
2) Use ToString() as a function.

Trying to log in to a website through a C# program

I'm new to C# so I looked for this topic in other questions but they weren't for me. What I am trying to do is I currently try to login to my school's servers using a c# program(Which I'm trying to implement). What I'm trying to do is I know the code of the page, so I am using web browser of c# to navigate then I just want to write name and password to the input boxes and this is where I stuck. Can you please give me any advices?
If you want to look at page: https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin
Thanks for your advices.
Here how I used the code(Edit: Added eventhandler but this is my first time using so it promts me "object reference not set to a instance of an object"):
private void buttonGo_Click(object sender, EventArgs e)
{
try
{
string input = "https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin";
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(WebBrowser_DocumentCompleted);
webBrowser1.Navigate(input);
HtmlDocument doc = webBrowser1.Document;
HtmlElement userName = doc.GetElementById("UserID");
HtmlElement pass = doc.GetElementById("PIN");
HtmlElement submit = doc.GetElementById("Login");
userName.SetAttribute("value", textID.Text);
pass.SetAttribute("value", textPASS.Text);
submit.InvokeMember("Click");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
public void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var webBrowser = sender as WebBrowser;
webBrowser.DocumentCompleted -= WebBrowser_DocumentCompleted;
MessageBox.Show(webBrowser.Url.ToString());
}
}
}
Finally I solved problem I cheated a little but managed to solve. Here is the working code:
private void buttonGo_Click(object sender, EventArgs e)
{
try
{
string input = "https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin";
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(WebBrowser_DocumentCompleted);
webBrowser1.Navigate(input);
HtmlDocument doc = webBrowser1.Document;
//HtmlElement userName = doc.GetElementById("UserID"); These not worked because ID of the elements were hidden so they are here to show which of these did not work.
//HtmlElement pass = doc.GetElementById("password");
HtmlElement submit = webBrowser1.Document.Forms[0].Document.All["PIN"].Parent.Parent.Parent.NextSibling.FirstChild;
//userName.SetAttribute("value", textID.Text);
//pass.SetAttribute("value", textPASS.Text);
webBrowser1.Document.Forms[0].All["UserID"].SetAttribute("value", textID.Text);
webBrowser1.Document.Forms[0].All["PIN"].FirstChild.SetAttribute("value", textPASS.Text);
submit.InvokeMember("Click");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
public void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var webBrowser = sender as WebBrowser;
webBrowser.DocumentCompleted -= WebBrowser_DocumentCompleted;
MessageBox.Show(webBrowser.Url.ToString());
}
You need to find the input boxes of the username and password fields as ID's or nodes first. Then assign them as such:
HtmlDocument doc = webBrowser1.Document;
HtmlElement email = doc.GetElementById("email");
HtmlElement pass = doc.GetElementById("pass");
HtmlElement submit = doc.GetElementById("LoginButton");
email.SetAttribute("value", "InsertYourEmailHere");
//Same for password
submit.InvokeMember("Click");

C# - Get variable from webbrowser generated by javascript

have downloaded page by webbrowser and need to get mail address. But it is generated by javastript. In code i can find this script:
<script type="text/javascript" charset="utf-8">var i='ma'+'il'+'to';var a='impexta#impexta.sk';document.write(''+a+'');</script>
I read everywhere how to Invoke script, by i don't know his name. So what i want is to get "a" variable value.
EDIT: Code before:
...
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate(url);
for (; wb.ReadyState != WebBrowserReadyState.Complete; )
{
System.Windows.Forms.Application.DoEvents();
}
...
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
if (wb != null)
{
if (wb.ReadyState == WebBrowserReadyState.Complete)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(wb.DocumentStream);
}
}
}
I found easy solution. Just finding the right part of string in HTML code:
foreach (HtmlNode link in root.SelectNodes("//script"))
{
if (link.InnerText.Contains("+a+"))
{
string[] strs = new string[] { "var a='", "';document.write" };
strs = link.InnerText.Split(strs, StringSplitOptions.None);
outMail = System.Net.WebUtility.HtmlDecode(strs[1]);
if (outMail != "")
{
break;
}
}
}

Categories

Resources