Winform WebBrowser Pass Cookie Then Process Links? - c#

I asked this question a while ago but seems that there are no answers, so i tried to go with an alternative solution but i am stuck now, please see the following code:
WebBrowser objWebBrowser = new WebBrowser();
objWebBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(objWebBrowser_DocumentCompleted);
objWebBrowser.Navigate("http://www.website.com/login.php?user=xxx&pass=xxx");
objWebBrowser.Navigate("http://www.website.com/page.php?link=url");
And here is the event code:
WebBrowser objWebBrowser = (WebBrowser)sender;
String data = new StreamReader(objWebBrowser.DocumentStream).ReadToEnd();
Since it's impossible for me to use the WebBrowser.Document.Cookies before a document is loaded, i have first to navigate the login page, that will store a cookie automatically, but after that i want to call the other navigate in order to get a result. Now using the above code it doesn't work cause it always takes the second one, and it won't work for me to put it in the event cause what i want is like this:
Navigate with the login page and store cookie for one time only.
Pass a different url each time i want to get some results.
Can anybody give a solution ?
Edit:
Maybe the sample of code i provided was misleading, what i want is:
foreach(url in urls)
{
Webborwser1.Navigate(url);
//Then wait for the above to complete and get the result from the event, then continue
}

I think you want to simulate a blocking call to Navigate if you are not authorized. There are probably many ways to accomplish this and other approaches to get what you want, but here's some code I wrote up quickly that might help you get started.
If you have any questions about what I'm trying to do here, let me know. I admit it feels like "a hack" which makes me think there's a smarter solution, but anyway....
bool authorized = false;
bool navigated;
WebBrowser objWebBrowser = new WebBrowser();
void GetResults(string url)
{
if(!authorized)
{
NavigateAndBlockWithSpinLock("http://www.website.com/login.php?user=xxx&pass=xxx");
authorized = true;
}
objWebBrowser.Navigate(url);
}
void NavigateAndBlockWithSpinLock(string url)
{
navigated = false;
objWebBrowser.DocumentCompleted += NavigateDone;
objWebBrowser.Navigate(url);
int count = 0;
while(!navigated && count++ < 10)
Thread.Sleep(1000);
objWebBrowser.DocumentCompleted -= NavigateDone;
if(!navigated)
throw new Exception("fail");
}
void NavigateDone(object sender, WebBrowserDocumentCompletedEventArgs e)
{
navigated = true;
}
void objWebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if(authorized)
{
WebBrowser objWebBrowser = (WebBrowser)sender;
String data = new StreamReader(objWebBrowser.DocumentStream).ReadToEnd();
}
}

Related

How to have C# Webbrowser handle webpage login popup for webscraping

I'm trying to programmatically login to a site like espn.com. The way the site is setup is once I click on the Log In button located on the homepage, a Log In popup window is displayed in the middle of the screen with the background slightly tinted. My goal is to programmatically obtain that popup box, supply the username and password, and submit it -- hoping that a cookie is returned to me to use as authentication. However, because Javascript is used to display the form, I don't necessarily have easy access to the form's input tags via the main page's HTML.
I've tried researching various solutions such as HttpClient and HttpWebRequest, however it appears that a Webbrowser is best since the login form is displayed using Javascript. Since I don't necessarily have easy access to the form's input tags, a Webbrowser seems the best alternative to capturing the popup's input elements.
class ESPNLoginViewModel
{
private string Url;
private WebBrowser webBrowser1 = new WebBrowser();
private SHDocVw.WebBrowser_V1 Web_V1;
public ESPNLoginViewModel()
{
Initialize();
}
private void Initialize()
{
Url = "http://www.espn.com/";
Login();
}
private void Login()
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(webpage_DocumentCompleted);
Web_V1 = (SHDocVw.WebBrowser_V1)this.webBrowser1.ActiveXInstance;
Web_V1.NewWindow += new SHDocVw.DWebBrowserEvents_NewWindowEventHandler(Web_V1_NewWindow);
}
//This never gets executed
private void Web_V1_NewWindow(string URL, int Flags, string TargetFrameName, ref object PostData, string Headers, ref bool Processed)
{
//I'll start determing how to code this once I'm able to get this invoked
}
private void webpage_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElement loginButton = webBrowser1.Document.GetElementsByTagName("button")[5];
loginButton.InvokeMember("click");
//I've also tried the below InvokeScript method to see if executing the javascript that
//is called when the Log In button is clicked, however Web_V1_NewWindow still wasn't called.
//webBrowser1.Document.InvokeScript("buildOverlay");
}
}
I'm expecting the Web_V1_NewWindow handler to be invoked when the InvokeMember("click") method is called. However, code execution only runs through the webpage_DocumentCompleted handler without any calls to Web_V1_NewWindow. It might be that I need to use a different method than InvokeMember("click") to invoke the Log In button's click event handler. Or I might need to try something completely different altogether. I'm not 100% sure the Web_V1.NewWindow is the correct approach for my needs, but I've seen NewWindow used often when dealing with popups so I figured I should give it a try.
Any help would be greatly appreciated as I've spent a significant amount of time on this.
I know it is the late answer. But it will help someone else.
You can extract the value from FRAME element by following
// Get frame using frame ID
HtmlWindow frameWindow = (from HtmlWindow win
in WbBrowser.Document.Window.Frames select win)
.Where(x => string.Compare(x.WindowFrameElement.Id, "frm1") == 0)
.FirstOrDefault();
// Get first frame textbox with ID
HtmlElement txtElement = (from HtmlElement element
in frameWindow.Document.GetElementsByTagName("input")
select element)
.Where(x => string.Compare(x.Id, "txt") == 0).FirstOrDefault();
// Check txtElement is nul or not
if(txtElement != null)
{
Label1.Text = txtElement.GetAttribute("value");
}
For more details check
this article

C# WebBrowser in loop and Fiddlercore

I use Fiddlercore to capture multiple url's at the same time inside a loop.
Example:
private void button1_Click(object sender, EventArgs e)
{
// I have 2 url
string arr = new string[]{ url1, url2 };
foreach(var url in arr)
{
new Webbrowser().Navigate(url);
}
Fiddler.FiddlerApplication.AfterSessionComplete
+= new Fiddler.SessionStateHandler(FiddlerApplication_AfterSessionComplete);
}
// I will catch 2 oSession contain same string "a/b/c" in 2 URL from 2 Webbrowser in loop
int Count = 0;
void FiddlerApplication_AfterSessionComplete(Fiddler.Session oSession)
{
if(oSession.fullUrl.contain("a/b/c"))
{
Count+= 1;
richtextbox1.AppendText("oSession.fullUrl" + "\n");
}
if(Count == 2)
{
Count = 0;
StopFiddler();
}
}
void StopFiddler()
{
Fiddler.FiddlerApplication.AfterSessionComplete
-= new Fiddler.SessionStateHandler(FiddlerApplication_AfterSessionComplete);
}
This works but I have a problem. Fiddlercore stops the capture session, but the web browser doesn't stop, it's still loading.
How to stop the WebBrowser from loading after I get what I need.
Use WebBrowser.Stop() to stop all loading.
Cancels any pending navigation and stops any dynamic page elements, such as background sounds and animations.
Edit: Also, you need to save a reference to those WebBrowser controls you're creating, so that you can actually call the Stop method for them. The way you use them now is quite strange and might lead to problems down the line (actually it led to problems already).

C# Thread not changing the text box values the second time

I am creating an application that involves using threads. Everything works until I click the button for the second time. Nothing happens on the second time the button is clicked. Its like the first time all the stuff loads and then just locks the values of the text boxes. The stuff in red is just private links that cannot be shown. Its not the links because they work just fine the first time. They just won't work the second time. I hope what I just said wasn't too confusing.
name1, name2, name3 are all downloaded when the form is created, they're just bound to the textboxes when you press the button the first time.
_name1(), _name2(), _name3() methods are just object instantiations and have no side effects of any kind (put differently, they don't do anything).
And all the threading stuff is just fluff - you're calling methods that don't do anything and then aborting the threads (thereby aborting something that isn't doing anything anyway). This has zero effect on the execution in any way as the code is currently written, even when executed the first time.
The simple, synchronous fix for your code will look like this:
private void Button_Click(object sender, EventArgs e)
{
using (WebClient client = new WebClient())
{
textBox1.Text = client.DownloadString("<your URL here>");
textBox2.Text = client.DownloadString("<your URL here>");
textBox3.Text = client.DownloadString("<your URL here>");
}
}
Seeing as you're using threads, your goal is obviously non-blocking, asynchronous execution. The easiest way to achieve it while preserving the sequencing of operations is with async/await:
private async void Button_Click(object sender, EventArgs e)
{
// Disabling the button ensures that it's not pressed
// again while the first request is still in flight.
materialRaisedButton1.Enabled = false;
try
{
using (WebClient client = new WebClient())
{
// Execute async downloads in parallel:
Task<string>[] parallelDownloads = new[] {
client.DownloadStringTaskAsync("<your URL here>"),
client.DownloadStringTaskAsync("<your URL here>"),
client.DownloadStringTaskAsync("<your URL here>")
};
// Collect results.
string[] results = await Task.WhenAll(parallelDownloads);
// Update all textboxes at the same time.
textBox1.Text = results[0];
textBox2.Text = results[1];
textBox3.Text = results[2];
}
}
finally
{
materialRaisedButton1.Enabled = true;
}
}

How to make WebBrowser wait till it loads fully?

I have a C# form with a web browser control on it.
I am trying to visit different websites in a loop.
However, I can not control URL address to load into my form web browser element.
This is the function I am using for navigating through URL addresses:
public String WebNavigateBrowser(String urlString, WebBrowser wb)
{
string data = "";
wb.Navigate(urlString);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
data = wb.DocumentText;
return data;
}
How can I make my loop wait until it fully loads?
My loop is something like this:
foreach (string urlAddresses in urls)
{
WebNavigateBrowser(urlAddresses, webBrowser1);
// I need to add a code to make webbrowser in Form to wait till it loads
}
Add This to your code:
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Fill in this function
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
//This line is so you only do the event once
if (e.Url != webBrowser1.Url)
return;
//do you actual code
}
After some time of anger of the crappy IE functionality I've came across making something which is the most accurate way to judge page loaded complete.
Never use the WebBrowserDocumentCompletedEventHandler event
use WebBrowserProgressChangedEventHandler with some modifections seen below.
//"ie" is our web browser object
ie.ProgressChanged += new WebBrowserProgressChangedEventHandler(_ie);
private void _ie(object sender, WebBrowserProgressChangedEventArgs e)
{
int max = (int)Math.Max(e.MaximumProgress, e.CurrentProgress);
int min = (int)Math.Min(e.MaximumProgress, e.CurrentProgress);
if (min.Equals(max))
{
//Run your code here when page is actually 100% complete
}
}
Simple genius method of going about this, I found this question googling "How to sleep web browser or put to pause"
According to MSDN (contains sample source) you can use the DocumentCompleted event for that. Additional very helpful information and source that shows how to differentiate between event invocations can be found here.
what you experiencend happened to me . readyStete.complete doesnt work in some cases. here i used bool in document_completed to check state
button1_click(){
//go site1
wb.Navigate("site1.com");
//wait for documentCompleted before continue to execute any further
waitWebBrowserToComplete(wb);
// set some values in html page
wb.Document.GetElementById("input1").SetAttribute("Value", "hello");
// then click submit. (submit does navigation)
wb.Document.GetElementById("formid").InvokeMember("submit");
// then wait for doc complete
waitWebBrowserToComplete(wb);
var processedHtml = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
var rawHtml = wb.DocumentText;
}
// helpers
//instead of checking readState . we get state from DocumentCompleted Event via bool value
bool webbrowserDocumentCompleted = false;
public static void waitWebBrowserToComplete(WebBrowser wb)
{
while (!webbrowserDocumentCompleted )
Application.DoEvents();
webbrowserDocumentCompleted = false;
}
form_load(){
wb.DocumentCompleted += (o, e) => {
webbrowserDocumentCompleted = true;
};
}

Get HtmlDocument after javascript manipulations

In C#, using the System.Windows.Forms.HtmlDocument class (or another class that allows DOM parsing), is it possible to wait until a webpage finishes its javascript manipulations of the HTML before retrieving that HTML? Certain sites add innerhtml to pages through javascript, but those changes do not show up when I parse the HtmlElements of the HtmlDocument.
One possibility would be to update the HtmlDocument of the page after a second. Does anybody know how to do this?
Someone revived this question by posting what I think is an incorrect answer. So, here are my thoughts to address it.
Non-deterministically, it's possible to get close to finding out if the page has finished its AJAX stuff. However, it completely depends on the logic of that particular page: some pages are perpetually dynamic.
To approach this, one can handle DocumentCompleted event first, then asynchronously poll the WebBrowser.IsBusy property and monitor the current HTML snapshot of the page for changes, like below.
The complete sample can be found here.
// get the root element
var documentElement = this.webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(500, token);
// continue polling if the WebBrowser is still busy
if (this.webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
In general aswer is "no" - unless script on the page notifies your code in some way you have to simply wait some time and grab HTML. Waiting a second after document ready notification likley will cover most sites (i.e. jQuery's $(code) cases).
You need to give the application a second to process the Java. Simply halting the current thread will delay the java processing as well so your doc will still come up outdated.
WebBrowserDocumentCompletedEventArgs cachedLoadArgs;
private void TimerDone(object sender, EventArgs e)
{
((Timer)sender).Stop();
respondToPageLoaded(cachedLoadArgs);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
cachedLoadArgs = e;
System.Windows.Forms.Timer timer = new Timer();
int interval = 1000;
timer.Interval = interval;
timer.Tick += new EventHandler(TimerDone);
timer.Start();
}
What about using 'WebBrowser.Navigated' event?
I made with WEbBrowser take a look at my class:
public class MYCLASSProduct: IProduct
{
public string Name { get; set; }
public double Price { get; set; }
public string Url { get; set; }
private WebBrowser _WebBrowser;
private AutoResetEvent _lock;
public void Load(string url)
{
_lock = new AutoResetEvent(false);
this.Url = url;
browserInitializeBecauseJavascriptLoadThePage();
}
private void browserInitializeBecauseJavascriptLoadThePage()
{
_WebBrowser = new WebBrowser();
_WebBrowser.DocumentCompleted += webBrowser_DocumentCompleted;
_WebBrowser.Dock = DockStyle.Fill;
_WebBrowser.Name = "webBrowser";
_WebBrowser.ScrollBarsEnabled = false;
_WebBrowser.TabIndex = 0;
_WebBrowser.Navigate(Url);
Form form = new Form();
form.Hide();
form.Controls.Add(_WebBrowser);
Application.Run(form);
_lock.WaitOne();
}
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlAgilityPack.HtmlDocument hDocument = new HtmlAgilityPack.HtmlDocument();
hDocument.LoadHtml(_WebBrowser.Document.Body.OuterHtml);
this.Price = Convert.ToDouble(hDocument.DocumentNode.SelectNodes("//td[#class='ask']").FirstOrDefault().InnerText.Trim());
_WebBrowser.FindForm().Close();
_lock.Set();
}
if your trying to do this in a console application, you need to put this tag above your main, because Windows needs to communicate with COM Components:
[STAThread]
static void Main(string[] args)
I did not like this solution, But I think that is no one better!

Categories

Resources