I am creating a metro-style version for one of my apps, coded in c#. I need to access a web page, get the url and html code from that page and then use them.
In the winforms version, i used a WebBrowser control and its properties, .Url and .Document.ActiveElement.OuterHtml/InnerHtml.
For the metro-style app, i used a WebView control to access the page but it doesn't have such properties and i can't find anywhere how to get the url and the html code.
Anybody knows how to do that? Thanks in advance!
Edit: something like this (winforms, C#):
webBrowser1.DocumentCompleted+=delegate
{
if (webBrowser1.Url.ToString ().StartsWith ("http://www.google.com/"))
{
string url=webBrowser1.Url.ToString ();
string htmlCode=webBrowser1.Document.ActiveElement.InnerHtml;
}
};
I hope that this is the answer you need.
Related
Im trying to open a URL(Some link) in Webbrowser Control.
The link return a html page which contain Google Graph , but my Webbrowser Control is Blank and dont display any thing on it. It works fine on WebBrowserTask and on my pc so their is no problem in this link but it is blank on webBrowser Control Any Idea How i can Do this ??
public GraphPage()
{
InitializeComponent();
webBrowser1.Navigated += new EventHandler<System.Windows.Navigation.NavigationEventArgs>(Browser_Navigated);
webBrowser1.Navigating += new EventHandler<NavigatingEventArgs>(Browser_Navigating);
loadPage(getBaseUrl(graphType));
}
private void loadPage(String url )
{
webBrowser1.IsScriptEnabled = true;
webBrowser1.Source = new Uri("Link");
}
As mentioned by user112553, set IsScriptEnabled true. Can be done under the XAML-code or in the code-behind with
XAML
<phone:WebBrowser x:Name="Browser" IsScriptEnabled="True" />
Code-Behind
Browser.IsScriptEnabled = true;
I encountered a similar situation, with Windows Phone 8 and a HTML page using JQuery.
IsScriptEnabled=true wasn't enough (the page didn't render properly).
I solved adding to the html page a
doctype declaration:
<!DOCTYPE html>
<html>
...
Seems like the WebBrowser component refuses to render HTML5 pages without explicit defining the document-type.
Since it's a common problem with rendering pages in IE<11 when not defining this tag, the cause of why my scripts didn't run could be many and most likely a reference to a HTML5 tag which was not handled correctly.
ref: http://msdn.microsoft.com/en-us/hh779632.aspx
Since windows phone 8.0 is based on Internet Explorer 10, it makes sense, the confusing part with debugging this behavior is that Internet Explorer on your phone renders the page perfectly. Still the WebBrowser component will not.
If this is documented in the API specifications, it should be easier to find, because I was not able to find any information that would point me to this solution at all, this would be mostly because my pages was rendered in WebViews for Android and IOS without any problems.
Thanks to Antonio Pelleriti for providing this solution.
I am trying to extract some information from a website. But when I navigate to it, it uses javascript to connect me to a server before dynamically loading a php-page. I can follow the sequence in Chrome with the developer tools. I figured it would be easiest to reproduce it in C# with the Webbrowser control and simply navigate to the website. Then the webbrowser control must contain all the javascript files, the text from the dynamically loaded php page and so on. But is this true and where in the control are they stored? I can't seem to find them.
Recreate the whole sequence diagram implemented in Chrome would be a lot of work. However, "extract some information from a website" is something that can be done quite easily.
Disclaimer: I assumed this question was for the WPF's WebBrower control (it would be almost the same for WinForms)
You can get the HTMLDocument once the page is loaded, using:
using mshtml; // <- don't forget to add the reference
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
browser.Navigate("http://google.com/");
browser.LoadCompleted += browser_LoadCompleted;
}
void browser_LoadCompleted(object sender, NavigationEventArgs e)
{
HTMLDocument doc = (HTMLDocument)browser.Document;
string html = doc.documentElement.innerHTML.ToString();
// from here, you should be able to parse the HTML
// or sniff the HTMLDocument (using HTML Agility Pack for instance)
}
}
From this HTMLDocument, you have access to a lot of properties, including HTML elements, CSS styles and scripts. I invite you to put a break-point and check out what best fits your needs.
Nevertheless, since the page you want to load uses JavaScript to fill its content, the HTMLDocument will probably not be complete a the time the LoadCompleted is raise.
In that case, I suggest to use a timer to poll until the content is stable.
You could also use HTMLDocument to inject your own JavaScript code, and call C# methods througth WebBrowser.ObjectForScripting, but this is gonna be much more complicated and harder to maintain.
I'm doing an automation program. I load a webpage into my windows form and load it in WebBrowser control. Then, I need to click on a link from the WebBrowser programatically. How can I do this? for example:
Google Me
Facebook Me
The above are 2 different conditions. The first element does not have an id attribute while the second one does. Any idea on how to click each programmatically?
You have to find your element first, by its ID or other filters:
HtmlElement fbLink = webBrowser.Document.GetElementByID("fbLink");
And to simulate "click":
fbLink.InvokeMember("click");
An example for finding your link by inner text:
HtmlElement FindLink(string innerText)
{
foreach (HtmlElement link in webBrowser.Document.GetElementsByTagName("a"))
{
if (link.InnerText.Equals("Google Me"))
{
return link;
}
}
}
You need a way to automate the browser then.
One way to do this is to use Watin (https://sourceforge.net/projects/watin/). It allows you to write a .Net program that controls the browser via a convenient object model. It is mainly used to write automated tests for web pages, but it can also be used to control the browser.
If you don't want to control the browser this way then you could write a javascript that you include on your page that does the clicking, but I doubt that is what you are after.
Is it possible to scrape all the text from a site that was navigated to by WebBrowser control without looking at the source?
David Walker's method is great when one don't need any info from the header nor non main part of the webpage. if one need something outside inner text, there is only two options, one is to parse with "getElement".
the other one is issue commands (Document.ExecCommand) to webbrowser to select all and copy to clipboard:
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);
then finally string content=clipboard.getText();
Please note the spelling and syntax may not be correct, I'm recalling from my memory
string browserContents = webBrowser.Document.Body.InnerText;
You use the DocumentText property or the WebBrowser control.
This property is what holds the HTML of the site you have navigated to.
Update: (following comments)
If you want to parse the HTML and get the text parts of it, I suggest you use the HTML Agility Pack.
We are using the WebBrowser control in c# winforms and need to be able to get information about the Url the cursor is positioned on.
So we have a web page in design mode, which has multiple urls, when the cursor is over one I would like to call a method which would return the id of the link.
Thanks
You can use the IHTMLCaret to get the cursor position from there using IMarkupPointer you can get the element in the current scope.
The webBrowser control has a Document property which has a Links collection. Each Link is an HTMLElement which has events you can tap into. Again, I'm not sure what you mean "cursor" because in the web world, unless if you're in a textbox, there really isn't a "cursor" (which is what I meant to ask in my comment) but you can tap into the MouseOver event and other stuff like that.
Example:
foreach (HtmlElement element in this.webBrowser1.Document.Links)
{
element.MouseOver += (o, ex) =>
{
Console.WriteLine(ex.ToElement.GetAttribute("HREF"));
};
}
This will print out the actual URL that the mouse is over.
You can have a look at this article - Hosting a web browser component in a C# winform - which explains several ways to perform that. or go directly to this one - Hosting a webpage inside a Windows Form - Basically what you need to do is handle the Click of the DOM object inside the COM WebBrowser of IE. You achieve this by handling the Js events inside your C# code.
I remember this kind of customization must be done using the AxSHDocVw.AxWebBrowser COM object instead of the System.Windows.Forms.WebBrowser Class from the newer versions of the .Net Framework.
I could send you more data about this, I did it some project, just give me time to find it ;). In the mean time try with those links.
By!