i want to navigate to a specific website, and i want then to be displayed in the web browser only a portion of the website, which starts with:
<div id="dex1" ...... </div>
I know i need to get the element by id, but firstly i tried writing this:
string data = webBorwser.Document.Body.OuterHtml;
So from data i need to grab that content "id" and display it and the rest to be deleted.
Any idea on this?
webBrowser1.DocumentCompleted += (sender, e) =>
{
webBrowser1.DocumentText = webBrowser1.Document.GetElementById("dex1").OuterHtml;
};
On second thoughts, don't do that, setting the DocumentText property causes the DocumentCompleted event to fire again. So maybe do:
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.DocumentCompleted -= webBrowser1_DocumentCompleted;
webBrowser1.DocumentText = webBrowser1.Document.GetElementById("dex1").OuterHtml;
}
Although in most real world cases I'd expect you'd get better results injecting some javascript to do the DOM manipulation, a la Andrei's answer.
Edit: to just replace everything inside the body tag which might if you're lucky maintain all the required styling and scripts if they're all in the head don't reference any discarded context, you may have some joy with:
webBrowser1.Document.Body.InnerHtml = webBrowser1.Document.GetElementById("dex1").OuterHtml;
So, as you probably need a lot of external resources like scripts and images. You can add some custom javascript to modify the DOM however you like after you have loaded the document from your website. From How to update DOM content inside WebBrowser Control in C#? it would look something like this:
HtmlElement headElement = webBrowser1.Document.GetElementsByTagName("head")[0];
HtmlElement scriptElement = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement domScriptElement = (IHTMLScriptElement)scriptElement.DomElement;
domScriptElement.text = "function applyChanges(){ $('body >').hide(); $('#dex1').show().prependTo('body');}";
headElement.AppendChild(scriptElement);
// Call the nextline whenever you want to execute your code
webBrowser1.Document.InvokeScript("applyChanges");
This is also assuming that jquery is available so you can do simple DOM manipulation.
The javascript code is just hiding all children on the body and then prepending the '#dex' div to the body so that it's at the top and visible.
Related
In my application I need to use a WebBrowser control to process user login. I use WPF and it's WebBrowser control. The problem is that I want to display a part of web page that is under certain div. I found a solution to this, I need to inject an javascript script into a loaded html page, but I have no clue how to do it :(
This is the script that I would like to inject into web.
function showHide() {
$('body >').hide();
$('#div').show().prependTo('body');
}
So i could later call it webbrowser1.InvokeScript("showHide");
I read a lot of posts on stack, but all of them refer to WindowsForms WebBrowser and it is not working with WPF one.
EDIT: I tried:
private void PageLoaded (object sender, NavigationEventArgs e)
{
var webBrowser = sender as WebBrowser;
webBrowser.Document.InvokeScript("execScript",
new object[] {"$('body >').hide(); $('#" + _div + "').show().prependTo('body');"});
}
But webBrowser.Document is type object and I cannot call InvokeScript on it, have no clue to what I should cast it.
Thanks for help.
Add a reference to mshtml
In whatever event you want to inject the JavaScript:
var doc = (mshtml.HTMLDocument)webBrowser1.Document;
var head = doc.getElementsByTagName("head").Cast<mshtml.HTMLHeadElement>().First();
var script = (mshtml.IHTMLScriptElement)doc.createElement("script");
script.text = "function myFunction() { alert(\"Hello!\");}";
head.appendChild((mshtml.IHTMLDOMNode)script);
In whatever event you want to invoke the JavaScript from:
webBrowser1.InvokeScript("myFunction");
Result:
I have a C# application which has a web browser, navigating to a specified page by default.
What I want to do is when the document has completely loaded, select a html element by tag name(not ID/Class) and then delete the html outside of it but I have tried for some time and still didn't success..
This is my event and where I got so far
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var elementToDelete = webBrowser1.Document.GetElementsByTagName("form");
}
I want to select that form element which has no CLASS/ID and delete all html code that is outside of it(outer-html) so it will be the only thing visible on the page.
You say that you want to delete an element, but then after your code you say that you want to delete everything outside of "form". I'm not sure which you actually want, but you can do the second with the following.
First, note that elementToDelete is actually a collection, not a single element, so we need to get a single element.
var formElements = webBrowser1.Document.GetElementsByTagName("form");
var elementToSave = formElements.FirstOrDefault();
if(elementToSave == null)
throw new InvalidOperationException("No element named 'form'");
You can then set the Document.DocumentText property to the InnerHtml property of "form". You should probably wrap up the inner HTML so that it's a valid page, but this should work:
webBrowser1.Document.DocumentText = elementToSave.InnerHtml;
I'm trying to retrieve data from a webpage but I cannot do it by making a web request and parsing the resulting html file because the actual text that I'm trying to retrieve is not in the html file! I imagine that this text is pulled using some script and for that reason it's not in the html file. For all I know I'm looking at the wrong data, but assuming that my theory is correct, is there a straightforward way to retrieve whatever text is displayed by the browser (Firefox or IE) rather than attempt to fetch the text from the html file?
Assuming you are referring to text that has been generated using Javascript in the browser.
You can use PhantomJS to achieve this: http://phantomjs.org/
It is essentially a headless browser that will process Javascript.
You may need to run this as ane xternal program but Im sure you can do that through C#
Your other option would be to open the web page in a WebBrowser object which should execute the scripts, and then you can get the HtmlDocument object and go from there.
Take a look at this example...
private void test()
{
WebBrowser wBrowser1 = new WebBrowser();
wBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wBrowser1_DocumentCompleted);
wBrowser1.Url = new Uri("Web Page URL");
}
void wBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument document = (sender as WebBrowser).Document;
// get elements and values accordingly.
}
I have trouble with the printing function of the WebBrowser control.
First i load my page and it is rendered correct.
Then i set the header/footer/margins and the right printer:
webbrowser printing;
How do I programatically change printer settings with the WebBrowser control?
That works so far. Then i use myBrowser.Print();
But my website does not print right. Only the upper left corner is printed, a few centimeters and then there are scrollbars.
I printed my website with the IE9 and everything was alright. Also tried different browsermodes and documentmodes. No problem.
And i thought the control and the IE are technically the same...
Are there any parameters i have forgotten?
The website i want to print is old and has no doctype. But since the control displays it correct, I expect it to print correct, too.
Edit:
Found out that it has to do with javascript on the website, which does not run for printing.
Is there a way to get the HTML from the manipulated DOM?
For the print function no external documents are processed. All documents for a website have to be merged into one printable site. To get the other documents you have to cast the myBrowser.Document.DomDocument to an IHTMLDocument2. From that IHTMLDocument2 you can extract the CSS or JS to put it into the html.
For Example:
void myBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentCompleted);
myBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
String mySource = myBrowser.DocumentText;
// Get the CSS
IHTMLDocument2 doc = (myBrowser.Document.DomDocument) as IHTMLDocument2;
myCSS = doc.styleSheets.item(0).cssText;
mySource = mySource.Replace("<link rel=\"stylesheet\" type=\"text/css\" href=\"/css/style.css\">", "<style type=\"text/css\">"+myCSS+"</style>");
// Reload
myBrowser.DocumentText = mySource;
}
void myBrowser_DocumentPrintable(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
myBrowser.Print();
}
I have a browser control embeded in a C# windows app. I want to grab the rendered HTML (which could have been modified by javascript) not the original one.
Any suggestions?
You can get the HTML, and indeed set it, with WebBrowser.DocumentText.
Sheng is correct, DocumentText returns the streamed document before scripts run. His code doesn't compile, but it's essentially correct. I found that you need:
mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;
DocumentText internally use the document's IPersistStream interface which returns the original HTML. Use webBrowser1.Document.DocumentElement.OuterHTML instead.
Add a Navigated event to your WebBrowser. Only then will your document be filled.
private void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
Console.WriteLine(webBrowser1.DocumentText);
}