Page is cut while printing with the Windows.Forms.WebBrowser control

Page is cut while printing with the Windows.Forms.WebBrowser control - c#

I have trouble with the printing function of the WebBrowser control.
First i load my page and it is rendered correct.
Then i set the header/footer/margins and the right printer:
webbrowser printing;
How do I programatically change printer settings with the WebBrowser control?
That works so far. Then i use myBrowser.Print();
But my website does not print right. Only the upper left corner is printed, a few centimeters and then there are scrollbars.
I printed my website with the IE9 and everything was alright. Also tried different browsermodes and documentmodes. No problem.
And i thought the control and the IE are technically the same...
Are there any parameters i have forgotten?
The website i want to print is old and has no doctype. But since the control displays it correct, I expect it to print correct, too.
Edit:
Found out that it has to do with javascript on the website, which does not run for printing.
Is there a way to get the HTML from the manipulated DOM?

For the print function no external documents are processed. All documents for a website have to be merged into one printable site. To get the other documents you have to cast the myBrowser.Document.DomDocument to an IHTMLDocument2. From that IHTMLDocument2 you can extract the CSS or JS to put it into the html.
For Example:
void myBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentCompleted);
myBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
String mySource = myBrowser.DocumentText;
// Get the CSS
IHTMLDocument2 doc = (myBrowser.Document.DomDocument) as IHTMLDocument2;
myCSS = doc.styleSheets.item(0).cssText;
mySource = mySource.Replace("<link rel=\"stylesheet\" type=\"text/css\" href=\"/css/style.css\">", "<style type=\"text/css\">"+myCSS+"</style>");
// Reload
myBrowser.DocumentText = mySource;
}
void myBrowser_DocumentPrintable(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
myBrowser.Print();
}

Related

How do I return all visible text on a web page as a big, unparsed string?

I am looking for a simple script that would basically do the equivalent of a user pressing Ctrl+A (select all) on a web page and then copying the text to the clipboard so I can pull it into a string from there.
The reason I want to emulate a user selecting all and then copy and pasting is because some pages are generated with Javascript and do not have the visible text in the HTML.
In any case, I am just looking for the raw unparsed text. I do not care if the spacing/line breaks are messed up, etc. I just want a quick and dirty snapshot of all the selectable text on the page into a string.
I have tried doing below as an example:
private void button3_Click(object sender, EventArgs e)
{
HAP.HtmlWeb web = new HAP.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.
Load(#"https://mywebsite");
string str = doc.DocumentNode.InnerText;
MessageBox.Show(str);
}
but if the page has javascript it does not return the text displayed by it.

Instead of
doc.DocumentNode.InnerText;
Use this
doc.DocumentNode.InnerHtml;
It will get you the whole HTML including JS and CSS. Hope it helps.

With jQuery: $(document).text() or $('body').text()

Display portion of a Website into a Web Browser

i want to navigate to a specific website, and i want then to be displayed in the web browser only a portion of the website, which starts with:
<div id="dex1" ...... </div>
I know i need to get the element by id, but firstly i tried writing this:
string data = webBorwser.Document.Body.OuterHtml;
So from data i need to grab that content "id" and display it and the rest to be deleted.
Any idea on this?

webBrowser1.DocumentCompleted += (sender, e) =>
{
webBrowser1.DocumentText = webBrowser1.Document.GetElementById("dex1").OuterHtml;
};
On second thoughts, don't do that, setting the DocumentText property causes the DocumentCompleted event to fire again. So maybe do:
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.DocumentCompleted -= webBrowser1_DocumentCompleted;
webBrowser1.DocumentText = webBrowser1.Document.GetElementById("dex1").OuterHtml;
}
Although in most real world cases I'd expect you'd get better results injecting some javascript to do the DOM manipulation, a la Andrei's answer.
Edit: to just replace everything inside the body tag which might if you're lucky maintain all the required styling and scripts if they're all in the head don't reference any discarded context, you may have some joy with:
webBrowser1.Document.Body.InnerHtml = webBrowser1.Document.GetElementById("dex1").OuterHtml;

So, as you probably need a lot of external resources like scripts and images. You can add some custom javascript to modify the DOM however you like after you have loaded the document from your website. From How to update DOM content inside WebBrowser Control in C#? it would look something like this:
HtmlElement headElement = webBrowser1.Document.GetElementsByTagName("head")[0];
HtmlElement scriptElement = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement domScriptElement = (IHTMLScriptElement)scriptElement.DomElement;
domScriptElement.text = "function applyChanges(){ $('body >').hide(); $('#dex1').show().prependTo('body');}";
headElement.AppendChild(scriptElement);
// Call the nextline whenever you want to execute your code
webBrowser1.Document.InvokeScript("applyChanges");
This is also assuming that jquery is available so you can do simple DOM manipulation.
The javascript code is just hiding all children on the body and then prepending the '#dex' div to the body so that it's at the top and visible.

Change contents without redirecting to about:blank

I'm currently working on an application that modifies a specific web page to hide irrelevant information and the displays it in a WebBrowser control in the application window. Unfortunately as soon as i set the DocumentText Property of the WebBrowser, it navigates to about:blank and the displays the HTML content. However, because it redirects to about:blank, all relative element in the web page become invalid, creating a very odd looking web page with no stylesheet what so ever.
Is there a way i can modify what the WebBrowser control displays, without having it redirect to about:blank and therefore ruining all relative elements?

This should work for injecting a HTML Element to your page, without resetting the rest of the DOM:
private void button1_Click(object sender, EventArgs e)
{
HtmlElement myElem = webBrowser1.Document.CreateElement("input");
dynamic element = myElem.DomElement;
element.SetAttribute("value", "Hello, World!");
(webBrowser1.Document.GetElementsByTagName("body")[0]).AppendChild(myElem);
}

is there a straightforward way to retrieve text that is rendered by the browser but is not hard-coded in the actual html file?

I'm trying to retrieve data from a webpage but I cannot do it by making a web request and parsing the resulting html file because the actual text that I'm trying to retrieve is not in the html file! I imagine that this text is pulled using some script and for that reason it's not in the html file. For all I know I'm looking at the wrong data, but assuming that my theory is correct, is there a straightforward way to retrieve whatever text is displayed by the browser (Firefox or IE) rather than attempt to fetch the text from the html file?

Assuming you are referring to text that has been generated using Javascript in the browser.
You can use PhantomJS to achieve this: http://phantomjs.org/
It is essentially a headless browser that will process Javascript.
You may need to run this as ane xternal program but Im sure you can do that through C#

Your other option would be to open the web page in a WebBrowser object which should execute the scripts, and then you can get the HtmlDocument object and go from there.
Take a look at this example...
private void test()
{
WebBrowser wBrowser1 = new WebBrowser();
wBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wBrowser1_DocumentCompleted);
wBrowser1.Url = new Uri("Web Page URL");
}
void wBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument document = (sender as WebBrowser).Document;
// get elements and values accordingly.
}

Get HTML source code from browser control embedded in C#

I have a browser control embeded in a C# windows app. I want to grab the rendered HTML (which could have been modified by javascript) not the original one.
Any suggestions?

You can get the HTML, and indeed set it, with WebBrowser.DocumentText.
Sheng is correct, DocumentText returns the streamed document before scripts run. His code doesn't compile, but it's essentially correct. I found that you need:
mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;

DocumentText internally use the document's IPersistStream interface which returns the original HTML. Use webBrowser1.Document.DocumentElement.OuterHTML instead.

Add a Navigated event to your WebBrowser. Only then will your document be filled.
private void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
Console.WriteLine(webBrowser1.DocumentText);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Page is cut while printing with the Windows.Forms.WebBrowser control - c#

Related

How do I return all visible text on a web page as a big, unparsed string?

Display portion of a Website into a Web Browser

Change contents without redirecting to about:blank

is there a straightforward way to retrieve text that is rendered by the browser but is not hard-coded in the actual html file?

Get HTML source code from browser control embedded in C#

Categories

Resources