Print HTML Files from .NET 2.0 App - c#

What's the best method for printing a directory of HTML files in landscape orientation? I don't mind showing a print dialog or not. I've tried several solutions (exhausting Google & StackOverflow) which either print the HTML as a string, or can't print in landscape.
I'm using a .NET 2.0 Win Forms project to create HTML reports. Now I need to send them to the printer spool.
Thanks

Update :
After understanding the requirement correctly following can be achieved
//The example is using WebBrowser Control Version 4.0.0.0 .NET Component
//MSDN : http://msdn.microsoft.com/en-us/library/2te2y1x6(v=vs.80).aspx
//Example 1 : You can print html string using the Web Browser Control
string htmlString = "<html><head><title>Printing from Win forms - Web Browser Control</title></head><body><h1>Hello World....</h1></body></html>";
webBrowser1.DocumentText = htmlString;
//Example 2 : Print file or URL using the Web Browser Control
webBrowser1.Url = new Uri("http://www.stackoverflow.com/faq");
//Call Print function or Print Dialog
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintFile);
private void PrintFile(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//You can setup page e.g. Orientation to Landscape and choose one of the Print options below
(WebBrowser)sender).ShowPageSetupDialog();
// Print the document now that it is fully loaded.
((WebBrowser)sender).Print();
//OR
(WebBrowser)sender).ShowPrintDialog();
//OR Even better setup print options and then Print
(WebBrowser)sender).ShowPrintPreviewDialog();
// Dispose the WebBrowser now that the task is complete.
((WebBrowser)sender).Dispose();
}

Related

Html Agility Pack, Web scraping [duplicate]

How can I scrape data that are dynamically generated by JavaScript in html document using C#?
Using WebRequest and HttpWebResponse in the C# library, I'm able to get the whole html source code as a string, but the difficulty is that the data I want isn't contained in the source code; the data are generated dynamically by JavaScript.
On the other hand, if the data I want are already in the source code, then I'm able to get them easily using Regular Expressions.
I have downloaded HtmlAgilityPack, but I don't know if it would take care of the case where items are generated dynamically by JavaScript...
Thank you very much!
When you make the WebRequest you're asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the javascript on it hasn't yet done anything.
You need to use a tool to execute the JavaScript on the page if you want to see what the page looks like after being parsed by a browser. One option you have is using the built in .net web browser control: http://msdn.microsoft.com/en-au/library/aa752040(v=vs.85).aspx
The web browser control can navigate to and load the page and then you can query it's DOM which will have been altered by the JavaScript on the page.
EDIT (example):
Uri uri = new Uri("http://www.somewebsite.com/somepage.htm");
webBrowserControl.AllowNavigation = true;
// optional but I use this because it stops javascript errors breaking your scraper
webBrowserControl.ScriptErrorsSuppressed = true;
// you want to start scraping after the document is finished loading so do it in the function you pass to this handler
webBrowserControl.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserControl_DocumentCompleted);
webBrowserControl.Navigate(uri);
private void webBrowserControl_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection divs = webBrowserControl.Document.GetElementsByTagName("div");
foreach (HtmlElement div in divs)
{
//do something
}
}
You could take a look at a tool like Selenium for scraping pages which has Javascript.
http://www.andykelk.net/tech/headless-browser-testing-with-phantomjs-selenium-webdriver-c-nunit-and-mono

Display PDF in WPF WebBrowser

on my PC this is works fine, but on some PCs the file qould not open, the WebBrowser displays an error, and the file opens in the default PDF program instead of WebBrowser.
My code:
Uri GuideURI = new Uri(String.Format("file:///{0}/../PDFs/" + link + ".pdf", Directory.GetCurrentDirectory()));
PDF_Web_Browser.Navigate(GuideURI);
One way to resolve this issue might be to not rely on the PC's PDF reader software.
You can use MuPDF as a library to extract the text from PDF and maybe write the content of it in XML format, then navigate to the file.
If you don't want to go this far, you can show an error message when trying to display a PDF file on a PC that doesn't have the required features to open it in the WebBrowser (source).
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
string url = e.Url.ToString();
if (url.StartsWith("res://ieframe.dll/navcancl.htm") && url.EndsWith("pdf"))
{
e.Cancel = true;
MessageBox.Show("Cannot open PDF!");
}
}
Or you can even make a mix of those. Just in case the WebBroswer can't open the PDF file, you can write a message like "PDF addon not detected" and then display the XML file generated with the help of MuPDF library.
Maybe its because WebBrowser uses engine of Interneet Explorer. If that person doesn't have installed extension for that, or have older version of IE, he dont be able to open PDF in WebBrowser.

is there a straightforward way to retrieve text that is rendered by the browser but is not hard-coded in the actual html file?

I'm trying to retrieve data from a webpage but I cannot do it by making a web request and parsing the resulting html file because the actual text that I'm trying to retrieve is not in the html file! I imagine that this text is pulled using some script and for that reason it's not in the html file. For all I know I'm looking at the wrong data, but assuming that my theory is correct, is there a straightforward way to retrieve whatever text is displayed by the browser (Firefox or IE) rather than attempt to fetch the text from the html file?
Assuming you are referring to text that has been generated using Javascript in the browser.
You can use PhantomJS to achieve this: http://phantomjs.org/
It is essentially a headless browser that will process Javascript.
You may need to run this as ane xternal program but Im sure you can do that through C#
Your other option would be to open the web page in a WebBrowser object which should execute the scripts, and then you can get the HtmlDocument object and go from there.
Take a look at this example...
private void test()
{
WebBrowser wBrowser1 = new WebBrowser();
wBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wBrowser1_DocumentCompleted);
wBrowser1.Url = new Uri("Web Page URL");
}
void wBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument document = (sender as WebBrowser).Document;
// get elements and values accordingly.
}

Page is cut while printing with the Windows.Forms.WebBrowser control

I have trouble with the printing function of the WebBrowser control.
First i load my page and it is rendered correct.
Then i set the header/footer/margins and the right printer:
webbrowser printing;
How do I programatically change printer settings with the WebBrowser control?
That works so far. Then i use myBrowser.Print();
But my website does not print right. Only the upper left corner is printed, a few centimeters and then there are scrollbars.
I printed my website with the IE9 and everything was alright. Also tried different browsermodes and documentmodes. No problem.
And i thought the control and the IE are technically the same...
Are there any parameters i have forgotten?
The website i want to print is old and has no doctype. But since the control displays it correct, I expect it to print correct, too.
Edit:
Found out that it has to do with javascript on the website, which does not run for printing.
Is there a way to get the HTML from the manipulated DOM?
For the print function no external documents are processed. All documents for a website have to be merged into one printable site. To get the other documents you have to cast the myBrowser.Document.DomDocument to an IHTMLDocument2. From that IHTMLDocument2 you can extract the CSS or JS to put it into the html.
For Example:
void myBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentCompleted);
myBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
String mySource = myBrowser.DocumentText;
// Get the CSS
IHTMLDocument2 doc = (myBrowser.Document.DomDocument) as IHTMLDocument2;
myCSS = doc.styleSheets.item(0).cssText;
mySource = mySource.Replace("<link rel=\"stylesheet\" type=\"text/css\" href=\"/css/style.css\">", "<style type=\"text/css\">"+myCSS+"</style>");
// Reload
myBrowser.DocumentText = mySource;
}
void myBrowser_DocumentPrintable(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(myBrowser_DocumentPrintable);
myBrowser.Print();
}

How Internet Explorer Prepare Print Preview window

I am wondering how Internet Explorer, Mozilla Firefox or any other browser generate print preview window of an web page loaded into the browser.
The preview image will have various changes such as banners and adv are removed, will have white background and black text and etc.
We would like implement similar print preview window using C# WebBrowser control and i don't want to use default browser Print preview feature such as ExecWB command or any other.
Please give us some light on this.
Thanks,
Ramanand Bhat.
You could try to alter the styles by accessing and modifying the HTMLDocument LINK elements.
HtmlDocument document = WebBrowser1.Document;
foreach (HtmlElement element in document.GetElementsByTagName("LINK"))
{
string cssMedia = element.GetAttribute("Media");
if (cssMedia == "print")
element.SetAttribute("Media", "screen"); //sets print styles to display normally
else
element.SetAttribute("Media", "hidden"); //hides normal styles
}
This will change your print-styles to display in screen view (i.e. as a normal stylesheet without having to use the print-preview window) and your screen-styles to not be shown (as they don't have a Media type of screen anymore)
This is sample code so doesn't have any error checking. It might also have some syntax errors but it should be a start to achieve your goal.
To print a screen you need to set up a call to window.print() in javascript.
Print screen
It will then use whatever css you have assigned as 'print' in the page to render the page as a preview
As far as I know, the banners, advertisements, et cetera are not removed by the browser during a print preview. CSS governs the appearance when the media is print.

Categories

Resources