MSHTML broken page on print

MSHTML broken page on print - c#

I have problems with printed html pages with mshtml.dll. My pages contains links to bootstrap css. In browser (IE, FF, Chrome) all looks fine and prints also fine. When I trying to print page with MSHTML something goes wrong, and page looks like css does not work. I am loading html content from disk, and then print it:
var htmlContent = File.ReadAllText(filePath);
var htmlDocument = new HTMLDocument() as IHTMLDocument2;
htmlDocument.write(htmlContent);
htmlDocument.execCommand("Print", true, 0);
htmlDocument.styleSheets; // contains all linked css files
Does anyone experience such problem? Maybe there are other methods to print html page without browser?

Related

SelectPDF HtmlToPdf not rendering all pages

I am using the SelectPDF library for my ASP.NET web application to convert a dynamically built HTML to PDF and then print it. The conversion simply looks like this:
HtmlToPdf converter = new HtmlToPdf();
converter.Options.PdfPageSize = PdfPageSize.A5;
converter.Options.PdfPageOrientation = PdfPageOrientation.Landscape;
PdfDocument doc = converter.ConvertHtmlString(FormattedLabelHTML);
doc.Save(Response, false, "Label_" + hfDispatchID.Value + ".pdf");
doc.Close();
The HTML itself is fine, as looking at the FormattedLabelHTML string in debug mode shows it's complete and looks at it should be (it's just HTML tables for each page, with page brakes between them).
But if the number of pages that should be printed is more than 12, then the rendered content stops at 12 pages, leaving the 12th page half-rendered, and the rest completely missing. If the page count is under 12, then they all render properly.
There are no errors thrown, everything executes properly. So does anyone have any tips as to why it can't print more than 12 pages?

htmlagilityPack: Web page doesn't return complete html

Using htmlagilityPack trying to get all href links. But web page doesn't return all links.
I tried in browser and saw that until you scroll down the whole page it doesn't show all links. Then I tried to resize (zoom-in) browser window so that all page contents can be seen without scrolling down. That moment all links appeared. May be java need to triggered....
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument Doc = web.Load("https://www.verkkokauppa.com/fi/catalog/438b/Televisiot/products?page=1");
foreach (HtmlNode item in Doc.DocumentNode.SelectNodes("//li[#class='product-list-grid__grid-item']/a"))
{
debug.WriteLine(item.GetAttributeValue("href", string.Empty));
}
One page has 24 product links but I get only 15 out of them.

Check Network tab in chrome on that page. There are ajax requests to https://www.verkkokauppa.com/resp-api/product?pids=467610. So products are loaded using javascript.
You can't just trigger javascript here. HtmlAgilityPack is an html parser. If you want to work with dynamic content you need browser engine. I think you should check Selenium and phantomjs.

get the web page source with the rendered html from javascript

If I use this
WebClient client = new WebClient();
String htmlCode = client.DownloadString("http://test.net");
I am able to use the agility pack to scan the html and get most of the tags that I need but its missing the html that is rendered by the javascript.
My question is, how do I get the final rendered page source using c#. Is there something more to the WebClient to get the final rendered source after javascript is run?

The HTML Agility Pack alone is not enough to do what you want, You need a javascript engine as well. To do that, you may want to check out something like Geckofx, which will allow you to embed a fully functional web browser into your application, and than allow you to programatically access the contents of the dom after the page has rendered.
http://code.google.com/p/geckofx/

You need to wrap a browser in your application.
You are in luck! There is a .NET wrapper for WebKit. http://webkitdotnet.sourceforge.net/

You can use the WebBrowser Class from System.Windows.Forms.
using (WebBrowser wb = new WebBrowser())
{
//Code here
}
https://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=vs.110).aspx

Delete tag from WebBrowser Control before rendering

The Problem:
I'm running a winforms application with an embedded WebBrowser control. I've used the magic registry setting to switch this Control to IE 8 mode (as answered here Will the IE9 WebBrowser Control Support all of IE9's features, including SVG?).
But now if I navigate to a website which contains the Meta tag X-UA-Compatible IE=9 (as of http://msdn.microsoft.com/en-us/library/cc288325(v=vs.85).aspx) my webbrowser control switches to IE9 mode and ignores the registry settings.
I would like my control to stay in IE8 mode...
My solution attempts
I've tried to remove the meta tag after the control has loaded (Document_complete) using IHTMLDOMNode.removeChild but the control does not re-render the page.
I've tried to load the HTML content manually (using WebClient), remove the meta tag and feed this into the the webbrowser control (using Document.Write or DocumentText) but this way the control refuses to load any other content (like images).
Help
Now I'm out of ideas short of writing my own HTTPProxy and modifiying the response on the way (which I would not like to do).
Anyone any ideas?
I'm using .Net 4, I cannot change the website which will be displayed and I need it to render in IE8 mode regardless of the X-UA-Compatible tag...
Thanks!

I had problems with DocumentText too - I gave up with it.
My solution was to write an in-process HTTP server and point the WebBrowser at that.
I wrote an article about it here: http://SimplyGenius.net/Article/WebBrowserEx
In my case, I was getting the content from the file system.
You'd have to change it to make calls to your target website, but it shouldn't be too much work.
Then you can modify the HTML as you like, and links will still work.

Don't know of a way to make the WebBrowser control ignore that tag and not override your registry setting. For a quick (dirty) workaround you could do the following.
Create a request for the site which you want to show in the WebBrowser control.
var requestUri = new Uri("http://stackoverflow.com/");
var request = (HttpWebRequest) WebRequest.Create(requestUri);
Get the response.
var response = request.GetResponse();
using (var stream = response.GetResponseStream())
using (var reader = new StreamReader(stream))
{
var html = reader.ReadToEnd();
//...
}
Use NuGet to install the HTMLAgilityPack.
http://nuget.org/packages/HtmlAgilityPack
Load the HTML you've just retrieved in an HtmlDocument instance.
var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
Select the tag. Here I use StackOverflow.com as an example and select its stylesheet node instead. When found, just remove the node.
var nodes = document.DocumentNode.SelectNodes("//link[#rel=\"stylesheet\"]");
foreach(var node in nodes)
{
node.ParentNode.RemoveChild(node);
}
All that remains is to retrieve the modified HTML and feed it directly to the WebBrowser control.
html = document.DocumentNode.OuterHtml;
webBrowser.DocumentText = html;
It cannot interprete what's not there.
You could do the same to solve your issue. Issue a request, get the response, modify the HTML and feed it to the WebBrowser control. Tested it, seems to load the rest of the document OK.

HttpWebRequest reades only homepage

Hi I tried to read a page using HttpWebRequest like this
string lcUrl = "http://www.greatandhra.com";
HttpWebRequest loHttp = (HttpWebRequest)WebRequest.Create(lcUrl);
loHttp.Timeout = 10000; // 10 secs
loHttp.UserAgent = "Code Sample Web Client";
HttpWebResponse loWebResponse = (HttpWebResponse)loHttp.GetResponse();
Encoding enc = Encoding.GetEncoding(1252); // Windows default Code Page
StreamReader loResponseStream =
new StreamReader(loWebResponse.GetResponseStream(), enc);
string lcHtml = loResponseStream.ReadToEnd();
mydiv.InnerHtml = lcHtml;
// Response.Write(lcHtml);
loWebResponse.Close();
loResponseStream.Close();
i can able to read that page and bind it to mydiv. But when i click on any one of links in that div it is not displaying any result. Because my application doesnt contain entire site. So what we will do now.
Can somebody copy my code and test it plz
Nagu

I'm fairly sure you can't insert a full page in a DIV without breaking something. In fact the whole head tag may be getting skipped altogether (and any javascript code there may not be run). Considering what you seem to want to do, I suggest you use an IFRAME with a dynamic src, which will also hopefully lift some pressure off your server (which wouldn't be in charge of fetching the html to be mirrored anymore).

If you really want a whole page of HTML embedded in another, then the IFRAME tag is probably the one to use, rather than the DIV.
Rather than having to create a web request and have all that code to retrieve the remote page, you can just set the src attribute of the IFRAME to point ot the page you want it to display.
For example, something like this in markup:
<iframe src="<%=LcUrl %>" frameborder="0"></iframe>
where LcUrl is a property on your code-behind page, that exposes your string lcUrl from your sample.
Alternatively, you could make the IFRAME runat="server" and set its src property programatically (or even inject the innerHTML in a way sismilar to your code sample if you really wanted to).

The code you are putting inside .InnerHtml of the div contains the entire page (including < html >, < body >, < /html > and < /body> ) which can cause a miriad of problems with any number of browsers.
I would either move to an iframe, or consider some sort of parsing the HTML for the remote site and displaying a transformed version (ie. strip the HTML ,BODY, META tags, replace some link URLs, etc).

But when i click on any one of links in that div it is not displaying any result
Probably because the links in the download page are relative... If you just copy the HTML into a DIV in your page, the browser considers the links relative to the current URL : it doesn't know about the origin of this content. I think the solution is to parse the downloaded HTML, and convert relative URLs in href attributes to absolute URLs

If you want to embed it, you need to strip everything but the body part. That means that you have to parse your string lcHTML for <body....> and remove everything before and includeing the body tag. You must also strip away everything from </body>. Then you need to parse the string for all occurences of <a href="....."> that do not start with http:// and include h t t p://www.greatandhra.com or set <base target="h t t p://www.greatandhra.com"> in your head section.
If you don't want to embed, simply clear the response buffer and stream the lcHTML string back to the browser.
PS: I had to write all h t t p with spaces to be able to post this.

Sounds like what you are trying to do is display a different site embedded in your site. For this to work by dropping it into a div you would have to extract the code between the body tags as it wouldn't be valid with html and head in the middle of another page.
The links won't work because you've now taken that page out of context in your site so you'd also have to rewrite any links on the page that are relative (i.e. don't start with http) to point to a page on your site which will then fetch the other sites page and display them back in your site, or you could add the url of the site you're grabbing to the beginning of all the relative links so they link back to that site.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

MSHTML broken page on print - c#

Related

SelectPDF HtmlToPdf not rendering all pages

htmlagilityPack: Web page doesn't return complete html

get the web page source with the rendered html from javascript

Delete tag from WebBrowser Control before rendering

HttpWebRequest reades only homepage

Categories

Resources