Get a snapshot of posted HTML page? - c#

I'm using a expertPDF to convert a couple webpages to PDF, and there's one that i'm having difficulties with. This page only renders content when info is POST'd to it, and the content is text and a PNG graph (the graph is the most important piece).
I tried creating a page form with a 'auto submit' on the body onload='' event. If i go to this page, it auto posts to the 3rd party page and i get the page as i expect. But it appears ExpertPDF won't take a 'snapshot' if the page is redirected.
I tried using HTTPRequest/Response and WebClient, but have only been able to retrieve the HTML, which doesn't include the PNG graph.
Any idea how i can create a memorystream that includes the HTML AND the PNG graph or post to it, but then somehow send ExpertPDF to that URL to take a snapshot of the posted results?
Help is greatly appreciated - i've spent too much time trying on this one sniff.
Thanks!

In HTML/HTTP the web page (the HTML) is a separate resource from any images it includes. So you would need to parse the HTML and find the URL that points to your graph, and then make a second request to that URL to get the image. (This is unless the page spits the image out inline, which is pretty rare, and if that were the case you probably wouldn't be asking.)
A quick look at ExpertPDF's FAQ page, there's a FAQ question that deals specifically with your problem. I would recommend you take a look at that.
** UPDATE **
Take a look at the second FAQ question:
Q: When I convert a HTML string to PDF, the external CSS files and images are not applied in the rendered PDF document.
You can take the original (single) response from your WebClient and convert that into a string and pass that string to ExpertPDF based on the answer to that question.

Related

Load picture from a .NET assembly resource in a simple HTML file?

I have a .NET Server Control app that simply returns some HTML. I also need to embed several picture files into the assembly so that the HTML file can use them as its src= for each of them.
We will simply have a .HTML file that lives in the project as an embedded resource and the server control code will read this html and serve it up. Within THAT html, we will need to have all the picture src links (as well as CSS, js, etc) to point back to embedded resource files.
Does anyone know what code I would put in the HTML file for the pictures to make it point back to the embedded picture file?
I have to do this on a grand scale... hundreds of times. I really would like a programmatic approach to doing this so I can write a wrapper and never have to touch it again when we update the server control with new html, picture files, etc.
One might imagine a way to do this at compile time where I can loop through the embedded files with GetManifestResourceNames and then replace() the src links with the HTTP resource links I suppose?
Thank you for any guidance!
Hm, your question covers quite many aspects. Let me repeat to see if I got it: You have an assembly, with a raw HTML file in it. This file references some items, which are to be found within this same assembly, and you want to have them served to the client upon request as well.
One possible solution might be this.
Instead of a raw HTML file, use a templated one. Then, feed all available resource names as proper URL's into the templating engine, to replace the placeholders.You may want to look at DotLiquid for this.
Create a HTTP handler for each file type you want to serve. Inside the handler, you pull the item from the resources of the dll and serve them.
Alternatively, if those resources are rather small, you want to have a look into the data URI scheme, to save the extra requests and omit the handler. With this you could replace the placeholders with the data URI's directly, and serve a single HTML file with everything in it in the frist place.
Another choice is to have your .NET Server Control app check for optional GET arguments and return the image instead of the HTML.
Your original HTML request might be a simple:
GET netServerApp
Which returns the HTML with normal embedded links.
The HTML image links in the HTML might look like this:
<img src="netServerApp?src=Image1.svg">
or the like. Your server app would then return the appropriate image, instead of the HTML.
It means several round trips to get everything, but that is normal for HTML anyway.

How to convert a HTML Table in an MVC View to PDF file, ITextSharp

I've been pointed in the direction of ITextSharp, when I went to download the package from NuGet I noticed something called RazorToPDF only to discover unsolvable formatting issues due to the project no longer being supported.
After more research I was surprised to find there wasn't a similarly worded question as this on SO.
So guys, what's the best way to convert a HTML page/table in an MVC project to a PDF file?
What's the best way to convert a HTML page/table in an MVC project to a PDF file?
Generally, print it to a PDF from the web browser on the client.
The thing is, by relying on the end-user perspective of the view in this case, you're also relying on the end-user rendering of that view. It's a step that should be removed from this particular equation entirely.
Keep in mind that there are fundamental differences between how an HTML page renders and how a PDF renders. The two aren't 100% interchangeable. A PDF has a static page size and elements are placed absolutely, whereas HTML has dynamic sizes and elements are placed in a flow layout. There are additional considerations such as client-side DOM manipulation that may take place in that view. "Rendering" it quickly becomes a browser-based activity, which is something you shouldn't really need to do server-side.
Instead of thinking of the PDF as an extra step following the rendering of the view, think of it as a view in and or itself, parallel to the other view. One requested action results in the HTML view, another requested action results in the PDF "view". As such, you design the PDF template how you want it to look and populate it with data (using something like iTextSharp) before returning the file contents to the client.

asp.net screenshot

I need to work on a new feature where a user can make a screenshot of the content in the browser via asp.net page and save it automatically as jpeg. Is there a example or someone can give me some idea how I can do that?
I will really appreciate.
Thanks in advance, Lazile
The long and short of it is that you'll need to render the page on the server and take a picture of it. Depending on your format needs, there are a variety of ways to do this.
Here's a link to a tutorial for getting a snapshot in jpg, bmp, png, etc.
If you need to get it to pdf format, I would recommend either using a program like wkhtmltopdf, or using the information from the tutorial and then pasting that image into a pdf.
check this out. It's javascript on the client side. It might meet your needs with some tweaking.
Edit: caveat is that it doesn't work in <IE9 as it uses HTML 5 canvas.

Scraping content from webpage

I need to scrape a remote html page looking for images and links. I need to find an image that is "most likely" the product image on the page and links that are "near" that image. I currently do this with a javascript bookmarklet so that I am able to get the rendered x/y coordinates of images and links to help me determine if those are the ones that I want.
What I want is the ability to get this information by just using a url and not the bookmarklet. The issues it that by using the url and trying something like httpwebrequest and getting the html on the server, I will not have location values since it wasn't rendered in a browser. I need the location of images and links to help me determine the images and links that I want.
So how can I get html from a remote site on the server AND use the rendered location values of the dom elements to help me locate images and links?
As you indicate, doing this purely through inspection of the html is a royal pain (especially when CSS gets involved). You could try using the WebBrowser control (which hosts IE), but I wonder if looking for an appropriate, supported API might be better (and less likely to get you blocked). If there isn't an API or similar, you probably shouldn't be doing this. So don't.
You can dowload the page with HttpWebRequet and then use the HtmlAgilityPack to parse out the data that you need.
You can download it from http://htmlagilitypack.codeplex.com/

How to render HTML chunk?

What's the best way to render a chunk of HTML in an application? We have a rich text editor control (from Karamasoft) in a web page, and need to generate a PDF with records saved from the control (with custom page headers, page footers, and record headers) so I need to be able to render the html so it can be "drawn" onto the page to be saved as a pdf.... is there any staright forward simple way to do this?
HTML Renderer is a library of 100% managed code that draws beautifully formatted HTML.
Without using any libraries, you can use the Literal control that allows you to inject the HTML you wish to display to the user.
You may try PURE to render JSON data in HTML: http://beebole.com/pure/
Although this may be out of topic.
But then I'm interested on how do you convert the HTML in PDF.
What technical steps are involved?

Categories

Resources