Generate a pdf thumbnail (open source/free) [closed]

Generate a pdf thumbnail (open source/free) [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Looking at other posts for this could not find an adequate solution that for my needs. Trying to just get the first page of a pdf document as a thumbnail. This is to be run as a server application so would not want to write out a pdf document to file to then call a third application that reads the pdf to generate the image on disk.
doc = new PDFdocument("some.pdf");
page = doc.page(1);
Image image = page.image;
Thanks.

Matthew Ephraim released an open source wrapper for Ghostscript that sounds like it does what you want and is in C#.
Link to Source Code: https://github.com/mephraim/ghostscriptsharp
Link to Blog Posting: http://www.mattephraim.com/blog/2009/01/06/a-simple-c-wrapper-for-ghostscript/
You can make a simple call to the GeneratePageThumb method to generate a thumbnail (or use GeneratePageThumbs with a start and end page number to generate thumbnails for multiple seperate pages, with each page being a seperate output file), default file format is jpeg but you can change it, and many other options, by using the alternate GenerateOutput method call and specify options such as file format, page size, etc...

I think that Windows API Code pack for Microsoft .NET framework might do the trick easiest. What it can is to generate the same thumbnail that Windows Explorer does (and that is first page), and you can chose several sizes, they go up to 1024x1024, so it should be enough. It is quite simple, just create ShellObject.FromParsingName(filepath) and find its Thumbnail subclass.
The problem might be what your server is. This works on Windows 7, Windows Vista and I guess Windows Server 2008. Also, Windows Explorer must be able to show thumbnails on that machine. The easiest way to insure that is to install Adobe Reader. If all of this is not a problem, I think that this is the most elegant way.
UPDATE: Adobe Reader has dropped support for thumbnails in the recent versions so its legacy versions must be used.
UPDATE2: According to comment from Roberto, you can still use latest version of Adobe Reader if you turn on thumbnails option in Edit - Preferences - General.

Download PDFLibNet and use the following code
public void ConvertPDFtoJPG(string filename, String dirOut)
{
PDFLibNet.PDFWrapper _pdfDoc = new PDFLibNet.PDFWrapper();
_pdfDoc.LoadPDF(filename);
for (int i = 0; i < _pdfDoc.PageCount; i++)
{
Image img = RenderPage(_pdfDoc, i);
img.Save(Path.Combine(dirOut, string.Format("{0}{1}.jpg", i,DateTime.Now.ToString("mmss"))));
}
_pdfDoc.Dispose();
return;
}
public Image RenderPage(PDFLibNet.PDFWrapper doc, int page)
{
doc.CurrentPage = page + 1;
doc.CurrentX = 0;
doc.CurrentY = 0;
doc.RenderPage(IntPtr.Zero);
// create an image to draw the page into
var buffer = new Bitmap(doc.PageWidth, doc.PageHeight);
doc.ClientBounds = new Rectangle(0, 0, doc.PageWidth, doc.PageHeight);
using (var g = Graphics.FromImage(buffer))
{
var hdc = g.GetHdc();
try
{
doc.DrawPageHDC(hdc);
}
finally
{
g.ReleaseHdc();
}
}
return buffer;
}

I used to do this kind of stuff with imagemagick (Convert) long ago.
There is a .Net Wrapper for that, maybe it's worth checking out :
http://imagemagick.codeplex.com/releases/view/30302

http://www.codeproject.com/KB/cs/GhostScriptUseWithCSharp.aspx
This works very well. The only dependencies are GhostScript's gsdll32.dll (you need to download GhostScript separately to get this, but there is no need to have GhostScript installed in your production environment), and PDFSharp.dll which is included in the project.

Related

HTML to PDF in a Windows universal app (UWP)

Is it possible to convert an HTML string to a PDF file in a UWP app?
I've seen lots of different ways it can be done in regular .NET apps (there seem to be plenty of third party libraries), but I've yet to see a way it can be done in a Universal/UWP app. Does anyone know how it can be done?
Perhaps there is some way to hook into the "Microsoft Print to PDF" option, if there is no pure code solution?
Or is there a roundabout way of doing it, maybe like somehow using Javascript and https://github.com/MrRio/jsPDF inside a C# UWP app? I'm not sure, clutching at straws...
EDIT
I have marked the solution provided by Grace Feng - MSFT as correct for proving that it IS possible to convert HTML to PDF, through the use of the Microsoft Print to PDF option in the print dialog. Thank you

Perhaps there is some way to hook into the "Microsoft Print to PDF" option, if there is no pure code solution?
Yes, but using this way, you will need to firstly show your HTML string in controls like RichEditBox or TextBlock, only UIElement can be printable content.
You can also create PDF file by yourself, here is basic syntax used in PDF:
You can use BT and ET to create paragraph:
Here is sample in C#:
StringBuilder sb = new StringBuilder();
sb.AppendLine("BT"); // BT = begin text object, with text-units the same as userspace-units
sb.AppendLine("/F0 40 Tf"); // Tf = start using the named font "F0" with size "40"
sb.AppendLine("40 TL"); // TL = set line height to "40"
sb.AppendLine("230.0 400.0 Td"); // Td = position text point at coordinates "230.0", "400.0"
sb.AppendLine("(Hello World)'");
sb.AppendLine("/F2 20 Tf");
sb.AppendLine("20 TL");
sb.AppendLine("0.0 0.2 1.0 rg"); // rg = set fill color to RGB("0.0", "0.2", "1.0")
sb.AppendLine("(This is StackOverflow)'");
sb.AppendLine("ET");
Then you can create a PDF file and save this into this file. But since you want to convert the HTML to PDF, it could be a hard work and I think you don't want to do this.
Or is there a roundabout way of doing it, maybe like somehow using Javascript and https://github.com/MrRio/jsPDF inside a C# UWP app? I'm not sure, clutching at straws...
To be honestly, using Libs or Web service to convert HTML to PDF is also a method, there are many and I just searched for them, but I can't find any free to be used in WinRT. So I think the most practicable method here is the first one, hooking into Microsoft Print to PDF. To do this, you can check the official Printing sample.
Update:
Used #Jerry Nixon - MSFT's code in How do I print WebView content in a Windows Store App?, this is a great sample. I just added some code for add pages for printing, in the NavigationCompleted event of WebView:
private async void webView_NavigationCompleted(WebView sender, WebViewNavigationCompletedEventArgs args)
{
MyWebViewRectangle.Fill = await GetWebViewBrush(webView);
MyPrintPages.ItemsSource = await GetWebPages(webView, new Windows.Foundation.Size(842, 595));
}
Then in the printDoc.AddPages += PrintDic_AddPages; event (printDoc is instance of PrintDocument):
private void PrintDic_AddPages(object sender, AddPagesEventArgs e)
{
foreach (var item in MyPrintPages.Items)
{
var rect = item as Rectangle;
printDoc.AddPage(rect);
}
printDoc.AddPagesComplete();
}
For other code you can refer to the official printing sample.

Free library to import FDF into PDF [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am trying to save a PDF file by saving the data from the FDF into a PDFTemplate, in my WPF application.
So, the situation is like this. I have a PDFTemplate.pdf which serves as a template and has placeholders (or fields). Now I generate this FDF file pro-grammatically, which in turn contain all the field names required for the PDFTemplate to be filled in. Also, this FDF contains the file path for the PDFTemaplte also, so that on opening, it knows which PDF to use.
Now, when try and double click on the FDF, it open the Adober Acrobat Reader and displays the PDFTemplate with the data filled in. But I can't save this file using the File menu, as it says this file will be saved without the data.
I would like to know if it is possible to import the FDF data into PDF and save it without using a thrid party component.
Also, if it is very difficult to do this, what would be the possible solution in terms of a free library that would be able to do it?
I just realized that iTextSharp is not free for commercial applications.

I have been able to achieve this using another library PDFSharp.
It is somewhat similar to how iTextSharp works except for some places where in iTextSharp is better and easier to use. I am posting the code in case someone would want to do something similar:
//Create a copy of the original PDF file from source
//to the destination location
File.Copy(formLocation, outputFileNameAndPath, true);
//Open the newly created PDF file
using (var pdfDoc = PdfSharp.Pdf.IO.PdfReader.Open(
outputFileNameAndPath,
PdfSharp.Pdf.IO.PdfDocumentOpenMode.Modify))
{
//Get the fields from the PDF into which the data
//is supposed to be inserted
var pdfFields = pdfDoc.AcroForm.Fields;
//To allow appearance of the fields
if (pdfDoc.AcroForm.Elements.ContainsKey("/NeedAppearances") == false)
{
pdfDoc.AcroForm.Elements.Add(
"/NeedAppearances",
new PdfSharp.Pdf.PdfBoolean(true));
}
else
{
pdfDoc.AcroForm.Elements["/NeedAppearances"] =
new PdfSharp.Pdf.PdfBoolean(true);
}
//To set the readonly flags for fields to their original values
bool flag = false;
//Iterate through the fields from PDF
for (int i = 0; i < pdfFields.Count(); i++)
{
try
{
//Get the current PDF field
var pdfField = pdfFields[i];
flag = pdfField.ReadOnly;
//Check if it is readonly and make it false
if (pdfField.ReadOnly)
{
pdfField.ReadOnly = false;
}
pdfField.Value = new PdfSharp.Pdf.PdfString(
fdfDataDictionary.Where(
p => p.Key == pdfField.Name)
.FirstOrDefault().Value);
//Set the Readonly flag back to the field
pdfField.ReadOnly = flag;
}
catch (Exception ex)
{
throw new Exception(ERROR_FILE_WRITE_FAILURE + ex.Message);
}
}
//Save the PDF to the output destination
pdfDoc.Save(outputFileNameAndPath);
pdfDoc.Close();
}

Printing A PDF Automatically to a specific printer and tray

I have a C# application that When the user clicks Print the application creates a PDF in memorystream using ITextSharp. I need to print this PDF automatically to a specific printer and tray.
I have searched for this but all i can find is using javascript, but it doesn't print to a specific tray.
Does anyone have an examples of doing this?
Thank you.

You can change printer tray with this code.
string _paperSource = "TRAY 2"; // Printer Tray
string _paperName = "8x17"; // Printer paper name
//Tested code comment. The commented code was the one I tested, but when
//I was writing the post I realized that could be done with less code.
//PaperSize pSize = new PaperSize() //Tested code :)
//PaperSource pSource = new PaperSource(); //Tested code :)
/// Find selected paperSource and paperName.
foreach (PaperSource _pSource in printDoc.PrinterSettings.PaperSources)
if (_pSource.SourceName.ToUpper() == _paperSource.ToUpper())
{
printDoc.DefaultPageSettings.PaperSource = _pSource;
//pSource = _pSource; //Tested code :)
break;
}
foreach (PaperSize _pSize in printDoc.PrinterSettings.PaperSizes)
if (_pSize.PaperName.ToUpper() == _paperName.ToUpper())
{
printDoc.DefaultPageSettings.PaperSize = _pSize;
//pSize = _pSize; //Tested code :)
break;
}
//printDoc.DefaultPageSettings.PaperSize = pSize; //Tested code :)
//printDoc.DefaultPageSettings.PaperSource = pSource; //Tested code :)

in the past I spent a lot of time searching the web for solutions to print pdf files to specific printer trays.
My requirement was: collect several pdf files from server directory and send each file to a different printer tray in a loop.
So I have tested a lot of 3rd party tools (trials) and best practices found in web.
Generally all theese tools can be divide into two classifications: a) send pdf files to printer in a direct way (silent in UI) or b) open pdf files in UI using a built-in pdf previewer working with .Net-PrintDocument.
The only solution that fix my requirement was PDFPrint from veryPdf (drawback: it´s not priceless, but my company bought it). All the other tools and solutions didn´t work reliable, that means: calling their print-routines with parameter e.g. id = 258 (defines tray 2; getting from installed printer) but printing the pdf file in tray 3 or pdf was opened in print previewer (UI) with lost images or totally blank content and so on..
Hope that helps a little bit.

There is a tool called pdfprint:
http://www.verypdf.com/pdfprint/index.html
And here they discuss some solutions:
http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/da99765f-2706-4bb6-aa0e-b90730294cb4

ASP.NET/ MVC/ C#/ jQuery to create a CMS front end and PDF Generator [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a few general ideas on how I want to do this.
What I am trying to do is: create a front end CMS system, which is very simple, where a report will be generated from i.e. a template, using jQuery (drag, drop etc), included in the report will be placeholders where data will be imported into e.g. name, address etc. This data can be changed by different users who have access to the data.
I was thinking I would need to convert this HTML into xsl-fo format and then generate it into a PDF as xsl-fo will give me a major advantage on custom display of data on PDF, i.e. the data will appear how I want it to. This will also enable me to do a lookup in the xsl-fo using xslt (or something?) to import the latest updated database values. The tool to actually convert from xsl-fo into PDF that looks like it fits my bill is: fo.net. Ultimately I would need to use some code already out there but where I can avoid it, I would want to.
Keep in mind:
I need ultimate control over everything (eventually)
Free / open source alternatives that are flexible (with source code)
Questions:
Is jQuery the best thing to use for the CMS? As I will be having custom controls which will contain database data or placeholders for data to be imported into
Is XSL-FO the best intermediary language to port this template into for rendering/ converting into a PDF?
How do I convert html into xsl-fo? Does c#/.net have an API I can look at?
Have I overcomplicated things? Any simpler ways to do this?
Note
The HTML + CSS on the page may be very complicated/ flexible so I may need to use jQuery to add the CSS inline to the elements, hence why I am thinking of using XSL-FO as I may be able to generate tags that can read this data and place it on the PDF in a certain way, please keep this in mind when answering my question (if you choose to!) :)

I have found PDFsharp and MigraDoc to be great for pdf generation.
I have created a pdf utility...
using System;
using System.IO;
using System.Web;
using System.Web.Mvc;
using PdfSharp.Pdf;
//Controller for a PdfResult
namespace Web.Utilities
{
public class PdfResult : ActionResult
{
public String Filename { get; set; }
protected MemoryStream pdfStream = new MemoryStream();
public PdfResult(PdfDocument doc)
{
Filename = String.Format("{0}.pdf", doc.Info.Title);
doc.Save(pdfStream, false);
}
public PdfResult(String pdfpath)
{
/* optional if requried ToString save ToString file System */
throw new NotImplementedException("PdfResult is just an example and does not serve files from the filesystem.");
}
public override void ExecuteResult(ControllerContext context)
{
context.HttpContext.Response.Clear();
context.HttpContext.Response.ContentType = "application/pdf";
context.HttpContext.Response.AddHeader("Content-Disposition", "attachment; filename=" + Filename); // specify filename
context.HttpContext.Response.AddHeader("content-length", pdfStream.Length.ToString());
context.HttpContext.Response.BinaryWrite(pdfStream.ToArray());
context.HttpContext.Response.Flush();
pdfStream.Close();
context.HttpContext.Response.End();
}
}
}
And then you can render a view of pdf in the controller...
public ActionResult Download()
{
Document document = new Document();
document.Info.Title = "Hello";
Section section = document.AddSection();
section.AddParagraph("Hello").AddFormattedText("World", TextFormat.Bold);
PdfDocumentRenderer renderer = new PdfDocumentRenderer();
renderer.Document = document;
renderer.RenderDocument();
return new PdfResult(renderer.PdfDocument);
}
I have found this to be a really neat and easy to control method of putting pdf into mvc.

To answer my own question, I have decided to use Fo.NET, a C# implementation of Fop.Net by Apache. I will generate my XML file on the fly, then transform this document into an XSL:Fo xml file then send to create a PDF.
I have managed to do this quite successfully, this will enable me to throw out Fo.Net in the future and get another software or even write my own if needed. Hopefully over the next few months I will have a firmer answer to how flexible my choice actually was. :)
I will handle the front end with jQuery and jQuery UI.

Determine number of pages in a PDF file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to determine the number of pages in a specified PDF file using C# code (.NET 2.0). The PDF file will be read from the file system, and not from an URL. Does anyone have any idea on how this could be done? Note: Adobe Acrobat Reader is installed on the PC where this check will be carried out.

You'll need a PDF API for C#. iTextSharp is one possible API, though better ones might exist.
iTextSharp Example
You must install iTextSharp.dll as a reference. Download iTextsharp from SourceForge.net This is a complete working program using a console application.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
class Program
{
static void Main(string[] args)
{
// Right side of equation is location of YOUR pdf file
string ppath = "C:\\aworking\\Hawkins.pdf";
PdfReader pdfReader = new PdfReader(ppath);
int numberOfPages = pdfReader.NumberOfPages;
Console.WriteLine(numberOfPages);
Console.ReadLine();
}
}
}

This should do the trick:
public int getNumberOfPdfPages(string fileName)
{
using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
{
Regex regex = new Regex(#"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}
From Rachael's answer and this one too.

found a way at http://www.dotnetspider.com/resources/21866-Count-pages-PDF-file.aspx
this does not require purchase of a pdf library

One Line:
int pdfPageCount = System.IO.File.ReadAllText("example.pdf").Split(new string[] { "/Type /Page" }, StringSplitOptions.None).Count()-2;
Recommended:
ITEXTSHARP

I have used pdflib for this.
p = new pdflib();
/* Open the input PDF */
indoc = p.open_pdi_document("myTestFile.pdf", "");
pageCount = (int) p.pcos_get_number(indoc, "length:pages");

Docotic.Pdf library may be used to accomplish the task.
Here is sample code:
PdfDocument document = new PdfDocument();
document.Open("file.pdf");
int pageCount = document.PageCount;
The library will parse as little as possible so performance should be ok.
Disclaimer: I work for Bit Miracle.

I have good success using CeTe Dynamic PDF products. They're not free, but are well documented. They did the job for me.
http://www.dynamicpdf.com/

I've used the code above that solves the problem using regex and it works, but it's quite slow. It reads the entire file to determine the number of pages.
I used it in a web app and pages would sometimes list 20 or 30 PDFs at a time and in that circumstance the load time for the page went from a couple seconds to almost a minute due to the page counting method.
I don't know if the 3rd party libraries are much better, I would hope that they are and I've used pdflib in other scenarios with success.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.