I am using PDFsharp to read / write PDF in our application. I am trying to get the paper size of a page to show in metadata window. I am using following code to do the same:
// read PDF document into memory
var pdfDocument = PdfReader.Open(pdfPath);
// if PDF document has more than one page, then try to get the page size from first page
if (pdfDocument.Pages.Count > 0)
{
var pageSize = pdfDocument.Pages[0].Size.ToString();
} // if
But every time it returns 'Undefined', I even tried to create A4 page document for MS Word. But still it returns 'Undefined'.
I also created a PDF from HTML but then also page size comes as undefined.
static void Main(string[] args)
{
// create PDF config to generate PDF
var config = new PdfGenerateConfig()
{
PageSize = PageSize.Letter,
PageOrientation = PageOrientation.Landscape
};
// load the HTML file
var html = System.IO.File.ReadAllText(#"C:\Users\mohit\Desktop\Temp\Ekta\Test.html");
// generate the PDF
var pdf = PdfGenerator.GeneratePdf(html,
PageSize.Letter);
// get the paper size of first page
var size = pdf.Pages[0].Size;
// save the pdf document
pdf.Save(#"C:\Users\mohit\Desktop\Temp\Ekta\A13.pdf");
Console.ReadKey();
}
The properties you have to use are Width and Height. Convert them to millimeter or inch if you do not want to show the size in points.
The property Size is a convenient way to set standard page sizes for new pages as you can specify A4 or Letter. It will not be set for imported PDF files.
Related
I have a PDF file with some images, which I want to replace with some other PDF. The code goes through the pdf and gets the image references:
PdfDocument pdf = new PdfDocument(new PdfReader(args[0]), new PdfWriter(args[1]));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i)
{
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
foreach (PdfName imgRef in xObjects.KeySet())
{
// image reference
}
}
For all my images I have a corresponding PDF which I would like to replace the image with. What I tried is to Put the other PDF (which is always a single page) as object by:
PdfDocument other = new PdfDocument(new PdfReader("replacement.pdf"));
xObjects.Put(imgRef, other.GetFirstPage().GetPdfObject().Clone());
But while closing the PdfDocument an exception is thrown:
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
How can I achieve to replace the image with (the content of) another PDF?
Update
I also tried a few other approaches, which maybe improved results. To overcome the previous error message, I copy the page to the original pdf by:
var page = other.GetFirstPage().CopyTo(pdf);
However, replacing the xObject doesn't work:
xObjects.Put(imgRef, page.GetPdfObject());
Results in a corrupted PDF.
To just copy the original page into another document to be used as an image replacement, you can use PdfPage#CopyAsFormXObject.
So let's assume we have this PDF as a template and we want to replace the image of a desert with the contents of another PDF:
Let's also assume the PDF that we want to use as a replacement looks as follows:
The issue is that if we blindly replace the original image with the contents of the PDF, chances are we will get something like this:
So we will get a feeling that everything worked well while we still have a bad visual result. The issue is that coordinates work a bit differently for plain raster images and vector XObjects (PDF replacements). So we also need to adjust the transformation matrix (/Matrix key) of our newly created XObject.
So the code could look like this:
PdfDocument pdf = new PdfDocument(new PdfReader(#"template.pdf"), new PdfWriter(#"out.pdf"));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i) {
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
IDictionary<PdfName, PdfStream> toReplace = new Dictionary<PdfName, PdfStream>();
foreach (PdfName imgRef in xObjects.KeySet()) {
PdfStream previousXobject = xObjects.GetAsStream(imgRef);
PdfDocument imageReplacementDoc =
new PdfDocument(new PdfReader(#"insert.pdf"));
PdfXObject imageReplacement = imageReplacementDoc.GetPage(1).CopyAsFormXObject(pdf);
toReplace[imgRef] = imageReplacement.GetPdfObject();
adjustXObjectSize(imageReplacement);
imageReplacementDoc.Close();
}
foreach (var x in toReplace) {
xObjects.Put(x.Key, x.Value);
}
}
pdf.Close();
UPD: Implementation of adjustXObjectSize(thanks mkl):
private void adjustXObjectSize(PdfXObject pageXObject) {
float scaleXobject = 1 / Math.Max(pageXObject.GetWidth(), pageXObject.GetHeight());
AffineTransform transform = new AffineTransform();
transform.Scale(scaleXobject, scaleXobject);
float[] matrix = new float[6];
transform.GetMatrix(matrix);
pageXObject.GetPdfObject().Put(PdfName.Matrix, new PdfArray(matrix));
}
And the visual result after running the above code on the samples I described would look like this:
I need to generate a pdf report from a URL in our application. Is it possible to have both Landscape and Portrait pages in the same pdf document that is generated?
I'd like to have the bar charts as Portrait, and the Tables as Landscape (horizontal). Looking at the EVO doc's I don't know if this is possible.
I know that you can define either Landscape or Portrait with
htmlToPdfConverter.PdfDocumentOptions.PdfPageOrientation
But this is applied to the whole document. I'd like something I could potentially define the html that would tell EVO to print this section as Landscape.
You can have Portrait and Landscape section in the same PDF. For this you can create a blank Document object and add a PDF page with the desired orientation to this document. On the newly created PDF page you can add a HtmlToPdfElement object to render the HTML and automatically add PDF pages with the same orientation with the PDF page you initially created. The procedure can be repeated with PDF pages of different orientations. There is a live sample with C# code for this approach in Merge Multiple HTML Pages into a Single PDF demo. The code sample is also copied below:
protected void convertToPdfButton_Click(object sender, EventArgs e)
{
// Create the PDF document where to add the HTML documents
Document pdfDocument = new Document();
// Create a PDF page where to add the first HTML
PdfPage firstPdfPage = pdfDocument.AddPage();
try
{
// Create the first HTML to PDF element
HtmlToPdfElement firstHtml = new HtmlToPdfElement(0, 0, firstUrlTextBox.Text);
// Optionally set a delay before conversion to allow asynchonous scripts to finish
firstHtml.ConversionDelay = 2;
// Add the first HTML to PDF document
AddElementResult firstAddResult = firstPdfPage.AddElement(firstHtml);
PdfPage secondPdfPage = null;
PointF secondHtmlLocation = Point.Empty;
if (startNewPageCheckBox.Checked)
{
// Create a PDF page where to add the second HTML
secondPdfPage = pdfDocument.AddPage();
secondHtmlLocation = PointF.Empty;
}
else
{
// Add the second HTML on the PDF page where the first HTML ended
secondPdfPage = firstAddResult.EndPdfPage;
secondHtmlLocation = new PointF(firstAddResult.EndPageBounds.Left, firstAddResult.EndPageBounds.Bottom);
}
// Create the second HTML to PDF element
HtmlToPdfElement secondHtml = new HtmlToPdfElement(secondHtmlLocation.X, secondHtmlLocation.Y, secondUrlTextBox.Text);
// Optionally set a delay before conversion to allow asynchonous scripts to finish
secondHtml.ConversionDelay = 2;
// Add the second HTML to PDF document
secondPdfPage.AddElement(secondHtml);
// Save the PDF document in a memory buffer
byte[] outPdfBuffer = pdfDocument.Save();
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Merge_Multipe_HTML.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
}
finally
{
// Close the PDF document
pdfDocument.Close();
}
}
I'm trying to create a Table of Contents using MigraDoc and PDFsharp and I've gotten really close but the problem I'm currently having is that the links on the Table of Contents all take me to the very first page of the PDF. I'm trying to link them to their respective pages. PDFSharp bookmarks work fine but when trying to create a table of contents based on the merged PDF it's not working.
static void TableOfContents(PdfDocument document)
{
// Puts the Table of contents on the second page
PdfPage page = document.Pages[1];
XGraphics gfx = XGraphics.FromPdfPage(page);
gfx.MUH = PdfFontEncoding.Unicode;
// Create MigraDoc document + Setup styles
Document doc = new Document();
Styles.DefineStyles(doc);
// Add header
Section section = doc.AddSection();
Paragraph paragraph = section.AddParagraph("Table of Contents");
paragraph.Format.Font.Size = 14;
paragraph.Format.Font.Bold = true;
paragraph.Format.SpaceAfter = 24;
paragraph.Format.OutlineLevel = OutlineLevel.Level1;
// Add links - these are the PdfSharp outlines/bookmarks
// added previously when concatinating the pages
foreach (var bookmark in document.Outlines)
{
paragraph = section.AddParagraph();
paragraph.Style = "TOC";
paragraph.AddBookmark(bookmark.Title);
Hyperlink hyperlink = paragraph.AddHyperlink(bookmark.Title);
hyperlink.AddText($"{bookmark.Title}\t");
hyperlink.AddPageRefField(bookmark.Title);
}
// Render document
DocumentRenderer docRenderer = new DocumentRenderer(doc);
docRenderer.PrepareDocument();
docRenderer.RenderPage(gfx, 1);
gfx.Dispose();
}
Ideally I want it to return the file's name (which it's doing) and the page number (it's only returning the first page). This is what it's currently outputting.
Table of Contents
file name here......................... 1
file name here......................... 1
file name here......................... 1
file name here......................... 1
As I understand it, the Hyperlink and bookmark should be unique to the document.
Otherwise the link will be made to the first paragraph containing the bookmark.
I simply use a number which I increase for a simple report I make.
private void DefineTOCLine(int level, string text, Paragraph linkTo)
{
var tocIndex = (tocindex++).ToString(CultureInfo.InvariantCulture);
var paragraph = tocsection.AddParagraph();
paragraph.Style = level == 1 ? "TOC1" : "TOC2";
var hyperlink = paragraph.AddHyperlink(tocIndex);
hyperlink.AddText(text + "\t");
hyperlink.AddPageRefField(tocIndex);
linkTo.AddBookmark(tocIndex);
}
You invoke hyperlink.AddPageRefField to set a reference, but as far as I can tell you never create the MigraDoc bookmark for the target of the reference by calling MigraDoc's AddBookmark method.
MigraDoc bookmarks are different from PDF file bookmarks.
I am converting html page to pdf using HtmlToPdf() of SelectPDF. Since html content is big, I am breaking it in half and creating 2 PDFs.
I am struggling to edit the total_pages in the footer to display actual total number of the pages, not only the current document; as well as page_number to display the actual page number in the context of both PDFs.
How can I assess {page_number} and {total_pages} to calculate proper values? All examples I found use PdfDocument(), not HtmlToPdf().
Dim converter As New HtmlToPdf()
Dim text As New PdfTextSection(0, 10, "Page: {page_number} of {total_pages} ")
text.HorizontalAlign = PdfTextHorizontalAlign.Center
converter.Footer.Add(text)
I am tagging both C# and VB since SelectPDF is for both languages, and relevant sample from either one will work for me. Thank you
Today I've stumbled upon the same issue and I have found a work-around for the problem. The converter was able to show page numbers for it's the generated document but can't be aware of multiple generated files (you can't access the page properties) so all my pages I concatenated were showing Page 1 of 1.
First I define one PdfDocument (see it as the main document) and I use HtmlToPdf to append html converted files to this main document.
// Create converter
converter = new HtmlToPdf();
PdfTextSection text = new PdfTextSection(0, 10, "Page: {page_number} of {total_pages} ", new Font("Arial", 8));
text.HorizontalAlign = PdfTextHorizontalAlign.Right;
converter.Footer.Add(text);
// Create main document
pdfDocument = new PdfDocument();
Then I add pages (from html) using this method
public void AddPage(string htmlPage)
{
PdfDocument doc = converter.ConvertHtmlString(htmlPage);
pdfDocument.Append(doc);
converter.Footer.TotalPagesOffset += doc.Pages.Count;
converter.Footer.FirstPageNumber += doc.Pages.Count;
}
This results in correct page numbers for the main document. The same trick could be used for splitting files and page numbers over multiple documents like you described.
EDIT: In case you don't see any page numbering using the HtmlToPdf converter, don't forget to set following property:
converter.Options.DisplayFooter = true;
There is an open source library called itextsharp that will help get total page count.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
class Program
{
static void Main(string[] args)
{
// Right side of equation is location of YOUR pdf file
string ppath = "C:\\aworking\\Hawkins.pdf";
PdfReader pdfReader = new PdfReader(ppath);
int numberOfPages = pdfReader.NumberOfPages;
Console.WriteLine(numberOfPages);
Console.ReadLine();
}
}
}
Then you can stamp text also on the page but you will need to add the location to where it needs to go.
link: http://crmhunt.com/how-to-modify-pdf-file-using-itextsharp/
hope this helps in some way.
You should use the following properties:
FirstPageNumber - Controls the page number for the first page being
rendered.
TotalPagesOffset - Controls the total number of pages
offset in the generated pdf document.
More details here:
http://selectpdf.com/html-to-pdf/docs/html/HtmlToPdfHeadersAndFooters.htm
The answers above did not work for me as I was trying to merge multiple PDFs with different orientations. bonnoj's answer did add page numbers but they were incorrect and I couldn't find a way to correct them. So I took a different approach - I created a PDF, then for each HTML page I added a pdfPage and then added a PdfHtmlElement to that page. Finally I loop over the pages and add a custom footer to each page. This may not be the most efficient way to do this but it's the only way that I could find that added the footer in the correct place when mixing portrait and landscape pages. Hopefully it will save somebody else spending hours playing with different properties.
var pdfDocument = new PdfDocument(PdfStandard.Full);
foreach (var (html, pdfPageOrientation) in pages)
{
var page = pdfDocument.AddPage(PdfCustomPageSize.A4, new PdfMargins(marginLeft, marginRight, marginTop, marginBottom));
page.Orientation = pdfPageOrientation;
var pdfHtmlElement = new PdfHtmlElement(html, "");
page.Add(pdfHtmlElement);
}
var pdfFont = pdfDocument.AddFont(PdfStandardFont.Helvetica);
pdfFont.Size = 12;
foreach (PdfPage page in pdfDocument.Pages)
{
var customFooter = pdfDocument.AddTemplate(page.PageSize.Width, 30);
var pdfFooterTextElement = new PdfTextElement(0, 15,
pageFooterText,
pdfFont)
{
HorizontalAlign = PdfTextHorizontalAlign.Right,
VerticalAlign = PdfTextVerticalAlign.Bottom,
};
customFooter.Add(pdfFooterTextElement);
page.CustomFooter = customFooter;
}
pdfDocument.Save(stream);
I am generating a PDF from an HTML string.
When this string is really long, I would like to create a new page, split the text (without breaking the html) and so on.
Here is my code :
// instantiate Pdf object
Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();
// specify the Character encoding for for HTML file
pdf.HtmlInfo.CharSet = "UTF-8";
pdf.HtmlInfo.Margin.Left = 10;
pdf.HtmlInfo.Margin.Right = 10;
pdf.HtmlInfo.PageHeight = 1050;
pdf.HtmlInfo.PageWidth = 730;
pdf.HtmlInfo.ShowUnknownHtmlTagsAsText = true;
pdf.HtmlInfo.TryEnlargePredefinedTableColumnWidthsToAvoidWordBreaking = true;
pdf.HtmlInfo.CharsetApplyingLevelOfForce = Aspose.Pdf.Generator.HtmlInfo.CharsetApplyingForceLevel.UseWhenImpossibleDetectFromContent;
// bind the source HTML
pdf.BindHTML("MyVeryVeryLongHTML");
MemoryStream stream = new MemoryStream();
pdf.Save(stream);
byte[] pdfBytes = stream.ToArray();
This code works for the HTML, but the overflow is not handled. The text continue after the page. Is it possible to set a max "height" of the page to not cross, and if it does, it recreates a new page ?
Hope it makes sense !
Thanks a lot
You can set the Page height by selecting type of PDF page you require like A1, A2, etc . Afterwords , your problem of page height will automatically be taken care by the Aspose. For more refer the link..
Aspose PDF Page Height
Update
update pdf.HtmlInfo to pdf.PageSetup (or pdf.PageInfo) and add bottom margin also.