The documentation for Select.Pdf for .Net v 2.22 uses the following code snippet to load an existing document. However, in the actual library there is no constructor with that takes a string as a parameter. Is anyone aware of how to load an existing .pdf? I am using the Community Edition of this product as well.
string file = Server.MapPath("~/files/doc1.pdf");
// load the pdf document
PdfDocument doc = new PdfDocument(file);
// add a new page to the document
PdfPage page = doc.AddPage();
// create a new pdf font (component standard font)
PdfFont font = doc.AddFont(PdfStandardFont.Helvetica);
font.Size = 20;
// create text element and add it to the new page
PdfTextElement text = new PdfTextElement(100, 100,
"Sample text added to an existing pdf document.", font);
page.Add(text);
// save pdf document
doc.Save(Response, false, "Sample.pdf");
// close pdf document
doc.Close();
As suggested by Jason, here are my comments as an answer:
I've downloaded the latest version (16.2) from the Select.Pdf website and the .net 4 DLLs do contain the PdfDocument(string filename) constructor, just as described in their documentation.
I've also downloaded the Community Edition and checked the same PdfDocument class. In this edition, the constructor has fewer overloads.
So I guess that the Community Edition is a much older version and you cannot apply the full version examples to the Community Edition.
Related
I've been attempting to find an easy solution to exporting a Canvas in my WPF Application to a PDF Document.
So far, the best solution has been to use the PrintDialog and set it up to automatically use the Microsoft Print the PDF 'printer'. The only problem I have had with this is that although the PrintDialog is skipped, there is a FileDialog to choose where the file should be saved.
Sadly, this is a deal-breaker because I would like to run this over a large number of canvases with automatically generated PDF names (well, programitically provided anyway).
Other solutions I have looked at include:
Using PrintDocument, but from my experimentation I would have to manually iterate through all my Canveses children and manually invoke the correct Draw method (of which a lot of my custom elements with transformation would be rather time consuming to do)
Exporting as a PNG image and then embedding that in a PDF. Although this works, TextBlocks within my canvas are no longer text. So this isn't an ideal situation.
Using the 3rd party library PDFSharp has the same downfall as the PrintDocument. A lot of custom logic for each element.
With PDFSharp. I did find a method fir generating the XGraphics from a Canvas but no way of then consuming that object to make a PDF Page
So does anybody know how I can skip or automate the PDF PrintDialog, or consume PDFSharp XGraphics to make
A page. Or any other ideas for directions to take this besides writing a whole library to convert each of my Canvas elements to PDF elements.
If you look at the output port of a recent windows installation of Microsoft Print To PDF
You may note it is set to PORTPROMP: and that is exactly what causes the request for a filename.
You might note lower down, I have several ports set to a filename, and the fourth one down is called "My Print to PDF"
So very last century methodology; when I print with a duplicate printer but give it a different name I can use different page ratios etc., without altering the built in standard one. The output for a file will naturally be built:-
A) Exactly in one repeatable location, that I can file monitor and rename it, based on the source calling the print sequence, such that if it is my current default printer I can right click files to print to a known \folder\file.pdf
B) The same port can be used via certain /pt (printto) command combinations to output, not just to that default port location, but to a given folder\name such as
"%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" /pt listIN.doc "My Print to PDF" "My Print to PDF" "listOUT.pdf"
Other drivers usually charge for the convenience of WPF programmable renaming, but I will leave you that PrintVisual challenge for another of your three wishes.
MS suggest XPS is best But then they would be promoting it as a PDF competitor.
It does not need to be Doc[X]2PDF it could be [O]XPS2PDF or aPNG2PDF or many pages TIFF2PDF etc. etc. Any of those are Native to Win 10 also other 3rd party apps such as [Free]Office with a PrintTo verb will do XLS[X]2PDF. Imagination becomes pagination.
I had a great success in generating PDFs using PDFSharp in combination with SkiaSharp (for more advanced graphics).
Let me begin from the very end:
you save the PdfDocument object in the following way:
PdfDocument yourDocument = ...;
string filename = #"your\file\path\document.pdf"
yourDocument.Save(filename);
creating the PdfDocument with a page can be achieved the following way (adjust the parameters to fit your needs):
PdfDocument yourDocument = new PdfDocument();
yourDocument.PageLayout = PdfPageLayout.SinglePage;
yourDocument.Info.Title = "Your document title";
PdfPage yourPage = yourDocument.AddPage();
yourDocument.Orientation = PageOrientation.Landscape;
yourDocument.Size = PageSize.A4;
the PdfPage object's content (as an example I'm putting a string and an image) is filled in the following way:
using (XGraphics gfx = XGraphics.FromPdfPage(yourPage))
{
XFont yourFont = new XFont("Helvetica", 20, XFontStyle.Bold);
gfx.DrawString(
"Your string in the page",
yourFont,
XBrushes.Black,
new XRect(0, XUnit.FromMillimeter(10), page.Width, yourFont.GetHeight()),
XStringFormats.Center);
using (Stream s = new FileStream(#"path\to\your\image.png", FileMode.Open))
{
XImage image = XImage.FromStream(s);
var imageRect = new XRect()
{
Location = new XPoint() { X = XUnit.FromMillimeter(42), Y = XUnit.FromMillimeter(42) },
Size = new XSize() { Width = XUnit.FromMillimeter(42), Height = XUnit.FromMillimeter(42.0 * image.PixelHeight / image.PixelWidth) }
};
gfx.DrawImage(image, imageRect);
}
}
Of course, the font objects can be created as static members of your class.
And this is, in short to answer your question, how you consume the XGraphics object to create a PDF page.
Let me know if you need more assistance.
I'm adding a link to a file from a pdf document (created with itext)
this way:
Chunk chunk = new Chunk(fileName, font);
chunk.SetAnchor("./relative/path/to/file");
Link works great if I open document in Google Chrome or Adobe reader.
But it doesn't work if I open my PDF in Microsoft Edge.
Is it even possible to create a file link inside pdf with itext that will work in Microsoft Edge? If yes, then how?
Is it even possible to create a file link inside pdf with itext that will work in Microsoft Edge?
If yes, then how?
Having done some tests it appears that Edge does not support relative links in PDF documents.
It does support absolute links, though, given the full URI, e.g.
chunk = new Chunk("Only ASCII chars in target. Full path.");
chunk.SetAnchor("file:///C:/Repo/GitHub/testarea/itext5/target/test-outputs/annotate/Attachments/1.png");
doc.Add(new Paragraph(chunk));
In contrast to other PDF viewers (Adobe Reader, Chrome, cf. your previous question in this context) it does not support URL encoding of special characters like Cyrillic ones:
chunk = new Chunk("Cyrillic chars in target. URL-encoded. Full path. NOT WORKING");
chunk.SetAnchor("file:///C:/Repo/GitHub/testarea/itext5/target/test-outputs/annotate/" + WebUtility.UrlEncode("Вложения") + "/1.png");
doc.Add(new Paragraph(chunk));
But it does support the special characters in UTF-8 encoding. As UTF-8 PdfString encoding is a PDF-2.0 feature and iText 5 does not support PDF-2.0, one has to cheat a bit to inject strings in UTF-8 encoding here:
chunk = new Chunk("Cyrillic chars in target. Action manipulated. Full path.");
chunk.SetAnchor("XXX");
action = (PdfAction)chunk.Attributes[Chunk.ACTION];
action.Put(PdfName.URI, new PdfString(new UTF8Encoding().GetBytes("file:///C:/Repo/GitHub/testarea/itext5/target/test-outputs/annotate/Вложения/1.png")));
doc.Add(new Paragraph(chunk));
Tested with Edge 41.16299.666.0
Is it possible to use iText 7 to flatten an XFA PDF? I'm only seeing Java documentation about it (http://developers.itextpdf.com/content/itext-7-examples/itext-7-form-examples/flatten-xfa-using-pdfxfa).
It seems like you can use iTextSharp, however to do this.
I believe it's not an AcroForm PDF because doing something similar to this answer How to flatten pdf with Itext in c#? simply created a PDF that wouldn't open properly.
It looks like you have to use iTextSharp and not iText7. Looking at the NuGet version it looks like iTextSharp is essentially the iText5 .NET version and like Bruno mentioned in the comments above, the XFA stuff simply hasn't been ported to iText7 for .NET.
The confusion stemmed from having both iText7 and iTextSharp versions in NuGet and also the trial page didn't state that the XFA worker wasn't available for the .NET version of iText7 (yet?)
I did the following to accomplish what I needed at least for a trial:
Request trial copy here: http://demo.itextsupport.com/newslicense/
You'll be emailed an xml license key, you can just place it on your desktop for now.
Create a new console application in Visual Studio
Open the Project Manager Console and type in the following and press ENTER (this will install other dependencies as well)
Install-Package itextsharp.xfaworker
Use the following code:
static void Main(string[] args)
{
ValidateLicense();
var sourcePdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "<your_xfa_pdf_file>");
var destinationPdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "output.pdf");
FlattenPDF(sourcePdfPath, destinationPdfPath);
}
private static void ValidateLicense()
{
var licenseFileLocation = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "itextkey.xml");
iTextSharp.license.LicenseKey.LoadLicenseFile(licenseFileLocation);
}
private static void FlattenPDF(string sourcePdfPath, string destinationPdfPath)
{
using (var sourcePdfStream = File.OpenRead(sourcePdfPath))
{
var document = new iTextSharp.text.Document();
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, new FileStream(destinationPdfPath, FileMode.Create));
var xfaf = new iTextSharp.tool.xml.xtra.xfa.XFAFlattener(document, writer);
sourcePdfStream.Position = 0;
xfaf.Flatten(new iTextSharp.text.pdf.PdfReader(sourcePdfStream));
document.Close();
}
}
The trial will put a huge watermark on the resulting PDF, but at least you can get it working and see how the full license should work.
For IText 7 this could be done in the following way
LicenseKey.LoadLicenseFile(#"Path of the license file");
MemoryStream dest_File = new MemoryStream();
XFAFlattener xfaFlattener = new XFAFlattener();
xfaFlattener.Flatten(new MemoryStream( File.ReadAllBytes(#"C:\\Unflattened file")), dest_File);
File.WriteAllBytes("flatten.pdf", dest_File.ToArray());
I want to convert pdf file to text file but some of pdf files do not work with pdfbox dll as the version of acrobat in newer than Acrobat 5.x
Please tell me what i do?
output.WriteLine("Begin Parsing.....");
output.WriteLine(DateTime.Now.ToString());
PDDocument doc = PDDocument.load(path);
PDFTextStripper stripper = new PDFTextStripper();
output.Write(stripper.getText(doc));
Your first attempt should be to try with a current version of PDFBox. Your version 0.7.3 dates back to 2006! PDFBox meanwhile has become an Apache project and is located here: http://pdfbox.apache.org/ and the current version (as of May 2013) is 1.8.1. And I'm very sure that PDFBox nowerdays does support PDF object streams and cross reference streams which were new in PDF Reference version 1.5, the version Adobe Acrobat 6 has been built for
If that does not work, you might want to try other PDF libraries, e.g. iText (or iTextSharp in your case) version 5.4.x if the AGPL (or alternatively buying a license) is no problem for you.
Information on text parsing using iText(Sharp) can be found in chapter15 Marked content and parsing PDF of iText in Action — 2nd Edition. The samples from that chapter can be found online: Java and .Net.
For a first test the sample ExtractPageContentSorted2.cs / ExtractPageContentSorted2.java would be a good start. The central code:
PdfReader reader = new PdfReader(PDF_FILE);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++) {
sb.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i));
}
If neither a current PDFBox version nor a current iText(Sharp) version can parse your PDF, you might want to post a sample for inspection; there are ways to drop all information required for text parsing from a PDF...
I have a report that has hundreds of pages. I need to create extract each individual page from this document into a new document. I have found that this is possible using INTEROP, however I'm trying avoid installing MS Office on the server. I've been using ASPOSE for most of the operations, but this functionality is doesn't appear to be supported.
Is there a way to seperate pages of a document into individual files without having MS Office Installed?
Aspose.Words does not have layout information like pages or line numbers. It maintains DOM. But we have written some utility classes to achieve such behavior. They split the word document into multiple sections, such that each page becomes one separate section. After that, it is easy to copy individual pages.
String sourceDoc = dataDir + "source.docx";
String destinationtDoc = dataDir + "destination.docx";
// Initialize the Document instance with source and destination documents
Document doc = new Document(sourceDoc);
Document dstDoc = new Document();
// Remove the blank default page from new document
dstDoc.RemoveAllChildren();
PageNumberFinder finder = new PageNumberFinder(doc);
// Split nodes across pages
finder.SplitNodesAcrossPages(true);
// Get separate page sections
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, 5, NodeType.Section);
foreach (Section section in pageSections)
dstDoc.AppendChild(dstDoc.ImportNode(section, true));
dstDoc.LastSection.Body.LastParagraph.Remove();
dstDoc.Save(destinationtDoc);
The PageNumberFinder class can be downloaded from here.
PS. I am a Developer Evangelist at Aspose.