Is it possible to verify if we can copy the content of a PDF document with iTextSharp?
I have a method that copy the content of the PDF and add a new page at the end with project's information but it throw a "System.ArgumentException: PdfReader not opened with owner password". I get this error when I do writer.GetImportedPage(reader, i);
Thanks for the help!
You should be able to just check the property PdfReader.IsOpenedWithFullPermissions.
PdfReader r = new PdfReader("YourFile.pdf");
if (r.IsOpenedWithFullPermissions)
{
//Do something
}
Related
I'm generating a pdf file from a template with iTextSharp, filling each field in this code portion:
PdfReader pdfReader = new PdfReader(templatePath);
try
{
using (FileStream newFileStream = new FileStream(newFilePath, FileMode.Create))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, newFileStream))
{
// fill each field
AcroFields pdfFormFields = stamper.AcroFields;
foreach (KeyValuePair<string, string> entry in content)
{
if (!String.IsNullOrEmpty(entry.Value))
pdfFormFields.SetField(entry.Key, entry.Value);
}
//The below will make sure the fields are not editable in
//the output PDF.
stamper.FormFlattening = true;
stamper.Close();
}
}
}
finally
{
pdfReader.Close();
}
Everything goes fine, file looks ok, but when i try to reopen the file to merge it with some other files I've generated in a unique document i get this error:
2015-11-23 09:46:54,651||ERROR|UrbeWeb|System.IO.IOException: The process cannot access the file 'D:\Sviluppo\communitygov\MaxiAnagrafeImmobiliare\MaxiAnagrafeImmobiliare\cache\IMU\E124\admin\Stampe\Provvedimento_00223850306_2015_11_23_094654.pdf' because it is being used by another process.
Error occurs at this point
foreach (Documento item in docs)
{
string fileName = item.FilePath;
pdfReader = new PdfReader(fileName); // IOException
// some other operations ...
}
Edit: Using Process monitor as suggested I can see there is no close CloseFile operation as I would expect. Can this be the source of the issue?
I've been stuck on this for hours any help is really really appreciated.
Had the same issue with me. This helped a lot.
"You're problem is that you are writing to a file while you are also reading from it. Unlike some file types (JPG, PNG, etc) that "load" all of the data into memory, iTextSharp reads the data as a stream. You either need to use two files and swap them at the end or you can force iTextSharp to "load" the first file by binding your PdfReader to a byte array of the file."
PdfReader reader = new PdfReader(System.IO.File.ReadAllBytes(filePath));
Ref: Cris Haas answer to Cannot access the file because it is being used by another process
I had a similar problem with opening pdf files (for read only) with iTextSharp PdfReader. The first file gave no problem, the second one gave that exception (can not access the file, etc.).
After hours and googling and searching for complicate solutions and twisting my brain, only the simple following code resolved it fully:
iTextSharp_pdf.PdfReader pdfReader = null;
pdfReader = new iTextSharp_pdf.PdfReader(fileName);
Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE], so whatever contents before this fields need to be extracted, the [STOP_HERE] field could be located on a different page for each document, so using page numbers wouldn't help here.
I searched online and all I can find is a way to copy only form fields from a document but not the whole document elements including images texts with their exact location and style.
Can IText do the job here?
EDIT: More details
[STOP_HERE] is an AcroForms text field which has been placed in a document by the PDF design person to indicate that everything before this element should be copied as is into a different document. The field itself is not important, I don't want to fill or do anything with it, it's just used as a signal to let the document parser stop there and copy all previous (upper) contents, I just don't know how to read all contents (without changing style, contents, etc) before this field.
Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE]
Unfortunately the OP didn't tell whether the page containing the form field [STOP_HERE] is to be included or not. As that is a mere +/-1 matter, though, I simply assumed the page is to be included.
Thus, the task can be implemented like this:
PdfReader reader = new PdfReader(srcFile);
AcroFields.Item field = reader.AcroFields.Fields["[STOP_HERE]"];
if (field != null)
{
int firstPage = reader.NumberOfPages + 1;
for (int index = 0; index < field.Size; index++)
{
int page = field.GetPage(index);
if (page > 0 && page < firstPage)
firstPage = page;
}
if (firstPage <= reader.NumberOfPages)
{
reader.SelectPages("1-" + firstPage);
PdfStamper stamper = new PdfStamper(reader, new FileStream(dstFile, FileMode.Create, FileAccess.Write));
stamper.Close();
}
}
reader.Close();
The code opens the source file in a PdfReader and first looks for the field. If it exists, it iterates over all appearances of that field and determines the earliest page with an appearance of the field. If there is such a page, the code restricts the reader to the pages up to that page and stores this restriction using a PdfStamper.
I have two pdf files. On Sercurity tab both files have set Security Method: No Security and Document Assembly: Not Allowed and page Extraction: Not Allowed. Other items are allowed.
I using standart ITextSharp method to retrieve text from pdf:
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); //LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
From first file i can get currentText wihtout any problem from second file I cannot retrieve text, currentText is empty. I was trying with LocationTextExtractionStrategy, but result is the same. I opened this file in SodaPDF and convert it to txt file but this file is empty too (while frist file is converted to txt without any problems).
It is possible to read text from this file from C# or with any other application? If I buy Adobe Reader I will convert this file to txt ?
What is difference between these two files ?
Thanks
There may be a lot of pdf which actually are images. You cannot extracttext from imaged pdf as Bruno Lowagie said. you need to go for third party OCR for this.
you ca use Adobe Acrobat to convert the pdf to editable format like word, html..
I work as Social Media Developer at Aspose. I would suggest you to download and try Aspose.Pdf for .NET to convert PDF to Text file. In case your file contains images and you need to extract the text from those images, you can use Aspose.Pdf to convert Pdf file to images and then perform OCR using Aspose.OCR for .NET.
Following is the sample code to convert PDf to Text using Aspose.Pdf for .NET
//open document
Document pdfDocument = new Document("input.pdf");
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
TextWriter tw = new StreamWriter("extracted-text.txt");
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();
Please download a free trial and try it.
I am generating a pdf using iTextSharp. If certain properties are true then I also want to insert an existing pdf with static content.
private byte[] GeneratePdf(DraftOrder draftOrder)
// create a pdf document
var document = new Document();
// set the page size, set the orientation
document.SetPageSize(PageSize.A4);
// create a writer instance
var pdfWriter = PdfWriter.GetInstance(document, new FileStream(file, FileMode.Create));
document.Open();
if(draftOrder.hasProperty){
//add these things to the pdf
var textToBeAdded = "<table><tr>....</table>";
}
FormatHtml(document, textToBeAdded , css);
if(someOtherProperty){
//add static pdf from file
document.NewPage();
var reader = new PdfReader("myPath/existing.pdf");
PdfImportedPage page;
for(var i = 0; i < reader.NumberOfPages; i++){
//It's this bit I don't really understand
//**how can I add the page read to the document being created?**
}
I can load the pdf from the source but when I iterate over the pages I can't seem to be able to add them to the document I am creating.
Cheers
Please read http://manning.com/lowagie2/samplechapter6.pdf
If you don't mind losing all interactivity, you can get the template from the writer object with the GetImportedPage() method and add it to the document with AddTemplate ().
This question has been answered many times on StackOverflow and you'll notice that I always warn about some dangers: you need to realize that the dimensions of the imported page can be different from the page size you initially defined. Because of this invisible parts of the imported page can become visible; visible parts can become invisible.
I'd prefer adding the extra page in a second ho using PdfCopy, but maybe that's just me.
I've generated a pdf using iTextSharp and I can preview it very well in ASP.Net but I need to send it directly to printer without a preview. I want the user to click the print button and automatically the document prints.
I know that a page can be sent directly to printer using the javascript window.print() but I don't know how to make it for a PDF.
Edit: it is not embedded, I generate it like this;
...
FileStream stream = new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create);
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf, stream);
pdf.Open();
pdf.Add(new Paragraph(member.ToString()));
pdf.Close();
Response.Redirect("~1.pdf");
...
And here I am.
Finally I made it, but I had to use an IFRAME, I defined an IFrame in the aspx and didn't set the src property, in the cs file I made generated the pdf file and set the src property of the iFrame as the generated pdf file name, like this;
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf,
new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create));
pdf.Open();
//This action leads directly to printer dialogue
PdfAction jAction = PdfAction.JavaScript("this.print(true);\r", writer);
writer.AddJavaScript(jAction);
pdf.Add(new Paragraph("My first PDF on line"));
pdf.Close();
//Open the pdf in the frame
frame1.Attributes["src"] = "~1.pdf";
And that made the trick, however, I think that i should implement your solution Stefan, the problem is that I'm new to asp.net and javascript and if I don't have a complete source code I could not code your suggestion but at least is the first step, I was very surprised how much code in html and javascript i need to learn. Thnx.
Is the pdf embedded in the page with embedd-tag or just opened in a frame or how are you showing it?
If its embedded, just make sure that the object is selected and then do a print().
Get the ref to the embedded document.
var x = document.getElementById("mypdfembeddobject");
x.click();
x.setActive();
x.focus();
x.print();
It's a little more tricky if you're using pdfsharp but quite doable
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
XGraphics gfx = XGraphics.FromPdfPage(page);
XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic);
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black,
new XRect(0, 0, page.Width, page.Height),
XStringFormats.Center);
// real stuff starts here
// current version of pdfsharp doesn't support actions
// http://www.pdfsharp.net/wiki/WorkOnPdfObjects-sample.ashx
// so we got to get close to the metal see chapter 12.6.4 of
// http://partners.adobe.com/public/developer/pdf/index_reference.html
PdfDictionary dict = new PdfDictionary(document); //
dict.Elements["/S"] = new PdfName("/JavaScript"); //
dict.Elements["/JS"] = new PdfString("this.print(true);\r");
document.Internals.AddObject(dict);
document.Internals.Catalog.Elements["/OpenAction"] =
PdfInternals.GetReference(dict);
document.Save(Server.MapPath("2.pdf"));
frame1.Attributes["src"] = "2.pdf";
ALso, try this gem:
<link ref="mypdf" media="print" href="mypdf.pdf">
I havent tested it, but what I have read about it, it can be used in this way to let the mypdf.pdf be printed instead of page content whatever method you are using to print the page.
Search for media="print" to check out more.
You can embed javascript in the pdf, so that the user gets a print dialog as soon as their browser loads the pdf.
I'm not sure about iTextSharp, but the javascript that I use is
var pp = this.getPrintParams();
pp.interactive = pp.constants.interactionLevel.automatic;
this.print(pp);
For iTextSharp, check out http://itextsharp.sourceforge.net/examples/Chap1106.cs