I'm generating PDF files using PDFsharp, and I need to overlay the PDF I'm generating with a specific page from another PDF.
I've created this method:
private void ApplyOverlay(XGraphics graph, string overlaypdfPath, int pageNumberInOverlay, XRect coordinates)
{
var xPdf = XPdfForm.FromFile(overlaypdfPath);
if(xPdf.PageCount < pageNumberInOverlay)
throw new Exception("not enough pages");
//Here i need to take from xPdf just the page number -> pageNumberInOverlay
graph.DrawImage(xPdfPageN, coordinates);
}
I don’t know how to select only a specific page.
You can append the page number to the name of the PDF file, separated with a hash sign ("#").
To get page 7 of "sample.pdf", use the filename "sample.pdf#6" (zero-based page numbers).
Related
I have a requirement to export RDLC report into PDF, it should also contain letter head of the company in background. Problem I see is that RDLC has a header, body and footer, how do we apply common image background? Any idea to this issue?
(Posted as an answer, since it's too long for a comment.)
I don't know of a method to add a page background directly in an RDLC. However, we had a similar issue with our report generator (MS Access, not RDLC), and solved it by (1) creating the PDF without letterhead and then (2) using PDFSharp to merge the resulting PDF with a letterhead ("background") PDF. Something like this might work for your use case as well.
We use the following code:
public static void AddBackground(string source, string background, string result)
{
using (var formBackground = XPdfForm.FromFile(background))
using (var pdf = PdfReader.Open(source, PdfDocumentOpenMode.Modify))
{
foreach (var page in pdf.Pages.Cast<PdfPage>())
{
var xg = XGraphics.FromPdfPage(page, XGraphicsPdfPageOptions.Prepend);
xg.DrawImage(formBackground, 0, 0);
if (formBackground.PageIndex < formBackground.PageCount - 1)
{
formBackground.PageIndex += 1;
}
}
pdf.Save(result);
}
}
All parameters are paths to the respective PDF files. If the background PDF has less pages than the source PDF, then the last page of the background PDF is added to all remaining source PDF pages. It is useful if your first page has a different background than all remaining pages, you just need a 2-page background PDF for that.
I want to convert the entire content of that page to PDF by searching for a specific word on each page (which may be on one page or more).
For example, we have a file that has three pages, there is a special word on the first page, and the next special word on the third page. I want to save the PDF from the first to the second page and then save the third page separately. The PDF files will be named according to the specific word on that page.
My problem is that I don't know how to loop for each page and read the content of that page to get to the special word and save the pages as a PDF.
Thank You
Here is how you can do it.
Paginate your Word document using DocumentModel.GetPaginator method.
Read the text content of each page using FrameworkElement.ToText extension method.
Save selected pages to PDF using DocumentModelPage.Save method.
In other words, try the following:
string search = "Your Specific Word";
string inputPath = "input.docx";
// Load Word document.
var document = DocumentModel.Load(inputPath);
// 1. Get document's pages.
var pages = document.GetPaginator().Pages;
for (int i = 0, count = pages.Count; i < count; ++i)
{
// 2. Read page's text content.
DocumentModelPage page = pages[i];
string pageTextContent = page.PageContent.ToText();
// 3. Save page as PDF.
if (pageTextContent.Contains(search))
{
string outputPath = $"{search}_{i}.pdf";
page.Save(outputPath);
}
}
I have a pdf which produced by SSRS. I need to get this pdf as a byte array then save whole pdf as a A4.Landscape.
I try ;
string say ="hello world";
byte [] pdfArr=Encoding.UTF8.GetBytes(say)
var doc = new Document(iTextSharp.text.PageSize.A4_Landscape.Rotate());
string path = Environment.CurrentDirectory;
PdfWriter.GetInstance(doc, new FileStream(path,"/pdfdoc.pdf",FileMode.Create));
doc.Open();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdfArr)));
doc.Close();
Process.Start(path+"/pdfdoc.pdf");
When I create new pdf by iTextsharp the above code works fine but when I try for the SSRS pdf, the pdf's inside fills with meaningless characters.
Also I know that, I can read and rotate page by page via PDFReader but I don't want to read the pages. Because, the reports table is too long so it divides into pages, I don't know how many pages should involved for one table, so my main aim is showing them in horizantal (landscape) as one table.
Any suggestions or code pieces are welcomed.
Thanks anyway..
Edit : As I explained in above paragraph, I can't take pages with pdfReader or something else because I don't want to change every page as landscape and I can't. It doesn't serve my aim. I just wat to create pdf as a landscape so all the loıng tables anda datas can seen in one page.
I have a problem when I'm working with image PDF files (PDF file with image only, no text) There are two PDF files img1, img2 and I want to combine two of them into one A4 page PDF file.
I have tried below code.
string Img1 = "C:/temp/image1.pdf";
string Img2 = "C:/temp/image2.pdf";
string MergedFile = "C:/temp/Combo.pdf";
//Create our PDF readers
PdfReader r1 = new PdfReader(Img1);
PdfReader r2 = new PdfReader(Img2);
//Our new page size, an A3 in landscape mode
iTextSharp.text.Rectangle NewPageSize = PageSize.A3.Rotate();
using (FileStream fs = new FileStream(MergedFile, FileMode.Create,
FileAccess.Write, FileShare.None))
{
//Create our document without margins
using (Document doc = new Document(NewPageSize, 0, 0, 0, 0))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
//Get our imported pages
PdfImportedPage imp1 = w.GetImportedPage(r1, 1);
PdfImportedPage imp2 = w.GetImportedPage(r2, 1);
//Add them to our merged document at specific X/Y coords
**w.DirectContent.AddTemplate(imp1, 0, 0);
w.DirectContent.AddTemplate(imp2, 0, -350);**
doc.Close();
}
}
}
r1.Close();
r2.Close();
So when i execute above code, because i have mentioned the y coord , it will combine pdf and two images will be on one page only.
BUt i don't want to do that
Here i am just giving example of two images,but in actual there are more than 20 images (converted into PDFs).
So depending on the image size, it should combine files. i can not give fix y coord for each n every file
Can anyone please help me to combine multiple PDF into single with no blank space..?
Structurally, here is what you want to do:
Allocate a new page of the "right" size
Merge the content streams of the pages
Merge the resources of the pages
Adjust all the annotations (if any)
The first step is easy, the rest, the second is easy, the third not so much (and will have the side effect of complicating step 2). I'll let you know ahead of time that I lied to you about the order.
Merging the content streams will be straight forward. What you will want to do is a four step process (I'll inject here that I know PDF very well, but iTextSharp not too well):
Insert a gsave operator (q)
Insert a transform operator (cm) to transform to the location where you want content to appear. In you case it will be 1 0 0 1 X Y cm
Copy the content streams from the current page
Insert a grestore operator (Q)
To merge the resources, you have to look at your newly created page's resources and for the current page do one of three things for each resource in each class of resource in a PDF page (XObject, Font, ColorSpace, ExtGState, Pattern, Shading, ProcSet - although for procset, you could set each procset to be the entire suite and do no harm):
If the resource exists in the newly created page, but under a different name, mark it as renamed.
If the resource does not exist in the newly created page and there is no resource with the same name, copy it in.
If the resource does not exist in the newly created page and there is a name conflict, rename the resource to a synthetic name not in the newly create page and copy it in.
Now to get back to my lie. In the resource merging, you will likely need a map built for the current page that maps old resource name to new resource name. When in the process of copying the content stream from one to the next, you will need to map all resource names referenced in the content stream to the new names built in the resource merge step.
To Adjust annotations, you will have to move them to their new location by adjusting the Rect property in each. You will also need to reset the /Parent property. For any of the text markup annotations, you will need to adjust the Quads.
Now, here is where the works will get gummed up in all of that. If a page is rotated, this will not work. If a page has a crop box, you will have to look at it and adjust the clipping region to simulate the crop. If the page is rotated and has Text annotations, this will need to attention to annotation flags to ensure that the aspect ratio is correct. If the document has link annotations on any of the pages with GoTo actions/destinations, you will need to adjust these.
Here is how I split a large PDF (144 mb):
public int SplitAndSave(string inputPath, string outputPath)
{
FileInfo file = new FileInfo(inputPath);
string name = file.Name.Substring(0, file.Name.LastIndexOf("."));
using (PdfReader reader = new PdfReader(inputPath))
{
for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber++)
{
string filename = pagenumber.ToString() + ".pdf";
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + filename, FileMode.Create));
document.Open();
copy.AddPage(copy.GetImportedPage(reader, pagenumber));
document.Close();
}
return reader.NumberOfPages;
}
}
For most PDFs (small size, and I guess old format), all works fine. But for a bigger one (that perhaps are using something like refstreams for better compression), the split pages open as one page, but its size is equal to the original PDF's size. What can I do?
In case of your document Top_Gear_Magazine_2012_09.pdf the reason is indeed the one I mentioned: All pages refer to object 2 0 R as their /Resources, and the dictionary in 2 0 obj in turn references all images in the PDF.
To split that document into partial documents containing only the images required, you should preprocess the document by first finding out which images belong to which pages and then creating individual /Resources dictionaries for all pages.
As you already use iText in this context, you can also use it to find out which images are required for which pages. Use the iText parser package to initially parse the PDF page by page using a RenderListener implementation whose RenderImage method simply remembers which image objects are used on the current page. (As a special twist, iText hides the name of the image XObject in question; you get the indirect object, though, and can query its object and generation number which suffices for the next step.)
In a second step, you open the document in a PdfStamperand iterate over the pages. For each page you retrieve the /Resources dictionary and copy it, but only copy those XObjects references referencing one of the image objects whose object number and generation you remembered for the respective page during the first step. Finally you set the diminished copy as the /Resources dictionary of the page in question.
The resulting PDF should split just fine.
PS A very similar issue recently came up on the iText mailing list. In that thread the solution recipe given here has been improved, to get around the difficulties caused by iText hiding the xobject name, I now would propose to intervene before the name is lost by using a different ContentOperator for "Do", here the Java version:
class Do implements ContentOperator
{
public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws IOException
{
PdfName xobjectName = (PdfName)operands.get(0);
names.add(xobjectName);
}
final List<PdfName> names = new ArrayList<PdfName>();
}
This content operator simply collects the names of the used xobjects, i.e. the xobject resources to keep for the given page.