I have a requirement to generate a PDF from multiple different (Unknown page Sized PDF's)
Create a cover sheet from a template and write the text onto it.
Pull a PDF (Unknown page size) and append to the above 3) Repeat
until all required PDF's are attached
Step 1 is not a problem and this is working, so I have a a cover sheet PDF generated. I now need a way to append the additional PDF's as above. How can we achieve this using ITextSharp?
If you are trying to concatenate multiple PDF files into one you may take a look at the following post.
I found a simple way to do this, I found something called PDFCopy in ITextSharp
void MergePdfStreams(List<Stream> Source, Stream Dest)
{
var copy = new PdfCopyFields(Dest);
foreach (Stream source in Source)
{
var reader = new PdfReader(source);
copy.AddDocument(reader);
}
copy.Close();
}
Source : Is there a straight forward way to append one PDF doc to another using iTextSharp?
Related
I want to find whether a text is present in the uploaded PDF file in ASP.NET c#.
using (MemoryStream str = new MemoryStream(this.docUploadField.FileBytes))
{
using (StreamReader sr = new StreamReader(str, Encoding.UTF8))
{
string line = sr.ReadToEnd();
}
}
I am getting the below as the file content when I read the contents of file.
Please help me with this
You surely need some PDF reading library.
Most famous being
IText (ITextSharp for who remembers it): https://github.com/itext/itext7-dotnet
PdfSharp: https://github.com/empira/PDFsharp
and many other free options.
With those you open pdf file and read it and take the text you need.
Usually they give you a collection of the PDF elements (paragraphs, images, etc etc, and you loop through them or use a search function to look for what you need)
I have a requirement to export RDLC report into PDF, it should also contain letter head of the company in background. Problem I see is that RDLC has a header, body and footer, how do we apply common image background? Any idea to this issue?
(Posted as an answer, since it's too long for a comment.)
I don't know of a method to add a page background directly in an RDLC. However, we had a similar issue with our report generator (MS Access, not RDLC), and solved it by (1) creating the PDF without letterhead and then (2) using PDFSharp to merge the resulting PDF with a letterhead ("background") PDF. Something like this might work for your use case as well.
We use the following code:
public static void AddBackground(string source, string background, string result)
{
using (var formBackground = XPdfForm.FromFile(background))
using (var pdf = PdfReader.Open(source, PdfDocumentOpenMode.Modify))
{
foreach (var page in pdf.Pages.Cast<PdfPage>())
{
var xg = XGraphics.FromPdfPage(page, XGraphicsPdfPageOptions.Prepend);
xg.DrawImage(formBackground, 0, 0);
if (formBackground.PageIndex < formBackground.PageCount - 1)
{
formBackground.PageIndex += 1;
}
}
pdf.Save(result);
}
}
All parameters are paths to the respective PDF files. If the background PDF has less pages than the source PDF, then the last page of the background PDF is added to all remaining source PDF pages. It is useful if your first page has a different background than all remaining pages, you just need a 2-page background PDF for that.
I have two PDF files and I want to merge two PDF files in single PDF files using IronPDF (reference from https://ironpdf.com/). Here is the code I am using
var PDFs = new List<PdfDocument>();
foreach (var file in files)
{
PDFs.Add(PdfDocument.FromFile(file));
}
PdfDocument PDF = PdfDocument.Merge(PDFs);
newFileName = Path.Combine(TEMP_PDF_FILESTORE_LOCATION, newFileName);
PDF.SaveAs(newFileName);
While merging two PDF files here is the error it showing "Could not safely read page objects from AnotherPdfFile". One of PDF can contain image in that. Some image PDF it will take some throw error.
How we can remove this error?
I got the same error (Could not safely read page objects from AnotherPdfFile) when I tried to merge PDF documents that were constructed using streams that came from another service.
In order to solve this, I had to first copy each stream into a MemoryStream and then passing in the memory stream into the PdfDocument constructor. Using memory streams, IronPdf was able to merge these.
How can I read pdf files and save contents to a text file using Spire.PDF?
For example: Here is a pdf file and here is the desired text file from that pdf
I tried the below code to read the file and save it to a text file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Tamal\Desktop\101395a.pdf");
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
doc.Close();
String fileName = #"C:\Users\Tamal\Desktop\101395a.txt";
File.WriteAllText(fileName, buffer.ToString());
System.Diagnostics.Process.Start(fileName);
But the output text file is not properly formatted. It has unnecessary whitespaces and a complete para is broken into multiple lines etc.
How do I get the desired result as in the desired text file?
Additionally, it is possible to detect and mark(like add a tag) to texts with bold, italic or underline forms as well? Also things get more problematic for pages have multiple columns of text.
Using iText
File inputFile = new File("input.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));
SimpleTextExtractionStrategy stes = new SimpleTextExtractionStrategy();
PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(stes);
canvasProcessor.processPageContent(pdfDocument.getPage(1));
System.out.println(stes.getResultantText());
This is (as the code says) a basic/simple text extraction strategy.
More advanced examples can be found in the documentation.
Use IronOCR
var Ocr = new IronOcr.AutoOcr();
var Results = Ocr.ReadPdf("E:\Demo.pdf");
File.WriteAllText("E:\Demo.txt", Convert.ToString(Results));
For reference https://ironsoftware.com/csharp/ocr/
Using this you should get formatted text output, but not exact desire output which you want.
If you want exact pre-interpreted output, then you should check paid OCR services like OmniPage capture SDK & Abbyy finereader SDK
That is the nature of PDF. It basically says "go to this location on a page and place this character there." I'm not familiar at all with Spire.PFF; I work with Java and the PDFBox library, but any attempt to extract text from PDF is heuristic and hence imperfect. This is a problem that has received considerable attention and some applications have better results than others, so you may want to survey all available options. Still, I think you'll have to clean up the result.
I have a pdf which produced by SSRS. I need to get this pdf as a byte array then save whole pdf as a A4.Landscape.
I try ;
string say ="hello world";
byte [] pdfArr=Encoding.UTF8.GetBytes(say)
var doc = new Document(iTextSharp.text.PageSize.A4_Landscape.Rotate());
string path = Environment.CurrentDirectory;
PdfWriter.GetInstance(doc, new FileStream(path,"/pdfdoc.pdf",FileMode.Create));
doc.Open();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdfArr)));
doc.Close();
Process.Start(path+"/pdfdoc.pdf");
When I create new pdf by iTextsharp the above code works fine but when I try for the SSRS pdf, the pdf's inside fills with meaningless characters.
Also I know that, I can read and rotate page by page via PDFReader but I don't want to read the pages. Because, the reports table is too long so it divides into pages, I don't know how many pages should involved for one table, so my main aim is showing them in horizantal (landscape) as one table.
Any suggestions or code pieces are welcomed.
Thanks anyway..
Edit : As I explained in above paragraph, I can't take pages with pdfReader or something else because I don't want to change every page as landscape and I can't. It doesn't serve my aim. I just wat to create pdf as a landscape so all the loıng tables anda datas can seen in one page.