As of now, we are generating PDFs programmatically using crystal reports and saving it to database. The PDF document has barcode image in it. Each file is of size 120-150 KB.
Everything is running fine but lately we are facing problem with huge growth in database size and storage requirements. This is due to 100 - 1000 records being generated each day.
Is there any way to compress the PDF files and then store it. Any API/tools available that perform these without creating issue to the barcode.Can we gain much reduction in size after compression?
Or any alternative way of storing the data will be good?
Any suggestions on this would be highly appreciated.
Thanks,
Sveerap
Unfortunately, you won't gain much by compressing a PDF as it is already compressed.
Many compressed PDF files can be compressed further.
Size of a PDF file can usually be decreased by:
removing unused objects (if any)
removing extra whitespace characters from the file (not from the visual content)
using object streams (a PDF 1.5 feature)
I do not know how well Crystal Report compresses PDFs but you might want to try Docotic.Pdf library and the following code and see if your files can be compressed better.
public static void CompressExistingDocument(string original, string output)
{
using (PdfDocument pdf = new PdfDocument(original))
{
pdf.SaveOptions.Compression = PdfCompression.Flate;
pdf.SaveOptions.UseObjectStreams = true;
pdf.SaveOptions.RemoveUnusedObjects = true;
pdf.SaveOptions.WriteWithoutFormatting = true;
pdf.Save(output);
}
FileInfo originalFileInfo = new FileInfo(original);
FileInfo compressedFileInfo = new FileInfo(output);
MessageBox.Show(
String.Format("Original file size: {0} bytes;\r\nCompressed file size: {1} bytes",
originalFileInfo.Length, compressedFileInfo.Length));
System.Diagnostics.Process.Start(output);
}
Disclaimer: I work for the vendor of the library.
Related
I have two PDF files and I want to merge two PDF files in single PDF files using IronPDF (reference from https://ironpdf.com/). Here is the code I am using
var PDFs = new List<PdfDocument>();
foreach (var file in files)
{
PDFs.Add(PdfDocument.FromFile(file));
}
PdfDocument PDF = PdfDocument.Merge(PDFs);
newFileName = Path.Combine(TEMP_PDF_FILESTORE_LOCATION, newFileName);
PDF.SaveAs(newFileName);
While merging two PDF files here is the error it showing "Could not safely read page objects from AnotherPdfFile". One of PDF can contain image in that. Some image PDF it will take some throw error.
How we can remove this error?
I got the same error (Could not safely read page objects from AnotherPdfFile) when I tried to merge PDF documents that were constructed using streams that came from another service.
In order to solve this, I had to first copy each stream into a MemoryStream and then passing in the memory stream into the PdfDocument constructor. Using memory streams, IronPdf was able to merge these.
In my App i generate an PDF-File with PDFSharp.Xamarin which I got from this site:
https://github.com/roceh/PdfSharp.Xamarin
Everything is working fine.
In my PDF-Document I have many Images, which are compressed.
But the file size of my PDF-Document is too large.
Is there a possibility to compress my PDF-Document before saving it?
How can I work with the PdfSharp.SharpZipLib.Zip Namespace to deflate the file size?
UPDATE:
Here is my Code:
document = new PdfDocument();
document.Info.Title = nameDok.Replace(" ", "");
document.Info.Author = "---";
document.Info.CreationDate = DateTime.Now;
document.Info.Subject = nameDok.Replace(" ", "");
//That is how i add Images:
XImage image = XImage.FromStream(lstr);
gfx.DrawImage(image, 465, YPrev - 2, newimagewf, newimagehf);
document.CustomValues.CompressionMode = PdfCustomValueCompressionMode.Compressed;
document.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
document.Save(speicherPfad);
Thanks for everyone.
I only know the original PDFsharp, not the Xamarin port: images are deflated automatically using SharpZipLib.
Make sure to use appropriate source images (e.g. JPEG or PNG, depending on the image).
On the project start page they write:
"Currently all images created via XGraphics are converted to jpegs with 70% quality."
This could mean that images are re-compressed, maybe leading to larger files than before.
Take one JPEG file, convert it to PDF, and check the size of the image (in bytes) in the PDF file.
I have a pdf template of size 230KB. In my WebAPI for multiple users, taking copy of that template pushing data to it, and merging using iTextsharp library. For 1500 users, total file size is reaching up to 320 MB.
I tried using BitMiracle, it reduced the file size to 160 MB. But it is still a large file.
I used acrobat Pro and used Save as Other option Reduced Size PDF, it reduced file size to 25 MB.
I want to decrease the file size to 25MB in my WebAPI using C# which will be hosted on server later.
As user is not supposed to edit that PDF, he will just store it as a record. Can i generate a post script file and then use acrobat distiller to decrease the size?If yes, how can I do it?
I am using ghostscript.Net. Wrote this method, it is not throwing any error. But i am unable to find the path of generated postscript file
public void convertToPs(string file)
{
try
{
Process printProcess = new Process();
printProcess.StartInfo.FileName = file;
printProcess.StartInfo.Verb = "printto";
printProcess.StartInfo.Arguments = "\"Ghostscript PDF\"";
printProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
printProcess.StartInfo.CreateNoWindow = true;
printProcess.Start();
// Wait until the PostScript file is created
try
{
printProcess.WaitForExit();
}
catch (InvalidOperationException) { }
printProcess.Dispose();
}
catch (Exception ex)
{
throw ex;
}
}
Please help
Does the template has embedded Fonts? Probably the merging doesn't combine those fonts. If you don't need embedded Fonts you could remove them. Adobe does some good work in combining embedded fonts.
If you want you can send me such a big pdf document, so that i can understand, why this file is getting so big. I am a developer of a PDF library and getting the smallest PDF is one interesting usecase i am working on.
I have a sql server db. In there are many, many rows. Each row has a column that contains a stored pdf.
The db is a gig in size. So we can expect roughly half that size is due to the pdfs.
now I have a requirement to join all those pdf's ... into 1 pdf. Don't ask why.
Can you suggest the best way forward and which component will be best suited for this job. There are many answers available:
How can I join two PDF's using iTextSharp?
Merge memorystreams to one itext document
How to merge multiple pdf files (generated in run time)?
as to how to join two (or more pdfs). But what I'm asking for is in terms of performance. We literally dealing with around 50 000 pdfs that need to be merged into 1 almighty pdf
[Edit Solution] Brought time to merge 1000 pdfs from 4m30s to 21s
public void MergePDFs(string targetPDF, string sourceDir)
{
using (FileStream stream = new FileStream(targetPDF, FileMode.Create))
{
var files = Directory.GetFiles(sourceDir);
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdfDoc.Open();
Console.WriteLine("Merging files count: " + files.Length);
int i = 1;
var watch = System.Diagnostics.Stopwatch.StartNew();
foreach (string file in files)
{
Console.WriteLine(i + ". Adding: " + file);
pdf.AddDocument(new PdfReader(file));
i++;
}
if (pdfDoc != null)
pdfDoc.Close();
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
MessageBox.Show(elapsedMs.ToString());
}
}
I just did a C#/Winforms project with PDFSharp and merging images to PDFs and it worked phenomenally with a traditional folder structure. I imagine that it would work similarly with data stored PDFs so long as you can pull them into a memory stream first then merge them.
Some suggestions:
1) Recommend doing it in a multi-threaded environment so you can work on multiple PDFs at a time.
2) Open only what you need and close as soon as the operation is complete. So say you have three documents that need to be merged into one. Create a blank PDF. Open first into a memory stream, open blank. Append first to blank. Close first, save blank, close blank. Repeat for second and third. This way you control how much memory you are taking up at any one point in time. In this way I was able to append millions of images, but control memory usage.
3) Ensure you are using the Using statements when utilizing objects. This will help with memory cleanup and eliminate the need for calling garbage collector which is looked down upon.
4) Separate your business (work) from your UI as best you can so you can cancel the operation at any point in time, or view current status as it progresses through.
5) Log everything that is done so that you can go back and correct one-offs for the PDFs that didn't make it through the first pass.
I'm currently converting some legacy code to create PDF files using iTextSharp. We're creating a largish PDF file that contains a number of images, which I'm inserting like so:
Document doc = new Document(PageSize.A4, 50, 50, 25, 25);
PdfWriter writer = PdfWriter.GetInstance(doc, myStream);
writer.SetFullCompression();
doc.Open();
Image frontCover = iTextSharp.text.Image.GetInstance(#"C:\MyImage.png");
//Scale down from a 96 dpi image to standard itextsharp 72 dpi
frontCover.ScalePercent(75f);
frontCover.SetAbsolutePosition(0, 0);
doc.Add(frontCover);
doc.Close();
Inserting an image (20.8 KB png file) seems to increase the PDF file size by nearly 100 KB.
Is there a way of compressing the image before entry (bearing in mind that this needs to be of reasonable print quality), or of further compressing the entire PDF? Am I even performing any compression in the above example?
The answer appears to have been that you need to set an appropriate version of the PDF spec to target and then set the compression as follows:
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
PdfContentByte contentPlacer;
writer.SetPdfVersion(PdfWriter.PDF_VERSION_1_5);
writer.CompressionLevel = PdfStream.BEST_COMPRESSION;
This has brought my file size down considerably. I also found that PNG's were giving me the best results as regards to final size of document.
I did some experiments this morning. My test image was 800x600 with a file size of 100.69K when saved as a PNG. I inserted this into a PDF (using iTextSharp and the usual GetInstance() method) and the file size increased from 301.71K to 402.63K. I then re-saved my test image as a raw bitmap with file size of 1,440,054. I inserted this into the PDF and the file size went DOWN to 389.81K. Interesting!
I did some research on the web for a possible explanation, and, based on what I found, it looks like iTextSharp does not compress images, but rather it compresses everything with some generic compression. So in other words, the BMP is not actually converted to another file type, it's just compressed very much like you would by ZIPping it. Whatever they're doing, it must be good, for it compressed better than the image with PNG compression. I assume iTextSharp woudld try to compress the PNG but would compress at 0% since it already is compressed. (This is inconsistent with the original author's observations, though... Paddy said his PDF size increased much more than the size of the PNG... not sure what to make of that. I can only go on my own experiments).
Conclusions:
1) I don't need to add some fancy library to my project to convert my (eventual dynamically-created) image to PNG; it actually does better to leave it totally uncompressed and let iTextSharp do all the compression work.
2) I also read stuff on the web about iTextSharp saving images at a certain DPI. I did NOT see this problem... I used ScalePercent() method to scale the bitmap to 1% and the file size was the same and there was no "loss" in the bitmap pixels in the bitmap... this confirms that iTextSharp is doing a simple, nice, generic lossless compression.
It seems that PDF requires the png to be transcoded to something else, jpeg, most probably.
see here: http://forums.adobe.com/message/2952201
The only thing I can think of is to convert png to smallest jpeg first, including scaling down 75%, then importing that file without scaling.
use:
var image = iTextSharp.text.Image.GetInstance(srcImage, ImageFormat.Jpeg);
image.ScaleToFit(document.PageSize.Width, document.PageSize.Height);
//image.ScalePercent(75f);
image.SetAbsolutePosition(0, 0);
document.Add(image);
document.NewPage();