Magick.NET takes a long time loading PDF

Magick.NET takes a long time loading PDF - c#

I am using Magick.NET to grab the first page of a PDF and convert it to a thumbnail. It's working well, but for larger files with lots of images and many pages, it takes a long time to load up the PDF itself. Is there a way to tell Magick.NET to ignore any pages after the first one?
I am loading them in directly from a steam after a PDF is uploaded.

You can specify the pages to read with the FrameIndex and FrameCount properties of the MagickReadSettings object.
using (MagickImageCollection collection = new MagickImageCollection())
{
MagickReadSettings settings = new MagickReadSettings();
settings.FrameIndex = 0; // First page
settings.FrameCount = 1; // Number of pages
collection.Read("Snakeware.pdf", settings);
}
I have also updated the documentation here: https://magick.codeplex.com/wikipage?title=Convert%20PDF

Related

Exporting WPF Canvas to PDF

I've been attempting to find an easy solution to exporting a Canvas in my WPF Application to a PDF Document.
So far, the best solution has been to use the PrintDialog and set it up to automatically use the Microsoft Print the PDF 'printer'. The only problem I have had with this is that although the PrintDialog is skipped, there is a FileDialog to choose where the file should be saved.
Sadly, this is a deal-breaker because I would like to run this over a large number of canvases with automatically generated PDF names (well, programitically provided anyway).
Other solutions I have looked at include:
Using PrintDocument, but from my experimentation I would have to manually iterate through all my Canveses children and manually invoke the correct Draw method (of which a lot of my custom elements with transformation would be rather time consuming to do)
Exporting as a PNG image and then embedding that in a PDF. Although this works, TextBlocks within my canvas are no longer text. So this isn't an ideal situation.
Using the 3rd party library PDFSharp has the same downfall as the PrintDocument. A lot of custom logic for each element.
With PDFSharp. I did find a method fir generating the XGraphics from a Canvas but no way of then consuming that object to make a PDF Page
So does anybody know how I can skip or automate the PDF PrintDialog, or consume PDFSharp XGraphics to make
A page. Or any other ideas for directions to take this besides writing a whole library to convert each of my Canvas elements to PDF elements.

If you look at the output port of a recent windows installation of Microsoft Print To PDF
You may note it is set to PORTPROMP: and that is exactly what causes the request for a filename.
You might note lower down, I have several ports set to a filename, and the fourth one down is called "My Print to PDF"
So very last century methodology; when I print with a duplicate printer but give it a different name I can use different page ratios etc., without altering the built in standard one. The output for a file will naturally be built:-
A) Exactly in one repeatable location, that I can file monitor and rename it, based on the source calling the print sequence, such that if it is my current default printer I can right click files to print to a known \folder\file.pdf
B) The same port can be used via certain /pt (printto) command combinations to output, not just to that default port location, but to a given folder\name such as
"%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" /pt listIN.doc "My Print to PDF" "My Print to PDF" "listOUT.pdf"
Other drivers usually charge for the convenience of WPF programmable renaming, but I will leave you that PrintVisual challenge for another of your three wishes.
MS suggest XPS is best But then they would be promoting it as a PDF competitor.
It does not need to be Doc[X]2PDF it could be [O]XPS2PDF or aPNG2PDF or many pages TIFF2PDF etc. etc. Any of those are Native to Win 10 also other 3rd party apps such as [Free]Office with a PrintTo verb will do XLS[X]2PDF. Imagination becomes pagination.

I had a great success in generating PDFs using PDFSharp in combination with SkiaSharp (for more advanced graphics).
Let me begin from the very end:
you save the PdfDocument object in the following way:
PdfDocument yourDocument = ...;
string filename = #"your\file\path\document.pdf"
yourDocument.Save(filename);
creating the PdfDocument with a page can be achieved the following way (adjust the parameters to fit your needs):
PdfDocument yourDocument = new PdfDocument();
yourDocument.PageLayout = PdfPageLayout.SinglePage;
yourDocument.Info.Title = "Your document title";
PdfPage yourPage = yourDocument.AddPage();
yourDocument.Orientation = PageOrientation.Landscape;
yourDocument.Size = PageSize.A4;
the PdfPage object's content (as an example I'm putting a string and an image) is filled in the following way:
using (XGraphics gfx = XGraphics.FromPdfPage(yourPage))
{
XFont yourFont = new XFont("Helvetica", 20, XFontStyle.Bold);
gfx.DrawString(
"Your string in the page",
yourFont,
XBrushes.Black,
new XRect(0, XUnit.FromMillimeter(10), page.Width, yourFont.GetHeight()),
XStringFormats.Center);
using (Stream s = new FileStream(#"path\to\your\image.png", FileMode.Open))
{
XImage image = XImage.FromStream(s);
var imageRect = new XRect()
{
Location = new XPoint() { X = XUnit.FromMillimeter(42), Y = XUnit.FromMillimeter(42) },
Size = new XSize() { Width = XUnit.FromMillimeter(42), Height = XUnit.FromMillimeter(42.0 * image.PixelHeight / image.PixelWidth) }
};
gfx.DrawImage(image, imageRect);
}
}
Of course, the font objects can be created as static members of your class.
And this is, in short to answer your question, how you consume the XGraphics object to create a PDF page.
Let me know if you need more assistance.

Exporting RDLC on letter head

I have a requirement to export RDLC report into PDF, it should also contain letter head of the company in background. Problem I see is that RDLC has a header, body and footer, how do we apply common image background? Any idea to this issue?

(Posted as an answer, since it's too long for a comment.)
I don't know of a method to add a page background directly in an RDLC. However, we had a similar issue with our report generator (MS Access, not RDLC), and solved it by (1) creating the PDF without letterhead and then (2) using PDFSharp to merge the resulting PDF with a letterhead ("background") PDF. Something like this might work for your use case as well.
We use the following code:
public static void AddBackground(string source, string background, string result)
{
using (var formBackground = XPdfForm.FromFile(background))
using (var pdf = PdfReader.Open(source, PdfDocumentOpenMode.Modify))
{
foreach (var page in pdf.Pages.Cast<PdfPage>())
{
var xg = XGraphics.FromPdfPage(page, XGraphicsPdfPageOptions.Prepend);
xg.DrawImage(formBackground, 0, 0);
if (formBackground.PageIndex < formBackground.PageCount - 1)
{
formBackground.PageIndex += 1;
}
}
pdf.Save(result);
}
}
All parameters are paths to the respective PDF files. If the background PDF has less pages than the source PDF, then the last page of the background PDF is added to all remaining source PDF pages. It is useful if your first page has a different background than all remaining pages, you just need a 2-page background PDF for that.

iTextSharp GetTextFromPage Only Returns First Page

I am using iTextSharp Version 5.5.12
The code knows there are 10 pages in my pdf. In my loop, only the first page is returned.
PdfReader Pdf = new PdfReader(PATH_TO_PDF);
for (intPageNum = 1; intPageNum <= Pdf.NumberOfPages; intPageNum++)
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string strPageText = PdfTextExtractor.GetTextFromPage(Pdf, intPageNum, strategy);
}
As I step through all ten iterations of the loop, only on the first iteration does strPageText have any text in it.
Any thoughts on what I am doing wrong?
Thanks in advance.

The "problem" appears to be a scanning software setting that combines multiple pdf files into one document (file).
Image Capture Plus software has a Job Setting, on the File tab, under OCR Settings for Searchable PDF. Make sure it is set to "All Pages".

PDFSharp compress filesize in c#

In my App i generate an PDF-File with PDFSharp.Xamarin which I got from this site:
https://github.com/roceh/PdfSharp.Xamarin
Everything is working fine.
In my PDF-Document I have many Images, which are compressed.
But the file size of my PDF-Document is too large.
Is there a possibility to compress my PDF-Document before saving it?
How can I work with the PdfSharp.SharpZipLib.Zip Namespace to deflate the file size?
UPDATE:
Here is my Code:
document = new PdfDocument();
document.Info.Title = nameDok.Replace(" ", "");
document.Info.Author = "---";
document.Info.CreationDate = DateTime.Now;
document.Info.Subject = nameDok.Replace(" ", "");
//That is how i add Images:
XImage image = XImage.FromStream(lstr);
gfx.DrawImage(image, 465, YPrev - 2, newimagewf, newimagehf);
document.CustomValues.CompressionMode = PdfCustomValueCompressionMode.Compressed;
document.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
document.Save(speicherPfad);
Thanks for everyone.

I only know the original PDFsharp, not the Xamarin port: images are deflated automatically using SharpZipLib.
Make sure to use appropriate source images (e.g. JPEG or PNG, depending on the image).
On the project start page they write:
"Currently all images created via XGraphics are converted to jpegs with 70% quality."
This could mean that images are re-compressed, maybe leading to larger files than before.
Take one JPEG file, convert it to PDF, and check the size of the image (in bytes) in the PDF file.

Merging PDFs and remove blank space with ITextSharp

I have a problem when I'm working with image PDF files (PDF file with image only, no text) There are two PDF files img1, img2 and I want to combine two of them into one A4 page PDF file.
I have tried below code.
string Img1 = "C:/temp/image1.pdf";
string Img2 = "C:/temp/image2.pdf";
string MergedFile = "C:/temp/Combo.pdf";
//Create our PDF readers
PdfReader r1 = new PdfReader(Img1);
PdfReader r2 = new PdfReader(Img2);
//Our new page size, an A3 in landscape mode
iTextSharp.text.Rectangle NewPageSize = PageSize.A3.Rotate();
using (FileStream fs = new FileStream(MergedFile, FileMode.Create,
FileAccess.Write, FileShare.None))
{
//Create our document without margins
using (Document doc = new Document(NewPageSize, 0, 0, 0, 0))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
//Get our imported pages
PdfImportedPage imp1 = w.GetImportedPage(r1, 1);
PdfImportedPage imp2 = w.GetImportedPage(r2, 1);
//Add them to our merged document at specific X/Y coords
**w.DirectContent.AddTemplate(imp1, 0, 0);
w.DirectContent.AddTemplate(imp2, 0, -350);**
doc.Close();
}
}
}
r1.Close();
r2.Close();
So when i execute above code, because i have mentioned the y coord , it will combine pdf and two images will be on one page only.
BUt i don't want to do that
Here i am just giving example of two images,but in actual there are more than 20 images (converted into PDFs).
So depending on the image size, it should combine files. i can not give fix y coord for each n every file
Can anyone please help me to combine multiple PDF into single with no blank space..?

Structurally, here is what you want to do:
Allocate a new page of the "right" size
Merge the content streams of the pages
Merge the resources of the pages
Adjust all the annotations (if any)
The first step is easy, the rest, the second is easy, the third not so much (and will have the side effect of complicating step 2). I'll let you know ahead of time that I lied to you about the order.
Merging the content streams will be straight forward. What you will want to do is a four step process (I'll inject here that I know PDF very well, but iTextSharp not too well):
Insert a gsave operator (q)
Insert a transform operator (cm) to transform to the location where you want content to appear. In you case it will be 1 0 0 1 X Y cm
Copy the content streams from the current page
Insert a grestore operator (Q)
To merge the resources, you have to look at your newly created page's resources and for the current page do one of three things for each resource in each class of resource in a PDF page (XObject, Font, ColorSpace, ExtGState, Pattern, Shading, ProcSet - although for procset, you could set each procset to be the entire suite and do no harm):
If the resource exists in the newly created page, but under a different name, mark it as renamed.
If the resource does not exist in the newly created page and there is no resource with the same name, copy it in.
If the resource does not exist in the newly created page and there is a name conflict, rename the resource to a synthetic name not in the newly create page and copy it in.
Now to get back to my lie. In the resource merging, you will likely need a map built for the current page that maps old resource name to new resource name. When in the process of copying the content stream from one to the next, you will need to map all resource names referenced in the content stream to the new names built in the resource merge step.
To Adjust annotations, you will have to move them to their new location by adjusting the Rect property in each. You will also need to reset the /Parent property. For any of the text markup annotations, you will need to adjust the Quads.
Now, here is where the works will get gummed up in all of that. If a page is rotated, this will not work. If a page has a crop box, you will have to look at it and adjust the clipping region to simulate the crop. If the page is rotated and has Text annotations, this will need to attention to annotation flags to ensure that the aspect ratio is correct. If the document has link annotations on any of the pages with GoTo actions/destinations, you will need to adjust these.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.