HTML to PDF conversion using WkHtmlToXSharp Caching / Buffering Issue

HTML to PDF conversion using WkHtmlToXSharp Caching / Buffering Issue - c#

I want to convert an HTML file to PDF file, and I was using "wkhtmltopdf.exe".
Then we moved this application to a shared hosting server. This server, wouldn't allow to run .exe files, so that I have to use the WkHtmlToXSharp.dll [wrapper for the above exe].
Its working fine but the problem is this it caching the output somewhere, so that every-time I create a new PDF, it always giving the first one.
I have called .Dispose() and setting the converter to null but no use.
But after a certian time, it bring the new PDF, that means it caching or buffering the byte data somewhere.
Below is my code. every-time I pass a new html file[htmlFullPath] with different images in it.
IHtmlToPdfConverter converter = new MultiplexingConverter();
converter.ObjectSettings.Page = htmlFullPath;
converter.ObjectSettings.Web.EnablePlugins = true;
converter.ObjectSettings.Web.EnableJavascript = true;
converter.ObjectSettings.Web.Background = true;
converter.ObjectSettings.Web.LoadImages = true;
converter.ObjectSettings.Load.LoadErrorHandling = LoadErrorHandlingType.ignore;
converter.GlobalSettings.Orientation = (PdfOrientation)Enum.Parse(typeof(PdfOrientation), orientation);
if (!string.IsNullOrEmpty(pageSize))
converter.GlobalSettings.Size.PageSize = (PdfPageSize)Enum.Parse(typeof(PdfPageSize), pageSize);
converter.GlobalSettings.Margin.Top = "0cm";
converter.GlobalSettings.Margin.Bottom = "0cm";
converter.GlobalSettings.Margin.Left = "0cm";
converter.GlobalSettings.Margin.Right = "0cm";
Byte[] bufferPDF = converter.Convert();
System.IO.File.WriteAllBytes(pdfUrl, bufferPDF);
converter.Dispose();
converter = null;

As I mentioned in the question "every-time I pass a new html file[htmlFullPath] with different images in it".
The image for each HTML is different but the Image name was same.
I have renamed the image also with time stamp and all working fine.
That means image with same name making the real problem, it may be a buffering issue of MultiplexingConverter or some settings in the IIS. which I will investigate later.

Related

NReco HTML-to-PDF Generator GeneratePdfFromFiles method throws exception

I have a fully working system for creating single page PDFs from HTML as below;
After initializing the converter
var nRecoHTMLToPDFConverter = new HtmlToPdfConverter();
nRecoHTMLToPDFConverter = PDFGenerator.PDFSettings(nRecoHTMLToPDFConverter);
string PDFContents;
PDFContents is an HTML string which is being populated.
The following command works perfectly and gives me the byte[] which I can return;
createDTO.PDFContent = nRecoHTMLToPDFConverter.GeneratePdf(PDFContents);
The problem arises when I want to test and develop the multi page functionality of the NReco library and change an arbitrary number of HTML pages to PDF pages.
var stringArray = new string[]
{
PDFContents, PDFContents,
};
var stream = new MemoryStream();
nRecoHTMLToPDFConverter.GeneratePdfFromFiles(stringArray, null, stream);
var mybyteArray = stream.ToArray();
the PDFContents are exactly the same as above. On paper, this should give me the byte array for 2 identical PDF pages however on call to GeneratePdfFromFiles method, I get the following exception;
WkHtmlToPdfException: Exit with code 1 due to network error: HostNotFoundError (exit code: 1)
Please help me resolve this if you have experience with this library and its complexities. I have a feeling that I'm not familiar with the proper use of a Stream object in this scenario. I've tested the working single page line and the malfunctioning multi page lines on the same method call so their context would be identical.
Many thanks

GeneratePdfFromFiles method you used expects array of file names (or URLs): https://www.nrecosite.com/doc/NReco.PdfGenerator/?topic=html/M_NReco_PdfGenerator_HtmlToPdfConverter_GeneratePdfFromFiles_1.htm
If you operate with HTML content as .NET strings you may simply save it to temp files, generate PDF and remove after that.

how to add report rdlc image parameter to get the image from application settings?

In my WinForms application, different users work on the application. Each user, uses it's own logo and name that I store it in app settings. Now, I want to display the app settings image on the reports. I have googled it, as per my search no one talks about this way. Here is what I found and tried it:
this.reportViewer1.LocalReport.EnableExternalImages = true;
string imgFrom = new Uri(Properties.Settings.Default.system_img).AbsolutePath;
ReportParameter parameter = new ReportParameter("img", imgFrom);
this.reportViewer1.LocalReport.SetParameters(parameter);
Before this, I have added a parameter to the report. But this doesn't work in my case.
Any can tell me how to do this?

After editing the code and made two changes in the above code, it works just fine. Here are the changes:
this.reportViewer1.LocalReport.EnableExternalImages = true;
string imgFrom = new Uri(Properties.Settings.Default.system_img).AbsoluteUri; // changed from AbsolutePath to AbsoluteUri
ReportParameter rpImg = new ReportParameter("img", imgFrom);
this.reportViewer1.LocalReport.SetParameters(new ReportParameter[] { rp1, rp2, rpImg }); // add the parameter to another set of parameters that I had before.

Process.StandardOutput Read method returns empty (sometimes)

I am using wkhtmltopdf to generate a PDF file from a HTML string. The code is pretty much the one that follows:
// ...
processStartInfo.UseShellExecute = false;
processStartInfo.CreateNoWindow = true;
processStartInfo.RedirectStandardInput = true;
processStartInfo.RedirectStandardOutput = true;
processStartInfo.RedirectStandardError = true;
// ...
process = Process.Start(processStartInfo);
using (StreamWriter stramWriter = process.StandardInput)
{
stramWriter.AutoFlush = true;
stramWriter.Write(htmlCode);
}
byte[] buffer = new byte[32768], file;
using (var memoryStream = new MemoryStream())
{
while (true)
{
int read = process.StandardOutput.BaseStream.Read(buffer, 0, buffer.Length);
if (read <= 0)
break;
memoryStream.Write(buffer, 0, read);
}
file = memoryStream.ToArray();
}
process.WaitForExit(60000);
process.Close();
return file;
This works as expected, but for one specific piece of HTML, the first call of the StandardOutput.BaseStream.Read method returns an empty byte array, in which case the StandardOutput.EndOfStream is also true.
I would normally suspect the wkhtmltopdf tool failing to process the HTML input for any reason, but the problem is that this only happens in about two out of five attempts, so I now suspect that this might have something to do with process buffering and output stream reading. However, I don't seem to be able to
figure out what the exact problem is.
What could cause this behavior?
Update
Reading the StandardError was the obvious approach, but did not help, it is always an empty string. Neither did the process.ExitCode (-1073741819) which, based on my knowledge, just states that "the process crashed".

After almost a year of production usage, wkhtmltopdf is doing its job, with the issue described above reported not more than five times so far.
The problem usually goes away when adding a DIV somewhere toward the end of the document, with a value of height enough to cause the last line of text to move onto the next page (say 20px), if the page happens to be full.
We knew that the tool sometimes has trouble in properly splitting the HTML content into pages because in such cases it generated (say) seven pages while the page numbering reported only six; so the last page's number was "7 of 6". Maybe it sometimes fails completely and doesn't get to generate the pages at all, we thought. The document is generated from a highly dynamic HTML content. Making a change that resulted in a shorter/longer content without using dummy DIVs was relatively easy, and that's how we got through the errors so far.
Right now we are testing puppeteer.

c# printing through PDF drivers, print to file option will output PS instead of PDF

After struggling whole day, I identified the issue but this didn't solve my problem.
On short:
I need to open a PDF, convert to BW (grayscale), search some words and insert some notes nearby found words. At a first look it seems easy but I discovered how hard PDF files are processed (having no "words" concepts and so on).
Now the first task, converting to grayscale just drove me crazy. I didn't find a working solution either commercial or free. I came up with this solution:
open the PDF
print with windows drivers, some free PDF printers
This is quite ugly since I will force the C# users to install such 3'rd party SW but.. that is fpr the moment. I tested FreePDF, CutePDF and PDFCreator. All of them are working "stand alone" as expected.
Now when I tried to print from C#, obviously, I don't want the print dialog, just select BW option and print (aka. convert)
The following code just uses a PDF library, shown for clarity only.
Aspose.Pdf.Facades.PdfViewer viewer = new Aspose.Pdf.Facades.PdfViewer();
viewer.BindPdf(txtPDF.Text);
viewer.PrintAsGrayscale = true;
//viewer.RenderingOptions = new RenderingOptions { UseNewImagingEngine = true };
//Set attributes for printing
//viewer.AutoResize = true; //Print the file with adjusted size
//viewer.AutoRotate = true; //Print the file with adjusted rotation
viewer.PrintPageDialog = true; //Do not produce the page number dialog when printing
////PrinterJob printJob = PrinterJob.getPrinterJob();
//Create objects for printer and page settings and PrintDocument
System.Drawing.Printing.PrinterSettings ps = new System.Drawing.Printing.PrinterSettings();
System.Drawing.Printing.PageSettings pgs = new System.Drawing.Printing.PageSettings();
//System.Drawing.Printing.PrintDocument prtdoc = new System.Drawing.Printing.PrintDocument();
//prtdoc.PrinterSettings = ps;
//Set printer name
//ps.PrinterName = prtdoc.PrinterSettings.PrinterName;
ps.PrinterName = "CutePDF Writer";
ps.PrintToFile = true;
ps.PrintFileName = #"test.pdf";
//
//ps.
//Set PageSize (if required)
//pgs.PaperSize = new System.Drawing.Printing.PaperSize("A4", 827, 1169);
//Set PageMargins (if required)
//pgs.Margins = new System.Drawing.Printing.Margins(0, 0, 0, 0);
//Print document using printer and page settings
viewer.PrintDocumentWithSettings(ps);
//viewer.PrintDocument();
//Close the PDF file after priting
What I discovered and seems to be little explained, is that if you select
ps.PrintToFile = true;
no matter C# PDF library or PDF printer driver, Windows will just skip the PDF drivers and instead of PDF files will output PS (postscript) ones which obviously, will not be recognized by Adobe Reader.
Now the question (and I am positive that others who may want to print PDFs from C# may be encountered) is how to print to CutePDF for example and still suppress any filename dialog?
In other words, just print silently with programmatically selected filename from C# application. Or somehow convince "print to file" to go through PDF driver, not Windows default PS driver.
Thanks very much for any hints.

I solved conversion to grayscale with a commercial component with this post and I also posted there my complete solution, in care anyone will struggle like me.
Converting PDF to Grayscale pdf using ABC PDF

Magick.NET takes a long time loading PDF

I am using Magick.NET to grab the first page of a PDF and convert it to a thumbnail. It's working well, but for larger files with lots of images and many pages, it takes a long time to load up the PDF itself. Is there a way to tell Magick.NET to ignore any pages after the first one?
I am loading them in directly from a steam after a PDF is uploaded.

You can specify the pages to read with the FrameIndex and FrameCount properties of the MagickReadSettings object.
using (MagickImageCollection collection = new MagickImageCollection())
{
MagickReadSettings settings = new MagickReadSettings();
settings.FrameIndex = 0; // First page
settings.FrameCount = 1; // Number of pages
collection.Read("Snakeware.pdf", settings);
}
I have also updated the documentation here: https://magick.codeplex.com/wikipage?title=Convert%20PDF

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

HTML to PDF conversion using WkHtmlToXSharp Caching / Buffering Issue - c#

Related

NReco HTML-to-PDF Generator GeneratePdfFromFiles method throws exception

how to add report rdlc image parameter to get the image from application settings?

Process.StandardOutput Read method returns empty (sometimes)

c# printing through PDF drivers, print to file option will output PS instead of PDF

Magick.NET takes a long time loading PDF

Categories

Resources