I am using the following code to convert large HTML content to PDF using iTextSharp SelectPDF. None of the HTML pages are correctly converted. The data vanishes after 6 pages into the PDF.
public static void CreatePDFFromHTMLFile(string HtmlStream, string FileName)
{
try
{
// read parameters from the webpage
string htmlString = HtmlStream;
string baseUrl = "";
string pdf_page_size = "A4";
PdfPageSize pageSize = (PdfPageSize)Enum.Parse(typeof(PdfPageSize),
pdf_page_size, true);
string pdf_orientation = "Portrait";
PdfPageOrientation pdfOrientation =
(PdfPageOrientation)Enum.Parse(typeof(PdfPageOrientation),
pdf_orientation, true);
int webPageWidth = 1024;
int webPageHeight = 0;
HtmlToPdf converter = new HtmlToPdf();
// set converter options
converter.Options.PdfPageSize = pageSize;
converter.Options.PdfPageOrientation = pdfOrientation;
converter.Options.WebPageWidth = webPageWidth;
converter.Options.WebPageHeight = webPageHeight;
// create a new pdf document converting an url
PdfDocument doc = converter.ConvertHtmlString(htmlString, baseUrl);
doc.Save(FileName);
doc.Close();
}
catch (Exception ex)
{
Tracing.HandleException(ex);
}
}
That is happening because you are using the community edition of SelectPdf which is free with the limitation that you can only convert to pdf up to 5-6 pages long, if you need more than that you will need to use the non-free one instead from here https://selectpdf.com/downloads/
The problem is that you are using SelectPdf community edition. According to SelectPdf (http://selectpdf.com/community-edition/), the free version is limited to 5 pages.
If you want to convert longer pages, you need to use the commercial edition, but that is not free.
Related
I'm trying to convert html codes to pdf with pdfsharp & migradoc. I use the RenderDocument() function for Turkish characters.But after the RenderDocument() function I get this error.
System.InvalidOperationException: '' DocumentRenderer 'must be set before calling' PrepareDocumentRenderer '.'
I wrote the code below by looking at the example in this link.
http://www.pdfsharp.net/wiki/HelloMigraDoc-sample.ashx
protected void btnGeneratePdf_Click(object sender, EventArgs e)
{
string html = "";
using (var client = new WebClient())
{
html = client.DownloadString("http://localhost:14670/WebForm6");
}
PdfGenerateConfig config = new PdfGenerateConfig();
config.PageSize = PageSize.A4;
config.SetMargins(20);
var doc = PdfGenerator.GeneratePdf(html, config);
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true);
renderer.PdfDocument = doc;
renderer.RenderDocument();
var tmpFile = "C://Users//mutlu.ozkurt//Desktop//Files/tmp372A.pdf";
renderer.PdfDocument.Save(tmpFile);
Process.Start(tmpFile);
}
You are using the HTML Renderer for PDF using PDFsharp that creates a PDF file, not a MigraDoc document. You mix this with sample code from MigraDoc. Things do not work this way.
Use the doc variable you get and use it to create a PDF directly without calling any MigraDoc code.
i have a netcore 3 app to read and split a PDF containing paychecks of some companies which i am working for.
This app ran pretty well since last builds... my the way, the PDF reader started to fail to parse the contents of any PDF.
PDF is built only with Italian words, no special chars. Few tables and a single logo. I'm not able to attach it due to privacy.
public PaycheckSplitter Read()
{
using (var reader = new PdfReader(new MemoryStream(this._stream)))
{
var doc = new PdfDocument(reader);
this.Paycheck = new PaychecksCollection();
for (int i = 1; i <= doc.GetNumberOfPages(); i++)
{
PdfPage page = doc.GetPage(i);
string text = PdfTextExtractor.GetTextFromPage(page, new LocationTextExtractionStrategy());
if (text.Contains(Consts.BpEnd)) break;
// trying to find something by regex... btw text contains only a sequence of \n\n\n\n...
string cf = Consts.CodFiscale.Match(text).Value;
this.Paychecks.Add(new Paycheck(cf), i);
}
doc.Close();
}
return this;
}
Anything i can do?
As far as i can see... the only and best way to have something to read a PDF text for free is iText7...
I have almost 900mb of PDF file and I want to convert it to documents or .docx
I've use sautinsoft.pdfFocus
Using this code
string pdfFile = #"d:\Coffee Table Book NPPNP (1).pdf";
string wordFile = #"d:\sample.docx";
// Convert PDF file to DOCX file
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
f.OpenPdf(pdfFile);
if (f.PageCount > 0)
{
// You may choose output format between Docx and Rtf.
f.WordOptions.Format = SautinSoft.PdfFocus.CWordOptions.eWordDocument.Docx;
int result = f.ToWord(wordFile);
MessageBox.Show(result.ToString());
// Show the resulting Word document.
if (result == 0)
{
System.Diagnostics.Process.Start(wordFile);
}
}
After running this code the application get laggy.
And how do I know if how many pages where converted?
String inputPath = #"d:\Coffee Table Book NPPNP (1).pdf";
String outputPath = #"d:\sample.docx";
PDFDocument doc = new PDFDocument(inputPath);
doc.ConvertToDocument(DocumentType.DOCX, outputPath);
Better to refer the above code to reduce complexity.
while downloading html code into pdf in selectpdf software. im getting error saying - "Conversion error: Could not open url".im using selectpdf for converting html code to pdf. what is the base url i have to give .
using SelectPdf;
public partial class HtmlcodePrint : System.Web.UI.Page
{
string TxtHtmlCode;
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
TxtHtmlCode = #"<html>
<body>
Hello World from selectpdf.com.
</body>
</html>
";
}
}
protected void Btndownloadpdf_Click(object sender, EventArgs e)
{
// read parameters from the webpage
string htmlString = TxtHtmlCode;
string baseUrl = "http://localhost:51868/HtmlcodePrint.aspx";
string pdf_page_size ="A4";
PdfPageSize pageSize = (PdfPageSize)Enum.Parse(typeof(PdfPageSize),
pdf_page_size, true);
string pdf_orientation = "Portrait";
PdfPageOrientation pdfOrientation =
(PdfPageOrientation)Enum.Parse(typeof(PdfPageOrientation),
pdf_orientation, true);
int webPageWidth = 1024;
try
{
webPageWidth = Convert.ToInt32("1024");
}
catch { }
int webPageHeight = 0;
try
{
webPageHeight = Convert.ToInt32("777");
}
catch { }
// instantiate a html to pdf converter object
HtmlToPdf converter = new HtmlToPdf();
// set converter options
converter.Options.PdfPageSize = pageSize;
converter.Options.PdfPageOrientation = pdfOrientation;
converter.Options.WebPageWidth = webPageWidth;
converter.Options.WebPageHeight = webPageHeight;
// create a new pdf document converting an url
PdfDocument doc = converter.ConvertHtmlString(htmlString, baseUrl);
// save pdf document
doc.Save(Response, false, "Sample.pdf");
// close pdf document
doc.Close();
}
}
I know this is old, but I've been working with SelectPdf for a couple of days, so I'll throw in my 2 cents.
You probably don't need a baseUrl...
You don't have to give any baseUrl at all to the ConvertHtmlString function. You can just pass it the html string you want to convert and that's it.
Unless...
You only need to pass it a baseUrl if the html you're converting has relative paths in the external references (example: if you were referencing a stylesheet and wanted to use a relative path, you could provide the baseUrl to show where you wanted the stylesheet to be relative to). It's just so the converter can create the full absolute paths from the relative paths.
So...
If you don't need that functionality or just don't have external references in your html, then you can just use
converter.ConvertHtmlString(htmlString);
Also...
doc.Save(Response, false, "Sample.pdf");
may not be what you're looking for either. I only say this because the comments look like the same ones on the examples on the site for SelectPDF, so I'm assuming you copied the code from there (which is what I originally did too), in which case I want to let you know you don't have to save your PDF doc with that particular version of Save. It actually has 3 overloads to allow you to save your doc as:
a byte array (default)
a stream
a file
an HTTP response (the one you're using now, as shown in the examples from the site)
So, like I pointed out, you're using the one that saves the PDF as a HTTP response, so if you're wanting to save it as an actual PDF file directly, you'll need to change it to
doc.Save(fileName)
with the fileName variable as the absolute or relative path or file name you want to save the PDF to.
Hope this helps
I am using HiQPdf to convert and combine a list of html page into one pdf document.
this is how i'm doing this:
public class HtmlToPdfEditor
{
private string _firstPage;
private string _secondPage;
//private const string _HiQPdfSerialNumber = "";
private PdfDocument _document;
public HtmlToPdfEditor(string firstPage, string secondPage)
{
_firstPage = firstPage;
_secondPage=secondPage;
}
public void ConvertAll(string outputPath)
{
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
_document = new PdfDocument();
//_document.SerialNumber = _HiQPdfSerialNumber;
string firstPageDoc = GetDocument(_firstPage, "firstPage.pdf");
string secondPageDoc = GetDocument(_secondPage, "secondtPage.pdf");
this.JoinDocument(PdfFromFile(firstPageDoc));
this.JoinDocument(PdfFromFile(secondPageDoc));
_document.WriteToFile(outputPath);
_document.Close();
_document = null;
}
private PdfDocument PdfFromFile(string path)
{
return PdfDocument.FromFile(path);
}
private int JoinDocument(PdfDocument document)
{
var nbPages = _document.Pages.Count;
_document.AddDocument(document);
document.Close();
return nbPages;
}
private string GetDocument(string content, string outputFile)
{
var baseUrl = "";
var htmlToPdfConverter = GetPdfExporter();
htmlToPdfConverter.ConvertHtmlToFile(content, baseUrl, outputFile);
return outputFile;
}
public HtmlToPdf GetPdfExporter()
{
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
//htmlToPdfConverter.SerialNumber = _HiQPdfSerialNumber;
htmlToPdfConverter.Document.PageSize = PdfPageSize.A4;
htmlToPdfConverter.Document.PageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.Document.Margins = new PdfMargins(2);
htmlToPdfConverter.HtmlLoadedTimeout = 60;
htmlToPdfConverter.TriggerMode = ConversionTriggerMode.WaitTime; //Time to load the html
htmlToPdfConverter.WaitBeforeConvert = 1;
return htmlToPdfConverter;
}
}
The issue here is that in the resulting document the page converted from html are displayed as empty pages, only google chrome display them correctly , in firefox these pages continue indefinitely in the loading state.
Notice that if I convert the Html to a PdfDocument instead of storing it to a file and then joining it. The resulting document is perfectly readable but unfortunately I can't use this method.
Any help will be much apreciated!! thx!!
Yes, that's correct, the PDF documents you add to a main document must remain opened until you close the main document.
If the PDF documents you merge are produced from HTML there is actually an easier way to merge the HTML documents in a PDF following the approach from Convert Many HTML to PDF example.
// create an empty PDF document
PdfDocument document = new PdfDocument();
// add a page to document
PdfPage page1 = document.AddPage(PdfPageSize.A4, new PdfDocumentMargins(5),
PdfPageOrientation.Portrait);
try
{
// set the document header and footer before
// adding any objects to document
SetHeader(document);
SetFooter(document);
// layout the HTML from URL 1
PdfHtml html1 = new PdfHtml(textBoxUrl1.Text);
PdfLayoutInfo html1LayoutInfo = page1.Layout(html1);
// determine the PDF page where to add URL 2
PdfPage page2 = null;
System.Drawing.PointF location2 = System.Drawing.PointF.Empty;
if (checkBoxNewPage.Checked)
{
// URL 2 is laid out on a new page with the selected orientation
page2 = document.AddPage(PdfPageSize.A4, new PdfDocumentMargins(5),
GetSelectedPageOrientation());
location2 = System.Drawing.PointF.Empty;
}
else
{
// URL 2 is laid out immediately after URL 1 and html1LayoutInfo
// gives the location where the URL 1 layout finished
page2 = document.Pages[html1LayoutInfo.LastPageIndex];
location2 = new System.Drawing.PointF(html1LayoutInfo.LastPageRectangle.X,
html1LayoutInfo.LastPageRectangle.Bottom);
}
// layout the HTML from URL 2
PdfHtml html2 = new PdfHtml(location2.X, location2.Y, textBoxUrl2.Text);
page2.Layout(html2);
// write the PDF document to a memory buffer
byte[] pdfBuffer = document.WriteToMemory();
// inform the browser about the binary data format
HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");
// let the browser know how to open the PDF document
HttpContext.Current.Response.AddHeader("Content-Disposition",
String.Format("attachment; filename=LayoutMultipleHtml.pdf;
size={0}",
pdfBuffer.Length.ToString()));
// write the PDF buffer to HTTP response
HttpContext.Current.Response.BinaryWrite(pdfBuffer);
// call End() method of HTTP response
// to stop ASP.NET page processing
HttpContext.Current.Response.End();
}
finally
{
document.Close();
}
Ok I resolved this issue by ensuring that all pdf documents to be merged are closed after closing the final document. In other words the method JoinDocument will no longer call document.Close(). I'll call it later after closing the final document (_document).