Convert HTML file to PDF file using ITextSharp - c#

I'd like to accomplish the following:
Given the path name of an html file, and the desired pathname of a pdf file, convert the HTML file to PDF using ITextSharp. I've seen plenty of code samples which do close to this but not exactly what I need. I believe my solution will need to use the iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList() function but I'm having trouble getting this to work with an actual HTML file and outputting an actual PDF file.
public void GeneratePDF(string htmlFileName, string outputPDFFileName)
{...}
is the function I'd really like to get working properly.
Thanks in advance
Edit: Here's an example I've of what I've tried:
iTextSharp.text.Document doc = new Document();
PdfWriter.GetInstance(doc, new FileStream(Path.GetFullPath("fromHTML.pdf"), FileMode.Create));
doc.Open();
try
{
List<IElement> list = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(File.ReadAllText(this.textBox1.Text)), null);
foreach (IElement elm in list)
{
doc.Add(elm);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
doc.Close();
Note that textBox1.Text contains the full path name of the html file I'm trying to convert to pdf and I want this to get output to "fromHTML.pdf"
Thanks!

I had the same requirement and was diverted to this page by Google but could not find a concrete answer.
But after some head hitting and trials, i have been able to successfully convert the HTML code to PDF using iTextSharp library 5.1.1.
The code that i have shared here also takes care of the img tags in HTML with relative paths. iTextSharp library throws an error if your img tags do not have absolute src.
You an find the code here:
http://am22tech.com/s/22/Blogs/post/2011/09/28/HTML-To-PDF-using-iTextSharp.aspx
Let me know if you need more information. The code is in c#.

Related

How to find a text in the uploaded PDF file in ASP.NET c#

I want to find whether a text is present in the uploaded PDF file in ASP.NET c#.
using (MemoryStream str = new MemoryStream(this.docUploadField.FileBytes))
{
using (StreamReader sr = new StreamReader(str, Encoding.UTF8))
{
string line = sr.ReadToEnd();
}
}
I am getting the below as the file content when I read the contents of file.
Please help me with this
You surely need some PDF reading library.
Most famous being
IText (ITextSharp for who remembers it): https://github.com/itext/itext7-dotnet
PdfSharp: https://github.com/empira/PDFsharp
and many other free options.
With those you open pdf file and read it and take the text you need.
Usually they give you a collection of the PDF elements (paragraphs, images, etc etc, and you loop through them or use a search function to look for what you need)

SelectPDF ConvertUrl Non-English Characters Error

In my demo project I'm using Selectpdf tool to convert html pages to pdf documents. These html pages are stored locally. So I'm using ConvertUrl function for conversion. Here is the inline code
`
string url = AppDomain.CurrentDomain.BaseDirectory + "HTML" + "\\OrderName_" + DateTime.Now.ToString("yyyy'-'MM'-'dd'_'HH'-'mm_") + MockOrderNo + ".html";
HtmlToPdf converter = new HtmlToPdf();
PdfDocument doc = converter.ConvertUrl(htmlurl);
`
Then I save the pdf document, using doc.Save(). Here is the pdf document result
Now as you can see there is a problem displaying Turkish characters like "İ,ı,ş,ğ...". How can I resolve this using SelectPdf? If solving this with SelectPdf is not possible, what are the other prefable pdf conversion tools that does not have this kind of problem?
Also for my requirements I don't use ConvertHtmlString function. I need to store html pages in a folder, convert these html pages to pdf and store those pdf documents in an another folder.
Thanks for your help
I just changed the encoding of html file to windows-1252. This solved the problem

Converting html to PDF using pdftron "ToPDF cannot convert this file format on this platform"

I get the below error we do that.
"ToPDF cannot convert this file format on this platform"
File is available at the locations. I am simply trying to convert a html file to pdf.
bool err = false;
try
{
PDFDoc pdfdoc = new PDFDoc();
string input_file_path = Path.Combine(Directory.GetCurrentDirectory(), "test.html");
pdftron.PDF.Convert.ToPdf(pdfdoc, input_file_path);
pdfdoc.Destroy();
}
catch (Exception e)
{
err = true;
}
HTML to PDF conversion on UWP is not currently available.
In fact, I am not aware of any completely server less, device only, HTML to PDF conversions for UWP.
At least not any solution that can handle any HTML input, and convert to PDF. There may be solutions if your HTML is a very narrow and known subset of HTML/CSS/JS.
Instead, for general HTML to PDF conversion on mobile, you would utilize a server component.

How to use C# to display pdf in onenote using com/interop api

I'm new to stack overflow, C# and onenote interop com api. I'm trying to display a pdf file in onenote using C# and the onenote com/interop api (I'd rather not use the REST API).
I am able to display a link to a pdf file using the tag < InsertedFile pathSource="[myfilepath]" preferredName = "[myPreferredName]"> in conjunction with the UpdatePageContent function in the interop API, but this doesn't display the PDF.
I have been able to get my program to display an image in onenote using the following code to create the image tag
private XElement createImageTag(Image image)
{
string OneNoteNamespace = "http://schemas.microsoft.com/office/onenote/2013/onenote";
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.toBase64(image);
img.Add(data);
return img;
}
private string toBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
I tried altering this for a pdf instead of am image by converting a pdf to a byte array then converting it to base64 and assigning the result as data.Value in the createImageTag function but it did not result in a displayed pdf either (presumably because onenote was expecting an image and not a pdf). I'd like to avoid using third party libraries or extensions to convert a pdf to an image if possible, and haven't found any other ways to convert a pdf to an image.
I used ONOMSpy to look for any other onenote/xml tags that might help me display a pdf in onenote, but didn't see others besides the Image and InsertedFile tags that looked like they were close to doing what I wanted.
so if you could help me either :
1) find an easy way to convert a pdf to an image using C# or
2) show me how to tell onenote to display the PDF
I'd really appreciate it. Thanks!

ITextSharp Multiple Page PDF from several templates

I have a requirement to generate a PDF from multiple different (Unknown page Sized PDF's)
Create a cover sheet from a template and write the text onto it.
Pull a PDF (Unknown page size) and append to the above 3) Repeat
until all required PDF's are attached
Step 1 is not a problem and this is working, so I have a a cover sheet PDF generated. I now need a way to append the additional PDF's as above. How can we achieve this using ITextSharp?
If you are trying to concatenate multiple PDF files into one you may take a look at the following post.
I found a simple way to do this, I found something called PDFCopy in ITextSharp
void MergePdfStreams(List<Stream> Source, Stream Dest)
{
var copy = new PdfCopyFields(Dest);
foreach (Stream source in Source)
{
var reader = new PdfReader(source);
copy.AddDocument(reader);
}
copy.Close();
}
Source : Is there a straight forward way to append one PDF doc to another using iTextSharp?

Categories

Resources