PdfSharp.HtmlRenderer does not populate my PDF

PdfSharp.HtmlRenderer does not populate my PDF - c#

When using TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(); my PDF comes up blank. Edit: I believe it is not rendering the styling, but I'm unsure how to solve it. We are using .xsl classes to style the HTML, whereas the GeneratePDF(); function expects to be passed a CSS class, OR it uses the default W3 styling. In theory, it should just work because out styling is all inline anyway (See the HTML sample I've attached below).
Source: PDFGenerator
I have tried rendering the HTML straight to PDF, and also HTML to a bitmap image, then to a PDF.
PdfDocument pdf = PdfGenerator.GeneratePdf(faxBody, PageSize.Letter,20,null,null,null);
string path = HostingEnvironment.MapPath("~/App_Data/mytestpdf.pdf");
pdf.Save(path);
And/Or
PdfDocument pdf = PdfGenerator.GeneratePdf(faxBody, PageSize.Letter,20,null,null,null);
Bitmap bitmap = new Bitmap(1200, 1800);
PdfPage page = new PdfPage();
XImage img = XImage.FromGdiPlusImage(bitmap);
pdf.Pages.Add(page);
XGraphics xgr = XGraphics.FromPdfPage(pdf.Pages[0]);
xgr.DrawImage(img, 0, 0);
string path = HostingEnvironment.MapPath("~/App_Data/mytestpdf.pdf");
pdf.Save(path);
I took the second snippet from another Stackoverflow thread to try. I get the same result either way: A blank PDF. The string "faxBody" definitely contains HTML here.
If I run this in the Immediate Window:
pdf.Save(HostingEnvironment.MapPath("~/App_Data/mytestpdf.pdf"));
I get this:
"Expression has been evaluated and has no value."
PDF: mytestpdf.pdf
HTML string: HTMLsample

Related

HTML text with image to PDF

Having an issue with producing a PDF with an image in IText.
We're able to produce the text of the document using IText, but it is not pulling through the image.
We are following the text in their Ebook at https://kb.itextpdf.com/home/it7kb/ebooks/itext-7-converting-html-to-pdf-with-pdfhtml/chapter-1-hello-html-to-pdf
The code in question is below:
void createPdf(string baseUri, string html, string dest)
{
ConverterProperties properties = new ConverterProperties();
properties.SetBaseUri(baseUri);
HtmlConverter.ConvertToPdf(html, new FileStream(dest, FileMode.Create), properties);
}
And as far as we can see the issue is around the string baseUri
We have assumed that this is the directory where the image is held in our C# project in visual studio and have so far used the following to no avail as a string:
/Images/
/Images/NewLogo.png
http://localhost:64070/Images/NewLogo.png
None of these have produced the image in the PDF and any help or suggestions would be greatly appreciated.

We have found that if we set the BaseUri to the location of an image on a url that we are able to produce a Image on a PDF

I've been attempting to find an easy solution to exporting a Canvas in my WPF Application to a PDF Document.
So far, the best solution has been to use the PrintDialog and set it up to automatically use the Microsoft Print the PDF 'printer'. The only problem I have had with this is that although the PrintDialog is skipped, there is a FileDialog to choose where the file should be saved.
Sadly, this is a deal-breaker because I would like to run this over a large number of canvases with automatically generated PDF names (well, programitically provided anyway).
Other solutions I have looked at include:
Using PrintDocument, but from my experimentation I would have to manually iterate through all my Canveses children and manually invoke the correct Draw method (of which a lot of my custom elements with transformation would be rather time consuming to do)
Exporting as a PNG image and then embedding that in a PDF. Although this works, TextBlocks within my canvas are no longer text. So this isn't an ideal situation.
Using the 3rd party library PDFSharp has the same downfall as the PrintDocument. A lot of custom logic for each element.
With PDFSharp. I did find a method fir generating the XGraphics from a Canvas but no way of then consuming that object to make a PDF Page
So does anybody know how I can skip or automate the PDF PrintDialog, or consume PDFSharp XGraphics to make
A page. Or any other ideas for directions to take this besides writing a whole library to convert each of my Canvas elements to PDF elements.

If you look at the output port of a recent windows installation of Microsoft Print To PDF
You may note it is set to PORTPROMP: and that is exactly what causes the request for a filename.
You might note lower down, I have several ports set to a filename, and the fourth one down is called "My Print to PDF"
So very last century methodology; when I print with a duplicate printer but give it a different name I can use different page ratios etc., without altering the built in standard one. The output for a file will naturally be built:-
A) Exactly in one repeatable location, that I can file monitor and rename it, based on the source calling the print sequence, such that if it is my current default printer I can right click files to print to a known \folder\file.pdf
B) The same port can be used via certain /pt (printto) command combinations to output, not just to that default port location, but to a given folder\name such as
"%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" /pt listIN.doc "My Print to PDF" "My Print to PDF" "listOUT.pdf"
Other drivers usually charge for the convenience of WPF programmable renaming, but I will leave you that PrintVisual challenge for another of your three wishes.
MS suggest XPS is best But then they would be promoting it as a PDF competitor.
It does not need to be Doc[X]2PDF it could be [O]XPS2PDF or aPNG2PDF or many pages TIFF2PDF etc. etc. Any of those are Native to Win 10 also other 3rd party apps such as [Free]Office with a PrintTo verb will do XLS[X]2PDF. Imagination becomes pagination.

I had a great success in generating PDFs using PDFSharp in combination with SkiaSharp (for more advanced graphics).
Let me begin from the very end:
you save the PdfDocument object in the following way:
PdfDocument yourDocument = ...;
string filename = #"your\file\path\document.pdf"
yourDocument.Save(filename);
creating the PdfDocument with a page can be achieved the following way (adjust the parameters to fit your needs):
PdfDocument yourDocument = new PdfDocument();
yourDocument.PageLayout = PdfPageLayout.SinglePage;
yourDocument.Info.Title = "Your document title";
PdfPage yourPage = yourDocument.AddPage();
yourDocument.Orientation = PageOrientation.Landscape;
yourDocument.Size = PageSize.A4;
the PdfPage object's content (as an example I'm putting a string and an image) is filled in the following way:
using (XGraphics gfx = XGraphics.FromPdfPage(yourPage))
{
XFont yourFont = new XFont("Helvetica", 20, XFontStyle.Bold);
gfx.DrawString(
"Your string in the page",
yourFont,
XBrushes.Black,
new XRect(0, XUnit.FromMillimeter(10), page.Width, yourFont.GetHeight()),
XStringFormats.Center);
using (Stream s = new FileStream(#"path\to\your\image.png", FileMode.Open))
{
XImage image = XImage.FromStream(s);
var imageRect = new XRect()
{
Location = new XPoint() { X = XUnit.FromMillimeter(42), Y = XUnit.FromMillimeter(42) },
Size = new XSize() { Width = XUnit.FromMillimeter(42), Height = XUnit.FromMillimeter(42.0 * image.PixelHeight / image.PixelWidth) }
};
gfx.DrawImage(image, imageRect);
}
}
Of course, the font objects can be created as static members of your class.
And this is, in short to answer your question, how you consume the XGraphics object to create a PDF page.
Let me know if you need more assistance.

Unable to read text in a specific location in a pdf file using iTextSharp

I'm given to read a pdf texts and do some stuffs are extracting the texts. I 'm using iTextSharp to read the PDF. The problem here is that the PdfTextExtractor.GetTextFromPage doesnt give me all the contents of the page. For ex
In the above PDF I m unable to read texts that are highlighted in blue. Rest of the characters I m able t read. Below is the line that does the above
`string filePath = "myFile path";
PdfReader pdfReader = new PdfReader(filePath);
for (int page = 1; page<=1; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
}`
Any suggestions here?
I have went through lots of queries and solution in SO but not specific to this query.

The reason for text extraction not extracting those texts is pretty simple: Those texts are not part of the static page content but form fields! But "Text extraction" in iText (and other PDF libraries I know, too) is considered to mean "extraction of the text of the static page content". Thus, those texts you miss simply are not subject to text extraction.
If you want to make form field values subject to your text extraction code, too, you first have to flatten the form field visualizations. "Flattening" here means making them part of the static page content and dropping all their form field dynamics.
You can do that by adding after reading the PDF in this line
PdfReader pdfReader = new PdfReader(filePath);
code to flatten this PDF and loading the flattened PDF into the pdfReader, e.g. like this:
MemoryStream memoryStream = new MemoryStream();
PdfStamper pdfStamper = new PdfStamper(pdfReader, memoryStream);
pdfStamper.FormFlattening = true;
pdfStamper.Writer.CloseStream = false;
pdfStamper.Close();
memoryStream.Position = 0;
pdfReader = new PdfReader(memoryStream);
Extracting the text from this re-initialized pdfReader will give you the text from the form fields, too.
Unfortunately, the flattened form text is added at the end of the content stream. As your chosen text extraction strategy SimpleTextExtractionStrategy simply returns the text in the order it is drawn, the former form fields contents all are extracted at the end.
You can change this by using a different text extraction strategy, i.e. by replacing this line:
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
Using the LocationTextExtractionStrategy (which is part of the iText distribution) already returns a better result; unfortunately the form field values are not exactly on the same base line as the static contents we perceive to be on the same line, so there are some unexpected line breaks.
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
Using the HorizontalTextExtractionStrategy (from this answer which contains both a Java and a C# version thereof) the result is even better. Beware, though, this strategy is not universally better, read the warnings in the answer text.
ITextExtractionStrategy strategy = new HorizontalTextExtractionStrategy();

Superscript to PDF document

I'm looking for a solution, either paid or free...
I have superscript text stored in SQL in RTF format, I need to print the superscript on a PDF document along with other text. So for example the PDF doc might read "Equation 1:" and then print the superscript text extracted from SQL.
I have been searching for an easy way to do this and so far come up empty.
The current PDF docs are made with PDFSharp but i'm happy to change that for a workable solution.
I thought of converting the rtf to an image with PdfConverter and then placing the image on the pdf doc but that doesn't seem to work.
I tried to do this with the following code however it throws an error "Parameter is not valid".
PdfConverter pdfConverter = new PdfConverter();
byte[] rtfstring = pdfConverter.GetPdfBytesFromRtfString(spec);
ImageConverter conv = new ImageConverter();
Image i = (Image)conv.ConvertFrom(rtfstring);

Print PDF from ASP.Net without preview

I've generated a pdf using iTextSharp and I can preview it very well in ASP.Net but I need to send it directly to printer without a preview. I want the user to click the print button and automatically the document prints.
I know that a page can be sent directly to printer using the javascript window.print() but I don't know how to make it for a PDF.
Edit: it is not embedded, I generate it like this;
...
FileStream stream = new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create);
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf, stream);
pdf.Open();
pdf.Add(new Paragraph(member.ToString()));
pdf.Close();
Response.Redirect("~1.pdf");
...
And here I am.

Finally I made it, but I had to use an IFRAME, I defined an IFrame in the aspx and didn't set the src property, in the cs file I made generated the pdf file and set the src property of the iFrame as the generated pdf file name, like this;
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf,
new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create));
pdf.Open();
//This action leads directly to printer dialogue
PdfAction jAction = PdfAction.JavaScript("this.print(true);\r", writer);
writer.AddJavaScript(jAction);
pdf.Add(new Paragraph("My first PDF on line"));
pdf.Close();
//Open the pdf in the frame
frame1.Attributes["src"] = "~1.pdf";
And that made the trick, however, I think that i should implement your solution Stefan, the problem is that I'm new to asp.net and javascript and if I don't have a complete source code I could not code your suggestion but at least is the first step, I was very surprised how much code in html and javascript i need to learn. Thnx.

Is the pdf embedded in the page with embedd-tag or just opened in a frame or how are you showing it?
If its embedded, just make sure that the object is selected and then do a print().
Get the ref to the embedded document.
var x = document.getElementById("mypdfembeddobject");
x.click();
x.setActive();
x.focus();
x.print();

It's a little more tricky if you're using pdfsharp but quite doable
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
XGraphics gfx = XGraphics.FromPdfPage(page);
XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic);
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black,
new XRect(0, 0, page.Width, page.Height),
XStringFormats.Center);
// real stuff starts here
// current version of pdfsharp doesn't support actions
// http://www.pdfsharp.net/wiki/WorkOnPdfObjects-sample.ashx
// so we got to get close to the metal see chapter 12.6.4 of
// http://partners.adobe.com/public/developer/pdf/index_reference.html
PdfDictionary dict = new PdfDictionary(document); //
dict.Elements["/S"] = new PdfName("/JavaScript"); //
dict.Elements["/JS"] = new PdfString("this.print(true);\r");
document.Internals.AddObject(dict);
document.Internals.Catalog.Elements["/OpenAction"] =
PdfInternals.GetReference(dict);
document.Save(Server.MapPath("2.pdf"));
frame1.Attributes["src"] = "2.pdf";

ALso, try this gem:
<link ref="mypdf" media="print" href="mypdf.pdf">
I havent tested it, but what I have read about it, it can be used in this way to let the mypdf.pdf be printed instead of page content whatever method you are using to print the page.
Search for media="print" to check out more.

You can embed javascript in the pdf, so that the user gets a print dialog as soon as their browser loads the pdf.
I'm not sure about iTextSharp, but the javascript that I use is
var pp = this.getPrintParams();
pp.interactive = pp.constants.interactionLevel.automatic;
this.print(pp);
For iTextSharp, check out http://itextsharp.sourceforge.net/examples/Chap1106.cs

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.