I want to display chinese text from html to pdf by using itextsharp in c#
Text in HTML is properly visible but when i tried to make pdf from Xml Parser in iTextSharp it does not show me chinese texts.
UTF8 encoding is not working properly. I also given Encoding.UTF8 but it also not worked.
Below are my code to generate PDF from html.
public static byte[] HtmlToPDFConvert(string baseHtml, Rectangle pageSize)
{
Stream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(baseHtml ?? ""));
Document pdfDoc = new Document(pageSize, 18f, 18f, 18f, 18f);
using (MemoryStream memoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, memoryStream);
pdfDoc.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(writer, pdfDoc, htmlStream, null, Encoding.UTF8, FontFactory.FontImp);
pdfDoc.Close();
byte[] bytes = memoryStream.ToArray();
memoryStream.Close();
return bytes;
}
}
Since Xmlworker has been deprecated by pdfHTML, I've used it instead.
The only trick is to point to a font that supports the glyphs you want to use.
ConverterProperties props = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(true, true, true);
fontProvider.AddFont("fonts/NotoSansCJKjp-Regular.otf");
props.SetFontProvider(fontProvider);
PdfDocument doc = new PdfDocument(new PdfWriter(DEST));
HtmlConverter.ConvertToPdf(new FileStream(ORIG, FileMode.Open), doc, props);
Related
I am using itext7 pdfhtml (4.0.3) to convert Html to pdf in memory. Below method is taking html in memory and returning PdfDocument object of itext7. I need to convert that PdfDocument object to byte array or stream.
Please let me know how we can achieve that.
private iText.Kernel.Pdf.PdfDocument CreatePdf( string html)
{
byte[] bytes = Encoding.ASCII.GetBytes(html);
ConverterProperties properties = new ConverterProperties();
properties.SetBaseUri(path);
MemoryStream myMemoryStream = new MemoryStream(bytes);
PdfWriter writer = new(myMemoryStream);
iText.Kernel.Pdf.PdfDocument pdf = new iText.Kernel.Pdf.PdfDocument(writer);
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToDocument(html,pdf,properties);
return pdf;
}
to #mkl's point, you're kind of overdoing it. Here's a simple example:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
var pdf = new PdfDocument(new PdfWriter(memoryStream));
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToPdf(html, pdf, new ConverterProperties());
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);
the memoryStream will have your in memory representation of your conversion. I've added the WriteAllBytes bits just so you can see for yourself.
Another note, if you do not require setting any PdfDocument properties, you can use an even simpler version:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
HtmlConverter.ConvertToPdf(html, memoryStream);
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);
I have a problem here while converting an HTML string to pdf using iText 5 .NET and attaching the pdf byte array to mail attachment.
I am able to complete the above task without styling the table, means a mail will be triggered with pdf attached without css styles like below link
but the expected pdf table be like below link
![Img of Expected pdf][2]
I have used inline styles dynamically so, is there a way that I can get the expected result pdf with styles?
Below code is been used to convert the html string to pdf byte array
public static byte[] HtmlToPdf(string sb) //here sb is the HTML string with inline styles
{
byte[] bytes;
StringReader sr = new StringReader(sb.ToString());
Document pdfDoc = new Document(PageSize.A4_LANDSCAPE, 10f, 10f, 10f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
using (MemoryStream memoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, memoryStream);
pdfDoc.Open();
htmlparser.Parse(new StringReader(sb));
pdfDoc.Close();
bytes = memoryStream.ToArray();
memoryStream.Close();
}
return bytes;
}
the byte used as an attachment in mail functionality
I'm adding an image to a iTextSharp pdf in this way:
iTextSharp.text.Document doc = new iTextSharp.text.Document(PageSize.A4, 0F,0F, 0F, 0F);
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance("https://s3-eu-west-1.amazonaws.com/foo/bar.png");
img.ScaleToFit(595, 120);
MemoryStream ms = new MemoryStream();
PdfWriter.GetInstance(doc, ms);
doc.Open();
doc.Add(img);
doc.Close();
Then I convert it to byte[] and then to base64 string, so it can be handled by AWS API Gateway.
byte[] pdf = ms.ToArray();
var headersDic = new Dictionary<string, string>();
headersDic.Add("Content-type", "application/pdf");
headersDic.Add("Content-disposition", "inline;filename=file.pdf");
return new APIGatewayProxyResponse
{
Body = Convert.ToBase64String(pdf),
IsBase64Encoded = true,
Headers = headersDic,
StatusCode = 200
};
But the image is not shown in the pdf, just an almost indistinguishable part of it, which is the line you can see here:
Any ideas? maybe the conversions to byte[] or base64 and to binary again are giving problems?
When I add text everything goes good.
I have a html page with Inline styles, containing few </br> tags and div tags also two tables,
I am usinh ITextSharp-5 version.
I have converted my HTML page to PDf using ItextSharp. The problem here is, the final PDF document which is converted is not having the exact style of HTML page . Below is my code for conversion.
var htmlFile = System.IO.File.ReadAllText(HttpContext.Current.Server.MapPath("~/Templates/GIPConversion.HTML"));
StringReader sr = new StringReader(htmlFile.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 10f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
using (MemoryStream memoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, memoryStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
bytes = memoryStream.ToArray();
memoryStream.Close();
}
Please let me know If I am missing anything while conversion, My final output should be same as my HTML page.
Thanks .
I am converting some html into pdf using itext sharp. First i have filled out some html string into String Writer then using below mentioned code to converty byte array into pdf
Problem is unicode character [arabic in specific] is rendering empty.
My code is
var sw = new StringWriter();
sw = GetHtmlContent();// here i fetch html
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
//Bind a parser to our PDF document
using (var htmlparser = new HTMLWorker(pdfDoc))
{
//Bind the writer to our document and our final stream
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
//Parse the HTML directly into the document
htmlparser.Parse(sr);
pdfDoc.Close();
//Grab the bytes from the stream before closing it
data = ms.ToArray();
}
}
}
}
}
Response.Buffer = false;
Response.Clear();
Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", "attachment; filename=Test.pdf");
Response.BinaryWrite(data);
Response.End();
Please help me what's wrong in it
Check below steps to display unicode characters in converting Html to Pdf
Create a HTMLWorker
Register a unicode font and assign it
Create a style sheet and set the encoding to Identity-H
Assign the style sheet to the html parser
Check below code
TextReader reader = new StringReader(html);
Document document = new Document(PageSize.A4, 30, 30, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(FileName, FileMode.Create));
HTMLWorker worker = new HTMLWorker(document);
document.Open();
FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
ST.LoadTagStyle("body", "encoding", "Identity-H");
worker.Style = ST;
worker.StartDocument();
Check below link for more understanding....
Display Unicode characters in converting Html to Pdf
Hindi, Turkish, and special characters are also display during converting from HTML to PDF using this method. Check below demo image.