ITextsharp not displaying ARABIC data when converting html table data to PDF - c#

I am making a report.following is a code and sample.I am using html table for reports.When I run the code pdf is successfully generated but Arabic is not showing.Can you guide me how can i embed Arabic in it.Can you modify my following code which shows arabic data.
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=TestPage.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
tblid1.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 80f, 80f, -2f, 35f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
<table id="tblid1" runat="server">
<tr>
<td>سلطانالخارج</td>
<td>مسندم</td>
</tr>
</table>

You would need to embed a font into your pdf that supports arabic glyphs.
string fontpath = Environment.GetEnvironmentVariable( "SystemRoot" ) + "\\fonts\\arabtype.ttf";
BaseFont basefont = BaseFont.CreateFont( fontpath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED );
Font arabicFont = new Font( basefont, 10f, Font.NORMAL );
Answer found in thread: Itextsharp and arabic character?
EDIT: This is how I would do it based on the examples I could find and what you're trying to accomplish:
using(WebClient client = new WebClient()) {
string htmlString = client.DownloadString(url);
}
FontFactory.Register("c:/windows/fonts/arabtype.TTF");
StyleSheet style = new StyleSheet();
style.LoadTagStyle("body", "face", "%NAME OF ARABIC FONT%");
style.LoadTagStyle("body", "encoding", BaseFont.IDENTITY_H);
using (Document document = new Document(PageSize.A4, 80f, 80f, -2f, 35f)) {
PdfWriter writer = PdfWriter.GetInstance(
document, Response.OutputStream
);
document.Open();
foreach(IElement element in HTMLWorker.ParseToList(
new StringReader(htmlString), style))
{
document.Add(element);
}
}
Note that you would need to ensure you are registering the correct TTF file that contains the encoding for arabic characters and that you would need to replace %NAME OF ARABIC FONT% with the name of the font you're using, and replace %VARIABLE CONTAINING YOUR RAW HTML% with the actual HTML.

var arialFontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
FontFactory.Register(arialFontPath);
BaseFont bf = BaseFont.CreateFont(arialFontPath, BaseFont.IDENTITY_H, true);
iTextSharp.text.Font FontAr = new iTextSharp.text.Font(bf);
iTextSharp.text.FontFactory.RegisterDirectory(arialFontPath);
StyleSheet styles = new StyleSheet();
styles.LoadTagStyle(HtmlTags.DIV, HtmlTags.FONTSIZE, "16");
styles.LoadTagStyle(HtmlTags.DIV, HtmlTags.COLOR, "navy");
styles.LoadTagStyle(HtmlTags.DIV, HtmlTags.FONTWEIGHT, "bold");
styles.LoadTagStyle(HtmlTags.P, HtmlTags.INDENT, "30px");
styles.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
styles.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);
List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), styles);
for (int k = 0; k < htmlarraylist.Count; k++)
{
pdfDocument.Add((IElement)htmlarraylist[k]);
}
this piece of code worked for me, arabic language works perfect in this,
just pass your html text to my htmltext Variable..
Hopefully it will work fine.

Related

Html to PDf conversion unicode character rendering as empty

I am converting some html into pdf using itext sharp. First i have filled out some html string into String Writer then using below mentioned code to converty byte array into pdf
Problem is unicode character [arabic in specific] is rendering empty.
My code is
var sw = new StringWriter();
sw = GetHtmlContent();// here i fetch html
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
//Bind a parser to our PDF document
using (var htmlparser = new HTMLWorker(pdfDoc))
{
//Bind the writer to our document and our final stream
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
//Parse the HTML directly into the document
htmlparser.Parse(sr);
pdfDoc.Close();
//Grab the bytes from the stream before closing it
data = ms.ToArray();
}
}
}
}
}
Response.Buffer = false;
Response.Clear();
Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", "attachment; filename=Test.pdf");
Response.BinaryWrite(data);
Response.End();
Please help me what's wrong in it
Check below steps to display unicode characters in converting Html to Pdf
Create a HTMLWorker
Register a unicode font and assign it
Create a style sheet and set the encoding to Identity-H
Assign the style sheet to the html parser
Check below code
TextReader reader = new StringReader(html);
Document document = new Document(PageSize.A4, 30, 30, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(FileName, FileMode.Create));
HTMLWorker worker = new HTMLWorker(document);
document.Open();
FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
ST.LoadTagStyle("body", "encoding", "Identity-H");
worker.Style = ST;
worker.StartDocument();
Check below link for more understanding....
Display Unicode characters in converting Html to Pdf
Hindi, Turkish, and special characters are also display during converting from HTML to PDF using this method. Check below demo image.

Export div content to pdf format

I used the following code to export div content to pdf format using itextsharp
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=Panel.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
Panel1.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
// Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 100f, 0f);
Document pdfDoc = new Document(new Rectangle(1000f, 1000f));
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
it shows the following error
Could not find a part of the path 'C:\Program Files\Common Files\Microsoft Shared\DevServer\10.0\Images\logo.png'.`
if i hide the logo in desin then it exported to pdf but the alignment are missing. how to correct it.The page i need to export is follows
In order to export the content to any pdf file, you need to follow the basic conventions of pdf file, those are given in the appendix G of the following pdf reference,
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
Firstly install class in Nuget console: Install-Package itextsharp.xmlworker
Then add namespace: using iTextSharp.tool.xml;
Response.ContentType = "application/pdf";
//string pdf;
// pdf = Convert.ToInt32(hidTablEmpCode.ToString()).ToString() + ".Pdf";
Response.AddHeader("content-disposition", "attachment;filename=Panel.pdf");
//Response.AddHeader("content-disposition", "attachment;filename=Panel.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
// var example_html = #"<p>This is <span class="" headline="" style="">some</span> sample text<span style="">!!!</span></p>";
StringReader sr = new StringReader(lblMessageDetail.Text);
Document pdfDoc = new Document(PageSize.A4, 50f, 50f, 50f, 50f);
var writer = PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
for (int i = 0; i < 5; i++)
{
string imageFilePath2 = Server.MapPath(".") + "/images/GraphicNew.jpg";
iTextSharp.text.Image jpg3 = iTextSharp.text.Image.GetInstance(imageFilePath2);
jpg3.Alignment = iTextSharp.text.Image.UNDERLYING;
jpg3.ScaleToFit(3530, 835);
jpg3.SetAbsolutePosition(0, 0);
pdfDoc.Add(jpg3);
}
XMLWorkerHelper.GetInstance().ParseXHtml(writer, pdfDoc, sr);
string imageFilePath = Server.MapPath(".") + "/Bimages/ns5.png";
iTextSharp.text.Image jpg = iTextSharp.text.Image.GetInstance(imageFilePath);
GetEmp();
string imageURLS = Server.MapPath("~/Emp/Document/") + hidTablEmpCode.Value + "/" + hidSign.Value.ToString();
iTextSharp.text.Image jpg2 = iTextSharp.text.Image.GetInstance(imageURLS);
jpg2.ScaleToFit(100, 50);
jpg2.SetAbsolutePosition(370, 530);
jpg2.Alignment = Element.ALIGN_LEFT;
pdfDoc.Add(jpg2);
jpg.Alignment = iTextSharp.text.Image.ALIGN_LEFT;
jpg.ScaleToFit(100, 50);
jpg.SetAbsolutePosition(50, 735);
pdfDoc.Add(jpg);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();

Itextsharp pdf GridView Positioning

I am working with asp.net application and in which i am using itexsharp to generate pdf file. I need to export my gridview to pdf and the problem is after generating pdf the gridview is always postioning at the top of the page. My requriement is i need the gridview at the centre of the page and at the top i will have some text entered throught the textbox from the front end..
Please help me to find solution for this and Thanks in Advance
here is the code for my button click:
Here is my code for the button click to generate PDF:
{
string name = TextBox1.Text;
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=UserDetails.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Chunk c = new Chunk("Export GridView to PDF Using iTextSharp \n",FontFactory.GetFont("Verdana", 15));
Paragraph p = new Paragraph();
p.Alignment = Element.ALIGN_CENTER;
p.Add(c);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
string s = sw.ToString();
GridView1.AllowPaging = false;
BindDDL();
GridView1.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A2, 7f, 7f, 7f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
var wri = PdfWriter.GetInstance(pdfDoc, new FileStream("D:/xyz" + ".pdf", FileMode.Create));
pdfDoc.Open();
htmlparser.Parse(sr);
string contents = File.ReadAllText(Server.MapPath("~/bill.htm"));
contents = contents.Replace("[CNAME]", name);
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(contents), null);
foreach (var htmlElement in parsedHtmlElements)
pdfDoc.Add(htmlElement as IElement);
pdfDoc.Add(p);
System.Xml.XmlTextReader xmlReader = new System.Xml.XmlTextReader(new StringReader(s));
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
}
set gridview property as follows to position it in center
<asp:GridView ID="GridView1" RowStyle HorizontalAlign="Center" ...>
m not gettin wat you exactly need...
but place your txtbox and gridview inside the table

Showing Nothing in created PDF using iTextSharp?

i am trying to export my HTML page to PDF using iTextSharp lib. firstly i am trying to print table which is in my Aspx page.PDF is created successfully but there nothing in PDF.
i am using this code below :
protected void btnExportToPdf_Click(object sender, EventArgs e)
{
btnExportToPdf.Visible = false;
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=TestPage.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
exportTable.RenderControl(hw);
//this.Page.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 100f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End(); }
Now I'm not too seasoned, but the problem to me looks like you are never actually writing the strings to the file. I see a few different string objects you initialize, and I see you adding the HTML to it, but I never see you actually write them to the PDF. This is my usual process for parsing HTML to PDF, maybe you can look at it, and see what you are missing.
System.Text.StringBuilder store = new System.Text.StringBuilder();
string line;
while ((line = htmlReader.ReadLine()) != null)
{
store.Append(line + Environment.NewLine);
}
string html = store.ToString();
FileStream stream = new FileStream(newFileName, FileMode.Create, FileAccess.Write);
Document document = new Document(PageSize.LETTER, 15, 15, 35, 25);
PdfWriter writer = PdfWriter.GetInstance(document, stream);
document.Open();
System.Collections.Generic.List<IElement> htmlarraylist = new List<IElement>(HTMLWorker.ParseToList(new StringReader(html), new StyleSheet()));
foreach (IElement element in htmlarraylist)
{
document.Add(element);
}
document.Close();
That was what jumped out to me at least, was the fact that it doesn't look like you actually write the generated text to the PDF. I hope this helps.

Html to pdf some characters are missing (itextsharp)

I want to export gridview to pdf by using the itextsharp library. The problem is that some turkish characters such as İ,ı,Ş,ş etc... are missing in the pdf document. The code used to export the pdf is:
protected void LinkButtonPdf_Click(object sender, EventArgs e)
{
Response.ContentType = "application/pdf";
Response.ContentEncoding = System.Text.Encoding.UTF8;
Response.AddHeader("content-disposition", "attachment;filename=FileName.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
System.IO.StringWriter stringWrite = new StringWriter();
System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);
GridView1.RenderControl(htmlWrite);
StringReader reader = new StringReader(textConvert(stringWrite.ToString()));
Document doc = new Document(PageSize.A4);
HTMLWorker parser = new HTMLWorker(doc);
PdfWriter.GetInstance(doc, Response.OutputStream);
doc.Open();
parser.Parse(reader);
doc.Close();
}
public static string textConvert(string S)
{
if (S == null) { return null; }
try
{
System.Text.Encoding encFrom = System.Text.Encoding.UTF8;
System.Text.Encoding encTo = System.Text.Encoding.UTF8;
string str = S;
Byte[] b = encFrom.GetBytes(str);
return encTo.GetString(b);
}
catch { return null; }
}
Note: when I want to insert characters into the pdf document, the missing characters are shown in it. I insert the characters with this code:
BaseFont bffont = BaseFont.CreateFont("C:\\WINDOWS\\Fonts\\arial.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Font fontozel = new Font(bffont, 12, Font.NORMAL, new Color(0, 0, 0));
doc.Add(new Paragraph("İİııŞŞşşĞĞğğ", fontozel));
Finaly I think I found the solution,I changed itextsharp source code a little in order to show turkish characters.(turkish character code is cp1254)
I add "public const string CP1254 = "Cp1254";" to [BaseFont.cs] in the source code.
After that I modify the [FactoryProperties.cs].I changed like this;
public Font GetFont(ChainedProperties props)
{
I don't write the whole code.I changed only code below;
------------Default itextsharp code------------------------------------------------------
if (encoding == null)
encoding = BaseFont.WINANSI;
return fontImp.GetFont(face, encoding, true, size, style, color);
-------------modified code--------------------------------------------
encoding = BaseFont.CP1254;
return fontImp.GetFont("C:\\WINDOWS\\Fonts\\arial.ttf", encoding, true, size, style, color);
}
.After I compile new dll ,and missing characters are shown.
No need to change the source code.
Try this:
iTextSharp.text.pdf.BaseFont STF_Helvetica_Turkish = iTextSharp.text.pdf.BaseFont.CreateFont("Helvetica","Cp1254", iTextSharp.text.pdf.BaseFont.NOT_EMBEDDED);
iTextSharp.text.Font fontNormal = new iTextSharp.text.Font(STF_Helvetica_Turkish, 12, iTextSharp.text.Font.NORMAL);
thank you very much all who posted the samples..
i use the below solution from codeproject , and there was the turkish char set problems due to font..
If you use htmlworker you should register font and pass to htmlworker
http://www.codeproject.com/Articles/260470/PDF-reporting-using-ASP-NET-MVC3
StyleSheet styles = new iTextSharp.text.html.simpleparser.StyleSheet();
styles.LoadTagStyle("h3", "size", "5");
styles.LoadTagStyle("td", "size", ".6");
FontFactory.Register("c:\\windows\\fonts\\arial.ttf", "Garamond"); // just give a path of arial.ttf
styles.LoadTagStyle("body", "face", "Garamond");
styles.LoadTagStyle("body", "encoding", "Identity-H");
styles.LoadTagStyle("body", "size", "12pt");
using (var htmlViewReader = new StringReader(htmlText))
{
using (var htmlWorker = new HTMLWorker(pdfDocument, null, styles))
{
htmlWorker.Parse(htmlViewReader);
}
}
I am not familiar with the iTextSharp library; however, you seem to be converting the output of your gridview component to a string and reading from that string to construct your PDF document. You also have a strange conversion from UTF-8 to UTF-8 going on.
From what I can see (given that your GridView is outputting characters correctly) if you are outputting the characters to a string they would be represented as UTF-16 in memory. You probably need to pass this string directly into the PDF library (like how you pass the raw UTF-16 .NET string "İııŞŞşşĞĞğğ" as it is).
You can use:
iTextSharp.text.pdf.BaseFont Vn_Helvetica = iTextSharp.text.pdf.BaseFont.CreateFont(#"C:\Windows\Fonts\arial.ttf", "Identity-H", iTextSharp.text.pdf.BaseFont.EMBEDDED);
iTextSharp.text.Font fontNormal = new iTextSharp.text.Font(Vn_Helvetica, 12, iTextSharp.text.Font.NORMAL);
For Turkish encoding
CultureInfo ci = new CultureInfo("tr-TR");
Encoding enc = Encoding.GetEncoding(ci.TextInfo.ANSICodePage);
If you're outputting HTML, try different DOCTYPE tags at the top of the page.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Note if using HTML you may need to HTMLEncode the characters.
Server.HTMLEncode()
HttpServerUtility.HtmlEncode()
BaseFont bF = BaseFont.CreateFont("c:\\arial.ttf","windows-1254",true);
Font f = new Font(bF,12f,Font.NORMAL);
Chunk c = new Chunk();
c.Font = f;
c.Append("Turkish characters: ĞÜŞİÖÇ ğüşıöç");
document.Add(c);
In the first line, you may write these instead of "windows-1254". All works:
Cp1254
iso-8859-9
windows-1254
Don't change the source code of the iTextSharp. Define a new style:
var styles = new StyleSheet();
styles.LoadTagStyle(HtmlTags.BODY, HtmlTags.FONTFAMILY, "tahoma");
styles.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, "Identity-H");
and then pass it to the HTMLWorker.ParseToList method.
i have finally find a soultution for this problem , by this you can print all turkish character.
String htmlText = html.ToString();
Document document = new Document();
string filePath = HostingEnvironment.MapPath("~/Content/Pdf/");
PdfWriter.GetInstance(document, new FileStream(filePath + "\\pdf-"+Name+".pdf", FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw = new iTextSharp.text.html.simpleparser.HTMLWorker(document);
FontFactory.Register(Path.Combine(_webHelper.MapPath("~/App_Data/Pdf/arial.ttf")), "Garamond"); // just give a path of arial.ttf
StyleSheet css = new StyleSheet();
css.LoadTagStyle("body", "face", "Garamond");
css.LoadTagStyle("body", "encoding", "Identity-H");
css.LoadTagStyle("body", "size", "12pt");
hw.SetStyleSheet(css);
hw.Parse(new StringReader(htmlText));
I strongly suggest not to change itextsharp source code in order to solve this problem. Have a look at my other comment on the subject: https://stackoverflow.com/a/24587745/1138663
I solved the problem. I can provide my the other solution type...
try
{
BaseFont bf = BaseFont.CreateFont("c:\\windows\\fonts\\calibrib.ttf",
BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Document document = new Document(PageSize.A4, 25, 25, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, fs);
Font f = new Font(bf, 12f, Font.NORMAL);
// Open the document to enable you to write to the document
document.Open();
// Add a simple and wellknown phrase to the document
for (int x = 0; x != 100; x++)
{
document.Add(new Paragraph("Paragraph - This is a test! ÇçĞğİıÖöŞşÜü",f));
}
// Close the document
document.Close();
}
catch(Exception)
{
}

Categories

Resources