Hebrew text in PDF

Hebrew text in PDF - c#

I wrote a PDF document, and I try to write in Hebrew (UTF-8), and I can not in Windows Forms using C# and Visual Studio 2010 using the following code.
Document Doc = new Document(PageSize.LETTER);
//Create our file stream
using (FileStream fs = new FileStream("C:\\Users\\moshe\\Desktop\\Test18.pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
//Bind PDF writer to document and stream
PdfWriter writer = PdfWriter.GetInstance(Doc, fs);
//Open document for writing
Doc.Open();
//Add a page
Doc.NewPage();
//Full path to the Unicode Arial file
string ARIALUNI_TFF = Path.Combine("C:\\Users\\moshe\\Desktop\\proj\\gold\\fop\\gold", "ARIAL.TTF");
//Create a base font object making sure to specify IDENTITY-H
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Create a specific font object
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 12);
//Write some text
Doc.Add(new Phrase("מה קורה", f));
//Write some more text
Doc.Add(new Phrase("תודה לכולם", f));
//Close the PDF
Doc.Close();
I put the font in the folder.
What do I need to do?

Use a PdfPTable, then you can set the right-to-left mode:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Document Doc = new Document(PageSize.LETTER);
//Create our file stream
using (FileStream fs = new FileStream(#"C:\Users\moshe\Desktop\Test18.pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
//Bind PDF writer to document and stream
PdfWriter writer = PdfWriter.GetInstance(Doc, fs);
//Open document for writing
Doc.Open();
//Add a page
Doc.NewPage();
//Full path to the Arial file
string ARIALUNI_TFF = Path.Combine(#"C:\Users\moshe\Desktop\proj\gold\fop\gold", "ARIAL.TTF");
//Create a base font object making sure to specify IDENTITY-H
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Create a specific font object
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 12);
//Use a table so that we can set the text direction
PdfPTable T = new PdfPTable(1);
//Hide the table border
T.DefaultCell.BorderWidth = 0;
//Set RTL mode
T.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
//Add our text
T.AddCell(new Phrase("מה קורה", f));
//Add table to document
Doc.Add(T);
//Close the PDF
Doc.Close();
}
}
}
}

Related

ItextSharp Find text in pdf and highlight it

I am working on PDF functionality, I want to search text in PDF and highlight the found text in the PDF. For that I am using iTextsharp.
I did not get any solution yet, please provide me with a solution.
I have written following code;
public ActionResult Index1()
{
string outputFile = #"D:\Test.pdf";
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.Add(new Paragraph("This is a test and sample pdf for test and wait for it"));
doc.Close();
}
}
}
List<int> pages = new List<int>();
PdfReader pdfReader = new PdfReader(outputFile);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
if (currentPageText.Contains("test"))
{
pages.Add(page);
}
}
pdfReader.Close();
//Create a new file from our test file with highlighting
string highLightFile = #"D:\Test1.pdf";
//Bind a reader and stamper to our test PDF
PdfReader reader = new PdfReader(outputFile);
using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (PdfStamper stamper = new PdfStamper(reader, fs))
{
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };
PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
//Set the color
highlight.Color = BaseColor.YELLOW;
//Add the annotation
stamper.AddAnnotation(highlight, 1);
}
}
return View();
}
Above code creates one PDF (test1.pdf)
And in another PDF it highlights some text with hard-coded coordinates, I need to find the coordinates of some text in the PDF.
But I could not find the coordinates of the text I'm looking for.

In iText7, we have implemented this functionality.
It can be found in the class RegexBasedLocationExtractionStrategy.
I suggest you have a look to see how it was done, since this functionality was not backported to iText5.

Not able to fill multiple rows in xfa pdf with FillXfaForm() using itextsharp

I was trying with below code to fill multiple rows in xfa pdf using FillXfaForm() method in C#.net. Anyone please guide me what is wrong on this code. We are using itexsharp.dll version 5.5.4.0.
public void manipulatePdf()
{
using (FileStream existingPdf = new FileStream "existingPdf.pdf",FileMode.Open))
using (FileStream sourceXml = new FileStream("sourceXml.xml", FileMode.Open))
using (FileStream newPdf = new FileStream("mymergedPdf.pdf", FileMode.Create))
{
// Open existing PDF
PdfReader pdfReader = new PdfReader(existingPdf);
// PdfStamper, which will create
PdfStamper stamper = new PdfStamper(pdfReader, newPdf);
AcroFields form = stamper.AcroFields;
XfaForm xfa = form.Xfa;
xfa.FillXfaForm(sourceXml);
stamper.Close();
pdfReader.Close();
}
}

arabic encoding in itextsharp

When I'm trying this code to create a PDF in Arabic using C#, the generated PDF file contains discrete characters. Any help so I can't get continuous characters?
//Create our document object
Document Doc = new Document(PageSize.LETTER);
//Create our file stream
using (FileStream fs = new FileStream(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
{
//Bind PDF writer to document and stream
PdfWriter writer = PdfWriter.GetInstance(Doc, fs);
//Open document for writing
Doc.Open();
//Add a page
Doc.NewPage();
//Full path to the Unicode Arial file
string ARIALUNI_TFF = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "arabtype.TTF");
//Create a base font object making sure to specify IDENTITY-H
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Create a specific font object
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 12, iTextSharp.text.Font.NORMAL);
//Write some text, the last character is 0x0278 - LATIN SMALL LETTER PHI
Doc.Add(new Phrase("This is a ميسو ɸ", f));
//Write some more text, the last character is 0x0682 - ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE
Doc.Add(new Phrase("Hello\u0682", f));
//Close the PDF
Doc.Close();
}

Right-to-left writing and Arabic ligatures are only supported in ColumnText and PdfPTable!
Take a look at the following examples:
http://itextpdf.com/examples/iia.php?id=205 with result: say_peace.pdf
http://itextpdf.com/examples/iia.php?id=215 with result: peace.pdf
Read the book from which these examples are taken to find out why not all words are rendered correctly in peace.pdf
Search http://tinyurl.com/itextsharpIIA2C11 for the corresponding C# version of these examples.
In any case, you need a font that knows how to display Arabic glyphs:
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
You can now add Arabic text, for instance in a table:
PdfPCell cell = new PdfPCell();
cell.AddElement(new Phrase("Hello\u0682", f));
cell.RunDirection = PdfWriter.RUN_DIRECTION_RTL;

Pdf Merge/Overlap with iText

I have used iText for some various utility, such us merge and editing of pdf files with success. Now I need to overlap 2 pdf pages:
For Instance:
INPUT:
PDF#1 (1 Page)
PDF#2 (1 Page)
OUTPUT:
PDF#3 (1 Page: This is the result of the 2 Input Pages Overlapped)
I don't know if it's possible to do this with iText latest version. I am also considering to use one of the 2 input PDF Files as background for the PDF Output Files.
Thank you in advance.

It's actually pretty easy to do. The PdfWriter object has an instance method called GetImportedPage() which returns a PdfImportedPage object. This object can be passed to a PdfContentByte's AddTemplate() method.
GetImportedPage() takes a PdfReader object and the page number that you want to get. You can get a PdfContentByte from an instance of a PdfWriter's DirectContent property.
The code below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.2.0 that shows this all off. It first creates two files on the desktop, the first with just a solid red background color and the second with just a paragraph. It then combines those two files overlapping into a third document. See the code for additional comments.
using System;
using System.IO;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Folder that we'll work from
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string pdf1 = Path.Combine(workingFolder, "pdf1.pdf");//PDF with solid red background color
string pdf2 = Path.Combine(workingFolder, "pdf2.pdf");//PDF with text
string pdf3 = Path.Combine(workingFolder, "pdf3.pdf");//Merged PDF
//Create a basic PDF filled with red, nothing special
using (FileStream fs = new FileStream(pdf1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
PdfContentByte cb = writer.DirectContent;
cb.SetColorFill(BaseColor.RED);
cb.Rectangle(0, 0, doc.PageSize.Width, doc.PageSize.Height);
cb.Fill();
doc.Close();
}
}
}
//Create a basic PDF with a single line of text, nothing special
using (FileStream fs = new FileStream(pdf2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("This is a test"));
doc.Close();
}
}
}
//Create a basic PDF
using (FileStream fs = new FileStream(pdf3, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Get page 1 of the first file
PdfImportedPage imp1 = writer.GetImportedPage(new PdfReader(pdf1), 1);
//Get page 2 of the second file
PdfImportedPage imp2 = writer.GetImportedPage(new PdfReader(pdf2), 1);
//Add the first file to coordinates 0,0
writer.DirectContent.AddTemplate(imp1, 0, 0);
//Since we don't call NewPage the next call will operate on the same page
writer.DirectContent.AddTemplate(imp2, 0, 0);
doc.Close();
}
}
}
this.Close();
}
}
}

iTextSharp HTMLWorker ParseHTML Tablestyle and PDFStamper

Hi I have succesfully used a HTMLWorker to convert a gridview using asp.NET / C#.
(1) I have applied some limited style to the resulting table but cannot see how to apply tablestyle for instance grid lines or apply other formatting style such as a large column width for example for a particular column.
(2) I would actually like to put this text onto a pre-existing template which contains a logo etc. I've used PDF Stamper before for this but cannot see how I can use both PDFStamper and HTMLWorker at once. HTMLWorker needs a Document which implements iDocListener ... but that doesnt seem compatible with usign a PDFStamper. I guess what I am looking for is a way to create a PDFStamper, write title etc, then add the parsed HTML from the grid. The other problem is that the parsed content doesnt interact with the other stuff on the page. For instance below I add a title chunk to the page. Rather than starting below it, the parsed HTML writes over the top. How do I place / interact the parsed HTML content with the rest of what is on the PDF document ?
Thanks in advance
Rob
Here';s the code I have already
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 30f, 0f);
HTMLWorker htmlWorker = new HTMLWorker(pdfDoc);
StyleSheet styles = new StyleSheet();
styles.LoadTagStyle("th", "size", "12px");
styles.LoadTagStyle("th", "face", "helvetica");
styles.LoadTagStyle("span", "size", "10px");
styles.LoadTagStyle("span", "face", "helvetica");
styles.LoadTagStyle("td", "size", "10px");
styles.LoadTagStyle("td", "face", "helvetica");
htmlWorker.SetStyleSheet(styles);
PdfWriter.GetInstance(pdfDoc, HttpContext.Current.Response.OutputStream);
pdfDoc.Open();
//Title - but this gets obsured by data, doesnt move it down
Font font = new Font(Font.FontFamily.HELVETICA, 14, Font.BOLD);
Chunk chunk = new Chunk(title, font);
pdfDoc.Add(chunk);
//Body
htmlWorker.Parse(sr);

Let me first give you a couple of links to look over when you get a chance:
ItextSharp support for HTML and CSS
How to apply font properties on while passing html to pdf using itextsharp
These answers go deeper into what's going on and I recommend reading them when you get a chance. Specifically the second one will show you why you need to use pt instead of px.
To answer your first question let me show you a different way to use the HTMLWorker class. This class has a static method on it called ParseToList that will convert HTML to a List<IElement>. The objects in that list are all iTextSharp specific versions of your HTML. Normally you would do a foreach on those and just add them to a document but you can modify them before adding which is what you want to do. Below is code that takes a static string and does that:
string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
//Our HTML
string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
//ParseToList requires a StreamReader instead of just a string so just wrap it
using (StringReader sr = new StringReader(html))
{
//Create a style sheet
StyleSheet styles = new StyleSheet();
//...styles omitted for brevity
//Convert our HTML to iTextSharp elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
//Loop through each element (in this case there's actually just one PdfPTable)
foreach (IElement el in elements)
{
//If the element is a PdfPTable
if (el is PdfPTable)
{
//Cast it
PdfPTable tt = (PdfPTable)el;
//Change the widths, these are relative width by the way
tt.SetWidths(new float[] { 75, 25 });
}
//Add the element to the document
doc.Add(el);
}
}
doc.Close();
}
}
}
Hopefully you can see that once you get access to the raw PdfPTable you can tweak it as necessary.
To answer your second question, if you want to use the normal Paragraph and Chunk objects with a PdfStamper then you need to use a PdfContentByte object. You can get this from your stamper in one of two ways, either by asking for one that sits "above" existing content, stamper.GetOverContent(int) or one that sits "below" existing content, stamper.GetUnderContent(int). Both versions take a single parameter saying what page to work with. Once you have a PdfContentByte you can create a ColumnText object bound to it and use this object's AddElement() method to add your normal elements. Before doing this (and this answers your third question), you'll want to create at least one "column". When I do this I generally create one that essentially covers the entire page. (This part might sound weird but we're essentially make a single row, single column table cell to add our objects to.)
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0 that shows off everything above. First it creates a generic PDF on the desktop. Then it creates a second document based off of the first, adds a paragraph and then some HTML. See the comments in the code for any questions.
using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
using System.IO;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//The two files that we are creating
string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
string file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");
//Create a base file to write on top of
using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.Add(new Paragraph("Hello world"));
doc.Close();
}
}
}
//Bind a reader to our first document
PdfReader reader = new PdfReader(file1);
//Create our second document
using (FileStream fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (PdfStamper stamper = new PdfStamper(reader, fs))
{
StyleSheet styles = new StyleSheet();
//...styles omitted for brevity
//Our HTML
string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
//ParseToList requires a StreamReader instead of just a string so just wrap it
using (StringReader sr = new StringReader(html))
{
//Get our raw PdfContentByte object letting us draw "above" existing content
PdfContentByte cb = stamper.GetOverContent(1);
//Create a new ColumnText object bound to the above PdfContentByte object
ColumnText ct = new ColumnText(cb);
//Get the dimensions of the first page of our source document
iTextSharp.text.Rectangle page1size = reader.GetPageSize(1);
//Create a single column object spanning the entire page
ct.SetSimpleColumn(0, 0, page1size.Width, page1size.Height);
ct.AddElement(new Paragraph("Hello world!"));
//Convert our HTML to iTextSharp elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
//Loop through each element (in this case there's actually just one PdfPTable)
foreach (IElement el in elements)
{
//If the element is a PdfPTable
if (el is PdfPTable)
{
//Cast it
PdfPTable tt = (PdfPTable)el;
//Change the widths, these are relative width by the way
tt.SetWidths(new float[] { 75, 25 });
}
//Add the element to the ColumnText
ct.AddElement(el);
}
//IMPORTANT, this actually commits our object to the PDF
ct.Go();
}
}
}
this.Close();
}
}
}

protected void LinkPdf_Click(object sender, EventArgs e)
{
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=TestPage.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
this.Page.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 100f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Hebrew text in PDF - c#

Related

ItextSharp Find text in pdf and highlight it

Not able to fill multiple rows in xfa pdf with FillXfaForm() using itextsharp

arabic encoding in itextsharp

Pdf Merge/Overlap with iText

iTextSharp HTMLWorker ParseHTML Tablestyle and PDFStamper

Categories

Resources