iText C# exact text displaying - c#

I am trying to insert the exact text with the spaces at the beginning of line, however iText eats all the spaces before the first visible symbol (tabulation does't work as well).
I am using iText 7 Community edition.
C# code:
FileInfo file = new FileInfo(DEST);
file.Directory.Create();
//Initialize PDF writer
PdfWriter writer = new PdfWriter(DEST);
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document doc = new Document(pdf);
doc.Add(new Paragraph("Test\n\tTest\n Test\n Test 1 2 3"));
doc.Close();
That code display the text in the output .pdf document as
Test
Test
Test
Test 1 2 3
Without any tabs and spaces before the fist visible symbol of each line.
How can I change code to get
Test
Test
Test
Test 1 2 3
in the output document?

In your code example, (embedded) tabs wouldn't work in iTextSharp 5.xx.xx either, although spaces are respected. What's a little surprising, as you've proved, is that iText7 strips spaces following a newline. Not sure if you need support for either or both, so will give an example that handles each case separately:
First, preserving tabs:
Paragraph p = new Paragraph("Line 0\n")
.AddTabStops(new TabStop(8f))
// change to your needs ^^
.Add(new Tab())
.Add("Line 1");
doc.Add(p);
Second, preserving spaces immediately following a newline:
string[] lines = "0\n1\n 2\n 3\n".Split(
new string[] { "\n" },
StringSplitOptions.RemoveEmptyEntries
);
p = new Paragraph().AddStyle(
new Style().SetFont(PdfFontFactory.CreateFont(FontConstants.COURIER))
);
foreach (var l in lines)
{
if (Regex.IsMatch(l, #"^\s+"))
{
p.Add(" ") // all spaces stripped, whether one or more characters
.Add(l) // now leading whitespace preserved
.Add("\n");
}
else
{
p.Add(l).Add("\n");
}
}
doc.Add(p);
This is the first time I've looked at/written any iText7, so there's likely a different/better way, and I don't consider it anything but a workaround. Oddly, if you add any number of space characters following a newline and then immediately add a string that's also preceded with space characters, the first call strips space, but the second preserves them.
As a side note, one thing I noticed right away and really like about the new API is you can use method chaining all over the place. :)
Here's the result:

You should use Chunks to add text to a paragraph.
Then you should set tab settings and use specific Chunk.TABBING
p = new Paragraph();
p.setTabSettings(new TabSettings(56f));
p.add(Chunk.TABBING);
p.add(new Chunk("Hello World with tab."));
This sample is located at iText examples

Just try this.
Font bodyFont = FontFactory.GetFont("Times New Roman", 10, Font.NORMAL);
file.Directory.Create();
//Initialize PDF writer
PdfWriter writer = new PdfWriter(DEST);
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document doc = new Document(pdf);
doc.Add(new Paragraph("Test", bodyFont));
doc.Add(new Paragraph(" Test", bodyFont));
doc.Add(new Paragraph(" Test", bodyFont));
doc.Add(new Paragraph(" Test 1 2 2", bodyFont));
doc.Close();

Related

Itext7 not showing arabic text

I am trying to create a Pdf document using IText7. Despite the table looks as expected, just found a big problem, this does not show Arabic letters.
I've tried adding new fonts and changing the encoding.
I'm displaying Arabic letters in the wrong direction and they are separated, changing the base direction from right to left didn't help.
This is the part of the code:
string font = "naskh.ttf";
PdfFontFactory.Register(font);
FontProgram fontProgram = FontProgramFactory.CreateFont(font, true);
PdfFont f = PdfFontFactory.CreateFont(font,true);
Cell cell = new Cell(1, 3)
.Add(new Paragraph(" English عربي "))
.SetFont(f).SetFontScript(UnicodeScript.ARABIC)
.SetFontSize(33).SetBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.SetFontColor(DeviceGray.WHITE)
.SetBackgroundColor(new DeviceRgb(80, 140, 80))
.SetTextAlignment(TextAlignment.CENTER);
The result is like this:
I've tried everything I could find online, lots of them are java or older version, I tried to change them to work on c# Itext7 but still no result
the closest I cot was with PdfFont f = PdfFontFactory.CreateFont(alaw, "Identity-H", true);
where I got 3 letters in the wrong order
I even tried to use \u0644\u0648\u0631\u0627\u0646\u0633 \u0627\u0644\u0639\u0631\u0628 (copied from an answer as string, but still not shown.
I can't use paid add-ons
Any solution to be able to write Arabic?
add language Processor:
LanguageProcessor languageProcessor = new ArabicLigaturizer();
and modify cell or PdfDocument like that:
com.itextpdf.kernel.pdf.PdfDocument tempPdfDoc = new com.itextpdf.kernel.pdf.PdfDocument(new PdfReader(pdfFile.getPath()), TempWriter);
com.itextpdf.layout.Document TempDoc = new com.itextpdf.layout.Document(tempPdfDoc);
com.itextpdf.layout.element.Paragraph paragraph0 = new com.itextpdf.layout.element.Paragraph(languageProcessor.process("الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية"))
.setFont(f).setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(15);
the final will be something like :
String font = "your Arabic font";
PdfFontFactory.register(font);
FontProgram fontProgram = FontProgramFactory.createFont(font, true);
PdfFont f = PdfFontFactory.createFont(fontProgram, PdfEncodings.IDENTITY_H);
LanguageProcessor languageProcessor = new ArabicLigaturizer();
com.itextpdf.kernel.pdf.PdfDocument tempPdfDoc = new
com.itextpdf.kernel.pdf.PdfDocument(new PdfReader(pdfFile.getPath()), TempWriter);
com.itextpdf.layout.Document TempDoc = new
com.itextpdf.layout.Document(tempPdfDoc);
com.itextpdf.layout.element.Paragraph paragraph0 = new
com.itextpdf.layout.element.Paragraph(languageProcessor.process("الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية"))
.setFont(f).setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(15);
//and look how i useded setBaseDirection & and don't use TextAlignment ,it will work without it

iText7 SetJustification(2) works partially

I have a code which takes a PDF template, inserts some input values into the template and creates an output PDF file.
One of the fields in the PDF file is a free text which can include a value with Hebrew/English/Numbers or signs characters.
I'm using the following code to make the text readable in Hebrew with RTL display:
iText.Kernel.Pdf.PdfReader reader = new iText.Kernel.Pdf.PdfReader(pdfTemplatePath); //src);
iText.Kernel.Pdf.PdfWriter writer = new iText.Kernel.Pdf.PdfWriter(pdfOutPutFile); //dest);
iText.Kernel.Pdf.PdfDocument pdf = new iText.Kernel.Pdf.PdfDocument(reader, writer);
iText.Forms.PdfAcroForm form = iText.Forms.PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, iText.Forms.Fields.PdfFormField> fields = form.GetFormFields();
// iText.Kernel.Font.PdfFont = iText.Kernel.Font.PdfFontFactory.CreateFont(iText.IO.Font.FontProgram
FontProgramFactory.RegisterFont(#"C:\Windows\Fonts\ARIALUNI.TTF", "arialUnicode");
iText.Kernel.Font.PdfFont myFont = PdfFontFactory.CreateRegisteredFont("arialUnicode", iText.IO.Font.PdfEncodings.IDENTITY_H, true);
pdf.GetFirstPage().GetResources().AddFont(pdf, myFont);
// Set Field value by Fields mapping
foreach (string fieldName in formFieldMap.Keys)
{
fields[fieldName].SetValue(formFieldMap[fieldName].ToString());
fields[fieldName].SetFont(myFont);
// displaying the text: 0 Left-justified 1 Centered 2 Right-justified
**fields[fieldName].SetJustification(2);**
}
My problem is that the text is not aligned to right.
Field with text-align left:
What more can I do to set the text-align on the right?

How to find and replace the text in the footer of the word document using open XML SDK?

I tried the below code. It works if it is a normal text and left indent is on. If I have the text with the square brackets, it corrupts the docx, and if I have the text to be center aligned in footer, replacing doesn't work. Please help me. Here is my code.
using (var file = WordprocessingDocument.Open(targetFileName, true))
{
string content = null;
using (StreamReader reader = new StreamReader(
file.MainDocumentPart.FooterParts.First().GetStream()))
{
content = reader.ReadToEnd();
}
Regex expression = new Regex("[name]");
content = expression.Replace(content,"replacement word");
using (StreamWriter writer = new StreamWriter(
file.MainDocumentPart.FooterParts.First().GetStream(FileMode.Create)))
{
writer.Write(content);
}
file.MainDocumentPart.Document.Save();
}
I want to replace multiple words in the footer like [name] | [email] | [telephone]
Document will be corrupted when the text to be replaced has [] in it.
Thanks in advance
What is happening in the above code is that every instance of "n", "a", "m", "e" is being replaced with the entire string "replacement word". The xml headers in the .docx file contain those characters, which are being overwritten and corrupted when the Regex runs the Replace function.
This can be fixed by escaping the [] characters as follows:
Regex expression = new Regex("\\[name\\]");

Extract words from a doc/docx file c#

I want to extract all the words from a Word file (doc/docx) and put them into a list. It seems like microsoft.Office.Interop works just if i want to extract paragraphs and add them into a list.
List<string> data = new List<string>();
Microsoft.Office.Interop.Word.Application app = new
Microsoft.Office.Interop.Word.Application();
Document doc = app.Documents.Open(dlg.FileName);
foreach (Paragraph objParagraph in doc.Paragraphs)
data.Add(objParagraph.Range.Text.Trim());
((_Document)doc).Close();
((_Application)app).Quit();`
I also found the way to extract word by word but it didn't works with big document because of the loop that generates an exception.
`Dictionary<int, string> motRap = new Dictionary<int, string>();
Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
Document document = application.Documents.Open("C:/Users/Titri/Desktop/test/test/bin/Debug/po.txt");
// Loop through all words in the document.
int count = document.Words.Count;
for (int i = 1; i <= count; i++)
{
string text = document.Words[i].Text;
motRap.Add(i, text);
}
// Close word.
application.Quit();`
So my question is, if there is a way to extract words from a big word file. I think that Microsoft.Office.Interop is not the good tool to extract from a big file.
Sorry my english is not good.
The object inside a paragraph is called Run, though I don't know whether or not this is available in Interop. To enhance your experience performancewise, I would suggest you switch to using OpenXmlSdk, in case you have to process a large amount of documents.
If you want to stick to Interop, why don't you just split each paragraph into an array (delimiter obviously space) and add all the words after that?

Insert List<string> into word document

I'm currently trying to find a way to read in, and insert data into a word document. So far this is what I have gotten:
class Program
{
static void Main(string[] args)
{
var FileName = #"C:\temp\test.DOC";
List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(#"C:\temp\test.DOC");
foreach (Paragraph objParagraph in doc.Paragraphs)
{
data.Add(objParagraph.Range.Text.Trim());
}
//data.Insert
data.Insert(16, "Test 1");
data.Insert(16, "\tTest 2\tName\tAmount");
data.Insert(16, "Test 3");
data.Insert(16, "Test 4");
data.Insert(16, "Test 5");
data.Insert(16, "Test 6");
data.Insert(16, "Test 7");
data.Insert(16, "Test 8");
data.Insert(16, "Test 9");
data.Insert(16, "Test 10");
var x = doc.Paragraphs.Add();
x.Range.Text.Insert(0,"\tTest 2\tName\tAmount");
doc.SaveAs2(#"C:\temp\test3.DOC");
((_Document)doc).Close();
((_Application)app).Quit();
}
}
Now, this successfully populates the List data - but I'm trying to append each new test element at the [16]th index, and save it into the word document. Is there a simple way to accomplish this, or am I just over-thinking this issue?
I realize the string list is separate from the Document object which represents the word document.
I have a few other places in the document where I am using bookmarks to add data, but I don't think it is possible to use bookmarks for placing the data in this instance - or If I don't have to use bookmarks I'd like to stray away from that.
EDIT: I am trying to insert X amount of elements at the [16]th position within the data[].
EDIT 2:
Essentially I am sourcing the data dynamically, and I'm not sure how many records/rows I'll need to add to the document, so it could be as follows:
[15]
[16]\tName\tID\tAMOUNT
[17]\tName\tID\tAMOUNT
[18]\tName\tID\tAMOUNT
Since the headers will already be there (NAME,ID,AMOUNT), and each time I run the program I'm not sure how many elements I'll be inserting into the document - so as long as each element is placed under one another, and on the 16th line in the document template I have setup that should accomplish what I am trying to do.
Image 1 - Image into string array
Image 2 - Image after adding content into the string - this is what the resulting document. (this is to be saved)
I'm attempting to put each element ie: Test1 Test2 Test3 in their each own column each (see above)
Again I am totally confused as to why you want to read the word file into a string list array. This simply adds the text you show after line 15 into the word document. You do not specify WHERE Test 1, Test 2, Test3... are coming from.
Edit: Added a try-catch just in case the document does not have at least 16 paragraphs.
static void Main(string[] args)
{
List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(#"C:\temp\test.DOC");
string testRows = "Test 1\n\tTest 2\tName\tAmount\nTest 3\nTest 4\nTest 5\nTest 6\nTest 7\nTest 8\nTest 9\nTest 10\n";
try
{
var x = doc.Paragraphs[16];
x = doc.Paragraphs.Add(x.Range);
x.Range.Text = testRows;
doc.SaveAs2(#"C:\temp\test3.DOC");
}
catch (System.Runtime.InteropServices.COMException e)
{
Console.WriteLine("COMException: " + e.StackTrace.ToString());
Console.ReadKey();
}
((_Document)doc).Close();
((_Application)app).Quit();
}
So what I figured out (for my purposes) is that is is easiest to insert a list of strings into makeshift columns separated by tabs by inserting at specific paragraphs.
Since I am using bookmarks to place text as well - I found it useful to work from a copy of a document instead of worrying about removing/creating bookmarks each time.
When populating the list that you are going to be placing at a specific paragraph mark it is useful to append tab characters as well as newline charters on the fly. Later on this will make it easier to loop through the list and place them nicely on the document.
Depending on the way you are going to go about placing columns some logic will have to be determined to space everything correctly. I did this by creating maximum lengths for columns and trimming, and accommodating for smaller/larger lengths by adding specific amounts of tab characters.
So, my columns I am populating would look like:
myList.Add("\t12345678912345\tJohn Doe\t\t\t\t123456\r\n");
myList.Add("\987654321654987\tJohn Smith\t\t\t\98765\r\n");
These lines would be inserted at paragraph 17 and placed neatly under headers.
Lastly, I decided to use bookmarks to place single lines of text like the date,title, and signature values since those values don't need to be correctly spaced or anything.
At the end I delete the copy of the word document I'm working on, and delete the pdf (since in my case I'm sending it via email)
Thank you for the help #JohnG - I hope this answer might help others who come across it. I removed the try-catch since I'm working from the template as well.
File.Copy(sCurrentPath + "\\" + "testTemplate.DOC", sCurrentPath + "\\" + "test.DOC");
Application app = new Application();
Document doc = app.Documents.Add(sCurrentPath + "\\" + "test.DOC");
foreach (string sValue in myList)
{
var List = doc.Paragraphs[17];
myList = doc.Paragraphs.Add(myList.Range);
myList.Range.Text = sValue;
}
if (doc.Bookmarks.Exists("Date"))
{
object oBookMark = "Date";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = DateTime.Now.ToString("MM/dd/yyyy");
}
if (doc.Bookmarks.Exists("Signature"))
{
object oBookMark = "Signature";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = "My Name";
}
if (doc.Bookmarks.Exists("Title"))
{
object oBookMark = "Title";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = "Title Here";
}
doc.ExportAsFixedFormat(sCurrentPath + "\\" + "test.pdf", WdExportFormat.wdExportFormatPDF);
File.Delete(sCurrentPath + "\\" + "testCopy.DOC");
File.Delete(sCurrentPath + "\\" + "test.pdf");
((_Document)doc).Close();
((_Application)app).Quit();

Categories

Resources