Reversing Strings in Right To Left (BiDirectional) Languages in iTextSharp

Reversing Strings in Right To Left (BiDirectional) Languages in iTextSharp - c#

I'm using iTextSharp( C# iText port) to create pdfs from text/html. Most of my text is in Hebrew, a Right-To-Left language.
My problem is that PDFs show RTL langauge in reverse, so I need to reverse my strings in a way that would only reverse the RTL text without reversing any numbers or text in English.
It is my understanding that fribidi allows doing that on linux, but I couldn't find any solutions for this problem for Windows.
I would welcome any suggestions, including an alternative to iTextSharp that would do this automatically (if one exists).

To show RTL texts by using iTextSharp correctly:
You need to set the font's encoding to BaseFont.IDENTITY_H
Then you have to use container elements which support RunDirection, such as PdfPCell, ColumnText, etc. and now you can set their element.RunDirection = PdfWriter.RUN_DIRECTION_RTL;

HTML displays Hebrew/Arabic in Logical mode, and in PDF you need to store it Visual mode. What you need to do is convert from Logical to Visual mode. There are some libraries which do this (google for minibidi which is BSD licensed IMHO, or fribidi which is GPL or LGPL).
My real suggestion would be to change direction. Write a very small application in Qt4 which takes as first argument the URL, and the second the PDF to write. Since Qt4 has HTML support (via QtWebKit) has has the option to print to PDF (post script and SVG as well) this should be simpler then writing your own HTML->PDF solution.

PDFCreator is a free tool to create PDF files from nearly any Windows application.
It installs as a Windows printer driver, such that it can be used by any Windows program that has a print functionality.
You can treat your input as simple text strings to be printed, and maybe using the print menu option of Notepad will create the correct PDF.
If you want to dive a little deeper into right to left C# printing, use StringFormatFlags.DirectionRightToLeft string format with Graphics.DrawString() calls.
A snippet from a PrintPage Event Handler:
lineFmt = new StringFormat(StringFormatFlags.DirectionRightToLeft);
e.Graphics.DrawString(textToPrint, font, Brushes.Black, startX, ypos, lineFmt);

Just put the string in table cell:
PdfPCell cell1 = new PdfPCell(new Phrase("מספר",font));
cell1.HorizontalAlignment = 2; //0=Left, 1=Centre, 2=Right

Related

How can I programmatically format text in visio shapes (color, size, font)?

I can change the font, color, size of characters in the shape. But only for rows that already exist in the VisSectionIndices.visSectionCharacter of shapesheet.
I cant create new rows for this section and cant change the number of characters for each formatting.
Any solution will suit me. Any hack, any idea.
I already broke my head, I don't know how to approach this.
How do I change the color of the text inside one shape? (For example black green red?)

Welcome to stackoverflow. You could check some visio book, like "Developing Visio Solutions" free microsoft book, it discusses these subjects.
There is also a great Visio forum, http://visguy.com/vgforum/ where you can find lots of visio-specific question answered out of the box. Russian visio related forum forum: https://visio.getbb.ru/
Also, you can always use macro recorder to generate the code (I mean, you turn on the recording, do the action manually, and get the generated code in the VBA IDE)
Anyways. You can use shape.Characters to modify style of a text fragment. Like this:
Set shp = ActivePage.DrawRectangle(0, 0, 1, 1)
Set chars = shp.Characters
chars.Text = "Something with Red Text"
chars.Begin = 10
chars.End = 19
chars.CharProps(visCharacterColor) = 2
Result:

How to Get pantone color number from pdf?

I want to extract pantone colors from PDf from illustrator.dll or acrobat.dll. It is possible to get pantone colors from adobe professional . but I don't know how to get through code. I used illustrator too. can you please help me to get pantone colors using csharp code.
I had extracted font size from PDF using Illustrator . Is it possible to get pantone color ? thank you in advance
Illustrator.TextFrame tF = doc.TextFrames[i];
Illustrator.TextFont objFont = tF.TextRange.CharacterAttributes.TextFont;

It would be very nice to obtain that kind of information and still rely on it as it is in reality.
Usually Pantone colours can only be seen on their official catalogues, as colours on digital surfaces tend to be converted to the screen crystal cells format.
Anyway, you've even got a comparison tool on their site I think, but my advice as a Designer is not to rely on those colours, as a simple tweak on a colour can sometimes ruin hours of work.
Here is a simples site I found: http://www.ginifab.com/feeds/pms/cmyk_to_pantone.php
Good Luck ;)

calculate html page breaks (html 2 pdf) server side for precise print layout with headers and footers

We print pdf books generated through a html to pdf application.
There is a header and footer on each page, and we place content exactly using production, and translation restrictions (and layout variations) for different languages to ensure that the fixed content for each page fits.
So for example, although our content is dynamic, a paragraph is expected to take approximatley the same amount of space for the same place in the book. We sometimes change style and layout attributes for translations but the same rules about like sizes apply.
We have a header and footer on each page, and the entire book is rendered as one long html page using css line breaking to force each header onto to a new page. So to reflect we control fixed content height per page server side.
This works well, and we are very happy with the advantages that HTML affords us in presentation (designers rather than programmers can design pages etc), we are also heavily invested in this tech, we are in too deep to change direction now, so we are not able to change our technology, we are using html 2 pdf and we need to make this work as best as possible. That is not to say we could not mix tech. but...
The problem is thus, we now have some variable sized content, that we have no former control over, to us it is text, so we have control over its formatting, but not it's quantity. We also have headings which are different sizes.
We need a way to calculate page breaks, leaving as little white space as possible, and I would love to know how anyone else is dealing with this. I know this will not be an exact science, but I still need the best approach possible.
We have total control over the rendering/layout engine it is always ie8 compatible, so different browsers need not be considered.
These are my thoughts, would love to hear yours:
This is our current method, assign a number of lines per page (variable by font-size and font to allow for different locales) each block of content will be calculated into n lines cost and this figure used to calculate pages breaks.
Pro simple
Con inaccurate, none of our fonts are monospaced, needs configuring for every locale.
Render each consecutive page of free flow content into a webpage in a div of the exact page width (fixed div) let it flow to whatever vertical height it requires, using a html 2 bmp solution capture an image and use the height of the rendered image (edge detected and cropped if required) to calculate the required number of pages.
Pro Could be accurate, not too expensive if free flow content is kept contiguous.
Con Incomplete solution, once I know the required number of pages, how do I know where to break the html? Measuring each page using this method and edge detecting would be very expensive.
On a font by font basis, knowing in advance the font sizes, padding and margins of text and headings, calculate width and line breaks and height, chracter by character using width data extracted from the font file.
Pro Once all the data had been extracted, and margins had been added for differences in HTML rendering this could likely be fairly accurate.
Con Highly intricate and sensitive to style sheet changes.
Could we use a WebBrowserControl to somehow measure the content?
Love to hear your thoughts and suggestions.
EDIT....
Our pdf converter is Winnovative, which runs within a .net Windows service, our html feed however is generated in PHP.

Kindly refer the manual
http://www.winnovative-software.com/manual%5CHTML%20to%20PDF%20Converter%20for%20.NET%20-%20Developer%27s%20Manual.htm
point 5.1.
Hope this solution helps you.
Note: the internal links are not working, so kindly manually navigate to the desired point.

This question is old, but I am doing the same basic thing as you. What I've found is that line number counting is still important, but you can use the css style line height to standardize the size of each line. (height for tr's if html is table based). This should allow you to have a constant number of lines per page.
Did you come up with a solution that worked for you?

iTextSharp - Copying elements from one PDF to another

I want to copy certain elements from one PDF to another using iTextSharp.
I want to read one PDF, read text elements from that and correct them and create a new PDF using the updated text elements and all the images etc. from the first PDF.
Please help me how this can be achieved.

This task is very complex. I wrote a program to do this for a large greeting card maker.
First you have to locate the text and calculate the glyph bounding boxes. Next you have to modify the contents stream to remove the text. The text may be broken into many pieces depending on the PDF creator. You have to remove those operators from the contents stream and adjust the CTM because some operators use relative positioning. Finally, you have to insert the replacement text, matching the original text's style (font, size, color, orientation, etc.)
As for copying elements from one PDF to another, most of the steps above are required plus you have to copy resources, eg. fonts, colorspaces, patterns, etc, to the new PDF.

Get height of rendered text and images in MS Word

I'm creating a newspaper authoring system. Today I'm using Aspose.Words library to generate newspaper using Docx format as output, based on a lot of other documents as input.
The basic idea is to load a lot of articles documents into a List, then generate a final docx with newspaper.
We need to get the total height of a text (with images and tables) inside columns. As libraries like Aspose.Words deal with Docx format like DOM, there isn't way to know how text will be arranged inside columns. Then I can't know the real height.
We've worked in our own way to get this height. I'm using MeasureString() function from System.Drawing.Graphics namespace. It returns width and height used by string and I can estimate how many lines (and points or inches) it will use inside a column.
But it is very poor and we need a more decent solution. We are thinking to use OpenXML SDK to get this Height, can we?
Aspose.Words doesn't support a way to know it and all Render classes are private to the library.
Can you think a new way to get this height?
Thank you,
Daniel Koch

This property isn't exposed in Open XML or the SDK (or VBA/VSTO for that matter). How exactly the height is calculated is not in any documentation. Possibly the way you are doing it is a way to proceed.
Another possible way is to put your TextColumns in a Table Column/Cell and grab that height (but if it is two text columns in the cell and the first one "fills" the cell top to bottom and the second one doesn't, you'll still have the issue of not being able to calculate the size of the second one).

I have almost the same problem that you have.. But in my case I'm dealing with Questions inside an Test Exam..
Well nowadays, we are using RTF to build the questions and a RichTextBox the measure the height.. Just like that (http://blogs.technet.com/david_bennett/archive/2005/04/06/403402.aspx)..
And I wanna to migrate to DOCX.. But still no luck on how to measure the question with tables and images.. :-(
Right now I'm studying the Document Members (http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document_members.aspx), to try to do with Word Automation..
Regards,
Bruno

Thank all for answer.
I finished it changing Aspose.Words to PDFLib. Now I can control pages, columns or anything using Postscript Points.
We keep Aspose.Words only to content import, but it isn't indicate to print newsletter.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.