ABCPDF Arabic Text Rendering Incorrectly

ABCPDF Arabic Text Rendering Incorrectly - c#

When I am rendering Arabic text on a report, the text is not rendering correctly. The text appears to be being rendered one individual letter at a time, rather than being joined up.
The text is being displayed right to left correctly (I've used the dir=rtl formatting on each element I'm adding), which is confusing me.
Any help anyone can give is appreciated.
I've added a screenshot of some text as an example.

So I emailed abcpdf directly and they told me this:
ABCpdf 8 supports Arabic, but does not support contextual ligatures with the Doc.AddHtml approach - only with regular HTML/CSS (i.e. using Doc.AddImageUrl or Doc.AddImageHtml).
Support for contextual ligatures with Doc.AddHtml was added in ABCpdf 9.1 and is present in the current live release, ABCpdf 10.
Further clarification:
if I add an html file with the specific Arabic text onto my server, i should be able to access it and render the text in that file correctly?
That's correct. Please ensure you have the final ABCpdf 8 minor version (8123) from our Downloads page. And you may need to use the Gecko HTML engine - please see the HtmlOptions.Engine property.

Related

iText 7 version of iTextSharp TextWithFontExtractionStrategy solution

I am trying to parse a PDF file that has two columns of text on most pages and no images. I tried using the iTextSharp solution that can be found at how can i get text formatting with iTextSharp . It seemed to be working for me, but then I noticed some rather serious issues with the text being returned out of order in some places on my PDF. I am simply looking for it to parse the text in the same order that it exists on each page (no special order), but this is not happening. I was wondering if there is a version of the TextWithFontExtractionStrategy solution available in iText 7 that would not exhibit this problem (or even a version of iTextSharp that works correctly for that matter). I would appreciate any assistance.

How to get coordinates of Run or other WordProcessing element via OpenXML?

I was wondering if it is possible to find the coordinates of a specific Run (text, no drawing or other elements that have offset parameter) on a page in a Word document using OpenXML SDK. I know that OpenXML is basically .. well XML, and simple runs have no relative, numerical position embedded in them.
I was reading through OpenXML SDK API and found no clues but maybe I have missed something. By coordinates I mean any tuple that can be mapped to pixels if I would generated an image out of the page (imagine you made a screenshot of page)
I suspect, if this is possible, it is not trivial.
Appreciate your help!

The Open XML SDK does not include this functionality. This would require a layout engine, which is not part of the SDK.

Word is not a page layout program, it's a word processor. Therefore:
No, it's not possible because...
The Word application dynamically lays out a page when it's opened in the Word application. Exactly how it's layed out and where things appear on-screen (or on the printed page) depends on how Word calculates font size as well as line, character and paragraph spacing (in all directions) for the currently selected printer driver. So it can vary and thus cannot be saved in the Open XML file.

Alignment Issue in telerik editor

I am using telerik editor(in asp.net), I am trying to write some content in that, some words or content are place on right hand side using tab key (i.e 4 time ).
At design time it display properly but when I print that document the right hand side words goes to the next line or some time it comes in middle of the page.
how I can solve this alignment problem??
1.before print(i.e design time in editor)
1.after print(i.e A4 page image after print)
thank you.

You have two options as I see it:
Define your own Print.css file and follow the instructions on this Telerik documentation page to ensure the print CSS is used. This will ensure printing is consistent and any stripping of CSS done by the browser is avoided.
Generate a PDF from the Telerik editor, and print the PDF. See this Telerik documentation page with demo code for more details on how to generate a PDF from the Telerik ASP.net RadEditor control.

Uneven character kerning in PDF when converted from Word via automation

I need your expertise in fixing a problem I have been facing from a week. This has already turned into a 'royal pain in the lower back side' category and time is running out fast.
Problem
I have developed a C# script that I call from ColdFusion to assist me in converting Word documents to PDF. This script is doing the conversion properly, but the (justified) text in the paragraphs is not being spaced properly. I get a non-select-able space next to some character.
See the image -
What is should look like...
What it looks like...
The red marks are added to show the spaces created.
Now, if I open the file by word manually and save it, I do not get this same problem. What is that I'm missing or doing wrong, that has resulted in this error?
Details of my application flow -
I create a DOC (based on my design needs) and save it as HTML.
This HTML will be used by my CF application to manipulate the content based on some placeholders and the final output is again saved as HTML.
The xx.html file is renamed to xx.doc and passed to my C# based converter, which does the doc to pdf convertion via Word Automation.
I ponder in joy seeing my well formed PDF output, but get sad that the text is a bit messy.
I have tried this with multiple fonts and what i observe is that it only happens with certain fonts (in my case its Palatino - Linotype). I want to know, what is the difference from manual to automation? Is there a setting (like a boolean) that is to done for this or some other hacks?
My system configuration -
Windows 2008 R2 64b + .NET 4 + Office 2010
Note: I know that office automation is bad. So on this date and time, this is the only option I have to get my job done.

I found a work-around for this. It seems to be dependent on the selected printer!
First go to the print dialog (File / Print) and select "Microsoft XPS Document Writer" instead of your normal printer. You don't need to print anything,
Now export the PDF (File / Export / Create PDF)
Selecting other printer drivers may work also. I found this solution at this thread: http://www.howtofixcomputers.com/forums/microsoft-office/bad-kerning-pdf-using-save-pdf-xps-add-244886.html
Notes:
I also installed Adobe PDF Writer before finding this. It's possible that affected it.
My system is Windows 8.1 & Office 2013 running under Fusion 5.0.3 on a Mac mini.

I guess that the trouble could be in used font. Please try:
change font
ensure, that language of the text (LanguageID Property) is correct
Or it could be inserted special character, for example, wrong way interpreted inserted "no-width optional break". Try to select the text, cut&paste in word and see non-printable characters - it should be visible.

How to generate an RTF document server side in c#

I have tried using the System.Windows.Documents.FlowDocument server side, but ran into a problem with images.
What I need to produce is a document with headings, section breaks, page breaks, images (with text wrapping around from the left or the right), tables and ideally some kind of table of contents.
I use c# and asp.net.
Is there a library that will do most of this?
RTF has been chosen because the document needs to be openable in older versions of word, be editable, and we can't run word on the server.
Thank-you

I used MigraDoc in the past, it is a free library. You can create PDFs or RTFs. Just Google it.

I have started using .net rtf writer.
It produces clean rtf, but doesn't do everything I need.
There is pretty good documentation for rtf here.
I am working some things out for my self. For example, I needed to be able to wrap text around an image. Whilst the rtf writer above enables you to add images to documents, it does so by putting the image in its own paragraph. What I need is a shape element.
In the rtf it ends up looking something like this (some of the numbers define the size and position of the image in twips):
{\shp{\*\shpinst\shpleft3801\shptop1\shpright8300\shpbottom4500\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr2\shpwrk0\shpfblwtxt0\shpz0
{\sp
{\sn pib}
{\sv
{\pict\pngblip\pichgoal4499\picwgoal4499
-- image binary data goes here --
}}}
{\sp
{\sn fLine}
{\sv 0}}}}
I sometimes just save something in word and try and understand what it did (but word seems to add a lot of noise).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.