Converting PDF, Doc and Docx to rtf in c#

Converting PDF, Doc and Docx to rtf in c# - c#

I have a requirement for an application that takes Doc, Docx and PDF and converts them to RTF.
The conversion is one way and I do not need to convert back to Doc or PDF.
Has anyone done this and can you recommend a libray? I know there is aspose but it's way to pricey and the licenses are per year so that's not going to work for the company I happen to work for.
I'm ok using more than one library for each of the file types if thats what it takes.
Thanks in advance

Telerik has a nice library to do this. They actually have an entire editor that looks like Microsoft Word. It can open multiple file formats and it saves natively as RTF (although it can save as PDF, DOCX, etc.) The one thing I'm not sure of is opening the PDF and saving as an RTF. I'm not sure that the Telerik library can do that.
Here is a link to the library:
http://www.telerik.com/products/wpf/richtextbox.aspx
For a PDF to RTF library, you could use this:
http://www.sautinsoft.com/products/pdf-focus/index.php

GroupDocs.Conversion Cloud is a REST API that converts all common file formats from on format to another reliably and easily. Its free pricing plan offers 50 free credits per month.
Here is sample code for PDF to RTF from default storage:
// Get App Key and App SID from https://dashboard.groupdocs.cloud/
var configuration = new GroupDocs.Conversion.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance = new ConvertApi(configuration);
try
{
// convert settings
var settings = new GroupDocs.Conversion.Cloud.Sdk.Model.ConvertSettings
{
StorageName = null,
FilePath = "02_pages.pdf",
Format = "rtf",
ConvertOptions = new RtfConvertOptions(),
OutputPath = "02_pages.rtf"
};
// convert to specified format
List<StoredConvertedResult> response = apiInstance.ConvertDocument(new ConvertDocumentRequest(settings));
Console.WriteLine("Document converted successfully: " + response[0].Url);
}
catch (Exception e)
{
Console.WriteLine("Exception when calling ConvertApi.QuickConvert: " + e.Message);
}
I'm developer evangelist at aspose.

Related

c# , how to generate password protected arabic pdf or word document

I used Itext7 in my C# code to create a pdf file, as I said in my other question here
Itext7 not showing arabic text
so I gave up on trying to fix it, because it seems like I need to pay for the addon, and I can't do that
I tried Pdf sharp, it showed arabic letters but there were disconnected and reversed, and writing arabic backward did not make the letters connect
I used SautinSoft library and it created a word document where arabic works fine, but it has a footer that says that it is a free version, so i can't use this one either
the pdf created by this library also doesnt support arabic
so I think I can't write pdf in arabic, all libraries I tried didn't supported it
is there anyway to fix it?
or can anyone please suggest another library that can create arabic pdf or a word document without watermarks or footers

I found the solution, using Gembox pdf, it only allows 20 paragraphs, but that is more than enough

What if DocumentCore?
public static void SecureDocument()
{
string filePath = #"ProtectedDocument.pdf";
DocumentCore dc = new DocumentCore();
// Let's create a simple document.
dc.Content.End.Insert("Hello World!!!", new CharacterFormat() { FontName = "Verdana", Size = 65.5f, FontColor = Color.Orange });
PdfSaveOptions so = new PdfSaveOptions();
// Password Protection
so.EncryptionDetails.UserPassword = "12345";
// EncryptionAlgorithm
so.EncryptionDetails.EncryptionAlgorithm = PdfEncryptionAlgorithm.RC4_128;
//Permissions: Content Copying, Commenting, Printing, Changing the Document, filing of form fildes
//Printing: Allowed
so.EncryptionDetails.Permissions = PdfPermissions.Printing;
// Save a document as the PDF file with Security Options.
dc.Save(filePath, so);
// Open the result for demonstration purposes.
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(filePath) { UseShellExecute = true });
}

Converting html to PDF using pdftron "ToPDF cannot convert this file format on this platform"

I get the below error we do that.
"ToPDF cannot convert this file format on this platform"
File is available at the locations. I am simply trying to convert a html file to pdf.
bool err = false;
try
{
PDFDoc pdfdoc = new PDFDoc();
string input_file_path = Path.Combine(Directory.GetCurrentDirectory(), "test.html");
pdftron.PDF.Convert.ToPdf(pdfdoc, input_file_path);
pdfdoc.Destroy();
}
catch (Exception e)
{
err = true;
}

HTML to PDF conversion on UWP is not currently available.
In fact, I am not aware of any completely server less, device only, HTML to PDF conversions for UWP.
At least not any solution that can handle any HTML input, and convert to PDF. There may be solutions if your HTML is a very narrow and known subset of HTML/CSS/JS.
Instead, for general HTML to PDF conversion on mobile, you would utilize a server component.

Uploading a resume in a pdf file and displaying in asp.net

I would like to know how to upload a resume in a pdf file in an asp.net page. I know how to upload a simple txt file and when the fields are separated by ",". Here's my code.
using System.IO;
string uploadfile = Server.MapPath("~/uploads3/") + FileUpload1.FileName;
FileUpload1.PostedFile.SaveAs(uploadfile);
if (File.Exists(uploadfile))
{
string inputline = "";
using (StreamReader sr = File.OpenText(uploadfile))
{
while ((inputline = sr.ReadLine()) != null)
{
string tempstr = inputline;
string firstname = tempstr.Substring(0, tempstr.IndexOf(","));
tempstr = tempstr.Substring(tempstr.IndexOf(",") + 1);
string lastname = tempstr.Substring(0, tempstr.IndexOf(","));
tempstr = tempstr.Substring(tempstr.IndexOf(",") + 1);
(...)
Now, I have absolutely no idea how to do this on a pdf file containing a resume. How to do that? Please explain your answers, I'm just new to system.io. Thanks again.

You will want to take a look at the open source iTextSharp library. It provides all the methods you will need for writing to a PDF. There are plenty of other PDF writing libraries that can do the same. As far as I know it isn't practical to do this using System.IO. You can still upload a CSV file, have the codebehind do the formatting and PDF creation, and then save it to the web server.

PDF is not an easy to read format. You will need a library to extract the needed information.
The iTextSharp library can work, but you will need to walk though the tree structure of the document.
A (sometimes) simple alternative is to use the .Net port of PDFBox, as instructed in this article. PDFBox converts the PDF to a pure text representation that may be easier to parse. The bad side on this approach is that the IKVM.Net library that PDFBox uses is huge, ~17MB.

Designing PDF component for easy access

I have seen open source and commercial PDF components which support Dot net implementation, I think almost every available component in market,but the strange to identify a document that is protected or not, every one is showing in the form of exception rather than a property.Is there anything tricky behind this?
I would expect
Component.Load(inputFile.pdf);
If(Component.isProtected)
{
Component.Open(inputFile.pdf,password);
}
else
{
Component.Open(inputFile.pdf);
}
instead of the following regular approach
Try{
Component.Open(inputFile.pdf);
}
catch(Exception ex)
{
//bad password
//Some exception
}

All can be detected basically by checking for the respective "dictionaries" as described on pages 115 - 136 of the PDF spec: http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf

This is possible with Aspose.Pdf for .NET, which is a commercial .NET component. It has a boolean property IsEncrypted for encrypted file detection. Sample code is given below.
// load the source PDF doucment
PdfFileInfo fileInfo = new PdfFileInfo(dataDir + "protected.pdf");
// determine that source PDF file is Encrypted with password
bool encrypted = fileInfo.IsEncrypted;
MessageBox.Show("Encrypted: " + encrypted);
I work for Aspose as a Developer Evangelist.

list of fonts for a pdf and .indd file

I have to make an application which can get the list of fonts for a pdf and .indd file in an excel sheet. After lot of research I came to know that with C# it is not possible.I came across Indesign Navigator API in Visual Studio which can be integrated to the VS IDE. Iam aware of C#, javascript is there any way by which this could be made and can be run on MAC and windows OS both. Thank You!!

One way you could do this is by saving a text file out of InDesign and Acrobat with the font information. You could probably use extendscript to do this. The text file can then be imported easily into Excel as a csv or text file (whitespace delimited).
You weren't very clear about what your intentions are, but here's an example of a javascript that can pull font information out of InDesign to save a list of fonts for a document.
var doc = app.activeDocument;
var docFonts = doc.fonts.everyItem().getElements();
var fileContents = "";
for (var i=0; i < docFonts.length; i++) {
var font = docFonts[i];
fileContents += font.name + "\n";
};
var newFilePath = doc.filePath + "/" + doc.name.replace(/\.indd/,'') + "_fonts.txt";
var newFile = File(newFilePath);
newFile.open('w')
newFile.write(fileContents);

here is a possible approach...
It is possible to write out an XML representation of an InDesign file...
To generate IDML, choose File > Export Format: InDesign Markup (INDML)...
This is a zip with all the information.
There is a folder Resources which contains Fonts.xml (Resources: Fonts.xml)
This can be parsed cross-plattform because it just XML...
Here you find a description of the anatomy of a INDML InDesign Document...
http://www.indesignsecrets.com/downloads/Anatomy_of_IDML.pdf
Hope this helps...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Converting PDF, Doc and Docx to rtf in c# - c#

Related

c# , how to generate password protected arabic pdf or word document

Converting html to PDF using pdftron "ToPDF cannot convert this file format on this platform"

Uploading a resume in a pdf file and displaying in asp.net

Designing PDF component for easy access

list of fonts for a pdf and .indd file

Categories

Resources