Replacing contents inside docx and pdf file using asp.net c#

Replacing contents inside docx and pdf file using asp.net c# - c#

In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?

For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian

For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.

Related

Convert Word to PDF in c# with free libs

I know that exist some post about this issue but never cover my simple problem. I need to print my DOCX or XLXS to PDF. First of all DOCX.
I don't want to use word Interop because this required word on server machine.
I try to use PDFCreator but I can't pass my docx to PDFCreator... or I'm not able to do this.
UPDATE
Using free libs and without word interop

I used NuGet to install something called freeSpire.Doc which claims to convert the first 500 paragraphs for free. Seems to working so far.

File -> Export -> Create PDF/XPS-document

Is it possible to generate .docx files without having MS Word installed?

I want to use "OLE automation" (or whatever it's called now) to generate a Word document.
I assume that it's possible to perform the following programmatically:
Set page size (height, width, margin vals)
Set font type/name, style, and size
Add page numbering
Add pages
Insert page breaks
What I'm not sure of is if I need to have MS Word on my system to do this (to have the necessary DLLs, perhaps)? I use Open Office (I like it, and it's free), but I reckon controlling the creation of docs programmatically is probably easier/better documented for MS Word than it is for Open Office and/or Libre Office - that's why I'm strongly considering making this "rendezvous with Redmond."
This question is tangentially related to this one
If Google Docs is a possibility here, I'd be willing to have a "meeting with Mountain View" but I know nothing about that file format or whether it can be "automated" etc.
I need to end up with something that I can either convert to a PDF file or a DOCX file. Open Office can open DOCX and convert files to PDF, but I don't know about Google Docs.

I've found https://docx.codeplex.com/ to be very useful in dynamically building docx documents.

Yes,
it is possible. Check this link: http://www.microsoft.com/en-us/download/details.aspx?id=30425
this is a library for open xml documents (*.docx, *.xlsx and powerpoint files)

yes you can Use Openxml , also with openXml you can create Excel Pdf and ...
Check This out

You can use this library to generate document by template:
https://github.com/StasClick/DocumentGenerator
'DocumentGenerator' can generate one leaflet, multiple leaflets in one document or registers.

How to add dynamic text to PDF toolbar using ITEXTSHARP

Hi All,
I am creating a PDF document using ITEXTSHARP. I need to add some content to PDF toolbar while creating the PDF document. How can i achieve this using C#. Please see the attached image for reference.
Thanks in advance.

iTextSharp is used to generate PDF files, not modifying the PDF viewer. If you need to modify toolbars and stuff like this in Adobe Reader this definitely is not something that you could achieve with iTextSharp.

eh...
Ok so how to do it.
Make template in Word.
eg of Word
Name <FirstName>
Surname <LastName>
Job <JobType>
Salary <Salary>
When generating:
Open word and replace and other marks
Then makepdf (pdfcreator for example)
Edit:
Okay Ill show u schema, no ready code cuz little busy
1) Create word template and
store it in safe place. 2) Copy
template to temp folder 3) Open in
programicaly in C# and replace
"" with ur data
.Replace('', 'Voon') 4)
Programiticaly print to PDF and save
it.

Only a plugin can modify the acrobat/reader toolbar. There might be C# bindings for the acrobat API these days, but I wouldn't count on it.
PS: You can make Acrobat plugins for free. To "Reader Enable" a plugin requires Adobe's direct intervention, and $$$. They sign a version of the plugin, and only that signed version will run in Reader.
Your best bet is to go looking for some third-party PDF viewer. I still wouldn't count on this feature being available, but it's better odds than "0".

What is the best way to populate a Word 2007 template in C#?

I have a need to populate a Word 2007 document from code, including repeating table sections - currently I use an XML transform on the document.xml portion of the docx, but this is extremely time consuming to setup (each time you edit the template document, you have to recreate the transform.xsl file, which can take up to a day to do for complex documents).
Is there any better way, preferably one that doesn't require you to run Word 2007 during the process?
Regards
Richard

I tried myself to write some code for that purpose, but gave up. Now I use a 3rd party product: Aspose Words and am quite happy with that component.
It doesn't need Microsoft Word on the machine.
"Aspose.Words enables .NET and Java applications to read, modify and write Word® documents without utilizing Microsoft Word®."
"Aspose.Words supports a wide array of features including document creation, content and formatting manipulation, powerful mail merge abilities, comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats. Aspose.Words is truly the most affordable, fastest and feature rich Word component on the market."
DISCLAIMER: I am not affiliated with that company.

Since a DOCX file is simply a ZIP file containing a folder structure with images and XML files, you should be able to manipulate those XML files using our favorite XML manipulation API. The specification of the format is known as WordprocessingML, part of the Office Open XML standard.
I thought I'd mention it in case the 3rd party tool suggested by splattne is not an option.

Have you considered using the Open XML SDK from Microsoft? The only dependency is on .NET 3.5.
Documentation: http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx
Download: http://www.microsoft.com/downloads/details.aspx?familyid=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en

Use invoke docx lib. it supports table data (http://invoke.co.nz/products/help/docx_tables.aspx). More info at http://invoke.co.nz/products/docx.aspx

Have you considered using VB? You could create a separate assembly to populate your document.
I know you are looking for a C# solution, but the XML literal support is one area where XML literal support could help you populate the document. Create a document in Word to server as a template, unzip the docx, paste the relevant XML section you want to change into you VB code, and add code to fill in the parts you wish to change. It's difficult to say from your description if this would meet your requirements but I would suggest looking into it.

Parsing Office Documents

I`d like to be able to read the content of office documents (for a custom crawler).
The office version that need to be readable are from 2000 to 2007. I mainly want to be crawling words, excel and powerpoint documents.
I don`t want to retrieve the formatting, only the text in it.
The crawler is based on lucene.NET if that can be of some help and is in c#.
I already used iTextSharp for parsing PDF

If you're already using Lucene.NET you might just want to take advantage of the various IFilters already available for doing this. Take a look at the open source SeekAFile project. It will show you how to use an IFilter to open and extract this information from any filetype where an IFilter is available. There are IFilters for Word, Excel, Powerpoint, PDf, and most of the other common document types.

There is an excelent open source project POI, only drawback - it is written for Java.
The .net port is somehow very beta.

Here is a good list of various tools for converting Word documents to plaintext, which you can then do whatever with.

Here's a nice little post on c-charpcorner by Krishnan LN that gives basic code to grab the text from a Word document using the Word Primary Interop assemblies.
Basically, you get the "WholeStory" property out of the Word document, paste it to the clipboard, then pull it from the clipboard while converting it to text format. The clipboard step is presumably done to strip out formatting.
For PowerPoint, you do a similar thing, but you need to loop through the slides, then for each slide loop through the shapes, and grab the "TextFrame.TextRange.Text" property in each shape.
For Excel, since Excel can be an OleDb data source, it's easiest to use ADO.NET. Here's a good post by Laurent Bugnion that walks through this technique.

You might also consider checking out DtSearch (www.DtSearch.com). Although it is primarily a searching tool, it does a great job of extracting text from a large number of file types and is considerably cheaper than other options like the Oracle/Stellent OutsideIn technology or the equivalent from Autonomy.
I've been using DtSearch for years and find it indispensible for this type of task.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replacing contents inside docx and pdf file using asp.net c# - c#

Related

Convert Word to PDF in c# with free libs

Is it possible to generate .docx files without having MS Word installed?

How to add dynamic text to PDF toolbar using ITEXTSHARP

What is the best way to populate a Word 2007 template in C#?

Parsing Office Documents

Categories

Resources