Editing a Word .docx file from C# application

Editing a Word .docx file from C# application - c#

I'm working on a C# project and I need to open a word doc and do a search/replace on it and save the result for later editing within Word itself.
This is to be a stand alone application and not a Word plugin.
Is there any simple code to get me started?
I've searched and not found anything helpful.
EDIT:
Looks like the nuget package DocX will do what I need.
http://docx.codeplex.com/
http://nuget.org/packages/DocX

If you save the doc as a .xml initially from within word you could open it as plain markup (as opposed to a binary) and do a (very rough) search and replace of the raw doc, you'll have to make sure you didn't mangle any tags containing the target words, but it would work.
You'll preserve all formatting and will be able to open/redistribute it as normal in word, the .xml is basically just an uncompressed .docx .
Edit: Giving this is a possible easy solution, not necessarily saying it's the best idea.

with Open XML you can open and manipulate a word document.

Related

Is it at all possible to convert a document to PDF or edit a PDF in C# using only free software?

I had this stupid idea of creating a template as a .docx or .rtf or .pdf and then replacing the text in that document to generate reports. This seemed like a better way of doing it than using paid reporting software.
Well, I believe I've tried just about everything now and I'm amazed at how impossible it is to do anything with pdfs.
Try 1
HTML -> PDF
A lot harder to design the template. It doesn't look the same when you print it. Never got it working outside of a command line example (not sure how well, say, iTextSharp-LGPL would even work or if it could handle base64 strings as I'm not sure how else you are going to tell it about images). In any case, doing it this way makes it too hard to design the template.
Try 2
OpenXml -> PDF
I stupidly assumed that because Word could save as PDF that OpenXml could to. I was wrong. It cannot save as a PDF.
Try 3
OpenOffice/LibreOffice (docX -> PDF)
It can't read OpenXml which is a problem because I was editing the template as OpenXml and then saving that result (as a .docx) but it can't read that saved document.
Try 4
iTextSharp LGPL
This one just doesn't work, lol. And apparently even though when you google "convert rtf to pdf" the ONLY thing that comes up is iText and its derivatives it doesn't convert rtf documents to pdf documents. I verified this myself (it only saves the text not the formatting) and later found this post to convince me I wasn't doing something wrong.
Try 5
PDF -> PDF
Since converting ANYTHING to a PDF seems to be impossible maybe I can save the template as a PDF and just do a text replace on that. Nope, lol, that is apparently a very difficult thing to do.
Try 6
Pandoc (.odt/.docx -> pdf), (.rtf -> .pdf not supported)
pandoc mockup2.odt -s -o mockup2.pdf
link to the files in the picture. *note, it messes up in the same way if you try converting .odt/.docx to .tex.
What do I do here? Buy software so that I can save a file as PDF? Is that the only option?

I have a solution. I'm not saying it's the best solution. LibreOffice (or possibly OpenOffice if you are so inclined) accepts command line arguments that will do the switch.
soffice.exe --headless --convert-to pdf mockup.odt
*note - this is after I added libreoffice to my path (C:\Program Files\LibreOffice\program). idk why it's called soffice.exe instead of libreoffice.exe.
where i found the answer
relevant documentation

I might have a working solution for you, if you are stuck with the docx-file for the template.
I found one free solution for docx to pdf conversions, without using microsoft.interop, etc.: See first answer in this stack overflow post
It uses two tools: The open xml power tools and DinkToPdf (Which is essentially a wkhtmltopdf wrapper). The html to pdf part works just fine, but the docx to html part looks like a catastrophe at first. You can fix this with custom css (There are some resources online).
Powertools-.NetStandard
DinkToPdf-GitHub
There are more possibilities for proprietary software, like Asposes.Words and Syncfusion file-formats. Most of the proprietary solutions are pretty expensive...
If you are just working on a Windows Environment, where MS-Office is installed, you can use Microsoft.Interop. It is by far the easiest solution (In this post, Interop is mentioned several times Stackoverflow Word to PDF
If you found another (better) working solution, please let me know. I still have not decided if I will use a proprietary or a free solution. :-)

Change Document template path in multiple Word Templates

I have about 500 word documents that I need to change the server name in the Document template path. I am not an expert with VBA but I have tried several solutions that have not worked for me. Is there a way to do this (perhaps with C#, with a foreach loop on the directory?) that I can do a very simple find and replace on this field ?
i.e.
\\ASDCFS\NtierFiles\...
becomes
\\NewServer\NtierFiles\...

You can't write to the field in the dialog box directly. In the object model the equivalent is Document.AttachedTemplate and yes, you can work with that. Using in the object model (whether using VBA or C#) you'd loop the documents in a folder, open each in Word, assign the correct path, save and close.
More efficient and less prone to "hiccups" if the original template path is already invalid, would be to edit the documents' Word Open XML directly, without using the Word application. The Open XML SDK would be a good tool for this. It provides the AttachedTemplate class (https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.attachedtemplate(v=office.14).aspx).

You can use WTC to correct the template path in a bulk of documents. You find the source code and binary on Github: https://github.com/NeosIT/wtc

How to convert a word document to a text file in c# without using microsoft.office.interop?

I have plenty of different versions of word documents which have to be converted to text files.
I hope this link brings you right way
How to extract text from Word files using C#?
I want to read the content of the word document and remove all the formats(just have words in text files). I have done by using microsoft.office.interop(here, always instantiate a Word on the client) which is not recommended. So I am trying to create a c# project which should convert word to text automatically. Can anyone suggest me any 3rd party tool which should be efficient open source or reasonable price for all the versions of word to text file conversion in c#?
With Regards,
Shanthini

Finally I found solution which perfectly works for me at the moment. I haven't test with 10000 documents. Here you go., http://sourceforge.net/projects/word-reader/?source=dlp
Comments and suggestions are expecting about this solution...
Thank you,
Shanthini

Replacing contents inside docx and pdf file using asp.net c#

In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?

For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian

For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.

Databinding in existing XPS document

I have an existing XPS file that I would like to use as a template and possibly bind data to it. I have tried several methods, but cannot seem to get it to work.
Does anyone have any experience altering an existing XPS file to add data at runtime and then print or save?
Any help is appreciated.

XPS documents conform to the Open XML standard. There is an SDK for working with these docs. Here is a How-to article by Beth Massi: "Accessing Open XML Document Parts with the Open XML SDK".
Since you are working with the internal doc structure you might also check out 'Open XML Package Editor" which lets you explore the doc with Visual Studio. Here is another How-to by Beth Massi: "Handy Visual Studio Add-In to View Office 2007 Files".
+tom

it's a bit of a challenge to do this with XPS, but it is possible.
You can do this with our NiXPS SDK.
I've posted an example on my blog a while ago:
XPS variable data example
Regards,
Nick

Bindings are evaluated during the process of writing to an XPS document. So you can't set up a {Binding} in a FixedDocument, Write that FD to an XpsDocument, and expect to get that original FD back again when you next open that saved doc.
Also, the standard XpsWriter does convert everything into Glyphs on canvases, so you can't, say, a textbox in the original and expect to be able to find it after its been saved to a document.
I've never used the NiXPS libraries, so if Nick says it can be done you might want to check it out.
One last possibility--You can create placeholders in a form that you will be able to find later. They'd have to be text (something like [[{{FORMFIELDHERELOL}}]]) with some kind of delimiter scheme to differentiate the text from everything else. You could then go spelunking in the XML looking for text that fits the delimeter pattern and switch out those glyphs for your binding text. Of course, the issue with THAT is that if you aren't putting X chars in place of X chars you might find you have to do some repositioning. As its all glyphs on canvas this might be slightly harder than, say, threading a needle with a shoelace.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.