Copy content from a Word document to another with the style - c#

I want to copy the content of a section in a Word document to a new document.
I do this to copy :
var docPath = #"C:\temp\myDoc.docx";
var doc = word.Documents.Open(FileName: docPath, ReadOnly: true);
var emptyDoc = word.Documents.Add();
doc.Sections.First.Range.Copy();
emptyDoc.Sections.First.Range.Paste();
This works well to copy content, but the style is not the same. How can I copy the complete section and have it rendered exactly the same way in the new document ?
If there is a better solution involving the OpenXML SDK instead of VSTO, I can take it.

You will find it much easier to automate Word if you do things manually first. That way you can get a better understanding of the various options available etc. You can also record a macro which will often, though not always, provide the answer.
In this instance you need to automate selecting 'Keep Source Formatting' from the context toolbar that appears after pasting. The code you need for that is:
emptyDoc.Sections.First.Range.PasteAndFormat wdFormatOriginalFormatting

Related

How to insert/fetch a cover page in word document using Microsoft.Office.Interop.Word C#

I'm creating a MsOffice template application(Winforms) to insert/evaluate the word document.
I want to insert a cover page and later after changes in cover page then i want to evaluate it, using interop c#. I searched a lot on internet but i didn't find suitable one.
Can any one please help me.
Thanks
So if your word template is the same (If the document already exists) each time you essentially have to:
Copy The Template
Work On The Template
Save In Desired Format
Delete Template Copy
Each of the sections that you are replacing within your word document you have to insert a bookmark for that location (easiest way to input text in an area).
I always create a function to accomplish this, and I end up passing in the path - as well as all of the text to replace my in-document bookmarks. The function call can get long sometimes, but it works for me.
Application app = new Application();
Document doc = app.Documents.Open("sDocumentCopyPath.docx");
if (doc.Bookmarks.Exists("bookmark_1"))
{
object oBookMark = "bookmark_1";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text =
"My Text To Replace bookmark_1";
}
if (doc.Bookmarks.Exists("bookmark_2"))
{
object oBookMark = "bookmark_2";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text =
"My Text To Replace bookmark_2";
}
doc.ExportAsFixedFormat("myNewPdf.pdf", WdExportFormat.wdExportFormatPDF);
((_Document)doc).Close();
((_Application)app).Quit();
The above code will get inserting text working for you - is there a reason that you have to re-evaluate the document afterwards if you know (and can add in checks before you attempt to insert ie: if the bookmark doesn't exist).
If you need some more explanation I can help as well :) my example saves it as a .pdf, but you can do any format you prefer.

How to parse text from MS Word document to string

I am trying to find a way to parse a word document's text to a string in my project.I have more than 600 word(.doc) files that I need to get the text content(with the new lines and tabs if possible) and assign it to a string for each one.
I've been reading stuff about the Open XML SDK but it looks quite complicated for something that looks so simple.
Open XML SDK is only for 2007 and newer formats and it is not trivial to use.
If performance is not an issue you could use Word Automation and have Word do this for you.
It will look something like this:
var app = new Application();
var doc = app.Documents.Open(documentLocation);
string rangeText = doc.Range().Text;
doc.Save();
doc.Close();
Marshal.ReleaseComObject(doc);
Marshal.ReleaseComObject(app);
Take a look at http://www.codeproject.com/Articles/18703/Word-2007-Automation or http://www.codeproject.com/Articles/21247/Word-Automation for more complete examples and instructions. Note that this may become a bit more tricky if your documents are move complex (footnotes, text boxes, tables...).
Another option is have word save the document as a text and then read the text file. Take a look at this - http://msdn.microsoft.com/en-us/library/microsoft.office.tools.word.document.saveas(v=vs.80).aspx
You could give a look at NPOI:
This project is the .NET version of POI Java project at
http://poi.apache.org/. POI is an open source project which can help
you read/write xls, doc, ppt files. It has a wide application.
Take a look at this previous SO thread for more information.

Prepare doc and docx Files for Lucene indexing

I wanted to ask if there is a quick way of getting content of a document into a single document field. All the examples i have seen have relatively short strings. I cannot save an entire journal article into a string and indexthat is there a quick way of telling lucene to index all the words in a file? I am using Lucene.net 3.03 for this application.
There is not an easy way to pass just the file, you have to provide the entire content to lucene to made the indexing for the search. Here is a answer from the Q/A about indexing PDF, but is the same from every type of document, just open it and index to lucene.
You can just pass a System.IO.TextReader to a Field. If the file is plain text, or something like it, you should just be able to open the Reader on it, and pass it directly into the Field, like:
System.IO.TextReader reader = new StreamReader("path/to/my/file.txt");
Field field = new Field("fieldName", reader);
document.add(field);

How to replace text in a PDF with C#?

I saw a lot of solutions in here but none are clear or good answers.
Here is my simple question, hoping with a straight answer.
I have a PDF file (a template) which is created having text something like this:
{FIRSTNAME} {LASTNAME} {ADDRESS} {PHONENUMBER}
is it possible to have C# code that replace these templates with a text of my choice?
No fields, no other complex stuff.
Is there any Open source library helping me achieve that?
This thread is dead, however I'm posting my solution for other lost souls that might face this problem in the future. Unfortunately my company doesn't allow posting code online so I'll describe the solution :).
So basically what you have to do is use PdfSharp and modify this sample to replace text in stream, but you must take into account that text may be split into many parentheses (convert stream to string to see what the format is).
Then, with code similar to this sample traverse through source pdf page by page and modify current page by searching for PdfContent items inside PdfReference items and replacing text in content's stream.
The 'problem' with PDF documents is that they are inherently not suitable for editing. Especially ones without fields. The best thing is to step back and look at your process and see if there is a way to replace the text before the PDF was generated. Obviously, you may not always have this freedom.
If you will be able to replace text, then you should be aware that there will be no automatic reflow of the text following the replaced text. Given that you are fine with that, then there are very few solutions that allows you to replace text.
I know that you are looking for an OpenSource solution so I feel reluctant to offer you a commercial solution. We offer one called PDFKit.NET. It allows you to extract all content on a page as so-called shapes (text, images, curves, etc.). See method Page.CreateShapes in the type reference. You can then programmatically navigate and edit this structure of shapes and then write it back to a PDF again.
Here it is:
http://www.tallcomponents.com/pdfkit
Disclosure: I am the founder of TallComponents, vendor of this component
For simple text replace use iTextSharp library.
The code that replace one string with another is below.
Note that this will replace only simple text and may not work in all cases.
//using iTextSharp.text.pdf;
void VerySimpleReplaceText(string OrigFile, string ResultFile, string origText, string replaceText)
{
using (PdfReader reader = new PdfReader(OrigFile))
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
byte[] contentBytes = reader.GetPageContent(i);
string contentString = PdfEncodings.ConvertToString(contentBytes, PdfObject.TEXT_PDFDOCENCODING);
contentString = contentString.Replace(origText, replaceText);
reader.SetPageContent(i, PdfEncodings.ConvertToBytes(contentString, PdfObject.TEXT_PDFDOCENCODING));
}
new PdfStamper(reader, new FileStream(ResultFile, FileMode.Create, FileAccess.Write)).Close();
}
}
As stated in similar thread this is not really possible an easy way. The easier way it seems to be getting a DocX file and using DocX library which allow easy word swapping and then converting your DocX to PDF (using PDF Creator printer or so).
Or use pdf sharp/migradoc to create new documents.
Updating in PDF is hard and dirty. So may be adding a content on top of existing will work for you as well, as it worked for me. If so, here's my primitive, but working solution covering a lot of cases ("covering", indeed):
https://github.com/astef/PatchPdfText

C# WPF Open File and edit certain text

So let's say I have a program with just a text box and an okay button. The user types in whatever word he wants, and when he clicks ok, it opens a specific file called Test.doc and CTRL+F for the word "test" and replaces it with whatever the user entered into the text box. How can I open said file and replace instances of the word test with the user's defined word?
Ignoring the format of the document, you could literally use the folowing for any type of file:
var contents = System.IO.File.ReadAllText(#"C:\myDoc.doc");
contents = contents.Replace("Test", "Tested");
System.IO.File.WriteAllText(#"C:\myDoc.doc", contents);
The best way would be to use the ms office interop library though.
Andrew
A number of things:
I'd recommend using a FileDialog to get the file's location. This lets you select the file to edit, but also gives you functionality to only show the file types that you want to handle in this program.
If you're handling .doc's, I'd suggest you look into VSTO and opening word docs. Here's a guide I found after a quick search. I'd suggest using it as a place to start, but you'll need to look around for more specifics.
Lastly, the string.Replace("", ""); method is probably very helpful in the CTRL-F functionality. You should be able to extract a string of the text from whatever document you're analyzing and use that method.

Categories

Resources