I wrote a utility for another team that recursively goes through folders and converts the Word docs found to PDF by using Word Interop with C#.
The problem we're having is that the documents were created with date fields that update to today's date before they get saved out. I found a method to disable updating fields before printing, but I need to prevent the fields from updating on open.
Is that possible? I'd like to do the fix in C#, but if I have to do a Word macro, I can.
As described in Microsoft's endless maze of documentation you can lock the field code. For example in VBA if I have a single date field in the body in the form of
{DATE \# "M/d/yyyy h:mm:ss am/pm" \* MERGEFORMAT }
I can run
ActiveDocument.Fields(1).Locked = True
Then if I make a change to the document, save, then re-open, the field code will not update.
Example using c# Office Interop:
Word.Application wordApp = new Word.Application();
Word.Document wordDoc = wordApp.ActiveDocument;
wordDoc.Fields.Locked = 1; //its apparently an int32 rather than a bool
You can place the code in the DocumentOpen event. I'm assuming you have an add-in which subscribes to the event. If not, clarify, as that can be a battle on its own.
EDIT: In my testing, locking fields in this manner locks them across all StoryRanges, so there is no need to get the field instances in headers, footers, footnotes, textboxes, ..., etc. This is a surprising treat.
Well, I didn't find a way to do it with Interop, but my company did buy Aspose.Words and I wrote a utility to convert the Word docs to TIFF images. The Aspose tool won't update fields unless you explicitly tell it to. Here's a sample of the code I used with Aspose. Keep in mind, I had a requirement to convert the Word docs to single page TIFF images and I hard-coded many of the options because it was just a utility for myself on this project.
private static bool ConvertWordToTiff(string inputFilePath, string outputFilePath)
{
try
{
Document doc = new Document(inputFilePath);
for (int i = 0; i < doc.PageCount; i++)
{
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Tiff);
options.PageIndex = i;
options.PageCount = 1;
options.TiffCompression = TiffCompression.Lzw;
options.Resolution = 200;
options.ImageColorMode = ImageColorMode.BlackAndWhite;
var extension = Path.GetExtension(outputFilePath);
var pageNum = String.Format("-{0:000}", (i+1));
var outputPageFilePath = outputFilePath.Replace(extension, pageNum + extension);
doc.Save(outputPageFilePath, options);
}
return true;
}
catch (Exception ex)
{
LogError(ex);
return false;
}
}
I think a new question on SO is appropriate then, because this will require XML processing rather than just Office Interop. If you have both .doc and .docx file types to convert, you might require two separate solutions: one for WordML (Word 2003 XML format), and another for OpenXML (Word 2007/2010/2013 XML format), since you cannot open the old file format and save as the new without the fields updating.
Inspecting the OOXML of a locked field shows us this w:fldLock="1" attribute. This can be inserted using appropriate XML processing against the document, such as through the OOXML SDK, or through a standard XSLT transform.
Might be helpful: this how-do-i-unlock-a-content-control-using-the-openxml-sdk-in-a-word-2010-document question might be similar situation but for Content Controls. You may be able to apply the same solution to Fields, if the the Lock and LockingValues types apply the same way to fields. I am not certain of this however.
To give more confidence that this is the way to do it, see example of this vendor's solution for the problem. If you need to develop this in-house, then openxmldeveloper.org is a good place to start - look for Eric White's examples for manipulating fields such as this.
Related
I am trying to read through a word document using Open XML.
I am looking for key tags within the document in order to identify the values i need to pick up from the document.
I am looping through each paragraph, and then each run within the document to be able to find these.
However it appears that the spelling & grammar check is causing problems, splitting up the "runs" within the documents with any errors it identifies with "ProofError" elements, which is making it difficult to parse the document correctly.
I have tried to remove all ProofError elements and save the document, however they appear to come back.
If i run the spelling and grammar check within MS Word manually there is no issue, though this isn't practical.
Does anyone know a way I can get around this?
Sample text from doc:
Communication System: UID 0, CW (0); Frequency: 900 MHz;Duty Cycle: 1:1
Medium: 900MHz HSL Medium parameters used: f = 900 MHz; σ = 0.979 S/m; εr = 40.68; ρ = 1000 kg/m3
Code used to explore the document
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(openFileDialog.FileName, false))
{
// start looking through the file here
// correct proof errors here
Body body = wordDocument.MainDocumentPart.Document.Body;
foreach (Paragraph p in body.OfType<Paragraph>())
{
p.GetType();
List<ProofError> errList = new List<ProofError>();
foreach (ProofError err in p.OfType<ProofError>())
{
errList.Add(err);
}
foreach (ProofError err in errList)
{
err.Remove();
}
}
wordDocument.Save();
}
The code above will remove any of the ProofError elements from the list, I hope that doing this and saving it would merge any similar runs together, however the proof errors come back when saving.
Screenshot below should show you the children of a paragraph.
Link to an example Document which throws up errors - these are due to the language being incorrect, but i have no control over the format coming in to me, and there will be other errors thrown up unrelated to language.
Sample File
I have a Word document, letter.docx, that is a letter I intend to mail to hundreds of people for a party. The letter is already composed and has been formatted in its own special way with varying type sizes and fonts. It's set and ready to go, with placeholders where I have to fill out variables that change like Name, Address, phone number, etc.
Now, I would like to write a C# program where a user can type in variable things like Name, Address, etc., into a form, hit a button, and produce letter.docx with the right information filled in at the right places.
I understand Word has features that allow you do this, but I really want to do this in C#.
Of course you can do it. Use Microsoft.Office.Interop.Word reference in your project.
First bookmark all the fields you want to be updated in the document from 'insert' tab (eg. NameField is bookmarked with tag 'name_field'). Then, in your C# code add the following:
Microsoft.Office.Interop.Word.Application wordApp = null;
wordApp = new Microsoft.Office.Interop.Word.Application();
wordApp.Visible = true;
Document wordDoc = wordApp.Documents.Open(#"C:\test.docx");
Bookmark bkm = wordDoc.Bookmarks["name_field"];
Microsoft.Office.Interop.Word.Range rng = bkm.Range;
rng.Text = "Adams Laura"; //Get value from any where
Remember to properly save & close the document.(You can see this)
I don't know of anything built into the language, but the example here seems to do exactly what you want.
If you can provide specific examples of what you want to do (are the placeholders Fields? specifically name bits of text?), I can probably give you a more refined answer that directly targets your problem.
Word Provides COM objects that one can make use of in C#
Add a reference to the Microsoft office interop under the COM tab in the add reference dialog
Also, see this question:
Filling in FIelds in work using C#
I had a situation where I needed to fill out some MS Word forms, so I used something similar to the following code (make sure you reference Microsoft.Office.Interop.Word; I used version 14, but you should adjust it to your own scenario):
// FormData is a custom container type that holds data... you'll have your own.
public static void FillOutForm(FormData data)
{
var app = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = null;
try
{
var filePath = "Your file path.";
doc = app.Documents.Add(filePath);
doc.Activate();
// Loop over the form fields and fill them out.
foreach(Microsoft.Office.Interop.Word.FormField field in doc.FormFields)
{
switch (field.Name)
{
// Text field case.
case "textField1":
field.Range.Text = data.SomeText;
break;
// Check box case.
case "checkBox1":
field.CheckBox.Value = data.IsSomethingTrue;
break;
default:
// Throw an error or do nothing.
break;
}
}
// Save a copy.
var newFilePath = "Your new file path.";
doc.SaveAs2(newFilePath);
}
catch (Exception e)
{
// Perform your error logging and handling here.
}
finally
{
// Make sure you close things out.
// I tend not to save over the original form, so I wouldn't save
// changes to it -- hence the option I chose here.
doc.Close(
Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges);
app.Quit();
}
}
As you can see, it's really not that hard at all. There are some other options on forms, so you'll have to research them, but the most general ones, the check box and the text box, are the ones I demonstrated here. If you didn't create a form, I suggest going through and making sure that you know all the fields, as that's what you'll need for this.
For a customer of mine I need to force the spell checking in a certain language.
I have explored the MSDN documentation and found that when calling the CheckSpelling() method in the active document, it will invoke the spelling check. This method has parameters for custom dictionaries.
My problem is that I can't find anything about those dictionaries or how to use them.
Also there is still the possibility that there is of course another way to do this.
Can anybody boost me in the right direction?
Found my solution:
foreach (Range range in activeDocument.Words)
{
range.LanguageID = WdLanguageID.wdFrenchLuxembourg;
}
Edit after comment
Since my activedocument is in a variable I seem to lose the static Range property. I found a work arround by doing the following. (lan is my variable where i keep my WdLanguageId)
object start = activeDocument.Content.Start;
object end = activeDocument.Content.End;
activeDocument.Range(ref start, ref end).LanguageID = lan;
thanks #Adrianno for all the help!
The Spell Checker uses the language of the text to select rules and dictionaries (look here to check how it works).
You have to set the text language to what you need and then SC will use that language. Follow this link for more details:
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.language.aspx
I have been working with this lately and thought I would add a bit to the already given answers.
To get a list of spelling errors in the document for a certain language, doing the following would get you going:
// Set the proofing language
myDocument.Content.LanguageID = WdLanguageID.wdDanish;
// Get the spelling errors (returns a ProofreadingErrors collection)
var errors = myDocument.SpellingErrors;
// There is no "ProofreadingError" object -> errors are accessed as Ranges
foreach (Range proofreadingError in errors)
Console.WriteLine(proofreadingError.Text);
As pointed out by Adriano, the key is to specify the language of the document content at first, and then you can access the spelling errors for the given language. I have tested this (Word Interop API version 15, Office 2013), and it works.
If you want to get suggestions for each of the misspelled words as well, I suggest you take a look at my previous answer to that issue: https://stackoverflow.com/a/14202099/700926
In that answer I provide sample code as well as links to relevant documentation for how that is done. In particular, the sample covers how to carry out spell checking of a given word in a certain language (of your choice) using Word Interop. The sample also covers how to access the suggestions returned by Word.
Finally, I have a couple of notes:
In contrast to the current accepted answer (your own) - this approach is much faster since it do not have to iterate through each word. I have been working with Word Interop for reports (100+ pages) and trust me, you don't want to sit and wait for that iteration to finish.
Information regarding the SpellingErrors property can be found here.
Information regarding the non-existence of a ProofreadingError object can be found here.
Never user foreach statements when accessing Office object. Most of the Office objects are COM object, and using foreach leads to memory leaks.
The following is a piece of working code
Microsoft.Office.Interop.Word.ProofreadingErrors errorCollection = null;
try
{
errorCollection = Globals.ThisAddIn.Application.ActiveDocument.SpellingErrors;
// Indexes start at 1 in Office objects
for (int i = 1; i <= errorCollection .Count; i++)
{
int start = errorCollection[i].Start;
int end = errorCollection[i].End;
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
finally
{
// Release the COM objects here
// as finally shall be always called
if (errorCollection != null)
{
Marshal.ReleaseComObject(errorCollection);
errorCollection = null;
}
}
While I doubt it, if I open up a word document using OpenXML sdk in C# and add some info, is there any way for me to see if it still fits one page?
If it doesn't I wan't to reduce font size on specific items I added until it fits.
I could write this algorithm if I had the current size in relation to page size with margins and all that.
I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA...
var app = new Word.Application();
var doc = app.Documents.Open("path/to/file");
doc.Repaginate()
var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;
I have a situation where I need to copy all of the form fields from one PDF to another. The purpose is to automate the overlaying of the fields when small edits are made to the underlying Word pages.
I've been using the trial version of Aspose.Pdf.Kit, and I'm able to copy everything but Radio buttons to a new form. However Aspose doesn't support copying the radio buttons, which completely nullifies it's usefulness, not to mention their customer support has been subpar.
In any event, I'm looking for some sort of library or plug-in that does support copying all types of form fields.
Does anyone have any ideas?
Thanks,
~DJ
Yes, it is possible. No, setField() won't do the trick... madisonw's code will copy the field values, but not the fields themselves.
OTOH, it really isn't that hard.
Something like:
PdfReader currentReader = new PdfReader( CURRENT_PDF_PATH ); // throws
PdfReader pdfFromWord = new PdfReader( TWEAKED_PDF_FROM_WORD_PATH ); // throws
PdfStamper stamper = new PdfStamper( currentReader , outputFile ); //throws
for( int i = 1; i <= tempalteReader.getNumberOfPages(); ++i) {
stamper.replacePage( pdfFromWord, i, i );
}
stamper.close(); // throws
I'm ignoring a bunch of exceptions, and am writing in Java, but C# should look virtually identical.
Also, this code ignores the case where someone ADDS A PAGE... which would get quite thorny. Was it added before or after the pages with fields on them? Did those pages reflow at all, requiring you to move the fields? At that point you really need a manual process with Acrobat Pro.
I agree with Oded, iTextSharp should be able to do the job. I've used code similar the following snippet and never had problems with any field types. I'm sure there must have been a radio button in the mix.
private void CopyFields(PdfStamper targetFile, PdfReader sourceFile){
{
foreach (DictionaryEntry de in targetFile.AcroFields.Fields)
{
string fieldName = de.Key.ToString();
target.AcroFields.SetField(fieldName, sourceFile.AcroFields.GetField(fieldName));
}
}