I have got about 200 word documents that I need to pdf.
Obviously, I cannot pdf them one by one as, first it will take ages, second I am sure it is not good practice to do so.
I need to find a way to automate that conversion, since we will need to this again and again.
I use C#, but the solution does not necessarily have to be in c#, but it is preferred.
I have had a look at few libraries such as PDfCreator, Office 2007 add-in, ITextSharp, and so forth and there is not any clear answer on the forums.
PDFCreator has c# sample, but it does only work with txt files.
Office 2007 add in does not have document locking capabilities which a must on the automation.
has anyone implemented such scenario before? I would like you hear your suggestions.
Thanks in advance
regards
You can try the method in this blog post:
http://angrez.blogspot.com/2007/06/create-pdf-in-net-using-pdfcreator.html
I'm doing this to automate the conversion of our doc and docx documents to pdf:
private bool ConvertDocument(string file)
{
object missing = System.Reflection.Missing.Value;
OW.Application word = null;
OW.Document doc = null;
try
{
word = new OW.Application();
word.Visible = false;
word.ScreenUpdating = false;
Object filename = (Object)file;
doc = word.Documents.Open(ref filename, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing);
doc.Activate();
if (Path.GetExtension(file) == ".docx")
file = file.Replace(".docx", ".pdf");
else
file = file.Replace(".doc", ".pdf");
object fileFormat = OW.WdSaveFormat.wdFormatPDF;
doc.ExportAsFixedFormat(file, OW.WdExportFormat.wdExportFormatPDF, false, OW.WdExportOptimizeFor.wdExportOptimizeForPrint,
OW.WdExportRange.wdExportAllDocument, 1, 1, OW.WdExportItem.wdExportDocumentContent, true, true, OW.WdExportCreateBookmarks.wdExportCreateNoBookmarks,
true, true, false, ref missing);
}
catch(Exception ex)
{
return false;
}
finally
{
if (doc != null)
{
object saveChanges = OW.WdSaveOptions.wdDoNotSaveChanges;
((OW._Document)doc).Close(ref saveChanges, ref missing, ref missing);
doc = null;
}
if (word != null)
{
((OW._Application)word).Quit(ref missing, ref missing, ref missing);
word = null;
}
}
return true;
}
where OW is an alias for Microsoft.Office.Interop.Word.
Have you check this MSDN article?
Edit:
Notice that this "How To" samples will not work as-is because:
For some reasons it runs over the program parameters (ConvertDocCS.exe [sourceDoc] [targetDoc] [targetFormat]) in line #77, #81 & #82.
I converted the project to VS 2010 and had to re-reference Microsoft.Office.Core. It's a COM reference called Microsoft Office 12.0 Object Library.
The sample do not except a relative path.
I'm sure you will manage to overcome those obstacles :)
One last thing. If you are working with .NET 4 you don't need to send all those annoying Missing.Value thanks to the wonder of optional parameters.
You may try Aspose.Words for .NET to convert DOC files to PDF. It can be used in any .NET application with C# or VB.NET like any other .NET assembly. It also work on any Windows OS and in 32/64-bit systems.
Disclosure: I work as developer evangelist at Aspose.
As HuBeZa said, if Word is installed on your workstation, you can use Word Automation to open your files one by one and save them as PDF.
All you need is referencing the COM component "Microsoft Word Object Library" and play with the classes of this assembly.
The execution time will probably a bit long, but your conversions will be automated.
We can set fonts for word automation, I applied single font to all generated documents from my solution for same application- and saved my time to manually go in each template and set the font separately for each tag and heading and etc...
using (WordprocessingDocument wordProcessingDocument = WordprocessingDocument.Open(input, true))
{
// Get all content control elements
List<DocumentFormat.OpenXml.OpenXmlElement> elements =
wordProcessingDocument.MainDocumentPart.Document.Body.ToList();
// Get and set the style properties of each content control
foreach (var itm in elements)
{
try
{
List<RunProperties> list_runProperties =
itm.Descendants<RunProperties>().ToList();
foreach (var item in list_runProperties)
{
if (item.RunFonts == null)
item.RunFonts = new RunFonts();
item.RunFonts.Ascii = "Courier New";
item.RunFonts.ComplexScript = "Courier New";
item.RunFonts.HighAnsi = "Courier New";
item.RunFonts.Hint = FontTypeHintValues.ComplexScript;
}
}
catch (Exception)
{
//continue for other tags in document
//throw;
}
}
wordProcessingDocument.MainDocumentPart.Document.Save();
}
I think straight answer to this is no!!!
but it is possible through workaround what i suggest is use imagemagik or some library and see if it can provide images of your word doc and then use these images in itextsharp to create pdf
Related
I'm attempting to create a word document, but text between certain words should be crossed out. I tried looking online for solutions but the only solution I could find was:
Set myRange = document.Range(Start:= document.Words(1).Start, _End:= document.Words(3).End)
myRange.Font.StrikeThrough = True
from here https://msdn.microsoft.com/en-us/vba/word-vba/articles/font-strikethrough-property-word for VBA. Unfortunately there was nothing for C#.
Does anyone know how you would add a strike-through to specific pieces of text before saving it all to a word document?
My code for reference:
//input is a StringBuilder received by method
try {
//Create an instance for word app
Microsoft.Office.Interop.Word.Application winword = new Microsoft.Office.Interop.Word.Application();
//Set animation status for word application
winword.ShowAnimation = false;
//Set status for word application is to be visible or not.
winword.Visible = false;
//Create a missing variable for missing value
object missing = System.Reflection.Missing.Value;
//Create a new document
Microsoft.Office.Interop.Word.Document document = winword.Documents.Add(ref missing, ref missing, ref missing, ref missing);
//adding text to document
document.Content.SetRange(0, 0);
document.Content.Text = input.ToString();
//Save the document
object filename = #"C:\Users\TempUser\Desktop\temp1.docx";
document.SaveAs2(ref filename);
document.Close(ref missing, ref missing, ref missing);
document = null;
winword.Quit(ref missing, ref missing, ref missing);
winword = null;
Console.WriteLine("Document created successfully!");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
I've extracted the relevant part from your code to make it easier to follow.
My assumption with this sample is that text should be inserted at the end of the document and formatted as "strikethrough". Notice how I declare a Word.Range object and assign the body of the document to it. For understanding how it works, think of a Range like a Selection, but you can have more than one and it's not visible in the document.
The next line "collapses" the Range to its end-point - like pressing Right arrow. If you did not collapse the Range, the text assigned to it would replace what's in the document (like over-typing a selection). The text is then assigned to the Range and the Strikethrough applied.
Note that in the old Word Basic days "true" and "false" were not concepts used for setting font decoration. Word's object model still uses these old Word Basic commands. Under the covers they still use -1 for true and 0 for false (and sometimes 1 for something else). While the VB languages can use the "pseudo boolean" settings (true/false) that have been added to the object model for convenience, C# doesn't "see" them, so you need -1 for true.
//adding text to document
object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;
Word.Range rng = document.Content;
rng.Collapse(ref oCollapseEnd);
rng.Text = input.ToString();
rng.Font.Strikethrough = -1; // 0 for false
I am developing a desktop application in C#. I have coded a function to merge multiple docx files but it does not work as expected. I don't get the content exactly as how it was in the source files.
A few blank lines are added in between. The content extends to the next pages, header and footer information is lost, page margins gets changed, etc..
How can I concatenate docs as it is without and change in it.Any suggestions will be helpful.
This is my code.
public bool CombineDocx(string[] filesToMerge, string destFilepath)
{
Application wordApp = null;
Document wordDoc = null;
object outputFile = destFilepath;
object missing = Type.Missing;
object pageBreak = WdBreakType.wdPageBreak;
try
{
wordApp = new Application { DisplayAlerts = WdAlertLevel.wdAlertsNone, Visible = false };
wordDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);
Selection selection = wordApp.Selection;
foreach (string file in filesToMerge)
{
selection.InsertFile(file, ref missing, ref missing, ref missing, ref missing);
selection.InsertBreak(ref pageBreak);
}
wordDoc.SaveAs( ref outputFile, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing);
return true;
}
catch (Exception ex)
{
Msg.Log(ex);
return false;
}
finally
{
if (wordDoc != null)
{
wordDoc.Close();
}
if (wordApp != null)
{
wordApp.DisplayAlerts = WdAlertLevel.wdAlertsAll;
wordApp.Quit();
Marshal.FinalReleaseComObject(wordApp);
}
}
}
In my opinion it's not so easy. Therefore I'll give you some tips here.
I think you need to implement the following changes to your code.
1.instead of pageBreak you need to add any of section breaks which could be the most appropriate:
object sectionBrak = WdBreakType.wdSectionBreakNextPage;
'other section break types also available
and use this new variable within your loop.
As a result you get all- text, footers and headers of the source document to new one.
2.However, you will still need to read margin parameters and apply them to your new document 'manually' using additional code. Therefore you will need to open source document and check it's margins in this way:
intLM = SourceDocument.Sections(1).PageSetup.LeftMargin;
'...and similar for other margins
and next you need to apply it to new document, to appropriate section:
selection.Sections(1).PageSetup.LeftMargin = intLM;
3.Some other document section could require some other techniques.
You could use the Open XML SDK and the DocumentBuilder tool.
See Merge multiple word documents into one Open Xml
I am using:
Office 2007
VC# Express 2010
1x Citrix virtual XP network environment accessed through Windows 7 laptop host
1x printer set to output to .prn in a given network-mapped drive
I am using C# and Word Interop to silently print a given set of files automatically. The application scans an input folder every 10 minutes for .doc / .docx files only, and inputs their path&filename into a list. Foreach found file, attempt to print via the following code:
public static Boolean PrintFoundFiles(List<string> foundFiles)
{
successful = false;
foreach (string file in foundFiles)
{
object fileAndPath = file; //declare my objects here, since methods want an object passed
object boolTrue = true; //
object boolFalse = false; //
object output = FormatOutputName(file); //
object missing = System.Type.Missing; //
object copies = "1"; //
object pages = ""; //
object items = Word.WdPrintOutItem.wdPrintDocumentContent; //
object range = Word.WdPrintOutRange.wdPrintAllDocument; //
object pageType = Word.WdPrintOutPages.wdPrintAllPages; //
Word.Application wordApp = new Word.Application(); //open word application
wordApp.Visible = false; //invisible
Word.Document doc = wordApp.Documents.Open(ref fileAndPath, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing); //opens the word document into application behind the scenes
doc.Activate(); //activates document, else when I print the system will throw an exception
wordApp.ActivePrinter = "my printer name"; //Specify printer I will print from
doc.PrintOut(ref boolTrue, ref boolFalse, ref range, ref output, ref missing, ref missing,
ref items, ref copies, ref pages, ref pageType, ref boolTrue, ref boolTrue,
ref missing, ref boolFalse, ref missing, ref missing, ref missing, ref missing);
doc.Close(SaveChanges: false);
doc = null;
((Word._Application)wordApp).Quit(SaveChanges: false); //kill word process the right way
wordApp = null; //reset to null
successful = true;
}
return successful;
}
After I receive the true boolean of "successful", I will back up the file automatically in a backup folder, delete it in the input folder, and look for the .prn in the output folder (it just sits here for processing later).
If I don't provide an output (see ref output on doc.PrintOut()), the output directory doesn't get updated or printed to at all. If I DO provide an output, the .prn is created, though it is a 0kb empty file.
The printer is set up as the default printer, and it has been configured to automatically output to said output folder. If I open Word manually with the same file I'm trying to automatically print from, hit print, it will create a 6kb .prn file in the output directory without having to hit anything other than CTRL + P, OK.
I'm fairly confident the file is being opened OK via "Word.Document doc = wordApp.Documents.Open()" because I did a doc.FullName and got the full path of the input word document in question. I just cannot for the life of me get the .prn to output correctly to the output folder.
If I start my word (2010) and record a macro of me pressing Ctrl+P and hitting print - I'm getting
Application.PrintOut fileName:="", Range:=wdPrintAllDocument, Item:= _
wdPrintDocumentWithMarkup, Copies:=1, Pages:="", PageType:= _
wdPrintAllPages, Collate:=True, Background:=True, PrintToFile:=False, _
PrintZoomColumn:=0, PrintZoomRow:=0, PrintZoomPaperWidth:=0, _
PrintZoomPaperHeight:=0
Change your PrintOut to reflect what Word did and see if it solves your issue.
There's no reason to be "fairly confident", just remove
wordApp.Visible = false
Debug your program and make certain it's OK.
I'm working on a program that would classify files to groups based on certain text found in them. Most of the files are possibly going to be .doc or .docx.
My program should be able to compare a list of words with words in the files.
I'm new to C# and i only study programming on my own, and the whole "read .doc file" thing goes way over my head, so any help would be greatly appreciated!
So far the part of my code that has to do with office is:
CODE
if (Path.GetExtension(listBox1.SelectedItem.ToString()) == ".doc" ||
Path.GetExtension(listBox1.SelectedItem.ToString()) == ".docx")
{
Microsoft.Office.Interop.Word.Document doc =
new Microsoft.Office.Interop.Word.Document(listBox1.SelectedItem.ToString());
doc.Activate();
}
EDIT:
Sorry if the question wasn't clear enough.
My question is:
How can i find, if the document contains any of the specific words contained in a text file.
I have read many other questions, answers and tutorials and it might be just me but I totally don't get it.
Here is an introduction on reading text out of a .docx file: http://www.codeproject.com/Articles/20529/Using-DocxToText-to-Extract-Text-from-DOCX-Files
You could convert the .doc files to .docx files and use the same process for both.
you seem to be using Microsoft's interop classes so you can use the Outlook.Interop.Find
MSDN description and HOW TO
The execute method will return true if the document contains the word.
StringBuilder sb = new StringBuilder();
Word.Range rng = rodape.Range;
Word.Find find = rng.Find;
find.ClearFormatting();
find.Replacement.ClearFormatting();//Only required if you will replace the text
if (find.Execute("textToBeFound", false))
{
//The document contains the word
}
Another example, from microsoft:
private void SelectionFind() {
object findText = "find me";
Application.Selection.Find.ClearFormatting();
if (Application.Selection.Find.Execute(ref findText,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing))
{
MessageBox.Show("Text found.");
}
else
{
MessageBox.Show("The text could not be located.");
} }
But you have many other ways to do this..
I have a WinForms application where I am using Word Automation to build documents via a template, and then save them to the database. After the document is created, I retrieve the document from the database, write it to the file system in a temp directory, and then open the document using the Word Interop services.
There is a list of documents loaded and the user can open only 1 instance of each document, but can open multiple different documents simultaneously. I don't have any problems with opening, saving, and closing when they open 1 document, however, when they open multiple documents simultaneously, I get the following error when closing any of my instances of Word:
The file is in use by another application or user. (C:\...\Templates\Normal.dotm)
This error is commonly encountered when a read lock is set on the file that you are attempting to open.
I am using the following code to open the document and handle the BeforeDocumentClosed event:
public void OpenDocument(string filePath, Protocol protocol, string docTitle, byte[] document)
{
_protocol = protocol;
documentTitle = docTitle;
_path = filePath;
if (!_wordDocuments.ContainsKey(_path))
{
FileUtility.WriteToFileSystem(filePath, document);
Word.Application wordApplication = new Word.Application();
wordApplication.DocumentBeforeClose += WordApplicationDocumentBeforeClose;
wordApplication.Documents.Open(_path);
_wordDocuments.Add(_path, wordApplication);
}
_wordApplication = _wordDocuments[_path];
_currentWordDocument = _wordApplication.ActiveDocument;
ShowWordApplication();
}
public void ShowWordApplication()
{
if (_wordApplication != null)
{
_wordApplication.Visible = true;
_wordApplication.Activate();
_wordApplication.ActiveWindow.SetFocus();
}
}
private void WordApplicationDocumentBeforeClose(Document doc, ref bool cancel)
{
if (!_currentWordDocument.Saved)
{
DialogResult dr = MessageHandler.ShowConfirmationYnc(String.Format(Strings.DocumentNotSavedMsg, _documentTitle), Strings.DocumentNotSavedCaption);
switch (dr)
{
case DialogResult.Yes:
SaveDocument(_path);
break;
case DialogResult.Cancel:
cancel = true;
return;
}
}
try
{
if (_currentWordDocument != null)
_currentWordDocument.Close();
}
finally
{
Cleanup();
}
}
public void Cleanup()
{
if (_currentWordDocument != null)
while(Marshal.ReleaseComObject(_currentWordDocument) > 0);
if (_wordApplication != null)
{
_wordApplication.Quit();
while (Marshal.ReleaseComObject(_wordApplication) > 0);
_wordDocuments.Remove(_path);
}
}
Does anyone see anything wrong that I am doing to allow opening of multiple documents at the same time? I am fairly new to Word Automation and the Word Interop services, so any advice is appreciated. Thanks.
I found the solution via this MSDN article: http://support.microsoft.com/kb/285885
You need to do this before calling Application.Quit();
_wordApplication.NormalTemplate.Saved = true;
This prevents Word from trying to save the Normal.dotm template. I hope this helps someone else.
I have used Word in C# doc2pdf application. Before close doc, set the save option like this:
object saveOption = Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges;
oDoc.Close(ref saveOption, ref oMissing, ref oMissing);
oWord.Quit(ref saveOption, ref oMissing, ref oMissing);
I have help links in my application and wanted to open a particular word doc to a particular bookmark. If the document is already open, it should not open it again. If Word is already open, it should not open a new instance of Word.
This code worked for me:
object filename = #"C:\Documents and Settings\blah\My Documents\chapters.docx";
object confirmConversions = false;
object readOnly = true;
object visible = true;
object missing = Type.Missing;
Application wordApp;
object word;
try
{
word = System.Runtime.InteropServices.Marshal.GetActiveObject("Word.Application");
}
catch (COMException)
{
Type type = Type.GetTypeFromProgID("Word.Application");
word = System.Activator.CreateInstance(type);
}
wordApp = (Application) word;
wordApp.Visible = true;
Console.WriteLine("There are {0} documents open.", wordApp.Documents.Count);
var wordDoc = wordApp.Documents.Open(ref filename, ref confirmConversions, ref readOnly, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref visible,
ref missing, ref missing, ref missing, ref missing);
wordApp.Activate();
object bookmarkName = "Chapter2";
if (wordDoc.Bookmarks.Exists(bookmarkName.ToString()))
{
var bookmark = wordDoc.Bookmarks.get_Item(bookmarkName);
bookmark.Select();
}
Keep in mind that the code:
Word.Application wordApplication = new Word.Application();
will always spin up a new instance of Word, even if there's already an instance loaded.
Usually, you're better off checking for a loaded instance (via GETOBJECT) and using it if there is one, and only spinning up a new instance if you need to.