I have a request to create a word document on the fly based on a template provided to me. I have done some research and everything seems to point at OpenXML. I have looked into that, but the cs file that gets created is over 15k lines and is breaking my VS 2010 (causing it to not respond every time I make a change).
I have been looking at this tutorial series on Open XML
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/10/13/getting-started-with-open-xml-development.aspx
I have done things in the past with text files and Regular Expressions, but since Word encrypts everything, that does not work. Are there any other options that are fairly lightweight for creating word documents from templates.
//Hi, It is quite simple.
//First, you should copy your Template file into another location.
string SourcePath = "C:\\MyTemplate.dotx";
string DestPath = "C:\\MyDocument.docx";
System.IO.File.Copy(SourcePath, DestPath);
//After copying the file, you can open a WordprocessingDocument using your Destination Path.
WordprocessingDocument Mydoc = WordprocessingDocument.Open(DestPath, true);
//After openning your document, you can change type of your document after adding additional parts into your document.
mydoc.ChangeDocumentType(WordprocessingDocumentType.Document);
//If you wish, you can edit your document
AttachedTemplate attachedTemplate1 = new AttachedTemplate() { Id = "MyRelationID" };
MainDocumentPart mainPart = mydoc.MainDocumentPart;
MySettingsPart = mainPart.DocumentSettingsPart;
MySettingsPart.Settings.Append(attachedTemplate1);
MySettingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate", new Uri(CopyPath, UriKind.Absolute), "MyRelationID");
//Finally you can save your document.
mainPart.Document.Save();
I am currently working on something along these lines and I have been making use of the Open XML SDK and the OpenXmlPowerTools The approach been taken is taking the actual template file opening it up and putting text into various place holders within the template document. I have been using content controls as the place markers.
The SDK tool to open up a document has been invaluable in being able to compare documents and see how it is constructed. However the code generated from the tool I have been refactoring heavily and removing sections that are not being used at all.
I can't talk about doc files but with docx files they are not encrypted they are just zip files that contain xml files
Eric White's blog has a large number of examples and code samples which have been very useful
Related
I am using WordprocessingDocument to Read and write content to a word document but when I am opening the document using MemoryStream, it is not showing me the images and header/footer which is already in the word document. Below is the code for the same.
private void AddReport(MainDocumentPart parent, MemoryStream report)
{
using (MemoryStream editingMemoryStream = new MemoryStream())
{
report.Position = 0;
report.CopyTo(editingMemoryStream);
editingMemoryStream.Position = 0;
using (WordprocessingDocument newDoc = WordprocessingDocument.Open(editingMemoryStream, true))
{
WP.Body Template = newDoc.MainDocumentPart.Document.Body;
var Main = newDoc.MainDocumentPart;
var cloneTemplate = Template.CloneNode(true);
parent.Document.Body.PrependChild(new WP.Paragraph(new WP.Run(cloneTemplate)));
parent.Document.Save();
}
}
}
Screenshot for the word document:
enter image description here
In this, the Parent document is the document where I am pre-pending the above document. Any help will be appreciated. Thanks in advance.
The headers, footers and images are not part of the document body, so won't be carried over to another document in the described scenario.
All this information is stored in separate "xml parts" contained within the Word file's "zip package". The Body part contains only refereces (relationship IDs listed in a "rel" part that point/link to the relevant xml part contained in the package).
This can be seen by opening the document in the Open XML SDK Productivity Tool and inspecting the underlying Word Open XML.
In order to copy such content to another document it's necessary to clone not only the body, but also each and every relevant xml part with content you want to have, while dynamically generating the necessary relationships - not a trivial undertaking. There are posts here and elsewhere in the Internet (including my blog, the WordMeister) that demonstrate the basics of how this is done which you could use as starting points for understanding the required approach.
Or, depending on what the Parent document is, it might make more sense to start with a copy of the "new" document and edit it with the other content.
FWIW and mentioned here for the sake of completeness: The COM object model will do what is described - copy the body of the document and paste to another document does carry over all this information. But the Word application is doing all the "heavy lifting" that the developer needs to code when using the Open XML SDK.
I am trying to copy entire content (including page numbers, and page layout) from a word document to another, using Microsoft.Office.Interop.Word.
I cannot use SaveAs method because the document in which I want to paste the contents is already created and it contains VBA code.
Also, I cannot use XML related code because the document in which I am copying the content is in the older format. This document is part of an old way of uploading a document to a server database, using VBA code.
Using VBA code, I can copy the entire content without any issue.
Selection.WholeStory
Selection.Copy
Windows("document.doc").Activate
Selection.WholeStory
Selection.PasteAndFormat (wdFormatOriginalFormatting)
For C#, I used Microsoft.Office.Interop.Word to replicate the VBA code.
Word.Application objWordOpen = new Word.Application();
objWordOpen.Visible = false;
Word.Document doclocal = objWordOpen.Documents.Open(filepath);
doclocal.ActiveWindow.Selection.WholeStory();
doclocal.ActiveWindow.Selection.Copy();
Document d1 = objWordOpen.Documents.Open(filepath2);
d1.Activate();
d1.ActiveWindow.Selection.WholeStory();
d1.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdFormatOriginalFormatting);
I have also tried using range
Word.Range oRange = doclocal.Content;
oRange.Copy();
The content is copied into the document, but without headers and footers. Also, when using Selection.WholeStory() approach, the page margins settings don't get copied.
What changes should I make to the c# code in order to achieve my result?
MS Office applications have complicated relationships with the clipboard. Between various optimisations that may lead to cryptic prompts, and numerous formats they support, it is best to not do anything remotely funny between a Copy and a Paste.
The VBA code follows this advice, the C# code opens a document between copying and pasting.
Make sure you open the documents in advance and not in the middle of a copypaste.
I have to use OpenXML SDK 2.5 with C# to copy formulas from one word document then append them to another word document. I tried the below code, it ran successfully but when I tried to open the file, it said there's something wrong with the content. I opened it ignoring the warning but those formulas were not displayed. They are just blank blocks.
My code:
private void CreateNewWordDocument(string document, Exercise[] exercices)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(document, WordprocessingDocumentType.Document))
{
// Set the content of the document so that Word can open it.
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
SetMainDocumentContent(mainPart);
foreach (Exercise ex in exercices)
{
wordDoc.MainDocumentPart.Document.Body.AppendChild(ex.toParagraph().CloneNode(true));
}
wordDoc.MainDocumentPart.Document.Save();
}
}
// Set content of MainDocumentPart.
private void SetMainDocumentContent(MainDocumentPart part)
{
string docXml =
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body><w:p><w:r><w:t>Exercise list!</w:t></w:r></w:p></w:body>
</w:document>";
using (Stream stream = part.GetStream())
{
byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
stream.Write(buf, 0, buf.Length);
}
}
This happens because not everything that can be referenced in the paragraph is copied when you clone the paragraph. The Word XML format consists of multiple files some of which reference each other. If you copy the paragraph from one document to another you need to also copy any relationships that may exist.
The OpenXML Productivity Tool is useful for diagnosing errors like these. You can open a document with the tool and ask it to validate the document.
I created a test document that just contained a hyperlink and ran your code to copy the contents to another document. I too got an error when I attempted to load it using Word so I opened it in the Productivity Tool and saw the following output:
This shows that the hyperlink is stored as a relationship rather than inline in the paragraph and my new file references a relationship that doesn't exist. Unzipping the original file and the new file and comparing the two shows what is going on:
document.xml from original:
.rels of original
document.xml of generated file
.rels of generated file
Note that in the generated file the hyperlink references relationship rId5 but that doesn't exist in the generated documents relationship file.
It's worth noting that for simple source documents the code worked without issue as there are no relationships that require copying.
There are two ways that you can solve this. The easiest way is to only copy the text of the paragraph (you'll lose all styles, images, hyperlinks etc) but it is very simple. All you need to do is change
wordDoc.MainDocumentPart.Document.Body.AppendChild(ex.toParagraph().CloneNode(true));
for
Paragraph para = wordDoc.MainDocumentPart.Document.Body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(ex.toParagraph().InnerText));
The more complex (and perhaps proper) way of achieving it is to find the relationships and copy them to the new document as well. The code for doing that is probably beyond the scope of what I can write here but there is an interesting article on the subject here http://blogs.msdn.com/b/ericwhite/archive/2009/02/05/move-insert-delete-paragraphs-in-word-processing-documents-using-the-open-xml-sdk.aspx.
Essentially the author of that blog post is using the Powertools for OpenXML to find relationships and copy them from one document to another.
I am just starting out with the OpenXML SDK 2.0 in Visual Studio 2010 (C#). I have automated office programs before using COM automation, which was painful.
I have a template made by one of our graphic designers, which will provide the foundation for my reports. In order to automate the simple things (plaintext items) I have added content controls to the template and bound a custom XML part to the doc. The content controls are as follows:
DayCount
AlternateJobTitle
Date
SignatureName
After making a copy of the template, I then edit the content controls and save the file with the following code:
//stand up object that reads the Word doc package
using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
{
//create XML string matching custom XML part
string newXml = "<root>" +
"<DayCount>42</DayCount>" +
"<AlternateJobTitle>Supervisor</AlternateJobTitle>" +
"<Date>9/24/2012</Date>" +
"<SignatureName>John Doe</SignatureName>" +
"</root>";
MainDocumentPart main = doc.MainDocumentPart;
main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);
//add and write new XML part
CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXml);
}
}
This all works well. However, my document is not made up solely of standard text and plaintext updates. The real meat of the report is in a number of tables that need to be added to each report as well. I have been searching like crazy for a good description on how this is done, but have really not found anything. Is there some way to delineate where to place a table using the same content control logic used for plaintext controls? Any code samples I have found of creating a table using OpenXML have just assumed that you want to append it to the end of the main document part. I would like to specify where the tables need to go in the template, generate the tables and place them in the specified regions of the template. Is this possible?
Any help is greatly appreciated.
There are a lot of OpenXml creation questions. But if you decide to take this path - answer is general - examine OpenXml Productivity Tool. At my PC it could be found at "C:\Program Files (x86)\Open XML SDK\V2.0\tool\OpenXmlSdkTool.exe". Just create in MsWord document which you want to create using OpenXml and reflect document's code using this tool. Good luck!
If you need to display tabled data, so far, the best thing I found is Word Document Generator at http://worddocgenerator.codeplex.com/.
currently i have been using the following code and i am using some dll files from pdfbox
FileInfo file = new FileInfo("c://aa.pdf");
PDDocument doc = PDDocument.load(file.FullName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string text = pdfStripper.getText (doc);
richTextBox1.Text = qq;
using this code i can able to get text file but not in a correct format plz give me a some ideas
Extracting the text from a pdf file is anything but trivial.
To quote from th iTextSharp tutorial.
"The pdf format is just a canvas where
text and graphics are placed without
any structure information. As such
there aren't any 'iText-objects' in a
PDF file. In each page there will
probably be a number of 'Strings', but
you can't reconstruct a phrase or a
paragraph using these strings. There
are probably a number of lines drawn,
but you can't retrieve a Table-object
based on these lines. In short:
parsing the content of a PDF-file is
NOT POSSIBLE with iText."
There are several commercial applications which claim to be able to do it. Caveat Emptor.
There is also a free software library called Poppler http://poppler.freedesktop.org/ which is used by the pdf viewers of GNOME and KDE. It has a function called pdftotext() but I have no experience with it. It may be your best free option.
There is a blog article explaining the issues with PDF text extraction in general at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text