copy individual pages from word document to new document c#

copy individual pages from word document to new document c# - c#

I have a report that has hundreds of pages. I need to create extract each individual page from this document into a new document. I have found that this is possible using INTEROP, however I'm trying avoid installing MS Office on the server. I've been using ASPOSE for most of the operations, but this functionality is doesn't appear to be supported.
Is there a way to seperate pages of a document into individual files without having MS Office Installed?

Aspose.Words does not have layout information like pages or line numbers. It maintains DOM. But we have written some utility classes to achieve such behavior. They split the word document into multiple sections, such that each page becomes one separate section. After that, it is easy to copy individual pages.
String sourceDoc = dataDir + "source.docx";
String destinationtDoc = dataDir + "destination.docx";
// Initialize the Document instance with source and destination documents
Document doc = new Document(sourceDoc);
Document dstDoc = new Document();
// Remove the blank default page from new document
dstDoc.RemoveAllChildren();
PageNumberFinder finder = new PageNumberFinder(doc);
// Split nodes across pages
finder.SplitNodesAcrossPages(true);
// Get separate page sections
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, 5, NodeType.Section);
foreach (Section section in pageSections)
dstDoc.AppendChild(dstDoc.ImportNode(section, true));
dstDoc.LastSection.Body.LastParagraph.Remove();
dstDoc.Save(destinationtDoc);
The PageNumberFinder class can be downloaded from here.
PS. I am a Developer Evangelist at Aspose.

Related

Calculate number of pages in docs file by dot net and C# and Syncfusion.DocIO.Net.Core Package

I have a .net Core 3.1 webapi.
I want to get the number of pages are in docs and doc file.
Am using the Syncfusion.DocIO.Net.Core package to perform operation on docs/doc file. But it does't provide feature to update stat of file and display only the PageCount: document.BuiltinDocumentProperties.PageCount;
This is not updated by files. Would you someone suggest me how i can calculate arithmetically.

You can get the page count from BuiltInProperties of the document using DocIO as below. It shows the page count in the document while creating the document using Microsoft Word application and also it still returns the same page count if we manipulate the document using DoIO.
int count = document.BuiltinDocumentProperties.PageCount;
If you want to get the page count, after manipulating the Word document using DocIO, we suggest you to convert the word document to PDF, and then you can retrieve the page count as like below.
WordDocument wordDocument = new WordDocument(fileStream, FormatType.Docx);
DocIORenderer render = new DocIORenderer();
//Sets Chart rendering Options.
render.Settings.ChartRenderingOptions.ImageFormat = ExportImageFormat.Jpeg;
//Converts Word document into PDF document
PdfDocument pdfDocument = render.ConvertToPDF(wordDocument);
int pageCount = pdfDocument.PageCount;
Since Word document is a flow document in which contents will not be preserved page by page; instead the contents will be preserved sequentially section by section. Each section may extend to various pages based on its contents like table, text, images etc.
Whereas Essential DocIO is a non-UI component that provides a full-fledged document object model to manipulate the Word document contents. Hence it is not feasible to get the page count directly from Word document using DocIO.
We have prepared the sample application to get the page counts from the word document and it can be downloaded from the below link.
https://www.syncfusion.com/downloads/support/directtrac/general/ze/GetPageCount16494613

Word document not displaying header, footer and images when using WordProcessingDocument

I am using WordprocessingDocument to Read and write content to a word document but when I am opening the document using MemoryStream, it is not showing me the images and header/footer which is already in the word document. Below is the code for the same.
private void AddReport(MainDocumentPart parent, MemoryStream report)
{
using (MemoryStream editingMemoryStream = new MemoryStream())
{
report.Position = 0;
report.CopyTo(editingMemoryStream);
editingMemoryStream.Position = 0;
using (WordprocessingDocument newDoc = WordprocessingDocument.Open(editingMemoryStream, true))
{
WP.Body Template = newDoc.MainDocumentPart.Document.Body;
var Main = newDoc.MainDocumentPart;
var cloneTemplate = Template.CloneNode(true);
parent.Document.Body.PrependChild(new WP.Paragraph(new WP.Run(cloneTemplate)));
parent.Document.Save();
}
}
}
Screenshot for the word document:
enter image description here
In this, the Parent document is the document where I am pre-pending the above document. Any help will be appreciated. Thanks in advance.

The headers, footers and images are not part of the document body, so won't be carried over to another document in the described scenario.
All this information is stored in separate "xml parts" contained within the Word file's "zip package". The Body part contains only refereces (relationship IDs listed in a "rel" part that point/link to the relevant xml part contained in the package).
This can be seen by opening the document in the Open XML SDK Productivity Tool and inspecting the underlying Word Open XML.
In order to copy such content to another document it's necessary to clone not only the body, but also each and every relevant xml part with content you want to have, while dynamically generating the necessary relationships - not a trivial undertaking. There are posts here and elsewhere in the Internet (including my blog, the WordMeister) that demonstrate the basics of how this is done which you could use as starting points for understanding the required approach.
Or, depending on what the Parent document is, it might make more sense to start with a copy of the "new" document and edit it with the other content.
FWIW and mentioned here for the sake of completeness: The COM object model will do what is described - copy the body of the document and paste to another document does carry over all this information. But the Word application is doing all the "heavy lifting" that the developer needs to code when using the Open XML SDK.

iTextSharp, add barcode to page 1 after generating document

I am using iTextSharp (5.5.5.90) to generate PDF files. I am using paragraphs and importing pages from readers and such. Here is how I create my document, from there I just append what I need:
FileStream fs = new FileStream("filename.pdf", FileMode.Create, FileAccess.Write, FileShare.None);
Document doc = new Document(new Rectangle(PageSize.LETTER), 58, 58, 100, 50);
PdfWriter writer = PdfWriter.GetInstance(doc, fs);
Once the file is created, I add paragraphs like this:
doc.Add(new Paragraph("Paragraph text"));
And import pages from readers like this:
writer.DirectContent.AddTemplate(writer.GetImportedPage(reader, page), 0, 0);
My question is how would I go back to page one after generating the entire document and add an element to page one? I will be adding a barcode (I know how to add barcodes, tables and such where I want them on the current page), but I don't know how to "go back" to page one to add an element.
Here is the full code, but you won't be able to compile it because of dependencies. Also, don't get caught up in the details of the full code, as this is a large project to create dynamically generated documents. https://pastebin.com/kABi7fzW

I can't attest as to the exactly calls to iTextSharp's as we approached our documentation quite differently; open a Word template, at the data as DataTables, etc., do a MailMerge, close and reopen and save as PDF. Sounds more involved but doesn't require the granular level of detail you're doing of creating the document paragraph by paragraph but it does allow the document generator to worry about content and not style placement (handled via Word, manually and external to the application).
From experience with iTextSharp, you'll have a lot of trouble trying to float an element on top of a section to insert the barcode. The document generation tool has an annoying tendency to not quite work in this scenario. We endured many weeks of back and forth with iTextSharp support and a version upgrade and still couldn't get it to behave properly in all scenarios.
As discussed in the comments and given how you've already written your code (I doubt you'll scrap all that code and start with a MailMerge unless you really, really have to), you'll need to insert a placeholder block that you can locate via iTextSharp's PdfBuilder api. I'd imagine that setting a bookmark location would likely be the easiest way.
If it's possible (preferable?) to have the barcode on a page of its own, then you already have the code needed to do this (circa line 324 in your pastebin link) with;
// create doc...
// reopen doc and get page count
doc.NewPage();
// add barcode with page count + 1
// save

Using the OpenXml SDK 2.0 to insert tables in a word document

I am just starting out with the OpenXML SDK 2.0 in Visual Studio 2010 (C#). I have automated office programs before using COM automation, which was painful.
I have a template made by one of our graphic designers, which will provide the foundation for my reports. In order to automate the simple things (plaintext items) I have added content controls to the template and bound a custom XML part to the doc. The content controls are as follows:
DayCount
AlternateJobTitle
Date
SignatureName
After making a copy of the template, I then edit the content controls and save the file with the following code:
//stand up object that reads the Word doc package
using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
{
//create XML string matching custom XML part
string newXml = "<root>" +
"<DayCount>42</DayCount>" +
"<AlternateJobTitle>Supervisor</AlternateJobTitle>" +
"<Date>9/24/2012</Date>" +
"<SignatureName>John Doe</SignatureName>" +
"</root>";
MainDocumentPart main = doc.MainDocumentPart;
main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);
//add and write new XML part
CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXml);
}
}
This all works well. However, my document is not made up solely of standard text and plaintext updates. The real meat of the report is in a number of tables that need to be added to each report as well. I have been searching like crazy for a good description on how this is done, but have really not found anything. Is there some way to delineate where to place a table using the same content control logic used for plaintext controls? Any code samples I have found of creating a table using OpenXML have just assumed that you want to append it to the end of the main document part. I would like to specify where the tables need to go in the template, generate the tables and place them in the specified regions of the template. Is this possible?
Any help is greatly appreciated.

There are a lot of OpenXml creation questions. But if you decide to take this path - answer is general - examine OpenXml Productivity Tool. At my PC it could be found at "C:\Program Files (x86)\Open XML SDK\V2.0\tool\OpenXmlSdkTool.exe". Just create in MsWord document which you want to create using OpenXml and reflect document's code using this tool. Good luck!

If you need to display tabled data, so far, the best thing I found is Word Document Generator at http://worddocgenerator.codeplex.com/.

Build Word Document from template

I have a request to create a word document on the fly based on a template provided to me. I have done some research and everything seems to point at OpenXML. I have looked into that, but the cs file that gets created is over 15k lines and is breaking my VS 2010 (causing it to not respond every time I make a change).
I have been looking at this tutorial series on Open XML
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/10/13/getting-started-with-open-xml-development.aspx
I have done things in the past with text files and Regular Expressions, but since Word encrypts everything, that does not work. Are there any other options that are fairly lightweight for creating word documents from templates.

//Hi, It is quite simple.
//First, you should copy your Template file into another location.
string SourcePath = "C:\\MyTemplate.dotx";
string DestPath = "C:\\MyDocument.docx";
System.IO.File.Copy(SourcePath, DestPath);
//After copying the file, you can open a WordprocessingDocument using your Destination Path.
WordprocessingDocument Mydoc = WordprocessingDocument.Open(DestPath, true);
//After openning your document, you can change type of your document after adding additional parts into your document.
mydoc.ChangeDocumentType(WordprocessingDocumentType.Document);
//If you wish, you can edit your document
AttachedTemplate attachedTemplate1 = new AttachedTemplate() { Id = "MyRelationID" };
MainDocumentPart mainPart = mydoc.MainDocumentPart;
MySettingsPart = mainPart.DocumentSettingsPart;
MySettingsPart.Settings.Append(attachedTemplate1);
MySettingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate", new Uri(CopyPath, UriKind.Absolute), "MyRelationID");
//Finally you can save your document.
mainPart.Document.Save();

I am currently working on something along these lines and I have been making use of the Open XML SDK and the OpenXmlPowerTools The approach been taken is taking the actual template file opening it up and putting text into various place holders within the template document. I have been using content controls as the place markers.
The SDK tool to open up a document has been invaluable in being able to compare documents and see how it is constructed. However the code generated from the tool I have been refactoring heavily and removing sections that are not being used at all.
I can't talk about doc files but with docx files they are not encrypted they are just zip files that contain xml files
Eric White's blog has a large number of examples and code samples which have been very useful

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.