Splitting word document into separate pages using c#

Splitting word document into separate pages using c# - c#

An hour ago I been searching for a code that split word document into separate pages I found this question
Using the code in the thread
static class PagesExtension {
public static IEnumerable<Range> Pages(this Document doc) {
int pageCount = doc.Range().Information[WdInformation.wdNumberOfPagesInDocument];
int pageStart = 0;
for (int currentPageIndex = 1; currentPageIndex <= pageCount; currentPageIndex++) {
var page = doc.Range(
pageStart
);
if (currentPageIndex < pageCount) {
//page.GoTo returns a new Range object, leaving the page object unaffected
page.End = page.GoTo(
What: WdGoToItem.wdGoToPage,
Which: WdGoToDirection.wdGoToAbsolute,
Count: currentPageIndex+1
).Start-1;
} else {
page.End = doc.Range().End;
}
pageStart = page.End + 1;
yield return page;
}
yield break;
}
}
I call the code above using this code
var app = new Microsoft.Office.Interop.Word.Application();
object missObj = System.Reflection.Missing.Value;
app.Visible = false;
var doc = app.Documents.Open(fileLocation);
int pageNumber = 1;
foreach (var page in doc.Pages())
{
Microsoft.Office.Interop.Word.Document newDoc = app.Documents.Add(ref missObj, ref missObj, ref missObj, ref missObj);
page.Copy();
var doc2 = app.Documents.Add();
doc2.Range().Paste();
object newDocName = pageNumber.ToString() + ".docx";
Console.WriteLine(newDocName);
doc2.SaveAs2(newDocName, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: Microsoft.Office.Interop.Word.WdCompatibilityMode.wdWord2010);
pageNumber++;
}
app.ActiveDocument.Close();
app.Quit();
But I'm getting an error in a specific document and here is the error
This method or property is not available because no text is selected.
What is the reason for it? i checked the document and found out that the document contains lots of spaces before the next page. How can I solve this?
And using the code above it didn't copy the header and footer. Thank you
Update: Error
This method or property is not available because no text is selected.
at Microsoft.Office.Interop.Word.Range.Copy()
at retrieveObjects(String location) in Document.cs:line 31
and this is the line
page.Copy();

Related

How to get and set value for /BSIColumnData of an annotation PDF itext c#

How to get and set value for /BSIColumnData of an annotation (markup) in PDF using itext c# as attached file?
I'm using Itext7 code below, but it is error at BSIColumnData:
public void BSIcontents ()
{
string pdfPath = #"C:\test PDF.pdf";
iText.Kernel.Pdf.PdfReader pdfReader = new iText.Kernel.Pdf.PdfReader(pdfPath);
iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfReader);
int numberOfPages = pdfDoc.GetNumberOfPages();
int z = 0;
for (int i = 1; i <= numberOfPages; i++)
{
iText.Kernel.Pdf.PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
iText.Kernel.Pdf.PdfArray annotArray = page.GetAsArray(iText.Kernel.Pdf.PdfName.Annots);
if (annotArray == null)
{
z++;
continue;
}
int size = annotArray.Size();
for (int x = 0; x < size; x++)
{
iText.Kernel.Pdf.PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
if (curAnnot != null)
{
if (curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData) != null)
{
MessageBox.Show("BSIColumnData: " + curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData).ToString());
}
}
}
}
pdfReader.Close();
}
In Bluebeam Revu, you can see as below:
In Itext-rups 5.5.9, you can see as below:

I see two errors:
You try to use the BSIColumnData name like this:
iText.Kernel.Pdf.PdfName.BSIColumnData
This assumes that there is already a static PdfName member for your custom name. But of course there isn't, there only are predefined members for standard names used in itext itself. If you want to work with other names, you have to create a PdfName instance yourself and use that instance, e.g. like this
var BSIColumnData = new iText.Kernel.Pdf.PdfName("BSIColumnData");
You try to retrieve the value of that name as string
curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData)
but it is clear from your RUPS screenshot that the value of that name is an array of strings. Thus, even after correcting as described in the first item GetAsString(BSIColumnData) will return null. Instead do
var BSIColumnData = new iText.Kernel.Pdf.PdfName("BSIColumnData");
var array = curAnnot.GetAsArray(BSIColumnData);
After checking if (array != null) you now can access the strings at their respective indices using array.GetAsString(index).

Split PDF by chapters from Table Of Contents

I'm using GemBox.Pdf and I need to extract individual chapters in a PDF file as a separate PDF files.
The first page (maybe the second page as well) contains TOC (Table Of Contents) and I need to split the rest of the PDF pages based on it:
Also, those PDF documents that are split, should be named as the chapters they contains.
I can split the PDF based on the number of pages for each document (I figured that out using this example):
using (var source = PdfDocument.Load("Chapters.pdf"))
{
int pagesPerSplit = 3;
int count = source.Pages.Count;
for (int index = 1; index < count; index += pagesPerSplit)
{
using (var destination = new PdfDocument())
{
for (int splitIndex = 0; splitIndex < pagesPerSplit; splitIndex++)
destination.Pages.AddClone(source.Pages[index + splitIndex]);
destination.Save("Chapter " + index + ".pdf");
}
}
}
But I can't figure out how to read and process that TOC and incorporate the chapters splitting base on its items.

You should iterate through the document's bookmarks (outlines) and split it based on the bookmark destination pages.
For instance, try this:
using (var source = PdfDocument.Load("Chapters.pdf"))
{
PdfOutlineCollection outlines = source.Outlines;
PdfPages pages = source.Pages;
Dictionary<PdfPage, int> pageIndexes = pages
.Select((page, index) => new { page, index })
.ToDictionary(item => item.page, item => item.index);
for (int index = 0, count = outlines.Count; index < count; ++index)
{
PdfOutline outline = outlines[index];
PdfOutline nextOutline = index + 1 < count ? outlines[index + 1] : null;
int pageStartIndex = pageIndexes[outline.Destination.Page];
int pageEndIndex = nextOutline != null ?
pageIndexes[nextOutline.Destination.Page] :
pages.Count;
using (var destination = new PdfDocument())
{
while (pageStartIndex < pageEndIndex)
{
destination.Pages.AddClone(pages[pageStartIndex]);
++pageStartIndex;
}
destination.Save($"{outline.Title}.pdf");
}
}
}
Note, from the screenshot it seems that your chapter bookmarks include the order's number (roman numerals). If needed, you can easily remove those with something like this:
destination.Save($"{outline.Title.Substring(outline.Title.IndexOf(' ') + 1)}.pdf");

How to use advanced find in MS Word with .NET C#

I have a problem like this: I need to use advanced find to find a specific text (base on Font of the text, the text string, wildcard,...), and write down to a notepad file which page did I find that text.
I see that in C# .Net has this Find.Execute method, but I don't know if it's possible to do this, I have googled around but no hope.
But my idea is like this code
using Microsoft.Office.Core;
using Word = Microsoft.Office.Interop.Word;
using System.Reflection;
...
Word.Application oWord;
Word._Document oDoc;
oWord = new Word.Application();
oWord.Visible = false;
oDoc = oWord.Documents.Open(strPath, ReadOnly: true);
Word.Range findRange;
Word.Range resultRange;
int nPage;
//Get the range of the whole word document
int nEnd;
nEnd = oDoc.Paragraphs.Last.Range.Sentences.First.End;
findRange = oDoc.Range(0, nEnd);
//Setup find condition
//The color of the found text: RGB(243,99, 195) . Please help!
//Execute find --> Loop until not found anymore
{
//findRange.Find.Execute... Please help!
//Get the range of the found text
//resultRange = ... Please help!
//Get page of the result range
nPage = resultRange.get_Information(Word.WdInformation.wdActiveEndPageNumber);
//Do anything you like with nPage
}
//Close the process
oDoc.Close(Word.WdSaveOptions.wdDoNotSaveChanges);
((Word._Application)oWord).Quit(Word.WdSaveOptions.wdDoNotSaveChanges);
Thank you in advance.

Thank God, I found my solution.
After I read:
this article to find out how to loop the find next feature.
This article to find out that must use Find.Font.Color instead of Find.Font.TextColor.RGB
This article to get the page range (the code is pretty unclean, but usable)
Ok, here it goes
Word.Application oWord;
Word._Document oDoc;
oWord = new Word.Application();
oWord.Visible = false;
oDoc = oWord.Documents.Open(strWorkingPath, ReadOnly: true);
//===================Excute===================
/*Word 2013*/
oWord.ActiveWindow.View.ReadingLayout = false;
// Get pages count
Word.WdStatistic PagesCountStat = Word.WdStatistic.wdStatisticPages;
int nTotalPage = oDoc.ComputeStatistics(PagesCountStat);
int nEndOfTheDoc = oDoc.Paragraphs.Last.Range.Sentences.First.End;
int nStart = 0;
int nEnd = nEndOfTheDoc;
List<int> lstPage = new List<int>();
int color = 696969;//The color you can get by read the Font.Color of the Range in Debug view
Word.Range findRange;
object What = Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage;
object Which = Microsoft.Office.Interop.Word.WdGoToDirection.wdGoToAbsolute;
object nCrtPage;
object nNextPage;
bool bPageIsIn = false;
/*Loop the pages*/
for (int i = 1; i <= nTotalPage; i++)
{
/*Get the start and end position of the current page*/
nCrtPage = i;
nNextPage = i + 1;
nStart = oWord.Selection.GoTo(ref What,
ref Which, ref nCrtPage).Start;
nEnd = oWord.Selection.GoTo(ref What,
ref Which, ref nNextPage).End;
/*The last page: nStart will equal nEnd*/
if(nStart == nEnd)
{
/*Set nEnd for the last page*/
nEnd = nEndOfTheDoc;
}
/*Set default for Count page trigger*/
bPageIsIn = false;
/*Set the find range is the current page range*/
findRange = oDoc.Range(nStart, nEnd);
/*Set up find condition*/
findRange.Find.Font.Color = (Word.WdColor)color;
findRange.Find.Format = true;
findRange.Find.Text = "^?";
do
{
/*Loop find next*/
findRange.Find.Execute();
/*If found*/
if (findRange.Find.Found)
{
/*If found data is still in the page*/
if (findRange.End <= nEnd)
{
/*If found data is visible by human eyes*/
if (!string.IsNullOrWhiteSpace(findRange.Text))
{
/*Ok, count this page*/
bPageIsIn = true;
break;/*no need to find anymore for this page*/
}
}
}
else
break;/*no need to find anymore for this page*/
}while (findRange.End < nEnd);/*Make sure it is in that page only*/
if (bPageIsIn)
lstPage.Add(i);
}
//===================Close===================
oDoc.Close(Word.WdSaveOptions.wdDoNotSaveChanges);
((Word._Application)oWord).Quit(Word.WdSaveOptions.wdDoNotSaveChanges);
foreach (var item in lstPage)
{
builder.AppendLine(item.ToString());//Do anything you like with the list page
}

Adding multiple word tables into word using interop

I'm trying to insert multiple tables into a word document using c#, but when I add another block of code to add a table in I am getting an error and the second table is not being inserted. How do I move the range down to the bottom of the page and then add another table? I tried creating a new range using the end of doc reference but this doesn't seem to work, can anyone give me some help?
Word._Application objApp;
Word._Document objDoc;
try
{
object objMiss = System.Reflection.Missing.Value;
object objEndOfDocFlag = "\\endofdoc"; /* \endofdoc is a predefined bookmark */
//Start Word and create a new document.
objApp = new Word.Application();
objApp.Visible = true;
objDoc = objApp.Documents.Add(ref objMiss, ref objMiss,
ref objMiss, ref objMiss);
//Insert a paragraph at the end of the document.
Word.Paragraph objPara2; //define paragraph object
object oRng = objDoc.Bookmarks.get_Item(ref objEndOfDocFlag).Range; //go to end of the page
objPara2 = objDoc.Content.Paragraphs.Add(ref oRng); //add paragraph at end of document
objPara2.Range.Text = "Test Table Caption"; //add some text in paragraph
objPara2.Format.SpaceAfter = 10; //defind some style
objPara2.Range.InsertParagraphAfter(); //insert paragraph
//Insert a table
Word.Table objTab1; //create table object
Word.Range objWordRng = objDoc.Bookmarks.get_Item(ref objEndOfDocFlag).Range; //go to end of document
objTab1 = objDoc.Tables.Add(objWordRng, 9, 2, ref objMiss, ref objMiss); //add table object in word document
objTab1.Range.ParagraphFormat.SpaceAfter = 6;
objTab1.Range.Borders[Word.WdBorderType.wdBorderBottom].LineStyle = Word.WdLineStyle.wdLineStyleThickThinLargeGap;
objTab1.Range.Borders[Word.WdBorderType.wdBorderHorizontal].LineStyle = Word.WdLineStyle.wdLineStyleDouble;
objTab1.Range.Borders[Word.WdBorderType.wdBorderTop].LineStyle = Word.WdLineStyle.wdLineStyleDouble;
objTab1.Range.Borders[Word.WdBorderType.wdBorderLeft].LineStyle = Word.WdLineStyle.wdLineStyleDouble;
objTab1.Range.Borders[Word.WdBorderType.wdBorderRight].LineStyle = Word.WdLineStyle.wdLineStyleDouble;
objTab1.Columns.Borders[Word.WdBorderType.wdBorderVertical].LineStyle = Word.WdLineStyle.wdLineStyleDouble;
objTab1.Columns[1].Shading.BackgroundPatternColor = Word.WdColor.wdColorGray20;
objTab1.Columns[1].Width = objApp.CentimetersToPoints(3.63f);
objTab1.Columns[2].Width = objApp.CentimetersToPoints(13.11f);
int iRow, iCols;
string[] col = new string[9];
col[0] = "Row1";
col[1] = "row2";
col[2] = "Row3";
col[3] = "row4";
col[4] = "row5";
col[5] = "row6";
col[6] = "row7";
col[7] = "row8";
col[8] = "tow9";
for (iRow = 1; iRow <= 9; iRow++)
{
objTab1.Rows[iRow].Range.Font.Bold = 1;
for (int i = 0; i <= col.Length; i++)
{
string s = col[i];
objTab1.Rows[iRow++].Range.Text = s;
objTab1.Rows[iRow].Range.Font.Bold = 1;
}
}
objApp.Selection.TypeParagraph();
//Insert a paragraph at the end of the document.
Word.Paragraph objPara3; //define paragraph object
object oRng2 = objDoc.Bookmarks.get_Item(ref objEndOfDocFlag).Range; //go to end of the page
objPara3 = objDoc.Content.Paragraphs.Add(ref oRng2); //add paragraph at end of document
objPara3.Range.Text = "hello"; //add some text in paragraph
objPara3.Format.SpaceAfter = 10; //defind some style
objPara3.Range.InsertParagraphAfter(); //insert paragraph
//Insert a 2 x 2 table, (table with 2 row and 2 column)
Word.Table objTab2; //create table object
Word.Range objWordRng2 = objDoc.Bookmarks.get_Item(ref objEndOfDocFlag).Range; //go to end of document
objTab2 = objDoc.Tables.Add(objWordRng2, 9, 2, ref objMiss, ref objMiss); //add table object in word document
objTab2.Range.ParagraphFormat.SpaceAfter = 6;
object stylename2 = "Table Grid";
I get the following exception "the requested member of the collection does not exist"

Without fully following how you want the layout to appear. There are a couple of issues with the posted code. First in the for loops where you are adding the text to the first table, I am not sure what you are doing with the following lines:
objTab1.Rows[iRow++].Range.Text = s;
objTab1.Rows[iRow].Range.Font.Bold = 1;
The iRow++ increment in the first line is going to throw off where the row is in the table. I am guessing you may want:
objTab1.Rows[iRow].Range.Font.Bold = 1;
objTab1.Rows[iRow].Range.Text = s;
iRow++;
The other issue is how the code is getting the last paragraph like below:
object oRng2 = objDoc.Bookmarks.get_Item(ref objEndOfDocFlag).Range;
objPara3 = objDoc.Content.Paragraphs.Add(ref oRng);
The oRng2 range is the end of doc range however, the next line uses oRng which is the top of the document. Changing the add paragraphs to the proper range should fix this.
objPara3 = objDoc.Content.Paragraphs.Add(ref oRng2);
Hope this helps.

How to execute all process parallel inside for loop in winforms C# application

I have created on function which will split single fine into multiple file. For example i have one file which contain 100 pages now i would like to create new files for every 15 pages, it means it will create 7 files each file has 15 pages. (100/15 = 7)
Now my problem is i have implemented logic for splitting files using Thread, ISynchronizeInvoke and delegate to smooth process and user experience. It is working aspected, but i would like to perform each split in parallel or simultaneously instead of one by one process.
This code has been written in Splitter.cs file
I have written following code:
#region Private Variables
private NotifyProgress _notifyDelegate;
private Thread _thread;
private ISynchronizeInvoke _synchronizingObject;
//this is the definition of the progress delegate - it defines the "signature" of the routine...
public delegate void NotifyProgress(int TotalFiles, int ProcessFileIndex, int TotalPages, int PageIndex, string Size);
#endregion
private void splitFiles()
{
//Intialize a new PdfReader instance with the contents of the source Pdf file:
PdfReader Reader = new PdfReader(PDFFile);
Reader.RemoveUnusedObjects();
Reader.ConsolidateNamedDestinations();
for (int f = 0; f < Files.Count(); f++)
{
List<int> Pages = Files[f].Pages;
string FileName = (f + 1).ToString();
string folderPath = Path.Combine(OutputFolderPath, GeneratedFilesFolder);
string gereatedPath = Path.Combine(folderPath, string.Format(FileNameFormat, FileName));
//This method will create folder if the path doesn't exist
CreateFolder(folderPath);
PdfImportedPage importedPage = null;
Document currentDocument = new Document();
PdfSmartCopy pdfWriter = null;
bool bIsFirst = true;
long _size = 0;
for (int p = 0; p < Pages.Count; p++)
{
NotifyUI(Files.Count(), f, Pages.Count, p, _size);
if (bIsFirst)
{
bIsFirst = false;
currentDocument = new Document(Reader.GetPageSizeWithRotation(1));
pdfWriter = new PdfSmartCopy(currentDocument, new FileStream(gereatedPath, FileMode.Create));
pdfWriter.SetFullCompression();
//pdfWriter.CompressionLevel = PdfStream..BEST_COMPRESSION;
pdfWriter.PdfVersion = Reader.PdfVersion;
currentDocument.Open();
}
_size += pdfWriter.CurrentDocumentSize;
importedPage = pdfWriter.GetImportedPage(Reader, Pages[p]);
pdfWriter.AddPage(importedPage);
}
currentDocument.Close();
pdfWriter.Close();
FileInfo _f = new FileInfo(gereatedPath);
NotifyUI(Files.Count(), f, Pages.Count, Pages.Count - 1, _f.Length);
}
}
private void NotifyUI(int TotalFiles, int ProcessFileIndex, int TotalPages, int PageIndex, long Size)
{
//this method will fail because we're not telling the delegate which thread to run in...
object[] args = { TotalFiles, ProcessFileIndex + 1, TotalPages, PageIndex + 1, CalculateFileSize(Size) };
//call the delegate, specifying the context in which to run...
_synchronizingObject.Invoke(_notifyDelegate, args);
}
This code has been written in Form.cs file
private void DelegateProgress(int TotalFiles, int ProcessFileIndex, int TotalPages, int PageIndex, string Size)
{
if (splitter != null && PDFSplitter.TotalPages > 0)
{
this.Invoke((MethodInvoker)delegate
{
int IndividualProgress = PageIndex * 100 / TotalPages;
lstFiles.Items[ProcessFileIndex - 1].SubItems[1].Text = Size;
TextProgressBar pb = (TextProgressBar)lstFiles.GetEmbeddedControl(3, ProcessFileIndex - 1);
pb.Text = string.Format("{0:00} %", IndividualProgress);
pb.Value = IndividualProgress;
int OverallProgress = ProcessFileIndex * 100 / TotalFiles;
ProgressStripItem statusProgrss = (ProgressStripItem)tsStatus.Items[3];
statusProgrss.TextProgressBar.Value = OverallProgress;
statusProgrss.TextProgressBar.Text = string.Format("{0:00}%", OverallProgress);
if (OverallProgress >= 100 && IndividualProgress >= 100)
{
tslblMessage.Text = "File has been split successfully.";
DialogResult dr = MessageBox.Show(tslblMessage.Text + "\nDo you want to open split files folder?", "Split Completed", MessageBoxButtons.YesNo, MessageBoxIcon.Information);
if (dr == System.Windows.Forms.DialogResult.Yes)
{
OpenSplitFilePath();
}
for (int i = 0; i < lstFiles.Items.Count; i++)
{
lstFiles.RemoveEmbeddedControl(lstFiles.GetEmbeddedControl(3, i));
}
lstFiles.Items.Clear();
splitter = null;
}
});
}
}

Have you tried Parallel For loop? Here's a simple example.
Parallel.For(0, 10, i =>
{
//What you would like to do simultaneously.
System.Diagnostics.Debug.WriteLine(i);
});
If you ran try building and compile this simple code, you would notice that the output will be something like this.
8
4
5
7
2
3
9
1
6

I hope this can be achieved by parallel For Each. Refer the following link to know how to use parallel for each. http://msdn.microsoft.com/en-us/library/dd460720.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Splitting word document into separate pages using c# - c#

Related

How to get and set value for /BSIColumnData of an annotation PDF itext c#

Split PDF by chapters from Table Of Contents

How to use advanced find in MS Word with .NET C#

Adding multiple word tables into word using interop

How to execute all process parallel inside for loop in winforms C# application

Categories

Resources