c# / openxml merge of word documents to one, fails --closed--

c# / openxml merge of word documents to one, fails --closed-- - c#

I have to merge several word documents with a small c# console application. So far so good. The documents are generated in arcplan. Around 30 files are generated, but somehow some documents are corrupted, but still shows me Content.
If I merge now all files which are correct my document is fine but if i have a corrupted file in my bunch of files any corrupted generates an empty page. I debugged it of course, but i dont see anything going wrong which explains the empty page.
the arguments are like this:
"C:\temp\Report_C_01.docx" "C:\temp\Report_D_01.docx" "C:\temp\Report_E_01.docx"
here´s my Code:
public static void Merge(params String[] filepaths)
{
String pathName = Path.GetDirectoryName(filepaths[0]);
subfolder = Path.Combine(pathName, "Output\\"); //Wird für den gemergten File benötigt
if (filepaths != null && filepaths.Length > 1)
{
WordprocessingDocument myDoc = WordprocessingDocument.Open(#filepaths[0], true); //Wordfiles werden geöffnet
MainDocumentPart mainPart = myDoc.MainDocumentPart;
for (int i = 1; i < filepaths.Length; i++)
{
String altChunkId = "AltChunkId" + i;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
FileStream fileStream = File.Open(#filepaths[i], FileMode.Open);
chunk.FeedData(fileStream);
DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
altChunk.Id = altChunkId;
//new page, if you like it...
mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page } )));
//next document
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
}
mainPart.Document.Save();
myDoc.Close();
for (int i = 0; i < 1; i++)
{
String fileNameWE = Path.GetFileName(filepaths[i]);
File.Copy(filepaths[i], subfolder + fileNameWE);
}
foreach (String fp in filepaths)
{
File.Delete(fp);
}
}
else
{
Console.WriteLine("Nur 1 Argument");
}
}
Hope someone can help me.
Best regards
Christian

Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine

Related

Pdfsharp Out of Memory Exception when Combine Multi Pdf File

I have to convert into a single pdf a large number (but undefined) pdf into one for this, I'm using the code PDFsharp here.
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
//nProcessedFile = 0;
//nStepConverted++;
//sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
//outputDocument.Save(sNameLastCombineFile);
//outputDocument.Close();
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
For small numbers of files the code works but then at some point
generates an exception of out of memory
is there a solution?
p.s
I was thinking of saving the files in step and then the remaining aggiungingere so liebrare memory but I can not find the way.
UPDATE1:
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
outputDocument = PdfReader.Open(sNameLastCombineFile,PdfDocumentOpenMode.Modify);
}
UPDATE 2 versione 1.32
Complete example
Error on line:
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
Text error:
Cannot handle iref streams. The current implementation of PDFsharp cannot handle this PDF feature introduced with Acrobat 6.
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<String> filesToPrint = new List<string>();
filesToPrint = Directory.GetFiles(#"D:\Downloads\RACCOLTA\FILE PDF", "*.pdf").ToList();
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
try
{
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
inputDocument = PdfReader.Open(sNameLastCombineFile , PdfDocumentOpenMode.Modify);
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.ReadKey();
}
}
}
}
UPDATE3
Code that generate exception out of memory
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
newPage = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(newPage);
newPage.Close();
}
I can not exactly which row general exception

I had a simular issue, saving, closing and reopening the PdfDocument did not really help.
I am adding al lot (100+) large (upto 5Mb) images (tiff, jpg, etc) to a pdf document where every images has its own page. It crashed around image #50. After the save-close-reopen it did finish the whole document but was still getting close to max memory, around 3Gb. Some more images and it would still crash.
After more refining, I implemented a using for the XGraphics object, it was a little better again but not much.
The big step forward was disposing of the XImage within the loop! After that the application never used more than 100-200Kb, I removed the save-close-reopen for the PdfDocument and it was no problem.

After saving and closing outputDocument (the code is commented out in your snippet), you have to open outputDocument again, using PdfDocumentOpenMode.Modify.
It could help to add using(...) for the inputDocument.
If your code is running as a 32-bit process, then switching to 64 bit will allow your process to use more than 2 GB of RAM (assuming your computer has more than 2 GB RAM).
Update: The message "Cannot handle iref streams" means you have to use PDFsharp 1.50 Prerelease, available on NuGet.

MailMerge with OpenXML

I have a sharepoint hosted application which contains a docx template file which has mailmerge fields like << Customer_Name >>.
It is a 1 Page document. I have to create a new docx file from this template which might contain multiple pages depending upon the number of customer. The content will be repeated and the merge fields has to be replaced with data from datatable for each page.
I tried using AltChunk but after using this method i cannot find and replace the text fields.
using (WordprocessingDocument template = WordprocessingDocument.Open(documentStream, true))
{
template.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = template.MainDocumentPart;
for (int i = 0; i < dt.Rows.Count; i++)
{
if (dt.Rows[i][1].ToString() != "")
{
ReplaceText(mainPart, "«customer_Address»", dt.Rows[i][1].ToString());
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 15);
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(filestream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
I am using ReplaceText Method to Replace text from paragraph.
private static void ReplaceText(MainDocumentPart docPart, string match, string value)
{
var body = docPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
if (text.Text.Contains(match))
{
text.Text = text.Text.Replace(match, value);
}
}
}
This ReplaceText works fine for origional mainPart but does nothing for text added using AltChunk.
What would be easier way to generate multi page document in my case?

Import pdf at a specific page

I have the requirement to merge pdf together. I need to import a pdf at a specific page into another one.
Let me illustrate this to you.
I have two pdf, first one is 50 pages long and the second one is 4pages long. I need to import the second one at the 13th page of the first pdf.
I don't find any exemple. There are plenty exemple on how to merge pdf but nothing about merging at a specific page.
Based on this exemple it look like I need to iterate over all pages one by one and import them in a new pdf. That look a bit painfull espicially if you have big pdf and need to merge many. I would create x new pdf to merge x+1 pdf.
Is there something I don't understand or is it really the way to go?

Borrowing from the example, this should be easy to do with a few modifications. You just need to add all the pages before the merge, then all the pages from the second document, then all the rest of the original pages.
Try something like this (not tested or robust - just a starting point maybe):
// Used the ExtractPages as a starting point.
public void MergeDocuments(string sourcePdfPath1, string sourcePdfPath2,
string outputPdfPath, int insertPage) {
PdfReader reader1 = null;
PdfReader reader2 = null;
Document sourceDocument1 = null;
Document sourceDocument2 = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try {
reader1 = new PdfReader(sourcePdfPath1);
reader2 = new PdfReader(sourcePdfPath2);
// Note, I'm assuming pages are 0 based. If that's not the case, change to 1.
sourceDocument1 = new Document(reader1.GetPageSizeWithRotation(0));
sourceDocument2 = new Document(reader2.GetPageSizeWithRotation(0));
pdfCopyProvider = new PdfCopy(sourceDocument1,
new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument1.Open();
sourceDocument2.Open();
int length1 = reader1.NumberOfPages;
int length2 = reader2.NumberOfPages;
int page1 = 0; // Also here I'm assuming pages are 0-based.
// Having these three loops is the key. First is pages before the merge.
for (;page1 < insertPage && page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
// These are the pages from the second document.
for (int page2 = 0; page2 < length2; page2++) {
importedPage = pdfCopyProvider.GetImportedPage(reader2, page2);
pdfCopyProvider.AddPage(importedPage);
}
// Finally, add the remaining pages from the first document.
for (;page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument1.Close();
sourceDocument2.Close();
reader1.Close();
reader2.Close();
} catch (Exception ex) {
throw ex;
}
}

Saving 2 copies of PDF template, 2nd file corrupt - iTextSharp

I have a PDF template file for 60 labels per page. My goal was to make copies of the template as needed, fill in the form data and then merge the files into a single PDF (or provide links to individual files...either works)
The problem is that the 2nd PDF copy comes out corrupt regardless of date.
The workflow is user selects a date. The lunch orders for that day are gathered into a generic list that in turn is used to fill in the form fields on the template. At 60, the file is saved as a temp file and a new copy of the template is used for the next 60 names, etc...
09/23/2013 through 09/25 have data. On the 25th there are only 38 orders, so this works as intended. On 09/24/2013 there are over 60 orders, the first page works, but the 2nd page is corrupt.
private List<string> CreateLabels(DateTime orderDate)
{
// create file name to save
string fName = ConvertDateToStringName(orderDate) + ".pdf"; // example 09242013.pdf
// to hold Temp File Names
List<string> tempFNames = new List<string>();
// Get path to template/save directory
string path = Server.MapPath("~/admin/labels/");
string pdfPath = path + "8195a.pdf"; // template file
// Get the students and their lunch orders
List<StudentLabel> labels = DalStudentLabel.GetStudentLabels(orderDate);
// Get number of template pages needed
decimal recCount = Convert.ToDecimal(labels.Count);
decimal pages = Decimal.Divide(recCount, 60);
int pagesNeeded = Convert.ToInt32(Math.Ceiling(pages));
// Make the temp names
for (int c = 0; c < pagesNeeded; c++)
{
tempFNames.Add(c.ToString() + fName); //just prepend a digit to the date string
}
//Create copies of the empty templates
foreach (string tName in tempFNames)
{
try
{ File.Delete(path + tName); }
catch { }
File.Copy(pdfPath, path + tName);
}
// we know we need X pages and there is 60 per page
int x = 0;
// foreach page needed
for (int pCount = 0; pCount < pagesNeeded; pCount++)
{
// Make a new page
PdfReader newReader = new PdfReader(pdfPath);
// pCount.ToString replicates temp names
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Open))
{
PdfStamper stamper = new PdfStamper(newReader, stream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
StudentLabel lbl = null;
string lblInfo = "";
// fill in acro fields with lunch data
foreach (string fieldKey in fieldKeys)
{
try
{
lbl = labels[x];
}
catch
{
break;
} // if we're out of labels, then we're done
lblInfo = lbl.StudentName + "\n";
lblInfo += lbl.Teacher + "\n";
lblInfo += lbl.MenuItem;
form.SetField(fieldKey, lblInfo);
x++;
if (x % 60 == 0) // reached 60, time for new page
{
break;
}
}
stamper.Writer.CloseStream = false;
stamper.FormFlattening = true;
stamper.Close();
newReader.Close();
stream.Flush();
stream.Close();
}
}
return tempFNames;
}

Why are you pre-allocating your files? My guess is that's your problem. You're binding a PdfStamper to a PdfReader for input and an exact copy of the same pdf to a FileStream object for output. The PdfStamper will generate your output file for you, you don't need to help it. You're trying to append new data to an existing file and I'm not quite sure what happens in that case (as I've never actually seen anyone do it.)
So drop your whole File.Copy pre-allocation and change your FileStream declaration to:
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Create, FileAccess.Write, FileShare.None))
You'll obviously also need to adjust how your return array gets populated, too.

Edit a specific Line of a Text File in C#

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?

the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);

You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.

When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.

You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.

I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}

The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# / openxml merge of word documents to one, fails --closed-- - c#

Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine

Related

Pdfsharp Out of Memory Exception when Combine Multi Pdf File

MailMerge with OpenXML

Import pdf at a specific page

Saving 2 copies of PDF template, 2nd file corrupt - iTextSharp

Edit a specific Line of a Text File in C#

Categories

Resources