I have a sharepoint hosted application which contains a docx template file which has mailmerge fields like << Customer_Name >>.
It is a 1 Page document. I have to create a new docx file from this template which might contain multiple pages depending upon the number of customer. The content will be repeated and the merge fields has to be replaced with data from datatable for each page.
I tried using AltChunk but after using this method i cannot find and replace the text fields.
using (WordprocessingDocument template = WordprocessingDocument.Open(documentStream, true))
{
template.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = template.MainDocumentPart;
for (int i = 0; i < dt.Rows.Count; i++)
{
if (dt.Rows[i][1].ToString() != "")
{
ReplaceText(mainPart, "«customer_Address»", dt.Rows[i][1].ToString());
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 15);
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(filestream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
I am using ReplaceText Method to Replace text from paragraph.
private static void ReplaceText(MainDocumentPart docPart, string match, string value)
{
var body = docPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
if (text.Text.Contains(match))
{
text.Text = text.Text.Replace(match, value);
}
}
}
This ReplaceText works fine for origional mainPart but does nothing for text added using AltChunk.
What would be easier way to generate multi page document in my case?
Related
I have searched a lot for the solution but can't find any.
I have a .docx file inside my MVC project folder which I want to open to overwrite some text but I'm unable to do so.
Inside my project folder, I have a Template folder and in this folder a genrated.docx file that I want to open. Here is my code:
using (WordprocessingDocument doc = WordprocessingDocument.Open
(#"~/Template/genrated.docx",true))
{
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Elements<Paragraph>();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("to-replace"))
{
text.Text = text.Text.Replace("to-replace", "replace-with");
run.AppendChild(new Break());
}
}
}
}
}
Please help me with this...
Your simplistic approach to replacing text only works in simple cases. Unfortunately, as soon as you use Microsoft Word to edit your template, your text "to-replace" might get split in multiple runs. This then means that you can't find your text "to-replace" if you only look for it in a single Text instance.
The following unit test demonstrates that by creating a document with two paragraphs, one having a single Text instance with your text "to-replace" and another one in which that same text is split into two Run and Text instances.
using System.Collections.Generic;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using Xunit;
namespace CodeSnippets.Tests.OpenXml.Wordprocessing
{
public class SimplisticTextReplacementTests
{
private const string ToReplace = "to-replace";
private const string ReplaceWith = "replace-with";
private static MemoryStream CreateWordprocessingDocument()
{
var stream = new MemoryStream();
const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, type);
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
mainDocumentPart.Document =
new Document(
new Body(
new Paragraph(
new Run(
new Text(ToReplace))),
new Paragraph(
new Run(
new Text("to-")),
new Run(
new Text("replace")))));
return stream;
}
private static void ReplaceText(MemoryStream stream)
{
using WordprocessingDocument doc = WordprocessingDocument.Open(stream, true);
Body body = doc.MainDocumentPart.Document.Body;
IEnumerable<Paragraph> paras = body.Elements<Paragraph>();
foreach (Paragraph para in paras)
{
foreach (Run run in para.Elements<Run>())
{
foreach (Text text in run.Elements<Text>())
{
if (text.Text.Contains(ToReplace))
{
text.Text = text.Text.Replace(ToReplace, ReplaceWith);
run.AppendChild(new Break());
}
}
}
}
}
[Fact]
public void SimplisticTextReplacementOnlyWorksInSimpleCases()
{
// Arrange.
using MemoryStream stream = CreateWordprocessingDocument();
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false))
{
Document document = wordDocument.MainDocumentPart.Document;
Paragraph firstParagraph = document.Descendants<Paragraph>().First();
Assert.Equal(ToReplace, firstParagraph.InnerText);
Assert.Contains(firstParagraph.Descendants<Text>(), t => t.Text == ToReplace);
Paragraph lastParagraph = document.Descendants<Paragraph>().Last();
Assert.Equal(ToReplace, lastParagraph.InnerText);
Assert.DoesNotContain(lastParagraph.Descendants<Text>(), t => t.Text == ToReplace);
}
// Act.
ReplaceText(stream);
// Assert.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false))
{
Document document = wordDocument.MainDocumentPart.Document;
Paragraph firstParagraph = document.Descendants<Paragraph>().First();
Assert.Equal(ReplaceWith, firstParagraph.InnerText);
Assert.Contains(firstParagraph.Descendants<Text>(), t => t.Text == ReplaceWith);
Paragraph lastParagraph = document.Descendants<Paragraph>().Last();
Assert.NotEqual(ReplaceWith, lastParagraph.InnerText);
Assert.DoesNotContain(lastParagraph.Descendants<Text>(), t => t.Text == ReplaceWith);
}
}
}
}
I have to merge several word documents with a small c# console application. So far so good. The documents are generated in arcplan. Around 30 files are generated, but somehow some documents are corrupted, but still shows me Content.
If I merge now all files which are correct my document is fine but if i have a corrupted file in my bunch of files any corrupted generates an empty page. I debugged it of course, but i dont see anything going wrong which explains the empty page.
the arguments are like this:
"C:\temp\Report_C_01.docx" "C:\temp\Report_D_01.docx" "C:\temp\Report_E_01.docx"
here´s my Code:
public static void Merge(params String[] filepaths)
{
String pathName = Path.GetDirectoryName(filepaths[0]);
subfolder = Path.Combine(pathName, "Output\\"); //Wird für den gemergten File benötigt
if (filepaths != null && filepaths.Length > 1)
{
WordprocessingDocument myDoc = WordprocessingDocument.Open(#filepaths[0], true); //Wordfiles werden geöffnet
MainDocumentPart mainPart = myDoc.MainDocumentPart;
for (int i = 1; i < filepaths.Length; i++)
{
String altChunkId = "AltChunkId" + i;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
FileStream fileStream = File.Open(#filepaths[i], FileMode.Open);
chunk.FeedData(fileStream);
DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
altChunk.Id = altChunkId;
//new page, if you like it...
mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page } )));
//next document
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
}
mainPart.Document.Save();
myDoc.Close();
for (int i = 0; i < 1; i++)
{
String fileNameWE = Path.GetFileName(filepaths[i]);
File.Copy(filepaths[i], subfolder + fileNameWE);
}
foreach (String fp in filepaths)
{
File.Delete(fp);
}
}
else
{
Console.WriteLine("Nur 1 Argument");
}
}
Hope someone can help me.
Best regards
Christian
Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine
I have a report that I'm trying to generate using iTextSharp that includes html text entered by the user using tinymce on my web page. I then have a report and I want to insert a phrase that uses their markup.
While basic markup such as bold and underline work, lists, indents, alignment do not. Any suggestions short of writing my own little html to pdf parser?
My code:
internal static Phrase GetPhraseFromHtml(string html, string fontName, int fontSize)
{
var returnPhrase = new Phrase();
html.Replace(Environment.NewLine, String.Empty);
//the string has to be well formated html in order to work and has to specify the font since
//specifying the font in the phrase overrides the formatting of the html tags.
string pTag = string.Format("<p style='font-size: {0}; font-family:{1}'>", fontSize, fontName);
if (html.StartsWith("<p>"))
{
html = html.Replace("<p>", pTag);
}
else
{
html = pTag + html + "</p>";
}
html
= "<html><body>"
+ html
+ "</body></html>";
using (StringWriter sw = new StringWriter())
{
using (System.Web.UI.HtmlTextWriter hw = new System.Web.UI.HtmlTextWriter(sw))
{
var xmlWorkerHandler = new XmlWorkerHandler();
//Bind a reader to our text
using (TextReader textReader = new StringReader(html))
{
//Parse
XMLWorkerHelper.GetInstance().ParseXHtml(xmlWorkerHandler, textReader);
}
var addPhrase = new Phrase();
var elementText = new StringBuilder();
bool firstElement = true;
//Loop through each element
foreach (var element in xmlWorkerHandler.elements)
{
if (firstElement)
{
firstElement = false;
}
else
{
addPhrase.Add(new Chunk("\n"));
}
//Loop through each chunk in each element
foreach (var chunk in element.Chunks)
{
addPhrase.Add(chunk);
}
returnPhrase.Add(addPhrase);
addPhrase = new Phrase();
}
return returnPhrase;
}
}
}
I am using this library to convert an html text to word format.
Everything works perfectly.
I need to style some of the text now. what I am using right now to generate document is that I have a list of heading and sub headings and heading text, I am using for each loop to get heading and subheading and its text and output them but I want these heading and subheading to assign heading1 to category and heading2 to sub category. here is what I got so far:
Foreach loop to get catagories and sub categories with its text
foreach (var category in ct)
{
strDocumentText.Append(category.ParentCat.CategoryName);
strDocumentText.Append("<br />");
if(category.DocumentText != null)
{
strDocumentText.Append(category.DocumentText);
}
if (category.Children != null)
{
foreach (var subCategoreis in category.Children)
{
strDocumentText.Append("<p />");
strDocumentText.Append(subCategoreis.ParentCat.CategoryName);
strDocumentText.Append("<br />");
if (category.DocumentText != null)
{
strDocumentText.Append(subCategoreis.DocumentText);
}
}
}
}
Create word document :
StringBuilder strDocumentText = new StringBuilder();
string html = strDocumentText.ToString();
using (MemoryStream generatedDocument = new MemoryStream())
{
BuildDocument(generatedDocument, html);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
if (mainPart == null)
{
mainPart = wordDoc.AddMainDocumentPart();
new DocumentFormat.OpenXml.Wordprocessing.Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
Body body = mainPart.Document.Body;
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
fs.Close();
File.WriteAllBytes(saveFileDialog1.FileName, generatedDocument.ToArray());
First you need to add the style definitions to the document. The default styles are not included when constructing an OpenXml Document. After you define the styles, you can reference them in the paragraph properties element (serialized as "pPr") OR the run element properties. Take a look at: http://msdn.microsoft.com/en-us/library/cc850838.aspx
I have an HTML table in a view. I'm using ITextSharp 4 to convert the HTML to a PDF using the htmlParser. The table spans multiple pages. How do I get it to show the header on each page? Is there some setting I can turn on in HTML so that ITextSharp can recognise it?
I don't have access to iTextSharp 4.0 but since the HTML parser writes directly to the document I'm not sure if it would be possible without modify the original source. Is it an option to upgrade to 5.0 which completely replaced the HtmlParser with a much more robust HTMLWorker object?
To have a PdfPTable's headers span multiple page you need to set its HeaderRows property to the number of rows in your header. Unfortunately if you're using the HTMLParser or the HTMLWorker they do not currently treat THEAD and TH tags differently than TBODY and TD tags. The solution is to modify the PdfPTable sometime after parsing but before being written to the document. I don't have 4.0 available here but in 5.1.1.0 using the HTMLWorker you can easily do that and manually set the HeaderRows property:
//Output file
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Table.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.NewPage();
//Create some long text to force a new page
string longText = String.Concat(Enumerable.Repeat("Lorem ipsum.", 40));
//Create our table using both THEAD and TH which iTextSharp currently ignores
string html = "<table>";
html += "<thead><tr><th>Header Row 1/Cell 1</th><th>Header Row 1/Cell 2</th></tr><tr><th>Header Row 2/Cell 1</th><th>Header Row 2/Cell 2</th></tr></thead>";
html += "<tbody>";
for (int i = 3; i < 20; i++)
{
html += "<tr>";
html += String.Format("<td>Data Row {0}</td>", i);
html += String.Format("<td>{0}</td>", longText);
html += "</tr>";
}
html += "</tbody>";
html += "</table>";
using (StringReader sr = new StringReader(html))
{
//Get our list of elements (only 1 in this case)
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, null);
foreach (IElement el in elements)
{
//If the element is a table manually set its header row count
if (el is PdfPTable)
{
((PdfPTable)el).HeaderRows = 2;
}
doc.Add(el);
}
}
doc.Close();
}
}
}
you should just be able to set: table.HeaderRows = 1;
this will repeat the header on each page.
Apply the "repeat-header" style, and set to "yes", like so:
<table style="repeat-header:yes;">