add html content in existing docx file using openxml in C# - c#

How do I add/append HTML content in an existing .docx file, using OpenXML in asp.net C#?
In an existing word file, I want to append the html content part.
For example:
In this example, I want to place "This is a Heading" inside a H1 tag.
Here its my code
protected void Button1_Click(object sender, EventArgs e)
{
try
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"C:\Users\admin\Downloads\WordGenerator\WordGenerator\FTANJS.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(new Justification() { Val = JustificationValues.Center }), run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
}
catch (Exception ex)
{
ex.ToString ();
}
}

Add HTML content as Chunk should work, and you are almost there.
If I understand the question properly, this code should work.
//insert html content to H1 tag
using(WordprocessingDocument fDocx = WordprocessingDocument.Open(sDocxFile,true))
{
string sChunkID = "myhtmlID";
AlternativeFormatImportPart oChunk = fDocx.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, sChunkID);
using(FileStream fs = File.Open(sHtml,FileMode.OpenOrCreate))
{
oChunk.FeedData(fs);
}
AltChunk oAltChunk = new AltChunk();
oAltChunk.Id =sChunkID ;
//insert html to the tag of 'H1' and remove H1.
Body body = fDocx.MainDocumentPart.Document.Body;
Paragraph theParagraph = body.Descendants<Paragraph>().Where(p => p.InnerText == "H1").FirstOrDefault();
theParagraph.InsertAfterSelf<AltChunk>(oAltChunk);
theParagraph.Remove();
fDocx.MainDocumentPart.Document.Save();
}

The short answer is "You can't add HTML to a docx file".
Docx is an open format defined here. If you're using the Microsoft version they have a number of extensions.
In any case, the file contains XML, not HTML and you can't simply add HTML to a docx file. There are styles and formatting objects and pointers that all need to be updated.
If you need to modify a docx file and don't want to do a lot of research and a lot of coding, you'll need to find an existing library to work with.

Related

How to edit bookmarks in a Word template using DocumentFormat.OpenXml and save it as a new PDF file?

I'm really having trouble in editing bookmarks in a Word template using Document.Format.OpenXML and then saving it to a new PDF file.
I cannot use Microsoft.Word.Interop as it gives a COM error on the server.
My code is this:
public static void CreateWordDoc(string templatePath, string destinationPath, Dictionary<string, dynamic> dictionary)
{
byte[] byteArray = File.ReadAllBytes(templatePath);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
var bookmarks = (from bm in wordDoc.MainDocumentPart.Document.Body.Descendants<BookmarkStart>()
select bm).ToList();
foreach (BookmarkStart mark in bookmarks)
{
if (mark.Name != "Table" && mark.Name != "_GoBack")
{
UpdateBookmark(dictionary, mark);//Not doing anything
}
else if (mark.Name != "Table")
{
// CreateTable(dictionary, wordDoc, mark);
}
}
File.WriteAllBytes("D:\\RohitDocs\\newfile_rohitsingh.docx", stream.ToArray());
wordDoc.Close();
}
// Save the file with the new name
}
}
private static void UpdateBookmark(Dictionary<string, dynamic> dictionary, BookmarkStart mark)
{
string name = mark.Name;
string value = dictionary[name];
Run run = new Run(new Text(value));
RunProperties props = new RunProperties();
props.AppendChild(new FontSize() { Val = "20" });
run.RunProperties = props;
var paragraph = new DocumentFormat.OpenXml.Wordprocessing.Paragraph(run);
mark.Parent.InsertAfterSelf(paragraph);
paragraph.PreviousSibling().Remove();
mark.Remove();
}
I was trying to replace bookmarks with my text but the UpdateBookmark method doesn't work. I'm writing stream and saving it because I thought if bookmarks are replaced then I can save it to another file.
I think you want to make sure that when you reference mark.Parent that you are getting the correct instance that you are expecting.
Once you get a reference to the correct Paragraph element where your content should go, use the following code to add/swap the run.
// assuming you have a reference to a paragraph called "p"
p.AppendChild<Run>(new Run(new Text(content)) { RunProperties = props });
// and here is some code to remove a run
p.RemoveChild<Run>(run);
To answers the second part of your question, when I did a similar project a few years ago we used iTextSharp to create PDFs from Docx. It worked very well and the API was easy to grok. We even added password encryption and embedded watermarks to the PDFs.

Word addin trying to show both XML and text but if the text have image the doc File corrupted?

I build word addin that have two check-boxes to swabbing between TEXT view and XML View.
And in the XML viewing I restrict the editing. When the user return back to TEXT viewing I remove the restriction for editing.
The code:
private void ShowDocBodyXML_Click(object sender, RibbonControlEventArgs e)
{
var doc = Globals.DLPAddIn.Application.ActiveDocument;
doc.Save();
string fileName = doc.FullName;
doc.Close();
using (WordprocessingDocument document = WordprocessingDocument.Open(fileName, true))
{
MainDocumentPart mainPart = document.MainDocumentPart;
Body body = mainPart.Document.Body;
string text = body.InnerXml;
body.RemoveAllChildren();
DocumentFormat.OpenXml.Wordprocessing.Paragraph para = body.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(text));
}
Globals.DLPAddIn.Application.Documents.Open(fileName);
doc = Globals.DLPAddIn.Application.ActiveDocument;
object missing = System.Reflection.Missing.Value;
doc.Protect(Microsoft.Office.Interop.Word.WdProtectionType.wdAllowOnlyReading,
missing, missing, missing, missing);
doc.Save();
}
private void ShowDocBodyText_Click(object sender, RibbonControlEventArgs e)
{
var doc = Globals.DLPAddIn.Application.ActiveDocument;
object missing = System.Reflection.Missing.Value;
if (doc.ProtectionType != WdProtectionType.wdNoProtection)
doc.Unprotect(missing);
doc.Save();
string fileName = doc.FullName;
doc.Close();
using (WordprocessingDocument document = WordprocessingDocument.Open(fileName, true))
{
MainDocumentPart mainPart = document.MainDocumentPart;
Body body = mainPart.Document.Body;
string text = body.InnerText;
body.RemoveAllChildren();
body.InnerXml = text;
}
Globals.DLPAddIn.Application.Documents.Open(fileName);
}
The code is working with out any problem but if the Word Doc have an image when the user trying to return back to TEXT viewing the the code throw exception (The file was corrupted) in this code:
Globals.DLPAddIn.Application.Documents.Open(fileName);
I think when I restrict the editing it remove the image file.
If it's like that how can i solve this problem.
Your current approach does not work when there are images or other resources in the document because your strip all images when creating the Word document containing only the XML of the MainDocumentPart.
One solution to this is to show the OpenXML in the so-called Flat OPC format. This format describes the entire OpenXML (zip) package in a single XML document (without the hierarchical structure of the OpenXML package).
The easiest way to obtain the XML in Flat OPC format is to use the Document.WordOpenXML property:
var doc = Globals.DLPAddIn.Application.ActiveDocument;
var xml = doc.WordOpenXML;
var newDoc = Globals.DLPAddIn.Application.Documents.Add();
newDoc.Range.Text = xml

Value of a string for file's location is nil but a stored value says it isn't

I'm trying to convert secured PDFs to XPS and back to PDF using FreeSpire and then combine them using iTextSharp. Below is my code snippet for converting various files.
char[] delimiter = { '\\' };
string WorkDir = #"C:\Users\*******\Desktop\PDF\Test";
Directory.SetCurrentDirectory(WorkDir);
string[] SubWorkDir = Directory.GetDirectories(WorkDir);
//convert items to PDF
foreach (string subdir in SubWorkDir)
{
string[] samplelist = Directory.GetFiles(subdir);
for (int f = 0; f < samplelist.Length - 1; f++)
{
if (samplelist[f].EndsWith(".doc") || samplelist[f].EndsWith(".DOC"))
{
Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument();
doc.LoadFromFile(sampleist[f], FileFormat.DOC);
doc.SaveToFile((Path.ChangeExtension(samplelist[f],".pdf")), FileFormat.PDF);
doc.Close();
}
. //other extension cases
.
.
else if (samplelist[f].EndsWith(".pdf") || sampleList[f].EndsWith(".PDF"))
{
PdfReader reader = new PdfReader(samplelist[f]);
bool PDFCheck = reader.IsOpenedWithFullPermissions;
reader.Close();
if (PDFCheck)
{
Console.WriteLine("{0}\\Full Permisions", Loan_list[f]);
reader.Close();
}
else
{
Console.WriteLine("{0}\\Secured", samplelist[f]);
Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument();
string path = Loan_List[f];
doc.LoadFromFile(samplelist[f]);
doc.SaveToFile((Path.ChangeExtension(samplelist[f], ".xps")), FileFormat.XPS);
doc.Close();
Spire.Pdf.PdfDocument doc2 = new Spire.Pdf.PdfDocument();
doc2.LoadFromFile((Path.ChangeExtension(samplelist[f], ".xps")), FileFormat.XPS);
doc2.SaveToFile(samplelist[f], FileFormat.PDF);
doc2.Close();
}
The issue is I get a Value cannot be null error in doc.LoadFromFile(samplelist[f]);.I have the string path = sampleList[f]; to check if samplelist[f] was empty but it was not. I tried to replace the samplelist[f] parameter with the variable named path but it also does not go though. I tested the PDF conversion on a smaller scale it it worked (see below)
string PDFDoc = #"C:\Users\****\Desktop\Test\Test\Test.PDF";
string XPSDoc = #"C:\Users\****\Desktop\Test\Test\Test.xps";
//Convert PDF file to XPS file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(PDFDoc);
doc.SaveToFile(XPSDoc, FileFormat.XPS);
doc.Close();
//Convert XPS file to PDF
PdfDocument doc2 = new PdfDocument();
doc2.LoadFromFile(XPSDoc, FileFormat.XPS);
doc2.SaveToFile(PDFDoc, FileFormat.PDF);
doc2.Close();
I would like to understand why I am getting this error and how to fix it.
There would be 2 solutions for the problem you are facing.
Get the Document in the Document Object not in PDFDocument. And then probably try to SaveToFile Something like this
Document document = new Document();
//Load a Document in document Object
document.SaveToFile("Sample.pdf", FileFormat.PDF);
You can use Stream for the same something like this
PdfDocument doc = new PdfDocument();
//Load PDF file from stream.
FileStream from_stream = File.OpenRead(Loan_list[f]);
//Make sure the Loan_list[f] is the complete path of the file with extension.
doc.LoadFromStream(from_stream);
//Save the PDF document.
doc.SaveToFile(Loan_list[f] + ".pdf",FileFormat.PDF);
Second approach is the easy one, but I would recommend you to use the first one as for obvious reasons like document will give better convertability than stream. Since the document have section, paragraph, page setup, text, fonts everything which need to be required to do a better or exact formatting required.

OpenXml-SDK: How to apply FontFamily/Size to AltChunk of Type [TextPlain]

Can anybody show me how to apply Fontfamily/size to an AltChunk of Type
AlternativeFormatImportPartType.TextPlain
This is my Code, but I can´t figure out how to do this at all (even Google doesn´t help)
MainDocumentPart main = doc.MainDocumentPart;
string altChunkId = "AltChunkId" + Guid.NewGuid().ToString().Replace("-", "");
var chunk = main.AddAlternativeFormatImportPart
(AlternativeFormatImportPartType.TextPlain, altChunkId);
using (var mStream = new MemoryStream())
{
using (var writer = new StreamWriter(mStream))
{
writer.Write(value);
writer.Flush();
mStream.Position = 0;
chunk.FeedData(mStream);
}
}
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement afterThat = null;
foreach (var para in main.Document.Body.Descendants<Paragraph>())
{
if (para.InnerText.Equals("Notizen:"))
{
afterThat = para;
}
}
main.Document.Body.InsertAfter(altChunk, afterThat);
if I do it this way I get "Courier New" with a Size of "10,5"
UPDATE
This is the working Solution I came up with:
Convert Plaintext to RTF, change the Fontfamily/size and apply it to the WordProcessingDocument!
public static string PlainToRtf(string value)
{
using (var rtf = new System.Windows.Forms.RichTextBox())
{
rtf.Text = value;
rtf.SelectAll();
rtf.SelectionFont = new System.Drawing.Font("Calibri", 10);
return rtf.Rtf;
}
}
var chunk = main.AddAlternativeFormatImportPart
(AlternativeFormatImportPartType.Rtf, altChunkId);
using (var mStream = new MemoryStream())
{
using (var writer = new StreamWriter(mStream))
{
var rtf = PlainToRtf(value);
writer.Write(rtf);
writer.Flush();
mStream.Position = 0;
chunk.FeedData(mStream);
}
}
//proceed with creating AltChunk and inserting it to the Document...
How to apply FontFamily/Size to AltChunk of Type [TextPlain]
I am afraid this is NOT possible, in any case, not with OpenXml SDK.
Why?
altChunk (Anchor for Imported External Content) object is further designed for importing content in the document. They are 'temporary' objects: it is a just a reference to an external content, that is incorporated "as is" in the document, and then, when the document will be opened and saved with Word, Word converts this external content in valid OpenXml content.
So you can't, for a newly created document, loop into the paragraphs in order to retrieve it and apply a style.
If you import rtf content for example, the style must be applied to rtf before importing it.
In case of plain text TextPlain (= Text file .txt), there is no style conversion (there is no style attached to the text file, you can change the font in NotePad, it will apply to all documents, this is an Application Level property).
And I can confirm that Word creates by default a style with "Courier New 10,5" to display the content of the file. I just tested.
What can I do?
Apply style after the document has been open/saved with Word. Note you will have to retreive the paragrap(s), or you could try to retrieve the style created in the document and change the font here. This link could help to achieve this:
How to: Apply a style to a paragraph in a word processing document (Open XML SDK).
Or maybe it exists(?) a registry key something Like this that you can change to change Word's default behavior on your computer. And even if it is, it doesn't solve the problem for newly created document which is opened the first time on the client.
Note from the OP:
I think a possible Solution to the Problem could be, converting the PlainText to RTF apply StyleInformation and then append it to WordProcessingDocument as AltChunk.
I totally agreed. Just note when he says apply StyleInformation, it means at rtf level.

OpenXML SDK -Converting C# to C++/CLI

I've got C# code for creating a document, I want to write the same in C++/CLI.
private void HelloWorld(string documentFileName)
{
// Create a Wordprocessing document.
using (WordprocessingDocument myDoc =
WordprocessingDocument.Create(documentFileName,
WordprocessingDocumentType.Document))
{
// Add a new main document part.
MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
//Create Document tree for simple document.
mainPart.Document = new Document();
//Create Body (this element contains
//other elements that we want to include
Body body = new Body();
//Create paragraph
Paragraph paragraph = new Paragraph();
Run run_paragraph = new Run();
// we want to put that text into the output document
Text text_paragraph = new Text("Hello World!");
//Append elements appropriately.
run_paragraph.Append(text_paragraph);
paragraph.Append(run_paragraph);
body.Append(paragraph);
mainPart.Document.Append(body);
// Save changes to the main document part.
mainPart.Document.Save();
}
}
Also please suggest me any links where I can find C++/CLI example for OpenXML SDK
Here's a direct translation:
private:
void HelloWorld(String^ documentFileName)
{
msclr::auto_handle<WordprocessingDocument> myDoc(
WordprocessingDocument::Create(
documentFileName, WordprocessingDocumentType::Document
)
);
MainDocumentPart^ mainPart = myDoc->AddMainDocumentPart();
mainPart->Document = gcnew Document;
Body^ body = gcnew Body;
Paragraph^ paragraph = gcnew Paragraph;
Run^ run_paragraph = gcnew Run;
Text^ text_paragraph = gcnew Text(L"Hello World!");
run_paragraph->Append(text_paragraph);
paragraph->Append(run_paragraph);
body->Append(paragraph);
mainPart->Document->Append(body);
mainPart->Document->Save();
}
msclr::auto_handle<> should generally be considered more idiomatic than try..finally, just as std::shared_ptr<> and std::unique_ptr<> are in C++.
I take it you have tried something? I don't have access to a compiler
http://en.wikipedia.org/wiki/C%2B%2B/CLI should get you started.
If you wonder about translating the using construct (good question, had you asked it!), I suggest something along the following lines (note the try {} finally { delete ... } idom)
private:
void HelloWorld(String^ documentFileName)
{
// Create a Wordprocessing document.
WordprocessingDocument ^myDoc = WordprocessingDocument::Create(documentFileName, WordprocessingDocumentType::Document);
try
{
// Add a new main document part.
MainDocumentPart mainPart = myDoc::AddMainDocumentPart();
//Create Document tree for simple document.
mainPart->Document = gcnew Document();
//Create Body (this element contains
//other elements that we want to include
Body body = gcnew Body();
//Create paragraph
Paragraph paragraph = gcnew Paragraph();
Run run_paragraph = gcnew Run();
// we want to put that text into the output document
Text text_paragraph = gcnew Text("Hello World!");
//Append elements appropriately.
run_paragraph->Append(text_paragraph);
paragraph->Append(run_paragraph);
body->Append(paragraph);
mainPart->Document->Append(body);
// Save changes to the main document part.
mainPart->Document->Save();
} finally
{
delete myDoc;
}
}
I want to repeat I have no compiler available at the moment, so it may be rough around the edges, but should provide some information nonetheless
OpenXML SDK C++ Examples
The syntax is basically the same. "new" needs to be replaced by gcnew, the "." by -> (e.g. body.Append(paragraph) will be body->Append(paragraph). The tricky part will be the "using" directive. To do it the C++ way, you need some kind of smart pointer that "deletes" the object at the end of the block (with delete meaning calling the IDisposable interface) - this is called RAII.

Categories

Resources