Search And Replace Text in OPENXML (Added file) - c#

I know there is alot of posts on it, BUT nothing worked for my problem:
Im using OPENxml to create word document, and I am adding some ready files to the document during the creation. I want to change some text in the file that I am adding after the document is ready. So thats what I tried:
First creating the document:
fileName = HttpContext.Current.Server.MapPath("~/reports/"+fileName+".docx");
using (var doc = WordprocessingDocument.Create(
fileName, WordprocessingDocumentType.Document))
{
///add files and content inside the document
addContentFile("template1part1", HttpContext.Current.Server.MapPath("~/templates/template1part1.docx"), mainPart);
}
this is how I am adding the files:
private static void addContentFile(string id,string path, MainDocumentPart mainPart){
string altChunkId = id;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(path, FileMode.Open))
{
chunk.FeedData(fileStream);
fileStream.Close();
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.Append(altChunk);
mainPart.Document.Save();
}
And this is how I am trying to replace text AFTER I created the file (after i finished to use WordprocessingDocument)
First try:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
docText = sr.ReadToEnd();
docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
sw.Write(docText);
}
Second try:
using ( WordprocessingDocument doc =
WordprocessingDocument.Open(#"yourpath\testdocument.docx", true))
{
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Elements<Paragraph>();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("text-to-replace"))
{
text.Text = text.Text.Replace("text-to-replace", "replaced-text");
}
}
}
}
}
}
None of them worked, and I tried much more.
Its worked for text that I am manually add to the document, but its now working for text that I am adding from the ready files.
there is a way to do it?

The way you are adding the files are using altchuncks. But you are trying to replace things as if you are modifying the resulting document's openxml.
When you merge documents as altchuncks you are basically adding them as embedded external files to the original document but not as openxml markup. Which means you cannot treat the additional attached documents as openxml documents.
If you want to achieve what you are trying, you have to merge the documents as explained in my answer here - https://stackoverflow.com/a/18352219/860243 which makes the resulting document a proper openxml document. Which allows you to modify it later as you wish.

Related

ZIP related exception after saving WordprocessingDocument

I am trying to replace text in a docx document based on this sample, with some modifications: https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part#sample-code
However, the saved document is not valid anymore. Word is able to correct the file, but there is a Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory. exception is thrown at System.IO.Compression.ZipArchive.ReadCentralDirectory() when trying to open the created file again with WordprocessingDocument.
My code looks like this:
using (var fs = new FileStream(fn, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var ms = new MemoryStream())
{
await fs.CopyToAsync(ms);
using (var wordDoc = WordprocessingDocument.Open(ms, true))
{
string docText;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
/*Regex regexText = new Regex("text to replace");
docText = regexText.Replace(docText, "new text");*/
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
await File.WriteAllBytesAsync(target, ms.GetBuffer());
}
using (var wordDoc = WordprocessingDocument.Open(target, true))
{
}
The issue is not repated to the replace itself. Even reading the MainDocumentPart in any way causes this exception to be thrown.
Why the streams? I want to create and modify a document from template and save it afterwards to a stream. But I haven't found any CreateFromTemplate overload neither a Save/SaveAs overload that accepts a stream.

Convert docx to byte[] and save it to the disk using File.WriteAllBytes

I am reading a docx file using DocumentFormat.OpenXml lib.
I am manipulating the file and need to write it to the disk.
Doing this using the openxml lib is no brainer, the problem is that I need to pass the file content (byte[]) to a different API in my code and this API is handling the save operation.
This api is using File.WriteAllBytes. When I try to save my file via File.WriteAllBytes I get XML inside the doc instead of the doc read content.
How can I extract the byte[] from the doc and save it to the disc using File.WriteAllBytes
var path = "path/to/doc.docx";
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
docText = new Regex("BBB").Replace(docText, "CCC!");
// here i will manipuldate docText
MemoryStream ms = new MemoryStream();
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(ms , WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
Body body = new Body(new Paragraph(new Run(new Text(docText))));
mainPart.Document = new Document(body);
}
File.WriteAllBytes("path/to/cloned.docx", ms.ToArray());
}
this should do the trick:
(tested with SampleDoc.docx from Github)
var path = #"path/to/doc.docx";
byte[] byteArray = File.ReadAllBytes(path);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
Body body = wordDoc.MainDocumentPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
text.Text = text.Text.Replace("BBB", "CCC!");
}
wordDoc.Close();
}
File.WriteAllBytes(path+".mod.docx", stream.ToArray());
}
debug output:
It looks like you want to manipulate the raw XML, so you actually shouldn't use DocumentFormat.OpenXml at all--just treat your docx like the raw ZIP file that it is. Here is some sample code:
using System.IO;
using System.IO.Compression;
public static byte[] Change(string path)
{
// Make a temporary directory
var myTempDir = new DirectoryInfo(Path.Join(Path.GetTempPath(), Path.GetRandomFileName() ));
myTempDir.Create();
// Extract all the XML files in the docx to that temporary directory
using (ZipArchive zipArchive = ZipFile.OpenRead(path))
zipArchive.ExtractToDirectory(myTempDir.FullName);
// Read in the main document XML
FileInfo docFile = new FileInfo(Path.Join(myTempDir.FullName, "word", "document.xml"));
string rawXML = File.ReadAllText(docFile.FullName);
// Manipulate it-- warning, this could break the whole thing
rawXML = rawXML.Replace("winter", "spring");
// Save the manipulated xml back over the old file
docFile.Delete();
File.WriteAllText(docFile.FullName, rawXML);
// Zip our temporary directory back into a docx file
FileInfo tempFile = new FileInfo(Path.GetTempFileName());
ZipFile.CreateFromDirectory(myTempDir.FullName, tempFile.FullName);
// Read the raw bytes in from our new file
byte[] rawBytes = File.ReadAllBytes(tempFile.FullName);
return rawBytes;
}
You might want to delete all those temp files, too--but I'll leave that part to you.

Generate docx from dotx without Microsoft Interop

I am using this code to replace 2 words and produce a *.doc (destinationFile) file from a *.dotx (sourceFile) file .
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("xxxxReplacethat1", "replaced1");
keyValues.Add("xxxxReplacethat2", "replaced2");
File.Copy(sourceFile, destinationFile, true);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(destinationFile, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in keyValues)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
How I can modify this code to produce a *.docx because I need to append some lines to the *.docx file in another function.
I don't want to use Microsoft Interop because I don't want to install it on server.
Did this myself a couple of weeks ago.
Copy the file first then open the copy and change it's document type
var template = #"SourceTemplate.dotx";
var destinationFile = #"DestinationFile.docx";
File.Copy(template, destinationFile);
using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true)) {
// Change the document's type here
document.ChangeDocumentType(WordprocessingDocumentType.Document);
// Do any additional processing here
document.Close();
}

Error while parsing Tinymce Editor string

I am parsing the HTML with the help of following code. The html comes from tinymce editor. I have just shortened my code. There can be any number of images in the EmailBody string as it is selected by the user in tinymce editor.
All works good except when there is <img src=""> tag in the Email Body.
I get the error on this line htmlWorker.Parse(sr);
string EmailBody = #"<p><img src=""http://weknowyourdreams.com/images/smile/smile-07.jpg""></p>";
using (var ms = new MemoryStream())
{
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (var doc = new Document())
{
//Create a writer that's bound to our PDF abstraction and our stream
using (var writer = PdfWriter.GetInstance(doc, ms))
{
//Open the document for writing
doc.Open();
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc))
{
//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
using (var sr = new StringReader(EmailBody))
{
//Parse the HTML
htmlWorker.Parse(sr);
}
}
doc.Close();
}
}
bytes = ms.ToArray();
}
Gives me this error:
Cannot access a closed Stream
How to fix this error?

Editing custom XML part in word document sometimes corrupts document

We have a system that stores some custom templating data in a Word document. Sometimes, updating this data causes Word to complain that the document is corrupted. When that happens, if I unzip the docx file and compare the contents to the previous version, the only difference appears to be the expected change in the customXML\item.xml file. If I re-zip the contents using 7zip, it seems to work OK (Word no longer complains that the document is corrupt).
The (simplified) code:
void CreateOrReplaceCustomXml(string filename, MyCustomData data)
{
using (var doc = WordProcessingDocument.Open(filename, true))
{
var part = GetCustomXmlParts(doc).SingleOrDefault();
if (part == null)
{
part = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
}
var serializer = new DataContractSerializer(typeof(MyCustomData));
using (var stream = new MemoryStream())
{
serializer.WriteObject(stream, data);
stream.Seek(0, SeekOrigin.Begin);
part.FeedData(stream);
}
}
}
IEnumerable<CustomXmlPart> GetCustomXmlParts(WordProcessingDocument doc)
{
return doc.MainDocumentPart.CustomXmlParts
.Where(part =>
{
using (var stream = doc.Package.GePart(c.Uri).GetStream())
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd().Contains("Some.Namespace");
}
});
}
Any suggestions?
Since re-zipping works, it seems the content is well-formed.
So it sounds like the zip process is at fault. So open the corrupted docx in 7-Zip, and take note of the values in the "method" column (especially for customXML\item.xml).
Compare that value to a working docx - is it the same or different? Method "Deflate" works.
I faced the same issue and it turned out it was due to encoding.
Do you already specify the same encoding when serializing/deserializing?
Couple of suggestion
a. Try doc.Package.Flush(); after you write the data back into the custom xml.
b. You may have to delete all custom part and add a new custom part. We are using the following code and it seems working fine.
public static void ReplaceCustomXML(WordprocessingDocument myDoc, string customXML)
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
{
ts.Write(customXML);
ts.Flush();
ts.Close();
}
}
public static MemoryStream GetCustomXmlPart(MainDocumentPart mainPart)
{
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
reader.MoveToContent();
if (reader.Name.Equals("aaaa", StringComparison.OrdinalIgnoreCase))
{
string str = reader.ReadOuterXml();
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
return stream;
}
}
}
return null; //result;
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(ms, true))
{
StreamReader reader = new StreamReader(memStream);
string FullXML = reader.ReadToEnd();
ReplaceCustomXML(myDoc, FullXML);
myDoc.Package.Flush();
//Code to save file
}

Categories

Resources