I am trying to replace text in a docx document based on this sample, with some modifications: https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part#sample-code
However, the saved document is not valid anymore. Word is able to correct the file, but there is a Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory. exception is thrown at System.IO.Compression.ZipArchive.ReadCentralDirectory() when trying to open the created file again with WordprocessingDocument.
My code looks like this:
using (var fs = new FileStream(fn, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var ms = new MemoryStream())
{
await fs.CopyToAsync(ms);
using (var wordDoc = WordprocessingDocument.Open(ms, true))
{
string docText;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
/*Regex regexText = new Regex("text to replace");
docText = regexText.Replace(docText, "new text");*/
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
await File.WriteAllBytesAsync(target, ms.GetBuffer());
}
using (var wordDoc = WordprocessingDocument.Open(target, true))
{
}
The issue is not repated to the replace itself. Even reading the MainDocumentPart in any way causes this exception to be thrown.
Why the streams? I want to create and modify a document from template and save it afterwards to a stream. But I haven't found any CreateFromTemplate overload neither a Save/SaveAs overload that accepts a stream.
Related
I am reading a docx file using DocumentFormat.OpenXml lib.
I am manipulating the file and need to write it to the disk.
Doing this using the openxml lib is no brainer, the problem is that I need to pass the file content (byte[]) to a different API in my code and this API is handling the save operation.
This api is using File.WriteAllBytes. When I try to save my file via File.WriteAllBytes I get XML inside the doc instead of the doc read content.
How can I extract the byte[] from the doc and save it to the disc using File.WriteAllBytes
var path = "path/to/doc.docx";
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
docText = new Regex("BBB").Replace(docText, "CCC!");
// here i will manipuldate docText
MemoryStream ms = new MemoryStream();
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(ms , WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
Body body = new Body(new Paragraph(new Run(new Text(docText))));
mainPart.Document = new Document(body);
}
File.WriteAllBytes("path/to/cloned.docx", ms.ToArray());
}
this should do the trick:
(tested with SampleDoc.docx from Github)
var path = #"path/to/doc.docx";
byte[] byteArray = File.ReadAllBytes(path);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
Body body = wordDoc.MainDocumentPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
text.Text = text.Text.Replace("BBB", "CCC!");
}
wordDoc.Close();
}
File.WriteAllBytes(path+".mod.docx", stream.ToArray());
}
debug output:
It looks like you want to manipulate the raw XML, so you actually shouldn't use DocumentFormat.OpenXml at all--just treat your docx like the raw ZIP file that it is. Here is some sample code:
using System.IO;
using System.IO.Compression;
public static byte[] Change(string path)
{
// Make a temporary directory
var myTempDir = new DirectoryInfo(Path.Join(Path.GetTempPath(), Path.GetRandomFileName() ));
myTempDir.Create();
// Extract all the XML files in the docx to that temporary directory
using (ZipArchive zipArchive = ZipFile.OpenRead(path))
zipArchive.ExtractToDirectory(myTempDir.FullName);
// Read in the main document XML
FileInfo docFile = new FileInfo(Path.Join(myTempDir.FullName, "word", "document.xml"));
string rawXML = File.ReadAllText(docFile.FullName);
// Manipulate it-- warning, this could break the whole thing
rawXML = rawXML.Replace("winter", "spring");
// Save the manipulated xml back over the old file
docFile.Delete();
File.WriteAllText(docFile.FullName, rawXML);
// Zip our temporary directory back into a docx file
FileInfo tempFile = new FileInfo(Path.GetTempFileName());
ZipFile.CreateFromDirectory(myTempDir.FullName, tempFile.FullName);
// Read the raw bytes in from our new file
byte[] rawBytes = File.ReadAllBytes(tempFile.FullName);
return rawBytes;
}
You might want to delete all those temp files, too--but I'll leave that part to you.
When I execute the below code to save the word file following exception occurs.please help me to correct the issue.thanks.
Cannot get stream with FileMode.Create, FileMode.CreateNew, FileMode.Truncate, FileMode.Append when access is FileAccess.Read.
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, false))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
//Regex regexText = new Regex("<var_Date>");
docText = docText.Replace("<var_Date>", DateTime.Now.ToString("MMM dd,yyyy"));
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
sw.Write(docText);
}
Right here:
WordprocessingDocument.Open(path, false))
The second argument is isEditable, you passed false. So you've opened it as read-only.
Ref: https://msdn.microsoft.com/en-us/library/cc562234.aspx
I know there is alot of posts on it, BUT nothing worked for my problem:
Im using OPENxml to create word document, and I am adding some ready files to the document during the creation. I want to change some text in the file that I am adding after the document is ready. So thats what I tried:
First creating the document:
fileName = HttpContext.Current.Server.MapPath("~/reports/"+fileName+".docx");
using (var doc = WordprocessingDocument.Create(
fileName, WordprocessingDocumentType.Document))
{
///add files and content inside the document
addContentFile("template1part1", HttpContext.Current.Server.MapPath("~/templates/template1part1.docx"), mainPart);
}
this is how I am adding the files:
private static void addContentFile(string id,string path, MainDocumentPart mainPart){
string altChunkId = id;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(path, FileMode.Open))
{
chunk.FeedData(fileStream);
fileStream.Close();
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.Append(altChunk);
mainPart.Document.Save();
}
And this is how I am trying to replace text AFTER I created the file (after i finished to use WordprocessingDocument)
First try:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
docText = sr.ReadToEnd();
docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
sw.Write(docText);
}
Second try:
using ( WordprocessingDocument doc =
WordprocessingDocument.Open(#"yourpath\testdocument.docx", true))
{
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Elements<Paragraph>();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("text-to-replace"))
{
text.Text = text.Text.Replace("text-to-replace", "replaced-text");
}
}
}
}
}
}
None of them worked, and I tried much more.
Its worked for text that I am manually add to the document, but its now working for text that I am adding from the ready files.
there is a way to do it?
The way you are adding the files are using altchuncks. But you are trying to replace things as if you are modifying the resulting document's openxml.
When you merge documents as altchuncks you are basically adding them as embedded external files to the original document but not as openxml markup. Which means you cannot treat the additional attached documents as openxml documents.
If you want to achieve what you are trying, you have to merge the documents as explained in my answer here - https://stackoverflow.com/a/18352219/860243 which makes the resulting document a proper openxml document. Which allows you to modify it later as you wish.
We have a system that stores some custom templating data in a Word document. Sometimes, updating this data causes Word to complain that the document is corrupted. When that happens, if I unzip the docx file and compare the contents to the previous version, the only difference appears to be the expected change in the customXML\item.xml file. If I re-zip the contents using 7zip, it seems to work OK (Word no longer complains that the document is corrupt).
The (simplified) code:
void CreateOrReplaceCustomXml(string filename, MyCustomData data)
{
using (var doc = WordProcessingDocument.Open(filename, true))
{
var part = GetCustomXmlParts(doc).SingleOrDefault();
if (part == null)
{
part = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
}
var serializer = new DataContractSerializer(typeof(MyCustomData));
using (var stream = new MemoryStream())
{
serializer.WriteObject(stream, data);
stream.Seek(0, SeekOrigin.Begin);
part.FeedData(stream);
}
}
}
IEnumerable<CustomXmlPart> GetCustomXmlParts(WordProcessingDocument doc)
{
return doc.MainDocumentPart.CustomXmlParts
.Where(part =>
{
using (var stream = doc.Package.GePart(c.Uri).GetStream())
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd().Contains("Some.Namespace");
}
});
}
Any suggestions?
Since re-zipping works, it seems the content is well-formed.
So it sounds like the zip process is at fault. So open the corrupted docx in 7-Zip, and take note of the values in the "method" column (especially for customXML\item.xml).
Compare that value to a working docx - is it the same or different? Method "Deflate" works.
I faced the same issue and it turned out it was due to encoding.
Do you already specify the same encoding when serializing/deserializing?
Couple of suggestion
a. Try doc.Package.Flush(); after you write the data back into the custom xml.
b. You may have to delete all custom part and add a new custom part. We are using the following code and it seems working fine.
public static void ReplaceCustomXML(WordprocessingDocument myDoc, string customXML)
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
{
ts.Write(customXML);
ts.Flush();
ts.Close();
}
}
public static MemoryStream GetCustomXmlPart(MainDocumentPart mainPart)
{
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
reader.MoveToContent();
if (reader.Name.Equals("aaaa", StringComparison.OrdinalIgnoreCase))
{
string str = reader.ReadOuterXml();
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
return stream;
}
}
}
return null; //result;
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(ms, true))
{
StreamReader reader = new StreamReader(memStream);
string FullXML = reader.ReadToEnd();
ReplaceCustomXML(myDoc, FullXML);
myDoc.Package.Flush();
//Code to save file
}
I have a RTF file that I want to open, replace a String "TEMPLATE_Name" and save. But after saving, the file cannot open correctly again. When I use MS Word, the file opens and shows the RTF raw code instead the text.
I am afraid I am breaking the format or the encoding but I don't really know how:
using (MemoryStream ms = new MemoryStream(1000))
using (StreamWriter sw = new StreamWriter(ms,Encoding.UTF8))
{
using (Stream fsSource = new FileStream(Server.MapPath("~/LetterTemplates/TestTemplate.rtf"), FileMode.Open))
using (StreamReader sr = new StreamReader(fsSource,Encoding.UTF8))
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
line = line.Replace("TEMPLATE_Name", model.FirstName + " " + model.LastName);
sw.WriteLine(line);
}
ms.Position = 0;
using (FileStream fs = new FileStream(Server.MapPath("~/LetterTemplates/test.rtf"), FileMode.Create))
ms.CopyTo(fs);
}
Any idea about what could be the issue?
Thanks.
SOLUTION: One problem was what #BrokenGlass has pointed out, the fact I was not flushing the stream. The other was the encoding. In the fist line of the RTF file I can see:
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\
So, even without understand anything about RTF, I set the encoding to code page 1252 and it works:
using (MemoryStream ms = new MemoryStream(1000))
using (StreamWriter sw = new StreamWriter(ms,Encoding.GetEncoding(1252)))
{
using (Stream fsSource = new FileStream(Server.MapPath("~/LetterTemplates/TestTemplate.rtf"), FileMode.Open))
using (StreamReader sr = new StreamReader(fsSource,Encoding.GetEncoding(1252)))
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
line = line.Replace("TEMPLATE_Name", model.FirstName + " " + model.LastName);
sw.WriteLine(line);
}
sw.Flush();
ms.Position = 0;
using (FileStream fs = new FileStream(Server.MapPath("~/LetterTemplates/test.rtf"), FileMode.Create))
ms.CopyTo(fs);
}
StreamWriter is buffering content - make sure you call sw.Flush() before reading from your memory stream.
StreamWriter.Flush():
Clears all buffers for the current writer and causes any buffered data
to be written to the underlying stream.
Edit in light of comments:
A better alternative as #leppie alluded to is restructuring the code to use the using block to force flushing, instead of explicitly doing it:
using (MemoryStream ms = new MemoryStream(1000))
{
using (StreamWriter sw = new StreamWriter(ms,Encoding.UTF8))
{
//...
}
ms.Position = 0;
//Write to file
}
An even better alternative as #Slaks pointed out is writing to the file directly and not using a memory stream at all - unless there are other reasons you are doing this this seems to be the most straightforward solution, it would simplify your code and avoid buffering the file in memory.