EDIT:
the problem is now solved, it was that there is xml code which is named 'name' which i was accidentally changing. the solution was to have a obscure name in the docx file
I am creating a program that modify a word document using open xml but every time the program runs the file gets corrupt and i don't know why or if there is any way around it?
i have had a look and one thing i saw was too make sure i had closed the connection but i tried that but i'm not sure if the connection is still opened
edit:
the output file says it corrupt but when the recovery in ms word run the files is as it should be
from the images/code
the the original file is copied to temp.docx and has "name" in the file
i require the program to replace "name" with another word.
the program is semi working as it changes the value of the document however it is corrupting the document.
link to photos: https://drive.google.com/open?id=0B130JvN0ZPPRODJpZWZENTNUX0E
CODE
private void gen_btn_Click(object sender, EventArgs e)
{
if (System.IO.File.Exists(#"C:\invoices\temp.docx"))
{
// Use a try block to catch IOExceptions, to
// handle the case of the file already being
// opened by another process.
try
{
System.IO.File.Delete(#"C:\invoices\temp.docx");
}
catch (System.IO.IOException exception)
{
Console.WriteLine(exception.Message);
return;
}
}
File.Copy(#"C:\invoices\template.docx", #"C:\invoices\temp.docx");
SearchAndReplace("name", "asdsadsadasdasdas");
}
public static void SearchAndReplace(string wordtoreplace, string replace)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(#"C:\invoices\temp.docx", true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
//Regex regexText = new Regex(wordtoreplace);
docText = docText.Replace(wordtoreplace, replace);
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
wordDoc.Close();
}
}
The problem is that the document stream you are opening is an XML document. It contains much more than the words that are typed in your document. There are XML attributes named "name" that are being replaced by your code which make the document no longer validate against the schema.
You can continue doing a plain text replace if you use more unique terms. For example, if your search term is "asdf", then it would be pretty safe to replace because that value won't appear in the XML schema.
To do this correctly, you need to parse the XML document. The XML elements that contain the actual text are named "w:t". If you loop through all of the "w:t" XML elements, you can do your plain text replace on their "InnerText" values. This will guarantee that your XML will remain valid.
Note that you will still have problems if you try to parse the XML directly... If you type your token text ("name" in this case), then apply some kind of format (like bold) to the middle of the word, you will no longer be able to find "name" in a single "w:t" element. By applying the format, the text "name" will be broken up into more than one "w:t" elements. To get this to work in my project, I applied an intermediate step that merged the "w:t" elements before I searched for the tokens. The trick here is knowing when the elements can't be merged due to formatting differences.
Related
I have a Windows Service to copy files to a folder and replace Text in Word documents. For the replace in the documents I use this code: Find and Replace text in a Word document
The problem is: The files stay in use until I copy the next files to another folder (and fill out Word document).
My code for the Search and Replace looks like this:
using (var flatDocument = new FlatDocument(fullpath))
{
flatDocument.FindAndReplace("ValueA", "ValueB");
// Save document on Dispose.
}
if I Skip this code the Service runs fine and the Files are not in use after copy. How come it stays in use even after the using clause?
Maybe someone has a clue?
I think there might be a bug in the Developer Center Sample Code Find and Replace text in a Word document.
In short, its keeping the File Handle open by not calling Dispose on the Underlying FileStream in the FlatDocument class. This seems weird as you would think Package.Dispose would clean up this handle, yet it doesn't.
If you modify the code in the FlatDocument class (as i have done in the following), it should fix it
In the constructor
private Stream _stream; // Add this
public FlatDocument(Stream stream)
{
if (stream == null)
{
throw new ArgumentNullException("stream");
}
_stream = stream; // Add this
documents = XDocumentCollection.Open(stream);
ranges = new List<FlatTextRange>();
CreateFlatTextRanges();
}
In Dispose
public void Dispose()
{
documents.Dispose();
_stream.Dispose(); // Add this
}
For the following operation:
Open a text file
Search and replace all searching characters with new characters
I'd like to achieve above in c#, here is my code:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
string line;
while ((line = sr.ReadLine())!= null)
{
if (!line.Contains("before"))
{
sw.WriteLine(line);
}
else if (line.Contains("before"))
{
sw.WriteLine(line.Replace("before", "after"));
}
}
}
}
Basically, the above code will generate a new file with the desired replace operation, but as you can see, the way I am doing is read each line of the original file and write to a new file. This could achieve my goal, but it may have system IO issue because it is reading and writing for each line. Also, I cannot read all the lines to an array first, and then write, because the file is large and if I try to write to an string[], replace all, then write the array to the file, will bring about the memory timeout issue.
Is there any way that I can just locate to the specific lines, and just replace those lines and keep all the rest? Or What is the best way to solve the above problem? Thanks
I don't know what IO issue you are worried about, but your code should work ok. You can code more concisely as follows:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
while ((string line = sr.ReadLine())!= null)
{
sw.WriteLine(line.Replace("before", "after"));
}
}
}
This will run a bit faster because it searches for "before" only once per line. By default the StreamWriter buffers your writes and does not flush to the disk each time you call WriteLine, and file IO is asynchronous in the operating system, so don't worry so much about IO.
In general, what you are doing is correct, possibly followed by some renames to replace the original file. If you do want to replace the original file, you should rename the original file to a temporary name, rename the new file to the original name, and then either leave or delete the original file. You must handle conflicts with your temporary name and errors in all renames.
Consider you are replacing a six character string with a five character string - if you write back to the original file, what will you do with the extra characters? Files are stored on disk as raw bytes of data, there is no "text" file on disk. What if you replace a string with a longer one - you then potentially have to move the entire rest of the file to make room to write the longer line.
You can imagine the file on disk as letters written on graph paper in the boxes. The end of each line is noted by a special character (or characters - in Windows, that is CRLF), the characters fill all the boxes horizontally. If you tried to replace words on the graph paper you would have to erase and re-write lots of letters. Writing on a new sheet will be easiest.
Well, your approach is basically fine... but I wouldn't check if the line contains the word before... the trade-off is not good enough:
using (StreamReader sr = new StreamReader(#"S:\Personal Folders\A\TESTA.txt"))
{
using (StreamWriter sw = new StreamWriter(#"S:\Personal Folders\A\TESTB.txt"))
{
String line;
while ((line = sr.ReadLine()) != null)
sw.WriteLine(line.Replace("before", "after"));
}
}
Try following :
else if (line.Contains("before"))
{
sw.WriteLine(line.Replace("before", "after"));
sw.Write(sr.ReadToEnd());
break;
}
I am using Open XML SDK to highlight a specific word inside a docx file but I not able to do that, after an extensive research I did the following I tried to open the document and then edit the color of the word and save it again, but I get no thing saved while I found the document last edit time with now date.
What is wrong with that code?
void HighLightWord(string documentUrl, string word)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentUrl, true))
{
var body = wordDoc.MainDocumentPart.Document.Body;
var paras = body.Elements<Paragraph>();
DocumentFormat.OpenXml.Wordprocessing.Color color = new DocumentFormat.OpenXml.Wordprocessing.Color();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains(word))
{
color.Val = "365F91";
run.Append(color);
wordDoc.MainDocumentPart.Document.Save();
return;
}
}
}
}
wordDoc.Close(); // close the template file
}
}
Create a simple document in the Word application with the formatting you need. Save and close. Open the document in the Open XML SDK Productivity Tool. That will generate the code required to create the document. Then you can compare your code to that of the Tool's.
FWIW any kind of formatting is a child element of the RunProperties, so appending a color directly to the Run cannot work. In addition, you need to create an object for the kind of formatting (it's not clear from your description whether you want to change the text color or apply highlight formatting). That is what's appended to the RunProperties. It's also important that your code first check whether RunProperties are available for the Run. If not, that first needs to be created before anything can be appended to it.
So far I've been able to set custom properties to a Word doc by using VSTO and by adding a package stream to the active document as it follows
public static void SetCustomProperty(Microsoft.Office.Interop.Word.Document doc, string propertyName, object propertyValue)
{
using (MemoryStream stream = new MemoryStream())
using ((WordprocessingDocument wordDoc = WordprocessingDocument.Create(stream, WordprocessingDocumentType.Document, true))
{
SetProperty(wordDoc, propertyName, propertyValue);
// Flush the contents of the package.
wordDoc.Package.Flush();
// Convert back to flat OPC by using this in-memory package.
XDocument xDoc = OpcHelper.OpcToFlatOpc(wordDoc.Package);
// Return the xml string.
string openxml = xDoc.ToString();
// Add to Word doc
doc.CustomXMLParts.Add(openxml);
}
}
The SetProperty method works as explained here and the OpcHelper can be found here and is explained here.
The problem is that my custom property is inserted in a xml file (e.g. item1.xml) that is located in the folder document.zip\customXml of the OpenXML file format. Later on when I want to read my custom property I use the WordProcessingDocument.CustomFilePropertiesPart which is empty. In fact I found that CustomFilePropertiesPart references the document.zip\docProps\custom.xml file.
So instead of using doc.CustomXMLParts.Add(openxml); what should I use to populate the right xml file, i.e. document.zip\docProps\custom.xml?
EDIT
I tried already the solution proposed by Mishra without success, i.e custom properties were not always saved. However since he posted this solution I tried again and I found here that you firstly need to mark the document as unsaved:
doc.CustomDocumentProperties.Add("MyProp", False, MsoDocProperties.msoPropertyTypeNumber, 123);
doc.Saved = false;
doc.Save();
you cant set custome properties using CustomXMLParts collection. If you have document open better keep it simple and use CustomDocumentProperties collection, its quite fast and easy. I would use open XML in open doc only if the data to insert is vary large.
Is there anything built in to determine if an XML file is valid. One way would be to read the entire content and verify if the string represents valid XML content. Even then, how to determine if string contains valid XML data.
Create an XmlReader around a StringReader with the XML and read through the reader:
using (var reader = XmlReader.Create(something))
while(reader.Read())
;
If you don't get any exceptions, the XML is well-formed.
Unlike XDocument or XmlDocument, this will not hold an entire DOM tree in memory, so it will run quickly even on extremely large XML files.
You can try to load the XML into XML document and catch the exception.
Here is the sample code:
var doc = new XmlDocument();
try {
doc.LoadXml(content);
} catch (XmlException e) {
// put code here that should be executed when the XML is not valid.
}
Hope it helps.
Have a look at this question:
How to check for valid xml in string input before calling .LoadXml()