I am trying to replace bookmark in docx with text in c++\cli using open xml SDK concept.
The below piece of code will fetch bookmarks from word document and checks whether the bookmark matches the string “VERSION” if it is true, it is replaced with the string “0000” in the docx file.
Paragraph ^paragraph = gcnew Paragraph();
Run ^run = gcnew Run();
DocumentFormat::OpenXml::Wordprocessing::Text^ text = gcnew DocumentFormat::OpenXml::Wordprocessing::Text(“0000”);
run->AppendChild(text);
paragraph->AppendChild(run);
IDictionary<String^, BookmarkStart^> ^bookmarkMap =
gcnew Dictionary<String^, BookmarkStart^>();
for each (BookmarkStart ^bookmarkStart in
GlobalObjects::wordDoc->MainDocumentPart->RootElement->Descendants<BookmarkStart^>())
{
if (bookmarkStart->Name->Value == “VERSION”)
{
bookmarkStart->Parent->InsertAt<Paragraph^>(paragraph,3);
}
}
The above code works fine in most scenarios(wherever we insert bookmarks), but sometimes times it fails and I am not able to find the reason.
And if the bookmark is inserted at the starting position of a line, then after execution I am not able to open the docx file, there will be some errors.
I tried giving the index value as 0 for InserAt method even this is not working.
Please provide a solution for the above.
Thanks in advance
See How to Retrieve the Text of a Bookmark from an OpenXML WordprocessingML Document for code that retrieves text. It is written in C#, but you could use the code directly from C++/CLI.
See Replacing Text of a Bookmark in an OpenXML WordprocessingML Document for an algorithm that you can use to replace text.
Related
I used Itext7 in my C# code to create a pdf file, as I said in my other question here
Itext7 not showing arabic text
so I gave up on trying to fix it, because it seems like I need to pay for the addon, and I can't do that
I tried Pdf sharp, it showed arabic letters but there were disconnected and reversed, and writing arabic backward did not make the letters connect
I used SautinSoft library and it created a word document where arabic works fine, but it has a footer that says that it is a free version, so i can't use this one either
the pdf created by this library also doesnt support arabic
so I think I can't write pdf in arabic, all libraries I tried didn't supported it
is there anyway to fix it?
or can anyone please suggest another library that can create arabic pdf or a word document without watermarks or footers
I found the solution, using Gembox pdf, it only allows 20 paragraphs, but that is more than enough
What if DocumentCore?
public static void SecureDocument()
{
string filePath = #"ProtectedDocument.pdf";
DocumentCore dc = new DocumentCore();
// Let's create a simple document.
dc.Content.End.Insert("Hello World!!!", new CharacterFormat() { FontName = "Verdana", Size = 65.5f, FontColor = Color.Orange });
PdfSaveOptions so = new PdfSaveOptions();
// Password Protection
so.EncryptionDetails.UserPassword = "12345";
// EncryptionAlgorithm
so.EncryptionDetails.EncryptionAlgorithm = PdfEncryptionAlgorithm.RC4_128;
//Permissions: Content Copying, Commenting, Printing, Changing the Document, filing of form fildes
//Printing: Allowed
so.EncryptionDetails.Permissions = PdfPermissions.Printing;
// Save a document as the PDF file with Security Options.
dc.Save(filePath, so);
// Open the result for demonstration purposes.
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(filePath) { UseShellExecute = true });
}
I am trying to copy entire content (including page numbers, and page layout) from a word document to another, using Microsoft.Office.Interop.Word.
I cannot use SaveAs method because the document in which I want to paste the contents is already created and it contains VBA code.
Also, I cannot use XML related code because the document in which I am copying the content is in the older format. This document is part of an old way of uploading a document to a server database, using VBA code.
Using VBA code, I can copy the entire content without any issue.
Selection.WholeStory
Selection.Copy
Windows("document.doc").Activate
Selection.WholeStory
Selection.PasteAndFormat (wdFormatOriginalFormatting)
For C#, I used Microsoft.Office.Interop.Word to replicate the VBA code.
Word.Application objWordOpen = new Word.Application();
objWordOpen.Visible = false;
Word.Document doclocal = objWordOpen.Documents.Open(filepath);
doclocal.ActiveWindow.Selection.WholeStory();
doclocal.ActiveWindow.Selection.Copy();
Document d1 = objWordOpen.Documents.Open(filepath2);
d1.Activate();
d1.ActiveWindow.Selection.WholeStory();
d1.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdFormatOriginalFormatting);
I have also tried using range
Word.Range oRange = doclocal.Content;
oRange.Copy();
The content is copied into the document, but without headers and footers. Also, when using Selection.WholeStory() approach, the page margins settings don't get copied.
What changes should I make to the c# code in order to achieve my result?
MS Office applications have complicated relationships with the clipboard. Between various optimisations that may lead to cryptic prompts, and numerous formats they support, it is best to not do anything remotely funny between a Copy and a Paste.
The VBA code follows this advice, the C# code opens a document between copying and pasting.
Make sure you open the documents in advance and not in the middle of a copypaste.
I have to use OpenXML SDK 2.5 with C# to copy formulas from one word document then append them to another word document. I tried the below code, it ran successfully but when I tried to open the file, it said there's something wrong with the content. I opened it ignoring the warning but those formulas were not displayed. They are just blank blocks.
My code:
private void CreateNewWordDocument(string document, Exercise[] exercices)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(document, WordprocessingDocumentType.Document))
{
// Set the content of the document so that Word can open it.
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
SetMainDocumentContent(mainPart);
foreach (Exercise ex in exercices)
{
wordDoc.MainDocumentPart.Document.Body.AppendChild(ex.toParagraph().CloneNode(true));
}
wordDoc.MainDocumentPart.Document.Save();
}
}
// Set content of MainDocumentPart.
private void SetMainDocumentContent(MainDocumentPart part)
{
string docXml =
#"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body><w:p><w:r><w:t>Exercise list!</w:t></w:r></w:p></w:body>
</w:document>";
using (Stream stream = part.GetStream())
{
byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
stream.Write(buf, 0, buf.Length);
}
}
This happens because not everything that can be referenced in the paragraph is copied when you clone the paragraph. The Word XML format consists of multiple files some of which reference each other. If you copy the paragraph from one document to another you need to also copy any relationships that may exist.
The OpenXML Productivity Tool is useful for diagnosing errors like these. You can open a document with the tool and ask it to validate the document.
I created a test document that just contained a hyperlink and ran your code to copy the contents to another document. I too got an error when I attempted to load it using Word so I opened it in the Productivity Tool and saw the following output:
This shows that the hyperlink is stored as a relationship rather than inline in the paragraph and my new file references a relationship that doesn't exist. Unzipping the original file and the new file and comparing the two shows what is going on:
document.xml from original:
.rels of original
document.xml of generated file
.rels of generated file
Note that in the generated file the hyperlink references relationship rId5 but that doesn't exist in the generated documents relationship file.
It's worth noting that for simple source documents the code worked without issue as there are no relationships that require copying.
There are two ways that you can solve this. The easiest way is to only copy the text of the paragraph (you'll lose all styles, images, hyperlinks etc) but it is very simple. All you need to do is change
wordDoc.MainDocumentPart.Document.Body.AppendChild(ex.toParagraph().CloneNode(true));
for
Paragraph para = wordDoc.MainDocumentPart.Document.Body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(ex.toParagraph().InnerText));
The more complex (and perhaps proper) way of achieving it is to find the relationships and copy them to the new document as well. The code for doing that is probably beyond the scope of what I can write here but there is an interesting article on the subject here http://blogs.msdn.com/b/ericwhite/archive/2009/02/05/move-insert-delete-paragraphs-in-word-processing-documents-using-the-open-xml-sdk.aspx.
Essentially the author of that blog post is using the Powertools for OpenXML to find relationships and copy them from one document to another.
I have a request to create a word document on the fly based on a template provided to me. I have done some research and everything seems to point at OpenXML. I have looked into that, but the cs file that gets created is over 15k lines and is breaking my VS 2010 (causing it to not respond every time I make a change).
I have been looking at this tutorial series on Open XML
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/10/13/getting-started-with-open-xml-development.aspx
I have done things in the past with text files and Regular Expressions, but since Word encrypts everything, that does not work. Are there any other options that are fairly lightweight for creating word documents from templates.
//Hi, It is quite simple.
//First, you should copy your Template file into another location.
string SourcePath = "C:\\MyTemplate.dotx";
string DestPath = "C:\\MyDocument.docx";
System.IO.File.Copy(SourcePath, DestPath);
//After copying the file, you can open a WordprocessingDocument using your Destination Path.
WordprocessingDocument Mydoc = WordprocessingDocument.Open(DestPath, true);
//After openning your document, you can change type of your document after adding additional parts into your document.
mydoc.ChangeDocumentType(WordprocessingDocumentType.Document);
//If you wish, you can edit your document
AttachedTemplate attachedTemplate1 = new AttachedTemplate() { Id = "MyRelationID" };
MainDocumentPart mainPart = mydoc.MainDocumentPart;
MySettingsPart = mainPart.DocumentSettingsPart;
MySettingsPart.Settings.Append(attachedTemplate1);
MySettingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate", new Uri(CopyPath, UriKind.Absolute), "MyRelationID");
//Finally you can save your document.
mainPart.Document.Save();
I am currently working on something along these lines and I have been making use of the Open XML SDK and the OpenXmlPowerTools The approach been taken is taking the actual template file opening it up and putting text into various place holders within the template document. I have been using content controls as the place markers.
The SDK tool to open up a document has been invaluable in being able to compare documents and see how it is constructed. However the code generated from the tool I have been refactoring heavily and removing sections that are not being used at all.
I can't talk about doc files but with docx files they are not encrypted they are just zip files that contain xml files
Eric White's blog has a large number of examples and code samples which have been very useful
currently i have been using the following code and i am using some dll files from pdfbox
FileInfo file = new FileInfo("c://aa.pdf");
PDDocument doc = PDDocument.load(file.FullName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string text = pdfStripper.getText (doc);
richTextBox1.Text = qq;
using this code i can able to get text file but not in a correct format plz give me a some ideas
Extracting the text from a pdf file is anything but trivial.
To quote from th iTextSharp tutorial.
"The pdf format is just a canvas where
text and graphics are placed without
any structure information. As such
there aren't any 'iText-objects' in a
PDF file. In each page there will
probably be a number of 'Strings', but
you can't reconstruct a phrase or a
paragraph using these strings. There
are probably a number of lines drawn,
but you can't retrieve a Table-object
based on these lines. In short:
parsing the content of a PDF-file is
NOT POSSIBLE with iText."
There are several commercial applications which claim to be able to do it. Caveat Emptor.
There is also a free software library called Poppler http://poppler.freedesktop.org/ which is used by the pdf viewers of GNOME and KDE. It has a function called pdftotext() but I have no experience with it. It may be your best free option.
There is a blog article explaining the issues with PDF text extraction in general at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text