I want to create a document (*.docx) using docx library. I have HTML formatted text from a rich text editor, and I want to save it as it is. I am not able to find any place where I can get the help. My current code is very basic, and is copy paste from there help.
My code looks like:
private string CreateDocumentFromText()
{
string filePath= Server.MapPath("../DocXExample.docx");
var document = Novacode.DocX.Create(filePath);
document.InsertParagraph("<b>Test</b>");
document.Save();
return fileName;
}
Document has content as:
<b>Test</b>
Whereas I want it to be:
Test
You have to convert it to xml element.
Try DocumentFormat.OpenXml library
Related
My need : I need to open an RTF File and read the content inside the RTF File and store it in a string variable.
What i have done : I have done it using "microsoft.office.interop.word.dll" ie.. Docment.open(String Filename);
But My Final necessity is : I need to open it using some other way to read the RTF File. This is Because in AzureFunction (microsoft.office.interop.word.dll is not supported ) ie.. word cant be installed in server.
OpenXML - This is used to open word , excel , powerpoint files . it cannot able to open RTF File.
Any possible answer is welcomed.
If you want to convert an RTF file to plain text, keeping only the text and losing all formatting and other non-text elements such as bitmaps, it is possible by using System.Windows.Forms.RichTextBox.
Note that you do not need an application with a user interface to do this; you can use RichTextBox in, for example, a service - but you will need to reference System.Windows.Forms.dll in order to do so.
The code to convert from an RTF file to a plain text string would look like this:
using System.Windows.Forms;
public static string RtfFileAsPlainText(string rtfPathName)
{
using (var rtf = new RichTextBox())
{
rtf.Rtf = File.ReadAllText(rtfPathName);
return rtf.Text;
}
}
I'm trying to read a file as string. But it seems that the data is corrupted.
string filepaths = Files[0].FullName;
System.IO.StreamReader myFile = new System.IO.StreamReader(filepaths);
string datas = myFile.ReadToEnd();
but in datas, it contains "pk0101" etc instead of original data. I'm doing this so I can replace a placeholder with this string data,datas. And finally when I replace,gets replaced text as 0101 etc. Is it because of the content in datas. How can I read the file as string. Your help will be greatly appreciated. Thank You.
*.docx is a file format which in raw view represents xml document. Take a look here to become more familiar with this format definition.
For working with office formats Microsoft recommends to use Open Xml SDK at DocumentFormat.OpenXml library.
Here is a great article for learning how to work with Word files.
It works as follows:
using (var wordDocument = WordprocessingDocument.Open(string.Empty, false))
{
var body = wordDocument.MainDocumentPart.Document.Body;
var text = body.GetFirstChild<Paragraph>().InnerText;
}
Also, take a look at this SO question: How do I read data from a word with format using the OpenXML Format SDK with c#?
So let's say in Microsoft Word VBA I got the openxml string using
Dim xmlstring as string = activedocument.range.wordopenxml
And now I want to create an wordProcessingdocument using this string.
However, in the relevant webpage (https://msdn.microsoft.com/en-us/library/documentformat.openxml.packaging.wordprocessingdocument_members(v=office.14).aspx) I cannot find any info about constructing an wordprocessingdocument using a string.
Can anyone teach me how to do this?
A word document is not just an xml file, it's an OPC (Open Packaging Convention), so you can't simply "create the file from xml".
Eric White describe in his blog how to do that : http://blogs.msdn.com/b/ericwhite/archive/2008/09/29/transforming-flat-opc-format-to-open-xml-documents.aspx
The principe is to create an empty file and then parse your xml to feed the opc file
I am trying to get the content of attachment. It may be an excel file, Document file or text file whatever it is but I want to store it in database so here I am using this code: -
foreach (FileAttachment file in em.Attachments)// Here em is type of EmailMessage class
{
Console.Write("Hello friends" + file.Name);
file.Load();
var stream = new System.IO.MemoryStream(file.Content);
var reader = new System.IO.StreamReader(stream, UTF8Encoding.UTF8);
var text = reader.ReadToEnd();
reader.Close();
Console.Write("Text Document" + text);
}
So By printing file.name is showing attachment file name but while printing 'text' on the console it is working if the attachment is .txt type but if it is .doc or .xls type then it is showing some symbolic result. I am not getting any text result. Am I doing something wrong or missing something. I want text result of any kind of file attachment . Please help me , I am beginner in C#
What you are seeing is what is actually in the file. Try opening one with Notepad.
There is no built-in way in .NET to show the "text contents" of arbitrary file formats. You'll have to create (preferably using third-party libraries that already solve this problem) some kind of logic that extracts plaintext from rich text documents.
See for example How to extract text from Pdf, Word and Excel documents?, Extract text from pdf and word files, and so on.
First, what do you expect when reading a binary file?
Your result is exactly what is expected. A text file can be shown as a string, but a doc or xls file is a binary file. You will see the binary content of the file. You will need to use a tool/lib to get the text/content from a binary file in human readable format.
TXT type is simple,DOC or XLS are much more complex.You can see TXT because is just text,DOC or XLS or PPT or something else needs to be interpreted by other mechanism.
See,for example,you have different colors or font sizes on a Word document,or a chart in an Excel document,how can you show that in a simple TextBox or RichTextBox?Short answer,you can't.
I am trying to replace bookmark in docx with text in c++\cli using open xml SDK concept.
The below piece of code will fetch bookmarks from word document and checks whether the bookmark matches the string “VERSION” if it is true, it is replaced with the string “0000” in the docx file.
Paragraph ^paragraph = gcnew Paragraph();
Run ^run = gcnew Run();
DocumentFormat::OpenXml::Wordprocessing::Text^ text = gcnew DocumentFormat::OpenXml::Wordprocessing::Text(“0000”);
run->AppendChild(text);
paragraph->AppendChild(run);
IDictionary<String^, BookmarkStart^> ^bookmarkMap =
gcnew Dictionary<String^, BookmarkStart^>();
for each (BookmarkStart ^bookmarkStart in
GlobalObjects::wordDoc->MainDocumentPart->RootElement->Descendants<BookmarkStart^>())
{
if (bookmarkStart->Name->Value == “VERSION”)
{
bookmarkStart->Parent->InsertAt<Paragraph^>(paragraph,3);
}
}
The above code works fine in most scenarios(wherever we insert bookmarks), but sometimes times it fails and I am not able to find the reason.
And if the bookmark is inserted at the starting position of a line, then after execution I am not able to open the docx file, there will be some errors.
I tried giving the index value as 0 for InserAt method even this is not working.
Please provide a solution for the above.
Thanks in advance
See How to Retrieve the Text of a Bookmark from an OpenXML WordprocessingML Document for code that retrieves text. It is written in C#, but you could use the code directly from C++/CLI.
See Replacing Text of a Bookmark in an OpenXML WordprocessingML Document for an algorithm that you can use to replace text.