My need : I need to open an RTF File and read the content inside the RTF File and store it in a string variable.
What i have done : I have done it using "microsoft.office.interop.word.dll" ie.. Docment.open(String Filename);
But My Final necessity is : I need to open it using some other way to read the RTF File. This is Because in AzureFunction (microsoft.office.interop.word.dll is not supported ) ie.. word cant be installed in server.
OpenXML - This is used to open word , excel , powerpoint files . it cannot able to open RTF File.
Any possible answer is welcomed.
If you want to convert an RTF file to plain text, keeping only the text and losing all formatting and other non-text elements such as bitmaps, it is possible by using System.Windows.Forms.RichTextBox.
Note that you do not need an application with a user interface to do this; you can use RichTextBox in, for example, a service - but you will need to reference System.Windows.Forms.dll in order to do so.
The code to convert from an RTF file to a plain text string would look like this:
using System.Windows.Forms;
public static string RtfFileAsPlainText(string rtfPathName)
{
using (var rtf = new RichTextBox())
{
rtf.Rtf = File.ReadAllText(rtfPathName);
return rtf.Text;
}
}
Related
I've been working on an application to read images from multiple word files and store them in one single word file using Microsoft.Office.Interop.Word in C#
EDIT: I also need to save a copy of the images on the file system, so I need the image in a Bitmap or similar object.
This is my implementation so far, which works fine:
foreach (InlineShape shape in doc.InlineShapes)
{
shape.Range.Select();
if (shape.Type == WdInlineShapeType.wdInlineShapePicture)
{
doc.ActiveWindow.Selection.Range.CopyAsPicture();
ImageData = Clipboard.GetDataObject();
object _ob1 = ImageData.GetData(DataFormats.Bitmap);
bmp = (Bitmap)_ob1;
images[i++] = bmp;
/*
bmp.Save("C:\\Users\\Akshay\\Pictures\\bitmaps\\test" + i.ToString() + ".bmp");
*/
}
}
I have:
Selected the images as InlineShapes
Copied the shape into Clipboard
Stored the shape in the Clipboard in a DataObject
Extracted the shape from the DataObject in Bitmap format and stored in a Bitmap object.
I've been told to refrain from using Clipboard in Word automation and use the Word APIs instead.
I've read up on it and found an SO answer stating the same.
I looked up many implementations of reading images from Word files on MSDN, SO etc. but could not find any without using clipboard.
How do I read images from Word files using the Word APIs from Microsoft.Office.Interop.Word namespace alone without using Clipboard ?
Word documents in the Office Open XML file format store images in Base64. So it should be possible to extract that information and convert/stream it to a file. You can access the information when the document is open in the Word application using the Range.WordOpenXML property.
string shapeBase64 = shape.Range.WordOpenXML;
This will return the entire Word Open XML in the flat file OPC format. In other words, it won't contain only the picture in Base64, but the entire zip package definition as XML that surrounds it. In my quick test, the tag the contains the actual Base64 is
<pkg:binaryData>
That's a child element of
<pkg:part pkg:name="/word/media/image1.jpg" pkg:contentType="image/jpeg" pkg:compression="store">
Note that it would also be possible for you to get the entire document's WordOpenXML in one step:
document.Content.WordOpenXML
but might then need to understand the way the InlineShapes in the document body are linked to the actual information in the "media" part.
And it would be possible, of course, to work directly with the Zip Package (using the Open XML SDK, perhaps) instead of opening the document in the Word.Application.
hiii
i want to get content from Microsoft word file with out
Microsoft.Office.Interop dll uses.
I also use this code but its only read text from .xml file and .txt file not in .doc file
using System.IO;
using(StreamReader streamReader = new StreamReader(filePath)) { string text = streamReader.ReadToEnd(); }
office documents are more complex than simple xml/txt files since they contain much more text-related information (fonts, colors, locations, tables, images, etc etc).
Starting from Office 2007, microsoft uses the 'Office Open XML' format for saving office files. To parse a docx file, rename its extension to zip (e.g. untitled1.docx.zip) and extract its contents (using any zip app/library).
You will get a few files and folders, navigate to the 'word' folder and simply read the file named 'document.xml'.
This file contains all the textual information of the document (it is xml-formatted, so be sure to parse it correctly).
If you want to extract textual information of a pre-2007 files (e.g. 'doc' file), you will have to use Microsoft Office Compatibility Pack, which migrates files to the new format (it can be used programmatically, read about it)
Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll.
Download the dll from the given URL :
sourceforge.net/p/word-reader/wiki/Home/
(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)
The Sample Code is in simple Console in C#:
using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;
namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}
It's working fine with doc & docx formet files.
I want to create a document (*.docx) using docx library. I have HTML formatted text from a rich text editor, and I want to save it as it is. I am not able to find any place where I can get the help. My current code is very basic, and is copy paste from there help.
My code looks like:
private string CreateDocumentFromText()
{
string filePath= Server.MapPath("../DocXExample.docx");
var document = Novacode.DocX.Create(filePath);
document.InsertParagraph("<b>Test</b>");
document.Save();
return fileName;
}
Document has content as:
<b>Test</b>
Whereas I want it to be:
Test
You have to convert it to xml element.
Try DocumentFormat.OpenXml library
I am trying to get the content of attachment. It may be an excel file, Document file or text file whatever it is but I want to store it in database so here I am using this code: -
foreach (FileAttachment file in em.Attachments)// Here em is type of EmailMessage class
{
Console.Write("Hello friends" + file.Name);
file.Load();
var stream = new System.IO.MemoryStream(file.Content);
var reader = new System.IO.StreamReader(stream, UTF8Encoding.UTF8);
var text = reader.ReadToEnd();
reader.Close();
Console.Write("Text Document" + text);
}
So By printing file.name is showing attachment file name but while printing 'text' on the console it is working if the attachment is .txt type but if it is .doc or .xls type then it is showing some symbolic result. I am not getting any text result. Am I doing something wrong or missing something. I want text result of any kind of file attachment . Please help me , I am beginner in C#
What you are seeing is what is actually in the file. Try opening one with Notepad.
There is no built-in way in .NET to show the "text contents" of arbitrary file formats. You'll have to create (preferably using third-party libraries that already solve this problem) some kind of logic that extracts plaintext from rich text documents.
See for example How to extract text from Pdf, Word and Excel documents?, Extract text from pdf and word files, and so on.
First, what do you expect when reading a binary file?
Your result is exactly what is expected. A text file can be shown as a string, but a doc or xls file is a binary file. You will see the binary content of the file. You will need to use a tool/lib to get the text/content from a binary file in human readable format.
TXT type is simple,DOC or XLS are much more complex.You can see TXT because is just text,DOC or XLS or PPT or something else needs to be interpreted by other mechanism.
See,for example,you have different colors or font sizes on a Word document,or a chart in an Excel document,how can you show that in a simple TextBox or RichTextBox?Short answer,you can't.
What I'm trying to accomplish
My app generates some tabular data
I want the user to be able to launch Excel and click "paste" to place the data as cells in Excel
Windows accepts a format called "CommaSeparatedValue" that is used with it's APIs so this seems possible
Putting raw text on the clipboard works, but trying to use this format does not
NOTE: I can correctly retrieve CSV data from the clipboard, my problem is about pasting CSV data to the clipboard.
What I have tried that isn't working
Clipboard.SetText()
System.Windows.Forms.Clipboard.SetText(
"1,2,3,4\n5,6,7,8",
System.Windows.Forms.TextDataFormat.CommaSeparatedValue
);
Clipboard.SetData()
System.Windows.Forms.Clipboard.SetData(
System.Windows.Forms.DataFormats.CommaSeparatedValue,
"1,2,3,4\n5,6,7,8",
);
In both cases something is placed on the clipboard, but when pasted into Excel it shows up as one cell of garbarge text: "–§žý;pC¦yVk²ˆû"
Update 1: Workaround using SetText()
As BFree's answer shows SetText with TextDataFormat serves as a workaround
System.Windows.Forms.Clipboard.SetText(
"1\t2\t3\t4\n5\t6\t7\t8",
System.Windows.Forms.TextDataFormat.Text
);
I have tried this and confirm that now pasting into Excel and Word works correctly. In each case it pastes as a table with cells instead of plaintext.
Still curious why CommaSeparatedValue is not working.
The .NET Framework places DataFormats.CommaSeparatedValue on the clipboard as Unicode text. But as mentioned at http://www.syncfusion.com/faq/windowsforms/faq_c98c.aspx#q899q, Excel expects CSV data to be a UTF-8 memory stream (it is difficult to say whether .NET or Excel is at fault for the incompatibility).
The solution I've come up with in my own application is to place two versions of the tabular data on the clipboard simultaneously as tab-delimited text and as a CSV memory stream. This allows the destination application to acquire the data in its preferred format. Notepad and Excel prefer the tab-delimited text, but you can force Excel to grab the CSV data via the Paste Special... command for testing purposes.
Here is some example code (note that WinForms-equivalents from the WPF namespaces are used here):
// Generate both tab-delimited and CSV strings.
string tabbedText = //...
string csvText = //...
// Create the container object that will hold both versions of the data.
var dataObject = new System.Windows.DataObject();
// Add tab-delimited text to the container object as is.
dataObject.SetText(tabbedText);
// Convert the CSV text to a UTF-8 byte stream before adding it to the container object.
var bytes = System.Text.Encoding.UTF8.GetBytes(csvText);
var stream = new System.IO.MemoryStream(bytes);
dataObject.SetData(System.Windows.DataFormats.CommaSeparatedValue, stream);
// Copy the container object to the clipboard.
System.Windows.Clipboard.SetDataObject(dataObject, true);
Use tabs instead of commas. ie:
Clipboard.SetText("1\t2\t3\t4\t3\t2\t3\t4", TextDataFormat.Text);
Just tested this myself, and it worked for me.
I have had success pasting into Excel using \t (see BFree's answer) as column separators and \n as row separators.
I got the most success defeating formatting issues by using a CSV library (KBCsv) to write the data into a CSV file in the temp folder then open it in Excel with Process.Start(). Once it is in Excel the formatting bit is easy(er), copy-paste from there.
string filePath = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".csv";
using (var streamWriter = new StreamWriter(filePath))
using (CsvWriter csvWriter = new CsvWriter(streamWriter))
{
// optional header
csvWriter.WriteRecord(new List<string>(){"Heading1", "Heading2", "YouGetTheIdea" });
csvWriter.ValueSeparator = ',';
foreach (var thing in YourListOfThings ?? new List<OfThings>())
{
if (thing != null)
{
List<string> csvLine = new List<string>
{
thing.Property1, thing.Property2, thing.YouGetTheIdea
};
csvWriter.WriteRecord(csvLine);
}
}
}
Process.Start(filePath);
BYO Error handing & logging.