Is there a way to append content to a RTF file format?
I need it for the following functionality. When adding a Paragraph to the FlowDocument, I need to write it to the RTF file, also. Till now I was saving the entire FlowDocument into the file. In addition, I don't want to keep the more than 10000 Paragraphs in memory.
Here is a piece of code I tried but it didn't work.
TextRange tr = new TextRange(flowDoc.ContentStart, flowDoc.ContentEnd);
FileStream file = new FileStream(path, FileMode.Append,FileAccess.ReadWrite, FileShare.Read);
tr.Save(file, DataFormats.Rtf);
TextWriter tw = new StreamWriter(file);
tw.Write(file.ToString());
tw.Close();
Thanks!
The only thing I could think of is to use a temp RichTextBox. Copy and paste into it all the formatted text you want to append, and then copy and paste from it. You will need another temp Richtextbox to convert from rtf code to formatted text before copying and pasting to the other one.
Related
Basically, I did ask already a question regarding on how to edit PDF File. But it seems that I need to pay a lot of money to achieve what I want. But here is the problem first,
I have a PDF Template containing a plain text like this {{Fullname}} like a placeholder and should be replaced with a certain value. Now, as I've said earlier, Editing the PDF File is a mess and tough because if it's not, I already saw a solution online for free.
Now, here is my Idea, What if I get the whole content of this template file and replace the placeholder with a certain value then create a new PDF File with the same content but the placeholder is already replaced?
I am trying to figure out this problem and here is my solution:
string destinationFile = Path.GetTempFileName();
string text = File.ReadAllText(sourceFile);
text = text.Replace("{{Fullname}}", "John Doe");
// Create a new PDF document
Document document = new Document();
PdfWriter.GetInstance(document, new FileStream(destinationFile, FileMode.Create));
document.Open();
// Add the text to the document
document.Add(new Paragraph(text));
// Close the document
document.Close();
Currently, I am using iTextSharp and does not meet what I expected. So, my question is, Is it possible to get all the content of the PDF File and modify it to create a new one? If yes, How?
How can I read pdf files and save contents to a text file using Spire.PDF?
For example: Here is a pdf file and here is the desired text file from that pdf
I tried the below code to read the file and save it to a text file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Tamal\Desktop\101395a.pdf");
StringBuilder buffer = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
}
doc.Close();
String fileName = #"C:\Users\Tamal\Desktop\101395a.txt";
File.WriteAllText(fileName, buffer.ToString());
System.Diagnostics.Process.Start(fileName);
But the output text file is not properly formatted. It has unnecessary whitespaces and a complete para is broken into multiple lines etc.
How do I get the desired result as in the desired text file?
Additionally, it is possible to detect and mark(like add a tag) to texts with bold, italic or underline forms as well? Also things get more problematic for pages have multiple columns of text.
Using iText
File inputFile = new File("input.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));
SimpleTextExtractionStrategy stes = new SimpleTextExtractionStrategy();
PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(stes);
canvasProcessor.processPageContent(pdfDocument.getPage(1));
System.out.println(stes.getResultantText());
This is (as the code says) a basic/simple text extraction strategy.
More advanced examples can be found in the documentation.
Use IronOCR
var Ocr = new IronOcr.AutoOcr();
var Results = Ocr.ReadPdf("E:\Demo.pdf");
File.WriteAllText("E:\Demo.txt", Convert.ToString(Results));
For reference https://ironsoftware.com/csharp/ocr/
Using this you should get formatted text output, but not exact desire output which you want.
If you want exact pre-interpreted output, then you should check paid OCR services like OmniPage capture SDK & Abbyy finereader SDK
That is the nature of PDF. It basically says "go to this location on a page and place this character there." I'm not familiar at all with Spire.PFF; I work with Java and the PDFBox library, but any attempt to extract text from PDF is heuristic and hence imperfect. This is a problem that has received considerable attention and some applications have better results than others, so you may want to survey all available options. Still, I think you'll have to clean up the result.
I have been having issues reading a file that contains a mix of Arabic and Western text. I read the file into a TextBox as follows:
tbx1.Text = File.ReadAllText(fileName.Text, Encoding.UTF8);
No matter what value I tried instead of "Encoding.UTF8" I got garbled characters displayed in place of the Arabic. The western text was displayed fine.
I thought it might have been an issue with the way the TextBox was defined, but on start up I write some mixed Western/Arabic text to the textbox and this displays fine:
tbx1.Text = "Start السلا عليكم" + Environment.NewLine + "Here";
Then I opened Notepad and copied the above text into it, then saved the file, at which point Notepad save dialogue asked for which encoding to use.
I then presented the saved file to my code and it displayed all the content correctly.
I examined the file and found 3 binary bytes at the beginning (not visible in Notepad):
The 3 bytes, I subsequently found through research represent the BOM, and this enables the C# "File.ReadAllText(fileName.Text, Encoding.UTF8);" to read/display the data as desired.
What puzzles me is specifying the " Encoding.UTF8" value should take care of this.
The only way I can think is to code up a step to add this data to a copy of teh file, then process that file. But this seems rather long-winded. Just wondering if there is a better way to do, or why the Encoding.UTF8 is not yielding the desired result.
Edit:
Still no luck despite trying the suggestion in the answer.
I cut the test data down to containing just Arabic as follows:
Code as follows:
FileStream fs = new FileStream(fileName.Text, FileMode.Open);
StreamReader sr = new StreamReader(fs, Encoding.UTF8, false);
tbx1.Text = sr.ReadToEnd();
sr.Close();
fs.Close();
Tried with both "true" and "false" on the 2nd line, but both give the same result.
If I open the file in Notepad++, and specify the Arabic ISO-8859-6 Character set it displays fine.
Here is what is looks like in Notepad++ (and what I would liek the textbox to display):
Not sure if the issue is in the reading from file, or the writing to the textbox.
I will try inspecting the data post read to see. But at the moment, I'm puzzled.
The StreamReader class has a constructor that will take care of testing for the BOM for you:
using (var stream = new FileStream(fileName.Text, FileAccess.Read))
{
using (var sr = new StreamReader(stream, Encoding.UTF8, true))
{
var text = sr.ReadToEnd();
}
}
The final true parameter is detectEncodingFromByteOrderMark:
The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes:
UTF-8
little-endian Unicode
and big-endian Unicode text
if the file
starts with the appropriate byte order marks. Otherwise, the
user-provided encoding is used. See the Encoding.GetPreamble method
for more information.
I am trying to get the content of attachment. It may be an excel file, Document file or text file whatever it is but I want to store it in database so here I am using this code: -
foreach (FileAttachment file in em.Attachments)// Here em is type of EmailMessage class
{
Console.Write("Hello friends" + file.Name);
file.Load();
var stream = new System.IO.MemoryStream(file.Content);
var reader = new System.IO.StreamReader(stream, UTF8Encoding.UTF8);
var text = reader.ReadToEnd();
reader.Close();
Console.Write("Text Document" + text);
}
So By printing file.name is showing attachment file name but while printing 'text' on the console it is working if the attachment is .txt type but if it is .doc or .xls type then it is showing some symbolic result. I am not getting any text result. Am I doing something wrong or missing something. I want text result of any kind of file attachment . Please help me , I am beginner in C#
What you are seeing is what is actually in the file. Try opening one with Notepad.
There is no built-in way in .NET to show the "text contents" of arbitrary file formats. You'll have to create (preferably using third-party libraries that already solve this problem) some kind of logic that extracts plaintext from rich text documents.
See for example How to extract text from Pdf, Word and Excel documents?, Extract text from pdf and word files, and so on.
First, what do you expect when reading a binary file?
Your result is exactly what is expected. A text file can be shown as a string, but a doc or xls file is a binary file. You will see the binary content of the file. You will need to use a tool/lib to get the text/content from a binary file in human readable format.
TXT type is simple,DOC or XLS are much more complex.You can see TXT because is just text,DOC or XLS or PPT or something else needs to be interpreted by other mechanism.
See,for example,you have different colors or font sizes on a Word document,or a chart in an Excel document,how can you show that in a simple TextBox or RichTextBox?Short answer,you can't.
i used a read stream to read an rtf file however it failed when this rtf file is opened by Microsoft word.
is there anyone know how to solve this problem?
The proper way to read a RTF file for a rich text box (has to be of type System.Windows.Forms.RichTextBox) is like this:
myRichTextBox.LoadFile(myFilename);
But, because you have a lock on the file, you have to do it this way (credit to #slaks):
myRichTextBox.LoadFile(new FileStream(myFilename, FileAccess.Read, FileSharing.ReadWrite));
And to save it, simply call this function:
myRichTextBox.SaveFile(myFilename);
Like this:
new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)