generating rtf from template in c# winforms app - c#

Is there a simple process someone can recommend for generating an rtf document from a pre-built "template" and populate fields.
I would prefer to avoid ms word automation type solutions as i cannot guarantee ms office versions etc.
Resulting file needs to be editable so I cant go pdf
is it as simple as using something like nvelocity, or do i need to do something fancier?
thanks

You can always use keywords : Example:
This is a [text] that needs to be [replaced]
and the use a stringbuilder to replace the keywords.
StringBuilder sb = new StringBuilder(yourTemplate);
sb.Replace("[text]", "car");
sb.Replace("[replaced]", "washed");
So if you get me the final text will be :
This is a car that needs to be wahsed.
This is just one way of doing it.

Related

Is it possible to split a PDF into separate files based on text using C#?

I have a large single pdf document which consists of multiple records. Each record usually takes one page however some use 2 pages. A record starts with a defined text, always the same.
My goal is to split this pdf into separate pdfs and the split should happen always before the "header text" is found.
Yes it's possible.
TikaOnDotnet
Take a look at TikaOnDotnet.TextExtractor it's a wrapper around the Tika text extraction Java library.
You can get data from your pdf easy like that:
var text = new TextExtractor().Extract(file.FullName).Text;
docs: https://github.com/KevM/tikaondotnet
nuget: https://www.nuget.org/packages/TikaOnDotnet.TextExtractor/
itext7
You can also use itext7
docs: https://api.itextpdf.com/iText7/dotnet/7.1.9/index.html
nuget: https://www.nuget.org/packages/itext7/

create and label a textblock in word using c#

I am trying to write an Add-In (VSTO) for Word to add some software requirement-management capabilities to it.
assume a Requirement would follow this convention:
[REQ_<nr>] Specification item title
⌈ Specification item description. ⌋ (REQ_<nr1>, REQ_<nr2>)
Is there anyway to declare this block and name it so, that you can later find all Requriements using OpenXML and c#?
thanks
Word has two basic ways you can identify something in a document: Bookmarks and ContentControls. Both will work for OpenXML and for the Interop (C#).
Each bookmark must have a different name, so you'd use a naming convention with an incrementing counter (Req1, Req2 for example).
ContentControls are specifically designed to allow the same entry for the Title property for multiple controls. The interop command Document.SelectContentControlsByTitle returns an array of all content controls. OpenXML can do the same, of course. This approach is not available for bookmarks in the interop.
Another difference between the two is that content controls can be protected, in case that's important for you.

Prepare doc and docx Files for Lucene indexing

I wanted to ask if there is a quick way of getting content of a document into a single document field. All the examples i have seen have relatively short strings. I cannot save an entire journal article into a string and indexthat is there a quick way of telling lucene to index all the words in a file? I am using Lucene.net 3.03 for this application.
There is not an easy way to pass just the file, you have to provide the entire content to lucene to made the indexing for the search. Here is a answer from the Q/A about indexing PDF, but is the same from every type of document, just open it and index to lucene.
You can just pass a System.IO.TextReader to a Field. If the file is plain text, or something like it, you should just be able to open the Reader on it, and pass it directly into the Field, like:
System.IO.TextReader reader = new StreamReader("path/to/my/file.txt");
Field field = new Field("fieldName", reader);
document.add(field);

How to replace text in a PDF with C#?

I saw a lot of solutions in here but none are clear or good answers.
Here is my simple question, hoping with a straight answer.
I have a PDF file (a template) which is created having text something like this:
{FIRSTNAME} {LASTNAME} {ADDRESS} {PHONENUMBER}
is it possible to have C# code that replace these templates with a text of my choice?
No fields, no other complex stuff.
Is there any Open source library helping me achieve that?
This thread is dead, however I'm posting my solution for other lost souls that might face this problem in the future. Unfortunately my company doesn't allow posting code online so I'll describe the solution :).
So basically what you have to do is use PdfSharp and modify this sample to replace text in stream, but you must take into account that text may be split into many parentheses (convert stream to string to see what the format is).
Then, with code similar to this sample traverse through source pdf page by page and modify current page by searching for PdfContent items inside PdfReference items and replacing text in content's stream.
The 'problem' with PDF documents is that they are inherently not suitable for editing. Especially ones without fields. The best thing is to step back and look at your process and see if there is a way to replace the text before the PDF was generated. Obviously, you may not always have this freedom.
If you will be able to replace text, then you should be aware that there will be no automatic reflow of the text following the replaced text. Given that you are fine with that, then there are very few solutions that allows you to replace text.
I know that you are looking for an OpenSource solution so I feel reluctant to offer you a commercial solution. We offer one called PDFKit.NET. It allows you to extract all content on a page as so-called shapes (text, images, curves, etc.). See method Page.CreateShapes in the type reference. You can then programmatically navigate and edit this structure of shapes and then write it back to a PDF again.
Here it is:
http://www.tallcomponents.com/pdfkit
Disclosure: I am the founder of TallComponents, vendor of this component
For simple text replace use iTextSharp library.
The code that replace one string with another is below.
Note that this will replace only simple text and may not work in all cases.
//using iTextSharp.text.pdf;
void VerySimpleReplaceText(string OrigFile, string ResultFile, string origText, string replaceText)
{
using (PdfReader reader = new PdfReader(OrigFile))
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
byte[] contentBytes = reader.GetPageContent(i);
string contentString = PdfEncodings.ConvertToString(contentBytes, PdfObject.TEXT_PDFDOCENCODING);
contentString = contentString.Replace(origText, replaceText);
reader.SetPageContent(i, PdfEncodings.ConvertToBytes(contentString, PdfObject.TEXT_PDFDOCENCODING));
}
new PdfStamper(reader, new FileStream(ResultFile, FileMode.Create, FileAccess.Write)).Close();
}
}
As stated in similar thread this is not really possible an easy way. The easier way it seems to be getting a DocX file and using DocX library which allow easy word swapping and then converting your DocX to PDF (using PDF Creator printer or so).
Or use pdf sharp/migradoc to create new documents.
Updating in PDF is hard and dirty. So may be adding a content on top of existing will work for you as well, as it worked for me. If so, here's my primitive, but working solution covering a lot of cases ("covering", indeed):
https://github.com/astef/PatchPdfText

Regex or XML Parser C#

I have some word templates(dot/dotx) files that contain xml tags along with plain text.
At run time, I need to replace the xml tags with their respective mail merge fields.
So, need to parse the document for these xml tags and replace them with merge fields.
I was using Regex to find and replace these xml tags. But I was suggested to use XML parser to parse for XML tags ([Regex for string enclosed in <*>, C#).
The sample document looks like:
Solicitor Letter
<Tfirm/>
<Tbuilding/>
<TstreetNumber/> <TstreetName/>
For the attention of: <TContact1/> <TEmail/>
Dear <TContact1/>
RE: <Pbuilding/> <PstreetNumber/> <PstreetName/> <Pvillage/> <PTown/>
We were pleased to hear that contracts have now been exchanged in the sale of the
above property on behalf of our mutual client/s. We now have pleasure in enclosing a
copy of our invoice for your kind attention upon completion.
....
One more note, the angle brackets are typed manually by end user in the template.
I tried using XMLReader, but got error as my documents have no root tags on their own.
Please guide if I should stick to Regex or is there any way to use XML Parser.
Thank you!
Unless you can get it structured as an XML document, the tools in the .NET Libraries to read XML are going to be entirely useless.
What you have is not XML. Having a tag or two that would qualify as XML does not an XML document make. The problem is that it simply does not follow any of the rules of XML.
Moral of the story is that you will have to come up with your own method to parse this. If you like to drink the RegEx kool-aid, that'll be the best solution for ya. Of course, there are plenty of ways to skin this cat.
It looks like you aren't actually using XML, just using a token that looks similar to XML as a placeholder for replacement.
If that's the case, you should be using Regex.
I would suggest neither. Microsoft has a free library in C# specifically for modifying open xml format documents without an installation of Microsoft Office.
OpenXML SDK
Doesn't seem like XML processing to me. It's not an XML doc. It's looks like straight string-replacement, and for that, you're better off with a Regular Expression.
An XML parser doesn't help you locate XML; it only helps you understand a given piece of XML. You will need some other mechanism, perhaps a Regex, to find the XML.
Seems that authors of most replies didnt read the question carefully.
inutan is asking for something that will parse Word documents. If a Word document is saved in docx format, it will be actually XML file that can be read by XML Reader or XPathReader, however I will not recomend to do it
Normally, mail merge with Word doesnt require any programming and XML parsing, see http://helpdesk.ua.edu/training/word/merg07.html
However if you still want to have XML-like fields in your Word templates and replace them with values, I would suggest using Word automation objects.
Below is an example of VBA code, for a similar code on other languages please refer MS Office development site http://msdn.microsoft.com/en-us/library/bb726434.aspx . For example if you use .NET - you should use Office interops and best of all is to install MS Visual Studio Tools for Office development http://msdn.microsoft.com/en-us/library/5s12ew2x.aspx
With Selection.Find
.Text = "<TContact1/>"
.Replacement.Text = "TContact1"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

Categories

Resources