How to replace text in a PDF with C#?

How to replace text in a PDF with C#? - c#

I saw a lot of solutions in here but none are clear or good answers.
Here is my simple question, hoping with a straight answer.
I have a PDF file (a template) which is created having text something like this:
{FIRSTNAME} {LASTNAME} {ADDRESS} {PHONENUMBER}
is it possible to have C# code that replace these templates with a text of my choice?
No fields, no other complex stuff.
Is there any Open source library helping me achieve that?

This thread is dead, however I'm posting my solution for other lost souls that might face this problem in the future. Unfortunately my company doesn't allow posting code online so I'll describe the solution :).
So basically what you have to do is use PdfSharp and modify this sample to replace text in stream, but you must take into account that text may be split into many parentheses (convert stream to string to see what the format is).
Then, with code similar to this sample traverse through source pdf page by page and modify current page by searching for PdfContent items inside PdfReference items and replacing text in content's stream.

The 'problem' with PDF documents is that they are inherently not suitable for editing. Especially ones without fields. The best thing is to step back and look at your process and see if there is a way to replace the text before the PDF was generated. Obviously, you may not always have this freedom.
If you will be able to replace text, then you should be aware that there will be no automatic reflow of the text following the replaced text. Given that you are fine with that, then there are very few solutions that allows you to replace text.
I know that you are looking for an OpenSource solution so I feel reluctant to offer you a commercial solution. We offer one called PDFKit.NET. It allows you to extract all content on a page as so-called shapes (text, images, curves, etc.). See method Page.CreateShapes in the type reference. You can then programmatically navigate and edit this structure of shapes and then write it back to a PDF again.
Here it is:
http://www.tallcomponents.com/pdfkit
Disclosure: I am the founder of TallComponents, vendor of this component

For simple text replace use iTextSharp library.
The code that replace one string with another is below.
Note that this will replace only simple text and may not work in all cases.
//using iTextSharp.text.pdf;
void VerySimpleReplaceText(string OrigFile, string ResultFile, string origText, string replaceText)
{
using (PdfReader reader = new PdfReader(OrigFile))
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
byte[] contentBytes = reader.GetPageContent(i);
string contentString = PdfEncodings.ConvertToString(contentBytes, PdfObject.TEXT_PDFDOCENCODING);
contentString = contentString.Replace(origText, replaceText);
reader.SetPageContent(i, PdfEncodings.ConvertToBytes(contentString, PdfObject.TEXT_PDFDOCENCODING));
}
new PdfStamper(reader, new FileStream(ResultFile, FileMode.Create, FileAccess.Write)).Close();
}
}

As stated in similar thread this is not really possible an easy way. The easier way it seems to be getting a DocX file and using DocX library which allow easy word swapping and then converting your DocX to PDF (using PDF Creator printer or so).
Or use pdf sharp/migradoc to create new documents.

Updating in PDF is hard and dirty. So may be adding a content on top of existing will work for you as well, as it worked for me. If so, here's my primitive, but working solution covering a lot of cases ("covering", indeed):
https://github.com/astef/PatchPdfText

Related

Is it possible to split a PDF into separate files based on text using C#?

I have a large single pdf document which consists of multiple records. Each record usually takes one page however some use 2 pages. A record starts with a defined text, always the same.
My goal is to split this pdf into separate pdfs and the split should happen always before the "header text" is found.

Yes it's possible.
TikaOnDotnet
Take a look at TikaOnDotnet.TextExtractor it's a wrapper around the Tika text extraction Java library.
You can get data from your pdf easy like that:
var text = new TextExtractor().Extract(file.FullName).Text;
docs: https://github.com/KevM/tikaondotnet
nuget: https://www.nuget.org/packages/TikaOnDotnet.TextExtractor/
itext7
You can also use itext7
docs: https://api.itextpdf.com/iText7/dotnet/7.1.9/index.html
nuget: https://www.nuget.org/packages/itext7/

Copy content from a Word document to another with the style

I want to copy the content of a section in a Word document to a new document.
I do this to copy :
var docPath = #"C:\temp\myDoc.docx";
var doc = word.Documents.Open(FileName: docPath, ReadOnly: true);
var emptyDoc = word.Documents.Add();
doc.Sections.First.Range.Copy();
emptyDoc.Sections.First.Range.Paste();
This works well to copy content, but the style is not the same. How can I copy the complete section and have it rendered exactly the same way in the new document ?
If there is a better solution involving the OpenXML SDK instead of VSTO, I can take it.

You will find it much easier to automate Word if you do things manually first. That way you can get a better understanding of the various options available etc. You can also record a macro which will often, though not always, provide the answer.
In this instance you need to automate selecting 'Keep Source Formatting' from the context toolbar that appears after pasting. The code you need for that is:
emptyDoc.Sections.First.Range.PasteAndFormat wdFormatOriginalFormatting

Manipulating SVG files with C#

I need to be able to edit the text and images of an SVG file that has been rendered in Adobe Illustrator.
How can I iterate through the elements of an SVG file, check for type = text, change the value, and save the file to disk? Is there any library available that could help me?
So far I've tried this basic library but it doesn't do well with complex SVG structures.

SVG RENDERING ENGINE
I used this one for a project.
There were a few flaws but it did the job.

This may be very late to answer but for the sake of others if they land in this page, You can use HTMLAgilityPack. Here is the link to a similar question: What is the best way to parse html in C#?
I have used it in my case where i needed to edit the svg string and replace some values like this:
HtmlDocument theDocument = new HtmlDocument();
theDocument.LoadHtml(svgChartImg1);
HtmlNodeCollection theNodes = theDocument.DocumentNode.SelectNodes("//tspan");
Here, the svgChartImg1 is an svg xml string.

generating rtf from template in c# winforms app

Is there a simple process someone can recommend for generating an rtf document from a pre-built "template" and populate fields.
I would prefer to avoid ms word automation type solutions as i cannot guarantee ms office versions etc.
Resulting file needs to be editable so I cant go pdf
is it as simple as using something like nvelocity, or do i need to do something fancier?
thanks

You can always use keywords : Example:
This is a [text] that needs to be [replaced]
and the use a stringbuilder to replace the keywords.
StringBuilder sb = new StringBuilder(yourTemplate);
sb.Replace("[text]", "car");
sb.Replace("[replaced]", "washed");
So if you get me the final text will be :
This is a car that needs to be wahsed.
This is just one way of doing it.

C# WPF Open File and edit certain text

So let's say I have a program with just a text box and an okay button. The user types in whatever word he wants, and when he clicks ok, it opens a specific file called Test.doc and CTRL+F for the word "test" and replaces it with whatever the user entered into the text box. How can I open said file and replace instances of the word test with the user's defined word?

Ignoring the format of the document, you could literally use the folowing for any type of file:
var contents = System.IO.File.ReadAllText(#"C:\myDoc.doc");
contents = contents.Replace("Test", "Tested");
System.IO.File.WriteAllText(#"C:\myDoc.doc", contents);
The best way would be to use the ms office interop library though.
Andrew

A number of things:
I'd recommend using a FileDialog to get the file's location. This lets you select the file to edit, but also gives you functionality to only show the file types that you want to handle in this program.
If you're handling .doc's, I'd suggest you look into VSTO and opening word docs. Here's a guide I found after a quick search. I'd suggest using it as a place to start, but you'll need to look around for more specifics.
Lastly, the string.Replace("", ""); method is probably very helpful in the CTRL-F functionality. You should be able to extract a string of the text from whatever document you're analyzing and use that method.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to replace text in a PDF with C#? - c#

As stated in similar thread this is not really possible an easy way. The easier way it seems to be getting a DocX file and using DocX library which allow easy word swapping and then converting your DocX to PDF (using PDF Creator printer or so). Or use pdf sharp/migradoc to create new documents.

Updating in PDF is hard and dirty. So may be adding a content on top of existing will work for you as well, as it worked for me. If so, here's my primitive, but working solution covering a lot of cases ("covering", indeed): https://github.com/astef/PatchPdfText

Related

Is it possible to split a PDF into separate files based on text using C#?

Copy content from a Word document to another with the style

Manipulating SVG files with C#

generating rtf from template in c# winforms app

C# WPF Open File and edit certain text

Categories

Resources