Remove text from PDF document using Aspose.PDF library?

Remove text from PDF document using Aspose.PDF library? - c#

I need to delete a text from a PDF document. I am using Aspose for the purpose
am currently using TextFragmentAbsorber.
FYI, I cannot use any other 3rd party library.
Below is the code I am using :
private string DeleteMachineReadableCode(string inputFilePath)
{
var outputFilePath = Path.Combine(Path.GetTempPath(), string.Format(#"{0}.pdf", Guid.NewGuid()));
try
{
// Open document
Document pdfDocument = new Document(inputFilePath);
// Create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("#START#((.|\r\n)*?)#END#");
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
// Update text and other properties
textFragment.Text = string.Empty;
// Set to an instance of an object.
textFragment.TextState.Font = FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 1;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
}
pdfDocument.Save(outputFilePath);
}
finally
{
if (File.Exists(inputFilePath))
File.Delete(inputFilePath);
}
return outputFilePath;
}
I am able to replace the content if the content to be deleted is on a single page.
My problem is that if the text spans over multiple pages the TextFragmentAbsorber does not recognize the text with the mentioned regex pattern ("#START#((.|\r\n)*?)#END#").
Please suggest if anything can be done on the regex or the some setting in Aspose can fix my issue.

As shared earlier, we can not promise earlier resolution of the issue reported by you, because of architecture limitation. However, we have modified the code snippet to meet your requirements.
The idea is to find text starting from '#START#' on the one of the document pages. Then to find text ending with '#END#' on the one of subsequent pages. And also to process all text fragments that placed on the pages between those two pages (if it exists).
private string DeleteMachineReadableCodeUpdated(string inputFilePath)
{
string outputFilePath = Path.Combine(Path.GetTempPath(), string.Format(#"{0}.pdf", Guid.NewGuid()));
try
{
// Open document
Document pdfDocument = new Document(inputFilePath);
// Create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber absorber = new TextFragmentAbsorber("#START#((.|\r\n)*?)#END#");
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
absorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all pages
pdfDocument.Pages.Accept(absorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = absorber.TextFragments;
// If pattern found on one of the pages
if (textFragmentCollection.Count > 0)
{
RemoveTextFromFragmentCollection(textFragmentCollection);
}
else
{
// In case nothing was found tries to find by parts
string startingPattern = "#START#((.|\r\n)*?)\\z";
string endingPattern = "\\A((.|\r\n)*?)#END#";
bool isStartingPatternFound = false;
bool isEndingPatternFound = false;
ArrayList fragmentsToRemove = new ArrayList();
foreach (Page page in pdfDocument.Pages)
{
// If ending pattern was already found - do nothing
if (isEndingPatternFound)
continue;
// If starting pattern was already found - activate textFragmentAbsorber with ending pattern
absorber.Phrase = !isStartingPatternFound ? startingPattern : endingPattern;
page.Accept(absorber);
if (absorber.TextFragments.Count > 0)
{
// In case something is found - add it to list
fragmentsToRemove.AddRange(absorber.TextFragments);
if (isStartingPatternFound)
{
// Both starting and ending patterns found - the document processing
isEndingPatternFound = true;
RemoveTextFromFragmentCollection(fragmentsToRemove);
}
else
{
// Only starting pattern found yet - continue
isStartingPatternFound = true;
}
}
else
{
// In case neither starting nor ending pattern are found on current page
// If starting pattern was found previously - get all fragments from the page
if (isStartingPatternFound)
{
absorber.Phrase = String.Empty;
page.Accept(absorber);
fragmentsToRemove.AddRange(absorber.TextFragments);
}
// Otherwise do nothing (continue)
}
}
}
pdfDocument.Save(outputFilePath);
}
finally
{
if (File.Exists(inputFilePath))
File.Delete(inputFilePath);
}
return outputFilePath;
}
private void RemoveTextFromFragmentCollection(ICollection fragmentCollection)
{
// Loop through the fragments
foreach (TextFragment textFragment in fragmentCollection)
{
textFragment.Text = string.Empty;
}
}
Note:
This code assumed that the only one text block starting from '#START#' and ending with '#END#' is in the document. However the above code can be easly modified to process several those blocks.
Instead of processing text on intermediate page(s) you may store page number(s) and than delete using pdfDocument.Pages.Delete(pageNumber) before the saving document. It lets to avoid 'blank' pages if them undesirable.

Related

C# and ANTLR4: Handling "include" directives when parsing a file

I’m in a situation that, using ANTLR, I’m trying to parse input files that contains references to other files inside them, just like #include "[insert file name]" of C language.
One suggested approach is:
Parse the root file, saving said references as nodes (so, specific Grammar rules)
Visit the tree searching for "reference" nodes
for each reference node, parse the file referenced and substitute the node with the newly generated tree
repeat this process recursively, to handle multiple levels of inclusions
The problem with this solution is that the referenced files might be completely partial (see includes inside the body of a C function). In order to parse such files, I would have to implement a different parser to handle the fragmented grammar.
Is there any valid/suggested approach to (literally) inject the new file inside the ongoing parsing process?

One solution to this problem can be achieved by overriding Scanner's behavior and specifically, the NextToken() method.
This is necassary since the EOF token cannot be handled by the ANTLR lexer grammar ( to my best knowledge ) and any actions
attached to the lexer rule recognizing the EOF are simply ignored (as shown in the code bellow). Thus, it is necessary to
implement this behaviour directly into the scanner method.
So assume we have a parser grammar
parser grammar INCParserGrammar;
#parser::members {
public static Stack<ICharStream> m_nestedfiles = new Stack<ICharStream>();
}
options { tokenVocab = INCLexerGrammar; }
/*
* Parser Rules
*/
compileUnit
: (include_directives | ANY )+ ENDOFFILE
;
include_directives : INCLUDEPREFIX FILE DQUOTE
;
A static public Stack<ICharStream> (i.e. mySpecialFileStack) should be introduced inside grammar's members. This stack will be used to store the Character Steams associated with the files that take part in the parsing. The Character Streams are push to
this stack as new files are encountered with the include statements
and a lexer grammar
lexer grammar INCLexerGrammar;
#lexer::header {
using System;
using System.IO;
}
#lexer::members {
string file;
ICharStream current;
}
/*
* Lexer Rules
*/
INCLUDEPREFIX : '#include'[ \t]+'"' {
Mode(INCLexerGrammar.FILEMODE);
};
// The following ruls has always less length matched string that the the rule above
ANY : ~[#]+ ;
ENDOFFILE : EOF { // Any actions in the this rule are ignored by the ANTLR lexer };
////////////////////////////////////////////////////////////////////////////////////////////////////////
mode FILEMODE;
FILE : [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ { file= Text;
StreamReader s = new StreamReader(file);
INCParserGrammar.m_nestedfiles.Push(_input);
current =new AntlrInputStream(s);
};
DQUOTE: '"' {
this._input = current;
Mode(INCLexerGrammar.DefaultMode); };
The overriden body of NextToken() method will be placed in the .g4.cs file which purpose is to extend
the generated scanner class given that the generated scanner class is decorated with the "partial" keyword
After the partial Scanner Class associated to the given grammar is generated navigate to the source code of the
ANTLR4 Lexer Class as given in the ANTLR Runtime and Copy ALL the original code to this new method and,
in the middle do-while block (right after the try-catch block) add the following code:
if (this._input.La(1) == -1)
{
if ( mySpecialFileStack.Count == 0 )
this._hitEOF = true;
else
this._input = mySpecialFileStack.Pop();
}
The full body of the NextToken() method override is
public override IToken NextToken() {
int marker = this._input != null ? this._input.Mark() : throw new InvalidOperationException("nextToken requires a non-null input stream.");
label_3:
try {
while (!this._hitEOF) {
this._token = (IToken)null;
this._channel = 0;
this._tokenStartCharIndex = this._input.Index;
this._tokenStartCharPositionInLine = this.Interpreter.Column;
this._tokenStartLine = this.Interpreter.Line;
this._text = (string)null;
do {
this._type = 0;
int num;
try {
num = this.Interpreter.Match(this._input, this._mode);
} catch (LexerNoViableAltException ex) {
this.NotifyListeners(ex);
this.Recover(ex);
num = -3;
}
if (this._input.La(1) == -1) {
if (INCParserGrammar.m_nestedfiles.Count == 0 ) {
this._hitEOF = true;
}
else
{
this._input = INCParserGrammar.m_nestedfiles.Pop();
}
}
if (this._type == 0)
this._type = num;
if (this._type == -3)
goto label_3;
}
while (this._type == -2);
if (this._token == null)
this.Emit();
return this._token;
}
this.EmitEOF();
return this._token;
} finally {
this._input.Release(marker);
}
}
Now, when you recognize a file inside your code that should be parsed, simply add the following action
FILE
: [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ {
StreamReader s = new StreamReader(Text);
mySpecialFileStack.Push(_input);
_input = new AntlrInputStream(s);
};
DQUOTE: '"' { this._input = current;
Mode(INCLexerGrammar.DefaultMode); };
//***Warning:***
// Be careful when your file inclusion is enclosed inside quotes or other symbols, or if
// the filename-to-be-included is not the last token that defines an inclusion: `_input`
// should only be switched AFTER the inclusion detection is completely found (i.e. after
// the closing quote has been recognized).
Finally the main program is given below where it is apparent that the root file is added first in the ICharStream stack
static void Main(string[] args) {
var a = new StreamReader("./root.txt");
var antlrInput = new AntlrInputStream(a);
INCParserGrammar.m_nestedfiles.Push(antlrInput);
var lexer = new INCLexerGrammar(antlrInput);
var tokens = new BufferedTokenStream(lexer);
var parser = new INCParserGrammar(tokens);
parser.compileUnit();
}

Reading Mr. Grigoris's answer helped me to discover another possible solution for my problem:
While trying to figure out how does the suggested solution work, I stumbled upon public virtual IToken EmitEOF() method. If the code that Mr. Grigoris provided gets placed inside this function (with minor changes), everything seems to work as intended.
That gave me the opportunity to override the functionality of EmitEOF() directly from #members block of lexer, without having to create a whole new file or to understand how my current parser's NextToken() method works.
Lexer Grammar:
lexer grammar INCLexerGrammar;
#lexer::header {
using System;
using System.IO;
using System.Collections.Generic;
}
#lexer::members {
private Stack<ICharStream> _nestedFiles = new Stack<ICharStream>();
public override IToken EmitEOF(){
if (_nestedFiles.Count == 0 ) {
return base.EmitEOF();
};
this._hitEOF = false;
this._input = _nestedFiles.Pop();
return this.NextToken();
}
}
/////////////////////////////////////////////////////////////////////////////////////
// Default Mode /////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////
// Skipped because we don't want to hide INCLUDEPREFIX's existance from parser
INCLUDEPREFIX : '#include'[ \t]+'"' { Mode(INCLexerGrammar.FILEMODE); } -> skip;
// This is the only valid token our Grammar accepts
ANY : ~[#]+ ;
/////////////////////////////////////////////////////////////////////////////////////
mode FILEMODE; //////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////
// Skipped because we don't want to hide FILE's existance from parser
FILE : [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ {
// Create new StreamReader from the file mentioned
StreamReader s = new StreamReader(Text);
// Push the old stream to stack
_nestedFiles.Push(_input);
// This new stream will be popped and used right after, on DQUOTE.
_nestedFiles.Push(new AntlrInputStream(s));
} -> skip;
// Skipped because we don't want to hide DQUOTE's existance from parser
DQUOTE: '"' {
// Injecting the newly generated Stream.
this._input = _nestedFiles.Pop();
Mode(INCLexerGrammar.DefaultMode);
} -> skip;
Parser Grammar:
parser grammar INCParserGrammar;
options { tokenVocab = INCLexerGrammar; }
// Our Grammar contains only ANY tokens. Include directives
// and other Tokens exists only for helping lexer to
// inject the contents of other files inside the current
// scanning process.
compileUnit
: ANY+ EOF
;
Execution Calls:
// [...]
var myRootFile = new StreamReader("./root.txt");
var myAntlrInputStream = new AntlrInputStream(myRootFile);
var lexer = new INCLexerGrammar(myAntlrInputStream);
var tokens = new BufferedTokenStream(lexer);
var parser = new INCParserGrammar(tokens);
parser.compileUnit();
// [...]

How to detect that a Paragraph.Range is inside a TOC in a C# Word AddIn

I'm making a MS-Word Addin with some features. One of them is to remove excessive blank lines (the current rule says that the Document can't have more than two sequential blank lines and that it can't have blank lines after the last line of text).
I've made a code to try to achieve this:
private void formatText() {
Microsoft.Office.Interop.Word.Paragraphs paragraphs = Globals.ThisAddIn.Application.ActiveDocument.Paragraphs;
Boolean isPreviousLineEmpty = false;
Boolean isLastLine = true;
for (int i = paragraphs.Count - 1; i > 0; i--) {
Microsoft.Office.Interop.Word.Paragraph paragraph = paragraphs[i];
if (paragraph.Range.Text.Trim().Equals("")) {
if (isLastLine) {
paragraph.Range.Delete();
continue;
}
if (isPreviousLineEmpty) {
paragraph.Range.Delete(); //This is the line where the error happens
}
isPreviousLineEmpty = true;
continue;
}
if (isLastLine) {
paragraph.Range.Text = paragraph.Range.Text.TrimEnd();
isLastLine = false;
}
isPreviousLineEmpty = false;
}
}
It was working untill I've added a "Table of Contents" (TOC) to the document. Now I get an error:
System.Runtime.InteropServices.COMException: 'Cannot edit Range.'
The reason is: there is a white line on the TOC that my code is trying to remove, and it can't. I've searched the documentation/internet and tried everything I could think of to be able to prevent my code from running on TOC lines, but nothing worked.
I need a way to know that I can skip that line, because I don't need to delete blank lines inside TOCs.
For the moment, what I can do is wrap the specific line who executes the deletion with a Try/Catch block, but I don't think this is the best solution (for I may be letting other errors go unnoticed, this is just a silencer).
Does anyone know the correct approach to this case?
UPDATE:
Following Freeflow comment I replaced all my method code with this:
private void formatText() {
Microsoft.Office.Interop.Word.Find find = Globals.ThisAddIn.Application.ActiveDocument.Range().Find;
Microsoft.Office.Interop.Word.Paragraphs paragraphs = Globals.ThisAddIn.Application.ActiveDocument.Paragraphs;
Boolean operationResult = true;
//Remove blank lines at the end of the document
for (int i = paragraphs.Count - 1; i > 0; i--) {
Microsoft.Office.Interop.Word.Paragraph paragraph = paragraphs[i];
if (paragraph.Range.Text.Trim().Equals("")) {
paragraph.Range.Delete();
continue;
}
paragraph.Range.Text = paragraph.Range.Text.TrimEnd();
break;
}
//Remove blank lines between paragraphs
while (operationResult) {
operationResult = find.Execute("^p^p^p", false, false, false, false, false, false, null, null, "^p^p",
Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll);
}
}
It has been working very well untill this moment. If any problem comes up, I'll post here.
Thanks for your comment. If you transform it in an answer I'll mark as the accepted one.

The Word object model has a useful method: InRange, which allows checking whether one range of text is part of another. Logically, then, it's possible to compare whether a paragraph's location is within a TOC.
Below is a test example, originally written in VBA. I'm converting it on-the-fly to C#, so there may be some minor syntax errors...
public void TestRangeInToc()
{
Word.Document doc =Globals.ThisAddIn.Application.ActiveDocument;
bool HasToc = false;
if(doc.TablesOfContents.Count > 0)
{
HasToc = true;
}
foreach(Word.Paragraph para In doc.Paragraphs)
{
if(HasToc)
{
if(IsRangeInTOC(para.Range, doc))
{
Debug.Print("in range");
//skip this one
}
}
}
}
public bool IsRangeInTOC(Word.Range rng, Word.Document doc)
{
Word.TableOfContents toc
foreach(toc in doc.TablesOfContents)
{
if(rng.InRange(toc.Range))
{
return true;
break;
}
}
}

Opening a hyperlink generated dynamically with a string command

Backstory,
So I am working on a personal assistant program and all my voice commands are translated into strings for parsing.
I have set up the ability to search Google and display the results in a text block as hyperlinks.
Now I want to be able to set up the ability to open these links with speech(string commands). So far I have the following.
This bit allows me to search using the Google Custom Search API with a custom "GoogleSearch" class.
public void search_google(string query) //Google Searching
{
#region link strings
string result_1 = "";
string result_2 = "";
string result_3 = "";
string result_4 = "";
string result_5 = "";
string result_6 = "";
string result_7 = "";
string result_8 = "";
string result_9 = "";
string result_10 = "";
#endregion
GoogleSearch search = new GoogleSearch()
{
Key = "{apikey}",
CX = "{cxkey}"
};
search.SearchCompleted += (a, b) =>
{
tab_control.SelectedIndex = 2;
int p = 1;
search_results.Text = String.Empty;
foreach (Item i in b.Response.Items)
{
Hyperlink hyperLink = new Hyperlink()
{
NavigateUri = new Uri(i.Link)
};
hyperLink.Inlines.Add(i.Title);
hyperLink.RequestNavigate += Hyperlink_RequestNavigate;
hyperLink.Name = "result_" + p;
//search_results.Inlines.Add(hyperLink.Name);
search_results.Inlines.Add(Environment.NewLine);
search_results.Inlines.Add(hyperLink);
search_results.Inlines.Add(Environment.NewLine);
search_results.Inlines.Add(i.Snippet);
search_results.Inlines.Add(Environment.NewLine);
search_results.Inlines.Add(Environment.NewLine);
p++;
};
};
search.Search(query);
}
It outputs my results in a series of hyperlinks and text snippets into a text block that I set up on the main window. The search process is triggered by my input parser which looks for the keywords "search" or "Google".
The next step would be the input parser checking for keyword "result" to look for the hyperlink to open. Here is the unfinished code for that.
if ((Input.Contains("result") || Input.Contains("Result")) && tab_control.TabIndex == 2)
{
int result_number = 0;
switch(result_number)
{
case 1:
if (Input.Contains("first") || Input.Contains("1st"))
{
// open hyperlink with name property result_1
}
break;
case 2:
// additional cases added up to 10 with similar syntax for parsing.
}
}

You can open a hyperlink in the default browser using:
Process.Start(myHyperlink);
EDIT
Based on your comments, it seems you are having trouble accessing result_1 (etc.).
You define result_1 as a variable local to the method search_google()
public void search_google(string query) //Google Searching
{
#region link strings
string result_1 = "";
That means result_1 is only visible within that method.
Your if and switch statements do not appear to be part of search_google(), so they can never see result_1. If those statements are in a different method, you can work around that issue by moving result_1 to the class level (outside of search_google()).
ON a site note, rather than defining ten individual result strings, you probably want to use an array of strings or a list of strings.

Remove a specific line in text file with c#

I'm building an app for windows 8 desktop, I'm reading in a text file and I want to change one specific line but not sure how so what I have is a text file that says
username|false
username|false
username|false
And I want to remove the middle line when something happens, this is what I have so far;
StorageFolder folder = ApplicationData.Current.LocalFolder;
StorageFile storageFile = await folder.GetFileAsync("students.txt");
var text = await Windows.Storage.FileIO.ReadLinesAsync(storageFile);
var list_false = "";
foreach (var line in text)
{
string name = "" + line.Split('|')[0];
string testTaken = "" + line.Split('|')[1];
if (your_name.Text == name)
{
if (testTaken == "false") {
pageTitle.Text = name;
enter_name_grid.Opacity = 0;
questions_grid.Opacity = 1;
var md = new MessageDialog("Enjoy the test");
await md.ShowAsync();
}
else
{
the_name.Text = "You have already taken the test";
var md1 = new MessageDialog("You have already taken the test");
await md1.ShowAsync();
}
return;
}
else
{
list_false = "You're not on the list";
}
}
if (list_false == "You're not on the list") {
var md2 = new MessageDialog("You're not on the list");
await md2.ShowAsync();
}
Help please, it reads in names perfectly and allows them to take the test, I just need it to remove the correct line. Thanks in advance!!

The important thing to consider is that you are modifying a file. So whatever you choose to change then you need to write it back to the file.
In your case you are opting to read the whole file into memory, this actually works in your favor for something like this as you can just remove any unwanted lines and write back to the file. However, you cannot remove an item while you are iterating through the list using a foreach loop.
The best practice for removing items from an array you are looping is to use a for loop and loop in reverse. It also makes it easier to remove items if we work with a List<string> too, like so:
var list = new List<string>(text);
for(int i = text.Length - 1; i >=0; i--)
{
string line = text[i];
//rest of code
}
text = list.ToArray();
The next part of your task is to remove the line. You can do this in your else statement as this is the part that handles users already having taken the test. For example:
the_name.Text = "You have already taken the test";
list.RemoveAt(i);
Finally, after your loop you need to write the whole thing back to the file:
await Windows.Storage.FileIO.WriteLinesAsync(storageFile, text);

When you read the file, you could store the contents in a list. When your "something happens" you could remove the content at the appropriate index and save (overwrite) the list to the file.

Replace tokens in an aspx page on load

I have an aspx page that contains regular html, some uicomponents, and multiple tokens of the form {tokenname} .
When the page loads, I want to parse the page content and replace these tokens with the correct content. The idea is that there will be multiple template pages using the same codebehind.
I've no trouble parsing the string data itself, (see named string formatting, replace tokens in template) my trouble lies in when to read, and how to write the data back to the page...
What's the best way for me to rewrite the page content? I've been using a streamreader, and the replacing the page with Response.Write, but this is no good - a page containing other .net components does not render correctly.
Any suggestions would be greatly appreciated!

Take a look at System.Web.UI.Adapters.PageAdapter method TransformText - generally it is used for multi device support, but you can postprocess your page with this.

I'm not sure if I'm answering your question, but...
If you can change your notation from
{tokenname}
to something like
<%$ ZeusExpression:tokenname %>
you could consider creating your System.Web.Compilation.ExpressionBuilder.
After reading your comment...
There are other ways of getting access to the current page using ExpressionBuilder: just... create an expression. ;-)
Changing just a bit the sample from MSDN and supposing the code of your pages contain a method like this
public object GetData(string token);
you could implement something like this
public override CodeExpression GetCodeExpression(BoundPropertyEntry entry, object parsedData, ExpressionBuilderContext context)
{
Type type1 = entry.DeclaringType;
PropertyDescriptor descriptor1 = TypeDescriptor.GetProperties(type1)[entry.PropertyInfo.Name];
CodeExpression[] expressionArray1 = new CodeExpression[1];
expressionArray1[0] = new CodePrimitiveExpression(entry.Expression.Trim());
return new CodeCastExpression(
descriptor1.PropertyType,
new CodeMethodInvokeExpression(
new CodeThisReferenceExpression(),
"GetData",
expressionArray1));
}
This replaces your placeholder with a call like this
(string)this.GetData("tokenname");
Of course you can elaborate much more on this, perhaps using a "utility method" to simplify and "protect" access to data (access to properties, no special method involved, error handling, etc.).
Something that replaces instead with (e.g.)
(string)Utilities.GetData(this, "tokenname");
Hope this helps.

Many thanks to those that contributed to this question, however I ended up using a different solution -
Overriding the render function as per this page, except I parsed the page content for multiple different tags using regular expressions.
protected override void Render(HtmlTextWriter writer)
{
if (!Page.IsPostBack)
{
using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
{
using (System.IO.StreamWriter streamWriter = new System.IO.StreamWriter(stream))
{
HtmlTextWriter htmlWriter = new HtmlTextWriter(streamWriter);
base.Render(htmlWriter);
htmlWriter.Flush();
stream.Position = 0;
using (System.IO.StreamReader oReader = new System.IO.StreamReader(stream))
{
string pageContent = oReader.ReadToEnd();
pageContent = ParseTagsFromPage(pageContent);
writer.Write(pageContent);
oReader.Close();
}
}
}
}
else
{
base.Render(writer);
}
}
Here's the regex tag parser
private string ParseTagsFromPage(string pageContent)
{
string regexPattern = "{zeus:(.*?)}"; //matches {zeus:anytagname}
string tagName = "";
string fieldName = "";
string replacement = "";
MatchCollection tagMatches = Regex.Matches(pageContent, regexPattern);
foreach (Match match in tagMatches)
{
tagName = match.ToString();
fieldName = tagName.Replace("{zeus:", "").Replace("}", "");
//get data based on my found field name, using some other function call
replacement = GetFieldValue(fieldName);
pageContent = pageContent.Replace(tagName, replacement);
}
return pageContent;
}
Seems to work quite well, as within the GetFieldValue function you can use your field name in any way you wish.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove text from PDF document using Aspose.PDF library? - c#

Related

C# and ANTLR4: Handling "include" directives when parsing a file

How to detect that a Paragraph.Range is inside a TOC in a C# Word AddIn

Opening a hyperlink generated dynamically with a string command

Remove a specific line in text file with c#

Replace tokens in an aspx page on load

Categories

Resources