Dynamically Built Regular Expressions are running extremely slow! - c#

I'm generating regular expressions dynamically by running through some xml structure and building up the statement as I shoot through its node types. I'm using this regular expression as part of a Layout type that I defined. I then parse through a text file that has an Id in the beginning of each line. This id points me to a specific layout. I then try to match the data in that row against its regex.
Sounds fine and dandy right? The only problem is it is matching strings extremely slow. I have them set as compiled to try and speed things up a bit, but to no avail. What is baffling is that these expressions aren't that complex. I am by no means a RegEx guru, but I know a decent amount about them to get things going well.
Here is the code that generates the expressions...
StringBuilder sb = new StringBuilder();
//get layout id and memberkey in there...
sb.Append(#"^([0-9]+)[ \t]{1,2}([0-9]+)");
foreach (ColumnDef c in columns)
{
sb.Append(#"[ \t]{1,2}");
switch (c.Variable.PrimType)
{
case PrimitiveType.BIT:
sb.Append("(0|1)");
break;
case PrimitiveType.DATE:
sb.Append(#"([0-9]{2}/[0-9]{2}/[0-9]{4})");
break;
case PrimitiveType.FLOAT:
sb.Append(#"([-+]?[0-9]*\.?[0-9]+)");
break;
case PrimitiveType.INTEGER:
sb.Append(#"([0-9]+)");
break;
case PrimitiveType.STRING:
sb.Append(#"([a-zA-Z0-9]*)");
break;
}
}
sb.Append("$");
_pattern = new Regex(sb.ToString(), RegexOptions.Compiled);
The actual slow part...
public System.Text.RegularExpressions.Match Match(string input)
{
if (input == null)
throw new ArgumentNullException("input");
return _pattern.Match(input);
}
A typical "_pattern" may have about 40-50 columns. I'll save from pasting the entire pattern. I try to group each case so that I can enumerate over each case in the Match object later on.
Any tips or modifications that could drastically help? Or is this running slowly to be expected?
EDIT FOR CLARITY: Sorry, I don't think I was clear enough the first time around.
I use an xml file to generate regex's for a specific layout. I then run through a file for a data import. I need to make sure that each line in the file matches the pattern it says its supposed to be. So, patterns could be checked against multiple times, possible thousands.

You are parsing a 50 column CSV file (that uses tabs) with regex?
You should just remove duplicate tabs, then split the text on \t. Now you have all of your columns in an array. You can use your ColumnDef object collection to tell you what each column is.
Edit: Once you have things split up, you could optionally use regex to verify each value, this should be much faster than using the giant single regex.
Edit2: You also get an additional benefit of knowing exactly what column(s) is badly formated and you can produce an error like "Sytax error in column 30 on line 12, expected date format."

Some performance thoughts:
use [01] instead of (0|1)
use non-capturing groups (?:expr) instead of capturing groups (if you really need grouping)
Edit   As it seems that your values are separated by whitespace, why don’t you split it up there?

Regular expression are expensive to create and are even more expensive if you compile them. So the problem is that you are creating many regular expressions but use them only once.
You should cache them for reuse and relly don't compile them if you don't want to use them really often. I have never meassured that, but I could imagine that you will have to use a simple regular expression well over 100 times to outweight the cost of the compilation.
Performance test
Regex: "^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+(?:[a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)$"
Input: "www.stackoverflow.com"
Results in milliseconds per iteration
one regex, compiled, 10,000 iterations: 0.0018 ms
one regex, not compiled, 10,000 iterations: 0.0021 ms
one regex per iteration, not compiled, 10,000 iterations: 0.0287 ms
one regex per iteration, compiled, 10,000 iterations: 4.8144 ms
Note that even after 10,000 iterations the compiled and uncompiled regex are still very close together comparing their performance. With increasing number of iterations the compiled regex performs better.
one regex, compiled, 1,000,000 iterations: 0.00137 ms
one regex, not compiled, 1,000,000 iterations: 0.00225 ms

Well. Building the pattern using a StringBuilder will save a few cycles, compared to concatenating them.
An optimization on this that is drastic (can be visibly measured) is most likely going to be doing this through some other method.
Regular expressions are slow ... powerful but slow. Parsing through a text-file and then comparing using regular expressions just to retrieve the right bits of data is not going to be very quick (dependent on the host computer and size of text file).
Perhaps storing the data in some other method rather than a large text file would be more efficient to parse (use XML for that as well?), or perhaps a comma separated list.

Having a potential of 50 match groups in a single expression by default is going to be a bit slow. I would do a few things to see if you can pin down the performance setback.
Start by trying a hard coded, vs dynamic comparison and see if you have any performance benefit.
Look at your requirements and see if there is any way you can reduce the number of groupings that you need to evaluate
Use a profiler tool if needed, such as Ants Profiler to see the location of the slowdown.

I would just build a lexer by hand.
In this case it looks like you have a bunch of fields seperated by tabs, with a record terminated by a new line. The XML file appears to describe the sequence of columns, and their types.
Writing code to recognize each case by hand is probably 5-10 lines of code at the worst case.
You would then want to simply generate an arraay of PrimitiveType[] values from the xml file, and then call the "GetValues" function below.
This should allow you to make a single pass through the input stream, which should give a big boost over using regexes.
You'll need to supply the "ScanXYZ" methods your self. They should be easy to write. It's best to implement them w/out using regexes.
public IEnumerable<object[]> GetValues(TextReader reader, PrimitiveType[] schema)
{
while (reader.Peek() > 0)
{
var values = new object[schema.Length];
for (int i = 0; i < schema.Length; ++i)
{
switch (schema[i])
{
case PrimitiveType.BIT:
values[i] = ScanBit(reader);
break;
case PrimitiveType.DATE:
values[i] = ScanDate(reader);
break;
case PrimitiveType.FLOAT:
values[i] = ScanFloat(reader);
break;
case PrimitiveType.INTEGER:
values[i] = ScanInt(reader);
break;
case PrimitiveType.STRING:
values[i] = ScanString(reader);
break;
}
}
EatTabs(reader);
if (reader.Peek() == '\n')
{
break;
}
if (reader.Peek() == '\n')
{
reader.Read();
}
else if (reader.Peek() >= 0)
{
throw new Exception("Extra junk detected!");
}
yield return values;
}
reader.Read();
}

Related

Regex ignore a pattern

I am trying to figure out a viable way to go about parsing this CSV file. Currently I am using filehelpers which is great. But with this csv file it seems to be having issues.
Each record in the the csv file is contained in quotes and delimited by a comma.
The records have commas within them and 1 record out of the 90,000 records im dealing with has one single " that mucks up the Readline.
The record looks like this "24" Blah ",
So I'm looking to write a regex to insert into the BeforeReadRecord that will go through and replace all instances of " with a space.
I'm newer to regex but I'm not finding any way to exclude three cases.
Case one: each line starts with a "
Case two: each line ends with a "
Case three: each field is separated by ","
I am trying to figure out how I could exclude those three cases and be left to just replace any straggler " .
So far I've been failing miserably and am not even sure if there is a way to accomplish this. Perhaps someone knows of a better csv parser that handles this one odd case as well?
EDIT: Well here's what I ended up with. It takes a little time to process(also just changes any outlier " to ' which is fine since the data that contains quotes is needed for any queries) but looking for any pitfalls I may be falling in to make it faster but it seemed to be the quickest solution so far(took about 7 seconds for 92,000 records) but there doesn't seem any way around checking every line so... My previous solution was a nasty nested if that seemed to 30 seconds or so over the course of processing the records. It accounts for all scenarios except for where someone decides to put a random ", at the end of a field... hoping I don't run into a record like this but it wouldn't surprise me.
in its own method{
engine.BeforeReadRecord += (sender, args) =>
args.RecordLine = checkQuote(args.RecordLine);
var records = engine.ReadFile(reportFilePath);
}
private static string checkQuote(string checkString)
{
if (checkString.Substring(0, 1) == #"""")
{
string removeQuote = #"""" + checkString.Replace(#"""", "'").Replace(#"','", #""",""").Remove(checkString.Length-1,1).Remove(0,1) + #"""";
return removeQuote;
}
else
return checkString; }
File format readers typically don't handle malformed input well. Why should they? If you give a CSV reader bad data, I would expect it to barf. I've rarely had good luck with computer software that makes assumptions about what I meant.
Do you really need a regular expression? If you define a straggler as the last quote character when the number is odd, then it's trivial to remove the last one: just count them and if the number is odd, remove the last one.
For example:
var quoteCount = inputString.Count(c => c == '\"');
if ((quoteCount % 2) == 1)
{
inputString = inputString.Remove(inputString.LastIndexOf('\"'));
}
Done and done.
You could also do it in a single pass with a loop, but that's probably overkill. I strongly suspect that sanitizing the input is not a major bottleneck in your program.
For more complex patterns (i.e. you're looking for "," or for a quote at the start and end, you just write a simple state machine. It's probably a dozen lines of code.
I realize that you might be able to do this with regular expressions. I find regex great for finding stuff and doing simple replacements. For more complicated rules like "replace quote with space unless the quote is at the beginning or end of line or next to a comma", I find it hard to come up with a good expression. For example, what about this case:
"first name","last name","","phone"
You have to take that blank field (i.e. "") into account. You also have to take into account spaces between fields (i.e. "first" , "last" , ""), and a whole host of other things. I'm reasonably sure that regex can do it. My experience has been that I can usually write the simple state machine and prove that it's correct faster than I can puzzle out the required regex. And it's certain that I'll more easily understand the state machine six months later.

Whats the best way to optimise my regex matching

I've got an app with a textbox in it. The user enters text into this box.
I have the following function triggered in an OnKeyUp event in that textbox
private void bxItemText_KeyUp(object sender, System.Windows.Input.KeyEventArgs e)
{
// rules is an array of regex strings
foreach (string rule in rules)
{
Regex regex = new Regex(rule);
if (regex.Match(text.Trim().ToLower()))
{
// matched rule is a variable
matchedRule = rule;
break;
}
}
}
I've got about 12 strings in rules, although this can vary.
Once the length of the text in my textbox gets bigger than 80 characters performance starts to degrade. typing a letter after 100 characters takes a second to show up.
How do I optimise this? Should I match on every 3rd KeyUp ? Should I abandon KeyUp altogether and just auto match every couple of seconds?
How do I optimise this? Should I match on every 3rd KeyUp ? Should I abandon KeyUp altogether and just auto match every couple of seconds?
I would go with the second option, that is abandon KeyUp and trigger the validation every couple of seconds, or better yet trigger the validation when the TextBox loses focus.
On the other hand, I should suggest to cache the regular expressions beforehand and compile them because it seems like you are using them over and over again, in other words instead of storing the rules as strings in that array, you should store them as compiled regular expression objects when they are added or loaded.
Use static method calls instead of create a new object each time, static calls use a caching feature : Optimizing Regular Expression Performance, Part I: Working with the Regex Class and Regex Objects.
That will be a major improvement in performance, then you can provide your regexes (rules) to see if some optimization can be done in the regexes.
Other resources :
Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking
Optimizing Regex Performance, Part 3
Combining strings to one on Regex level will work faster than foreach in code.
Combining two Regex to one
If you need pattern determination for Each new symbol, and you care about performance, than Final State Machine seems to be the best option...
That is much harder way. You should specify for each symbol list of next symbols, that are allowed.
And OnKeyUp you just walk on next state, if possible. And you will have the amount of patterns, that input text currently matches.
Some useful references, that I could found:
FSM example
Guy explaining how to convert Regex to FSM
Regex - FSM converting discussion
You don't need to create a new regex object each time. Also using static call will cache the pattern if used before (since .Net 2). Here is how I would rewrite it
matchedRule = Rules.FirstOrDefault( rule => Regex.IsMatch(text.Trim().ToLower(), rule));
Given that you seem to be matching keywords, can you just perform the match on the current portion of text that's been edited (i.e. in the vicinity of the cursor)? Might be tricky to engineer, especially for operations like paste or undo, but scope for a big performance gain.
Pre-compile your regexes (using RegexOptions.Compiled). Also, can you make the Trim and ToLower redundant by expanding your regex? You're running Trim and ToLower once for each rule, which is inefficient even if you can't eliminate them altogether
You can try and make your rules mutually exclusive - this should speed things up. I did a short test: matching against the following
"cat|car|cab|box|balloon|button"
can be sped up by writing it like this
"ca(t|r|b)|b(ox|alloon|utton)"

Parsing A String - Is There A More Efficient Method than Checking Each Line?

I am working on a project to parse out a text file. The file is output from networking equipment. The incoming string is anywhere from a few thousand to tens of thousands of lines long. There will be a variable number of entries with keywords like these:
fcN/N is up
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 20:52:00:0d:ec:ef:b0:40
Admin port mode is F, trunk mode is on
snmp link state traps are enabled
Port vsan is 10
fcipN is up
.....
port-channel-N is trunking
......
The N is a number. There will always be the 'fcN/N' entries, there may or may not be the other two. The 'fcip' and 'port-channel' entries will have similar status information after each one as the fcN/N entries. All entries of the same type will be grouped - there won't be an fc followed by an fcip followed by another fc. Also as a general rule, all the fc entries are listed, then all the port-channel then all the fcip but I don't want to assume that. At the moment I have about 7 different RegEx patterns I am looking for. I do this by examining each line in turn, however managing all those is cumbersome. I thought about splitting the string on newline and then some kind of LINQ select to get all of each of the 3 types of entries, but that assumes they are always grouped in the same order. I also thought about 3 monster regexes to match everything from one entry to the next, but my experience is those are tough to get working and almost unreadable. Another thing I thought of was first match the three keywords - fc or port-channel or fcip, then have an if statement that matches the patterns unique to those. That is still matching each line for all 3 patterns though.
To be clear, I have the Regex patterns working. I am looking for a more efficient way to do this than test each line for 6 0r 8 matches.
Any other ideas?
I have two thought:
(1) Your last approach of using if statements to first find the right regex to apply is like to be quite efficient. I'd recommend it.
(2) You can compose regex's like this:
var pattern1 = #"abc";
var pattern2 = #"def";
var unionPattern = "((" + pattern1 + ")|(" + pattern2 + "))";
This makes it much more readable.
If you never want to find a match that spans lines you should split the file into lines first. That will improve efficiency because the regexes have smaller inputs and will backtrack less.
If your matches span multiple lines but they always start after a new-line, you can you can split the string into chunks first like this:
var chunks = Regex.Split(str, "((fc\d)|(fcip\d)|(port-channel-\d)));
You might get clearer and more concise code by using a parser combinator library, such as Sprache.
Not being a C# programmer, I'm not intimately familiar with this library (and there may well be others for C# as well), but I've used Scala parser combinators to good effect, and they build on and use regular expression parsing.
Whether it make your code more efficient likely depends on how inefficient your code now is.
Are you looking for raw speed, or efficiency? If the former, you can split the file into parts and have a thread parsing each part simultaneously. The trick will be finding a boundary to split on (so that each part contains only whole entries) quickly. You will also only want to go multithreaded if the total number of lines is large, or the overhead will outweigh the parallelization gains.

C# Code/Algorithm to Search Text for Terms

We have 5mb of typical text (just plain words). We have 1000 words/phrases to use as terms to search for in this text.
What's the most efficient way to do this in .NET (ideally C#)?
Our ideas include regex's (a single one, lots of them) plus even the String.Contains stuff.
The input is a 2mb to 5mb text string - all text. Multiple hits are good, as in each term (of the 1000) that matches then we do want to know about it. Performance in terms of entire time to execute, don't care about footprint. Current algorithm gives about 60 seconds+ using naive string.contains. We don't want 'cat' to provide a match with 'category' or even 'cats' (i.e. entire term word must hit, no stemming).
We expect a <5% hit ratio in the text. The results would ideally just be the terms that matched (dont need position or frequency just yet). We get a new 2-5mb string every 10 seconds, so can't assume we can index the input. The 1000 terms are dynamic, although have a change rate of about 1 change an hour.
A naive string.Contains with 762 words (the final page) of War and Peace (3.13MB) runs in about 10s for me. Switching to 1000 GUIDs runs in about 5.5 secs.
Regex.IsMatch found the 762 words (much of which were probably in earlier pages as well) in about .5 seconds, and ruled out the GUIDs in 2.5 seconds.
I'd suggest your problem lies elsewhere...Or you just need some decent hardware.
Why reinvent the wheel? Why not just leverage something like Lucene.NET?
have you considered the following:
do you care about substring? lets say I am looking for the word "cat", nothing more or nothing less. now consider the Knuth-Morris-Pratt algorithm, or string.contains for "concatinate". both of these will return true (or an index). is this ok?
Also you will have to look into the idea of the stemmed or "Finite" state of the word. lets look for "diary" again, the test sentance is "there are many kinds of diaries". well to you and me we have the word "diaries" does this count? if so we will need to preprocess the sentance converting the words to a finite state (diaries -> diary) the sentance will become "there are many kind of diary". now we can say that Diary is in the sentance (please look at the porter Stemmer Algroithm)
Also when it comes to processing text (aka Natrual Langauge Processing) you can remove some words as noise, take for example "a, have, you, I, me, some, to" <- these could be considered as useless words, and can then be removed before any processing takes place? for example
"I have written some C# today", if i have 10,000 key works to look for I would have to scan the entire sentance 10,000 x the number of words in the sentance. removing noise before hand will shorting the processing time
"written C# today" <- removed noise, now there are lots less to look throught.
A great article on NLP can be found here. Sentance comparing
HTH
Bones
A modified Suffix tree would be very fast, though it would take up a lot of memory and I don't know how fast it would be to build it. After that however every search would take O(1).
Here's another idea: Make a class something like this:
class Word
{
string Word;
List<int> Positions;
}
For every unique word in your text you create an instance of this class. Positions array will store positions (counted in words, not characters) from the start of the text where this word was found.
Then make another two lists which will serve as indexes. One will store all these classes sorted by their texts, the other - by their positions in the text. In essence, the text index would probably be a SortedDictionary, while the position index would be a simple List<Word>.
Then to search for a phrase, you split that phrase into words. Look up the first word in the Dictionary (that's O(log(n))). From there you know what are the possible words that follow it in the text (you have them from the Positions array). Look at those words (use the position index to find them in O(1)) and go on, until you've found one or more full matches.
Are you trying to achieve a list of matched words or are you trying to highlight them in the text getting the start and length of the match position? If all you're trying to do is find out if the words exist, then you could use subset theory to fairly efficiently perform this.
However, I expect you're trying to each match's start position in the text... in which case this approach wouldn't work.
The most efficient approach I can think is to dynamically build a match pattern using a list and then use regex. It's far easier to maintain a list of 1000 items than it is to maintain a regex pattern based on those same 1000 items.
It is my understanding that Regex uses the same KMP algorithm suggested to efficiently process large amounts of data - so unless you really need to dig through and understand the minutiae of how it works (which might be beneficial for personal growth), then perhaps regex would be ok.
There's quite an interesting paper on search algorithms for many patterns in large files here: http://webglimpse.net/pubs/TR94-17.pdf
Is this a bottleneck? How long does it take? 5 MiB isn't actually a lot of data to search in. Regular expressions might do just fine, especially if you encode all the search strings into one pattern using alternations. This basically amortizes the overall cost of the search to O(n + m) where n is the length of your text and m is the length of all patterns, combined. Notice that this is a very good performance.
An alternative that's well suited for many patterns is the Wu Manber algorithm. I've already posted a very simplistic C++ implementation of the algorithm.
Ok, current rework shows this as fastest (psuedocode):
foreach (var term in allTerms)
{
string pattern = term.ToWord(); // Use /b word boundary regex
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(bigTextToSearchForTerms))
{
result.Add(term);
}
}
What was surprising (to me at least!) is that running the regex 1000 times was faster that a single regex with 1000 alternatives, i.e. "/b term1 /b | /b term2 /b | /b termN /b" and then trying to use regex.Matches.Count
How does this perform in comparison? It uses LINQ, so it may be a little slower, not sure...
List<String> allTerms = new List<String>(new String(){"string1", "string2", "string3", "string4"});
List<String> matches = allTerms.Where(item => Regex.IsMatch(bigTextToSearchForTerms, item, RegexOptions.IgnoreCase));
This uses classic predicates to implement the FIND method, so it should be quicker than LINQ:
static bool Match(string checkItem)
{
return Regex.IsMatch(bigTextToSearchForTerms, checkItem, RegexOptions.IgnoreCase);
}
static void Main(string[] args)
{
List<String> allTerms = new List<String>(new String(){"string1", "string2", "string3", "string4"});
List<String> matches = allTerms.Find(Match);
}
Or this uses the lambda syntax to implement the classic predicate, which again should be faster than the LINQ, but is more readable than the previous syntax:
List<String> allTerms = new List<String>(new String(){"string1", "string2", "string3", "string4"});
List<String> matches = allTerms.Find(checkItem => Regex.IsMatch(bigTextToSearchForTerms, checkItem, RegexOptions.IgnoreCase));
I haven't tested any of them for performance, but they all implement your idea of iteration through the search list using the regex. It's just different methods of implementing it.

Best way to replace tokens in a large text template

I have a large text template which needs tokenized sections replaced by other text. The tokens look something like this: ##USERNAME##. My first instinct is just to use String.Replace(), but is there a better, more efficient way or is Replace() already optimized for this?
System.Text.RegularExpressions.Regex.Replace() is what you seek - IF your tokens are odd enough that you need a regex to find them.
Some kind soul did some performance testing, and between Regex.Replace(), String.Replace(), and StringBuilder.Replace(), String.Replace() actually came out on top.
The only situation in which I've had to do this is sending a templated e-mail. In .NET this is provided out of the box by the MailDefinition class. So this is how you create a templated message:
MailDefinition md = new MailDefinition();
md.BodyFileName = pathToTemplate;
md.From = "test#somedomain.com";
ListDictionary replacements = new ListDictionary();
replacements.Add("<%To%>", someValue);
// continue adding replacements
MailMessage msg = md.CreateMailMessage("test#someotherdomain.com", replacements, this);
After this, msg.Body would be created by substituting the values in the template. I guess you can take a look at MailDefinition.CreateMailMessage() with Reflector :). Sorry for being a little off-topic, but if this is your scenario I think it's the easiest way.
Well, depending on how many variables you have in your template, how many templates you have, etc. this might be a work for a full template processor. The only one I've ever used for .NET is NVelocity, but I'm sure there must be scores of others out there, most of them linked to some web framework or another.
string.Replace is fine. I'd prefer using a Regex, but I'm *** for regular expressions.
The thing to keep in mind is how big these templates are. If its real big, and memory is an issue, you might want to create a custom tokenizer that acts on a stream. That way you only hold a small part of the file in memory while you manipulate it.
But, for the naiive implementation, string.Replace should be fine.
If you are doing multiple replaces on large strings then it might be better to use StringBuilder.Replace(), as the usual performance issues with strings will appear.
Regular expressions would be the quickest solution to code up but if you have many different tokens then it will get slower. If performance is not an issue then use this option.
A better approach would be to define token, like your "##" that you can scan for in the text. Then select what to replace from a hash table with the text that follows the token as the key.
If this is part of a build script then nAnt has a great feature for doing this called Filter Chains. The code for that is open source so you could look at how its done for a fast implementation.
Had to do something similar recently. What I did was:
make a method that takes a dictionary (key = token name, value = the text you need to insert)
Get all matches to your token format (##.+?## in your case I guess, not that good at regular expressions :P) using Regex.Matches(input, regular expression)
foreach over the results, using the dictionary to find the insert value for your token.
return result.
Done ;-)
If you want to test your regexes I can suggest the regulator.
FastReplacer implements token replacement in O(n*log(n) + m) time and uses 3x the memory of the original string.
FastReplacer is good for executing many Replace operations on a large string when performance is important.
The main idea is to avoid modifying existing text or allocating new memory every time a string is replaced.
We have designed FastReplacer to help us on a project where we had to generate a large text with a large number of append and replace operations. The first version of the application took 20 seconds to generate the text using StringBuilder. The second improved version that used the String class took 10 seconds. Then we implemented FastReplacer and the duration dropped to 0.1 seconds.
If your template is large and you have lots of tokens, you probably don't want walk it and replace the token in the template one by one as that would result in an O(N * M) operation where N is the size of the template and M is the number of tokens to replace.
The following method accepts a template and a dictionary of the keys value pairs you wish to replace. By initializing the StringBuilder to slightly larger than the size of the template, it should result in an O(N) operation (i.e. it shouldn't have to grow itself log N times).
Finally, you can move the building of the tokens into a Singleton as it only needs to be generated once.
static string SimpleTemplate(string template, Dictionary<string, string> replacements)
{
// parse the message into an array of tokens
Regex regex = new Regex("(##[^#]+##)");
string[] tokens = regex.Split(template);
// the new message from the tokens
var sb = new StringBuilder((int)((double)template.Length * 1.1));
foreach (string token in tokens)
sb.Append(replacements.ContainsKey(token) ? replacements[token] : token);
return sb.ToString();
}
This is an ideal use of Regular Expressions. Check out this helpful website, the .Net Regular Expressions class, and this very helpful book Mastering Regular Expressions.

Categories

Resources