Need to perform Wildcard (*,?, etc) search on a string using Regex - c#

I need to perform Wildcard (*, ?, etc.) search on a string.
This is what I have done:
string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
MessageBox.Show("Found");
}
else
{
MessageBox.Show("Not Found");
}
With the above code "Found" block is hitting but actually it should not!
If my pattern is "e*" then only "Found" should hit.
My understanding or requirement is d* search should find the text containing "d" followed by any characters.
Should I change my pattern as "d.*" and "e.*"? Is there any support in .NET for Wild Card which internally does it while using Regex class?

From http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx:
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern)
.Replace(#"\*", ".*")
.Replace(#"\?", ".")
+ "$";
}
So something like foo*.xls? will get transformed to ^foo.*\.xls.$.

You can do a simple wildcard mach without RegEx using a Visual Basic function called LikeString.
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
Console.WriteLine("This matched!");
}
If you use CompareMethod.Text it will compare case-insensitive. For case-sensitive comparison, you can use CompareMethod.Binary.
More info here: http://www.henrikbrinch.dk/Blog/2012/02/14/Wildcard-matching-in-C
MSDN: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

The correct regular expression formulation of the glob expression d* is ^d, which means match anything that starts with d.
string input = "Message";
string pattern = #"^d";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
(The # quoting is not necessary in this case, but good practice since many regexes use backslash escapes that need to be left alone, and it also indicates to the reader that this string is special).

Windows and *nux treat wildcards differently. *, ? and . are processed in a very complex way by Windows, one's presence or position would change another's meaning. While *nux keeps it simple, all it does is just one simple pattern match. Besides that, Windows matches ? for 0 or 1 chars, Linux matches it for exactly 1 chars.
I didn't find authoritative documents on this matter, here is just my conclusion based on days of tests on Windows 8/XP (command line, dir command to be specific, and the Directory.GetFiles method uses the same rules too) and Ubuntu Server 12.04.1 (ls command). I made tens of common and uncommon cases work, although there'are many failed cases too.
The current answer by Gabe, works like *nux. If you also want a Windows style one, and are willing to accept the imperfection, then here it is:
/// <summary>
/// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
/// </summary>
/// <param name="fileName">The file name to test, without folder.</param>
/// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
/// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
/// <returns>true if the file name matches the pattern, false otherwise.</returns>
public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
{
if (fileName == null)
throw new ArgumentNullException("fileName");
if (pattern == null)
throw new ArgumentNullException("pattern");
if (unixStyle)
return WildcardMatchesUnixStyle(pattern, fileName);
return WildcardMatchesWindowsStyle(fileName, pattern);
}
private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
{
var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
if (dotdot >= 0)
{
for (var i = dotdot; i < pattern.Length; i++)
if (pattern[i] != '.')
return false;
}
var normalized = Regex.Replace(pattern, #"\.+$", "");
var endsWithDot = normalized.Length != pattern.Length;
var endWeight = 0;
if (endsWithDot)
{
var lastNonWildcard = normalized.Length - 1;
for (; lastNonWildcard >= 0; lastNonWildcard--)
{
var c = normalized[lastNonWildcard];
if (c == '*')
endWeight += short.MaxValue;
else if (c == '?')
endWeight += 1;
else
break;
}
if (endWeight > 0)
normalized = normalized.Substring(0, lastNonWildcard + 1);
}
var endsWithWildcardDot = endWeight > 0;
var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
if (endsWithDotWildcardDot)
normalized = normalized.Substring(0, normalized.Length - 1);
normalized = Regex.Replace(normalized, #"(?!^)(\.\*)+$", #".*");
var escaped = Regex.Escape(normalized);
string head, tail;
if (endsWithDotWildcardDot)
{
head = "^" + escaped;
tail = #"(\.[^.]{0," + endWeight + "})?$";
}
else if (endsWithWildcardDot)
{
head = "^" + escaped;
tail = "[^.]{0," + endWeight + "}$";
}
else
{
head = "^" + escaped;
tail = "$";
}
if (head.EndsWith(#"\.\*") && head.Length > 5)
{
head = head.Substring(0, head.Length - 4);
tail = #"(\..*)?" + tail;
}
var regex = head.Replace(#"\*", ".*").Replace(#"\?", "[^.]?") + tail;
return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
}
private static bool WildcardMatchesUnixStyle(string pattern, string text)
{
var regex = "^" + Regex.Escape(pattern)
.Replace("\\*", ".*")
.Replace("\\?", ".")
+ "$";
return Regex.IsMatch(text, regex);
}
There's a funny thing, even the Windows API PathMatchSpec does not agree with FindFirstFile. Just try a1*., FindFirstFile says it matches a1, PathMatchSpec says not.

d* means that it should match zero or more "d" characters. So any string is a valid match. Try d+ instead!
In order to have support for wildcard patterns I would replace the wildcards with the RegEx equivalents. Like * becomes .* and ? becomes .?. Then your expression above becomes d.*

You need to convert your wildcard expression to a regular expression. For example:
private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
{
// Replace the * with an .* and the ? with a dot. Put ^ at the
// beginning and a $ at the end
String pattern = "^" + Regex.Escape(wildcard).Replace(#"\*", ".*").Replace(#"\?", ".") + "$";
// Now, run the Regex as you already know
Regex regex;
if(case_sensitive)
regex = new Regex(pattern);
else
regex = new Regex(pattern, RegexOptions.IgnoreCase);
return(regex.IsMatch(s));
}

You must escape special Regex symbols in input wildcard pattern (for example pattern *.txt will equivalent to ^.*\.txt$)
So slashes, braces and many special symbols must be replaced with #"\" + s, where s - special Regex symbol.

I think #Dmitri has nice solution at
Matching strings with wildcard https://stackoverflow.com/a/30300521/1726296
Based on his solution, I have created two extension methods. (credit goes to him)
May be helpful.
public static String WildCardToRegular(this String value)
{
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
if (ignoreCase)
return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);
return Regex.IsMatch(value, WildCardToRegular(pattern));
}
Usage:
string pattern = "file.*";
var isMatched = "file.doc".WildCardMatch(pattern)
or
string xlsxFile = "file.xlsx"
var isMatched = xlsxFile.WildCardMatch(pattern)

All upper code is not correct to the end.
This is because when searching zz*foo* or zz* you will not get correct results.
And if you search "abcd*" in "abcd" in TotalCommander will he find a abcd file so all upper code is wrong.
Here is the correct code.
public string WildcardToRegex(string pattern)
{
string result= Regex.Escape(pattern).
Replace(#"\*", ".+?").
Replace(#"\?", ".");
if (result.EndsWith(".+?"))
{
result = result.Remove(result.Length - 3, 3);
result += ".*";
}
return result;
}

You may want to use WildcardPattern from System.Management.Automation assembly. See my answer here.

The most accepted answer works fine for most cases and can be used in most scenarios:
"^" + Regex.Escape(pattern).Replace(#"\*", ".*").Replace(#"\?", ".") + "$";
However if you allow escaping in you input wildcard pattern, e.g. "find \*", meaning you want to search for a string "find *" with asterisk, it won't work. The already escaped * will be escaped to "\\\\\\*" and after replacing we have "^value\\ with\\\\.*$", which is wrong.
The following code (which for sure can be optimized and rewritten) handles that special case:
public static string WildcardToRegex(string wildcard)
{
var sb = new StringBuilder();
for (var i = 0; i < wildcard.Length; i++)
{
// If wildcard has an escaped \* or \?, preserve it like it is in the Regex expression
var character = wildcard[i];
if (character == '\\' && i < wildcard.Length - 1)
{
if (wildcard[i + 1] == '*')
{
sb.Append("\\*");
i++;
continue;
}
if (wildcard[i + 1] == '?')
{
sb.Append("\\?");
i++;
continue;
}
}
switch (character)
{
// If it's unescaped * or ?, change it to Regex equivalents. Add more wildcard characters (like []) if you need to support them.
case '*':
sb.Append(".*");
break;
case '?':
sb.Append('.');
break;
default:
//// Escape all other symbols because wildcard could contain Regex special symbols like '.'
sb.Append(Regex.Escape(character.ToString()));
break;
}
}
return $"^{sb}$";
}
Solution for the problem just with Regex substitutions is proposed here https://stackoverflow.com/a/15275806/1105564

Related

Replace and remove a string via regex

This is first time I am working with regex.
The string below
var value ="abc ltd as yes"
need to be change to
var value ="abc Limited"
I have the following code:
public static string Attempt_Prefix_Removal( string prefix, string replacement, bool remove = false)
{
if (remove == true)
{
var yesy = $"(?<!prefix )" + prefix + ".*";
var test = Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + ".*", replacement );
}
var output = (remove == true) ? Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + ".*", replacement) : Regex.Replace(prefix.ToLower(), $"(?<!prefix )" + prefix + "", replacement);
return output;
}
the values that are passed to the method are
prefix ="ltd", replacement = "Limited" , remove = ture
after running the code the result is
abc Limited as yes
what do i need to change to get ride of as yes ??
thanks
You may leverage this code:
var prefix ="ltd"; var replacement = "Limited";
var pat = $#"(?s)(?<!\w){Regex.Escape(prefix)}(?!\w){remove ? ".*" : string.Empty}";
return Regex.Replace(val, pat, replacement.Replace("$", "$$"));
See the C# demo online
The main points here are:
(?s) - will allow . match a newline (in case you will use .* in the pattern)
(?<!\w){Regex.Escape(prefix)}(?!\w) - the (?<!\w) negative lookbehind will fail the match if the current location is preceded with a word char (you may further tweak the lookbehind pattern as per your requirements)
{remove ? ".*" : string.Empty} - this will either append .* (if remove is true) or not.
private string regexOp(string sentence, string word, string wordtoReplace, bool isRemove)
{
var retValue = sentence;
if (isRemove)
{
var Pattern = "^.*?(?=" + word + ")";
Match result = Regex.Match(sentence, #Pattern);
if (!string.IsNullOrEmpty(result.Value))
retValue = result + wordtoReplace;
}
else
retValue = Regex.Replace(sentence, word, wordtoReplace);
return retValue;
}
try this method, this will work as you expected with dynamic,
do not forget to mark it as answer if this really helped you,

regex match partial or whole word

I am trying to figure out a regular expression which can match either the whole word or a (predefined in length, e.g first 4 chars) part of the word.
For example, if I am trying to match the word between and my offset is set to 4, then
between betwee betwe betw
are matches, but not the
bet betweenx bet12 betw123 beta
I have created an example in regex101, where I am trying (with no luck) a combination of positive lookahead (?=) and a non-word boundary \B.
I found a similar question which proposes a word around in its accepted answer. As I understand, it overrides the matcher somehow, to run all the possible regular expressions, based on the word and an offset.
My code has to be written in C#, so I am trying to convert the aforementioned code. As I see Regex.Replace (and I assume Regex.Match also) can accept delegates to override the default functionality, but I can not make it work.
You could take the first 4 characters, and make the remaining ones optional.
Then wrap these in word boundaries and parenthesis.
So in the case of "between", it would be
#"\b(betw)(?:(e|ee|een)?)\b"
The code to achieve that would be:
public string createRegex(string word, int count)
{
var mandatory = word.Substring(0, count);
var optional = "(" + String.Join("|", Enumerable.Range(1, count - 1).Select(i => word.Substring(count, i))) + ")?";
var regex = #"\b(" + mandatory + ")(?:" + optional + #")\b";
return regex;
}
The code in the answer you mentioned simply builds up this:
betw|betwe|betwee|between
So all you need is to write a function, to build up a string with a substrings of given word given minimum length.
static String BuildRegex(String word, int min_len)
{
String toReturn = "";
for(int i = 0; i < word.Length - min_len +1; i++)
{
toReturn += word.Substring(0, min_len+i);
toReturn += "|";
}
toReturn = toReturn.Substring(0, toReturn.Length-1);
return toReturn;
}
Demo
You can use this regex
\b(bet(?:[^\s]){1,4})\b
And replace bet and the 4 dynamically like this:
public static string CreateRegex(string word, int minLen)
{
string token = word.Substring(0, minLen - 1);
string pattern = #"\b(" + token + #"(?:[^\s]){1," + minLen + #"})\b";
return pattern;
}
Here's a demo: https://regex101.com/r/lH0oL2/1
EDIT: as for the bet1122 match, you can edit the pattern this way:
\b(bet(?:[^\s0-9]){1,4})\b
If you don't want to match some chars, just enqueue them into the [] character class.
Demo: https://regex101.com/r/lH0oL2/2
For more info, see http://www.regular-expressions.info/charclass.html

Replace regular expression with regular expression

Consider two regular expressions:
var regex_A = "Main\.(.+)\.Value";
var regex_B = "M_(.+)_Sp";
I want to be able to replace a string using regex_A as input, and regex_B as the replacement string. But also the other way around. And without supplying additional information like a format string per regex.
Specifically I want to create a replaced_B string from an input_A string. So:
var input_A = "Main.Rotating.Value";
var replaced_B = input_A.RegEx_Awesome_Replace(regex_A, regex_B);
Assert.AreEqual("M_Rotating_Sp", replaced_B);
And this should also work in reverse (thats the reason i can't use a simple string.format for regex_B). Because I don't want to supply a format string for every regular expression (i'm lazy).
var input_B = "M_Skew_Sp";
var replaced_A = input_B.RegEx_Awesome_Replace(regex_B, regex_A);
Assert.AreEqual("Main.Skew.Value", replaced_A);
I have no clue if this exists, or how to call it. Google search finds me all kinds of other regex replaces... not this one.
Update:
So basically I need a way to convert a regular expression to a format string.
var regex_A_format = Regex2Format(regex_A);
Assert.AreEqual("Main.$1.Value", regex_A_format);
and
var regex_B_format = Regex2Format(regex_B);
Assert.AreEqual("M_$1_Sp", regex_B_format);
So what should the RegEx_Awesome_Replace and/or Regex2Format function look like?
Update 2:
I guess the RegEx_Awesome_Replace should look something like (using some code from answers below):
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
return Regex.Replace(inputString, searchPattern, Regex2Format(replacePattern));
}
}
Which would leave the Regex2Format as an open question.
There is no defined way for one regex to refer to a match found in another regex. Regexes are not format strings.
What you can do is to use Tuples of a format string together with its regex. e.g.
var a = new Tuple<Regex,string>(new Regex(#"(?<=Main\.).+(?=\.Value)"), #"Main.{0}.Value")
var b = new Tuple<Regex,string>(new Regex(#"(?<=M_).+(?=_Sp)"), #"M_{0}_Sp")`
Then you can pass these objects to a common replacement method in any order, like this:
private string RegEx_Awesome_Replace(string input, Tuple<Regex,string> toFind, Tuple<Regex,string> replaceWith)
{
return string.Format(replaceWith.Item2, toFind.Item1.Match(input).Value);
}
You will notice that I have used zero-width positive lookahead assertion and zero-width positive lookbehind assertions in my regexes, to ensure that Value contains exactly the text that I want to replace.
You may also want to add error handling, for cases where the match can not be found. Maybe read about Regex.Match
Since you have already reduced your problem to where you need to change a Regex into a string format (implementing Regex2Format) I will focus my answer just on that part. Note that my answer is incomplete because it doesn't address the full breadth of parsing regex capturing groups, however it works for simple cases.
First thing needed is a Regex that will match Regex capture groups. There is a negative lookbehind to not match escaped bracket symbols. There are other cases that break this regex. E.g. a non-capturing group, wildcard symbols, things between square braces.
private static readonly Regex CaptureGroupMatcher = new Regex(#"(?<!\\)\([^\)]+\)");
The implementation of Regex2Format here basically writes everything outside of capture groups into the output string, and replaces the capture group value by {x}.
static string Regex2Format(string pattern)
{
var targetBuilder = new StringBuilder();
int previousEndIndex = 0;
int formatIndex = 0;
foreach (Match match in CaptureGroupMatcher.Matches(pattern))
{
var group = match.Groups[0];
int endIndex = group.Index;
AppendPart(pattern, previousEndIndex, endIndex, targetBuilder);
targetBuilder.Append('{');
targetBuilder.Append(formatIndex++);
targetBuilder.Append('}');
previousEndIndex = group.Index + group.Length;
}
AppendPart(pattern, previousEndIndex, pattern.Length, targetBuilder);
return targetBuilder.ToString();
}
This helper function writes pattern string values into the output, it currently writes everything except \ characters used to escape something.
static void AppendPart(string pattern, int previousEndIndex, int endIndex, StringBuilder targetBuilder)
{
for (int i = previousEndIndex; i < endIndex; i++)
{
char c = pattern[i];
if (c == '\\' && i < pattern.Length - 1 && pattern[i + 1] != '\\')
{
//backslash not followed by another backslash - it's an escape char
}
else
{
targetBuilder.Append(c);
}
}
}
Test cases
static void Test()
{
var cases = new Dictionary<string, string>
{
{ #"Main\.(.+)\.Value", #"Main.{0}.Value" },
{ #"M_(.+)_Sp(.*)", "M_{0}_Sp{1}" },
{ #"M_\(.+)_Sp", #"M_(.+)_Sp" },
};
foreach (var kvp in cases)
{
if (PatternToStringFormat(kvp.Key) != kvp.Value)
{
Console.WriteLine("Test failed for {0} - expected {1} but got {2}", kvp.Key, kvp.Value, PatternToStringFormat(kvp.Key));
}
}
}
To wrap up, here is the usage:
private static string AwesomeRegexReplace(string input, string sourcePattern, string targetPattern)
{
var targetFormat = PatternToStringFormat(targetPattern);
return Regex.Replace(input, sourcePattern, match =>
{
var args = match.Groups.OfType<Group>().Skip(1).Select(g => g.Value).ToArray<object>();
return string.Format(targetFormat, args);
});
}
Something like this might work
var replaced_B = Regex.Replace(input_A, #"Main\.(.+)\.Value", #"M_$1_Sp");
Are you looking for something like this?
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
Match searchMatch = Regex.Match(inputString,searchPattern);
Match replaceMatch = Regex.Match(inputString, replacePattern);
if (!searchMatch.Success || !replaceMatch.Success)
{
return inputString;
}
return inputString.Replace(searchMatch.Value, replaceMatch.Value);
}
}
The string extension method returns the string with replaced value for search pattern and replace pattern.
This is how you call:
input_A.RegEx_Awesome_Replace(regex_A, regex_B);

Extra delimiter in MAC Address reformat

I've looked at several questions on here about formatting and validating MAC addresses, which is where I developed my regex from. The problem I'm having is that when I go to update the field is that there are extra delimiters in the new formatted MAC or if no delimiter exists the MAC fails to validate. I'm new to using regex, so can someone clarify why this is happening?
if (checkMac(NewMacAddress.Text) == true)
{
string formattedMAC = NewMacAddress.Text;
formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
NewMacAddress.Text = newformat.ToString();
}
Here is the checkmac function
protected bool checkMac(string macaddress)
{
macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}
This is sample output for the extra delimiter that I was talking about. 00::5:0::56::b:f:00:7f
I was able to get the original MAC from a textbox. This also occurs with the MAC address I get from screen scrapes.
The reason your code is not working as intended is because:
String.Replace does not modify the string you pass in, but returns a new string instead (strings are immutable). You have to assign the result of String.Replace to a variable.
Your checkMac function only allows mac addresses with delimiters. You can simply remove this restriction to resolve your problems.
The working code then becomes something along the lines of:
string newMacAddress = "00::5:0::56::b:f:00:7f";
if (checkMac(newMacAddress) == true)
{
string formattedMAC = newMacAddress;
formattedMAC = formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
newMacAddress = newformat.ToString();
}
protected static bool checkMac(string macaddress)
{
macaddress = macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{12})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}
You're close. I'm first going to answer using Ruby because that's what I'm most familiar with at the moment, and it should be sufficient for you to understand how to get it working in C#. Maybe I can convert it to C# later.
using these elements:
\A - start of entire string
[0-9a-fA-F] - any hex digit
{2} - twice
[:-]? - either ":" or "-" or "" (no delimiter)
\Z - end of entire string, before ending newline if it exists
() - parenthetical match in order to reference parts of the regex, e.g. match[1]
This regex will do what you need:
mac_address_regex = /\A([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})\Z/
You can both validate and sanitize input using this regex:
match = mac_address_regex.match(new_mac_address.text)
if match.present?
sanitized_mac_addr = (1..6).map { |i| match[i] }.join(":") # join match[i] for i = (1,2,3,4,5,6)
sanitized_mac_addr.upcase! # uppercase
else
sanitized_mac_addr = nil
end

Regular Expression to Match Exact Word - Search String Highlight

I'm using the following 2 methods to highlight the search keywords. It is working fine but fetching partial words also.
For Example:
Text: "This is .net Programming"
Search Key Word: "is"
It is highlighting partial word from this and "is"
Please let me know the correct regular expression to highlight the correct match.
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
Regex exp = new Regex(#", ?");
searchKeyWord = "(\b" + exp.Replace(searchKeyWord, #"|") + "\b)";
exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, new MatchEvaluator(MatchEval));
}
private string MatchEval(Match match)
{
if (match.Groups[1].Success)
{
return "<span class='search-highlight'>" + match.ToString() + "</span>";
}
return ""; //no match
}
You really just need # before your "(\b" and "\b)" because the string "\b" will not be "\b" as you would expect. But I have also tried making another version with a replacement pattern instead of a full-blown method.
How about this one:
private string keywordPattern(string searchKeyword)
{
var keywords = searchKeyword.Split(',').Select(k => k.Trim()).Where(k => k != "").Select(k => Regex.Escape(k));
return #"\b(" + string.Join("|", keywords) + #")\b";
}
private string HighlightSearchKeyWords(string searchKeyword, string text)
{
var pattern = keywordPattern(searchKeyword);
Regex exp = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}
Usage:
var res = HighlightSearchKeyWords("is,this", "Is this programming? This is .net Programming.");
Result:
<span class="search-highlight">Is</span> <span class="search-highlight">this</span> programming? <span class="search-highlight">This</span> <span class="search-highlight">is</span> .net Programming.
Updated to use \b and a simplified replace pattern. (The old one used (^|\s) instead of the first \b and ($|\s) instead of the last \b. So it would also work on search terms which not only includes word-characters.
Updated to your comma notation for search terms
Updated forgot Regex.Escape - added now. Otherwise searches for "\w" would blow up the thing :)
Updated do to a comment ;)
Try this fixed line:
searchKeyWord = #"(\b" + exp.Replace(searchKeyWord, #"|") + #"\b)";
You need to enclose the keywords in a non-matching group, otherwise you will get false positives (if you are using multiple keywords separated by commas as indicated in the sample)!
private string EscapeKeyWords(string searchKeyWord)
{
string[] keyWords = searchKeyWord.Split(',');
for (int i = 0; i < keyWords.Length; i++) keyWords[i] = Regex.Escape(keyWords[i].Trim());
return String.Join("|", keyWords);
}
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
searchKeyWord = #"(\b(?:" + EscapeKeyWords(searchKeyWord) + #")\b)";
Regex exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}

Categories

Resources