Regex replace all matched tokens with lowercase - c#

Given the following html text snippet
<th>Member name:</th>
<td>$$FULLNAME$$</td>
<th>Club:</th>
<td>$$ClubName$$</td>
<th>Business Category:</th>
<td>$$SubCategory$$</td>
I am trying to replace all the tokens e.g. $$FULLNAME$$ becomes $$fullname$$ using C#, the output should be
<th>Member name:</th>
<td>$$fullname$$</td>
<th>Club:</th>
<td>$$clubname$$</td>
<th>Business Category:</th>
<td>$$subcategory$$</td>
I have come up with this which does not work correctly as the \Lis not converting the matches to lowercase
public static string TokenReplacer(string value)
{
var pattern = Regex.Escape("$$") + "(.*?)" + Regex.Escape("$$");
var regex = new Regex(pattern);
return regex.Replace(value, Regex.Unescape("$$$$") + #"\L$1" + Regex.Unescape("$$$$"));
}

var output = Regex.Replace(input, #"\$\$.+?\$\$", m => m.Value.ToLower());

Related

Get values from a string based on a format

I am trying to get some individual values from a string based on a format, now this format can change so ideally, I want to specify this using another string.
For example let's say my input is 1. Line One - Part Two (Optional Third Part) I would want to specify the format as to match so %number%. %first% - %second% (%third%) and then I want these values as variables.
Now the only way I could think of doing this was using RegEx groups and I have very nearly got RegEx works.
var input = "1. Line One - Part Two (Optional Third Part)";
var formatString = "%number%. %first% - %second% (%third%)";
var expression = new Regex("(?<Number>[^.]+). (?<First>[^-]+) - (?<Second>[^\\(]+) ((?<Third>[^)]+))");
var match = expression.Match(input);
Console.WriteLine(match.Groups["Number"].ToString().Trim());
Console.WriteLine(match.Groups["First"].ToString().Trim());
Console.WriteLine(match.Groups["Second"].ToString().Trim());
Console.WriteLine(match.Groups["Third"].ToString().Trim());
This results in the following output, so all good apart from that opening bracket.
1 Line One Part Two (Optional Third Part
I'm now a bit lost as to how I could translate my format string into a regular expression, now there are no rules on this format, but it would need to be fairly easy for a user.
Any advice is greatly appreciated, or perhaps there is another way not involving Regex?
You included in your pattern couple of special characters (such as .) without escaping them, so Regex does not match . literlally.
Here's corrected code of yours:
using System.Text.RegularExpressions;
var input = "1. Line One - Part Two (Optional Third Part)";
var pattern = string.Format(
"(?<Number>{0})\\. (?<First>{1}) - (?<Second>{2}) \\((?<Third>{3})\\)",
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+");
var match = Regex.Match(input, pattern);
Console.WriteLine(match.Groups["Number"]);
Console.WriteLine(match.Groups["First"]);
Console.WriteLine(match.Groups["Second"]);
Console.WriteLine(match.Groups["Third"]);
Sample output:
If you want to keep you syntax, you can leverage Regex.Escape method. I also written some code that parses all parameters within %
using System.Text.RegularExpressions;
var input = "1. Line One - Part Two (Optional Third Part)";
var formatString = "%number%. %first% - %second% (%third%)";
formatString = Regex.Escape(formatString);
var parameters = new List<string>();
formatString = Regex.Replace(formatString, "%([^%]+)%", match =>
{
var paramName = match.Groups[1].Value;
var groupPattern = "(?<" + paramName + ">{" + parameters.Count + "})";
parameters.Add(paramName);
return groupPattern;
});
var pattern = string.Format(
formatString,
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+");
var match = Regex.Match(input, pattern);
foreach (var paramName in parameters)
{
Console.WriteLine(match.Groups[paramName]);
}
Further notes
You need to adjust part where you specify pattern for each group, currently it's not generic and does not care about how many paramters there would be.
So finally, taking it all into account and cleaning up the code a little, you can use such solution:
public static class FormatBasedCustomRegex
{
public static string GetPattern(this string formatString,
string[] subpatterns,
out string[] parameters)
{
formatString = Regex.Escape(formatString);
formatString = formatString.ReplaceParams(out var #params);
if(#params.Length != subpatterns.Length)
{
throw new InvalidOperationException();
}
parameters = #params;
return string.Format(
formatString,
subpatterns);
}
private static string ReplaceParams(
this string formatString,
out string[] parameters)
{
var #params = new List<string>();
var outputPattern = Regex.Replace(formatString, "%([^%]+)%", match =>
{
var paramName = match.Groups[1].Value;
var groupPattern = "(?<" + paramName + ">{" + #params.Count + "})";
#params.Add(paramName);
return groupPattern;
});
parameters = #params.ToArray();
return outputPattern;
}
}
and main method would look like:
var input = "1. Line One - Part Two (Optional Third Part)";
var pattern = "%number%. %first% - %second% (%third%)".GetPattern(
new[]
{
"[^\\.]+",
"[^\\-]+",
"[^\\(]+",
"[^\\)]+",
},
out var parameters);
var match = Regex.Match(input, pattern);
foreach (var paramName in parameters)
{
Console.WriteLine(match.Groups[paramName]);
}
But it's up to you how would you define particular methods and what signatures they should have for you to have the best code :)
You may use this regex:
^(?<Number>[^.]+)\. (?<First>[^-]+) - (?<Second>[^(]+)(?: \((?<Third>[^)]+)\))?$
RegEx Demo
RegEx Details:
^: Start
(?<Number>[^.]+): Match and capture 1+ of any char that is not .
\. : Match ". "
(?<First>[^-]+):
-: Match " - "
(?<Second>[^(]+): Match and capture 1+ of any char that is not (
(?:: Start a non-capture group
\(: Match space followed by (
(?<Third>[^)]+): Match and capture 1+ of any char that is not )
\): Match )
)?: End optional non-capture group
$: End
Your format contains special characters that are becoming part of the regular expression. You can use the Regex.Escape method to handle that. After that, you can just use a Regex.Replace with a delegate to transform the format into a regular expression:
var input = "1. Line One - Part Two (Optional Third Part)";
var fmt = "%number%. %first% - %second% (%third%)";
var templateRE = new Regex(#"%([a-z]+)%", RegexOptions.Compiled);
var pattern = templateRE.Replace(Regex.Escape(fmt), m => $"(?<{m.Groups[1].Value}>.+?)");
var ansRE = new Regex(pattern);
var ans = ansRE.Match(input);
Note: You may want to place ^ and $ at the beginning and end of the pattern respectively, to ensure the format must match the entire input string.

Text between 2 optional strings with OR condition using Regex

I have a string with 2 possibilities:
var desc = "Keyword1: That text I want \r\n Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
var desc = "Keyword1: That text I want Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
where the order of the keywords after the text "That text I want" Keyword2, Keyword3, Keyword4 doesn't matter and they are all optional.
I tried with the Regex Keyword1:(\s+)(.*)(\W+?)(\r\n?)(?=Keyword2:|Keyword3:|Keyword4:)
It does not work. Not sure what is wrong in my regex.
Any help is highly appreciated.
Thanks in advance!
Show here for the solution.
In your case you could simply use (regex between two strings):
(?<=Keyword1:)(.*)(?=Keyword2)
Try it out
Hope it helps.
Assuming those \r\n are actual special characters in the string and not the literals, this should work:
Keyword1: (.*?)(Keyword2:|Keyword3:|Keyword4:|\r\n)
You need to get the second grouping from the match. For example: match.Groups[1].
This regex matches Keyword1:, followed by the minimum amount of necessary characters, and then followed by either Keyword2: or \r\n (special characters). If those are literals in your input string, you will need to double those backslashes.
You can check it here. Note that on the right, Group 1 contains your text in both cases.
var pattern = keywordName + #":\s+(.+?)\r?\n";
var regex = new Regex(pattern);
var match = regex.Match(description);
if (!match.Success) return null;
var firstMatch = match.Groups[1].Value;
//Find if there's another keyword in the extracted Value
var lstKeywords = Enum.GetValues(typeof(Keywords)).Cast<Keywords>().Where(k => k != keywordName);
//Add : to the last value so that it's recognized as a keyword
var sOtherKeywords = string.Join(":|", lstKeywords) + ":";
var pattern2 = #"(" + sOtherKeywords + #")(\s+)";
regex = new Regex(pattern2);
match = regex.Match(firstMatch);
//If there's no other keyword in the same line then return the expression that is extracted from the first regex
if (!match.Success) return firstMatch;
var secondMatch = match.Groups[1].Value;
var pattern3 = keywordName + #":\s+(.+)(\r?\n?)" + secondMatch;
regex = new Regex(pattern3);
match = regex.Match(description);
return match.Success ? match.Groups[1].Value.TrimEnd() : null;

Regex from a html parsing, how do I grab a specific string?

I'm trying to specifically get the string after charactername= and before " >. How would I use regex to allow me to catch only the player name?
This is what I have so far, and it's not working. Not working as it doesn't actually print anything. On the client.DownloadString it returns a string like this:
<a href="https://my.examplegame.com/charactername=Atro+Roter" >
So, I know it actually gets string, I'm just stuck on the regex.
using (var client = new WebClient())
{
//Example of what the string looks like on Console when I Console.WriteLine(html)
//<a href="https://my.examplegame.com/charactername=Atro+Roter" >
// I want the "Atro+Roter"
string html = client.DownloadString(worldDest + world + inOrderName);
string playerName = "https://my.examplegame.com/charactername=(.+?)\" >";
MatchCollection m1 = Regex.Matches(html, playerName);
foreach (Match m in m1)
{
Console.WriteLine(m.Groups[1].Value);
}
}
I'm trying to specifically get the string after charactername= and before " >. 
So, you just need a lookbehind with lookahead and use LINQ to get all the match values into a list:
var input = "your input string";
var rx = new Regex(#"(?<=charactername=)[^""]+(?="")";
var res = rx.Matches(input).Cast<Match>().Select(p => p.Value).ToList();
The res variable should hold all your character names now.
I assume your issue is trying to parse the URL. Don't - use what .NET gives you:
var playerName = "https://my.examplegame.com/?charactername=NAME_HERE";
var uri = new Uri(playerName);
var queryString = HttpUtility.ParseQueryString(uri.Query);
Console.WriteLine("Name is: " + queryString["charactername"]);
This is much easier to read and no doubt more performant.
Working sample here: https://dotnetfiddle.net/iJlBKW
All forward slashes must be unescaped with back slashes like this \/
string input = #"<a href=""https://my.examplegame.com/charactername=Atro+Roter"" >";
string playerName = #"https:\/\/my.examplegame.com\/charactername=(.+?)""";
Match match = Regex.Match(input, playerName);
string result = match.Groups[1].Value;
Result = Atro+Roter

What regular expression is good for extracting URLs from HTML?

I have tried using my own and using the top ones here on StackOverflow, but most of them let matched more than was desired.
For instance, some would extract http://foo.com/hello?world<br (note <br at end) from the input ...http://foo.com/hello?world<br>....
If there a pattern that can match just the URL more reliably?
This is the current pattern I am using:
#"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&^]*)"
The most secure regex is to not use a regex at all and use the System.Uri class.
System.Uri
Uri uri = new Uri("http://myUrl/%2E%2E/%2E%2E");
Console.WriteLine(uri.AbsoluteUri);
Console.WriteLine(uri.PathAndQuery);
Your regex needs an escape for the dash "-" in the last character group:
#"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+\-=\\\.&^]*)"
Essentially, you were allowing characters from + through =, which includes <
Try this:
public static string[] Parse(string pattern, string groupName, string input)
{
var list = new List<string>();
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
for (var match = regex.Match(input); match.Success; match = match.NextMatch())
{
list.Add(string.IsNullOrWhiteSpace(groupName) ? match.Value : match.Groups[groupName].Value);
}
return list.ToArray();
}
public static string[] ParseUri(string input)
{
const string pattern = #"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*";
return Parse(pattern, string.Empty, input);
}

Regular Expression to Match Exact Word - Search String Highlight

I'm using the following 2 methods to highlight the search keywords. It is working fine but fetching partial words also.
For Example:
Text: "This is .net Programming"
Search Key Word: "is"
It is highlighting partial word from this and "is"
Please let me know the correct regular expression to highlight the correct match.
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
Regex exp = new Regex(#", ?");
searchKeyWord = "(\b" + exp.Replace(searchKeyWord, #"|") + "\b)";
exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, new MatchEvaluator(MatchEval));
}
private string MatchEval(Match match)
{
if (match.Groups[1].Success)
{
return "<span class='search-highlight'>" + match.ToString() + "</span>";
}
return ""; //no match
}
You really just need # before your "(\b" and "\b)" because the string "\b" will not be "\b" as you would expect. But I have also tried making another version with a replacement pattern instead of a full-blown method.
How about this one:
private string keywordPattern(string searchKeyword)
{
var keywords = searchKeyword.Split(',').Select(k => k.Trim()).Where(k => k != "").Select(k => Regex.Escape(k));
return #"\b(" + string.Join("|", keywords) + #")\b";
}
private string HighlightSearchKeyWords(string searchKeyword, string text)
{
var pattern = keywordPattern(searchKeyword);
Regex exp = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}
Usage:
var res = HighlightSearchKeyWords("is,this", "Is this programming? This is .net Programming.");
Result:
<span class="search-highlight">Is</span> <span class="search-highlight">this</span> programming? <span class="search-highlight">This</span> <span class="search-highlight">is</span> .net Programming.
Updated to use \b and a simplified replace pattern. (The old one used (^|\s) instead of the first \b and ($|\s) instead of the last \b. So it would also work on search terms which not only includes word-characters.
Updated to your comma notation for search terms
Updated forgot Regex.Escape - added now. Otherwise searches for "\w" would blow up the thing :)
Updated do to a comment ;)
Try this fixed line:
searchKeyWord = #"(\b" + exp.Replace(searchKeyWord, #"|") + #"\b)";
You need to enclose the keywords in a non-matching group, otherwise you will get false positives (if you are using multiple keywords separated by commas as indicated in the sample)!
private string EscapeKeyWords(string searchKeyWord)
{
string[] keyWords = searchKeyWord.Split(',');
for (int i = 0; i < keyWords.Length; i++) keyWords[i] = Regex.Escape(keyWords[i].Trim());
return String.Join("|", keyWords);
}
private string HighlightSearchKeyWords(string searchKeyWord, string text)
{
searchKeyWord = #"(\b(?:" + EscapeKeyWords(searchKeyWord) + #")\b)";
Regex exp = new Regex(searchKeyWord, RegexOptions.Singleline | RegexOptions.IgnoreCase);
return exp.Replace(text, #"<span class=""search-highlight"">$0</span>");
}

Categories

Resources